Full Code of woahdae/simple_xlsx_reader for AI

master 22d783c88a75 cached

38 files

92.0 KB

26.7k tokens

84 symbols

1 requests

Download .txt

Repository: woahdae/simple_xlsx_reader
Branch: master
Commit: 22d783c88a75
Files: 38
Total size: 92.0 KB

Directory structure:
gitextract_v1o319sx/

├── .github/
│   ├── dependabot.yml
│   └── workflows/
│       └── ruby.yml
├── .gitignore
├── .travis.yml
├── CHANGELOG.md
├── Gemfile
├── LICENSE.txt
├── README.md
├── Rakefile
├── lib/
│   ├── simple_xlsx_reader/
│   │   ├── document.rb
│   │   ├── hyperlink.rb
│   │   ├── loader/
│   │   │   ├── shared_strings_parser.rb
│   │   │   ├── sheet_parser.rb
│   │   │   ├── style_types_parser.rb
│   │   │   └── workbook_parser.rb
│   │   ├── loader.rb
│   │   └── version.rb
│   └── simple_xlsx_reader.rb
├── simple_xlsx_reader.gemspec
└── test/
    ├── chunky_utf8.xlsx
    ├── date1904.xlsx
    ├── date1904_test.rb
    ├── datetime_test.rb
    ├── datetimes.xlsx
    ├── gdocs_sheet.xlsx
    ├── gdocs_sheet_test.rb
    ├── lower_case_sharedstrings.xlsx
    ├── lower_case_sharedstrings_test.rb
    ├── misc_numbers.xlsx
    ├── namespaces_and_missing_atts_test.rb
    ├── percentages_n_currencies.xlsx
    ├── performance_test.rb
    ├── sesame_street_blog.xlsx
    ├── shared_strings.xml
    ├── simple_xlsx_reader_test.rb
    ├── styles.xml
    ├── test_helper.rb
    └── test_xlsx_builder.rb

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/dependabot.yml
================================================
version: 2
updates:
  - package-ecosystem: "github-actions"
    directory: "/"
    schedule:
      interval: "weekly"


================================================
FILE: .github/workflows/ruby.yml
================================================
# This workflow uses actions that are not certified by GitHub.
# They are provided by a third-party and are governed by
# separate terms of service, privacy policy, and support
# documentation.
# This workflow will download a prebuilt Ruby version, install dependencies and run tests with Rake
# For more information see: https://github.com/marketplace/actions/setup-ruby-jruby-and-truffleruby

name: Ruby

on:
  push:
    branches: [ "master" ]
  pull_request:
    branches: [ "master" ]

permissions:
  contents: read

jobs:
  test:

    runs-on: ubuntu-latest
    strategy:
      matrix:
        ruby-version: ['2.6', '2.7', '3.0', '3.1', '3.2', '3.3']

    steps:
    - uses: actions/checkout@v4
    - name: Set up Ruby
      uses: ruby/setup-ruby@v1
      with:
        ruby-version: ${{ matrix.ruby-version }}
        bundler-cache: true # runs 'bundle install' and caches installed gems automatically
    - name: Run tests
      run: bundle exec rake


================================================
FILE: .gitignore
================================================
*.gem
*.rbc
.bundle
.config
.yardoc
Gemfile.lock
InstalledFiles
_yardoc
coverage
doc/
lib/bundler/man
pkg
rdoc
spec/reports
test/tmp
test/version_tmp
tmp


================================================
FILE: .travis.yml
================================================
language: ruby
cache: bundler
before_install:
  - gem update bundler
rvm:
  - 2.5.8
  - 2.7.2
  - 3.0.0


================================================
FILE: CHANGELOG.md
================================================
### 5.1.0

* Parse sheets containing namespaces and no 'r' att (@skipchris)
* Fix Zlib error when loading from string (@myabc)
* Prevent a SimpleXlsxReader::CellLoadError (no implicit conversion of Integer
  into String) when the casted value (friendly name) is not a string (@tsdbrown)
* Accidental 25% perfarmance improvement while experimenting with namespace
  support (see #53f5a9).

### 5.0.0

* Change SimpleXlsxReader::Hyperlink to default to the visible cell value
  instead of the hyperlink URL, which in the case of mailto hyperlinks is
  surprising.
* Fix blank content when parsing docs from string (@codemole)

### 4.0.1

* Fix nil error when handling some inline strings

  Inline strings are almost exclusively used by non-Excel XLSX
  implementations, but are valid, and sometimes have nil chunks.

  Also, inline strings weren't preserving whitespace if Nokogiri is
  parsing the string in chunks, as it does when encountering escaped
  characters. Fixed.

### 4.0.0

* Fix percentage rounding errors. Previously we were dividing by 100, when we
  actually don't need to, so percentage types were 100x too small. Fixes #21.
  Major bump because workarounds might have been implemented for previous
  incorrect behavior.
* Fix small oddity in one currency format where round numbers would be cast
  to an integer instead of a float.

### 3.0.1

* Fix parsing "chunky" UTF-8 workbooks. Closes issues #39 and #45. See ce67f0d4.

### 3.0.0

* Change the way we typecast cells in the General format. This probably won't
  break anything in your app, but it's a change in behavior that theoretically
  could.

  Previously, we were treating cells using General the format as strings, when
  according to the Office XML standard, they should be treated as numbers. We
  now attempt to cast such cells as numbers, and fall back to strings if number
  casting fails.

  Thanks @jrodrigosm

### 2.0.1

* Restore ability to parse IO strings (@robbevp)
* Add Ruby 3.1 and 3.2 to CI (@taichi-ishitani)

### 2.0.0

* SPEED
  * Reimplement internals in terms of a SAX parser
  * Change `SimpleXlsxReader::Sheet#rows` to be a `RowsProxy` that streams `#each`
* Convenience - use `rows#each(headers: true)` to get header names while enumerating rows

### 1.0.5

* Support string or io input via `SimpleXlsxReader#parse` (@kalsan, @til)

### 1.0.4

* Fix Windows + RubyZip 1.2.1 bug preventing files from being read
* Add ability to parse hyperlinks
* Support files exported from Google Docs (@Strnadj)

### 1.0.3

Broken on Ruby 1.9; yanked.

### 1.0.2

* Fix Ruby 1.9.3-specific bug preventing parsing most sheets [middagj, eritiro]
* Better support for non-excel-generated xlsx files [bwlang]
  * You don't always have a numFmtId column, and that's OK
  * Sometimes 'sharedStrings.xml' can be 'sharedstrings.xml'
* Fixed parsing times very close to 12/30/1899 [Valeriy Utyaganov]
* Be more flexible with custom formats using a numFmtId < 164

### 1.0.1

* Add support for the 1904 date system [zilverline]

### 1.0.0

No changes since 1.0.0.pre. Releasing 1.0.0 since the project has seen a
few months of stability in terms of bug fix requests, and the API is not
going to change.

### 1.0.0.pre

* Handle files with blank rows [Brian Hoffman]
* Preserve seconds when casting datetimes [Rob Newbould]
* Preserve empty rows (previously would be ommitted)
* Speed up parsing by ~55%

### 0.9.8

* Rubyzip 1.0 compatability

### 0.9.7

* Fix cell parsing where cells have a type, but no content
* Add a speed test; parsing performs in linear time, but a relatively
  slow line :/

### 0.9.6

* Fix worksheet indexes when worksheets have been deleted

### 0.9.5

* Fix inlineStr support (broken by formula support commit)

### 0.9.4

* Formula support. Formulas used to cause things to blow up, now they don't!
* Support number types styled as dates. Previously, the type was honored
  above the style, which is incorrect for dates; date-numbers now parse as
  dates.
* Error-free parsing of empty sheets
* Fix custom styles w/ numFmtId == 164. Custom style types are delineated
  starting *at* numFmtId 164, not greater than 164.

### 0.9.3

* Support 1.8.7 (tests pass). Ongoing support will depend on ease.

### 0.9.2

* Support reading files written by ex. simple_xlsx_writer that don't
  specify sheet dimensions explicitly (which Excel does).

### 0.9.1

* Fixed an important parse bug that ignored empty 'Generic' cells

### 0.9.0

* Initial release. 0.9 version number is meant to reflect the near-stable
  public api, yet still prerelease status of the project.


================================================
FILE: Gemfile
================================================
source 'https://rubygems.org'

# Specify your gem's dependencies in simple_xlsx_reader.gemspec
gemspec


================================================
FILE: LICENSE.txt
================================================
Copyright (c) 2013 Woody Peterson

MIT License

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

================================================
FILE: README.md
================================================
# SimpleXlsxReader

A [fast](#performance) xlsx reader for Ruby that parses xlsx cell values into
plain ruby primitives and dates/times.

This is *not* a rewrite of excel in Ruby. Font styles, for
example, are parsed to determine whether a cell is a number or a date,
then forgotten. We just want to get the data, and get out!

## Summary (now with stream parsing):

```ruby
doc = SimpleXlsxReader.open('/path/to/workbook.xlsx')
doc.sheets # => [<#SXR::Sheet>, ...]
doc.sheets.first.name # 'Sheet1'
rows = doc.sheets.first.rows # <SXR::Document::RowsProxy>
rows.each # an <Enumerator> ready to chain or stream
rows.each {} # Streams the rows to your block
rows.each(headers: true) {} # Streams row-hashes
rows.each(headers: {id: /ID/}) {} # finds & maps headers, streams
rows.slurp # Slurps rows into memory as a 2D array
```

That's the gist of it!

See also the [Document](https://github.com/woahdae/simple_xlsx_reader/blob/2.0.0-pre/lib/simple_xlsx_reader/document.rb) object.

## Why?

### Accurate

This project was started years ago, primarily because other Ruby xlsx parsers
didn't import data with the correct types. Numbers as strings, dates as numbers,
[hyperlinks](https://github.com/woahdae/simple_xlsx_reader/blob/master/lib/simple_xlsx_reader/hyperlink.rb)
with inaccessible URLs, or - subtly buggy - simple dates as DateTime
objects. If your app uses a timezone offset, depending on what timezone and
what time of day you load the xlsx file, your dates might end up a day off!
SimpleXlsxReader understands all these correctly.

### Idiomatic

Many Ruby xlsx parsers seem to be inspired more by Excel than Ruby, frankly.
SimpleXlsxReader strives to be fairly idiomatic Ruby:

```ruby
# quick example having fun w/ ruby
doc = SimpleXlsxReader.open(file_path) # or SimpleXlsxReader.parse(string_or_io)
doc.sheets.first.rows.each(headers: {id: /ID/})
  .with_index.with_object({}) do |(row, index), acc|
    acc[row[:id]] = index
end
```

### Now faster

Finally, as of v2.0, SimpleXlsxReader is the fastest and most
memory-efficient parser. Previously this project couldn't reasonably load
anything over ~10k rows. Other parsers could load 100k+ rows, but were still
taking ~1gb RSS to do so, even "streaming," which seemed excessive. So a SAX
implementation was born. See [performance](#performance) for details.

## Usage

### Streaming

SimpleXlsxReader is performant by default - If you use
`rows.each {|row| ...}` it will stream the XLSX rows to your block without
loading either the sheet XML or the full sheet data into memory.

You can also chain `rows.each` with other Enumerable functions without
triggering a slurp, and you have lots of ways to find and map headers while
streaming.

If you had an excel sheet representing this data:

```
| Hero ID | Hero Name  | Location     |
| 13576   | Samus Aran | Planet Zebes |
| 117     | John Halo  | Ring World   |
| 9704133 | Iron Man   | Planet Earth |
```

Get a handle on the rows proxy:

```ruby
rows = SimpleXlsxReader.open('suited_heroes.xlsx').sheets.first.rows
```

Simple streaming (kinda boring):

```ruby
rows.each { |row| ... }
````

Streaming with headers, and how about a little enumerable chaining:

```ruby
# Map of hero names by ID: { 117 => 'John Halo', ... }

rows.each(headers: true).with_object({}) do |row, acc|
  acc[row['Hero ID']] = row['Hero Name']
end
```

Sometimes though you have some junk at the top of your spreadsheet:

```
| Unofficial Report  |                        |              |
| Dont tell Nintendo | Yes "John Halo" I know |              |
|                    |                        |              |
| Hero ID            | Hero Name              | Location     |
| 13576              | Samus Aran             | Planet Zebes |
| 117                | John Halo              | Ring World   |
| 9704133            | Iron Man               | Planet Earth |
```

For this, `headers` can be a hash whose keys replace headers and whose values
help find the correct header row:

```ruby
# Same map of hero names by ID: { 117 => 'John Halo', ... }

rows.each(headers: {id: /ID/, name: /Name/}).with_object({}) do |row, acc|
  acc[row[:id]] = row[:name]
end
```

If your header-to-attribute mapping is more complicated than key/value, you
can do the mapping elsewhere, but use a block to find the header row:

```ruby
# Example roughly analogous to some production code mapping a single spreadsheet
# across many objects. Might be a simpler way now that we have the headers-hash
# feature.

object_map = { Hero => { id: 'Hero ID', name: 'Hero Name', location: 'Location' } }

HEADERS = ['Hero ID', 'Hero Name', 'Location']

rows.each(headers: ->(row) { (HEADERS & row).any? }) do |row|
  object_map.each_pair do |klass, attribute_map|
    attributes =
      attribute_map.each_pair.with_object({}) do |(key, header), attrs|
        attrs[key] = row[header]
      end

    klass.new(attributes)
  end
end
```

### Slurping

To make SimpleXlsxReader rows act like an array, for use with legacy
SimpleXlsxReader apps or otherwise, we still support slurping the whole array
into memory. The good news is even when doing this, the xlsx worksheet & shared
string files are never loaded as a (big) Nokogiri doc, so that's nice.

By default, to prevent accidental slurping, `<RowsProxy>` will throw an exception
if you try to access it with array methods like `[]` and `shift` without
explicitly slurping first. You can slurp either by calling `rows.slurp` or
globally by setting `SimpleXlsxReader.configuration.auto_slurp = true`.

Once slurped, enumerable methods on `rows` will use the slurped data
(i.e. not re-parse the sheet), and those Array-like methods will work.

We don't support all Array methods, just the few we have used in real projects,
as we transition towards streaming instead.

### Load Errors

By default, cell load errors (ex. if a date cell contains the string
'hello') result in a SimpleXlsxReader::CellLoadError.

If you would like to provide better error feedback to your users, you
can set `SimpleXlsxReader.configuration.catch_cell_load_errors =
true`, and load errors will instead be inserted into Sheet#load_errors keyed
by [rownum, colnum]:

```ruby
{
  [rownum, colnum] => '[error]'
}
```

### Performance

SimpleXlsxReader is (as of this writing) the fastest and most memory efficient
Ruby xlsx parser.

Recent updates here have focused on large spreadsheets with especially
non-unique strings in sheets using xlsx' shared strings feature
(Excel-generated spreadsheets always use this). Other projects have implemented
streaming parsers for the sheet data, but currently none stream while loading
the shared strings file, which is the second-largest file in an xlsx archive
and can represent millions of strings in large files.

For more details, see [my fork of @shkm's excel benchmark project](https://github.com/woahdae/excel-parsing-benchmarks), but here's the summary:

1mb excel file, 10,000 rows of sample "sales records" with a fair amount of
non-unique strings (ran on an M1 Macbook Pro):

| Gem                | Parses/second | RSS Increase | Allocated Mem | Retained Mem | Allocated Objects | Retained Objects |
|--------------------|---------------|--------------|---------------|--------------|-------------------|------------------|
| simple_xlsx_reader | 1.13          | 36.94mb      | 614.51mb      | 1.13kb       | 8796275           | 3                |
| roo                | 0.75          | 74.0mb       | 164.47mb      | 2.18kb       | 2128396           | 4                |
| creek              | 0.65          | 107.55mb     | 581.38mb      | 3.3kb        | 7240760           | 16               |
| xsv                | 0.61          | 75.66mb      | 2127.42mb     | 3.66kb       | 5922563           | 10               |
| rubyxl             | 0.27          | 373.52mb     | 716.7mb       | 2.18kb       | 10612577          | 4                |

Here is a benchmark for the "worst" file I've seen, a 26mb file whose shared
strings represent 10% of the archive (note, MemoryProfiler has too much
overhead to reasonably measure allocations so that analysis was left off, and
we just measure total time for one parse):

| Gem                | Time    | RSS Increase |
|--------------------|---------|--------------|
| simple_xlsx_reader | 28.71s  | 148.77mb     |
| roo                | 40.25s  | 1322.08mb    |
| xsv                | 45.82s  | 391.27mb     |
| creek              | 60.63s  | 886.81mb     |
| rubyxl             | 238.68s | 9136.3mb     |

## Installation

Add this line to your application's Gemfile:

    gem 'simple_xlsx_reader'

And then execute:

    $ bundle

Or install it yourself as:

    $ gem install simple_xlsx_reader

## Versioning

This project follows [semantic versioning 1.0](http://semver.org/spec/v1.0.0.html)

## Contributing

Remember to write tests, think about edge cases, and run the existing
suite.

The full suite contains a performance test that on an M1 MBP runs the final
large file in about five seconds. Check out that test before & after your
change to check for performance changes.

Then, the standard stuff:

1. Fork this project
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create new Pull Request


================================================
FILE: Rakefile
================================================
# frozen_string_literal: true

require "bundler/gem_tasks"

require 'rake/testtask'
Rake::TestTask.new do |t|
  t.pattern = "test/**/*_test.rb"
  t.libs << 'test'
end

task default: [:test]


================================================
FILE: lib/simple_xlsx_reader/document.rb
================================================
# frozen_string_literal: true

require 'forwardable'

module SimpleXlsxReader

  ##
  # Main class for the public API. See the README for usage examples,
  # or read the code, it's pretty friendly.
  class Document
    attr_reader :string_or_io

    def initialize(legacy_file_path = nil, file_path: nil, string_or_io: nil)
      fail(ArgumentError, 'either file_path or string_or_io must be provided') if legacy_file_path.nil? && file_path.nil? && string_or_io.nil?
  
      @string_or_io = string_or_io || File.new(legacy_file_path || file_path)
    end

    def sheets
      @sheets ||= Loader.new(string_or_io).init_sheets
    end

    # Expensive because it slurps all the sheets into memory,
    # probably only appropriate for testing
    def to_hash
      sheets.each_with_object({}) { |sheet, acc| acc[sheet.name] = sheet.rows.to_a; }
    end

    # `rows` is a RowsProxy that responds to #each
    class Sheet
      extend Forwardable

      attr_reader :name, :rows

      def_delegators :rows, :load_errors, :slurp

      def initialize(name:, sheet_parser:)
        @name = name
        @rows = RowsProxy.new(sheet_parser: sheet_parser)
      end

      # Legacy - consider `rows.each(headers: true)` for better performance
      def headers
        rows.slurped![0]
      end

      # Legacy - consider `rows` or `rows.each(headers: true)` for better
      # performance
      def data
        rows.slurped![1..-1]
      end
    end

    # Waits until we call #each with a block to parse the rows
    class RowsProxy
      include Enumerable

      attr_reader :slurped, :load_errors

      def initialize(sheet_parser:)
        @sheet_parser = sheet_parser
        @slurped = nil
        @load_errors = {}
      end

      # By default, #each streams the rows to the provided block, either as
      # arrays, or as header => cell value pairs if provided a `headers:`
      # argument.
      #
      # `headers` can be:
      #
      # * `true` - simply takes the first row as the header row
      # * block - calls the block with successive rows until the block returns
      #   true, which it then uses that row for the headers. All data prior to
      #   finding the headers is ignored.
      # * hash - transforms the header row by replacing cells with keys matched
      #   by value, ex. `{id: /ID|Identity/, name: /Name/i, date: 'Date'}` would
      #   potentially yield the row `{id: 5, name: 'Jane', date: [Date object]}`
      #   instead of the headers from the sheet. It would also search for the
      #   row that matches at least one header, in case the header row isn't the
      #   first.
      #
      # If rows have been slurped, #each will iterate the slurped rows instead.
      #
      # Note, calls to this after slurping will raise if given the `headers:`
      # argument, as that's handled by the sheet parser. If this is important
      # to someone, speak up and we could potentially support it.
      def each(headers: false, &block)
        if slurped?
          raise '#each does not support headers with slurped rows' if headers

          slurped.each(&block)
        elsif block_given?
          # It's possible to slurp while yielding to the block, which would
          # null out @sheet_parser, so let's just keep track of it here too
          sheet_parser = @sheet_parser
          @sheet_parser.parse(headers: headers, &block).tap do
            @load_errors = sheet_parser.load_errors
          end
        else
          to_enum(:each, headers: headers)
        end
      end

      # Mostly for legacy support, I'm not aware of a use case for doing this
      # when you don't have to.
      #
      # Note that #each will use slurped results if available, and since we're
      # leveraging Enumerable, all the other Enumerable methods will too.
      def slurp
        # possibly release sheet parser from memory on next GC run;
        # untested, but it can hold a lot of stuff, so worth a try
        @slurped ||= to_a.tap { @sheet_parser = nil }
      end

      def slurped?
        !!@slurped
      end

      def slurped!
        check_slurped

        slurped
      end

      def [](*args)
        check_slurped

        slurped[*args]
      end

      def shift(*args)
        check_slurped

        slurped.shift(*args)
      end

      private

      def check_slurped
        slurp if SimpleXlsxReader.configuration.auto_slurp
        return if slurped?

        raise 'Called a slurp-y method without explicitly slurping;'\
          ' use #each or call rows.slurp first'
      end
    end
  end
end


================================================
FILE: lib/simple_xlsx_reader/hyperlink.rb
================================================
# frozen_string_literal: true

module SimpleXlsxReader
  # We support hyperlinks as a "type" even though they're technically
  # represented either as a function or an external reference in the xlsx spec.
  #
  # In practice, hyperlinks are usually a link or a mailto. In the case of a
  # link, we probably want to follow it to download something, but in the case
  # of an email, we probably just want the email and not the mailto. So we
  # represent a hyperlink primarily as it is seen by the user, following the
  # principle of least surprise, but the url is accessible via #url.
  #
  # Microsoft calls the visible part of a hyperlink cell the "friendly name,"
  # so we expose that as a method too, in case you want to be explicit about
  # how you're accessing it.
  #
  # See MS documentation on the HYPERLINK function for some background:
  # https://support.office.com/en-us/article/HYPERLINK-function-333c7ce6-c5ae-4164-9c47-7de9b76f577f
  class Hyperlink < String
    attr_reader :friendly_name
    attr_reader :url

    def initialize(url, friendly_name = nil)
      @url = url
      @friendly_name = friendly_name&.to_s
      super(@friendly_name || @url)
    end
  end
end


================================================
FILE: lib/simple_xlsx_reader/loader/shared_strings_parser.rb
================================================
# frozen_string_literal: true

module SimpleXlsxReader
  class Loader
    # For performance reasons, excel uses an optional SpreadsheetML feature
    # that puts all strings in a separate xml file, and then references
    # them by their index in that file.
    #
    # http://msdn.microsoft.com/en-us/library/office/gg278314.aspx
    class SharedStringsParser < Nokogiri::XML::SAX::Document
      def self.parse(file)
        new.tap do |parser|
          Nokogiri::XML::SAX::Parser.new(parser).parse(file)
        end.result
      end

      def initialize
        @result = []
        @composite = false
        @extract = false
      end

      attr_reader :result

      def start_element(name, _attrs = [])
        case name
        when 'si' then @current_string = +"" # UTF-8 variant of String.new
        when 't' then @extract = true
        end
      end

      def characters(string)
        return unless @extract

        @current_string << string
      end

      def end_element(name)
        case name
        when 't' then @extract = false
        when 'si' then @result << @current_string
        end
      end
    end
  end
end


================================================
FILE: lib/simple_xlsx_reader/loader/sheet_parser.rb
================================================
# frozen_string_literal: true

require 'forwardable'

module SimpleXlsxReader
  class Loader
    class SheetParser < Nokogiri::XML::SAX::Document
      extend Forwardable

      attr_accessor :xrels_file
      attr_accessor :hyperlinks_by_cell

      attr_reader :load_errors

      def_delegators :@loader, :style_types, :shared_strings, :base_date

      def initialize(file_io:, loader:)
        @file_io = file_io
        @loader = loader
      end

      def parse(headers: false, &block)
        raise 'parse called without a block; what should this do?'\
          unless block_given?

        @headers = headers
        @each_callback = block
        @load_errors = {}
        @current_row_num = nil
        @last_seen_row_idx = 0
        @url = nil # silence warnings
        @function = nil # silence warnings
        @capture = nil # silence warnings
        @captured = nil # silence warnings
        @dimension = nil # silence warnings
        @column_index = 0

        @file_io.rewind # if it's IO from IO.read, we need to rewind it

        # In this project this is only used for GUI-made hyperlinks (as opposed
        # to FUNCTION-based hyperlinks). Unfortunately the're needed to parse
        # the spreadsheet, and they come AFTER the sheet data. So, solution is
        # to just stream-parse the file twice, first for the hyperlinks at the
        # bottom of the file, then for the file itself. In the future it would
        # be clever to use grep to extract the xml into its own smaller file.
        if xrels_file
          if xrels_file.grep(/hyperlink/).any?
            xrels_file.rewind
            load_gui_hyperlinks # represented as hyperlinks_by_cell
          end
          @file_io.rewind # we've already parsed this once
        end

        Nokogiri::XML::SAX::Parser.new(self).parse(@file_io)
      end

      ###
      # SAX document hooks

      def start_element_namespace(name, attrs = [], _prefix, _uri, _ns)
        case name
        when 'dimension'
          @dimension = attrs.last.value
        when 'row'
          @current_row_num = attrs.find {|attr| attr.localname == 'r'}&.value&.to_i
          @current_row = Array.new(column_length)
          @column_index = 0
        when 'c'
          attrs = attrs.inject({}) {|acc, attr| acc[attr.localname] = attr.value; acc}
          @cell_name = attrs['r'] || column_number_to_letter(@column_index)
          @type = attrs['t']
          @style = attrs['s'] && style_types[attrs['s'].to_i]
          @column_index += 1
        when 'f' then @function = true
        when 'v', 't' then @capture = true
        end
      end

      def characters(string)
        if @function
          # the only "function" we support is a hyperlink
          @url = string.slice(/HYPERLINK\("(.*?)"/, 1)
        end

        return unless @capture

        captured =
          begin
            SimpleXlsxReader::Loader.cast(
              string, @type, @style,
              url: @url || hyperlinks_by_cell&.[](@cell_name),
              shared_strings: shared_strings,
              base_date: base_date
            )
          rescue StandardError => e
            column, row = @cell_name.match(/([A-Z]+)([0-9]+)/).captures
            col_idx = column_letter_to_number(column) - 1
            row_idx = row.to_i - 1

            if !SimpleXlsxReader.configuration.catch_cell_load_errors
              error = CellLoadError.new(
                "Row #{row_idx}, Col #{col_idx}: #{e.message}"
              )
              error.set_backtrace(e.backtrace)
              raise error
            else
              @load_errors[[row_idx, col_idx]] = e.message

              string
            end
          end

        # For some reason I can't figure out in a reasonable timeframe,
        # SAX parsing some workbooks captures separate strings in the same cell
        # when we encounter UTF-8, although I can't get workbooks made in my
        # own version of excel to repro it. Our fix is just to keep building
        # the string in this case, although maybe there's a setting in Nokogiri
        # to make it not do this (looked, couldn't find it).
        #
        # Loading the workbook test/chunky_utf8.xlsx repros the issue.
        @captured = @captured ? @captured + (captured || '') : captured
      end

      def end_element_namespace(name, _prefix, _uri)
        case name
        when 'row'
          if @headers == true # ya a little funky
            @headers = @current_row
          elsif @headers.is_a?(Hash)
            test_headers_hash_against_current_row
            # in case there were empty rows before finding the header
            @last_seen_row_idx = @current_row_num - 1
          elsif @headers.respond_to?(:call)
            @headers = @current_row if @headers.call(@current_row)
            # in case there were empty rows before finding the header
            @last_seen_row_idx = @current_row_num - 1
          elsif @headers
            possibly_yield_empty_rows(headers: true)
            yield_row(@current_row, headers: true)
          else
            possibly_yield_empty_rows(headers: false)
            yield_row(@current_row, headers: false)
          end

          @last_seen_row_idx += 1

          # Note that excel writes a '/worksheet/dimension' node we can get
          # this from, but some libs (ex. simple_xlsx_writer) don't record it.
          # In that case, we assume the data is of uniform column length and
          # store the column name of the last header row we see. Obviously this
          # isn't the most robust strategy, but it likely fits 99% of use cases
          # considering it's not a problem with actual excel docs.
          @dimension = "A1:#{@cell_name}" if @dimension.nil?
        when 'v', 't'
          @current_row[cell_idx] = @captured
          @capture = false
          @captured = nil
        when 'f' then @function = false
        when 'c' then @url = nil
        end
      end

      ###
      # /End SAX hooks

      def test_headers_hash_against_current_row
        found = false

        @current_row.each_with_index do |cell, cell_idx|
          @headers.each_pair do |key, search|
            if search.is_a?(String) ? cell == search : cell&.match?(search)
              found = true
              @current_row[cell_idx] = key
            end
          end
        end

        @headers = @current_row if found
      end

      def possibly_yield_empty_rows(headers:)
        while @current_row_num && @current_row_num > @last_seen_row_idx + 1
          @last_seen_row_idx += 1
          yield_row(Array.new(column_length), headers: headers)
        end
      end

      def yield_row(row, headers:)
        if headers
          @each_callback.call(Hash[@headers.zip(row)])
        else
          @each_callback.call(row)
        end
      end

      # This sax-parses the whole sheet, just to extract hyperlink refs at the end.
      def load_gui_hyperlinks
        self.hyperlinks_by_cell =
          HyperlinksParser.parse(@file_io, xrels: xrels)
      end

      class HyperlinksParser < Nokogiri::XML::SAX::Document
        def initialize(file_io, xrels:)
          @file_io = file_io
          @xrels = xrels
        end

        def self.parse(file_io, xrels:)
          new(file_io, xrels: xrels).parse
        end

        def parse
          @hyperlinks_by_cell = {}
          Nokogiri::XML::SAX::Parser.new(self).parse(@file_io)
          @hyperlinks_by_cell
        end

        def start_element_namespace(name, attrs, _prefix, _uri, _ns)
          case name
          when 'hyperlink'
            attrs = attrs.inject({}) {|acc, attr| acc[attr.localname] = attr.value; acc}
            id = attrs['id'] || attrs['r:id']

            @hyperlinks_by_cell[attrs['ref']] =
              @xrels.at_xpath(%(//*[@Id="#{id}"])).attr('Target')
          end
        end
      end

      def xrels
        @xrels ||= Nokogiri::XML(xrels_file.read) if xrels_file
      end

      def column_length
        return 0 unless @dimension

        @column_length ||= column_letter_to_number(last_cell_letter)
      end

      def cell_idx
        column_letter_to_number(@cell_name.scan(/[A-Z]+/).first) - 1
      end

      ##
      # Returns the last column name, ex. 'E'
      def last_cell_letter
        return unless @dimension

        @dimension.scan(/:([A-Z]+)/)&.first&.first || 'A'
      end

      # formula fits an exponential factorial function of the form:
      # 'A'   = 1
      # 'B'   = 2
      # 'Z'   = 26
      # 'AA'  = 26 * 1  + 1
      # 'AZ'  = 26 * 1  + 26
      # 'BA'  = 26 * 2  + 1
      # 'ZA'  = 26 * 26 + 1
      # 'ZZ'  = 26 * 26 + 26
      # 'AAA' = 26 * 26 * 1 + 26 * 1  + 1
      # 'AAZ' = 26 * 26 * 1 + 26 * 1  + 26
      # 'ABA' = 26 * 26 * 1 + 26 * 2  + 1
      # 'BZA' = 26 * 26 * 2 + 26 * 26 + 1
      def column_letter_to_number(column_letter)
        pow = column_letter.length - 1
        result = 0
        column_letter.each_byte do |b|
          result += 26**pow * (b - 64)
          pow -= 1
        end
        result
      end

      def column_number_to_letter(n)
        result = []
        loop do
          result.unshift((n % 26 + 65).chr)
          n = (n / 26) - 1
          break if n < 0
        end
        result.join
      end
    end
  end
end


================================================
FILE: lib/simple_xlsx_reader/loader/style_types_parser.rb
================================================
# frozen_string_literal: true

module SimpleXlsxReader
  class Loader
    StyleTypesParser = Struct.new(:file_io) do
      def self.parse(file_io)
        new(file_io).tap(&:parse).style_types
      end

      # Map of non-custom numFmtId to casting symbol
      NumFmtMap = {
        0 => :string,        # General
        1 => :fixnum,        # 0
        2 => :float,         # 0.00
        3 => :fixnum,        # #,##0
        4 => :float,         # #,##0.00
        5 => :unsupported,   # $#,##0_);($#,##0)
        6 => :unsupported,   # $#,##0_);[Red]($#,##0)
        7 => :unsupported,   # $#,##0.00_);($#,##0.00)
        8 => :unsupported,   # $#,##0.00_);[Red]($#,##0.00)
        9 => :percentage,    # 0%
        10 => :percentage,   # 0.00%
        11 => :bignum,       # 0.00E+00
        12 => :unsupported,  # # ?/?
        13 => :unsupported,  # # ??/??
        14 => :date,         # mm-dd-yy
        15 => :date,         # d-mmm-yy
        16 => :date,         # d-mmm
        17 => :date,         # mmm-yy
        18 => :time,         # h:mm AM/PM
        19 => :time,         # h:mm:ss AM/PM
        20 => :time,         # h:mm
        21 => :time,         # h:mm:ss
        22 => :date_time,    # m/d/yy h:mm
        37 => :unsupported,  # #,##0 ;(#,##0)
        38 => :unsupported,  # #,##0 ;[Red](#,##0)
        39 => :unsupported,  # #,##0.00;(#,##0.00)
        40 => :unsupported,  # #,##0.00;[Red](#,##0.00)
        44 => :float,        # some odd currency format ?from Office 2007?
        45 => :time,         # mm:ss
        46 => :time,         # [h]:mm:ss
        47 => :time,         # mmss.0
        48 => :bignum,       # ##0.0E+0
        49 => :unsupported   # @
      }.freeze

      def parse
        @xml = Nokogiri::XML(file_io.read).remove_namespaces!
      end

      # Excel doesn't record types for some cells, only its display style, so
      # we have to back out the type from that style.
      #
      # Some of these styles can be determined from a known set (see NumFmtMap),
      # while others are 'custom' and we have to make a best guess.
      #
      # This is the array of types corresponding to the styles a spreadsheet
      # uses, and includes both the known style types and the custom styles.
      #
      # Note that the xml sheet cells that use this don't reference the
      # numFmtId, but instead the array index of a style in the stored list of
      # only the styles used in the spreadsheet (which can be either known or
      # custom). Hence this style types array, rather than a map of numFmtId to
      # type.
      def style_types
        @xml.xpath('/styleSheet/cellXfs/xf').map do |xstyle|
          style_type_by_num_fmt_id(
            xstyle.attributes['numFmtId']&.value
          )
        end
      end

      # Finds the type we think a style is; For example, fmtId 14 is a date
      # style, so this would return :date.
      #
      # Note, custom styles usually (are supposed to?) have a numFmtId >= 164,
      # but in practice can sometimes be simply out of the usual "Any Language"
      # id range that goes up to 49. For example, I have seen a numFmtId of
      # 59 specified as a date. In Thai, 59 is a number format, so this seems
      # like a bad idea, but we try to be flexible and just go with it.
      def style_type_by_num_fmt_id(id)
        return nil if id.nil?

        id = id.to_i
        NumFmtMap[id] || custom_style_types[id]
      end

      # Map of (numFmtId >= 164) (custom styles) to our best guess at the type
      # ex. {164 => :date_time}
      def custom_style_types
        @custom_style_types ||=
          @xml.xpath('/styleSheet/numFmts/numFmt')
            .each_with_object({}) do |xstyle, acc|
              acc[xstyle.attributes['numFmtId'].value.to_i] =
                determine_custom_style_type(xstyle.attributes['formatCode'].value)
            end
      end

      # This is the least deterministic part of reading xlsx files. Due to
      # custom styles, you can't know for sure when a date is a date other than
      # looking at its format and gessing. It's not impossible to guess right,
      # though.
      #
      # http://stackoverflow.com/questions/4948998/determining-if-an-xlsx-cell-is-date-formatted-for-excel-2007-spreadsheets
      def determine_custom_style_type(string)
        return :float if string[0] == '_'
        return :float if string[0] == ' 0'

        # Looks for one of ymdhis outside of meta-stuff like [Red]
        return :date_time if string =~ /(^|\])[^\[]*[ymdhis]/i

        :unsupported
      end
    end
  end
end


================================================
FILE: lib/simple_xlsx_reader/loader/workbook_parser.rb
================================================
# frozen_string_literal: true

module SimpleXlsxReader
  class Loader
    WorkbookParser = Struct.new(:file_io) do
      def self.parse(file_io)
        parser = new(file_io).tap(&:parse)
        [parser.sheet_toc, parser.base_date]
      end

      def parse
        @xml = Nokogiri::XML(file_io.read).remove_namespaces!
      end

      # Table of contents for the sheets, ex. {'Authors' => 0, ...}
      def sheet_toc
        @xml.xpath('/workbook/sheets/sheet')
          .each_with_object({}) do |sheet, acc|
            acc[sheet.attributes['name'].value] =
              sheet.attributes['sheetId'].value.to_i - 1 # keep things 0-indexed
          end
      end

      ## Returns the base_date from which to calculate dates.
      # Defaults to 1900 (minus two days due to excel quirk), but use 1904 if
      # it's set in the Workbook's workbookPr.
      # http://msdn.microsoft.com/en-us/library/ff530155(v=office.12).aspx
      def base_date
        return DATE_SYSTEM_1900 if @xml.nil?

        @xml.xpath('//workbook/workbookPr[@date1904]').each do |workbookPr|
          return DATE_SYSTEM_1904 if workbookPr['date1904'] =~ /true|1/i
        end

        DATE_SYSTEM_1900
      end
    end
  end
end


================================================
FILE: lib/simple_xlsx_reader/loader.rb
================================================
# frozen_string_literal: true

module SimpleXlsxReader
  class Loader < Struct.new(:string_or_io)
    attr_accessor :shared_strings, :sheet_parsers, :sheet_toc, :style_types, :base_date

    def init_sheets
      ZipReader.new(
        string_or_io: string_or_io,
        loader: self
      ).read

      sheet_toc.each_with_index.map do |(sheet_name, _sheet_number), i|
        # sheet_number is *not* the index into xml.sheet_parsers
        SimpleXlsxReader::Document::Sheet.new(
          name: sheet_name,
          sheet_parser: sheet_parsers[i]
        )
      end
    end

    ZipReader = Struct.new(:string_or_io, :loader, keyword_init: true) do
      attr_reader :zip

      def initialize(*args)
        super
        @zip = SimpleXlsxReader::Zip.open_buffer(string_or_io)
      end

      def read
        entry_at('xl/workbook.xml') do |file_io|
          loader.sheet_toc, loader.base_date = *WorkbookParser.parse(file_io)
        end

        entry_at('xl/styles.xml') do |file_io|
          loader.style_types = StyleTypesParser.parse(file_io)
        end

        # optional feature used by excel,
        # but not often used by xlsx generation libraries
        if (ss_entry = entry_at('xl/sharedStrings.xml'))
          ss_entry.get_input_stream do |file|
            loader.shared_strings = SharedStringsParser.parse(file)
          end
        else
          loader.shared_strings = []
        end

        loader.sheet_parsers = []

        # Sometimes there's a zero-index sheet.xml, ex.
        # Google Docs creates:
        # xl/worksheets/sheet.xml
        # xl/worksheets/sheet1.xml
        # xl/worksheets/sheet2.xml
        # While Excel creates:
        # xl/worksheets/sheet1.xml
        # xl/worksheets/sheet2.xml
        add_sheet_parser_at_index(nil)

        i = 1
        while(add_sheet_parser_at_index(i)) do
          i += 1
        end
      end

      def entry_at(path, &block)
        # Older and newer (post-mid-2021) RubyZip normalizes pathnames,
        # but unfortunately there is a time in between where it doesn't.
        # Rather than require a specific version, let's just be flexible.
        entry =
          zip.find_entry(path) || # *nix-generated
          zip.find_entry(path.tr('/', '\\')) || # Windows-generated
          zip.find_entry(path.downcase) || # Sometimes it's lowercase
          zip.find_entry(path.tr('/', '\\').downcase) # Sometimes it's lowercase

        if block
          entry.get_input_stream(&block)
        else
          entry
        end
      end

      def add_sheet_parser_at_index(i)
        sheet_file_name = "xl/worksheets/sheet#{i}.xml"
        return unless (entry = entry_at(sheet_file_name))

        parser =
          SheetParser.new(
            file_io: entry.get_input_stream,
            loader: loader
          )

        relationship_file_name = "xl/worksheets/_rels/sheet#{i}.xml.rels"
        if (rel = entry_at(relationship_file_name))
          parser.xrels_file = rel.get_input_stream
        end

        loader.sheet_parsers << parser
      end
    end

    ##
    # The heart of typecasting. The ruby type is determined either explicitly
    # from the cell xml or implicitly from the cell style, and this
    # method expects that work to have been done already. This, then,
    # takes the type we determined it to be and casts the cell value
    # to that type.
    #
    # types:
    # - s: shared string (see #shared_string)
    # - n: number (cast to a float)
    # - b: boolean
    # - str: string
    # - inlineStr: string
    # - ruby symbol: for when type has been determined by style
    #
    # options:
    # - shared_strings: needed for 's' (shared string) type
    def self.cast(value, type, style, options = {})
      return nil if value.nil? || value.empty?

      # Sometimes the type is dictated by the style alone
      if type.nil? ||
         (type == 'n' && %i[date time date_time].include?(style))
        type = style
      end

      casted =
        case type

        ##
        # There are few built-in types
        ##

        when 's' # shared string
          options[:shared_strings][value.to_i]
        when 'n' # number
          value.to_f
        when 'b'
          value.to_i == 1
        when 'str'
          value
        when 'inlineStr'
          value

        ##
        # Type can also be determined by a style,
        # detected earlier and cast here by its standardized symbol
        ##

        # no type encoded with the the General format defaults to a number type
        when nil, :string
          retval = Integer(value, exception: false)
          retval ||= Float(value, exception: false)
          retval ||= value
          retval
        when :unsupported
          value
        when :fixnum
          value.to_i
        when :float
          value.to_f
        when :percentage
          value.to_f
        # the trickiest. note that  all these formats can vary on
        # whether they actually contain a date, time, or datetime.
        when :date, :time, :date_time
          value = Float(value)
          days_since_date_system_start = value.to_i
          fraction_of_24 = value - days_since_date_system_start

          # http://stackoverflow.com/questions/10559767/how-to-convert-ms-excel-date-from-float-to-date-format-in-ruby
          date = options.fetch(:base_date, DATE_SYSTEM_1900) + days_since_date_system_start

          if fraction_of_24 > 0 # there is a time associated
            seconds = (fraction_of_24 * 86_400).round
            return Time.utc(date.year, date.month, date.day) + seconds
          else
            return date
          end
        when :bignum
          if defined?(BigDecimal)
            BigDecimal(value)
          else
            value.to_f
          end

        ##
        # Beats me
        ##

        else
          value
        end

      if options[:url]
        Hyperlink.new(options[:url], casted)
      else
        casted
      end
    end
  end
end



================================================
FILE: lib/simple_xlsx_reader/version.rb
================================================
# frozen_string_literal: true

module SimpleXlsxReader
  VERSION = '5.1.0'
end


================================================
FILE: lib/simple_xlsx_reader.rb
================================================
# frozen_string_literal: true

require 'nokogiri'
require 'date'

require 'simple_xlsx_reader/version'
require 'simple_xlsx_reader/hyperlink'
require 'simple_xlsx_reader/document'
require 'simple_xlsx_reader/loader'
require 'simple_xlsx_reader/loader/workbook_parser'
require 'simple_xlsx_reader/loader/shared_strings_parser'
require 'simple_xlsx_reader/loader/sheet_parser'
require 'simple_xlsx_reader/loader/style_types_parser'


# Rubyzip 1.0 only has different naming, everything else is the same, so let's
# be flexible so we don't force people into a dependency hell w/ other gems.
begin
  # Try loading rubyzip < 1.0
  require 'zip/zip'
  require 'zip/zipfilesystem'
  SimpleXlsxReader::Zip = Zip::ZipFile
rescue LoadError
  # Try loading rubyzip >= 1.0
  require 'zip'
  require 'zip/filesystem'
  SimpleXlsxReader::Zip = Zip::File
end

module SimpleXlsxReader
  DATE_SYSTEM_1900 = Date.new(1899, 12, 30)
  DATE_SYSTEM_1904 = Date.new(1904, 1, 1)

  class CellLoadError < StandardError; end

  class << self
    def configuration
      @configuration ||= Struct.new(:catch_cell_load_errors, :auto_slurp).new.tap do |c|
        c.catch_cell_load_errors = false
        c.auto_slurp = false
      end
    end

    def open(file_path)
      Document.new(file_path: file_path).tap(&:sheets)
    end
    
    def parse(string_or_io)
      Document.new(string_or_io: string_or_io).tap(&:sheets)
    end
  end
end


================================================
FILE: simple_xlsx_reader.gemspec
================================================
# -*- encoding: utf-8 -*-
lib = File.expand_path('../lib', __FILE__)
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
require 'simple_xlsx_reader/version'

Gem::Specification.new do |gem|
  gem.name          = "simple_xlsx_reader"
  gem.version       = SimpleXlsxReader::VERSION
  gem.authors       = ["Woody Peterson"]
  gem.email         = ["woody.peterson@gmail.com"]
  gem.description   = %q{Read xlsx data the Ruby way}
  gem.summary       = %q{Read xlsx data the Ruby way}
  gem.homepage      = ""
  gem.license       = "MIT"

  gem.add_dependency 'nokogiri'
  gem.add_dependency 'rubyzip'

  gem.add_development_dependency 'minitest', '>= 5.0'
  gem.add_development_dependency 'rake'
  gem.add_development_dependency 'pry'

  gem.files         = `git ls-files`.split($/)
  gem.executables   = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
  gem.test_files    = gem.files.grep(%r{^test/})
  gem.require_paths = ["lib"]
end


================================================
FILE: test/date1904_test.rb
================================================
# frozen_string_literal: true

require_relative 'test_helper'

describe SimpleXlsxReader do
  let(:date1904_file) { File.join(File.dirname(__FILE__), 'date1904.xlsx') }
  let(:subject) { SimpleXlsxReader::Document.new(date1904_file) }

  it 'supports converting dates with the 1904 date system' do
    _(subject.to_hash).must_equal(
      'date1904' => [[Date.parse('2014-05-01')]]
    )
  end
end


================================================
FILE: test/datetime_test.rb
================================================
# frozen_string_literal: true

require_relative 'test_helper'

describe SimpleXlsxReader do
  let(:datetimes_file) do
    File.join(
      File.dirname(__FILE__),
      'datetimes.xlsx'
    )
  end

  let(:subject) { SimpleXlsxReader::Document.new(datetimes_file) }

  it 'converts date_times with the correct precision' do
    _(subject.to_hash).must_equal(
      'Datetimes' =>
        [
          [Time.parse('2013-08-19 18:29:59 UTC')],
          [Time.parse('2013-08-19 18:30:00 UTC')],
          [Time.parse('2013-08-19 18:30:01 UTC')],
          [Time.parse('1899-12-30 00:30:00 UTC')]
        ]
    )
  end
end


================================================
FILE: test/gdocs_sheet_test.rb
================================================
# frozen_string_literal: true

require_relative 'test_helper'
require 'time'

describe SimpleXlsxReader do
  let(:one_sheet_file) { File.join(File.dirname(__FILE__), 'gdocs_sheet.xlsx') }
  let(:subject) { SimpleXlsxReader::Document.new(one_sheet_file) }

  it 'able to load file from google docs' do
    _(subject.to_hash).must_equal(
      'List 1' => [['Empty gdocs list 1']],
      'List 2' => [['Empty gdocs list 2']]
    )
  end
end


================================================
FILE: test/lower_case_sharedstrings_test.rb
================================================
# frozen_string_literal: true

require_relative 'test_helper'

describe SimpleXlsxReader do
  let(:lower_case_shared_strings) do
    File.join(
      File.dirname(__FILE__),
      'lower_case_sharedstrings.xlsx'
    )
  end

  let(:subject) { SimpleXlsxReader::Document.new(lower_case_shared_strings) }

  describe '#to_hash' do
    it 'should have the word Well in the first row' do
      _(subject.sheets.first.rows.to_a[0]).must_include('Well')
    end
  end
end


================================================
FILE: test/namespaces_and_missing_atts_test.rb
================================================
# frozen_string_literal: true

require_relative 'test_helper'

describe SimpleXlsxReader do
  # Based on a real-world sheet possibly generated by PowerBI, where the xml
  # has namespacing and rows are missing the 'r' attribute.
  let(:sheet) do
    <<~XML
      <?xml version="1.0" encoding="utf-8"?>
      <x:worksheet xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
        <x:sheetData>
          <x:row>
            <x:c s="2" t="inlineStr">
              <x:is>
                <x:t>Salmon</x:t>
              </x:is>
            </x:c>
            <x:c s="2" t="inlineStr">
              <x:is>
                <x:t>Trout</x:t>
              </x:is>
            </x:c>
          </x:row>
          <x:row>
            <x:c s="2" t="inlineStr">
              <x:is>
                <x:t>Cat</x:t>
              </x:is>
            </x:c>
            <x:c s="2" t="inlineStr">
              <x:is>
                <x:t>Dog</x:t>
              </x:is>
            </x:c>
          </x:row>
        </x:sheetData>
      </x:worksheet>
    XML
  end

  let(:styles) do
    <<~XML
      <?xml version="1.0" encoding="utf-8"?><x:styleSheet xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><x:numFmts><x:numFmt numFmtId="181" formatCode="0" /><x:numFmt numFmtId="182" formatCode="m/d/yyyy h:mm:ss AM/PM" /><x:numFmt numFmtId="183" formatCode="dd MMMM yyyy" /></x:numFmts><x:fonts><x:font /><x:font><x:b /></x:font></x:fonts><x:fills><x:fill><x:patternFill patternType="none" /></x:fill><x:fill><x:patternFill patternType="gray125" /></x:fill></x:fills><x:borders><x:border /><x:border><x:bottom style="thin" /></x:border><x:border><x:right style="thin" /></x:border></x:borders><x:cellXfs><x:xf /><x:xf fontId="1" /><x:xf borderId="1" /><x:xf fontId="1" borderId="1" /><x:xf borderId="2" /><x:xf fontId="1" borderId="2" /><x:xf><x:alignment vertical="top" /></x:xf><x:xf fontId="1"><x:alignment vertical="top" /></x:xf><x:xf numFmtId="181" /><x:xf numFmtId="182" /><x:xf numFmtId="183" /><x:xf numFmtId="182" fontId="1" /><x:xf numFmtId="181" fontId="1" /><x:xf numFmtId="183" fontId="1" /></x:cellXfs></x:styleSheet>
    XML
  end

  let(:wonky_file) do
    TestXlsxBuilder.new(
      sheets: [sheet],
      styles: styles
    )
  end

  let(:subject) { SimpleXlsxReader::Document.new(wonky_file.archive.path) }

  describe '#to_hash' do
    it 'should extract values from namespaced cells missing "r" attributes' do
      _(subject.sheets.first.rows.to_a[0]).must_include('Salmon')
      _(subject.sheets.first.rows.to_a[1]).must_include('Dog')
    end
  end
end


================================================
FILE: test/performance_test.rb
================================================
# frozen_string_literal: true

require_relative 'test_helper'
require 'minitest/benchmark'

describe 'SimpleXlsxReader Benchmark' do
  # n is 0-indexed for us, then converted to 1-indexed for excel
  def sheet_with_n_rows(row_count)
    acc = +""
    acc <<
      <<~XML
        <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
          <sheetData>
      XML

    row_count.times.each do |n|
      n += 1
      acc <<
        <<~XML
          <row>
            <c r='A#{n}' s='0'>
              <v>Cell A#{n}</v>
            </c>
            <c r='B#{n}' s='1'>
              <v>2.4</v>
            </c>
            <c r='C#{n}' s='2'>
              <v>30687</v>
            </c>
            <c r='D#{n}' t='inlineStr' s='0'>
              <is><t>Cell D#{n}</t></is>
            </c>

            <c r='E#{n}' s='0'>
              <v>Cell E#{n}</v>
            </c>
            <c r='F#{n}' s='1'>
              <v>2.4</v>
            </c>
            <c r='G#{n}' s='2'>
              <v>30687</v>
            </c>
            <c r='H#{n}' t='inlineStr' s='0'>
              <is><t>Cell H#{n}</t></is>
            </c>

            <c r='I#{n}' s='0'>
              <v>Cell I#{n}</v>
            </c>
            <c r='J#{n}' s='1'>
              <v>2.4</v>
            </c>
            <c r='K#{n}' s='2'>
              <v>30687</v>
            </c>
            <c r='L#{n}' t='inlineStr' s='0'>
              <is><t>Cell L#{n}</t></is>
            </c>
          </row>
        XML
    end

    acc <<
      <<~XML
          </sheetData>
        </worksheet>
      XML
  end

  let(:styles) do
    # s='0' above refers to the value of numFmtId at cellXfs index 0,
    # which is in this case 'General' type
    _styles =
      <<-XML
        <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
          <cellXfs count="1">
            <xf numFmtId="0" />
            <xf numFmtId="2" />
            <xf numFmtId="14" />
          </cellXfs>
        </styleSheet>
      XML
  end

  before do
    @xlsxs = {}

    # Every new sheet has one more row
    self.class.bench_range.each do |num_rows|
      @xlsxs[num_rows] =
        TestXlsxBuilder.new(
          sheets: [sheet_with_n_rows(num_rows)],
          styles: styles
        ).archive
    end
  end

  def self.bench_range
    # Works out to a max just shy of 265k rows, which takes ~20s on my M1 Mac.
    # Second-largest is ~65k rows @ ~5s.
    max = ENV['BIG_PERF_TEST'] ? 265_000 : 66_000
    bench_exp(100, max, 4)
  end

  bench_performance_linear 'parses sheets in linear time', 0.999 do |n|
    SimpleXlsxReader.open(@xlsxs[n].path).sheets[0].rows.each(headers: true) {|_row| }
  end
end


================================================
FILE: test/shared_strings.xml
================================================
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="6" uniqueCount="5">
    <si>
        <t>Cell A1</t>
    </si>
    <si>
        <t>Cell B1</t>
    </si>
    <si>
        <t>My Cell</t>
    </si>
    <si>
        <r>
            <rPr>
                <sz val="11"/>
                <color rgb="FFFF0000"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t>Cell</t>
        </r>
        <r>
            <rPr>
                <sz val="11"/>
                <color theme="1"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t xml:space="preserve"> </t>
        </r>
        <r>
            <rPr>
                <b/>
                <sz val="11"/>
                <color theme="1"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t>A2</t>
        </r>
    </si>
    <si>
        <r>
            <rPr>
                <sz val="11"/>
                <color rgb="FF00B0F0"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t>Cell</t>
        </r>
        <r>
            <rPr>
                <sz val="11"/>
                <color theme="1"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t xml:space="preserve"> </t>
        </r>
        <r>
            <rPr>
                <i/>
                <sz val="11"/>
                <color theme="1"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t>B2</t>
        </r>
    </si>
    <si>
        <t>Cell Fmt</t>
    </si>
    <si>
      <t>’ When it sees a unicode character (such as the fancy apostrophe starting this sentence), it starts chunking the stream for at least the current node, and we have to keep consuming the characters until we hit the end of the text. We can't assume that the string first given by the SAX callback us is the whole shared string content. It only happens with both unicode *and* really long text.
      </t>
    </si>
</sst>


================================================
FILE: test/simple_xlsx_reader_test.rb
================================================
# frozen_string_literal: true

require_relative 'test_helper'
require 'time'

SXR = SimpleXlsxReader

describe SimpleXlsxReader do
  let(:sesame_street_blog_file) do
    File.join(File.dirname(__FILE__), 'sesame_street_blog.xlsx')
  end

  let(:document) { SimpleXlsxReader.open(sesame_street_blog_file) }

  ##
  # A high-level acceptance test testing misc features such as date/time parsing,
  # hyperlinks (both function and ref kinds), formula dates, emty rows, etc.

  let(:sesame_street_blog_file_path) { File.join(File.dirname(__FILE__), 'sesame_street_blog.xlsx') }
  let(:sesame_street_blog_io) { File.new(sesame_street_blog_file_path) }
  let(:sesame_street_blog_string) { IO.read(sesame_street_blog_file_path) }

  let(:expected_result) do
    {
      'Authors' =>
      [
        ['Name', 'Occupation'],
        ['Big Bird', 'Teacher']
      ],
      'Posts' =>
      [
        ['Author Name', 'Title', 'Body', 'Created At', 'Comment Count', 'URL'],
        ['Big Bird', 'The Number 1', 'The Greatest', Time.parse('2002-01-01 11:00:00 UTC'), 1, SXR::Hyperlink.new('http://www.example.com/hyperlink-function', 'This uses the HYPERLINK() function')],
        ['Big Bird', 'The Number 2', 'Second Best', Time.parse('2002-01-02 14:00:00 UTC'), 2, SXR::Hyperlink.new('http://www.example.com/hyperlink-gui', 'This uses the hyperlink GUI option')],
        ['Big Bird', 'Formula Dates', 'Tricky tricky', Time.parse('2002-01-03 14:00:00 UTC'), 0, nil],
        ['Empty Eagress', nil, 'The title, date, and comment have types, but no values', nil, nil, nil]
      ]
    }
  end

  describe SimpleXlsxReader do
    describe 'load from file path' do
      let(:subject) { SimpleXlsxReader.open(sesame_street_blog_file_path) }

      it 'reads an xlsx file into a hash of {[sheet name] => [data]}' do
        _(subject.to_hash).must_equal(expected_result)
      end
    end

    describe 'load from buffer' do
      let(:subject) { SimpleXlsxReader.parse(sesame_street_blog_io) }

      it 'reads an xlsx buffer into a hash of {[sheet name] => [data]}' do
        _(subject.to_hash).must_equal(expected_result)
      end
    end

    describe 'load from string' do
      let(:subject) { SimpleXlsxReader.parse(sesame_street_blog_string) }

      it 'reads an xlsx string into a hash of {[sheet name] => [data]}' do
        _(subject.to_hash).must_equal(expected_result)
      end
    end

    it 'outputs strings in UTF-8 encoding' do
      document = SimpleXlsxReader.parse(sesame_street_blog_io)
      _(document.sheets[0].rows.to_a.flatten.map(&:encoding).uniq)
        .must_equal [Encoding::UTF_8]
    end

    it 'can use all our enumerable nicities without slurping' do
      document = SimpleXlsxReader.parse(sesame_street_blog_io)

      headers = {
        name: 'Author Name',
        title: 'Title',
        body: 'Body',
        created_at: 'Created At',
        count: /Count/
      }

      rows = document.sheets[1].rows
      result =
        rows.each(headers: headers).with_index.with_object({}) do |(row, i), acc|
          acc[i] = row
        end

      _(result[0]).must_equal(
        name: 'Big Bird',
        title: 'The Number 1',
        body: 'The Greatest',
        created_at: Time.parse('2002-01-01 11:00:00 UTC'),
        count: 1,
        "URL" => 'This uses the HYPERLINK() function'
      )

      _(rows.slurped?).must_equal false
    end
  end

  ##
  # For more fine-grained unit tests, we sometimes build our own workbook via
  # Nokogiri. TestXlsxBuilder has some defaults, and this let-style lets us
  # concisely override them in nested describe blocks.

  let(:shared_strings) { nil }
  let(:styles) { nil }
  let(:sheet) { nil }
  let(:workbook) { nil }
  let(:rels) { nil }

  let(:xlsx) do
    TestXlsxBuilder.new(
      shared_strings: shared_strings,
      styles: styles,
      sheets: sheet && [sheet],
      workbook: workbook,
      rels: rels
    )
  end

  let(:reader) { SimpleXlsxReader.open(xlsx.archive.path) }

  describe 'when parsing escaped characters' do
    let(:escaped_content) do
      '&lt;a href="https://www.example.com"&gt;Link A&lt;/a&gt; &amp;bull; &lt;a href="https://www.example.com"&gt;Link B&lt;/a&gt;'
    end

    let(:unescaped_content) do
      '<a href="https://www.example.com">Link A</a> &bull; <a href="https://www.example.com">Link B</a>'
    end

    let(:sheet) do
      <<~XML
        <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
          <dimension ref="A1:B1" />
          <sheetData>
            <row r="1">
              <c r="A1" s="1" t="s">
                <v>0</v>
              </c>
              <c r='B1' s='0'>
                <v>#{escaped_content}</v>
              </c>
            </row>
          </sheetData>
        </worksheet>
      XML
    end

    let(:shared_strings) do
      <<~XML
        <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
          <si>
            <t>#{escaped_content}</t>
          </si>
        </sst>
      XML
    end

    it 'loads correctly using inline strings' do
      _(reader.sheets[0].rows.slurp[0][0]).must_equal(unescaped_content)
    end

    it 'loads correctly using shared strings' do
      _(reader.sheets[0].rows.slurp[0][1]).must_equal(unescaped_content)
    end
  end

  describe 'Sheet#rows#each(headers: true)' do
    let(:sheet) do
      <<~XML
        <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
          <dimension ref="A1:B3" />
          <sheetData>
            <row r="1">
              <c r="A1" s="0">
                <v>Header 1</v>
              </c>
              <c r="B1" s="0">
                <v>Header 2</v>
              </c>
            </row>
            <row r="2">
              <c r="A2" s="0">
                <v>Data 1-A</v>
              </c>
              <c r="B2" s="0">
                <v>Data 1-B</v>
              </c>
            </row>
            <row r="4">
              <c r="A4" s="0">
                <v>Data 2-A</v>
              </c>
              <c r="B4" s="0">
                <v>Data 2-B</v>
              </c>
            </row>
          </sheetData>
        </worksheet>
      XML
    end

    it 'yields rows as hashes' do
      acc = []

      reader.sheets[0].rows.each(headers: true) do |row|
        acc << row
      end

      _(acc).must_equal(
        [
          { 'Header 1' => 'Data 1-A', 'Header 2' => 'Data 1-B' },
          { 'Header 1' => nil, 'Header 2' => nil },
          { 'Header 1' => 'Data 2-A', 'Header 2' => 'Data 2-B' }
        ]
      )
    end
  end

  describe 'Sheet#rows#each(headers: ->(row) {...})' do
    let(:sheet) do
      <<~XML
        <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
          <dimension ref="A1:B7" />
          <sheetData>
            <row r="1">
              <c r="A1" s="0">
                <v>a chart or something</v>
              </c>
              <c r="B1" s="0">
                <v>Rabble rabble</v>
              </c>
            </row>
            <row r="2">
              <c r="A2" s="0">
                <v>Chatty junk</v>
              </c>
              <c r="B2" s="0">
                <v></v>
              </c>
            </row>
            <row r="4">
              <c r="A4" s="0">
                <v>Header 1</v>
              </c>
              <c r="B4" s="0">
                <v>Header 2</v>
              </c>
            </row>
            <row r="5">
              <c r="A5" s="0">
                <v>Data 1-A</v>
              </c>
              <c r="B5" s="0">
                <v>Data 1-B</v>
              </c>
            </row>
            <row r="7">
              <c r="A7" s="0">
                <v>Data 2-A</v>
              </c>
              <c r="B7" s="0">
                <v>Data 2-B</v>
              </c>
            </row>
          </sheetData>
        </worksheet>
      XML
    end

    it 'yields rows as hashes' do
      acc = []

      finder = ->(row) { row.find {|c| c&.match(/Header/)} }
      reader.sheets[0].rows.each(headers: finder) do |row|
        acc << row
      end

      _(acc).must_equal(
        [
          { 'Header 1' => 'Data 1-A', 'Header 2' => 'Data 1-B' },
          { 'Header 1' => nil, 'Header 2' => nil },
          { 'Header 1' => 'Data 2-A', 'Header 2' => 'Data 2-B' }
        ]
      )
    end
  end

  describe "Sheet#rows#each(headers: a_hash)" do
    let(:sheet) do
      Nokogiri::XML(
        <<~XML
          <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
            <dimension ref="A1:C7" />
            <sheetData>
              <row r="1">
                <c r="A1" s="0">
                  <v>a chart or something</v>
                </c>
                <c r="B1" s="0">
                  <v>Rabble rabble</v>
                </c>
                <c r="C1" s="0">
                  <v>Rabble rabble</v>
                </c>
              </row>
              <row r="2">
                <c r="A2" s="0">
                  <v>Chatty junk</v>
                </c>
                <c r="B2" s="0">
                  <v></v>
                </c>
                <c r="C2" s="0">
                  <v></v>
                </c>
              </row>
              <row r="4">
                <c r="A4" s="0">
                  <v>ID Number</v>
                </c>
                <c r="B4" s="0">
                  <v>ExacT</v>
                </c>
                <c r="C4" s="0">
                  <v>FOO Name</v>
                </c>

              </row>
              <row r="5">
                <c r="A5" s="0">
                  <v>ID 1-A</v>
                </c>
                <c r="B5" s="0">
                  <v>Exact 1-B</v>
                </c>
                <c r="C5" s="0">
                  <v>Name 1-C</v>
                </c>
              </row>
              <row r="7">
                <c r="A7" s="0">
                  <v>ID 2-A</v>
                </c>
                <c r="B7" s="0">
                  <v>Exact 2-B</v>
                </c>
                <c r="C7" s="0">
                  <v>Name 2-C</v>
                </c>
              </row>
            </sheetData>
          </worksheet>
        XML
      )
    end

    it 'transforms headers into symbols based on the header map' do
      header_map = {id: /ID/, name: /foo/i, exact: 'ExacT'}
      result = reader.sheets[0].rows.each(headers: header_map).to_a

      _(result).must_equal(
        [
          { id: 'ID 1-A', exact: 'Exact 1-B', name: 'Name 1-C' },
          { id: nil, exact: nil, name: nil },
          { id: 'ID 2-A', exact: 'Exact 2-B', name: 'Name 2-C' },
        ]
      )
    end

    it 'if a match isnt found, uses un-matched header name' do
      sheet.xpath("//*[text() = 'ExacT']")
        .first.children.first.content = 'not ExacT'

      header_map = {id: /ID/, name: /foo/i, exact: 'ExacT'}
      result = reader.sheets[0].rows.each(headers: header_map).to_a

      _(result).must_equal(
        [
          { id: 'ID 1-A', 'not ExacT' => 'Exact 1-B', name: 'Name 1-C' },
          { id: nil, 'not ExacT' => nil, name: nil },
          { id: 'ID 2-A', 'not ExacT' => 'Exact 2-B', name: 'Name 2-C' },
        ]
      )
    end
  end

  describe 'Sheet#rows[]' do
    it 'raises a RuntimeError if rows not slurped yet' do
      _(-> { reader.sheets[0].rows[1] }).must_raise(RuntimeError)
    end

    it 'works if the rows have been slurped' do
      _(reader.sheets[0].rows.tap(&:slurp)[0]).must_equal(
        ['Cell A', 'Cell B', 'Cell C']
      )
    end

    it 'works if the config allows auto slurping' do
      SimpleXlsxReader.configuration.auto_slurp = true

      _(reader.sheets[0].rows[0]).must_equal(
        ['Cell A', 'Cell B', 'Cell C']
      )

      SimpleXlsxReader.configuration.auto_slurp = false
    end
  end

  describe 'Sheet#rows#slurp' do
    let(:rows) { reader.sheets[0].rows.tap(&:slurp) }

    it 'loads the sheet parser results into memory' do
      _(rows.slurped).must_equal(
        [['Cell A', 'Cell B', 'Cell C']]
      )
    end

    it '#each and #map use slurped results' do
      _(rows.map(&:reverse)).must_equal(
        [['Cell C', 'Cell B', 'Cell A']]
      )
    end
  end

  describe 'Sheet#rows#each' do
    let(:sheet) do
      <<~XML
        <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
          <dimension ref="A1:B3" />
          <sheetData>
            <row r="1">
              <c r="A1" s="0">
                <v>Header 1</v>
              </c>
              <c r="B1" s="0">
                <v>Header 2</v>
              </c>
            </row>
            <row r="2">
              <c r="A2" s="0">
                <v>Data 1-A</v>
              </c>
              <c r="B2" s="0">
                <v>Data 1-B</v>
              </c>
            </row>
            <row r="4">
              <c r="A4" s="0">
                <v>Data 2-A</v>
              </c>
              <c r="B4" s="0">
                <v>Data 2-B</v>
              </c>
            </row>
          </sheetData>
        </worksheet>
      XML
    end

    let(:rows) { reader.sheets[0].rows }

    it 'with no block, returns an enumerator when not slurped' do
      _(rows.each.class).must_equal Enumerator
    end

    it 'with no block, passes on header argument in enumerator' do
      _(rows.each(headers: true).inspect).must_match 'headers: true'
    end

    it 'returns an enumerator when slurped' do
      rows.slurp
      _(rows.each.class).must_equal Enumerator
    end
  end

  describe 'Sheet#rows#map' do
    let(:sheet) do
      <<~XML
        <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
          <dimension ref="A1:B3" />
          <sheetData>
            <row r="1">
              <c r="A1" s="0">
                <v>Header 1</v>
              </c>
              <c r="B1" s="0">
                <v>Header 2</v>
              </c>
            </row>
            <row r="2">
              <c r="A2" s="0">
                <v>Data 1-A</v>
              </c>
              <c r="B2" s="0">
                <v>Data 1-B</v>
              </c>
            </row>
            <row r="4">
              <c r="A4" s="0">
                <v>Data 2-A</v>
              </c>
              <c r="B4" s="0">
                <v>Data 2-B</v>
              </c>
            </row>
          </sheetData>
        </worksheet>
      XML
    end

    let(:rows) { reader.sheets[0].rows }

    it 'does not slurp' do
      _(rows.map(&:first)).must_equal(
        ["Header 1", "Data 1-A", nil, "Data 2-A"]
      )
      _(rows.slurped?).must_equal false
    end
  end

  describe 'Sheet#headers' do
    let(:doc_sheet) { reader.sheets[0] }

    it 'raises a RuntimeError if rows not slurped yet' do
      _(-> { doc_sheet.headers }).must_raise(RuntimeError)
    end

    it 'returns first row if slurped' do
      _(doc_sheet.tap(&:slurp).headers).must_equal(
        ['Cell A', 'Cell B', 'Cell C']
      )
    end

    it 'returns first row if auto_slurp' do
      SimpleXlsxReader.configuration.auto_slurp = true

      _(doc_sheet.headers).must_equal(
        ['Cell A', 'Cell B', 'Cell C']
      )

      SimpleXlsxReader.configuration.auto_slurp = false
    end
  end

  describe SimpleXlsxReader::Loader do
    let(:described_class) { SimpleXlsxReader::Loader }

    describe '::cast' do
      it 'reads type s as a shared string' do
        _(described_class.cast('1', 's', nil, shared_strings: %w[a b c]))
          .must_equal 'b'
      end

      it 'reads type inlineStr as a string' do
        _(described_class.cast('the value', nil, 'inlineStr'))
          .must_equal 'the value'
      end

      it 'reads date styles' do
        _(described_class.cast('41505', nil, :date))
          .must_equal Date.parse('2013-08-19')
      end

      it 'reads time styles' do
        _(described_class.cast('41505.77083', nil, :time))
          .must_equal Time.parse('2013-08-19 18:30 UTC')
      end

      it 'reads date_time styles' do
        _(described_class.cast('41505.77083', nil, :date_time))
          .must_equal Time.parse('2013-08-19 18:30 UTC')
      end

      it 'reads number types styled as dates' do
        _(described_class.cast('41505', 'n', :date))
          .must_equal Date.parse('2013-08-19')
      end

      it 'reads number types styled as times' do
        _(described_class.cast('41505.77083', 'n', :time))
          .must_equal Time.parse('2013-08-19 18:30 UTC')
      end

      it 'reads less-than-zero complex number types styled as times' do
        _(described_class.cast('6.25E-2', 'n', :time))
          .must_equal Time.parse('1899-12-30 01:30:00 UTC')
      end

      it 'reads number types styled as date_times' do
        _(described_class.cast('41505.77083', 'n', :date_time))
          .must_equal Time.parse('2013-08-19 18:30 UTC')
      end

      it 'raises when date-styled values are not numerical' do
        _(-> { described_class.cast('14 is not a valid date', nil, :date) })
          .must_raise(ArgumentError)
      end

      describe 'with the url option' do
        let(:url) { 'http://www.example.com/hyperlink' }
        it 'creates a hyperlink with a string type' do
          _(described_class.cast('A link', 'str', :string, url: url))
            .must_equal SXR::Hyperlink.new(url, 'A link')
        end

        it 'creates a hyperlink with a shared string type' do
          _(described_class.cast('2', 's', nil, shared_strings: %w[a b c], url: url))
            .must_equal SXR::Hyperlink.new(url, 'c')
        end

        it 'creates a hyperlink with a fixnum friendly_name' do
          _(described_class.cast('123', nil, :fixnum, url: url))
            .must_equal SXR::Hyperlink.new(url, '123')
        end
      end
    end

    describe 'shared_strings' do
      let(:xml) do
        File.open(File.join(File.dirname(__FILE__), 'shared_strings.xml'))
      end

      let(:ss) { SimpleXlsxReader::Loader::SharedStringsParser.parse(xml) }

      it 'parses strings formatted at the cell level' do
        _(ss[0..2]).must_equal ['Cell A1', 'Cell B1', 'My Cell']
      end

      it 'parses strings formatted at the character level' do
        _(ss[3..5]).must_equal ['Cell A2', 'Cell B2', 'Cell Fmt']
      end

      it 'parses looong strings containing unicode' do
        _(ss[6]).must_include 'It only happens with both unicode *and* really long text.'
      end
    end

    describe 'style_types' do
      let(:xml_file) do
        File.open(File.join(File.dirname(__FILE__), 'styles.xml'))
      end

      let(:parser) do
        SimpleXlsxReader::Loader::StyleTypesParser.new(xml_file).tap(&:parse)
      end

      it 'reads custom formatted styles (numFmtId >= 164)' do
        _(parser.style_types[1]).must_equal :date_time
        _(parser.custom_style_types[164]).must_equal :date_time
      end

      # something I've seen in the wild; don't think it's correct, but let's be flexible.
      it 'reads custom formatted styles given an id < 164, but not explicitly defined in the SpreadsheetML spec' do
        _(parser.style_types[2]).must_equal :date_time
        _(parser.custom_style_types[59]).must_equal :date_time
      end
    end

    describe '#last_cell_label' do
      # Note, this is not a valid sheet, since the last cell is actually D1 but
      # the dimension specifies C1. This is just for testing.
      let(:sheet) do
        Nokogiri::XML(
          <<-XML
          <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
            <dimension ref="A1:C1" />
            <sheetData>
              <row>
                <c r='A1' s='0'>
                  <v>Cell A</v>
                </c>
                <c r='C1' s='0'>
                  <v>Cell C</v>
                </c>
                <c r='D1' s='0'>
                  <v>Cell D</v>
                </c>
              </row>
            </sheetData>
          </worksheet>
          XML
        ).remove_namespaces!
      end

      let(:loader) do
        SimpleXlsxReader::Loader.new(nil).tap do |l|
          l.shared_strings = []
          l.sheet_toc = { 'Sheet1': 0 }
          l.style_types = []
          l.base_date = SimpleXlsxReader::DATE_SYSTEM_1900
        end
      end

      let(:sheet_parser) do
        tempfile = Tempfile.new(['sheet', '.xml'])
        tempfile.write(sheet)
        tempfile.rewind

        SimpleXlsxReader::Loader::SheetParser.new(
          file_io: tempfile,
          loader: loader
        ).tap { |parser| parser.parse {} }
      end

      it 'uses /worksheet/dimension if available' do
        _(sheet_parser.last_cell_letter).must_equal 'C'
      end

      it 'uses the last header cell if /worksheet/dimension is missing' do
        sheet.at_xpath('/worksheet/dimension').remove
        _(sheet_parser.last_cell_letter).must_equal 'D'
      end

      it 'returns "A1" if the dimension is just one cell' do
        sheet.xpath('/worksheet/sheetData/row').remove
        sheet.xpath('/worksheet/dimension').attr('ref', 'A1')
        _(sheet_parser.last_cell_letter).must_equal 'A'
      end

      it 'returns nil if the sheet is just one cell, but /worksheet/dimension is missing' do
        sheet.xpath('/worksheet/sheetData/row').remove
        sheet.xpath('/worksheet/dimension').remove
        _(sheet_parser.last_cell_letter).must_be_nil
      end
    end

    describe '#column_letter_to_number' do
      let(:subject) { SXR::Loader::SheetParser.new(file_io: nil, loader: nil) }

      [
        ['A', 1],
        ['B',   2],
        ['Z',   26],
        ['AA',  27],
        ['AB',  28],
        ['AZ',  52],
        ['BA',  53],
        ['BZ',  78],
        ['ZZ',  702],
        ['AAA', 703],
        ['AAZ', 728],
        ['ABA', 729],
        ['ABZ', 754],
        ['AZZ', 1378],
        ['ZZZ', 18_278]
      ].each do |(letter, number)|
        it "converts #{letter} to #{number}" do
          _(subject.column_letter_to_number(letter)).must_equal number
        end
      end
    end
  end

  describe 'parse errors' do
    after do
      SimpleXlsxReader.configuration.catch_cell_load_errors = false
    end

    let(:sheet) do
      Nokogiri::XML(
        <<-XML
          <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
            <dimension ref="A1:A1" />
            <sheetData>
              <row>
                <c r='A1' s='0'>
                  <v>14 is a date style; this is not a date</v>
                </c>
              </row>
            </sheetData>
          </worksheet>
        XML
      ).remove_namespaces!
    end

    let(:styles) do
      # s='0' above refers to the value of numFmtId at cellXfs index 0
      Nokogiri::XML(
        <<-XML
          <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
            <cellXfs count="1">
              <xf numFmtId="14" />
            </cellXfs>
          </styleSheet>
        XML
      ).remove_namespaces!
    end

    it 'raises if configuration.catch_cell_load_errors' do
      SimpleXlsxReader.configuration.catch_cell_load_errors = false

      _(-> { SimpleXlsxReader.open(xlsx.archive.path).to_hash })
        .must_raise(SimpleXlsxReader::CellLoadError)
    end

    it 'records a load error if not configuration.catch_cell_load_errors' do
      SimpleXlsxReader.configuration.catch_cell_load_errors = true

      sheet = SimpleXlsxReader.open(xlsx.archive.path).sheets[0].tap(&:slurp)
      _(sheet.load_errors).must_equal(
        [0, 0] => 'invalid value for Float(): "14 is a date style; this is not a date"'
      )
    end
  end

  describe 'missing numFmtId attributes' do
    let(:sheet) do
      Nokogiri::XML(
        <<-XML
          <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
            <dimension ref="A1:A1" />
            <sheetData>
              <row>
                <c r='A1' s='s'>
                  <v>some content</v>
                </c>
              </row>
            </sheetData>
          </worksheet>
        XML
      ).remove_namespaces!
    end

    let(:styles) do
      Nokogiri::XML(
        <<-XML
          <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">

          </styleSheet>
        XML
      ).remove_namespaces!
    end

    before do
      @row = SimpleXlsxReader.open(xlsx.archive.path).sheets[0].rows.to_a[0]
    end

    it 'continues even when cells are missing numFmtId attributes ' do
      _(@row[0]).must_equal 'some content'
    end
  end

  describe 'parsing types' do
    let(:sheet) do
      Nokogiri::XML(
        <<-XML
          <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
            <dimension ref="A1:G1" />
            <sheetData>
              <row>
                <c r='A1' s='0'>
                  <v>Cell A1</v>
                </c>

                <c r='C1' s='1'>
                  <v>2.4</v>
                </c>
                <c r='D1' s='1' />

                <c r='E1' s='2'>
                  <v>30687</v>
                </c>
                <c r='F1' s='2' />

                <c r='G1' t='inlineStr' s='0'>
                  <is><t>Cell G1</t></is>
                </c>

                <c r='H1' s='0'>
                  <f>HYPERLINK("http://www.example.com/hyperlink-function", "HYPERLINK function")</f>
                  <v>HYPERLINK function</v>
                </c>

                <c r='I1' s='0'>
                  <v>GUI-made hyperlink</v>
                </c>

                <c r='J1' s='0'>
                  <v>1</v>
                </c>
              </row>
            </sheetData>

            <hyperlinks>
              <hyperlink ref="I1" id="rId1"/>
            </hyperlinks>
          </worksheet>
        XML
      ).remove_namespaces!
    end

    let(:styles) do
      # s='0' above refers to the value of numFmtId at cellXfs index 0,
      # which is in this case 'General' type
      Nokogiri::XML(
        <<-XML
          <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
            <cellXfs count="1">
              <xf numFmtId="0" />
              <xf numFmtId="2" />
              <xf numFmtId="14" />
            </cellXfs>
          </styleSheet>
        XML
      ).remove_namespaces!
    end

    # Although not a "type" or "style" according to xlsx spec,
    # it sure could/should be, so let's test it with the rest of our
    # typecasting code.
    let(:rels) do
      [
        Nokogiri::XML(
          <<-XML
            <Relationships>
              <Relationship
                Id="rId1"
                Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink"
                Target="http://www.example.com/hyperlink-gui"
                TargetMode="External"
              />
            </Relationships>
          XML
        ).remove_namespaces!
      ]
    end

    before do
      @row = SimpleXlsxReader.open(xlsx.archive.path).sheets[0].rows.to_a[0]
    end

    it "reads 'Generic' cells as strings" do
      _(@row[0]).must_equal 'Cell A1'
    end

    it "reads empty 'Generic' cells as nil" do
      _(@row[1]).must_be_nil
    end

    # We could expand on these type tests, but really just a couple
    # demonstrate that it's wired together. Type-specific tests should go
    # on #cast

    it 'reads floats' do
      _(@row[2]).must_equal 2.4
    end

    it 'reads empty floats as nil' do
      _(@row[3]).must_be_nil
    end

    it 'reads dates' do
      _(@row[4]).must_equal Date.parse('Jan 6, 1984')
    end

    it 'reads empty date cells as nil' do
      _(@row[5]).must_be_nil
    end

    it 'reads strings formatted as inlineStr' do
      _(@row[6]).must_equal 'Cell G1'
    end

    it 'reads hyperlinks created via HYPERLINK()' do
      _(@row[7]).must_equal(
        SXR::Hyperlink.new(
          'http://www.example.com/hyperlink-function', 'HYPERLINK function'
        )
      )
    end

    it 'reads hyperlinks created via the GUI' do
      _(@row[8]).must_equal(
        SXR::Hyperlink.new(
          'http://www.example.com/hyperlink-gui', 'GUI-made hyperlink'
        )
      )
    end

    it "reads 'Generic' cells with numbers as numbers" do
      _(@row[9]).must_equal 1
    end
  end

  describe 'parsing documents with blank rows' do
    let(:sheet) do
      Nokogiri::XML(
        <<-XML
          <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
            <dimension ref="A1:D7" />
            <sheetData>
            <row r="2" spans="1:1">
              <c r="A2" s="0">
                <v>a</v>
              </c>
            </row>
            <row r="4" spans="1:1">
              <c r="B4" s="0">
                <v>1</v>
              </c>
            </row>
            <row r="5" spans="1:1">
              <c r="C5" s="0">
                <v>2</v>
              </c>
            </row>
            <row r="7" spans="1:1">
              <c r="D7" s="0">
                <v>3</v>
              </c>
            </row>
            </sheetData>
          </worksheet>
        XML
      ).remove_namespaces!
    end

    before do
      @rows = SimpleXlsxReader.open(xlsx.archive.path).sheets[0].rows.to_a
    end

    it 'reads row data despite gaps in row numbering' do
      _(@rows).must_equal [
        [nil, nil, nil, nil],
        ['a', nil, nil, nil],
        [nil, nil, nil, nil],
        [nil, 1, nil, nil],
        [nil, nil, 2, nil],
        [nil, nil, nil, nil],
        [nil, nil, nil, 3]
      ]
    end
  end

  describe 'parsing documents with non-hyperlinked rels' do
    let(:rels) do
      [
        Nokogiri::XML(
          <<-XML
          <?xml version="1.0" encoding="UTF-8"?>
          <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"></Relationships>
          XML
        ).remove_namespaces!
      ]
    end

    describe 'when document is opened as path' do
      before do
        @row = SimpleXlsxReader.open(xlsx.archive.path).sheets[0].rows.to_a[0]
      end

      it 'reads cell content' do
        _(@row[0]).must_equal 'Cell A'
      end
    end

    describe 'when document is parsed as a String' do
      before do
        output = File.binread(xlsx.archive.path)
        @row = SimpleXlsxReader.parse(output).sheets[0].rows.to_a[0]
      end

      it 'reads cell content' do
        _(@row[0]).must_equal 'Cell A'
      end
    end

    describe 'when document is parsed as StringIO' do
      before do
        stream = StringIO.new(File.binread(xlsx.archive.path), 'rb')
        @row = SimpleXlsxReader.parse(stream).sheets[0].rows.to_a[0]
        stream.close
      end

      it 'reads cell content' do
        _(@row[0]).must_equal 'Cell A'
      end
    end
  end

  # https://support.microsoft.com/en-us/office/available-number-formats-in-excel-0afe8f52-97db-41f1-b972-4b46e9f1e8d2
  describe 'numeric fields styled as "General"' do
    let(:misc_numbers_path) do
      File.join(File.dirname(__FILE__), 'misc_numbers.xlsx')
    end

    let(:sheet) { SimpleXlsxReader.open(misc_numbers_path).sheets[0] }

    it 'reads medium sized integers as integers' do
      _(sheet.rows.slurp[1][0]).must_equal 98070
    end

    it 'reads large (>12 char) integers as integers' do
      _(sheet.rows.slurp[1][1]).must_equal 1234567890123
    end
  end

  describe 'with mysteriously chunky UTF-8 text' do
    let(:chunky_utf8_path) do
      File.join(File.dirname(__FILE__), 'chunky_utf8.xlsx')
    end

    let(:sheet) { SimpleXlsxReader.open(chunky_utf8_path).sheets[0] }

    it 'reads the whole cell text' do
      _(sheet.rows.slurp[1]).must_equal(
        ["sample-company-1", "Korntal-Münchingen", "Bronholmer straße"]
      )
    end
  end

  describe 'when using percentages & currencies' do
    let(:pnc_path) do
      # This file provided by a GitHub user having parse errors in these fields
      File.join(File.dirname(__FILE__), 'percentages_n_currencies.xlsx')
    end

    let(:sheet) { SimpleXlsxReader.open(pnc_path).sheets[0] }

    it 'reads percentages as floats of the form 0.XX' do
      _(sheet.rows.slurp[1][2]).must_equal(0.87)
    end

    it 'reads currencies as floats' do
      _(sheet.rows.slurp[1][4]).must_equal(300.0)
    end
  end
end


================================================
FILE: test/styles.xml
================================================
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" mc:Ignorable="x14ac">
  <numFmts count="2">
    <numFmt numFmtId="59" formatCode="dd/mm/yyyy"/>
    <numFmt numFmtId="164" formatCode="[$-409]m/d/yy\ h:mm\ AM/PM;@"/>
  </numFmts>
  <fonts count="3" x14ac:knownFonts="1">
    <font>
      <sz val="12"/>
      <color theme="1"/>
      <name val="Calibri"/>
      <family val="2"/>
      <scheme val="minor"/>
    </font>
    <font>
      <u/>
      <sz val="12"/>
      <color theme="10"/>
      <name val="Calibri"/>
      <family val="2"/>
      <scheme val="minor"/>
    </font>
    <font>
      <u/>
      <sz val="12"/>
      <color theme="11"/>
      <name val="Calibri"/>
      <family val="2"/>
      <scheme val="minor"/>
    </font>
  </fonts>
  <fills count="2">
    <fill>
      <patternFill patternType="none"/>
    </fill>
    <fill>
      <patternFill patternType="gray125"/>
    </fill>
  </fills>
  <borders count="1">
    <border>
      <left/>
      <right/>
      <top/>
      <bottom/>
      <diagonal/>
    </border>
  </borders>
  <cellStyleXfs count="3">
    <xf numFmtId="0" fontId="0" fillId="0" borderId="0"/>
    <xf numFmtId="0" fontId="1" fillId="0" borderId="0" applyNumberFormat="0" applyFill="0" applyBorder="0" applyAlignment="0" applyProtection="0"/>
    <xf numFmtId="0" fontId="2" fillId="0" borderId="0" applyNumberFormat="0" applyFill="0" applyBorder="0" applyAlignment="0" applyProtection="0"/>
  </cellStyleXfs>
  <cellXfs count="4">
    <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/>
    <xf numFmtId="164" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
    <xf numFmtId="59" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
    <xf numFmtId="1" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
  </cellXfs>
  <cellStyles count="3">
    <cellStyle name="Followed Hyperlink" xfId="2" builtinId="9" hidden="1"/>
    <cellStyle name="Hyperlink" xfId="1" builtinId="8" hidden="1"/>
    <cellStyle name="Normal" xfId="0" builtinId="0"/>
  </cellStyles>
  <dxfs count="0"/>
  <tableStyles count="0" defaultTableStyle="TableStyleMedium9" defaultPivotStyle="PivotStyleMedium4"/>
</styleSheet>


================================================
FILE: test/test_helper.rb
================================================
# frozen_string_literal: true

gem 'minitest'
require 'minitest/autorun'
require 'minitest/spec'
require 'pry'
require 'time'
require 'test_xlsx_builder'

$LOAD_PATH.unshift File.expand_path('lib')
require 'simple_xlsx_reader'


================================================
FILE: test/test_xlsx_builder.rb
================================================
# frozen_string_literal: true

require 'nokogiri'

TestXlsxBuilder = Struct.new(:shared_strings, :styles, :sheets, :workbook, :rels, keyword_init: true) do

  DEFAULTS = {
    workbook:
      Nokogiri::XML(
        <<-XML
          <workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
            <sheets>
              <sheet name="Sheet1" sheetId="1" r:id="rId1"/>
            </sheets>
          </styleSheet>
        XML
      ).remove_namespaces!,

    styles:
      Nokogiri::XML(
        <<-XML
          <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
            <cellXfs count="1">
              <xf numFmtId="0" />
            </cellXfs>
          </styleSheet>
        XML
      ).remove_namespaces!,

    sheet:
      Nokogiri::XML(
        <<-XML
        <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
          <dimension ref="A1:C1" />
          <sheetData>
            <row>
              <c r='A1' s='0'>
                <v>Cell A</v>
              </c>
              <c r='B1' s='0'>
                <v>Cell B</v>
              </c>
              <c r='C1' s='0'>
                <v>Cell C</v>
              </c>
            </row>
          </sheetData>
        </worksheet>
        XML
      ).remove_namespaces!
  }

  def initialize(*args)
    super

    self.workbook ||= DEFAULTS[:workbook]
    self.styles ||= DEFAULTS[:styles]
    self.sheets ||= [DEFAULTS[:sheet]]
    self.rels ||= []
  end

  def archive
    tmpfile = Tempfile.new(['workbook', '.xlsx'])
    tmpfile.binmode
    tmpfile.rewind

    Zip::File.open(tmpfile.path, create: true) do |zip|
      zip.mkdir('xl')

      zip.get_output_stream('xl/workbook.xml') do |wb_file|
        wb_file.write(workbook)
      end

      zip.get_output_stream('xl/styles.xml') do |styles_file|
        styles_file.write(styles)
      end

      if shared_strings
        zip.get_output_stream('xl/sharedStrings.xml') do |ss_file|
          ss_file.write(shared_strings)
        end
      end

      zip.mkdir('xl/worksheets')

      sheets.each_with_index do |sheet, i|
        zip.get_output_stream("xl/worksheets/sheet#{i + 1}.xml") do |sf|
          sf.write(sheet)
        end

        if rels[i]
          zip.mkdir('xl/worksheets/_rels')
          zip.get_output_stream("xl/worksheets/_rels/sheet#{i + 1}.xml.rels") do |rf|
            rf.write(rels[i])
          end
        end
      end
    end

    tmpfile
  end
end

Download .txt

gitextract_v1o319sx/

├── .github/
│   ├── dependabot.yml
│   └── workflows/
│       └── ruby.yml
├── .gitignore
├── .travis.yml
├── CHANGELOG.md
├── Gemfile
├── LICENSE.txt
├── README.md
├── Rakefile
├── lib/
│   ├── simple_xlsx_reader/
│   │   ├── document.rb
│   │   ├── hyperlink.rb
│   │   ├── loader/
│   │   │   ├── shared_strings_parser.rb
│   │   │   ├── sheet_parser.rb
│   │   │   ├── style_types_parser.rb
│   │   │   └── workbook_parser.rb
│   │   ├── loader.rb
│   │   └── version.rb
│   └── simple_xlsx_reader.rb
├── simple_xlsx_reader.gemspec
└── test/
    ├── chunky_utf8.xlsx
    ├── date1904.xlsx
    ├── date1904_test.rb
    ├── datetime_test.rb
    ├── datetimes.xlsx
    ├── gdocs_sheet.xlsx
    ├── gdocs_sheet_test.rb
    ├── lower_case_sharedstrings.xlsx
    ├── lower_case_sharedstrings_test.rb
    ├── misc_numbers.xlsx
    ├── namespaces_and_missing_atts_test.rb
    ├── percentages_n_currencies.xlsx
    ├── performance_test.rb
    ├── sesame_street_blog.xlsx
    ├── shared_strings.xml
    ├── simple_xlsx_reader_test.rb
    ├── styles.xml
    ├── test_helper.rb
    └── test_xlsx_builder.rb

Download .txt

SYMBOL INDEX (84 symbols across 11 files)

FILE: lib/simple_xlsx_reader.rb
  type SimpleXlsxReader (line 30) | module SimpleXlsxReader
    class CellLoadError (line 34) | class CellLoadError < StandardError; end
    function configuration (line 37) | def configuration
    function open (line 44) | def open(file_path)
    function parse (line 48) | def parse(string_or_io)

FILE: lib/simple_xlsx_reader/document.rb
  type SimpleXlsxReader (line 5) | module SimpleXlsxReader
    class Document (line 10) | class Document
      method initialize (line 13) | def initialize(legacy_file_path = nil, file_path: nil, string_or_io:...
      method sheets (line 19) | def sheets
      method to_hash (line 25) | def to_hash
      class Sheet (line 30) | class Sheet
        method initialize (line 37) | def initialize(name:, sheet_parser:)
        method headers (line 43) | def headers
        method data (line 49) | def data
      class RowsProxy (line 55) | class RowsProxy
        method initialize (line 60) | def initialize(sheet_parser:)
        method each (line 88) | def each(headers: false, &block)
        method slurp (line 110) | def slurp
        method slurped? (line 116) | def slurped?
        method slurped! (line 120) | def slurped!
        method [] (line 126) | def [](*args)
        method shift (line 132) | def shift(*args)
        method check_slurped (line 140) | def check_slurped

FILE: lib/simple_xlsx_reader/hyperlink.rb
  type SimpleXlsxReader (line 3) | module SimpleXlsxReader
    class Hyperlink (line 19) | class Hyperlink < String
      method initialize (line 23) | def initialize(url, friendly_name = nil)

FILE: lib/simple_xlsx_reader/loader.rb
  type SimpleXlsxReader (line 3) | module SimpleXlsxReader
    class Loader (line 4) | class Loader < Struct.new(:string_or_io)
      method init_sheets (line 7) | def init_sheets
      method initialize (line 25) | def initialize(*args)
      method read (line 30) | def read
      method entry_at (line 67) | def entry_at(path, &block)
      method add_sheet_parser_at_index (line 84) | def add_sheet_parser_at_index(i)
      method cast (line 120) | def self.cast(value, type, style, options = {})

FILE: lib/simple_xlsx_reader/loader/shared_strings_parser.rb
  type SimpleXlsxReader (line 3) | module SimpleXlsxReader
    class Loader (line 4) | class Loader
      class SharedStringsParser (line 10) | class SharedStringsParser < Nokogiri::XML::SAX::Document
        method parse (line 11) | def self.parse(file)
        method initialize (line 17) | def initialize
        method start_element (line 25) | def start_element(name, _attrs = [])
        method characters (line 32) | def characters(string)
        method end_element (line 38) | def end_element(name)

FILE: lib/simple_xlsx_reader/loader/sheet_parser.rb
  type SimpleXlsxReader (line 5) | module SimpleXlsxReader
    class Loader (line 6) | class Loader
      class SheetParser (line 7) | class SheetParser < Nokogiri::XML::SAX::Document
        method initialize (line 17) | def initialize(file_io:, loader:)
        method parse (line 22) | def parse(headers: false, &block)
        method start_element_namespace (line 60) | def start_element_namespace(name, attrs = [], _prefix, _uri, _ns)
        method characters (line 79) | def characters(string)
        method end_element_namespace (line 124) | def end_element_namespace(name, _prefix, _uri)
        method test_headers_hash_against_current_row (line 166) | def test_headers_hash_against_current_row
        method possibly_yield_empty_rows (line 181) | def possibly_yield_empty_rows(headers:)
        method yield_row (line 188) | def yield_row(row, headers:)
        method load_gui_hyperlinks (line 197) | def load_gui_hyperlinks
        class HyperlinksParser (line 202) | class HyperlinksParser < Nokogiri::XML::SAX::Document
          method initialize (line 203) | def initialize(file_io, xrels:)
          method parse (line 208) | def self.parse(file_io, xrels:)
          method parse (line 212) | def parse
          method start_element_namespace (line 218) | def start_element_namespace(name, attrs, _prefix, _uri, _ns)
        method xrels (line 230) | def xrels
        method column_length (line 234) | def column_length
        method cell_idx (line 240) | def cell_idx
        method last_cell_letter (line 246) | def last_cell_letter
        method column_letter_to_number (line 265) | def column_letter_to_number(column_letter)
        method column_number_to_letter (line 275) | def column_number_to_letter(n)

FILE: lib/simple_xlsx_reader/loader/style_types_parser.rb
  type SimpleXlsxReader (line 3) | module SimpleXlsxReader
    class Loader (line 4) | class Loader
      method parse (line 6) | def self.parse(file_io)
      method parse (line 47) | def parse
      method style_types (line 65) | def style_types
      method style_type_by_num_fmt_id (line 81) | def style_type_by_num_fmt_id(id)
      method custom_style_types (line 90) | def custom_style_types
      method determine_custom_style_type (line 105) | def determine_custom_style_type(string)

FILE: lib/simple_xlsx_reader/loader/workbook_parser.rb
  type SimpleXlsxReader (line 3) | module SimpleXlsxReader
    class Loader (line 4) | class Loader
      method parse (line 6) | def self.parse(file_io)
      method parse (line 11) | def parse
      method sheet_toc (line 16) | def sheet_toc
      method base_date (line 28) | def base_date

FILE: lib/simple_xlsx_reader/version.rb
  type SimpleXlsxReader (line 3) | module SimpleXlsxReader

FILE: test/performance_test.rb
  function sheet_with_n_rows (line 8) | def sheet_with_n_rows(row_count)
  function bench_range (line 98) | def self.bench_range

FILE: test/test_xlsx_builder.rb
  function initialize (line 53) | def initialize(*args)
  function archive (line 62) | def archive

Download .json

Condensed preview — 38 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (100K chars).

[
  {
    "path": ".github/dependabot.yml",
    "chars": 118,
    "preview": "version: 2\nupdates:\n  - package-ecosystem: \"github-actions\"\n    directory: \"/\"\n    schedule:\n      interval: \"weekly\"\n"
  },
  {
    "path": ".github/workflows/ruby.yml",
    "chars": 958,
    "preview": "# This workflow uses actions that are not certified by GitHub.\n# They are provided by a third-party and are governed by\n"
  },
  {
    "path": ".gitignore",
    "chars": 154,
    "preview": "*.gem\n*.rbc\n.bundle\n.config\n.yardoc\nGemfile.lock\nInstalledFiles\n_yardoc\ncoverage\ndoc/\nlib/bundler/man\npkg\nrdoc\nspec/repo"
  },
  {
    "path": ".travis.yml",
    "chars": 104,
    "preview": "language: ruby\ncache: bundler\nbefore_install:\n  - gem update bundler\nrvm:\n  - 2.5.8\n  - 2.7.2\n  - 3.0.0\n"
  },
  {
    "path": "CHANGELOG.md",
    "chars": 4570,
    "preview": "### 5.1.0\n\n* Parse sheets containing namespaces and no 'r' att (@skipchris)\n* Fix Zlib error when loading from string (@"
  },
  {
    "path": "Gemfile",
    "chars": 103,
    "preview": "source 'https://rubygems.org'\n\n# Specify your gem's dependencies in simple_xlsx_reader.gemspec\ngemspec\n"
  },
  {
    "path": "LICENSE.txt",
    "chars": 1070,
    "preview": "Copyright (c) 2013 Woody Peterson\n\nMIT License\n\nPermission is hereby granted, free of charge, to any person obtaining\na "
  },
  {
    "path": "README.md",
    "chars": 9341,
    "preview": "# SimpleXlsxReader\n\nA [fast](#performance) xlsx reader for Ruby that parses xlsx cell values into\nplain ruby primitives "
  },
  {
    "path": "Rakefile",
    "chars": 190,
    "preview": "# frozen_string_literal: true\n\nrequire \"bundler/gem_tasks\"\n\nrequire 'rake/testtask'\nRake::TestTask.new do |t|\n  t.patter"
  },
  {
    "path": "lib/simple_xlsx_reader/document.rb",
    "chars": 4577,
    "preview": "# frozen_string_literal: true\n\nrequire 'forwardable'\n\nmodule SimpleXlsxReader\n\n  ##\n  # Main class for the public API. S"
  },
  {
    "path": "lib/simple_xlsx_reader/hyperlink.rb",
    "chars": 1190,
    "preview": "# frozen_string_literal: true\n\nmodule SimpleXlsxReader\n  # We support hyperlinks as a \"type\" even though they're technic"
  },
  {
    "path": "lib/simple_xlsx_reader/loader/shared_strings_parser.rb",
    "chars": 1148,
    "preview": "# frozen_string_literal: true\n\nmodule SimpleXlsxReader\n  class Loader\n    # For performance reasons, excel uses an optio"
  },
  {
    "path": "lib/simple_xlsx_reader/loader/sheet_parser.rb",
    "chars": 9303,
    "preview": "# frozen_string_literal: true\n\nrequire 'forwardable'\n\nmodule SimpleXlsxReader\n  class Loader\n    class SheetParser < Nok"
  },
  {
    "path": "lib/simple_xlsx_reader/loader/style_types_parser.rb",
    "chars": 4588,
    "preview": "# frozen_string_literal: true\n\nmodule SimpleXlsxReader\n  class Loader\n    StyleTypesParser = Struct.new(:file_io) do\n   "
  },
  {
    "path": "lib/simple_xlsx_reader/loader/workbook_parser.rb",
    "chars": 1213,
    "preview": "# frozen_string_literal: true\n\nmodule SimpleXlsxReader\n  class Loader\n    WorkbookParser = Struct.new(:file_io) do\n     "
  },
  {
    "path": "lib/simple_xlsx_reader/loader.rb",
    "chars": 5994,
    "preview": "# frozen_string_literal: true\n\nmodule SimpleXlsxReader\n  class Loader < Struct.new(:string_or_io)\n    attr_accessor :sha"
  },
  {
    "path": "lib/simple_xlsx_reader/version.rb",
    "chars": 79,
    "preview": "# frozen_string_literal: true\n\nmodule SimpleXlsxReader\n  VERSION = '5.1.0'\nend\n"
  },
  {
    "path": "lib/simple_xlsx_reader.rb",
    "chars": 1415,
    "preview": "# frozen_string_literal: true\n\nrequire 'nokogiri'\nrequire 'date'\n\nrequire 'simple_xlsx_reader/version'\nrequire 'simple_x"
  },
  {
    "path": "simple_xlsx_reader.gemspec",
    "chars": 944,
    "preview": "# -*- encoding: utf-8 -*-\nlib = File.expand_path('../lib', __FILE__)\n$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?("
  },
  {
    "path": "test/date1904_test.rb",
    "chars": 398,
    "preview": "# frozen_string_literal: true\n\nrequire_relative 'test_helper'\n\ndescribe SimpleXlsxReader do\n  let(:date1904_file) { File"
  },
  {
    "path": "test/datetime_test.rb",
    "chars": 619,
    "preview": "# frozen_string_literal: true\n\nrequire_relative 'test_helper'\n\ndescribe SimpleXlsxReader do\n  let(:datetimes_file) do\n  "
  },
  {
    "path": "test/gdocs_sheet_test.rb",
    "chars": 439,
    "preview": "# frozen_string_literal: true\n\nrequire_relative 'test_helper'\nrequire 'time'\n\ndescribe SimpleXlsxReader do\n  let(:one_sh"
  },
  {
    "path": "test/lower_case_sharedstrings_test.rb",
    "chars": 466,
    "preview": "# frozen_string_literal: true\n\nrequire_relative 'test_helper'\n\ndescribe SimpleXlsxReader do\n  let(:lower_case_shared_str"
  },
  {
    "path": "test/namespaces_and_missing_atts_test.rb",
    "chars": 2604,
    "preview": "# frozen_string_literal: true\n\nrequire_relative 'test_helper'\n\ndescribe SimpleXlsxReader do\n  # Based on a real-world sh"
  },
  {
    "path": "test/performance_test.rb",
    "chars": 2712,
    "preview": "# frozen_string_literal: true\n\nrequire_relative 'test_helper'\nrequire 'minitest/benchmark'\n\ndescribe 'SimpleXlsxReader B"
  },
  {
    "path": "test/shared_strings.xml",
    "chars": 2404,
    "preview": "<sst xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\" count=\"6\" uniqueCount=\"5\">\n    <si>\n        <t>Ce"
  },
  {
    "path": "test/simple_xlsx_reader_test.rb",
    "chars": 32357,
    "preview": "# frozen_string_literal: true\n\nrequire_relative 'test_helper'\nrequire 'time'\n\nSXR = SimpleXlsxReader\n\ndescribe SimpleXls"
  },
  {
    "path": "test/styles.xml",
    "chars": 2430,
    "preview": "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n<styleSheet xmlns=\"http://schemas.openxmlformats.org/spreadsheet"
  },
  {
    "path": "test/test_helper.rb",
    "chars": 227,
    "preview": "# frozen_string_literal: true\n\ngem 'minitest'\nrequire 'minitest/autorun'\nrequire 'minitest/spec'\nrequire 'pry'\nrequire '"
  },
  {
    "path": "test/test_xlsx_builder.rb",
    "chars": 2489,
    "preview": "# frozen_string_literal: true\n\nrequire 'nokogiri'\n\nTestXlsxBuilder = Struct.new(:shared_strings, :styles, :sheets, :work"
  }
]

// ... and 8 more files (download for full content)

About this extraction

This page contains the full source code of the woahdae/simple_xlsx_reader GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 38 files (92.0 KB), approximately 26.7k tokens, and a symbol index with 84 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo