Full Code of ankane/informers for AI

master 14e74907c83d cached
30 files
199.8 KB
51.9k tokens
497 symbols
1 requests
Download .txt
Showing preview only (210K chars total). Download the full file or copy to clipboard to get everything.
Repository: ankane/informers
Branch: master
Commit: 14e74907c83d
Files: 30
Total size: 199.8 KB

Directory structure:
gitextract_hoze518r/

├── .github/
│   └── workflows/
│       └── build.yml
├── .gitignore
├── CHANGELOG.md
├── Gemfile
├── LICENSE.txt
├── README.md
├── Rakefile
├── informers.gemspec
├── lib/
│   ├── informers/
│   │   ├── backends/
│   │   │   └── onnx.rb
│   │   ├── configs.rb
│   │   ├── env.rb
│   │   ├── model.rb
│   │   ├── models.rb
│   │   ├── pipelines.rb
│   │   ├── processors.rb
│   │   ├── tokenizers.rb
│   │   ├── utils/
│   │   │   ├── audio.rb
│   │   │   ├── core.rb
│   │   │   ├── dtypes.rb
│   │   │   ├── ffmpeg.rb
│   │   │   ├── generation.rb
│   │   │   ├── hub.rb
│   │   │   ├── image.rb
│   │   │   ├── math.rb
│   │   │   └── tensor.rb
│   │   └── version.rb
│   └── informers.rb
└── test/
    ├── model_test.rb
    ├── pipeline_test.rb
    └── test_helper.rb

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/workflows/build.yml
================================================
name: build
on: [push, pull_request]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - uses: ruby/setup-ruby@v1
        with:
          ruby-version: "4.0"
          bundler-cache: true
      - uses: actions/cache@v5
        with:
          path: ~/.cache/informers
          key: informers
      - run: sudo apt-get update && sudo apt-get install libvips
      - run: bundle exec rake download:files
      - run: bundle exec rake test


================================================
FILE: .gitignore
================================================
/.bundle/
/.yardoc
/_yardoc/
/coverage/
/doc/
/pkg/
/spec/reports/
/test/support/
/tmp/
*.lock


================================================
FILE: CHANGELOG.md
================================================
## 1.3.0 (unreleased)

- Dropped support for Ruby < 3.3

## 1.2.1 (2025-02-01)

- Fixed error when terminal width is zero

## 1.2.0 (2024-11-14)

- Added support for models with external data
- Added `device` option
- Added `dtype` option
- Added `session_options` option

## 1.1.1 (2024-10-14)

- Added `audio-classification` pipeline
- Fixed error with `sentence-transformers/all-MiniLM-L6-v2`

## 1.1.0 (2024-09-17)

- Added more pipelines

## 1.0.3 (2024-08-29)

- Added `model_output` option
- Improved `model_file_name` option

## 1.0.2 (2024-08-28)

- Added `embedding` pipeline
- Added experimental `reranking` pipeline
- Added support for `nomic-ai/nomic-embed-text-v1`

## 1.0.1 (2024-08-27)

- Added support for `Supabase/gte-small` to `Model`
- Fixed error with downloads

## 1.0.0 (2024-08-26)

- Replaced task classes with `pipeline` method
- Added `Model` class
- Dropped support for Ruby < 3.1

## 0.2.0 (2022-09-06)

- Added support for `optimum` and `transformers.onnx` models
- Dropped support for Ruby < 2.7

## 0.1.3 (2021-09-25)

- Added text generation
- Added fill mask

## 0.1.2 (2020-11-24)

- Added feature extraction

## 0.1.1 (2020-10-05)

- Fixed question answering for Ruby < 2.7

## 0.1.0 (2020-10-01)

- First release


================================================
FILE: Gemfile
================================================
source "https://rubygems.org"

gemspec

gem "rake"
gem "minitest"
gem "ruby-vips"


================================================
FILE: LICENSE.txt
================================================

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README.md
================================================
# Informers

:fire: Fast [transformer](https://github.com/huggingface/transformers.js) inference for Ruby

For non-ONNX models, check out [Transformers.rb](https://github.com/ankane/transformers-ruby) :slightly_smiling_face:

[![Build Status](https://github.com/ankane/informers/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/informers/actions)

## Installation

Add this line to your application’s Gemfile:

```ruby
gem "informers"
```

## Getting Started

- [Models](#models)
- [Pipelines](#pipelines)

## Models

Embedding

- [sentence-transformers/all-MiniLM-L6-v2](#sentence-transformersall-MiniLM-L6-v2)
- [sentence-transformers/multi-qa-MiniLM-L6-cos-v1](#sentence-transformersmulti-qa-MiniLM-L6-cos-v1)
- [sentence-transformers/all-mpnet-base-v2](#sentence-transformersall-mpnet-base-v2)
- [sentence-transformers/paraphrase-MiniLM-L6-v2](#sentence-transformersparaphrase-minilm-l6-v2)
- [mixedbread-ai/mxbai-embed-large-v1](#mixedbread-aimxbai-embed-large-v1)
- [Supabase/gte-small](#supabasegte-small)
- [intfloat/e5-base-v2](#intfloate5-base-v2)
- [nomic-ai/nomic-embed-text-v1](#nomic-ainomic-embed-text-v1)
- [BAAI/bge-base-en-v1.5](#baaibge-base-en-v15)
- [jinaai/jina-embeddings-v2-base-en](#jinaaijina-embeddings-v2-base-en)
- [Snowflake/snowflake-arctic-embed-m-v1.5](#snowflakesnowflake-arctic-embed-m-v15)

Reranking

- [mixedbread-ai/mxbai-rerank-base-v1](#mixedbread-aimxbai-rerank-base-v1)
- [jinaai/jina-reranker-v1-turbo-en](#jinaaijina-reranker-v1-turbo-en)
- [BAAI/bge-reranker-base](#baaibge-reranker-base)
- [Xenova/ms-marco-MiniLM-L-6-v2](#xenovams-marco-minilm-l-6-v2)

### sentence-transformers/all-MiniLM-L6-v2

[Docs](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)

```ruby
sentences = ["This is an example sentence", "Each sentence is converted"]

model = Informers.pipeline("embedding", "sentence-transformers/all-MiniLM-L6-v2")
embeddings = model.(sentences)
```

### sentence-transformers/multi-qa-MiniLM-L6-cos-v1

[Docs](https://huggingface.co/Xenova/multi-qa-MiniLM-L6-cos-v1)

```ruby
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]

model = Informers.pipeline("embedding", "sentence-transformers/multi-qa-MiniLM-L6-cos-v1")
query_embedding = model.(query)
doc_embeddings = model.(docs)
scores = doc_embeddings.map { |e| e.zip(query_embedding).sum { |d, q| d * q } }
doc_score_pairs = docs.zip(scores).sort_by { |d, s| -s }
```

### sentence-transformers/all-mpnet-base-v2

[Docs](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)

```ruby
sentences = ["This is an example sentence", "Each sentence is converted"]

model = Informers.pipeline("embedding", "sentence-transformers/all-mpnet-base-v2")
embeddings = model.(sentences)
```

### sentence-transformers/paraphrase-MiniLM-L6-v2

[Docs](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2)

```ruby
sentences = ["This is an example sentence", "Each sentence is converted"]

model = Informers.pipeline("embedding", "sentence-transformers/paraphrase-MiniLM-L6-v2")
embeddings = model.(sentences, normalize: false)
```

### mixedbread-ai/mxbai-embed-large-v1

[Docs](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1)

```ruby
query_prefix = "Represent this sentence for searching relevant passages: "

input = [
  "The dog is barking",
  "The cat is purring",
  query_prefix + "puppy"
]

model = Informers.pipeline("embedding", "mixedbread-ai/mxbai-embed-large-v1")
embeddings = model.(input)
```

### Supabase/gte-small

[Docs](https://huggingface.co/Supabase/gte-small)

```ruby
sentences = ["That is a happy person", "That is a very happy person"]

model = Informers.pipeline("embedding", "Supabase/gte-small")
embeddings = model.(sentences)
```

### intfloat/e5-base-v2

[Docs](https://huggingface.co/intfloat/e5-base-v2)

```ruby
doc_prefix = "passage: "
query_prefix = "query: "

input = [
  doc_prefix + "Ruby is a programming language created by Matz",
  query_prefix + "Ruby creator"
]

model = Informers.pipeline("embedding", "intfloat/e5-base-v2")
embeddings = model.(input)
```

### nomic-ai/nomic-embed-text-v1

[Docs](https://huggingface.co/nomic-ai/nomic-embed-text-v1)

```ruby
doc_prefix = "search_document: "
query_prefix = "search_query: "

input = [
  doc_prefix + "The dog is barking",
  doc_prefix + "The cat is purring",
  query_prefix + "puppy"
]

model = Informers.pipeline("embedding", "nomic-ai/nomic-embed-text-v1")
embeddings = model.(input)
```

### BAAI/bge-base-en-v1.5

[Docs](https://huggingface.co/BAAI/bge-base-en-v1.5)

```ruby
query_prefix = "Represent this sentence for searching relevant passages: "

input = [
  "The dog is barking",
  "The cat is purring",
  query_prefix + "puppy"
]

model = Informers.pipeline("embedding", "BAAI/bge-base-en-v1.5")
embeddings = model.(input)
```

### jinaai/jina-embeddings-v2-base-en

[Docs](https://huggingface.co/jinaai/jina-embeddings-v2-base-en)

```ruby
sentences = ["How is the weather today?", "What is the current weather like today?"]

model = Informers.pipeline("embedding", "jinaai/jina-embeddings-v2-base-en", model_file_name: "../model")
embeddings = model.(sentences)
```

### Snowflake/snowflake-arctic-embed-m-v1.5

[Docs](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5)

```ruby
query_prefix = "Represent this sentence for searching relevant passages: "

input = [
  "The dog is barking",
  "The cat is purring",
  query_prefix + "puppy"
]

model = Informers.pipeline("embedding", "Snowflake/snowflake-arctic-embed-m-v1.5")
embeddings = model.(input, model_output: "sentence_embedding", pooling: "none")
```

### mixedbread-ai/mxbai-rerank-base-v1

[Docs](https://huggingface.co/mixedbread-ai/mxbai-rerank-base-v1)

```ruby
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]

model = Informers.pipeline("reranking", "mixedbread-ai/mxbai-rerank-base-v1")
result = model.(query, docs)
```

### jinaai/jina-reranker-v1-turbo-en

[Docs](https://huggingface.co/jinaai/jina-reranker-v1-turbo-en)

```ruby
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]

model = Informers.pipeline("reranking", "jinaai/jina-reranker-v1-turbo-en")
result = model.(query, docs)
```

### BAAI/bge-reranker-base

[Docs](https://huggingface.co/BAAI/bge-reranker-base)

```ruby
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]

model = Informers.pipeline("reranking", "BAAI/bge-reranker-base")
result = model.(query, docs)
```

### Xenova/ms-marco-MiniLM-L-6-v2

[Docs](https://huggingface.co/Xenova/ms-marco-MiniLM-L-6-v2)

```ruby
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]

model = Informers.pipeline("reranking", "Xenova/ms-marco-MiniLM-L-6-v2")
result = model.(query, docs)
```

### Other

The model must include a `.onnx` file ([example](https://huggingface.co/Xenova/all-MiniLM-L6-v2/tree/main/onnx)). If the file is not at `onnx/model.onnx`, use the `model_file_name` option to specify the location.

## Pipelines

- [Text](#text)
- [Vision](#vision)
- [Audio](#audio)
- [Multimodel](#multimodal)

### Text

Embedding

```ruby
embed = Informers.pipeline("embedding")
embed.("We are very happy to show you the 🤗 Transformers library.")
```

Reranking

```ruby
rerank = Informers.pipeline("reranking")
rerank.("Who created Ruby?", ["Matz created Ruby", "Another doc"])
```

Named-entity recognition

```ruby
ner = Informers.pipeline("ner")
ner.("Ruby is a programming language created by Matz")
```

Sentiment analysis

```ruby
classifier = Informers.pipeline("sentiment-analysis")
classifier.("We are very happy to show you the 🤗 Transformers library.")
```

Question answering

```ruby
qa = Informers.pipeline("question-answering")
qa.("Who invented Ruby?", "Ruby is a programming language created by Matz")
```

Zero-shot classification

```ruby
classifier = Informers.pipeline("zero-shot-classification")
classifier.("text", ["label1", "label2", "label3"])
```

Text generation

```ruby
generator = Informers.pipeline("text-generation")
generator.("I enjoy walking with my cute dog,")
```

Text-to-text generation

```ruby
text2text = Informers.pipeline("text2text-generation")
text2text.("translate from English to French: I'm very happy")
```

Translation

```ruby
translator = Informers.pipeline("translation", "Xenova/nllb-200-distilled-600M")
translator.("जीवन एक चॉकलेट बॉक्स की तरह है।", src_lang: "hin_Deva", tgt_lang: "fra_Latn")
```

Summarization

```ruby
summarizer = Informers.pipeline("summarization")
summarizer.("Many paragraphs of text")
```

Fill mask

```ruby
unmasker = Informers.pipeline("fill-mask")
unmasker.("Paris is the [MASK] of France.")
```

Feature extraction

```ruby
extractor = Informers.pipeline("feature-extraction")
extractor.("We are very happy to show you the 🤗 Transformers library.")
```

### Vision

Note: [ruby-vips](https://github.com/libvips/ruby-vips) is required to load images

Image classification

```ruby
classifier = Informers.pipeline("image-classification")
classifier.("image.jpg")
```

Zero-shot image classification

```ruby
classifier = Informers.pipeline("zero-shot-image-classification")
classifier.("image.jpg", ["label1", "label2", "label3"])
```

Image segmentation

```ruby
segmenter = Informers.pipeline("image-segmentation")
segmenter.("image.jpg")
```

Object detection

```ruby
detector = Informers.pipeline("object-detection")
detector.("image.jpg")
```

Zero-shot object detection

```ruby
detector = Informers.pipeline("zero-shot-object-detection")
detector.("image.jpg", ["label1", "label2", "label3"])
```

Depth estimation

```ruby
estimator = Informers.pipeline("depth-estimation")
estimator.("image.jpg")
```

Image-to-image

```ruby
upscaler = Informers.pipeline("image-to-image")
upscaler.("image.jpg")
```

Image feature extraction

```ruby
extractor = Informers.pipeline("image-feature-extraction")
extractor.("image.jpg")
```

### Audio

Note: [ffmpeg](https://www.ffmpeg.org/) is required to load audio files

Audio classification

```ruby
classifier = Informers.pipeline("audio-classification")
classifier.("audio.wav")
```

### Multimodal

Image captioning

```ruby
captioner = Informers.pipeline("image-to-text")
captioner.("image.jpg")
```

Document question answering

```ruby
qa = Informers.pipeline("document-question-answering")
qa.("image.jpg", "What is the invoice number?")
```

## Reference

Specify a variant of the model if available (`fp32`, `fp16`, `int8`, `uint8`, `q8`, `q4`, `q4f16`, or `bnb4`)

```ruby
Informers.pipeline("embedding", "Xenova/all-MiniLM-L6-v2", dtype: "fp16")
```

Specify a device (`cpu`, `cuda`, or `coreml`)

```ruby
Informers.pipeline("embedding", device: "cuda")
```

Note: Follow [these instructions](https://github.com/ankane/onnxruntime-ruby?tab=readme-ov-file#gpu-support) for `cuda`

Specify ONNX Runtime [session options](https://github.com/ankane/onnxruntime-ruby?tab=readme-ov-file#session-options)

```ruby
Informers.pipeline("embedding", session_options: {log_severity_level: 2})
```

## Credits

This library was ported from [Transformers.js](https://github.com/huggingface/transformers.js) and is available under the same license.

## History

View the [changelog](https://github.com/ankane/informers/blob/master/CHANGELOG.md)

## Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

- [Report bugs](https://github.com/ankane/informers/issues)
- Fix bugs and [submit pull requests](https://github.com/ankane/informers/pulls)
- Write, clarify, or fix documentation
- Suggest or add new features

To get started with development:

```sh
git clone https://github.com/ankane/informers.git
cd informers
bundle install
bundle exec rake download:files
bundle exec rake test
```


================================================
FILE: Rakefile
================================================
require "bundler/gem_tasks"
require "rake/testtask"

Rake::TestTask.new do |t|
  t.pattern = FileList["test/**/*_test.rb"].exclude("test/model_test.rb")
end

task default: :test

def download_file(url)
  require "open-uri"

  file = File.basename(url)
  puts "Downloading #{file}..."
  dest = "test/support/#{file}"
  File.binwrite(dest, URI.parse(url).read)
  puts "Saved #{dest}"
end

namespace :download do
  task :files do
    Dir.mkdir("test/support") unless Dir.exist?("test/support")

    download_file("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg")
    download_file("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_1.png")
  end
end


================================================
FILE: informers.gemspec
================================================
require_relative "lib/informers/version"

Gem::Specification.new do |spec|
  spec.name          = "informers"
  spec.version       = Informers::VERSION
  spec.summary       = "Fast transformer inference for Ruby"
  spec.homepage      = "https://github.com/ankane/informers"
  spec.license       = "Apache-2.0"

  spec.author        = "Andrew Kane"
  spec.email         = "andrew@ankane.org"

  spec.files         = Dir["*.{md,txt}", "{lib}/**/*"]
  spec.require_path  = "lib"

  spec.required_ruby_version = ">= 3.3"

  spec.add_dependency "onnxruntime", ">= 0.9"
  spec.add_dependency "tokenizers", ">= 0.5.3"
end


================================================
FILE: lib/informers/backends/onnx.rb
================================================
module Informers
  module Backends
    module Onnx
      def self.device_to_execution_providers(device)
        case device&.to_s
        when "cpu", nil
          []
        when "cuda"
          ["CUDAExecutionProvider"]
        when "coreml"
          ["CoreMLExecutionProvider"]
        else
          supported_devices = ["cpu", "cuda", "coreml"]
          raise ArgumentError, "Unsupported device: #{device}. Should be one of: #{supported_devices.join(", ")}"
        end
      end
    end
  end
end


================================================
FILE: lib/informers/configs.rb
================================================
module Informers
  class PretrainedConfig
    def initialize(config_json)
      @config_json = config_json.to_h
    end

    def [](key)
      @config_json[key.to_s]
    end

    def []=(key, value)
      @config_json[key.to_s] = value
    end

    def to_h
      @config_json.to_h
    end

    def self.from_pretrained(
      pretrained_model_name_or_path,
      progress_callback: nil,
      config: nil,
      cache_dir: nil,
      local_files_only: false,
      revision: "main",
      **kwargs
    )
      data = config || load_config(
        pretrained_model_name_or_path,
        progress_callback:,
        config:,
        cache_dir:,
        local_files_only:,
        revision:
      )
      new(data)
    end

    def self.load_config(pretrained_model_name_or_path, **options)
      info = Utils::Hub.get_model_json(pretrained_model_name_or_path, "config.json", true, **options)
      info
    end
  end

  class AutoConfig
    def self.from_pretrained(...)
      PretrainedConfig.from_pretrained(...)
    end
  end
end


================================================
FILE: lib/informers/env.rb
================================================
module Informers
  CACHE_HOME = ENV.fetch("XDG_CACHE_HOME", File.join(ENV.fetch("HOME"), ".cache"))
  DEFAULT_CACHE_DIR = File.expand_path(File.join(CACHE_HOME, "informers"))

  class << self
    attr_accessor :allow_remote_models, :remote_host, :remote_path_template, :cache_dir
  end

  self.allow_remote_models = ENV["INFORMERS_OFFLINE"].to_s.empty?
  self.remote_host = "https://huggingface.co/"
  self.remote_path_template = "{model}/resolve/{revision}/"

  self.cache_dir = DEFAULT_CACHE_DIR
end


================================================
FILE: lib/informers/model.rb
================================================
module Informers
  # TODO remove in 2.0
  class Model
    def initialize(model_id, quantized: false)
      @model = Informers.pipeline("embedding", model_id, quantized: quantized)
      @options = model_id == "mixedbread-ai/mxbai-embed-large-v1" ? {pooling: "cls", normalize: false} : {}
    end

    def embed(texts)
      @model.(texts, **@options)
    end
  end
end


================================================
FILE: lib/informers/models.rb
================================================
module Informers
  MODEL_TYPES = {
    EncoderOnly: 0,
    EncoderDecoder: 1,
    Seq2Seq: 2,
    Vision2Seq: 3,
    DecoderOnly: 4,
    MaskGeneration: 5
  }

  # NOTE: These will be populated fully later
  MODEL_TYPE_MAPPING = {}
  MODEL_NAME_TO_CLASS_MAPPING = {}
  MODEL_CLASS_TO_NAME_MAPPING = {}

  class PretrainedMixin
    def self.from_pretrained(
      pretrained_model_name_or_path,
      quantized: true,
      progress_callback: nil,
      config: nil,
      cache_dir: nil,
      local_files_only: false,
      revision: "main",
      device: nil,
      dtype: nil,
      model_file_name: nil,
      session_options: {}
    )
      options = {
        quantized:,
        progress_callback:,
        config:,
        cache_dir:,
        local_files_only:,
        revision:,
        device:,
        dtype:,
        model_file_name:,
        session_options:
      }
      config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **options)
      if options[:config].nil?
        # If no config was passed, reuse this config for future processing
        options[:config] = config
      end

      if !const_defined?(:MODEL_CLASS_MAPPINGS)
        raise Error, "`MODEL_CLASS_MAPPINGS` not implemented for this type of `AutoClass`: #{name}"
      end

      const_get(:MODEL_CLASS_MAPPINGS).each do |model_class_mapping|
        model_info = model_class_mapping[config[:model_type]]
        if !model_info
          next # Item not found in this mapping
        end
        return model_info[1].from_pretrained(pretrained_model_name_or_path, **options)
      end

      if const_defined?(:BASE_IF_FAIL)
        warn "Unknown model class #{config[:model_type].inspect}, attempting to construct from base class."
        PreTrainedModel.from_pretrained(pretrained_model_name_or_path, **options)
      else
        raise Error, "Unsupported model type: #{config[:model_type]}"
      end
    end
  end

  class PreTrainedModel
    MAIN_INPUT_NAME = :input_ids

    attr_reader :config

    def initialize(config, session)
      super()

      @config = config
      @session = session

      @output_names = nil

      model_name = MODEL_CLASS_TO_NAME_MAPPING[self.class]
      model_type = MODEL_TYPE_MAPPING[model_name]

      case model_type
      when MODEL_TYPES[:DecoderOnly]
        @can_generate = true

        @run_beam = method(:decoder_run_beam)
        @get_start_beams = method(:decoder_start_beams)
        @update_beam = method(:decoder_update_beam)
        @forward = method(:decoder_forward)

      when MODEL_TYPES[:Seq2Seq], MODEL_TYPES[:Vision2Seq]
        @can_generate = true

        @run_beam = method(:seq2seq_run_beam)
        @get_start_beams = method(:seq2seq_start_beams)
        @update_beam = method(:seq2seq_update_beam)
        @forward = method(:seq2seq_forward)

      when MODEL_TYPES[:EncoderDecoder]
        @forward = method(:encoder_forward)

      else
        @forward = method(:encoder_forward)
      end
    end

    def self.from_pretrained(
      pretrained_model_name_or_path,
      quantized: true,
      progress_callback: nil,
      config: nil,
      cache_dir: nil,
      local_files_only: false,
      revision: "main",
      device: nil,
      dtype: nil,
      model_file_name: nil,
      session_options: {}
    )
      options = {
        quantized:,
        progress_callback:,
        config:,
        cache_dir:,
        local_files_only:,
        revision:,
        device:,
        dtype:,
        model_file_name:,
        session_options:
      }

      model_name = MODEL_CLASS_TO_NAME_MAPPING[self]
      model_type = MODEL_TYPE_MAPPING[model_name]

      config ||= AutoConfig.from_pretrained(pretrained_model_name_or_path, **options)

      if model_type == MODEL_TYPES[:DecoderOnly]
        info = [
          construct_session(pretrained_model_name_or_path, options[:model_file_name] || "decoder_model_merged", **options),
          Utils::Hub.get_model_json(pretrained_model_name_or_path, "generation_config.json", false, **options)
        ]

      elsif model_type == MODEL_TYPES[:Seq2Seq] || model_type == MODEL_TYPES[:Vision2Seq]
        info = [
          construct_session(pretrained_model_name_or_path, "encoder_model", **options),
          construct_session(pretrained_model_name_or_path, "decoder_model_merged", **options),
          Utils::Hub.get_model_json(pretrained_model_name_or_path, "generation_config.json", false, **options)
        ]

      elsif model_type == MODEL_TYPES[:MaskGeneration]
        info = [
          construct_session(pretrained_model_name_or_path, "vision_encoder", **options),
          construct_session(pretrained_model_name_or_path, "prompt_encoder_mask_decoder", **options)
        ]

      elsif model_type == MODEL_TYPES[:EncoderDecoder]
        info = [
          construct_session(pretrained_model_name_or_path, "encoder_model", **options),
          construct_session(pretrained_model_name_or_path, "decoder_model_merged", **options)
        ]

      else
        if model_type != MODEL_TYPES[:EncoderOnly]
          warn "Model type for '#{model_name || config[:model_type]}' not found, assuming encoder-only architecture. Please report this."
        end
        info = [
          construct_session(pretrained_model_name_or_path, options[:model_file_name] || "model", **options)
        ]
      end

      new(config, *info)
    end

    def self.construct_session(pretrained_model_name_or_path, file_name, **options)
      prefix = "onnx/"
      if file_name.start_with?("../")
        prefix = ""
        file_name = file_name[3..]
      elsif file_name.start_with?("/")
        prefix = ""
        file_name = file_name[1..]
      end
      dtype = options[:dtype] || (options[:quantized] ? "q8" : "fp32")
      suffix = Utils::DEFAULT_DTYPE_SUFFIX_MAPPING[dtype.to_sym]
      if !suffix
        raise ArgumentError, "Invalid dtype: #{dtype}. Should be one of: #{Utils::DEFAULT_DTYPE_SUFFIX_MAPPING.keys.join(", ")}"
      end
      model_file_name = "#{prefix}#{file_name}#{suffix}.onnx"
      path = Utils::Hub.get_model_file(pretrained_model_name_or_path, model_file_name, true, **options)

      session_options = {
        providers: Backends::Onnx.device_to_execution_providers(options[:device]),
        log_severity_level: 4
      }.merge(options[:session_options] || {})

      begin
        OnnxRuntime::InferenceSession.new(path, **session_options)
      rescue OnnxRuntime::Error => e
        raise e unless e.message.include?("No such file or directory") && e.message.include?(".onnx_data")

        Utils::Hub.get_model_file(pretrained_model_name_or_path, "#{model_file_name}_data", true, **options)
        OnnxRuntime::InferenceSession.new(path, **session_options)
      end
    end

    def call(model_inputs, **kwargs)
      @forward.(model_inputs, **kwargs)
    end

    def generate(inputs, generation_config = nil, logits_processor = nil, inputs_attention_mask: nil)
      if !@can_generate
        model_name = MODEL_CLASS_TO_NAME_MAPPING[self.class]
        error_message = "The current model class (#{model_name}) is not compatible with `.generate()`, as it doesn't have a language model head."
        raise Error, error_message
      end

      if !inputs.is_a?(Array)
        raise ArgumentError, "`inputs` must be an Array, but is #{inputs.class.name}"
      end

      if @config[:is_encoder_decoder]
        # Generating from the encoder outputs
        input_ids_seq_length = 0
      else
        input_ids_seq_length = inputs.length

        # decoder-only
        if input_ids_seq_length == 0
          raise Error, "Must supply a non-empty array of input token ids."
        end
      end

      # Update generation config with defaults
      generation_config = get_generation_config(generation_config)

      logits_processor ||= Utils::LogitsProcessorList.new

      # Update logits processor
      logits_processor = get_logits_processor(
        generation_config,
        input_ids_seq_length,
        logits_processor
      )

      eos_token_ids = generation_config[:eos_token_id]
      if !eos_token_ids.nil? && !eos_token_ids.is_a?(Array)
        eos_token_ids = [eos_token_ids]
      end

      num_output_tokens = 1
      max_output_tokens = num_output_tokens + (generation_config[:max_new_tokens] || Float::INFINITY)

      # Only use max length if max_new_tokens is not provided
      use_max_length = generation_config[:max_length].is_a?(Integer) && generation_config[:max_new_tokens].nil?
      sampler = Utils::Sampler.get_sampler(generation_config)

      beams = get_start_beams(inputs, generation_config, num_output_tokens, inputs_attention_mask)

      while beams.any? { |x| !x[:done] } && num_output_tokens < max_output_tokens
        newest_beams = []
        beams.each do |beam|
          if beam[:done]
            # Add this beam back into the pool
            newest_beams << beam
            next
          end
          if use_max_length && beam[:output_token_ids].length >= generation_config["max_length"]
            # Set this beam to done and add it back into the pool
            beam[:done] = true
            newest_beams << beam
            next
          end

          output = run_beam(beam)

          # add attentions/scores to beam only if user requested
          if generation_config["output_attentions"]
            add_attentions_to_beam(beam, output)
          end

          # Logits are of the form [batch_size, out_seq_length, vocab_size]
          # In most cases, this will be [batch_size, 1, vocab_size]
          # So, we select the last token's logits:
          # (equivalent to `logits = outputs.logits[:, -1, :]`)
          logits = output["logits"].map { |v| v[-1] }

          # Apply logits processor
          logits_processor.(beam[:output_token_ids], logits)

          sampled_tokens = sampler.(logits)
          sampled_tokens.each do |new_token_id, log_prob|
            # use previous beam as a starting point
            new_beam = beam.dup

            # update new beam
            update_beam(new_beam, new_token_id)

            new_beam[:score] += log_prob

            if eos_token_ids && eos_token_ids.include?(new_token_id)
              new_beam[:done] = true
            end

            newest_beams << new_beam
          end
        end
        num_output_tokens += 1

        # Next, we get the best beams, per ID
        newest_beams =
          group_beams(newest_beams).map do |group|
            group.sort_by { |v| -v[:score] }[0...generation_config["num_beams"]]
          end

        # Flatten beams
        beams = newest_beams.flatten(1)

        # Run callback
        if generation_config["callback_function"]
          generation_config["callback_function"].(beams)
        end
      end

      # TODO: Ensure that we can return non-batched outputs

      grouped_beams = group_beams(beams)

      get_flattened = lambda do |key|
        grouped_beams.flat_map do |batch|
          if generation_config["num_return_sequences"] > 1
            raise Todo
          else
            [batch[0][key]]
          end
        end
      end

      sequences = get_flattened.(:output_token_ids) # [1, seqLength]

      if generation_config["return_dict_in_generate"]
        raise Todo
      else
        sequences
      end
    end

    private

    def get_logits_processor(
      generation_config,
      input_ids_seq_length,
      logits_processor = nil
    )
      processors = Utils::LogitsProcessorList.new

      if !generation_config["repetition_penalty"].nil? && generation_config["repetition_penalty"] != 1.0
        processors.push(Utils::RepetitionPenaltyLogitsProcessor.new(generation_config["repetition_penalty"]))
      end

      if !generation_config["no_repeat_ngram_size"].nil? && generation_config["no_repeat_ngram_size"] > 0
        processors.push(Utils::NoRepeatNGramLogitsProcessor.new(generation_config["no_repeat_ngram_size"]))
      end

      if !generation_config["bad_words_ids"].nil?
        processors.push(Utils::NoBadWordsLogitsProcessor.new(generation_config["bad_words_ids"], generation_config["eos_token_id"]))
      end

      if !generation_config["min_length"].nil? && !generation_config["eos_token_id"].nil? && generation_config["min_length"] > 0
        processors.push(Utils::MinLengthLogitsProcessor.new(generation_config["min_length"], generation_config["eos_token_id"]))
      end

      if !generation_config["min_new_tokens"].nil? && !generation_config["eos_token_id"].nil? && generation_config["min_new_tokens"] > 0
        processors.push(Utils::MinNewTokensLengthLogitsProcessor.new(
          input_ids_seq_length,
          generation_config["min_new_tokens"],
          generation_config["eos_token_id"]
        ))
      end

      if !generation_config["forced_bos_token_id"].nil?
        processors.push(Utils::ForcedBOSTokenLogitsProcessor.new(generation_config["forced_bos_token_id"]))
      end

      if !generation_config["forced_eos_token_id"].nil?
        processors.push(Utils::ForcedEOSTokenLogitsProcessor.new(
          generation_config["max_length"],
          generation_config["forced_eos_token_id"]
        ))
      end

      if !generation_config["begin_suppress_tokens"].nil?
        raise Todo
      end

      if !generation_config["forced_decoder_ids"].nil?
        processors.push(Utils::ForceTokensLogitsProcessor.new(generation_config["forced_decoder_ids"]))
      end

      if !logits_processor.nil?
        processors.concat(logits_processor)
      end

      processors
    end

    def get_generation_config(generation_config)
      # Create empty generation config (contains defaults)
      # We pass `@config` so that if `eos_token_id` or `bos_token_id` exist in the model's config, we will use them
      gen_config = Utils::GenerationConfig.new(@config.to_h)

      # Apply model's generation config, if it exists
      if @generation_config
        gen_config.merge!(@generation_config)
      end

      # Finally, use any generation config specified by the user
      # when calling `generate`
      if !generation_config.nil?
        gen_config.merge!(generation_config)
      end

      gen_config
    end

    def seq2seq_forward(model_inputs)
      encoder_outputs = model_inputs[:encoder_outputs]
      past_key_values = model_inputs[:past_key_values]

      if !encoder_outputs
        # Encoder outputs are not given, so we must compute them.
        encoder_outputs = encoder_forward(model_inputs)[0]
      end
      decoder_feeds = {
        input_ids: model_inputs[:decoder_input_ids],
        encoder_hidden_states: encoder_outputs
      }
      use_cache_branch = !!past_key_values

      if @decoder_merged_session.inputs.map { |v| v[:name] }.include?("use_cache_branch")
        decoder_feeds[:use_cache_branch] = [use_cache_branch]
      end

      if @decoder_merged_session.inputs.map { |v| v[:name] }.include?("encoder_attention_mask")
        decoder_feeds[:encoder_attention_mask] = model_inputs[:attention_mask]
      end

      prepare_position_ids(@decoder_merged_session, decoder_feeds, use_cache_branch)
      add_past_key_values(decoder_feeds, past_key_values)

      decoder_results = session_run(@decoder_merged_session, decoder_feeds)
      decoder_results = @decoder_merged_session.outputs.map { |v| v[:name] }.zip(decoder_results).to_h
      logits = decoder_results["logits"]
      past_key_values = get_past_key_values(decoder_results, past_key_values)

      # Get cross attention and/or decoder attentions if they are present
      attns = get_attentions(decoder_results)

      Seq2SeqLMOutput.new(logits, past_key_values, encoder_outputs, attns["decoder_attentions"], attns["cross_attentions"])
    end

    def prepare_position_ids(session, feeds, use_cache_branch)
      if !session.inputs.map { |v| v[:name] }.include?("position_ids")
        return
      end

      raise Todo
    end

    def get_past_key_values(decoder_results, past_key_values)
      pkvs = {}

      decoder_results.each_key do |name|
        if name.start_with?("present")
          new_name = name.sub("present", "past_key_values")

          if past_key_values && name.include?("encoder")
            # Optimization introduced by optimum to reuse past key values. So, we just replace the constant
            # outputs with the previous past key values.
            # https://github.com/huggingface/optimum/blob/0bf2c05fb7e1182b52d21b703cfc95fd9e4ea3dc/optimum/onnxruntime/base.py#L677-L704
            pkvs[new_name] = past_key_values[new_name]
          else
            pkvs[new_name] = decoder_results[name]
          end
        end
      end
      pkvs
    end

    def get_attentions(decoder_results)
      attns = {}

      ["cross_attentions", "decoder_attentions"].each do |attn_name|
        result = []
        decoder_results.each_key do |name|
          if name.start_with?(attn_name)
            index = name.split(".").pop
            result[index] = decoder_results[name]
          end
        end
        attns[attn_name] = result
      end
      attns
    end

    def add_past_key_values(decoder_feeds, past_key_values)
      if past_key_values
        decoder_feeds.merge!(past_key_values)
      else
        # TODO support batches (i.e., batch_size > 1)
        batch_size = 1

        if @config[:is_encoder_decoder] && (!@add_encoder_pkv.nil? ? @add_encoder_pkv : true)
          _encoder_dims = [batch_size, @num_encoder_heads, 0, @encoder_dim_kv]
          _decoder_dims = [batch_size, @num_decoder_heads, 0, @decoder_dim_kv]
          @num_decoder_layers.times do |i|
            # decoder_feeds["past_key_values.#{i}.encoder.key"] = OnnxRuntime::OrtValue.from_shape_and_type(encoder_dims, :float)
            # decoder_feeds["past_key_values.#{i}.encoder.value"] = OnnxRuntime::OrtValue.from_shape_and_type(encoder_dims, :float)
            # decoder_feeds["past_key_values.#{i}.decoder.key"] = OnnxRuntime::OrtValue.from_shape_and_type(decoder_dims, :float)
            # decoder_feeds["past_key_values.#{i}.decoder.value"] = OnnxRuntime::OrtValue.from_shape_and_type(decoder_dims, :float)
          end
        elsif @config[:model_type] == "falcon"
          raise Todo
        elsif @config[:multi_query]
          raise Todo
        elsif @config[:model_type] == "bloom"
          raise Todo
        else
          _dims = [batch_size, @num_heads, 0, @dim_kv]
          @num_layers.times do |i|
            # decoder_feeds["past_key_values.#{i}.key"] = OnnxRuntime::OrtValue.from_shape_and_type(dims, :float)
            # decoder_feeds["past_key_values.#{i}.value"] = OnnxRuntime::OrtValue.from_shape_and_type(dims, :float)
          end
        end
      end
    end

    def seq2seq_start_beams(input_token_ids, generation_config, num_output_tokens, inputs_attention_mask = nil)
      beams = []
      beam_id = 0

      requires_attention_mask = !@requires_attention_mask.nil? ? @requires_attention_mask : true

      # decoder_input_ids == output_token_ids
      decoder_input_ids =
        generation_config["decoder_input_ids"] ||
        generation_config["decoder_start_token_id"] ||
        generation_config["bos_token_id"] ||
        generation_config["eos_token_id"]

      if !decoder_input_ids.is_a?(Array)
        decoder_input_ids = [decoder_input_ids]
      end

      input_token_ids.each do |tokens|
        # TODO: Improve
        # Currently, just add back batch dimension.
        # In future, allow for true parallel execution
        tokens = [tokens]

        # Create beam
        start = {
          inputs: tokens,
          encoder_outputs: nil,
          prev_model_outputs: nil,

          output_token_ids: decoder_input_ids,
          done: false,
          score: 0,
          id: beam_id # assign unique id to beams
        }
        beam_id += 1

        if requires_attention_mask
          start[:attention_mask] = prepare_attention_mask(tokens)
        end

        beams << start
      end

      beams
    end

    def prepare_attention_mask(tokens)
      # Prepare attention mask
      pad_token_id = @config["pad_token_id"]
      eos_token_id = @config["eos_token_id"]
      if eos_token_id.is_a?(Integer)
        eos_token_id = [eos_token_id]
      end

      is_pad_token_in_inputs = !tokens.index(pad_token_id).nil?
      is_pad_token_not_equal_to_eos_token_id = eos_token_id.nil? || !eos_token_id.include?(pad_token_id)

      if is_pad_token_in_inputs && is_pad_token_not_equal_to_eos_token_id
        raise Todo
      else
        Utils.ones_like(tokens)
      end
    end

    def seq2seq_run_beam(beam)
      input_name = self.class.const_get(:MAIN_INPUT_NAME)

      decoder_input_ids = beam[:output_token_ids]
      if beam[:prev_model_outputs]
        # After the first step, `prev_model_outputs` won't be null.
        # So, we cut decoder_input_ids if past is used
        decoder_input_ids = [decoder_input_ids[-1]]
      end

      # 1. Prepare
      model_inputs = {
        input_name => beam[:inputs],
        decoder_input_ids: [decoder_input_ids],
        encoder_outputs: beam[:encoder_outputs],
        past_key_values: beam[:prev_model_outputs] && beam[:prev_model_outputs][:past_key_values]
      }
      if beam[:attention_mask]
        model_inputs[:attention_mask] = beam[:attention_mask]
      end

      # 2. Run
      output = @forward.(model_inputs)

      # 3. Update
      beam[:prev_model_outputs] = output
      beam[:encoder_outputs] = output[:encoder_outputs]

      output
    end

    def seq2seq_update_beam(beam, new_token_id)
      beam[:output_token_ids] += [new_token_id]
    end

    def group_beams(beams)
      # Group beams by their ids
      groups = {}
      beams.each do |obj|
        if !groups[obj[:id]]
          groups[obj[:id]] = [obj]
        else
          groups[obj[:id]] << obj
        end
      end
      groups.values
    end

    def encoder_forward(model_inputs, output_names: nil)
      encoder_feeds = {}
      @session.inputs.each do |input|
        key = input[:name].to_sym
        encoder_feeds[key] = model_inputs[key]
      end
      if @session.inputs.any? { |v| v[:name] == "token_type_ids" } && !encoder_feeds[:token_type_ids]
        raise Todo
      end
      session_run(@session, encoder_feeds, output_names:)
    end

    def decoder_forward(model_inputs)
      input_ids, past_key_values, attention_mask =
        model_inputs.values_at(:input_ids, :past_key_values, :attention_mask)
      decoder_feeds = {
        input_ids: input_ids,
        attention_mask: attention_mask || prepare_attention_mask(input_ids)
      }
      use_cache_branch = !!past_key_values

      if @session.inputs.map { |v| v[:name] }.include?("use_cache_branch")
        decoder_feeds[:use_cache_branch] = [use_cache_branch]
      end

      prepare_position_ids(@session, decoder_feeds, use_cache_branch)

      add_past_key_values(decoder_feeds, past_key_values)

      decoder_results = session_run(@session, decoder_feeds)
      decoder_results = @session.outputs.map { |v| v[:name] }.zip(decoder_results).to_h

      logits = decoder_results["logits"]

      past_key_values = get_past_key_values(decoder_results, past_key_values)
      {"logits" => logits, past_key_values: past_key_values}
    end

    def decoder_start_beams(input_token_ids, generation_config, num_output_tokens, inputs_attention_mask)
      beams = []

      beam_id = 0
      input_token_ids.each do |tokens|
        output_token_ids = tokens.dup

        # TODO: Improve
        # Currently, just add back batch dimension.
        # In future, allow for true parallel execution
        tokens = [tokens]

        if inputs_attention_mask
          attn_mask = inputs_attention_mask[beam_id]
          attn_mask = [attn_mask]
        else
          attn_mask = prepare_attention_mask(tokens)
        end

        start = {
          input: tokens,
          model_input_ids: tokens,
          attention_mask: attn_mask,
          prev_model_outputs: nil,

          output_token_ids: output_token_ids,
          num_output_tokens: num_output_tokens,

          done: false,
          score: 0,
          id: beam_id # assign unique id to beams
        }
        beam_id += 1

        beams << start
      end
      beams
    end

    def decoder_run_beam(beam)
      attn_mask_data = Array.new(beam[:output_token_ids].length, 1)

      # 1. Prepare
      model_inputs = {
        input_ids: beam[:model_input_ids],
        attention_mask: [attn_mask_data],
        past_key_values: beam[:prev_model_outputs] && beam[:prev_model_outputs][:past_key_values]
      }

      # 2. Run
      output = @forward.(model_inputs)

      # 3. Update
      beam[:prev_model_outputs] = output

      output
    end

    def decoder_update_beam(beam, new_token_id)
      beam[:output_token_ids] += [new_token_id]
      beam[:model_input_ids] = [[new_token_id]]
    end

    def session_run(session, inputs, output_names: nil)
      checked_inputs = validate_inputs(session, inputs)
      begin
        output = session.run(output_names || @output_names, checked_inputs)
        output = replace_tensors(output)
        output
      rescue => e
        raise e
      end
    end

    # TODO
    def replace_tensors(obj)
      obj
    end

    # TODO
    def validate_inputs(session, inputs)
      inputs
    end

    def get_start_beams(input_token_ids, generation_config, num_output_tokens, inputs_attention_mask)
      @get_start_beams.(input_token_ids, generation_config, num_output_tokens, inputs_attention_mask)
    end

    def run_beam(beam)
      @run_beam.(beam)
    end

    def update_beam(beam, new_token_id)
      @update_beam.(beam, new_token_id)
    end
  end

  class BertPreTrainedModel < PreTrainedModel
  end

  class BertModel < BertPreTrainedModel
  end

  class BertForMaskedLM < BertPreTrainedModel
    def call(model_inputs)
      MaskedLMOutput.new(*super(model_inputs))
    end
  end

  class BertForSequenceClassification < BertPreTrainedModel
    def call(model_inputs)
      SequenceClassifierOutput.new(*super(model_inputs))
    end
  end

  class BertForTokenClassification < BertPreTrainedModel
    def call(model_inputs)
      TokenClassifierOutput.new(*super(model_inputs))
    end
  end

  class ModernBertPreTrainedModel < PreTrainedModel
  end

  class ModernBertModel < ModernBertPreTrainedModel
  end

  class ModernBertForMaskedLM < ModernBertPreTrainedModel
    def call(model_inputs)
      MaskedLMOutput.new(*super(model_inputs))
    end
  end

  class ModernBertForSequenceClassification < ModernBertPreTrainedModel
    def call(model_inputs)
      SequenceClassifierOutput.new(*super(model_inputs))
    end
  end

  class ModernBertForTokenClassification < ModernBertPreTrainedModel
    def call(model_inputs)
      TokenClassifierOutput.new(*super(model_inputs))
    end
  end

  class NomicBertPreTrainedModel < PreTrainedModel
  end

  class NomicBertModel < NomicBertPreTrainedModel
  end

  class ConvBertPreTrainedModel < PreTrainedModel
  end

  class ConvBertModel < ConvBertPreTrainedModel
  end

  class ElectraPreTrainedModel < PreTrainedModel
  end

  # TODO add ElectraForPreTraining
  class ElectraModel < ElectraPreTrainedModel
  end

  class DebertaV2PreTrainedModel < PreTrainedModel
  end

  class DebertaV2Model < DebertaV2PreTrainedModel
  end

  class DistilBertPreTrainedModel < PreTrainedModel
  end

  class DistilBertModel < DistilBertPreTrainedModel
  end

  class DistilBertForSequenceClassification < DistilBertPreTrainedModel
    def call(model_inputs)
      SequenceClassifierOutput.new(*super(model_inputs))
    end
  end

  class DistilBertForQuestionAnswering < DistilBertPreTrainedModel
    def call(model_inputs)
      QuestionAnsweringModelOutput.new(*super(model_inputs))
    end
  end

  class MPNetPreTrainedModel < PreTrainedModel
  end

  class MPNetModel < MPNetPreTrainedModel
  end

  class T5PreTrainedModel < PreTrainedModel
  end

  class T5Model < T5PreTrainedModel
  end

  class T5ForConditionalGeneration < T5PreTrainedModel
    def initialize(config, session, decoder_merged_session, generation_config)
      super(config, session)
      @decoder_merged_session = decoder_merged_session
      @generation_config = generation_config

      @num_decoder_layers = @config[:num_decoder_layers]
      @num_decoder_heads = @config[:num_heads]
      @decoder_dim_kv = @config[:d_kv]

      @num_encoder_layers = @config[:num_layers]
      @num_encoder_heads = @config[:num_heads]
      @encoder_dim_kv = @config[:d_kv]
    end
  end

  class BartPretrainedModel < PreTrainedModel
  end

  class BartModel < BartPretrainedModel
  end

  class BartForConditionalGeneration < BartPretrainedModel
    def initialize(config, session, decoder_merged_session, generation_config)
      super(config, session)
      @decoder_merged_session = decoder_merged_session
      @generation_config = generation_config

      @num_decoder_layers = @config["decoder_layers"]
      @num_decoder_heads = @config["decoder_attention_heads"]
      @decoder_dim_kv = @config["d_model"] / @num_decoder_heads.to_f

      @num_encoder_layers = @config["encoder_layers"]
      @num_encoder_heads = @config["encoder_attention_heads"]
      @encoder_dim_kv = @config["d_model"] / @num_encoder_heads
    end
  end

  class BartForSequenceClassification < BartPretrainedModel
    def call(model_inputs)
      SequenceClassifierOutput.new(*super(model_inputs))
    end
  end

  class MBartPreTrainedModel < PreTrainedModel
  end

  class MBartModel < MBartPreTrainedModel
  end

  class MBartForCausalLM < MBartPreTrainedModel
    attr_reader :num_decoder_layers, :num_decoder_heads, :decoder_dim_kv,
      :num_encoder_layers, :num_encoder_heads, :encoder_dim_kv

    def initialize(config, decoder_merged_session, generation_config)
      super(config, decoder_merged_session)
      @generation_config = generation_config

      @num_decoder_layers = @config["decoder_layers"]
      @num_decoder_heads = @config["decoder_attention_heads"]
      @decoder_dim_kv = @config["d_model"] / @num_decoder_heads.to_f

      @num_encoder_layers = @config["encoder_layers"]
      @num_encoder_heads = @config["encoder_attention_heads"]
      @encoder_dim_kv = @config["d_model"] / @num_encoder_heads.to_f
    end
  end

  class M2M100PreTrainedModel < PreTrainedModel
  end

  class M2M100Model < M2M100PreTrainedModel
  end

  class M2M100ForConditionalGeneration < M2M100PreTrainedModel
    def initialize(config, session, decoder_merged_session, generation_config)
      super(config, session)
      @decoder_merged_session = decoder_merged_session
      @generation_config = generation_config

      @num_decoder_layers = @config["decoder_layers"]
      @num_decoder_heads = @config["decoder_attention_heads"]
      @decoder_dim_kv = @config["d_model"] / @num_decoder_heads.to_f

      @num_encoder_layers = @config["encoder_layers"]
      @num_encoder_heads = @config["encoder_attention_heads"]
      @encoder_dim_kv = @config["d_model"] / @num_encoder_heads.to_f
    end
  end

  class Wav2Vec2PreTrainedModel < PreTrainedModel
  end

  class Wav2Vec2Model < Wav2Vec2PreTrainedModel
  end

  class Wav2Vec2ForSequenceClassification < Wav2Vec2PreTrainedModel
    def call(model_inputs)
      SequenceClassifierOutput.new(*super(model_inputs))
    end
  end

  class RobertaPreTrainedModel < PreTrainedModel
  end

  class RobertaModel < RobertaPreTrainedModel
  end

  class RobertaForMaskedLM < RobertaPreTrainedModel
    def call(model_inputs)
      MaskedLMOutput.new(*super(model_inputs))
    end
  end

  class RobertaForTokenClassification <  RobertaPreTrainedModel
    def call(model_inputs)
      TokenClassifierOutput.new(*super(model_inputs))
    end
  end

  class RobertaForSequenceClassification < RobertaPreTrainedModel
    def call(model_inputs)
      SequenceClassifierOutput.new(*super(model_inputs))
    end
  end

  class XLMRobertaPreTrainedModel < PreTrainedModel
  end

  class XLMRobertaModel < XLMRobertaPreTrainedModel
  end

  class XLMRobertaForSequenceClassification < XLMRobertaPreTrainedModel
    def call(model_inputs)
      SequenceClassifierOutput.new(*super(model_inputs))
    end
  end

  class ViTPreTrainedModel < PreTrainedModel
  end

  class ViTModel < ViTPreTrainedModel
  end

  class ViTForImageClassification < ViTPreTrainedModel
    def call(model_inputs)
      SequenceClassifierOutput.new(*super(model_inputs))
    end
  end

  class CLIPPreTrainedModel < PreTrainedModel
  end

  class CLIPModel < CLIPPreTrainedModel
  end

  class GPT2PreTrainedModel < PreTrainedModel
    attr_reader :num_heads, :num_layers, :dim_kv

    def initialize(config, session, generation_config)
      super(config, session)
      @generation_config = generation_config

      # config doesn't contain pad_token_id, so we assume it is the eos_token_id
      @config["pad_token_id"] = @config["eos_token_id"]

      @num_heads = @config["n_head"]
      @num_layers = @config["n_layer"]
      @dim_kv = @config["n_embd"] / @num_heads.to_f
    end
  end

  class GPT2Model < GPT2PreTrainedModel
  end

  class GPT2LMHeadModel < GPT2PreTrainedModel
  end

  class OwlViTPreTrainedModel < PreTrainedModel
  end

  class OwlViTModel < OwlViTPreTrainedModel
  end

  class OwlViTForObjectDetection < OwlViTPreTrainedModel
  end

  class DetrPreTrainedModel < PreTrainedModel
  end

  class DetrModel < DetrPreTrainedModel
  end

  class DetrForObjectDetection < DetrPreTrainedModel
    def call(model_inputs)
      DetrObjectDetectionOutput.new(*super(model_inputs))
    end
  end

  class DetrForSegmentation < DetrPreTrainedModel
    def call(model_inputs)
      DetrSegmentationOutput.new(*super(model_inputs))
    end
  end

  class Swin2SRPreTrainedModel < PreTrainedModel
  end

  class Swin2SRModel < Swin2SRPreTrainedModel
  end

  class Swin2SRForImageSuperResolution < Swin2SRPreTrainedModel
  end

  class DPTPreTrainedModel < PreTrainedModel
  end

  class DPTModel < DPTPreTrainedModel
  end

  class DPTForDepthEstimation < DPTPreTrainedModel
  end

  class VisionEncoderDecoderModel < PreTrainedModel
    MAIN_INPUT_NAME = :pixel_values

    def initialize(config, session, decoder_merged_session, generation_config)
      super(config, session)
      @decoder_merged_session = decoder_merged_session
      @generation_config = generation_config

      # Extract configs
      encoder_config = @config["encoder"]
      decoder_config = @config["decoder"]

      # Validate encoder
      encoder_model_type = encoder_config["model_type"]
      encoder_model = MODEL_MAPPING_NAMES_ENCODER_ONLY[encoder_model_type] || MODEL_MAPPING_NAMES_ENCODER_DECODER[encoder_model_type]
      if !encoder_model
        warn "Model type for encoder '#{encoder_model_type}' not found, assuming encoder-only architecture. Please report this."
      end

      # Validate decoder
      decoder_model = MODEL_WITH_LM_HEAD_MAPPING_NAMES[decoder_config["model_type"]]
      if !decoder_model
        raise Error, "Unable to construct `VisionEncoderDecoder` due to unsupported decoder: \"#{decoder_config["model_type"]}\""
      end

      decoder_model_class = decoder_model[1]
      decoder = decoder_model_class.new(decoder_config, decoder_merged_session, generation_config)

      @add_encoder_pkv = decoder.respond_to?(:num_decoder_layers)
      if @add_encoder_pkv
        # Decoder is part of an encoder-decoder model
        @num_decoder_layers = decoder.num_decoder_layers
        @num_decoder_heads = decoder.num_decoder_heads
        @decoder_dim_kv = decoder.decoder_dim_kv

        @num_encoder_layers = decoder.num_encoder_layers
        @num_encoder_heads = decoder.num_encoder_heads
        @encoder_dim_kv = decoder.encoder_dim_kv
      else
        # Decoder is a decoder-only model
        @num_layers = decoder.num_layers
        @num_heads = decoder.num_heads
        @dim_kv = decoder.dim_kv
      end
    end
  end

  class DonutSwinPreTrainedModel < PreTrainedModel
  end

  class DonutSwinModel < DonutSwinPreTrainedModel
  end

  class WhisperPreTrainedModel < PreTrainedModel
  end

  class WhisperModel < WhisperPreTrainedModel
  end

  class WhisperForConditionalGeneration < WhisperPreTrainedModel
    REQUIRES_ATTENTION_MASK = false
    MAIN_INPUT_NAME = :input_features

    def initialize(config, session, decoder_merged_session, generation_config)
      super(config, session)
      @decoder_merged_session = decoder_merged_session
      @generation_config = generation_config

      @num_decoder_layers = @config["decoder_layers"]
      @num_decoder_heads = @config["decoder_attention_heads"]
      @decoder_dim_kv = @config["d_model"] / @num_decoder_heads.to_f

      @num_encoder_layers = @config["encoder_layers"]
      @num_encoder_heads = @config["encoder_attention_heads"]
      @encoder_dim_kv = @config["d_model"] / @num_encoder_heads.to_f
    end

    def generate(inputs, generation_config = nil, logits_processor = nil)
      raise Todo
    end
  end

  class VitsPreTrainedModel < PreTrainedModel
  end

  class VitsModel < VitsPreTrainedModel
    def call(model_inputs)
      VitsModelOutput.new(*super(model_inputs))
    end
  end

  class SpeechT5PreTrainedModel < PreTrainedModel
  end

  class SpeechT5Model < SpeechT5PreTrainedModel
  end

  class SpeechT5ForSpeechToText < SpeechT5PreTrainedModel
  end

  class SpeechT5ForTextToSpeech < SpeechT5PreTrainedModel
  end

  class ClapPreTrainedModel < PreTrainedModel
  end

  class ClapModel < ClapPreTrainedModel
  end

  MODEL_MAPPING_NAMES_ENCODER_ONLY = {
    "bert" => ["BertModel", BertModel],
    "modernbert" => ["ModernBertModel", ModernBertModel],
    "nomic_bert" => ["NomicBertModel", NomicBertModel],
    "electra" => ["ElectraModel", ElectraModel],
    "convbert" => ["ConvBertModel", ConvBertModel],
    "deberta-v2" => ["DebertaV2Model", DebertaV2Model],
    "mpnet" => ["MPNetModel", MPNetModel],
    "distilbert" => ["DistilBertModel", DistilBertModel],
    "roberta" => ["RobertaModel", RobertaModel],
    "xlm-roberta" => ["XLMRobertaModel", XLMRobertaModel],
    "clap" => ["ClapModel", ClapModel],
    "clip" => ["CLIPModel", CLIPModel],
    "detr" => ["DetrModel", DetrModel],
    "vit" => ["ViTModel", ViTModel],
    "owlvit" => ["OwlViTModel", OwlViTModel],
    "donut-swin" => ["DonutSwinModel", DonutSwinModel]
  }

  MODEL_MAPPING_NAMES_ENCODER_DECODER = {
    "bart" => ["BartModel", BartModel]
  }

  MODEL_MAPPING_NAMES_DECODER_ONLY = {
  }

  MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES = {
    "whisper" => ["WhisperForConditionalGeneration", WhisperForConditionalGeneration]
  }

  MODEL_FOR_TEXT_TO_SPECTROGRAM_MAPPING_NAMES = {
    "speecht5" => ["SpeechT5ForTextToSpeech", SpeechT5ForTextToSpeech]
  }

  MODEL_FOR_TEXT_TO_WAVEFORM_MAPPING_NAMES = {
    "vits" => ["VitsModel", VitsModel]
  }

  MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = {
    "bert" => ["BertForSequenceClassification", BertForSequenceClassification],
    "modernbert" => ["ModernBertForSequenceClassification", ModernBertForSequenceClassification],
    "distilbert" => ["DistilBertForSequenceClassification", DistilBertForSequenceClassification],
    "roberta" => ["RobertaForSequenceClassification", RobertaForSequenceClassification],
    "xlm-roberta" => ["XLMRobertaForSequenceClassification", XLMRobertaForSequenceClassification],
    "bart" => ["BartForSequenceClassification", BartForSequenceClassification]
  }

  MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = {
    "bert" => ["BertForTokenClassification", BertForTokenClassification],
    "modernbert" => ["ModernBertForTokenClassification", ModernBertForTokenClassification],
    "roberta" => ["RobertaForTokenClassification", RobertaForTokenClassification]
  }

  MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING_NAMES = {
    "t5" => ["T5ForConditionalGeneration", T5ForConditionalGeneration],
    "bart" => ["BartForConditionalGeneration", BartForConditionalGeneration],
    "m2m_100" => ["M2M100ForConditionalGeneration", M2M100ForConditionalGeneration]
  }

  MODEL_WITH_LM_HEAD_MAPPING_NAMES = {
    "gpt2" => ["GPT2LMHeadModel", GPT2LMHeadModel],
    "mbart" => ["MBartForCausalLM", MBartForCausalLM]
  }

  MODEL_FOR_MASKED_LM_MAPPING_NAMES = {
    "bert" => ["BertForMaskedLM", BertForMaskedLM],
    "modernbert" => ["ModernBertForMaskedLM", ModernBertForMaskedLM],
    "roberta" => ["RobertaForMaskedLM", RobertaForMaskedLM]
  }

  MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES = {
    "distilbert" => ["DistilBertForQuestionAnswering", DistilBertForQuestionAnswering]
  }

  MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES = {
    "vision-encoder-decoder" => ["VisionEncoderDecoderModel", VisionEncoderDecoderModel]
  }

  MODEL_FOR_DOCUMENT_QUESTION_ANSWERING_MAPPING_NAMES = {
    "vision-encoder-decoder" => ["VisionEncoderDecoderModel", VisionEncoderDecoderModel]
  }

  MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES = {
    "vit" => ["ViTForImageClassification", ViTForImageClassification]
  }

  MODEL_FOR_OBJECT_DETECTION_MAPPING_NAMES = {
    "detr" => ["DetrForObjectDetection", DetrForObjectDetection]
  }

  MODEL_FOR_ZERO_SHOT_OBJECT_DETECTION_MAPPING_NAMES = {
    "owlvit" => ["OwlViTForObjectDetection", OwlViTForObjectDetection]
  }

  MODEL_FOR_IMAGE_SEGMENTATION_MAPPING_NAMES = {
    "detr" => ["DetrForSegmentation", DetrForSegmentation]
  }

  MODEL_FOR_SEMANTIC_SEGMENTATION_MAPPING_NAMES = {
  }

  MODEL_FOR_MASK_GENERATION_MAPPING_NAMES = {
  }

  MODEL_FOR_CTC_MAPPING_NAMES = {
  }

  MODEL_FOR_AUDIO_CLASSIFICATION_MAPPING_NAMES = {
    "wav2vec2" => ["Wav2Vec2ForSequenceClassification", Wav2Vec2ForSequenceClassification]
  }

  MODEL_FOR_AUDIO_XVECTOR_MAPPING_NAMES = {
  }

  MODEL_FOR_AUDIO_FRAME_CLASSIFICATION_MAPPING_NAMES = {
  }

  MODEL_FOR_IMAGE_MATTING_MAPPING_NAMES = {
  }

  MODEL_FOR_IMAGE_TO_IMAGE_MAPPING_NAMES = {
    "swin2sr" => ["Swin2SRForImageSuperResolution", Swin2SRForImageSuperResolution]
  }

  MODEL_FOR_DEPTH_ESTIMATION_MAPPING_NAMES = {
    "dpt" => ["DPTForDepthEstimation", DPTForDepthEstimation]
  }

  MODEL_FOR_IMAGE_FEATURE_EXTRACTION_MAPPING_NAMES = {
  }

  MODEL_CLASS_TYPE_MAPPING = [
    [MODEL_MAPPING_NAMES_ENCODER_ONLY, MODEL_TYPES[:EncoderOnly]],
    [MODEL_MAPPING_NAMES_ENCODER_DECODER, MODEL_TYPES[:EncoderDecoder]],
    [MODEL_MAPPING_NAMES_DECODER_ONLY, MODEL_TYPES[:DecoderOnly]],
    [MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING_NAMES, MODEL_TYPES[:Seq2Seq]],
    [MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES, MODEL_TYPES[:Seq2Seq]],
    [MODEL_WITH_LM_HEAD_MAPPING_NAMES, MODEL_TYPES[:DecoderOnly]],
    [MODEL_FOR_MASKED_LM_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES, MODEL_TYPES[:Vision2Seq]],
    [MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_IMAGE_SEGMENTATION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_SEMANTIC_SEGMENTATION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_IMAGE_MATTING_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_IMAGE_TO_IMAGE_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_DEPTH_ESTIMATION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_OBJECT_DETECTION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_ZERO_SHOT_OBJECT_DETECTION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_MASK_GENERATION_MAPPING_NAMES, MODEL_TYPES[:MaskGeneration]],
    [MODEL_FOR_CTC_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_AUDIO_CLASSIFICATION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_TEXT_TO_SPECTROGRAM_MAPPING_NAMES, MODEL_TYPES[:Seq2Seq]],
    [MODEL_FOR_TEXT_TO_WAVEFORM_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_AUDIO_XVECTOR_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_AUDIO_FRAME_CLASSIFICATION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
    [MODEL_FOR_IMAGE_FEATURE_EXTRACTION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]]
  ]

  MODEL_CLASS_TYPE_MAPPING.each do |mappings, type|
    mappings.values.each do |name, model|
      MODEL_TYPE_MAPPING[name] = type
      MODEL_CLASS_TO_NAME_MAPPING[model] = name
      MODEL_NAME_TO_CLASS_MAPPING[name] = model
    end
  end

  class AutoModel < PretrainedMixin
    MODEL_CLASS_MAPPINGS = MODEL_CLASS_TYPE_MAPPING.map { |x| x[0] }
    BASE_IF_FAIL = true
  end

  class AutoModelForSequenceClassification < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES]
  end

  class AutoModelForTokenClassification < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES]
  end

  class AutoModelForSeq2SeqLM < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING_NAMES]
  end

  class AutoModelForSpeechSeq2Seq < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES]
  end

  class AutoModelForTextToSpectrogram < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_TEXT_TO_SPECTROGRAM_MAPPING_NAMES]
  end

  class AutoModelForTextToWaveform < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_TEXT_TO_WAVEFORM_MAPPING_NAMES]
  end

  class AutoModelForCausalLM < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_WITH_LM_HEAD_MAPPING_NAMES]
  end

  class AutoModelForMaskedLM < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_MASKED_LM_MAPPING_NAMES]
  end

  class AutoModelForQuestionAnswering < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES]
  end

  class AutoModelForVision2Seq < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES]
  end

  class AutoModelForImageClassification < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES]
  end

  class AutoModelForImageSegmentation < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_IMAGE_SEGMENTATION_MAPPING_NAMES]
  end

  class AutoModelForSemanticSegmentation < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_SEMANTIC_SEGMENTATION_MAPPING_NAMES]
  end

  class AutoModelForObjectDetection < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_OBJECT_DETECTION_MAPPING_NAMES]
  end

  class AutoModelForZeroShotObjectDetection < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_ZERO_SHOT_OBJECT_DETECTION_MAPPING_NAMES]
  end

  class AutoModelForMaskGeneration < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_MASK_GENERATION_MAPPING_NAMES]
  end

  class AutoModelForCTC < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_CTC_MAPPING_NAMES]
  end

  class AutoModelForAudioClassification < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_AUDIO_CLASSIFICATION_MAPPING_NAMES]
  end

  class AutoModelForXVector < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_AUDIO_XVECTOR_MAPPING_NAMES]
  end

  class AutoModelForAudioFrameClassification < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_AUDIO_FRAME_CLASSIFICATION_MAPPING_NAMES]
  end

  class AutoModelForDocumentQuestionAnswering < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_DOCUMENT_QUESTION_ANSWERING_MAPPING_NAMES]
  end

  class AutoModelForImageMatting < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_IMAGE_MATTING_MAPPING_NAMES]
  end

  class AutoModelForImageToImage < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_IMAGE_TO_IMAGE_MAPPING_NAMES]
  end

  class AutoModelForDepthEstimation < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_DEPTH_ESTIMATION_MAPPING_NAMES]
  end

  class AutoModelForImageFeatureExtraction < PretrainedMixin
    MODEL_CLASS_MAPPINGS = [MODEL_FOR_IMAGE_FEATURE_EXTRACTION_MAPPING_NAMES]
  end

  class ModelOutput
    def [](key)
      instance_variable_get("@#{key}")
    end
  end

  class Seq2SeqLMOutput < ModelOutput
    def initialize(logits, past_key_values, encoder_outputs, decoder_attentions = nil, cross_attentions = nil)
      super()
      @logits = logits
      @past_key_values = past_key_values
      @encoder_outputs = encoder_outputs
      @decoder_attentions = decoder_attentions
      @cross_attentions = cross_attentions
    end
  end

  class SequenceClassifierOutput < ModelOutput
    attr_reader :logits

    def initialize(logits)
      super()
      @logits = logits
    end
  end

  class TokenClassifierOutput < ModelOutput
    attr_reader :logits

    def initialize(logits)
      super()
      @logits = logits
    end
  end

  class MaskedLMOutput < ModelOutput
    attr_reader :logits

    def initialize(logits)
      super()
      @logits = logits
    end
  end

  class QuestionAnsweringModelOutput < ModelOutput
    attr_reader :start_logits, :end_logits

    def initialize(start_logits, end_logits)
      super()
      @start_logits = start_logits
      @end_logits = end_logits
    end
  end

  class DetrObjectDetectionOutput < ModelOutput
    attr_reader :logits, :pred_boxes

    def initialize(logits, pred_boxes)
      super()
      @logits = logits
      @pred_boxes = pred_boxes
    end
  end

  class DetrSegmentationOutput < ModelOutput
    attr_reader :logits, :pred_boxes, :pred_masks

    def initialize(logits, pred_boxes, pred_masks)
      super()
      @logits = logits
      @pred_boxes = pred_boxes
      @pred_masks = pred_masks
    end
  end
end


================================================
FILE: lib/informers/pipelines.rb
================================================
module Informers
  class Pipeline
    def initialize(task:, model:, tokenizer: nil, processor: nil)
      super()
      @task = task
      @model = model
      @tokenizer = tokenizer
      @processor = processor
    end

    private

    def prepare_images(images)
      if !images.is_a?(Array)
        images = [images]
      end

      # Possibly convert any non-images to images
      images.map { |x| Utils::RawImage.read(x) }
    end

    def prepare_audios(audios, sampling_rate)
      if !audios.is_a?(Array)
        audios = [audios]
      end

      audios.map do |x|
        if x.is_a?(String) || x.is_a?(URI)
          Utils.read_audio(x, sampling_rate)
        else
          x
        end
      end
    end

    def get_bounding_box(box, as_integer)
      if as_integer
        box = box.map { |x| x.to_i }
      end
      xmin, ymin, xmax, ymax = box

      {xmin:, ymin:, xmax:, ymax:}
    end
  end

  class TextClassificationPipeline < Pipeline
    def call(texts, top_k: 1)
      # Run tokenization
      model_inputs = @tokenizer.(texts,
        padding: true,
        truncation: true
      )

      # Run model
      outputs = @model.(model_inputs)

      function_to_apply =
        if @model.config[:problem_type] == "multi_label_classification"
          ->(batch) { Utils.sigmoid(batch) }
        else
          ->(batch) { Utils.softmax(batch) } # single_label_classification (default)
        end

      id2label = @model.config[:id2label]

      to_return = []
      outputs.logits.each do |batch|
        output = function_to_apply.(batch)
        scores = Utils.get_top_items(output, top_k)

        vals = scores.map do |x|
          {
            label: id2label[x[0].to_s],
            score: x[1]
          }
        end
        if top_k == 1
          to_return.concat(vals)
        else
          to_return << vals
        end
      end

      texts.is_a?(Array) ? to_return : to_return[0]
    end
  end

  class TokenClassificationPipeline < Pipeline
    def call(
      texts,
      ignore_labels: ["O"],
      aggregation_strategy: "simple"
    )
      is_batched = texts.is_a?(Array)

      # Run tokenization
      model_inputs = @tokenizer.(is_batched ? texts : [texts],
        padding: true,
        truncation: true,
        return_offsets: true
      )

      # Run model
      outputs = @model.(model_inputs)

      logits = outputs.logits
      id2label = @model.config[:id2label]

      to_return = []
      logits.length.times do |i|
        ids = model_inputs[:input_ids][i]
        batch = logits[i]
        offsets = model_inputs[:offsets][i]

        # List of tokens that aren't ignored
        tokens = []
        batch.length.times do |j|
          token_data = batch[j]
          top_score_index = Utils.max(token_data)[1]

          entity = id2label ? id2label[top_score_index.to_s] : "LABEL_#{top_score_index}"
          if ignore_labels.include?(entity)
            # We predicted a token that should be ignored. So, we skip it.
            next
          end

          # TODO add option to keep special tokens?
          word = @tokenizer.decode([ids[j]], skip_special_tokens: true)
          if word == ""
            # Was a special token. So, we skip it.
            next
          end

          scores = Utils.softmax(token_data)

          tokens << {
            entity: entity,
            score: scores[top_score_index],
            index: j,
            word: word,
            start: offsets[j][0],
            end: offsets[j][1]
          }
        end

        case aggregation_strategy
        when "simple"
          tokens = group_entities(tokens)
        when "none"
          # do nothing
        else
          raise ArgumentError, "Invalid aggregation_strategy"
        end

        to_return << tokens
      end
      is_batched ? to_return : to_return[0]
    end

    def group_sub_entities(entities)
      # Get the first entity in the entity group
      entity = entities[0][:entity].split("-", 2)[-1]
      scores = entities.map { |entity| entity[:score] }
      tokens = entities.map { |entity| entity[:word] }

      entity_group = {
        entity_group: entity,
        score: scores.sum / scores.count.to_f,
        word: @tokenizer.convert_tokens_to_string(tokens),
        start: entities[0][:start],
        end: entities[-1][:end]
      }
      entity_group
    end

    def get_tag(entity_name)
      if entity_name.start_with?("B-")
        bi = "B"
        tag = entity_name[2..]
      elsif entity_name.start_with?("I-")
        bi = "I"
        tag = entity_name[2..]
      else
        # It's not in B-, I- format
        # Default to I- for continuation.
        bi = "I"
        tag = entity_name
      end
      [bi, tag]
    end

    def group_entities(entities)
      entity_groups = []
      entity_group_disagg = []

      entities.each do |entity|
        if entity_group_disagg.empty?
          entity_group_disagg << entity
          next
        end

        # If the current entity is similar and adjacent to the previous entity,
        # append it to the disaggregated entity group
        # The split is meant to account for the "B" and "I" prefixes
        # Shouldn't merge if both entities are B-type
        bi, tag = get_tag(entity[:entity])
        _last_bi, last_tag = get_tag(entity_group_disagg[-1][:entity])

        if tag == last_tag && bi != "B"
          # Modify subword type to be previous_type
          entity_group_disagg << entity
        else
          # If the current entity is different from the previous entity
          # aggregate the disaggregated entity group
          entity_groups << group_sub_entities(entity_group_disagg)
          entity_group_disagg = [entity]
        end
      end
      if entity_group_disagg.any?
        # it's the last entity, add it to the entity groups
        entity_groups << group_sub_entities(entity_group_disagg)
      end

      entity_groups
    end
  end

  class QuestionAnsweringPipeline < Pipeline
    def call(question, context, top_k: 1)
      # Run tokenization
      inputs = @tokenizer.(question,
        text_pair: context,
        padding: true,
        truncation: true,
        return_offsets: true
      )

      output = @model.(inputs)

      to_return = []
      output.start_logits.length.times do |j|
        ids = inputs[:input_ids][j]
        sep_index = ids.index(@tokenizer.sep_token_id)
        offsets = inputs[:offsets][j]

        s1 = Utils.softmax(output.start_logits[j])
          .map.with_index
          .select { |x| x[1] > sep_index }
        e1 = Utils.softmax(output.end_logits[j])
          .map.with_index
          .select { |x| x[1] > sep_index }

        options = s1.product(e1)
          .select { |x| x[0][1] <= x[1][1] }
          .map { |x| [x[0][1], x[1][1], x[0][0] * x[1][0]] }
          .sort_by { |v| -v[2] }

        [options.length, top_k].min.times do |k|
          start, end_, score = options[k]

          answer_tokens = ids.slice(start, end_ + 1)

          answer = @tokenizer.decode(answer_tokens,
            skip_special_tokens: true
          )

          to_return << {
            answer:,
            score:,
            start: offsets[start][0],
            end: offsets[end_][1]
          }
        end
      end

      question.is_a?(Array) ? to_return : to_return[0]
    end
  end

  class FillMaskPipeline < Pipeline
    def call(texts, top_k: 5)
      model_inputs = @tokenizer.(texts, padding: true, truncation: true)
      outputs = @model.(model_inputs)

      to_return = []
      model_inputs[:input_ids].each_with_index do |ids, i|
        mask_token_index = ids.index(@tokenizer.mask_token_id)

        if mask_token_index.nil?
          raise ArgumentError, "Mask token (#{@tokenizer.mask_token}) not found in text."
        end
        logits = outputs.logits[i]
        item_logits = logits[mask_token_index]

        scores = Utils.get_top_items(Utils.softmax(item_logits), top_k)

        to_return <<
          scores.map do |x|
            sequence = ids.dup
            sequence[mask_token_index] = x[0]

            {
              score: x[1],
              token: x[0],
              token_str: @tokenizer.id_to_token(x[0]),
              sequence: @tokenizer.decode(sequence, skip_special_tokens: true)
            }
          end
      end
      texts.is_a?(Array) ? to_return : to_return[0]
    end
  end

  class Text2TextGenerationPipeline < Pipeline
    KEY = :generated_text

    def call(texts, **generate_kwargs)
      if !texts.is_a?(Array)
        texts = [texts]
      end

      # Add global prefix, if present
      if @model.config[:prefix]
        texts = texts.map { |x| @model.config[:prefix] + x }
      end

      # Handle task specific params:
      task_specific_params = @model.config[:task_specific_params]
      if task_specific_params && task_specific_params[@task]
        # Add prefixes, if present
        if task_specific_params[@task]["prefix"]
          texts = texts.map { |x| task_specific_params[@task]["prefix"] + x }
        end

        # TODO update generation config
      end

      tokenizer = @tokenizer
      tokenizer_options = {
        padding: true,
        truncation: true
      }
      if is_a?(TranslationPipeline) && tokenizer.respond_to?(:_build_translation_inputs)
        input_ids = tokenizer._build_translation_inputs(texts, tokenizer_options, generate_kwargs)[:input_ids]
      else
        input_ids = tokenizer.(texts, **tokenizer_options)[:input_ids]
      end

      output_token_ids = @model.generate(input_ids, generate_kwargs)

      tokenizer.batch_decode(output_token_ids, skip_special_tokens: true)
        .map { |text| {self.class.const_get(:KEY) => text} }
    end
  end

  class SummarizationPipeline < Text2TextGenerationPipeline
    KEY = :summary_text
  end

  class TranslationPipeline < Text2TextGenerationPipeline
    KEY = :translation_text
  end

  class TextGenerationPipeline < Pipeline
    def call(texts, **generate_kwargs)
      is_batched = false
      is_chat_input = false

      # Normalize inputs
      if texts.is_a?(String)
        texts = [texts]
        inputs = texts
      else
        raise Todo
      end

      # By default, do not add special tokens
      add_special_tokens = generate_kwargs[:add_special_tokens] || false

      # /By default, return full text
      return_full_text =
        if is_chat_input
          false
        else
          generate_kwargs[:return_full_text] || true
        end

      @tokenizer.padding_side = "left"
      input_ids, attention_mask =
        @tokenizer.(inputs, add_special_tokens:, padding: true, truncation: true)
          .values_at(:input_ids, :attention_mask)

      output_token_ids =
        @model.generate(
          input_ids, generate_kwargs, nil, inputs_attention_mask: attention_mask
        )

      decoded = @tokenizer.batch_decode(output_token_ids, skip_special_tokens: true)

      if !return_full_text && Utils.dims(input_ids)[-1] > 0
        prompt_lengths = @tokenizer.batch_decode(input_ids, skip_special_tokens: true).map { |x| x.length }
      end

      to_return = Array.new(texts.length) { [] }
      decoded.length.times do |i|
        text_index = (i / output_token_ids.length.to_i * texts.length).floor

        if prompt_lengths
          raise Todo
        end
        # TODO is_chat_input
        to_return[text_index] << {
          generated_text: decoded[i]
        }
      end
      !is_batched && to_return.length == 1 ? to_return[0] : to_return
    end
  end

  class ZeroShotClassificationPipeline < Pipeline
    def initialize(**options)
      super(**options)

      @label2id = @model.config[:label2id].transform_keys(&:downcase)

      @entailment_id = @label2id["entailment"]
      if @entailment_id.nil?
        warn "Could not find 'entailment' in label2id mapping. Using 2 as entailment_id."
        @entailment_id = 2
      end

      @contradiction_id = @label2id["contradiction"] || @label2id["not_entailment"]
      if @contradiction_id.nil?
        warn "Could not find 'contradiction' in label2id mapping. Using 0 as contradiction_id."
        @contradiction_id = 0
      end
    end

    def call(texts, candidate_labels, hypothesis_template: "This example is {}.", multi_label: false)
      is_batched = texts.is_a?(Array)
      if !is_batched
        texts = [texts]
      end
      if !candidate_labels.is_a?(Array)
        candidate_labels = [candidate_labels]
      end

      # Insert labels into hypothesis template
      hypotheses = candidate_labels.map { |x| hypothesis_template.sub("{}", x) }

      # How to perform the softmax over the logits:
      #  - true:  softmax over the entailment vs. contradiction dim for each label independently
      #  - false: softmax the "entailment" logits over all candidate labels
      softmax_each = multi_label || candidate_labels.length == 1

      to_return = []
      texts.each do |premise|
        entails_logits = []

        hypotheses.each do |hypothesis|
          inputs = @tokenizer.(
            premise,
            text_pair: hypothesis,
            padding: true,
            truncation: true
          )
          outputs = @model.(inputs)

          if softmax_each
            entails_logits << [
              outputs.logits[0][@contradiction_id],
              outputs.logits[0][@entailment_id]
            ]
          else
            entails_logits << outputs.logits[0][@entailment_id]
          end
        end

        scores =
          if softmax_each
            entails_logits.map { |x| Utils.softmax(x)[1] }
          else
            Utils.softmax(entails_logits)
          end

        # Sort by scores (desc) and return scores with indices
        scores_sorted = scores.map.with_index { |x, i| [x, i] }.sort_by { |v| -v[0] }

        to_return << {
          sequence: premise,
          labels: scores_sorted.map { |x| candidate_labels[x[1]] },
          scores: scores_sorted.map { |x| x[0] }
        }
      end
      is_batched ? to_return : to_return[0]
    end
  end

  class ImageToTextPipeline < Pipeline
    def call(images, **generate_kwargs)
      is_batched = images.is_a?(Array)
      prepared_images = prepare_images(images)

      pixel_values = @processor.(prepared_images)[:pixel_values]

      to_return = []
      pixel_values.each do |batch|
        batch = [batch]
        output = @model.generate(batch, **generate_kwargs)
        decoded = @tokenizer
          .batch_decode(output, skip_special_tokens: true)
          .map { |x| {generated_text: x.strip} }
        to_return << decoded
      end

      is_batched ? to_return : to_return[0]
    end
  end

  class ImageClassificationPipeline < Pipeline
    def call(images, top_k: 1)
      is_batched = images.is_a?(Array)
      prepared_images = prepare_images(images)

      pixel_values = @processor.(prepared_images)[:pixel_values]
      output = @model.({pixel_values: pixel_values})

      id2label = @model.config[:id2label]
      to_return = []
      output.logits.each do |batch|
        scores = Utils.get_top_items(Utils.softmax(batch), top_k)

        vals =
          scores.map do |x|
            {
              label: id2label[x[0].to_s],
              score: x[1]
            }
          end
        if top_k == 1
          to_return.push(*vals)
        else
          to_return << vals
        end
      end

      is_batched || top_k == 1 ? to_return : to_return[0]
    end
  end

  class ImageSegmentationPipeline < Pipeline
    def initialize(**options)
      super(**options)

      @subtasks_mapping = {
        "panoptic" => "post_process_panoptic_segmentation",
        "instance" => "post_process_instance_segmentation",
        "semantic" => "post_process_semantic_segmentation"
      }
    end

    def call(
      images,
      threshold: 0.5,
      mask_threshold: 0.5,
      overlap_mask_area_threshold: 0.8,
      label_ids_to_fuse: nil,
      target_sizes: nil,
      subtask: nil
    )
      is_batched = images.is_a?(Array)

      if is_batched && images.length != 1
        raise Error, "Image segmentation pipeline currently only supports a batch size of 1."
      end

      prepared_images = prepare_images(images)
      image_sizes = prepared_images.map { |x| [x.height, x.width] }

      model_inputs = @processor.(prepared_images).slice(:pixel_values, :pixel_mask)
      output = @model.(model_inputs)

      if !subtask.nil?
        fn = @subtasks_mapping[subtask]
      else
        @subtasks_mapping.each do |task, func|
          if @processor.feature_extractor.respond_to?(func)
            fn = @processor.feature_extractor.method(func)
            subtask = task
            break
          end
        end
      end

      id2label = @model.config[:id2label]

      annotation = []
      if subtask == "panoptic" || subtask == "instance"
        processed = fn.(
          output,
          threshold:,
          mask_threshold:,
          overlap_mask_area_threshold:,
          label_ids_to_fuse:,
          target_sizes: target_sizes || image_sizes, # TODO FIX?
        )[0]

        _segmentation = processed[:segmentation]

        processed[:segments_info].each do |segment|
          annotation << {
            label: id2label[segment[:label_id].to_s],
            score: segment[:score]
            # TODO mask
          }
        end
      elsif subtask == "semantic"
        raise Todo
      else
        raise Error, "Subtask #{subtask} not supported."
      end

      annotation
    end
  end

  class ZeroShotImageClassificationPipeline < Pipeline
    def call(images, candidate_labels, hypothesis_template: "This is a photo of {}")
      is_batched = images.is_a?(Array)
      prepared_images = prepare_images(images)

      # Insert label into hypothesis template
      texts = candidate_labels.map { |x| hypothesis_template.sub("{}", x) }

      #  Run tokenization
      text_inputs = @tokenizer.(texts,
        padding: @model.config[:model_type] == "siglip" ? "max_length" : true,
        truncation: true
      )

      # Run processor
      pixel_values = @processor.(prepared_images)[:pixel_values]

      # Run model with both text and pixel inputs
      output = @model.(text_inputs.merge(pixel_values: pixel_values))

      function_to_apply =
        if @model.config[:model_type] == "siglip"
          ->(batch) { Utils.sigmoid(batch) }
        else
          ->(batch) { Utils.softmax(batch) }
        end

      # Compare each image with each candidate label
      to_return = []
      output[0].each do |batch|
        # Compute softmax per image
        probs = function_to_apply.(batch)

        result = probs
          .map.with_index { |x, i| {label: candidate_labels[i], score: x} }
          .sort_by { |v| -v[:score] }

        to_return << result
      end

      is_batched ? to_return : to_return[0]
    end
  end

  class ObjectDetectionPipeline < Pipeline
    def call(images, threshold: 0.9, percentage: false)
      is_batched = images.is_a?(Array)

      if is_batched && images.length != 1
        raise Error, "Object detection pipeline currently only supports a batch size of 1."
      end
      prepared_images = prepare_images(images)

      image_sizes = percentage ? nil : prepared_images.map { |x| [x.height, x.width] }

      model_inputs = @processor.(prepared_images).slice(:pixel_values, :pixel_mask)
      output = @model.(model_inputs)

      processed = @processor.feature_extractor.post_process_object_detection(output, threshold, image_sizes)

      # Add labels
      id2label = @model.config[:id2label]

      # Format output
      result =
        processed.map do |batch|
          batch[:boxes].map.with_index do |box, i|
            {
              label: id2label[batch[:classes][i].to_s],
              score: batch[:scores][i],
              box: get_bounding_box(box, !percentage)
            }
          end.sort_by { |v| -v[:score] }
        end

      is_batched ? result : result[0]
    end
  end

  class ZeroShotObjectDetectionPipeline < Pipeline
    def call(
      images,
      candidate_labels,
      threshold: 0.1,
      top_k: nil,
      percentage: false
    )
      is_batched = images.is_a?(Array)
      prepared_images = prepare_images(images)

      # Run tokenization
      text_inputs = @tokenizer.(candidate_labels,
        padding: true,
        truncation: true
      )

      # Run processor
      model_inputs = @processor.(prepared_images)

      # Since non-maximum suppression is performed for exporting, we need to
      # process each image separately. For more information, see:
      # https://github.com/huggingface/optimum/blob/e3b7efb1257c011db907ef40ab340e795cc5684c/optimum/exporters/onnx/model_configs.py#L1028-L1032
      to_return = []
      prepared_images.length.times do |i|
        image = prepared_images[i]
        image_size = percentage ? nil : [[image.height, image.width]]
        pixel_values = [model_inputs[:pixel_values][i]]

        # Run model with both text and pixel inputs
        output = @model.(text_inputs.merge(pixel_values: pixel_values))
        # TODO remove
        output = @model.instance_variable_get(:@session).outputs.map { |v| v[:name].to_sym }.zip(output).to_h

        processed = @processor.feature_extractor.post_process_object_detection(output, threshold, image_size, true)[0]
        result =
          processed[:boxes].map.with_index do |box, i|
            {
              label: candidate_labels[processed[:classes][i]],
              score: processed[:scores][i],
              box: get_bounding_box(box, !percentage)
            }
          end
        result.sort_by! { |v| -v[:score] }
        if !top_k.nil?
          result = result[0...topk]
        end
        to_return << result
      end

      is_batched ? to_return : to_return[0]
    end
  end

  class DocumentQuestionAnsweringPipeline < Pipeline
    def call(image, question, **generate_kwargs)
      # NOTE: For now, we only support a batch size of 1

      # Preprocess image
      prepared_image = prepare_images(image)[0]
      pixel_values = @processor.(prepared_image)[:pixel_values]

      # Run tokenization
      task_prompt = "<s_docvqa><s_question>#{question}</s_question><s_answer>"
      decoder_input_ids =
        @tokenizer.(
          task_prompt,
          add_special_tokens: false,
          padding: true,
          truncation: true
        )[:input_ids]

      # Run model
      output =
        @model.generate(
          pixel_values,
          generate_kwargs.merge(
            decoder_input_ids: decoder_input_ids[0],
            max_length: @model.config["decoder"]["max_position_embeddings"]
          ).transform_keys(&:to_s)
        )

      # Decode output
      decoded = @tokenizer.batch_decode(output, skip_special_tokens: false)[0]

      # Parse answer
      match = decoded.match(/<s_answer>(.*?)<\/s_answer>/)
      answer = nil
      if match && match.length >= 2
        answer = match[1].strip
      end
      [{answer:}]
    end
  end

  class TextToAudioPipeline < Pipeline
    DEFAULT_VOCODER_ID = "Xenova/speecht5_hifigan"

    def initialize(**options)
      super(**options)

      # TODO: Find a better way for `pipeline` to set the default vocoder
      @vocoder = options[:vocoder]
    end

    def call(text_inputs, speaker_embeddings: nil)
      # If this.processor is not set, we are using a `AutoModelForTextToWaveform` model
      if @processor
        call_text_to_spectrogram(text_inputs, speaker_embeddings:)
      else
        call_text_to_waveform(text_inputs)
      end
    end
  end

  class FeatureExtractionPipeline < Pipeline
    def call(
      texts,
      pooling: "none",
      normalize: false,
      quantize: false,
      precision: "binary",
      model_output: nil
    )
      # Run tokenization
      model_inputs = @tokenizer.(texts,
        padding: true,
        truncation: true
      )
      model_options = {}

      if !model_output.nil?
        model_options[:output_names] = Array(model_output)
      elsif @model.instance_variable_get(:@output_names) == ["token_embeddings"] && pooling == "mean" && normalize
        # optimization for previous revision of sentence-transformers/all-MiniLM-L6-v2
        model_options[:output_names] = ["sentence_embedding"]
        pooling = "none"
        normalize = false
      end

      # Run model
      outputs = @model.(model_inputs, **model_options)

      # TODO improve
      result =
        if outputs.is_a?(Array)
          # TODO show returned instead of all
          output_names = @model.instance_variable_get(:@session).outputs.map { |v| v[:name] }
          raise Error, "unexpected outputs: #{output_names}" if outputs.size != 1
          outputs[0]
        else
          outputs.logits
        end

      case pooling
      when "none"
        # Skip pooling
      when "mean"
        result = Utils.mean_pooling(result, model_inputs[:attention_mask])
      when "cls"
        result = result.map(&:first)
      else
        # TODO raise ArgumentError in 2.0
        raise Error, "Pooling method '#{pooling}' not supported."
      end

      if normalize
        result = Utils.normalize(result)
      end

      if quantize
        result = quantize_embeddings(result, precision)
      end

      texts.is_a?(Array) ? result : result[0]
    end
  end

  class ImageFeatureExtractionPipeline < Pipeline
    def call(images)
      prepared_images = prepare_images(images)
      pixel_values = @processor.(prepared_images)[:pixel_values]
      outputs = @model.({pixel_values: pixel_values})

      result = outputs[0]
      result
    end
  end

  class AudioClassificationPipeline < Pipeline
    def call(audio, top_k: nil)
      single = !audio.is_a?(Array)

      sampling_rate = @processor.feature_extractor.config["sampling_rate"]
      prepared_audios = prepare_audios(audio, sampling_rate)

      id2label = @model.config[:id2label]

      to_return = []
      prepared_audios.each do |aud|
        inputs = @processor.(aud)
        output = @model.(inputs)
        logits = output.logits[0]

        scores = Utils.get_top_items(Utils.softmax(logits), top_k)

        vals =
          scores.map do |x|
            {
              label: id2label[x[0].to_s],
              score: x[1]
            }
          end

        if top_k == 1
          to_return.concat(vals)
        else
          to_return << vals
        end
      end
      !single || top_k == 1 ? to_return : to_return[0]
    end
  end

  class ZeroShotAudioClassificationPipeline < Pipeline
    def call(audio, candidate_labels, hypothesis_template: "This is a sound of {}.")
      single = !audio.is_a?(Array)
      if single
        audio = [audio]
      end

      # Insert label into hypothesis template
      texts = candidate_labels.map { |x| hypothesis_template.sub("{}", x) }

      # Run tokenization
      text_inputs =
        @tokenizer.(
          texts,
          padding: true,
          truncation: true
        )

      sampling_rate = @processor.feature_extractor.config["sampling_rate"]
      prepared_audios = prepare_audios(audio, sampling_rate)

      to_return = []
      prepared_audios.each do |aud|
        audio_inputs = @processor.(aud)

        # Run model with both text and audio inputs
        output = @model.(text_inputs.merge(audio_inputs))

        # Compute softmax per audio
        probs = Utils.softmax(output.logits_per_audio.data)

        to_return <<
          probs.map.with_index do |x, i|
            {
              label: candidate_labels[i],
              score: x
            }
          end
      end
      single ? to_return[0] : to_return
    end
  end

  class AutomaticSpeechRecognitionPipeline < Pipeline
    def call(audio, **kwargs)
      case @model.config["model_type"]
      when "whisper"
        call_whisper(audio, **kwargs)
      else
        raise Error, "AutomaticSpeechRecognitionPipeline does not support model type '#{@model.config["model_type"]}'."
      end
    end

    private

    def call_whisper(audio, **kwargs)
      raise Todo
    end
  end

  class ImageToImagePipeline < Pipeline
    def call(images)
      prepared_images = prepare_images(images)
      inputs = @processor.(prepared_images)
      outputs = @model.(inputs)

      to_return = []
      outputs[0].each do |batch|
        # TODO flatten first
        output =
          batch.map do |v|
            v.map do |v2|
              v2.map do |v3|
                (v3.clamp(0, 1) * 255).round
              end
            end
          end
        to_return << Utils::RawImage.from_array(output).image
      end

      to_return.length > 1 ? to_return : to_return[0]
    end
  end

  class DepthEstimationPipeline < Pipeline
    def call(images)
      prepared_images = prepare_images(images)

      inputs = @processor.(prepared_images)
      predicted_depth = @model.(inputs)[0]

      to_return = []
      prepared_images.length.times do |i|
        prediction = Utils.interpolate(predicted_depth[i], prepared_images[i].size.reverse, "bilinear", false)
        max_prediction = Utils.max(prediction.flatten)[0]
        formatted =
          prediction.map do |v|
            v.map do |v2|
              v2.map do |v3|
                (v3 * 255 / max_prediction).round
              end
            end
          end
        to_return << {
          predicted_depth: predicted_depth[i],
          depth: Utils::RawImage.from_array(formatted).image
        }
      end
      to_return.length > 1 ? to_return : to_return[0]
    end
  end

  class EmbeddingPipeline < FeatureExtractionPipeline
    def call(
      texts,
      pooling: "mean",
      normalize: true,
      model_output: nil
    )
      super(texts, pooling:, normalize:, model_output:)
    end
  end

  class RerankingPipeline < Pipeline
    def call(
      query,
      documents,
      return_documents: false,
      top_k: nil
    )
      model_inputs = @tokenizer.([query] * documents.size,
        text_pair: documents,
        padding: true,
        truncation: true
      )

      outputs = @model.(model_inputs)

      result =
        Utils.sigmoid(outputs[0].map(&:first))
          .map.with_index { |s, i| {doc_id: i, score: s} }
          .sort_by { |v| -v[:score] }

      if return_documents
        result.each do |v|
          v[:text] = documents[v[:doc_id]]
        end
      end

      top_k ? result.first(top_k) : result
    end
  end

  SUPPORTED_TASKS = {
    "text-classification" => {
      tokenizer: AutoTokenizer,
      pipeline: TextClassificationPipeline,
      model: AutoModelForSequenceClassification,
      default: {
        model: "Xenova/distilbert-base-uncased-finetuned-sst-2-english"
      },
      type: "text"
    },
    "token-classification" => {
      tokenizer: AutoTokenizer,
      pipeline: TokenClassificationPipeline,
      model: AutoModelForTokenClassification,
      default: {
        model: "Xenova/bert-base-multilingual-cased-ner-hrl"
      },
      type: "text"
    },
    "question-answering" => {
      tokenizer: AutoTokenizer,
      pipeline: QuestionAnsweringPipeline,
      model: AutoModelForQuestionAnswering,
      default: {
        model: "Xenova/distilbert-base-cased-distilled-squad"
      },
      type: "text"
    },
    "fill-mask" => {
      tokenizer: AutoTokenizer,
      pipeline: FillMaskPipeline,
      model: AutoModelForMaskedLM,
      default: {
        model: "Xenova/bert-base-uncased"
      },
      type: "text"
    },
    "summarization" => {
      tokenizer: AutoTokenizer,
      pipeline: SummarizationPipeline,
      model: AutoModelForSeq2SeqLM,
      default: {
        model: "Xenova/distilbart-cnn-6-6"
      },
      type: "text"
    },
    "translation" => {
      tokenizer: AutoTokenizer,
      pipeline: TranslationPipeline,
      model: AutoModelForSeq2SeqLM,
      default: {
        model: "Xenova/t5-small"
      },
      type: "text"
    },
    "text2text-generation" => {
      tokenizer: AutoTokenizer,
      pipeline: Text2TextGenerationPipeline,
      model: AutoModelForSeq2SeqLM,
      default: {
        model: "Xenova/flan-t5-small"
      },
      type: "text"
    },
    "text-generation" => {
      tokenizer: AutoTokenizer,
      pipeline: TextGenerationPipeline,
      model: AutoModelForCausalLM,
      default: {
        model: "Xenova/gpt2"
      },
      type: "text"
    },
    "zero-shot-classification" => {
      tokenizer: AutoTokenizer,
      pipeline: ZeroShotClassificationPipeline,
      model: AutoModelForSequenceClassification,
      default: {
        model: "Xenova/distilbert-base-uncased-mnli"
      },
      type: "text"
    },
    "audio-classification" => {
      pipeline: AudioClassificationPipeline,
      model: AutoModelForAudioClassification,
      processor: AutoProcessor,
      default: {
        model: "Xenova/wav2vec2-base-superb-ks"
      },
      type: "audio"
    },
    # TODO
    # "zero-shot-audio-classification" => {
    #   tokenizer: AutoTokenizer,
    #   pipeline: ZeroShotAudioClassificationPipeline,
    #   model: AutoModel,
    #   processor: AutoProcessor,
    #   default: {
    #      model: "Xenova/clap-htsat-unfused"
    #   },
    #   type: "multimodal"
    # },
    # TODO
    # "automatic-speech-recognition" => {
    #   tokenizer: AutoTokenizer,
    #   pipeline: AutomaticSpeechRecognitionPipeline,
    #   model: [AutoModelForSpeechSeq2Seq, AutoModelForCTC],
    #   processor: AutoProcessor,
    #   default: {
    #     model: "Xenova/whisper-tiny.en"
    #   },
    #   type: "multimodal"
    # },
    "text-to-audio" => {
      tokenizer: AutoTokenizer,
      pipeline: TextToAudioPipeline,
      model: [AutoModelForTextToWaveform, AutoModelForTextToSpectrogram],
      processor: [AutoProcessor, nil],
      default: {
        model: "Xenova/speecht5_tts"
      },
      type: "text"
    },
    "image-to-text" => {
      tokenizer: AutoTokenizer,
      pipeline: ImageToTextPipeline,
      model: AutoModelForVision2Seq,
      processor: AutoProcessor,
      default: {
        model: "Xenova/vit-gpt2-image-captioning"
      },
      type: "multimodal"
    },
    "image-classification" => {
      pipeline: ImageClassificationPipeline,
      model: AutoModelForImageClassification,
      processor: AutoProcessor,
      default: {
        model: "Xenova/vit-base-patch16-224"
      },
      type: "multimodal"
    },
    "image-segmentation" => {
      pipeline: ImageSegmentationPipeline,
      model: [AutoModelForImageSegmentation, AutoModelForSemanticSegmentation],
      processor: AutoProcessor,
      default: {
        model: "Xenova/detr-resnet-50-panoptic"
      },
      type: "multimodal"
    },
    "zero-shot-image-classification" => {
      tokenizer: AutoTokenizer,
      pipeline: ZeroShotImageClassificationPipeline,
      model: AutoModel,
      processor: AutoProcessor,
      default: {
        model: "Xenova/clip-vit-base-patch32"
      },
      type: "multimodal"
    },
    "object-detection" => {
      pipeline: ObjectDetectionPipeline,
      model: AutoModelForObjectDetection,
      processor: AutoProcessor,
      default: {
        model: "Xenova/detr-resnet-50"
      },
      type: "multimodal"
    },
    "zero-shot-object-detection" => {
      tokenizer: AutoTokenizer,
      pipeline: ZeroShotObjectDetectionPipeline,
      model: AutoModelForZeroShotObjectDetection,
      processor: AutoProcessor,
      default: {
        model: "Xenova/owlvit-base-patch32"
      },
      type: "multimodal"
    },
    "document-question-answering" => {
      tokenizer: AutoTokenizer,
      pipeline: DocumentQuestionAnsweringPipeline,
      model: AutoModelForDocumentQuestionAnswering,
      processor: AutoProcessor,
      default: {
        model: "Xenova/donut-base-finetuned-docvqa"
      },
      type: "multimodal"
    },
    "image-to-image" => {
      pipeline: ImageToImagePipeline,
      model: AutoModelForImageToImage,
      processor: AutoProcessor,
      default: {
        model: "Xenova/swin2SR-classical-sr-x2-64"
      },
      type: "image"
    },
    "depth-estimation" => {
      pipeline: DepthEstimationPipeline,
      model: AutoModelForDepthEstimation,
      processor: AutoProcessor,
      default: {
        model: "Xenova/dpt-large"
      },
      type: "image"
    },
    "feature-extraction" => {
      tokenizer: AutoTokenizer,
      pipeline: FeatureExtractionPipeline,
      model: AutoModel,
      default: {
        model: "Xenova/all-MiniLM-L6-v2"
      },
      type: "text"
    },
    "image-feature-extraction" => {
      processor: AutoProcessor,
      pipeline: ImageFeatureExtractionPipeline,
      model: [AutoModelForImageFeatureExtraction, AutoModel],
      default: {
        model: "Xenova/vit-base-patch16-224"
      },
      type: "image"
    },
    "embedding" => {
      tokenizer: AutoTokenizer,
      pipeline: EmbeddingPipeline,
      model: AutoModel,
      default: {
        model: "sentence-transformers/all-MiniLM-L6-v2"
      },
      type: "text"
    },
    "reranking" => {
      tokenizer: AutoTokenizer,
      pipeline: RerankingPipeline,
      model: AutoModel,
      default: {
        model: "mixedbread-ai/mxbai-rerank-base-v1"
      },
      type: "text"
    }
  }

  TASK_ALIASES = {
    "sentiment-analysis" => "text-classification",
    "ner" => "token-classification",
    "text-to-speech" => "text-to-audio"
  }

  DEFAULT_PROGRESS_CALLBACK = lambda do |msg|
    stream = $stderr
    tty = stream.tty?
    width = tty ? stream.winsize[1] : 80
    width = 80 if width == 0

    if msg[:status] == "progress" && tty
      stream.print "\r#{Utils::Hub.display_progress(msg[:file], width, msg[:size], msg[:total_size])}"
    elsif msg[:status] == "done" && !msg[:cache_hit]
      if tty
        stream.puts
      else
        stream.puts Utils::Hub.display_progress(msg[:file], width, 1, 1)
      end
    end
  end

  NO_DEFAULT = Object.new

  class << self
    def pipeline(
      task,
      model = nil,
      quantized: NO_DEFAULT,
      progress_callback: DEFAULT_PROGRESS_CALLBACK,
      config: nil,
      cache_dir: nil,
      local_files_only: false,
      revision: "main",
      device: nil,
      dtype: nil,
      model_file_name: nil,
      session_options: {}
    )
      # Apply aliases
      task = TASK_ALIASES[task] || task

      if quantized == NO_DEFAULT
        # TODO no quantization by default in 2.0
        quantized = ["text-classification", "token-classification", "question-answering", "feature-extraction"].include?(task)
      end

      # Get pipeline info
      pipeline_info = SUPPORTED_TASKS[task.split("_", 1)[0]]
      if !pipeline_info
        raise Error, "Unsupported pipeline: #{task}. Must be one of #{SUPPORTED_TASKS.keys}"
      end

      # Use model if specified, otherwise, use default
      if !model
        model = pipeline_info[:default][:model]
        warn "No model specified. Using default model: #{model.inspect}."
      end

      pretrained_options = {
        quantized:,
        progress_callback:,
        config:,
        cache_dir:,
        local_files_only:,
        revision:,
        device:,
        dtype:,
        model_file_name:,
        session_options:
      }

      classes = {
        tokenizer: pipeline_info[:tokenizer],
        model: pipeline_info[:model],
        processor: pipeline_info[:processor]
      }

      # Load model, tokenizer, and processor (if they exist)
      results = load_items(classes, model, pretrained_options)
      results[:task] = task

      # for previous revision of sentence-transformers/all-MiniLM-L6-v2
      if model == "sentence-transformers/all-MiniLM-L6-v2" && results[:model].instance_variable_get(:@session).outputs.any? { |v| v[:name] == "token_embeddings" }
        results[:model].instance_variable_set(:@output_names, ["token_embeddings"])
      end

      Utils.dispatch_callback(progress_callback, {
        status: "ready",
        task: task,
        model: model
      })

      pipeline_class = pipeline_info.fetch(:pipeline)
      pipeline_class.new(**results)
    end

    private

    def load_items(mapping, model, pretrained_options)
      result = {}

      mapping.each do |name, cls|
        next if !cls

        if cls.is_a?(Array)
          e = nil
          cls.each do |c|
            begin
              result[name] = c.from_pretrained(model, **pretrained_options)
            rescue => err
              e = err
            end
          end
          raise e unless result[name]
        else
          result[name] = cls.from_pretrained(model, **pretrained_options)
        end
      end

      result
    end
  end
end


================================================
FILE: lib/informers/processors.rb
================================================
module Informers
  class FeatureExtractor
    attr_reader :config

    def initialize(config)
      super()
      @config = config
    end
  end

  class ImageFeatureExtractor < FeatureExtractor
    def initialize(config)
      super(config)

      @image_mean = @config["image_mean"] || @config["mean"]
      @image_std = @config["image_std"] || @config["std"]

      @resample = @config["resample"] || 2 # 2 => bilinear
      @do_rescale = @config.fetch("do_rescale", true)
      @rescale_factor = @config["rescale_factor"] || (1 / 255.0)
      @do_normalize = @config["do_normalize"]

      @do_resize = @config["do_resize"]
      @do_thumbnail = @config["do_thumbnail"]
      @size = @config["size"]
      @size_divisibility = @config["size_divisibility"] || @config["size_divisor"]

      @do_center_crop = @config["do_center_crop"]
      @crop_size = @config["crop_size"]
      @do_convert_rgb = @config.fetch("do_convert_rgb", true)
      @do_crop_margin = @config["do_crop_margin"]

      @pad_size = @config["pad_size"]
      @do_pad = @config["do_pad"]

      if @do_pad && !@pad_size && @size && !@size["width"].nil? && !@size["height"].nil?
        # Should pad, but no pad size specified
        # We infer the pad size from the resize size
        @pad_size = @size
      end

      @do_flip_channel_order = @config["do_flip_channel_order"] || false
    end

    def thumbnail(image, size, resample = 2)
      input_height = image.height
      input_width = image.width

      output_height = size["height"]
      output_width = size["width"]

      # We always resize to the smallest of either the input or output size.
      height = [input_height, output_height].min
      width = [input_width, output_width].min

      if height == input_height && width == input_width
        return image
      end
      if input_height > input_width
        width = (input_width * height / input_height).floor
      elsif input_width > input_height
        height = (input_height * width / input_width).floor
      end
      image.resize(width, height, resample:)
    end

    def pad_image(
      pixel_data,
      img_dims,
      pad_size,
      mode: "constant",
      center: false,
      constant_values: 0
    )
      image_height, image_width, image_channels = img_dims

      if pad_size.is_a?(Numeric)
        padded_image_width = pad_size
        padded_image_height = pad_size
      else
        padded_image_width = pad_size[:width] || pad_size["width"]
        padded_image_height = pad_size[:height] || pad_size["height"]
      end

      # Only add padding if there is a difference in size
      if padded_image_width != image_width || padded_image_height != image_height
        padded_pixel_data = Array.new(padded_image_width * padded_image_height * image_channels)
        if constant_values.is_a?(Array)
          # Fill with constant values, cycling through the array
          padded_pixel_data.length.times do |i|
            padded_pixel_data[i] = constant_values[i % image_channels]
          end
        elsif constant_values != 0
          padded_pixel_data.fill(constant_values)
        end

        left, top =
          if center
            [((padded_image_width - image_width) / 2.0).floor, ((padded_image_height - image_height) / 2.0).floor]
          else
            [0, 0]
          end

        # Copy the original image into the padded image
        image_height.times do |i|
          a = (i + top) * padded_image_width
          b = i * image_width
          image_width.times do |j|
            c = (a + j + left) * image_channels
            d = (b + j) * image_channels
            image_channels.times do |k|
              padded_pixel_data[c + k] = pixel_data[d + k]
            end
          end
        end

        if mode == "symmetric"
          if center
            raise Error, "`center` padding is not supported when `mode` is set to `symmetric`."
          end
          h1 = image_height - 1
          w1 = image_width - 1
          padded_image_height.times do |i|
            a = i * padded_image_width
            b = Utils.calculate_reflect_offset(i, h1) * image_width

            padded_image_width.times do |j|
              next if i < image_height && j < image_width # Do not overwrite original image
              c = (a + j) * image_channels
              d = (b + Utils.calculate_reflect_offset(j, w1)) * image_channels

              # Copy channel-wise
              image_channels.times do |k|
                padded_pixel_data[c + k] = pixel_data[d + k]
              end
            end
          end
        end

        # Update pixel data and image dimensions
        pixel_data = padded_pixel_data
        img_dims = [padded_image_height, padded_image_width, image_channels]
      end
      [pixel_data, img_dims]
    end

    def rescale(pixel_data)
      pixel_data.length.times do |i|
        pixel_data[i] *= @rescale_factor
      end
    end

    def get_resize_output_image_size(image, size)
      src_width, src_height = image.size

      if @do_thumbnail
        # NOTE: custom logic for `Donut` models
        height = size["height"]
        width = size["width"]
        shortest_edge = [height, width].min
      elsif size.is_a?(Numeric)
        shortest_edge = size
        longest_edge = @config["max_size"] || shortest_edge
      elsif !size.nil?
        # Extract known properties from `size`
        shortest_edge = size["shortest_edge"]
        longest_edge = size["longest_edge"]
      end

      if !shortest_edge.nil? || !longest_edge.nil?
        # http://opensourcehacker.com/2011/12/01/calculate-aspect-ratio-conserving-resize-for-images-in-javascript/
        # Try resize so that shortest edge is `shortest_edge` (target)
        short_resize_factor =
          if shortest_edge.nil?
            1 # If `shortest_edge` is not set, don't upscale
          else
            [shortest_edge / src_width.to_f, shortest_edge / src_height.to_f].max
          end

        new_width = src_width * short_resize_factor
        new_height = src_height * short_resize_factor

        # The new width and height might be greater than `longest_edge`, so
        # we downscale again to ensure the largest dimension is `longest_edge`
        long_resize_factor =
          if longest_edge.nil?
            1 # If `longest_edge` is not set, don't downscale
          else
            [longest_edge / new_width.to_f, longest_edge / new_height.to_f].min
          end

        # To avoid certain floating point precision issues, we round to 2 decimal places
        final_width = (new_width * long_resize_factor).round(2).floor
        final_height = (new_height * long_resize_factor).round(2).floor

        if !@size_divisibility.nil?
          raise Todo
        end
        [final_width, final_height]
      elsif !size.nil? && !size["width"].nil? && !size["height"].nil?
        new_width = size["width"]
        new_height = size["height"]

        if @config["keep_aspect_ratio"] && @config["ensure_multiple_of"]
          raise Todo
        end

        [new_width, new_height]
      else
        raise Todo
      end
    end

    def resize(image)
      new_width, new_height = get_resize_output_image_size(image, @size)
      image.resize(new_width, new_height, resample: @resample)
    end

    def preprocess(
      image,
      do_normalize: nil,
      do_pad: nil,
      do_convert_rgb: nil,
      do_convert_grayscale: nil,
      do_flip_channel_order: nil
    )
      if @do_crop_margin
        # NOTE: Specific to nougat processors. This is done before resizing,
        # and can be interpreted as a pre-preprocessing step.
        image = crop_margin(image)
      end

      src_width, src_height = image.size # original image size

      # Convert image to RGB if specified in config.
      if !do_convert_rgb.nil? ? do_convert_rgb : @do_convert_rgb
        image = image.rgb
      elsif do_convert_grayscale
        image = image.grayscale
      end

      # Resize all images
      if @do_resize
        image = resize(image)
      end

      # Resize the image using thumbnail method.
      if @do_thumbnail
        image = thumbnail(image, @size, @resample)
      end

      if @do_center_crop
        if @crop_size.is_a?(Integer)
          crop_width = @crop_size
          crop_height = @crop_size
        else
          crop_width = @crop_size["width"]
          crop_height = @crop_size["height"]
        end
        image = image.center_crop(crop_width, crop_height)
      end

      reshaped_input_size = [image.height, image.width]

      # NOTE: All pixel-level manipulation (i.e., modifying `pixelData`)
      # occurs with data in the hwc format (height, width, channels),
      # to emulate the behavior of the original Python code (w/ numpy).
      pixel_data = image.data
      img_dims = [image.height, image.width, image.channels]

      if @do_rescale
        rescale(pixel_data)
      end

      if !do_normalize.nil? ? do_normalize : @do_normalize
        image_mean = @image_mean
        if !@image_mean.is_a?(Array)
          image_mean = new Array(image.channels) { image_mean }
        end

        image_std = @image_std
        if !@image_std.is_a?(Array)
          image_std = new Array(image.channels) { image_std }
        end

        if image_mean.length != image.channels || image_std.length != image.channels
          raise Error, "When set to arrays, the length of `image_mean` (#{image_mean.length}) and `image_std` (#{image_std.length}) must match the number of channels in the image (#{image.channels})."
        end

        i = 0
        while i < pixel_data.length
          image.channels.times do |j|
            pixel_data[i + j] = (pixel_data[i + j] - image_mean[j]) / image_std[j]
          end
          i += image.channels
        end
      end

      # do padding after rescaling/normalizing
      if !do_pad.nil? ? do_pad : @do_pad
        if @pad_size
          padded = pad_image(pixel_data, [image.height, image.width, image.channels], @pad_size)
          pixel_data, img_dims = padded # Update pixel data and image dimensions
        elsif @size_divisibility
          raise Todo
        end
      end

      if !do_flip_channel_order.nil? ? do_flip_channel_order : @do_flip_channel_order
        raise Todo
      end

      # convert to channel dimension format (hwc -> chw)
      h, w, c = img_dims
      pixel_values =
        c.times.map do |ci|
          h.times.map do |hi|
            w.times.map do |wi|
              index = (hi * w * c) + (wi * c) + ci
              pixel_data[index]
            end
          end
        end

      {
        original_size: [src_height, src_width],
        reshaped_input_size: reshaped_input_size,
        pixel_values: pixel_values
      }
    end

    def call(images, *args)
      if !images.is_a?(Array)
        images = [images]
      end

      image_data = images.map { |x| preprocess(x) }

      # Stack pixel values
      pixel_values = Utils.stack(image_data.map { |x| x[:pixel_values] }, 0)

      {
        pixel_values: pixel_values,

        # Original sizes of images
        original_sizes: image_data.map { |x| x[:original_size] },

        # Reshaped sizes of images, before padding or cropping
        reshaped_input_sizes: image_data.map { |x| x[:reshaped_input_size] }
      }
    end
  end

  class CLIPFeatureExtractor < ImageFeatureExtractor
  end

  class DPTFeatureExtractor < ImageFeatureExtractor
  end

  class ViTFeatureExtractor < ImageFeatureExtractor
  end

  class OwlViTFeatureExtractor < ImageFeatureExtractor
    def post_process_object_detection(*args)
      Utils.post_process_object_detection(*args)
    end
  end

  class Swin2SRImageProcessor < ImageFeatureExtractor
    def pad_image(pixel_data, img_dims, pad_size, **options)
      # NOTE: In this case, `padSize` represents the size of the sliding window for the local attention.
      # In other words, the image is padded so that its width and height are multiples of `padSize`.
      image_height, image_width, _image_channels = img_dims

      super(
        pixel_data,
        img_dims,
        {
          # NOTE: For Swin2SR models, the original python implementation adds padding even when the image's width/height is already
          # a multiple of `pad_size`. However, this is most likely a bug (PR: https://github.com/mv-lab/swin2sr/pull/19).
          # For this reason, we only add padding when the image's width/height is not a multiple of `pad_size`.
          width: image_width + (pad_size - image_width % pad_size) % pad_size,
          height: image_height + (pad_size - image_height % pad_size) % pad_size
        },
        mode: "symmetric",
        center: false,
        constant_values: -1,
        **options
      )
    end
  end

  class DonutFeatureExtractor < ImageFeatureExtractor
    def pad_image(pixel_data, img_dims, pad_size, **options)
      _image_height, _image_width, image_channels = img_dims

      image_mean = @image_mean
      if !image_mean.is_a?(Array)
        image_mean = new Array(image_channels, image_mean)
      end

      image_std = @image_std
      if !image_std.is_a?(Array)
        image_std = new Array(image_channels, image_std)
      end

      constant_values = image_mean.map.with_index { |x, i| -x / image_std[i] }

      super(
        pixel_data,
        img_dims,
        pad_size,
        center: true,
        # Since normalization is done after padding, we need to use certain constant values to ensure the same behaviour is observed.
        # For more information, see https://github.com/huggingface/transformers/blob/main/src/transformers/models/donut/image_processing_donut.py#L433-L451
        constant_values: constant_values,
        **options
      )
    end
  end

  class DetrFeatureExtractor < ImageFeatureExtractor
    def call(images)
      result = super(images)

      # TODO support differently-sized images, for now assume all images are the same size.
      # TODO support different mask sizes (not just 64x64)
      # Currently, just fill pixel mask with 1s
      mask_size = [result[:pixel_values].size, 64, 64]
      pixel_mask =
        mask_size[0].times.map do
          mask_size[1].times.map do
            mask_size[2].times.map do
              1
            end
          end
        end

      result.merge(pixel_mask: pixel_mask)
    end

    def post_process_object_detection(*args)
      Utils.post_process_object_detection(*args)
    end

    def remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels)
      mask_probs_item = []
      pred_scores_item = []
      pred_labels_item = []

      class_logits.size.times do |j|
        cls = class_logits[j]
        mask = mask_logits[j]

        pred_label = Utils.max(cls)[1]
        if pred_label == num_labels
          # Is the background, so we ignore it
          next
        end

        scores = Utils.softmax(cls)
        pred_score = scores[pred_label]
        if pred_score > object_mask_threshold
          mask_probs_item << mask
          pred_scores_item << pred_score
          pred_labels_item << pred_label
        end
      end

      [mask_probs_item, pred_scores_item, pred_labels_item]
    end

    def check_segment_validity(
      mask_labels,
      mask_probs,
      k,
      mask_threshold = 0.5,
      overlap_mask_area_threshold = 0.8
    )
      # mask_k is a 1D array of indices, indicating where the mask is equal to k
      mask_k = []
      mask_k_area = 0
      original_area = 0

      mask_probs_k_data = mask_probs[k].flatten

      # Compute the area of all the stuff in query k
      mask_labels.length.times do |i|
        if mask_labels[i] == k
          mask_k << i
          mask_k_area += 1
        end

        if mask_probs_k_data[i] >= mask_threshold
          original_area += 1
        end
      end
      mask_exists = mask_k_area > 0 && original_area > 0

      # Eliminate disconnected tiny segments
      if mask_exists
        # Perform additional check
        area_ratio = mask_k_area / original_area
        mask_exists = area_ratio > overlap_mask_area_threshold
      end

      [mask_exists, mask_k]
    end

    def compute_segments(
      mask_probs,
      pred_scores,
      pred_labels,
      mask_threshold,
      overlap_mask_area_threshold,
      label_ids_to_fuse = nil,
      target_size = nil
    )
      height, width = target_size || Utils.dims(mask_probs[0])

      segmentation = Array.new(height * width)
      segments = []

      # 1. If target_size is not null, we need to resize the masks to the target size
      if !target_size.nil?
        # resize the masks to the target size
        mask_probs.length.times do |i|
          mask_probs[i] = Utils.interpolate(mask_probs[i], target_size, "bilinear", false)
        end
      end

      # 2. Weigh each mask by its prediction score
      # NOTE: `mask_probs` is updated in-place
      #
      # Temporary storage for the best label/scores for each pixel ([height, width]):
      mask_labels = Array.new(mask_probs[0].flatten.length)
      best_scores = Array.new(mask_probs[0].flatten.length, 0)

      mask_probs.length.times do |i|
        score = pred_scores[i]

        mask_probs_i_data = mask_probs[i].flatten
        mask_probs_i_dims = Utils.dims(mask_probs[i])

        mask_probs_i_data.length.times do |j|
          mask_probs_i_data[j] *= score
          if mask_probs_i_data[j] > best_scores[j]
            mask_labels[j] = i
            best_scores[j] = mask_probs_i_data[j]
          end
        end

        mask_probs[i] = Utils.reshape(mask_probs_i_data, mask_probs_i_dims)
      end

      current_segment_id = 0

      # stuff_memory_list = {}
      pred_labels.length.times do |k|
        pred_class = pred_labels[k]

        # TODO add `should_fuse`
        # should_fuse = label_ids_to_fuse.include?(pred_class)

        # Check if mask exists and large enough to be a segment
        mask_exists, mask_k = check_segment_validity(
          mask_labels,
          mask_probs,
          k,
          mask_threshold,
          overlap_mask_area_threshold
        )

        if !mask_exists
          # Nothing to see here
          next
        end

        current_segment_id += 1

        # Add current object segment to final segmentation map
        mask_k.each do |index|
          segmentation[index] = current_segment_id
        end

        segments << {
          id: current_segment_id,
          label_id: pred_class,
          score: pred_scores[k]
        }
      end

      segmentation = Utils.reshape(segmentation, [height, width])

      [segmentation, segments]
    end

    def post_process_panoptic_segmentation(
      outputs,
      threshold: 0.5,
      mask_threshold: 0.5,
      overlap_mask_area_threshold: 0.8,
      label_ids_to_fuse: nil,
      target_sizes: nil
    )
      if label_ids_to_fuse.nil?
        warn "`label_ids_to_fuse` unset. No instance will be fused."
        label_ids_to_fuse = Set.new
      end

      class_queries_logits = outputs[:logits] # [batch_size, num_queries, num_classes+1]
      masks_queries_logits = outputs[:pred_masks] # [batch_size, num_queries, height, width]

      mask_probs = Utils.sigmoid(masks_queries_logits) # [batch_size, num_queries, height, width]

      batch_size, _num_queries, num_labels = class_queries_logits.size, class_queries_logits[0].size, class_queries_logits[0][0].size
      num_labels -= 1 # Remove last class (background)

      if !target_sizes.nil? && target_sizes.length != batch_size
        raise Error, "Make sure that you pass in as many target sizes as the batch dimension of the logits"
      end

      to_return = []
      batch_size.times do |i|
        target_size = !target_sizes.nil? ? target_sizes[i] : nil

        class_logits = class_queries_logits[i]
        mask_logits = mask_probs[i]

        mask_probs_item, pred_scores_item, pred_labels_item = remove_low_and_no_objects(class_logits, mask_logits, threshold, num_labels)

        if pred_labels_item.length == 0
          raise Todo
        end

        # Get segmentation map and segment information of batch item
        segmentation, segments = compute_segments(
          mask_probs_item,
          pred_scores_item,
          pred_labels_item,
          mask_threshold,
          overlap_mask_area_threshold,
          label_ids_to_fuse,
          target_size
        )

        to_return << {
          segmentation: segmentation,
          segments_info: segments
        }
      end

      to_return
    end
  end

  module Utils
    def self.center_to_corners_format(v)
      centerX, centerY, width, height = v
      [
        centerX - width / 2.0,
        centerY - height / 2.0,
        centerX + width / 2.0,
        centerY + height / 2.0
      ]
    end

    def self.post_process_object_detection(outputs, threshold = 0.5, target_sizes = nil, is_zero_shot = false)
      out_logits = outputs[:logits]
      out_bbox = outputs[:pred_boxes]
      batch_size, num_boxes, num_classes = out_logits.size, out_logits[0].size, out_logits[0][0].size

      if !target_sizes.nil? && target_sizes.length != batch_size
        raise Error, "Make sure that you pass in as many target sizes as the batch dimension of the logits"
      end
      to_return = []
      batch_size.times do |i|
        target_size = !target_sizes.nil? ? target_sizes[i] : nil
        info = {
          boxes: [],
          classes: [],
          scores: []
        }
        logits = out_logits[i]
        bbox = out_bbox[i]

        num_boxes.times do |j|
          logit = logits[j]

          indices = []
          if is_zero_shot
            # Get indices of classes with high enough probability
            probs = Utils.sigmoid(logit)
            probs.length.times do |k|
              if probs[k] > threshold
                indices << k
              end
            end
          else
            # Get most probable class
            max_index = Utils.max(logit)[1]

            if max_index == num_classes - 1
              # This is the background class, skip it
              next
            end
            indices << max_index

            # Compute softmax over classes
            probs = Utils.softmax(logit)
          end

          indices.each do |index|
            box = bbox[j]

            # convert to [x0, y0, x1, y1] format
            box = center_to_corners_format(box)
            if !target_size.nil?
              box = box.map.with_index { |x, i| x * target_size[(i + 1) % 2] }
            end

            info[:boxes] << box
            info[:classes] << index
            info[:scores] << probs[index]
          end
        end
        to_return << info
      end
      to_return
    end
  end

  class WhisperFeatureExtractor < FeatureExtractor
    def initialize(config)
      super(config)

      raise Todo
    end

    def _extract_fbank_features(waveform)
      raise Todo
    end

    def call(audio)
      raise Todo
    end
  end

  class Wav2Vec2FeatureExtractor < FeatureExtractor
    def _zero_mean_unit_var_norm(input_values)
      sum = input_values.sum
      mean = sum / input_values.length.to_f
      variance = input_values.sum { |b| (b - mean) ** 2 } / input_values.length.to_f
      input_values.map { |x| (x - mean) / Math.sqrt(variance + 1e-7) }
    end

    def call(audio)
      # TODO
      # validate_audio_inputs(audio, 'Wav2Vec2FeatureExtractor')

      input_values = audio

      # zero-mean and unit-variance normalization
      if @config["do_normalize"]
        input_values = _zero_mean_unit_var_norm(input_values)
      end

      # TODO: allow user to pass in attention mask
      {
        input_values: [input_values],
        attention_mask: [Array.new(input_values.length, 1)]
      }
    end
  end

  class ClapFeatureExtractor < FeatureExtractor
    def initialize(config)
      super(config)

      # TODO
    end

    def call(audio, max_length: nil)
      raise Todo
    end
  end

  class Processor
    attr_reader :feature_extractor

    def initialize(feature_extractor)
      @feature_extractor = feature_extractor
    end

    def call(input, *args)
      @feature_extractor.(input, *args)
    end
  end

  class AutoProcessor
    FEATURE_EXTRACTOR_CLASS_MAPPING = {
      "ViTFeatureExtractor" => ViTFeatureExtractor,
      "OwlViTFeatureExtractor" => OwlViTFeatureExtractor,
      "CLIPFeatureExtractor" => CLIPFeatureExtractor,
      "DPTFeatureExtractor" => DPTFeatureExtractor,
      "DetrFeatureExtractor" => DetrFeatureExtractor,
      "Swin2SRImageProcessor" => Swin2SRImageProcessor,
      "DonutFeatureExtractor" => DonutFeatureExtractor,
      "WhisperFeatureExtractor" => WhisperFeatureExtractor,
      "Wav2Vec2FeatureExtractor" => Wav2Vec2FeatureExtractor,
      "ClapFeatureExtractor" => ClapFeatureExtractor
    }

    PROCESSOR_CLASS_MAPPING = {}

    def self.from_pretrained(
      pretrained_model_name_or_path,
      progress_callback: nil,
      config: nil,
      cache_dir: nil,
      local_files_only: false,
      revision: "main",
      **kwargs
    )
      preprocessor_config = config || Utils::Hub.get_model_json(pretrained_model_name_or_path, "preprocessor_config.json", true,
        progress_callback:,
        config:,
        cache_dir:,
        local_files_only:,
        revision:
      )

      # Determine feature extractor class
      # TODO: Ensure backwards compatibility with old configs
      key = preprocessor_config["feature_extractor_type"] || preprocessor_config["image_processor_type"]
      feature_extractor_class = FEATURE_EXTRACTOR_CLASS_MAPPING[key]

      if !feature_extractor_class
        if preprocessor_config["size"]
          # Assume ImageFeatureExtractor
          warn "Feature extractor type #{key.inspect} not found, assuming ImageFeatureExtractor due to size parameter in config."
          feature_extractor_class = ImageFeatureExtractor
        else
          raise Error, "Unknown Feature Extractor type: #{key}"
        end
      end

      # If no associated processor class, use default
      processor_class = PROCESSOR_CLASS_MAPPING[preprocessor_config["processor_class"]] || Processor

      # Instantiate processor and feature extractor
      feature_extractor = feature_extractor_class.new(preprocessor_config)
      processor_class.new(feature_extractor)
    end
  end
end


================================================
FILE: lib/informers/tokenizers.rb
================================================
module Informers
  class PreTrainedTokenizer
    attr_reader :mask_token, :mask_token_id, :sep_token_id

    def initialize(tokenizer_json, tokenizer_config)
      super()

      @tokenizer_config = tokenizer_config

      @tokenizer = Tokenizers::Tokenizer.from_file(tokenizer_json)

      # Add added_tokens to model
      @special_tokens = []
      @all_special_ids = []

      @added_tokens = []
      @tokenizer.added_tokens_decoder.each do |id, token|
        @added_tokens << token

        if token.special
          @special_tokens << token.content
          @all_special_ids << id
        end
      end

      # Update additional_special_tokens
      @additional_special_tokens = tokenizer_config["additional_special_tokens"] || []
      @special_tokens.concat(@additional_special_tokens)

      @mask_token = get_token("mask_token")
      @mask_token_id = @tokenizer.token_to_id(@mask_token) if @mask_token

      @sep_token = get_token("sep_token")
      @sep_token_id = @tokenizer.token_to_id(@sep_token) if @sep_token

      @model_max_length = tokenizer_config["model_max_length"]

      # for donut-base-finetuned-docvqa
      if @model_max_length && @model_max_length > (1 << 63)
        @model_max_length = 1 << 63
      end
    end

    def get_token(*keys)
      keys.each do |key|
        item = @tokenizer_config[key]
        if !item
          next
        end

        if item.is_a?(Hash)
          if item["__type"] == "AddedToken"
            return item["content"]
          else
            raise Error, "Unknown token: #{item}"
          end
        else
          return item
        end
      end

      nil
    end

    def call(
      text,
      text_pair: nil,
      add_special_tokens: true,
      padding: false,
      truncation: nil,
      max_length: nil,
      return_tensor: true,
      return_token_type_ids: true, # TODO change default
      return_offsets: false
    )
      is_batched = text.is_a?(Array)

      if is_batched
        if text.length == 0
          raise Error, "text array must be non-empty"
        end

        if !text_pair.nil?
          if !text_pair.is_a?(Array)
            raise Error, "text_pair must also be an array"
          elsif text.length != text_pair.length
            raise Error, "text and text_pair must have the same length"
          end
        end
      end

      if padding
        @tokenizer.enable_padding
      else
        @tokenizer.no_padding
      end

      if truncation
        @tokenizer.enable_truncation(max_length || @model_max_length)
      else
        @tokenizer.no_truncation
      end

      if is_batched
        input = text_pair ? text.zip(text_pair) : text
        encoded = @tokenizer.encode_batch(input, add_special_tokens: add_special_tokens)
      else
        encoded = [@tokenizer.encode(text, text_pair, add_special_tokens: add_special_tokens)]
      end

      result = {input_ids: encoded.map(&:ids), attention_mask: encoded.map(&:attention_mask)}
      if return_token_type_ids
        result[:token_type_ids] = encoded.map(&:type_ids)
      end
      if return_offsets
        result[:offsets] = encoded.map(&:offsets)
      end
      result
    end

    def decode(tokens, skip_special_tokens:)
      @tokenizer.decode(tokens, skip_special_tokens: skip_special_tokens)
    end

    def convert_tokens_to_string(tokens)
      @tokenizer.decoder.decode(tokens)
    end

    def convert_tokens_to_ids(tokens)
      tokens.map { |t| @tokenizer.token_to_id(t) }
    end

    def id_to_token(id)
      @tokenizer.id_to_token(id)
    end

    def batch_decode(batch, **decode_args)
      @tokenizer.decode_batch(batch, **decode_args)
    end

    def padding_side=(side)
      @tokenizer.enable_padding(direction: side)
    end
  end

  class BertTokenizer < PreTrainedTokenizer
    # TODO
    # self.return_token_type_ids = true
  end

  class DebertaV2Tokenizer < PreTrainedTokenizer
    # TODO
    # self.return_token_type_ids = true
  end

  class DistilBertTokenizer < PreTrainedTokenizer
  end

  class T5Tokenizer < PreTrainedTokenizer
  end

  class GPT2Tokenizer < PreTrainedTokenizer
    # _default_chat_template = `{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}`
  end

  class BartTokenizer < PreTrainedTokenizer
  end

  class RobertaTokenizer < PreTrainedTokenizer
  end

  class XLMRobertaTokenizer < PreTrainedTokenizer
  end

  class MPNetTokenizer < PreTrainedTokenizer
  end

  class CLIPTokenizer < PreTrainedTokenizer
  end

  class NllbTokenizer < PreTrainedTokenizer
    attr_reader :language_regex, :language_codes, :lang_to_token

    def initialize(tokenizer_json, tokenizer_config)
      super(tokenizer_json, tokenizer_config)

      @language_regex = /^[a-z]{3}_[A-Z][a-z]{3}$/
      @language_codes = @special_tokens.filter { |x| @language_regex.match?(x) }
      @lang_to_token = ->(x) { x } # Identity function
    end

    def _build_translation_inputs(raw_inputs, tokenizer_options, generate_kwargs)
      Utils._build_translation_inputs(self, raw_inputs, tokenizer_options, generate_kwargs)
    end
  end

  class M2M100Tokenizer < PreTrainedTokenizer
    attr_reader :language_regex, :language_codes, :lang_to_token

    def initialize(tokenizer_json, tokenizer_config)
      super(tokenizer_json, tokenizer_config)

      @language_regex = /^__[a-z]{2,3}__$/
      @language_codes = @special_tokens
        .filter { |x| @language_regex.match?(x) }
        .map { |x| x.slice(2, -2) }
      @lang_to_token = ->(x) { "__#{x}__" }
    end

    def _build_translation_inputs(raw_inputs, tokenizer_options, generate_kwargs)
      Utils._build_translation_inputs(self, raw_inputs, tokenizer_options, generate_kwargs)
    end
  end

  module Utils
    def self._build_translation_inputs(slf, raw_inputs, tokenizer_options, generate_kwargs)
      if !slf.respond_to?(:language_codes) || !slf.language_codes.is_a?(Array)
        raise Error, "Tokenizer must have `language_codes` attribute set and it should be an array of language ids."
      end
      if !slf.respond_to?(:language_regex) || !slf.language_regex.is_a?(Regexp)
        raise Error, "Tokenizer must have `language_regex` attribute set and it should be a regular expression."
      end
      if !slf.respond_to?(:lang_to_token) || !slf.lang_to_token.respond_to?(:call)
        raise Error, "Tokenizer must have `lang_to_token` attribute set and it should be a function."
      end
      src_lang_token = generate_kwargs[:src_lang]
      tgt_lang_token = generate_kwargs[:tgt_lang]

      if !slf.language_codes.include?(tgt_lang_token)
        raise Error, "Target language code #{tgt_lang_token.inspect} is not valid. Must be one of: #{slf.language_codes.join(", ")}"
      end

      if !src_lang_token.nil?
        # Check that the source language is valid:
        if !slf.language_codes.include?(src_lang_token)
          raise Error, "Source language code #{src_lang_token.inspect} is not valid. Must be one of: #{slf.language_codes.join(", ")}"
        end
      end

      # Override the `forced_bos_token_id` to force the correct language
      generate_kwargs["forced_bos_token_id"] = slf.convert_tokens_to_ids([slf.lang_to_token.(tgt_lang_token)])[0]

      slf.(raw_inputs, **tokenizer_options)
    end
  end

  class SpeechT5Tokenizer < PreTrainedTokenizer
  end

  class AutoTokenizer
    TOKENIZER_CLASS_MAPPING = {
      "T5Tokenizer" => T5Tokenizer,
      "BertTokenizer" => BertTokenizer,
      "DebertaV2Tokenizer" => DebertaV2Tokenizer,
      "DistilBertTokenizer" => DistilBertTokenizer,
      "BartTokenizer" => BartTokenizer,
      "RobertaTokenizer" => RobertaTokenizer,
      "XLMRobertaTokenizer" => XLMRobertaTokenizer,
      "MPNetTokenizer" => MPNetTokenizer,
      "CLIPTokenizer" => CLIPTokenizer,
      "GPT2Tokenizer" => GPT2Tokenizer,
      "NllbTokenizer" => NllbTokenizer,
      "M2M100Tokenizer" => M2M100Tokenizer,
      "SpeechT5Tokenizer" => SpeechT5Tokenizer,
      "PreTrainedTokenizer" => PreTrainedTokenizer
    }

    def self.from_pretrained(
      pretrained_model_name_or_path,
      quantized: true,
      progress_callback: nil,
      config: nil,
      cache_dir: nil,
      local_files_only: false,
      revision: "main",
      legacy: nil,
      **kwargs
    )
      tokenizer_json, tokenizer_config = load_tokenizer(
        pretrained_model_name_or_path,
        quantized:,
        progress_callback:,
        config:,
        cache_dir:,
        local_files_only:,
        revision:,
        legacy:
      )

      # Some tokenizers are saved with the "Fast" suffix, so we remove that if present.
      tokenizer_name = tokenizer_config["tokenizer_class"]&.delete_suffix("Fast") || "PreTrainedTokenizer"

      cls = TOKENIZER_CLASS_MAPPING[tokenizer_name]
      if !cls
        warn "Unknown tokenizer class #{tokenizer_name.inspect}, attempting to construct from base class."
        cls = PreTrainedTokenizer
      end
      cls.new(tokenizer_json, tokenizer_config)
    end

    def self.load_tokenizer(pretrained_model_name_or_path, **options)
      info = [
        Utils::Hub.get_model_file(pretrained_model_name_or_path, "tokenizer.json", true, **options),
        Utils::Hub.get_model_json(pretrained_model_name_or_path, "tokenizer_config.json", true, **options)
      ]

      # Override legacy option if `options.legacy` is not null
      if !options[:legacy].nil?
        info[1]["legacy"] = options[:legacy]
      end
      info
    end
  end
end


================================================
FILE: lib/informers/utils/audio.rb
================================================
module Informers
  module Utils
    def self.read_audio(input, sampling_rate)
      data =
        if input.is_a?(URI)
          require "open-uri"

          input.read
        elsif input.is_a?(String)
          File.binread(input)
        else
          raise ArgumentError, "Unsupported input type: #{input.class.name}"
        end

      ffmpeg_read(data, sampling_rate)
    end
  end
end


================================================
FILE: lib/informers/utils/core.rb
================================================
module Informers
  module Utils
    def self.dispatch_callback(progress_callback, data)
      progress_callback.(data) if progress_callback
    end

    def self.calculate_reflect_offset(i, w)
      ((i + w) % (2 * w) - w).abs
    end
  end
end


================================================
FILE: lib/informers/utils/dtypes.rb
================================================
module Informers
  module Utils
    DEFAULT_DTYPE_SUFFIX_MAPPING = {
      fp32: "",
      fp16: "_fp16",
      int8: "_int8",
      uint8: "_uint8",
      q8: "_quantized",
      q4: "_q4",
      q4f16: "_q4f16",
      bnb4: "_bnb4"
    }
  end
end


================================================
FILE: lib/informers/utils/ffmpeg.rb
================================================
# Copyright 2021 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

module Informers
  module Utils
    # from the Transformers Python library
    def self.ffmpeg_read(data, sampling_rate)
      ar = "#{sampling_rate}"
      ac = "1"
      format_for_conversion = "f32le"
      ffmpeg_command = [
        "ffmpeg",
        "-i",
        "pipe:0",
        "-ac",
        ac,
        "-ar",
        ar,
        "-f",
        format_for_conversion,
        "-hide_banner",
        "-loglevel",
        "quiet",
        "pipe:1"
      ]

      stdout, status = Open3.capture2(*ffmpeg_command, stdin_data: data)
      if !status.success?
        raise Error, "ffmpeg was not found but is required to load audio files from filename"
      end
      stdout.unpack("e*")
    end
  end
end


================================================
FILE: lib/informers/utils/generation.rb
================================================
module Informers
  module Utils
    class GenerationConfig
      def initialize(kwargs)
        @config = {}

        # Parameters that control the length of the output
        @config["max_length"] = kwargs["max_length"] || 20
        @config["max_new_tokens"] = kwargs["max_new_tokens"]
        @config["min_length"] = kwargs["min_length"] || 0
        @config["min_new_tokens"] = kwargs["min_new_tokens"]
        @config["early_stopping"] = kwargs["early_stopping"] || false
        @config["max_time"] = kwargs["max_time"]

        # Parameters that control the generation strategy used
        @config["do_sample"] = kwargs["do_sample"] || false
        @config["num_beams"] = kwargs["num_beams"] || 1
        @config["num_beam_groups"] = kwargs["num_beam_groups"] || 1
        @config["penalty_alpha"] = kwargs["penalty_alpha"]
        @config["use_cache"] = kwargs.fetch("use_cache", true)

        # Parameters for manipulation of the model output logits
        @config["temperature"] = kwargs["temperature"] || 1.0
        @config["top_k"] = kwargs["top_k"] || 50
        @config["top_p"] = kwargs["top_p"] || 1.0
        @config["typical_p"] = kwargs["typical_p"] || 1.0
        @config["epsilon_cutoff"] = kwargs["epsilon_cutoff"] || 0.0
        @config["eta_cutoff"] = kwargs["eta_cutoff"] || 0.0
        @config["diversity_penalty"] = kwargs["diversity_penalty"] || 0.0
        @config["repetition_penalty"] = kwargs["repetition_penalty"] || 1.0
        @config["encoder_repetition_penalty"] = kwargs["encoder_repetition_penalty"] || 1.0
        @config["length_penalty"] = kwargs["length_penalty"] || 1.0
        @config["no_repeat_ngram_size"] = kwargs["no_repeat_ngram_size"] || 0
        @config["bad_words_ids"] = kwargs["bad_words_ids"]
        @config["force_words_ids"] = kwargs["force_words_ids"]
        @config["renormalize_logits"] = kwargs["renormalize_logits"] || false
        @config["constraints"] = kwargs["constraints"]
        @config["forced_bos_token_id"] = kwargs["forced_bos_token_id"]
        @config["forced_eos_token_id"] = kwargs["forced_eos_token_id"]
        @config["remove_invalid_values"] = kwargs["remove_invalid_values"] || false
        @config["exponential_decay_length_penalty"] = kwargs["exponential_decay_length_penalty"]
        @config["suppress_tokens"] = kwargs["suppress_tokens"]
        @config["begin_suppress_tokens"] = kwargs["begin_suppress_tokens"]
        @config["forced_decoder_ids"] = kwargs["forced_decoder_ids"]

        # Parameters that define the output variables of `generate`
        @config["num_return_sequences"] = kwargs["num_return_sequences"] || 1
        @config["output_attentions"] = kwargs["output_attentions"] || false
        @config["output_hidden_states"] = kwargs["output_hidden_states"] || false
        @config["output_scores"] = kwargs["output_scores"] || false
        @config["return_dict_in_generate"] = kwargs["return_dict_in_generate"] || false

        # Special tokens that can be used at generation time
        @config["pad_token_id"] = kwargs["pad_token_id"]
        @config["bos_token_id"] = kwargs["bos_token_id"]
        @config["eos_token_id"] = kwargs["eos_token_id"]

        # Generation parameters exclusive to encoder-decoder models
        @config["encoder_no_repeat_ngram_size"] = kwargs["encoder_no_repeat_ngram_size"] || 0
        @config["decoder_start_token_id"] = kwargs["decoder_start_token_id"]

        # Wild card
        @generation_kwargs = kwargs["generation_kwargs"] || {}
      end

      def [](key)
        @config[key.to_s]
      end

      def merge!(config)
        @config.merge!(config)
      end
    end

    class Sampler
      def initialize(generation_config)
        super()
        @generation_config = generation_config
      end

      def call(logits, index = -1)
        # Sample from logits, of dims [batch, sequence_length, vocab_size].
        # If index is specified, sample from [batch, index, vocab_size].
        sample(logits, index)
      end

      def get_logits(logits, index)
        vocab_size = Utils.dims(logits)[-1]

        logs = logits.flatten

        if index == -1
          logs = logs.last(vocab_size)
        else
          raise Todo
        end

        # add temperature
        if @generation_config["temperature"] > 0
          logs = logs.map { |x| x / @generation_config["temperature"] }
        end
        logs
      end

      def self.get_sampler(generation_config)
        if generation_config[:do_sample]
          MultinomialSampler.new(generation_config)
        elsif generation_config[:num_beams] > 1
          BeamSearchSampler.new(generation_config)
        else
          if generation_config[:num_return_sequences] > 1
            raise Error, "num_return_sequences has to be 1 when doing greedy search, but is #{generation_config[:num_return_sequences]}."
          end
          GreedySampler.new(generation_config)
        end
      end
    end

    class GreedySampler < Sampler
      def sample(logits, index = -1)
        # NOTE: no need to do log_softmax here since we only take the maximum
        logs = get_logits(logits, index)
        argmax = Utils.max(logs)[1]

        # Note: score is meaningless in this context, since we are performing
        # greedy search (p = 1 => log(p) = 0)
        [
          [argmax, 0]
        ]
      end
    end

    class BeamSearchSampler < Sampler
      def sample(logits, index = -1)
        k = Utils.dims(logits)[-1] # defaults to vocab size
        if @generation_config["top_k"] > 0
          k = [@generation_config["top_k"], k].min
        end

        # Get logits of nth token
        logs = get_logits(logits, index)

        # Get top k tokens
        top_logits = Utils.get_top_items(logs, k)

        # Compute softmax over logits
        probabilities = Utils.softmax(top_logits.map { |x| x[1] })

        Array.new(@generation_config["num_beams"]) do |i|
          [
            top_logits[i][0],
            Math.log(probabilities[i])
          ]
        end
      end
    end

    class LogitsProcessorList
      def initialize
        super
        @processors = []
      end

      def push(item)
        @processors << item
      end

      def concat(items)
        @processors.concat(items)
      end

      def call(input_ids, batched_logits)
        # NOTE: This is different from the Python code, since vanilla Ruby does not support vectorized operations.
        # As a result, we apply each processor to each item in the batch.
        batched_logits.each do |logits|
          # Modifies logits inplace
          @processors.each do |func|
            func.(input_ids, logits)
          end
        end
      end

      def to_ary
        @processors
      end
    end

    class LogitsProcessor
    end

    class NoRepeatNGramLogitsProcessor < LogitsProcessor
      def initialize(no_repeat_ngram_size)
        super()
        @no_repeat_ngram_size = no_repeat_ngram_size
      end

      def get_ngrams(prev_input_ids)
        cur_len = prev_input_ids.length

        ngrams = []
        j = 0
        while j < cur_len + 1 - @no_repeat_ngram_size
          ngram = []
          @no_repeat_ngram_size.times do |k|
            ngram << prev_input_ids[j + k]
          end
          ngrams << ngram
          j += 1
        end

        generated_ngram = {}
        ngrams.each do |ngram|
          prev_ngram = ngram.slice(0, ngram.length - 1)
          prev_ngram_key = JSON.generate(prev_ngram)
          prev_ngram_value = generated_ngram[prev_ngram_key] || []
          prev_ngram_value << ngram[ngram.length - 1]
          generated_ngram[prev_ngram_key] = prev_ngram_value
        end
        generated_ngram
      end

      def get_generated_ngrams(banned_ngrams, prev_input_ids)
        ngram_idx = prev_input_ids.slice(prev_input_ids.length + 1 - @no_repeat_ngram_size, prev_input_ids.length)
        banned = banned_ngrams[JSON.generate(ngram_idx)] || []
        banned
      end

      def calc_banned_ngram_tokens(prev_input_ids)
        banned_tokens = []
        if prev_input_ids.length + 1 < @no_repeat_ngram_size
          # return no banned tokens if we haven't generated no_repeat_ngram_size tokens yet
          banned_tokens
        else
          generated_ngrams = get_ngrams(prev_input_ids)
          banned_tokens = get_generated_ngrams(generated_ngrams, prev_input_ids)
          banned_tokens
        end
      end

      def call(input_ids, logits)
        banned_tokens = calc_banned_ngram_tokens(input_ids)

        banned_tokens.each do |token|
          logits[token] = -Float::INFINITY
        end
        logits
      end
    end

    class MinLengthLogitsProcessor < LogitsProcessor
      def initialize(min_length, eos_token_id)
        super()
        @min_length = min_length
        @eos_token_id = eos_token_id.is_a?(Array) ? eos_token_id : [eos_token_id]
      end

      def call(input_ids, logits)
        if input_ids.length < @min_length
          @eos_token_id.each do |eos_token|
            logits[eos_token] = -Float::INFINITY
          end
        end

        logits
      end
    end

    class ForcedBOSTokenLogitsProcessor < LogitsProcessor
      def initialize(bos_token_id)
        super()
        @bos_token_id = bos_token_id
      end

      def call(input_ids, logits)
        if input_ids.length == 1
          logits.map! { -Float::INFINITY }
          logits[@bos_token_id] = 0
        end
        logits
      end
    end

    class ForcedEOSTokenLogitsProcessor < LogitsProcessor
      def initialize(max_length, forced_eos_token_id)
        super()
        @max_length = max_length
        @forced_eos_token_id = forced_eos_token_id
      end

      def call(input_ids, logits)
      end
    end
  end
end


================================================
FILE: lib/informers/utils/hub.rb
================================================
module Informers
  module Utils
    module Hub
      class FileResponse
        attr_reader :exists, :status

        def initialize(file_path)
          @file_path = file_path

          @exists = File.exist?(file_path)
          if @exists
            @status = ["200", "OK"]
          else
            @status = ["404", "Not Found"]
          end
        end

        def read
          File.binread(@file_path)
        end
      end

      def self.is_valid_url(string, protocols = nil, valid_hosts = nil)
        begin
          url = URI.parse(string)
        rescue
          return false
        end
        if protocols && !protocols.include?(url.scheme)
          return false
        end
        if valid_hosts && !valid_hosts.include?(url.host)
          return false
        end
        true
      end

      def self.get_file(url_or_path, progress_callback = nil, progress_info = {})
        if !is_valid_url(url_or_path, ["http", "https"])
          raise Error, "Invalid url"
        else
          headers = {}
          headers["User-Agent"] = "informers/#{VERSION};"

          # Check whether we are making a request to the Hugging Face Hub.
          is_hfurl = is_valid_url(url_or_path, ["http", "https"], ["huggingface.co", "hf.co"])
          if is_hfurl
            # If an access token is present in the environment variables,
            # we add it to the request headers.
            token = ENV["HF_TOKEN"]
            if token
              headers["Authorization"] = "Bearer #{token}"
            end
          end
          options = {}
          if progress_callback
            total_size = nil
            options[:content_length_proc] = lambda do |size|
              total_size = size
              Utils.dispatch_callback(progress_callback, {status: "download"}.merge(progress_info).merge(total_size: size))
            end
            options[:progress_proc] = lambda do |size|
              Utils.dispatch_callback(progress_callback, {status: "progress"}.merge(progress_info).merge(size: size, total_size: total_size))
            end
          end
          URI.parse(url_or_path).open(**headers, **options)
        end
      end

      class FileCache
        attr_reader :path

        def initialize(path)
          @path = path
        end

        def match(request)
          file_path = resolve_path(request)
          file = FileResponse.new(file_path)

          file if file.exists
        end

        def put(request, response)
          output_path = resolve_path(request)

          begin
            tmp_path = "#{output_path}.incomplete"
            FileUtils.mkdir_p(File.dirname(output_path))
            File.open(tmp_path, "wb") do |f|
              while !response.eof?
                f.write(response.read(1024 * 1024))
              end
            end
            FileUtils.move(tmp_path, output_path)
          rescue => e
            warn "An error occurred while writing the file to cache: #{e}"
          end
        end

        def resolve_path(request)
          File.join(@path, request)
        end
      end

      def self.try_cache(cache, *names)
        names.each do |name|
          begin
            result = cache.match(name)
            return result if result
          rescue
            next
          end
        end
        nil
      end

      def self.get_model_file(path_or_repo_id, filename, fatal = true, **options)
        # Initiate file retrieval
        Utils.dispatch_callback(options[:progress_callback], {
          status: "initiate",
          name: path_or_repo_id,
          file: filename
        })

        # If `cache_dir` is not specified, use the default cache directory
        cache = FileCache.new(options[:cache_dir] || Informers.cache_dir)

        revision = options[:revision] || "main"

        request_url = path_join(path_or_repo_id, filename)

        remote_url = path_join(
          Informers.remote_host,
          Informers.remote_path_template
            .gsub("{model}", path_or_repo_id)
            .gsub("{revision}", URI.encode_www_form_component(revision)),
          filename
        )

        # Choose cache key for filesystem cache
        # When using the main revision (default), we use the request URL as the cache key.
        # If a specific revision is requested, we account for this in the cache key.
        fs_cache_key = revision == "main" ? request_url : path_join(path_or_repo_id, revision, filename)

        proposed_cache_key = fs_cache_key

        resolved_path = cache.resolve_path(proposed_cache_key)

        # Whether to cache the final response in the end.
        to_cache_response = false

        # A caching system is available, so we try to get the file from it.
        response = try_cache(cache, proposed_cache_key)

        cache_hit = !response.nil?

        if response.nil?
          # File is not cached, so we perform the request

          if response.nil? || response.status[0] == "404"
            # File not found locally. This means either:
            # - The user has disabled local file access (`Informers.allow_local_models = false`)
            # - the path is a valid HTTP url (`response.nil?`)
            # - the path is not a valid HTTP url and the file is not present on the file system or local server (`response.status[0] == "404"`)

            if options[:local_files_only] || !Informers.allow_remote_models
              # User requested local files only, but the file is not found locally.
              if fatal
                raise Error, "`local_files_only: true` or `Informers.allow_remote_models = false` and file was not found locally at #{resolved_path.inspect}."
              else
                # File not found, but this file is optional.
                # TODO in future, cache the response?
                return nil
              end
            end

            progress_info = {
              name: path_or_repo_id,
              file: filename
            }

            # File not found locally, so we try to download it from the remote server
            response = get_file(remote_url, options[:progress_callback], progress_info)

            if response.status[0] != "200"
              # should not happen
              raise Todo
            end

            # Success! We use the proposed cache key from earlier
            cache_key = proposed_cache_key
          end

          to_cache_response = cache && !response.is_a?(FileResponse) && response.status[0] == "200"
        end

        if to_cache_response && cache_key && cache.match(cache_key).nil?
          cache.put(cache_key, response)
        end

        Utils.dispatch_callback(options[:progress_callback], {
          status: "done",
          name: path_or_repo_id,
          file: filename,
          cache_hit: cache_hit
        })

        resolved_path
      end

      def self.get_model_json(model_path, file_name, fatal = true, **options)
        buffer = get_model_file(model_path, file_name, fatal, **options)
        if buffer.nil?
          # Return empty object
          return {}
        end

        JSON.load_file(buffer)
      end

      def self.path_join(*parts)
        parts = parts.map.with_index do |part, index|
          if index != 0
            part = part.delete_prefix("/")
          end
          if index != parts.length - 1
            part = part.delete_suffix("/")
          end
          part
        end
        parts.join("/")
      end

      def self.display_progress(filename, width, size, expected_size)
        bar_width = [width - (filename.length + 3), 1].max
        progress = expected_size && expected_size > 0 ? size / expected_size.to_f : 0
        done = (progress * bar_width).round
        not_done = bar_width - done
        "#{filename} |#{"█" * done}#{" " * not_done}|"
      end
    end
  end
end


================================================
FILE: lib/informers/utils/image.rb
================================================
module Informers
  module Utils
    class RawImage
      RESAMPLING_MAPPING = {
        0 => "nearest",
        1 => "lanczos",
        2 => "bilinear",
        3 => "bicubic",
        4 => "box",
        5 => "hamming"
      }

      attr_reader :image, :width, :height, :channels

      def initialize(image)
        @image = image
        @width = image.width
        @height = image.height
        @channels = image.bands
      end

      def data
        @image.write_to_memory.unpack("C*")
      end

      def size
        [@width, @height]
      end

      def resize(width, height, resample: 2)
        resample_method = RESAMPLING_MAPPING[resample] || resample

        case resample_method
        when "bilinear", "bicubic"
          img =
            @image.affine(
              [width / @width.to_f, 0, 0, height / @height.to_f],
              interpolate: Vips::Interpolate.new(resample_method.to_sym)
            )
        else
          raise Todo
        end

        RawImage.new(img)
      end

      def center_crop(crop_width, crop_height)
        # If the image is already the desired size, return it
        if @width == crop_width && @height == crop_height
          return self
        end

        # Determine bounds of the image in the new canvas
        width_offset = (@width - crop_width) / 2.0
        height_offset = (@height - crop_height) / 2.0

        if width_offset >= 0 && height_offset >= 0
          # Cropped image lies entirely within the original image
          img = @image.crop(
            width_offset.floor,
            height_offset.floor,
            crop_width,
            crop_height
          )
        elsif width_offset <= 0 && height_offset <= 0
          raise Todo
        else
          raise Todo
        end

        RawImage.new(img)
      end

      def rgb
        if @channels == 3
          return self
        end

        raise Todo
      end

      def save(path)
        @image.write_to_file(path)
      end

      def self.read(input)
        if input.is_a?(RawImage)
          input
        elsif input.is_a?(URI)
          require "open-uri"

          RawImage.new(Vips::Image.new_from_buffer(input.read, ""))
        elsif input.is_a?(String)
          RawImage.new(Vips::Image.new_from_file(input))
        else
          raise ArgumentError, "Unsupported input type: #{input.class.name}"
        end
      end

      def self.from_array(input)
        c, h, w = Utils.dims(input)
        pixel_data = Array.new(w * h * c)

        input.each_with_index do |cv, ci|
          cv.each_with_index do |hv, hi|
            hv.each_with_index do |v, wi|
              pixel_data[(hi * w * c) + (wi * c) + ci] = v
            end
          end
        end

        RawImage.new(Vips::Image.new_from_memory_copy(pixel_data.pack("C*"), w, h, c, :uchar))
      end
    end
  end
end


================================================
FILE: lib/informers/utils/math.rb
================================================
module Informers
  module Utils
    def self.interpolate_data(input, in_shape, out_shape, mode = "bilinear", align_corners = false)
      in_channels, in_height, in_width = in_shape
      out_height, out_width = out_shape

      # TODO use mode and align_corners

      # Output image dimensions
      x_scale = out_width / in_width.to_f
      y_scale = out_height / in_height.to_f

      # Output image
      out_img = Array.new(out_height * out_width * in_channels)

      # Pre-calculate strides
      in_stride = in_height * in_width
      out_stride = out_height * out_width

      out_height.times do |i|
        out_width.times do |j|
          # Calculate output offset
          out_offset = i * out_width + j

          # Calculate input pixel coordinates
          x = (j + 0.5) / x_scale - 0.5
          y = (i + 0.5) / y_scale - 0.5

          # Calculate the four nearest input pixels
          # We also check if the input pixel coordinates are within the image bounds
          x1 = x.floor
          y1 = y.floor
          x2 = [x1 + 1, in_width - 1].min
          y2 = [y1 + 1, in_height - 1].min

          x1 = [x1, 0].max
          y1 = [y1, 0].max

          # Calculate the fractional distances between the input pixel and the four nearest pixels
          s = x - x1
          t = y - y1

          # Perform bilinear interpolation
          w1 = (1 - s) * (1 - t)
          w2 = s * (1 - t)
          w3 = (1 - s) * t
          w4 = s * t

          # Calculate the four nearest input pixel indices
          y_stride = y1 * in_width
          x_stride = y2 * in_width
          idx1 = y_stride + x1
          idx2 = y_stride + x2
          idx3 = x_stride + x1
          idx4 = x_stride + x2

          in_channels.times do |k|
            # Calculate channel offset
            c_offset = k * in_stride

            out_img[k * out_stride + out_offset] =
              w1 * input[c_offset + idx1] +
              w2 * input[c_offset + idx2] +
              w3 * input[c_offset + idx3] +
              w4 * input[c_offset + idx4]
          end
        end
      end

      out_img
    end

    def self.softmax(arr)
      # Compute the maximum value in the array
      max_val = arr.max

      #  Compute the exponentials of the array values
      exps = arr.map { |x| Math.exp(x - max_val) }

      # Compute the sum of the exponentials
      sum_exps = exps.sum

      # Compute the softmax values
      softmax_arr = exps.map { |x| x / sum_exps }

      softmax_arr
    end

    def self.sigmoid(arr)
      if arr[0].is_a?(Array)
        return arr.map { |a| sigmoid(a) }
      end
      arr.map { |v| 1 / (1 + Math.exp(-v)) }
    end

    def self.get_top_items(items, top_k = 0)
      # if top == 0, return all

      items = items
        .map.with_index { |x, i| [i, x] } # Get indices ([index, score])
        .sort_by { |v| -v[1] }            # Sort by log probabilities

      if !top_k.nil? && top_k > 0
        items = items.slice(0, top_k)     # Get top k items
      end

      items
    end

    def self.max(arr)
      if arr.length == 0
        raise Error, "Array must not be empty"
      end
      arr.map.with_index.max_by { |v, _| v }
    end
  end
end


================================================
FILE: lib/informers/utils/tensor.rb
================================================
module Informers
  module Utils
    def self.mean_pooling(last_hidden_state, attention_mask)
      last_hidden_state.zip(attention_mask).map do |state, mask|
        state[0].size.times.map do |k|
          sum = 0.0
          count = 0

          state.zip(mask) do |s, m|
            count += m
            sum += s[k] * m
          end

          sum / count
        end
      end
    end

    def self.normalize(result)
      result.map do |row|
        norm = Math.sqrt(row.sum { |v| v * v })
        row.map { |v| v / norm }
      end
    end

    def self.stack(tensors, dim = 0)
      tensors
    end

    def self.ones_like(tensor)
      if tensor[0].is_a?(Array)
        return tensor.map { |v| ones_like(v) }
      end
      tensor.map { |_| 1 }
    end

    def self.dims(tensor)
      dims = []
      while tensor.is_a?(Array)
        dims << tensor.size
        tensor = tensor[0]
      end
      dims
    end

    def self.interpolate(input, shape, mode = "bilinear", align_corners = false)
      out_height, out_width = shape

      # Input image dimensions
      in_channels = dims(input)[-3] || 1
      in_height = dims(input)[-2]
      in_width = dims(input)[-1]

      output = interpolate_data(
        input.flatten,
        [in_channels, in_height, in_width],
        [out_height, out_width],
        mode,
        align_corners
      )
      reshape(output, [in_channels, out_height, out_width])
    end

    def self.reshape(arr, dims)
      arr = arr.flatten
      dims[1..-1].reverse_each do |dim|
        arr = arr.each_slice(dim)
      end
      arr.to_a
    end
  end
end


================================================
FILE: lib/informers/version.rb
================================================
module Informers
  VERSION = "1.2.1"
end


================================================
FILE: lib/informers.rb
================================================
# dependencies
require "onnxruntime"
require "tokenizers"

# stdlib
require "io/console"
require "json"
require "open-uri"
require "open3"
require "stringio"
require "uri"

# modules
require_relative "informers/backends/onnx"
require_relative "informers/utils/audio"
require_relative "informers/utils/core"
require_relative "informers/utils/dtypes"
require_relative "informers/utils/generation"
require_relative "informers/utils/ffmpeg"
require_relative "informers/utils/hub"
require_relative "informers/utils/image"
require_relative "informers/utils/math"
require_relative "informers/utils/tensor"
require_relative "informers/configs"
require_relative "informers/env"
require_relative "informers/model"
require_relative "informers/models"
require_relative "informers/processors"
require_relative "informers/tokenizers"
require_relative "informers/version"
require_relative "informers/pipelines"

module Informers
  class Error < StandardError; end

  class Todo < Error
    def message
      "not implemented yet"
    end
  end
end


================================================
FILE: test/model_test.rb
================================================
require_relative "test_helper"

class ModelTest < Minitest::Test
  # https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
  def test_all_minilm
    sentences = ["This is an example sentence", "Each sentence is converted"]

    model = Informers.pipeline("embedding", "sentence-transformers/all-MiniLM-L6-v2")
    embeddings = model.(sentences)

    assert_elements_in_delta [0.067657, 0.063496, 0.048713], embeddings[0][..2]
    assert_elements_in_delta [0.086439, 0.10276, 0.0053946], embeddings[1][..2]
  end

  # https://huggingface.co/Xenova/all-MiniLM-L6-v2
  def test_all_minilm_xenova
    sentences = ["This is an example sentence", "Each sentence is converted"]

    model = Informers.pipeline("embedding", "Xenova/all-MiniLM-L6-v2", dtype: "q8")
    embeddings = model.(sentences)

    assert_elements_in_delta [0.045927, 0.07328, 0.05401], embeddings[0][..2]
    assert_elements_in_delta [0.081881, 0.1076, -0.01324], embeddings[1][..2]
  end

  # https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1
  def test_multi_qa_minilm
    query = "How many people live in London?"
    docs = ["Around 9 Million people live in London", "London is known for its financial district"]

    model = Informers.pipeline("embedding", "sentence-transformers/multi-qa-MiniLM-L6-cos-v1")
    query_embedding = model.(query)
    doc_embeddings = model.(docs)
    scores = doc_embeddings.map { |e| e.zip(query_embedding).sum { |d, q| d * q } }
    doc_score_pairs = docs.zip(scores).sort_by { |d, s| -s }

    assert_equal "Around 9 Million people live in London", doc_score_pairs[0][0]
    assert_in_delta 0.9156, doc_score_pairs[0][1]
    assert_equal "London is known for its financial district", doc_score_pairs[1][0]
    assert_in_delta 0.4948, doc_score_pairs[1][1]
  end

  # https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2
  def test_paraphrase_minilm
    sentences = ["This is an example sentence", "Each sentence is converted"]

    model = Informers.pipeline("embedding", "sentence-transformers/paraphrase-MiniLM-L6-v2")
    embeddings = model.(sentences, normalize: false)

    assert_elements_in_delta [0.067359, 0.783935, 0.270018], embeddings[0][..2]
    assert_elements_in_delta [0.122117, 0.670228, 0.317166], embeddings[1][..2]
  end

  # https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1
  def test_mxbai_embed
    query_prefix = "Represent this sentence for searching relevant passages: "

    input = [
      "The dog is barking",
      "The cat is purring",
      query_prefix + "puppy"
    ]

    model = Informers.pipeline("embedding", "mixedbread-ai/mxbai-embed-large-v1")
    embeddings = model.(input, pooling: "cls", normalize: false)

    assert_elements_in_delta [-0.61227727, 1.4060247, -0.04079155], embeddings[1][..2]
    assert_elements_in_delta [-0.00624076, 0.12864432, 0.5248165], embeddings[-1][..2]
  end

  # https://huggingface.co/Supabase/gte-small
  def test_gte_small
    sentences = ["That is a happy person", "That is a very happy person"]

    model = Informers.pipeline("embedding", "Supabase/gte-small")
    embeddings = model.(sentences)

    assert_elements_in_delta [-0.05316979, 0.01044252, 0.06194701], embeddings[0][..2]
    assert_elements_in_delta [-0.05246907, 0.03752426, 0.07344585], embeddings[-1][..2]
  end

  # https://huggingface.co/intfloat/e5-base-v2
  def test_e5_base
    doc_prefix = "passage: "
    query_prefix = "query: "

    input = [
      doc_prefix + "Ruby is a programming language created by Matz",
      query_prefix + "Ruby creator"
    ]

    model = Informers.pipeline("embedding", "intfloat/e5-base-v2")
    embeddings = model.(input)

    assert_elements_in_delta [-0.00596662, -0.03730119, -0.0703470], embeddings[0][..2]
    assert_elements_in_delta [0.00298353, -0.04421991, -0.0591884], embeddings[-1][..2]
  end

  # https://huggingface.co/nomic-ai/nomic-embed-text-v1
  def test_nomic_embed
    doc_prefix = "search_document: "
    query_prefix = "search_query: "

    input = [
      doc_prefix + "The dog is barking",
      query_prefix + "puppy"
    ]

    model = Informers.pipeline("embedding", "nomic-ai/nomic-embed-text-v1")
    embeddings = model.(input)

    assert_elements_in_delta [-0.00645858, 0.01145126, 0.0099767], embeddings[0][..2]
    assert_elements_in_delta [-0.01173127, 0.04957652, -0.0176401], embeddings[-1][..2]
  end

  # https://huggingface.co/BAAI/bge-base-en-v1.5
  def test_bge_base
    query_prefix = "Represent this sentence for searching relevant passages: "

    input = [
      "The dog is barking",
      "The cat is purring",
      query_prefix + "puppy"
    ]

    model = Informers.pipeline("embedding", "BAAI/bge-base-en-v1.5")
    embeddings = model.(input)

    assert_elements_in_delta [-0.07482512, -0.0770234, 0.03398684], embeddings[1][..2]
    assert_elements_in_delta [0.00029264, -0.0619305, -0.06199387], embeddings[-1][..2]
  end

  # https://huggingface.co/jinaai/jina-embeddings-v2-base-en
  def test_jina_embeddings
    sentences = ["How is the weather today?", "What is the current weather like today?"]

    model = Informers.pipeline("embedding", "jinaai/jina-embeddings-v2-base-en", model_file_name: "../model")
    embeddings = model.(sentences)

    assert_elements_in_delta [-0.02488641, -0.0429398, 0.04303398], embeddings[0][..2]
    assert_elements_in_delta [-0.0081194, -0.06225249, 0.03116853], embeddings[1][..2]
  end

  # https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5
  def test_snowflake_arctic_embed
    query_prefix = "Represent this sentence for searching relevant passages: "

    input = [
      "The dog is barking",
      "The cat is purring",
      query_prefix + "puppy"
    ]

    model = Informers.pipeline("embedding", "Snowflake/snowflake-arctic-embed-m-v1.5")
    embeddings = model.(input, model_output: "sentence_embedding", pooling: "none")

    assert_elements_in_delta [0.03239886, 0.0009998, 0.08401278], embeddings[0][..2]
    assert_elements_in_delta [-0.02530634, -0.02715422, 0.01218867], embeddings[-1][..2]

    embeddings = model.(input, model_output: "token_embeddings", pooling: "cls")

    assert_elements_in_delta [0.03239886, 0.0009998, 0.08401278], embeddings[0][..2]
    assert_elements_in_delta [-0.02530634, -0.02715422, 0.01218867], embeddings[-1][..2]
  end

  # https://huggingface.co/sentence-transformers/all-mpnet-base-v2
  def test_all_mpnet
    sentences = ["This is an example sentence", "Each sentence is converted"]

    model = Informers.pipeline("embedding", "sentence-transformers/all-mpnet-base-v2")
    embeddings = model.(sentences)

    assert_elements_in_delta [0.02250263, -0.07829167, -0.02303071], embeddings[0][..2]
    assert_elements_in_delta [0.04170236, 0.00109747, -0.01553415], embeddings[1][..2]
  end

  # https://huggingface.co/BAAI/bge-m3
  def test_bge_m3
    sentences = ["This is an example sentence", "Each sentence is converted"]

    model = Informers.pipeline("embedding", "BAAI/bge-m3")
    model.(sentences, model_output: "token_embeddings")
  end

  # https://huggingface.co/mixedbread-ai/mxbai-rerank-base-v1
  def test_mxbai_rerank
    query = "How many people live in London?"
    docs = ["Around 9 Million people live in London", "London is known for its financial district"]

    model = Informers.pipeline("reranking", "mixedbread-ai/mxbai-rerank-base-v1")
    result = model.(query, docs, return_documents: true)

    assert_equal 0, result[0][:doc_id]
    assert_in_delta 0.984, result[0][:score]
    assert_equal docs[0], result[0][:text]

    assert_equal 1, result[1][:doc_id]
    assert_in_delta 0.139, result[1][:score]
    assert_equal docs[1], result[1][:text]
  end

  # https://huggingface.co/jinaai/jina-reranker-v1-turbo-en
  def test_jina_reranker
    query = "How many people live in London?"
    docs = ["Around 9 Million people live in London", "London is known for its financial district"]

    model = Informers.pipeline("reranking", "jinaai/jina-reranker-v1-turbo-en")
    result = model.(query, docs, return_documents: true)

    assert_equal 0, result[0][:doc_id]
    assert_in_delta 0.912, result[0][:score]
    assert_equal docs[0], result[0][:text]

    assert_equal 1, result[1][:doc_id]
    assert_in_delta 0.0555, result[1][:score]
    assert_equal docs[1], result[1][:text]
  end

  # https://huggingface.co/BAAI/bge-reranker-base
  def test_bge_reranker
    query = "How many people live in London?"
    docs = ["Around 9 Million people live in London", "London is known for its financial district"]

    model = Informers.pipeline("reranking", "BAAI/bge-reranker-base")
    result = model.(query, docs, return_documents: true)

    assert_equal 0, result[0][:doc_id]
    assert_in_delta 0.996, result[0][:score]
    assert_equal docs[0], result[0][:text]

    assert_equal 1, result[1][:doc_id]
    assert_in_delta 0.000158, result[1][:score], 0.000001
    assert_equal docs[1], result[1][:text]
  end

  # https://huggingface.co/Xenova/ms-marco-MiniLM-L-6-v2
  def test_ms_marco_minilm
    query = "How many people live in London?"
    docs = ["Around 9 Million people live in London", "London is known for its financial district"]

    model = Informers.pipeline("reranking", "Xenova/ms-marco-MiniLM-L-6-v2")
    result = model.(query, docs, return_documents: true)

    assert_equal 0, result[0][:doc_id]
    assert_in_delta 1, result[0][:score]
    assert_equal docs[0], result[0][:text]

    assert_equal 1, result[1][:doc_id]
    assert_in_delta 0.0067, result[1][:score]
    assert_equal docs[1], result[1][:text]
  end
end


================================================
FILE: test/pipeline_test.rb
================================================
require_relative "test_helper"

class PipelineTest < Minitest::Test
  def test_ner
    ner = Informers.pipeline("ner")
    result = ner.("Ruby is a programming language created by Matz")
    assert_equal 1, result.size
    assert_equal "PER", result[0][:entity_group]
    assert_in_delta 0.994, result[0][:score]
    assert_equal "Matz", result[0][:word]
    assert_equal 42, result[0][:start]
    assert_equal 46, result[0][:end]
  end

  def test_ner_aggregation_strategy
    ner = Informers.pipeline("ner")
    result = ner.("Ruby is a programming language created by Matz", aggregation_strategy: "none")
    assert_equal 2, result.size
    assert_equal "B-PER", result[0][:entity]
    assert_in_delta 0.996, result[0][:score]
    assert_equal 8, result[0][:index]
    assert_equal "Mat", result[0][:word]
    assert_equal 42, result[0][:start]
    assert_equal 45, result[0][:end]
  end

  def test_sentiment_analysis
    classifier = Informers.pipeline("sentiment-analysis")
    result = classifier.("I love transformers!")
    assert_equal "POSITIVE", result[:label]
    assert_in_delta 0.9997887, result[:score], 0.0000001

    result = classifier.("This is super cool")
    assert_equal "POSITIVE", result[:label]
    assert_in_delta 0.9998608, result[:score], 0.0000001

    result = classifier.(["This is super cool", "I didn't like it"])
    assert_equal "POSITIVE", result[0][:label]
    assert_in_delta 0.9998600, result[0][:score], 0.0000001
    assert_equal "NEGATIVE", result[1][:label]
    assert_in_delta 0.9985375, result[1][:score], 0.0000001
  end

  def test_question_answering
    qa = Informers.pipeline("question-answering")
    result = qa.("Who invented Ruby?", "Ruby is a programming language created by Matz")
    assert_in_delta 0.998, result[:score]
    assert_equal "Matz", result[:answer]
    assert_equal 42, result[:start]
    assert_equal 46, result[:end]
  end

  def test_zero_shot_classification
    classifier = Informers.pipeline("zero-shot-classification")
    text = "Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app."
    labels = ["mobile", "billing", "website", "account access"]
    result = classifier.(text, labels)
    assert_equal text, result[:sequence]
    assert_equal ["mobile", "billing", "account access", "website"], result[:labels]
    assert_elements_in_delta [0.633, 0.134, 0.121, 0.111], result[:scores]
  end

  def test_text2text_generation
    text2text = Informers.pipeline("text2text-generation")
    result = text2text.("translate from English to French: I'm very happy")
    assert_equal "Je suis très heureux.", result[0][:generated_text]
  end

  def test_translation
    translator = Informers.pipeline("translation", "Xenova/nllb-200-distilled-600M")
    result = translator.("जीवन एक चॉकलेट बॉक्स की तरह है।", src_lang: "hin_Deva", tgt_lang: "fra_Latn")
    assert_equal "La vie est comme une boîte à chocolat.", result[0][:translation_text]
  end

  def test_text_generation
    generator = Informers.pipeline("text-generation")
    result = generator.("I enjoy walking with my cute dog,")
    assert_equal "I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to", result[0][:generated_text]
Download .txt
gitextract_hoze518r/

├── .github/
│   └── workflows/
│       └── build.yml
├── .gitignore
├── CHANGELOG.md
├── Gemfile
├── LICENSE.txt
├── README.md
├── Rakefile
├── informers.gemspec
├── lib/
│   ├── informers/
│   │   ├── backends/
│   │   │   └── onnx.rb
│   │   ├── configs.rb
│   │   ├── env.rb
│   │   ├── model.rb
│   │   ├── models.rb
│   │   ├── pipelines.rb
│   │   ├── processors.rb
│   │   ├── tokenizers.rb
│   │   ├── utils/
│   │   │   ├── audio.rb
│   │   │   ├── core.rb
│   │   │   ├── dtypes.rb
│   │   │   ├── ffmpeg.rb
│   │   │   ├── generation.rb
│   │   │   ├── hub.rb
│   │   │   ├── image.rb
│   │   │   ├── math.rb
│   │   │   └── tensor.rb
│   │   └── version.rb
│   └── informers.rb
└── test/
    ├── model_test.rb
    ├── pipeline_test.rb
    └── test_helper.rb
Download .txt
SYMBOL INDEX (497 symbols across 22 files)

FILE: lib/informers.rb
  type Informers (line 33) | module Informers
    class Error (line 34) | class Error < StandardError; end
    class Todo (line 36) | class Todo < Error
      method message (line 37) | def message

FILE: lib/informers/backends/onnx.rb
  type Informers (line 1) | module Informers
    type Backends (line 2) | module Backends
      type Onnx (line 3) | module Onnx
        function device_to_execution_providers (line 4) | def self.device_to_execution_providers(device)

FILE: lib/informers/configs.rb
  type Informers (line 1) | module Informers
    class PretrainedConfig (line 2) | class PretrainedConfig
      method initialize (line 3) | def initialize(config_json)
      method [] (line 7) | def [](key)
      method []= (line 11) | def []=(key, value)
      method to_h (line 15) | def to_h
      method from_pretrained (line 19) | def self.from_pretrained(
      method load_config (line 39) | def self.load_config(pretrained_model_name_or_path, **options)
    class AutoConfig (line 45) | class AutoConfig
      method from_pretrained (line 46) | def self.from_pretrained(...)

FILE: lib/informers/env.rb
  type Informers (line 1) | module Informers

FILE: lib/informers/model.rb
  type Informers (line 1) | module Informers
    class Model (line 3) | class Model
      method initialize (line 4) | def initialize(model_id, quantized: false)
      method embed (line 9) | def embed(texts)

FILE: lib/informers/models.rb
  type Informers (line 1) | module Informers
    class PretrainedMixin (line 16) | class PretrainedMixin
      method from_pretrained (line 17) | def self.from_pretrained(
    class PreTrainedModel (line 69) | class PreTrainedModel
      method initialize (line 74) | def initialize(config, session)
      method from_pretrained (line 110) | def self.from_pretrained(
      method construct_session (line 178) | def self.construct_session(pretrained_model_name_or_path, file_name,...
      method call (line 210) | def call(model_inputs, **kwargs)
      method generate (line 214) | def generate(inputs, generation_config = nil, logits_processor = nil...
      method get_logits_processor (line 353) | def get_logits_processor(
      method get_generation_config (line 410) | def get_generation_config(generation_config)
      method seq2seq_forward (line 429) | def seq2seq_forward(model_inputs)
      method prepare_position_ids (line 465) | def prepare_position_ids(session, feeds, use_cache_branch)
      method get_past_key_values (line 473) | def get_past_key_values(decoder_results, past_key_values)
      method get_attentions (line 493) | def get_attentions(decoder_results)
      method add_past_key_values (line 509) | def add_past_key_values(decoder_feeds, past_key_values)
      method seq2seq_start_beams (line 541) | def seq2seq_start_beams(input_token_ids, generation_config, num_outp...
      method prepare_attention_mask (line 587) | def prepare_attention_mask(tokens)
      method seq2seq_run_beam (line 605) | def seq2seq_run_beam(beam)
      method seq2seq_update_beam (line 636) | def seq2seq_update_beam(beam, new_token_id)
      method group_beams (line 640) | def group_beams(beams)
      method encoder_forward (line 653) | def encoder_forward(model_inputs, output_names: nil)
      method decoder_forward (line 665) | def decoder_forward(model_inputs)
      method decoder_start_beams (line 691) | def decoder_start_beams(input_token_ids, generation_config, num_outp...
      method decoder_run_beam (line 730) | def decoder_run_beam(beam)
      method decoder_update_beam (line 749) | def decoder_update_beam(beam, new_token_id)
      method session_run (line 754) | def session_run(session, inputs, output_names: nil)
      method replace_tensors (line 766) | def replace_tensors(obj)
      method validate_inputs (line 771) | def validate_inputs(session, inputs)
      method get_start_beams (line 775) | def get_start_beams(input_token_ids, generation_config, num_output_t...
      method run_beam (line 779) | def run_beam(beam)
      method update_beam (line 783) | def update_beam(beam, new_token_id)
    class BertPreTrainedModel (line 788) | class BertPreTrainedModel < PreTrainedModel
    class BertModel (line 791) | class BertModel < BertPreTrainedModel
    class BertForMaskedLM (line 794) | class BertForMaskedLM < BertPreTrainedModel
      method call (line 795) | def call(model_inputs)
    class BertForSequenceClassification (line 800) | class BertForSequenceClassification < BertPreTrainedModel
      method call (line 801) | def call(model_inputs)
    class BertForTokenClassification (line 806) | class BertForTokenClassification < BertPreTrainedModel
      method call (line 807) | def call(model_inputs)
    class ModernBertPreTrainedModel (line 812) | class ModernBertPreTrainedModel < PreTrainedModel
    class ModernBertModel (line 815) | class ModernBertModel < ModernBertPreTrainedModel
    class ModernBertForMaskedLM (line 818) | class ModernBertForMaskedLM < ModernBertPreTrainedModel
      method call (line 819) | def call(model_inputs)
    class ModernBertForSequenceClassification (line 824) | class ModernBertForSequenceClassification < ModernBertPreTrainedModel
      method call (line 825) | def call(model_inputs)
    class ModernBertForTokenClassification (line 830) | class ModernBertForTokenClassification < ModernBertPreTrainedModel
      method call (line 831) | def call(model_inputs)
    class NomicBertPreTrainedModel (line 836) | class NomicBertPreTrainedModel < PreTrainedModel
    class NomicBertModel (line 839) | class NomicBertModel < NomicBertPreTrainedModel
    class ConvBertPreTrainedModel (line 842) | class ConvBertPreTrainedModel < PreTrainedModel
    class ConvBertModel (line 845) | class ConvBertModel < ConvBertPreTrainedModel
    class ElectraPreTrainedModel (line 848) | class ElectraPreTrainedModel < PreTrainedModel
    class ElectraModel (line 852) | class ElectraModel < ElectraPreTrainedModel
    class DebertaV2PreTrainedModel (line 855) | class DebertaV2PreTrainedModel < PreTrainedModel
    class DebertaV2Model (line 858) | class DebertaV2Model < DebertaV2PreTrainedModel
    class DistilBertPreTrainedModel (line 861) | class DistilBertPreTrainedModel < PreTrainedModel
    class DistilBertModel (line 864) | class DistilBertModel < DistilBertPreTrainedModel
    class DistilBertForSequenceClassification (line 867) | class DistilBertForSequenceClassification < DistilBertPreTrainedModel
      method call (line 868) | def call(model_inputs)
    class DistilBertForQuestionAnswering (line 873) | class DistilBertForQuestionAnswering < DistilBertPreTrainedModel
      method call (line 874) | def call(model_inputs)
    class MPNetPreTrainedModel (line 879) | class MPNetPreTrainedModel < PreTrainedModel
    class MPNetModel (line 882) | class MPNetModel < MPNetPreTrainedModel
    class T5PreTrainedModel (line 885) | class T5PreTrainedModel < PreTrainedModel
    class T5Model (line 888) | class T5Model < T5PreTrainedModel
    class T5ForConditionalGeneration (line 891) | class T5ForConditionalGeneration < T5PreTrainedModel
      method initialize (line 892) | def initialize(config, session, decoder_merged_session, generation_c...
    class BartPretrainedModel (line 907) | class BartPretrainedModel < PreTrainedModel
    class BartModel (line 910) | class BartModel < BartPretrainedModel
    class BartForConditionalGeneration (line 913) | class BartForConditionalGeneration < BartPretrainedModel
      method initialize (line 914) | def initialize(config, session, decoder_merged_session, generation_c...
    class BartForSequenceClassification (line 929) | class BartForSequenceClassification < BartPretrainedModel
      method call (line 930) | def call(model_inputs)
    class MBartPreTrainedModel (line 935) | class MBartPreTrainedModel < PreTrainedModel
    class MBartModel (line 938) | class MBartModel < MBartPreTrainedModel
    class MBartForCausalLM (line 941) | class MBartForCausalLM < MBartPreTrainedModel
      method initialize (line 945) | def initialize(config, decoder_merged_session, generation_config)
    class M2M100PreTrainedModel (line 959) | class M2M100PreTrainedModel < PreTrainedModel
    class M2M100Model (line 962) | class M2M100Model < M2M100PreTrainedModel
    class M2M100ForConditionalGeneration (line 965) | class M2M100ForConditionalGeneration < M2M100PreTrainedModel
      method initialize (line 966) | def initialize(config, session, decoder_merged_session, generation_c...
    class Wav2Vec2PreTrainedModel (line 981) | class Wav2Vec2PreTrainedModel < PreTrainedModel
    class Wav2Vec2Model (line 984) | class Wav2Vec2Model < Wav2Vec2PreTrainedModel
    class Wav2Vec2ForSequenceClassification (line 987) | class Wav2Vec2ForSequenceClassification < Wav2Vec2PreTrainedModel
      method call (line 988) | def call(model_inputs)
    class RobertaPreTrainedModel (line 993) | class RobertaPreTrainedModel < PreTrainedModel
    class RobertaModel (line 996) | class RobertaModel < RobertaPreTrainedModel
    class RobertaForMaskedLM (line 999) | class RobertaForMaskedLM < RobertaPreTrainedModel
      method call (line 1000) | def call(model_inputs)
    class RobertaForTokenClassification (line 1005) | class RobertaForTokenClassification <  RobertaPreTrainedModel
      method call (line 1006) | def call(model_inputs)
    class RobertaForSequenceClassification (line 1011) | class RobertaForSequenceClassification < RobertaPreTrainedModel
      method call (line 1012) | def call(model_inputs)
    class XLMRobertaPreTrainedModel (line 1017) | class XLMRobertaPreTrainedModel < PreTrainedModel
    class XLMRobertaModel (line 1020) | class XLMRobertaModel < XLMRobertaPreTrainedModel
    class XLMRobertaForSequenceClassification (line 1023) | class XLMRobertaForSequenceClassification < XLMRobertaPreTrainedModel
      method call (line 1024) | def call(model_inputs)
    class ViTPreTrainedModel (line 1029) | class ViTPreTrainedModel < PreTrainedModel
    class ViTModel (line 1032) | class ViTModel < ViTPreTrainedModel
    class ViTForImageClassification (line 1035) | class ViTForImageClassification < ViTPreTrainedModel
      method call (line 1036) | def call(model_inputs)
    class CLIPPreTrainedModel (line 1041) | class CLIPPreTrainedModel < PreTrainedModel
    class CLIPModel (line 1044) | class CLIPModel < CLIPPreTrainedModel
    class GPT2PreTrainedModel (line 1047) | class GPT2PreTrainedModel < PreTrainedModel
      method initialize (line 1050) | def initialize(config, session, generation_config)
    class GPT2Model (line 1063) | class GPT2Model < GPT2PreTrainedModel
    class GPT2LMHeadModel (line 1066) | class GPT2LMHeadModel < GPT2PreTrainedModel
    class OwlViTPreTrainedModel (line 1069) | class OwlViTPreTrainedModel < PreTrainedModel
    class OwlViTModel (line 1072) | class OwlViTModel < OwlViTPreTrainedModel
    class OwlViTForObjectDetection (line 1075) | class OwlViTForObjectDetection < OwlViTPreTrainedModel
    class DetrPreTrainedModel (line 1078) | class DetrPreTrainedModel < PreTrainedModel
    class DetrModel (line 1081) | class DetrModel < DetrPreTrainedModel
    class DetrForObjectDetection (line 1084) | class DetrForObjectDetection < DetrPreTrainedModel
      method call (line 1085) | def call(model_inputs)
    class DetrForSegmentation (line 1090) | class DetrForSegmentation < DetrPreTrainedModel
      method call (line 1091) | def call(model_inputs)
    class Swin2SRPreTrainedModel (line 1096) | class Swin2SRPreTrainedModel < PreTrainedModel
    class Swin2SRModel (line 1099) | class Swin2SRModel < Swin2SRPreTrainedModel
    class Swin2SRForImageSuperResolution (line 1102) | class Swin2SRForImageSuperResolution < Swin2SRPreTrainedModel
    class DPTPreTrainedModel (line 1105) | class DPTPreTrainedModel < PreTrainedModel
    class DPTModel (line 1108) | class DPTModel < DPTPreTrainedModel
    class DPTForDepthEstimation (line 1111) | class DPTForDepthEstimation < DPTPreTrainedModel
    class VisionEncoderDecoderModel (line 1114) | class VisionEncoderDecoderModel < PreTrainedModel
      method initialize (line 1117) | def initialize(config, session, decoder_merged_session, generation_c...
    class DonutSwinPreTrainedModel (line 1161) | class DonutSwinPreTrainedModel < PreTrainedModel
    class DonutSwinModel (line 1164) | class DonutSwinModel < DonutSwinPreTrainedModel
    class WhisperPreTrainedModel (line 1167) | class WhisperPreTrainedModel < PreTrainedModel
    class WhisperModel (line 1170) | class WhisperModel < WhisperPreTrainedModel
    class WhisperForConditionalGeneration (line 1173) | class WhisperForConditionalGeneration < WhisperPreTrainedModel
      method initialize (line 1177) | def initialize(config, session, decoder_merged_session, generation_c...
      method generate (line 1191) | def generate(inputs, generation_config = nil, logits_processor = nil)
    class VitsPreTrainedModel (line 1196) | class VitsPreTrainedModel < PreTrainedModel
    class VitsModel (line 1199) | class VitsModel < VitsPreTrainedModel
      method call (line 1200) | def call(model_inputs)
    class SpeechT5PreTrainedModel (line 1205) | class SpeechT5PreTrainedModel < PreTrainedModel
    class SpeechT5Model (line 1208) | class SpeechT5Model < SpeechT5PreTrainedModel
    class SpeechT5ForSpeechToText (line 1211) | class SpeechT5ForSpeechToText < SpeechT5PreTrainedModel
    class SpeechT5ForTextToSpeech (line 1214) | class SpeechT5ForTextToSpeech < SpeechT5PreTrainedModel
    class ClapPreTrainedModel (line 1217) | class ClapPreTrainedModel < PreTrainedModel
    class ClapModel (line 1220) | class ClapModel < ClapPreTrainedModel
    class AutoModel (line 1392) | class AutoModel < PretrainedMixin
    class AutoModelForSequenceClassification (line 1397) | class AutoModelForSequenceClassification < PretrainedMixin
    class AutoModelForTokenClassification (line 1401) | class AutoModelForTokenClassification < PretrainedMixin
    class AutoModelForSeq2SeqLM (line 1405) | class AutoModelForSeq2SeqLM < PretrainedMixin
    class AutoModelForSpeechSeq2Seq (line 1409) | class AutoModelForSpeechSeq2Seq < PretrainedMixin
    class AutoModelForTextToSpectrogram (line 1413) | class AutoModelForTextToSpectrogram < PretrainedMixin
    class AutoModelForTextToWaveform (line 1417) | class AutoModelForTextToWaveform < PretrainedMixin
    class AutoModelForCausalLM (line 1421) | class AutoModelForCausalLM < PretrainedMixin
    class AutoModelForMaskedLM (line 1425) | class AutoModelForMaskedLM < PretrainedMixin
    class AutoModelForQuestionAnswering (line 1429) | class AutoModelForQuestionAnswering < PretrainedMixin
    class AutoModelForVision2Seq (line 1433) | class AutoModelForVision2Seq < PretrainedMixin
    class AutoModelForImageClassification (line 1437) | class AutoModelForImageClassification < PretrainedMixin
    class AutoModelForImageSegmentation (line 1441) | class AutoModelForImageSegmentation < PretrainedMixin
    class AutoModelForSemanticSegmentation (line 1445) | class AutoModelForSemanticSegmentation < PretrainedMixin
    class AutoModelForObjectDetection (line 1449) | class AutoModelForObjectDetection < PretrainedMixin
    class AutoModelForZeroShotObjectDetection (line 1453) | class AutoModelForZeroShotObjectDetection < PretrainedMixin
    class AutoModelForMaskGeneration (line 1457) | class AutoModelForMaskGeneration < PretrainedMixin
    class AutoModelForCTC (line 1461) | class AutoModelForCTC < PretrainedMixin
    class AutoModelForAudioClassification (line 1465) | class AutoModelForAudioClassification < PretrainedMixin
    class AutoModelForXVector (line 1469) | class AutoModelForXVector < PretrainedMixin
    class AutoModelForAudioFrameClassification (line 1473) | class AutoModelForAudioFrameClassification < PretrainedMixin
    class AutoModelForDocumentQuestionAnswering (line 1477) | class AutoModelForDocumentQuestionAnswering < PretrainedMixin
    class AutoModelForImageMatting (line 1481) | class AutoModelForImageMatting < PretrainedMixin
    class AutoModelForImageToImage (line 1485) | class AutoModelForImageToImage < PretrainedMixin
    class AutoModelForDepthEstimation (line 1489) | class AutoModelForDepthEstimation < PretrainedMixin
    class AutoModelForImageFeatureExtraction (line 1493) | class AutoModelForImageFeatureExtraction < PretrainedMixin
    class ModelOutput (line 1497) | class ModelOutput
      method [] (line 1498) | def [](key)
    class Seq2SeqLMOutput (line 1503) | class Seq2SeqLMOutput < ModelOutput
      method initialize (line 1504) | def initialize(logits, past_key_values, encoder_outputs, decoder_att...
    class SequenceClassifierOutput (line 1514) | class SequenceClassifierOutput < ModelOutput
      method initialize (line 1517) | def initialize(logits)
    class TokenClassifierOutput (line 1523) | class TokenClassifierOutput < ModelOutput
      method initialize (line 1526) | def initialize(logits)
    class MaskedLMOutput (line 1532) | class MaskedLMOutput < ModelOutput
      method initialize (line 1535) | def initialize(logits)
    class QuestionAnsweringModelOutput (line 1541) | class QuestionAnsweringModelOutput < ModelOutput
      method initialize (line 1544) | def initialize(start_logits, end_logits)
    class DetrObjectDetectionOutput (line 1551) | class DetrObjectDetectionOutput < ModelOutput
      method initialize (line 1554) | def initialize(logits, pred_boxes)
    class DetrSegmentationOutput (line 1561) | class DetrSegmentationOutput < ModelOutput
      method initialize (line 1564) | def initialize(logits, pred_boxes, pred_masks)

FILE: lib/informers/pipelines.rb
  type Informers (line 1) | module Informers
    class Pipeline (line 2) | class Pipeline
      method initialize (line 3) | def initialize(task:, model:, tokenizer: nil, processor: nil)
      method prepare_images (line 13) | def prepare_images(images)
      method prepare_audios (line 22) | def prepare_audios(audios, sampling_rate)
      method get_bounding_box (line 36) | def get_bounding_box(box, as_integer)
    class TextClassificationPipeline (line 46) | class TextClassificationPipeline < Pipeline
      method call (line 47) | def call(texts, top_k: 1)
    class TokenClassificationPipeline (line 88) | class TokenClassificationPipeline < Pipeline
      method call (line 89) | def call(
      method group_sub_entities (line 160) | def group_sub_entities(entities)
      method get_tag (line 176) | def get_tag(entity_name)
      method group_entities (line 192) | def group_entities(entities)
    class QuestionAnsweringPipeline (line 228) | class QuestionAnsweringPipeline < Pipeline
      method call (line 229) | def call(question, context, top_k: 1)
    class FillMaskPipeline (line 280) | class FillMaskPipeline < Pipeline
      method call (line 281) | def call(texts, top_k: 5)
    class Text2TextGenerationPipeline (line 314) | class Text2TextGenerationPipeline < Pipeline
      method call (line 317) | def call(texts, **generate_kwargs)
    class SummarizationPipeline (line 356) | class SummarizationPipeline < Text2TextGenerationPipeline
    class TranslationPipeline (line 360) | class TranslationPipeline < Text2TextGenerationPipeline
    class TextGenerationPipeline (line 364) | class TextGenerationPipeline < Pipeline
      method call (line 365) | def call(texts, **generate_kwargs)
    class ZeroShotClassificationPipeline (line 420) | class ZeroShotClassificationPipeline < Pipeline
      method initialize (line 421) | def initialize(**options)
      method call (line 439) | def call(texts, candidate_labels, hypothesis_template: "This example...
    class ImageToTextPipeline (line 499) | class ImageToTextPipeline < Pipeline
      method call (line 500) | def call(images, **generate_kwargs)
    class ImageClassificationPipeline (line 520) | class ImageClassificationPipeline < Pipeline
      method call (line 521) | def call(images, top_k: 1)
    class ImageSegmentationPipeline (line 551) | class ImageSegmentationPipeline < Pipeline
      method initialize (line 552) | def initialize(**options)
      method call (line 562) | def call(
    class ZeroShotImageClassificationPipeline (line 627) | class ZeroShotImageClassificationPipeline < Pipeline
      method call (line 628) | def call(images, candidate_labels, hypothesis_template: "This is a p...
    class ObjectDetectionPipeline (line 671) | class ObjectDetectionPipeline < Pipeline
      method call (line 672) | def call(images, threshold: 0.9, percentage: false)
    class ZeroShotObjectDetectionPipeline (line 706) | class ZeroShotObjectDetectionPipeline < Pipeline
      method call (line 707) | def call(
    class DocumentQuestionAnsweringPipeline (line 760) | class DocumentQuestionAnsweringPipeline < Pipeline
      method call (line 761) | def call(image, question, **generate_kwargs)
    class TextToAudioPipeline (line 801) | class TextToAudioPipeline < Pipeline
      method initialize (line 804) | def initialize(**options)
      method call (line 811) | def call(text_inputs, speaker_embeddings: nil)
    class FeatureExtractionPipeline (line 821) | class FeatureExtractionPipeline < Pipeline
      method call (line 822) | def call(
    class ImageFeatureExtractionPipeline (line 884) | class ImageFeatureExtractionPipeline < Pipeline
      method call (line 885) | def call(images)
    class AudioClassificationPipeline (line 895) | class AudioClassificationPipeline < Pipeline
      method call (line 896) | def call(audio, top_k: nil)
    class ZeroShotAudioClassificationPipeline (line 930) | class ZeroShotAudioClassificationPipeline < Pipeline
      method call (line 931) | def call(audio, candidate_labels, hypothesis_template: "This is a so...
    class AutomaticSpeechRecognitionPipeline (line 973) | class AutomaticSpeechRecognitionPipeline < Pipeline
      method call (line 974) | def call(audio, **kwargs)
      method call_whisper (line 985) | def call_whisper(audio, **kwargs)
    class ImageToImagePipeline (line 990) | class ImageToImagePipeline < Pipeline
      method call (line 991) | def call(images)
    class DepthEstimationPipeline (line 1014) | class DepthEstimationPipeline < Pipeline
      method call (line 1015) | def call(images)
    class EmbeddingPipeline (line 1042) | class EmbeddingPipeline < FeatureExtractionPipeline
      method call (line 1043) | def call(
    class RerankingPipeline (line 1053) | class RerankingPipeline < Pipeline
      method call (line 1054) | def call(
    function pipeline (line 1355) | def pipeline(
    function load_items (line 1429) | def load_items(mapping, model, pretrained_options)

FILE: lib/informers/processors.rb
  type Informers (line 1) | module Informers
    class FeatureExtractor (line 2) | class FeatureExtractor
      method initialize (line 5) | def initialize(config)
    class ImageFeatureExtractor (line 11) | class ImageFeatureExtractor < FeatureExtractor
      method initialize (line 12) | def initialize(config)
      method thumbnail (line 45) | def thumbnail(image, size, resample = 2)
      method pad_image (line 67) | def pad_image(
      method rescale (line 147) | def rescale(pixel_data)
      method get_resize_output_image_size (line 153) | def get_resize_output_image_size(image, size)
      method resize (line 214) | def resize(image)
      method preprocess (line 219) | def preprocess(
      method call (line 332) | def call(images, *args)
    class CLIPFeatureExtractor (line 354) | class CLIPFeatureExtractor < ImageFeatureExtractor
    class DPTFeatureExtractor (line 357) | class DPTFeatureExtractor < ImageFeatureExtractor
    class ViTFeatureExtractor (line 360) | class ViTFeatureExtractor < ImageFeatureExtractor
    class OwlViTFeatureExtractor (line 363) | class OwlViTFeatureExtractor < ImageFeatureExtractor
      method post_process_object_detection (line 364) | def post_process_object_detection(*args)
    class Swin2SRImageProcessor (line 369) | class Swin2SRImageProcessor < ImageFeatureExtractor
      method pad_image (line 370) | def pad_image(pixel_data, img_dims, pad_size, **options)
    class DonutFeatureExtractor (line 393) | class DonutFeatureExtractor < ImageFeatureExtractor
      method pad_image (line 394) | def pad_image(pixel_data, img_dims, pad_size, **options)
    class DetrFeatureExtractor (line 422) | class DetrFeatureExtractor < ImageFeatureExtractor
      method call (line 423) | def call(images)
      method post_process_object_detection (line 442) | def post_process_object_detection(*args)
      method remove_low_and_no_objects (line 446) | def remove_low_and_no_objects(class_logits, mask_logits, object_mask...
      method check_segment_validity (line 473) | def check_segment_validity(
      method compute_segments (line 510) | def compute_segments(
      method post_process_panoptic_segmentation (line 598) | def post_process_panoptic_segmentation(
    type Utils (line 657) | module Utils
      function center_to_corners_format (line 658) | def self.center_to_corners_format(v)
      function post_process_object_detection (line 668) | def self.post_process_object_detection(outputs, threshold = 0.5, tar...
    class WhisperFeatureExtractor (line 733) | class WhisperFeatureExtractor < FeatureExtractor
      method initialize (line 734) | def initialize(config)
      method _extract_fbank_features (line 740) | def _extract_fbank_features(waveform)
      method call (line 744) | def call(audio)
    class Wav2Vec2FeatureExtractor (line 749) | class Wav2Vec2FeatureExtractor < FeatureExtractor
      method _zero_mean_unit_var_norm (line 750) | def _zero_mean_unit_var_norm(input_values)
      method call (line 757) | def call(audio)
    class ClapFeatureExtractor (line 776) | class ClapFeatureExtractor < FeatureExtractor
      method initialize (line 777) | def initialize(config)
      method call (line 783) | def call(audio, max_length: nil)
    class Processor (line 788) | class Processor
      method initialize (line 791) | def initialize(feature_extractor)
      method call (line 795) | def call(input, *args)
    class AutoProcessor (line 800) | class AutoProcessor
      method from_pretrained (line 816) | def self.from_pretrained(

FILE: lib/informers/tokenizers.rb
  type Informers (line 1) | module Informers
    class PreTrainedTokenizer (line 2) | class PreTrainedTokenizer
      method initialize (line 5) | def initialize(tokenizer_json, tokenizer_config)
      method get_token (line 44) | def get_token(*keys)
      method call (line 65) | def call(
      method decode (line 121) | def decode(tokens, skip_special_tokens:)
      method convert_tokens_to_string (line 125) | def convert_tokens_to_string(tokens)
      method convert_tokens_to_ids (line 129) | def convert_tokens_to_ids(tokens)
      method id_to_token (line 133) | def id_to_token(id)
      method batch_decode (line 137) | def batch_decode(batch, **decode_args)
      method padding_side= (line 141) | def padding_side=(side)
    class BertTokenizer (line 146) | class BertTokenizer < PreTrainedTokenizer
    class DebertaV2Tokenizer (line 151) | class DebertaV2Tokenizer < PreTrainedTokenizer
    class DistilBertTokenizer (line 156) | class DistilBertTokenizer < PreTrainedTokenizer
    class T5Tokenizer (line 159) | class T5Tokenizer < PreTrainedTokenizer
    class GPT2Tokenizer (line 162) | class GPT2Tokenizer < PreTrainedTokenizer
    class BartTokenizer (line 166) | class BartTokenizer < PreTrainedTokenizer
    class RobertaTokenizer (line 169) | class RobertaTokenizer < PreTrainedTokenizer
    class XLMRobertaTokenizer (line 172) | class XLMRobertaTokenizer < PreTrainedTokenizer
    class MPNetTokenizer (line 175) | class MPNetTokenizer < PreTrainedTokenizer
    class CLIPTokenizer (line 178) | class CLIPTokenizer < PreTrainedTokenizer
    class NllbTokenizer (line 181) | class NllbTokenizer < PreTrainedTokenizer
      method initialize (line 184) | def initialize(tokenizer_json, tokenizer_config)
      method _build_translation_inputs (line 192) | def _build_translation_inputs(raw_inputs, tokenizer_options, generat...
    class M2M100Tokenizer (line 197) | class M2M100Tokenizer < PreTrainedTokenizer
      method initialize (line 200) | def initialize(tokenizer_json, tokenizer_config)
      method _build_translation_inputs (line 210) | def _build_translation_inputs(raw_inputs, tokenizer_options, generat...
    type Utils (line 215) | module Utils
      function _build_translation_inputs (line 216) | def self._build_translation_inputs(slf, raw_inputs, tokenizer_option...
    class SpeechT5Tokenizer (line 247) | class SpeechT5Tokenizer < PreTrainedTokenizer
    class AutoTokenizer (line 250) | class AutoTokenizer
      method from_pretrained (line 268) | def self.from_pretrained(
      method load_tokenizer (line 301) | def self.load_tokenizer(pretrained_model_name_or_path, **options)

FILE: lib/informers/utils/audio.rb
  type Informers (line 1) | module Informers
    type Utils (line 2) | module Utils
      function read_audio (line 3) | def self.read_audio(input, sampling_rate)

FILE: lib/informers/utils/core.rb
  type Informers (line 1) | module Informers
    type Utils (line 2) | module Utils
      function dispatch_callback (line 3) | def self.dispatch_callback(progress_callback, data)
      function calculate_reflect_offset (line 7) | def self.calculate_reflect_offset(i, w)

FILE: lib/informers/utils/dtypes.rb
  type Informers (line 1) | module Informers
    type Utils (line 2) | module Utils

FILE: lib/informers/utils/ffmpeg.rb
  type Informers (line 15) | module Informers
    type Utils (line 16) | module Utils
      function ffmpeg_read (line 18) | def self.ffmpeg_read(data, sampling_rate)

FILE: lib/informers/utils/generation.rb
  type Informers (line 1) | module Informers
    type Utils (line 2) | module Utils
      class GenerationConfig (line 3) | class GenerationConfig
        method initialize (line 4) | def initialize(kwargs)
        method [] (line 66) | def [](key)
        method merge! (line 70) | def merge!(config)
      class Sampler (line 75) | class Sampler
        method initialize (line 76) | def initialize(generation_config)
        method call (line 81) | def call(logits, index = -1)
        method get_logits (line 87) | def get_logits(logits, index)
        method get_sampler (line 105) | def self.get_sampler(generation_config)
      class GreedySampler (line 119) | class GreedySampler < Sampler
        method sample (line 120) | def sample(logits, index = -1)
      class BeamSearchSampler (line 133) | class BeamSearchSampler < Sampler
        method sample (line 134) | def sample(logits, index = -1)
      class LogitsProcessorList (line 158) | class LogitsProcessorList
        method initialize (line 159) | def initialize
        method push (line 164) | def push(item)
        method concat (line 168) | def concat(items)
        method call (line 172) | def call(input_ids, batched_logits)
        method to_ary (line 183) | def to_ary
      class LogitsProcessor (line 188) | class LogitsProcessor
      class NoRepeatNGramLogitsProcessor (line 191) | class NoRepeatNGramLogitsProcessor < LogitsProcessor
        method initialize (line 192) | def initialize(no_repeat_ngram_size)
        method get_ngrams (line 197) | def get_ngrams(prev_input_ids)
        method get_generated_ngrams (line 222) | def get_generated_ngrams(banned_ngrams, prev_input_ids)
        method calc_banned_ngram_tokens (line 228) | def calc_banned_ngram_tokens(prev_input_ids)
        method call (line 240) | def call(input_ids, logits)
      class MinLengthLogitsProcessor (line 250) | class MinLengthLogitsProcessor < LogitsProcessor
        method initialize (line 251) | def initialize(min_length, eos_token_id)
        method call (line 257) | def call(input_ids, logits)
      class ForcedBOSTokenLogitsProcessor (line 268) | class ForcedBOSTokenLogitsProcessor < LogitsProcessor
        method initialize (line 269) | def initialize(bos_token_id)
        method call (line 274) | def call(input_ids, logits)
      class ForcedEOSTokenLogitsProcessor (line 283) | class ForcedEOSTokenLogitsProcessor < LogitsProcessor
        method initialize (line 284) | def initialize(max_length, forced_eos_token_id)
        method call (line 290) | def call(input_ids, logits)

FILE: lib/informers/utils/hub.rb
  type Informers (line 1) | module Informers
    type Utils (line 2) | module Utils
      type Hub (line 3) | module Hub
        class FileResponse (line 4) | class FileResponse
          method initialize (line 7) | def initialize(file_path)
          method read (line 18) | def read
        function is_valid_url (line 23) | def self.is_valid_url(string, protocols = nil, valid_hosts = nil)
        function get_file (line 38) | def self.get_file(url_or_path, progress_callback = nil, progress_i...
        class FileCache (line 70) | class FileCache
          method initialize (line 73) | def initialize(path)
          method match (line 77) | def match(request)
          method put (line 84) | def put(request, response)
          method resolve_path (line 101) | def resolve_path(request)
        function try_cache (line 106) | def self.try_cache(cache, *names)
        function get_model_file (line 118) | def self.get_model_file(path_or_repo_id, filename, fatal = true, *...
        function get_model_json (line 212) | def self.get_model_json(model_path, file_name, fatal = true, **opt...
        function path_join (line 222) | def self.path_join(*parts)
        function display_progress (line 235) | def self.display_progress(filename, width, size, expected_size)

FILE: lib/informers/utils/image.rb
  type Informers (line 1) | module Informers
    type Utils (line 2) | module Utils
      class RawImage (line 3) | class RawImage
        method initialize (line 15) | def initialize(image)
        method data (line 22) | def data
        method size (line 26) | def size
        method resize (line 30) | def resize(width, height, resample: 2)
        method center_crop (line 47) | def center_crop(crop_width, crop_height)
        method rgb (line 74) | def rgb
        method save (line 82) | def save(path)
        method read (line 86) | def self.read(input)
        method from_array (line 100) | def self.from_array(input)

FILE: lib/informers/utils/math.rb
  type Informers (line 1) | module Informers
    type Utils (line 2) | module Utils
      function interpolate_data (line 3) | def self.interpolate_data(input, in_shape, out_shape, mode = "biline...
      function softmax (line 73) | def self.softmax(arr)
      function sigmoid (line 89) | def self.sigmoid(arr)
      function get_top_items (line 96) | def self.get_top_items(items, top_k = 0)
      function max (line 110) | def self.max(arr)

FILE: lib/informers/utils/tensor.rb
  type Informers (line 1) | module Informers
    type Utils (line 2) | module Utils
      function mean_pooling (line 3) | def self.mean_pooling(last_hidden_state, attention_mask)
      function normalize (line 19) | def self.normalize(result)
      function stack (line 26) | def self.stack(tensors, dim = 0)
      function ones_like (line 30) | def self.ones_like(tensor)
      function dims (line 37) | def self.dims(tensor)
      function interpolate (line 46) | def self.interpolate(input, shape, mode = "bilinear", align_corners ...
      function reshape (line 64) | def self.reshape(arr, dims)

FILE: lib/informers/version.rb
  type Informers (line 1) | module Informers

FILE: test/model_test.rb
  class ModelTest (line 3) | class ModelTest < Minitest::Test
    method test_all_minilm (line 5) | def test_all_minilm
    method test_all_minilm_xenova (line 16) | def test_all_minilm_xenova
    method test_multi_qa_minilm (line 27) | def test_multi_qa_minilm
    method test_paraphrase_minilm (line 44) | def test_paraphrase_minilm
    method test_mxbai_embed (line 55) | def test_mxbai_embed
    method test_gte_small (line 72) | def test_gte_small
    method test_e5_base (line 83) | def test_e5_base
    method test_nomic_embed (line 100) | def test_nomic_embed
    method test_bge_base (line 117) | def test_bge_base
    method test_jina_embeddings (line 134) | def test_jina_embeddings
    method test_snowflake_arctic_embed (line 145) | def test_snowflake_arctic_embed
    method test_all_mpnet (line 167) | def test_all_mpnet
    method test_bge_m3 (line 178) | def test_bge_m3
    method test_mxbai_rerank (line 186) | def test_mxbai_rerank
    method test_jina_reranker (line 203) | def test_jina_reranker
    method test_bge_reranker (line 220) | def test_bge_reranker
    method test_ms_marco_minilm (line 237) | def test_ms_marco_minilm

FILE: test/pipeline_test.rb
  class PipelineTest (line 3) | class PipelineTest < Minitest::Test
    method test_ner (line 4) | def test_ner
    method test_ner_aggregation_strategy (line 15) | def test_ner_aggregation_strategy
    method test_sentiment_analysis (line 27) | def test_sentiment_analysis
    method test_question_answering (line 44) | def test_question_answering
    method test_zero_shot_classification (line 53) | def test_zero_shot_classification
    method test_text2text_generation (line 63) | def test_text2text_generation
    method test_translation (line 69) | def test_translation
    method test_text_generation (line 75) | def test_text_generation
    method test_summarization (line 81) | def test_summarization
    method test_fill_mask (line 89) | def test_fill_mask
    method test_fill_mask_no_mask_token (line 99) | def test_fill_mask_no_mask_token
    method test_feature_extraction (line 107) | def test_feature_extraction
    method test_embedding (line 115) | def test_embedding
    method test_reranking (line 123) | def test_reranking
    method test_image_classification (line 135) | def test_image_classification
    method test_zero_shot_image_classification (line 144) | def test_zero_shot_image_classification
    method test_object_detection (line 156) | def test_object_detection
    method test_zero_shot_object_detection (line 176) | def test_zero_shot_object_detection
    method test_depth_estimation (line 196) | def test_depth_estimation
    method test_image_to_text (line 204) | def test_image_to_text
    method test_image_to_image (line 210) | def test_image_to_image
    method test_image_segmentation (line 219) | def test_image_segmentation
    method test_image_feature_extraction (line 232) | def test_image_feature_extraction
    method test_progress_callback (line 238) | def test_progress_callback
    method test_device (line 252) | def test_device
    method test_device_invalid (line 262) | def test_device_invalid
    method test_dtype (line 269) | def test_dtype
    method test_dtype_invalid (line 277) | def test_dtype_invalid
    method test_session_options (line 284) | def test_session_options

FILE: test/test_helper.rb
  class Minitest::Test (line 5) | class Minitest::Test
    method assert_elements_in_delta (line 6) | def assert_elements_in_delta(expected, actual, delta = 0.001)
    method mac? (line 13) | def mac?
Condensed preview — 30 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (215K chars).
[
  {
    "path": ".github/workflows/build.yml",
    "chars": 481,
    "preview": "name: build\non: [push, pull_request]\njobs:\n  build:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout"
  },
  {
    "path": ".gitignore",
    "chars": 95,
    "preview": "/.bundle/\n/.yardoc\n/_yardoc/\n/coverage/\n/doc/\n/pkg/\n/spec/reports/\n/test/support/\n/tmp/\n*.lock\n"
  },
  {
    "path": "CHANGELOG.md",
    "chars": 1251,
    "preview": "## 1.3.0 (unreleased)\n\n- Dropped support for Ruby < 3.3\n\n## 1.2.1 (2025-02-01)\n\n- Fixed error when terminal width is zer"
  },
  {
    "path": "Gemfile",
    "chars": 82,
    "preview": "source \"https://rubygems.org\"\n\ngemspec\n\ngem \"rake\"\ngem \"minitest\"\ngem \"ruby-vips\"\n"
  },
  {
    "path": "LICENSE.txt",
    "chars": 11358,
    "preview": "\n                                 Apache License\n                           Version 2.0, January 2004\n                  "
  },
  {
    "path": "README.md",
    "chars": 12065,
    "preview": "# Informers\n\n:fire: Fast [transformer](https://github.com/huggingface/transformers.js) inference for Ruby\n\nFor non-ONNX "
  },
  {
    "path": "Rakefile",
    "chars": 771,
    "preview": "require \"bundler/gem_tasks\"\nrequire \"rake/testtask\"\n\nRake::TestTask.new do |t|\n  t.pattern = FileList[\"test/**/*_test.rb"
  },
  {
    "path": "informers.gemspec",
    "chars": 615,
    "preview": "require_relative \"lib/informers/version\"\n\nGem::Specification.new do |spec|\n  spec.name          = \"informers\"\n  spec.ver"
  },
  {
    "path": "lib/informers/backends/onnx.rb",
    "chars": 506,
    "preview": "module Informers\n  module Backends\n    module Onnx\n      def self.device_to_execution_providers(device)\n        case dev"
  },
  {
    "path": "lib/informers/configs.rb",
    "chars": 1033,
    "preview": "module Informers\n  class PretrainedConfig\n    def initialize(config_json)\n      @config_json = config_json.to_h\n    end\n"
  },
  {
    "path": "lib/informers/env.rb",
    "chars": 502,
    "preview": "module Informers\n  CACHE_HOME = ENV.fetch(\"XDG_CACHE_HOME\", File.join(ENV.fetch(\"HOME\"), \".cache\"))\n  DEFAULT_CACHE_DIR "
  },
  {
    "path": "lib/informers/model.rb",
    "chars": 369,
    "preview": "module Informers\n  # TODO remove in 2.0\n  class Model\n    def initialize(model_id, quantized: false)\n      @model = Info"
  },
  {
    "path": "lib/informers/models.rb",
    "chars": 49385,
    "preview": "module Informers\n  MODEL_TYPES = {\n    EncoderOnly: 0,\n    EncoderDecoder: 1,\n    Seq2Seq: 2,\n    Vision2Seq: 3,\n    Dec"
  },
  {
    "path": "lib/informers/pipelines.rb",
    "chars": 40596,
    "preview": "module Informers\n  class Pipeline\n    def initialize(task:, model:, tokenizer: nil, processor: nil)\n      super()\n      "
  },
  {
    "path": "lib/informers/processors.rb",
    "chars": 26340,
    "preview": "module Informers\n  class FeatureExtractor\n    attr_reader :config\n\n    def initialize(config)\n      super()\n      @confi"
  },
  {
    "path": "lib/informers/tokenizers.rb",
    "chars": 9467,
    "preview": "module Informers\n  class PreTrainedTokenizer\n    attr_reader :mask_token, :mask_token_id, :sep_token_id\n\n    def initial"
  },
  {
    "path": "lib/informers/utils/audio.rb",
    "chars": 394,
    "preview": "module Informers\n  module Utils\n    def self.read_audio(input, sampling_rate)\n      data =\n        if input.is_a?(URI)\n "
  },
  {
    "path": "lib/informers/utils/core.rb",
    "chars": 245,
    "preview": "module Informers\n  module Utils\n    def self.dispatch_callback(progress_callback, data)\n      progress_callback.(data) i"
  },
  {
    "path": "lib/informers/utils/dtypes.rb",
    "chars": 250,
    "preview": "module Informers\n  module Utils\n    DEFAULT_DTYPE_SUFFIX_MAPPING = {\n      fp32: \"\",\n      fp16: \"_fp16\",\n      int8: \"_"
  },
  {
    "path": "lib/informers/utils/ffmpeg.rb",
    "chars": 1320,
    "preview": "# Copyright 2021 The HuggingFace Team. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lic"
  },
  {
    "path": "lib/informers/utils/generation.rb",
    "chars": 9746,
    "preview": "module Informers\n  module Utils\n    class GenerationConfig\n      def initialize(kwargs)\n        @config = {}\n\n        # "
  },
  {
    "path": "lib/informers/utils/hub.rb",
    "chars": 7842,
    "preview": "module Informers\n  module Utils\n    module Hub\n      class FileResponse\n        attr_reader :exists, :status\n\n        de"
  },
  {
    "path": "lib/informers/utils/image.rb",
    "chars": 2855,
    "preview": "module Informers\n  module Utils\n    class RawImage\n      RESAMPLING_MAPPING = {\n        0 => \"nearest\",\n        1 => \"la"
  },
  {
    "path": "lib/informers/utils/math.rb",
    "chars": 3200,
    "preview": "module Informers\n  module Utils\n    def self.interpolate_data(input, in_shape, out_shape, mode = \"bilinear\", align_corne"
  },
  {
    "path": "lib/informers/utils/tensor.rb",
    "chars": 1602,
    "preview": "module Informers\n  module Utils\n    def self.mean_pooling(last_hidden_state, attention_mask)\n      last_hidden_state.zip"
  },
  {
    "path": "lib/informers/version.rb",
    "chars": 41,
    "preview": "module Informers\n  VERSION = \"1.2.1\"\nend\n"
  },
  {
    "path": "lib/informers.rb",
    "chars": 1033,
    "preview": "# dependencies\nrequire \"onnxruntime\"\nrequire \"tokenizers\"\n\n# stdlib\nrequire \"io/console\"\nrequire \"json\"\nrequire \"open-ur"
  },
  {
    "path": "test/model_test.rb",
    "chars": 9573,
    "preview": "require_relative \"test_helper\"\n\nclass ModelTest < Minitest::Test\n  # https://huggingface.co/sentence-transformers/all-Mi"
  },
  {
    "path": "test/pipeline_test.rb",
    "chars": 11253,
    "preview": "require_relative \"test_helper\"\n\nclass PipelineTest < Minitest::Test\n  def test_ner\n    ner = Informers.pipeline(\"ner\")\n "
  },
  {
    "path": "test/test_helper.rb",
    "chars": 365,
    "preview": "require \"bundler/setup\"\nBundler.require(:default)\nrequire \"minitest/autorun\"\n\nclass Minitest::Test\n  def assert_elements"
  }
]

About this extraction

This page contains the full source code of the ankane/informers GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 30 files (199.8 KB), approximately 51.9k tokens, and a symbol index with 497 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!