Showing preview only (210K chars total). Download the full file or copy to clipboard to get everything.
Repository: ankane/informers
Branch: master
Commit: 14e74907c83d
Files: 30
Total size: 199.8 KB
Directory structure:
gitextract_hoze518r/
├── .github/
│ └── workflows/
│ └── build.yml
├── .gitignore
├── CHANGELOG.md
├── Gemfile
├── LICENSE.txt
├── README.md
├── Rakefile
├── informers.gemspec
├── lib/
│ ├── informers/
│ │ ├── backends/
│ │ │ └── onnx.rb
│ │ ├── configs.rb
│ │ ├── env.rb
│ │ ├── model.rb
│ │ ├── models.rb
│ │ ├── pipelines.rb
│ │ ├── processors.rb
│ │ ├── tokenizers.rb
│ │ ├── utils/
│ │ │ ├── audio.rb
│ │ │ ├── core.rb
│ │ │ ├── dtypes.rb
│ │ │ ├── ffmpeg.rb
│ │ │ ├── generation.rb
│ │ │ ├── hub.rb
│ │ │ ├── image.rb
│ │ │ ├── math.rb
│ │ │ └── tensor.rb
│ │ └── version.rb
│ └── informers.rb
└── test/
├── model_test.rb
├── pipeline_test.rb
└── test_helper.rb
================================================
FILE CONTENTS
================================================
================================================
FILE: .github/workflows/build.yml
================================================
name: build
on: [push, pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: ruby/setup-ruby@v1
with:
ruby-version: "4.0"
bundler-cache: true
- uses: actions/cache@v5
with:
path: ~/.cache/informers
key: informers
- run: sudo apt-get update && sudo apt-get install libvips
- run: bundle exec rake download:files
- run: bundle exec rake test
================================================
FILE: .gitignore
================================================
/.bundle/
/.yardoc
/_yardoc/
/coverage/
/doc/
/pkg/
/spec/reports/
/test/support/
/tmp/
*.lock
================================================
FILE: CHANGELOG.md
================================================
## 1.3.0 (unreleased)
- Dropped support for Ruby < 3.3
## 1.2.1 (2025-02-01)
- Fixed error when terminal width is zero
## 1.2.0 (2024-11-14)
- Added support for models with external data
- Added `device` option
- Added `dtype` option
- Added `session_options` option
## 1.1.1 (2024-10-14)
- Added `audio-classification` pipeline
- Fixed error with `sentence-transformers/all-MiniLM-L6-v2`
## 1.1.0 (2024-09-17)
- Added more pipelines
## 1.0.3 (2024-08-29)
- Added `model_output` option
- Improved `model_file_name` option
## 1.0.2 (2024-08-28)
- Added `embedding` pipeline
- Added experimental `reranking` pipeline
- Added support for `nomic-ai/nomic-embed-text-v1`
## 1.0.1 (2024-08-27)
- Added support for `Supabase/gte-small` to `Model`
- Fixed error with downloads
## 1.0.0 (2024-08-26)
- Replaced task classes with `pipeline` method
- Added `Model` class
- Dropped support for Ruby < 3.1
## 0.2.0 (2022-09-06)
- Added support for `optimum` and `transformers.onnx` models
- Dropped support for Ruby < 2.7
## 0.1.3 (2021-09-25)
- Added text generation
- Added fill mask
## 0.1.2 (2020-11-24)
- Added feature extraction
## 0.1.1 (2020-10-05)
- Fixed question answering for Ruby < 2.7
## 0.1.0 (2020-10-01)
- First release
================================================
FILE: Gemfile
================================================
source "https://rubygems.org"
gemspec
gem "rake"
gem "minitest"
gem "ruby-vips"
================================================
FILE: LICENSE.txt
================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
================================================
FILE: README.md
================================================
# Informers
:fire: Fast [transformer](https://github.com/huggingface/transformers.js) inference for Ruby
For non-ONNX models, check out [Transformers.rb](https://github.com/ankane/transformers-ruby) :slightly_smiling_face:
[](https://github.com/ankane/informers/actions)
## Installation
Add this line to your application’s Gemfile:
```ruby
gem "informers"
```
## Getting Started
- [Models](#models)
- [Pipelines](#pipelines)
## Models
Embedding
- [sentence-transformers/all-MiniLM-L6-v2](#sentence-transformersall-MiniLM-L6-v2)
- [sentence-transformers/multi-qa-MiniLM-L6-cos-v1](#sentence-transformersmulti-qa-MiniLM-L6-cos-v1)
- [sentence-transformers/all-mpnet-base-v2](#sentence-transformersall-mpnet-base-v2)
- [sentence-transformers/paraphrase-MiniLM-L6-v2](#sentence-transformersparaphrase-minilm-l6-v2)
- [mixedbread-ai/mxbai-embed-large-v1](#mixedbread-aimxbai-embed-large-v1)
- [Supabase/gte-small](#supabasegte-small)
- [intfloat/e5-base-v2](#intfloate5-base-v2)
- [nomic-ai/nomic-embed-text-v1](#nomic-ainomic-embed-text-v1)
- [BAAI/bge-base-en-v1.5](#baaibge-base-en-v15)
- [jinaai/jina-embeddings-v2-base-en](#jinaaijina-embeddings-v2-base-en)
- [Snowflake/snowflake-arctic-embed-m-v1.5](#snowflakesnowflake-arctic-embed-m-v15)
Reranking
- [mixedbread-ai/mxbai-rerank-base-v1](#mixedbread-aimxbai-rerank-base-v1)
- [jinaai/jina-reranker-v1-turbo-en](#jinaaijina-reranker-v1-turbo-en)
- [BAAI/bge-reranker-base](#baaibge-reranker-base)
- [Xenova/ms-marco-MiniLM-L-6-v2](#xenovams-marco-minilm-l-6-v2)
### sentence-transformers/all-MiniLM-L6-v2
[Docs](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
```ruby
sentences = ["This is an example sentence", "Each sentence is converted"]
model = Informers.pipeline("embedding", "sentence-transformers/all-MiniLM-L6-v2")
embeddings = model.(sentences)
```
### sentence-transformers/multi-qa-MiniLM-L6-cos-v1
[Docs](https://huggingface.co/Xenova/multi-qa-MiniLM-L6-cos-v1)
```ruby
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]
model = Informers.pipeline("embedding", "sentence-transformers/multi-qa-MiniLM-L6-cos-v1")
query_embedding = model.(query)
doc_embeddings = model.(docs)
scores = doc_embeddings.map { |e| e.zip(query_embedding).sum { |d, q| d * q } }
doc_score_pairs = docs.zip(scores).sort_by { |d, s| -s }
```
### sentence-transformers/all-mpnet-base-v2
[Docs](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
```ruby
sentences = ["This is an example sentence", "Each sentence is converted"]
model = Informers.pipeline("embedding", "sentence-transformers/all-mpnet-base-v2")
embeddings = model.(sentences)
```
### sentence-transformers/paraphrase-MiniLM-L6-v2
[Docs](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2)
```ruby
sentences = ["This is an example sentence", "Each sentence is converted"]
model = Informers.pipeline("embedding", "sentence-transformers/paraphrase-MiniLM-L6-v2")
embeddings = model.(sentences, normalize: false)
```
### mixedbread-ai/mxbai-embed-large-v1
[Docs](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1)
```ruby
query_prefix = "Represent this sentence for searching relevant passages: "
input = [
"The dog is barking",
"The cat is purring",
query_prefix + "puppy"
]
model = Informers.pipeline("embedding", "mixedbread-ai/mxbai-embed-large-v1")
embeddings = model.(input)
```
### Supabase/gte-small
[Docs](https://huggingface.co/Supabase/gte-small)
```ruby
sentences = ["That is a happy person", "That is a very happy person"]
model = Informers.pipeline("embedding", "Supabase/gte-small")
embeddings = model.(sentences)
```
### intfloat/e5-base-v2
[Docs](https://huggingface.co/intfloat/e5-base-v2)
```ruby
doc_prefix = "passage: "
query_prefix = "query: "
input = [
doc_prefix + "Ruby is a programming language created by Matz",
query_prefix + "Ruby creator"
]
model = Informers.pipeline("embedding", "intfloat/e5-base-v2")
embeddings = model.(input)
```
### nomic-ai/nomic-embed-text-v1
[Docs](https://huggingface.co/nomic-ai/nomic-embed-text-v1)
```ruby
doc_prefix = "search_document: "
query_prefix = "search_query: "
input = [
doc_prefix + "The dog is barking",
doc_prefix + "The cat is purring",
query_prefix + "puppy"
]
model = Informers.pipeline("embedding", "nomic-ai/nomic-embed-text-v1")
embeddings = model.(input)
```
### BAAI/bge-base-en-v1.5
[Docs](https://huggingface.co/BAAI/bge-base-en-v1.5)
```ruby
query_prefix = "Represent this sentence for searching relevant passages: "
input = [
"The dog is barking",
"The cat is purring",
query_prefix + "puppy"
]
model = Informers.pipeline("embedding", "BAAI/bge-base-en-v1.5")
embeddings = model.(input)
```
### jinaai/jina-embeddings-v2-base-en
[Docs](https://huggingface.co/jinaai/jina-embeddings-v2-base-en)
```ruby
sentences = ["How is the weather today?", "What is the current weather like today?"]
model = Informers.pipeline("embedding", "jinaai/jina-embeddings-v2-base-en", model_file_name: "../model")
embeddings = model.(sentences)
```
### Snowflake/snowflake-arctic-embed-m-v1.5
[Docs](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5)
```ruby
query_prefix = "Represent this sentence for searching relevant passages: "
input = [
"The dog is barking",
"The cat is purring",
query_prefix + "puppy"
]
model = Informers.pipeline("embedding", "Snowflake/snowflake-arctic-embed-m-v1.5")
embeddings = model.(input, model_output: "sentence_embedding", pooling: "none")
```
### mixedbread-ai/mxbai-rerank-base-v1
[Docs](https://huggingface.co/mixedbread-ai/mxbai-rerank-base-v1)
```ruby
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]
model = Informers.pipeline("reranking", "mixedbread-ai/mxbai-rerank-base-v1")
result = model.(query, docs)
```
### jinaai/jina-reranker-v1-turbo-en
[Docs](https://huggingface.co/jinaai/jina-reranker-v1-turbo-en)
```ruby
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]
model = Informers.pipeline("reranking", "jinaai/jina-reranker-v1-turbo-en")
result = model.(query, docs)
```
### BAAI/bge-reranker-base
[Docs](https://huggingface.co/BAAI/bge-reranker-base)
```ruby
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]
model = Informers.pipeline("reranking", "BAAI/bge-reranker-base")
result = model.(query, docs)
```
### Xenova/ms-marco-MiniLM-L-6-v2
[Docs](https://huggingface.co/Xenova/ms-marco-MiniLM-L-6-v2)
```ruby
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]
model = Informers.pipeline("reranking", "Xenova/ms-marco-MiniLM-L-6-v2")
result = model.(query, docs)
```
### Other
The model must include a `.onnx` file ([example](https://huggingface.co/Xenova/all-MiniLM-L6-v2/tree/main/onnx)). If the file is not at `onnx/model.onnx`, use the `model_file_name` option to specify the location.
## Pipelines
- [Text](#text)
- [Vision](#vision)
- [Audio](#audio)
- [Multimodel](#multimodal)
### Text
Embedding
```ruby
embed = Informers.pipeline("embedding")
embed.("We are very happy to show you the 🤗 Transformers library.")
```
Reranking
```ruby
rerank = Informers.pipeline("reranking")
rerank.("Who created Ruby?", ["Matz created Ruby", "Another doc"])
```
Named-entity recognition
```ruby
ner = Informers.pipeline("ner")
ner.("Ruby is a programming language created by Matz")
```
Sentiment analysis
```ruby
classifier = Informers.pipeline("sentiment-analysis")
classifier.("We are very happy to show you the 🤗 Transformers library.")
```
Question answering
```ruby
qa = Informers.pipeline("question-answering")
qa.("Who invented Ruby?", "Ruby is a programming language created by Matz")
```
Zero-shot classification
```ruby
classifier = Informers.pipeline("zero-shot-classification")
classifier.("text", ["label1", "label2", "label3"])
```
Text generation
```ruby
generator = Informers.pipeline("text-generation")
generator.("I enjoy walking with my cute dog,")
```
Text-to-text generation
```ruby
text2text = Informers.pipeline("text2text-generation")
text2text.("translate from English to French: I'm very happy")
```
Translation
```ruby
translator = Informers.pipeline("translation", "Xenova/nllb-200-distilled-600M")
translator.("जीवन एक चॉकलेट बॉक्स की तरह है।", src_lang: "hin_Deva", tgt_lang: "fra_Latn")
```
Summarization
```ruby
summarizer = Informers.pipeline("summarization")
summarizer.("Many paragraphs of text")
```
Fill mask
```ruby
unmasker = Informers.pipeline("fill-mask")
unmasker.("Paris is the [MASK] of France.")
```
Feature extraction
```ruby
extractor = Informers.pipeline("feature-extraction")
extractor.("We are very happy to show you the 🤗 Transformers library.")
```
### Vision
Note: [ruby-vips](https://github.com/libvips/ruby-vips) is required to load images
Image classification
```ruby
classifier = Informers.pipeline("image-classification")
classifier.("image.jpg")
```
Zero-shot image classification
```ruby
classifier = Informers.pipeline("zero-shot-image-classification")
classifier.("image.jpg", ["label1", "label2", "label3"])
```
Image segmentation
```ruby
segmenter = Informers.pipeline("image-segmentation")
segmenter.("image.jpg")
```
Object detection
```ruby
detector = Informers.pipeline("object-detection")
detector.("image.jpg")
```
Zero-shot object detection
```ruby
detector = Informers.pipeline("zero-shot-object-detection")
detector.("image.jpg", ["label1", "label2", "label3"])
```
Depth estimation
```ruby
estimator = Informers.pipeline("depth-estimation")
estimator.("image.jpg")
```
Image-to-image
```ruby
upscaler = Informers.pipeline("image-to-image")
upscaler.("image.jpg")
```
Image feature extraction
```ruby
extractor = Informers.pipeline("image-feature-extraction")
extractor.("image.jpg")
```
### Audio
Note: [ffmpeg](https://www.ffmpeg.org/) is required to load audio files
Audio classification
```ruby
classifier = Informers.pipeline("audio-classification")
classifier.("audio.wav")
```
### Multimodal
Image captioning
```ruby
captioner = Informers.pipeline("image-to-text")
captioner.("image.jpg")
```
Document question answering
```ruby
qa = Informers.pipeline("document-question-answering")
qa.("image.jpg", "What is the invoice number?")
```
## Reference
Specify a variant of the model if available (`fp32`, `fp16`, `int8`, `uint8`, `q8`, `q4`, `q4f16`, or `bnb4`)
```ruby
Informers.pipeline("embedding", "Xenova/all-MiniLM-L6-v2", dtype: "fp16")
```
Specify a device (`cpu`, `cuda`, or `coreml`)
```ruby
Informers.pipeline("embedding", device: "cuda")
```
Note: Follow [these instructions](https://github.com/ankane/onnxruntime-ruby?tab=readme-ov-file#gpu-support) for `cuda`
Specify ONNX Runtime [session options](https://github.com/ankane/onnxruntime-ruby?tab=readme-ov-file#session-options)
```ruby
Informers.pipeline("embedding", session_options: {log_severity_level: 2})
```
## Credits
This library was ported from [Transformers.js](https://github.com/huggingface/transformers.js) and is available under the same license.
## History
View the [changelog](https://github.com/ankane/informers/blob/master/CHANGELOG.md)
## Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- [Report bugs](https://github.com/ankane/informers/issues)
- Fix bugs and [submit pull requests](https://github.com/ankane/informers/pulls)
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
```sh
git clone https://github.com/ankane/informers.git
cd informers
bundle install
bundle exec rake download:files
bundle exec rake test
```
================================================
FILE: Rakefile
================================================
require "bundler/gem_tasks"
require "rake/testtask"
Rake::TestTask.new do |t|
t.pattern = FileList["test/**/*_test.rb"].exclude("test/model_test.rb")
end
task default: :test
def download_file(url)
require "open-uri"
file = File.basename(url)
puts "Downloading #{file}..."
dest = "test/support/#{file}"
File.binwrite(dest, URI.parse(url).read)
puts "Saved #{dest}"
end
namespace :download do
task :files do
Dir.mkdir("test/support") unless Dir.exist?("test/support")
download_file("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg")
download_file("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_1.png")
end
end
================================================
FILE: informers.gemspec
================================================
require_relative "lib/informers/version"
Gem::Specification.new do |spec|
spec.name = "informers"
spec.version = Informers::VERSION
spec.summary = "Fast transformer inference for Ruby"
spec.homepage = "https://github.com/ankane/informers"
spec.license = "Apache-2.0"
spec.author = "Andrew Kane"
spec.email = "andrew@ankane.org"
spec.files = Dir["*.{md,txt}", "{lib}/**/*"]
spec.require_path = "lib"
spec.required_ruby_version = ">= 3.3"
spec.add_dependency "onnxruntime", ">= 0.9"
spec.add_dependency "tokenizers", ">= 0.5.3"
end
================================================
FILE: lib/informers/backends/onnx.rb
================================================
module Informers
module Backends
module Onnx
def self.device_to_execution_providers(device)
case device&.to_s
when "cpu", nil
[]
when "cuda"
["CUDAExecutionProvider"]
when "coreml"
["CoreMLExecutionProvider"]
else
supported_devices = ["cpu", "cuda", "coreml"]
raise ArgumentError, "Unsupported device: #{device}. Should be one of: #{supported_devices.join(", ")}"
end
end
end
end
end
================================================
FILE: lib/informers/configs.rb
================================================
module Informers
class PretrainedConfig
def initialize(config_json)
@config_json = config_json.to_h
end
def [](key)
@config_json[key.to_s]
end
def []=(key, value)
@config_json[key.to_s] = value
end
def to_h
@config_json.to_h
end
def self.from_pretrained(
pretrained_model_name_or_path,
progress_callback: nil,
config: nil,
cache_dir: nil,
local_files_only: false,
revision: "main",
**kwargs
)
data = config || load_config(
pretrained_model_name_or_path,
progress_callback:,
config:,
cache_dir:,
local_files_only:,
revision:
)
new(data)
end
def self.load_config(pretrained_model_name_or_path, **options)
info = Utils::Hub.get_model_json(pretrained_model_name_or_path, "config.json", true, **options)
info
end
end
class AutoConfig
def self.from_pretrained(...)
PretrainedConfig.from_pretrained(...)
end
end
end
================================================
FILE: lib/informers/env.rb
================================================
module Informers
CACHE_HOME = ENV.fetch("XDG_CACHE_HOME", File.join(ENV.fetch("HOME"), ".cache"))
DEFAULT_CACHE_DIR = File.expand_path(File.join(CACHE_HOME, "informers"))
class << self
attr_accessor :allow_remote_models, :remote_host, :remote_path_template, :cache_dir
end
self.allow_remote_models = ENV["INFORMERS_OFFLINE"].to_s.empty?
self.remote_host = "https://huggingface.co/"
self.remote_path_template = "{model}/resolve/{revision}/"
self.cache_dir = DEFAULT_CACHE_DIR
end
================================================
FILE: lib/informers/model.rb
================================================
module Informers
# TODO remove in 2.0
class Model
def initialize(model_id, quantized: false)
@model = Informers.pipeline("embedding", model_id, quantized: quantized)
@options = model_id == "mixedbread-ai/mxbai-embed-large-v1" ? {pooling: "cls", normalize: false} : {}
end
def embed(texts)
@model.(texts, **@options)
end
end
end
================================================
FILE: lib/informers/models.rb
================================================
module Informers
MODEL_TYPES = {
EncoderOnly: 0,
EncoderDecoder: 1,
Seq2Seq: 2,
Vision2Seq: 3,
DecoderOnly: 4,
MaskGeneration: 5
}
# NOTE: These will be populated fully later
MODEL_TYPE_MAPPING = {}
MODEL_NAME_TO_CLASS_MAPPING = {}
MODEL_CLASS_TO_NAME_MAPPING = {}
class PretrainedMixin
def self.from_pretrained(
pretrained_model_name_or_path,
quantized: true,
progress_callback: nil,
config: nil,
cache_dir: nil,
local_files_only: false,
revision: "main",
device: nil,
dtype: nil,
model_file_name: nil,
session_options: {}
)
options = {
quantized:,
progress_callback:,
config:,
cache_dir:,
local_files_only:,
revision:,
device:,
dtype:,
model_file_name:,
session_options:
}
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **options)
if options[:config].nil?
# If no config was passed, reuse this config for future processing
options[:config] = config
end
if !const_defined?(:MODEL_CLASS_MAPPINGS)
raise Error, "`MODEL_CLASS_MAPPINGS` not implemented for this type of `AutoClass`: #{name}"
end
const_get(:MODEL_CLASS_MAPPINGS).each do |model_class_mapping|
model_info = model_class_mapping[config[:model_type]]
if !model_info
next # Item not found in this mapping
end
return model_info[1].from_pretrained(pretrained_model_name_or_path, **options)
end
if const_defined?(:BASE_IF_FAIL)
warn "Unknown model class #{config[:model_type].inspect}, attempting to construct from base class."
PreTrainedModel.from_pretrained(pretrained_model_name_or_path, **options)
else
raise Error, "Unsupported model type: #{config[:model_type]}"
end
end
end
class PreTrainedModel
MAIN_INPUT_NAME = :input_ids
attr_reader :config
def initialize(config, session)
super()
@config = config
@session = session
@output_names = nil
model_name = MODEL_CLASS_TO_NAME_MAPPING[self.class]
model_type = MODEL_TYPE_MAPPING[model_name]
case model_type
when MODEL_TYPES[:DecoderOnly]
@can_generate = true
@run_beam = method(:decoder_run_beam)
@get_start_beams = method(:decoder_start_beams)
@update_beam = method(:decoder_update_beam)
@forward = method(:decoder_forward)
when MODEL_TYPES[:Seq2Seq], MODEL_TYPES[:Vision2Seq]
@can_generate = true
@run_beam = method(:seq2seq_run_beam)
@get_start_beams = method(:seq2seq_start_beams)
@update_beam = method(:seq2seq_update_beam)
@forward = method(:seq2seq_forward)
when MODEL_TYPES[:EncoderDecoder]
@forward = method(:encoder_forward)
else
@forward = method(:encoder_forward)
end
end
def self.from_pretrained(
pretrained_model_name_or_path,
quantized: true,
progress_callback: nil,
config: nil,
cache_dir: nil,
local_files_only: false,
revision: "main",
device: nil,
dtype: nil,
model_file_name: nil,
session_options: {}
)
options = {
quantized:,
progress_callback:,
config:,
cache_dir:,
local_files_only:,
revision:,
device:,
dtype:,
model_file_name:,
session_options:
}
model_name = MODEL_CLASS_TO_NAME_MAPPING[self]
model_type = MODEL_TYPE_MAPPING[model_name]
config ||= AutoConfig.from_pretrained(pretrained_model_name_or_path, **options)
if model_type == MODEL_TYPES[:DecoderOnly]
info = [
construct_session(pretrained_model_name_or_path, options[:model_file_name] || "decoder_model_merged", **options),
Utils::Hub.get_model_json(pretrained_model_name_or_path, "generation_config.json", false, **options)
]
elsif model_type == MODEL_TYPES[:Seq2Seq] || model_type == MODEL_TYPES[:Vision2Seq]
info = [
construct_session(pretrained_model_name_or_path, "encoder_model", **options),
construct_session(pretrained_model_name_or_path, "decoder_model_merged", **options),
Utils::Hub.get_model_json(pretrained_model_name_or_path, "generation_config.json", false, **options)
]
elsif model_type == MODEL_TYPES[:MaskGeneration]
info = [
construct_session(pretrained_model_name_or_path, "vision_encoder", **options),
construct_session(pretrained_model_name_or_path, "prompt_encoder_mask_decoder", **options)
]
elsif model_type == MODEL_TYPES[:EncoderDecoder]
info = [
construct_session(pretrained_model_name_or_path, "encoder_model", **options),
construct_session(pretrained_model_name_or_path, "decoder_model_merged", **options)
]
else
if model_type != MODEL_TYPES[:EncoderOnly]
warn "Model type for '#{model_name || config[:model_type]}' not found, assuming encoder-only architecture. Please report this."
end
info = [
construct_session(pretrained_model_name_or_path, options[:model_file_name] || "model", **options)
]
end
new(config, *info)
end
def self.construct_session(pretrained_model_name_or_path, file_name, **options)
prefix = "onnx/"
if file_name.start_with?("../")
prefix = ""
file_name = file_name[3..]
elsif file_name.start_with?("/")
prefix = ""
file_name = file_name[1..]
end
dtype = options[:dtype] || (options[:quantized] ? "q8" : "fp32")
suffix = Utils::DEFAULT_DTYPE_SUFFIX_MAPPING[dtype.to_sym]
if !suffix
raise ArgumentError, "Invalid dtype: #{dtype}. Should be one of: #{Utils::DEFAULT_DTYPE_SUFFIX_MAPPING.keys.join(", ")}"
end
model_file_name = "#{prefix}#{file_name}#{suffix}.onnx"
path = Utils::Hub.get_model_file(pretrained_model_name_or_path, model_file_name, true, **options)
session_options = {
providers: Backends::Onnx.device_to_execution_providers(options[:device]),
log_severity_level: 4
}.merge(options[:session_options] || {})
begin
OnnxRuntime::InferenceSession.new(path, **session_options)
rescue OnnxRuntime::Error => e
raise e unless e.message.include?("No such file or directory") && e.message.include?(".onnx_data")
Utils::Hub.get_model_file(pretrained_model_name_or_path, "#{model_file_name}_data", true, **options)
OnnxRuntime::InferenceSession.new(path, **session_options)
end
end
def call(model_inputs, **kwargs)
@forward.(model_inputs, **kwargs)
end
def generate(inputs, generation_config = nil, logits_processor = nil, inputs_attention_mask: nil)
if !@can_generate
model_name = MODEL_CLASS_TO_NAME_MAPPING[self.class]
error_message = "The current model class (#{model_name}) is not compatible with `.generate()`, as it doesn't have a language model head."
raise Error, error_message
end
if !inputs.is_a?(Array)
raise ArgumentError, "`inputs` must be an Array, but is #{inputs.class.name}"
end
if @config[:is_encoder_decoder]
# Generating from the encoder outputs
input_ids_seq_length = 0
else
input_ids_seq_length = inputs.length
# decoder-only
if input_ids_seq_length == 0
raise Error, "Must supply a non-empty array of input token ids."
end
end
# Update generation config with defaults
generation_config = get_generation_config(generation_config)
logits_processor ||= Utils::LogitsProcessorList.new
# Update logits processor
logits_processor = get_logits_processor(
generation_config,
input_ids_seq_length,
logits_processor
)
eos_token_ids = generation_config[:eos_token_id]
if !eos_token_ids.nil? && !eos_token_ids.is_a?(Array)
eos_token_ids = [eos_token_ids]
end
num_output_tokens = 1
max_output_tokens = num_output_tokens + (generation_config[:max_new_tokens] || Float::INFINITY)
# Only use max length if max_new_tokens is not provided
use_max_length = generation_config[:max_length].is_a?(Integer) && generation_config[:max_new_tokens].nil?
sampler = Utils::Sampler.get_sampler(generation_config)
beams = get_start_beams(inputs, generation_config, num_output_tokens, inputs_attention_mask)
while beams.any? { |x| !x[:done] } && num_output_tokens < max_output_tokens
newest_beams = []
beams.each do |beam|
if beam[:done]
# Add this beam back into the pool
newest_beams << beam
next
end
if use_max_length && beam[:output_token_ids].length >= generation_config["max_length"]
# Set this beam to done and add it back into the pool
beam[:done] = true
newest_beams << beam
next
end
output = run_beam(beam)
# add attentions/scores to beam only if user requested
if generation_config["output_attentions"]
add_attentions_to_beam(beam, output)
end
# Logits are of the form [batch_size, out_seq_length, vocab_size]
# In most cases, this will be [batch_size, 1, vocab_size]
# So, we select the last token's logits:
# (equivalent to `logits = outputs.logits[:, -1, :]`)
logits = output["logits"].map { |v| v[-1] }
# Apply logits processor
logits_processor.(beam[:output_token_ids], logits)
sampled_tokens = sampler.(logits)
sampled_tokens.each do |new_token_id, log_prob|
# use previous beam as a starting point
new_beam = beam.dup
# update new beam
update_beam(new_beam, new_token_id)
new_beam[:score] += log_prob
if eos_token_ids && eos_token_ids.include?(new_token_id)
new_beam[:done] = true
end
newest_beams << new_beam
end
end
num_output_tokens += 1
# Next, we get the best beams, per ID
newest_beams =
group_beams(newest_beams).map do |group|
group.sort_by { |v| -v[:score] }[0...generation_config["num_beams"]]
end
# Flatten beams
beams = newest_beams.flatten(1)
# Run callback
if generation_config["callback_function"]
generation_config["callback_function"].(beams)
end
end
# TODO: Ensure that we can return non-batched outputs
grouped_beams = group_beams(beams)
get_flattened = lambda do |key|
grouped_beams.flat_map do |batch|
if generation_config["num_return_sequences"] > 1
raise Todo
else
[batch[0][key]]
end
end
end
sequences = get_flattened.(:output_token_ids) # [1, seqLength]
if generation_config["return_dict_in_generate"]
raise Todo
else
sequences
end
end
private
def get_logits_processor(
generation_config,
input_ids_seq_length,
logits_processor = nil
)
processors = Utils::LogitsProcessorList.new
if !generation_config["repetition_penalty"].nil? && generation_config["repetition_penalty"] != 1.0
processors.push(Utils::RepetitionPenaltyLogitsProcessor.new(generation_config["repetition_penalty"]))
end
if !generation_config["no_repeat_ngram_size"].nil? && generation_config["no_repeat_ngram_size"] > 0
processors.push(Utils::NoRepeatNGramLogitsProcessor.new(generation_config["no_repeat_ngram_size"]))
end
if !generation_config["bad_words_ids"].nil?
processors.push(Utils::NoBadWordsLogitsProcessor.new(generation_config["bad_words_ids"], generation_config["eos_token_id"]))
end
if !generation_config["min_length"].nil? && !generation_config["eos_token_id"].nil? && generation_config["min_length"] > 0
processors.push(Utils::MinLengthLogitsProcessor.new(generation_config["min_length"], generation_config["eos_token_id"]))
end
if !generation_config["min_new_tokens"].nil? && !generation_config["eos_token_id"].nil? && generation_config["min_new_tokens"] > 0
processors.push(Utils::MinNewTokensLengthLogitsProcessor.new(
input_ids_seq_length,
generation_config["min_new_tokens"],
generation_config["eos_token_id"]
))
end
if !generation_config["forced_bos_token_id"].nil?
processors.push(Utils::ForcedBOSTokenLogitsProcessor.new(generation_config["forced_bos_token_id"]))
end
if !generation_config["forced_eos_token_id"].nil?
processors.push(Utils::ForcedEOSTokenLogitsProcessor.new(
generation_config["max_length"],
generation_config["forced_eos_token_id"]
))
end
if !generation_config["begin_suppress_tokens"].nil?
raise Todo
end
if !generation_config["forced_decoder_ids"].nil?
processors.push(Utils::ForceTokensLogitsProcessor.new(generation_config["forced_decoder_ids"]))
end
if !logits_processor.nil?
processors.concat(logits_processor)
end
processors
end
def get_generation_config(generation_config)
# Create empty generation config (contains defaults)
# We pass `@config` so that if `eos_token_id` or `bos_token_id` exist in the model's config, we will use them
gen_config = Utils::GenerationConfig.new(@config.to_h)
# Apply model's generation config, if it exists
if @generation_config
gen_config.merge!(@generation_config)
end
# Finally, use any generation config specified by the user
# when calling `generate`
if !generation_config.nil?
gen_config.merge!(generation_config)
end
gen_config
end
def seq2seq_forward(model_inputs)
encoder_outputs = model_inputs[:encoder_outputs]
past_key_values = model_inputs[:past_key_values]
if !encoder_outputs
# Encoder outputs are not given, so we must compute them.
encoder_outputs = encoder_forward(model_inputs)[0]
end
decoder_feeds = {
input_ids: model_inputs[:decoder_input_ids],
encoder_hidden_states: encoder_outputs
}
use_cache_branch = !!past_key_values
if @decoder_merged_session.inputs.map { |v| v[:name] }.include?("use_cache_branch")
decoder_feeds[:use_cache_branch] = [use_cache_branch]
end
if @decoder_merged_session.inputs.map { |v| v[:name] }.include?("encoder_attention_mask")
decoder_feeds[:encoder_attention_mask] = model_inputs[:attention_mask]
end
prepare_position_ids(@decoder_merged_session, decoder_feeds, use_cache_branch)
add_past_key_values(decoder_feeds, past_key_values)
decoder_results = session_run(@decoder_merged_session, decoder_feeds)
decoder_results = @decoder_merged_session.outputs.map { |v| v[:name] }.zip(decoder_results).to_h
logits = decoder_results["logits"]
past_key_values = get_past_key_values(decoder_results, past_key_values)
# Get cross attention and/or decoder attentions if they are present
attns = get_attentions(decoder_results)
Seq2SeqLMOutput.new(logits, past_key_values, encoder_outputs, attns["decoder_attentions"], attns["cross_attentions"])
end
def prepare_position_ids(session, feeds, use_cache_branch)
if !session.inputs.map { |v| v[:name] }.include?("position_ids")
return
end
raise Todo
end
def get_past_key_values(decoder_results, past_key_values)
pkvs = {}
decoder_results.each_key do |name|
if name.start_with?("present")
new_name = name.sub("present", "past_key_values")
if past_key_values && name.include?("encoder")
# Optimization introduced by optimum to reuse past key values. So, we just replace the constant
# outputs with the previous past key values.
# https://github.com/huggingface/optimum/blob/0bf2c05fb7e1182b52d21b703cfc95fd9e4ea3dc/optimum/onnxruntime/base.py#L677-L704
pkvs[new_name] = past_key_values[new_name]
else
pkvs[new_name] = decoder_results[name]
end
end
end
pkvs
end
def get_attentions(decoder_results)
attns = {}
["cross_attentions", "decoder_attentions"].each do |attn_name|
result = []
decoder_results.each_key do |name|
if name.start_with?(attn_name)
index = name.split(".").pop
result[index] = decoder_results[name]
end
end
attns[attn_name] = result
end
attns
end
def add_past_key_values(decoder_feeds, past_key_values)
if past_key_values
decoder_feeds.merge!(past_key_values)
else
# TODO support batches (i.e., batch_size > 1)
batch_size = 1
if @config[:is_encoder_decoder] && (!@add_encoder_pkv.nil? ? @add_encoder_pkv : true)
_encoder_dims = [batch_size, @num_encoder_heads, 0, @encoder_dim_kv]
_decoder_dims = [batch_size, @num_decoder_heads, 0, @decoder_dim_kv]
@num_decoder_layers.times do |i|
# decoder_feeds["past_key_values.#{i}.encoder.key"] = OnnxRuntime::OrtValue.from_shape_and_type(encoder_dims, :float)
# decoder_feeds["past_key_values.#{i}.encoder.value"] = OnnxRuntime::OrtValue.from_shape_and_type(encoder_dims, :float)
# decoder_feeds["past_key_values.#{i}.decoder.key"] = OnnxRuntime::OrtValue.from_shape_and_type(decoder_dims, :float)
# decoder_feeds["past_key_values.#{i}.decoder.value"] = OnnxRuntime::OrtValue.from_shape_and_type(decoder_dims, :float)
end
elsif @config[:model_type] == "falcon"
raise Todo
elsif @config[:multi_query]
raise Todo
elsif @config[:model_type] == "bloom"
raise Todo
else
_dims = [batch_size, @num_heads, 0, @dim_kv]
@num_layers.times do |i|
# decoder_feeds["past_key_values.#{i}.key"] = OnnxRuntime::OrtValue.from_shape_and_type(dims, :float)
# decoder_feeds["past_key_values.#{i}.value"] = OnnxRuntime::OrtValue.from_shape_and_type(dims, :float)
end
end
end
end
def seq2seq_start_beams(input_token_ids, generation_config, num_output_tokens, inputs_attention_mask = nil)
beams = []
beam_id = 0
requires_attention_mask = !@requires_attention_mask.nil? ? @requires_attention_mask : true
# decoder_input_ids == output_token_ids
decoder_input_ids =
generation_config["decoder_input_ids"] ||
generation_config["decoder_start_token_id"] ||
generation_config["bos_token_id"] ||
generation_config["eos_token_id"]
if !decoder_input_ids.is_a?(Array)
decoder_input_ids = [decoder_input_ids]
end
input_token_ids.each do |tokens|
# TODO: Improve
# Currently, just add back batch dimension.
# In future, allow for true parallel execution
tokens = [tokens]
# Create beam
start = {
inputs: tokens,
encoder_outputs: nil,
prev_model_outputs: nil,
output_token_ids: decoder_input_ids,
done: false,
score: 0,
id: beam_id # assign unique id to beams
}
beam_id += 1
if requires_attention_mask
start[:attention_mask] = prepare_attention_mask(tokens)
end
beams << start
end
beams
end
def prepare_attention_mask(tokens)
# Prepare attention mask
pad_token_id = @config["pad_token_id"]
eos_token_id = @config["eos_token_id"]
if eos_token_id.is_a?(Integer)
eos_token_id = [eos_token_id]
end
is_pad_token_in_inputs = !tokens.index(pad_token_id).nil?
is_pad_token_not_equal_to_eos_token_id = eos_token_id.nil? || !eos_token_id.include?(pad_token_id)
if is_pad_token_in_inputs && is_pad_token_not_equal_to_eos_token_id
raise Todo
else
Utils.ones_like(tokens)
end
end
def seq2seq_run_beam(beam)
input_name = self.class.const_get(:MAIN_INPUT_NAME)
decoder_input_ids = beam[:output_token_ids]
if beam[:prev_model_outputs]
# After the first step, `prev_model_outputs` won't be null.
# So, we cut decoder_input_ids if past is used
decoder_input_ids = [decoder_input_ids[-1]]
end
# 1. Prepare
model_inputs = {
input_name => beam[:inputs],
decoder_input_ids: [decoder_input_ids],
encoder_outputs: beam[:encoder_outputs],
past_key_values: beam[:prev_model_outputs] && beam[:prev_model_outputs][:past_key_values]
}
if beam[:attention_mask]
model_inputs[:attention_mask] = beam[:attention_mask]
end
# 2. Run
output = @forward.(model_inputs)
# 3. Update
beam[:prev_model_outputs] = output
beam[:encoder_outputs] = output[:encoder_outputs]
output
end
def seq2seq_update_beam(beam, new_token_id)
beam[:output_token_ids] += [new_token_id]
end
def group_beams(beams)
# Group beams by their ids
groups = {}
beams.each do |obj|
if !groups[obj[:id]]
groups[obj[:id]] = [obj]
else
groups[obj[:id]] << obj
end
end
groups.values
end
def encoder_forward(model_inputs, output_names: nil)
encoder_feeds = {}
@session.inputs.each do |input|
key = input[:name].to_sym
encoder_feeds[key] = model_inputs[key]
end
if @session.inputs.any? { |v| v[:name] == "token_type_ids" } && !encoder_feeds[:token_type_ids]
raise Todo
end
session_run(@session, encoder_feeds, output_names:)
end
def decoder_forward(model_inputs)
input_ids, past_key_values, attention_mask =
model_inputs.values_at(:input_ids, :past_key_values, :attention_mask)
decoder_feeds = {
input_ids: input_ids,
attention_mask: attention_mask || prepare_attention_mask(input_ids)
}
use_cache_branch = !!past_key_values
if @session.inputs.map { |v| v[:name] }.include?("use_cache_branch")
decoder_feeds[:use_cache_branch] = [use_cache_branch]
end
prepare_position_ids(@session, decoder_feeds, use_cache_branch)
add_past_key_values(decoder_feeds, past_key_values)
decoder_results = session_run(@session, decoder_feeds)
decoder_results = @session.outputs.map { |v| v[:name] }.zip(decoder_results).to_h
logits = decoder_results["logits"]
past_key_values = get_past_key_values(decoder_results, past_key_values)
{"logits" => logits, past_key_values: past_key_values}
end
def decoder_start_beams(input_token_ids, generation_config, num_output_tokens, inputs_attention_mask)
beams = []
beam_id = 0
input_token_ids.each do |tokens|
output_token_ids = tokens.dup
# TODO: Improve
# Currently, just add back batch dimension.
# In future, allow for true parallel execution
tokens = [tokens]
if inputs_attention_mask
attn_mask = inputs_attention_mask[beam_id]
attn_mask = [attn_mask]
else
attn_mask = prepare_attention_mask(tokens)
end
start = {
input: tokens,
model_input_ids: tokens,
attention_mask: attn_mask,
prev_model_outputs: nil,
output_token_ids: output_token_ids,
num_output_tokens: num_output_tokens,
done: false,
score: 0,
id: beam_id # assign unique id to beams
}
beam_id += 1
beams << start
end
beams
end
def decoder_run_beam(beam)
attn_mask_data = Array.new(beam[:output_token_ids].length, 1)
# 1. Prepare
model_inputs = {
input_ids: beam[:model_input_ids],
attention_mask: [attn_mask_data],
past_key_values: beam[:prev_model_outputs] && beam[:prev_model_outputs][:past_key_values]
}
# 2. Run
output = @forward.(model_inputs)
# 3. Update
beam[:prev_model_outputs] = output
output
end
def decoder_update_beam(beam, new_token_id)
beam[:output_token_ids] += [new_token_id]
beam[:model_input_ids] = [[new_token_id]]
end
def session_run(session, inputs, output_names: nil)
checked_inputs = validate_inputs(session, inputs)
begin
output = session.run(output_names || @output_names, checked_inputs)
output = replace_tensors(output)
output
rescue => e
raise e
end
end
# TODO
def replace_tensors(obj)
obj
end
# TODO
def validate_inputs(session, inputs)
inputs
end
def get_start_beams(input_token_ids, generation_config, num_output_tokens, inputs_attention_mask)
@get_start_beams.(input_token_ids, generation_config, num_output_tokens, inputs_attention_mask)
end
def run_beam(beam)
@run_beam.(beam)
end
def update_beam(beam, new_token_id)
@update_beam.(beam, new_token_id)
end
end
class BertPreTrainedModel < PreTrainedModel
end
class BertModel < BertPreTrainedModel
end
class BertForMaskedLM < BertPreTrainedModel
def call(model_inputs)
MaskedLMOutput.new(*super(model_inputs))
end
end
class BertForSequenceClassification < BertPreTrainedModel
def call(model_inputs)
SequenceClassifierOutput.new(*super(model_inputs))
end
end
class BertForTokenClassification < BertPreTrainedModel
def call(model_inputs)
TokenClassifierOutput.new(*super(model_inputs))
end
end
class ModernBertPreTrainedModel < PreTrainedModel
end
class ModernBertModel < ModernBertPreTrainedModel
end
class ModernBertForMaskedLM < ModernBertPreTrainedModel
def call(model_inputs)
MaskedLMOutput.new(*super(model_inputs))
end
end
class ModernBertForSequenceClassification < ModernBertPreTrainedModel
def call(model_inputs)
SequenceClassifierOutput.new(*super(model_inputs))
end
end
class ModernBertForTokenClassification < ModernBertPreTrainedModel
def call(model_inputs)
TokenClassifierOutput.new(*super(model_inputs))
end
end
class NomicBertPreTrainedModel < PreTrainedModel
end
class NomicBertModel < NomicBertPreTrainedModel
end
class ConvBertPreTrainedModel < PreTrainedModel
end
class ConvBertModel < ConvBertPreTrainedModel
end
class ElectraPreTrainedModel < PreTrainedModel
end
# TODO add ElectraForPreTraining
class ElectraModel < ElectraPreTrainedModel
end
class DebertaV2PreTrainedModel < PreTrainedModel
end
class DebertaV2Model < DebertaV2PreTrainedModel
end
class DistilBertPreTrainedModel < PreTrainedModel
end
class DistilBertModel < DistilBertPreTrainedModel
end
class DistilBertForSequenceClassification < DistilBertPreTrainedModel
def call(model_inputs)
SequenceClassifierOutput.new(*super(model_inputs))
end
end
class DistilBertForQuestionAnswering < DistilBertPreTrainedModel
def call(model_inputs)
QuestionAnsweringModelOutput.new(*super(model_inputs))
end
end
class MPNetPreTrainedModel < PreTrainedModel
end
class MPNetModel < MPNetPreTrainedModel
end
class T5PreTrainedModel < PreTrainedModel
end
class T5Model < T5PreTrainedModel
end
class T5ForConditionalGeneration < T5PreTrainedModel
def initialize(config, session, decoder_merged_session, generation_config)
super(config, session)
@decoder_merged_session = decoder_merged_session
@generation_config = generation_config
@num_decoder_layers = @config[:num_decoder_layers]
@num_decoder_heads = @config[:num_heads]
@decoder_dim_kv = @config[:d_kv]
@num_encoder_layers = @config[:num_layers]
@num_encoder_heads = @config[:num_heads]
@encoder_dim_kv = @config[:d_kv]
end
end
class BartPretrainedModel < PreTrainedModel
end
class BartModel < BartPretrainedModel
end
class BartForConditionalGeneration < BartPretrainedModel
def initialize(config, session, decoder_merged_session, generation_config)
super(config, session)
@decoder_merged_session = decoder_merged_session
@generation_config = generation_config
@num_decoder_layers = @config["decoder_layers"]
@num_decoder_heads = @config["decoder_attention_heads"]
@decoder_dim_kv = @config["d_model"] / @num_decoder_heads.to_f
@num_encoder_layers = @config["encoder_layers"]
@num_encoder_heads = @config["encoder_attention_heads"]
@encoder_dim_kv = @config["d_model"] / @num_encoder_heads
end
end
class BartForSequenceClassification < BartPretrainedModel
def call(model_inputs)
SequenceClassifierOutput.new(*super(model_inputs))
end
end
class MBartPreTrainedModel < PreTrainedModel
end
class MBartModel < MBartPreTrainedModel
end
class MBartForCausalLM < MBartPreTrainedModel
attr_reader :num_decoder_layers, :num_decoder_heads, :decoder_dim_kv,
:num_encoder_layers, :num_encoder_heads, :encoder_dim_kv
def initialize(config, decoder_merged_session, generation_config)
super(config, decoder_merged_session)
@generation_config = generation_config
@num_decoder_layers = @config["decoder_layers"]
@num_decoder_heads = @config["decoder_attention_heads"]
@decoder_dim_kv = @config["d_model"] / @num_decoder_heads.to_f
@num_encoder_layers = @config["encoder_layers"]
@num_encoder_heads = @config["encoder_attention_heads"]
@encoder_dim_kv = @config["d_model"] / @num_encoder_heads.to_f
end
end
class M2M100PreTrainedModel < PreTrainedModel
end
class M2M100Model < M2M100PreTrainedModel
end
class M2M100ForConditionalGeneration < M2M100PreTrainedModel
def initialize(config, session, decoder_merged_session, generation_config)
super(config, session)
@decoder_merged_session = decoder_merged_session
@generation_config = generation_config
@num_decoder_layers = @config["decoder_layers"]
@num_decoder_heads = @config["decoder_attention_heads"]
@decoder_dim_kv = @config["d_model"] / @num_decoder_heads.to_f
@num_encoder_layers = @config["encoder_layers"]
@num_encoder_heads = @config["encoder_attention_heads"]
@encoder_dim_kv = @config["d_model"] / @num_encoder_heads.to_f
end
end
class Wav2Vec2PreTrainedModel < PreTrainedModel
end
class Wav2Vec2Model < Wav2Vec2PreTrainedModel
end
class Wav2Vec2ForSequenceClassification < Wav2Vec2PreTrainedModel
def call(model_inputs)
SequenceClassifierOutput.new(*super(model_inputs))
end
end
class RobertaPreTrainedModel < PreTrainedModel
end
class RobertaModel < RobertaPreTrainedModel
end
class RobertaForMaskedLM < RobertaPreTrainedModel
def call(model_inputs)
MaskedLMOutput.new(*super(model_inputs))
end
end
class RobertaForTokenClassification < RobertaPreTrainedModel
def call(model_inputs)
TokenClassifierOutput.new(*super(model_inputs))
end
end
class RobertaForSequenceClassification < RobertaPreTrainedModel
def call(model_inputs)
SequenceClassifierOutput.new(*super(model_inputs))
end
end
class XLMRobertaPreTrainedModel < PreTrainedModel
end
class XLMRobertaModel < XLMRobertaPreTrainedModel
end
class XLMRobertaForSequenceClassification < XLMRobertaPreTrainedModel
def call(model_inputs)
SequenceClassifierOutput.new(*super(model_inputs))
end
end
class ViTPreTrainedModel < PreTrainedModel
end
class ViTModel < ViTPreTrainedModel
end
class ViTForImageClassification < ViTPreTrainedModel
def call(model_inputs)
SequenceClassifierOutput.new(*super(model_inputs))
end
end
class CLIPPreTrainedModel < PreTrainedModel
end
class CLIPModel < CLIPPreTrainedModel
end
class GPT2PreTrainedModel < PreTrainedModel
attr_reader :num_heads, :num_layers, :dim_kv
def initialize(config, session, generation_config)
super(config, session)
@generation_config = generation_config
# config doesn't contain pad_token_id, so we assume it is the eos_token_id
@config["pad_token_id"] = @config["eos_token_id"]
@num_heads = @config["n_head"]
@num_layers = @config["n_layer"]
@dim_kv = @config["n_embd"] / @num_heads.to_f
end
end
class GPT2Model < GPT2PreTrainedModel
end
class GPT2LMHeadModel < GPT2PreTrainedModel
end
class OwlViTPreTrainedModel < PreTrainedModel
end
class OwlViTModel < OwlViTPreTrainedModel
end
class OwlViTForObjectDetection < OwlViTPreTrainedModel
end
class DetrPreTrainedModel < PreTrainedModel
end
class DetrModel < DetrPreTrainedModel
end
class DetrForObjectDetection < DetrPreTrainedModel
def call(model_inputs)
DetrObjectDetectionOutput.new(*super(model_inputs))
end
end
class DetrForSegmentation < DetrPreTrainedModel
def call(model_inputs)
DetrSegmentationOutput.new(*super(model_inputs))
end
end
class Swin2SRPreTrainedModel < PreTrainedModel
end
class Swin2SRModel < Swin2SRPreTrainedModel
end
class Swin2SRForImageSuperResolution < Swin2SRPreTrainedModel
end
class DPTPreTrainedModel < PreTrainedModel
end
class DPTModel < DPTPreTrainedModel
end
class DPTForDepthEstimation < DPTPreTrainedModel
end
class VisionEncoderDecoderModel < PreTrainedModel
MAIN_INPUT_NAME = :pixel_values
def initialize(config, session, decoder_merged_session, generation_config)
super(config, session)
@decoder_merged_session = decoder_merged_session
@generation_config = generation_config
# Extract configs
encoder_config = @config["encoder"]
decoder_config = @config["decoder"]
# Validate encoder
encoder_model_type = encoder_config["model_type"]
encoder_model = MODEL_MAPPING_NAMES_ENCODER_ONLY[encoder_model_type] || MODEL_MAPPING_NAMES_ENCODER_DECODER[encoder_model_type]
if !encoder_model
warn "Model type for encoder '#{encoder_model_type}' not found, assuming encoder-only architecture. Please report this."
end
# Validate decoder
decoder_model = MODEL_WITH_LM_HEAD_MAPPING_NAMES[decoder_config["model_type"]]
if !decoder_model
raise Error, "Unable to construct `VisionEncoderDecoder` due to unsupported decoder: \"#{decoder_config["model_type"]}\""
end
decoder_model_class = decoder_model[1]
decoder = decoder_model_class.new(decoder_config, decoder_merged_session, generation_config)
@add_encoder_pkv = decoder.respond_to?(:num_decoder_layers)
if @add_encoder_pkv
# Decoder is part of an encoder-decoder model
@num_decoder_layers = decoder.num_decoder_layers
@num_decoder_heads = decoder.num_decoder_heads
@decoder_dim_kv = decoder.decoder_dim_kv
@num_encoder_layers = decoder.num_encoder_layers
@num_encoder_heads = decoder.num_encoder_heads
@encoder_dim_kv = decoder.encoder_dim_kv
else
# Decoder is a decoder-only model
@num_layers = decoder.num_layers
@num_heads = decoder.num_heads
@dim_kv = decoder.dim_kv
end
end
end
class DonutSwinPreTrainedModel < PreTrainedModel
end
class DonutSwinModel < DonutSwinPreTrainedModel
end
class WhisperPreTrainedModel < PreTrainedModel
end
class WhisperModel < WhisperPreTrainedModel
end
class WhisperForConditionalGeneration < WhisperPreTrainedModel
REQUIRES_ATTENTION_MASK = false
MAIN_INPUT_NAME = :input_features
def initialize(config, session, decoder_merged_session, generation_config)
super(config, session)
@decoder_merged_session = decoder_merged_session
@generation_config = generation_config
@num_decoder_layers = @config["decoder_layers"]
@num_decoder_heads = @config["decoder_attention_heads"]
@decoder_dim_kv = @config["d_model"] / @num_decoder_heads.to_f
@num_encoder_layers = @config["encoder_layers"]
@num_encoder_heads = @config["encoder_attention_heads"]
@encoder_dim_kv = @config["d_model"] / @num_encoder_heads.to_f
end
def generate(inputs, generation_config = nil, logits_processor = nil)
raise Todo
end
end
class VitsPreTrainedModel < PreTrainedModel
end
class VitsModel < VitsPreTrainedModel
def call(model_inputs)
VitsModelOutput.new(*super(model_inputs))
end
end
class SpeechT5PreTrainedModel < PreTrainedModel
end
class SpeechT5Model < SpeechT5PreTrainedModel
end
class SpeechT5ForSpeechToText < SpeechT5PreTrainedModel
end
class SpeechT5ForTextToSpeech < SpeechT5PreTrainedModel
end
class ClapPreTrainedModel < PreTrainedModel
end
class ClapModel < ClapPreTrainedModel
end
MODEL_MAPPING_NAMES_ENCODER_ONLY = {
"bert" => ["BertModel", BertModel],
"modernbert" => ["ModernBertModel", ModernBertModel],
"nomic_bert" => ["NomicBertModel", NomicBertModel],
"electra" => ["ElectraModel", ElectraModel],
"convbert" => ["ConvBertModel", ConvBertModel],
"deberta-v2" => ["DebertaV2Model", DebertaV2Model],
"mpnet" => ["MPNetModel", MPNetModel],
"distilbert" => ["DistilBertModel", DistilBertModel],
"roberta" => ["RobertaModel", RobertaModel],
"xlm-roberta" => ["XLMRobertaModel", XLMRobertaModel],
"clap" => ["ClapModel", ClapModel],
"clip" => ["CLIPModel", CLIPModel],
"detr" => ["DetrModel", DetrModel],
"vit" => ["ViTModel", ViTModel],
"owlvit" => ["OwlViTModel", OwlViTModel],
"donut-swin" => ["DonutSwinModel", DonutSwinModel]
}
MODEL_MAPPING_NAMES_ENCODER_DECODER = {
"bart" => ["BartModel", BartModel]
}
MODEL_MAPPING_NAMES_DECODER_ONLY = {
}
MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES = {
"whisper" => ["WhisperForConditionalGeneration", WhisperForConditionalGeneration]
}
MODEL_FOR_TEXT_TO_SPECTROGRAM_MAPPING_NAMES = {
"speecht5" => ["SpeechT5ForTextToSpeech", SpeechT5ForTextToSpeech]
}
MODEL_FOR_TEXT_TO_WAVEFORM_MAPPING_NAMES = {
"vits" => ["VitsModel", VitsModel]
}
MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = {
"bert" => ["BertForSequenceClassification", BertForSequenceClassification],
"modernbert" => ["ModernBertForSequenceClassification", ModernBertForSequenceClassification],
"distilbert" => ["DistilBertForSequenceClassification", DistilBertForSequenceClassification],
"roberta" => ["RobertaForSequenceClassification", RobertaForSequenceClassification],
"xlm-roberta" => ["XLMRobertaForSequenceClassification", XLMRobertaForSequenceClassification],
"bart" => ["BartForSequenceClassification", BartForSequenceClassification]
}
MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = {
"bert" => ["BertForTokenClassification", BertForTokenClassification],
"modernbert" => ["ModernBertForTokenClassification", ModernBertForTokenClassification],
"roberta" => ["RobertaForTokenClassification", RobertaForTokenClassification]
}
MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING_NAMES = {
"t5" => ["T5ForConditionalGeneration", T5ForConditionalGeneration],
"bart" => ["BartForConditionalGeneration", BartForConditionalGeneration],
"m2m_100" => ["M2M100ForConditionalGeneration", M2M100ForConditionalGeneration]
}
MODEL_WITH_LM_HEAD_MAPPING_NAMES = {
"gpt2" => ["GPT2LMHeadModel", GPT2LMHeadModel],
"mbart" => ["MBartForCausalLM", MBartForCausalLM]
}
MODEL_FOR_MASKED_LM_MAPPING_NAMES = {
"bert" => ["BertForMaskedLM", BertForMaskedLM],
"modernbert" => ["ModernBertForMaskedLM", ModernBertForMaskedLM],
"roberta" => ["RobertaForMaskedLM", RobertaForMaskedLM]
}
MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES = {
"distilbert" => ["DistilBertForQuestionAnswering", DistilBertForQuestionAnswering]
}
MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES = {
"vision-encoder-decoder" => ["VisionEncoderDecoderModel", VisionEncoderDecoderModel]
}
MODEL_FOR_DOCUMENT_QUESTION_ANSWERING_MAPPING_NAMES = {
"vision-encoder-decoder" => ["VisionEncoderDecoderModel", VisionEncoderDecoderModel]
}
MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES = {
"vit" => ["ViTForImageClassification", ViTForImageClassification]
}
MODEL_FOR_OBJECT_DETECTION_MAPPING_NAMES = {
"detr" => ["DetrForObjectDetection", DetrForObjectDetection]
}
MODEL_FOR_ZERO_SHOT_OBJECT_DETECTION_MAPPING_NAMES = {
"owlvit" => ["OwlViTForObjectDetection", OwlViTForObjectDetection]
}
MODEL_FOR_IMAGE_SEGMENTATION_MAPPING_NAMES = {
"detr" => ["DetrForSegmentation", DetrForSegmentation]
}
MODEL_FOR_SEMANTIC_SEGMENTATION_MAPPING_NAMES = {
}
MODEL_FOR_MASK_GENERATION_MAPPING_NAMES = {
}
MODEL_FOR_CTC_MAPPING_NAMES = {
}
MODEL_FOR_AUDIO_CLASSIFICATION_MAPPING_NAMES = {
"wav2vec2" => ["Wav2Vec2ForSequenceClassification", Wav2Vec2ForSequenceClassification]
}
MODEL_FOR_AUDIO_XVECTOR_MAPPING_NAMES = {
}
MODEL_FOR_AUDIO_FRAME_CLASSIFICATION_MAPPING_NAMES = {
}
MODEL_FOR_IMAGE_MATTING_MAPPING_NAMES = {
}
MODEL_FOR_IMAGE_TO_IMAGE_MAPPING_NAMES = {
"swin2sr" => ["Swin2SRForImageSuperResolution", Swin2SRForImageSuperResolution]
}
MODEL_FOR_DEPTH_ESTIMATION_MAPPING_NAMES = {
"dpt" => ["DPTForDepthEstimation", DPTForDepthEstimation]
}
MODEL_FOR_IMAGE_FEATURE_EXTRACTION_MAPPING_NAMES = {
}
MODEL_CLASS_TYPE_MAPPING = [
[MODEL_MAPPING_NAMES_ENCODER_ONLY, MODEL_TYPES[:EncoderOnly]],
[MODEL_MAPPING_NAMES_ENCODER_DECODER, MODEL_TYPES[:EncoderDecoder]],
[MODEL_MAPPING_NAMES_DECODER_ONLY, MODEL_TYPES[:DecoderOnly]],
[MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING_NAMES, MODEL_TYPES[:Seq2Seq]],
[MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES, MODEL_TYPES[:Seq2Seq]],
[MODEL_WITH_LM_HEAD_MAPPING_NAMES, MODEL_TYPES[:DecoderOnly]],
[MODEL_FOR_MASKED_LM_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES, MODEL_TYPES[:Vision2Seq]],
[MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_IMAGE_SEGMENTATION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_SEMANTIC_SEGMENTATION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_IMAGE_MATTING_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_IMAGE_TO_IMAGE_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_DEPTH_ESTIMATION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_OBJECT_DETECTION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_ZERO_SHOT_OBJECT_DETECTION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_MASK_GENERATION_MAPPING_NAMES, MODEL_TYPES[:MaskGeneration]],
[MODEL_FOR_CTC_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_AUDIO_CLASSIFICATION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_TEXT_TO_SPECTROGRAM_MAPPING_NAMES, MODEL_TYPES[:Seq2Seq]],
[MODEL_FOR_TEXT_TO_WAVEFORM_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_AUDIO_XVECTOR_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_AUDIO_FRAME_CLASSIFICATION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]],
[MODEL_FOR_IMAGE_FEATURE_EXTRACTION_MAPPING_NAMES, MODEL_TYPES[:EncoderOnly]]
]
MODEL_CLASS_TYPE_MAPPING.each do |mappings, type|
mappings.values.each do |name, model|
MODEL_TYPE_MAPPING[name] = type
MODEL_CLASS_TO_NAME_MAPPING[model] = name
MODEL_NAME_TO_CLASS_MAPPING[name] = model
end
end
class AutoModel < PretrainedMixin
MODEL_CLASS_MAPPINGS = MODEL_CLASS_TYPE_MAPPING.map { |x| x[0] }
BASE_IF_FAIL = true
end
class AutoModelForSequenceClassification < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES]
end
class AutoModelForTokenClassification < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES]
end
class AutoModelForSeq2SeqLM < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING_NAMES]
end
class AutoModelForSpeechSeq2Seq < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES]
end
class AutoModelForTextToSpectrogram < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_TEXT_TO_SPECTROGRAM_MAPPING_NAMES]
end
class AutoModelForTextToWaveform < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_TEXT_TO_WAVEFORM_MAPPING_NAMES]
end
class AutoModelForCausalLM < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_WITH_LM_HEAD_MAPPING_NAMES]
end
class AutoModelForMaskedLM < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_MASKED_LM_MAPPING_NAMES]
end
class AutoModelForQuestionAnswering < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES]
end
class AutoModelForVision2Seq < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES]
end
class AutoModelForImageClassification < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES]
end
class AutoModelForImageSegmentation < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_IMAGE_SEGMENTATION_MAPPING_NAMES]
end
class AutoModelForSemanticSegmentation < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_SEMANTIC_SEGMENTATION_MAPPING_NAMES]
end
class AutoModelForObjectDetection < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_OBJECT_DETECTION_MAPPING_NAMES]
end
class AutoModelForZeroShotObjectDetection < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_ZERO_SHOT_OBJECT_DETECTION_MAPPING_NAMES]
end
class AutoModelForMaskGeneration < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_MASK_GENERATION_MAPPING_NAMES]
end
class AutoModelForCTC < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_CTC_MAPPING_NAMES]
end
class AutoModelForAudioClassification < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_AUDIO_CLASSIFICATION_MAPPING_NAMES]
end
class AutoModelForXVector < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_AUDIO_XVECTOR_MAPPING_NAMES]
end
class AutoModelForAudioFrameClassification < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_AUDIO_FRAME_CLASSIFICATION_MAPPING_NAMES]
end
class AutoModelForDocumentQuestionAnswering < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_DOCUMENT_QUESTION_ANSWERING_MAPPING_NAMES]
end
class AutoModelForImageMatting < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_IMAGE_MATTING_MAPPING_NAMES]
end
class AutoModelForImageToImage < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_IMAGE_TO_IMAGE_MAPPING_NAMES]
end
class AutoModelForDepthEstimation < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_DEPTH_ESTIMATION_MAPPING_NAMES]
end
class AutoModelForImageFeatureExtraction < PretrainedMixin
MODEL_CLASS_MAPPINGS = [MODEL_FOR_IMAGE_FEATURE_EXTRACTION_MAPPING_NAMES]
end
class ModelOutput
def [](key)
instance_variable_get("@#{key}")
end
end
class Seq2SeqLMOutput < ModelOutput
def initialize(logits, past_key_values, encoder_outputs, decoder_attentions = nil, cross_attentions = nil)
super()
@logits = logits
@past_key_values = past_key_values
@encoder_outputs = encoder_outputs
@decoder_attentions = decoder_attentions
@cross_attentions = cross_attentions
end
end
class SequenceClassifierOutput < ModelOutput
attr_reader :logits
def initialize(logits)
super()
@logits = logits
end
end
class TokenClassifierOutput < ModelOutput
attr_reader :logits
def initialize(logits)
super()
@logits = logits
end
end
class MaskedLMOutput < ModelOutput
attr_reader :logits
def initialize(logits)
super()
@logits = logits
end
end
class QuestionAnsweringModelOutput < ModelOutput
attr_reader :start_logits, :end_logits
def initialize(start_logits, end_logits)
super()
@start_logits = start_logits
@end_logits = end_logits
end
end
class DetrObjectDetectionOutput < ModelOutput
attr_reader :logits, :pred_boxes
def initialize(logits, pred_boxes)
super()
@logits = logits
@pred_boxes = pred_boxes
end
end
class DetrSegmentationOutput < ModelOutput
attr_reader :logits, :pred_boxes, :pred_masks
def initialize(logits, pred_boxes, pred_masks)
super()
@logits = logits
@pred_boxes = pred_boxes
@pred_masks = pred_masks
end
end
end
================================================
FILE: lib/informers/pipelines.rb
================================================
module Informers
class Pipeline
def initialize(task:, model:, tokenizer: nil, processor: nil)
super()
@task = task
@model = model
@tokenizer = tokenizer
@processor = processor
end
private
def prepare_images(images)
if !images.is_a?(Array)
images = [images]
end
# Possibly convert any non-images to images
images.map { |x| Utils::RawImage.read(x) }
end
def prepare_audios(audios, sampling_rate)
if !audios.is_a?(Array)
audios = [audios]
end
audios.map do |x|
if x.is_a?(String) || x.is_a?(URI)
Utils.read_audio(x, sampling_rate)
else
x
end
end
end
def get_bounding_box(box, as_integer)
if as_integer
box = box.map { |x| x.to_i }
end
xmin, ymin, xmax, ymax = box
{xmin:, ymin:, xmax:, ymax:}
end
end
class TextClassificationPipeline < Pipeline
def call(texts, top_k: 1)
# Run tokenization
model_inputs = @tokenizer.(texts,
padding: true,
truncation: true
)
# Run model
outputs = @model.(model_inputs)
function_to_apply =
if @model.config[:problem_type] == "multi_label_classification"
->(batch) { Utils.sigmoid(batch) }
else
->(batch) { Utils.softmax(batch) } # single_label_classification (default)
end
id2label = @model.config[:id2label]
to_return = []
outputs.logits.each do |batch|
output = function_to_apply.(batch)
scores = Utils.get_top_items(output, top_k)
vals = scores.map do |x|
{
label: id2label[x[0].to_s],
score: x[1]
}
end
if top_k == 1
to_return.concat(vals)
else
to_return << vals
end
end
texts.is_a?(Array) ? to_return : to_return[0]
end
end
class TokenClassificationPipeline < Pipeline
def call(
texts,
ignore_labels: ["O"],
aggregation_strategy: "simple"
)
is_batched = texts.is_a?(Array)
# Run tokenization
model_inputs = @tokenizer.(is_batched ? texts : [texts],
padding: true,
truncation: true,
return_offsets: true
)
# Run model
outputs = @model.(model_inputs)
logits = outputs.logits
id2label = @model.config[:id2label]
to_return = []
logits.length.times do |i|
ids = model_inputs[:input_ids][i]
batch = logits[i]
offsets = model_inputs[:offsets][i]
# List of tokens that aren't ignored
tokens = []
batch.length.times do |j|
token_data = batch[j]
top_score_index = Utils.max(token_data)[1]
entity = id2label ? id2label[top_score_index.to_s] : "LABEL_#{top_score_index}"
if ignore_labels.include?(entity)
# We predicted a token that should be ignored. So, we skip it.
next
end
# TODO add option to keep special tokens?
word = @tokenizer.decode([ids[j]], skip_special_tokens: true)
if word == ""
# Was a special token. So, we skip it.
next
end
scores = Utils.softmax(token_data)
tokens << {
entity: entity,
score: scores[top_score_index],
index: j,
word: word,
start: offsets[j][0],
end: offsets[j][1]
}
end
case aggregation_strategy
when "simple"
tokens = group_entities(tokens)
when "none"
# do nothing
else
raise ArgumentError, "Invalid aggregation_strategy"
end
to_return << tokens
end
is_batched ? to_return : to_return[0]
end
def group_sub_entities(entities)
# Get the first entity in the entity group
entity = entities[0][:entity].split("-", 2)[-1]
scores = entities.map { |entity| entity[:score] }
tokens = entities.map { |entity| entity[:word] }
entity_group = {
entity_group: entity,
score: scores.sum / scores.count.to_f,
word: @tokenizer.convert_tokens_to_string(tokens),
start: entities[0][:start],
end: entities[-1][:end]
}
entity_group
end
def get_tag(entity_name)
if entity_name.start_with?("B-")
bi = "B"
tag = entity_name[2..]
elsif entity_name.start_with?("I-")
bi = "I"
tag = entity_name[2..]
else
# It's not in B-, I- format
# Default to I- for continuation.
bi = "I"
tag = entity_name
end
[bi, tag]
end
def group_entities(entities)
entity_groups = []
entity_group_disagg = []
entities.each do |entity|
if entity_group_disagg.empty?
entity_group_disagg << entity
next
end
# If the current entity is similar and adjacent to the previous entity,
# append it to the disaggregated entity group
# The split is meant to account for the "B" and "I" prefixes
# Shouldn't merge if both entities are B-type
bi, tag = get_tag(entity[:entity])
_last_bi, last_tag = get_tag(entity_group_disagg[-1][:entity])
if tag == last_tag && bi != "B"
# Modify subword type to be previous_type
entity_group_disagg << entity
else
# If the current entity is different from the previous entity
# aggregate the disaggregated entity group
entity_groups << group_sub_entities(entity_group_disagg)
entity_group_disagg = [entity]
end
end
if entity_group_disagg.any?
# it's the last entity, add it to the entity groups
entity_groups << group_sub_entities(entity_group_disagg)
end
entity_groups
end
end
class QuestionAnsweringPipeline < Pipeline
def call(question, context, top_k: 1)
# Run tokenization
inputs = @tokenizer.(question,
text_pair: context,
padding: true,
truncation: true,
return_offsets: true
)
output = @model.(inputs)
to_return = []
output.start_logits.length.times do |j|
ids = inputs[:input_ids][j]
sep_index = ids.index(@tokenizer.sep_token_id)
offsets = inputs[:offsets][j]
s1 = Utils.softmax(output.start_logits[j])
.map.with_index
.select { |x| x[1] > sep_index }
e1 = Utils.softmax(output.end_logits[j])
.map.with_index
.select { |x| x[1] > sep_index }
options = s1.product(e1)
.select { |x| x[0][1] <= x[1][1] }
.map { |x| [x[0][1], x[1][1], x[0][0] * x[1][0]] }
.sort_by { |v| -v[2] }
[options.length, top_k].min.times do |k|
start, end_, score = options[k]
answer_tokens = ids.slice(start, end_ + 1)
answer = @tokenizer.decode(answer_tokens,
skip_special_tokens: true
)
to_return << {
answer:,
score:,
start: offsets[start][0],
end: offsets[end_][1]
}
end
end
question.is_a?(Array) ? to_return : to_return[0]
end
end
class FillMaskPipeline < Pipeline
def call(texts, top_k: 5)
model_inputs = @tokenizer.(texts, padding: true, truncation: true)
outputs = @model.(model_inputs)
to_return = []
model_inputs[:input_ids].each_with_index do |ids, i|
mask_token_index = ids.index(@tokenizer.mask_token_id)
if mask_token_index.nil?
raise ArgumentError, "Mask token (#{@tokenizer.mask_token}) not found in text."
end
logits = outputs.logits[i]
item_logits = logits[mask_token_index]
scores = Utils.get_top_items(Utils.softmax(item_logits), top_k)
to_return <<
scores.map do |x|
sequence = ids.dup
sequence[mask_token_index] = x[0]
{
score: x[1],
token: x[0],
token_str: @tokenizer.id_to_token(x[0]),
sequence: @tokenizer.decode(sequence, skip_special_tokens: true)
}
end
end
texts.is_a?(Array) ? to_return : to_return[0]
end
end
class Text2TextGenerationPipeline < Pipeline
KEY = :generated_text
def call(texts, **generate_kwargs)
if !texts.is_a?(Array)
texts = [texts]
end
# Add global prefix, if present
if @model.config[:prefix]
texts = texts.map { |x| @model.config[:prefix] + x }
end
# Handle task specific params:
task_specific_params = @model.config[:task_specific_params]
if task_specific_params && task_specific_params[@task]
# Add prefixes, if present
if task_specific_params[@task]["prefix"]
texts = texts.map { |x| task_specific_params[@task]["prefix"] + x }
end
# TODO update generation config
end
tokenizer = @tokenizer
tokenizer_options = {
padding: true,
truncation: true
}
if is_a?(TranslationPipeline) && tokenizer.respond_to?(:_build_translation_inputs)
input_ids = tokenizer._build_translation_inputs(texts, tokenizer_options, generate_kwargs)[:input_ids]
else
input_ids = tokenizer.(texts, **tokenizer_options)[:input_ids]
end
output_token_ids = @model.generate(input_ids, generate_kwargs)
tokenizer.batch_decode(output_token_ids, skip_special_tokens: true)
.map { |text| {self.class.const_get(:KEY) => text} }
end
end
class SummarizationPipeline < Text2TextGenerationPipeline
KEY = :summary_text
end
class TranslationPipeline < Text2TextGenerationPipeline
KEY = :translation_text
end
class TextGenerationPipeline < Pipeline
def call(texts, **generate_kwargs)
is_batched = false
is_chat_input = false
# Normalize inputs
if texts.is_a?(String)
texts = [texts]
inputs = texts
else
raise Todo
end
# By default, do not add special tokens
add_special_tokens = generate_kwargs[:add_special_tokens] || false
# /By default, return full text
return_full_text =
if is_chat_input
false
else
generate_kwargs[:return_full_text] || true
end
@tokenizer.padding_side = "left"
input_ids, attention_mask =
@tokenizer.(inputs, add_special_tokens:, padding: true, truncation: true)
.values_at(:input_ids, :attention_mask)
output_token_ids =
@model.generate(
input_ids, generate_kwargs, nil, inputs_attention_mask: attention_mask
)
decoded = @tokenizer.batch_decode(output_token_ids, skip_special_tokens: true)
if !return_full_text && Utils.dims(input_ids)[-1] > 0
prompt_lengths = @tokenizer.batch_decode(input_ids, skip_special_tokens: true).map { |x| x.length }
end
to_return = Array.new(texts.length) { [] }
decoded.length.times do |i|
text_index = (i / output_token_ids.length.to_i * texts.length).floor
if prompt_lengths
raise Todo
end
# TODO is_chat_input
to_return[text_index] << {
generated_text: decoded[i]
}
end
!is_batched && to_return.length == 1 ? to_return[0] : to_return
end
end
class ZeroShotClassificationPipeline < Pipeline
def initialize(**options)
super(**options)
@label2id = @model.config[:label2id].transform_keys(&:downcase)
@entailment_id = @label2id["entailment"]
if @entailment_id.nil?
warn "Could not find 'entailment' in label2id mapping. Using 2 as entailment_id."
@entailment_id = 2
end
@contradiction_id = @label2id["contradiction"] || @label2id["not_entailment"]
if @contradiction_id.nil?
warn "Could not find 'contradiction' in label2id mapping. Using 0 as contradiction_id."
@contradiction_id = 0
end
end
def call(texts, candidate_labels, hypothesis_template: "This example is {}.", multi_label: false)
is_batched = texts.is_a?(Array)
if !is_batched
texts = [texts]
end
if !candidate_labels.is_a?(Array)
candidate_labels = [candidate_labels]
end
# Insert labels into hypothesis template
hypotheses = candidate_labels.map { |x| hypothesis_template.sub("{}", x) }
# How to perform the softmax over the logits:
# - true: softmax over the entailment vs. contradiction dim for each label independently
# - false: softmax the "entailment" logits over all candidate labels
softmax_each = multi_label || candidate_labels.length == 1
to_return = []
texts.each do |premise|
entails_logits = []
hypotheses.each do |hypothesis|
inputs = @tokenizer.(
premise,
text_pair: hypothesis,
padding: true,
truncation: true
)
outputs = @model.(inputs)
if softmax_each
entails_logits << [
outputs.logits[0][@contradiction_id],
outputs.logits[0][@entailment_id]
]
else
entails_logits << outputs.logits[0][@entailment_id]
end
end
scores =
if softmax_each
entails_logits.map { |x| Utils.softmax(x)[1] }
else
Utils.softmax(entails_logits)
end
# Sort by scores (desc) and return scores with indices
scores_sorted = scores.map.with_index { |x, i| [x, i] }.sort_by { |v| -v[0] }
to_return << {
sequence: premise,
labels: scores_sorted.map { |x| candidate_labels[x[1]] },
scores: scores_sorted.map { |x| x[0] }
}
end
is_batched ? to_return : to_return[0]
end
end
class ImageToTextPipeline < Pipeline
def call(images, **generate_kwargs)
is_batched = images.is_a?(Array)
prepared_images = prepare_images(images)
pixel_values = @processor.(prepared_images)[:pixel_values]
to_return = []
pixel_values.each do |batch|
batch = [batch]
output = @model.generate(batch, **generate_kwargs)
decoded = @tokenizer
.batch_decode(output, skip_special_tokens: true)
.map { |x| {generated_text: x.strip} }
to_return << decoded
end
is_batched ? to_return : to_return[0]
end
end
class ImageClassificationPipeline < Pipeline
def call(images, top_k: 1)
is_batched = images.is_a?(Array)
prepared_images = prepare_images(images)
pixel_values = @processor.(prepared_images)[:pixel_values]
output = @model.({pixel_values: pixel_values})
id2label = @model.config[:id2label]
to_return = []
output.logits.each do |batch|
scores = Utils.get_top_items(Utils.softmax(batch), top_k)
vals =
scores.map do |x|
{
label: id2label[x[0].to_s],
score: x[1]
}
end
if top_k == 1
to_return.push(*vals)
else
to_return << vals
end
end
is_batched || top_k == 1 ? to_return : to_return[0]
end
end
class ImageSegmentationPipeline < Pipeline
def initialize(**options)
super(**options)
@subtasks_mapping = {
"panoptic" => "post_process_panoptic_segmentation",
"instance" => "post_process_instance_segmentation",
"semantic" => "post_process_semantic_segmentation"
}
end
def call(
images,
threshold: 0.5,
mask_threshold: 0.5,
overlap_mask_area_threshold: 0.8,
label_ids_to_fuse: nil,
target_sizes: nil,
subtask: nil
)
is_batched = images.is_a?(Array)
if is_batched && images.length != 1
raise Error, "Image segmentation pipeline currently only supports a batch size of 1."
end
prepared_images = prepare_images(images)
image_sizes = prepared_images.map { |x| [x.height, x.width] }
model_inputs = @processor.(prepared_images).slice(:pixel_values, :pixel_mask)
output = @model.(model_inputs)
if !subtask.nil?
fn = @subtasks_mapping[subtask]
else
@subtasks_mapping.each do |task, func|
if @processor.feature_extractor.respond_to?(func)
fn = @processor.feature_extractor.method(func)
subtask = task
break
end
end
end
id2label = @model.config[:id2label]
annotation = []
if subtask == "panoptic" || subtask == "instance"
processed = fn.(
output,
threshold:,
mask_threshold:,
overlap_mask_area_threshold:,
label_ids_to_fuse:,
target_sizes: target_sizes || image_sizes, # TODO FIX?
)[0]
_segmentation = processed[:segmentation]
processed[:segments_info].each do |segment|
annotation << {
label: id2label[segment[:label_id].to_s],
score: segment[:score]
# TODO mask
}
end
elsif subtask == "semantic"
raise Todo
else
raise Error, "Subtask #{subtask} not supported."
end
annotation
end
end
class ZeroShotImageClassificationPipeline < Pipeline
def call(images, candidate_labels, hypothesis_template: "This is a photo of {}")
is_batched = images.is_a?(Array)
prepared_images = prepare_images(images)
# Insert label into hypothesis template
texts = candidate_labels.map { |x| hypothesis_template.sub("{}", x) }
# Run tokenization
text_inputs = @tokenizer.(texts,
padding: @model.config[:model_type] == "siglip" ? "max_length" : true,
truncation: true
)
# Run processor
pixel_values = @processor.(prepared_images)[:pixel_values]
# Run model with both text and pixel inputs
output = @model.(text_inputs.merge(pixel_values: pixel_values))
function_to_apply =
if @model.config[:model_type] == "siglip"
->(batch) { Utils.sigmoid(batch) }
else
->(batch) { Utils.softmax(batch) }
end
# Compare each image with each candidate label
to_return = []
output[0].each do |batch|
# Compute softmax per image
probs = function_to_apply.(batch)
result = probs
.map.with_index { |x, i| {label: candidate_labels[i], score: x} }
.sort_by { |v| -v[:score] }
to_return << result
end
is_batched ? to_return : to_return[0]
end
end
class ObjectDetectionPipeline < Pipeline
def call(images, threshold: 0.9, percentage: false)
is_batched = images.is_a?(Array)
if is_batched && images.length != 1
raise Error, "Object detection pipeline currently only supports a batch size of 1."
end
prepared_images = prepare_images(images)
image_sizes = percentage ? nil : prepared_images.map { |x| [x.height, x.width] }
model_inputs = @processor.(prepared_images).slice(:pixel_values, :pixel_mask)
output = @model.(model_inputs)
processed = @processor.feature_extractor.post_process_object_detection(output, threshold, image_sizes)
# Add labels
id2label = @model.config[:id2label]
# Format output
result =
processed.map do |batch|
batch[:boxes].map.with_index do |box, i|
{
label: id2label[batch[:classes][i].to_s],
score: batch[:scores][i],
box: get_bounding_box(box, !percentage)
}
end.sort_by { |v| -v[:score] }
end
is_batched ? result : result[0]
end
end
class ZeroShotObjectDetectionPipeline < Pipeline
def call(
images,
candidate_labels,
threshold: 0.1,
top_k: nil,
percentage: false
)
is_batched = images.is_a?(Array)
prepared_images = prepare_images(images)
# Run tokenization
text_inputs = @tokenizer.(candidate_labels,
padding: true,
truncation: true
)
# Run processor
model_inputs = @processor.(prepared_images)
# Since non-maximum suppression is performed for exporting, we need to
# process each image separately. For more information, see:
# https://github.com/huggingface/optimum/blob/e3b7efb1257c011db907ef40ab340e795cc5684c/optimum/exporters/onnx/model_configs.py#L1028-L1032
to_return = []
prepared_images.length.times do |i|
image = prepared_images[i]
image_size = percentage ? nil : [[image.height, image.width]]
pixel_values = [model_inputs[:pixel_values][i]]
# Run model with both text and pixel inputs
output = @model.(text_inputs.merge(pixel_values: pixel_values))
# TODO remove
output = @model.instance_variable_get(:@session).outputs.map { |v| v[:name].to_sym }.zip(output).to_h
processed = @processor.feature_extractor.post_process_object_detection(output, threshold, image_size, true)[0]
result =
processed[:boxes].map.with_index do |box, i|
{
label: candidate_labels[processed[:classes][i]],
score: processed[:scores][i],
box: get_bounding_box(box, !percentage)
}
end
result.sort_by! { |v| -v[:score] }
if !top_k.nil?
result = result[0...topk]
end
to_return << result
end
is_batched ? to_return : to_return[0]
end
end
class DocumentQuestionAnsweringPipeline < Pipeline
def call(image, question, **generate_kwargs)
# NOTE: For now, we only support a batch size of 1
# Preprocess image
prepared_image = prepare_images(image)[0]
pixel_values = @processor.(prepared_image)[:pixel_values]
# Run tokenization
task_prompt = "<s_docvqa><s_question>#{question}</s_question><s_answer>"
decoder_input_ids =
@tokenizer.(
task_prompt,
add_special_tokens: false,
padding: true,
truncation: true
)[:input_ids]
# Run model
output =
@model.generate(
pixel_values,
generate_kwargs.merge(
decoder_input_ids: decoder_input_ids[0],
max_length: @model.config["decoder"]["max_position_embeddings"]
).transform_keys(&:to_s)
)
# Decode output
decoded = @tokenizer.batch_decode(output, skip_special_tokens: false)[0]
# Parse answer
match = decoded.match(/<s_answer>(.*?)<\/s_answer>/)
answer = nil
if match && match.length >= 2
answer = match[1].strip
end
[{answer:}]
end
end
class TextToAudioPipeline < Pipeline
DEFAULT_VOCODER_ID = "Xenova/speecht5_hifigan"
def initialize(**options)
super(**options)
# TODO: Find a better way for `pipeline` to set the default vocoder
@vocoder = options[:vocoder]
end
def call(text_inputs, speaker_embeddings: nil)
# If this.processor is not set, we are using a `AutoModelForTextToWaveform` model
if @processor
call_text_to_spectrogram(text_inputs, speaker_embeddings:)
else
call_text_to_waveform(text_inputs)
end
end
end
class FeatureExtractionPipeline < Pipeline
def call(
texts,
pooling: "none",
normalize: false,
quantize: false,
precision: "binary",
model_output: nil
)
# Run tokenization
model_inputs = @tokenizer.(texts,
padding: true,
truncation: true
)
model_options = {}
if !model_output.nil?
model_options[:output_names] = Array(model_output)
elsif @model.instance_variable_get(:@output_names) == ["token_embeddings"] && pooling == "mean" && normalize
# optimization for previous revision of sentence-transformers/all-MiniLM-L6-v2
model_options[:output_names] = ["sentence_embedding"]
pooling = "none"
normalize = false
end
# Run model
outputs = @model.(model_inputs, **model_options)
# TODO improve
result =
if outputs.is_a?(Array)
# TODO show returned instead of all
output_names = @model.instance_variable_get(:@session).outputs.map { |v| v[:name] }
raise Error, "unexpected outputs: #{output_names}" if outputs.size != 1
outputs[0]
else
outputs.logits
end
case pooling
when "none"
# Skip pooling
when "mean"
result = Utils.mean_pooling(result, model_inputs[:attention_mask])
when "cls"
result = result.map(&:first)
else
# TODO raise ArgumentError in 2.0
raise Error, "Pooling method '#{pooling}' not supported."
end
if normalize
result = Utils.normalize(result)
end
if quantize
result = quantize_embeddings(result, precision)
end
texts.is_a?(Array) ? result : result[0]
end
end
class ImageFeatureExtractionPipeline < Pipeline
def call(images)
prepared_images = prepare_images(images)
pixel_values = @processor.(prepared_images)[:pixel_values]
outputs = @model.({pixel_values: pixel_values})
result = outputs[0]
result
end
end
class AudioClassificationPipeline < Pipeline
def call(audio, top_k: nil)
single = !audio.is_a?(Array)
sampling_rate = @processor.feature_extractor.config["sampling_rate"]
prepared_audios = prepare_audios(audio, sampling_rate)
id2label = @model.config[:id2label]
to_return = []
prepared_audios.each do |aud|
inputs = @processor.(aud)
output = @model.(inputs)
logits = output.logits[0]
scores = Utils.get_top_items(Utils.softmax(logits), top_k)
vals =
scores.map do |x|
{
label: id2label[x[0].to_s],
score: x[1]
}
end
if top_k == 1
to_return.concat(vals)
else
to_return << vals
end
end
!single || top_k == 1 ? to_return : to_return[0]
end
end
class ZeroShotAudioClassificationPipeline < Pipeline
def call(audio, candidate_labels, hypothesis_template: "This is a sound of {}.")
single = !audio.is_a?(Array)
if single
audio = [audio]
end
# Insert label into hypothesis template
texts = candidate_labels.map { |x| hypothesis_template.sub("{}", x) }
# Run tokenization
text_inputs =
@tokenizer.(
texts,
padding: true,
truncation: true
)
sampling_rate = @processor.feature_extractor.config["sampling_rate"]
prepared_audios = prepare_audios(audio, sampling_rate)
to_return = []
prepared_audios.each do |aud|
audio_inputs = @processor.(aud)
# Run model with both text and audio inputs
output = @model.(text_inputs.merge(audio_inputs))
# Compute softmax per audio
probs = Utils.softmax(output.logits_per_audio.data)
to_return <<
probs.map.with_index do |x, i|
{
label: candidate_labels[i],
score: x
}
end
end
single ? to_return[0] : to_return
end
end
class AutomaticSpeechRecognitionPipeline < Pipeline
def call(audio, **kwargs)
case @model.config["model_type"]
when "whisper"
call_whisper(audio, **kwargs)
else
raise Error, "AutomaticSpeechRecognitionPipeline does not support model type '#{@model.config["model_type"]}'."
end
end
private
def call_whisper(audio, **kwargs)
raise Todo
end
end
class ImageToImagePipeline < Pipeline
def call(images)
prepared_images = prepare_images(images)
inputs = @processor.(prepared_images)
outputs = @model.(inputs)
to_return = []
outputs[0].each do |batch|
# TODO flatten first
output =
batch.map do |v|
v.map do |v2|
v2.map do |v3|
(v3.clamp(0, 1) * 255).round
end
end
end
to_return << Utils::RawImage.from_array(output).image
end
to_return.length > 1 ? to_return : to_return[0]
end
end
class DepthEstimationPipeline < Pipeline
def call(images)
prepared_images = prepare_images(images)
inputs = @processor.(prepared_images)
predicted_depth = @model.(inputs)[0]
to_return = []
prepared_images.length.times do |i|
prediction = Utils.interpolate(predicted_depth[i], prepared_images[i].size.reverse, "bilinear", false)
max_prediction = Utils.max(prediction.flatten)[0]
formatted =
prediction.map do |v|
v.map do |v2|
v2.map do |v3|
(v3 * 255 / max_prediction).round
end
end
end
to_return << {
predicted_depth: predicted_depth[i],
depth: Utils::RawImage.from_array(formatted).image
}
end
to_return.length > 1 ? to_return : to_return[0]
end
end
class EmbeddingPipeline < FeatureExtractionPipeline
def call(
texts,
pooling: "mean",
normalize: true,
model_output: nil
)
super(texts, pooling:, normalize:, model_output:)
end
end
class RerankingPipeline < Pipeline
def call(
query,
documents,
return_documents: false,
top_k: nil
)
model_inputs = @tokenizer.([query] * documents.size,
text_pair: documents,
padding: true,
truncation: true
)
outputs = @model.(model_inputs)
result =
Utils.sigmoid(outputs[0].map(&:first))
.map.with_index { |s, i| {doc_id: i, score: s} }
.sort_by { |v| -v[:score] }
if return_documents
result.each do |v|
v[:text] = documents[v[:doc_id]]
end
end
top_k ? result.first(top_k) : result
end
end
SUPPORTED_TASKS = {
"text-classification" => {
tokenizer: AutoTokenizer,
pipeline: TextClassificationPipeline,
model: AutoModelForSequenceClassification,
default: {
model: "Xenova/distilbert-base-uncased-finetuned-sst-2-english"
},
type: "text"
},
"token-classification" => {
tokenizer: AutoTokenizer,
pipeline: TokenClassificationPipeline,
model: AutoModelForTokenClassification,
default: {
model: "Xenova/bert-base-multilingual-cased-ner-hrl"
},
type: "text"
},
"question-answering" => {
tokenizer: AutoTokenizer,
pipeline: QuestionAnsweringPipeline,
model: AutoModelForQuestionAnswering,
default: {
model: "Xenova/distilbert-base-cased-distilled-squad"
},
type: "text"
},
"fill-mask" => {
tokenizer: AutoTokenizer,
pipeline: FillMaskPipeline,
model: AutoModelForMaskedLM,
default: {
model: "Xenova/bert-base-uncased"
},
type: "text"
},
"summarization" => {
tokenizer: AutoTokenizer,
pipeline: SummarizationPipeline,
model: AutoModelForSeq2SeqLM,
default: {
model: "Xenova/distilbart-cnn-6-6"
},
type: "text"
},
"translation" => {
tokenizer: AutoTokenizer,
pipeline: TranslationPipeline,
model: AutoModelForSeq2SeqLM,
default: {
model: "Xenova/t5-small"
},
type: "text"
},
"text2text-generation" => {
tokenizer: AutoTokenizer,
pipeline: Text2TextGenerationPipeline,
model: AutoModelForSeq2SeqLM,
default: {
model: "Xenova/flan-t5-small"
},
type: "text"
},
"text-generation" => {
tokenizer: AutoTokenizer,
pipeline: TextGenerationPipeline,
model: AutoModelForCausalLM,
default: {
model: "Xenova/gpt2"
},
type: "text"
},
"zero-shot-classification" => {
tokenizer: AutoTokenizer,
pipeline: ZeroShotClassificationPipeline,
model: AutoModelForSequenceClassification,
default: {
model: "Xenova/distilbert-base-uncased-mnli"
},
type: "text"
},
"audio-classification" => {
pipeline: AudioClassificationPipeline,
model: AutoModelForAudioClassification,
processor: AutoProcessor,
default: {
model: "Xenova/wav2vec2-base-superb-ks"
},
type: "audio"
},
# TODO
# "zero-shot-audio-classification" => {
# tokenizer: AutoTokenizer,
# pipeline: ZeroShotAudioClassificationPipeline,
# model: AutoModel,
# processor: AutoProcessor,
# default: {
# model: "Xenova/clap-htsat-unfused"
# },
# type: "multimodal"
# },
# TODO
# "automatic-speech-recognition" => {
# tokenizer: AutoTokenizer,
# pipeline: AutomaticSpeechRecognitionPipeline,
# model: [AutoModelForSpeechSeq2Seq, AutoModelForCTC],
# processor: AutoProcessor,
# default: {
# model: "Xenova/whisper-tiny.en"
# },
# type: "multimodal"
# },
"text-to-audio" => {
tokenizer: AutoTokenizer,
pipeline: TextToAudioPipeline,
model: [AutoModelForTextToWaveform, AutoModelForTextToSpectrogram],
processor: [AutoProcessor, nil],
default: {
model: "Xenova/speecht5_tts"
},
type: "text"
},
"image-to-text" => {
tokenizer: AutoTokenizer,
pipeline: ImageToTextPipeline,
model: AutoModelForVision2Seq,
processor: AutoProcessor,
default: {
model: "Xenova/vit-gpt2-image-captioning"
},
type: "multimodal"
},
"image-classification" => {
pipeline: ImageClassificationPipeline,
model: AutoModelForImageClassification,
processor: AutoProcessor,
default: {
model: "Xenova/vit-base-patch16-224"
},
type: "multimodal"
},
"image-segmentation" => {
pipeline: ImageSegmentationPipeline,
model: [AutoModelForImageSegmentation, AutoModelForSemanticSegmentation],
processor: AutoProcessor,
default: {
model: "Xenova/detr-resnet-50-panoptic"
},
type: "multimodal"
},
"zero-shot-image-classification" => {
tokenizer: AutoTokenizer,
pipeline: ZeroShotImageClassificationPipeline,
model: AutoModel,
processor: AutoProcessor,
default: {
model: "Xenova/clip-vit-base-patch32"
},
type: "multimodal"
},
"object-detection" => {
pipeline: ObjectDetectionPipeline,
model: AutoModelForObjectDetection,
processor: AutoProcessor,
default: {
model: "Xenova/detr-resnet-50"
},
type: "multimodal"
},
"zero-shot-object-detection" => {
tokenizer: AutoTokenizer,
pipeline: ZeroShotObjectDetectionPipeline,
model: AutoModelForZeroShotObjectDetection,
processor: AutoProcessor,
default: {
model: "Xenova/owlvit-base-patch32"
},
type: "multimodal"
},
"document-question-answering" => {
tokenizer: AutoTokenizer,
pipeline: DocumentQuestionAnsweringPipeline,
model: AutoModelForDocumentQuestionAnswering,
processor: AutoProcessor,
default: {
model: "Xenova/donut-base-finetuned-docvqa"
},
type: "multimodal"
},
"image-to-image" => {
pipeline: ImageToImagePipeline,
model: AutoModelForImageToImage,
processor: AutoProcessor,
default: {
model: "Xenova/swin2SR-classical-sr-x2-64"
},
type: "image"
},
"depth-estimation" => {
pipeline: DepthEstimationPipeline,
model: AutoModelForDepthEstimation,
processor: AutoProcessor,
default: {
model: "Xenova/dpt-large"
},
type: "image"
},
"feature-extraction" => {
tokenizer: AutoTokenizer,
pipeline: FeatureExtractionPipeline,
model: AutoModel,
default: {
model: "Xenova/all-MiniLM-L6-v2"
},
type: "text"
},
"image-feature-extraction" => {
processor: AutoProcessor,
pipeline: ImageFeatureExtractionPipeline,
model: [AutoModelForImageFeatureExtraction, AutoModel],
default: {
model: "Xenova/vit-base-patch16-224"
},
type: "image"
},
"embedding" => {
tokenizer: AutoTokenizer,
pipeline: EmbeddingPipeline,
model: AutoModel,
default: {
model: "sentence-transformers/all-MiniLM-L6-v2"
},
type: "text"
},
"reranking" => {
tokenizer: AutoTokenizer,
pipeline: RerankingPipeline,
model: AutoModel,
default: {
model: "mixedbread-ai/mxbai-rerank-base-v1"
},
type: "text"
}
}
TASK_ALIASES = {
"sentiment-analysis" => "text-classification",
"ner" => "token-classification",
"text-to-speech" => "text-to-audio"
}
DEFAULT_PROGRESS_CALLBACK = lambda do |msg|
stream = $stderr
tty = stream.tty?
width = tty ? stream.winsize[1] : 80
width = 80 if width == 0
if msg[:status] == "progress" && tty
stream.print "\r#{Utils::Hub.display_progress(msg[:file], width, msg[:size], msg[:total_size])}"
elsif msg[:status] == "done" && !msg[:cache_hit]
if tty
stream.puts
else
stream.puts Utils::Hub.display_progress(msg[:file], width, 1, 1)
end
end
end
NO_DEFAULT = Object.new
class << self
def pipeline(
task,
model = nil,
quantized: NO_DEFAULT,
progress_callback: DEFAULT_PROGRESS_CALLBACK,
config: nil,
cache_dir: nil,
local_files_only: false,
revision: "main",
device: nil,
dtype: nil,
model_file_name: nil,
session_options: {}
)
# Apply aliases
task = TASK_ALIASES[task] || task
if quantized == NO_DEFAULT
# TODO no quantization by default in 2.0
quantized = ["text-classification", "token-classification", "question-answering", "feature-extraction"].include?(task)
end
# Get pipeline info
pipeline_info = SUPPORTED_TASKS[task.split("_", 1)[0]]
if !pipeline_info
raise Error, "Unsupported pipeline: #{task}. Must be one of #{SUPPORTED_TASKS.keys}"
end
# Use model if specified, otherwise, use default
if !model
model = pipeline_info[:default][:model]
warn "No model specified. Using default model: #{model.inspect}."
end
pretrained_options = {
quantized:,
progress_callback:,
config:,
cache_dir:,
local_files_only:,
revision:,
device:,
dtype:,
model_file_name:,
session_options:
}
classes = {
tokenizer: pipeline_info[:tokenizer],
model: pipeline_info[:model],
processor: pipeline_info[:processor]
}
# Load model, tokenizer, and processor (if they exist)
results = load_items(classes, model, pretrained_options)
results[:task] = task
# for previous revision of sentence-transformers/all-MiniLM-L6-v2
if model == "sentence-transformers/all-MiniLM-L6-v2" && results[:model].instance_variable_get(:@session).outputs.any? { |v| v[:name] == "token_embeddings" }
results[:model].instance_variable_set(:@output_names, ["token_embeddings"])
end
Utils.dispatch_callback(progress_callback, {
status: "ready",
task: task,
model: model
})
pipeline_class = pipeline_info.fetch(:pipeline)
pipeline_class.new(**results)
end
private
def load_items(mapping, model, pretrained_options)
result = {}
mapping.each do |name, cls|
next if !cls
if cls.is_a?(Array)
e = nil
cls.each do |c|
begin
result[name] = c.from_pretrained(model, **pretrained_options)
rescue => err
e = err
end
end
raise e unless result[name]
else
result[name] = cls.from_pretrained(model, **pretrained_options)
end
end
result
end
end
end
================================================
FILE: lib/informers/processors.rb
================================================
module Informers
class FeatureExtractor
attr_reader :config
def initialize(config)
super()
@config = config
end
end
class ImageFeatureExtractor < FeatureExtractor
def initialize(config)
super(config)
@image_mean = @config["image_mean"] || @config["mean"]
@image_std = @config["image_std"] || @config["std"]
@resample = @config["resample"] || 2 # 2 => bilinear
@do_rescale = @config.fetch("do_rescale", true)
@rescale_factor = @config["rescale_factor"] || (1 / 255.0)
@do_normalize = @config["do_normalize"]
@do_resize = @config["do_resize"]
@do_thumbnail = @config["do_thumbnail"]
@size = @config["size"]
@size_divisibility = @config["size_divisibility"] || @config["size_divisor"]
@do_center_crop = @config["do_center_crop"]
@crop_size = @config["crop_size"]
@do_convert_rgb = @config.fetch("do_convert_rgb", true)
@do_crop_margin = @config["do_crop_margin"]
@pad_size = @config["pad_size"]
@do_pad = @config["do_pad"]
if @do_pad && !@pad_size && @size && !@size["width"].nil? && !@size["height"].nil?
# Should pad, but no pad size specified
# We infer the pad size from the resize size
@pad_size = @size
end
@do_flip_channel_order = @config["do_flip_channel_order"] || false
end
def thumbnail(image, size, resample = 2)
input_height = image.height
input_width = image.width
output_height = size["height"]
output_width = size["width"]
# We always resize to the smallest of either the input or output size.
height = [input_height, output_height].min
width = [input_width, output_width].min
if height == input_height && width == input_width
return image
end
if input_height > input_width
width = (input_width * height / input_height).floor
elsif input_width > input_height
height = (input_height * width / input_width).floor
end
image.resize(width, height, resample:)
end
def pad_image(
pixel_data,
img_dims,
pad_size,
mode: "constant",
center: false,
constant_values: 0
)
image_height, image_width, image_channels = img_dims
if pad_size.is_a?(Numeric)
padded_image_width = pad_size
padded_image_height = pad_size
else
padded_image_width = pad_size[:width] || pad_size["width"]
padded_image_height = pad_size[:height] || pad_size["height"]
end
# Only add padding if there is a difference in size
if padded_image_width != image_width || padded_image_height != image_height
padded_pixel_data = Array.new(padded_image_width * padded_image_height * image_channels)
if constant_values.is_a?(Array)
# Fill with constant values, cycling through the array
padded_pixel_data.length.times do |i|
padded_pixel_data[i] = constant_values[i % image_channels]
end
elsif constant_values != 0
padded_pixel_data.fill(constant_values)
end
left, top =
if center
[((padded_image_width - image_width) / 2.0).floor, ((padded_image_height - image_height) / 2.0).floor]
else
[0, 0]
end
# Copy the original image into the padded image
image_height.times do |i|
a = (i + top) * padded_image_width
b = i * image_width
image_width.times do |j|
c = (a + j + left) * image_channels
d = (b + j) * image_channels
image_channels.times do |k|
padded_pixel_data[c + k] = pixel_data[d + k]
end
end
end
if mode == "symmetric"
if center
raise Error, "`center` padding is not supported when `mode` is set to `symmetric`."
end
h1 = image_height - 1
w1 = image_width - 1
padded_image_height.times do |i|
a = i * padded_image_width
b = Utils.calculate_reflect_offset(i, h1) * image_width
padded_image_width.times do |j|
next if i < image_height && j < image_width # Do not overwrite original image
c = (a + j) * image_channels
d = (b + Utils.calculate_reflect_offset(j, w1)) * image_channels
# Copy channel-wise
image_channels.times do |k|
padded_pixel_data[c + k] = pixel_data[d + k]
end
end
end
end
# Update pixel data and image dimensions
pixel_data = padded_pixel_data
img_dims = [padded_image_height, padded_image_width, image_channels]
end
[pixel_data, img_dims]
end
def rescale(pixel_data)
pixel_data.length.times do |i|
pixel_data[i] *= @rescale_factor
end
end
def get_resize_output_image_size(image, size)
src_width, src_height = image.size
if @do_thumbnail
# NOTE: custom logic for `Donut` models
height = size["height"]
width = size["width"]
shortest_edge = [height, width].min
elsif size.is_a?(Numeric)
shortest_edge = size
longest_edge = @config["max_size"] || shortest_edge
elsif !size.nil?
# Extract known properties from `size`
shortest_edge = size["shortest_edge"]
longest_edge = size["longest_edge"]
end
if !shortest_edge.nil? || !longest_edge.nil?
# http://opensourcehacker.com/2011/12/01/calculate-aspect-ratio-conserving-resize-for-images-in-javascript/
# Try resize so that shortest edge is `shortest_edge` (target)
short_resize_factor =
if shortest_edge.nil?
1 # If `shortest_edge` is not set, don't upscale
else
[shortest_edge / src_width.to_f, shortest_edge / src_height.to_f].max
end
new_width = src_width * short_resize_factor
new_height = src_height * short_resize_factor
# The new width and height might be greater than `longest_edge`, so
# we downscale again to ensure the largest dimension is `longest_edge`
long_resize_factor =
if longest_edge.nil?
1 # If `longest_edge` is not set, don't downscale
else
[longest_edge / new_width.to_f, longest_edge / new_height.to_f].min
end
# To avoid certain floating point precision issues, we round to 2 decimal places
final_width = (new_width * long_resize_factor).round(2).floor
final_height = (new_height * long_resize_factor).round(2).floor
if !@size_divisibility.nil?
raise Todo
end
[final_width, final_height]
elsif !size.nil? && !size["width"].nil? && !size["height"].nil?
new_width = size["width"]
new_height = size["height"]
if @config["keep_aspect_ratio"] && @config["ensure_multiple_of"]
raise Todo
end
[new_width, new_height]
else
raise Todo
end
end
def resize(image)
new_width, new_height = get_resize_output_image_size(image, @size)
image.resize(new_width, new_height, resample: @resample)
end
def preprocess(
image,
do_normalize: nil,
do_pad: nil,
do_convert_rgb: nil,
do_convert_grayscale: nil,
do_flip_channel_order: nil
)
if @do_crop_margin
# NOTE: Specific to nougat processors. This is done before resizing,
# and can be interpreted as a pre-preprocessing step.
image = crop_margin(image)
end
src_width, src_height = image.size # original image size
# Convert image to RGB if specified in config.
if !do_convert_rgb.nil? ? do_convert_rgb : @do_convert_rgb
image = image.rgb
elsif do_convert_grayscale
image = image.grayscale
end
# Resize all images
if @do_resize
image = resize(image)
end
# Resize the image using thumbnail method.
if @do_thumbnail
image = thumbnail(image, @size, @resample)
end
if @do_center_crop
if @crop_size.is_a?(Integer)
crop_width = @crop_size
crop_height = @crop_size
else
crop_width = @crop_size["width"]
crop_height = @crop_size["height"]
end
image = image.center_crop(crop_width, crop_height)
end
reshaped_input_size = [image.height, image.width]
# NOTE: All pixel-level manipulation (i.e., modifying `pixelData`)
# occurs with data in the hwc format (height, width, channels),
# to emulate the behavior of the original Python code (w/ numpy).
pixel_data = image.data
img_dims = [image.height, image.width, image.channels]
if @do_rescale
rescale(pixel_data)
end
if !do_normalize.nil? ? do_normalize : @do_normalize
image_mean = @image_mean
if !@image_mean.is_a?(Array)
image_mean = new Array(image.channels) { image_mean }
end
image_std = @image_std
if !@image_std.is_a?(Array)
image_std = new Array(image.channels) { image_std }
end
if image_mean.length != image.channels || image_std.length != image.channels
raise Error, "When set to arrays, the length of `image_mean` (#{image_mean.length}) and `image_std` (#{image_std.length}) must match the number of channels in the image (#{image.channels})."
end
i = 0
while i < pixel_data.length
image.channels.times do |j|
pixel_data[i + j] = (pixel_data[i + j] - image_mean[j]) / image_std[j]
end
i += image.channels
end
end
# do padding after rescaling/normalizing
if !do_pad.nil? ? do_pad : @do_pad
if @pad_size
padded = pad_image(pixel_data, [image.height, image.width, image.channels], @pad_size)
pixel_data, img_dims = padded # Update pixel data and image dimensions
elsif @size_divisibility
raise Todo
end
end
if !do_flip_channel_order.nil? ? do_flip_channel_order : @do_flip_channel_order
raise Todo
end
# convert to channel dimension format (hwc -> chw)
h, w, c = img_dims
pixel_values =
c.times.map do |ci|
h.times.map do |hi|
w.times.map do |wi|
index = (hi * w * c) + (wi * c) + ci
pixel_data[index]
end
end
end
{
original_size: [src_height, src_width],
reshaped_input_size: reshaped_input_size,
pixel_values: pixel_values
}
end
def call(images, *args)
if !images.is_a?(Array)
images = [images]
end
image_data = images.map { |x| preprocess(x) }
# Stack pixel values
pixel_values = Utils.stack(image_data.map { |x| x[:pixel_values] }, 0)
{
pixel_values: pixel_values,
# Original sizes of images
original_sizes: image_data.map { |x| x[:original_size] },
# Reshaped sizes of images, before padding or cropping
reshaped_input_sizes: image_data.map { |x| x[:reshaped_input_size] }
}
end
end
class CLIPFeatureExtractor < ImageFeatureExtractor
end
class DPTFeatureExtractor < ImageFeatureExtractor
end
class ViTFeatureExtractor < ImageFeatureExtractor
end
class OwlViTFeatureExtractor < ImageFeatureExtractor
def post_process_object_detection(*args)
Utils.post_process_object_detection(*args)
end
end
class Swin2SRImageProcessor < ImageFeatureExtractor
def pad_image(pixel_data, img_dims, pad_size, **options)
# NOTE: In this case, `padSize` represents the size of the sliding window for the local attention.
# In other words, the image is padded so that its width and height are multiples of `padSize`.
image_height, image_width, _image_channels = img_dims
super(
pixel_data,
img_dims,
{
# NOTE: For Swin2SR models, the original python implementation adds padding even when the image's width/height is already
# a multiple of `pad_size`. However, this is most likely a bug (PR: https://github.com/mv-lab/swin2sr/pull/19).
# For this reason, we only add padding when the image's width/height is not a multiple of `pad_size`.
width: image_width + (pad_size - image_width % pad_size) % pad_size,
height: image_height + (pad_size - image_height % pad_size) % pad_size
},
mode: "symmetric",
center: false,
constant_values: -1,
**options
)
end
end
class DonutFeatureExtractor < ImageFeatureExtractor
def pad_image(pixel_data, img_dims, pad_size, **options)
_image_height, _image_width, image_channels = img_dims
image_mean = @image_mean
if !image_mean.is_a?(Array)
image_mean = new Array(image_channels, image_mean)
end
image_std = @image_std
if !image_std.is_a?(Array)
image_std = new Array(image_channels, image_std)
end
constant_values = image_mean.map.with_index { |x, i| -x / image_std[i] }
super(
pixel_data,
img_dims,
pad_size,
center: true,
# Since normalization is done after padding, we need to use certain constant values to ensure the same behaviour is observed.
# For more information, see https://github.com/huggingface/transformers/blob/main/src/transformers/models/donut/image_processing_donut.py#L433-L451
constant_values: constant_values,
**options
)
end
end
class DetrFeatureExtractor < ImageFeatureExtractor
def call(images)
result = super(images)
# TODO support differently-sized images, for now assume all images are the same size.
# TODO support different mask sizes (not just 64x64)
# Currently, just fill pixel mask with 1s
mask_size = [result[:pixel_values].size, 64, 64]
pixel_mask =
mask_size[0].times.map do
mask_size[1].times.map do
mask_size[2].times.map do
1
end
end
end
result.merge(pixel_mask: pixel_mask)
end
def post_process_object_detection(*args)
Utils.post_process_object_detection(*args)
end
def remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels)
mask_probs_item = []
pred_scores_item = []
pred_labels_item = []
class_logits.size.times do |j|
cls = class_logits[j]
mask = mask_logits[j]
pred_label = Utils.max(cls)[1]
if pred_label == num_labels
# Is the background, so we ignore it
next
end
scores = Utils.softmax(cls)
pred_score = scores[pred_label]
if pred_score > object_mask_threshold
mask_probs_item << mask
pred_scores_item << pred_score
pred_labels_item << pred_label
end
end
[mask_probs_item, pred_scores_item, pred_labels_item]
end
def check_segment_validity(
mask_labels,
mask_probs,
k,
mask_threshold = 0.5,
overlap_mask_area_threshold = 0.8
)
# mask_k is a 1D array of indices, indicating where the mask is equal to k
mask_k = []
mask_k_area = 0
original_area = 0
mask_probs_k_data = mask_probs[k].flatten
# Compute the area of all the stuff in query k
mask_labels.length.times do |i|
if mask_labels[i] == k
mask_k << i
mask_k_area += 1
end
if mask_probs_k_data[i] >= mask_threshold
original_area += 1
end
end
mask_exists = mask_k_area > 0 && original_area > 0
# Eliminate disconnected tiny segments
if mask_exists
# Perform additional check
area_ratio = mask_k_area / original_area
mask_exists = area_ratio > overlap_mask_area_threshold
end
[mask_exists, mask_k]
end
def compute_segments(
mask_probs,
pred_scores,
pred_labels,
mask_threshold,
overlap_mask_area_threshold,
label_ids_to_fuse = nil,
target_size = nil
)
height, width = target_size || Utils.dims(mask_probs[0])
segmentation = Array.new(height * width)
segments = []
# 1. If target_size is not null, we need to resize the masks to the target size
if !target_size.nil?
# resize the masks to the target size
mask_probs.length.times do |i|
mask_probs[i] = Utils.interpolate(mask_probs[i], target_size, "bilinear", false)
end
end
# 2. Weigh each mask by its prediction score
# NOTE: `mask_probs` is updated in-place
#
# Temporary storage for the best label/scores for each pixel ([height, width]):
mask_labels = Array.new(mask_probs[0].flatten.length)
best_scores = Array.new(mask_probs[0].flatten.length, 0)
mask_probs.length.times do |i|
score = pred_scores[i]
mask_probs_i_data = mask_probs[i].flatten
mask_probs_i_dims = Utils.dims(mask_probs[i])
mask_probs_i_data.length.times do |j|
mask_probs_i_data[j] *= score
if mask_probs_i_data[j] > best_scores[j]
mask_labels[j] = i
best_scores[j] = mask_probs_i_data[j]
end
end
mask_probs[i] = Utils.reshape(mask_probs_i_data, mask_probs_i_dims)
end
current_segment_id = 0
# stuff_memory_list = {}
pred_labels.length.times do |k|
pred_class = pred_labels[k]
# TODO add `should_fuse`
# should_fuse = label_ids_to_fuse.include?(pred_class)
# Check if mask exists and large enough to be a segment
mask_exists, mask_k = check_segment_validity(
mask_labels,
mask_probs,
k,
mask_threshold,
overlap_mask_area_threshold
)
if !mask_exists
# Nothing to see here
next
end
current_segment_id += 1
# Add current object segment to final segmentation map
mask_k.each do |index|
segmentation[index] = current_segment_id
end
segments << {
id: current_segment_id,
label_id: pred_class,
score: pred_scores[k]
}
end
segmentation = Utils.reshape(segmentation, [height, width])
[segmentation, segments]
end
def post_process_panoptic_segmentation(
outputs,
threshold: 0.5,
mask_threshold: 0.5,
overlap_mask_area_threshold: 0.8,
label_ids_to_fuse: nil,
target_sizes: nil
)
if label_ids_to_fuse.nil?
warn "`label_ids_to_fuse` unset. No instance will be fused."
label_ids_to_fuse = Set.new
end
class_queries_logits = outputs[:logits] # [batch_size, num_queries, num_classes+1]
masks_queries_logits = outputs[:pred_masks] # [batch_size, num_queries, height, width]
mask_probs = Utils.sigmoid(masks_queries_logits) # [batch_size, num_queries, height, width]
batch_size, _num_queries, num_labels = class_queries_logits.size, class_queries_logits[0].size, class_queries_logits[0][0].size
num_labels -= 1 # Remove last class (background)
if !target_sizes.nil? && target_sizes.length != batch_size
raise Error, "Make sure that you pass in as many target sizes as the batch dimension of the logits"
end
to_return = []
batch_size.times do |i|
target_size = !target_sizes.nil? ? target_sizes[i] : nil
class_logits = class_queries_logits[i]
mask_logits = mask_probs[i]
mask_probs_item, pred_scores_item, pred_labels_item = remove_low_and_no_objects(class_logits, mask_logits, threshold, num_labels)
if pred_labels_item.length == 0
raise Todo
end
# Get segmentation map and segment information of batch item
segmentation, segments = compute_segments(
mask_probs_item,
pred_scores_item,
pred_labels_item,
mask_threshold,
overlap_mask_area_threshold,
label_ids_to_fuse,
target_size
)
to_return << {
segmentation: segmentation,
segments_info: segments
}
end
to_return
end
end
module Utils
def self.center_to_corners_format(v)
centerX, centerY, width, height = v
[
centerX - width / 2.0,
centerY - height / 2.0,
centerX + width / 2.0,
centerY + height / 2.0
]
end
def self.post_process_object_detection(outputs, threshold = 0.5, target_sizes = nil, is_zero_shot = false)
out_logits = outputs[:logits]
out_bbox = outputs[:pred_boxes]
batch_size, num_boxes, num_classes = out_logits.size, out_logits[0].size, out_logits[0][0].size
if !target_sizes.nil? && target_sizes.length != batch_size
raise Error, "Make sure that you pass in as many target sizes as the batch dimension of the logits"
end
to_return = []
batch_size.times do |i|
target_size = !target_sizes.nil? ? target_sizes[i] : nil
info = {
boxes: [],
classes: [],
scores: []
}
logits = out_logits[i]
bbox = out_bbox[i]
num_boxes.times do |j|
logit = logits[j]
indices = []
if is_zero_shot
# Get indices of classes with high enough probability
probs = Utils.sigmoid(logit)
probs.length.times do |k|
if probs[k] > threshold
indices << k
end
end
else
# Get most probable class
max_index = Utils.max(logit)[1]
if max_index == num_classes - 1
# This is the background class, skip it
next
end
indices << max_index
# Compute softmax over classes
probs = Utils.softmax(logit)
end
indices.each do |index|
box = bbox[j]
# convert to [x0, y0, x1, y1] format
box = center_to_corners_format(box)
if !target_size.nil?
box = box.map.with_index { |x, i| x * target_size[(i + 1) % 2] }
end
info[:boxes] << box
info[:classes] << index
info[:scores] << probs[index]
end
end
to_return << info
end
to_return
end
end
class WhisperFeatureExtractor < FeatureExtractor
def initialize(config)
super(config)
raise Todo
end
def _extract_fbank_features(waveform)
raise Todo
end
def call(audio)
raise Todo
end
end
class Wav2Vec2FeatureExtractor < FeatureExtractor
def _zero_mean_unit_var_norm(input_values)
sum = input_values.sum
mean = sum / input_values.length.to_f
variance = input_values.sum { |b| (b - mean) ** 2 } / input_values.length.to_f
input_values.map { |x| (x - mean) / Math.sqrt(variance + 1e-7) }
end
def call(audio)
# TODO
# validate_audio_inputs(audio, 'Wav2Vec2FeatureExtractor')
input_values = audio
# zero-mean and unit-variance normalization
if @config["do_normalize"]
input_values = _zero_mean_unit_var_norm(input_values)
end
# TODO: allow user to pass in attention mask
{
input_values: [input_values],
attention_mask: [Array.new(input_values.length, 1)]
}
end
end
class ClapFeatureExtractor < FeatureExtractor
def initialize(config)
super(config)
# TODO
end
def call(audio, max_length: nil)
raise Todo
end
end
class Processor
attr_reader :feature_extractor
def initialize(feature_extractor)
@feature_extractor = feature_extractor
end
def call(input, *args)
@feature_extractor.(input, *args)
end
end
class AutoProcessor
FEATURE_EXTRACTOR_CLASS_MAPPING = {
"ViTFeatureExtractor" => ViTFeatureExtractor,
"OwlViTFeatureExtractor" => OwlViTFeatureExtractor,
"CLIPFeatureExtractor" => CLIPFeatureExtractor,
"DPTFeatureExtractor" => DPTFeatureExtractor,
"DetrFeatureExtractor" => DetrFeatureExtractor,
"Swin2SRImageProcessor" => Swin2SRImageProcessor,
"DonutFeatureExtractor" => DonutFeatureExtractor,
"WhisperFeatureExtractor" => WhisperFeatureExtractor,
"Wav2Vec2FeatureExtractor" => Wav2Vec2FeatureExtractor,
"ClapFeatureExtractor" => ClapFeatureExtractor
}
PROCESSOR_CLASS_MAPPING = {}
def self.from_pretrained(
pretrained_model_name_or_path,
progress_callback: nil,
config: nil,
cache_dir: nil,
local_files_only: false,
revision: "main",
**kwargs
)
preprocessor_config = config || Utils::Hub.get_model_json(pretrained_model_name_or_path, "preprocessor_config.json", true,
progress_callback:,
config:,
cache_dir:,
local_files_only:,
revision:
)
# Determine feature extractor class
# TODO: Ensure backwards compatibility with old configs
key = preprocessor_config["feature_extractor_type"] || preprocessor_config["image_processor_type"]
feature_extractor_class = FEATURE_EXTRACTOR_CLASS_MAPPING[key]
if !feature_extractor_class
if preprocessor_config["size"]
# Assume ImageFeatureExtractor
warn "Feature extractor type #{key.inspect} not found, assuming ImageFeatureExtractor due to size parameter in config."
feature_extractor_class = ImageFeatureExtractor
else
raise Error, "Unknown Feature Extractor type: #{key}"
end
end
# If no associated processor class, use default
processor_class = PROCESSOR_CLASS_MAPPING[preprocessor_config["processor_class"]] || Processor
# Instantiate processor and feature extractor
feature_extractor = feature_extractor_class.new(preprocessor_config)
processor_class.new(feature_extractor)
end
end
end
================================================
FILE: lib/informers/tokenizers.rb
================================================
module Informers
class PreTrainedTokenizer
attr_reader :mask_token, :mask_token_id, :sep_token_id
def initialize(tokenizer_json, tokenizer_config)
super()
@tokenizer_config = tokenizer_config
@tokenizer = Tokenizers::Tokenizer.from_file(tokenizer_json)
# Add added_tokens to model
@special_tokens = []
@all_special_ids = []
@added_tokens = []
@tokenizer.added_tokens_decoder.each do |id, token|
@added_tokens << token
if token.special
@special_tokens << token.content
@all_special_ids << id
end
end
# Update additional_special_tokens
@additional_special_tokens = tokenizer_config["additional_special_tokens"] || []
@special_tokens.concat(@additional_special_tokens)
@mask_token = get_token("mask_token")
@mask_token_id = @tokenizer.token_to_id(@mask_token) if @mask_token
@sep_token = get_token("sep_token")
@sep_token_id = @tokenizer.token_to_id(@sep_token) if @sep_token
@model_max_length = tokenizer_config["model_max_length"]
# for donut-base-finetuned-docvqa
if @model_max_length && @model_max_length > (1 << 63)
@model_max_length = 1 << 63
end
end
def get_token(*keys)
keys.each do |key|
item = @tokenizer_config[key]
if !item
next
end
if item.is_a?(Hash)
if item["__type"] == "AddedToken"
return item["content"]
else
raise Error, "Unknown token: #{item}"
end
else
return item
end
end
nil
end
def call(
text,
text_pair: nil,
add_special_tokens: true,
padding: false,
truncation: nil,
max_length: nil,
return_tensor: true,
return_token_type_ids: true, # TODO change default
return_offsets: false
)
is_batched = text.is_a?(Array)
if is_batched
if text.length == 0
raise Error, "text array must be non-empty"
end
if !text_pair.nil?
if !text_pair.is_a?(Array)
raise Error, "text_pair must also be an array"
elsif text.length != text_pair.length
raise Error, "text and text_pair must have the same length"
end
end
end
if padding
@tokenizer.enable_padding
else
@tokenizer.no_padding
end
if truncation
@tokenizer.enable_truncation(max_length || @model_max_length)
else
@tokenizer.no_truncation
end
if is_batched
input = text_pair ? text.zip(text_pair) : text
encoded = @tokenizer.encode_batch(input, add_special_tokens: add_special_tokens)
else
encoded = [@tokenizer.encode(text, text_pair, add_special_tokens: add_special_tokens)]
end
result = {input_ids: encoded.map(&:ids), attention_mask: encoded.map(&:attention_mask)}
if return_token_type_ids
result[:token_type_ids] = encoded.map(&:type_ids)
end
if return_offsets
result[:offsets] = encoded.map(&:offsets)
end
result
end
def decode(tokens, skip_special_tokens:)
@tokenizer.decode(tokens, skip_special_tokens: skip_special_tokens)
end
def convert_tokens_to_string(tokens)
@tokenizer.decoder.decode(tokens)
end
def convert_tokens_to_ids(tokens)
tokens.map { |t| @tokenizer.token_to_id(t) }
end
def id_to_token(id)
@tokenizer.id_to_token(id)
end
def batch_decode(batch, **decode_args)
@tokenizer.decode_batch(batch, **decode_args)
end
def padding_side=(side)
@tokenizer.enable_padding(direction: side)
end
end
class BertTokenizer < PreTrainedTokenizer
# TODO
# self.return_token_type_ids = true
end
class DebertaV2Tokenizer < PreTrainedTokenizer
# TODO
# self.return_token_type_ids = true
end
class DistilBertTokenizer < PreTrainedTokenizer
end
class T5Tokenizer < PreTrainedTokenizer
end
class GPT2Tokenizer < PreTrainedTokenizer
# _default_chat_template = `{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}`
end
class BartTokenizer < PreTrainedTokenizer
end
class RobertaTokenizer < PreTrainedTokenizer
end
class XLMRobertaTokenizer < PreTrainedTokenizer
end
class MPNetTokenizer < PreTrainedTokenizer
end
class CLIPTokenizer < PreTrainedTokenizer
end
class NllbTokenizer < PreTrainedTokenizer
attr_reader :language_regex, :language_codes, :lang_to_token
def initialize(tokenizer_json, tokenizer_config)
super(tokenizer_json, tokenizer_config)
@language_regex = /^[a-z]{3}_[A-Z][a-z]{3}$/
@language_codes = @special_tokens.filter { |x| @language_regex.match?(x) }
@lang_to_token = ->(x) { x } # Identity function
end
def _build_translation_inputs(raw_inputs, tokenizer_options, generate_kwargs)
Utils._build_translation_inputs(self, raw_inputs, tokenizer_options, generate_kwargs)
end
end
class M2M100Tokenizer < PreTrainedTokenizer
attr_reader :language_regex, :language_codes, :lang_to_token
def initialize(tokenizer_json, tokenizer_config)
super(tokenizer_json, tokenizer_config)
@language_regex = /^__[a-z]{2,3}__$/
@language_codes = @special_tokens
.filter { |x| @language_regex.match?(x) }
.map { |x| x.slice(2, -2) }
@lang_to_token = ->(x) { "__#{x}__" }
end
def _build_translation_inputs(raw_inputs, tokenizer_options, generate_kwargs)
Utils._build_translation_inputs(self, raw_inputs, tokenizer_options, generate_kwargs)
end
end
module Utils
def self._build_translation_inputs(slf, raw_inputs, tokenizer_options, generate_kwargs)
if !slf.respond_to?(:language_codes) || !slf.language_codes.is_a?(Array)
raise Error, "Tokenizer must have `language_codes` attribute set and it should be an array of language ids."
end
if !slf.respond_to?(:language_regex) || !slf.language_regex.is_a?(Regexp)
raise Error, "Tokenizer must have `language_regex` attribute set and it should be a regular expression."
end
if !slf.respond_to?(:lang_to_token) || !slf.lang_to_token.respond_to?(:call)
raise Error, "Tokenizer must have `lang_to_token` attribute set and it should be a function."
end
src_lang_token = generate_kwargs[:src_lang]
tgt_lang_token = generate_kwargs[:tgt_lang]
if !slf.language_codes.include?(tgt_lang_token)
raise Error, "Target language code #{tgt_lang_token.inspect} is not valid. Must be one of: #{slf.language_codes.join(", ")}"
end
if !src_lang_token.nil?
# Check that the source language is valid:
if !slf.language_codes.include?(src_lang_token)
raise Error, "Source language code #{src_lang_token.inspect} is not valid. Must be one of: #{slf.language_codes.join(", ")}"
end
end
# Override the `forced_bos_token_id` to force the correct language
generate_kwargs["forced_bos_token_id"] = slf.convert_tokens_to_ids([slf.lang_to_token.(tgt_lang_token)])[0]
slf.(raw_inputs, **tokenizer_options)
end
end
class SpeechT5Tokenizer < PreTrainedTokenizer
end
class AutoTokenizer
TOKENIZER_CLASS_MAPPING = {
"T5Tokenizer" => T5Tokenizer,
"BertTokenizer" => BertTokenizer,
"DebertaV2Tokenizer" => DebertaV2Tokenizer,
"DistilBertTokenizer" => DistilBertTokenizer,
"BartTokenizer" => BartTokenizer,
"RobertaTokenizer" => RobertaTokenizer,
"XLMRobertaTokenizer" => XLMRobertaTokenizer,
"MPNetTokenizer" => MPNetTokenizer,
"CLIPTokenizer" => CLIPTokenizer,
"GPT2Tokenizer" => GPT2Tokenizer,
"NllbTokenizer" => NllbTokenizer,
"M2M100Tokenizer" => M2M100Tokenizer,
"SpeechT5Tokenizer" => SpeechT5Tokenizer,
"PreTrainedTokenizer" => PreTrainedTokenizer
}
def self.from_pretrained(
pretrained_model_name_or_path,
quantized: true,
progress_callback: nil,
config: nil,
cache_dir: nil,
local_files_only: false,
revision: "main",
legacy: nil,
**kwargs
)
tokenizer_json, tokenizer_config = load_tokenizer(
pretrained_model_name_or_path,
quantized:,
progress_callback:,
config:,
cache_dir:,
local_files_only:,
revision:,
legacy:
)
# Some tokenizers are saved with the "Fast" suffix, so we remove that if present.
tokenizer_name = tokenizer_config["tokenizer_class"]&.delete_suffix("Fast") || "PreTrainedTokenizer"
cls = TOKENIZER_CLASS_MAPPING[tokenizer_name]
if !cls
warn "Unknown tokenizer class #{tokenizer_name.inspect}, attempting to construct from base class."
cls = PreTrainedTokenizer
end
cls.new(tokenizer_json, tokenizer_config)
end
def self.load_tokenizer(pretrained_model_name_or_path, **options)
info = [
Utils::Hub.get_model_file(pretrained_model_name_or_path, "tokenizer.json", true, **options),
Utils::Hub.get_model_json(pretrained_model_name_or_path, "tokenizer_config.json", true, **options)
]
# Override legacy option if `options.legacy` is not null
if !options[:legacy].nil?
info[1]["legacy"] = options[:legacy]
end
info
end
end
end
================================================
FILE: lib/informers/utils/audio.rb
================================================
module Informers
module Utils
def self.read_audio(input, sampling_rate)
data =
if input.is_a?(URI)
require "open-uri"
input.read
elsif input.is_a?(String)
File.binread(input)
else
raise ArgumentError, "Unsupported input type: #{input.class.name}"
end
ffmpeg_read(data, sampling_rate)
end
end
end
================================================
FILE: lib/informers/utils/core.rb
================================================
module Informers
module Utils
def self.dispatch_callback(progress_callback, data)
progress_callback.(data) if progress_callback
end
def self.calculate_reflect_offset(i, w)
((i + w) % (2 * w) - w).abs
end
end
end
================================================
FILE: lib/informers/utils/dtypes.rb
================================================
module Informers
module Utils
DEFAULT_DTYPE_SUFFIX_MAPPING = {
fp32: "",
fp16: "_fp16",
int8: "_int8",
uint8: "_uint8",
q8: "_quantized",
q4: "_q4",
q4f16: "_q4f16",
bnb4: "_bnb4"
}
end
end
================================================
FILE: lib/informers/utils/ffmpeg.rb
================================================
# Copyright 2021 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
module Informers
module Utils
# from the Transformers Python library
def self.ffmpeg_read(data, sampling_rate)
ar = "#{sampling_rate}"
ac = "1"
format_for_conversion = "f32le"
ffmpeg_command = [
"ffmpeg",
"-i",
"pipe:0",
"-ac",
ac,
"-ar",
ar,
"-f",
format_for_conversion,
"-hide_banner",
"-loglevel",
"quiet",
"pipe:1"
]
stdout, status = Open3.capture2(*ffmpeg_command, stdin_data: data)
if !status.success?
raise Error, "ffmpeg was not found but is required to load audio files from filename"
end
stdout.unpack("e*")
end
end
end
================================================
FILE: lib/informers/utils/generation.rb
================================================
module Informers
module Utils
class GenerationConfig
def initialize(kwargs)
@config = {}
# Parameters that control the length of the output
@config["max_length"] = kwargs["max_length"] || 20
@config["max_new_tokens"] = kwargs["max_new_tokens"]
@config["min_length"] = kwargs["min_length"] || 0
@config["min_new_tokens"] = kwargs["min_new_tokens"]
@config["early_stopping"] = kwargs["early_stopping"] || false
@config["max_time"] = kwargs["max_time"]
# Parameters that control the generation strategy used
@config["do_sample"] = kwargs["do_sample"] || false
@config["num_beams"] = kwargs["num_beams"] || 1
@config["num_beam_groups"] = kwargs["num_beam_groups"] || 1
@config["penalty_alpha"] = kwargs["penalty_alpha"]
@config["use_cache"] = kwargs.fetch("use_cache", true)
# Parameters for manipulation of the model output logits
@config["temperature"] = kwargs["temperature"] || 1.0
@config["top_k"] = kwargs["top_k"] || 50
@config["top_p"] = kwargs["top_p"] || 1.0
@config["typical_p"] = kwargs["typical_p"] || 1.0
@config["epsilon_cutoff"] = kwargs["epsilon_cutoff"] || 0.0
@config["eta_cutoff"] = kwargs["eta_cutoff"] || 0.0
@config["diversity_penalty"] = kwargs["diversity_penalty"] || 0.0
@config["repetition_penalty"] = kwargs["repetition_penalty"] || 1.0
@config["encoder_repetition_penalty"] = kwargs["encoder_repetition_penalty"] || 1.0
@config["length_penalty"] = kwargs["length_penalty"] || 1.0
@config["no_repeat_ngram_size"] = kwargs["no_repeat_ngram_size"] || 0
@config["bad_words_ids"] = kwargs["bad_words_ids"]
@config["force_words_ids"] = kwargs["force_words_ids"]
@config["renormalize_logits"] = kwargs["renormalize_logits"] || false
@config["constraints"] = kwargs["constraints"]
@config["forced_bos_token_id"] = kwargs["forced_bos_token_id"]
@config["forced_eos_token_id"] = kwargs["forced_eos_token_id"]
@config["remove_invalid_values"] = kwargs["remove_invalid_values"] || false
@config["exponential_decay_length_penalty"] = kwargs["exponential_decay_length_penalty"]
@config["suppress_tokens"] = kwargs["suppress_tokens"]
@config["begin_suppress_tokens"] = kwargs["begin_suppress_tokens"]
@config["forced_decoder_ids"] = kwargs["forced_decoder_ids"]
# Parameters that define the output variables of `generate`
@config["num_return_sequences"] = kwargs["num_return_sequences"] || 1
@config["output_attentions"] = kwargs["output_attentions"] || false
@config["output_hidden_states"] = kwargs["output_hidden_states"] || false
@config["output_scores"] = kwargs["output_scores"] || false
@config["return_dict_in_generate"] = kwargs["return_dict_in_generate"] || false
# Special tokens that can be used at generation time
@config["pad_token_id"] = kwargs["pad_token_id"]
@config["bos_token_id"] = kwargs["bos_token_id"]
@config["eos_token_id"] = kwargs["eos_token_id"]
# Generation parameters exclusive to encoder-decoder models
@config["encoder_no_repeat_ngram_size"] = kwargs["encoder_no_repeat_ngram_size"] || 0
@config["decoder_start_token_id"] = kwargs["decoder_start_token_id"]
# Wild card
@generation_kwargs = kwargs["generation_kwargs"] || {}
end
def [](key)
@config[key.to_s]
end
def merge!(config)
@config.merge!(config)
end
end
class Sampler
def initialize(generation_config)
super()
@generation_config = generation_config
end
def call(logits, index = -1)
# Sample from logits, of dims [batch, sequence_length, vocab_size].
# If index is specified, sample from [batch, index, vocab_size].
sample(logits, index)
end
def get_logits(logits, index)
vocab_size = Utils.dims(logits)[-1]
logs = logits.flatten
if index == -1
logs = logs.last(vocab_size)
else
raise Todo
end
# add temperature
if @generation_config["temperature"] > 0
logs = logs.map { |x| x / @generation_config["temperature"] }
end
logs
end
def self.get_sampler(generation_config)
if generation_config[:do_sample]
MultinomialSampler.new(generation_config)
elsif generation_config[:num_beams] > 1
BeamSearchSampler.new(generation_config)
else
if generation_config[:num_return_sequences] > 1
raise Error, "num_return_sequences has to be 1 when doing greedy search, but is #{generation_config[:num_return_sequences]}."
end
GreedySampler.new(generation_config)
end
end
end
class GreedySampler < Sampler
def sample(logits, index = -1)
# NOTE: no need to do log_softmax here since we only take the maximum
logs = get_logits(logits, index)
argmax = Utils.max(logs)[1]
# Note: score is meaningless in this context, since we are performing
# greedy search (p = 1 => log(p) = 0)
[
[argmax, 0]
]
end
end
class BeamSearchSampler < Sampler
def sample(logits, index = -1)
k = Utils.dims(logits)[-1] # defaults to vocab size
if @generation_config["top_k"] > 0
k = [@generation_config["top_k"], k].min
end
# Get logits of nth token
logs = get_logits(logits, index)
# Get top k tokens
top_logits = Utils.get_top_items(logs, k)
# Compute softmax over logits
probabilities = Utils.softmax(top_logits.map { |x| x[1] })
Array.new(@generation_config["num_beams"]) do |i|
[
top_logits[i][0],
Math.log(probabilities[i])
]
end
end
end
class LogitsProcessorList
def initialize
super
@processors = []
end
def push(item)
@processors << item
end
def concat(items)
@processors.concat(items)
end
def call(input_ids, batched_logits)
# NOTE: This is different from the Python code, since vanilla Ruby does not support vectorized operations.
# As a result, we apply each processor to each item in the batch.
batched_logits.each do |logits|
# Modifies logits inplace
@processors.each do |func|
func.(input_ids, logits)
end
end
end
def to_ary
@processors
end
end
class LogitsProcessor
end
class NoRepeatNGramLogitsProcessor < LogitsProcessor
def initialize(no_repeat_ngram_size)
super()
@no_repeat_ngram_size = no_repeat_ngram_size
end
def get_ngrams(prev_input_ids)
cur_len = prev_input_ids.length
ngrams = []
j = 0
while j < cur_len + 1 - @no_repeat_ngram_size
ngram = []
@no_repeat_ngram_size.times do |k|
ngram << prev_input_ids[j + k]
end
ngrams << ngram
j += 1
end
generated_ngram = {}
ngrams.each do |ngram|
prev_ngram = ngram.slice(0, ngram.length - 1)
prev_ngram_key = JSON.generate(prev_ngram)
prev_ngram_value = generated_ngram[prev_ngram_key] || []
prev_ngram_value << ngram[ngram.length - 1]
generated_ngram[prev_ngram_key] = prev_ngram_value
end
generated_ngram
end
def get_generated_ngrams(banned_ngrams, prev_input_ids)
ngram_idx = prev_input_ids.slice(prev_input_ids.length + 1 - @no_repeat_ngram_size, prev_input_ids.length)
banned = banned_ngrams[JSON.generate(ngram_idx)] || []
banned
end
def calc_banned_ngram_tokens(prev_input_ids)
banned_tokens = []
if prev_input_ids.length + 1 < @no_repeat_ngram_size
# return no banned tokens if we haven't generated no_repeat_ngram_size tokens yet
banned_tokens
else
generated_ngrams = get_ngrams(prev_input_ids)
banned_tokens = get_generated_ngrams(generated_ngrams, prev_input_ids)
banned_tokens
end
end
def call(input_ids, logits)
banned_tokens = calc_banned_ngram_tokens(input_ids)
banned_tokens.each do |token|
logits[token] = -Float::INFINITY
end
logits
end
end
class MinLengthLogitsProcessor < LogitsProcessor
def initialize(min_length, eos_token_id)
super()
@min_length = min_length
@eos_token_id = eos_token_id.is_a?(Array) ? eos_token_id : [eos_token_id]
end
def call(input_ids, logits)
if input_ids.length < @min_length
@eos_token_id.each do |eos_token|
logits[eos_token] = -Float::INFINITY
end
end
logits
end
end
class ForcedBOSTokenLogitsProcessor < LogitsProcessor
def initialize(bos_token_id)
super()
@bos_token_id = bos_token_id
end
def call(input_ids, logits)
if input_ids.length == 1
logits.map! { -Float::INFINITY }
logits[@bos_token_id] = 0
end
logits
end
end
class ForcedEOSTokenLogitsProcessor < LogitsProcessor
def initialize(max_length, forced_eos_token_id)
super()
@max_length = max_length
@forced_eos_token_id = forced_eos_token_id
end
def call(input_ids, logits)
end
end
end
end
================================================
FILE: lib/informers/utils/hub.rb
================================================
module Informers
module Utils
module Hub
class FileResponse
attr_reader :exists, :status
def initialize(file_path)
@file_path = file_path
@exists = File.exist?(file_path)
if @exists
@status = ["200", "OK"]
else
@status = ["404", "Not Found"]
end
end
def read
File.binread(@file_path)
end
end
def self.is_valid_url(string, protocols = nil, valid_hosts = nil)
begin
url = URI.parse(string)
rescue
return false
end
if protocols && !protocols.include?(url.scheme)
return false
end
if valid_hosts && !valid_hosts.include?(url.host)
return false
end
true
end
def self.get_file(url_or_path, progress_callback = nil, progress_info = {})
if !is_valid_url(url_or_path, ["http", "https"])
raise Error, "Invalid url"
else
headers = {}
headers["User-Agent"] = "informers/#{VERSION};"
# Check whether we are making a request to the Hugging Face Hub.
is_hfurl = is_valid_url(url_or_path, ["http", "https"], ["huggingface.co", "hf.co"])
if is_hfurl
# If an access token is present in the environment variables,
# we add it to the request headers.
token = ENV["HF_TOKEN"]
if token
headers["Authorization"] = "Bearer #{token}"
end
end
options = {}
if progress_callback
total_size = nil
options[:content_length_proc] = lambda do |size|
total_size = size
Utils.dispatch_callback(progress_callback, {status: "download"}.merge(progress_info).merge(total_size: size))
end
options[:progress_proc] = lambda do |size|
Utils.dispatch_callback(progress_callback, {status: "progress"}.merge(progress_info).merge(size: size, total_size: total_size))
end
end
URI.parse(url_or_path).open(**headers, **options)
end
end
class FileCache
attr_reader :path
def initialize(path)
@path = path
end
def match(request)
file_path = resolve_path(request)
file = FileResponse.new(file_path)
file if file.exists
end
def put(request, response)
output_path = resolve_path(request)
begin
tmp_path = "#{output_path}.incomplete"
FileUtils.mkdir_p(File.dirname(output_path))
File.open(tmp_path, "wb") do |f|
while !response.eof?
f.write(response.read(1024 * 1024))
end
end
FileUtils.move(tmp_path, output_path)
rescue => e
warn "An error occurred while writing the file to cache: #{e}"
end
end
def resolve_path(request)
File.join(@path, request)
end
end
def self.try_cache(cache, *names)
names.each do |name|
begin
result = cache.match(name)
return result if result
rescue
next
end
end
nil
end
def self.get_model_file(path_or_repo_id, filename, fatal = true, **options)
# Initiate file retrieval
Utils.dispatch_callback(options[:progress_callback], {
status: "initiate",
name: path_or_repo_id,
file: filename
})
# If `cache_dir` is not specified, use the default cache directory
cache = FileCache.new(options[:cache_dir] || Informers.cache_dir)
revision = options[:revision] || "main"
request_url = path_join(path_or_repo_id, filename)
remote_url = path_join(
Informers.remote_host,
Informers.remote_path_template
.gsub("{model}", path_or_repo_id)
.gsub("{revision}", URI.encode_www_form_component(revision)),
filename
)
# Choose cache key for filesystem cache
# When using the main revision (default), we use the request URL as the cache key.
# If a specific revision is requested, we account for this in the cache key.
fs_cache_key = revision == "main" ? request_url : path_join(path_or_repo_id, revision, filename)
proposed_cache_key = fs_cache_key
resolved_path = cache.resolve_path(proposed_cache_key)
# Whether to cache the final response in the end.
to_cache_response = false
# A caching system is available, so we try to get the file from it.
response = try_cache(cache, proposed_cache_key)
cache_hit = !response.nil?
if response.nil?
# File is not cached, so we perform the request
if response.nil? || response.status[0] == "404"
# File not found locally. This means either:
# - The user has disabled local file access (`Informers.allow_local_models = false`)
# - the path is a valid HTTP url (`response.nil?`)
# - the path is not a valid HTTP url and the file is not present on the file system or local server (`response.status[0] == "404"`)
if options[:local_files_only] || !Informers.allow_remote_models
# User requested local files only, but the file is not found locally.
if fatal
raise Error, "`local_files_only: true` or `Informers.allow_remote_models = false` and file was not found locally at #{resolved_path.inspect}."
else
# File not found, but this file is optional.
# TODO in future, cache the response?
return nil
end
end
progress_info = {
name: path_or_repo_id,
file: filename
}
# File not found locally, so we try to download it from the remote server
response = get_file(remote_url, options[:progress_callback], progress_info)
if response.status[0] != "200"
# should not happen
raise Todo
end
# Success! We use the proposed cache key from earlier
cache_key = proposed_cache_key
end
to_cache_response = cache && !response.is_a?(FileResponse) && response.status[0] == "200"
end
if to_cache_response && cache_key && cache.match(cache_key).nil?
cache.put(cache_key, response)
end
Utils.dispatch_callback(options[:progress_callback], {
status: "done",
name: path_or_repo_id,
file: filename,
cache_hit: cache_hit
})
resolved_path
end
def self.get_model_json(model_path, file_name, fatal = true, **options)
buffer = get_model_file(model_path, file_name, fatal, **options)
if buffer.nil?
# Return empty object
return {}
end
JSON.load_file(buffer)
end
def self.path_join(*parts)
parts = parts.map.with_index do |part, index|
if index != 0
part = part.delete_prefix("/")
end
if index != parts.length - 1
part = part.delete_suffix("/")
end
part
end
parts.join("/")
end
def self.display_progress(filename, width, size, expected_size)
bar_width = [width - (filename.length + 3), 1].max
progress = expected_size && expected_size > 0 ? size / expected_size.to_f : 0
done = (progress * bar_width).round
not_done = bar_width - done
"#{filename} |#{"█" * done}#{" " * not_done}|"
end
end
end
end
================================================
FILE: lib/informers/utils/image.rb
================================================
module Informers
module Utils
class RawImage
RESAMPLING_MAPPING = {
0 => "nearest",
1 => "lanczos",
2 => "bilinear",
3 => "bicubic",
4 => "box",
5 => "hamming"
}
attr_reader :image, :width, :height, :channels
def initialize(image)
@image = image
@width = image.width
@height = image.height
@channels = image.bands
end
def data
@image.write_to_memory.unpack("C*")
end
def size
[@width, @height]
end
def resize(width, height, resample: 2)
resample_method = RESAMPLING_MAPPING[resample] || resample
case resample_method
when "bilinear", "bicubic"
img =
@image.affine(
[width / @width.to_f, 0, 0, height / @height.to_f],
interpolate: Vips::Interpolate.new(resample_method.to_sym)
)
else
raise Todo
end
RawImage.new(img)
end
def center_crop(crop_width, crop_height)
# If the image is already the desired size, return it
if @width == crop_width && @height == crop_height
return self
end
# Determine bounds of the image in the new canvas
width_offset = (@width - crop_width) / 2.0
height_offset = (@height - crop_height) / 2.0
if width_offset >= 0 && height_offset >= 0
# Cropped image lies entirely within the original image
img = @image.crop(
width_offset.floor,
height_offset.floor,
crop_width,
crop_height
)
elsif width_offset <= 0 && height_offset <= 0
raise Todo
else
raise Todo
end
RawImage.new(img)
end
def rgb
if @channels == 3
return self
end
raise Todo
end
def save(path)
@image.write_to_file(path)
end
def self.read(input)
if input.is_a?(RawImage)
input
elsif input.is_a?(URI)
require "open-uri"
RawImage.new(Vips::Image.new_from_buffer(input.read, ""))
elsif input.is_a?(String)
RawImage.new(Vips::Image.new_from_file(input))
else
raise ArgumentError, "Unsupported input type: #{input.class.name}"
end
end
def self.from_array(input)
c, h, w = Utils.dims(input)
pixel_data = Array.new(w * h * c)
input.each_with_index do |cv, ci|
cv.each_with_index do |hv, hi|
hv.each_with_index do |v, wi|
pixel_data[(hi * w * c) + (wi * c) + ci] = v
end
end
end
RawImage.new(Vips::Image.new_from_memory_copy(pixel_data.pack("C*"), w, h, c, :uchar))
end
end
end
end
================================================
FILE: lib/informers/utils/math.rb
================================================
module Informers
module Utils
def self.interpolate_data(input, in_shape, out_shape, mode = "bilinear", align_corners = false)
in_channels, in_height, in_width = in_shape
out_height, out_width = out_shape
# TODO use mode and align_corners
# Output image dimensions
x_scale = out_width / in_width.to_f
y_scale = out_height / in_height.to_f
# Output image
out_img = Array.new(out_height * out_width * in_channels)
# Pre-calculate strides
in_stride = in_height * in_width
out_stride = out_height * out_width
out_height.times do |i|
out_width.times do |j|
# Calculate output offset
out_offset = i * out_width + j
# Calculate input pixel coordinates
x = (j + 0.5) / x_scale - 0.5
y = (i + 0.5) / y_scale - 0.5
# Calculate the four nearest input pixels
# We also check if the input pixel coordinates are within the image bounds
x1 = x.floor
y1 = y.floor
x2 = [x1 + 1, in_width - 1].min
y2 = [y1 + 1, in_height - 1].min
x1 = [x1, 0].max
y1 = [y1, 0].max
# Calculate the fractional distances between the input pixel and the four nearest pixels
s = x - x1
t = y - y1
# Perform bilinear interpolation
w1 = (1 - s) * (1 - t)
w2 = s * (1 - t)
w3 = (1 - s) * t
w4 = s * t
# Calculate the four nearest input pixel indices
y_stride = y1 * in_width
x_stride = y2 * in_width
idx1 = y_stride + x1
idx2 = y_stride + x2
idx3 = x_stride + x1
idx4 = x_stride + x2
in_channels.times do |k|
# Calculate channel offset
c_offset = k * in_stride
out_img[k * out_stride + out_offset] =
w1 * input[c_offset + idx1] +
w2 * input[c_offset + idx2] +
w3 * input[c_offset + idx3] +
w4 * input[c_offset + idx4]
end
end
end
out_img
end
def self.softmax(arr)
# Compute the maximum value in the array
max_val = arr.max
# Compute the exponentials of the array values
exps = arr.map { |x| Math.exp(x - max_val) }
# Compute the sum of the exponentials
sum_exps = exps.sum
# Compute the softmax values
softmax_arr = exps.map { |x| x / sum_exps }
softmax_arr
end
def self.sigmoid(arr)
if arr[0].is_a?(Array)
return arr.map { |a| sigmoid(a) }
end
arr.map { |v| 1 / (1 + Math.exp(-v)) }
end
def self.get_top_items(items, top_k = 0)
# if top == 0, return all
items = items
.map.with_index { |x, i| [i, x] } # Get indices ([index, score])
.sort_by { |v| -v[1] } # Sort by log probabilities
if !top_k.nil? && top_k > 0
items = items.slice(0, top_k) # Get top k items
end
items
end
def self.max(arr)
if arr.length == 0
raise Error, "Array must not be empty"
end
arr.map.with_index.max_by { |v, _| v }
end
end
end
================================================
FILE: lib/informers/utils/tensor.rb
================================================
module Informers
module Utils
def self.mean_pooling(last_hidden_state, attention_mask)
last_hidden_state.zip(attention_mask).map do |state, mask|
state[0].size.times.map do |k|
sum = 0.0
count = 0
state.zip(mask) do |s, m|
count += m
sum += s[k] * m
end
sum / count
end
end
end
def self.normalize(result)
result.map do |row|
norm = Math.sqrt(row.sum { |v| v * v })
row.map { |v| v / norm }
end
end
def self.stack(tensors, dim = 0)
tensors
end
def self.ones_like(tensor)
if tensor[0].is_a?(Array)
return tensor.map { |v| ones_like(v) }
end
tensor.map { |_| 1 }
end
def self.dims(tensor)
dims = []
while tensor.is_a?(Array)
dims << tensor.size
tensor = tensor[0]
end
dims
end
def self.interpolate(input, shape, mode = "bilinear", align_corners = false)
out_height, out_width = shape
# Input image dimensions
in_channels = dims(input)[-3] || 1
in_height = dims(input)[-2]
in_width = dims(input)[-1]
output = interpolate_data(
input.flatten,
[in_channels, in_height, in_width],
[out_height, out_width],
mode,
align_corners
)
reshape(output, [in_channels, out_height, out_width])
end
def self.reshape(arr, dims)
arr = arr.flatten
dims[1..-1].reverse_each do |dim|
arr = arr.each_slice(dim)
end
arr.to_a
end
end
end
================================================
FILE: lib/informers/version.rb
================================================
module Informers
VERSION = "1.2.1"
end
================================================
FILE: lib/informers.rb
================================================
# dependencies
require "onnxruntime"
require "tokenizers"
# stdlib
require "io/console"
require "json"
require "open-uri"
require "open3"
require "stringio"
require "uri"
# modules
require_relative "informers/backends/onnx"
require_relative "informers/utils/audio"
require_relative "informers/utils/core"
require_relative "informers/utils/dtypes"
require_relative "informers/utils/generation"
require_relative "informers/utils/ffmpeg"
require_relative "informers/utils/hub"
require_relative "informers/utils/image"
require_relative "informers/utils/math"
require_relative "informers/utils/tensor"
require_relative "informers/configs"
require_relative "informers/env"
require_relative "informers/model"
require_relative "informers/models"
require_relative "informers/processors"
require_relative "informers/tokenizers"
require_relative "informers/version"
require_relative "informers/pipelines"
module Informers
class Error < StandardError; end
class Todo < Error
def message
"not implemented yet"
end
end
end
================================================
FILE: test/model_test.rb
================================================
require_relative "test_helper"
class ModelTest < Minitest::Test
# https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
def test_all_minilm
sentences = ["This is an example sentence", "Each sentence is converted"]
model = Informers.pipeline("embedding", "sentence-transformers/all-MiniLM-L6-v2")
embeddings = model.(sentences)
assert_elements_in_delta [0.067657, 0.063496, 0.048713], embeddings[0][..2]
assert_elements_in_delta [0.086439, 0.10276, 0.0053946], embeddings[1][..2]
end
# https://huggingface.co/Xenova/all-MiniLM-L6-v2
def test_all_minilm_xenova
sentences = ["This is an example sentence", "Each sentence is converted"]
model = Informers.pipeline("embedding", "Xenova/all-MiniLM-L6-v2", dtype: "q8")
embeddings = model.(sentences)
assert_elements_in_delta [0.045927, 0.07328, 0.05401], embeddings[0][..2]
assert_elements_in_delta [0.081881, 0.1076, -0.01324], embeddings[1][..2]
end
# https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1
def test_multi_qa_minilm
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]
model = Informers.pipeline("embedding", "sentence-transformers/multi-qa-MiniLM-L6-cos-v1")
query_embedding = model.(query)
doc_embeddings = model.(docs)
scores = doc_embeddings.map { |e| e.zip(query_embedding).sum { |d, q| d * q } }
doc_score_pairs = docs.zip(scores).sort_by { |d, s| -s }
assert_equal "Around 9 Million people live in London", doc_score_pairs[0][0]
assert_in_delta 0.9156, doc_score_pairs[0][1]
assert_equal "London is known for its financial district", doc_score_pairs[1][0]
assert_in_delta 0.4948, doc_score_pairs[1][1]
end
# https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2
def test_paraphrase_minilm
sentences = ["This is an example sentence", "Each sentence is converted"]
model = Informers.pipeline("embedding", "sentence-transformers/paraphrase-MiniLM-L6-v2")
embeddings = model.(sentences, normalize: false)
assert_elements_in_delta [0.067359, 0.783935, 0.270018], embeddings[0][..2]
assert_elements_in_delta [0.122117, 0.670228, 0.317166], embeddings[1][..2]
end
# https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1
def test_mxbai_embed
query_prefix = "Represent this sentence for searching relevant passages: "
input = [
"The dog is barking",
"The cat is purring",
query_prefix + "puppy"
]
model = Informers.pipeline("embedding", "mixedbread-ai/mxbai-embed-large-v1")
embeddings = model.(input, pooling: "cls", normalize: false)
assert_elements_in_delta [-0.61227727, 1.4060247, -0.04079155], embeddings[1][..2]
assert_elements_in_delta [-0.00624076, 0.12864432, 0.5248165], embeddings[-1][..2]
end
# https://huggingface.co/Supabase/gte-small
def test_gte_small
sentences = ["That is a happy person", "That is a very happy person"]
model = Informers.pipeline("embedding", "Supabase/gte-small")
embeddings = model.(sentences)
assert_elements_in_delta [-0.05316979, 0.01044252, 0.06194701], embeddings[0][..2]
assert_elements_in_delta [-0.05246907, 0.03752426, 0.07344585], embeddings[-1][..2]
end
# https://huggingface.co/intfloat/e5-base-v2
def test_e5_base
doc_prefix = "passage: "
query_prefix = "query: "
input = [
doc_prefix + "Ruby is a programming language created by Matz",
query_prefix + "Ruby creator"
]
model = Informers.pipeline("embedding", "intfloat/e5-base-v2")
embeddings = model.(input)
assert_elements_in_delta [-0.00596662, -0.03730119, -0.0703470], embeddings[0][..2]
assert_elements_in_delta [0.00298353, -0.04421991, -0.0591884], embeddings[-1][..2]
end
# https://huggingface.co/nomic-ai/nomic-embed-text-v1
def test_nomic_embed
doc_prefix = "search_document: "
query_prefix = "search_query: "
input = [
doc_prefix + "The dog is barking",
query_prefix + "puppy"
]
model = Informers.pipeline("embedding", "nomic-ai/nomic-embed-text-v1")
embeddings = model.(input)
assert_elements_in_delta [-0.00645858, 0.01145126, 0.0099767], embeddings[0][..2]
assert_elements_in_delta [-0.01173127, 0.04957652, -0.0176401], embeddings[-1][..2]
end
# https://huggingface.co/BAAI/bge-base-en-v1.5
def test_bge_base
query_prefix = "Represent this sentence for searching relevant passages: "
input = [
"The dog is barking",
"The cat is purring",
query_prefix + "puppy"
]
model = Informers.pipeline("embedding", "BAAI/bge-base-en-v1.5")
embeddings = model.(input)
assert_elements_in_delta [-0.07482512, -0.0770234, 0.03398684], embeddings[1][..2]
assert_elements_in_delta [0.00029264, -0.0619305, -0.06199387], embeddings[-1][..2]
end
# https://huggingface.co/jinaai/jina-embeddings-v2-base-en
def test_jina_embeddings
sentences = ["How is the weather today?", "What is the current weather like today?"]
model = Informers.pipeline("embedding", "jinaai/jina-embeddings-v2-base-en", model_file_name: "../model")
embeddings = model.(sentences)
assert_elements_in_delta [-0.02488641, -0.0429398, 0.04303398], embeddings[0][..2]
assert_elements_in_delta [-0.0081194, -0.06225249, 0.03116853], embeddings[1][..2]
end
# https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5
def test_snowflake_arctic_embed
query_prefix = "Represent this sentence for searching relevant passages: "
input = [
"The dog is barking",
"The cat is purring",
query_prefix + "puppy"
]
model = Informers.pipeline("embedding", "Snowflake/snowflake-arctic-embed-m-v1.5")
embeddings = model.(input, model_output: "sentence_embedding", pooling: "none")
assert_elements_in_delta [0.03239886, 0.0009998, 0.08401278], embeddings[0][..2]
assert_elements_in_delta [-0.02530634, -0.02715422, 0.01218867], embeddings[-1][..2]
embeddings = model.(input, model_output: "token_embeddings", pooling: "cls")
assert_elements_in_delta [0.03239886, 0.0009998, 0.08401278], embeddings[0][..2]
assert_elements_in_delta [-0.02530634, -0.02715422, 0.01218867], embeddings[-1][..2]
end
# https://huggingface.co/sentence-transformers/all-mpnet-base-v2
def test_all_mpnet
sentences = ["This is an example sentence", "Each sentence is converted"]
model = Informers.pipeline("embedding", "sentence-transformers/all-mpnet-base-v2")
embeddings = model.(sentences)
assert_elements_in_delta [0.02250263, -0.07829167, -0.02303071], embeddings[0][..2]
assert_elements_in_delta [0.04170236, 0.00109747, -0.01553415], embeddings[1][..2]
end
# https://huggingface.co/BAAI/bge-m3
def test_bge_m3
sentences = ["This is an example sentence", "Each sentence is converted"]
model = Informers.pipeline("embedding", "BAAI/bge-m3")
model.(sentences, model_output: "token_embeddings")
end
# https://huggingface.co/mixedbread-ai/mxbai-rerank-base-v1
def test_mxbai_rerank
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]
model = Informers.pipeline("reranking", "mixedbread-ai/mxbai-rerank-base-v1")
result = model.(query, docs, return_documents: true)
assert_equal 0, result[0][:doc_id]
assert_in_delta 0.984, result[0][:score]
assert_equal docs[0], result[0][:text]
assert_equal 1, result[1][:doc_id]
assert_in_delta 0.139, result[1][:score]
assert_equal docs[1], result[1][:text]
end
# https://huggingface.co/jinaai/jina-reranker-v1-turbo-en
def test_jina_reranker
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]
model = Informers.pipeline("reranking", "jinaai/jina-reranker-v1-turbo-en")
result = model.(query, docs, return_documents: true)
assert_equal 0, result[0][:doc_id]
assert_in_delta 0.912, result[0][:score]
assert_equal docs[0], result[0][:text]
assert_equal 1, result[1][:doc_id]
assert_in_delta 0.0555, result[1][:score]
assert_equal docs[1], result[1][:text]
end
# https://huggingface.co/BAAI/bge-reranker-base
def test_bge_reranker
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]
model = Informers.pipeline("reranking", "BAAI/bge-reranker-base")
result = model.(query, docs, return_documents: true)
assert_equal 0, result[0][:doc_id]
assert_in_delta 0.996, result[0][:score]
assert_equal docs[0], result[0][:text]
assert_equal 1, result[1][:doc_id]
assert_in_delta 0.000158, result[1][:score], 0.000001
assert_equal docs[1], result[1][:text]
end
# https://huggingface.co/Xenova/ms-marco-MiniLM-L-6-v2
def test_ms_marco_minilm
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]
model = Informers.pipeline("reranking", "Xenova/ms-marco-MiniLM-L-6-v2")
result = model.(query, docs, return_documents: true)
assert_equal 0, result[0][:doc_id]
assert_in_delta 1, result[0][:score]
assert_equal docs[0], result[0][:text]
assert_equal 1, result[1][:doc_id]
assert_in_delta 0.0067, result[1][:score]
assert_equal docs[1], result[1][:text]
end
end
================================================
FILE: test/pipeline_test.rb
================================================
require_relative "test_helper"
class PipelineTest < Minitest::Test
def test_ner
ner = Informers.pipeline("ner")
result = ner.("Ruby is a programming language created by Matz")
assert_equal 1, result.size
assert_equal "PER", result[0][:entity_group]
assert_in_delta 0.994, result[0][:score]
assert_equal "Matz", result[0][:word]
assert_equal 42, result[0][:start]
assert_equal 46, result[0][:end]
end
def test_ner_aggregation_strategy
ner = Informers.pipeline("ner")
result = ner.("Ruby is a programming language created by Matz", aggregation_strategy: "none")
assert_equal 2, result.size
assert_equal "B-PER", result[0][:entity]
assert_in_delta 0.996, result[0][:score]
assert_equal 8, result[0][:index]
assert_equal "Mat", result[0][:word]
assert_equal 42, result[0][:start]
assert_equal 45, result[0][:end]
end
def test_sentiment_analysis
classifier = Informers.pipeline("sentiment-analysis")
result = classifier.("I love transformers!")
assert_equal "POSITIVE", result[:label]
assert_in_delta 0.9997887, result[:score], 0.0000001
result = classifier.("This is super cool")
assert_equal "POSITIVE", result[:label]
assert_in_delta 0.9998608, result[:score], 0.0000001
result = classifier.(["This is super cool", "I didn't like it"])
assert_equal "POSITIVE", result[0][:label]
assert_in_delta 0.9998600, result[0][:score], 0.0000001
assert_equal "NEGATIVE", result[1][:label]
assert_in_delta 0.9985375, result[1][:score], 0.0000001
end
def test_question_answering
qa = Informers.pipeline("question-answering")
result = qa.("Who invented Ruby?", "Ruby is a programming language created by Matz")
assert_in_delta 0.998, result[:score]
assert_equal "Matz", result[:answer]
assert_equal 42, result[:start]
assert_equal 46, result[:end]
end
def test_zero_shot_classification
classifier = Informers.pipeline("zero-shot-classification")
text = "Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app."
labels = ["mobile", "billing", "website", "account access"]
result = classifier.(text, labels)
assert_equal text, result[:sequence]
assert_equal ["mobile", "billing", "account access", "website"], result[:labels]
assert_elements_in_delta [0.633, 0.134, 0.121, 0.111], result[:scores]
end
def test_text2text_generation
text2text = Informers.pipeline("text2text-generation")
result = text2text.("translate from English to French: I'm very happy")
assert_equal "Je suis très heureux.", result[0][:generated_text]
end
def test_translation
translator = Informers.pipeline("translation", "Xenova/nllb-200-distilled-600M")
result = translator.("जीवन एक चॉकलेट बॉक्स की तरह है।", src_lang: "hin_Deva", tgt_lang: "fra_Latn")
assert_equal "La vie est comme une boîte à chocolat.", result[0][:translation_text]
end
def test_text_generation
generator = Informers.pipeline("text-generation")
result = generator.("I enjoy walking with my cute dog,")
assert_equal "I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to", result[0][:generated_text]
gitextract_hoze518r/
├── .github/
│ └── workflows/
│ └── build.yml
├── .gitignore
├── CHANGELOG.md
├── Gemfile
├── LICENSE.txt
├── README.md
├── Rakefile
├── informers.gemspec
├── lib/
│ ├── informers/
│ │ ├── backends/
│ │ │ └── onnx.rb
│ │ ├── configs.rb
│ │ ├── env.rb
│ │ ├── model.rb
│ │ ├── models.rb
│ │ ├── pipelines.rb
│ │ ├── processors.rb
│ │ ├── tokenizers.rb
│ │ ├── utils/
│ │ │ ├── audio.rb
│ │ │ ├── core.rb
│ │ │ ├── dtypes.rb
│ │ │ ├── ffmpeg.rb
│ │ │ ├── generation.rb
│ │ │ ├── hub.rb
│ │ │ ├── image.rb
│ │ │ ├── math.rb
│ │ │ └── tensor.rb
│ │ └── version.rb
│ └── informers.rb
└── test/
├── model_test.rb
├── pipeline_test.rb
└── test_helper.rb
SYMBOL INDEX (497 symbols across 22 files)
FILE: lib/informers.rb
type Informers (line 33) | module Informers
class Error (line 34) | class Error < StandardError; end
class Todo (line 36) | class Todo < Error
method message (line 37) | def message
FILE: lib/informers/backends/onnx.rb
type Informers (line 1) | module Informers
type Backends (line 2) | module Backends
type Onnx (line 3) | module Onnx
function device_to_execution_providers (line 4) | def self.device_to_execution_providers(device)
FILE: lib/informers/configs.rb
type Informers (line 1) | module Informers
class PretrainedConfig (line 2) | class PretrainedConfig
method initialize (line 3) | def initialize(config_json)
method [] (line 7) | def [](key)
method []= (line 11) | def []=(key, value)
method to_h (line 15) | def to_h
method from_pretrained (line 19) | def self.from_pretrained(
method load_config (line 39) | def self.load_config(pretrained_model_name_or_path, **options)
class AutoConfig (line 45) | class AutoConfig
method from_pretrained (line 46) | def self.from_pretrained(...)
FILE: lib/informers/env.rb
type Informers (line 1) | module Informers
FILE: lib/informers/model.rb
type Informers (line 1) | module Informers
class Model (line 3) | class Model
method initialize (line 4) | def initialize(model_id, quantized: false)
method embed (line 9) | def embed(texts)
FILE: lib/informers/models.rb
type Informers (line 1) | module Informers
class PretrainedMixin (line 16) | class PretrainedMixin
method from_pretrained (line 17) | def self.from_pretrained(
class PreTrainedModel (line 69) | class PreTrainedModel
method initialize (line 74) | def initialize(config, session)
method from_pretrained (line 110) | def self.from_pretrained(
method construct_session (line 178) | def self.construct_session(pretrained_model_name_or_path, file_name,...
method call (line 210) | def call(model_inputs, **kwargs)
method generate (line 214) | def generate(inputs, generation_config = nil, logits_processor = nil...
method get_logits_processor (line 353) | def get_logits_processor(
method get_generation_config (line 410) | def get_generation_config(generation_config)
method seq2seq_forward (line 429) | def seq2seq_forward(model_inputs)
method prepare_position_ids (line 465) | def prepare_position_ids(session, feeds, use_cache_branch)
method get_past_key_values (line 473) | def get_past_key_values(decoder_results, past_key_values)
method get_attentions (line 493) | def get_attentions(decoder_results)
method add_past_key_values (line 509) | def add_past_key_values(decoder_feeds, past_key_values)
method seq2seq_start_beams (line 541) | def seq2seq_start_beams(input_token_ids, generation_config, num_outp...
method prepare_attention_mask (line 587) | def prepare_attention_mask(tokens)
method seq2seq_run_beam (line 605) | def seq2seq_run_beam(beam)
method seq2seq_update_beam (line 636) | def seq2seq_update_beam(beam, new_token_id)
method group_beams (line 640) | def group_beams(beams)
method encoder_forward (line 653) | def encoder_forward(model_inputs, output_names: nil)
method decoder_forward (line 665) | def decoder_forward(model_inputs)
method decoder_start_beams (line 691) | def decoder_start_beams(input_token_ids, generation_config, num_outp...
method decoder_run_beam (line 730) | def decoder_run_beam(beam)
method decoder_update_beam (line 749) | def decoder_update_beam(beam, new_token_id)
method session_run (line 754) | def session_run(session, inputs, output_names: nil)
method replace_tensors (line 766) | def replace_tensors(obj)
method validate_inputs (line 771) | def validate_inputs(session, inputs)
method get_start_beams (line 775) | def get_start_beams(input_token_ids, generation_config, num_output_t...
method run_beam (line 779) | def run_beam(beam)
method update_beam (line 783) | def update_beam(beam, new_token_id)
class BertPreTrainedModel (line 788) | class BertPreTrainedModel < PreTrainedModel
class BertModel (line 791) | class BertModel < BertPreTrainedModel
class BertForMaskedLM (line 794) | class BertForMaskedLM < BertPreTrainedModel
method call (line 795) | def call(model_inputs)
class BertForSequenceClassification (line 800) | class BertForSequenceClassification < BertPreTrainedModel
method call (line 801) | def call(model_inputs)
class BertForTokenClassification (line 806) | class BertForTokenClassification < BertPreTrainedModel
method call (line 807) | def call(model_inputs)
class ModernBertPreTrainedModel (line 812) | class ModernBertPreTrainedModel < PreTrainedModel
class ModernBertModel (line 815) | class ModernBertModel < ModernBertPreTrainedModel
class ModernBertForMaskedLM (line 818) | class ModernBertForMaskedLM < ModernBertPreTrainedModel
method call (line 819) | def call(model_inputs)
class ModernBertForSequenceClassification (line 824) | class ModernBertForSequenceClassification < ModernBertPreTrainedModel
method call (line 825) | def call(model_inputs)
class ModernBertForTokenClassification (line 830) | class ModernBertForTokenClassification < ModernBertPreTrainedModel
method call (line 831) | def call(model_inputs)
class NomicBertPreTrainedModel (line 836) | class NomicBertPreTrainedModel < PreTrainedModel
class NomicBertModel (line 839) | class NomicBertModel < NomicBertPreTrainedModel
class ConvBertPreTrainedModel (line 842) | class ConvBertPreTrainedModel < PreTrainedModel
class ConvBertModel (line 845) | class ConvBertModel < ConvBertPreTrainedModel
class ElectraPreTrainedModel (line 848) | class ElectraPreTrainedModel < PreTrainedModel
class ElectraModel (line 852) | class ElectraModel < ElectraPreTrainedModel
class DebertaV2PreTrainedModel (line 855) | class DebertaV2PreTrainedModel < PreTrainedModel
class DebertaV2Model (line 858) | class DebertaV2Model < DebertaV2PreTrainedModel
class DistilBertPreTrainedModel (line 861) | class DistilBertPreTrainedModel < PreTrainedModel
class DistilBertModel (line 864) | class DistilBertModel < DistilBertPreTrainedModel
class DistilBertForSequenceClassification (line 867) | class DistilBertForSequenceClassification < DistilBertPreTrainedModel
method call (line 868) | def call(model_inputs)
class DistilBertForQuestionAnswering (line 873) | class DistilBertForQuestionAnswering < DistilBertPreTrainedModel
method call (line 874) | def call(model_inputs)
class MPNetPreTrainedModel (line 879) | class MPNetPreTrainedModel < PreTrainedModel
class MPNetModel (line 882) | class MPNetModel < MPNetPreTrainedModel
class T5PreTrainedModel (line 885) | class T5PreTrainedModel < PreTrainedModel
class T5Model (line 888) | class T5Model < T5PreTrainedModel
class T5ForConditionalGeneration (line 891) | class T5ForConditionalGeneration < T5PreTrainedModel
method initialize (line 892) | def initialize(config, session, decoder_merged_session, generation_c...
class BartPretrainedModel (line 907) | class BartPretrainedModel < PreTrainedModel
class BartModel (line 910) | class BartModel < BartPretrainedModel
class BartForConditionalGeneration (line 913) | class BartForConditionalGeneration < BartPretrainedModel
method initialize (line 914) | def initialize(config, session, decoder_merged_session, generation_c...
class BartForSequenceClassification (line 929) | class BartForSequenceClassification < BartPretrainedModel
method call (line 930) | def call(model_inputs)
class MBartPreTrainedModel (line 935) | class MBartPreTrainedModel < PreTrainedModel
class MBartModel (line 938) | class MBartModel < MBartPreTrainedModel
class MBartForCausalLM (line 941) | class MBartForCausalLM < MBartPreTrainedModel
method initialize (line 945) | def initialize(config, decoder_merged_session, generation_config)
class M2M100PreTrainedModel (line 959) | class M2M100PreTrainedModel < PreTrainedModel
class M2M100Model (line 962) | class M2M100Model < M2M100PreTrainedModel
class M2M100ForConditionalGeneration (line 965) | class M2M100ForConditionalGeneration < M2M100PreTrainedModel
method initialize (line 966) | def initialize(config, session, decoder_merged_session, generation_c...
class Wav2Vec2PreTrainedModel (line 981) | class Wav2Vec2PreTrainedModel < PreTrainedModel
class Wav2Vec2Model (line 984) | class Wav2Vec2Model < Wav2Vec2PreTrainedModel
class Wav2Vec2ForSequenceClassification (line 987) | class Wav2Vec2ForSequenceClassification < Wav2Vec2PreTrainedModel
method call (line 988) | def call(model_inputs)
class RobertaPreTrainedModel (line 993) | class RobertaPreTrainedModel < PreTrainedModel
class RobertaModel (line 996) | class RobertaModel < RobertaPreTrainedModel
class RobertaForMaskedLM (line 999) | class RobertaForMaskedLM < RobertaPreTrainedModel
method call (line 1000) | def call(model_inputs)
class RobertaForTokenClassification (line 1005) | class RobertaForTokenClassification < RobertaPreTrainedModel
method call (line 1006) | def call(model_inputs)
class RobertaForSequenceClassification (line 1011) | class RobertaForSequenceClassification < RobertaPreTrainedModel
method call (line 1012) | def call(model_inputs)
class XLMRobertaPreTrainedModel (line 1017) | class XLMRobertaPreTrainedModel < PreTrainedModel
class XLMRobertaModel (line 1020) | class XLMRobertaModel < XLMRobertaPreTrainedModel
class XLMRobertaForSequenceClassification (line 1023) | class XLMRobertaForSequenceClassification < XLMRobertaPreTrainedModel
method call (line 1024) | def call(model_inputs)
class ViTPreTrainedModel (line 1029) | class ViTPreTrainedModel < PreTrainedModel
class ViTModel (line 1032) | class ViTModel < ViTPreTrainedModel
class ViTForImageClassification (line 1035) | class ViTForImageClassification < ViTPreTrainedModel
method call (line 1036) | def call(model_inputs)
class CLIPPreTrainedModel (line 1041) | class CLIPPreTrainedModel < PreTrainedModel
class CLIPModel (line 1044) | class CLIPModel < CLIPPreTrainedModel
class GPT2PreTrainedModel (line 1047) | class GPT2PreTrainedModel < PreTrainedModel
method initialize (line 1050) | def initialize(config, session, generation_config)
class GPT2Model (line 1063) | class GPT2Model < GPT2PreTrainedModel
class GPT2LMHeadModel (line 1066) | class GPT2LMHeadModel < GPT2PreTrainedModel
class OwlViTPreTrainedModel (line 1069) | class OwlViTPreTrainedModel < PreTrainedModel
class OwlViTModel (line 1072) | class OwlViTModel < OwlViTPreTrainedModel
class OwlViTForObjectDetection (line 1075) | class OwlViTForObjectDetection < OwlViTPreTrainedModel
class DetrPreTrainedModel (line 1078) | class DetrPreTrainedModel < PreTrainedModel
class DetrModel (line 1081) | class DetrModel < DetrPreTrainedModel
class DetrForObjectDetection (line 1084) | class DetrForObjectDetection < DetrPreTrainedModel
method call (line 1085) | def call(model_inputs)
class DetrForSegmentation (line 1090) | class DetrForSegmentation < DetrPreTrainedModel
method call (line 1091) | def call(model_inputs)
class Swin2SRPreTrainedModel (line 1096) | class Swin2SRPreTrainedModel < PreTrainedModel
class Swin2SRModel (line 1099) | class Swin2SRModel < Swin2SRPreTrainedModel
class Swin2SRForImageSuperResolution (line 1102) | class Swin2SRForImageSuperResolution < Swin2SRPreTrainedModel
class DPTPreTrainedModel (line 1105) | class DPTPreTrainedModel < PreTrainedModel
class DPTModel (line 1108) | class DPTModel < DPTPreTrainedModel
class DPTForDepthEstimation (line 1111) | class DPTForDepthEstimation < DPTPreTrainedModel
class VisionEncoderDecoderModel (line 1114) | class VisionEncoderDecoderModel < PreTrainedModel
method initialize (line 1117) | def initialize(config, session, decoder_merged_session, generation_c...
class DonutSwinPreTrainedModel (line 1161) | class DonutSwinPreTrainedModel < PreTrainedModel
class DonutSwinModel (line 1164) | class DonutSwinModel < DonutSwinPreTrainedModel
class WhisperPreTrainedModel (line 1167) | class WhisperPreTrainedModel < PreTrainedModel
class WhisperModel (line 1170) | class WhisperModel < WhisperPreTrainedModel
class WhisperForConditionalGeneration (line 1173) | class WhisperForConditionalGeneration < WhisperPreTrainedModel
method initialize (line 1177) | def initialize(config, session, decoder_merged_session, generation_c...
method generate (line 1191) | def generate(inputs, generation_config = nil, logits_processor = nil)
class VitsPreTrainedModel (line 1196) | class VitsPreTrainedModel < PreTrainedModel
class VitsModel (line 1199) | class VitsModel < VitsPreTrainedModel
method call (line 1200) | def call(model_inputs)
class SpeechT5PreTrainedModel (line 1205) | class SpeechT5PreTrainedModel < PreTrainedModel
class SpeechT5Model (line 1208) | class SpeechT5Model < SpeechT5PreTrainedModel
class SpeechT5ForSpeechToText (line 1211) | class SpeechT5ForSpeechToText < SpeechT5PreTrainedModel
class SpeechT5ForTextToSpeech (line 1214) | class SpeechT5ForTextToSpeech < SpeechT5PreTrainedModel
class ClapPreTrainedModel (line 1217) | class ClapPreTrainedModel < PreTrainedModel
class ClapModel (line 1220) | class ClapModel < ClapPreTrainedModel
class AutoModel (line 1392) | class AutoModel < PretrainedMixin
class AutoModelForSequenceClassification (line 1397) | class AutoModelForSequenceClassification < PretrainedMixin
class AutoModelForTokenClassification (line 1401) | class AutoModelForTokenClassification < PretrainedMixin
class AutoModelForSeq2SeqLM (line 1405) | class AutoModelForSeq2SeqLM < PretrainedMixin
class AutoModelForSpeechSeq2Seq (line 1409) | class AutoModelForSpeechSeq2Seq < PretrainedMixin
class AutoModelForTextToSpectrogram (line 1413) | class AutoModelForTextToSpectrogram < PretrainedMixin
class AutoModelForTextToWaveform (line 1417) | class AutoModelForTextToWaveform < PretrainedMixin
class AutoModelForCausalLM (line 1421) | class AutoModelForCausalLM < PretrainedMixin
class AutoModelForMaskedLM (line 1425) | class AutoModelForMaskedLM < PretrainedMixin
class AutoModelForQuestionAnswering (line 1429) | class AutoModelForQuestionAnswering < PretrainedMixin
class AutoModelForVision2Seq (line 1433) | class AutoModelForVision2Seq < PretrainedMixin
class AutoModelForImageClassification (line 1437) | class AutoModelForImageClassification < PretrainedMixin
class AutoModelForImageSegmentation (line 1441) | class AutoModelForImageSegmentation < PretrainedMixin
class AutoModelForSemanticSegmentation (line 1445) | class AutoModelForSemanticSegmentation < PretrainedMixin
class AutoModelForObjectDetection (line 1449) | class AutoModelForObjectDetection < PretrainedMixin
class AutoModelForZeroShotObjectDetection (line 1453) | class AutoModelForZeroShotObjectDetection < PretrainedMixin
class AutoModelForMaskGeneration (line 1457) | class AutoModelForMaskGeneration < PretrainedMixin
class AutoModelForCTC (line 1461) | class AutoModelForCTC < PretrainedMixin
class AutoModelForAudioClassification (line 1465) | class AutoModelForAudioClassification < PretrainedMixin
class AutoModelForXVector (line 1469) | class AutoModelForXVector < PretrainedMixin
class AutoModelForAudioFrameClassification (line 1473) | class AutoModelForAudioFrameClassification < PretrainedMixin
class AutoModelForDocumentQuestionAnswering (line 1477) | class AutoModelForDocumentQuestionAnswering < PretrainedMixin
class AutoModelForImageMatting (line 1481) | class AutoModelForImageMatting < PretrainedMixin
class AutoModelForImageToImage (line 1485) | class AutoModelForImageToImage < PretrainedMixin
class AutoModelForDepthEstimation (line 1489) | class AutoModelForDepthEstimation < PretrainedMixin
class AutoModelForImageFeatureExtraction (line 1493) | class AutoModelForImageFeatureExtraction < PretrainedMixin
class ModelOutput (line 1497) | class ModelOutput
method [] (line 1498) | def [](key)
class Seq2SeqLMOutput (line 1503) | class Seq2SeqLMOutput < ModelOutput
method initialize (line 1504) | def initialize(logits, past_key_values, encoder_outputs, decoder_att...
class SequenceClassifierOutput (line 1514) | class SequenceClassifierOutput < ModelOutput
method initialize (line 1517) | def initialize(logits)
class TokenClassifierOutput (line 1523) | class TokenClassifierOutput < ModelOutput
method initialize (line 1526) | def initialize(logits)
class MaskedLMOutput (line 1532) | class MaskedLMOutput < ModelOutput
method initialize (line 1535) | def initialize(logits)
class QuestionAnsweringModelOutput (line 1541) | class QuestionAnsweringModelOutput < ModelOutput
method initialize (line 1544) | def initialize(start_logits, end_logits)
class DetrObjectDetectionOutput (line 1551) | class DetrObjectDetectionOutput < ModelOutput
method initialize (line 1554) | def initialize(logits, pred_boxes)
class DetrSegmentationOutput (line 1561) | class DetrSegmentationOutput < ModelOutput
method initialize (line 1564) | def initialize(logits, pred_boxes, pred_masks)
FILE: lib/informers/pipelines.rb
type Informers (line 1) | module Informers
class Pipeline (line 2) | class Pipeline
method initialize (line 3) | def initialize(task:, model:, tokenizer: nil, processor: nil)
method prepare_images (line 13) | def prepare_images(images)
method prepare_audios (line 22) | def prepare_audios(audios, sampling_rate)
method get_bounding_box (line 36) | def get_bounding_box(box, as_integer)
class TextClassificationPipeline (line 46) | class TextClassificationPipeline < Pipeline
method call (line 47) | def call(texts, top_k: 1)
class TokenClassificationPipeline (line 88) | class TokenClassificationPipeline < Pipeline
method call (line 89) | def call(
method group_sub_entities (line 160) | def group_sub_entities(entities)
method get_tag (line 176) | def get_tag(entity_name)
method group_entities (line 192) | def group_entities(entities)
class QuestionAnsweringPipeline (line 228) | class QuestionAnsweringPipeline < Pipeline
method call (line 229) | def call(question, context, top_k: 1)
class FillMaskPipeline (line 280) | class FillMaskPipeline < Pipeline
method call (line 281) | def call(texts, top_k: 5)
class Text2TextGenerationPipeline (line 314) | class Text2TextGenerationPipeline < Pipeline
method call (line 317) | def call(texts, **generate_kwargs)
class SummarizationPipeline (line 356) | class SummarizationPipeline < Text2TextGenerationPipeline
class TranslationPipeline (line 360) | class TranslationPipeline < Text2TextGenerationPipeline
class TextGenerationPipeline (line 364) | class TextGenerationPipeline < Pipeline
method call (line 365) | def call(texts, **generate_kwargs)
class ZeroShotClassificationPipeline (line 420) | class ZeroShotClassificationPipeline < Pipeline
method initialize (line 421) | def initialize(**options)
method call (line 439) | def call(texts, candidate_labels, hypothesis_template: "This example...
class ImageToTextPipeline (line 499) | class ImageToTextPipeline < Pipeline
method call (line 500) | def call(images, **generate_kwargs)
class ImageClassificationPipeline (line 520) | class ImageClassificationPipeline < Pipeline
method call (line 521) | def call(images, top_k: 1)
class ImageSegmentationPipeline (line 551) | class ImageSegmentationPipeline < Pipeline
method initialize (line 552) | def initialize(**options)
method call (line 562) | def call(
class ZeroShotImageClassificationPipeline (line 627) | class ZeroShotImageClassificationPipeline < Pipeline
method call (line 628) | def call(images, candidate_labels, hypothesis_template: "This is a p...
class ObjectDetectionPipeline (line 671) | class ObjectDetectionPipeline < Pipeline
method call (line 672) | def call(images, threshold: 0.9, percentage: false)
class ZeroShotObjectDetectionPipeline (line 706) | class ZeroShotObjectDetectionPipeline < Pipeline
method call (line 707) | def call(
class DocumentQuestionAnsweringPipeline (line 760) | class DocumentQuestionAnsweringPipeline < Pipeline
method call (line 761) | def call(image, question, **generate_kwargs)
class TextToAudioPipeline (line 801) | class TextToAudioPipeline < Pipeline
method initialize (line 804) | def initialize(**options)
method call (line 811) | def call(text_inputs, speaker_embeddings: nil)
class FeatureExtractionPipeline (line 821) | class FeatureExtractionPipeline < Pipeline
method call (line 822) | def call(
class ImageFeatureExtractionPipeline (line 884) | class ImageFeatureExtractionPipeline < Pipeline
method call (line 885) | def call(images)
class AudioClassificationPipeline (line 895) | class AudioClassificationPipeline < Pipeline
method call (line 896) | def call(audio, top_k: nil)
class ZeroShotAudioClassificationPipeline (line 930) | class ZeroShotAudioClassificationPipeline < Pipeline
method call (line 931) | def call(audio, candidate_labels, hypothesis_template: "This is a so...
class AutomaticSpeechRecognitionPipeline (line 973) | class AutomaticSpeechRecognitionPipeline < Pipeline
method call (line 974) | def call(audio, **kwargs)
method call_whisper (line 985) | def call_whisper(audio, **kwargs)
class ImageToImagePipeline (line 990) | class ImageToImagePipeline < Pipeline
method call (line 991) | def call(images)
class DepthEstimationPipeline (line 1014) | class DepthEstimationPipeline < Pipeline
method call (line 1015) | def call(images)
class EmbeddingPipeline (line 1042) | class EmbeddingPipeline < FeatureExtractionPipeline
method call (line 1043) | def call(
class RerankingPipeline (line 1053) | class RerankingPipeline < Pipeline
method call (line 1054) | def call(
function pipeline (line 1355) | def pipeline(
function load_items (line 1429) | def load_items(mapping, model, pretrained_options)
FILE: lib/informers/processors.rb
type Informers (line 1) | module Informers
class FeatureExtractor (line 2) | class FeatureExtractor
method initialize (line 5) | def initialize(config)
class ImageFeatureExtractor (line 11) | class ImageFeatureExtractor < FeatureExtractor
method initialize (line 12) | def initialize(config)
method thumbnail (line 45) | def thumbnail(image, size, resample = 2)
method pad_image (line 67) | def pad_image(
method rescale (line 147) | def rescale(pixel_data)
method get_resize_output_image_size (line 153) | def get_resize_output_image_size(image, size)
method resize (line 214) | def resize(image)
method preprocess (line 219) | def preprocess(
method call (line 332) | def call(images, *args)
class CLIPFeatureExtractor (line 354) | class CLIPFeatureExtractor < ImageFeatureExtractor
class DPTFeatureExtractor (line 357) | class DPTFeatureExtractor < ImageFeatureExtractor
class ViTFeatureExtractor (line 360) | class ViTFeatureExtractor < ImageFeatureExtractor
class OwlViTFeatureExtractor (line 363) | class OwlViTFeatureExtractor < ImageFeatureExtractor
method post_process_object_detection (line 364) | def post_process_object_detection(*args)
class Swin2SRImageProcessor (line 369) | class Swin2SRImageProcessor < ImageFeatureExtractor
method pad_image (line 370) | def pad_image(pixel_data, img_dims, pad_size, **options)
class DonutFeatureExtractor (line 393) | class DonutFeatureExtractor < ImageFeatureExtractor
method pad_image (line 394) | def pad_image(pixel_data, img_dims, pad_size, **options)
class DetrFeatureExtractor (line 422) | class DetrFeatureExtractor < ImageFeatureExtractor
method call (line 423) | def call(images)
method post_process_object_detection (line 442) | def post_process_object_detection(*args)
method remove_low_and_no_objects (line 446) | def remove_low_and_no_objects(class_logits, mask_logits, object_mask...
method check_segment_validity (line 473) | def check_segment_validity(
method compute_segments (line 510) | def compute_segments(
method post_process_panoptic_segmentation (line 598) | def post_process_panoptic_segmentation(
type Utils (line 657) | module Utils
function center_to_corners_format (line 658) | def self.center_to_corners_format(v)
function post_process_object_detection (line 668) | def self.post_process_object_detection(outputs, threshold = 0.5, tar...
class WhisperFeatureExtractor (line 733) | class WhisperFeatureExtractor < FeatureExtractor
method initialize (line 734) | def initialize(config)
method _extract_fbank_features (line 740) | def _extract_fbank_features(waveform)
method call (line 744) | def call(audio)
class Wav2Vec2FeatureExtractor (line 749) | class Wav2Vec2FeatureExtractor < FeatureExtractor
method _zero_mean_unit_var_norm (line 750) | def _zero_mean_unit_var_norm(input_values)
method call (line 757) | def call(audio)
class ClapFeatureExtractor (line 776) | class ClapFeatureExtractor < FeatureExtractor
method initialize (line 777) | def initialize(config)
method call (line 783) | def call(audio, max_length: nil)
class Processor (line 788) | class Processor
method initialize (line 791) | def initialize(feature_extractor)
method call (line 795) | def call(input, *args)
class AutoProcessor (line 800) | class AutoProcessor
method from_pretrained (line 816) | def self.from_pretrained(
FILE: lib/informers/tokenizers.rb
type Informers (line 1) | module Informers
class PreTrainedTokenizer (line 2) | class PreTrainedTokenizer
method initialize (line 5) | def initialize(tokenizer_json, tokenizer_config)
method get_token (line 44) | def get_token(*keys)
method call (line 65) | def call(
method decode (line 121) | def decode(tokens, skip_special_tokens:)
method convert_tokens_to_string (line 125) | def convert_tokens_to_string(tokens)
method convert_tokens_to_ids (line 129) | def convert_tokens_to_ids(tokens)
method id_to_token (line 133) | def id_to_token(id)
method batch_decode (line 137) | def batch_decode(batch, **decode_args)
method padding_side= (line 141) | def padding_side=(side)
class BertTokenizer (line 146) | class BertTokenizer < PreTrainedTokenizer
class DebertaV2Tokenizer (line 151) | class DebertaV2Tokenizer < PreTrainedTokenizer
class DistilBertTokenizer (line 156) | class DistilBertTokenizer < PreTrainedTokenizer
class T5Tokenizer (line 159) | class T5Tokenizer < PreTrainedTokenizer
class GPT2Tokenizer (line 162) | class GPT2Tokenizer < PreTrainedTokenizer
class BartTokenizer (line 166) | class BartTokenizer < PreTrainedTokenizer
class RobertaTokenizer (line 169) | class RobertaTokenizer < PreTrainedTokenizer
class XLMRobertaTokenizer (line 172) | class XLMRobertaTokenizer < PreTrainedTokenizer
class MPNetTokenizer (line 175) | class MPNetTokenizer < PreTrainedTokenizer
class CLIPTokenizer (line 178) | class CLIPTokenizer < PreTrainedTokenizer
class NllbTokenizer (line 181) | class NllbTokenizer < PreTrainedTokenizer
method initialize (line 184) | def initialize(tokenizer_json, tokenizer_config)
method _build_translation_inputs (line 192) | def _build_translation_inputs(raw_inputs, tokenizer_options, generat...
class M2M100Tokenizer (line 197) | class M2M100Tokenizer < PreTrainedTokenizer
method initialize (line 200) | def initialize(tokenizer_json, tokenizer_config)
method _build_translation_inputs (line 210) | def _build_translation_inputs(raw_inputs, tokenizer_options, generat...
type Utils (line 215) | module Utils
function _build_translation_inputs (line 216) | def self._build_translation_inputs(slf, raw_inputs, tokenizer_option...
class SpeechT5Tokenizer (line 247) | class SpeechT5Tokenizer < PreTrainedTokenizer
class AutoTokenizer (line 250) | class AutoTokenizer
method from_pretrained (line 268) | def self.from_pretrained(
method load_tokenizer (line 301) | def self.load_tokenizer(pretrained_model_name_or_path, **options)
FILE: lib/informers/utils/audio.rb
type Informers (line 1) | module Informers
type Utils (line 2) | module Utils
function read_audio (line 3) | def self.read_audio(input, sampling_rate)
FILE: lib/informers/utils/core.rb
type Informers (line 1) | module Informers
type Utils (line 2) | module Utils
function dispatch_callback (line 3) | def self.dispatch_callback(progress_callback, data)
function calculate_reflect_offset (line 7) | def self.calculate_reflect_offset(i, w)
FILE: lib/informers/utils/dtypes.rb
type Informers (line 1) | module Informers
type Utils (line 2) | module Utils
FILE: lib/informers/utils/ffmpeg.rb
type Informers (line 15) | module Informers
type Utils (line 16) | module Utils
function ffmpeg_read (line 18) | def self.ffmpeg_read(data, sampling_rate)
FILE: lib/informers/utils/generation.rb
type Informers (line 1) | module Informers
type Utils (line 2) | module Utils
class GenerationConfig (line 3) | class GenerationConfig
method initialize (line 4) | def initialize(kwargs)
method [] (line 66) | def [](key)
method merge! (line 70) | def merge!(config)
class Sampler (line 75) | class Sampler
method initialize (line 76) | def initialize(generation_config)
method call (line 81) | def call(logits, index = -1)
method get_logits (line 87) | def get_logits(logits, index)
method get_sampler (line 105) | def self.get_sampler(generation_config)
class GreedySampler (line 119) | class GreedySampler < Sampler
method sample (line 120) | def sample(logits, index = -1)
class BeamSearchSampler (line 133) | class BeamSearchSampler < Sampler
method sample (line 134) | def sample(logits, index = -1)
class LogitsProcessorList (line 158) | class LogitsProcessorList
method initialize (line 159) | def initialize
method push (line 164) | def push(item)
method concat (line 168) | def concat(items)
method call (line 172) | def call(input_ids, batched_logits)
method to_ary (line 183) | def to_ary
class LogitsProcessor (line 188) | class LogitsProcessor
class NoRepeatNGramLogitsProcessor (line 191) | class NoRepeatNGramLogitsProcessor < LogitsProcessor
method initialize (line 192) | def initialize(no_repeat_ngram_size)
method get_ngrams (line 197) | def get_ngrams(prev_input_ids)
method get_generated_ngrams (line 222) | def get_generated_ngrams(banned_ngrams, prev_input_ids)
method calc_banned_ngram_tokens (line 228) | def calc_banned_ngram_tokens(prev_input_ids)
method call (line 240) | def call(input_ids, logits)
class MinLengthLogitsProcessor (line 250) | class MinLengthLogitsProcessor < LogitsProcessor
method initialize (line 251) | def initialize(min_length, eos_token_id)
method call (line 257) | def call(input_ids, logits)
class ForcedBOSTokenLogitsProcessor (line 268) | class ForcedBOSTokenLogitsProcessor < LogitsProcessor
method initialize (line 269) | def initialize(bos_token_id)
method call (line 274) | def call(input_ids, logits)
class ForcedEOSTokenLogitsProcessor (line 283) | class ForcedEOSTokenLogitsProcessor < LogitsProcessor
method initialize (line 284) | def initialize(max_length, forced_eos_token_id)
method call (line 290) | def call(input_ids, logits)
FILE: lib/informers/utils/hub.rb
type Informers (line 1) | module Informers
type Utils (line 2) | module Utils
type Hub (line 3) | module Hub
class FileResponse (line 4) | class FileResponse
method initialize (line 7) | def initialize(file_path)
method read (line 18) | def read
function is_valid_url (line 23) | def self.is_valid_url(string, protocols = nil, valid_hosts = nil)
function get_file (line 38) | def self.get_file(url_or_path, progress_callback = nil, progress_i...
class FileCache (line 70) | class FileCache
method initialize (line 73) | def initialize(path)
method match (line 77) | def match(request)
method put (line 84) | def put(request, response)
method resolve_path (line 101) | def resolve_path(request)
function try_cache (line 106) | def self.try_cache(cache, *names)
function get_model_file (line 118) | def self.get_model_file(path_or_repo_id, filename, fatal = true, *...
function get_model_json (line 212) | def self.get_model_json(model_path, file_name, fatal = true, **opt...
function path_join (line 222) | def self.path_join(*parts)
function display_progress (line 235) | def self.display_progress(filename, width, size, expected_size)
FILE: lib/informers/utils/image.rb
type Informers (line 1) | module Informers
type Utils (line 2) | module Utils
class RawImage (line 3) | class RawImage
method initialize (line 15) | def initialize(image)
method data (line 22) | def data
method size (line 26) | def size
method resize (line 30) | def resize(width, height, resample: 2)
method center_crop (line 47) | def center_crop(crop_width, crop_height)
method rgb (line 74) | def rgb
method save (line 82) | def save(path)
method read (line 86) | def self.read(input)
method from_array (line 100) | def self.from_array(input)
FILE: lib/informers/utils/math.rb
type Informers (line 1) | module Informers
type Utils (line 2) | module Utils
function interpolate_data (line 3) | def self.interpolate_data(input, in_shape, out_shape, mode = "biline...
function softmax (line 73) | def self.softmax(arr)
function sigmoid (line 89) | def self.sigmoid(arr)
function get_top_items (line 96) | def self.get_top_items(items, top_k = 0)
function max (line 110) | def self.max(arr)
FILE: lib/informers/utils/tensor.rb
type Informers (line 1) | module Informers
type Utils (line 2) | module Utils
function mean_pooling (line 3) | def self.mean_pooling(last_hidden_state, attention_mask)
function normalize (line 19) | def self.normalize(result)
function stack (line 26) | def self.stack(tensors, dim = 0)
function ones_like (line 30) | def self.ones_like(tensor)
function dims (line 37) | def self.dims(tensor)
function interpolate (line 46) | def self.interpolate(input, shape, mode = "bilinear", align_corners ...
function reshape (line 64) | def self.reshape(arr, dims)
FILE: lib/informers/version.rb
type Informers (line 1) | module Informers
FILE: test/model_test.rb
class ModelTest (line 3) | class ModelTest < Minitest::Test
method test_all_minilm (line 5) | def test_all_minilm
method test_all_minilm_xenova (line 16) | def test_all_minilm_xenova
method test_multi_qa_minilm (line 27) | def test_multi_qa_minilm
method test_paraphrase_minilm (line 44) | def test_paraphrase_minilm
method test_mxbai_embed (line 55) | def test_mxbai_embed
method test_gte_small (line 72) | def test_gte_small
method test_e5_base (line 83) | def test_e5_base
method test_nomic_embed (line 100) | def test_nomic_embed
method test_bge_base (line 117) | def test_bge_base
method test_jina_embeddings (line 134) | def test_jina_embeddings
method test_snowflake_arctic_embed (line 145) | def test_snowflake_arctic_embed
method test_all_mpnet (line 167) | def test_all_mpnet
method test_bge_m3 (line 178) | def test_bge_m3
method test_mxbai_rerank (line 186) | def test_mxbai_rerank
method test_jina_reranker (line 203) | def test_jina_reranker
method test_bge_reranker (line 220) | def test_bge_reranker
method test_ms_marco_minilm (line 237) | def test_ms_marco_minilm
FILE: test/pipeline_test.rb
class PipelineTest (line 3) | class PipelineTest < Minitest::Test
method test_ner (line 4) | def test_ner
method test_ner_aggregation_strategy (line 15) | def test_ner_aggregation_strategy
method test_sentiment_analysis (line 27) | def test_sentiment_analysis
method test_question_answering (line 44) | def test_question_answering
method test_zero_shot_classification (line 53) | def test_zero_shot_classification
method test_text2text_generation (line 63) | def test_text2text_generation
method test_translation (line 69) | def test_translation
method test_text_generation (line 75) | def test_text_generation
method test_summarization (line 81) | def test_summarization
method test_fill_mask (line 89) | def test_fill_mask
method test_fill_mask_no_mask_token (line 99) | def test_fill_mask_no_mask_token
method test_feature_extraction (line 107) | def test_feature_extraction
method test_embedding (line 115) | def test_embedding
method test_reranking (line 123) | def test_reranking
method test_image_classification (line 135) | def test_image_classification
method test_zero_shot_image_classification (line 144) | def test_zero_shot_image_classification
method test_object_detection (line 156) | def test_object_detection
method test_zero_shot_object_detection (line 176) | def test_zero_shot_object_detection
method test_depth_estimation (line 196) | def test_depth_estimation
method test_image_to_text (line 204) | def test_image_to_text
method test_image_to_image (line 210) | def test_image_to_image
method test_image_segmentation (line 219) | def test_image_segmentation
method test_image_feature_extraction (line 232) | def test_image_feature_extraction
method test_progress_callback (line 238) | def test_progress_callback
method test_device (line 252) | def test_device
method test_device_invalid (line 262) | def test_device_invalid
method test_dtype (line 269) | def test_dtype
method test_dtype_invalid (line 277) | def test_dtype_invalid
method test_session_options (line 284) | def test_session_options
FILE: test/test_helper.rb
class Minitest::Test (line 5) | class Minitest::Test
method assert_elements_in_delta (line 6) | def assert_elements_in_delta(expected, actual, delta = 0.001)
method mac? (line 13) | def mac?
Condensed preview — 30 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (215K chars).
[
{
"path": ".github/workflows/build.yml",
"chars": 481,
"preview": "name: build\non: [push, pull_request]\njobs:\n build:\n runs-on: ubuntu-latest\n steps:\n - uses: actions/checkout"
},
{
"path": ".gitignore",
"chars": 95,
"preview": "/.bundle/\n/.yardoc\n/_yardoc/\n/coverage/\n/doc/\n/pkg/\n/spec/reports/\n/test/support/\n/tmp/\n*.lock\n"
},
{
"path": "CHANGELOG.md",
"chars": 1251,
"preview": "## 1.3.0 (unreleased)\n\n- Dropped support for Ruby < 3.3\n\n## 1.2.1 (2025-02-01)\n\n- Fixed error when terminal width is zer"
},
{
"path": "Gemfile",
"chars": 82,
"preview": "source \"https://rubygems.org\"\n\ngemspec\n\ngem \"rake\"\ngem \"minitest\"\ngem \"ruby-vips\"\n"
},
{
"path": "LICENSE.txt",
"chars": 11358,
"preview": "\n Apache License\n Version 2.0, January 2004\n "
},
{
"path": "README.md",
"chars": 12065,
"preview": "# Informers\n\n:fire: Fast [transformer](https://github.com/huggingface/transformers.js) inference for Ruby\n\nFor non-ONNX "
},
{
"path": "Rakefile",
"chars": 771,
"preview": "require \"bundler/gem_tasks\"\nrequire \"rake/testtask\"\n\nRake::TestTask.new do |t|\n t.pattern = FileList[\"test/**/*_test.rb"
},
{
"path": "informers.gemspec",
"chars": 615,
"preview": "require_relative \"lib/informers/version\"\n\nGem::Specification.new do |spec|\n spec.name = \"informers\"\n spec.ver"
},
{
"path": "lib/informers/backends/onnx.rb",
"chars": 506,
"preview": "module Informers\n module Backends\n module Onnx\n def self.device_to_execution_providers(device)\n case dev"
},
{
"path": "lib/informers/configs.rb",
"chars": 1033,
"preview": "module Informers\n class PretrainedConfig\n def initialize(config_json)\n @config_json = config_json.to_h\n end\n"
},
{
"path": "lib/informers/env.rb",
"chars": 502,
"preview": "module Informers\n CACHE_HOME = ENV.fetch(\"XDG_CACHE_HOME\", File.join(ENV.fetch(\"HOME\"), \".cache\"))\n DEFAULT_CACHE_DIR "
},
{
"path": "lib/informers/model.rb",
"chars": 369,
"preview": "module Informers\n # TODO remove in 2.0\n class Model\n def initialize(model_id, quantized: false)\n @model = Info"
},
{
"path": "lib/informers/models.rb",
"chars": 49385,
"preview": "module Informers\n MODEL_TYPES = {\n EncoderOnly: 0,\n EncoderDecoder: 1,\n Seq2Seq: 2,\n Vision2Seq: 3,\n Dec"
},
{
"path": "lib/informers/pipelines.rb",
"chars": 40596,
"preview": "module Informers\n class Pipeline\n def initialize(task:, model:, tokenizer: nil, processor: nil)\n super()\n "
},
{
"path": "lib/informers/processors.rb",
"chars": 26340,
"preview": "module Informers\n class FeatureExtractor\n attr_reader :config\n\n def initialize(config)\n super()\n @confi"
},
{
"path": "lib/informers/tokenizers.rb",
"chars": 9467,
"preview": "module Informers\n class PreTrainedTokenizer\n attr_reader :mask_token, :mask_token_id, :sep_token_id\n\n def initial"
},
{
"path": "lib/informers/utils/audio.rb",
"chars": 394,
"preview": "module Informers\n module Utils\n def self.read_audio(input, sampling_rate)\n data =\n if input.is_a?(URI)\n "
},
{
"path": "lib/informers/utils/core.rb",
"chars": 245,
"preview": "module Informers\n module Utils\n def self.dispatch_callback(progress_callback, data)\n progress_callback.(data) i"
},
{
"path": "lib/informers/utils/dtypes.rb",
"chars": 250,
"preview": "module Informers\n module Utils\n DEFAULT_DTYPE_SUFFIX_MAPPING = {\n fp32: \"\",\n fp16: \"_fp16\",\n int8: \"_"
},
{
"path": "lib/informers/utils/ffmpeg.rb",
"chars": 1320,
"preview": "# Copyright 2021 The HuggingFace Team. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lic"
},
{
"path": "lib/informers/utils/generation.rb",
"chars": 9746,
"preview": "module Informers\n module Utils\n class GenerationConfig\n def initialize(kwargs)\n @config = {}\n\n # "
},
{
"path": "lib/informers/utils/hub.rb",
"chars": 7842,
"preview": "module Informers\n module Utils\n module Hub\n class FileResponse\n attr_reader :exists, :status\n\n de"
},
{
"path": "lib/informers/utils/image.rb",
"chars": 2855,
"preview": "module Informers\n module Utils\n class RawImage\n RESAMPLING_MAPPING = {\n 0 => \"nearest\",\n 1 => \"la"
},
{
"path": "lib/informers/utils/math.rb",
"chars": 3200,
"preview": "module Informers\n module Utils\n def self.interpolate_data(input, in_shape, out_shape, mode = \"bilinear\", align_corne"
},
{
"path": "lib/informers/utils/tensor.rb",
"chars": 1602,
"preview": "module Informers\n module Utils\n def self.mean_pooling(last_hidden_state, attention_mask)\n last_hidden_state.zip"
},
{
"path": "lib/informers/version.rb",
"chars": 41,
"preview": "module Informers\n VERSION = \"1.2.1\"\nend\n"
},
{
"path": "lib/informers.rb",
"chars": 1033,
"preview": "# dependencies\nrequire \"onnxruntime\"\nrequire \"tokenizers\"\n\n# stdlib\nrequire \"io/console\"\nrequire \"json\"\nrequire \"open-ur"
},
{
"path": "test/model_test.rb",
"chars": 9573,
"preview": "require_relative \"test_helper\"\n\nclass ModelTest < Minitest::Test\n # https://huggingface.co/sentence-transformers/all-Mi"
},
{
"path": "test/pipeline_test.rb",
"chars": 11253,
"preview": "require_relative \"test_helper\"\n\nclass PipelineTest < Minitest::Test\n def test_ner\n ner = Informers.pipeline(\"ner\")\n "
},
{
"path": "test/test_helper.rb",
"chars": 365,
"preview": "require \"bundler/setup\"\nBundler.require(:default)\nrequire \"minitest/autorun\"\n\nclass Minitest::Test\n def assert_elements"
}
]
About this extraction
This page contains the full source code of the ankane/informers GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 30 files (199.8 KB), approximately 51.9k tokens, and a symbol index with 497 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.