Full Code of kyegomez/ScreenAI for AI

main 119cecb2e6b3 cached
34 files
36.2 KB
10.2k tokens
25 symbols
1 requests
Download .txt
Repository: kyegomez/ScreenAI
Branch: main
Commit: 119cecb2e6b3
Files: 34
Total size: 36.2 KB

Directory structure:
gitextract_vttoceip/

├── .github/
│   ├── FUNDING.yml
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   └── feature_request.md
│   ├── PULL_REQUEST_TEMPLATE.yml
│   ├── dependabot.yml
│   ├── labeler.yml
│   └── workflows/
│       ├── code_quality_control.yml
│       ├── cos_integration.yml
│       ├── docs.yml
│       ├── docs_test.yml
│       ├── label.yml
│       ├── lints.yml
│       ├── pr_request_checks.yml
│       ├── pull-request-links.yml
│       ├── pylint.yml
│       ├── python-publish.yml
│       ├── quality.yml
│       ├── ruff.yml
│       ├── run_test.yml
│       ├── stale.yml
│       ├── test.yml
│       ├── testing.yml
│       ├── unit-test.yml
│       └── welcome.yml
├── .gitignore
├── .pre-commit-config.yaml
├── .readthedocs.yml
├── LICENSE
├── README.md
├── example.py
├── pyproject.toml
├── requirements.txt
└── screenai/
    ├── __init__.py
    └── main.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/FUNDING.yml
================================================
# These are supported funding model platforms

github: [kyegomez]
patreon: # Replace with a single Patreon username
open_collective: # Replace with a single Open Collective username
ko_fi: # Replace with a single Ko-fi username
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
otechie: # Replace with a single Otechie username
lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
custom: #Nothing


================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug report
about: Create a detailed report on the bug and it's root cause. Conduct root cause error analysis
title: "[BUG] "
labels: bug
assignees: kyegomez

---

**Describe the bug**
A clear and concise description of what the bug is and what the main root cause error is. Test very thoroughly before submitting.

**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**
Add any other context about the problem here.


================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: 'kyegomez'

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.


================================================
FILE: .github/PULL_REQUEST_TEMPLATE.yml
================================================
<!-- Thank you for contributing to Zeta!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
  - Tag maintainer: for a quicker response, tag the relevant maintainer (see below),
  - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
  1. a test for the integration, preferably unit tests that do not rely on network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - nn / Misc / if you don't know who to tag: kye@apac.ai
  - tokenizers: kye@apac.ai
  - training / Prompts: kye@apac.ai
  - models: kye@apac.ai

If no one reviews your PR within a few days, feel free to kye@apac.ai

See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/kyegomez/zeta

================================================
FILE: .github/dependabot.yml
================================================
# https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated-automatically/configuration-options-for-dependency-updates

version: 2
updates:
  - package-ecosystem: "github-actions"
    directory: "/"
    schedule:
      interval: "weekly"

  - package-ecosystem: "pip"
    directory: "/"
    schedule:
      interval: "weekly"



================================================
FILE: .github/labeler.yml
================================================
# this is a config file for the github action labeler

# Add 'label1' to any changes within 'example' folder or any subfolders
example_change:
- example/**

# Add 'label2' to any file changes within 'example2' folder
example2_change: example2/*

# Add label3 to any change to .txt files within the entire repository. Quotation marks are required for the leading asterisk
text_files:
- '**/*.txt'


================================================
FILE: .github/workflows/code_quality_control.yml
================================================
name: Linting and Formatting

on:
  push:
    branches:
      - main

jobs:
  lint_and_format:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: pip install --no-cache-dir -r requirements.txt

      - name: Find Python files
        run: find swarms_torch -name "*.py" -type f -exec autopep8 --in-place --aggressive --aggressive {} +

      - name: Push changes
        uses: ad-m/github-push-action@master
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}

================================================
FILE: .github/workflows/cos_integration.yml
================================================
name: Continuous Integration

on:
  push:
    branches:
      - main

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: pip install --no-cache-dir -r requirements.txt

      - name: Run unit tests
        run: pytest tests/unit

      - name: Run integration tests
        run: pytest tests/integration

      - name: Run code coverage
        run: pytest --cov=swarms tests/

      - name: Run linters
        run: pylint swarms

      - name: Build documentation
        run: make docs

      - name: Validate documentation
        run: sphinx-build -b linkcheck docs build/docs

      - name: Run performance tests
        run: pytest tests/performance

================================================
FILE: .github/workflows/docs.yml
================================================
name: Docs WorkFlow

on:
  push:
    branches:
      - master
      - main
      - develop
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.10'
      - run: pip install mkdocs-material
      - run: pip install "mkdocstrings[python]"
      - run: mkdocs gh-deploy --force

================================================
FILE: .github/workflows/docs_test.yml
================================================
name: Documentation Tests

on:
  push:
    branches:
      - master

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: pip install --no-cache-dir -r requirements.txt

      - name: Build documentation
        run: make docs

      - name: Validate documentation
        run: sphinx-build -b linkcheck docs build/docs

================================================
FILE: .github/workflows/label.yml
================================================
# This workflow will triage pull requests and apply a label based on the
# paths that are modified in the pull request.
#
# To use this workflow, you will need to set up a .github/labeler.yml
# file with configuration.  For more information, see:
# https://github.com/actions/labeler

name: Labeler
on: [pull_request_target]

jobs:
  label:

    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write

    steps:
    - uses: actions/labeler@v5.0.0
      with:
        repo-token: "${{ secrets.GITHUB_TOKEN }}"


================================================
FILE: .github/workflows/lints.yml
================================================
name: Linting

on:
  push:
    branches:
      - master

jobs:
  lint:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: pip install --no-cache-dir -r requirements.txt

      - name: Run linters
        run: pylint swarms_torch

================================================
FILE: .github/workflows/pr_request_checks.yml
================================================
name: Pull Request Checks

on:
  pull_request:
    branches:
      - master

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: pip install --no-cache-dir -r requirements.txt

      - name: Run tests and checks
        run: |
          pytest tests/
          pylint swarms_torch

================================================
FILE: .github/workflows/pull-request-links.yml
================================================
name: readthedocs/actions
on:
  pull_request_target:
    types:
      - opened
    paths:
      - "docs/**"

permissions:
  pull-requests: write

jobs:
  pull-request-links:
    runs-on: ubuntu-latest
    steps:
      - uses: readthedocs/actions/preview@v1
        with:
          project-slug: swarms_torch

================================================
FILE: .github/workflows/pylint.yml
================================================
name: Pylint

on: [push]

jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.9", "3.10"]
    steps:
    - uses: actions/checkout@v4
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v5
      with:
        python-version: ${{ matrix.python-version }}
    - name: Install dependencies
      run: |
        python -m pip install --no-cache-dir --upgrade pip
        pip install pylint
    - name: Analysing the code with pylint
      run: |
        pylint $(git ls-files '*.py')


================================================
FILE: .github/workflows/python-publish.yml
================================================

name: Upload Python Package

on:
  release:
    types: [published]

permissions:
  contents: read

jobs:
  deploy:

    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v4
    - name: Set up Python
      uses: actions/setup-python@v5
      with:
        python-version: '3.10'
    - name: Install dependencies
      run: |
        python -m pip install --no-cache-dir --upgrade pip
        pip install build
    - name: Build package
      run: python -m build
    - name: Publish package
      uses: pypa/gh-action-pypi-publish@2f6f737ca5f74c637829c0f5c3acd0e29ea5e8bf
      with:
        user: __token__
        password: ${{ secrets.PYPI_API_TOKEN }}

================================================
FILE: .github/workflows/quality.yml
================================================
name: Quality

on:
  push:
    branches: [ "main" ]
  pull_request:
    branches: [ "main" ]

jobs:
  lint:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
    steps:
      - name: Checkout actions
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Init environment 
        uses: ./.github/actions/init-environment 
      - name: Run linter
        run: |
          pylint `git diff --name-only --diff-filter=d origin/main HEAD | grep -E '\.py$' | tr '\n' ' '`

================================================
FILE: .github/workflows/ruff.yml
================================================
name: Ruff
on: [ push, pull_request ]
jobs:
  ruff:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: chartboost/ruff-action@v1


================================================
FILE: .github/workflows/run_test.yml
================================================
name: Python application test

on: [push]

jobs:
  build:

    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v4
    - name: Set up Python 3.10
      uses: actions/setup-python@v5
      with:
        python-version: '3.10'
    - name: Install dependencies
      run: |
        python -m pip install --no-cache-dir --upgrade pip
        pip install pytest
        if [ -f requirements.txt ]; then pip install --no-cache-dir -r requirements.txt; fi
    - name: Run tests with pytest
      run: |
        pytest tests/


================================================
FILE: .github/workflows/stale.yml
================================================
# This workflow warns and then closes issues and PRs that have had no activity for a specified amount of time.
#
# You can adjust the behavior by modifying this file.
# For more information, see:
# https://github.com/actions/stale
name: Mark stale issues and pull requests

on:
  schedule:
  - cron: '26 12 * * *'

jobs:
  stale:

    runs-on: ubuntu-latest
    permissions:
      issues: write
      pull-requests: write

    steps:
    - uses: actions/stale@v9
      with:
        repo-token: ${{ secrets.GITHUB_TOKEN }}
        stale-issue-message: 'Stale issue message'
        stale-pr-message: 'Stale pull request message'
        stale-issue-label: 'no-issue-activity'
        stale-pr-label: 'no-pr-activity'

================================================
FILE: .github/workflows/test.yml
================================================
name: test

on:
  push:
    branches: [master]
  pull_request:
  workflow_dispatch:

env:
  POETRY_VERSION: "1.4.2"

jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version:
          - "3.9"
          - "3.10"
          - "3.11"
        test_type:
          - "core"
          - "extended"
    name: Python ${{ matrix.python-version }} ${{ matrix.test_type }}
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python ${{ matrix.python-version }}
        uses: "./.github/actions/poetry_setup"
        with:
          python-version: ${{ matrix.python-version }}
          poetry-version: "1.4.2"
          cache-key: ${{ matrix.test_type }}
          install-command: |
              if [ "${{ matrix.test_type }}" == "core" ]; then
                echo "Running core tests, installing dependencies with poetry..."
                poetry install
              else
                echo "Running extended tests, installing dependencies with poetry..."
                poetry install -E extended_testing
              fi
      - name: Run ${{matrix.test_type}} tests
        run: |
          if [ "${{ matrix.test_type }}" == "core" ]; then
            make test
          else
            make extended_tests
          fi
        shell: bash

================================================
FILE: .github/workflows/testing.yml
================================================
name: Unit Tests

on:
  push:
    branches:
      - master

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: pip install --no-cache-dir -r requirements.txt

      - name: Run unit tests
        run: pytest tests/

================================================
FILE: .github/workflows/unit-test.yml
================================================
name: build

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:

  build:

    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v4

    - name: Setup Python
      uses: actions/setup-python@v5
      with:
        python-version: '3.10'

    - name: Install dependencies
      run: pip install --no-cache-dir -r requirements.txt

    - name: Run Python unit tests
      run: python3 -m unittest tests/

    - name: Verify that the Docker image for the action builds
      run: docker build . --file Dockerfile

    - name: Verify integration test results
      run: python3 -m unittest tests/


================================================
FILE: .github/workflows/welcome.yml
================================================
name: Welcome WorkFlow

on:
  issues:
    types: [opened]
  pull_request_target:
    types: [opened]

jobs:
  build:
    name: 👋 Welcome
    permissions: write-all
    runs-on: ubuntu-latest
    steps:
      - uses: actions/first-interaction@v1.3.0
        with:
          repo-token: ${{ secrets.GITHUB_TOKEN }}
          issue-message: "Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap."
          pr-message:  "Hello there, thank you for opening an PR ! 🙏🏻 The team was notified and they will get back to you asap."

================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so
.vscode/
.vscode

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
.ruff_cache/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/


================================================
FILE: .pre-commit-config.yaml
================================================
repos:
  - repo: https://github.com/ambv/black
    rev: 22.3.0
    hooks:
    - id: black
  - repo: https://github.com/charliermarsh/ruff-pre-commit
    rev: 'v0.0.255'
    hooks:
      - id: ruff
        args: [--fix]
  - repo: https://github.com/nbQA-dev/nbQA
    rev: 1.6.3
    hooks:
    - id: nbqa-black
      additional_dependencies: [ipython==8.12, black]
    - id: nbqa-ruff 
      args: ["--ignore=I001"]
      additional_dependencies: [ipython==8.12, ruff]

================================================
FILE: .readthedocs.yml
================================================
version: 2

build:
  os: ubuntu-22.04
  tools:
    python: "3.11"

mkdocs:
  configuration: mkdocs.yml

python:
   install:
   - requirements: requirements.txt

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2023 Eternal Reclaimer

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# Screen AI
Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding". The flow is:
img + text -> patch sizes -> vit -> embed + concat -> attn + ffn -> cross attn + ffn + self attn -> to out. [PAPER LINK: ](https://arxiv.org/abs/2402.04615)

## Install
`pip3 install screenai`

## Usage
```python

import torch
from screenai.main import ScreenAI

# Create a tensor for the image
image = torch.rand(1, 3, 224, 224)

# Create a tensor for the text
text = torch.randn(1, 1, 512)

# Create an instance of the ScreenAI model with specified parameters
model = ScreenAI(
    patch_size=16,
    image_size=224,
    dim=512,
    depth=6,
    heads=8,
    vit_depth=4,
    multi_modal_encoder_depth=4,
    llm_decoder_depth=4,
    mm_encoder_ff_mult=4,
)

# Perform forward pass of the model with the given text and image tensors
out = model(text, image)

# Print the shape of the output tensor
print(out)


```

# License
MIT


## Citation
```bibtex

@misc{baechler2024screenai,
    title={ScreenAI: A Vision-Language Model for UI and Infographics Understanding}, 
    author={Gilles Baechler and Srinivas Sunkara and Maria Wang and Fedir Zubach and Hassan Mansoor and Vincent Etter and Victor Cărbune and Jason Lin and Jindong Chen and Abhanshu Sharma},
    year={2024},
    eprint={2402.04615},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
```

# Todo
- [ ] Implement the nn.ModuleList([]) in the encoder and decoder


================================================
FILE: example.py
================================================
import torch
from screenai.main import ScreenAI

# Create a tensor for the image
image = torch.rand(1, 3, 224, 224)

# Create a tensor for the text
text = torch.randint(0, 20000, (1, 1028))

# Create an instance of the ScreenAI model with specified parameters
model = ScreenAI(
    num_tokens = 20000,
    max_seq_len = 1028,
    patch_size=16,
    image_size=224,
    dim=512,
    depth=6,
    heads=8,
    vit_depth=4,
    multi_modal_encoder_depth=4,
    llm_decoder_depth=4,
    mm_encoder_ff_mult=4,
)

# Perform forward pass of the model with the given text and image tensors
out = model(text, image)

# Print the shape of the output tensor
print(out)


================================================
FILE: pyproject.toml
================================================
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

[tool.poetry]
name = "screenai"
version = "0.0.8"
description = "Screen AI - Pytorch"
license = "MIT"
authors = ["Kye Gomez <kye@apac.ai>"]
homepage = "https://github.com/kyegomez/ScreenAI"
documentation = "https://github.com/kyegomez/ScreenAI" 
readme = "README.md"  # Assuming you have a README.md
repository = "https://github.com/kyegomez/ScreenAI"
keywords = ["artificial intelligence", "deep learning", "optimizers", "Prompt Engineering"]
classifiers = [
    "Development Status :: 4 - Beta",
    "Intended Audience :: Developers",
    "Topic :: Scientific/Engineering :: Artificial Intelligence",
    "License :: OSI Approved :: MIT License",
    "Programming Language :: Python :: 3.9"
]

[tool.poetry.dependencies]
python = "^3.6"
swarms = "*"
zetascale = "*"
einops = "*"
torch = "*"
torchvision = "*"


[tool.poetry.group.lint.dependencies]
ruff = "^0.1.6"
types-toml = "^0.10.8.1"
types-redis = "^4.3.21.6"
types-pytz = "^2023.3.0.0"
black = "^23.1.0"
types-chardet = "^5.0.4.6"
mypy-protobuf = "^3.0.0"


[tool.autopep8]
max_line_length = 80
ignore = "E501,W6"  # or ["E501", "W6"]
in-place = true
recursive = true
aggressive = 3


[tool.ruff]
line-length = 70

[tool.black]
line-length = 70
target-version = ['py38']
preview = true


================================================
FILE: requirements.txt
================================================
torch
zetascale
einops
torch 


================================================
FILE: screenai/__init__.py
================================================
from screenai.main import (
    CrossAttention,
    MultiModalEncoder,
    MultiModalDecoder,
    ScreenAI,
)


__all__ = [
    "CrossAttention",
    "MultiModalEncoder",
    "MultiModalDecoder",
    "ScreenAI",
]


================================================
FILE: screenai/main.py
================================================
import torch
import torch.distributed as dist
import torch.nn.functional as F
from einops import rearrange
from torch import Tensor, einsum, nn
from torch.autograd import Function
from zeta.nn import (
    SwiGLU,
    FeedForward,
    Attention,
)
from zeta.structs import (
    Encoder,
    ViTransformerWrapper,
)

# helper functions


def exists(val):
    return val is not None


def default(val, d):
    return val if exists(val) else d


def pair(val):
    return (val, val) if not isinstance(val, tuple) else val


def divisible_by(numer, denom):
    return (numer % denom) == 0


def dynamic_patching(x, patch_size, image_size):
    # Calculate the patch size based off the image
    patch_size = pair(patch_size)
    image_size = pair(image_size)

    # Get the height and width of the image
    h, w = image_size

    # Use einops to rearrange the image
    x = rearrange(
        x,
        "b c (h p1) (w p2) -> b (h w) (p1 p2 c)",
        p1=patch_size[0],
        p2=patch_size[1],
    )

    return x


# distributed


def pad_dim_to(t, length, dim=0):
    pad_length = length - t.shape[dim]
    zero_pairs = (-dim - 1) if dim < 0 else (t.ndim - dim - 1)
    return F.pad(t, (*((0, 0) * zero_pairs), 0, pad_length))


def all_gather_variable_batch(t):
    device, rank, world_size = (
        t.device,
        dist.get_rank(),
        dist.get_world_size(),
    )

    size = torch.tensor(t.shape[0], device=device, dtype=torch.long)
    sizes = [
        torch.empty_like(size, device=device, dtype=torch.long)
        for i in range(world_size)
    ]
    dist.all_gather(sizes, size)

    sizes = torch.stack(sizes)
    max_size = sizes.amax().item()

    padded_t = pad_dim_to(t, max_size, dim=0)
    gathered_tensors = [
        torch.empty_like(
            padded_t, device=device, dtype=padded_t.dtype
        )
        for i in range(world_size)
    ]
    dist.all_gather(gathered_tensors, padded_t)

    gathered_tensor = torch.cat(gathered_tensors)
    seq = torch.arange(max_size, device=device)

    mask = rearrange(seq, "j -> 1 j") < rearrange(sizes, "i -> i 1")
    mask = rearrange(mask, "i j -> (i j)")

    gathered_tensor = gathered_tensor[mask]
    sizes = sizes.tolist()

    return gathered_tensor, sizes


class AllGather(Function):
    @staticmethod
    def forward(ctx, x):
        assert dist.is_initialized() and dist.get_world_size() > 1
        x, batch_sizes = all_gather_variable_batch(x)
        ctx.batch_sizes = batch_sizes
        return x

    @staticmethod
    def backward(ctx, grads):
        batch_sizes, rank = ctx.batch_sizes, dist.get_rank()
        grads_by_rank = grads.split(batch_sizes, dim=0)
        return grads_by_rank[rank]


all_gather = AllGather.apply


# normalization
# they use layernorm without bias, something that pytorch does not offer


# to latents


class EmbedToLatents(nn.Module):
    def __init__(self, dim, dim_latents):
        super().__init__()
        self.to_latents = nn.Linear(dim, dim_latents, bias=False)

    def forward(self, x):
        latents = self.to_latents(x)
        return F.normalize(latents, dim=-1)


# parallel attention and feedforward with residual
# cross attention - using multi-query + one-headed key / values as in PaLM w/ optional parallel feedforward


class CrossAttention(nn.Module):
    """
    Initializes the ScreenAI model.

    Args:
    dim (int): The input dimension.
    context_dim (int, optional): The dimension of the context. Defaults to None.
    dim_head (int, optional): The dimension of each head. Defaults to 64.
    heads (int, optional): The number of attention heads. Defaults to 8.
    parallel_ff (bool, optional): Whether to use parallel feedforward. Defaults to False.
    ff_mult (int, optional): The multiplier for the feedforward inner dimension. Defaults to 4.
    norm_context (bool, optional): Whether to apply layer normalization to the context. Defaults to False.
    """

    def __init__(
        self,
        dim,
        *,
        context_dim=None,
        dim_head=64,
        heads=8,
        parallel_ff=False,
        ff_mult=4,
        norm_context=False,
    ):
        super().__init__()
        self.heads = heads
        self.scale = dim_head**-0.5
        inner_dim = heads * dim_head
        context_dim = default(context_dim, dim)

        self.norm = nn.LayerNorm(dim)
        self.context_norm = (
            nn.LayerNorm(context_dim)
            if norm_context
            else nn.Identity()
        )

        self.to_q = nn.Linear(dim, inner_dim, bias=False)
        self.to_kv = nn.Linear(context_dim, dim_head * 2, bias=False)
        self.to_out = nn.Linear(inner_dim, dim, bias=False)

        # whether to have parallel feedforward

        ff_inner_dim = ff_mult * dim

        self.ff = (
            nn.Sequential(
                nn.Linear(dim, ff_inner_dim * 2, bias=False),
                SwiGLU(),
                nn.Linear(ff_inner_dim, dim, bias=False),
            )
            if parallel_ff
            else None
        )

    def forward(self, x, context):
        """
        einstein notation
        b - batch
        h - heads
        n, i, j - sequence length (base sequence length, source, target)
        d - feature dimension
        """

        # pre-layernorm, for queries and context

        x = self.norm(x)
        context = self.context_norm(context)

        # get queries

        q = self.to_q(x)
        q = rearrange(q, "b n (h d) -> b h n d", h=self.heads)

        # scale

        q = q * self.scale

        # get key / values

        k, v = self.to_kv(context).chunk(2, dim=-1)

        # query / key similarity

        sim = einsum("b h i d, b j d -> b h i j", q, k)

        # attention

        attn = sim.softmax(dim=-1)

        # aggregate

        out = einsum("b h i j, b j d -> b h i d", attn, v)

        # merge and combine heads

        out = rearrange(out, "b h n d -> b n (h d)")
        out = self.to_out(out)

        # add parallel feedforward (for multimodal layers)

        if exists(self.ff):
            out = out + self.ff(x)

        return out


class MultiModalEncoder(nn.Module):
    """
    MultiModalEncoder class is responsible for encoding multi-modal inputs using self-attention mechanism.

    Args:
        dim (int): The dimension of the input and output tensors. Default is 512.
        depth (int): The number of layers in the encoder. Default is 6.
        dim_head (int): The dimension of each head in the self-attention mechanism. Default is 64.
        heads (int): The number of attention heads. Default is 8.
        *args: Variable length argument list.
        **kwargs: Arbitrary keyword arguments.

    Attributes:
        dim (int): The dimension of the input and output tensors.
        depth (int): The number of layers in the encoder.
        heads (int): The number of attention heads.
        dim_head (int): The dimension of each head in the self-attention mechanism.
        layers (list): List of attention and feedforward layers.

    """

    def __init__(
        self,
        dim: int = 512,
        depth: int = 6,
        dim_head: int = 64,
        heads: int = 8,
        *args,
        **kwargs,
    ):
        super().__init__()
        self.dim = dim
        self.depth = depth
        self.heads = heads
        self.dim_head = dim_head

        self.flash = "cuda" if torch.cuda.is_available() else "cpu"

        self.attn = Attention(
            dim,
            dim_head,
            heads,
            causal=True,
            qk_norm=True,
            flash=self.flash,
        )
        self.ffn = FeedForward(dim, dim, 4, *args, **kwargs)

    def forward(self, x: Tensor) -> Tensor:
        """
        Forward pass of the MultiModalEncoder.

        Args:
            x (Tensor): The input tensor.

        Returns:
            Tensor: The encoded tensor.

        """
        skip = x
        x, _ = self.attn(x)
        x = x + skip
        x = self.ffn(x) + x

        return x + skip


class MultiModalDecoder(nn.Module):
    """
    MultiModalDecoder module for decoding multi-modal inputs.

    Args:
        dim (int): The dimension of the input.
        depth (int): The number of layers in the decoder.
        dim_head (int): The dimension of each attention head.
        heads (int): The number of attention heads.
        *args: Variable length argument list.
        **kwargs: Arbitrary keyword arguments.

    Attributes:
        dim (int): The dimension of the input.
        depth (int): The number of layers in the decoder.
        heads (int): The number of attention heads.
        dim_head (int): The dimension of each attention head.
        layers (nn.ModuleList): List of decoder layers.

    """

    def __init__(
        self,
        dim: int = 512,
        depth: int = 6,
        dim_head: int = 64,
        heads: int = 8,
        *args,
        **kwargs,
    ):
        super().__init__()
        self.dim = dim
        self.depth = depth
        self.heads = heads
        self.dim_head = dim_head
        self.flash = "cuda" if torch.cuda.is_available() else "cpu"
        self.cross_attn = CrossAttention(
            dim,
            dim_head=dim_head,
            heads=heads,
            parallel_ff=True,
        )

        self.attn = Attention(
            dim,
            dim_head,
            heads,
            causal=True,
            qk_norm=True,
            flash=self.flash,
        )

    def forward(self, x: Tensor) -> Tensor:
        skip = x
        x = self.cross_attn(x, x) + x
        x, _ = self.attn(x)

        return x + skip


class ScreenAI(nn.Module):
    """
    ScreenAI module for multimodal learning.

    Args:
        patch_size (int): Size of the image patches.
        image_size (int): Size of the input image.
        dim (int): Dimension of the model.
        depth (int): Depth of the model.
        dim_head (int): Dimension of the attention head.
        heads (int): Number of attention heads.
        vit_depth (int): Depth of the ViT transformer.
        multi_modal_encoder_depth (int): Depth of the multimodal encoder.
        llm_decoder_depth (int): Depth of the LLM decoder.
        mm_encoder_ff_mult (int): Multiplier for the feed-forward dimension in the multimodal encoder.
        *args: Variable length argument list.
        **kwargs: Arbitrary keyword arguments.

    Attributes:
        patch_size (int): Size of the image patches.
        image_size (int): Size of the input image.
        dim (int): Dimension of the model.
        depth (int): Depth of the model.
        heads (int): Number of attention heads.
        vit_depth (int): Depth of the ViT transformer.
        multi_modal_encoder_depth (int): Depth of the multimodal encoder.
        llm_decoder_depth (int): Depth of the LLM decoder.
        patch_embedding (nn.Conv2d): Patch embedding layer.
        vit (ViTransformerWrapper): ViT transformer layer.
        image_embedding (nn.Linear): Image embedding layer.
        to_out (nn.Sequential): Output layer.
        flash (str): Device to use for computation.
        encoder (MultiModalEncoder): Multimodal encoder layer.
        decoder (MultiModalDecoder): LLM decoder layer.
    """

    def __init__(
        self,
        num_tokens: int,
        max_seq_len: int,
        patch_size: int,
        image_size: int = 224,
        dim: int = 512,
        depth: int = 6,
        dim_head: int = 64,
        heads: int = 8,
        vit_depth: int = 4,
        multi_modal_encoder_depth: int = 4,
        llm_decoder_depth: int = 4,
        channels: int = 3,
        *args,
        **kwargs,
    ):
        super().__init__()
        self.num_tokens = num_tokens
        self.max_seq_len = max_seq_len
        self.patch_size = patch_size
        self.image_size = image_size
        self.dim = dim
        self.depth = depth
        self.heads = heads
        self.vit_depth = vit_depth
        self.multi_modal_encoder_depth = multi_modal_encoder_depth
        self.llm_decoder_depth = llm_decoder_depth
        
        # ViTransformerWrapper
        self.vit = ViTransformerWrapper(
            image_size=image_size,
            patch_size=patch_size,
            post_emb_norm=True,
            attn_layers=Encoder(
                dim=dim, depth=vit_depth, heads=heads
            ),
        )

        # Image embedding
        self.image_embedding = nn.Linear(dim, dim)

        # To out
        self.to_out = nn.Sequential(
            nn.LayerNorm(dim), nn.Linear(dim, dim), nn.Softmax(dim=-1)
        )

        # If cuda is avaialble then cuda
        self.flash = "cuda" if torch.cuda.is_available() else "cpu"

        # MultiModal Encoder layers
        self.encoder = MultiModalEncoder(
            dim,
            multi_modal_encoder_depth,
            dim_head,
            heads,
        )

        # LLM Layer / T5
        self.decoder = MultiModalDecoder(
            dim,
            llm_decoder_depth,
            dim_head,
            heads,
        )
        self.to_patch_embedding = nn.Sequential(
            nn.LayerNorm(dim),
            nn.Linear(dim, dim),
            nn.LayerNorm(dim),
        )
        
        
        # Embedding for the tokens
        self.embedding = nn.Embedding(num_tokens, dim)

    def forward(self, text: Tensor, img: Tensor) -> Tensor:
        """
        Forward pass of the ScreenAI module.

        Args:
            text (Tensor): Input text tensor.
            img (Tensor): Input image tensor.

        Returns:
            Tensor: Output tensor.
        """
        text = self.embedding(text)
        # Aspect ratio preserving grid with max e.g 25 patches, output needs to be 4
        x = rearrange(
            img,
            "b c (h p1) (w p2) -> b c (h p1) (w p2)",
            p1=self.patch_size,
            p2=self.patch_size,
        )

        # vit
        img = self.vit(img, return_embeddings=True)
        print(f"Image shape: {img.shape}")

        # Embed image
        # img = self.image_embedding(img)
        img = self.to_patch_embedding(img)

        # Concatenate image and text
        x = torch.cat((img, text), dim=1)
        print(x.shape)

        # T5 Multimodal encoder
        x = self.encoder(x)

        # Pass the k, v values into the cross attention of llm
        x = self.decoder(x)

        # To out
        x = self.to_out(x)

        return x
Download .txt
gitextract_vttoceip/

├── .github/
│   ├── FUNDING.yml
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   └── feature_request.md
│   ├── PULL_REQUEST_TEMPLATE.yml
│   ├── dependabot.yml
│   ├── labeler.yml
│   └── workflows/
│       ├── code_quality_control.yml
│       ├── cos_integration.yml
│       ├── docs.yml
│       ├── docs_test.yml
│       ├── label.yml
│       ├── lints.yml
│       ├── pr_request_checks.yml
│       ├── pull-request-links.yml
│       ├── pylint.yml
│       ├── python-publish.yml
│       ├── quality.yml
│       ├── ruff.yml
│       ├── run_test.yml
│       ├── stale.yml
│       ├── test.yml
│       ├── testing.yml
│       ├── unit-test.yml
│       └── welcome.yml
├── .gitignore
├── .pre-commit-config.yaml
├── .readthedocs.yml
├── LICENSE
├── README.md
├── example.py
├── pyproject.toml
├── requirements.txt
└── screenai/
    ├── __init__.py
    └── main.py
Download .txt
SYMBOL INDEX (25 symbols across 1 files)

FILE: screenai/main.py
  function exists (line 20) | def exists(val):
  function default (line 24) | def default(val, d):
  function pair (line 28) | def pair(val):
  function divisible_by (line 32) | def divisible_by(numer, denom):
  function dynamic_patching (line 36) | def dynamic_patching(x, patch_size, image_size):
  function pad_dim_to (line 58) | def pad_dim_to(t, length, dim=0):
  function all_gather_variable_batch (line 64) | def all_gather_variable_batch(t):
  class AllGather (line 102) | class AllGather(Function):
    method forward (line 104) | def forward(ctx, x):
    method backward (line 111) | def backward(ctx, grads):
  class EmbedToLatents (line 127) | class EmbedToLatents(nn.Module):
    method __init__ (line 128) | def __init__(self, dim, dim_latents):
    method forward (line 132) | def forward(self, x):
  class CrossAttention (line 141) | class CrossAttention(nn.Module):
    method __init__ (line 155) | def __init__(
    method forward (line 197) | def forward(self, x, context):
  class MultiModalEncoder (line 249) | class MultiModalEncoder(nn.Module):
    method __init__ (line 270) | def __init__(
    method forward (line 297) | def forward(self, x: Tensor) -> Tensor:
  class MultiModalDecoder (line 316) | class MultiModalDecoder(nn.Module):
    method __init__ (line 337) | def __init__(
    method forward (line 368) | def forward(self, x: Tensor) -> Tensor:
  class ScreenAI (line 376) | class ScreenAI(nn.Module):
    method __init__ (line 412) | def __init__(
    method forward (line 487) | def forward(self, text: Tensor, img: Tensor) -> Tensor:
Condensed preview — 34 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (41K chars).
[
  {
    "path": ".github/FUNDING.yml",
    "chars": 673,
    "preview": "# These are supported funding model platforms\n\ngithub: [kyegomez]\npatreon: # Replace with a single Patreon username\nopen"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "chars": 682,
    "preview": "---\nname: Bug report\nabout: Create a detailed report on the bug and it's root cause. Conduct root cause error analysis\nt"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "chars": 603,
    "preview": "---\nname: Feature request\nabout: Suggest an idea for this project\ntitle: ''\nlabels: ''\nassignees: 'kyegomez'\n\n---\n\n**Is "
  },
  {
    "path": ".github/PULL_REQUEST_TEMPLATE.yml",
    "chars": 1004,
    "preview": "<!-- Thank you for contributing to Zeta!\n\nReplace this comment with:\n  - Description: a description of the change, \n  - "
  },
  {
    "path": ".github/dependabot.yml",
    "chars": 366,
    "preview": "# https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated-automatically/configu"
  },
  {
    "path": ".github/labeler.yml",
    "chars": 396,
    "preview": "# this is a config file for the github action labeler\n\n# Add 'label1' to any changes within 'example' folder or any subf"
  },
  {
    "path": ".github/workflows/code_quality_control.yml",
    "chars": 686,
    "preview": "name: Linting and Formatting\n\non:\n  push:\n    branches:\n      - main\n\njobs:\n  lint_and_format:\n    runs-on: ubuntu-lates"
  },
  {
    "path": ".github/workflows/cos_integration.yml",
    "chars": 882,
    "preview": "name: Continuous Integration\n\non:\n  push:\n    branches:\n      - main\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n    steps"
  },
  {
    "path": ".github/workflows/docs.yml",
    "chars": 390,
    "preview": "name: Docs WorkFlow\n\non:\n  push:\n    branches:\n      - master\n      - main\n      - develop\njobs:\n  deploy:\n    runs-on: "
  },
  {
    "path": ".github/workflows/docs_test.yml",
    "chars": 544,
    "preview": "name: Documentation Tests\n\non:\n  push:\n    branches:\n      - master\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n\n    steps"
  },
  {
    "path": ".github/workflows/label.yml",
    "chars": 543,
    "preview": "# This workflow will triage pull requests and apply a label based on the\n# paths that are modified in the pull request.\n"
  },
  {
    "path": ".github/workflows/lints.yml",
    "chars": 441,
    "preview": "name: Linting\n\non:\n  push:\n    branches:\n      - master\n\njobs:\n  lint:\n    runs-on: ubuntu-latest\n\n    steps:\n      - na"
  },
  {
    "path": ".github/workflows/pr_request_checks.yml",
    "chars": 506,
    "preview": "name: Pull Request Checks\n\non:\n  pull_request:\n    branches:\n      - master\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n\n "
  },
  {
    "path": ".github/workflows/pull-request-links.yml",
    "chars": 307,
    "preview": "name: readthedocs/actions\non:\n  pull_request_target:\n    types:\n      - opened\n    paths:\n      - \"docs/**\"\n\npermissions"
  },
  {
    "path": ".github/workflows/pylint.yml",
    "chars": 561,
    "preview": "name: Pylint\n\non: [push]\n\njobs:\n  build:\n    runs-on: ubuntu-latest\n    strategy:\n      matrix:\n        python-version: "
  },
  {
    "path": ".github/workflows/python-publish.yml",
    "chars": 669,
    "preview": "\nname: Upload Python Package\n\non:\n  release:\n    types: [published]\n\npermissions:\n  contents: read\n\njobs:\n  deploy:\n\n   "
  },
  {
    "path": ".github/workflows/quality.yml",
    "chars": 513,
    "preview": "name: Quality\n\non:\n  push:\n    branches: [ \"main\" ]\n  pull_request:\n    branches: [ \"main\" ]\n\njobs:\n  lint:\n    runs-on:"
  },
  {
    "path": ".github/workflows/ruff.yml",
    "chars": 164,
    "preview": "name: Ruff\non: [ push, pull_request ]\njobs:\n  ruff:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout"
  },
  {
    "path": ".github/workflows/run_test.yml",
    "chars": 533,
    "preview": "name: Python application test\n\non: [push]\n\njobs:\n  build:\n\n    runs-on: ubuntu-latest\n\n    steps:\n    - uses: actions/ch"
  },
  {
    "path": ".github/workflows/stale.yml",
    "chars": 716,
    "preview": "# This workflow warns and then closes issues and PRs that have had no activity for a specified amount of time.\n#\n# You c"
  },
  {
    "path": ".github/workflows/test.yml",
    "chars": 1294,
    "preview": "name: test\n\non:\n  push:\n    branches: [master]\n  pull_request:\n  workflow_dispatch:\n\nenv:\n  POETRY_VERSION: \"1.4.2\"\n\njob"
  },
  {
    "path": ".github/workflows/testing.yml",
    "chars": 441,
    "preview": "name: Unit Tests\n\non:\n  push:\n    branches:\n      - master\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n\n    steps:\n      -"
  },
  {
    "path": ".github/workflows/unit-test.yml",
    "chars": 637,
    "preview": "name: build\n\non:\n  push:\n    branches: [ main ]\n  pull_request:\n    branches: [ main ]\n\njobs:\n\n  build:\n\n    runs-on: ub"
  },
  {
    "path": ".github/workflows/welcome.yml",
    "chars": 575,
    "preview": "name: Welcome WorkFlow\n\non:\n  issues:\n    types: [opened]\n  pull_request_target:\n    types: [opened]\n\njobs:\n  build:\n   "
  },
  {
    "path": ".gitignore",
    "chars": 3108,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n.vscode/\n.vscode\n\n# Distr"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 466,
    "preview": "repos:\n  - repo: https://github.com/ambv/black\n    rev: 22.3.0\n    hooks:\n    - id: black\n  - repo: https://github.com/c"
  },
  {
    "path": ".readthedocs.yml",
    "chars": 159,
    "preview": "version: 2\n\nbuild:\n  os: ubuntu-22.04\n  tools:\n    python: \"3.11\"\n\nmkdocs:\n  configuration: mkdocs.yml\n\npython:\n   insta"
  },
  {
    "path": "LICENSE",
    "chars": 1074,
    "preview": "MIT License\n\nCopyright (c) 2023 Eternal Reclaimer\n\nPermission is hereby granted, free of charge, to any person obtaining"
  },
  {
    "path": "README.md",
    "chars": 1547,
    "preview": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# Screen AI\nImplementation of the ScreenAI model fr"
  },
  {
    "path": "example.py",
    "chars": 658,
    "preview": "import torch\nfrom screenai.main import ScreenAI\n\n# Create a tensor for the image\nimage = torch.rand(1, 3, 224, 224)\n\n# C"
  },
  {
    "path": "pyproject.toml",
    "chars": 1337,
    "preview": "[build-system]\nrequires = [\"poetry-core>=1.0.0\"]\nbuild-backend = \"poetry.core.masonry.api\"\n\n[tool.poetry]\nname = \"screen"
  },
  {
    "path": "requirements.txt",
    "chars": 30,
    "preview": "torch\nzetascale\neinops\ntorch \n"
  },
  {
    "path": "screenai/__init__.py",
    "chars": 214,
    "preview": "from screenai.main import (\n    CrossAttention,\n    MultiModalEncoder,\n    MultiModalDecoder,\n    ScreenAI,\n)\n\n\n__all__ "
  },
  {
    "path": "screenai/main.py",
    "chars": 14380,
    "preview": "import torch\nimport torch.distributed as dist\nimport torch.nn.functional as F\nfrom einops import rearrange\nfrom torch im"
  }
]

About this extraction

This page contains the full source code of the kyegomez/ScreenAI GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 34 files (36.2 KB), approximately 10.2k tokens, and a symbol index with 25 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!