[
  {
    "path": ".github/FUNDING.yml",
    "content": "# These are supported funding model platforms\n\ngithub: [kyegomez]\npatreon: # Replace with a single Patreon username\nopen_collective: # Replace with a single Open Collective username\nko_fi: # Replace with a single Ko-fi username\ntidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel\ncommunity_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry\nliberapay: # Replace with a single Liberapay username\nissuehunt: # Replace with a single IssueHunt username\notechie: # Replace with a single Otechie username\nlfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry\ncustom: #Nothing\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "content": "---\nname: Bug report\nabout: Create a detailed report on the bug and it's root cause. Conduct root cause error analysis\ntitle: \"[BUG] \"\nlabels: bug\nassignees: kyegomez\n\n---\n\n**Describe the bug**\nA clear and concise description of what the bug is and what the main root cause error is. Test very thoroughly before submitting.\n\n**To Reproduce**\nSteps to reproduce the behavior:\n1. Go to '...'\n2. Click on '....'\n3. Scroll down to '....'\n4. See error\n\n**Expected behavior**\nA clear and concise description of what you expected to happen.\n\n**Screenshots**\nIf applicable, add screenshots to help explain your problem.\n\n**Additional context**\nAdd any other context about the problem here.\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "content": "---\nname: Feature request\nabout: Suggest an idea for this project\ntitle: ''\nlabels: ''\nassignees: 'kyegomez'\n\n---\n\n**Is your feature request related to a problem? Please describe.**\nA clear and concise description of what the problem is. Ex. I'm always frustrated when [...]\n\n**Describe the solution you'd like**\nA clear and concise description of what you want to happen.\n\n**Describe alternatives you've considered**\nA clear and concise description of any alternative solutions or features you've considered.\n\n**Additional context**\nAdd any other context or screenshots about the feature request here.\n"
  },
  {
    "path": ".github/PULL_REQUEST_TEMPLATE.yml",
    "content": "<!-- Thank you for contributing to Zeta!\n\nReplace this comment with:\n  - Description: a description of the change, \n  - Issue: the issue # it fixes (if applicable),\n  - Dependencies: any dependencies required for this change,\n  - Tag maintainer: for a quicker response, tag the relevant maintainer (see below),\n  - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out!\n\nIf you're adding a new integration, please include:\n  1. a test for the integration, preferably unit tests that do not rely on network access,\n  2. an example notebook showing its use.\n\nMaintainer responsibilities:\n  - nn / Misc / if you don't know who to tag: kye@apac.ai\n  - tokenizers: kye@apac.ai\n  - training / Prompts: kye@apac.ai\n  - models: kye@apac.ai\n\nIf no one reviews your PR within a few days, feel free to kye@apac.ai\n\nSee contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/kyegomez/zeta"
  },
  {
    "path": ".github/dependabot.yml",
    "content": "# https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated-automatically/configuration-options-for-dependency-updates\n\nversion: 2\nupdates:\n  - package-ecosystem: \"github-actions\"\n    directory: \"/\"\n    schedule:\n      interval: \"weekly\"\n\n  - package-ecosystem: \"pip\"\n    directory: \"/\"\n    schedule:\n      interval: \"weekly\"\n\n"
  },
  {
    "path": ".github/labeler.yml",
    "content": "# this is a config file for the github action labeler\n\n# Add 'root' label to any root file changes\n# Quotation marks are required for the leading asterisk\nroot:\n- changed-files:\n  - any-glob-to-any-file: '*'\n\n# Add 'Documentation' label to any changes within 'docs' folder or any subfolders\nDocumentation:\n- changed-files:\n  - any-glob-to-any-file: docs/**\n\n# Add 'Tests' label to any file changes within 'docs' folder\nTests:\n- changed-files:\n  - any-glob-to-any-file: tests/*\n\n# Add 'Documentation' label to any file changes within 'docs' or 'guides' folders\nghactions:\n- changed-files:\n  - any-glob-to-any-file:\n    - .github/workflows/*\n    - .github/*\n\n# Add 'Scripts' label to any file changes within 'docs' folder\nScripts:\n- changed-files:\n  - any-glob-to-any-file: scripts/*\n  \n## Equivalent of the above mentioned configuration using another syntax\nDocumentation:\n- changed-files:\n  - any-glob-to-any-file: ['docs/*', 'guides/*']\n\n# Add 'Documentation' label to any change to .md files within the entire repository \nDocumentation:\n- changed-files:\n  - any-glob-to-any-file: '**/*.md'\n\n# Add 'source' label to any change to src files within the source dir EXCEPT for the docs sub-folder\nsource:\n- all:\n  - changed-files:\n    - any-glob-to-any-file: 'src/**/*'\n    - all-globs-to-all-files: '!src/docs/*'\n\n# Add 'feature' label to any PR where the head branch name starts with `feature` or has a `feature` section in the name\nfeature:\n - head-branch: ['^feature', 'feature']\n\n# Add 'release' label to any PR that is opened against the `main` branch\nrelease:\n - base-branch: 'main'\n"
  },
  {
    "path": ".github/workflows/code_quality_control.yml",
    "content": "name: Linting and Formatting\n\non:\n  push:\n    branches:\n      - main\n\njobs:\n  lint_and_format:\n    runs-on: ubuntu-latest\n\n    steps:\n      - name: Checkout code\n        uses: actions/checkout@v4\n\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.10'\n\n      - name: Install dependencies\n        run: pip install --no-cache-dir -r requirements.txt\n\n      - name: Find Python files\n        run: find swarms_torch -name \"*.py\" -type f -exec autopep8 --in-place --aggressive --aggressive {} +\n\n      - name: Push changes\n        uses: ad-m/github-push-action@master\n        with:\n          github_token: ${{ secrets.GITHUB_TOKEN }}"
  },
  {
    "path": ".github/workflows/cos_integration.yml",
    "content": "name: Continuous Integration\n\non:\n  push:\n    branches:\n      - main\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Checkout code\n        uses: actions/checkout@v4\n\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.10'\n\n      - name: Install dependencies\n        run: pip install --no-cache-dir -r requirements.txt\n\n      - name: Run unit tests\n        run: pytest tests/unit\n\n      - name: Run integration tests\n        run: pytest tests/integration\n\n      - name: Run code coverage\n        run: pytest --cov=swarms tests/\n\n      - name: Run linters\n        run: pylint swarms\n\n      - name: Build documentation\n        run: make docs\n\n      - name: Validate documentation\n        run: sphinx-build -b linkcheck docs build/docs\n\n      - name: Run performance tests\n        run: pytest tests/performance"
  },
  {
    "path": ".github/workflows/docs.yml",
    "content": "name: Docs WorkFlow\n\non:\n  push:\n    branches:\n      - master\n      - main\n      - develop\njobs:\n  deploy:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n      - uses: actions/setup-python@v5\n        with:\n          python-version: '3.10'\n      - run: pip install mkdocs-material\n      - run: pip install \"mkdocstrings[python]\"\n      - run: mkdocs gh-deploy --force"
  },
  {
    "path": ".github/workflows/docs_test.yml",
    "content": "name: Documentation Tests\n\non:\n  push:\n    branches:\n      - master\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n\n    steps:\n      - name: Checkout code\n        uses: actions/checkout@v4\n\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.10'\n\n      - name: Install dependencies\n        run: pip install --no-cache-dir -r requirements.txt\n\n      - name: Build documentation\n        run: make docs\n\n      - name: Validate documentation\n        run: sphinx-build -b linkcheck docs build/docs"
  },
  {
    "path": ".github/workflows/label.yml",
    "content": "# This workflow will triage pull requests and apply a label based on the\n# paths that are modified in the pull request.\n#\n# To use this workflow, you will need to set up a .github/labeler.yml\n# file with configuration.  For more information, see:\n# https://github.com/actions/labeler\n\nname: Labeler\non: [pull_request_target]\n\njobs:\n  label:\n\n    runs-on: ubuntu-latest\n    permissions:\n      contents: read\n      pull-requests: write\n\n    steps:\n    - uses: actions/labeler@v5.0.0\n      with:\n        repo-token: \"${{ secrets.GITHUB_TOKEN }}\"\n"
  },
  {
    "path": ".github/workflows/lints.yml",
    "content": "name: Linting\n\non:\n  push:\n    branches:\n      - master\n\njobs:\n  lint:\n    runs-on: ubuntu-latest\n\n    steps:\n      - name: Checkout code\n        uses: actions/checkout@v4\n\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.10'\n\n      - name: Install dependencies\n        run: pip install --no-cache-dir -r requirements.txt\n\n      - name: Run linters\n        run: pylint swarms_torch"
  },
  {
    "path": ".github/workflows/pr_request_checks.yml",
    "content": "name: Pull Request Checks\n\non:\n  pull_request:\n    branches:\n      - master\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n\n    steps:\n      - name: Checkout code\n        uses: actions/checkout@v4\n\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.10'\n\n      - name: Install dependencies\n        run: pip install --no-cache-dir -r requirements.txt\n\n      - name: Run tests and checks\n        run: |\n          pytest tests/\n          pylint swarms_torch"
  },
  {
    "path": ".github/workflows/pull-request-links.yml",
    "content": "name: readthedocs/actions\non:\n  pull_request_target:\n    types:\n      - opened\n    paths:\n      - \"docs/**\"\n\npermissions:\n  pull-requests: write\n\njobs:\n  pull-request-links:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: readthedocs/actions/preview@v1\n        with:\n          project-slug: swarms_torch"
  },
  {
    "path": ".github/workflows/pylint.yml",
    "content": "name: Pylint\n\non: [push]\n\njobs:\n  build:\n    runs-on: ubuntu-latest\n    strategy:\n      matrix:\n        python-version: [\"3.9\", \"3.10\"]\n    steps:\n    - uses: actions/checkout@v4\n    - name: Set up Python ${{ matrix.python-version }}\n      uses: actions/setup-python@v5\n      with:\n        python-version: ${{ matrix.python-version }}\n    - name: Install dependencies\n      run: |\n        python -m pip install --no-cache-dir --upgrade pip\n        pip install pylint\n    - name: Analysing the code with pylint\n      run: |\n        pylint $(git ls-files '*.py')\n"
  },
  {
    "path": ".github/workflows/python-publish.yml",
    "content": "\nname: Upload Python Package\n\non:\n  release:\n    types: [published]\n\npermissions:\n  contents: read\n\njobs:\n  deploy:\n\n    runs-on: ubuntu-latest\n\n    steps:\n    - uses: actions/checkout@v4\n    - name: Set up Python\n      uses: actions/setup-python@v5\n      with:\n        python-version: '3.10'\n    - name: Install dependencies\n      run: |\n        python -m pip install --no-cache-dir --upgrade pip\n        pip install build\n    - name: Build package\n      run: python -m build\n    - name: Publish package\n      uses: pypa/gh-action-pypi-publish@ec4db0b4ddc65acdf4bff5fa45ac92d78b56bdf0\n      with:\n        user: __token__\n        password: ${{ secrets.PYPI_API_TOKEN }}"
  },
  {
    "path": ".github/workflows/quality.yml",
    "content": "name: Quality\n\non:\n  push:\n    branches: [ \"main\" ]\n  pull_request:\n    branches: [ \"main\" ]\n\njobs:\n  lint:\n    runs-on: ubuntu-latest\n    strategy:\n      fail-fast: false\n    steps:\n      - name: Checkout actions\n        uses: actions/checkout@v4\n        with:\n          fetch-depth: 0\n      - name: Init environment \n        uses: ./.github/actions/init-environment \n      - name: Run linter\n        run: |\n          pylint `git diff --name-only --diff-filter=d origin/main HEAD | grep -E '\\.py$' | tr '\\n' ' '`"
  },
  {
    "path": ".github/workflows/ruff.yml",
    "content": "name: Ruff\non: [ push, pull_request ]\njobs:\n  ruff:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n      - uses: chartboost/ruff-action@v1\n"
  },
  {
    "path": ".github/workflows/run_test.yml",
    "content": "name: Python application test\n\non: [push]\n\njobs:\n  build:\n\n    runs-on: ubuntu-latest\n\n    steps:\n    - uses: actions/checkout@v4\n    - name: Set up Python 3.10\n      uses: actions/setup-python@v5\n      with:\n        python-version: '3.10'\n    - name: Install dependencies\n      run: |\n        python -m pip install --no-cache-dir --upgrade pip\n        pip install pytest\n        if [ -f requirements.txt ]; then pip install --no-cache-dir -r requirements.txt; fi\n    - name: Run tests with pytest\n      run: |\n        pytest tests/\n"
  },
  {
    "path": ".github/workflows/stale.yml",
    "content": "# This workflow warns and then closes issues and PRs that have had no activity for a specified amount of time.\n#\n# You can adjust the behavior by modifying this file.\n# For more information, see:\n# https://github.com/actions/stale\nname: Mark stale issues and pull requests\n\non:\n  schedule:\n  - cron: '26 12 * * *'\n\njobs:\n  stale:\n\n    runs-on: ubuntu-latest\n    permissions:\n      issues: write\n      pull-requests: write\n\n    steps:\n    - uses: actions/stale@v9\n      with:\n        repo-token: ${{ secrets.GITHUB_TOKEN }}\n        stale-issue-message: 'Stale issue message'\n        stale-pr-message: 'Stale pull request message'\n        stale-issue-label: 'no-issue-activity'\n        stale-pr-label: 'no-pr-activity'"
  },
  {
    "path": ".github/workflows/test.yml",
    "content": "name: test\n\non:\n  push:\n    branches: [master]\n  pull_request:\n  workflow_dispatch:\n\nenv:\n  POETRY_VERSION: \"1.4.2\"\n\njobs:\n  build:\n    runs-on: ubuntu-latest\n    strategy:\n      matrix:\n        python-version:\n          - \"3.9\"\n          - \"3.10\"\n          - \"3.11\"\n        test_type:\n          - \"core\"\n          - \"extended\"\n    name: Python ${{ matrix.python-version }} ${{ matrix.test_type }}\n    steps:\n      - uses: actions/checkout@v4\n      - name: Set up Python ${{ matrix.python-version }}\n        uses: \"./.github/actions/poetry_setup\"\n        with:\n          python-version: ${{ matrix.python-version }}\n          poetry-version: \"1.4.2\"\n          cache-key: ${{ matrix.test_type }}\n          install-command: |\n              if [ \"${{ matrix.test_type }}\" == \"core\" ]; then\n                echo \"Running core tests, installing dependencies with poetry...\"\n                poetry install\n              else\n                echo \"Running extended tests, installing dependencies with poetry...\"\n                poetry install -E extended_testing\n              fi\n      - name: Run ${{matrix.test_type}} tests\n        run: |\n          if [ \"${{ matrix.test_type }}\" == \"core\" ]; then\n            make test\n          else\n            make extended_tests\n          fi\n        shell: bash"
  },
  {
    "path": ".github/workflows/testing.yml",
    "content": "name: Unit Tests\n\non:\n  push:\n    branches:\n      - master\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n\n    steps:\n      - name: Checkout code\n        uses: actions/checkout@v4\n\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.10'\n\n      - name: Install dependencies\n        run: pip install --no-cache-dir -r requirements.txt\n\n      - name: Run unit tests\n        run: pytest tests/"
  },
  {
    "path": ".github/workflows/unit-test.yml",
    "content": "name: build\n\non:\n  push:\n    branches: [ main ]\n  pull_request:\n    branches: [ main ]\n\njobs:\n\n  build:\n\n    runs-on: ubuntu-latest\n\n    steps:\n    - uses: actions/checkout@v4\n\n    - name: Setup Python\n      uses: actions/setup-python@v5\n      with:\n        python-version: '3.10'\n\n    - name: Install dependencies\n      run: pip install --no-cache-dir -r requirements.txt\n\n    - name: Run Python unit tests\n      run: python3 -m unittest tests/\n\n    - name: Verify that the Docker image for the action builds\n      run: docker build . --file Dockerfile\n\n    - name: Verify integration test results\n      run: python3 -m unittest tests/\n"
  },
  {
    "path": ".github/workflows/welcome.yml",
    "content": "name: Welcome WorkFlow\n\non:\n  issues:\n    types: [opened]\n  pull_request_target:\n    types: [opened]\n\njobs:\n  build:\n    name: 👋 Welcome\n    permissions: write-all\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/first-interaction@v1.3.0\n        with:\n          repo-token: ${{ secrets.GITHUB_TOKEN }}\n          issue-message: \"Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap.\"\n          pr-message:  \"Hello there, thank you for opening an PR ! 🙏🏻 The team was notified and they will get back to you asap.\""
  },
  {
    "path": ".gitignore",
    "content": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n.vscode/\n.vscode\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\n.ruff_cache/\nsdist/\nvar/\nwheels/\nshare/python-wheels/\n*.egg-info/\n.installed.cfg\n*.egg\nMANIFEST\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.nox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n*.py,cover\n.hypothesis/\n.pytest_cache/\ncover/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\ndb.sqlite3\ndb.sqlite3-journal\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\ndocs/_build/\n\n# PyBuilder\n.pybuilder/\ntarget/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# IPython\nprofile_default/\nipython_config.py\n\n# pyenv\n#   For a library or package, you might want to ignore these files since the code is\n#   intended to run in multiple environments; otherwise, check them in:\n# .python-version\n\n# pipenv\n#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.\n#   However, in case of collaboration, if having platform-specific dependencies or dependencies\n#   having no cross-platform support, pipenv may install dependencies that don't work, or not\n#   install all needed dependencies.\n#Pipfile.lock\n\n# poetry\n#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.\n#   This is especially recommended for binary packages to ensure reproducibility, and is more\n#   commonly ignored for libraries.\n#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control\n#poetry.lock\n\n# pdm\n#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.\n#pdm.lock\n#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it\n#   in version control.\n#   https://pdm.fming.dev/#use-with-ide\n.pdm.toml\n\n# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm\n__pypackages__/\n\n# Celery stuff\ncelerybeat-schedule\ncelerybeat.pid\n\n# SageMath parsed files\n*.sage.py\n\n# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Spyder project settings\n.spyderproject\n.spyproject\n\n# Rope project settings\n.ropeproject\n\n# mkdocs documentation\n/site\n\n# mypy\n.mypy_cache/\n.dmypy.json\ndmypy.json\n\n# Pyre type checker\n.pyre/\n\n# pytype static type analyzer\n.pytype/\n\n# Cython debug symbols\ncython_debug/\n\n# PyCharm\n#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can\n#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore\n#  and can be added to the global gitignore or merged into this file.  For a more nuclear\n#  option (not recommended) you can uncomment the following to ignore the entire idea folder.\n#.idea/\n"
  },
  {
    "path": ".pre-commit-config.yaml",
    "content": "repos:\n  - repo: https://github.com/ambv/black\n    rev: 22.3.0\n    hooks:\n    - id: black\n  - repo: https://github.com/charliermarsh/ruff-pre-commit\n    rev: 'v0.0.255'\n    hooks:\n      - id: ruff\n        args: [--fix]\n  - repo: https://github.com/nbQA-dev/nbQA\n    rev: 1.6.3\n    hooks:\n    - id: nbqa-black\n      additional_dependencies: [ipython==8.12, black]\n    - id: nbqa-ruff \n      args: [\"--ignore=I001\"]\n      additional_dependencies: [ipython==8.12, ruff]"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2023 Eternal Reclaimer\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "\n# Liquid Foundation Models [LFMs]\n\n[![Join our Discord](https://img.shields.io/badge/Discord-Join%20our%20server-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/agora-999382051935506503) [![Subscribe on YouTube](https://img.shields.io/badge/YouTube-Subscribe-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/@kyegomez3242) [![Connect on LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/kye-g-38759a207/) [![Follow on X.com](https://img.shields.io/badge/X.com-Follow-1DA1F2?style=for-the-badge&logo=x&logoColor=white)](https://x.com/kyegomezb)\n\nThis is an attempt to make an open source implementation of LFMs, this is obviously not the official repository because it's closed source. I link papers below which I am using as a referrence.\n[Discover more about the model from the original article](https://www.liquid.ai/liquid-foundation-models)\n\n## Installation\n```bash\n$ pip3 install -U lfm-torch\n```\n\n## Usage\n\n```python\nimport torch\nfrom lfm_torch.model import LFModel\nfrom loguru import logger\n\n# Instantiate and test the model\nif __name__ == \"__main__\":\n    batch_size, seq_length, embedding_dim = 32, 128, 512\n    token_dim, channel_dim, expert_dim, adapt_dim, num_experts = (\n        embedding_dim,\n        embedding_dim,\n        embedding_dim,\n        128,\n        4,\n    )\n    model = LFModel(\n        token_dim, channel_dim, expert_dim, adapt_dim, num_experts\n    )\n\n    input_tensor = torch.randn(\n        batch_size, seq_length, embedding_dim\n    )  # 3D text tensor\n    output = model(input_tensor)\n    logger.info(\"Model forward pass complete.\")\n```\n\n\n## Liquid Transformer \nA novel neural architecture combining Liquid Neural Networks, Transformer attention mechanisms, and Mixture of Experts (MoE) for enhanced adaptive processing and dynamic state updates. Very experimental and early! We're working on a training script [here](./liquid_transformer_train.py). It still needs an actual tokenizer like llama's tokenizer but it's getting there. If you can help with this then let me know.\n\n\n### Architecture Overview\n\n```mermaid\nflowchart TB\n    subgraph \"Liquid Transformer\"\n        Input[\"Input Sequence\"] --> TL[\"Transformer Layer\"]\n        \n        subgraph \"Transformer Layer\"\n            direction TB\n            MHA[\"Multi-Head Attention\"] --> LC[\"Liquid Cell\"]\n            LC --> MOE[\"Mixture of Experts\"]\n            MOE --> LN[\"Layer Norm + Residual\"]\n        end\n        \n        subgraph \"Liquid Cell Details\"\n            direction LR\n            HS[\"Hidden State\"] --> WH[\"W_h Linear\"]\n            Input2[\"Input\"] --> WI[\"W_in Linear\"]\n            WH --> Add((+))\n            WI --> Add\n            Add --> Act[\"Activation\"]\n            Act --> LN2[\"LayerNorm\"]\n            LN2 --> DO[\"Dropout\"]\n        end\n        \n        subgraph \"MoE Details\"\n            direction TB\n            Input3[\"Input\"] --> Gate[\"Gating Network\"]\n            Input3 --> E1[\"Expert 1\"]\n            Input3 --> E2[\"Expert 2\"]\n            Input3 --> E3[\"Expert N\"]\n            Gate --> Comb[\"Weighted Combination\"]\n            E1 --> Comb\n            E2 --> Comb\n            E3 --> Comb\n        end\n        \n        TL --> Output[\"Output Sequence\"]\n    end\n```\n\n\n\n```python\nimport torch\nfrom loguru import logger\n\nfrom lfm_torch.liquid_t_moe import LiquidTransformer\n\n# Example usage\nif __name__ == \"__main__\":\n    seq_len, batch_size, embed_size = 10, 2, 64\n    num_heads, num_experts, expert_size, num_layers = 8, 4, 64, 6\n\n    # Create the model\n    model = LiquidTransformer(embed_size, num_heads, num_experts, expert_size, num_layers)\n\n    # Example input tensor\n    x = torch.randn(seq_len, batch_size, embed_size)\n\n    # Forward pass\n    output = model(x)\n    logger.info(f\"Model output shape: {output.shape}\")\n```\n\n\n# Citations\n- All credit for the liquid transformer architecture goes to the original authors from liquid.ai\n- https://arxiv.org/abs/2209.12951\n- \n\n# License\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n"
  },
  {
    "path": "example.py",
    "content": "import torch\nfrom lfm_torch.model import LFModel\nfrom loguru import logger\n\n# Instantiate and test the model\nif __name__ == \"__main__\":\n    batch_size, seq_length, embedding_dim = 32, 128, 512\n    token_dim, channel_dim, expert_dim, adapt_dim, num_experts = (\n        embedding_dim,\n        embedding_dim,\n        embedding_dim,\n        128,\n        4,\n    )\n    model = LFModel(\n        token_dim, channel_dim, expert_dim, adapt_dim, num_experts\n    )\n\n    input_tensor = torch.randn(\n        batch_size, seq_length, embedding_dim\n    )  # 3D text tensor\n    output = model(input_tensor)\n    logger.info(\"Model forward pass complete.\")\n"
  },
  {
    "path": "lfm_torch/__init__.py",
    "content": "from lfm_torch.model import LFModel\nfrom lfm_torch.liquid_t_moe import LiquidTransformer\n\n__all__ = [\"LFModel\", \"LiquidTransformer\"]\n"
  },
  {
    "path": "lfm_torch/liquid_t_moe.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom torch import Tensor\nfrom loguru import logger\n\n# from zeta import MixtureOfExperts, FeedForward\n\n# Logging Configuration\nlogger.add(\n    \"liquid_transformer.log\",\n    format=\"{time} {level} {message}\",\n    level=\"DEBUG\",\n)\n\n\nclass LiquidCell(nn.Module):\n    \"\"\"\n    Liquid Neural Network Cell with enhanced production-readiness.\n\n    This liquid cell dynamically updates its hidden state with input features and\n    continuously adapts the internal state over time using non-linear updates.\n\n    Args:\n        input_size (int): The size of the input features.\n        hidden_size (int): The size of the hidden state.\n        dropout (float): Dropout rate for regularization.\n        layer_norm (bool): Whether to apply layer normalization to stabilize updates.\n    \"\"\"\n\n    def __init__(\n        self,\n        input_size: int,\n        hidden_size: int,\n        dropout: float = 0.1,\n        layer_norm: bool = True,\n    ):\n        super(LiquidCell, self).__init__()\n        self.input_size = input_size\n        self.hidden_size = hidden_size\n        self.dropout = nn.Dropout(dropout)\n\n        # Linear layers for input-to-hidden and hidden-to-hidden connections\n        self.w_in = nn.Linear(input_size, hidden_size)\n        self.w_h = nn.Linear(hidden_size, hidden_size)\n\n        # Optionally add layer normalization\n        self.layer_norm = (\n            nn.LayerNorm(hidden_size) if layer_norm else None\n        )\n\n        # Stable non-linear activation (can switch to ReLU or GELU)\n        self.activation = nn.Tanh()\n\n        logger.info(\n            f\"Initialized LiquidCell with input_size={input_size}, hidden_size={hidden_size}, dropout={dropout}\"\n        )\n\n    def forward(self, x: Tensor, h: Tensor) -> Tensor:\n        \"\"\"\n        Forward pass of the LiquidCell.\n\n        Args:\n            x (Tensor): Input tensor of shape (batch_size, input_size).\n            h (Tensor): Hidden state tensor of shape (batch_size, hidden_size).\n\n        Returns:\n            Tensor: Updated hidden state of shape (batch_size, hidden_size).\n        \"\"\"\n        logger.debug(\n            f\"Input shape: {x.shape}, Hidden state shape: {h.shape}\"\n        )\n\n        # Update hidden state with dynamic input and previous hidden state\n        new_h = self.activation(self.w_in(x) + self.w_h(h))\n\n        # Optionally apply layer normalization\n        if self.layer_norm:\n            new_h = self.layer_norm(new_h)\n\n        # Apply dropout for regularization\n        new_h = self.dropout(new_h)\n\n        logger.debug(f\"Updated hidden state shape: {new_h.shape}\")\n        return new_h\n\n    def initialize_hidden_state(\n        self, batch_size: int, device: torch.device\n    ) -> Tensor:\n        \"\"\"\n        Initialize the hidden state dynamically for the given batch size and device.\n\n        Args:\n            batch_size (int): The batch size for which the hidden state is initialized.\n            device (torch.device): The device (CPU or GPU) where the hidden state should reside.\n\n        Returns:\n            Tensor: Initialized hidden state of shape (batch_size, hidden_size).\n        \"\"\"\n        hidden_state = torch.zeros(batch_size, self.hidden_size).to(\n            device\n        )\n        logger.info(\n            f\"Initialized hidden state of shape {hidden_state.shape} on {device}\"\n        )\n        return hidden_state\n\n\nclass MixtureOfExperts(nn.Module):\n    \"\"\"\n    Mixture of Experts (MoE) Layer\n\n    Args:\n        num_experts (int): Number of experts.\n        expert_size (int): Size of each expert layer.\n        output_size (int): Output size for gating network.\n    \"\"\"\n\n    def __init__(\n        self, num_experts: int, expert_size: int, output_size: int\n    ):\n        super(MixtureOfExperts, self).__init__()\n        self.experts = nn.ModuleList(\n            [\n                nn.Linear(expert_size, output_size)\n                for _ in range(num_experts)\n            ]\n        )\n        self.gate = nn.Linear(expert_size, num_experts)\n\n    def forward(self, x: Tensor) -> Tensor:\n        \"\"\"\n        Forward pass of the MoE layer.\n\n        Args:\n            x (Tensor): Input tensor of shape (batch_size, input_size).\n\n        Returns:\n            Tensor: Output of the mixture of experts.\n        \"\"\"\n        gate_outputs = F.softmax(self.gate(x), dim=1)\n        logger.debug(f\"Gate outputs: {gate_outputs}\")\n\n        expert_outputs = torch.stack(\n            [expert(x) for expert in self.experts], dim=1\n        )\n        logger.debug(f\"Expert outputs: {expert_outputs}\")\n\n        output = torch.einsum(\n            \"be,bec->bc\", gate_outputs, expert_outputs\n        )\n        return output\n\n\nclass TransformerLayerWithLiquid(nn.Module):\n    \"\"\"\n    A single transformer block integrated with a Liquid Neural Network Cell and Mixture of Experts.\n\n    Args:\n        embed_size (int): Size of embedding.\n        num_heads (int): Number of attention heads.\n        num_experts (int): Number of experts in the MoE layer.\n        expert_size (int): Size of each expert layer.\n    \"\"\"\n\n    def __init__(\n        self,\n        embed_size: int,\n        num_heads: int,\n        num_experts: int,\n        expert_size: int,\n    ):\n        super(TransformerLayerWithLiquid, self).__init__()\n        self.attention = nn.MultiheadAttention(embed_size, num_heads)\n        self.liquid_cell = LiquidCell(embed_size, embed_size)\n        self.moe = MixtureOfExperts(\n            num_experts, embed_size, embed_size\n        )\n        # self.moe = MixtureOfExperts(\n        #     dim = embed_size,\n        #     num_experts=num_experts,\n        # )\n        self.layernorm = nn.LayerNorm(embed_size)\n\n    def forward(self, x: Tensor, hidden_state: Tensor) -> Tensor:\n        \"\"\"\n        Forward pass of the Transformer layer with Liquid Cell and Mixture of Experts.\n\n        Args:\n            x (Tensor): Input tensor of shape (seq_len, batch_size, embed_size).\n            hidden_state (Tensor): Hidden state tensor for the liquid cell (batch_size, embed_size).\n\n        Returns:\n            Tensor: Output of the transformer layer.\n        \"\"\"\n        logger.debug(\n            f\"Input shape to TransformerLayerWithLiquid: {x.shape}\"\n        )\n\n        # Self-attention\n        attention_output, _ = self.attention(x, x, x)\n        logger.debug(\n            f\"Attention output shape: {attention_output.shape}\"\n        )\n\n        # Liquid Neural Network Cell\n        hidden_state = self.liquid_cell(\n            attention_output.mean(dim=0), hidden_state\n        )\n        logger.debug(\n            f\"Updated hidden state from LiquidCell: {hidden_state.shape}\"\n        )\n\n        # Mixture of Experts\n        moe_output = self.moe(hidden_state)\n        logger.debug(f\"MoE output shape: {moe_output.shape}\")\n\n        # Layer Norm and Residual Connection\n        output = self.layernorm(\n            attention_output + moe_output.unsqueeze(0)\n        )\n        return output\n\n\nclass LiquidTransformer(nn.Module):\n    \"\"\"\n    Transformer with multiple layers of liquid neural network cells and mixture of experts.\n\n    Args:\n        embed_size (int): Size of embedding.\n        num_heads (int): Number of attention heads.\n        num_experts (int): Number of experts in each MoE layer.\n        expert_size (int): Size of each expert.\n        num_layers (int): Number of transformer layers.\n    \"\"\"\n\n    def __init__(\n        self,\n        embed_size: int,\n        num_heads: int,\n        num_experts: int,\n        expert_size: int,\n        num_layers: int,\n    ):\n        super(LiquidTransformer, self).__init__()\n        self.layers = nn.ModuleList(\n            [\n                TransformerLayerWithLiquid(\n                    embed_size, num_heads, num_experts, expert_size\n                )\n                for _ in range(num_layers)\n            ]\n        )\n        self.hidden_state = torch.zeros(1, embed_size)\n\n    def forward(self, x: Tensor) -> Tensor:\n        \"\"\"\n        Forward pass of the Liquid Transformer.\n\n        Args:\n            x (Tensor): Input tensor of shape (seq_len, batch_size, embed_size).\n\n        Returns:\n            Tensor: Output of the transformer network.\n        \"\"\"\n        for layer in self.layers:\n            x = layer(x, self.hidden_state)\n        return x\n\n\n# # Example usage\n# if __name__ == \"__main__\":\n#     seq_len, batch_size, embed_size = 10, 2, 64\n#     num_heads, num_experts, expert_size, num_layers = 8, 4, 64, 6\n\n#     # Create the model\n#     model = LiquidTransformer(embed_size, num_heads, num_experts, expert_size, num_layers)\n\n#     # Example input tensor\n#     x = torch.randn(seq_len, batch_size, embed_size)\n\n#     # Forward pass\n#     output = model(x)\n#     logger.info(f\"Model output shape: {output.shape}\")\n"
  },
  {
    "path": "lfm_torch/model.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom loguru import logger\nfrom typing import Optional, Tuple\nfrom torch.nn.functional as F\n\nclass AdaptiveLinear(nn.Module):\n    \"\"\"\n    Adaptive Linear layer whose weight and bias adapt based on input.\n    \"\"\"\n\n    def __init__(\n        self, in_features: int, out_features: int, adapt_dim: int\n    ):\n        super(AdaptiveLinear, self).__init__()\n        self.in_features = in_features\n        self.out_features = out_features\n\n        self.weight = nn.Parameter(\n            torch.randn(out_features, in_features)\n        )\n        self.bias = nn.Parameter(torch.randn(out_features))\n\n        # Linear transformation for adapting the weight based on input\n        self.adapt = nn.Linear(adapt_dim, out_features * in_features)\n\n    def forward(\n        self, x: torch.Tensor, adapt_input: torch.Tensor\n    ) -> torch.Tensor:\n        adapt_weight = self.adapt(adapt_input).view(\n            self.out_features, self.in_features\n        )\n        weight = self.weight + adapt_weight\n        return F.linear(x, weight, self.bias)\n\n\nclass TokenMixing(nn.Module):\n    \"\"\"\n    Token mixing layer that performs token-wise interactions using adaptive linear layers.\n    Operates across the sequence dimension (sequence_length).\n    \"\"\"\n\n    def __init__(self, token_dim: int, adapt_dim: int):\n        super(TokenMixing, self).__init__()\n        self.token_mixing = AdaptiveLinear(\n            token_dim, token_dim, adapt_dim\n        )\n\n    def forward(\n        self, x: torch.Tensor, adapt_input: torch.Tensor\n    ) -> torch.Tensor:\n        # x: [batch_size, sequence_length, embedding_dim]\n        batch_size, seq_length, embed_dim = x.shape\n        x = x.view(\n            batch_size * seq_length, embed_dim\n        )  # Flatten sequence for linear transformation\n        x_mixed = self.token_mixing(x, adapt_input)\n        return x_mixed.view(batch_size, seq_length, embed_dim)\n\n\nclass ChannelMixing(nn.Module):\n    \"\"\"\n    Channel mixing layer that performs cross-channel interactions using adaptive linear layers.\n    Operates across the embedding dimension (embedding_dim).\n    \"\"\"\n\n    def __init__(self, channel_dim: int, adapt_dim: int):\n        super(ChannelMixing, self).__init__()\n        self.channel_mixing = AdaptiveLinear(\n            channel_dim, channel_dim, adapt_dim\n        )\n\n    def forward(\n        self, x: torch.Tensor, adapt_input: torch.Tensor\n    ) -> torch.Tensor:\n        # x: [batch_size, sequence_length, embedding_dim]\n        return self.channel_mixing(x, adapt_input)\n\n\nclass MixtureOfExperts(nn.Module):\n    \"\"\"\n    Mixture of Experts (MoE) module that dynamically selects experts based on input.\n    Operates after channel and token mixing.\n    \"\"\"\n\n    def __init__(\n        self, expert_dim: int, num_experts: int, adapt_dim: int\n    ):\n        super(MixtureOfExperts, self).__init__()\n        self.experts = nn.ModuleList(\n            [\n                AdaptiveLinear(expert_dim, expert_dim, adapt_dim)\n                for _ in range(num_experts)\n            ]\n        )\n        self.gating = nn.Linear(adapt_dim, num_experts)\n\n    def forward(\n        self, x: torch.Tensor, adapt_input: torch.Tensor\n    ) -> torch.Tensor:\n        gate_scores = F.softmax(self.gating(adapt_input), dim=-1)\n        output = sum(\n            gate_scores[:, i].unsqueeze(1) * expert(x, adapt_input)\n            for i, expert in enumerate(self.experts)\n        )\n        return output\n\n\nclass LFModel(nn.Module):\n    \"\"\"\n    Custom LF Model architecture combining token mixing, channel mixing, and MoE.\n    Accepts 3D input tensor: [batch_size, sequence_length, embedding_dim].\n    \"\"\"\n\n    def __init__(\n        self,\n        token_dim: int,\n        channel_dim: int,\n        expert_dim: int,\n        adapt_dim: int,\n        num_experts: int,\n    ):\n        super(LFModel, self).__init__()\n        self.featurizer = nn.Linear(token_dim, adapt_dim)\n        self.token_mixer = TokenMixing(token_dim, adapt_dim)\n        self.channel_mixer = ChannelMixing(channel_dim, adapt_dim)\n        self.moe = MixtureOfExperts(\n            expert_dim, num_experts, adapt_dim\n        )\n        self.output_layer = nn.Linear(expert_dim, token_dim)\n\n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        logger.info(\"Input shape: {}\", x.shape)\n\n        # Featurization stage\n        batch_size, seq_length, embed_dim = x.shape\n        adapt_input = self.featurizer(\n            x.mean(dim=1)\n        )  # Aggregate across sequence for adaptation\n        logger.info(\n            \"Featurization complete. Shape: {}\", adapt_input.shape\n        )\n\n        # Token Mixing\n        token_mixed = self.token_mixer(x, adapt_input)\n        logger.info(\n            \"Token mixing complete. Shape: {}\", token_mixed.shape\n        )\n\n        # Channel Mixing\n        channel_mixed = self.channel_mixer(token_mixed, adapt_input)\n        logger.info(\n            \"Channel mixing complete. Shape: {}\", channel_mixed.shape\n        )\n\n        # Mixture of Experts\n        expert_output = self.moe(channel_mixed, adapt_input)\n        logger.info(\n            \"Mixture of Experts complete. Shape: {}\",\n            expert_output.shape,\n        )\n\n        # Final Output\n        output = self.output_layer(expert_output)\n        logger.info(\"Output shape: {}\", output.shape)\n        return output\n\n"
  },
  {
    "path": "lfm_torch/rnn.py",
    "content": "import torch\nimport torch.nn as nn\nfrom loguru import logger\n\nlogger.add(\"liquid_neural_net.log\", rotation=\"500 MB\", level=\"INFO\")\n\n\nclass LiquidNeuron(nn.Module):\n    \"\"\"\n    A single neuron in a liquid neural network with time-varying dynamics.\n\n    Attributes:\n        input_size (int): Size of the input.\n        hidden_size (int): Size of the hidden state.\n        tau (float): Time constant to control the neuron dynamics.\n    \"\"\"\n\n    def __init__(\n        self, input_size: int, hidden_size: int, tau: float = 0.1\n    ):\n        \"\"\"\n        Initialize the LiquidNeuron with the given input and hidden size.\n\n        Args:\n            input_size (int): Size of the input.\n            hidden_size (int): Size of the hidden state.\n            tau (float): Time constant that controls the update speed of the neuron state.\n        \"\"\"\n        super(LiquidNeuron, self).__init__()\n        self.input_size = input_size\n        self.hidden_size = hidden_size\n        self.tau = tau  # Time constant for neuron dynamics\n\n        # Parameters: weights and biases for input and hidden connections\n        self.W_input = nn.Parameter(\n            torch.randn(hidden_size, input_size)\n        )\n        self.W_hidden = nn.Parameter(\n            torch.randn(hidden_size, hidden_size)\n        )\n        self.bias = nn.Parameter(torch.zeros(hidden_size))\n\n        # Initial hidden state (zero-initialized)\n        self.state = torch.zeros(hidden_size)\n\n    def forward(\n        self, x: torch.Tensor, previous_state: torch.Tensor\n    ) -> torch.Tensor:\n        \"\"\"\n        Forward pass through the liquid neuron.\n\n        The state of the neuron evolves dynamically based on the input and the previous state.\n\n        Equation: s(t+1) = (1 - tau) * s(t) + tau * tanh(W_input * x(t) + W_hidden * s(t) + b)\n        Reference: Hasani, Ramin, et al. \"Liquid time-constant networks\" (2021).\n\n        Args:\n            x (torch.Tensor): Input tensor of shape (batch_size, input_size).\n            previous_state (torch.Tensor): The previous state of the neuron.\n\n        Returns:\n            torch.Tensor: The updated state of the neuron.\n        \"\"\"\n        # Dynamic state update based on a differential equation for liquid neuron behavior\n        new_state = (\n            1 - self.tau\n        ) * previous_state + self.tau * torch.tanh(\n            torch.matmul(x, self.W_input.T)\n            + torch.matmul(previous_state, self.W_hidden.T)\n            + self.bias\n        )\n        return new_state\n\n\nclass LiquidRNN(nn.Module):\n    \"\"\"\n    A recurrent neural network (RNN) built using liquid neurons.\n\n    Attributes:\n        input_size (int): Size of the input.\n        hidden_size (int): Size of the hidden state.\n        output_size (int): Size of the output (vocabulary size).\n        tau (float): Time constant for neuron dynamics.\n    \"\"\"\n\n    def __init__(\n        self,\n        input_size: int,\n        hidden_size: int,\n        output_size: int,\n        tau: float = 0.1,\n    ):\n        \"\"\"\n        Initialize the LiquidRNN with the given input size, hidden size, and output size.\n\n        Args:\n            input_size (int): Size of the input.\n            hidden_size (int): Size of the hidden state.\n            output_size (int): Size of the output (vocabulary size).\n            tau (float): Time constant for neuron dynamics (controls neuron update speed).\n        \"\"\"\n        super(LiquidRNN, self).__init__()\n        self.hidden_size = hidden_size\n        self.input_size = input_size\n        self.output_size = output_size\n\n        # Liquid neuron layer\n        self.liquid_neuron = LiquidNeuron(\n            input_size, hidden_size, tau\n        )\n\n        # Output layer\n        self.output_layer = nn.Linear(hidden_size, output_size)\n\n        # Initialize hidden state\n        self.hidden_state = torch.zeros(hidden_size)\n\n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        \"\"\"\n        Forward pass through the LiquidRNN.\n\n        Processes each timestep sequentially, evolving hidden states based on the liquid neuron dynamics.\n\n        Args:\n            x (torch.Tensor): Input tensor of shape (batch_size, sequence_length, input_size).\n\n        Returns:\n            torch.Tensor: Output tensor of shape (batch_size, sequence_length, output_size).\n        \"\"\"\n        batch_size, seq_len, _ = x.shape\n        outputs = []\n        hidden_state = self.hidden_state\n\n        logger.info(\n            f\"Starting forward pass with batch_size: {batch_size}, sequence_length: {seq_len}\"\n        )\n\n        for t in range(seq_len):\n            hidden_state = self.liquid_neuron(\n                x[:, t, :], hidden_state\n            )\n            output = self.output_layer(hidden_state)\n            outputs.append(output)\n\n        return torch.stack(outputs, dim=1)\n\n    def generate_text(\n        self, start_token: torch.Tensor, max_len: int = 100\n    ) -> str:\n        \"\"\"\n        Generates text using the trained LiquidRNN model.\n\n        Args:\n            start_token (torch.Tensor): The starting token for text generation.\n            max_len (int): Maximum length of the generated sequence.\n\n        Returns:\n            str: The generated text as a string of tokens.\n        \"\"\"\n        generated_tokens = [start_token.item()]\n        hidden_state = self.hidden_state.unsqueeze(0)\n\n        logger.info(f\"Generating text with max length {max_len}\")\n\n        # Generate text by predicting one token at a time\n        for _ in range(max_len - 1):\n            output = self(\n                start_token.unsqueeze(0).unsqueeze(0)\n            )  # Add batch and sequence dimensions\n            next_token = torch.argmax(output, dim=-1)\n            generated_tokens.append(next_token.item())\n            start_token = next_token.squeeze(0)\n\n        return \"\".join(map(str, generated_tokens))\n\n\n# Assuming the LiquidRNN class has been defined as shown earlier\n# Here is a simple forward pass on CPU without using GPUs.\n\n\ndef cpu_forward_pass_example():\n    \"\"\"\n    Performs a forward pass with the LiquidRNN model using a CPU.\n    \"\"\"\n    logger.info(\"Starting forward pass on CPU...\")\n\n    # Example configuration\n    input_size = 128  # Input size (e.g., embedding dimension or one-hot encoding size)\n    hidden_size = 256  # Size of the hidden state\n    output_size = 128  # Output size (e.g., vocabulary size)\n\n    # Create a dummy input tensor (batch_size=2, sequence_length=10, input_size=128)\n    batch_size = 2\n    sequence_length = 10\n    dummy_input = torch.randn(batch_size, sequence_length, input_size)\n\n    # Initialize the LiquidRNN model\n    model = LiquidRNN(input_size, hidden_size, output_size)\n\n    # Move the model to CPU (this is already the default)\n    device = torch.device(\"cpu\")\n    model = model.to(device)\n\n    # Perform the forward pass on the dummy input\n    output = model(dummy_input)\n\n    # Log output information\n    logger.info(\n        f\"Output shape: {output.shape}\"\n    )  # Output shape should be (batch_size, sequence_length, output_size)\n    logger.info(\"Forward pass on CPU completed.\")\n\n    return output\n\n\n# Run the CPU forward pass example\noutput = cpu_forward_pass_example()\n\n# Output will be printed in the logs\n"
  },
  {
    "path": "liquid_transformer_example.py",
    "content": "import torch\nfrom loguru import logger\n\nfrom lfm_torch.liquid_t_moe import LiquidTransformer\n\n# Example usage\nif __name__ == \"__main__\":\n    seq_len, batch_size, embed_size = 10, 2, 64\n    num_heads, num_experts, expert_size, num_layers = 8, 4, 64, 6\n\n    # Create the model\n    model = LiquidTransformer(embed_size, num_heads, num_experts, expert_size, num_layers)\n\n    # Example input tensor\n    x = torch.randn(seq_len, batch_size, embed_size)\n\n    # Forward pass\n    output = model(x)\n    logger.info(f\"Model output shape: {output.shape}\")\n"
  },
  {
    "path": "liquid_transformer_train.py",
    "content": "import os\nimport torch\nimport torch.nn as nn\nfrom torch.utils.data import IterableDataset, DataLoader\nfrom torch.optim import AdamW\nfrom torch.optim.lr_scheduler import CosineAnnealingLR\nfrom datasets import load_dataset\nfrom transformers import AutoTokenizer\nfrom typing import Dict, List, Optional, Tuple, Union, Iterator\nfrom dataclasses import dataclass\nfrom loguru import logger\nimport wandb\nfrom tqdm.auto import tqdm\nimport numpy as np\nfrom pathlib import Path\nfrom lfm_torch.liquid_t_moe import LiquidTransformer\n\n# Set tokenizer parallelism\nos.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n\n# Configure logging\nlogger.add(\n    \"training.log\",\n    rotation=\"500 MB\",\n    retention=\"10 days\",\n    level=\"INFO\",\n    format=\"{time:YYYY-MM-DD HH:mm:ss} | {level} | {message}\"\n)\n\n@dataclass\nclass TrainingConfig:\n    \"\"\"Training configuration parameters.\"\"\"\n    \n    # Model parameters\n    embed_size: int = 768  # Match BERT embedding size\n    num_heads: int = 8\n    num_experts: int = 4\n    expert_size: int = 768  # Match embed_size\n    num_layers: int = 6\n    \n    # Training parameters\n    batch_size: int = 16\n    learning_rate: float = 1e-4\n    max_steps: int = 100000\n    warmup_steps: int = 1000\n    max_grad_norm: float = 1.0\n    weight_decay: float = 0.01\n    \n    # Data parameters\n    max_length: int = 512\n    vocab_size: int = 30522  # BERT vocab size\n    \n    # System parameters\n    device: str = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n    num_workers: int = 0  # Avoid multiprocessing issues with streaming\n    seed: int = 42\n    \n    # Logging parameters\n    wandb_project: str = \"liquid-transformer\"\n    checkpoint_dir: str = \"checkpoints\"\n    checkpoint_steps: int = 1000\n    log_steps: int = 10\n\nclass ArXivDataset(IterableDataset):\n    \"\"\"Dataset class for arXiv papers.\"\"\"\n    \n    def __init__(\n        self,\n        tokenizer: AutoTokenizer,\n        max_length: int = 512,\n    ):\n        super().__init__()\n        self.tokenizer = tokenizer\n        self.max_length = max_length\n        self.dataset = load_dataset(\"neuralwork/arxiver\", split=\"train\", streaming=True)\n        logger.info(f\"Initialized streaming dataset\")\n    \n    def preprocess_text(self, text: str) -> str:\n        \"\"\"Clean and preprocess text.\"\"\"\n        return text.strip().replace('\\n', ' ')\n\n    def __iter__(self) -> Iterator[Dict[str, torch.Tensor]]:\n        \"\"\"Iterate over the dataset.\"\"\"\n        iterator = iter(self.dataset)\n        while True:\n            try:\n                item = next(iterator)\n                text = f\"Title: {self.preprocess_text(item['title'])} Abstract: {self.preprocess_text(item['abstract'])}\"\n                \n                encoded = self.tokenizer(\n                    text,\n                    max_length=self.max_length,\n                    padding=\"max_length\",\n                    truncation=True,\n                    return_tensors=\"pt\"\n                )\n                \n                # Keep as long tensor for input ids\n                yield {\n                    \"input_ids\": encoded[\"input_ids\"][0],\n                    \"attention_mask\": encoded[\"attention_mask\"][0]\n                }\n            except StopIteration:\n                iterator = iter(self.dataset)  # Restart iteration\n                continue\n\nclass Trainer:\n    \"\"\"Trainer class for Liquid Transformer.\"\"\"\n    \n    def __init__(\n        self,\n        model: nn.Module,\n        config: TrainingConfig,\n        tokenizer: AutoTokenizer\n    ):\n        self.model = model.to(config.device)\n        self.config = config\n        self.tokenizer = tokenizer\n        \n        # Initialize hidden state\n        self.model.hidden_state = torch.zeros(\n            config.batch_size,\n            config.embed_size,\n            device=config.device\n        )\n        \n        # Create embedding layer for input tokens\n        self.embedding = nn.Embedding(\n            config.vocab_size,\n            config.embed_size\n        ).to(config.device)\n        \n        self.optimizer = AdamW(\n            list(model.parameters()) + list(self.embedding.parameters()),\n            lr=config.learning_rate,\n            weight_decay=config.weight_decay\n        )\n        \n        self.scheduler = CosineAnnealingLR(\n            self.optimizer,\n            T_max=config.max_steps\n        )\n        \n        wandb.init(project=config.wandb_project, config=vars(config))\n        os.makedirs(config.checkpoint_dir, exist_ok=True)\n        logger.info(\"Trainer initialized successfully\")\n    \n    def train_step(\n        self,\n        batch: Dict[str, torch.Tensor]\n    ) -> float:\n        \"\"\"Perform a single training step.\"\"\"\n        try:\n            self.model.train()\n            \n            # Move batch to device\n            input_ids = batch[\"input_ids\"].to(self.config.device)\n            attention_mask = batch[\"attention_mask\"].to(self.config.device)\n            \n            # Convert input tokens to embeddings\n            embedded_input = self.embedding(input_ids)  # [batch_size, seq_len, embed_size]\n            \n            # Add sequence dimension expected by transformer\n            embedded_input = embedded_input.unsqueeze(0)  # [1, batch_size, seq_len, embed_size]\n            \n            # Update hidden state size if batch size changed\n            if self.model.hidden_state.size(0) != embedded_input.size(1):\n                self.model.hidden_state = self.model.hidden_state.new_zeros(\n                    embedded_input.size(1),\n                    self.config.embed_size\n                )\n            \n            # Forward pass\n            outputs = self.model(embedded_input)\n            \n            # Compute reconstruction loss\n            loss = nn.MSELoss()(outputs, embedded_input)\n            \n            # Backward pass\n            loss.backward()\n            torch.nn.utils.clip_grad_norm_(\n                list(self.model.parameters()) + list(self.embedding.parameters()),\n                self.config.max_grad_norm\n            )\n            \n            self.optimizer.step()\n            self.optimizer.zero_grad()\n            \n            return loss.item()\n            \n        except Exception as e:\n            logger.error(f\"Error in train_step: {str(e)}\")\n            raise\n    \n    def save_checkpoint(\n        self,\n        step: int,\n        loss: Optional[float] = None,\n    ):\n        \"\"\"Save model checkpoint.\"\"\"\n        checkpoint = {\n            \"step\": step,\n            \"model_state_dict\": self.model.state_dict(),\n            \"embedding_state_dict\": self.embedding.state_dict(),\n            \"optimizer_state_dict\": self.optimizer.state_dict(),\n            \"scheduler_state_dict\": self.scheduler.state_dict(),\n            \"loss\": loss if loss is not None else float('inf'),\n            \"config\": self.config\n        }\n        \n        path = Path(self.config.checkpoint_dir)\n        checkpoint_path = path / f\"checkpoint_step_{step}.pt\"\n        torch.save(checkpoint, checkpoint_path)\n        logger.info(f\"Saved checkpoint at step {step} to {checkpoint_path}\")\n    \n    def train(\n        self,\n        train_dataset: ArXivDataset,\n    ):\n        \"\"\"Train the model.\"\"\"\n        logger.info(\"Starting training\")\n        \n        train_loader = DataLoader(\n            train_dataset,\n            batch_size=self.config.batch_size,\n            num_workers=self.config.num_workers\n        )\n        \n        global_step = 0\n        running_loss = 0.0\n        current_loss = None\n        \n        progress_bar = tqdm(total=self.config.max_steps, desc=\"Training\")\n        \n        try:\n            for batch in train_loader:\n                if global_step >= self.config.max_steps:\n                    break\n                \n                current_loss = self.train_step(batch)\n                running_loss += current_loss\n                global_step += 1\n                \n                # Update progress bar\n                progress_bar.update(1)\n                progress_bar.set_postfix({\n                    \"loss\": f\"{current_loss:.4f}\",\n                    \"step\": global_step\n                })\n                \n                # Log metrics\n                if global_step % self.config.log_steps == 0:\n                    avg_loss = running_loss / self.config.log_steps\n                    wandb.log({\n                        \"train_loss\": avg_loss,\n                        \"learning_rate\": self.scheduler.get_last_lr()[0],\n                        \"global_step\": global_step\n                    })\n                    running_loss = 0.0\n                \n                # Save checkpoint if needed\n                if global_step % self.config.checkpoint_steps == 0:\n                    self.save_checkpoint(global_step, current_loss)\n                \n                # Update learning rate\n                self.scheduler.step()\n                \n        except KeyboardInterrupt:\n            logger.info(\"Training interrupted by user\")\n            self.save_checkpoint(global_step, current_loss)\n        except Exception as e:\n            logger.error(f\"Training error: {str(e)}\")\n            self.save_checkpoint(global_step, current_loss)\n            raise\n        finally:\n            progress_bar.close()\n            # Save final checkpoint\n            self.save_checkpoint(global_step, current_loss)\n            logger.info(f\"Training completed after {global_step} steps\")\n\ndef main():\n    \"\"\"Main training function.\"\"\"\n    try:\n        # Set random seeds\n        config = TrainingConfig()\n        torch.manual_seed(config.seed)\n        np.random.seed(config.seed)\n        \n        # Initialize tokenizer\n        tokenizer = AutoTokenizer.from_pretrained(\"bert-base-uncased\")\n        \n        # Create dataset\n        train_dataset = ArXivDataset(\n            tokenizer=tokenizer,\n            max_length=config.max_length,\n        )\n        \n        # Initialize model\n        model = LiquidTransformer(\n            embed_size=config.embed_size,\n            num_heads=config.num_heads,\n            num_experts=config.num_experts,\n            expert_size=config.expert_size,\n            num_layers=config.num_layers\n        )\n        \n        # Initialize trainer\n        trainer = Trainer(model, config, tokenizer)\n        \n        # Start training\n        trainer.train(train_dataset)\n        \n    except Exception as e:\n        logger.error(f\"Training failed with error: {str(e)}\")\n        raise\n    finally:\n        wandb.finish()\n\nif __name__ == \"__main__\":\n    main()"
  },
  {
    "path": "pyproject.toml",
    "content": "[build-system]\nrequires = [\"poetry-core>=1.0.0\"]\nbuild-backend = \"poetry.core.masonry.api\"\n\n[tool.poetry]\nname = \"lfm-torch\"\nversion = \"0.0.3\"\ndescription = \"lfm - Pytorch\"\nlicense = \"MIT\"\nauthors = [\"Kye Gomez <kye@apac.ai>\"]\nhomepage = \"https://github.com/kyegomez/lfm\"\ndocumentation = \"https://github.com/kyegomez/lfm\"  # Add this if you have documentation.\nreadme = \"README.md\"  # Assuming you have a README.md\nrepository = \"https://github.com/kyegomez/lfm\"\nkeywords = [\"artificial intelligence\", \"deep learning\", \"optimizers\", \"Prompt Engineering\"]\nclassifiers = [\n    \"Development Status :: 4 - Beta\",\n    \"Intended Audience :: Developers\",\n    \"Topic :: Scientific/Engineering :: Artificial Intelligence\",\n    \"License :: OSI Approved :: MIT License\",\n    \"Programming Language :: Python :: 3.9\"\n]\n\n[tool.poetry.dependencies]\npython = \"^3.10\"\ntorch = \"*\"\nloguru = \"*\"\n\n\n\n[tool.poetry.group.lint.dependencies]\nruff = \"^0.1.6\"\ntypes-toml = \"^0.10.8.1\"\ntypes-redis = \"^4.3.21.6\"\ntypes-pytz = \"^2023.3.0.0\"\nblack = \"^23.1.0\"\ntypes-chardet = \"^5.0.4.6\"\nmypy-protobuf = \"^3.0.0\"\n\n\n[tool.autopep8]\nmax_line_length = 80\nignore = \"E501,W6\"  # or [\"E501\", \"W6\"]\nin-place = true\nrecursive = true\naggressive = 3\n\n\n[tool.ruff]\nline-length = 70\n\n[tool.black]\nline-length = 70\ntarget-version = ['py38']\npreview = true\n"
  },
  {
    "path": "requirements.txt",
    "content": "torch\n"
  },
  {
    "path": "research/bench.py",
    "content": "import torch\nimport torch.nn as nn\nimport numpy as np\nfrom tqdm import tqdm\nfrom loguru import logger\nfrom typing import List, Tuple\n\n\nclass ScalingBenchmark:\n    def __init__(\n        self,\n        models: List[nn.Module],\n        scaling_factor: float = 1.1,\n        input_size_start: int = 16,\n        num_tests: int = 10,\n    ):\n        \"\"\"\n        Initialize the benchmark.\n\n        :param models: A list of models to test.\n        :param scaling_factor: How much to increase input size each iteration.\n        :param input_size_start: Starting size of input.\n        :param num_tests: Number of tests to run.\n        \"\"\"\n        logger.info(\n            f\"Initializing ScalingBenchmark with {len(models)} models\"\n        )\n\n        self.models = models\n        self.scaling_factor = scaling_factor\n        self.input_size_start = input_size_start\n        self.num_tests = num_tests\n\n    def _generate_input(self, input_size: int) -> torch.Tensor:\n        \"\"\"\n        Generates random input tensor of a given size.\n\n        :param input_size: Size of the input tensor to generate.\n        :return: Random tensor of shape (input_size, input_size).\n        \"\"\"\n        logger.debug(f\"Generating input tensor of size {input_size}\")\n        return torch.randn(input_size, input_size)\n\n    def _test_model(\n        self, model: nn.Module, input_size: int\n    ) -> Tuple[float, float]:\n        \"\"\"\n        Test a model with a specific input size and measure the forward pass time and output.\n\n        :param model: The model to test.\n        :param input_size: Size of the input tensor.\n        :return: The time taken for the forward pass and the model's output mean.\n        \"\"\"\n        logger.debug(f\"Testing model with input size {input_size}\")\n\n        input_tensor = self._generate_input(input_size)\n\n        model.eval()\n        with torch.no_grad():\n            start_time = torch.cuda.Event(enable_timing=True)\n            end_time = torch.cuda.Event(enable_timing=True)\n\n            start_time.record()\n            output = model(input_tensor)\n            end_time.record()\n\n            # Waits for everything to finish running\n            torch.cuda.synchronize()\n\n            elapsed_time = start_time.elapsed_time(\n                end_time\n            )  # in milliseconds\n            output_mean = output.mean().item()\n\n            logger.debug(\n                f\"Model test completed: elapsed time {elapsed_time} ms, output mean {output_mean}\"\n            )\n\n            return elapsed_time, output_mean\n\n    def run_benchmark(self) -> None:\n        \"\"\"\n        Run the scaling benchmark on all models.\n        Categorizes the models as linear, quadratic, or sub-linear based on performance scaling.\n        \"\"\"\n        logger.info(\"Starting benchmark tests\")\n\n        performance_data = {model: [] for model in self.models}\n\n        for i in tqdm(range(self.num_tests), desc=\"Benchmarking\"):\n            current_input_size = int(\n                self.input_size_start * (self.scaling_factor**i)\n            )\n            logger.info(\n                f\"Running test {i + 1}/{self.num_tests} with input size {current_input_size}\"\n            )\n\n            for model in self.models:\n                elapsed_time, output_mean = self._test_model(\n                    model, current_input_size\n                )\n                performance_data[model].append(\n                    (current_input_size, elapsed_time)\n                )\n\n        self._categorize_models(performance_data)\n\n    def _categorize_models(self, performance_data: dict) -> None:\n        \"\"\"\n        Categorize models based on how their performance scales with input size.\n\n        :param performance_data: Dictionary containing performance data for each model.\n        \"\"\"\n        logger.info(\"Categorizing models based on scaling behavior\")\n\n        for model, data in performance_data.items():\n            input_sizes, times = zip(*data)\n            input_sizes = np.array(input_sizes)\n            times = np.array(times)\n\n            # Fit to a polynomial of degree 2 (quadratic), 1 (linear), or sub-linear\n            quadratic_fit = np.polyfit(input_sizes, times, 2)\n            linear_fit = np.polyfit(input_sizes, times, 1)\n\n            quadratic_error = np.sum(\n                (np.polyval(quadratic_fit, input_sizes) - times) ** 2\n            )\n            linear_error = np.sum(\n                (np.polyval(linear_fit, input_sizes) - times) ** 2\n            )\n\n            logger.info(\n                f\"Model {model.__class__.__name__} fit results: quadratic_error={quadratic_error}, linear_error={linear_error}\"\n            )\n\n            if quadratic_error < linear_error:\n                logger.success(\n                    f\"Model {model.__class__.__name__} scales quadratically.\"\n                )\n            elif (\n                linear_error < quadratic_error\n                and linear_error < 0.1 * quadratic_error\n            ):\n                logger.success(\n                    f\"Model {model.__class__.__name__} scales linearly.\"\n                )\n            else:\n                logger.success(\n                    f\"Model {model.__class__.__name__} scales sub-linearly.\"\n                )\n"
  },
  {
    "path": "research/sss_linear.py",
    "content": "import torch\nimport torch.nn as nn\nfrom torch import Tensor\nfrom loguru import logger\nimport time\nfrom typing import List\n\nlogger.info(\"Setting up Sub-Sub-Linear LLM Model\")\n\n\nclass SparseDynamicLayer(nn.Module):\n    \"\"\"\n    A layer that dynamically selects a subset of tokens for processing.\n\n    Attributes:\n        input_dim (int): The input embedding dimension.\n        output_dim (int): The output embedding dimension.\n        dropout (float): Dropout rate for token selection.\n    \"\"\"\n\n    def __init__(\n        self, input_dim: int, output_dim: int, dropout: float = 0.1\n    ):\n        super(SparseDynamicLayer, self).__init__()\n        self.input_dim = input_dim\n        self.output_dim = output_dim\n        self.dropout = nn.Dropout(dropout)\n        self.fc = nn.Linear(input_dim, output_dim)\n\n    def forward(self, x: Tensor) -> Tensor:\n        \"\"\"\n        Forward pass through the sparse dynamic layer.\n\n        Args:\n            x (Tensor): Input tensor of shape (batch_size, sequence_length, input_dim).\n\n        Returns:\n            Tensor: Output tensor after sparse selection and transformation.\n        \"\"\"\n        # Dynamic sparse token selection (probability-driven)\n        token_selection_prob = torch.sigmoid(\n            self.fc(x)\n        )  # Shape (batch_size, seq_len, output_dim)\n        selected_tokens = self.dropout(token_selection_prob)\n\n        logger.info(\n            f\"Selected {selected_tokens.sum()} tokens for processing out of {x.shape[1]} total tokens.\"\n        )\n        return selected_tokens\n\n\nclass HierarchicalSubstructureLayer(nn.Module):\n    \"\"\"\n    A layer that processes the input sequence hierarchically, by splitting the sequence into substructures\n    and processing relevant portions.\n\n    Attributes:\n        input_dim (int): The input embedding dimension.\n    \"\"\"\n\n    def __init__(self, input_dim: int):\n        super(HierarchicalSubstructureLayer, self).__init__()\n        self.input_dim = input_dim\n        self.fc = nn.Linear(input_dim, input_dim)\n\n    def forward(self, x: Tensor) -> Tensor:\n        \"\"\"\n        Forward pass for hierarchical substructure processing.\n\n        Args:\n            x (Tensor): Input tensor of shape (batch_size, sequence_length, input_dim).\n\n        Returns:\n            Tensor: Output tensor after hierarchical substructure processing.\n        \"\"\"\n        batch_size, seq_len, _ = x.size()\n        logger.info(\n            f\"Processing {seq_len} tokens into hierarchical substructures.\"\n        )\n\n        # Hierarchical substructure processing\n        # For simplicity, we'll break the input sequence into 2 substructures.\n        substructure_1 = x[:, : seq_len // 2, :]\n        substructure_2 = x[:, seq_len // 2 :, :]\n\n        # Processing each substructure independently\n        processed_1 = self.fc(substructure_1)\n        processed_2 = self.fc(substructure_2)\n\n        # Reassemble the processed structures\n        processed = torch.cat([processed_1, processed_2], dim=1)\n\n        return processed\n\n\nclass ProbabilisticMemoryCompressionLayer(nn.Module):\n    \"\"\"\n    A layer that performs probabilistic memory compression to reduce the amount of information passed to subsequent layers.\n\n    Attributes:\n        input_dim (int): The input embedding dimension.\n        output_dim (int): The output embedding dimension (should match hidden_dim of next layer).\n    \"\"\"\n\n    def __init__(self, input_dim: int, output_dim: int):\n        super(ProbabilisticMemoryCompressionLayer, self).__init__()\n        self.input_dim = input_dim\n        self.output_dim = output_dim\n        self.fc = nn.Linear(\n            input_dim, output_dim\n        )  # Directly output hidden_dim size\n\n    def forward(self, x: Tensor) -> Tensor:\n        \"\"\"\n        Forward pass for probabilistic memory compression.\n\n        Args:\n            x (Tensor): Input tensor of shape (batch_size, sequence_length, input_dim).\n\n        Returns:\n            Tensor: Compressed memory output.\n        \"\"\"\n        batch_size, seq_len, _ = x.size()\n        logger.info(f\"Compressing memory from {seq_len} tokens.\")\n\n        # Apply the compression to match the output_dim (hidden_dim)\n        compressed = self.fc(x)\n\n        logger.info(\n            f\"Memory compressed to {compressed.shape[1]} tokens.\"\n        )\n        return compressed\n\n\nclass SubSubLinearLLM(nn.Module):\n    \"\"\"\n    Sub-Sub-Linear LLM Model that scales sub-sub-linearly while maintaining learning ability.\n\n    Attributes:\n        input_dim (int): Dimension of input embeddings.\n        hidden_dim (int): Dimension of hidden layers.\n        output_dim (int): Dimension of the output embeddings.\n    \"\"\"\n\n    def __init__(\n        self, input_dim: int, hidden_dim: int, output_dim: int\n    ):\n        super(SubSubLinearLLM, self).__init__()\n        self.sparse_layer = SparseDynamicLayer(input_dim, hidden_dim)\n        self.hierarchical_layer = HierarchicalSubstructureLayer(\n            hidden_dim\n        )\n        self.compression_layer = ProbabilisticMemoryCompressionLayer(\n            hidden_dim, hidden_dim\n        )  # Ensure output is hidden_dim\n        self.fc_output = nn.Linear(hidden_dim, output_dim)\n\n    def forward(self, x: Tensor) -> Tensor:\n        \"\"\"\n        Forward pass of the Sub-Sub-Linear LLM Model.\n\n        Args:\n            x (Tensor): Input tensor of shape (batch_size, sequence_length, input_dim).\n\n        Returns:\n            Tensor: Final output tensor of shape (batch_size, output_dim).\n        \"\"\"\n        # Step 1: Sparse dynamic selection\n        x = self.sparse_layer(x)\n\n        # Step 2: Hierarchical processing\n        x = self.hierarchical_layer(x)\n\n        # Step 3: Probabilistic memory compression\n        x = self.compression_layer(x)\n\n        # Final output layer\n        # Perform mean pooling along the sequence dimension (dim=1), resulting in shape (batch_size, hidden_dim)\n        x = x.mean(dim=1)\n\n        # Now x has shape (batch_size, hidden_dim), which matches fc_output\n        output = self.fc_output(x)\n        return output\n\n\nimport matplotlib.pyplot as plt\nfrom loguru import logger\nimport numpy as np\nfrom scipy.stats import linregress\n\n\ndef benchmark_model(\n    model: nn.Module,\n    input_dim: int,\n    seq_lengths: List[int],\n    batch_size: int = 32,\n    runs: int = 5,\n):\n    \"\"\"\n    Benchmark the model on different sequence lengths and log the results.\n\n    Args:\n        model (nn.Module): The model to benchmark.\n        input_dim (int): Input dimensionality.\n        seq_lengths (List[int]): List of sequence lengths to test.\n        batch_size (int): Batch size for testing.\n        runs (int): Number of runs to average for each sequence length.\n\n    Returns:\n        dict: A dictionary with sequence lengths as keys and average times as values.\n    \"\"\"\n    model.eval()\n    times = []\n\n    for seq_len in seq_lengths:\n        logger.info(f\"Benchmarking sequence length {seq_len}\")\n\n        # Generate random input for the given sequence length\n        x = torch.randn(batch_size, seq_len, input_dim)\n\n        # Measure time for several runs and average\n        elapsed_times = []\n        for _ in range(runs):\n            start_time = time.time()\n\n            with torch.no_grad():\n                output = model(x)\n\n            end_time = time.time()\n            elapsed_times.append(end_time - start_time)\n\n        avg_time = np.mean(elapsed_times)\n        times.append(avg_time)\n\n        logger.info(\n            f\"Average time for sequence length {seq_len}: {avg_time:.6f} seconds\"\n        )\n\n    return {\n        seq_len: time for seq_len, time in zip(seq_lengths, times)\n    }\n\n\ndef detect_scaling_regime(\n    seq_lengths: List[int], times: List[float]\n) -> float:\n    \"\"\"\n    Detect the scaling regime by fitting a line to the log-log data and computing the slope.\n\n    Args:\n        seq_lengths (List[int]): Sequence lengths.\n        times (List[float]): Times corresponding to each sequence length.\n\n    Returns:\n        float: The slope of the log-log plot indicating the scaling regime.\n    \"\"\"\n    log_seq_lengths = np.log(seq_lengths)\n    log_times = np.log(times)\n\n    # Fit a linear regression to the log-log data\n    slope, intercept, r_value, p_value, std_err = linregress(\n        log_seq_lengths, log_times\n    )\n\n    logger.info(f\"Slope of the log-log plot: {slope:.4f}\")\n    return slope\n\n\ndef plot_benchmark_results(results: dict, slope: float):\n    \"\"\"\n    Plot the benchmark results to analyze scaling behavior.\n\n    Args:\n        results (dict): A dictionary with sequence lengths as keys and average times as values.\n        slope (float): The slope of the log-log plot for scaling regime detection.\n    \"\"\"\n    seq_lengths = list(results.keys())\n    times = list(results.values())\n\n    # Plot the results\n    plt.figure(figsize=(10, 6))\n    plt.plot(\n        seq_lengths, times, marker=\"o\", label=f\"Slope: {slope:.2f}\"\n    )\n    plt.title(\"Model Benchmark: Time vs Sequence Length\")\n    plt.xlabel(\"Sequence Length\")\n    plt.ylabel(\"Average Time (seconds)\")\n    plt.grid(True)\n    plt.xscale(\"log\")\n    plt.yscale(\n        \"log\"\n    )  # Use log-log scale to detect power-law relationships\n    plt.legend()\n    plt.show()\n\n    logger.info(\"Benchmark plot generated.\")\n\n\nif __name__ == \"__main__\":\n    input_dim = 512\n    hidden_dim = 256\n    output_dim = 128\n    seq_lengths = [\n        128,\n        256,\n        512,\n        1024,\n        2048,\n    ]  # Varying sequence lengths\n    batch_size = 32\n    runs = 5  # Average over 5 runs for each sequence length\n\n    model = SubSubLinearLLM(\n        input_dim=input_dim,\n        hidden_dim=hidden_dim,\n        output_dim=output_dim,\n    )\n\n    # Run benchmark and get results\n    benchmark_results = benchmark_model(\n        model, input_dim, seq_lengths, batch_size, runs\n    )\n\n    # Extract sequence lengths and times\n    seq_lengths = list(benchmark_results.keys())\n    times = list(benchmark_results.values())\n\n    # Detect scaling regime (slope of log-log plot)\n    slope = detect_scaling_regime(seq_lengths, times)\n\n    # Plot the benchmark results and scaling regime\n    plot_benchmark_results(benchmark_results, slope)\n\n    # Automatically detect and print scaling regime\n    if slope > 1.5:\n        logger.info(\n            f\"The model scales **quadratically** (slope: {slope:.2f})\"\n        )\n    elif 0.9 <= slope <= 1.5:\n        logger.info(\n            f\"The model scales **linearly** (slope: {slope:.2f})\"\n        )\n    elif 0.5 <= slope < 0.9:\n        logger.info(\n            f\"The model scales **sub-linearly** (slope: {slope:.2f})\"\n        )\n    else:\n        logger.info(\n            f\"The model scales **sub-sub-linearly** (slope: {slope:.2f})\"\n        )\n"
  },
  {
    "path": "research/ssub.py",
    "content": "# hcen.py\n\nimport torch\nimport torch.nn as nn\nfrom torch import Tensor\nfrom loguru import logger\n\nlogger.add(\"hcen.log\", rotation=\"1 MB\")  # Log file configuration\n\n\nclass EncodingFunction(nn.Module):\n    \"\"\"\n    Encoding function f that maps sequences of varying lengths to a fixed-dimensional vector space.\n    \"\"\"\n\n    def __init__(self, input_dim: int, hidden_dim: int):\n        super(EncodingFunction, self).__init__()\n        self.encoder = nn.Linear(input_dim, hidden_dim)\n\n    def forward(self, x: Tensor) -> Tensor:\n        \"\"\"\n        Forward pass of the encoding function.\n\n        Args:\n            x (Tensor): Input tensor of shape (batch_size, seq_len, input_dim).\n\n        Returns:\n            Tensor: Encoded tensor of shape (batch_size, hidden_dim).\n        \"\"\"\n        # Simple mean pooling followed by a linear layer\n        x = x.mean(dim=1)  # Shape: (batch_size, input_dim)\n        encoded = self.encoder(x)  # Shape: (batch_size, hidden_dim)\n        return encoded\n\n\nclass ImportanceScoring(nn.Module):\n    \"\"\"\n    Importance scoring function I(C_l) to select the most informative segments.\n    \"\"\"\n\n    def __init__(self, hidden_dim: int):\n        super(ImportanceScoring, self).__init__()\n        self.scorer = nn.Linear(hidden_dim, 1)\n\n    def forward(self, x: Tensor) -> Tensor:\n        \"\"\"\n        Compute importance scores for each compressed representation.\n\n        Args:\n            x (Tensor): Tensor of shape (batch_size, num_segments, hidden_dim).\n\n        Returns:\n            Tensor: Importance scores of shape (batch_size, num_segments).\n        \"\"\"\n        scores = self.scorer(x).squeeze(\n            -1\n        )  # Shape: (batch_size, num_segments)\n        return scores\n\n\nclass AggregationFunction(nn.Module):\n    \"\"\"\n    Aggregation function g to combine two compressed representations.\n    \"\"\"\n\n    def __init__(self, hidden_dim: int):\n        super(AggregationFunction, self).__init__()\n        self.aggregator = nn.Linear(hidden_dim * 2, hidden_dim)\n\n    def forward(self, x1: Tensor, x2: Tensor) -> Tensor:\n        \"\"\"\n        Aggregate two compressed representations.\n\n        Args:\n            x1 (Tensor): Tensor of shape (batch_size, hidden_dim).\n            x2 (Tensor): Tensor of shape (batch_size, hidden_dim).\n\n        Returns:\n            Tensor: Aggregated tensor of shape (batch_size, hidden_dim).\n        \"\"\"\n        combined = torch.cat(\n            [x1, x2], dim=-1\n        )  # Shape: (batch_size, hidden_dim * 2)\n        aggregated = self.aggregator(\n            combined\n        )  # Shape: (batch_size, hidden_dim)\n        return aggregated\n\n\nclass OutputFunction(nn.Module):\n    \"\"\"\n    Output function h to produce the final output from the root compressed representation.\n    \"\"\"\n\n    def __init__(self, hidden_dim: int, output_dim: int):\n        super(OutputFunction, self).__init__()\n        self.output_layer = nn.Linear(hidden_dim, output_dim)\n\n    def forward(self, x: Tensor) -> Tensor:\n        \"\"\"\n        Compute the final output.\n\n        Args:\n            x (Tensor): Root compressed representation of shape (batch_size, hidden_dim).\n\n        Returns:\n            Tensor: Final output tensor of shape (batch_size, output_dim).\n        \"\"\"\n        output = self.output_layer(\n            x\n        )  # Shape: (batch_size, output_dim)\n        return output\n\n\nclass HCEN(nn.Module):\n    \"\"\"\n    Hierarchical Compressed Encoding Network (HCEN).\n    \"\"\"\n\n    def __init__(\n        self, input_dim: int, hidden_dim: int, output_dim: int, k: int\n    ):\n        super(HCEN, self).__init__()\n        self.input_dim = input_dim\n        self.hidden_dim = hidden_dim\n        self.k = k  # Number of segments to select at each level\n        self.encoding_function = EncodingFunction(\n            input_dim, hidden_dim\n        )\n        self.importance_scoring = ImportanceScoring(hidden_dim)\n        self.aggregation_function = AggregationFunction(hidden_dim)\n        self.output_function = OutputFunction(hidden_dim, output_dim)\n\n    def forward(self, x: Tensor) -> Tensor:\n        \"\"\"\n        Forward pass of HCEN.\n\n        Args:\n            x (Tensor): Input tensor of shape (batch_size, seq_len, input_dim).\n\n        Returns:\n            Tensor: Final output tensor of shape (batch_size, output_dim).\n        \"\"\"\n        batch_size, seq_len, _ = x.size()\n        logger.info(f\"Input shape: {x.shape}\")\n\n        # Initialize segments with the entire sequence\n        segments = [\n            x\n        ]  # List of tensors of shape (batch_size, seq_len_i, input_dim)\n        level = 0\n\n        while True:\n            logger.info(\n                f\"Processing level {level} with {len(segments)} segments\"\n            )\n\n            compressed_reps = []\n            # Encode each segment\n            for segment in segments:\n                if segment.size(-1) == self.input_dim:\n                    # Segment is unencoded, so encode it\n                    encoded = self.encoding_function(\n                        segment\n                    )  # Shape: (batch_size, hidden_dim)\n                    compressed_reps.append(encoded)\n                elif segment.size(-1) == self.hidden_dim:\n                    # Segment is already encoded\n                    compressed_reps.append(segment)\n                else:\n                    raise ValueError(\n                        f\"Unexpected segment size: {segment.size()}\"\n                    )\n\n            compressed_reps = torch.stack(\n                compressed_reps, dim=1\n            )  # Shape: (batch_size, num_segments, hidden_dim)\n\n            # If only one compressed representation remains, we can stop\n            if compressed_reps.size(1) == 1:\n                root_representation = compressed_reps.squeeze(\n                    1\n                )  # Shape: (batch_size, hidden_dim)\n                break\n\n            # Compute importance scores\n            importance_scores = self.importance_scoring(\n                compressed_reps\n            )  # Shape: (batch_size, num_segments)\n            logger.debug(\n                f\"Importance scores shape: {importance_scores.shape}\"\n            )\n\n            # Select top-k segments based on importance scores\n            k = min(self.k, compressed_reps.size(1))\n            _, indices = torch.topk(\n                importance_scores, k, dim=1\n            )  # Indices of top-k segments\n            logger.info(f\"Selected top-{k} segments at level {level}\")\n\n            # Gather selected compressed representations\n            batch_indices = (\n                torch.arange(batch_size).unsqueeze(-1).expand(-1, k)\n            )\n            selected_reps = compressed_reps[\n                batch_indices, indices\n            ]  # Shape: (batch_size, k, hidden_dim)\n\n            # Aggregate selected representations pairwise\n            aggregated_reps = []\n            i = 0\n            while i < selected_reps.size(1):\n                x1 = selected_reps[\n                    :, i, :\n                ]  # Shape: (batch_size, hidden_dim)\n                if i + 1 < selected_reps.size(1):\n                    x2 = selected_reps[\n                        :, i + 1, :\n                    ]  # Shape: (batch_size, hidden_dim)\n                    aggregated = self.aggregation_function(x1, x2)\n                else:\n                    # If there's an odd number of representations, carry the last one forward\n                    aggregated = x1\n                aggregated_reps.append(aggregated)\n                i += 2\n\n            # Prepare for next level\n            segments = aggregated_reps  # Each segment is a tensor of shape (batch_size, hidden_dim)\n\n            level += 1\n\n        # Final output\n        output = self.output_function(\n            root_representation\n        )  # Shape: (batch_size, output_dim)\n        logger.info(f\"Output shape: {output.shape}\")\n        return output\n\n\n# test_hcen.py\n\n# import torch\n# # from hcen import HCEN\n# import time\n# import matplotlib.pyplot as plt\n\n# # def test_hcen_sublinear_scaling():\n# #     \"\"\"\n# #     Test the HCEN model to verify sub-linear computational complexity.\n# #     \"\"\"\n# #     input_dim = 128\n# #     hidden_dim = 64\n# #     output_dim = 10\n# #     k = 5  # Number of segments to select at each level\n# #     batch_size = 32\n\n# #     sequence_lengths = [2 ** i for i in range(5, 15)]  # Sequence lengths from 32 to 16384\n# #     times = []\n\n# #     for seq_len in sequence_lengths:\n# #         model = HCEN(input_dim, hidden_dim, output_dim, k)\n# #         x = torch.randn(batch_size, seq_len, input_dim)\n\n# #         start_time = time.time()\n# #         output = model(x)\n# #         end_time = time.time()\n\n# #         elapsed_time = end_time - start_time\n# #         times.append(elapsed_time)\n# #         print(f\"Sequence Length: {seq_len}, Time Taken: {elapsed_time:.6f} seconds\")\n\n# #     # Plotting the results\n# #     plt.figure(figsize=(10, 6))\n# #     plt.plot(sequence_lengths, times, marker='o')\n# #     plt.xlabel('Sequence Length (N)')\n# #     plt.ylabel('Time Taken (seconds)')\n# #     plt.title('HCEN Computational Time vs Sequence Length')\n# #     plt.xscale('log')\n# #     plt.yscale('log')\n# #     plt.grid(True)\n# #     plt.show()\n\n# # if __name__ == \"__main__\":\n# #     test_hcen_sublinear_scaling()\n\n\n# # # Transformer Model (Quadratic Scaling)\n# # class TransformerModel(nn.Module):\n# #     def __init__(self, input_dim: int, num_heads: int, num_layers: int, output_dim: int):\n# #         super(TransformerModel, self).__init__()\n# #         self.transformer = nn.Transformer(\n# #             d_model=input_dim,\n# #             nhead=num_heads,\n# #             num_encoder_layers=num_layers,\n# #             num_decoder_layers=num_layers,\n# #             dim_feedforward=4 * input_dim,\n# #             batch_first=True,\n# #         )\n# #         self.output_layer = nn.Linear(input_dim, output_dim)\n\n# #     def forward(self, x: torch.Tensor) -> torch.Tensor:\n# #         # Transformer requires both src and tgt; for simplicity, we'll use the same input\n# #         output = self.transformer(x, x)\n# #         # Take the mean across the sequence length\n# #         output = output.mean(dim=1)\n# #         output = self.output_layer(output)\n# #         return output\n\n# # # RNN Model (Linear Scaling)\n# # class RNNModel(nn.Module):\n# #     def __init__(self, input_dim: int, hidden_dim: int, num_layers: int, output_dim: int):\n# #         super(RNNModel, self).__init__()\n# #         self.rnn = nn.RNN(\n# #             input_size=input_dim,\n# #             hidden_size=hidden_dim,\n# #             num_layers=num_layers,\n# #             batch_first=True,\n# #         )\n# #         self.output_layer = nn.Linear(hidden_dim, output_dim)\n\n# #     def forward(self, x: torch.Tensor) -> torch.Tensor:\n# #         # RNN returns output and hidden state; we'll use the final hidden state\n# #         _, hn = self.rnn(x)\n# #         # hn shape: (num_layers, batch_size, hidden_dim)\n# #         hn = hn[-1]  # Take the output from the last layer\n# #         output = self.output_layer(hn)\n# #         return output\n\n# # def benchmark_models():\n# #     \"\"\"\n# #     Benchmark HCEN, Transformer, and RNN models to compare computational scaling.\n# #     \"\"\"\n# #     input_dim = 128\n# #     hidden_dim = 64\n# #     output_dim = 10\n# #     k = 5  # Number of segments to select at each level in HCEN\n# #     num_heads = 8\n# #     num_layers = 2\n# #     batch_size = 32\n\n# #     sequence_lengths = [2 ** i for i in range(5, 14)]  # Sequence lengths from 32 to 8192\n# #     hcen_times = []\n# #     transformer_times = []\n# #     rnn_times = []\n\n# #     for seq_len in sequence_lengths:\n# #         x = torch.randn(batch_size, seq_len, input_dim)\n\n# #         # HCEN Model\n# #         hcen_model = HCEN(input_dim, hidden_dim, output_dim, k)\n# #         start_time = time.time()\n# #         hcen_output = hcen_model(x)\n# #         end_time = time.time()\n# #         hcen_elapsed = end_time - start_time\n# #         hcen_times.append(hcen_elapsed)\n\n# #         # Transformer Model\n# #         # transformer_model = TransformerModel(input_dim, num_heads, num_layers, output_dim)\n# #         # start_time = time.time()\n# #         # transformer_output = transformer_model(x)\n# #         # end_time = time.time()\n# #         # transformer_elapsed = end_time - start_time\n# #         # transformer_times.append(transformer_elapsed)\n\n# #         # RNN Model\n# #         rnn_model = RNNModel(input_dim, hidden_dim, num_layers, output_dim)\n# #         start_time = time.time()\n# #         rnn_output = rnn_model(x)\n# #         end_time = time.time()\n# #         rnn_elapsed = end_time - start_time\n# #         rnn_times.append(rnn_elapsed)\n\n# #         print(f\"Sequence Length: {seq_len}, HCEN Time: {hcen_elapsed:.6f}s, \"\n# #               f\"RNN Time: {rnn_elapsed:.6f}s\")\n\n# #     # Plotting the results\n# #     plt.figure(figsize=(12, 8))\n# #     plt.plot(sequence_lengths, hcen_times, marker='o', label='HCEN (Sub-Linear)')\n# #     # plt.plot(sequence_lengths, transformer_times, marker='o', label='Transformer (Quadratic)')\n# #     plt.plot(sequence_lengths, rnn_times, marker='o', label='RNN (Linear)')\n\n# #     # Reference lines for O(N), O(N log N), O(N^2)\n# #     N = np.array(sequence_lengths)\n# #     plt.plot(N, N / N.max() * max(hcen_times + transformer_times + rnn_times), 'k--', label='O(N)')\n# #     plt.plot(N, np.log(N) / np.log(N.max()) * max(hcen_times + transformer_times + rnn_times), 'g--', label='O(log N)')\n# #     plt.plot(N, (N ** 2) / (N.max() ** 2) * max(hcen_times + transformer_times + rnn_times), 'r--', label='O(N^2)')\n\n# #     plt.xlabel('Sequence Length (N)')\n# #     plt.ylabel('Time Taken (seconds)')\n# #     plt.title('Model Computational Time vs Sequence Length')\n# #     plt.xscale('log')\n# #     plt.yscale('log')\n# #     plt.legend()\n# #     plt.grid(True)\n# #     plt.show()\n\n# # if __name__ == \"__main__\":\n# #     benchmark_models()\n"
  },
  {
    "path": "research/sub_linear.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom torch import Tensor\nfrom typing import List, Tuple\nfrom loguru import logger\nfrom pydantic import BaseModel\n\nlogger.add(\n    \"model_log.log\", format=\"{time} {level} {message}\", level=\"DEBUG\"\n)\n\n\n# Helper class for managing configuration using Pydantic\nclass ModelConfig(BaseModel):\n    input_dim: int\n    num_layers: int\n    sparsity: float\n    cluster_size: int\n    hidden_dim: int\n    num_clusters: int\n    num_classes: int\n    memory_size: int\n\n\n# Sparse Information Extraction Layer\nclass SparseInformationExtraction(nn.Module):\n    \"\"\"\n    This layer selects a sparse subset of tokens based on their importance.\n    \"\"\"\n\n    def __init__(self, input_dim: int, sparsity: float):\n        \"\"\"\n        Initializes the sparse selection layer.\n        Args:\n            input_dim (int): Dimension of input tokens.\n            sparsity (float): Fraction of tokens to select (between 0 and 1).\n        \"\"\"\n        super(SparseInformationExtraction, self).__init__()\n        self.input_dim = input_dim\n        self.sparsity = sparsity\n\n    def forward(self, x: Tensor) -> Tensor:\n        \"\"\"\n        Select a sparse subset of tokens based on their magnitudes.\n        Args:\n            x (Tensor): Input token embeddings of shape (batch_size, seq_len, input_dim).\n        Returns:\n            Tensor: Sparsely selected tokens.\n        \"\"\"\n        logger.debug(f\"Original input shape: {x.shape}\")\n\n        # Compute the L2 norm across the token embeddings\n        token_norms = torch.norm(x, p=2, dim=-1)\n        logger.debug(f\"Token norms shape: {token_norms.shape}\")\n\n        # Select top-k tokens based on sparsity value\n        k = int(self.sparsity * x.size(1))\n        _, topk_indices = torch.topk(token_norms, k, dim=1)\n\n        # Gather the top-k tokens\n        sparse_x = torch.gather(\n            x,\n            1,\n            topk_indices.unsqueeze(-1).expand(-1, -1, self.input_dim),\n        )\n        logger.debug(f\"Sparse input shape: {sparse_x.shape}\")\n\n        return sparse_x\n\n\n# Hierarchical Clustering Layer\nclass HierarchicalClustering(nn.Module):\n    \"\"\"\n    Hierarchically clusters the input tokens into fewer groups.\n    \"\"\"\n\n    def __init__(self, cluster_size: int):\n        \"\"\"\n        Initializes the clustering layer.\n        Args:\n            cluster_size (int): Number of clusters to group tokens into.\n        \"\"\"\n        super(HierarchicalClustering, self).__init__()\n        self.cluster_size = cluster_size\n\n    def forward(self, x: Tensor) -> Tensor:\n        \"\"\"\n        Cluster tokens hierarchically by reshaping and reducing their dimension.\n        Args:\n            x (Tensor): Sparse input tokens of shape (batch_size, seq_len, input_dim).\n        Returns:\n            Tensor: Clustered tokens.\n        \"\"\"\n        logger.debug(f\"Input before clustering: {x.shape}\")\n        batch_size, seq_len, input_dim = x.shape\n        num_clusters = seq_len // self.cluster_size\n        x = x.view(\n            batch_size, num_clusters, self.cluster_size * input_dim\n        )\n        logger.debug(f\"Input after clustering: {x.shape}\")\n        return x\n\n\n# Dynamic Activation Layer\nclass DynamicMaskingActivation(nn.Module):\n    \"\"\"\n    Activates only a subset of neurons based on dynamic masking.\n    \"\"\"\n\n    def __init__(\n        self, input_dim: int, hidden_dim: int, mask_fraction: float\n    ):\n        \"\"\"\n        Initializes the dynamic activation layer.\n        Args:\n            input_dim (int): Dimension of input layer (matches the output of clustering layer).\n            hidden_dim (int): Dimension of hidden layer.\n            mask_fraction (float): Fraction of neurons to activate.\n        \"\"\"\n        super(DynamicMaskingActivation, self).__init__()\n        self.input_dim = input_dim\n        self.hidden_dim = hidden_dim\n        self.mask_fraction = mask_fraction\n        self.fc = nn.Linear(\n            input_dim, hidden_dim\n        )  # Adjusted to take input_dim\n\n    def forward(self, x: Tensor) -> Tensor:\n        \"\"\"\n        Apply dynamic masking to the hidden layer.\n        Args:\n            x (Tensor): Input tensor of shape (batch_size, seq_len, input_dim).\n        Returns:\n            Tensor: Masked activation output.\n        \"\"\"\n        logger.debug(f\"Input before dynamic masking: {x.shape}\")\n        batch_size, seq_len, input_dim = x.shape\n\n        # Compute the number of neurons to activate\n        k = int(self.mask_fraction * self.hidden_dim)\n\n        # Apply linear transformation\n        x = self.fc(x)\n\n        # Mask out a random subset of neurons\n        mask = torch.zeros_like(x).bernoulli_(self.mask_fraction)\n        x = x * mask\n        logger.debug(f\"Masked output shape: {x.shape}\")\n\n        return F.relu(x)\n\n\n# Sparse Recursion-Based Memory Layer\nclass SparseMemory(nn.Module):\n    \"\"\"\n    Implements a recursive memory mechanism for sequence compression.\n    \"\"\"\n\n    def __init__(self, input_dim: int, memory_size: int):\n        \"\"\"\n        Initializes the memory mechanism.\n        Args:\n            input_dim (int): Dimension of input embeddings.\n            memory_size (int): Size of memory (number of stored representations).\n        \"\"\"\n        super(SparseMemory, self).__init__()\n        self.memory_size = memory_size\n        self.fc = nn.Linear(input_dim, memory_size)\n\n    def forward(\n        self, x: Tensor, memory: Tensor\n    ) -> Tuple[Tensor, Tensor]:\n        \"\"\"\n        Update and compress the memory state.\n        Args:\n            x (Tensor): Current input tensor of shape (batch_size, seq_len, input_dim).\n            memory (Tensor): Previous memory state of shape (batch_size, memory_size).\n        Returns:\n            Tuple[Tensor, Tensor]: Updated input and memory state.\n        \"\"\"\n        logger.debug(f\"Input before memory update: {x.shape}\")\n\n        # Compress sequence length to match memory size (batch_size, memory_size)\n        x_compressed = torch.mean(\n            x, dim=1\n        )  # Compress along the sequence dimension\n        logger.debug(f\"Compressed input shape: {x_compressed.shape}\")\n\n        # Update the memory state by combining previous memory and new compressed input\n        updated_memory = F.relu(self.fc(x_compressed) + memory)\n        logger.debug(f\"Updated memory shape: {updated_memory.shape}\")\n\n        return x_compressed, updated_memory\n\n\n# Main SDCI Model Architecture\nclass SDCIModel(nn.Module):\n    \"\"\"\n    Main model combining Sparse Information Extraction, Clustering, Masking, and Memory.\n    \"\"\"\n\n    def __init__(self, config: ModelConfig):\n        \"\"\"\n        Initializes the SDCI model.\n        Args:\n            config (ModelConfig): Configuration object containing model parameters.\n        \"\"\"\n        super(SDCIModel, self).__init__()\n        self.sparse_extraction = SparseInformationExtraction(\n            config.input_dim, config.sparsity\n        )\n        self.clustering = HierarchicalClustering(config.cluster_size)\n        self.dynamic_activation = DynamicMaskingActivation(\n            input_dim=config.cluster_size\n            * config.input_dim,  # Match the clustered output\n            hidden_dim=config.hidden_dim,\n            mask_fraction=config.sparsity,\n        )\n        self.memory = SparseMemory(\n            config.hidden_dim, config.memory_size\n        )\n        self.fc_out = nn.Linear(\n            config.memory_size, config.num_classes\n        )\n\n    def forward(\n        self, x: Tensor, memory: Tensor\n    ) -> Tuple[Tensor, Tensor]:\n        \"\"\"\n        Forward pass through the model.\n        Args:\n            x (Tensor): Input tensor of shape (batch_size, seq_len, input_dim).\n            memory (Tensor): Memory tensor of shape (batch_size, memory_size).\n        Returns:\n            Tuple[Tensor, Tensor]: Output predictions and updated memory.\n        \"\"\"\n        logger.debug(\"Starting forward pass of the model.\")\n\n        # Step 1: Sparse information extraction\n        x = self.sparse_extraction(x)\n\n        # Step 2: Hierarchical clustering\n        x = self.clustering(x)\n\n        # Step 3: Dynamic masking and activation\n        x = self.dynamic_activation(x)\n\n        # Step 4: Recursive memory update\n        x, memory = self.memory(x, memory)\n\n        # Step 5: Output layer for classification\n        output = self.fc_out(memory)\n\n        return output, memory\n\n\nimport time\nimport matplotlib.pyplot as plt\n\n# Example configuration\nconfig = ModelConfig(\n    input_dim=128,\n    num_layers=4,\n    sparsity=0.5,\n    cluster_size=4,\n    hidden_dim=256,\n    num_clusters=16,\n    num_classes=10,\n    memory_size=128,\n)\n\n# Initialize the model and memory\nmodel = SDCIModel(config)\n\n\n# Function to benchmark the model with different input sizes\ndef benchmark_model(\n    model: nn.Module, input_sizes: List[int], batch_size: int = 32\n):\n    times = []\n    memory = torch.zeros(\n        batch_size, config.memory_size\n    )  # Initialize memory\n\n    for input_size in input_sizes:\n        input_tensor = torch.randn(\n            batch_size, input_size, config.input_dim\n        )  # Generate random input\n\n        # Measure time for forward pass\n        start_time = time.time()\n        with torch.no_grad():  # Disable gradients for benchmarking\n            _ = model(input_tensor, memory)\n        elapsed_time = time.time() - start_time\n\n        times.append(elapsed_time)\n        logger.info(\n            f\"Input size {input_size} - Elapsed time: {elapsed_time:.6f} seconds\"\n        )\n\n    return times\n\n\n# Define the range of input sizes to test\ninput_sizes = [128, 256, 512, 1024, 2048]\n\n# Run the benchmark\nexecution_times = benchmark_model(model, input_sizes)\n\n# Plotting the results\nplt.figure(figsize=(10, 6))\nplt.plot(\n    input_sizes,\n    execution_times,\n    label=\"Model Execution Time\",\n    marker=\"o\",\n)\nplt.plot(\n    input_sizes,\n    [size for size in input_sizes],\n    label=\"Linear Time (O(N))\",\n    linestyle=\"--\",\n)\nplt.plot(\n    input_sizes,\n    [size**2 for size in input_sizes],\n    label=\"Quadratic Time (O(N^2))\",\n    linestyle=\"--\",\n)\nplt.xlabel(\"Input Sequence Length (N)\")\nplt.ylabel(\"Execution Time (seconds)\")\nplt.title(\"Benchmark: Model Execution Time vs Input Size\")\nplt.legend()\nplt.grid(True)\nplt.show()\n"
  },
  {
    "path": "scripts/code_quality.sh",
    "content": "#!/bin/bash\n\n# Navigate to the directory containing the 'package' folder\n# cd /path/to/your/code/directory\n\n# Run autopep8 with max aggressiveness (-aaa) and in-place modification (-i)\n# on all Python files (*.py) under the 'package' directory.\nautopep8 --in-place --aggressive --aggressive --recursive --experimental --list-fixes package/\n\n# Run black with default settings, since black does not have an aggressiveness level.\n# Black will format all Python files it finds in the 'package' directory.\nblack --experimental-string-processing package/\n\n# Run ruff on the 'package' directory.\n# Add any additional flags if needed according to your version of ruff.\nruff --unsafe_fix\n\n# YAPF\nyapf --recursive --in-place --verbose --style=google --parallel package\n"
  },
  {
    "path": "scripts/merge_all_prs.sh",
    "content": "#!/bin/bash\n\n# Check if we are inside a Git repository\nif ! git rev-parse --git-dir > /dev/null 2>&1; then\n    echo \"Error: Must be run inside a Git repository.\"\n    exit 1\nfi\n\n# Fetch all open pull requests\necho \"Fetching open PRs...\"\nprs=$(gh pr list --state open --json number --jq '.[].number')\n\n# Check if there are PRs to merge\nif [ -z \"$prs\" ]; then\n    echo \"No open PRs to merge.\"\n    exit 0\nfi\n\necho \"Found PRs: $prs\"\n\n# Loop through each pull request number and merge it\nfor pr in $prs; do\n    echo \"Attempting to merge PR #$pr\"\n    merge_output=$(gh pr merge $pr --auto --merge)\n    merge_status=$?\n    if [ $merge_status -ne 0 ]; then\n        echo \"Failed to merge PR #$pr. Error: $merge_output\"\n    else\n        echo \"Successfully merged PR #$pr\"\n    fi\ndone\n\necho \"Processing complete.\"\n"
  },
  {
    "path": "scripts/test_name.sh",
    "content": "find ./tests -name \"*.py\" -type f | while read file\ndo\n  filename=$(basename \"$file\")\n  dir=$(dirname \"$file\")\n  if [[ $filename != test_* ]]; then\n    mv \"$file\" \"$dir/test_$filename\"\n  fi\ndone"
  },
  {
    "path": "scripts/tests.sh",
    "content": "find ./tests -name '*.py' -exec pytest {} \\;"
  }
]