Repository: Mozilla-Ocho/Memory-Cache
Branch: main
Commit: 8b54e4e7daa8
Files: 115
Total size: 320.2 KB
Directory structure:
gitextract_ahfxg4dk/
├── .gitignore
├── ATTRIBUTIONS.md
├── LICENSE
├── README.md
├── docs/
│ ├── .gitignore
│ ├── 404.html
│ ├── CNAME
│ ├── Gemfile
│ ├── _config.yml
│ ├── _includes/
│ │ ├── footer.html
│ │ └── header.html
│ ├── _layouts/
│ │ ├── default.html
│ │ ├── home.html
│ │ ├── page.html
│ │ └── post.html
│ ├── _posts/
│ │ ├── 2023-11-06-introducing-memory-cache.markdown
│ │ ├── 2023-11-30-we-have-a-website.markdown
│ │ ├── 2024-03-01-memory-cache-and-ai-privacy.markdown
│ │ ├── 2024-03-06-designlog-update.markdown
│ │ ├── 2024-03-07-devlog.markdown
│ │ ├── 2024-03-15-devlog.markdown
│ │ └── 2024-04-19-memory-cache-hub.markdown
│ ├── _sass/
│ │ ├── memorycache.scss
│ │ └── minima.scss
│ ├── about.markdown
│ ├── assets/
│ │ └── main.scss
│ ├── faq.md
│ ├── index.markdown
│ └── readme.md
├── extension/
│ ├── content-script.js
│ ├── manifest.json
│ └── popup/
│ ├── marked.esm.js
│ ├── memory_cache.html
│ ├── memory_cache.js
│ └── styles.css
├── scratch/
│ ├── backend/
│ │ ├── hub/
│ │ │ ├── .gitignore
│ │ │ ├── PLAN.md
│ │ │ ├── README.md
│ │ │ ├── docker/
│ │ │ │ ├── Dockerfile.hub-builder-gnu-linux
│ │ │ │ ├── Dockerfile.hub-builder-old-gnu-linux
│ │ │ │ ├── Dockerfile.hub-dev
│ │ │ │ └── Dockerfile.hub-dev-cuda
│ │ │ ├── requirements/
│ │ │ │ ├── hub-base.txt
│ │ │ │ ├── hub-builder.txt
│ │ │ │ └── hub-cpu.txt
│ │ │ └── src/
│ │ │ ├── api/
│ │ │ │ ├── llamafile_api.py
│ │ │ │ └── thread_api.py
│ │ │ ├── async_utils.py
│ │ │ ├── chat.py
│ │ │ ├── chat2.py
│ │ │ ├── chat3.py
│ │ │ ├── fastapi_app.py
│ │ │ ├── gradio_app.py
│ │ │ ├── hub.py
│ │ │ ├── hub_build_gnu_linux.py
│ │ │ ├── hub_build_macos.py
│ │ │ ├── hub_build_windows.py
│ │ │ ├── llamafile_infos.json
│ │ │ ├── llamafile_infos.py
│ │ │ ├── llamafile_manager.py
│ │ │ └── static/
│ │ │ └── index.html
│ │ ├── langserve-demo/
│ │ │ ├── .gitignore
│ │ │ ├── Dockerfile.cpu
│ │ │ ├── README.md
│ │ │ ├── client.py
│ │ │ ├── requirements-cpu.txt
│ │ │ ├── requirements.txt
│ │ │ └── serve.py
│ │ └── python-llamafile-manager/
│ │ ├── .gitignore
│ │ ├── Dockerfile.plm
│ │ ├── Dockerfile.plm-builder-gnu-linux
│ │ ├── README.md
│ │ ├── build_gnu_linux.py
│ │ ├── manager.py
│ │ └── requirements.txt
│ ├── browser-client/
│ │ ├── .gitignore
│ │ ├── README.md
│ │ ├── package.json
│ │ ├── src/
│ │ │ ├── index.html
│ │ │ ├── main.js
│ │ │ ├── styleguide.html
│ │ │ └── styles.css
│ │ └── webpack.config.js
│ └── hub-browser-client/
│ ├── .gitignore
│ ├── README.md
│ ├── config/
│ │ ├── env.js
│ │ ├── getHttpsConfig.js
│ │ ├── jest/
│ │ │ ├── babelTransform.js
│ │ │ ├── cssTransform.js
│ │ │ └── fileTransform.js
│ │ ├── modules.js
│ │ ├── paths.js
│ │ ├── webpack/
│ │ │ └── persistentCache/
│ │ │ └── createEnvironmentHash.js
│ │ ├── webpack.config.js
│ │ └── webpackDevServer.config.js
│ ├── package.json
│ ├── public/
│ │ ├── index.html
│ │ ├── manifest.json
│ │ └── robots.txt
│ ├── scripts/
│ │ ├── build.js
│ │ ├── start.js
│ │ └── test.js
│ ├── src/
│ │ ├── App.css
│ │ ├── App.test.tsx
│ │ ├── App.tsx
│ │ ├── api/
│ │ │ └── llamafile_api.ts
│ │ ├── components/
│ │ │ └── llamafile_details.tsx
│ │ ├── index.css
│ │ ├── index.tsx
│ │ ├── react-app-env.d.ts
│ │ ├── reportWebVitals.ts
│ │ ├── setupTests.ts
│ │ └── types.ts
│ └── tsconfig.json
└── scripts/
└── run_ingest.sh
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
.DS_Store
extension/.web-extension-id
extension/web-ext-artifacts/
================================================
FILE: ATTRIBUTIONS.md
================================================
## Icons
brain_24.png icon licensed under [CC-by 3.0 Unported](https://creativecommons.org/licenses/by/3.0/) from user 'Howcolour' on www.iconfinder.com
save-icon-16.png icon licensed under [CC-by 3.0](https://creativecommons.org/licenses/by/3.0/) from user 'Bhuvan' from Noun Project
file-icon-16.png icon licensed under [CC-by 3.0](https://creativecommons.org/licenses/by/3.0/) from user 'Mas Dhimas' from Noun Project
## Helpful Links
CSS Gradient tool used: https://cssgradient.io/
Pastel Rainbow color palette by user allyasdf on color-hex: https://www.color-hex.com/color-palette/5361
CSS Trick - border-top-linear-gradient solution fromL https://michaelharley.net/posts/2021/01/12/how-to-create-a-border-top-linear-gradient/
================================================
FILE: LICENSE
================================================
Mozilla Public License Version 2.0
==================================
1. Definitions
--------------
1.1. "Contributor"
means each individual or legal entity that creates, contributes to
the creation of, or owns Covered Software.
1.2. "Contributor Version"
means the combination of the Contributions of others (if any) used
by a Contributor and that particular Contributor's Contribution.
1.3. "Contribution"
means Covered Software of a particular Contributor.
1.4. "Covered Software"
means Source Code Form to which the initial Contributor has attached
the notice in Exhibit A, the Executable Form of such Source Code
Form, and Modifications of such Source Code Form, in each case
including portions thereof.
1.5. "Incompatible With Secondary Licenses"
means
(a) that the initial Contributor has attached the notice described
in Exhibit B to the Covered Software; or
(b) that the Covered Software was made available under the terms of
version 1.1 or earlier of the License, but not also under the
terms of a Secondary License.
1.6. "Executable Form"
means any form of the work other than Source Code Form.
1.7. "Larger Work"
means a work that combines Covered Software with other material, in
a separate file or files, that is not Covered Software.
1.8. "License"
means this document.
1.9. "Licensable"
means having the right to grant, to the maximum extent possible,
whether at the time of the initial grant or subsequently, any and
all of the rights conveyed by this License.
1.10. "Modifications"
means any of the following:
(a) any file in Source Code Form that results from an addition to,
deletion from, or modification of the contents of Covered
Software; or
(b) any new file in Source Code Form that contains any Covered
Software.
1.11. "Patent Claims" of a Contributor
means any patent claim(s), including without limitation, method,
process, and apparatus claims, in any patent Licensable by such
Contributor that would be infringed, but for the grant of the
License, by the making, using, selling, offering for sale, having
made, import, or transfer of either its Contributions or its
Contributor Version.
1.12. "Secondary License"
means either the GNU General Public License, Version 2.0, the GNU
Lesser General Public License, Version 2.1, the GNU Affero General
Public License, Version 3.0, or any later versions of those
licenses.
1.13. "Source Code Form"
means the form of the work preferred for making modifications.
1.14. "You" (or "Your")
means an individual or a legal entity exercising rights under this
License. For legal entities, "You" includes any entity that
controls, is controlled by, or is under common control with You. For
purposes of this definition, "control" means (a) the power, direct
or indirect, to cause the direction or management of such entity,
whether by contract or otherwise, or (b) ownership of more than
fifty percent (50%) of the outstanding shares or beneficial
ownership of such entity.
2. License Grants and Conditions
--------------------------------
2.1. Grants
Each Contributor hereby grants You a world-wide, royalty-free,
non-exclusive license:
(a) under intellectual property rights (other than patent or trademark)
Licensable by such Contributor to use, reproduce, make available,
modify, display, perform, distribute, and otherwise exploit its
Contributions, either on an unmodified basis, with Modifications, or
as part of a Larger Work; and
(b) under Patent Claims of such Contributor to make, use, sell, offer
for sale, have made, import, and otherwise transfer either its
Contributions or its Contributor Version.
2.2. Effective Date
The licenses granted in Section 2.1 with respect to any Contribution
become effective for each Contribution on the date the Contributor first
distributes such Contribution.
2.3. Limitations on Grant Scope
The licenses granted in this Section 2 are the only rights granted under
this License. No additional rights or licenses will be implied from the
distribution or licensing of Covered Software under this License.
Notwithstanding Section 2.1(b) above, no patent license is granted by a
Contributor:
(a) for any code that a Contributor has removed from Covered Software;
or
(b) for infringements caused by: (i) Your and any other third party's
modifications of Covered Software, or (ii) the combination of its
Contributions with other software (except as part of its Contributor
Version); or
(c) under Patent Claims infringed by Covered Software in the absence of
its Contributions.
This License does not grant any rights in the trademarks, service marks,
or logos of any Contributor (except as may be necessary to comply with
the notice requirements in Section 3.4).
2.4. Subsequent Licenses
No Contributor makes additional grants as a result of Your choice to
distribute the Covered Software under a subsequent version of this
License (see Section 10.2) or under the terms of a Secondary License (if
permitted under the terms of Section 3.3).
2.5. Representation
Each Contributor represents that the Contributor believes its
Contributions are its original creation(s) or it has sufficient rights
to grant the rights to its Contributions conveyed by this License.
2.6. Fair Use
This License is not intended to limit any rights You have under
applicable copyright doctrines of fair use, fair dealing, or other
equivalents.
2.7. Conditions
Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted
in Section 2.1.
3. Responsibilities
-------------------
3.1. Distribution of Source Form
All distribution of Covered Software in Source Code Form, including any
Modifications that You create or to which You contribute, must be under
the terms of this License. You must inform recipients that the Source
Code Form of the Covered Software is governed by the terms of this
License, and how they can obtain a copy of this License. You may not
attempt to alter or restrict the recipients' rights in the Source Code
Form.
3.2. Distribution of Executable Form
If You distribute Covered Software in Executable Form then:
(a) such Covered Software must also be made available in Source Code
Form, as described in Section 3.1, and You must inform recipients of
the Executable Form how they can obtain a copy of such Source Code
Form by reasonable means in a timely manner, at a charge no more
than the cost of distribution to the recipient; and
(b) You may distribute such Executable Form under the terms of this
License, or sublicense it under different terms, provided that the
license for the Executable Form does not attempt to limit or alter
the recipients' rights in the Source Code Form under this License.
3.3. Distribution of a Larger Work
You may create and distribute a Larger Work under terms of Your choice,
provided that You also comply with the requirements of this License for
the Covered Software. If the Larger Work is a combination of Covered
Software with a work governed by one or more Secondary Licenses, and the
Covered Software is not Incompatible With Secondary Licenses, this
License permits You to additionally distribute such Covered Software
under the terms of such Secondary License(s), so that the recipient of
the Larger Work may, at their option, further distribute the Covered
Software under the terms of either this License or such Secondary
License(s).
3.4. Notices
You may not remove or alter the substance of any license notices
(including copyright notices, patent notices, disclaimers of warranty,
or limitations of liability) contained within the Source Code Form of
the Covered Software, except that You may alter any license notices to
the extent required to remedy known factual inaccuracies.
3.5. Application of Additional Terms
You may choose to offer, and to charge a fee for, warranty, support,
indemnity or liability obligations to one or more recipients of Covered
Software. However, You may do so only on Your own behalf, and not on
behalf of any Contributor. You must make it absolutely clear that any
such warranty, support, indemnity, or liability obligation is offered by
You alone, and You hereby agree to indemnify every Contributor for any
liability incurred by such Contributor as a result of warranty, support,
indemnity or liability terms You offer. You may include additional
disclaimers of warranty and limitations of liability specific to any
jurisdiction.
4. Inability to Comply Due to Statute or Regulation
---------------------------------------------------
If it is impossible for You to comply with any of the terms of this
License with respect to some or all of the Covered Software due to
statute, judicial order, or regulation then You must: (a) comply with
the terms of this License to the maximum extent possible; and (b)
describe the limitations and the code they affect. Such description must
be placed in a text file included with all distributions of the Covered
Software under this License. Except to the extent prohibited by statute
or regulation, such description must be sufficiently detailed for a
recipient of ordinary skill to be able to understand it.
5. Termination
--------------
5.1. The rights granted under this License will terminate automatically
if You fail to comply with any of its terms. However, if You become
compliant, then the rights granted under this License from a particular
Contributor are reinstated (a) provisionally, unless and until such
Contributor explicitly and finally terminates Your grants, and (b) on an
ongoing basis, if such Contributor fails to notify You of the
non-compliance by some reasonable means prior to 60 days after You have
come back into compliance. Moreover, Your grants from a particular
Contributor are reinstated on an ongoing basis if such Contributor
notifies You of the non-compliance by some reasonable means, this is the
first time You have received notice of non-compliance with this License
from such Contributor, and You become compliant prior to 30 days after
Your receipt of the notice.
5.2. If You initiate litigation against any entity by asserting a patent
infringement claim (excluding declaratory judgment actions,
counter-claims, and cross-claims) alleging that a Contributor Version
directly or indirectly infringes any patent, then the rights granted to
You by any and all Contributors for the Covered Software under Section
2.1 of this License shall terminate.
5.3. In the event of termination under Sections 5.1 or 5.2 above, all
end user license agreements (excluding distributors and resellers) which
have been validly granted by You or Your distributors under this License
prior to termination shall survive termination.
************************************************************************
* *
* 6. Disclaimer of Warranty *
* ------------------------- *
* *
* Covered Software is provided under this License on an "as is" *
* basis, without warranty of any kind, either expressed, implied, or *
* statutory, including, without limitation, warranties that the *
* Covered Software is free of defects, merchantable, fit for a *
* particular purpose or non-infringing. The entire risk as to the *
* quality and performance of the Covered Software is with You. *
* Should any Covered Software prove defective in any respect, You *
* (not any Contributor) assume the cost of any necessary servicing, *
* repair, or correction. This disclaimer of warranty constitutes an *
* essential part of this License. No use of any Covered Software is *
* authorized under this License except under this disclaimer. *
* *
************************************************************************
************************************************************************
* *
* 7. Limitation of Liability *
* -------------------------- *
* *
* Under no circumstances and under no legal theory, whether tort *
* (including negligence), contract, or otherwise, shall any *
* Contributor, or anyone who distributes Covered Software as *
* permitted above, be liable to You for any direct, indirect, *
* special, incidental, or consequential damages of any character *
* including, without limitation, damages for lost profits, loss of *
* goodwill, work stoppage, computer failure or malfunction, or any *
* and all other commercial damages or losses, even if such party *
* shall have been informed of the possibility of such damages. This *
* limitation of liability shall not apply to liability for death or *
* personal injury resulting from such party's negligence to the *
* extent applicable law prohibits such limitation. Some *
* jurisdictions do not allow the exclusion or limitation of *
* incidental or consequential damages, so this exclusion and *
* limitation may not apply to You. *
* *
************************************************************************
8. Litigation
-------------
Any litigation relating to this License may be brought only in the
courts of a jurisdiction where the defendant maintains its principal
place of business and such litigation shall be governed by laws of that
jurisdiction, without reference to its conflict-of-law provisions.
Nothing in this Section shall prevent a party's ability to bring
cross-claims or counter-claims.
9. Miscellaneous
----------------
This License represents the complete agreement concerning the subject
matter hereof. If any provision of this License is held to be
unenforceable, such provision shall be reformed only to the extent
necessary to make it enforceable. Any law or regulation which provides
that the language of a contract shall be construed against the drafter
shall not be used to construe this License against a Contributor.
10. Versions of the License
---------------------------
10.1. New Versions
Mozilla Foundation is the license steward. Except as provided in Section
10.3, no one other than the license steward has the right to modify or
publish new versions of this License. Each version will be given a
distinguishing version number.
10.2. Effect of New Versions
You may distribute the Covered Software under the terms of the version
of the License under which You originally received the Covered Software,
or under the terms of any subsequent version published by the license
steward.
10.3. Modified Versions
If you create software not governed by this License, and you want to
create a new license for such software, you may create and use a
modified version of this License if you rename the license and remove
any references to the name of the license steward (except to note that
such modified license differs from this License).
10.4. Distributing Source Code Form that is Incompatible With Secondary
Licenses
If You choose to distribute Source Code Form that is Incompatible With
Secondary Licenses under the terms of this version of the License, the
notice described in Exhibit B of this License must be attached.
Exhibit A - Source Code Form License Notice
-------------------------------------------
This Source Code Form is subject to the terms of the Mozilla Public
License, v. 2.0. If a copy of the MPL was not distributed with this
file, You can obtain one at http://mozilla.org/MPL/2.0/.
If it is not possible or desirable to put the notice in a particular
file, then You may include the notice in a location (such as a LICENSE
file in a relevant directory) where a recipient would be likely to look
for such a notice.
You may add additional accurate notices of copyright ownership.
Exhibit B - "Incompatible With Secondary Licenses" Notice
---------------------------------------------------------
This Source Code Form is "Incompatible With Secondary Licenses", as
defined by the Mozilla Public License, v. 2.0.
================================================
FILE: README.md
================================================
# Memory Cache
Memory Cache is a project that allows you to save a webpage while you're browsing in Firefox as a PDF, and save it to a synchronized folder that can be used in conjunction with privateGPT to augment a local language model.
| ⚠️: This setup uses the primordial version of privateGPT. I'm working from a fork that can be found [here](https://github.com/misslivirose/privateGPT). |
| ---------------------------------------------------------------------------------------------------------------------- |
## Prerequisites
1. Set up [privateGPT](https://github.com/imartinez/privateGPT) - either using the primordial checkpoint, or from my fork.
2. Create a symlink between a subdirectory in your default Downloads folder called 'MemoryCache' and a 'MemoryCache' directory created inside of /PrivateGPT/source_documents/MemoryCache
3. Apply patch to Firefox to add the `printerSettings.silentMode` property to the Tabs API. [See wiki page for instructions](https://github.com/Mozilla-Ocho/Memory-Cache/wiki/Modifying-Firefox-to-Save-PDF-files-automagically-to-MemoryCache)
4. Copy /scripts/run_ingest.sh into your privateGPT directory and run it to start `inotifywait` watching your downloads directory for new content
## Setting up the Extension
1. Clone the Memory-Cache GitHub repository to your local machine
2. In Firefox, navigate to `about:debugging` and click on 'This Firefox'
3. Click 'Load Temporary Add-on" and open the `extension/manifest.json` file in the MemoryCacheExt directory
## Using the Extension
1. Under the 'Extensions' menu, add the Memory Cache extension to the toolbar
2. When you want to save a page to your Memory Cache, click the icon and select the 'Save' button. This will save the file silently as a PDF if you are using a Firefox build with the `printerSettings.silentMode` property addition.
================================================
FILE: docs/.gitignore
================================================
_site
.sass-cache
.jekyll-cache
.jekyll-metadata
vendor
================================================
FILE: docs/404.html
================================================
---
permalink: /404.html
layout: default
---
404
Page not found :(
The requested page could not be found.
================================================
FILE: docs/CNAME
================================================
memorycache.ai
================================================
FILE: docs/Gemfile
================================================
source "https://rubygems.org"
# Hello! This is where you manage which Jekyll version is used to run.
# When you want to use a different version, change it below, save the
# file and run `bundle install`. Run Jekyll with `bundle exec`, like so:
#
# bundle exec jekyll serve
#
# This will help ensure the proper Jekyll version is running.
# Happy Jekylling!
# This is the default theme for new Jekyll sites. You may change this to anything you like.
gem "minima", "~> 2.5"
# If you want to use GitHub Pages, remove the "gem "jekyll"" above and
# uncomment the line below. To upgrade, run `bundle update github-pages`.
gem "github-pages", "~> 228", group: :jekyll_plugins
# If you have any plugins, put them here!
group :jekyll_plugins do
gem "jekyll-feed", "~> 0.12"
end
# Windows and JRuby does not include zoneinfo files, so bundle the tzinfo-data gem
# and associated library.
platforms :mingw, :x64_mingw, :mswin, :jruby do
gem "tzinfo", ">= 1", "< 3"
gem "tzinfo-data"
end
# Performance-booster for watching directories on Windows
gem "wdm", "~> 0.1.1", :platforms => [:mingw, :x64_mingw, :mswin]
# Lock `http_parser.rb` gem to `v0.6.x` on JRuby builds since newer versions of the gem
# do not have a Java counterpart.
gem "http_parser.rb", "~> 0.6.0", :platforms => [:jruby]
gem "webrick", "~> 1.8"
================================================
FILE: docs/_config.yml
================================================
# Welcome to Jekyll!
#
# This config file is meant for settings that affect your whole blog, values
# which you are expected to set up once and rarely edit after that. If you find
# yourself editing this file very often, consider using Jekyll's data files
# feature for the data you need to update frequently.
#
# For technical reasons, this file is *NOT* reloaded automatically when you use
# 'bundle exec jekyll serve'. If you change this file, please restart the server process.
#
# If you need help with YAML syntax, here are some quick references for you:
# https://learn-the-web.algonquindesign.ca/topics/markdown-yaml-cheat-sheet/#yaml
# https://learnxinyminutes.com/docs/yaml/
#
# Site settings
# These are used to personalize your new site. If you look in the HTML files,
# you will see them accessed via {{ site.title }}, {{ site.email }}, and so on.
# You can create any custom variable you would like, and they will be accessible
# in the templates via {{ site.myvariable }}.
title: MemoryCache
email: oerickson@mozilla.com
description: MemoryCache is an experimental developer project to turn a local desktop environment into an on-device AI agent.
# baseurl: "/Memory-Cache" # the subpath of your site, e.g. /blog
url: "https://memorycache.ai/" # the base hostname & protocol for your site, e.g. http://example.com
github_username: Mozilla-Ocho
# Build settings
theme: minima
plugins:
- jekyll-feed
# Exclude from processing.
# The following items will not be processed, by default.
# Any item listed under the `exclude:` key here will be automatically added to
# the internal "default list".
#
# Excluded items can be processed by explicitly listing the directories or
# their entries' file path in the `include:` list.
#
# exclude:
# - .sass-cache/
# - .jekyll-cache/
# - gemfiles/
# - Gemfile
# - Gemfile.lock
# - node_modules/
# - vendor/bundle/
# - vendor/cache/
# - vendor/gems/
# - vendor/ruby/
================================================
FILE: docs/_includes/footer.html
================================================
================================================
FILE: docs/_includes/header.html
================================================
MemoryCache is an experimental development project to turn a local desktop environment into an on-device AI agent.
Every human is unique. The original vision of the personal computer was as a companion tool for creating intelligence, and the internet was born as a way to connect people and data together around the world. Today, artificial intelligence is upending the way that we interact with data and information, but control of these systems is most often provided through an API endpoint, run in the cloud, and abstracting away deep personal agency in favor of productivity.
MemoryCache, a Mozilla Innovation Project, is an experimental AI Firefox add-on that partners with privateGPT to quickly save your browser history to your local machine and have a local AI model ingest these - and any other local files you give it - to augment responses to a chat interface that comes built-in with privateGPT. We have an ambition to use MemoryCache to move beyond the chat interface, and find a way to utilize idle compute time to generate net new insights that reflect what you've actually read and learned - not the entirety of the internet at scale.
Design mockup of a future interface idea for MemoryCache
We're not breaking ground on AI innovation (in fact, we're using an old, "deprecated" file format from a whole six months ago), by design. MemoryCache is a project that allows us to sow some seeds of exploration into creating a deeply personalized AI experience that returns to the original vision of the computer as a companion for our own thought. With MemoryCache, weirdness and unpredictability is part of the charm.
We're a small team working on MemoryCache as a part-time project within Mozilla's innovation group, looking at ways that our personal data and files are used to form insights and new neural connections for our own creative purpose. We're working in the open not because we have answers, but because we want to contribute our way of thinking to one another in a way where others can join in.
{%- if site.disqus.shortname -%}
{%- include disqus_comments.html -%}
{%- endif -%}
================================================
FILE: docs/_posts/2023-11-06-introducing-memory-cache.markdown
================================================
---
layout: post
title: "Introducing Memory Cache"
date: 2023-11-06 14:47:57 -0500
categories: developer-blog
---
Most AI development today is centered around services. Companies offer tailored insights and powerful agents that can replicate all aspects of the human experience. AI is supposedly "passing the bar exam", diagnosing medical issues, and everything in-between. What is the role of a human being in an increasingly online world?
In practice, AI is a complex web of big data sources (e.g. the entirety of the internet). Pairing massive amounts of data with increasingly powerful cloud computing capabilities has resulted in unprecedented software development capabilities. Adopting naming practices and principles from science fiction stories, Silicon Valley is racing down a path towards a fictional idea of a "sentient computer" with AGI. Artificial intelligence is a field gives us more modalities and capabilities to use with computers, and what really matters is how we (as humans) use the technology at hand.
Not that long ago, computing was grounded in the idea that digital literacy was a skill to be adopted and used in service of greater problems to be solved. We, as individuals, had control over our data, our files, our thoughts. Over the past several years, Big Tech has traded us systems of addictive social media sites for yottabytes of our personal data in the service of "personalization" - a.k.a targeted advertising.
Memory Cache is an exploration into human-first artificial intelligence, starting with the actual idea of the personal computer. The project is an experiment in local, on-premise AI: what you can do with a standard gaming desktop that sits in your home, and actually works for you. It bridges your browser history with your local file system, so that you can use the power of openly licensed AI and open source code to inspect, query, and tinker with an AI that is under your own control.
================================================
FILE: docs/_posts/2023-11-30-we-have-a-website.markdown
================================================
---
layout: post
title: "We have a website! And other MemoryCache Updates"
date: 2023-11-30 11:47:00 -0800
categories: developer-blog
---
We've been continuing our work on MemoryCache over the past several weeks, and are excited to have our [landing page](https://memorycache.ai) up and running. The updated design for MemoryCache has been something we've been iterating on, and it's been a fun process to talk about what sort of emotions we want to seed MemoryCache tools with. We've settled on an initial design style guide and have recently landed several contributions to enable a vanilla Firefox version of the extension.
Our team is working on MemoryCache as a sandbox for exploring concepts related to small, local, and patient AI. We're a small project with big ambitions, and look forward to continuing down several areas of exploration in the coming weeks, including:
* Building an app experience to automatically generate insights from newly added data, to act as a gentle reminder of places that can grow from your recently added content
* Updating the project website to include more details about the philosophy, design, and thinking around the project and how we envision it growing
* Competitive and secondary research reporting that we can publish that shares our insights and findings on how people think about recall and note-taking
* Understanding how to evaluate and generate personal insights outside of the chat interface model
* Exploring a social layer to easily distill and share insights within a trusted network of people
Follow along with us on our [GitHub repo](https://github.com/misslivirose/Memory-Cache) - we'd love to see you there!
================================================
FILE: docs/_posts/2024-03-01-memory-cache-and-ai-privacy.markdown
================================================
---
layout: post
title: "MemoryCache and AI Privacy"
date: 2024-03-01 07:08:57 -0500
categories: developer-blog
---
_Author: Liv Erickson_
It's been an exciting few months since we first shared [MemoryCache](https://memorycache.ai/developer-blog/2023/11/06/introducing-memory-cache.html), a home for experimenting with local-first AI that learns what you learn. While we'll be sharing more updates about what the team has been working on in the coming weeks by way of more regular development blogs, I wanted to share a podcast that I recently recorded with [Schalk Neethling for the Mechanical Ink Podcast](https://schalkneethling.substack.com/p/privacy-ai-and-an-ai-digital-memory) that goes in-depth about the principles behind MemoryCache.
[](https://www.youtube.com/watch?v=CGdxLfcU9TU)
In this podcast, we go into the motivations behind the project and how it was originally created, as well as the overall challenges that are presented with the growing creation of synthetic content and preserving authentic connections online in an area of unprecedented, generated personalization.
At it's core, MemoryCache is a project exploring what it means to urgently, yet collaboratively, envision futures where AI technologies enable us to build a more authentic relationship with information and ourselves through small acts of insight and [embracing friction](https://foundation.mozilla.org/en/blog/speculative-friction-in-generative-ai/) as it presents novel outcomes.
More coming soon!
================================================
FILE: docs/_posts/2024-03-06-designlog-update.markdown
================================================
---
layout: post
title: "MemoryCache March Design Update"
date: 2024-03-06 08 -0500
categories: developer-blog
---
_Author: Kate Taylor_
Hi! My name is Kate and I am a designer working on MemoryCache in the Mozilla Innovation organization. My official title is Design Technologist, which describes the focus between humans and technology. Humans is an important word in this context because it is not specific to a group of people, but the recognition that the choices we make as technologists have effects that ripple on to humans who may or may not be aware of what happens with their information. Information is powerful in the world of AI and the handling of it deserves genuine respect, which in turn builds on an atmosphere of trust and safety.
MemoryCache serves as an open and safe testbed to explore the ideas of what humans need to both benefit from an AI agent while maintaining control over their information and the technical processes involved. As a designer on this project, my work is intended to create an environment that feels like a true augmentation of your creative thought work.
Early concepts and notes exploring the idea of what safety in an AI Agent experience looks like
When we started the project last year, the world was in a different place. Reflecting back on the time along with the goings-on in the AI space is bringing to mind a lot of big personal feelings as well as acknowledgment of the generally fearful vibes. I tend to pay attention to feelings because they are what allow us (as people) to find the things that matter most. It’s difficult to not sense an amount of fear when people are faced with a lot of change. This especially becomes clear when comparing conversations with people in and out of the tech field. This fear has brought a lot of very meaningful interpersonal conversations about what this technology means -- what job does it do well, what jobs do we (as people) do well and want to keep, how did we get to this point in time. These conversations are the motivation for contributing to Open Source AI because the power of understanding the world around you is boundless, and the barriers to this knowledge are mysterious but significant.
Awareness of the barriers to entry to an experience is where designers do their best work. We strive to find meaningful solutions to problems that will sustain. Acknowledging the emotional barrier is the foundation of how we are thinking about MemoryCache. Depending on who you are, interacting with a chatbot has baggage - in the same way that social situations differ across individuals. There are social aspects, language considerations, articulation differences, historical contexts, etc. This is A LOT to deal with as a user of a system. When approaching the design work for MemoryCache, our guiding light is to take into account the unique humanity of each person and allow for the ability to utilize the technology to create an environment that nurtures human needs rather than profit from them.
Exploratory work for interactions that combine with chat interface to interact with the agent in personalized ways for various usecasesDesign mockups and explorations for the UI exploring themes and personalization in combination with input methods for interaction
This philosophy is core to the work we are doing with MemoryCache. We believe that personalized experiences for interacting with your own information provides a safe space for working with your thought materials with a lot of flexibility and personalized modularity.
When thinking about what safety means in relation to AI computing, in 2024 this is a complicated subject. We are not just speaking about access to information, but access to people and the very things that make us human. Painting, drawing, and tinkering have always been the safe space in my life, personally. The process of making things provides the opportunity to explore your thoughts and experiences without the judgment or unsolicited opinions of others. This time spent reflecting tends to be where the most valuable ideas come to light in other areas of life (similar to the idea of “shower thoughts”). We are iterating on this concept with the idea that the agent could work with the person in creating more of those shower-thought moments. Flipping the idea of asking the agent for a task with the agent providing insights that you could find valuable. The mockups below demonstrate our current thinking for an interface that we can start building and working with as needs evolve
Latest design mockup for MemoryCache agent

We are excited about the potential of where MemoryCache can go. Part of the magic of developing in the open is that we will learn along the way what matters most to people and evolve from there. The logo and visual design we are running with for MemoryCache visualizes our hope for celebration of individuality and personal empowerment through technology advancements. The world can be a messy place, but your individual context is yours to control. In our next round of work, we are building on this philosophy to expose the Agent’s capabilities in meaningful interactions that support the ability to augment your thought work in the way that your brain thinks.
================================================
FILE: docs/_posts/2024-03-07-devlog.markdown
================================================
---
layout: post
title: "Memory Cache Dev Log March 7 2024"
date: 2024-03-07 08 -0500
categories: developer-blog
---
_Author: John Shaughnessy_
# Memory Cache Dev Log, March 7 2024
A couple months ago [we introduced Memory Cache](https://future.mozilla.org/blog/introducing-memorycache/):
> Memory Cache, a Mozilla Innovation Project, is an early exploration project that augments an on-device, personal model with local files saved from the browser to reflect a more personalized and tailored experience through the lens of privacy and agency
Since then we've been quiet.... _too quiet_.
## New phone, who dis?
It's my first time writing on this blog, so I want to introduce myself. My name is John Shaughnessy. I'm a software engineer at Mozilla.
I got involved in Memory Cache a few months ago by resolving an issue that was added to the github repo and a couple weeks ago I started building Memory Cache V2.
## Why V2?
Memory Cache V1 was a browser extension and that made it convenient to collect and feed documents to an LLM-based program called `privateGPT`. PrivateGPT would break the documents into fragments, save those fragments in a vector database, and let you perform similarity search on those documents via a command line interface. We were running an old version of PrivateGPT based on LangChain.
There were several big, obvious technical gaps between Memory Cache V1 and what we'd need in order to do the kind of investigative research and product development we wanted to do.
It seemed to me that if we really wanted to explore the design space, we'd need to roll our own backend and ship a friendly client UI alongside it. We'd need to speed up inference and we'd need more control over how it ingested documents, inserted context, batched background tasks and presented information to you.
We also needed to fix the "getting started" experience. Setting up V1 required users to be comfortable working on the command line, managing python environments, and in general understanding their file system. As far as I'm aware, there are only three of us who have gone through the steps to actually set up and run V1. We were inspired by [Llamafile](https://github.com/Mozilla-Ocho/llamafile/) and [cosmopolitan](https://justine.lol/ape.html), which create executables that you just download and run on many platforms.
And lastly, we're excited about multiplayer opportunities. Could insights that my LLM generates become useful to my teammates? Under what circumstances would I want to share my data with others? How should I separate what's private, semi-private, or public?
### Running LLMs for Inference
I wasn't very familiar with running LLMs, and I certainly hadn't written an application that did "Retrieval-Augmented Generation" (RAG), which was what we wanted Memory Cache to do. So I started down a long, winding path.
Liv and I chatted with Iván Martínez who wrote `privateGPT`. He was super helpful! And it was exciting to talk to someone who'd built something that let us prototype what we wanted to do so quickly.
Mozilla had just announced [Llamafile](https://github.com/Mozilla-Ocho/llamafile), which seemed like a great way to package an LLM and serve it on many platforms. I wasn't familiar with either [Llama.cpp](https://github.com/ggerganov/llama.cpp) or [cosmo](https://cosmo.zip/), so there was a lot to learn. [Justine](https://github.com/jart) and [Stephen](https://github.com/stlhood) were incredibly helpful and generous with their time. I didn't contribute much back to the project other than trying to write accurate reports of a couple issues ([#214](https://github.com/Mozilla-Ocho/llamafile/issues/214), [#232](https://github.com/Mozilla-Ocho/llamafile/issues/232)) I ran into along the way.
Initially when I was looking into `Llamafile`, I wanted to repackage `privateGPT` as a `Llamafile` so that we could distribute it as a standalone executable. Eventually realized this wasn't a good idea. `Llamafile` bundles `Llama.cpp` programs as executables. `Cosmopolitan` _can_ also bundle things like a python interpreter, but tracking down platform-specific dependencies of `privateGPT` and handling them in a way that was compatible with cosmo was not going to be straightforward. It's just not what the project was designed to do.
Once I worked through the issues I was having with my GPU, I was amazed and excited to see how fast LLMs can run. I made a short comparison video that shows the difference: [llamafile CPU vs GPU](https://www.youtube.com/watch?v=G9wBw8jLJwU).
I thought I might extend the simple HTTP server that's baked into `Llamafile` with all the capabilities we'd want in Memory Cache. Justine helped me get some "hello world" programs up and running, and I started reading some examples of C++ servers. I'm not much of a C++ programmer, and I was not feeling very confident that this was the direction I really wanted to go.
I like working in Rust, and I knew that Rust had some kind of story for getting `C` and `C++` binding working, so I wrote a kind of LLM "hello world" program using [rustformers/llm](https://github.com/rustformers/llm). But after about a week of fiddling with Llamafile, Llama.cpp, and rustformers, I felt like I was going down a bit of a rabbit hole, and I wanted to pull myself back up to the problem at hand.
### Langchain and Langserve
Ok. So if we weren't going to build out a C++ or Rust server, what _should_ we be doing? `PrivateGPT` was a python project, and the basic functionality was similar to what I'd done in some simple programs I'd written with hugging face's `transformers` library. (I mentioned these in a [blog post](https://johnshaughnessy.com/blog/posts/osai-kube) and [talk](https://www.youtube.com/watch?v=AHd3jCMQQLs) about upskilling in AI.)
It seemed like `LangChain` and `LlamaIndex` were the two popular python libraries / frameworks for building RAG apps, so I wrote a "hello world" with LangChain. It was... fine. But it seemed like a _lot_ more functionality (and abstraction, complexity, and code) than I wanted.
I ended up dropping the framework after reading the docs for `ChromaDB` and `FastAPI`.
`ChromaDB` is a vector database for turning documents into fragments and then run similarity search (the fundamentals of a RAG system).
I needed to choose a database, and I chose this one arbitrarily. Langchain had an official "integration" for Chroma, but I felt like Chroma was so simple that I couldn't imagine an "integration" being helpful.
`FastAPI` is a python library for setting up http servers, and is "batteries included" in some very convenient ways:
- It's compatible with [pydantic](https://docs.pydantic.dev/) which lets you define types, validate user input against them, and generate `OpenAPI` spec from them.
- It comes with [swagger-ui](https://github.com/swagger-api/swagger-ui) which gives an interactive browser interface to your APIs.
- It's compatible with a bunch of other random helpful things like [python-multipart](https://github.com/Kludex/python-multipart).
The other thing to know about `FastAPI` is that as far as http libraries go, it's very easy to use. I was reading documentation about `Langserve`, which seemed like a kind of fancy server for `Langchain` apps until I realized that actually `FastAPI`, `pydantic`, `swagger-ui` et. al were doing all the heavy lifting.
So, I dropped LangChain and Langserve and decided I'd wait until I encountered an actually hard problem before picking up another framework. (And who knows -- such a problem might be right around the corner!)
It helped to read LangChain docs and code to figure out what RAG even is. After that I was able to get basic rag app working (without the framework). I felt pretty good about it.
### Inference
I still needed to decide how to run the LLM. I had explored Llamafiles and Hugging face's `transformers` library. The other popular option seemed to be `ollama`, so I gave that a shot.
`Ollama` ended up being very easy to get up and running. I don't know very much about the project. So far I'm a fan. But I didn't want users of Memory Cache to have to download and run an LLM inference server/process by themselves. It just feels like a very clunky user experience. I want to distribute ONE executable that does everything.
Maybe I'm out of the loop, but I didn't feel very good about any of the options. Like, what I really wanted was to write a python program that handled RAG, project files, and generating various artifacts by talking to an llm, and I also wanted it to run the LLM, and I also wanted it to be an HTTP server to serve a browser client. I suppose that's a complex list of requirements, but it seemed like a reasonable approach for Memory Cache. And I didn't find any examples of people doing this.
I had the idea of using Llamafiles for inference and a python web server for the rest of the "brains", which could also serve a static client. That way, the python code stays simple (it doesn't bring with it any of the transformers / hugging face / llm / cuda code).
### Memory Cache Hub
I did a series of short spikes to piece together exactly how such an app could work. I wrote a bit about each one in this [PR](https://github.com/Mozilla-Ocho/Memory-Cache/pull/58).
In the end, I landed on a (technical) design that I'm pretty happy with. I'm putting the pieces together in a repo called [Memory Cache Hub](https://github.com/johnshaughnessy/Memory-Cache-Hub/) (which will graduate to the Mozilla-Ocho org when it's ready). The [README.md](https://github.com/johnshaughnessy/Memory-Cache-Hub/blob/main/README.md) has more details, but here's the high level:
```plaintext
Memory Cache Hub is a core component of Memory Cache:
- It exposes APIs used by the browser extension, browser client, and plugins.
- It serves static files including the browser client and various project artifacts.
- It downloads and runs llamafiles as subprocesses.
- It ingests and retrieves document fragments with the help of a vector database.
- It generates various artifacts using prompt templates and large language models.
Memory Cache Hub is designed to run on your own machine. All of your data is stored locally and is never uploaded to any server.
To use Memory Cache Hub:
- Download the latest release for your platform (Windows, MacOS, or GNU/Linux)
- Run the release executable. It will open a new tab in your browser showing the Memory Cache GUI.
- If the GUI does not open automatically, you can navigate to http://localhost:4444 in your browser.
Each release build of Memory Cache Hub is a standalone executable that includes the browser client and all necessary assets. By "standalone", we mean that you do not need to install any additional software to use Memory Cache.
A Firefox browser extension for Memory Cache that extends its functionality is also available. More information can be found in the main Memory Cache repository.
```
There are two key ideas here:
- Inference is provided by llamafiles that the hub downloads and runs.
- We use `PyInstaller` to bundle the hub and the browser client into a single executable that we can release.
The rest of the requirements are handled easily in python because of the great libraries and tools that are available (`fastapi`, `pydantic`, `chromadb`, etc).
Getting the two novel ideas to work was challenging. I'm not a python expert, so figuring out the `asyncio` and `subprocess` stuff to download and run llamafiles was tricky. And `PyInstaller` has a long list of "gotchas" and "beware" warnings in its docs. I'm still not convinced I'm using it correctly, even though the executables I'm producing seem like they're doing the right thing.
## The Front End
By this time I had built three measly, unimpressive browser clients for Memory Cache. The first was compatible with `privateGPT`, the second was compatible with some early versions of the Memory Cache Hub. I built the third with `gradio` but quickly decided that it did not spark joy.
And none of these felt like good starting points for a designer to jump into the building process.
I've started working on a kind of "hello world" dashboard for Memory Cache using `tailwindcss`. I want to avoid reinventing the wheel and make sure the basic interactions feel good.
I've exposed most of the Hub's APIs in the client interface by now. It doesn't look or feel good yet, but it's good to have the basic capabilities working.
## What We're Aiming For
The technical pieces have started to fall into place. We're aiming to have the next iteration of Memory Cache - one that you can easily download and run on your own machine - in a matter of weeks. In an ideal world, we'd ship by the end of Q1, which is a few weeks away.
It won't be perfect, but it'll be far enough along that the feedback we get will be much more valuable, and will help shape the next steps.
================================================
FILE: docs/_posts/2024-03-15-devlog.markdown
================================================
---
layout: post
title: "Memory Cache Dev Log March 15 2024"
date: 2024-03-15 08 -0500
categories: developer-blog
---
_Author: John Shaughnessy_
# Memory Cache Dev Log, March 15 2024
Last Friday, during a casual weekly engineering call, a colleague asked how LLM libraries (llama.cpp, llamafile, langchain, llamaindex, HF transformers, ollama, etc) handle the different chat templates and special tokens that models train on. It was a good question, and none of us seemed to have a complete answer.
The subsequent discussion and research made me realize that a naive approach towards writing model-agnostic application would have unfortunate limitations. Insofar as the differences between models are actually important to the use case, application developers should write model-specific code.
## Text Summarization
I thought about the capabilities that are important for an application like Memory Cache, and which models would be good at providing those capabilities. The first obvious one was text summarization. I had "hacked" summarization by asking an assistant-type model (llamafile) to summarize text, but a model trained specifically for text summarization would be a better fit.
I tested a popular text summarization model with with HF transformers, since I didn't find any relevant llamafiles. (If they're out there, I don't know how to find them.) I wanted to make sure that HF code could be built and bundled to a native application with PyInstaller, since that's how we want to build and bundle Memory Cache as a standalone executable. I verified that it could with a [small test project](https://github.com/johnshaughnessy/summarization-test).
Bundling HF dependencies like pytorch increases the complexity of the release process because we'd go from 3 build targets (MacOS, Windows, Linux) to 8 build targets (assuming support for every platform that pytorch supports):
- `Linux` + `CUDA 11.8`
- `Linux` + `CUDA 12.1`
- `Linux` + `ROCm 5.7`
- `Linux` + `CPU`
- `Mac` + `CPU`
- `Windows` + `CUDA 11.8`
- `Windows` + `CUDA 12.1`
- `Windows` + `CPU`
It's good to know that this is possible, but since our near-term goal for Memory Cache is just to prove out the technical bits mentioned in the [previous dev log](https://memorycache.ai/developer-blog/2024/03/07/devlog.html), we'll likely stick with the text summarization "hack" for now.
## Training Agents
Text summarization is still a simple task (in terms of inference inputs and outputs), so models trained to summarize are likely interchangable for the most part (modulo input/output lengths). However, once we start looking at more complicated types of tasks (like tool use / function calling / memory), the differences between models will be exaggerated.
Consider an example like [this dataset](https://huggingface.co/datasets/smangrul/assistant_chatbot_dataset) meant to help train a model to with act with agentic intentions, beliefs (memory), actions, and chat:
```
Context:
<|begincontext|><|beginlastuserutterance|>I am feeling hungry so I would like to find a place to eat.<|endlastuserutterance|><|endcontext|>
```
```
Target:
<|begintarget|><|begindsts|><|begindst|><|beginintent|>FindRestaurants<|endintent|><|beginbelief|><|endbelief|><|enddst|><|enddsts|><|beginuseraction|>INFORM_INTENT->Restaurants^intent~FindRestaurants<|enduseraction|><|beginaction|>REQUEST->Restaurants^city~<|endaction|><|beginresponse|>Do you have a specific which you want the eating place to be located at?<|endresponse|><|endtarget|>
```
Here, we can see that there are _many_ special tokens that the application developer would need to be aware of:
```
-
-
-
-
```
Research on how to train these types of models is still rapidly evolving. I suspect attempting to abstract away these differences will lead to leaky or nerfed abstractions in libraries and toolkits. For now, my guess is that it's better to write application code targeting the specific models you want to use.
## Conclusion
Even if a dedicated text summarization model doesn't make it into the upcoming release, this was a valuable excursion. These are the exact types of problems I hoped to stumble over along the way.
================================================
FILE: docs/_posts/2024-04-19-memory-cache-hub.markdown
================================================
---
layout: post
title: "Memory Cache Hub"
date: 2024-04-19 08 -0500
categories: developer-blog
---
_Author: John Shaughnessy_
# Memory Cache Hub
In a [dev log](https://memorycache.ai/developer-blog/2024/03/07/devlog.html) last month, I explained why we were building Memory Cache Hub. We wanted:
- our own backend to learn about and play around with [`RAG`](https://python.langchain.com/docs/expression_language/cookbook/retrieval),
- a friendly browser-based UI,
- to experiment with `llamafile`s,
- to experiment with bundling python files with `PyInstaller`
The work outlined in that blog post is done. You can try Memory Cache Hub by following the [installation instructions in the README](https://github.com/Mozilla-Ocho/Memory-Cache-Hub?tab=readme-ov-file#installation).
My goal for Memory Cache Hub was just to get the project to this point. It was useful to build and I learned a lot, but there are no plans to continue development.
For the rest of this post, I would like to share some thoughts/takeaways from working on the project.
- Client/Server architecture is convenient, especially with OpenAPI specs.
- Browser clients are great until they're not.
- Llamafiles are relatively painless.
- Python and PyInstaller pros/cons.
- Github Actions and large files.
- There's a lot of regular (non-AI) work that needs doing.
## The Client / Server Architecture is convenient, especially with OpenAPI specs.
This is probably a boring point to start with, and it's old news. Still, I thought it'd be worth mentioning a couple of ways that it turned out to be nice to have the main guts of the application implemented behind an HTTP server.
When I was testing out `llamafiles`, I wanted to try enabling GPU acceleration, but my main development machine had some compatibility issues. Since Memory Cache was built as a separate client/server, I could just run the server on another machine (with a compatible GPU) and run the client on my main development machine. It was super painless.
We built Memory Cache with on-device AI in mind, but another form that could make sense is to run AI workloads on a dedicated homelab server (e.g. "an Xbox for AI") or in a private cloud. If the AI apps expose everything over HTTP apis, it's easy to play around with these kinds of setups.
Another time I was glad to have the server implemented separately from the client was when I wanted to build an emacs plugin that leveraged RAG + LLMs for my programming projects. I wrote about the project [on my blog](https://www.johnshaughnessy.com/blog/posts/acorn_pal_emacs). As I was building the plugin I realized that I could probably just plug in to Memory Cache Hub instead of building another RAG/LLM app. It ended up working great!
Unfortunately, I stopped working on the emacs plugin and left it in an unfinished state, mostly because I couldn't generate `elisp` client code for Memory Cache Hub's [Open API](https://openapi-generator.tech/docs/generators) spec. By the way, if anyone wants to write an elisp generator for Open API, that would be really great!
I ended up generating typescript code from the OpenAPI spec for use in the [Memory Cache Browser Client](https://github.com/Mozilla-Ocho/Memory-Cache-Browser-Client). The relevant bit of code was this:
```sh
# Download the openapi.json spec from the server
curl http://localhost:4444/openapi.json > $PROJECT_ROOT/openapi.json
# Generate typescript code
yarn openapi-generator-cli generate -i $PROJECT_ROOT/openapi.json -g typescript-fetch -o $PROJECT_ROOT/src/api/
```
## Browser Clients Are Great, Until They're Not
I am familiar with web front end tools -- React, javascript, parcel, css, canvas, etc. So, I liked the idea of building a front end for Memory Cache in the browser. No need to bundle things with [electron](https://www.electronjs.org/), and no need to train other developers I might be working with (who were also mostly familiar with web development).
For the most part, this worked out great. While the UI isn't "beautiful" or "breathtaking" -- it was painless and quick to build and it'd be easy for someone who really cared about it to come in and improve things.
That said, there were a couple of areas where working in the browser was pretty frustrating:
1. You can't specify directories via a file picker.
2. You can't directly send the user to a file URL.
### No file picker for me
The way Memory Cache works is that the user specifies files in their filesystem that they want to add to their "cache"s. The server will make its own copies of the files in those directories for ingestion and such. The problem is that while browsers have built-in support for a file upload window, there's no way to tell the browser that we want the user to specify full paths to directories on their hard drive.
It's not surprising that browser's don't support this. This isn't really what they're made for. But it means that for this initial version of Memory Cache Hub, I settled for telling the user to type complete file paths into an input field rather than having a file picker UI. This feels really bad and particularly unpolished, even for a demo app.
### No file previews
The browser acts as a file viewer if you specify the path of a file prefixed with `file://` in the address bar. This is convenient, because I wanted to let users easily view the files in their cache.
Unfortunately, due to security concerns, the browser disallows redirects to `file://` links. This means that the best I could do for Memory Cache was provide a "copy" button that puts the `file://` URI onto the user's clipboard. Then, they can open a new tab, paste the URL and preview the file. This is a much worse experience.
My client could also have provided file previews (e.g. of PDF's) directly with the server sending the file contents to the client, but I didn't end up going down this route.
Again, this isn't surprising because I'm mostly using the browser as an application UI toolkit and that's not really what it's for. Electron (or something like it) would have been a better choice here.
## Llamafiles are (relatively) painless
Using `llamafiles` for inference turned out to be an easy win. The dependencies of my python application stayed pretty simple because I didn't need to bring in hugging face / pytorch dependencies (and further separate platforms along `CUDA`/`ROCm`/`CPU` boundaries).
There are some "gotchas" with using `llamafiles`, most of which are documented in the [`llamafile README`](https://github.com/Mozilla-Ocho/llamafile?tab=readme-ov-file). For example, I don't end up enabling GPU support because I didn't spend time on handling errors that can occur if `llamafile` fails to move model weights to GPU for whatever reason. There are also still some platform-specific troubleshooting tips you need to follow if the `llamafile` server fails to start for whatever reason.
Still, my overall feeling was that this was a pretty nice way to bundle an inference server with an application, and I hope to see more models bundled as `llamafile`s in the future.
## Python and PyInstaller Pros and Cons
I'm not very deeply embedded in the Python world, so figuring out how people built end-user programs with Python was new to me. For example, I know that [Blender](https://www.blender.org/) has a lot of python code, but as far as I can tell, the core is built with C and C++.
I found `PyInstaller` and had success building standalone executables with it (as described in [this previous blog post](https://memorycache.ai/developer-blog/2024/03/07/devlog.html) and [this one too](https://memorycache.ai/developer-blog/2024/03/15/devlog.html)).
It worked, which is great. But there were some hurdles and downsides.
The first complaint is about the way `single-file` builds work. At startup, they need to unpack the supporting files. In our case, we had something like ~10,000 supporting files (which is probably our fault, not `PyInstaller`s) that get unpacked to a temporary directory. This takes ~30 seconds of basically just waiting around with no progress indicator or anything else of that nature. `PyInstaller` has an experimental feature for [adding a splash screen](https://pyinstaller.org/en/stable/usage.html#splash-screen-experimental), but I didn't end up trying it out because of the disclaimer at the top explaining that the feature doesn't work on MacOS. So, the single-file executable version of Memory Cache Hub appears as if it hangs for 30 seconds when you start it before eventually finishing the unpacking process.
The second complaint is not really about `PyInstaller` and more about using Python at all, which is that in the end we're still running a python interpreter at runtime. There's no real "compile to bytecode/machine code" (except for those dependencies written in something like [Cython](https://cython.org/)). It seems like python is the most well-supported ecosystem for developer tools for ML / AI, and part of me wishes I were spending my time in C or Rust. Not that I'm excellent with those languages, but considering that I get better at whatever I spend time doing, I'd rather be getting better at things that give me more control over what the computer is actually doing.
Nothing is stopping me from choosing different tools for my next project, and after all - `llama.cpp` is pretty darn popular and I'm looking forward to trying the (rust-based) [burn](https://github.com/tracel-ai/burn) project.
## Github Actions and Large Files
Ok, so here's another problem with my gigantic bundled python executables with 10,000 files... My build pipeline takes 2+ hours to finish!
Uploading the build artifacts from the runner to github takes a long time -- especially for the zips that have over 10,000 files in them. This feels pretty terrible. Again, I think the problem is not with Github or PyInstaller or anything like that -- The problem is thinking that shipping 10,000 files was a good idea. It wasn't -- I regret it. haha.
## There's a lot of non-AI work to be done
90% of the effort I put into this project was completely unrelated to AI, machine learning, rag, etc. It was all, "How do I build a python server", "How do we want this browser client to work?", "What's PyInstaller?", "How do we set up Github Actions?" etc.
The idea was that once we had all of this ground work out of the way, we'd have a (user-facing) playground to experiment with whatever AI stuff we wanted. That's all fine and good, but I'm not sure how much time we're going to actually spend in that experiment phase, since in the meantime, there have been many other projects vying for our attention.
My thoughts about this at the moment are two fold.
First, if your main goal is to experiment and learn something about AI or ML -- Don't bother trying to wrap it in an end-user application. Just write your python program or Jupyter notebook or whatever and do the learning. Don't worry if it doesn't work on other platforms or only supports whatever kind of GPU you happen to be running -- none of that changes the math / AI / deep learning / ML stuff that you were actually interested in. All of that other stuff is a distraction if all you wanted was the core thing.
However -- if your goal is to experiment with an AI or ML thing that you want people to use -- Then get those people on board using your thing as fast as possible, even if that means getting on a call with that, having them share their desktop and following your instructions to set up a python environment. Whatever you need to do to get them actually running your code and using your thing and giving you feedback -- that's the hurdle you should cross. That doesn't mean you have to ship out to the whole world. Maybe you know your thing is not ready for that. But if you have a particular user in mind and you want them involved and giving you constant feedback, it's good to bring them in early.
## What Now?
Learning the build / deploy side of things was pretty helpful and useful. I'd never built a python application like this one before, and I enjoyed myself along the way.
There's been some interest in connecting Memory Cache more directly with the browser history, with Slack, with emails, and other document/information sources. That direction is probably pretty useful -- and a lot of other people are exploring that space too.
However, I'll likely leave that to others. My next projects will be unrelated to Memory Cache. There are a lot of simple ideas I want to play around with in the space just to deepen my understanding of LLMs, and there are a lot of projects external to Mozilla that I'd like to learn more about and maybe contribute to.
## Screenshots
FilesAbout Memory CacheVector SearchChat depends on a model runningThe model selection pageThe model selection pageRetrieval augmented chat
================================================
FILE: docs/_sass/memorycache.scss
================================================
@import url('https://fonts.googleapis.com/css2?family=Work+Sans:wght@300;400;500;600&display=swap');
body {
font-family: 'Work Sans';
}
.site-title {
font-family: 'Work Sans';
color: coral;
}
.introduction {
background-image: url('../assets/images/header-background.png');
background-repeat:no-repeat;
background-size: 675px 210.75px;
background-position: right;
width: 100%;
min-height: 201px;
border-bottom-width: 1px;
border-bottom-style: solid;
border-bottom-color: #F0EFEF;
}
.page-content {
background-color: white;
}
a {
color: #180AB8;
}
.site-logo {
width: 166px;
height: 36px;
}
.site-header {
border-top-width: 5px;
border-top-style: solid;
border-image: linear-gradient(90deg, #FFB3BB 0%, #FFDFBA 26.56%, #FFFFBA 50.52%, #87EBDA 76.04%, #BAE1FF 100%) 1 0 0 0;
}
.callout-left {
width: 75%;
padding-top: 5%;
}
.page-content {
border-top-width: 1px;
border-top-style: solid;
border-top-color: #F0EFEF;
}
.post-list-heading {
font-size: 16px;
padding-top: 5px;
}
.post-link {
font-size: 16px;
}
.moz-logo {
width: 128px;
float: right;
}
.detailed-overview {
padding-top: 5px;
border-top-width: 1px;
border-bottom-width: 1px;
border-top-style: solid;
border-bottom-style: solid;
border-top-color: #F0EFEF;
border-bottom-color: #F0EFEF;
}
figcaption {
text-align: center;
font-style: italic;
margin: 1em 0 3em 0;
}
================================================
FILE: docs/_sass/minima.scss
================================================
@charset "utf-8";
// Define defaults for each variable.
$base-font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol" !default;
$base-font-size: 16px !default;
$base-font-weight: 400 !default;
$small-font-size: $base-font-size * 0.875 !default;
$base-line-height: 1.5 !default;
$spacing-unit: 30px !default;
$text-color: #111 !default;
$background-color: #fdfdfd !default;
$brand-color: #2a7ae2 !default;
$grey-color: #828282 !default;
$grey-color-light: lighten($grey-color, 40%) !default;
$grey-color-dark: darken($grey-color, 25%) !default;
$table-text-align: left !default;
// Width of the content area
$content-width: 800px !default;
$on-palm: 600px !default;
$on-laptop: 800px !default;
// Use media queries like this:
// @include media-query($on-palm) {
// .wrapper {
// padding-right: $spacing-unit / 2;
// padding-left: $spacing-unit / 2;
// }
// }
@mixin media-query($device) {
@media screen and (max-width: $device) {
@content;
}
}
@mixin relative-font-size($ratio) {
font-size: $base-font-size * $ratio;
}
// Import partials.
@import
"minima/base",
"minima/layout",
"minima/syntax-highlighting"
;
================================================
FILE: docs/about.markdown
================================================
---
layout: page
title: About
permalink: /about/
---
Memory Cache is an exploration into synthesis, discovery, and sharing of insights more effectively through the use of technology.
Unlike most explorations into artificial intelligence, Memory Cache is designed to be completely personalized, on-device, and private. It is meant to explore the nuances of how individuals think, from the perspective of learning alongside us over time from the articles we read and save.
================================================
FILE: docs/assets/main.scss
================================================
---
---
@import "minima";
@import "memorycache";
================================================
FILE: docs/faq.md
================================================
---
layout: default
title: FAQ
---
# Frequently Asked Questions
**Q: How do I try MemoryCache?**
Right now (as of December 7, 2023), MemoryCache requires a few manual steps to set up the end to end workflow. There are three components: a) a Firefox extension, b) a local instance of privateGPT, and c) a symlinked folder between privateGPT and your local Downloads folder. There is also an optional configuration that can be done to a private build of Firefox to save files to your local machine as PDF files instead of HTML files. Check out the [GitHub repository](https://github.com/Mozilla-Ocho/Memory-Cache) for more detailed instructions. We are looking into ways to streamline the deployment of MemoryCache to require less manual configuration, but if you're here at this stage, you're at the very earliest stages of our explorations.
**Q: Does MemoryCache send my data anywhere?**
No. One of the core principles of MemoryCache is that you have full control over the system, and that it all stays on your device. If you're a developer or someone who just likes to tinker with your computer applications, and you want to cloud-ify this, feel free! But we're looking to stay entirely local.
**Q: Why is MemoryCache using an old language model and primordial privateGPT?**
MemoryCache is using an old language model ([Nomic AI's gpt4all-j v1.3 groovy.ggml](https://huggingface.co/nomic-ai/gpt4all-j)) and primordial privateGPT because right now, this combo is the one that passes our criteria for the type of responses that it generates. This tech is almost a year old, and there have been many advancements in local AI that we'll be integrating in over time, but we're a small team exploring a lot of different subsets of this problem space and the quality of the insight generated is a sweet spot that we want to preserve. This is a temporary tradeoff, but we want to be careful to keep a consistent benchmark for insight generation.
GPT-J was trained on the 'Pile' dataset, and the versions between the 1.0 release and 1.3 release also added the ShareGPT and Dolly datasettes. The Databricks Dolly dataset is licensed under the Creative Commons license with human contributions and wikipedia entries. The ShareGPT dataset is human prompts and ChatGPT output responses that were submitted by human users.
**Q: What kind of tasks would I use MemoryCache for?**
MemoryCache is ultimately leaning into the weird and creative parts of human insight. The goal with MemoryCache is to "learn what you learn", which is why you are in control of what you want files you want to augment the application with. This can be helpful for research, brainstorming, creative writing, and synthesis of new ideas to connect seemingly unrelated topics together to find new insights and learnings from the body of knowledge that matters most to you.
**Q: Is this a Firefox project?**
No. MemoryCache is a hackathon-style project by the Mozilla Innovation Ecosystem team, not a Firefox project.While the project uses a Firefox extension as a way of collecting information, this is a set of scripts and tools to augment privateGPT, a native AI application. It's meant to streamline the process of saving information that you might read in the browser to store it in your own personal library of information.
================================================
FILE: docs/index.markdown
================================================
---
# Feel free to add content and custom Front Matter to this file.
# To modify the layout, see https://jekyllrb.com/docs/themes/#overriding-theme-defaults
layout: home
---
================================================
FILE: docs/readme.md
================================================
Placeholder
================================================
FILE: extension/content-script.js
================================================
function getPageText() {
const head = document.head.innerHTML;
const body = document.body.innerText;
return `\n\n${head}\n\n\n${body}\n\n