Repository: cohere-ai/cohere-terrarium
Branch: main
Commit: d1e20d6be3e9
Files: 31
Total size: 54.9 KB
Directory structure:
gitextract_h93r42wc/
├── CHANGELOG.md
├── CODEOWNERS
├── Dockerfile
├── LICENSE
├── README.md
├── default_python_home/
│ ├── README.md
│ └── matplotlibrc
├── example-clients/
│ └── python/
│ ├── requirements.txt
│ └── terrarium_client.py
├── nodemon.json
├── package.json
├── src/
│ ├── index.ts
│ ├── services/
│ │ └── python-interpreter/
│ │ ├── service.ts
│ │ └── types.ts
│ └── utils/
│ └── async-utils.ts
├── tests/
│ ├── file_io/
│ │ ├── _outputs/
│ │ │ └── test_file_output.json
│ │ ├── replay_inputs.py
│ │ ├── simple_matplotlib.py
│ │ ├── simple_matplotlib_barchart.py
│ │ └── test_file_input.json
│ ├── functionality/
│ │ ├── error_missing_import.py
│ │ ├── error_syntax_error.py
│ │ ├── error_wrong_param.py
│ │ ├── numpy_simple.py
│ │ ├── super_long_python_file.py
│ │ └── sympy_simple.py
│ └── security/
│ ├── create_dir.py
│ ├── cve_2026_5752_proto_escape.py
│ ├── list_dirs.py
│ └── subprocess.py
└── tsconfig.json
================================================
FILE CONTENTS
================================================
================================================
FILE: CHANGELOG.md
================================================
# Changelog
## 1.0.1 — 2026-04-22
### Security
* Fix **CVE-2026-5752** (CVSS 9.3, critical): sandbox escape via JavaScript
prototype chain traversal in `src/services/python-interpreter/service.ts`.
Mock `document` / `ImageData` / DOM stub objects exposed to Pyodide via
`jsglobals` were plain object literals that inherited from
`Object.prototype`, allowing sandboxed Python to walk
`.constructor.constructor` to the host `Function` constructor, obtain
host `globalThis`, and reach `require` for arbitrary code execution as
root. Every exposed object is now built with `Object.create(null)`;
read-only mocks are additionally frozen. See `SECURITY.md` and
[VU#414811](https://kb.cert.org/vuls/id/414811).
* Add regression test
`tests/security/cve_2026_5752_proto_escape.py`.
### Notes
This project remains unmaintained beyond this security release. Users are
encouraged to migrate to a maintained sandbox.
================================================
FILE: CODEOWNERS
================================================
# Explicit All
* @cohere-ai/rag
## Security can approve changes to CODEOWNERS:
* @cohere-ai/security
================================================
FILE: Dockerfile
================================================
FROM node:21-alpine3.18
WORKDIR /usr/src/app
COPY package*.json ./
RUN apk --no-cache add curl
RUN npm install
RUN npm i -g typescript ts-node
RUN npm prune --production
COPY . .
EXPOSE 8080
ENV ENV_RUN_AS "docker"
HEALTHCHECK --interval=1s --timeout=10s --retries=2 \
CMD curl -m 10 -f http://localhost:8080/health || kill 1
ENTRYPOINT [ "ts-node" , "src/index.ts"]
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2024 Cohere
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
# ⚠️ No longer maintained: This repository is archived and no longer supported. If you wish to continue development, please fork the project.
# Terrarium - A Simple Python Sandbox
Terrarium is a relatively low latency, easy to use, and economical Python sandbox - to be used as a docker deployed container, for example in GCP Cloud Run - for executing untrusted user or LLM generated ``python`` code.
- **Terrarium is fast:** 900ms runtime to generate a 200 dpi png with a simple matplotlib barchart - 500 ms for a svg version. (hosted on GCP Cloud Run)
- **Terrarium is cheap:** We spent less than $30 a month hosting terrarium on GCP during internal annotations (2GB mem + 1vCPU and at least 1 alive instance + autoscale on demand)
- **Terrarium is fully compartmentalized:** The sandbox gets completely recycled after every invocation. No state whatsoever is carried over between calls. *Cohere does not give any guarantees for the sandbox integrity.*
- **Terrarium supports native input & output files:** You can send any number & type of files as part of the request and we put it them in the python filesystem. After the code execution we gather up all generated files and return them with the response.
- **Terrarium supports many common packages:** Terrarium runs on [Pyodide](https://pyodide.org/en/stable/index.html), therefore it supports numpy, pandas, matplotlib, sympy, and other standard python packages.
## Using Terrarium
Using the deployed Cloud Run is super easy - just call it with the `code` to run & authorization bearer (if so configured) as follows:
```bash
curl -X POST --url <name of your deployed gcp cloud run> \
-H "Authorization: bearer $(gcloud auth print-identity-token)" \
-H "Content-Type: application/json" \
--no-buffer \
--data-raw '{"code": "1 + 1"}'
```
which returns:
```json
{"output_files":[],"final_expression":2,"success":true,"std_out":"","std_err":"","code_runtime":16}
```
The authentication `gcloud auth print-identity-token` needs to be renewed every hour.
See `terrarium_client.py` for an easy-to-use python function to call the service - including file input & output functionality via base64 encoded files.
## Sandbox Design
The sandbox is composed of multiple layers:
1. Parse, compile, & execute python code inside a node.js process - via CPython compiled to webassembly, not running natively - with https://pyodide.org/en/stable/index.html. This approach restricts the untrusted code's abilities:
- NO access to the filesystem (pyodide provides a compartmentalized memory only guest filesystem)
- NO threading & multiprocessing
- NO ability to call a subprocess
- NO access to any of our hosts memory
- NO access to other call states: we recycle the full pyodide environment (including the virtual file system, global state, loaded libs ... the works) after every call
- NO network nor internet access (this is a current design choice and could be changed in the future)
2. Deploy the node.js host into a GCP Cloud Run container, which restricts:
- runtime
- decouples the node.js host (in case of a breakout) from the rest of our network
---
The following packages are supported out of the box:
https://pyodide.org/en/stable/usage/packages-in-pyodide.html including, but not limited to:
- numpy
- pandas
- sympy
- beautifulsoup4
- matplotlib (plt.show() is not supported, but plt.savefig() works like a charm - most of the time)
- python-sat
- scikit-learn
- scipy
- sqlite3 (not enabled by default, but we could load it as well)
## Development
You need node.js installed on your system. To install dependencies run:
```bash
npm install
mkdir pyodide_cache
```
run the server & function locally:
```bash
npm run dev
```
execute code in the terrarium:
```bash
curl -X POST -H "Content-Type: application/json" \
--url http://localhost:8080 \
--data-raw '{"code": "1 + 1"}' \
--no-buffer
```
run a set of test files (all .py files in ``/test``) through the endpoint with:
```bash
python terrarium_client.py http://localhost:8080
```
## Deployment
### Deploy as Docker container
To run in docker:
**Build:**
```bash
docker build -t terrarium .
```
**Run:**
```bash
docker run -p 8080:8080 terrarium
```
**Stop:**
```bash
docker ps
```
to get the container id and then
```bash
docker stop {container_id}
```
### Deploy to GCP Cloud Run
Allocating more resources to speed up run time as well as limiting concurrency from Cloud Run:
```bash
gcloud run deploy <insert name of your deployment here> \
--region=us-central1 \
--source . \
--concurrency=1 \
--min-instances=3 \
--max-instances=100 \
--cpu=2 \
--memory=4Gi \
--no-cpu-throttling \
--cpu-boost \
--timeout=100
```
### Handling timeouts
Pyodide today runs on the node.js main process, and can block node.js from responding. Pyodide recommends using a Worker if we need to interrupt. However the interface with pyodide would be through message passing, and it doesn't support matplotlib amongst other libraries.
Example code that would trigger a timeout.
```bash
curl -m 110 -X POST <insert name of your deployment here> \
-H "Authorization: bearer $(gcloud auth print-identity-token)" \
-H "Content-Type: application/json" \
-d '{
"code": "import time\ntime.sleep(200)"
}'
```
Cloud Run doesn't support Dockerfile healthcheck. Once the service is deployed for the first time, you need to grab the service.yaml file and add the liveness probe.
`gcloud run services describe <insert name of your deployment here> --format export > service.yaml`
Add [livenessProbe](https://cloud.google.com/run/docs/configuring/healthchecks#yaml_3) after the `image` definition
```
livenessProbe:
failureThreshold: 1
httpGet:
path: /health
port: 8080
periodSeconds: 100
timeoutSeconds: 1
```
Run `gcloud run services replace service.yaml `
This is only needed once per new Cloud Run service deployed.
Docker itself doesn't support auto-restarts based on HEALTHCHECK (it seems). Process with pid `1` seems protected, and can't be killed. Would need to spin up a separate service like so: https://github.com/willfarrell/docker-autoheal
## Limitations
### Ability to install packages
### Network access
### Complex operations
For large & complex computations we sometimes observe untraceble "RangeError: Maximum call stack size exceeded" exceptions in Pyodide.
- This increasingly happens when we set a too high dpi parameter on png saves for matplotlib figures
- Or highly complex pandas operations
See also: https://blog.pyodide.org/posts/function-pointer-cast-handling/
================================================
FILE: default_python_home/README.md
================================================
Hey there 👋 If you found this file via the sandbox - why don't apply at https://cohere.com/careers 🤷?
================================================
FILE: default_python_home/matplotlibrc
================================================
#
# cohere default style for matplotlib
#
figure.figsize: 6.4, 4.0 # make it a bit more portrait mode vs default
# general figure styles
axes.spines.top: False
axes.spines.right: False
axes.edgecolor: 9a9a9a
axes.linewidth: 1
xtick.color: 9a9a9a
xtick.labelcolor: black
ytick.color: 9a9a9a
ytick.labelcolor: black
xtick.major.width: 1
ytick.major.width: 1
lines.linewidth: 2
lines.dash_capstyle: round
lines.solid_capstyle: round
# not sure if we should activate this by default, it leads to wide margins in some cases
# axes.autolimit_mode: round_numbers
# axes.xmargin: 0.05 # x margin.
# axes.ymargin: 0.05 # y margin.
# green, orange, violet from cohere logo + default matplotlib colors
axes.prop_cycle: cycler('color', ['39594d','ff7759','d18ee2','1f77b4', 'ff7f0e', '2ca02c', 'd62728', '9467bd', '8c564b', 'e377c2']) + cycler('linestyle', ['-',':',(0, (5, 7)), '-.', '-', '--', '-.','-', '--', '-.'])
# we could also add default markers, but that can look too messy
# + cycler('marker', ['o', 's', '^','o', 's', '^','o', 's', '^','o'])
# this is a fiddly number - setting it too high and the sandbox rendering of a png breaks with a call stack error :shrug: when we also use the constrained layout
figure.dpi: 128
# auto set the constrained layout - this is important to make sure the text is in bounds
# alternative: autolayout can be a bit thorny, but could be activated with #figure.autolayout: True
figure.constrained_layout.use: True
# by default remove the background
savefig.transparent: True
# svg.fonttype: none # activate if frontends support font overrides
# show faded grid
axes.grid: True
grid.linestyle: dotted
grid.alpha: 0.5
# make sure to put the grid lowest on the z-axis
axes.axisbelow: True
================================================
FILE: example-clients/python/requirements.txt
================================================
requests
typing_extensions
google-auth
================================================
FILE: example-clients/python/terrarium_client.py
================================================
import glob
from sys import argv
from typing import List
from typing_extensions import TypedDict
import requests
import json
import time
import google.auth
import google.auth.transport.requests
#
# credentials needed if connecting to a gcp cloud run / function deployment
#
creds, project = google.auth.default()
def get_bearer():
auth_req = google.auth.transport.requests.Request()
if creds.expired == True or creds.valid == False:
print("refreshing creds")
creds.refresh(auth_req)
return creds.id_token.strip()
class B64_FileData(TypedDict):
b64_data: str
filename: str
def run_terrarium(server_url:str, code:str, file_data:List[B64_FileData] = None):
"""
Executes the given code in the terrarium environment and returns the result.
Args:
server_url (str): The URL of the terrarium server.
code (str): The code to be executed in the terrarium environment.
file_data (dict, optional): Additional file data to be passed to the terrarium server. Defaults to None.
Returns:
dict: The result of executing the code in the terrarium environment.
The result is a dictionary with the following:
- success: A boolean indicating whether the code was executed successfully.
- error: An error object containing the type and message of the error, if any.
- std_out: The standard output stream as single string of the code execution.
- std_err: The standard error stream as single string of the code execution.
- code_runtime: The inner runtime of the code in milliseconds (excluding networking, auth, et al.).
Raises:
RuntimeError: If there is an error when parsing the response content.
"""
headers = {"Content-Type": "application/json",
"Authorization":"bearer " + get_bearer()}
data = {"code": code}
if file_data is not None:
data["files"] = file_data
result = requests.post(server_url, headers=headers, json=data, stream=True)
if result.status_code != 200:
return {"success": False,
"error": {
"type": "HTTPError",
"message": "Error: {result.status_code} - {result.text}"
},
"std_out": "",
"std_err": "",
"code_runtime": 0}
#
# Explanation for this contorted parsing (made possbile by stream=True):
#
# The terrarium server needs to recycle the python interpreter environment either before or after each request.
# We are doing it after to save on latency for the next request.
# BUT the annoying thing is that gcp cloud functions and optionally cloud run terminate all CPU cycles as soon as the response content is closed !!
# With this trick we can parse the response content, return from this function, but crucially don't have to close the connection,
# and then the server can recycle the python interpreter.
#
res_string = ""
try:
for c in result.iter_content(decode_unicode=True):
if c == "\n":
break
res_string+=c
return json.loads(res_string)
except json.decoder.JSONDecodeError as e:
raise RuntimeError("Error when parsing: "+ res_string, e)
import base64
import os
def file_to_base64(file_path):
try:
# Read the file in binary mode
with open(file_path, 'rb') as file:
# Read the content of the file
file_content = file.read()
# Convert the binary content to base64 encoding
base64_content = base64.b64encode(file_content)
# Decode the base64 bytes to a UTF-8 string
base64_string = base64_content.decode('utf-8')
return base64_string
except FileNotFoundError:
print(f"Error: File not found - {file_path}")
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
# get url from command line argument
if len(argv) < 2:
print("Usage: python terrarium_client.py <server_url>")
exit(1)
server_url = argv[1]
current_directory = os.path.dirname(os.path.realpath(__file__))
test_files = glob.glob(os.path.join(current_directory, "../../tests/**/*.py"),recursive=True)
print("Testing files:",test_files)
for file in test_files:
file_data = None
if "file_io" in file:
# load all test_file_input* files
input_files = glob.glob(os.path.join(current_directory, "../../tests/file_io/test_file_*"))
file_data = []
for f in input_files:
file_data.append({"filename": os.path.basename(f), "b64_data": file_to_base64(f)})
print(file)
print("---------")
with open(file) as f:
code = "".join(f.readlines())
print(code)
print("---------")
start = time.time()
#
# run the code in the terrarium environment
#
result = run_terrarium(server_url, code, file_data)
if "output_files" in result:
os.makedirs("tests/file_io/_outputs",exist_ok=True)
for of in result["output_files"]:
print(of["filename"],of["b64_data"][:20]+"...")
with open(os.path.join("tests/file_io/_outputs",of["filename"]),mode="wb") as f2:
f2.write(base64.b64decode(of["b64_data"]))
del result["output_files"]
print(json.dumps(result,indent=2,ensure_ascii=False))
print("response parsed after:",time.time() - start)
print("\n***********************\n")
# let the server recycle the python interpreter (useful for local testing to see true speed)
# disable this for load testing / testing scalability
time.sleep(15)
================================================
FILE: nodemon.json
================================================
{
"watch": ["src"],
"ext": "ts",
"exec": "ts-node ./src/index.ts"
}
================================================
FILE: package.json
================================================
{
"name": "cohere-terrarium",
"version": "1.0.1",
"description": "A simple Python sandbox for helpful LLM data agents",
"repository": {
"type": "git",
"url": "https://github.com/cohere-ai/cohere-terrarium.git"
},
"license": "MIT",
"main": "dist/index.js",
"dependencies": {
"@types/express": "^4.17.21",
"@types/node": "^20.11.30",
"clean": "^4.0.2",
"express": "^4.19.2",
"pyodide": "^0.24.1",
"typescript": "^5.4.3"
},
"scripts": {
"build": "tsc",
"start": "ts-node ./src/index.ts",
"gcp-build": "npm run build",
"gcp-start": "npm run start",
"dev": "nodemon src/index.ts"
},
"devDependencies": {
"node-fetch": "^3.3.2",
"nodemon": "^3.1.0",
"ts-node": "^10.9.2"
}
}
================================================
FILE: src/index.ts
================================================
import { PyodideInterface, loadPyodide } from 'pyodide';
import express, { Express, Request, Response } from "express";
import { PyodidePythonEnvironment } from '../src/services/python-interpreter/service';
import { PythonEnvironment } from './services/python-interpreter/types';
import { doWithLock } from './utils/async-utils';
const pythonEnvironment: PythonEnvironment = new PyodidePythonEnvironment();
// prepare python env before a request comes in
pythonEnvironment.init()
//
// The main http endpoint
//
// Can create more express apps if we need multiple services.
const terrariumApp: Express = express();
terrariumApp.use(express.json({ limit: '100mb' }));
async function runRequest(req: any, res: any): Promise<void> {
res.setHeader("Content-Type", "application/json");
// make sure pyodide is loaded
await pythonEnvironment.waitForReady();
//
// parse the request body (code & files)
//
const code = req.body.code
if (code == undefined || code.trim() == "") {
res.send(JSON.stringify({ "success": false, "error": { "type": "parsing", "message": "no code provided" } }) + "\n");
return
}
let files: any[] = [] // { "filename": "file.txt", "b64_data": "dGhlc..." }]
if (req.body.files != undefined) {
files = req.body.files
console.log("Got " + files.length + " input files")
console.log(files.map(f => f.filename + " " + f.b64_data.slice(0, 10) + "... " + f.b64_data.length))
}
const result = await pythonEnvironment.runCode(code, files);
// write out the answer, but do not close the response yet - otherwise gcp cloud functions terminate the cpu cycles and hibernate the recycling
res.write(JSON.stringify(result) + "\n");
console.log("Reloading pyodide");
// run the recycle background process'
// see https://cloud.google.com/functions/docs/bestpractices/tips#do_not_start_background_activities
await pythonEnvironment.terminate();
await pythonEnvironment.cleanup();
// to make gcp run it until the promise resolves & only now close the response connection
res.end()
}
terrariumApp.post('', async (req, res) => {
// queue 1 request at a time - might be better in express.js middleware probably if we run into issues (example: https://www.npmjs.com/package/express-queue though not maintained)
await doWithLock('python-request', () => runRequest(req, res));
});
terrariumApp.get('/health', (req, res) => {
res.send("hi!");
});
const server = terrariumApp.listen(8080, () => {
console.log("Server is running on port 8080");
});
================================================
FILE: src/services/python-interpreter/service.ts
================================================
import { PyodideInterface, loadPyodide } from "pyodide";
import { waitFor } from "../../utils/async-utils";
import { promises as fs } from 'fs';
import * as path from 'path';
import { CodeExecutionResponse, FileData, PythonEnvironment } from "./types";
const pythonEnvironmentHomeDir = "/home/earth";
const defaultDirectoryOuterPath = 'default_python_home';
// CVE-2026-5752 hardening:
// Plain `{}` literals inherit from Object.prototype, which lets sandboxed
// Pyodide code walk the prototype chain (e.g. `({}).constructor.constructor`)
// to reach the host Function constructor, obtain the host `globalThis`, and
// from there reach Node.js internals such as `require`. Building every object
// exposed to the sandbox with a NULL prototype (`Object.create(null)`)
// removes the chain so `.constructor` resolves to undefined.
//
// Some mocks must remain writable because matplotlib-pyodide assigns to them
// during figure init (e.g. `style_element.id = "..."`,
// `el.style.display = "none"`). Freezing those objects causes Pyodide to
// throw a TypeError mid-`plt.subplots()` and breaks all matplotlib output.
// `nullProto` keeps the mock writable; `sealed` also freezes for objects
// that are only read or only have their methods called.
function nullProto<T extends object>(props: T): T {
return Object.assign(Object.create(null) as T, props);
}
function sealed<T extends object>(props: T): Readonly<T> {
return Object.freeze(nullProto(props));
}
const noop = () => { /* no-op DOM stub */ };
// `elementStub` and its `style` are NOT frozen: matplotlib-pyodide writes
// `.id`, `.textContent`, `.style.display`, etc. on returned elements.
const elementStub = () => nullProto({
addEventListener: noop,
style: nullProto({}),
classList: sealed({ add: noop, remove: noop }),
setAttribute: noop,
appendChild: noop,
remove: noop,
});
export class PyodidePythonEnvironment implements PythonEnvironment {
out_string = ""
err_string = ""
default_files: any[] = []
default_file_names = new Set()
pyodide?: PyodideInterface;
interruptBufferPyodide = new SharedArrayBuffer(4);
interrupt = new Uint8Array(this.interruptBufferPyodide);
async prepareEnvironment() {
console.log("Preparing Pyodide environment");
const files = await fs.readdir(defaultDirectoryOuterPath);
const filePromises = files.map(file => {
const filePath = path.join(defaultDirectoryOuterPath, file);
return this.readHostFileAsync(filePath);
});
const filesData = await Promise.all(filePromises);
filesData.forEach(({ filename, data }) => {
this.default_files.push({ "filename": filename, "byte_data": new Uint8Array(data) })
this.default_file_names.add(filename)
});
}
async loadEnvironment(): Promise<void> {
console.log("Loading Pyodide environment");
this.interrupt[0] = 0;
this.out_string = ""
this.err_string = ""
this.pyodide = await loadPyodide({
packageCacheDir: "pyodide_cache", // allows us to cache the packages in the cloud function deployment
stdout: msg => { this.out_string += msg + "\n" },
stderr: msg => { this.err_string += msg + "\n" },
// we need to provide fake ImageData & document objects to pyodide, because matplotlib-pyodide polyfills try to access them when initializing
// BUT luckily for us matplotlib-pyodide does not actually use them for .savefig rendering (only for .show()), so we can just provide empty objects
//
// SECURITY (CVE-2026-5752): every object **the Python sandbox can
// reach via `import js`** MUST be built with `Object.create(null)`
// (via `sealed`) so the sandbox cannot walk
// Object.prototype -> Function -> globalThis -> require.
//
// The outer `jsglobals` container is intentionally a plain object:
// Pyodide writes its own bookkeeping into the globals at runtime,
// and freezing the container silently drops those writes (which
// later manifests as `'hiwire_call_bound' in undefined` when
// Pyodide tries to walk a JS error stack). It is the *values*
// exposed to the sandbox that need null prototypes, not the
// container Pyodide owns.
jsglobals: {
clearInterval, clearTimeout, setInterval, setTimeout,
ImageData: Object.freeze(Object.create(null)),
document: sealed({
getElementById: (id: any) => {
if (id.includes("canvas")) return null; // lol don't ask ... this is needed! https://github.com/pyodide/matplotlib-pyodide/blob/61935f72718c0754a9b94e1569a685ad3c50ae91/matplotlib_pyodide/wasm_backend.py#L48
return elementStub();
},
createElement: () => elementStub(),
createTextNode: () => elementStub(),
body: sealed({ appendChild: noop }),
}),
},
env: { "HOME": pythonEnvironmentHomeDir } // using a non-descriptive home dir
});
let pyodide = this.pyodide!;
// write the default files from default_python_home to the pyodide file system
this.default_files.forEach((f) => {
pyodide.FS.writeFile(pyodide?.PATH.join2(pythonEnvironmentHomeDir, f.filename), f.byte_data);
})
// load the packages we commonly use to avoid the latency hit during the user req
await pyodide.loadPackage(["numpy", "matplotlib", "pandas"])
// set interrupt buffer to allow for termination
pyodide.setInterruptBuffer(this.interrupt);
// second part of the import (also takes a latency hit), its ok to re-import packages
await pyodide.runPythonAsync("import matplotlib.pyplot as plt\nimport pandas as pd\nimport numpy as np")
console.log("Pyodide is loaded with packages imported")
return Promise.resolve();
}
async init(): Promise<void> {
await this.prepareEnvironment();
await this.loadEnvironment();
}
async waitForReady(): Promise<void> {
//TODO won't need this in 2nd iteration
if (!this.pyodide) {
let max_tries = 0
while (max_tries < 100 && this.pyodide == null) {
await waitFor(100);
max_tries++;
}
}
if (this.pyodide == null) {
console.error("pyodide is still not loaded after waiting")
return Promise.reject("pyodide is still not loaded after waiting")
}
return Promise.resolve();
}
async terminate(): Promise<void> {
// terminating to avoid leak (noticed packages are loaded twice with loadEnvironment the second time)
this.interrupt[0] = 1;
}
async cleanup(): Promise<void> {
return this.loadEnvironment();
}
/**
* Simple helper function to read a file asynchronously.
* @param {string} filePath - The path of the file to be read.
* @returns {Promise<{ filename: string, data: Buffer }>} - A promise that resolves to an object containing the filename and the file data.
* @throws {Error} - If there is an error reading the file.
*/
async readHostFileAsync(filePath: any): Promise<FileData> {
const buffer = await fs.readFile(filePath);
return { filename: path.basename(filePath), data: buffer };
}
/**
* Function to recursively list files in the pyodide file system from the given directory.
* @param {string} dir
* @returns list of file paths
*/
listFilesRecursive(dir: string) {
var files: any[] = [];
var entries = this.pyodide?.FS.readdir(dir);
for (var i = 0; i < entries.length; i++) {
var entry = entries[i];
if (entry === '.' || entry === '..') {
// Skip entries that are themselves directories
continue;
}
if (this.default_file_names.has(entry)) {
// Skip default files
continue;
}
var fullPath = this.pyodide?.PATH.join2(dir, entry);
if (this.pyodide?.FS.isDir(this.pyodide.FS.stat(fullPath).mode)) {
// If it's a directory, recursively list files in that directory
files = files.concat(this.listFilesRecursive(fullPath));
} else {
// If it's a file, add it to the list
files.push(fullPath);
}
}
return files;
}
/**
* Reads a file from the pyodide file system from the given file path and returns its content as a base64 encoded string.
* @param {string} filePath - The path of the file to be read.
* @returns {string} - The base64 encoded content of the file.
*/
readFileAsBase64(filePath: string) {
var fileData = this.pyodide!.FS.readFile(filePath, { encoding: 'binary' });
return this.bytesToBase64(fileData);
}
/**
* Transforms a byte array into a base64 encoded string.
* @param {Uint8Array} bytes the raw bytes to encode as base64
* @returns base64 encoded string
*/
bytesToBase64(bytes: any) {
const binString = String.fromCodePoint(...bytes);
return btoa(binString);
}
/**
* transforms a base64 encoded string into a byte array.
* @param {string} base64
* @returns Uint8Array of bytes
*/
base64ToBytes(base64: any) {
const binString = atob(base64);
return (Uint8Array as any).from(binString, (m: any) => m.codePointAt(0));
}
async runCode(code: string, files: any[]): Promise<CodeExecutionResponse> {
const startCode = Date.now();
let pyodide = this.pyodide!;
let result: CodeExecutionResponse = { success: true };
try {
// load available and needed packages - only supports pyodide built-in packages
await pyodide.loadPackagesFromImports(code)
//
// write the input files to the pyodide file system
//
files.forEach((f) => {
if (f.filename == undefined || f.b64_data == undefined) {
result.success = false;
result.error = { type: "parsing", message: "file data is missing for: " + JSON.stringify(f) }
return result;
}
// TODO make sure to create subdirectories if the file is in a subdirectory path
pyodide.FS.writeFile(pyodide?.PATH.join2(pythonEnvironmentHomeDir, f.filename), this.base64ToBytes(f.b64_data));
})
//
// !! here is where the code is actually executed !!
//
let interpreterResult = await pyodide.runPythonAsync(code);
//
// soak up newly created files and return them as output
//
var allFiles = this.listFilesRecursive(pythonEnvironmentHomeDir);
// get only the new files (not in the input files) and read as base64
let input_file_names = files.map(f => f.filename)
let new_files = allFiles
.filter(f => !input_file_names.includes(f.slice(pythonEnvironmentHomeDir.length + 1)))
.map(f => {
return { "filename": f.slice(pythonEnvironmentHomeDir.length + 1), "b64_data": this.readFileAsBase64(f) } //"content": decodeBase64ToText(readFileAsBase64(f))
});
console.log("output files:", new_files.map(f => f.filename + " " + f.b64_data.slice(0, 10) + "... " + f.b64_data.length));
result.output_files = new_files
let result_reporting = ""
if (interpreterResult != undefined) {
result_reporting = result.toString().replace(/\n/g, '\\n');
}
console.log("[Success] Code:", (code as any).replace(/\n/g, '\\n'),
"final_expression:", result_reporting,
"stdout:", this.out_string.replace(/\n/g, '\\n'),
"stderr:", this.err_string.replace(/\n/g, '\\n'));
result.final_expression = interpreterResult;
result.success = true
}
catch (error: any) {
// enrich error message with more code context
let errorMsg = error.toString()
// check for File "<exec>", line N, in <module> and extract the line number
let lineMatch = errorMsg.match(/File "<exec>", line (\d+)/)
console.log("lineMatch", lineMatch)
if (lineMatch != null) {
let lineNum = parseInt(lineMatch[1])
let codeLines = code.split("\n")
let startLine = Math.max(1, lineNum - 4)
let endLine = Math.min(codeLines.length, lineNum + 4)
let codeContext = codeLines.slice(startLine - 1, endLine)
.map((line, idx) => { return (startLine + idx) + ": " + line })
.join("\n")
errorMsg = errorMsg + "\n\nCode context:\n" + codeContext
}
console.error("[Failure] Code:", code.replace(/\n/g, '\\n'),
"Error:", errorMsg.replace(/\n/g, '\\n'));
result.error = { "type": error.type, "message": errorMsg };
result.success = false
}
result.std_out = this.out_string;
result.std_err = this.err_string;
result.code_runtime = (Date.now() - startCode)
return result;
}
}
================================================
FILE: src/services/python-interpreter/types.ts
================================================
export interface CodeExecutionResponse {
success: boolean;
final_expression?: any;
output_files?: any[];
error?: {
type: string;
message: string;
};
std_out?: string;
std_err?: string;
code_runtime?: number;
}
export interface FileData {
filename: string;
data: Buffer;
}
export interface PythonEnvironment {
init(): Promise<void>;
waitForReady(): Promise<void>;
runCode(code: string, files: any[]): Promise<CodeExecutionResponse>;
cleanup(): Promise<void>;
terminate() : Promise<void>;
}
================================================
FILE: src/utils/async-utils.ts
================================================
// From gist: https://gist.github.com/Justin-Credible/693529fa4672a0d97963b95a26897812#file-async-utils-ts
/**
* A wrapper around setTimeout which returns a promise. Useful for waiting for an amount of
* time from an async function. e.g. await waitFor(1000);
*
* @param milliseconds The amount of time to wait.
* @returns A promise that resolves once the given number of milliseconds has ellapsed.
*/
export function waitFor(milliseconds: number): Promise<void> {
return new Promise((resolve) => {
setTimeout(resolve, milliseconds);
});
}
/**
* Used by doWithLock() to keep track of each "stack" of locks for a given lock name.
*/
const locksByName: Record<string, Promise<any>[]> = {};
/**
* Used to ensure that only a single task for the given lock name can be executed at once.
* While JS is generally single threaded, this method can be useful when running asynchronous
* tasks which may interact with external systems (HTTP API calls, React Native plugins, etc)
* which will cause the main JS thread's event loop to become unblocked. By using the same
* lock name for a group of tasks you can ensure the only one task will ever be in progress
* at a given time.
*
* @param lockName The name of the lock to be obtained.
* @param task The task to execute.
* @returns The value returned by the task.
*/
export async function doWithLock<T>(lockName: string, task: () => Promise<T>): Promise<T> {
// Ensure array present for the given lock name.
if (!locksByName[lockName]) {
locksByName[lockName] = [];
}
// Obtain the stack (array) of locks (promises) for the given lock name.
// The lock at the bottom of the stack (index 0) is for the currently executing task.
const locks = locksByName[lockName];
// Determine if this is the first/only task for the given lock name.
const isFirst = locks.length === 0;
// Create the lock, which is simply a promise. Obtain the promise's resolve method which
// we can use to "unlock" the lock, which signals to the next task in line that it can start.
let unlock = () => {};
const newLock = new Promise<void>((resolve) => {
unlock = resolve;
});
locks.push(newLock);
// If this is the first task for a given lock, we can skip this. All other tasks need to wait
// for the immediately proceeding task to finish executing before continuing.
if (!isFirst) {
const predecessorLock = locks[locks.length - 2];
await predecessorLock;
}
// Now that it's our turn, execute the task. We use a finally block here to ensure that we unlock
// the lock so the next task can start, even if our task throws an error.
try {
return await task();
} catch (error) {
throw error;
} finally {
// Ensure that our lock is removed from the stack.
locks.splice(0, 1);
// Invoke unlock to signal to the next waiting task to start.
unlock();
}
}
================================================
FILE: tests/file_io/_outputs/test_file_output.json
================================================
{
"company": "ABC Corporation",
"address": "123 Main Street",
"city": "New York",
"state": "NY",
"zipcode": "10001",
"employees": [
{
"name": "John Doe",
"position": "Manager",
"salary": 50000
},
{
"name": "Jane Smith ❤️🔥",
"position": "Developer",
"salary": 60000
},
{
"name": "Mike Johnson",
"position": "Sales Representative - äöüß",
"salary": 40000
}
]
}
================================================
FILE: tests/file_io/replay_inputs.py
================================================
import os
directory = os.path.expanduser("~")
# Get a list of all files in the directory
files = os.listdir(directory)
# Print the list of files
for file in files:
print(file)
# check if the file is a directory (we are only interested in files)
if not os.path.isdir(file):
with open(file,mode="rb") as f, open(file.replace("_input","_output"),mode="wb") as f2:
f2.write(f.read())
================================================
FILE: tests/file_io/simple_matplotlib.py
================================================
import matplotlib.pyplot as plt
import numpy as np
# Generate some sample data
x = np.linspace(0, 10, 100) # Create an array of 100 values from 0 to 10
y = np.sin(x) # Compute the sine values for each x
# Create a line plot
plt.plot(x, y, label='Sin(x)')
y = np.cos(x) # Compute the cos values for each x
# Create a line plot
plt.plot(x, y, label='Cos(x)')
plt.plot(x, y+1, label='Cos(x)+1')
plt.plot(x, y+2, label='Cos(x)+2')
# Add labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Simple Matplotlib Example')
# Add a legend
plt.legend()
# Show the plot
#plt.tight_layout()
plt.savefig("plot.png")
plt.savefig("plot.pdf")
plt.savefig("plot.svg")
================================================
FILE: tests/file_io/simple_matplotlib_barchart.py
================================================
import matplotlib.pyplot as plt
# Data for plot
data = {
'Olympus Mons': 72000,
'Ascraeus Mons': 11200,
'Arsia Mons': 10000
}
# Create horizontal bar chart
width = 10
colors = ['#FFA500', '#FFC000', '#FFB74D']
for i, (mountain, height) in enumerate(data.items()):
plt.barh(mountain, height, color=colors[i], edgecolor='black')
plt.xlabel('Height (feet)')
plt.savefig('mars_mountains.png')
================================================
FILE: tests/file_io/test_file_input.json
================================================
{
"company": "ABC Corporation",
"address": "123 Main Street",
"city": "New York",
"state": "NY",
"zipcode": "10001",
"employees": [
{
"name": "John Doe",
"position": "Manager",
"salary": 50000
},
{
"name": "Jane Smith ❤️🔥",
"position": "Developer",
"salary": 60000
},
{
"name": "Mike Johnson",
"position": "Sales Representative - äöüß",
"salary": 40000
}
]
}
================================================
FILE: tests/functionality/error_missing_import.py
================================================
# disabled import
#from sympy import Symbol
v = Symbol('v')
================================================
FILE: tests/functionality/error_syntax_error.py
================================================
import matplotlib.pyplot as plt
import numpy as np
# Generate some sample data
x = np.linspace(0, 10, 100) # Create an array of 100 values from 0 to 10
y = np.sin(x) # Compute the sine values for each x
# Create a line plot
plt.plot(x, y, label='Sin(x)')
y = np.cos(x) # Compute the cos values for each x
# Create a line plot
plt.plot(x, y, label='Cos(x)')
plt.plot(x, y+1, label=Cos(x)+ one) # syntax error here
plt.plot(x, y+2, label='Cos(x)+2')
# Add labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Simple Matplotlib Example')
# Add a legend
plt.legend()
# Show the plot
#plt.tight_layout()
plt.savefig("plot.png")
================================================
FILE: tests/functionality/error_wrong_param.py
================================================
import matplotlib.pyplot as plt
import numpy as np
# Generate some sample data
x = np.linspace(0, 10, 100) # Create an array of 100 values from 0 to 10
y = np.sin(x) # Compute the sine values for each x
# Create a line plot
plt.plot(x, y, label='Sin(x)')
y = np.cos(x) # Compute the cos values for each x
# Create a line plot
plt.plot(x, y, label='Cos(x)')
plt.plotter(x, y+1, labels='Cos(x)+1') # param error here
plt.plot(x, y+2, label='Cos(x)+2')
# Add labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Simple Matplotlib Example')
# Add a legend
plt.legend()
# Show the plot
plt.savefig("plot.png")
================================================
FILE: tests/functionality/numpy_simple.py
================================================
import numpy as np
a = np.arange(15).reshape(3, 5)
print(a)
print(a + 20)
================================================
FILE: tests/functionality/super_long_python_file.py
================================================
import pandas as pd
from datetime import datetime
dora_data = [
{
"componentsperday": 332,
"meantimetoresolve": 0,
"leadtimetochange": 9.921686746987952,
"Date": 1691433000000
},
{
"componentsperday": 156,
"meantimetoresolve": 0,
"leadtimetochange": 3.8333333333333337,
"Date": 1691519400000
},
{
"componentsperday": 179,
"meantimetoresolve": 0,
"leadtimetochange": 7.033519553072626,
"Date": 1691605800000
},
{
"componentsperday": 135,
"meantimetoresolve": 3,
"leadtimetochange": 4.733333333333333,
"Date": 1691692200000
},
{
"componentsperday": 69,
"meantimetoresolve": 0,
"leadtimetochange": 1.1014492753623189,
"Date": 1691778600000
},
{
"componentsperday": 17,
"meantimetoresolve": 0,
"leadtimetochange": 2.0588235294117647,
"Date": 1691865000000
},
{
"componentsperday": 304,
"meantimetoresolve": 443,
"leadtimetochange": 4.434210526315789,
"Date": 1691951400000
},
{
"componentsperday": 208,
"meantimetoresolve": 0,
"leadtimetochange": 2.264423076923077,
"Date": 1692037800000
},
{
"componentsperday": 271,
"meantimetoresolve": 0,
"leadtimetochange": 5.409594095940959,
"Date": 1692124200000
},
{
"componentsperday": 431,
"meantimetoresolve": 0,
"leadtimetochange": 17.08584686774942,
"Date": 1692210600000
},
{
"componentsperday": 433,
"meantimetoresolve": 0,
"leadtimetochange": 9.030023094688222,
"Date": 1692297000000
},
{
"componentsperday": 162,
"meantimetoresolve": 0,
"leadtimetochange": 0.4506172839506173,
"Date": 1692383400000
},
{
"componentsperday": 7,
"meantimetoresolve": 0,
"leadtimetochange": 0,
"Date": 1692469800000
},
{
"componentsperday": 365,
"meantimetoresolve": 1055,
"leadtimetochange": 4.780821917808219,
"Date": 1692556200000
},
{
"componentsperday": 353,
"meantimetoresolve": 216,
"leadtimetochange": 4.9405099150141649,
"Date": 1692642600000
},
{
"componentsperday": 336,
"meantimetoresolve": 0,
"leadtimetochange": 5.4523809523809529,
"Date": 1692729000000
},
{
"componentsperday": 157,
"meantimetoresolve": 0,
"leadtimetochange": 2.515923566878981,
"Date": 1692815400000
},
{
"componentsperday": 227,
"meantimetoresolve": 0,
"leadtimetochange": 3.788546255506608,
"Date": 1692901800000
},
{
"componentsperday": 40,
"meantimetoresolve": 0,
"leadtimetochange": 15.925,
"Date": 1692988200000
},
{
"componentsperday": 21,
"meantimetoresolve": 0,
"leadtimetochange": 1.8095238095238096,
"Date": 1693074600000
},
{
"componentsperday": 134,
"meantimetoresolve": 0,
"leadtimetochange": 6.41044776119403,
"Date": 1693161000000
},
{
"componentsperday": 265,
"meantimetoresolve": 0,
"leadtimetochange": 13.818867924528302,
"Date": 1693247400000
},
{
"componentsperday": 158,
"meantimetoresolve": 0,
"leadtimetochange": 4.987341772151899,
"Date": 1693333800000
},
{
"componentsperday": 451,
"meantimetoresolve": 0,
"leadtimetochange": 9.279379157427938,
"Date": 1693420200000
},
{
"componentsperday": 331,
"meantimetoresolve": 24,
"leadtimetochange": 7.637462235649547,
"Date": 1693506600000
},
{
"componentsperday": 144,
"meantimetoresolve": 473,
"leadtimetochange": 4.826388888888889,
"Date": 1693593000000
},
{
"componentsperday": 4,
"meantimetoresolve": 0,
"leadtimetochange": 0.5,
"Date": 1693679400000
},
{
"componentsperday": 344,
"meantimetoresolve": 744,
"leadtimetochange": 5.77906976744186,
"Date": 1693765800000
},
{
"componentsperday": 367,
"meantimetoresolve": 23,
"leadtimetochange": 5.615803814713897,
"Date": 1693852200000
},
{
"componentsperday": 281,
"meantimetoresolve": 0,
"leadtimetochange": 18.416370106761567,
"Date": 1693938600000
},
{
"componentsperday": 457,
"meantimetoresolve": 0,
"leadtimetochange": 23.792122538293218,
"Date": 1694025000000
},
{
"componentsperday": 240,
"meantimetoresolve": 0,
"leadtimetochange": 1.825,
"Date": 1694111400000
},
{
"componentsperday": 57,
"meantimetoresolve": 0,
"leadtimetochange": 0.45614035087719298,
"Date": 1694197800000
},
{
"componentsperday": 60,
"meantimetoresolve": 0,
"leadtimetochange": 27.25,
"Date": 1694284200000
},
{
"componentsperday": 389,
"meantimetoresolve": 0,
"leadtimetochange": 4.29305912596401,
"Date": 1694370600000
},
{
"componentsperday": 206,
"meantimetoresolve": 7,
"leadtimetochange": 6.349514563106796,
"Date": 1694457000000
},
{
"componentsperday": 180,
"meantimetoresolve": 0,
"leadtimetochange": 5.572222222222222,
"Date": 1694543400000
},
{
"componentsperday": 189,
"meantimetoresolve": 38,
"leadtimetochange": 2.4761904761904764,
"Date": 1694629800000
},
{
"componentsperday": 366,
"meantimetoresolve": 8,
"leadtimetochange": 9.295081967213115,
"Date": 1694716200000
},
{
"componentsperday": 25,
"meantimetoresolve": 0,
"leadtimetochange": 2.76,
"Date": 1694802600000
},
{
"componentsperday": 0,
"meantimetoresolve": 0,
"leadtimetochange": 0,
"Date": 1694889000000
},
{
"componentsperday": 148,
"meantimetoresolve": 0,
"leadtimetochange": 6.952702702702703,
"Date": 1694975400000
},
{
"componentsperday": 163,
"meantimetoresolve": 0,
"leadtimetochange": 3.950920245398773,
"Date": 1695061800000
},
{
"componentsperday": 112,
"meantimetoresolve": 0,
"leadtimetochange": 18.535714285714286,
"Date": 1695148200000
},
{
"componentsperday": 146,
"meantimetoresolve": 0,
"leadtimetochange": 5.773972602739726,
"Date": 1695234600000
},
{
"componentsperday": 613,
"meantimetoresolve": 12,
"leadtimetochange": 136.42088091353998,
"Date": 1695321000000
},
{
"componentsperday": 108,
"meantimetoresolve": 0,
"leadtimetochange": 2.2685185185185188,
"Date": 1695407400000
},
{
"componentsperday": 1,
"meantimetoresolve": 0,
"leadtimetochange": 0,
"Date": 1695493800000
},
{
"componentsperday": 271,
"meantimetoresolve": 46,
"leadtimetochange": 25.44649446494465,
"Date": 1695580200000
},
{
"componentsperday": 369,
"meantimetoresolve": 0,
"leadtimetochange": 16.471544715447157,
"Date": 1695666600000
},
{
"componentsperday": 287,
"meantimetoresolve": 63,
"leadtimetochange": 9.508710801393729,
"Date": 1695753000000
},
{
"componentsperday": 359,
"meantimetoresolve": 7,
"leadtimetochange": 6.933147632311978,
"Date": 1695839400000
},
{
"componentsperday": 234,
"meantimetoresolve": 0,
"leadtimetochange": 16.482905982905984,
"Date": 1695925800000
},
{
"componentsperday": 29,
"meantimetoresolve": 0,
"leadtimetochange": 0.13793103448275863,
"Date": 1696012200000
},
{
"componentsperday": 34,
"meantimetoresolve": 0,
"leadtimetochange": 24.235294117647059,
"Date": 1696098600000
},
{
"componentsperday": 74,
"meantimetoresolve": 396,
"leadtimetochange": 11.324324324324325,
"Date": 1696185000000
},
{
"componentsperday": 300,
"meantimetoresolve": 0,
"leadtimetochange": 4.386666666666667,
"Date": 1696271400000
},
{
"componentsperday": 359,
"meantimetoresolve": 0,
"leadtimetochange": 12,
"Date": 1696357800000
},
{
"componentsperday": 85,
"meantimetoresolve": 0,
"leadtimetochange": 7.91764705882353,
"Date": 1696444200000
},
{
"componentsperday": 64,
"meantimetoresolve": 0,
"leadtimetochange": 2.453125,
"Date": 1696530600000
},
{
"componentsperday": 46,
"meantimetoresolve": 0,
"leadtimetochange": 4.456521739130435,
"Date": 1696617000000
},
{
"componentsperday": 9,
"meantimetoresolve": 0,
"leadtimetochange": 0.2222222222222222,
"Date": 1696703400000
},
{
"componentsperday": 129,
"meantimetoresolve": 0,
"leadtimetochange": 23.651162790697677,
"Date": 1696789800000
},
{
"componentsperday": 122,
"meantimetoresolve": 0,
"leadtimetochange": 5.60655737704918,
"Date": 1696876200000
},
{
"componentsperday": 56,
"meantimetoresolve": 0,
"leadtimetochange": 6.392857142857143,
"Date": 1696962600000
},
{
"componentsperday": 82,
"meantimetoresolve": 0,
"leadtimetochange": 4.951219512195122,
"Date": 1697049000000
},
{
"componentsperday": 60,
"meantimetoresolve": 0,
"leadtimetochange": 2.716666666666667,
"Date": 1697135400000
},
{
"componentsperday": 23,
"meantimetoresolve": 0,
"leadtimetochange": 0.6086956521739131,
"Date": 1697221800000
},
{
"componentsperday": 4,
"meantimetoresolve": 0,
"leadtimetochange": 2,
"Date": 1697308200000
},
{
"componentsperday": 79,
"meantimetoresolve": 0,
"leadtimetochange": 3.8734177215189877,
"Date": 1697394600000
},
{
"componentsperday": 148,
"meantimetoresolve": 0,
"leadtimetochange": 12.891891891891892,
"Date": 1697481000000
},
{
"componentsperday": 119,
"meantimetoresolve": 0,
"leadtimetochange": 11.663865546218487,
"Date": 1697567400000
},
{
"componentsperday": 225,
"meantimetoresolve": 0,
"leadtimetochange": 25.56,
"Date": 1697653800000
},
{
"componentsperday": 84,
"meantimetoresolve": 0,
"leadtimetochange": 3.2261904761904764,
"Date": 1697740200000
},
{
"componentsperday": 61,
"meantimetoresolve": 0,
"leadtimetochange": 0.21311475409836065,
"Date": 1697826600000
},
{
"componentsperday": 59,
"meantimetoresolve": 0,
"leadtimetochange": 2.0338983050847458,
"Date": 1697913000000
},
{
"componentsperday": 198,
"meantimetoresolve": 0,
"leadtimetochange": 16.525252525252527,
"Date": 1697999400000
},
{
"componentsperday": 136,
"meantimetoresolve": 0,
"leadtimetochange": 2.2205882352941179,
"Date": 1698085800000
},
{
"componentsperday": 112,
"meantimetoresolve": 0,
"leadtimetochange": 28.8125,
"Date": 1698172200000
},
{
"componentsperday": 111,
"meantimetoresolve": 0,
"leadtimetochange": 18.117117117117119,
"Date": 1698258600000
},
{
"componentsperday": 147,
"meantimetoresolve": 0,
"leadtimetochange": 3.061224489795918,
"Date": 1698345000000
},
{
"componentsperday": 1,
"meantimetoresolve": 0,
"leadtimetochange": 0,
"Date": 1698431400000
},
{
"componentsperday": 0,
"meantimetoresolve": 0,
"leadtimetochange": 0,
"Date": 1698517800000
},
{
"componentsperday": 54,
"meantimetoresolve": 0,
"leadtimetochange": 17.203703703703704,
"Date": 1698604200000
},
{
"componentsperday": 89,
"meantimetoresolve": 0,
"leadtimetochange": 11.146067415730338,
"Date": 1698690600000
},
{
"componentsperday": 104,
"meantimetoresolve": 0,
"leadtimetochange": 53.875,
"Date": 1698777000000
},
{
"componentsperday": 163,
"meantimetoresolve": 0,
"leadtimetochange": 32.94478527607362,
"Date": 1698863400000
},
{
"componentsperday": 212,
"meantimetoresolve": 0,
"leadtimetochange": 55.424528301886798,
"Date": 1698949800000
},
{
"componentsperday": 0,
"meantimetoresolve": 0,
"leadtimetochange": 0,
"Date": 1699036200000
},
{
"componentsperday": 4,
"meantimetoresolve": 0,
"leadtimetochange": 85.5,
"Date": 1699122600000
},
{
"componentsperday": 87,
"meantimetoresolve": 0,
"leadtimetochange": 26.344827586206898,
"Date": 1699209000000
},
{
"componentsperday": 40,
"meantimetoresolve": 0,
"leadtimetochange": 9.5,
"Date": 1699295400000
},
{
"componentsperday": 15,
"meantimetoresolve": 0,
"leadtimetochange": 1.9333333333333334,
"Date": 1699381800000
}
]
def metrics_analysis_html(data):
''' Function to provide metrics analysis in html '''
for entry in data:
entry['Date'] = datetime.utcfromtimestamp(entry['Date'] / 1000.0).strftime('%Y-%m-%d') # Convert timestamp to datetime
df = pd.DataFrame(dora_data) # Convert data to a Pandas DataFrame
descriptive_stats = df.describe() # Data summary
summary_in_html = descriptive_stats.to_html() # Present results in HTML
return summary_in_html
metrics_analysis_html(dora_data)
================================================
FILE: tests/functionality/sympy_simple.py
================================================
import sympy
import sympy.solvers
from sympy import Symbol
# Define the variables
v = Symbol('v')
# Define the equations
A = -29*v + 2*v
B = -13*v + 98
# Solve the equations
soln = sympy.solvers.solve(A - B, v)
print(f'The solution to {A - B} is v = {soln}')
================================================
FILE: tests/security/create_dir.py
================================================
#
# expected output in terrarium: allow to create dir in guest fs, but not show any other test dirs from previous runs (!)
#
import os
import time
os.makedirs("test_dir_"+str(time.time()))
print(os.listdir())
================================================
FILE: tests/security/cve_2026_5752_proto_escape.py
================================================
#
# expected output in terrarium: fail
#
# Regression test for CVE-2026-5752.
#
# Before the fix, every object exposed to the sandbox via `jsglobals` (e.g.
# `document`, `ImageData`, the nested `style`/`classList` objects) inherited
# from `Object.prototype`. That let sandboxed Python code reach `js.document`
# from Pyodide, walk `.constructor.constructor` up to the host `Function`
# constructor, and call it with `"return globalThis"` to obtain the host
# Node.js global object -- from there `require("child_process").execSync(...)`
# gave arbitrary code execution as root inside the container.
#
# After the fix, every exposed object is built with `Object.create(null)` and
# frozen, so `.constructor` is `undefined` and the prototype walk dead-ends.
# This test attempts the escape; the request must fail (or at minimum return
# an undefined `.constructor`) for the patch to be considered effective.
#
import js
doc = js.document
# .constructor must NOT resolve to a callable host Function on a patched build.
ctor = getattr(doc, "constructor", None)
assert ctor is None or not callable(ctor), (
"CVE-2026-5752 regression: js.document.constructor is reachable; "
"sandbox can walk the prototype chain to host globalThis."
)
# Belt-and-suspenders: try the full escape and make sure it raises.
try:
leak = doc.constructor.constructor("return globalThis")()
raise AssertionError(
f"CVE-2026-5752 regression: obtained host globalThis from sandbox: {leak}"
)
except (AttributeError, TypeError, Exception):
print("ok: prototype chain escape blocked")
================================================
FILE: tests/security/list_dirs.py
================================================
#
# expected output in terrarium: listing the guest root & files only, not of the root system!!
#
import os
from os.path import expanduser
home = expanduser("~")
print(home)
print(home," files",os.listdir(home+"/"))
print("root:",os.listdir("/"))
================================================
FILE: tests/security/subprocess.py
================================================
#
# expected output in terrarium: fail
#
import subprocess
result = subprocess.run('bash echo "test"',
shell=True, text=True)
print(result)
================================================
FILE: tsconfig.json
================================================
{
"compilerOptions": {
"target": "es2018",
"module": "commonjs",
"moduleResolution": "node",
"esModuleInterop": true,
"sourceMap": true,
"strict": true,
"skipLibCheck": true,
"outDir": "dist",
"rootDir": "src"
},
"include" : ["src"],
"exclude" : ["node_modules"]
}
gitextract_h93r42wc/ ├── CHANGELOG.md ├── CODEOWNERS ├── Dockerfile ├── LICENSE ├── README.md ├── default_python_home/ │ ├── README.md │ └── matplotlibrc ├── example-clients/ │ └── python/ │ ├── requirements.txt │ └── terrarium_client.py ├── nodemon.json ├── package.json ├── src/ │ ├── index.ts │ ├── services/ │ │ └── python-interpreter/ │ │ ├── service.ts │ │ └── types.ts │ └── utils/ │ └── async-utils.ts ├── tests/ │ ├── file_io/ │ │ ├── _outputs/ │ │ │ └── test_file_output.json │ │ ├── replay_inputs.py │ │ ├── simple_matplotlib.py │ │ ├── simple_matplotlib_barchart.py │ │ └── test_file_input.json │ ├── functionality/ │ │ ├── error_missing_import.py │ │ ├── error_syntax_error.py │ │ ├── error_wrong_param.py │ │ ├── numpy_simple.py │ │ ├── super_long_python_file.py │ │ └── sympy_simple.py │ └── security/ │ ├── create_dir.py │ ├── cve_2026_5752_proto_escape.py │ ├── list_dirs.py │ └── subprocess.py └── tsconfig.json
SYMBOL INDEX (26 symbols across 6 files)
FILE: example-clients/python/terrarium_client.py
function get_bearer (line 16) | def get_bearer():
class B64_FileData (line 24) | class B64_FileData(TypedDict):
function run_terrarium (line 29) | def run_terrarium(server_url:str, code:str, file_data:List[B64_FileData]...
function file_to_base64 (line 95) | def file_to_base64(file_path):
FILE: src/index.ts
function runRequest (line 19) | async function runRequest(req: any, res: any): Promise<void> {
FILE: src/services/python-interpreter/service.ts
function nullProto (line 25) | function nullProto<T extends object>(props: T): T {
function sealed (line 28) | function sealed<T extends object>(props: T): Readonly<T> {
class PyodidePythonEnvironment (line 43) | class PyodidePythonEnvironment implements PythonEnvironment {
method prepareEnvironment (line 53) | async prepareEnvironment() {
method loadEnvironment (line 67) | async loadEnvironment(): Promise<void> {
method init (line 124) | async init(): Promise<void> {
method waitForReady (line 129) | async waitForReady(): Promise<void> {
method terminate (line 147) | async terminate(): Promise<void> {
method cleanup (line 151) | async cleanup(): Promise<void> {
method readHostFileAsync (line 162) | async readHostFileAsync(filePath: any): Promise<FileData> {
method listFilesRecursive (line 173) | listFilesRecursive(dir: string) {
method readFileAsBase64 (line 206) | readFileAsBase64(filePath: string) {
method bytesToBase64 (line 215) | bytesToBase64(bytes: any) {
method base64ToBytes (line 225) | base64ToBytes(base64: any) {
method runCode (line 231) | async runCode(code: string, files: any[]): Promise<CodeExecutionRespon...
FILE: src/services/python-interpreter/types.ts
type CodeExecutionResponse (line 1) | interface CodeExecutionResponse {
type FileData (line 14) | interface FileData {
type PythonEnvironment (line 19) | interface PythonEnvironment {
FILE: src/utils/async-utils.ts
function waitFor (line 9) | function waitFor(milliseconds: number): Promise<void> {
function doWithLock (line 33) | async function doWithLock<T>(lockName: string, task: () => Promise<T>): ...
FILE: tests/functionality/super_long_python_file.py
function metrics_analysis_html (line 565) | def metrics_analysis_html(data):
Condensed preview — 31 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (62K chars).
[
{
"path": "CHANGELOG.md",
"chars": 930,
"preview": "# Changelog\n\n## 1.0.1 — 2026-04-22\n\n### Security\n\n* Fix **CVE-2026-5752** (CVSS 9.3, critical): sandbox escape via JavaS"
},
{
"path": "CODEOWNERS",
"chars": 102,
"preview": "# Explicit All\n* @cohere-ai/rag\n\n## Security can approve changes to CODEOWNERS:\n* @cohere-ai/security\n"
},
{
"path": "Dockerfile",
"chars": 371,
"preview": "FROM node:21-alpine3.18\nWORKDIR /usr/src/app\nCOPY package*.json ./\nRUN apk --no-cache add curl\nRUN npm install\nRUN npm i"
},
{
"path": "LICENSE",
"chars": 1063,
"preview": "MIT License\n\nCopyright (c) 2024 Cohere\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof "
},
{
"path": "README.md",
"chars": 6569,
"preview": "# ⚠️ No longer maintained: This repository is archived and no longer supported. If you wish to continue development, ple"
},
{
"path": "default_python_home/README.md",
"chars": 101,
"preview": "Hey there 👋 If you found this file via the sandbox - why don't apply at https://cohere.com/careers 🤷?"
},
{
"path": "default_python_home/matplotlibrc",
"chars": 1792,
"preview": "#\n# cohere default style for matplotlib\n#\nfigure.figsize: 6.4, 4.0 # make it a bit more portrait mode vs default\n\n\n#"
},
{
"path": "example-clients/python/requirements.txt",
"chars": 38,
"preview": "requests\ntyping_extensions\ngoogle-auth"
},
{
"path": "example-clients/python/terrarium_client.py",
"chars": 5859,
"preview": "import glob\nfrom sys import argv\nfrom typing import List\nfrom typing_extensions import TypedDict\nimport requests\nimport "
},
{
"path": "nodemon.json",
"chars": 79,
"preview": "{\n \"watch\": [\"src\"],\n \"ext\": \"ts\",\n \"exec\": \"ts-node ./src/index.ts\"\n}"
},
{
"path": "package.json",
"chars": 758,
"preview": "{\n \"name\": \"cohere-terrarium\",\n \"version\": \"1.0.1\",\n \"description\": \"A simple Python sandbox for helpful LLM data age"
},
{
"path": "src/index.ts",
"chars": 2602,
"preview": "import { PyodideInterface, loadPyodide } from 'pyodide';\nimport express, { Express, Request, Response } from \"express\";\n"
},
{
"path": "src/services/python-interpreter/service.ts",
"chars": 13740,
"preview": "import { PyodideInterface, loadPyodide } from \"pyodide\";\nimport { waitFor } from \"../../utils/async-utils\";\nimport { pro"
},
{
"path": "src/services/python-interpreter/types.ts",
"chars": 568,
"preview": "export interface CodeExecutionResponse {\n success: boolean;\n final_expression?: any;\n output_files?: any[];\n "
},
{
"path": "src/utils/async-utils.ts",
"chars": 3018,
"preview": "// From gist: https://gist.github.com/Justin-Credible/693529fa4672a0d97963b95a26897812#file-async-utils-ts\n/**\n * A wrap"
},
{
"path": "tests/file_io/_outputs/test_file_output.json",
"chars": 543,
"preview": "{\n \"company\": \"ABC Corporation\",\n \"address\": \"123 Main Street\",\n \"city\": \"New York\",\n \"state\": \"NY\",\n \"zi"
},
{
"path": "tests/file_io/replay_inputs.py",
"chars": 421,
"preview": "import os\ndirectory = os.path.expanduser(\"~\")\n\n# Get a list of all files in the directory\nfiles = os.listdir(directory)\n"
},
{
"path": "tests/file_io/simple_matplotlib.py",
"chars": 673,
"preview": "import matplotlib.pyplot as plt\nimport numpy as np\n# Generate some sample data\nx = np.linspace(0, 10, 100) # Create an "
},
{
"path": "tests/file_io/simple_matplotlib_barchart.py",
"chars": 408,
"preview": "import matplotlib.pyplot as plt\n\n# Data for plot\ndata = {\n 'Olympus Mons': 72000,\n 'Ascraeus Mons': 11200,\n 'Ar"
},
{
"path": "tests/file_io/test_file_input.json",
"chars": 543,
"preview": "{\n \"company\": \"ABC Corporation\",\n \"address\": \"123 Main Street\",\n \"city\": \"New York\",\n \"state\": \"NY\",\n \"zi"
},
{
"path": "tests/functionality/error_missing_import.py",
"chars": 59,
"preview": "# disabled import\n#from sympy import Symbol\nv = Symbol('v')"
},
{
"path": "tests/functionality/error_syntax_error.py",
"chars": 646,
"preview": "import matplotlib.pyplot as plt\nimport numpy as np\n# Generate some sample data\nx = np.linspace(0, 10, 100) # Create an "
},
{
"path": "tests/functionality/error_wrong_param.py",
"chars": 628,
"preview": "import matplotlib.pyplot as plt\nimport numpy as np\n# Generate some sample data\nx = np.linspace(0, 10, 100) # Create an "
},
{
"path": "tests/functionality/numpy_simple.py",
"chars": 76,
"preview": "import numpy as np\n\na = np.arange(15).reshape(3, 5)\nprint(a)\n\nprint(a + 20)\n"
},
{
"path": "tests/functionality/super_long_python_file.py",
"chars": 11920,
"preview": "import pandas as pd\nfrom datetime import datetime\n\ndora_data = [\n\t{\n\t\t\"componentsperday\": 332,\n\t\t\"meantimetoresolve\": 0,"
},
{
"path": "tests/functionality/sympy_simple.py",
"chars": 262,
"preview": "import sympy\nimport sympy.solvers\n\nfrom sympy import Symbol\n\n# Define the variables\nv = Symbol('v')\n\n# Define the equati"
},
{
"path": "tests/security/create_dir.py",
"chars": 208,
"preview": "#\n# expected output in terrarium: allow to create dir in guest fs, but not show any other test dirs from previous runs ("
},
{
"path": "tests/security/cve_2026_5752_proto_escape.py",
"chars": 1585,
"preview": "#\n# expected output in terrarium: fail\n#\n# Regression test for CVE-2026-5752.\n#\n# Before the fix, every object exposed t"
},
{
"path": "tests/security/list_dirs.py",
"chars": 246,
"preview": "#\n# expected output in terrarium: listing the guest root & files only, not of the root system!!\n#\nimport os\nfrom os.path"
},
{
"path": "tests/security/subprocess.py",
"chars": 144,
"preview": "#\n# expected output in terrarium: fail\n#\nimport subprocess\nresult = subprocess.run('bash echo \"test\"',\n shell=True, t"
},
{
"path": "tsconfig.json",
"chars": 312,
"preview": "{\n \"compilerOptions\": {\n \"target\": \"es2018\", \n \"module\": \"commonjs\", \n \"moduleResolution\": \"node\",\n \"esModu"
}
]
About this extraction
This page contains the full source code of the cohere-ai/cohere-terrarium GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 31 files (54.9 KB), approximately 16.9k tokens, and a symbol index with 26 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.