Showing preview only (4,311K chars total). Download the full file or copy to clipboard to get everything.
Repository: databricks-demos/dbdemos
Branch: main
Commit: fa3f7f981d75
Files: 65
Total size: 4.1 MB
Directory structure:
gitextract_ps7v2nt2/
├── .claude/
│ └── commands/
│ └── release.md
├── .cursorignore
├── .gitignore
├── .vscode/
│ ├── launch.json
│ └── settings.json
├── CLAUDE.md
├── LICENSE
├── MANIFEST.in
├── NOTICE
├── README.md
├── README_AIBI.md
├── SECURITY.md
├── ai_release/
│ ├── __init__.py
│ ├── bundle.py
│ ├── compute.py
│ ├── inspect_jobs.py
│ ├── jobs.py
│ ├── run_remote.py
│ ├── run_state.py
│ └── runs/
│ └── .gitignore
├── build-and-distribute.sh
├── build.sh
├── dbdemos/
│ ├── __init__.py
│ ├── conf.py
│ ├── dbdemos.py
│ ├── exceptions/
│ │ ├── __init__.py
│ │ └── dbdemos_exception.py
│ ├── installer.py
│ ├── installer_dashboard.py
│ ├── installer_genie.py
│ ├── installer_report.py
│ ├── installer_repos.py
│ ├── installer_workflows.py
│ ├── job_bundler.py
│ ├── notebook_parser.py
│ ├── packager.py
│ ├── resources/
│ │ ├── default_cluster_config-AWS.json
│ │ ├── default_cluster_config-AZURE.json
│ │ ├── default_cluster_config-GCP.json
│ │ ├── default_cluster_config.json
│ │ ├── default_cluster_job_config.json
│ │ └── default_test_job_conf.json
│ ├── sql_query.py
│ ├── template/
│ │ ├── LICENSE.html
│ │ ├── NOTICE.html
│ │ ├── README.html
│ │ ├── code_viewer.html
│ │ └── index.html
│ └── tracker.py
├── docs/
│ ├── CNAME
│ └── index.html
├── main.py
├── requirements.in
├── requirements.txt
├── setup.py
├── test/
│ ├── __init__.py
│ ├── test2.html
│ ├── test_installer.py
│ ├── test_installer_genie.py
│ ├── test_job_bundler.py
│ ├── test_list_demos.html
│ ├── test_list_demos2.html
│ └── test_notebook_parser.py
├── test_demo.py
└── test_list_html.html
================================================
FILE CONTENTS
================================================
================================================
FILE: .claude/commands/release.md
================================================
# DBDemos Release Workflow
You are helping with the dbdemos release process. This involves bundling demos from the `dbdemos-notebooks` repository, testing them, fixing any issues, and preparing for release.
## ⛔ CRITICAL WARNINGS
1. **NEVER run a release to PyPI by yourself** - Only the human can trigger `./build-and-distribute.sh`
2. **NEVER commit secrets** - PAT tokens, GitHub tokens must never appear in commits or outputs
3. **NEVER push directly to main** - Always use feature branches and PRs
4. **NEVER cleanup workspace resources yourself** - Always ask the human to do cleanup
## 📚 Notebook Code Quality Principles (CRITICAL)
**The dbdemos-notebooks are state-of-the-art examples that customers will reuse.** Code must be:
1. **Clean and minimal** - No unnecessary code, no hacks, no workarounds
2. **Simple and readable** - Easy to understand for learning purposes
3. **Safe to re-run** - Notebooks must work when run multiple times (idempotent)
4. **No error handling hacks** - Don't add try/except blocks to work around specific errors
5. **No comments explaining errors** - Don't add comments like "handles BudgetPolicy error"
### What NOT to do:
```python
# BAD - Don't add error handling for specific workspace issues
try:
agents.deploy(...)
except NotFound as e:
if "BudgetPolicy" in str(e):
# cleanup and retry...
```
### What TO do instead:
- If a job fails due to stale data/resources, **ASK THE HUMAN** to clean up the workspace
- Never attempt cleanup yourself - the human must do it
- Fix the root cause in the code, not the symptom
### Handling Stale Resource Errors:
**Note:** The bundler now automatically cleans up schemas before running (via `DROP SCHEMA CASCADE`).
This should prevent most stale resource errors. If you still encounter issues:
- `BudgetPolicy not found` → Schema cleanup should fix this, or ask human to delete the serving endpoint
- `Model version already exists` → Should be fixed by schema cleanup
- `Endpoint already exists` → Should be fixed by schema cleanup
- `Table already exists` → Should be fixed by schema cleanup
If automatic cleanup fails or you need manual intervention, **ask the human** to run:
```sql
DROP SCHEMA IF EXISTS main__build.<schema_name> CASCADE;
```
Or delete specific resources via the Databricks UI/API.
## Overview
The dbdemos package bundles notebooks from the `dbdemos-notebooks` repository. The bundling process:
1. Creates/updates jobs in a Databricks workspace that run the notebooks
2. Waits for job completion
3. Downloads executed notebooks with outputs
4. Packages them into the `dbdemos/bundles/` directory
## Environment Setup
Before starting, verify these are available:
- `DATABRICKS_TOKEN` or token in `local_conf_E2TOOL.json`
- `GITHUB_TOKEN` or token in `local_conf_E2TOOL.json`
- Workspace: `https://e2-demo-tools.cloud.databricks.com/`
- dbdemos-notebooks repo at: `../dbdemos-notebooks` (configurable)
- Test cluster: Matches `cluster_name_pattern` in config (default: "quentin")
## AI Release Tools Location
All AI-powered release tools are in `ai_release/`:
- `ai_release/bundle.py` - Bundle and test demos
- `ai_release/run_remote.py` - Execute code on Databricks clusters
- `ai_release/compute.py` - Remote execution library
- `ai_release/run_state.py` - Persistent state tracking for runs
- `ai_release/jobs.py` - Job inspection library (uses Databricks SDK)
- `ai_release/inspect_jobs.py` - CLI for job inspection
## ⏱️ Important: Job Run Times
**Bundle jobs typically take 15-30 minutes to complete.** Each job runs all notebooks in a demo on a Databricks cluster.
- Do NOT wait synchronously for jobs to complete
- Start the job, then work on other tasks or let the user know to check back later
- Use `--status` to check job progress without blocking
- The state tracking system persists progress across sessions
---
## Part 0: Run State Tracking
The AI release workflow tracks state persistently in `ai_release/runs/`:
```
ai_release/runs/
<commit_id>/
state.json # Overall run state
<demo_name>/
status.json # Demo-specific status
errors.json # Extracted errors from failed runs
fix_attempts.json # History of fix attempts
job_output.log # Raw job output
notes.md # AI notes and observations
```
### Using Run State in Python
```python
from ai_release.run_state import get_run_state, get_latest_run
# Get or create state for current commit
state = get_run_state()
# Update demo status
state.update_demo_status("ai-agent", "running", job_id=123, run_id=456)
# Save errors
state.save_errors("ai-agent", [{"cell": 5, "error": "ImportError..."}])
# Record a fix attempt
state.add_fix_attempt("ai-agent", "Remove protobuf constraint", "ai-fix-ai-agent-pip", ["01_create_first_billing_agent.py"])
# Add notes
state.add_note("ai-agent", "The pip install fails due to protobuf<5 conflict with grpcio-status")
# Get summary
print(state.get_summary())
# Resume from previous session
state = get_latest_run()
```
### When to Use State Tracking
- Before starting a bundle job: `state.update_demo_status(demo, "running", ...)`
- After job completes: `state.update_demo_status(demo, "success")` or `"failed"`
- When extracting errors: `state.save_errors(demo, errors)`
- When making a fix: `state.add_fix_attempt(demo, description, branch, files)`
- To add context for future sessions: `state.add_note(demo, note)`
---
## Part 1: Remote Code Execution (Testing Fixes)
Before committing a fix to dbdemos-notebooks, test it interactively on a cluster.
### List Available Clusters
```bash
python ai_release/run_remote.py --list-clusters
```
### Check/Start the Test Cluster
```bash
# Check status
python ai_release/run_remote.py --cluster-status
# Start if not running (will ask for confirmation)
python ai_release/run_remote.py --start-cluster --wait-for-cluster
```
### Execute Code for Testing
```bash
# Execute Python code
python ai_release/run_remote.py --code "print(spark.version)"
# Execute SQL
python ai_release/run_remote.py --code "SELECT current_catalog()" --language sql
# Execute a file
python ai_release/run_remote.py --file path/to/test_script.py
# With longer timeout (default 300s)
python ai_release/run_remote.py --code "long_running_code()" --timeout 600
```
### Context Reuse (Faster Follow-up Commands)
```bash
# First command - save context
python ai_release/run_remote.py --code "x = spark.range(100)" --save-context
# Follow-up commands reuse context (faster, keeps variables)
python ai_release/run_remote.py --code "x.count()" --load-context
# Clear context when done
python ai_release/run_remote.py --clear-context
```
---
## Part 2: Bundling Commands
### Check Configuration
```bash
python ai_release/bundle.py --check-config
```
### Check Status of a Demo
```bash
python ai_release/bundle.py --demo <demo-name> --status
```
This shows recent job runs, task status, and error details.
### Bundle a Specific Demo (from main)
```bash
python ai_release/bundle.py --demo <demo-name>
```
### Bundle from a Feature Branch
```bash
python ai_release/bundle.py --demo <demo-name> --branch <branch-name>
```
### Force Re-run (ignore diff optimization)
```bash
python ai_release/bundle.py --demo <demo-name> --force
```
### Repair Failed Job (re-run only failed tasks)
```bash
python ai_release/bundle.py --demo <demo-name> --repair
```
Use this for quick iteration when debugging. After fixing, always do a full re-run.
Add `--wait` to wait for completion:
```bash
python ai_release/bundle.py --demo <demo-name> --repair --wait
```
### Schema Cleanup (Default: Enabled)
By default, the bundler automatically drops the demo schema (`main__build.<schema>`) before running.
This ensures a clean state and avoids stale resource errors.
```bash
# Cleanup is enabled by default - these are equivalent:
python ai_release/bundle.py --demo <demo-name>
python ai_release/bundle.py --demo <demo-name> --cleanup-schema
# To skip cleanup (not recommended unless debugging):
python ai_release/bundle.py --demo <demo-name> --no-cleanup-schema
```
### Bundle All Demos
```bash
python ai_release/bundle.py --all
```
This uses GitHub diff API to only run demos with changed files.
### List Available Demos
```bash
python ai_release/bundle.py --list-demos
```
---
## Part 3: Fixing a Failed Demo - Complete Workflow
When a demo fails, follow this workflow:
### Step 1: Identify the Error
```bash
# Get job status with auto-extracted errors from notebook cells
python ai_release/inspect_jobs.py --demo <demo-name>
# For full error traces and failing code
python ai_release/inspect_jobs.py --demo <demo-name> --errors
# List all failed jobs
python ai_release/inspect_jobs.py --list --failed-only
```
The inspection tool automatically:
- Fetches the job run details
- Exports the notebook HTML
- Extracts cell-level errors with traceback
- Shows the exact code that failed
- Suggests a fix workflow
Common issues:
- Missing/incompatible dependencies (pip install failures)
- API changes in Databricks
- Data schema changes
- Cluster configuration issues
### Step 2: Test the Fix Interactively (Optional but Recommended)
Before touching the notebooks, test your fix on a cluster:
```bash
# Start cluster if needed
python ai_release/run_remote.py --start-cluster --wait-for-cluster
# Test your fix code
python ai_release/run_remote.py --code "
# Your fix code here
df = spark.read.table('your_table')
# ...
"
```
### Step 3: Create a Fix Branch in dbdemos-notebooks
```bash
cd ../dbdemos-notebooks
git checkout main
git pull origin main
git checkout -b ai-fix-<demo-name>-<issue>
```
### Step 4: Make the Fix
Edit the notebook files in `../dbdemos-notebooks`. The notebooks are `.py` files using Databricks notebook format.
### Step 5: Commit and Push
```bash
cd ../dbdemos-notebooks
git add .
git commit -m "fix: <description of fix>"
git push origin ai-fix-<demo-name>-<issue>
```
### Step 6: Test the Fix (Full Re-run)
```bash
cd ../dbdemos
python ai_release/bundle.py --demo <demo-name> --branch ai-fix-<demo-name>-<issue> --force
```
### Step 7: If Still Failing - Iterate
```bash
# Make more fixes in dbdemos-notebooks
cd ../dbdemos-notebooks
# ... edit files ...
git add . && git commit -m "fix: additional fixes" && git push
# Quick test with repair (faster, but use full re-run for final verification)
cd ../dbdemos
python ai_release/bundle.py --demo <demo-name> --repair --wait
# Or full re-run if dependencies changed
python ai_release/bundle.py --demo <demo-name> --branch ai-fix-<demo-name>-<issue> --force
```
### Step 8: Create PR (When Tests Pass)
```bash
cd ../dbdemos-notebooks
gh pr create --title "fix: <description>" --body "## Summary
- Fixed <issue>
## Testing
- Bundling job passed: <link to job run>
🤖 Generated with Claude Code"
```
### Step 9: After PR is Merged - Final Verification
Wait for the human to merge the PR, then:
```bash
cd ../dbdemos
python ai_release/bundle.py --demo <demo-name> --force
```
Report the result to the human.
---
## Part 4: Full Release Workflow
When all demos are working and you're asked to prepare a release:
### Step 1: Bundle All Demos from Main
```bash
python ai_release/bundle.py --all --force
```
### Step 2: Verify All Passed
Check output for any failures. If any failed, fix them first.
### Step 3: Report to Human
Tell the human:
- All demos bundled successfully
- Any changes made
- Ready for PyPI release
### Step 4: Human Runs Release
**The human will run:** `./build-and-distribute.sh`
**You must NEVER run this yourself.**
---
## Useful Information
### Demo Path Structure
Demos are located in paths like:
- `product_demos/Delta-Lake/delta-lake`
- `demo-retail/lakehouse-retail-c360`
- `aibi/aibi-marketing-campaign`
### Job Naming Convention
Jobs are named: `field-bundle_<demo-name>`
### Bundle Config Location
Each demo has a config at: `<demo-path>/_resources/bundle_config`
### Workspace URLs
- Jobs: `https://e2-demo-tools.cloud.databricks.com/#job/<job_id>`
- Runs: `https://e2-demo-tools.cloud.databricks.com/#job/<job_id>/run/<run_id>`
### Package Versioning Rules (IMPORTANT)
When fixing `%pip install` lines in notebooks, follow these rules for Databricks packages:
**Always use latest (no version pin):**
- `databricks-langchain` - use latest
- `databricks-agents` - use latest
- `databricks-feature-engineering` - use latest (NOT pinned like `==0.12.1`)
- `databricks-sdk` - use latest
- `databricks-mcp` - use latest
**Use minimum version (`>=`):**
- `mlflow>=3.10.1` - minimum version constraint is OK
**Never pin these constraints (they cause conflicts):**
- `protobuf<5` - REMOVE, conflicts with grpcio-status
- `cryptography<43` - REMOVE, unnecessary constraint
**Example - BAD:**
```
%pip install mlflow>=3.10.1 databricks-feature-engineering==0.12.1 protobuf<5 cryptography<43
```
**Example - GOOD:**
```
%pip install mlflow>=3.10.1 databricks-langchain databricks-agents databricks-feature-engineering
```
### Common Errors and Fixes
1. **"couldn't get notebook for run... You probably did a run repair"**
- Solution: Do a full re-run with `--force`
2. **"last job failed for demo X. Can't package"**
- Solution: Fix the failing notebook, then re-run
3. **API rate limits (429 errors)**
- The script auto-retries. If persistent, wait a few minutes.
4. **"Couldn't pull the repo"**
- Git conflicts in workspace. May need manual resolution.
5. **Cluster not running**
- Use `python ai_release/run_remote.py --start-cluster --wait-for-cluster`
6. **pip install CalledProcessError with protobuf/cryptography conflicts**
- Remove `protobuf<5` and `cryptography<43` constraints
- Remove pinned versions like `databricks-feature-engineering==0.12.1`
- See "Package Versioning Rules" above
---
## Files Reference
- `ai_release/inspect_jobs.py` - Job inspection CLI (auto-extracts errors from notebooks)
- `ai_release/jobs.py` - Job inspection library (uses Databricks SDK)
- `ai_release/bundle.py` - Main CLI for bundling
- `ai_release/run_remote.py` - Remote code execution CLI
- `ai_release/compute.py` - Remote execution library
- `ai_release/run_state.py` - Persistent state tracking for runs
- `ai_release/runs/` - Directory containing run state (gitignored)
- `dbdemos/job_bundler.py` - Job creation and execution
- `dbdemos/packager.py` - Packaging executed notebooks
- `local_conf_E2TOOL.json` - Local configuration (gitignored)
## SDK Documentation
Databricks SDK for Python: https://databricks-sdk-py.readthedocs.io/en/latest/
- `../dbdemos-notebooks/` - Source notebooks repository
================================================
FILE: .cursorignore
================================================
bdemos/exceptions/__pycache__
__pycache__
local_conf_awsevent.json
local_conf_cse2.json
local_conf_gcp.json
local_conf_ioannis.json
local_conf.json
local_conf*
.eggs
.DS_Store
build
dbdemos/minisite/
dbdemos/bundles
dist
*.egg-info/
dbdemos/__pycache__
.idea
send_to_e2.sh
test_package.py
dbdemos/resources/local_conf.json
field-demo
venv
databricks-demos.iml
dist
conf.json
.DS_Store
__pycache__
.idea
config.json
================================================
FILE: .gitignore
================================================
dbdemos/exceptions/__pycache__
__pycache__
update-minisite-dbdemos-website.sh
local_conf_awsevent.json
local_conf_cse2.json
local_conf_gcp.json
local_conf_ioannis.json
local_conf.json
local_conf*
.eggs
.DS_Store
build
dbdemos/minisite/
dbdemos/bundles
dist
*.egg-info/
dbdemos/__pycache__
.idea
send_to_e2.sh
test_package.py
dbdemos/resources/local_conf.json
field-demo
venv
databricks-demos.iml
local_conf_azure.json
================================================
FILE: .vscode/launch.json
================================================
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python Debugger: Launch main",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/main.py",
"console": "integratedTerminal"
}
]
}
================================================
FILE: .vscode/settings.json
================================================
{
"python.testing.unittestArgs": [
"-v",
"-s",
"./test",
"-p",
"test_*.py"
],
"python.testing.pytestEnabled": false,
"python.testing.unittestEnabled": true,
"python-envs.defaultEnvManager": "ms-python.python:conda",
"python-envs.defaultPackageManager": "ms-python.python:conda",
"python-envs.pythonProjects": []
}
================================================
FILE: CLAUDE.md
================================================
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## ⛔ CRITICAL: NEVER Release to PyPI
**Claude Code must NEVER run `./build-and-distribute.sh` or release to PyPI.**
Only the human maintainer can trigger a PyPI release. When demos are ready:
1. Report to the human that bundling is complete
2. Wait for the human to run the release script manually
## ⛔ CRITICAL: Never Commit Secrets
**NEVER include PAT tokens, GitHub tokens, or any credentials in:**
- Commits
- PR descriptions
- Tool outputs shown to user
- Log messages
Tokens are stored in `local_conf.json` (gitignored) or environment variables.
## CRITICAL: Do Not Modify Bundle and Minisite Directories
**NEVER search, read, edit, or modify files under these directories unless explicitly asked:**
- `dbdemos/bundles/` - Contains packaged demo bundles (generated artifacts)
- `dbdemos/minisite/` - Contains generated minisite content
These directories contain packaged/generated demo content that should only be modified through the bundling workflow (`job_bundler.py` → `packager.py`). Direct edits to these files will be overwritten during the next bundling process and can break demo installations.
**Work on the source code in the core modules instead** (`installer.py`, `packager.py`, `job_bundler.py`, etc.) or on the source repository (`dbdemos-notebooks`).
## Project Overview
`dbdemos` is a Python toolkit for installing and packaging Databricks demos. It automates deployment of complete demo environments including notebooks, Spark Declarative Pipeline (SDP) pipelines, DBSQL dashboards, workflows, ML models, and AI/BI Genie spaces. The project serves two main purposes:
1. **End-user library**: Users install demos via `pip install dbdemos` and call `dbdemos.install('demo-name')`
2. **Demo packaging system**: Maintainers package demos from source repositories (usually `dbdemos-notebooks`) into distributable bundles
## Architecture
### Core Components
- **installer.py**: Main installation engine that deploys demos to Databricks workspaces
- Creates clusters, SDP pipelines, workflows, dashboards, and ML models
- Handles resource templating (replacing {{CURRENT_USER}}, {{DEMO_FOLDER}}, etc.)
- Manages demo lifecycle from download to deployment
- **job_bundler.py**: Manages the demo bundling workflow
- Scans repositories for demos with `_resources/bundle_config` files
- Executes pre-run jobs to generate notebook outputs
- Tracks execution state and commit history to avoid redundant runs
- **packager.py**: Packages demos into distributable bundles
- Downloads notebooks (with or without pre-run results)
- Extracts Lakeview dashboards from workspace
- Processes notebook content (removes build tags, updates paths)
- Generates minisite HTML for [dbdemos.ai](https://www.dbdemos.ai)
- **dbdemos.py**: User-facing API layer providing `help()`, `list_demos()`, `install()` functions
- **conf.py**: Configuration management including `DBClient` for Databricks REST API calls
- **installer_*.py modules**: Specialized installers for different resource types:
- `installer_workflows.py`: Job/workflow deployment
- `installer_dashboard.py`: DBSQL dashboard installation
- `installer_genie.py`: AI/BI Genie space setup
- `installer_repos.py`: Repository management
- **notebook_parser.py**: Parses and transforms notebook JSON/HTML content
### Demo Bundle Structure
Each demo lives in `dbdemos/bundles/{demo-name}/` with:
- `_resources/bundle_config`: JSON configuration defining demo metadata, notebooks, pipelines, workflows, dashboards
- Notebook files (`.html` format, pre-run with cell outputs)
- `_resources/dashboards/*.lvdash.json`: Dashboard definitions
Bundle configs use template keys that get replaced during installation:
- `{{CURRENT_USER}}`: Installing user's email
- `{{CURRENT_USER_NAME}}`: Sanitized username
- `{{DEMO_FOLDER}}`: Installation path
- `{{DEMO_NAME}}`: Demo identifier
- `{{TODAY}}`: Current date
Demos are sourced from external repositories (typically `databricks-demos/dbdemos-notebooks`) and bundled into this package for distribution.
## Common Development Commands
### Building the Package
```bash
# Build wheel distribution
python setup.py clean --all bdist_wheel
# Build script (used locally)
./build.sh
```
### Testing
```bash
# Run all tests
pytest
# Run specific test file
pytest test/test_installer.py
# Run specific test
pytest test/test_installer.py::TestInstaller::test_method_name
```
### Bundling Demos (Maintainer Workflow)
Create a `local_conf.json` file with workspace credentials (see `local_conf_example.json`):
```json
{
"username": "user@example.com",
"url": "https://workspace.cloud.databricks.com",
"org_id": "1234567890",
"pat_token": "dapi...",
"repo_staging_path": "/Repos/user@example.com",
"repo_name": "dbdemos-notebooks",
"repo_url": "https://github.com/databricks-demos/dbdemos-notebooks",
"branch": "master",
"github_token": "ghp_..."
}
```
Then use `main.py` to bundle demos:
```python
from dbdemos.job_bundler import JobBundler
from dbdemos.packager import Packager
bundler = JobBundler(conf)
bundler.reset_staging_repo(skip_pull=False)
bundler.add_bundle("product_demos/delta-lake") # or use load_bundles_conf() to discover all
bundler.start_and_wait_bundle_jobs(force_execution=False)
packager = Packager(conf, bundler)
packager.package_all()
```
See `test_demo.py` for a complete bundling example.
### Distribution and Release
```bash
# Full release process (bumps version, builds, uploads to PyPI, creates GitHub releases)
./build-and-distribute.sh
```
This script:
1. Verifies GitHub CLI authentication and repository access
2. Auto-increments version in `setup.py` and `dbdemos/__init__.py`
3. Builds wheel package
4. Uploads to PyPI via `twine`
5. Creates release branch and pull request
6. Creates GitHub releases on multiple repositories (`dbdemos`, `dbdemos-notebooks`, `dbdemos-dataset`, `dbdemos-resources`)
## Key Implementation Details
### Dynamic Link Replacement
Notebooks contain special attributes in HTML links that get replaced during installation:
- `dbdemos-pipeline-id="pipeline-id"`: Links to SDP pipelines
- `dbdemos-workflow-id="workflow-id"`: Links to workflows
- `dbdemos-dashboard-id="dashboard-id"`: Links to dashboards
The installer updates these links with actual resource IDs/URLs after creation.
### Resource Creation Flow
1. Parse bundle configuration
2. Create/update Git repo if specified
3. Create demo cluster (with auto-termination)
4. Install notebooks to workspace
5. Create SDP pipelines
6. Create workflows
7. Create DBSQL dashboards
8. Create Genie spaces (for AI/BI demos)
9. Update notebook links to point to created resources
10. Track installation metrics
### Cluster Configuration
Default cluster configs are in `dbdemos/resources/`:
- `default_cluster_config.json`: Standard demo cluster
- `default_test_job_conf.json`: Job cluster configuration
- Cloud-specific variants for AWS/Azure/GCP
Demos can override cluster settings in their bundle config under the `cluster` key.
### Multi-Cloud Support
The project supports AWS, Azure, and GCP. Cloud-specific configurations include:
- Instance type selection
- Storage paths (S3/ADLS/GCS)
- Authentication mechanisms
- DBR version selection
Cloud is detected automatically from workspace or specified via `cloud` parameter in `install()`.
### Serverless Support
Some demos support serverless compute. Set `serverless=True` when installing to use:
- Serverless SDP pipelines
- Serverless SQL warehouses
- Serverless notebooks (where supported)
## Testing Considerations
- Tests use local configuration files (see `local_conf_*.json` examples)
- Tests require a Databricks workspace with appropriate permissions
- Most tests are in the `test/` directory
- `test_demo.py` in root is for bundling workflow testing
## Data Collection
By default, dbdemos collects usage metrics (views, installations) to improve demo quality. This can be disabled by setting `Tracker.enable_tracker = False` in `tracker.py`. No PII is collected; only aggregate usage data and org IDs.
## Important Constraints
- Users need cluster creation, SDP pipeline creation, and DBSQL dashboard permissions
- Unity Catalog demos require a UC metastore
- Some demos have resource quotas (compute, storage)
- Pre-run notebooks require job execution in staging workspace
- Dashboard API has rate limits (especially on GCP workspaces)
## Claude Code Release Workflow
For the full release workflow, use the `/release` command which provides detailed instructions.
### Quick Reference
**Remote Execution** (`ai_release/run_remote.py`) - Test fixes before committing:
```bash
# List clusters
python ai_release/run_remote.py --list-clusters
# Start/check cluster
python ai_release/run_remote.py --start-cluster --wait-for-cluster
# Execute code
python ai_release/run_remote.py --code "print(spark.version)"
# Execute SQL
python ai_release/run_remote.py --code "SELECT 1" --language sql
```
**Job Inspection** (`ai_release/inspect_jobs.py`) - Auto-extracts errors from notebook:
```bash
# List all jobs with status
python ai_release/inspect_jobs.py --list
# List only failed jobs
python ai_release/inspect_jobs.py --list --failed-only
# Get details for a demo (auto-fetches errors if failed)
python ai_release/inspect_jobs.py --demo <name>
# Show full error traces and code
python ai_release/inspect_jobs.py --demo <name> --errors
```
**Bundle CLI** (`ai_release/bundle.py`):
```bash
# Check demo status
python ai_release/bundle.py --demo <name> --status
# Bundle from main
python ai_release/bundle.py --demo <name>
# Bundle from feature branch
python ai_release/bundle.py --demo <name> --branch <branch>
# Repair failed job (quick iteration)
python ai_release/bundle.py --demo <name> --repair
# Force full re-run
python ai_release/bundle.py --demo <name> --force
# Bundle all demos
python ai_release/bundle.py --all
```
**Fix Workflow Summary**:
1. Inspect errors: `python ai_release/inspect_jobs.py --demo <name> --errors`
2. Test fix interactively: `python ai_release/run_remote.py --code "..."`
3. Create fix branch in `../dbdemos-notebooks`: `ai-fix-<demo>-<issue>`
4. Make fix, commit, push
5. Test: `--branch ai-fix-... --force`
6. If fails, iterate with `--repair` or `--force`
7. Create PR when green
8. Human merges PR
9. Final verification from main: `--force`
10. Human runs `./build-and-distribute.sh`
**GitHub CLI Account Switch** - Use the public account for PRs:
```bash
# List accounts
gh auth status
# Switch to public account (for creating PRs on public repos)
gh auth switch --user QuentinAmbard
# Switch to enterprise account (quentin-ambard_data is EMU, can't create PRs on public repos)
gh auth switch --user quentin-ambard_data
```
**PR Status Verification** - Always check PR state before assuming:
```bash
# Check if PR is merged before running tests from main
gh pr view <PR_NUMBER> --json state,mergedAt
# Never assume a PR is not merged - always verify first
```
### Environment
- **Workspace**: `https://e2-demo-tools.cloud.databricks.com/`
- **Config**: `local_conf_E2TOOL.json` (primary) or environment variables
- **Notebooks repo**: `../dbdemos-notebooks` (configurable)
- **Test cluster**: Matches `cluster_name_pattern` in config (default: "quentin")
### Key Files
- `ai_release/inspect_jobs.py` - Job inspection CLI (auto-extracts errors)
- `ai_release/jobs.py` - Job inspection library (uses Databricks SDK)
- `ai_release/bundle.py` - Bundle CLI for demo packaging
- `ai_release/run_remote.py` - Remote code execution on clusters
- `ai_release/compute.py` - Remote execution library
- `.claude/commands/release.md` - Full release workflow documentation
- `dbdemos/job_bundler.py` - Job creation and execution logic
- `dbdemos/packager.py` - Packaging logic
- `local_conf.json` - Local configuration (gitignored, contains secrets)
### Databricks SDK
All Databricks API operations use the Python SDK: https://databricks-sdk-py.readthedocs.io/en/latest/
================================================
FILE: LICENSE
================================================
Copyright (2022) Databricks, Inc.
This library (the "Software") may not be used except in connection with the Licensee's use of the Databricks Platform Services pursuant to an Agreement (defined below) between Licensee (defined below) and Databricks, Inc. ("Databricks"). The Object Code version of the Software shall be deemed part of the Downloadable Services under the Agreement, or if the Agreement does not define Downloadable Services, Subscription Services, or if neither are defined then the term in such Agreement that refers to the applicable Databricks Platform Services (as defined below) shall be substituted herein for “Downloadable Services.” Licensee's use of the Software must comply at all times with any restrictions applicable to the Downlodable Services and Subscription Services, generally, and must be used in accordance with any applicable documentation. For the avoidance of doubt, the Software constitutes Databricks Confidential Information under the Agreement.
Additionally, and notwithstanding anything in the Agreement to the contrary:
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
you may view, make limited copies of, and may compile the Source Code version of the Software into an Object Code version of the Software. For the avoidance of doubt, you may not make derivative works of Software (or make any any changes to the Source Code version of the unless you have agreed to separate terms with Databricks permitting such modifications (e.g., a contribution license agreement)).
If you have not agreed to an Agreement or otherwise do not agree to these terms, you may not use the Software or view, copy or compile the Source Code of the Software.
This license terminates automatically upon the termination of the Agreement or Licensee's breach of these terms. Additionally, Databricks may terminate this license at any time on notice. Upon termination, you must permanently delete the Software and all copies thereof (including the Source Code).
Agreement: the agreement between Databricks and Licensee governing the use of the Databricks Platform Services, which shall be, with respect to Databricks, the Databricks Terms of Service located at www.databricks.com/termsofservice, and with respect to Databricks Community Edition, the Community Edition Terms of Service located at www.databricks.com/ce-termsofuse, in each case unless Licensee has entered into a separate written agreement with Databricks governing the use of the applicable Databricks Platform Services.
Databricks Platform Services: the Databricks services or the Databricks Community Edition services, according to where the Software is used.
Licensee: the user of the Software, or, if the Software is being used on behalf of a company, the company.
Object Code: is version of the Software produced when an interpreter or a compiler translates the Source Code into recognizable and executable machine code.
Source Code: the human readable portion of the Software.
================================================
FILE: MANIFEST.in
================================================
recursive-include dbdemos/bundles *
recursive-include dbdemos/template *
recursive-include dbdemos/resources *
================================================
FILE: NOTICE
================================================
Copyright (2022) Databricks, Inc.
## License
This Software includes software developed at Databricks (https://www.databricks.com/) and its use is subject to the included LICENSE file.
This Software contains code from the following open source projects, licensed under the Apache 2.0 license:
psf/requests - https://github.com/psf/requests
Copyright 2019 Kenneth Reitz
## Data collection
To improve users experience and dbdemos asset quality, dbdemos sends report usage and capture views in the installed notebook (usually in the first cell) and other assets like dashboards. This information is captured for product improvement only and not for marketing purpose, and doesn't contain PII information. By using `dbdemos` and the assets it provides, you consent to this data collection. If you wish to disable it, you can set `Tracker.enable_tracker` to False in the `tracker.py` file.
## Resource creation
To simplify your experience, `dbdemos` will create and start for you resources. As example, a demo could start (not exhaustive):
- A cluster to run your demo
- A Delta Live Table Pipeline to ingest data
- A DBSQL endpoint to run DBSQL dashboard
- An ML model
While `dbdemos` does its best to limit the consumption and enforce resource auto-termination, you remain responsible for the resources created and the potential consumption associated.
## Catalog/Database created
dbdemos will try to create catalogs & databases (schemas). Demos are using the hive_metastore or UC catalogs. dbdemos will try to use the dbdemos catalog when possible.
Permissions / ownership can be granted to all users (account users) in these datasets.
## Support
Databricks does not offer official support for `dbdemos` and the associated assets.
================================================
FILE: README.md
================================================
# dbdemos
DBDemos is a toolkit to easily install Lakehouse demos for Databricks.
**Looking for the dbdemos notebooks and content?** Access [https://github.com/databricks-demos/dbdemos](https://github.com/databricks-demos/dbdemos-notebooks)?
Simply deploy & share demos on any workspace. dbdemos is packaged with a list of demos:
- Lakehouse, end-to-end demos (ex: Lakehouse Retail Churn)
- Product demos (ex: Delta Live Table, CDC, ML, DBSQL Dashboard, MLOps...)
**Please visit [dbdemos.ai](https://www.dbdemos.ai) to explore all our demos.**
## Installation
**Do not clone the repo, just pip install dbdemos wheel:**
```
%pip install dbdemos
```
## Usage within Databricks
See [demo video](https://drive.google.com/file/d/12Iu50r7hlawVN01eE_GoUKBQ4kvUrR56/view?usp=sharing)
```
import dbdemos
dbdemos.help()
dbdemos.list_demos()
dbdemos.install('lakehouse-retail-c360', path='./', overwrite = True)
```

## Requirements
`dbdemos` requires the current user to have:
* Cluster creation permission
* SDP Pipeline creation permission
* DBSQL dashboard & query creation permission
* For UC demos: Unity Catalog metastore must be available (demo will be installed but won't work)
## Features
* Load demo notebooks (pre-run) to the given path
* Start job to load dataset based on demo requirement
* Start demo cluster customized for the demo & the current user
* Setup SDP pipelines
* Setup DBSQL dashboard
* Create ML Model
* Demo links are updated with resources created for an easy navigation
## Feedback
Demo not working? Can't use dbdemos? Please open a github issue. <br/>
Make sure you mention the name of the demo.
# DBDemos Developer options
## Adding an AI/BI demo to dbdemos
open [README_AIBI.md](README_AIBI.md) for more details on how to contribute & add an AI/BI demo.
Read the following if you want to add a new demo bundle.
## Packaging a demo with dbdemos
Your demo must contain a `_resources` folder where you include all initialization scripts and your bundle configuration file.
### Links & tags
DBdemos will dynamically override the link to point to the resources created.
**Always use links relative to the local path to support multi workspaces. Do not add the workspace id.**
#### SDP pipelines:
Your SDP pipeline must be added in the bundle file (see below).
Within your notebook, to identify your pipeline using the id in the bundle file, specify the id `dbdemos-pipeline-id="<id>"`as following:
`<a dbdemos-pipeline-id="sdp-churn" href="#joblist/pipelines/a6ba1d12-74d7-4e2d-b9b7-ca53b655f39d" target="_blank">Spark Declarative Pipeline</a>`
#### Workflows:
Your workflows must be added in the bundle file (see below).
Within your notebook, to identify your workflow using the id in the bundle file, specify the id `dbdemos-workflow-id="<id>"`as following:
`<a dbdemos-workflow-id="credit-job" href="#joblist/pipelines/a6ba1d12-74d7-4e2d-b9b7-ca53b655f39d" target="_blank">Access your workflow</a>`
#### DBSQL dashboards:
Similar to workflows, your dashboard id must match the one in the bundle file.
Dashboards definition should be added to the _dashboards folder (make sure the file name matches the dashboard id: `churn-prediction.lvdash.json`).
` <a dbdemos-dashboard-id="churn-prediction" href="/sql/dashboardsv3/19394330-2274-4b4b-90ce-d415a7ff2130" target="_blank">Churn Analysis Dashboard</a>`
### bundle_config
The demo must contain the a `./_resources/bundle_config` file containing your bundle definition.
This need to be a notebook & not a .json file (due to current api limitation).
```json
{
"name": "<Demo name, used in dbdemos.install('xxx')>",
"category": "<Category, like data-engineering>",
"title": "<Title>.",
"description": "<Description>",
"bundle": <Will bundle when True, skip when False>,
"tags": [{"sdp": "Spark Declarative Pipeline"}],
"notebooks": [
{
"path": "<notebbok path from the demo folder (ex: resources/00-load-data)>",
"pre_run": <Will start a job to run it before packaging to get the cells results>,
"publish_on_website": <Will add the notebook in the public website (with the results if it's pre_run=True)>,
"add_cluster_setup_cell": <if True, add a cell with the name of the demo cluster>,
"title": "<Title>",
"description": "<Description (will be in minisite also)>",
"parameters": {"<key>": "<value. Will be sent to the pre_run job>"}
}
],
"init_job": {
"settings": {
"name": "demos_sdp_cdc_init_{{CURRENT_USER_NAME}}",
"email_notifications": {
"no_alert_for_skipped_runs": False
},
"timeout_seconds": 0,
"max_concurrent_runs": 1,
"tasks": [
{
"task_key": "init_data",
"notebook_task": {
"notebook_path": "{{DEMO_FOLDER}}/_resources/01-load-data-quality-dashboard",
"source": "WORKSPACE"
},
"job_cluster_key": "Shared_job_cluster",
"timeout_seconds": 0,
"email_notifications": {}
}
]
.... Full standard job definition
}
},
"pipelines": <list of SDP pipelines if any>
[
{
"id": "sdp-cdc", <id, used in the notebook links to go to the generated notebook: <a dbdemos-pipeline-id="sdp-cdc" href="#joblist/pipelines/xxxx">installed SDP pipeline</a> >
"run_after_creation": True,
"definition": {
... Any SDP pipeline configuration...
"libraries": [
{
"notebook": {
"path": "{{DEMO_FOLDER}}/_resources/00-Data_CDC_Generator"
}
}
],
"name": "demos_sdp_cdc_{{CURRENT_USER_NAME}}",
"storage": "/demos/sdp/cdc/{{CURRENT_USER_NAME}}",
"target": "demos_sdp_cdc_{{CURRENT_USER_NAME}}"
}
}
],
"workflows": [{
"start_on_install": False,
"id": "credit-job",
"definition": {
"settings": {
... full pipeline settings
}
}],
"dashboards": [{"name": "[dbdemos] Retail Churn Prediction Dashboard", "id": "churn-prediction"}]
}
```
dbdemos will replace the values defined as {{<KEY>}} based on who install the demo. Supported keys:
* TODAY
* CURRENT_USER (email)
* CURRENT_USER_NAME (derivated from email)
* DEMO_NAME
* DEMO_FOLDER
# DBDemo Installer configuration
The following describe how to package the demos created.
The installer needs to fetch data from a workspace & start jobs. To do so, it requires informations `local_conf.json`
```json
{
"pat_token": "xxx",
"username": "xx.xx@databricks.com",
"url": "https://xxx.databricks.com",
"repo_staging_path": "/Repos/xx.xx@databricks.com",
"repo_name": "dbdemos-notebooks",
"repo_url": "https://github.com/databricks-demos/dbdemos-notebooks.git", #put your clone here
"branch": "master",
"current_folder": "<Used to mock the current folder outside of a notebook, ex: /Users/quentin.ambard@databricks.com/test_install_demo>"
}
```
### Creating the bundles:
```python
bundler = JobBundler(conf)
# the bundler will use a stating repo dir in the workspace to analyze & run content.
bundler.reset_staging_repo(skip_pull=False)
# Discover bundles from repo:
bundler.load_bundles_conf()
# Or manually add bundle to run faster:
#bundler.add_bundle("product_demos/Auto-Loader (cloudFiles)")
# Run the jobs (only if there is a new commit since the last time, or failure, or force execution)
bundler.start_and_wait_bundle_jobs(force_execution = False)
packager = Packager(conf, bundler)
packager.package_all()
```
## Licence
See LICENSE file.
## Data collection
To improve users experience and dbdemos asset quality, dbdemos sends report usage and capture views in the installed notebook (usually in the first cell) and dashboards. This information is captured for product improvement only and not for marketing purpose, and doesn't contain PII information. By using `dbdemos` and the assets it provides, you consent to this data collection. If you wish to disable it, you can set `Tracker.enable_tracker` to False in the `tracker.py` file.
## Resource creation
To simplify your experience, `dbdemos` will create and start for you resources. As example, a demo could start (not exhaustive):
- A cluster to run your demo
- A Delta Live Table Pipeline to ingest data
- A DBSQL endpoint to run DBSQL dashboard
- An ML model
While `dbdemos` does its best to limit the consumption and enforce resource auto-termination, you remain responsible for the resources created and the potential consumption associated.
## Support
Databricks does not offer official support for `dbdemos` and the associated assets.
For any issue with `dbdemos` or the demos installed, please open an issue and the demo team will have a look on a best effort basis.
================================================
FILE: README_AIBI.md
================================================
# Adding an AI-BI demo to dbdemos
*Note: Adding new content from external contributors required special terms approval. Please open an issue if you'd like to contribute and are not part of the Databricks team.*
*Note: if you're part of the Databricks team, please reach the demo team slack channel before starting the process for alignement and avoid duplicating work.*
## Fork dbdemos-notebooks
The actual AI-BI demo content is in the [dbdemos-notebooks repository](https://github.com/databricks-demos/dbdemos-notebooks).
Start by forking the repository and create a new branch there with your changes.
## Create the demo
Start by creating your dataset (must be all crafted/generated with DBRX to avoid any license issues), dashboard and genie space.
Once you're ready, add your dbdemos to the [aibi folder](https://github.com/databricks-demos/dbdemos-notebooks/tree/main/aibi).
For that, clone the the [aibi-marketing-campaign folder](https://github.com/databricks-demos/dbdemos-notebooks/tree/main/aibi/aibi-marketing-campaign) and replace the content with your own.
Make sure the name of the folder has the similar pattern: `aibi-<use-case>`.
## Data Transformation and Table Structure
### Start with your story first
Think about what would be a good Dashboard+Genie. Ideally you want to show some business outcome in the dashboard, and you see a spike somewhere. Then you open genie to ask a followup question.
### Dataset
Once your story is ready, work backward to generate your dataset. Think about the gold table required, and then the raw dataset that you'll clean to create these tables.
**Your dataset must be entirely crafted with tools like faker / DBRX. Double check any dataset license. Add a NOTICE file in your dataset folder explaining where the data is coming from / how it was created.**
Datasets are stored in the [dbdemos-datasets repository](https://github.com/databricks-demos/dbdemos-datasets), and then mirrored in the dbdemos-dataset S3 bucket. Fork this repository and add your data in the `aibi` folder.
### Defining the dbdemos genie room setup
All the configuration should go in the bundle file. See this example: [https://github.com/databricks-demos/dbdemos-notebooks/blob/main/aibi/aibi-marketing-campaign/_resources/bundle_config.py](https://github.com/databricks-demos/dbdemos-notebooks/blob/main/aibi/aibi-marketing-campaign/_resources/bundle_config.py)
Here is what your bundle should look like:
```json
{
"name": "aibi-marketing-campaign",
"category": "AI-BI",
"title": "AI/BI: Marketing Campaign effectiveness",
"custom_schema_supported": True,
"default_catalog": "main",
"default_schema": "dbdemos_aibi_cme_marketing_campaign",
"description": "Analyze your Marketing Campaign effectiveness leveraging AI/BI Dashboard. Deep dive into your data and metrics, asking plain question through Genie Room.",
"bundle": True,
"notebooks": [
{
"path": "AI-BI-Marketing-campaign",
"pre_run": False,
"publish_on_website": True,
"add_cluster_setup_cell": False,
"title": "AI BI: Campaign effectiveness",
"description": "Discover Databricks Intelligence Data Platform capabilities."
}
],
"init_job": {},
"cluster": {},
"pipelines": [],
"dashboards": [{"name": "[dbdemos] AI/BI - Marketing Campaign", "id": "web-marketing"}
],
"data_folders":[
{"source_folder":"aibi/dbdemos_aibi_cme_marketing_campaign/raw_campaigns", "source_format": "parquet", "target_volume_folder":"raw_campaigns", "target_format":"parquet"}],
"sql_queries": [
[
"CREATE OR REPLACE TABLE `{{CATALOG}}`.`{{SCHEMA}}`.raw_campaigns TBLPROPERTIES (delta.autooptimize.optimizewrite = TRUE, delta.autooptimize.autocompact = TRUE ) COMMENT 'This is the bronze table for campaigns created from parquet files' AS SELECT * FROM read_files('/Volumes/{{CATALOG}}/{{SCHEMA}}/dbdemos_raw_data/raw_campaigns', format => 'parquet', pathGlobFilter => '*.parquet')"
],
[
"... queries in here will be executed in parallel", " ... (don't forget to add comments) on the table, and PK/FK"
],
["CREATE OR REPLACE FUNCTION {{CATALOG}}.{{SCHEMA}}.my_ai_forecast(input_table STRING, target_column STRING, time_column STRING, periods INT) RETURN TABLE ..."]
],
"genie_rooms":[
{
"id": "marketing-campaign",
"display_name": "DBDemos - AI/BI - Marketing Campaign",
"description": "Analyze your Marketing Campaign effectiveness leveraging AI/BI Dashboard. Deep dive into your data and metrics.",
"table_identifiers": ["{{CATALOG}}.{{SCHEMA}}.campaigns", "..."],
"sql_instructions": [
{
"title": "Compute rolling metrics",
"content": "select date, unique_clicks, sum(unique_clicks) OVER (ORDER BY date RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS clicks_t7d, sum(total_delivered) OVER (ORDER BY date RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS delivered_t7d, sum(unique_clicks) OVER (ORDER BY date RANGE BETWEEN 27 PRECEDING AND CURRENT ROW) AS clicks_t28d, sum(total_delivered) OVER (ORDER BY date RANGE BETWEEN 27 PRECEDING AND CURRENT ROW) AS delivered_t28d, sum(unique_clicks) OVER (ORDER BY date RANGE BETWEEN 90 PRECEDING AND CURRENT ROW) AS clicks_t91d, sum(total_delivered) OVER (ORDER BY date RANGE BETWEEN 90 PRECEDING AND CURRENT ROW) AS delivered_t91d, unique_clicks / total_delivered as ctr, total_delivered / total_sent AS delivery_rate, total_optouts / total_delivered AS optout_rate, total_spam / total_delivered AS spam_rate, clicks_t7d / delivered_t7d as ctr_t7d, clicks_t28d / delivered_t28d as ctr_t28d, clicks_t91d / delivered_t91d as ctr_t91d from {{CATALOG}}.{{SCHEMA}}.metrics_daily_rolling"
}
],
"instructions": "If a customer ask a forecast, leverage the sql fonction ai_forecast",
"function_names": [
"{{CATALOG}}.{{SCHEMA}}.my_ai_forecast"
],
"curated_questions": [
"How has the total number of emails sent, delivered, and the unique clicks evolved over the last six months?", "..."
]
}
]
}
```
### Data Loading and Transformation
AIBI demos should start with raw data files in a volume and implement a few transformation steps to showcase data lineage. This helps demonstrate the end-to-end data workflow and provides a more comprehensive view of Databricks' capabilities.
**Important:** Avoid using Materialized Views (MVs) for transformations as they can slow down the dbdemos installation process. Instead, use standard SQL transformations in your demo for now (we'll revisit soon).
Example transformation flow:
1. Start with raw data in volume, typically 3+ sources
2. Create bronze table(s) directly from the volume files (~3+ tables)
3. [optional] Create silver table(s) with basic transformations (cleaning, type conversion, etc.)
4. Create gold table(s) with business-specific transformations and potentially a few joins (we want to keep at least 2 or 3 tables in the genie room)
### Gold Table Requirements
- Gold tables (used in the Genie room) should have PK and FK defined
- Gold tables should include comprehensive comments on all fields. This improves the Genie experience by providing context for each column and helps users understand the data model.
Example gold table creation with comments directly in the CREATE statement:
```sql
CREATE OR REPLACE TABLE {{CATALOG}}.{{SCHEMA}}.customer_gold (
id STRING COMMENT 'Unique customer identifier' PRIMARY KEY,
first_name STRING COMMENT 'Customer first name',
last_name STRING COMMENT 'Customer last name',
email STRING COMMENT 'Customer email address',
signup_date DATE COMMENT 'Date when customer created their account',
last_activity_date DATE COMMENT 'Most recent date of customer activity',
customer_segment STRING COMMENT 'Customer segmentation category (New, Loyal, At-Risk, Churned)',
lifetime_value DOUBLE COMMENT 'Calculated total customer spend in USD'
)
AS
SELECT
id, first_name, last_name, email, signup_date, last_activity_date, customer_segment, lifetime_value FROM {{CATALOG}}.{{SCHEMA}}.customer_silver;
```
This approach is more concise and ensures all column comments are created in a single SQL statement.
## SQL AI Functions
We need help implementing SQL AI Functions in the installer_genie.py file and the JSON configuration. These functions enhance the AI capabilities of the Genie room and enable more sophisticated queries.
The AI functions should be added as part of the SQL statement. Don't forget to add comments on them (at the function level and function param level).
Once created, you can add them to the genie room under `"function_names": ["{{CATALOG}}.{{SCHEMA}}.ai_forecast", "..."]`
**Note: this isn't yet implemented. If you're interested in contributing to DBDemos, reach out to the demo team. The implementation should go in the `InstallerGenie` class to create these functions during demo installation and make them available in the Genie room.**
## Update the Main notebook
Update the notebook cloned from the folder above, with your use-case.
### Present the use-case
Rename & Update the main notebook, detailing your use-case, what is the data and the insights you want to show.
### Update tracking
Update the demo name in the first cell in the tracker pixel, and the notebook name.
### Update the dashboard link
Put your dashboard in the dashboards folder. In the dashboard json, make sure you use the same catalog and schema as the one you have in the bundle configuration file, typically `main.dbdemos_aibi_xxxxxx`
You can then reference the dashboard like this:
```html
<a dbdemos-dashboard-id="web-marketing" href='/sql/dashboardsv3/02ef00cc36721f9e1f2028ee75723cc1' target="_blank">your dashboard</a>
```
the ID here `web-marketing` must match the ID in the bundle configuration (and the dashboard file name):
```
"dashboards": [{"name": "[dbdemos] AI/BI - Marketing Campaign", "id": "web-marketing"}]
```
### Update the Genie Room link
```html
<a dbdemos-genie-id="marketing-campaign" href='/genie/rooms/01ef775474091f7ba11a8a9d2075eb58' target="_blank">your genie space</a>
```
the ID here `marketing-campaign` must match the ID in the bundle configuration:
```
"genie_rooms":[
{
"id": "marketing-campaign",
"display_name": "DBDemos - AI/BI - Marketing Campaign",
...
}
]
```
## Update the bundle configuration
- Make sure the dashboard ID and genie room ID match your links as above. The dashboard ID must match the dashboard file name in the dashboards folder (see below).
- Keep the `default_catalog` to main, `default_schema` should follow the naming convention `dbdemos_aibi_<industry>_<use-case>`.
- Make sure you add a sql instruction in the genie room, with curated questions, descriptions and an instruction.
- Dataset folders must be the path in the databricks-datasets repository (see below).
## Add your dashboard under the dashboards folder
- create json file in the dashboards folder. Make sure you format it correctly as it's easy to read/diff
- your catalog.schema in the queries must match the `default_catalog` and `default_schema` in the bundle configuration.
- your dashboard name must match the id in the bundle configuration.
- don't forget to update the dashboard tracking (add the tracker in the MD at the end of the dashboard, match the demo name)
## Add your images
Images are stored in the [dbdemos-resources repository](https://github.com/databricks-demos/dbdemos-resources).
To add an image, fork the repo and send a PR.
You need at least 2 images:
- the miniature for the demo list: `https://github.com/databricks-demos/dbdemos-resources/raw/main/icon/<demo_name>.jpg`
- the screenshot of the dashboard: `https://www.dbdemos.ai/assets/img/dbdemos/<demo_name>-dashboard-0.png` (1 per dashboard you add)
Reach out the demo team for a demo miniature
https://www.dbdemos.ai/assets/img/dbdemos/aibi-marketing-campaign-dashboard-0.png
https://github.com/databricks-demos/dbdemos-resources/raw/main/icon/aibi-marketing-campaign.jpg
# Packaging & testing your demo
Open the `test_demo.py` file. Update the conf to match your databricks-notebooks repo fork/branch in the config json file.
dbdemos needs a workspace and a repo to package the demo, make sure you configure it in the conf json file (use your fork).
Make sure you update the bundle folder to match your demo:
```
bundle(conf, "aibi/aibi-marketing-campaign")
```
================================================
FILE: SECURITY.md
================================================
# Security Policy
## Reporting a Vulnerability
Please email bugbounty@databricks.com to report any security vulnerabilities. We will acknowledge receipt of your vulnerability and strive to send you regular updates about our progress. If you're curious about the status of your disclosure please feel free to email us again. If you want to encrypt your disclosure email, you can use [this PGP key](https://keybase.io/arikfr/key.asc).
================================================
FILE: ai_release/__init__.py
================================================
"""
AI Release Tools for DBDemos
This module provides tools for Claude Code to:
1. Execute code remotely on Databricks clusters for testing
2. Bundle and test demos
3. Fix issues in dbdemos-notebooks
CRITICAL: Never release to PyPI - only the human can do that!
"""
__version__ = "0.1.0"
================================================
FILE: ai_release/bundle.py
================================================
#!/usr/bin/env python3
"""
DBDemos Bundle CLI - For bundling and testing demos
This script is designed to be run by Claude Code for the release workflow.
It supports bundling specific demos, running from feature branches, and job repair.
Usage:
# Bundle a specific demo from main branch
python ai_release/bundle.py --demo lakehouse-retail-c360
# Bundle from a feature branch
python ai_release/bundle.py --demo lakehouse-retail-c360 --branch fix/retail-bug
# Bundle all demos (uses GitHub diff to only run changed ones)
python ai_release/bundle.py --all
# Repair a failed job (re-run only failed tasks)
python ai_release/bundle.py --demo lakehouse-retail-c360 --repair
# Force full re-run (ignore commit diff optimization)
python ai_release/bundle.py --demo lakehouse-retail-c360 --force
# Get job status and error details
python ai_release/bundle.py --demo lakehouse-retail-c360 --status
# List all available demos
python ai_release/bundle.py --list-demos
Environment Variables:
DATABRICKS_HOST: Workspace URL (default: https://e2-demo-tools.cloud.databricks.com/)
DATABRICKS_TOKEN: PAT token for Databricks
GITHUB_TOKEN: GitHub token for API access
DBDEMOS_NOTEBOOKS_PATH: Path to dbdemos-notebooks repo (default: ../dbdemos-notebooks)
Config File:
Can also use local_conf.json in the repo root for configuration.
"""
import argparse
import json
import os
import sys
from pathlib import Path
# Add parent directory to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent))
from dbdemos.conf import Conf, DemoConf
from dbdemos.job_bundler import JobBundler
from dbdemos.packager import Packager
def load_config(args):
"""Load configuration from environment variables and/or local_conf.json"""
config = {}
# Try to load from local_conf.json files (E2TOOL is primary for bundling)
repo_root = Path(__file__).parent.parent
conf_files = [
repo_root / "local_conf_E2TOOL.json", # Primary for bundling/testing
repo_root / "local_conf.json",
]
for conf_file in conf_files:
if conf_file.exists():
with open(conf_file, "r") as f:
config = json.load(f)
print(f"Loaded config from {conf_file}")
break
# dbdemos-notebooks path
default_notebooks_path = str(repo_root.parent / "dbdemos-notebooks")
notebooks_path = os.environ.get("DBDEMOS_NOTEBOOKS_PATH", config.get("dbdemos_notebooks_path", default_notebooks_path))
config["dbdemos_notebooks_path"] = notebooks_path
# Branch override from CLI
if args.branch:
config["branch"] = args.branch
elif "branch" not in config:
config["branch"] = "main"
# Validate required fields
required = ["pat_token", "github_token", "url"]
missing = [f for f in required if not config.get(f)]
if missing:
print(f"ERROR: Missing required config: {missing}")
print("Set via environment variables or local_conf.json")
sys.exit(1)
return config
def load_cluster_templates():
"""Load default cluster configuration templates"""
repo_root = Path(__file__).parent.parent
with open(repo_root / "dbdemos/resources/default_cluster_config.json", "r") as f:
default_cluster_template = f.read()
with open(repo_root / "dbdemos/resources/default_test_job_conf.json", "r") as f:
default_cluster_job_template = f.read()
return default_cluster_template, default_cluster_job_template
def create_conf(config):
"""Create Conf object from config dict"""
default_cluster_template, default_cluster_job_template = load_cluster_templates()
# Strip .git from repo_url if present (Conf doesn't allow it)
repo_url = config.get("repo_url", "https://github.com/databricks-demos/dbdemos-notebooks")
if repo_url.endswith(".git"):
repo_url = repo_url[:-4]
return Conf(
username=config.get("username", "claude-code@databricks.com"),
workspace_url=config["url"],
org_id=config.get("org_id", ""),
pat_token=config["pat_token"],
default_cluster_template=default_cluster_template,
default_cluster_job_template=default_cluster_job_template,
repo_staging_path=config.get("repo_staging_path", "/Repos/quentin.ambard@databricks.com"),
repo_name=config.get("repo_name", "dbdemos-notebooks"),
repo_url=repo_url,
branch=config["branch"],
github_token=config["github_token"],
run_test_as_username=config.get("run_test_as_username", "quentin.ambard@databricks.com")
)
def list_demos(bundler: JobBundler):
"""List all available demos"""
print("Scanning for available demos...")
bundler.reset_staging_repo(skip_pull=False)
bundler.load_bundles_conf()
print(f"\nFound {len(bundler.bundles)} demos:\n")
for path, demo_conf in sorted(bundler.bundles.items()):
print(f" - {demo_conf.name:<40} ({path})")
return bundler.bundles
def get_job_status(bundler: JobBundler, demo_name: str):
"""Get detailed job status for a demo"""
# Find the demo
bundler.reset_staging_repo(skip_pull=True)
# Try to find the job
job_name = f"field-bundle_{demo_name}"
job = bundler.db.find_job(job_name)
if not job:
print(f"No job found for demo: {demo_name}")
return None
job_id = job["job_id"]
print(f"\n{'='*80}")
print(f"Job: {job_name}")
print(f"Job ID: {job_id}")
print(f"URL: {bundler.conf.workspace_url}/#job/{job_id}")
print(f"{'='*80}\n")
# Get recent runs
runs = bundler.db.get("2.1/jobs/runs/list", {"job_id": job_id, "limit": 5, "expand_tasks": "true"})
if "runs" not in runs or len(runs["runs"]) == 0:
print("No runs found for this job.")
return None
for i, run in enumerate(runs["runs"]):
run_id = run["run_id"]
state = run["state"]
status = run.get("status", {})
print(f"\n--- Run {i+1}: {run_id} ---")
print(f"State: {state.get('life_cycle_state', 'N/A')} / {state.get('result_state', 'N/A')}")
print(f"URL: {bundler.conf.workspace_url}/#job/{job_id}/run/{run_id}")
if "termination_details" in status:
print(f"Termination: {status['termination_details']}")
# Show task details for the most recent run
if i == 0 and "tasks" in run:
print(f"\nTasks ({len(run['tasks'])} total):")
for task in run["tasks"]:
task_key = task["task_key"]
task_state = task.get("state", {})
task_result = task_state.get("result_state", "PENDING")
# Get error info if failed
error_info = ""
if task_result == "FAILED":
# Try to get run output for error details
task_run_id = task.get("run_id")
if task_run_id:
task_output = bundler.db.get("2.1/jobs/runs/get-output", {"run_id": task_run_id})
if "error" in task_output:
error_info = f"\n Error: {task_output['error'][:200]}..."
if "error_trace" in task_output:
error_info += f"\n Trace: {task_output['error_trace'][:500]}..."
status_icon = "✓" if task_result == "SUCCESS" else "✗" if task_result == "FAILED" else "○"
print(f" {status_icon} {task_key}: {task_result}{error_info}")
return runs["runs"][0] if runs["runs"] else None
def wait_for_run(bundler: JobBundler, job_id: int, run_id: int):
"""Wait for a job run to complete"""
import time
print(f"Waiting for job completion...")
print(f"URL: {bundler.conf.workspace_url}/#job/{job_id}/run/{run_id}")
i = 0
while True:
run = bundler.db.get("2.1/jobs/runs/get", {"run_id": run_id})
state = run.get("state", {})
life_cycle = state.get("life_cycle_state", "UNKNOWN")
if life_cycle not in ["RUNNING", "PENDING"]:
result = state.get("result_state", "UNKNOWN")
print(f"\nJob finished: {life_cycle} / {result}")
return result == "SUCCESS"
if i % 60 == 0: # Print every 5 minutes
print(f" Still running... ({i * 5}s elapsed)")
i += 1
time.sleep(5)
def repair_job(bundler: JobBundler, demo_name: str, wait: bool = False):
"""Repair a failed job (re-run only failed tasks)"""
job_name = f"field-bundle_{demo_name}"
job = bundler.db.find_job(job_name)
if not job:
print(f"No job found for demo: {demo_name}")
return False
job_id = job["job_id"]
# Get the most recent run
runs = bundler.db.get("2.1/jobs/runs/list", {"job_id": job_id, "limit": 1, "expand_tasks": "true"})
if "runs" not in runs or len(runs["runs"]) == 0:
print("No runs found to repair.")
return False
latest_run = runs["runs"][0]
run_id = latest_run["run_id"]
# Check if run is in a repairable state
state = latest_run["state"]
if state.get("life_cycle_state") != "TERMINATED":
print(f"Run is not terminated (state: {state.get('life_cycle_state')}). Cannot repair.")
return False
if state.get("result_state") == "SUCCESS":
print("Run already succeeded. No repair needed.")
return True
# Find failed tasks
failed_tasks = []
for task in latest_run.get("tasks", []):
task_state = task.get("state", {})
if task_state.get("result_state") in ["FAILED", "CANCELED", "TIMEDOUT"]:
failed_tasks.append(task["task_key"])
if not failed_tasks:
print("No failed tasks found to repair.")
return True
print(f"Repairing run {run_id} - re-running tasks: {failed_tasks}")
# Call repair API
repair_response = bundler.db.post("2.1/jobs/runs/repair", {
"run_id": run_id,
"rerun_tasks": failed_tasks
})
if "repair_id" in repair_response:
print(f"Repair started. Repair ID: {repair_response['repair_id']}")
print(f"URL: {bundler.conf.workspace_url}/#job/{job_id}/run/{run_id}")
if wait:
return wait_for_run(bundler, job_id, run_id)
return True
else:
print(f"Failed to repair: {repair_response}")
return False
def cleanup_demo_schema(bundler: JobBundler, demo_conf):
"""Drop the demo schema to ensure clean state before running.
Uses main__build as the catalog (bundling catalog) and the demo's default_schema.
Uses Databricks SDK: w.schemas.delete(full_name=schema_full_name, force=True)
"""
from databricks.sdk import WorkspaceClient
from databricks.sdk.errors import NotFound
# Bundling uses main__build catalog
catalog = "main__build"
schema = demo_conf.default_schema
if not schema:
print(f" No default_schema defined for {demo_conf.name}, skipping cleanup")
return
full_schema = f"{catalog}.{schema}"
print(f" Cleaning up schema: {full_schema}")
try:
w = WorkspaceClient(
host=bundler.conf.workspace_url,
token=bundler.conf.pat_token
)
# force=True is equivalent to CASCADE
w.schemas.delete(full_name=full_schema, force=True)
print(f" ✓ Schema {full_schema} dropped successfully")
except NotFound:
print(f" ✓ Schema {full_schema} does not exist (nothing to clean)")
except Exception as e:
print(f" WARNING: Error during schema cleanup: {e}")
def bundle_demo(bundler: JobBundler, demo_path: str, force: bool = False, skip_packaging: bool = False, cleanup_schema: bool = True):
"""Bundle a specific demo"""
print(f"\nBundling demo: {demo_path}")
print(f"Branch: {bundler.conf.branch}")
bundler.reset_staging_repo(skip_pull=False)
bundler.add_bundle(demo_path)
if len(bundler.bundles) == 0:
print(f"ERROR: Demo not found or not configured for bundling: {demo_path}")
return False
# Clean up schema before running if requested
if cleanup_schema:
print("\nCleaning up demo schemas...")
for path, demo_conf in bundler.bundles.items():
cleanup_demo_schema(bundler, demo_conf)
# Run the job
bundler.start_and_wait_bundle_jobs(
force_execution=force,
skip_execution=False,
recreate_jobs=False
)
# Check results
for path, demo_conf in bundler.bundles.items():
if demo_conf.run_id:
run = bundler.db.get("2.1/jobs/runs/get", {"run_id": demo_conf.run_id})
result_state = run.get("state", {}).get("result_state", "UNKNOWN")
if result_state == "SUCCESS":
print(f"\n✓ Job succeeded for {demo_conf.name}")
if not skip_packaging:
print("Packaging demo...")
packager = Packager(bundler.conf, bundler)
packager.package_all()
print(f"✓ Demo packaged successfully")
return True
else:
print(f"\n✗ Job failed for {demo_conf.name}: {result_state}")
print(f"Check: {bundler.conf.workspace_url}/#job/{demo_conf.job_id}/run/{demo_conf.run_id}")
return False
return False
def bundle_all(bundler: JobBundler, force: bool = False, cleanup_schema: bool = True):
"""Bundle all demos (uses diff optimization)"""
print("\nBundling all demos...")
print(f"Branch: {bundler.conf.branch}")
bundler.reset_staging_repo(skip_pull=False)
bundler.load_bundles_conf()
print(f"Found {len(bundler.bundles)} demos")
# Clean up schemas before running if requested
if cleanup_schema:
print("\nCleaning up demo schemas...")
for path, demo_conf in bundler.bundles.items():
cleanup_demo_schema(bundler, demo_conf)
# Run jobs (will skip unchanged demos unless force=True)
bundler.start_and_wait_bundle_jobs(
force_execution=force,
skip_execution=False,
recreate_jobs=False
)
# Check results
success_count = 0
fail_count = 0
skip_count = 0
for path, demo_conf in bundler.bundles.items():
if demo_conf.run_id:
run = bundler.db.get("2.1/jobs/runs/get", {"run_id": demo_conf.run_id})
result_state = run.get("state", {}).get("result_state", "UNKNOWN")
if result_state == "SUCCESS":
success_count += 1
else:
fail_count += 1
print(f"✗ {demo_conf.name} failed: {result_state}")
else:
skip_count += 1
print(f"\nResults: {success_count} succeeded, {fail_count} failed, {skip_count} skipped")
if fail_count == 0:
print("\nPackaging all demos...")
packager = Packager(bundler.conf, bundler)
packager.package_all()
print("✓ All demos packaged successfully")
return True
else:
print("\n✗ Some jobs failed. Fix errors before packaging.")
return False
def find_demo_path(bundler: JobBundler, demo_name: str) -> str:
"""Find the full path for a demo by name"""
bundler.reset_staging_repo(skip_pull=True)
bundler.load_bundles_conf()
# Check if it's already a path
if demo_name in bundler.bundles:
return demo_name
# Search by demo name
for path, demo_conf in bundler.bundles.items():
if demo_conf.name == demo_name:
return path
# Partial match
matches = []
for path, demo_conf in bundler.bundles.items():
if demo_name in demo_conf.name or demo_name in path:
matches.append((path, demo_conf.name))
if len(matches) == 1:
return matches[0][0]
elif len(matches) > 1:
print(f"Multiple matches for '{demo_name}':")
for path, name in matches:
print(f" - {name} ({path})")
print("\nPlease be more specific.")
return None
print(f"Demo not found: {demo_name}")
return None
def main():
parser = argparse.ArgumentParser(
description="DBDemos Bundle CLI - Bundle and test demos",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=__doc__
)
# Actions
parser.add_argument("--demo", "-d", help="Demo name or path to bundle")
parser.add_argument("--all", "-a", action="store_true", help="Bundle all demos")
parser.add_argument("--list-demos", "-l", action="store_true", help="List all available demos")
parser.add_argument("--status", "-s", action="store_true", help="Get job status for a demo")
# Options
parser.add_argument("--branch", "-b", help="Git branch to use (overrides config)")
parser.add_argument("--force", "-f", action="store_true", help="Force re-run (ignore diff optimization)")
parser.add_argument("--repair", "-r", action="store_true", help="Repair failed job (re-run failed tasks only)")
parser.add_argument("--skip-packaging", action="store_true", help="Skip packaging step (useful for debugging)")
parser.add_argument("--check-config", action="store_true", help="Verify configuration without running anything")
parser.add_argument("--wait", "-w", action="store_true", help="Wait for job/repair completion")
parser.add_argument("--no-cleanup-schema", action="store_true", help="Skip schema cleanup (default: cleanup enabled)")
parser.add_argument("--cleanup-schema", action="store_true", default=True, help="Clean up demo schema before running (default: True)")
args = parser.parse_args()
# Load config
config = load_config(args)
conf = create_conf(config)
bundler = JobBundler(conf)
print(f"Workspace: {conf.workspace_url}")
print(f"Branch: {conf.branch}")
# Check config only
if args.check_config:
print(f"\n✓ Configuration valid")
print(f" - Username: {conf.username}")
print(f" - Repo: {conf.repo_url}")
print(f" - Repo path: {conf.get_repo_path()}")
print(f" - Notebooks path: {config.get('dbdemos_notebooks_path', 'N/A')}")
return 0
# Execute action
if args.list_demos:
list_demos(bundler)
return 0
if args.status:
if not args.demo:
print("ERROR: --status requires --demo")
return 1
get_job_status(bundler, args.demo)
return 0
if args.repair:
if not args.demo:
print("ERROR: --repair requires --demo")
return 1
success = repair_job(bundler, args.demo, wait=args.wait)
return 0 if success else 1
# Determine cleanup_schema setting (--no-cleanup-schema disables it)
cleanup_schema = not args.no_cleanup_schema
if args.demo:
demo_path = find_demo_path(bundler, args.demo)
if not demo_path:
return 1
# Reset bundler after find_demo_path used it
bundler = JobBundler(conf)
success = bundle_demo(bundler, demo_path, force=args.force, skip_packaging=args.skip_packaging, cleanup_schema=cleanup_schema)
return 0 if success else 1
if args.all:
success = bundle_all(bundler, force=args.force, cleanup_schema=cleanup_schema)
return 0 if success else 1
parser.print_help()
return 1
if __name__ == "__main__":
sys.exit(main())
================================================
FILE: ai_release/compute.py
================================================
"""
Remote Code Execution on Databricks Clusters
This module provides functions to execute code on Databricks clusters for testing
notebook fixes before committing them to dbdemos-notebooks.
Based on databricks-tools-core from ai-dev-kit.
"""
import datetime
import json
import time
from pathlib import Path
from typing import Optional, List, Dict, Any
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.compute import (
CommandStatus,
ClusterSource,
Language,
ListClustersFilterBy,
State,
)
class ExecutionResult:
"""Result from code execution on a Databricks cluster."""
def __init__(
self,
success: bool,
output: Optional[str] = None,
error: Optional[str] = None,
cluster_id: Optional[str] = None,
cluster_name: Optional[str] = None,
context_id: Optional[str] = None,
context_destroyed: bool = True,
):
self.success = success
self.output = output
self.error = error
self.cluster_id = cluster_id
self.cluster_name = cluster_name
self.context_id = context_id
self.context_destroyed = context_destroyed
if success and context_id and not context_destroyed:
self.message = (
f"Execution successful. Reuse context_id='{context_id}' with "
f"cluster_id='{cluster_id}' for follow-up commands."
)
elif success:
self.message = "Execution successful."
else:
self.message = f"Execution failed: {error}"
def __repr__(self):
if self.success:
return f"ExecutionResult(success=True, output={repr(self.output[:100] if self.output else None)}...)"
return f"ExecutionResult(success=False, error={repr(self.error)})"
def to_dict(self) -> Dict[str, Any]:
return {
"success": self.success,
"output": self.output,
"error": self.error,
"cluster_id": self.cluster_id,
"cluster_name": self.cluster_name,
"context_id": self.context_id,
"context_destroyed": self.context_destroyed,
"message": self.message,
}
_LANGUAGE_MAP = {
"python": Language.PYTHON,
"scala": Language.SCALA,
"sql": Language.SQL,
"r": Language.R,
}
def get_workspace_client(host: str, token: str) -> WorkspaceClient:
"""Create a WorkspaceClient with explicit credentials."""
return WorkspaceClient(
host=host,
token=token,
auth_type="pat",
product="dbdemos-ai-release",
product_version="0.1.0",
)
def list_clusters(client: WorkspaceClient, include_terminated: bool = False) -> List[Dict[str, Any]]:
"""List user-created clusters in the workspace."""
clusters = []
# Only list user-created clusters
user_sources = [ClusterSource.UI, ClusterSource.API]
# Running clusters
running_filter = ListClustersFilterBy(
cluster_sources=user_sources,
cluster_states=[State.RUNNING, State.PENDING, State.RESIZING, State.RESTARTING],
)
for cluster in client.clusters.list(filter_by=running_filter):
clusters.append({
"cluster_id": cluster.cluster_id,
"cluster_name": cluster.cluster_name,
"state": cluster.state.value if cluster.state else None,
"creator_user_name": cluster.creator_user_name,
})
if include_terminated:
terminated_filter = ListClustersFilterBy(
cluster_sources=user_sources,
cluster_states=[State.TERMINATED, State.TERMINATING, State.ERROR],
)
for cluster in client.clusters.list(filter_by=terminated_filter):
clusters.append({
"cluster_id": cluster.cluster_id,
"cluster_name": cluster.cluster_name,
"state": cluster.state.value if cluster.state else None,
"creator_user_name": cluster.creator_user_name,
})
return clusters
def find_cluster_by_name(client: WorkspaceClient, name_pattern: str) -> Optional[Dict[str, Any]]:
"""
Find a cluster by name pattern (case-insensitive).
Args:
client: WorkspaceClient
name_pattern: Pattern to match (e.g., "quentin")
Returns:
Cluster info dict or None
"""
clusters = list_clusters(client, include_terminated=True)
pattern_lower = name_pattern.lower()
# First try running clusters
for cluster in clusters:
if cluster["state"] == "RUNNING" and pattern_lower in cluster["cluster_name"].lower():
return cluster
# Then try any cluster
for cluster in clusters:
if pattern_lower in cluster["cluster_name"].lower():
return cluster
return None
def start_cluster(client: WorkspaceClient, cluster_id: str) -> Dict[str, Any]:
"""Start a terminated cluster."""
cluster = client.clusters.get(cluster_id)
cluster_name = cluster.cluster_name or cluster_id
current_state = cluster.state.value if cluster.state else "UNKNOWN"
if current_state == "RUNNING":
return {
"cluster_id": cluster_id,
"cluster_name": cluster_name,
"state": "RUNNING",
"message": f"Cluster '{cluster_name}' is already running.",
}
if current_state not in ("TERMINATED", "ERROR"):
return {
"cluster_id": cluster_id,
"cluster_name": cluster_name,
"state": current_state,
"message": f"Cluster '{cluster_name}' is in state {current_state}.",
}
client.clusters.start(cluster_id)
return {
"cluster_id": cluster_id,
"cluster_name": cluster_name,
"previous_state": current_state,
"state": "PENDING",
"message": f"Cluster '{cluster_name}' is starting (3-8 minutes).",
}
def get_cluster_status(client: WorkspaceClient, cluster_id: str) -> Dict[str, Any]:
"""Get cluster status."""
cluster = client.clusters.get(cluster_id)
return {
"cluster_id": cluster_id,
"cluster_name": cluster.cluster_name or cluster_id,
"state": cluster.state.value if cluster.state else "UNKNOWN",
}
def wait_for_cluster(client: WorkspaceClient, cluster_id: str, timeout: int = 600) -> bool:
"""Wait for cluster to reach RUNNING state."""
start_time = time.time()
while time.time() - start_time < timeout:
status = get_cluster_status(client, cluster_id)
state = status["state"]
if state == "RUNNING":
print(f"✓ Cluster '{status['cluster_name']}' is running")
return True
elif state in ("TERMINATED", "ERROR"):
print(f"✗ Cluster '{status['cluster_name']}' is {state}")
return False
print(f" Cluster state: {state}... waiting")
time.sleep(30)
print(f"✗ Timeout waiting for cluster")
return False
def create_context(client: WorkspaceClient, cluster_id: str, language: str = "python") -> str:
"""Create an execution context on a cluster."""
lang_enum = _LANGUAGE_MAP.get(language.lower(), Language.PYTHON)
result = client.command_execution.create(
cluster_id=cluster_id, language=lang_enum
).result()
return result.id
def destroy_context(client: WorkspaceClient, cluster_id: str, context_id: str) -> None:
"""Destroy an execution context."""
client.command_execution.destroy(cluster_id=cluster_id, context_id=context_id)
def execute_command(
client: WorkspaceClient,
code: str,
cluster_id: str,
context_id: Optional[str] = None,
language: str = "python",
timeout: int = 300,
destroy_context_on_completion: bool = False,
) -> ExecutionResult:
"""
Execute code on a Databricks cluster.
Args:
client: WorkspaceClient
code: Code to execute
cluster_id: Cluster ID
context_id: Optional existing context ID (for state preservation)
language: "python", "scala", "sql", or "r"
timeout: Timeout in seconds
destroy_context_on_completion: Whether to destroy context after execution
Returns:
ExecutionResult
"""
# Get cluster name for better output
try:
cluster_info = client.clusters.get(cluster_id)
cluster_name = cluster_info.cluster_name
except Exception:
cluster_name = cluster_id
# Create context if not provided
context_created = False
if context_id is None:
context_id = create_context(client, cluster_id, language)
context_created = True
lang_enum = _LANGUAGE_MAP.get(language.lower(), Language.PYTHON)
try:
result = client.command_execution.execute(
cluster_id=cluster_id,
context_id=context_id,
language=lang_enum,
command=code,
).result(timeout=datetime.timedelta(seconds=timeout))
if result.status == CommandStatus.FINISHED:
# Check for error in results
if result.results and result.results.result_type and result.results.result_type.value == "error":
error_msg = result.results.cause if result.results.cause else "Unknown error"
return ExecutionResult(
success=False,
error=error_msg,
cluster_id=cluster_id,
cluster_name=cluster_name,
context_id=context_id,
context_destroyed=False,
)
output = result.results.data if result.results and result.results.data else "Success (no output)"
exec_result = ExecutionResult(
success=True,
output=str(output),
cluster_id=cluster_id,
cluster_name=cluster_name,
context_id=context_id,
context_destroyed=False,
)
elif result.status in [CommandStatus.ERROR, CommandStatus.CANCELLED]:
error_msg = result.results.cause if result.results and result.results.cause else "Unknown error"
exec_result = ExecutionResult(
success=False,
error=error_msg,
cluster_id=cluster_id,
cluster_name=cluster_name,
context_id=context_id,
context_destroyed=False,
)
else:
exec_result = ExecutionResult(
success=False,
error=f"Unexpected status: {result.status}",
cluster_id=cluster_id,
cluster_name=cluster_name,
context_id=context_id,
context_destroyed=False,
)
# Destroy context if requested
if destroy_context_on_completion:
try:
destroy_context(client, cluster_id, context_id)
exec_result.context_destroyed = True
except Exception:
pass
return exec_result
except TimeoutError:
return ExecutionResult(
success=False,
error=f"Command timed out after {timeout}s",
cluster_id=cluster_id,
cluster_name=cluster_name,
context_id=context_id,
context_destroyed=False,
)
except Exception as e:
if context_created and destroy_context_on_completion:
try:
destroy_context(client, cluster_id, context_id)
except Exception:
pass
return ExecutionResult(
success=False,
error=str(e),
cluster_id=cluster_id,
cluster_name=cluster_name,
context_id=context_id if not destroy_context_on_completion else None,
context_destroyed=destroy_context_on_completion,
)
def execute_file(
client: WorkspaceClient,
file_path: str,
cluster_id: str,
context_id: Optional[str] = None,
timeout: int = 600,
destroy_context_on_completion: bool = False,
) -> ExecutionResult:
"""
Execute a local Python file on a Databricks cluster.
Args:
client: WorkspaceClient
file_path: Path to the Python file
cluster_id: Cluster ID
context_id: Optional existing context ID
timeout: Timeout in seconds
destroy_context_on_completion: Whether to destroy context after
Returns:
ExecutionResult
"""
try:
with open(file_path, "r", encoding="utf-8") as f:
code = f.read()
except FileNotFoundError:
return ExecutionResult(success=False, error=f"File not found: {file_path}")
except Exception as e:
return ExecutionResult(success=False, error=f"Failed to read file: {e}")
if not code.strip():
return ExecutionResult(success=False, error=f"File is empty: {file_path}")
return execute_command(
client=client,
code=code,
cluster_id=cluster_id,
context_id=context_id,
language="python",
timeout=timeout,
destroy_context_on_completion=destroy_context_on_completion,
)
================================================
FILE: ai_release/inspect_jobs.py
================================================
#!/usr/bin/env python3
"""
Job Inspection CLI for DBDemos
Inspect bundle jobs, check their status, and get detailed failure information.
Automatically extracts errors from notebook HTML when API doesn't provide them.
Usage:
# List all bundle jobs with their status
python ai_release/inspect_jobs.py --list
# List only failed jobs
python ai_release/inspect_jobs.py --list --failed-only
# Get detailed info for a specific demo (auto-fetches errors)
python ai_release/inspect_jobs.py --demo ai-agent
# Get detailed failure info with fix suggestions
python ai_release/inspect_jobs.py --demo ai-agent --errors
# Export notebook path for the failed task
python ai_release/inspect_jobs.py --demo ai-agent --notebook-path
# Check if job is up-to-date with HEAD commit
python ai_release/inspect_jobs.py --demo ai-agent --check-commit
# Get task output for debugging
python ai_release/inspect_jobs.py --task-output <task_run_id>
# Export failure summary to file
python ai_release/inspect_jobs.py --demo ai-agent --errors --output errors.txt
"""
import argparse
import json
import sys
from datetime import datetime
from pathlib import Path
# Add parent directory to path
sys.path.insert(0, str(Path(__file__).parent.parent))
from ai_release.jobs import JobInspector, load_inspector_from_config, JobInfo
def format_timestamp(ts: int) -> str:
"""Format a millisecond timestamp to human-readable string."""
if not ts:
return "N/A"
return datetime.fromtimestamp(ts / 1000).strftime("%Y-%m-%d %H:%M:%S")
def format_duration(start: int, end: int) -> str:
"""Format duration between two timestamps."""
if not start or not end:
return "N/A"
duration_sec = (end - start) / 1000
if duration_sec < 60:
return f"{duration_sec:.0f}s"
elif duration_sec < 3600:
return f"{duration_sec / 60:.1f}m"
else:
return f"{duration_sec / 3600:.1f}h"
def print_job_list(jobs: list, failed_only: bool = False):
"""Print a formatted list of jobs."""
if failed_only:
jobs = [j for j in jobs if j.latest_run and j.latest_run.failed]
if not jobs:
print("No jobs found.")
return
print(f"\n{'Demo Name':<40} {'State':<12} {'Result':<10} {'Run Time':<12} {'Commit':<10}")
print("=" * 90)
for job in sorted(jobs, key=lambda j: j.demo_name):
run = job.latest_run
if run:
state_icon = "🟢" if run.succeeded else "🔴" if run.failed else "🟡" if run.running else "⚪"
result = run.result_state or run.state
duration = format_duration(run.start_time, run.end_time)
commit = (run.used_commit or "")[:8]
else:
state_icon = "⚪"
result = "NO RUNS"
duration = "N/A"
commit = "N/A"
print(f"{state_icon} {job.demo_name:<38} {result:<12} {duration:<12} {commit:<10}")
# Summary
total = len(jobs)
succeeded = len([j for j in jobs if j.latest_run and j.latest_run.succeeded])
failed = len([j for j in jobs if j.latest_run and j.latest_run.failed])
running = len([j for j in jobs if j.latest_run and j.latest_run.running])
print(f"\nTotal: {total} | ✓ Succeeded: {succeeded} | ✗ Failed: {failed} | ◐ Running: {running}")
def print_fix_workflow(job: JobInfo, inspector: JobInspector):
"""Print suggested fix workflow for a failed job."""
print("\n" + "=" * 80)
print("SUGGESTED FIX WORKFLOW")
print("=" * 80)
demo_name = job.demo_name
run = job.latest_run
# Get the notebook path from the first failed task
notebook_path = None
if run and run.failed_tasks:
notebook_path = run.failed_tasks[0].notebook_path
print(f"""
1. TEST FIX INTERACTIVELY (optional but recommended):
python ai_release/run_remote.py --start-cluster --wait-for-cluster
python ai_release/run_remote.py --code "# test your fix code here"
2. CREATE FIX BRANCH in dbdemos-notebooks:
cd ../dbdemos-notebooks
git checkout main && git pull origin main
git checkout -b ai-fix-{demo_name}-<issue>
3. EDIT THE NOTEBOOK:
{notebook_path or 'Check the failed task notebook path above'}
4. COMMIT AND PUSH:
git add . && git commit -m "fix: <description>" && git push origin ai-fix-{demo_name}-<issue>
5. TEST THE FIX:
cd ../dbdemos
python ai_release/bundle.py --demo {demo_name} --branch ai-fix-{demo_name}-<issue> --force
6. IF STILL FAILING - iterate with repair (faster):
python ai_release/bundle.py --demo {demo_name} --repair --wait
7. CREATE PR when tests pass:
cd ../dbdemos-notebooks
gh pr create --title "fix: {demo_name} <issue>" --body "Fixed <issue>"
8. AFTER PR MERGED - final verification:
cd ../dbdemos
python ai_release/bundle.py --demo {demo_name} --force
""")
def print_job_details(job: JobInfo, inspector: JobInspector, show_errors: bool = False,
check_commit: bool = False, show_workflow: bool = True):
"""Print detailed information about a job."""
print(f"\n{'=' * 80}")
print(f"Demo: {job.demo_name}")
print(f"Job ID: {job.job_id}")
print(f"Job URL: {inspector.get_job_url(job.job_id)}")
print(f"{'=' * 80}")
run = job.latest_run
if not run:
print("\nNo runs found for this job.")
return
print(f"\nLatest Run: {run.run_id}")
print(f"Run URL: {inspector.get_job_url(job.job_id, run.run_id)}")
print(f"State: {run.state}")
print(f"Result: {run.result_state or 'N/A'}")
print(f"Started: {format_timestamp(run.start_time)}")
print(f"Ended: {format_timestamp(run.end_time)}")
print(f"Duration: {format_duration(run.start_time, run.end_time)}")
print(f"Git Commit: {run.used_commit or 'N/A'}")
if run.state_message:
print(f"Message: {run.state_message}")
# Check commit status
if check_commit:
print(f"\n--- Git Commit Check ---")
head = inspector.get_head_commit()
if head:
print(f"HEAD Commit: {head}")
if run.used_commit:
if run.used_commit == head:
print("✓ Job is UP-TO-DATE with HEAD")
else:
print("✗ Job is OUTDATED - HEAD has newer commits")
else:
print("? Cannot determine - no commit info in job run")
else:
print("Could not fetch HEAD commit from GitHub")
# Print tasks
print(f"\n--- Tasks ({len(run.tasks)} total) ---")
for task in run.tasks:
icon = "✓" if task.state == "SUCCESS" else "✗" if task.failed else "○"
notebook = task.notebook_path.split("/")[-1] if task.notebook_path else "N/A"
print(f" {icon} {task.task_key}: {task.state} ({notebook})")
# Print errors if job failed (always show for failed jobs, more detail with --errors)
if run.failed_tasks:
print(f"\n{'=' * 80}")
print("FAILURE DETAILS")
print(f"{'=' * 80}")
for task in run.failed_tasks:
print(f"\n--- Task: {task.task_key} ---")
if task.notebook_path:
print(f"Notebook: {task.notebook_path}")
# Show error summary
if task.error_message:
print(f"\nError: {task.error_message}")
# Show notebook errors if available
if task.notebook_errors:
print(f"\n--- Notebook Cell Errors ({len(task.notebook_errors)} found) ---")
for err in task.notebook_errors:
print(f"\n[Cell {err.cell_index}] {err.error_name}: {err.error_message}")
if err.cell_source and show_errors:
# Show the code that caused the error
src = err.cell_source
if len(src) > 500:
src = src[:500] + "\n... (truncated)"
print(f"\nCode:\n{src}")
if err.error_trace and show_errors:
trace = err.error_trace
if len(trace) > 2000:
trace = trace[:2000] + "\n... (truncated)"
print(f"\nTraceback:\n{trace}")
# Fallback to API trace if no notebook errors
elif task.error_trace and show_errors:
trace = task.error_trace
if len(trace) > 3000:
trace = trace[:3000] + "\n... (truncated, use --task-output for full trace)"
print(f"\nStack Trace:\n{trace}")
# Show fix workflow for failed jobs
if show_workflow:
print_fix_workflow(job, inspector)
def print_task_output(inspector: JobInspector, task_run_id: int):
"""Print the full output from a task run, including exported notebook errors."""
print(f"\n{'=' * 80}")
print(f"Task Run ID: {task_run_id}")
print(f"{'=' * 80}")
# First try standard API output
output = inspector.get_task_output(task_run_id)
if output:
if output.get("error"):
print(f"\nAPI Error:\n{output['error']}")
if output.get("error_trace"):
print(f"\nAPI Stack Trace:\n{output['error_trace']}")
if output.get("notebook_output"):
print(f"\nNotebook Output:\n{output['notebook_output']}")
# Also export and parse notebook HTML for cell-level errors
print("\n--- Extracting errors from notebook HTML ---")
html = inspector.export_notebook_html(task_run_id)
if html:
errors = inspector.extract_errors_from_html(html)
if errors:
print(f"Found {len(errors)} error(s) in notebook cells:")
for err in errors:
print(f"\n[Cell {err.cell_index}] {err.error_name}: {err.error_message}")
if err.cell_source:
print(f"\nCode:\n{err.cell_source}")
if err.error_trace:
print(f"\nTraceback:\n{err.error_trace}")
else:
print("No cell errors found in notebook HTML")
else:
print("Could not export notebook HTML")
def main():
parser = argparse.ArgumentParser(
description="Inspect DBDemos bundle jobs",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=__doc__
)
# Main actions
parser.add_argument("--list", "-l", action="store_true", help="List all bundle jobs")
parser.add_argument("--demo", "-d", help="Get details for a specific demo")
parser.add_argument("--task-output", type=int, help="Get output from a specific task run ID")
# Options
parser.add_argument("--failed-only", "-f", action="store_true", help="Only show failed jobs")
parser.add_argument("--errors", "-e", action="store_true", help="Show detailed error traces and code")
parser.add_argument("--check-commit", "-c", action="store_true", help="Check if job is up-to-date with HEAD")
parser.add_argument("--no-workflow", action="store_true", help="Don't show fix workflow suggestions")
parser.add_argument("--notebook-path", action="store_true", help="Print only the notebook path for the first failed task")
parser.add_argument("--output", "-o", help="Write output to file")
parser.add_argument("--json", action="store_true", help="Output as JSON")
args = parser.parse_args()
# Load inspector
try:
inspector = load_inspector_from_config()
print(f"Workspace: {inspector.host}")
except Exception as e:
print(f"Error loading config: {e}")
return 1
# Redirect output to file if requested
output_file = None
if args.output:
output_file = open(args.output, "w")
sys.stdout = output_file
try:
# List jobs
if args.list:
print("\nFetching bundle jobs...")
jobs = inspector.list_bundle_jobs(include_run_details=True)
print_job_list(jobs, failed_only=args.failed_only)
return 0
# Get demo details
if args.demo:
print(f"\nFetching job for demo: {args.demo}")
job = inspector.find_job(args.demo)
if not job:
print(f"No job found for demo: {args.demo}")
return 1
# Always get full details for failed jobs (to get errors)
if job.latest_run and (job.latest_run.failed or args.errors or args.check_commit):
print("Fetching error details...")
job.latest_run = inspector.get_job_run_details(job.job_id, job.latest_run.run_id)
# Just print notebook path if requested
if args.notebook_path:
if job.latest_run and job.latest_run.failed_tasks:
for task in job.latest_run.failed_tasks:
if task.notebook_path:
print(task.notebook_path)
return 0
if args.json:
# Output as JSON for programmatic use
data = {
"demo_name": job.demo_name,
"job_id": job.job_id,
"job_url": inspector.get_job_url(job.job_id),
}
if job.latest_run:
data["latest_run"] = {
"run_id": job.latest_run.run_id,
"state": job.latest_run.state,
"result_state": job.latest_run.result_state,
"used_commit": job.latest_run.used_commit,
"failed_tasks": [
{
"task_key": t.task_key,
"run_id": t.run_id,
"notebook_path": t.notebook_path,
"error_message": t.error_message,
"error_trace": t.error_trace,
"notebook_errors": [
{
"cell_index": e.cell_index,
"error_name": e.error_name,
"error_message": e.error_message,
"cell_source": e.cell_source,
}
for e in t.notebook_errors
] if t.notebook_errors else []
}
for t in job.latest_run.failed_tasks
]
}
print(json.dumps(data, indent=2))
else:
print_job_details(job, inspector,
show_errors=args.errors,
check_commit=args.check_commit,
show_workflow=not args.no_workflow)
return 0
# Get task output
if args.task_output:
print_task_output(inspector, args.task_output)
return 0
# No action specified
parser.print_help()
return 1
finally:
if output_file:
output_file.close()
sys.stdout = sys.__stdout__
print(f"Output written to: {args.output}")
if __name__ == "__main__":
sys.exit(main())
================================================
FILE: ai_release/jobs.py
================================================
"""
Job Inspection Module for DBDemos
Provides functions to inspect bundle jobs, get failure details, and compare git commits.
Uses the Databricks SDK for all API operations.
SDK Documentation: https://databricks-sdk-py.readthedocs.io/en/latest/
"""
import json
import re
import requests
import urllib.parse
from bs4 import BeautifulSoup
from dataclasses import dataclass, field
from typing import Optional, List, Dict, Any
from pathlib import Path
from html import unescape
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.jobs import RunResultState, RunLifeCycleState, ViewsToExport
@dataclass
class NotebookError:
"""Error extracted from a notebook cell."""
cell_index: int
cell_type: str # "code", "markdown"
error_name: str # e.g., "NameError", "ValueError"
error_message: str
error_trace: Optional[str] = None
cell_source: Optional[str] = None # The code that caused the error
@dataclass
class TaskResult:
"""Result from a single task in a job run."""
task_key: str
run_id: int
state: str # SUCCESS, FAILED, SKIPPED, etc.
notebook_path: Optional[str] = None
error_message: Optional[str] = None
error_trace: Optional[str] = None
used_commit: Optional[str] = None
notebook_errors: List[NotebookError] = field(default_factory=list)
@property
def failed(self) -> bool:
return self.state in ("FAILED", "TIMEDOUT", "CANCELED")
def get_error_summary(self) -> str:
"""Get a summary of all errors for this task."""
if not self.failed:
return "Task succeeded"
lines = []
if self.error_message:
lines.append(f"Error: {self.error_message}")
if self.error_trace:
lines.append(f"Trace: {self.error_trace[:500]}...")
for err in self.notebook_errors:
lines.append(f"\n[Cell {err.cell_index}] {err.error_name}: {err.error_message}")
if err.cell_source:
# Show first 200 chars of source
src = err.cell_source[:200]
if len(err.cell_source) > 200:
src += "..."
lines.append(f"Code: {src}")
if err.error_trace:
lines.append(f"Traceback:\n{err.error_trace}")
return "\n".join(lines) if lines else "Unknown error"
@dataclass
class JobRunResult:
"""Result from a job run with all task details."""
job_id: int
job_name: str
run_id: int
state: str # RUNNING, TERMINATED, etc.
result_state: Optional[str] = None # SUCCESS, FAILED, etc.
state_message: Optional[str] = None
start_time: Optional[int] = None
end_time: Optional[int] = None
tasks: List[TaskResult] = field(default_factory=list)
used_commit: Optional[str] = None # Most recent commit from tasks
@property
def succeeded(self) -> bool:
return self.result_state == "SUCCESS"
@property
def failed(self) -> bool:
return self.result_state in ("FAILED", "TIMEDOUT", "CANCELED")
@property
def running(self) -> bool:
return self.state == "RUNNING"
@property
def failed_tasks(self) -> List[TaskResult]:
return [t for t in self.tasks if t.failed]
def get_failure_summary(self) -> str:
"""Get a human-readable summary of failures."""
if self.succeeded:
return "Job succeeded"
lines = [f"Job {self.job_name} FAILED"]
if self.state_message:
lines.append(f"Message: {self.state_message}")
for task in self.failed_tasks:
lines.append(f"\n--- Task: {task.task_key} ---")
if task.notebook_path:
lines.append(f"Notebook: {task.notebook_path}")
if task.error_message:
lines.append(f"Error: {task.error_message}")
if task.error_trace:
# Truncate long traces
trace = task.error_trace
if len(trace) > 2000:
trace = trace[:2000] + "\n... (truncated)"
lines.append(f"Trace:\n{trace}")
return "\n".join(lines)
@dataclass
class JobInfo:
"""Information about a bundle job."""
job_id: int
job_name: str
demo_name: str
latest_run: Optional[JobRunResult] = None
head_commit: Optional[str] = None
is_up_to_date: Optional[bool] = None # True if latest run used HEAD commit
class JobInspector:
"""
Inspects bundle jobs and retrieves detailed failure information.
Uses the Databricks SDK for all API operations.
Usage:
inspector = JobInspector(host, token, github_token, repo_url)
# List all bundle jobs
jobs = inspector.list_bundle_jobs()
# Get detailed failure info
result = inspector.get_job_run_details(job_id, run_id)
print(result.get_failure_summary())
# Get task error output
output = inspector.get_task_output(task_run_id)
"""
# Both prefixes are used - field-demos_ for demos, field-bundle_ for bundling
JOB_PREFIXES = ["field-demos_", "field-bundle_"]
def __init__(self, host: str, token: str, github_token: str = None, repo_url: str = None):
self.host = host.rstrip("/")
self.token = token
self.github_token = github_token
self.repo_url = repo_url
# Create Databricks SDK client
self.ws = WorkspaceClient(
host=host,
token=token,
auth_type="pat",
product="dbdemos-ai-release",
product_version="0.1.0"
)
def _github_get(self, path: str) -> dict:
"""Make a GET request to the GitHub API."""
if not self.github_token:
raise ValueError("GitHub token required for this operation")
headers = {
"Accept": "application/vnd.github.v3+json",
"Authorization": f"token {self.github_token}"
}
url = f"https://api.github.com/{path}"
resp = requests.get(url, headers=headers, timeout=60)
return resp.json()
def list_bundle_jobs(self, include_run_details: bool = True) -> List[JobInfo]:
"""
List all bundle jobs (jobs with 'field-bundle_' prefix).
Args:
include_run_details: If True, fetches latest run details for each job
Returns:
List of JobInfo objects
"""
jobs = []
# List all jobs using SDK
for job in self.ws.jobs.list():
name = job.settings.name if job.settings else None
if not name:
continue
# Check all known prefixes
demo_name = None
for prefix in self.JOB_PREFIXES:
if name.startswith(prefix):
demo_name = name[len(prefix):]
break
if demo_name:
job_info = JobInfo(
job_id=job.job_id,
job_name=name,
demo_name=demo_name
)
if include_run_details:
# Get latest run using SDK
runs = list(self.ws.jobs.list_runs(job_id=job.job_id, limit=1, expand_tasks=True))
if runs:
job_info.latest_run = self._parse_run(runs[0], name)
jobs.append(job_info)
return jobs
def find_job(self, demo_name: str) -> Optional[JobInfo]:
"""Find a bundle job by demo name. Tries all known prefixes."""
# Try each prefix
for prefix in self.JOB_PREFIXES:
job_name = f"{prefix}{demo_name}"
# Search with name filter using SDK
for job in self.ws.jobs.list(name=job_name):
if job.settings and job.settings.name == job_name:
job_info = JobInfo(
job_id=job.job_id,
job_name=job_name,
demo_name=demo_name
)
# Get latest run
runs = list(self.ws.jobs.list_runs(job_id=job.job_id, limit=1, expand_tasks=True))
if runs:
job_info.latest_run = self._parse_run(runs[0], job_name)
return job_info
return None
def get_job_run_details(self, job_id: int, run_id: int = None) -> Optional[JobRunResult]:
"""
Get detailed information about a job run.
Args:
job_id: The job ID
run_id: Specific run ID. If None, gets the latest run.
Returns:
JobRunResult with full task details and errors
"""
if run_id is None:
# Get latest run
runs = list(self.ws.jobs.list_runs(job_id=job_id, limit=1, expand_tasks=True))
if not runs:
return None
run = runs[0]
else:
run = self.ws.jobs.get_run(run_id=run_id)
# Get job name
job = self.ws.jobs.get(job_id=job_id)
job_name = job.settings.name if job.settings else f"job_{job_id}"
result = self._parse_run(run, job_name)
# For failed tasks, get detailed error output (API + notebook HTML)
for task in result.failed_tasks:
self.get_task_errors(task)
return result
def _parse_run(self, run, job_name: str) -> JobRunResult:
"""Parse a run object from SDK into a JobRunResult."""
# Get state info
state = run.state
lifecycle_state = state.life_cycle_state.value if state and state.life_cycle_state else "UNKNOWN"
result_state = state.result_state.value if state and state.result_state else None
state_message = state.state_message if state else None
# Parse tasks
tasks = []
most_recent_commit = None
for task in (run.tasks or []):
task_state = task.state
task_result_state = task_state.result_state.value if task_state and task_state.result_state else "UNKNOWN"
# Get commit from git_source
used_commit = None
if task.git_source and task.git_source.git_snapshot:
used_commit = task.git_source.git_snapshot.used_commit
if used_commit and (not most_recent_commit or used_commit > most_recent_commit):
most_recent_commit = used_commit
notebook_path = None
if task.notebook_task:
notebook_path = task.notebook_task.notebook_path
task_result = TaskResult(
task_key=task.task_key or "unknown",
run_id=task.run_id or 0,
state=task_result_state,
notebook_path=notebook_path,
used_commit=used_commit
)
tasks.append(task_result)
return JobRunResult(
job_id=run.job_id or 0,
job_name=job_name,
run_id=run.run_id or 0,
state=lifecycle_state,
result_state=result_state,
state_message=state_message,
start_time=run.start_time,
end_time=run.end_time,
tasks=tasks,
used_commit=most_recent_commit
)
def get_task_output(self, task_run_id: int) -> Optional[Dict[str, Any]]:
"""
Get the output/error from a specific task run.
Args:
task_run_id: The task's run_id (not the job run_id)
Returns:
Dict with 'error' and 'error_trace' if available
"""
try:
output = self.ws.jobs.get_run_output(run_id=task_run_id)
return {
"error": output.error,
"error_trace": output.error_trace,
"metadata": str(output.metadata) if output.metadata else None,
"notebook_output": str(output.notebook_output) if output.notebook_output else None
}
except Exception as e:
return {"error": str(e)}
def export_notebook_html(self, task_run_id: int) -> Optional[str]:
"""
Export the notebook HTML from a task run.
Args:
task_run_id: The task's run_id
Returns:
HTML content of the notebook with outputs, or None if failed
"""
try:
export = self.ws.jobs.export_run(run_id=task_run_id, views_to_export=ViewsToExport.ALL)
if export.views and len(export.views) > 0:
return export.views[0].content
return None
except Exception as e:
print(f"Failed to export notebook: {e}")
return None
def extract_errors_from_html(self, html_content: str) -> List[NotebookError]:
"""
Parse notebook HTML and extract error information from failed cells.
The HTML contains a base64+URL encoded JSON model with command details.
Args:
html_content: The HTML content from export_notebook_html
Returns:
List of NotebookError objects for each failed cell
"""
import base64
errors = []
# Find the notebook model in the HTML - it's base64 then URL encoded JSON
match = re.search(r'__DATABRICKS_NOTEBOOK_MODEL = \'([^\']+)\'', html_content)
if not match:
return errors
try:
encoded = match.group(1)
# Decode: base64 -> URL encoding -> JSON
decoded_bytes = base64.b64decode(encoded)
url_encoded = decoded_bytes.decode('utf-8')
json_str = urllib.parse.unquote(url_encoded)
model = json.loads(json_str)
except Exception as e:
print(f"Failed to parse notebook model: {e}")
return errors
# Extract errors from commands
for idx, cmd in enumerate(model.get('commands', [])):
state = cmd.get('state')
error_summary = cmd.get('errorSummary')
error = cmd.get('error')
# Skip non-error commands and "Command skipped" errors (not the root cause)
if not (error or error_summary) or state != 'error':
continue
if error_summary == 'Command skipped':
continue
# Get command source
cell_source = cmd.get('command', '')
# Parse error name and message
error_name = "Error"
error_message = error_summary or "Unknown error"
error_trace = None
# Try to parse Python exception from error_summary
exc_match = re.search(r'(\w+Error|\w+Exception):\s*(.+)', error_summary or '')
if exc_match:
error_name = exc_match.group(1)
error_message = exc_match.group(2).strip()
# Clean ANSI codes from error trace
if error:
# Remove ANSI escape codes
error_trace = re.sub(r'\x1b\[[0-9;]*m', '', str(error))
errors.append(NotebookError(
cell_index=idx,
cell_type="code",
error_name=error_name,
error_message=error_message,
error_trace=error_trace,
cell_source=cell_source[:500] if cell_source else None # Truncate source
))
return errors
def get_task_errors(self, task: TaskResult) -> TaskResult:
"""
Get comprehensive error information for a failed task.
First tries API, then falls back to exporting and parsing notebook HTML.
Args:
task: TaskResult to enrich with error information
Returns:
The same TaskResult with error fields populated
"""
# First try the standard API
output = self.get_task_output(task.run_id)
if output:
task.error_message = output.get("error")
task.error_trace = output.get("error_trace")
# If no error from API, export and parse the notebook
if not task.error_message and not task.error_trace:
html = self.export_notebook_html(task.run_id)
if html:
errors = self.extract_errors_from_html(html)
task.notebook_errors = errors
# Set primary error from first notebook error
if errors:
first_err = errors[0]
task.error_message = f"{first_err.error_name}: {first_err.error_message}"
task.error_trace = first_err.error_trace
return task
def get_head_commit(self) -> Optional[str]:
"""Get the HEAD commit SHA from the GitHub repo."""
if not self.repo_url or not self.github_token:
return None
# Extract owner/repo from URL
match = re.search(r'github\.com[/:]([^/]+)/([^/\.]+)', self.repo_url)
if not match:
return None
owner, repo = match.groups()
resp = self._github_get(f"repos/{owner}/{repo}/commits/HEAD")
return resp.get("sha")
def check_job_up_to_date(self, job_info: JobInfo) -> bool:
"""
Check if a job's latest run used the HEAD commit.
Args:
job_info: JobInfo with latest_run populated
Returns:
True if the job was run with the latest commit
"""
if not job_info.latest_run or not job_info.latest_run.used_commit:
return False
head_commit = self.get_head_commit()
if not head_commit:
return False
job_info.head_commit = head_commit
job_info.is_up_to_date = job_info.latest_run.used_commit == head_commit
return job_info.is_up_to_date
def get_failed_jobs(self) -> List[JobInfo]:
"""Get all bundle jobs that have a failed latest run."""
all_jobs = self.list_bundle_jobs(include_run_details=True)
return [j for j in all_jobs if j.latest_run and j.latest_run.failed]
def get_job_url(self, job_id: int, run_id: int = None) -> str:
"""Get the workspace URL for a job or run."""
if run_id:
return f"{self.host}/#job/{job_id}/run/{run_id}"
return f"{self.host}/#job/{job_id}"
def load_inspector_from_config() -> JobInspector:
"""Load a JobInspector using the local config file."""
repo_root = Path(__file__).parent.parent
conf_files = [
repo_root / "local_conf_E2TOOL.json",
repo_root / "local_conf.json",
]
config = None
for conf_file in conf_files:
if conf_file.exists():
with open(conf_file, "r") as f:
config = json.load(f)
break
if not config:
raise FileNotFoundError("No config file found")
# Clean repo_url
repo_url = config.get("repo_url", "")
if repo_url.endswith(".git"):
repo_url = repo_url[:-4]
return JobInspector(
host=config["url"],
token=config["pat_token"],
github_token=config.get("github_token"),
repo_url=repo_url
)
================================================
FILE: ai_release/run_remote.py
================================================
#!/usr/bin/env python3
"""
Remote Code Execution CLI for DBDemos
Execute Python code on a Databricks cluster for testing notebook fixes.
Usage:
# Execute code directly
python ai_release/run_remote.py --code "print('Hello from Databricks!')"
# Execute a file
python ai_release/run_remote.py --file path/to/script.py
# Execute SQL
python ai_release/run_remote.py --code "SELECT 1" --language sql
# List available clusters
python ai_release/run_remote.py --list-clusters
# Start a cluster
python ai_release/run_remote.py --start-cluster
# Check cluster status
python ai_release/run_remote.py --cluster-status
# Reuse context for faster follow-up commands
python ai_release/run_remote.py --code "x = 1" --save-context
python ai_release/run_remote.py --code "print(x)" --load-context
Environment Variables / Config:
Uses local_conf_E2TOOL.json for credentials.
Cluster is auto-selected by matching "cluster_name_pattern" (default: "quentin")
"""
import argparse
import json
import os
import sys
from pathlib import Path
# Add parent directory to path
sys.path.insert(0, str(Path(__file__).parent.parent))
from ai_release.compute import (
get_workspace_client,
list_clusters,
find_cluster_by_name,
start_cluster,
get_cluster_status,
wait_for_cluster,
execute_command,
execute_file,
)
CONTEXT_FILE = Path(__file__).parent / ".execution_context.json"
def load_config():
"""Load configuration from local_conf_E2TOOL.json"""
repo_root = Path(__file__).parent.parent
conf_files = [
repo_root / "local_conf_E2TOOL.json",
repo_root / "local_conf.json",
]
for conf_file in conf_files:
if conf_file.exists():
with open(conf_file, "r") as f:
config = json.load(f)
print(f"Loaded config from {conf_file.name}")
return config
print("ERROR: No config file found (local_conf_E2TOOL.json or local_conf.json)")
sys.exit(1)
def save_context(cluster_id: str, context_id: str):
"""Save execution context for reuse."""
with open(CONTEXT_FILE, "w") as f:
json.dump({"cluster_id": cluster_id, "context_id": context_id}, f)
print(f"Context saved to {CONTEXT_FILE}")
def load_context():
"""Load saved execution context."""
if CONTEXT_FILE.exists():
with open(CONTEXT_FILE, "r") as f:
return json.load(f)
return None
def clear_context():
"""Clear saved context."""
if CONTEXT_FILE.exists():
CONTEXT_FILE.unlink()
print("Context cleared")
def main():
parser = argparse.ArgumentParser(
description="Execute code on Databricks clusters",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=__doc__,
)
# Execution options
parser.add_argument("--code", "-c", help="Code to execute")
parser.add_argument("--file", "-f", help="Python file to execute")
parser.add_argument("--language", "-l", default="python", choices=["python", "sql", "scala", "r"])
parser.add_argument("--timeout", "-t", type=int, default=300, help="Timeout in seconds")
# Cluster management
parser.add_argument("--list-clusters", action="store_true", help="List available clusters")
parser.add_argument("--start-cluster", action="store_true", help="Start the configured cluster")
parser.add_argument("--cluster-status", action="store_true", help="Check cluster status")
parser.add_argument("--wait-for-cluster", action="store_true", help="Wait for cluster to be running")
parser.add_argument("--cluster-name", help="Cluster name pattern to match (default: from config or 'quentin')")
# Context management
parser.add_argument("--save-context", action="store_true", help="Save context for reuse")
parser.add_argument("--load-context", action="store_true", help="Reuse saved context")
parser.add_argument("--clear-context", action="store_true", help="Clear saved context")
parser.add_argument("--destroy-context", action="store_true", help="Destroy context after execution")
args = parser.parse_args()
# Load config
config = load_config()
host = config.get("url", os.environ.get("DATABRICKS_HOST"))
token = config.get("pat_token", os.environ.get("DATABRICKS_TOKEN"))
cluster_pattern = args.cluster_name or config.get("cluster_name_pattern", "quentin")
if not host or not token:
print("ERROR: Missing workspace URL or token")
sys.exit(1)
# Create client
client = get_workspace_client(host, token)
print(f"Workspace: {host}")
# Clear context
if args.clear_context:
clear_context()
return 0
# List clusters
if args.list_clusters:
clusters = list_clusters(client, include_terminated=True)
print(f"\nFound {len(clusters)} clusters:\n")
for c in clusters:
state_icon = "🟢" if c["state"] == "RUNNING" else "🔴" if c["state"] == "TERMINATED" else "🟡"
print(f" {state_icon} {c['cluster_name']:<40} {c['state']:<12} {c['cluster_id']}")
return 0
# Find cluster
cluster = find_cluster_by_name(client, cluster_pattern)
if not cluster:
print(f"ERROR: No cluster found matching '{cluster_pattern}'")
print("Use --list-clusters to see available clusters")
return 1
print(f"Cluster: {cluster['cluster_name']} ({cluster['state']})")
cluster_id = cluster["cluster_id"]
# Cluster status
if args.cluster_status:
status = get_cluster_status(client, cluster_id)
print(f" State: {status['state']}")
return 0
# Start cluster
if args.start_cluster:
result = start_cluster(client, cluster_id)
print(f" {result['message']}")
if args.wait_for_cluster and result.get("state") != "RUNNING":
wait_for_cluster(client, cluster_id)
return 0
# Wait for cluster
if args.wait_for_cluster:
success = wait_for_cluster(client, cluster_id)
return 0 if success else 1
# Execute code
if args.code or args.file:
# Check cluster is running
if cluster["state"] != "RUNNING":
print(f"ERROR: Cluster is {cluster['state']}, not RUNNING")
print("Use --start-cluster --wait-for-cluster to start it")
return 1
# Load context if requested
context_id = None
if args.load_context:
saved = load_context()
if saved and saved.get("cluster_id") == cluster_id:
context_id = saved.get("context_id")
print(f"Reusing context: {context_id}")
else:
print("No saved context found or cluster changed, creating new context")
# Execute
if args.file:
print(f"\nExecuting file: {args.file}")
result = execute_file(
client=client,
file_path=args.file,
cluster_id=cluster_id,
context_id=context_id,
timeout=args.timeout,
destroy_context_on_completion=args.destroy_context,
)
else:
print(f"\nExecuting {args.language} code...")
result = execute_command(
client=client,
code=args.code,
cluster_id=cluster_id,
context_id=context_id,
language=args.language,
timeout=args.timeout,
destroy_context_on_completion=args.destroy_context,
)
# Print result
print("\n" + "=" * 60)
if result.success:
print("✓ SUCCESS")
print("=" * 60)
print(result.output)
else:
print("✗ FAILED")
print("=" * 60)
print(result.error)
# Save context if requested
if args.save_context and result.context_id and not result.context_destroyed:
save_context(cluster_id, result.context_id)
print(f"\nContext saved. Use --load-context to reuse.")
return 0 if result.success else 1
parser.print_help()
return 1
if __name__ == "__main__":
sys.exit(main())
================================================
FILE: ai_release/run_state.py
================================================
"""
Run state management for AI release workflow.
Tracks job runs, errors, and fixes in a persistent folder structure:
ai_release/runs/
<commit_id>/
state.json - Overall run state
<demo_name>/
status.json - Demo-specific status
errors.json - Extracted errors from failed runs
fix_attempts.json - History of fix attempts
job_output.log - Raw job output
notes.md - AI notes and observations
"""
import json
import os
from dataclasses import dataclass, field, asdict
from datetime import datetime
from pathlib import Path
from typing import Optional, List, Dict, Any
RUNS_DIR = Path(__file__).parent / "runs"
@dataclass
class DemoRunState:
"""State for a single demo run."""
demo_name: str
status: str = "pending" # pending, running, success, failed, fixing
job_id: Optional[int] = None
run_id: Optional[int] = None
branch: Optional[str] = None
started_at: Optional[str] = None
completed_at: Optional[str] = None
error_summary: Optional[str] = None
fix_attempts: List[Dict[str, Any]] = field(default_factory=list)
def to_dict(self) -> dict:
return asdict(self)
@classmethod
def from_dict(cls, data: dict) -> "DemoRunState":
return cls(**data)
@dataclass
class RunState:
"""Overall state for a release run."""
commit_id: str
branch: str = "main"
started_at: str = field(default_factory=lambda: datetime.now().isoformat())
demos: Dict[str, DemoRunState] = field(default_factory=dict)
def to_dict(self) -> dict:
return {
"commit_id": self.commit_id,
"branch": self.branch,
"started_at": self.started_at,
"demos": {k: v.to_dict() for k, v in self.demos.items()}
}
@classmethod
def from_dict(cls, data: dict) -> "RunState":
demos = {k: DemoRunState.from_dict(v) for k, v in data.get("demos", {}).items()}
return cls(
commit_id=data["commit_id"],
branch=data.get("branch", "main"),
started_at=data.get("started_at", ""),
demos=demos
)
class RunStateManager:
"""Manages persistent run state for AI release workflow."""
def __init__(self, commit_id: Optional[str] = None):
"""Initialize with a specific commit or auto-detect from git."""
if commit_id is None:
commit_id = self._get_current_commit()
self.commit_id = commit_id
self.run_dir = RUNS_DIR / commit_id
self.run_dir.mkdir(parents=True, exist_ok=True)
self.state = self._load_or_create_state()
def _get_current_commit(self) -> str:
"""Get current git commit from dbdemos-notebooks."""
import subprocess
try:
result = subprocess.run(
["git", "rev-parse", "--short", "HEAD"],
cwd=Path(__file__).parent.parent.parent / "dbdemos-notebooks",
capture_output=True, text=True
)
if result.returncode == 0:
return result.stdout.strip()
except Exception:
pass
return datetime.now().strftime("%Y%m%d_%H%M%S")
def _load_or_create_state(self) -> RunState:
"""Load existing state or create new one."""
state_file = self.run_dir / "state.json"
if state_file.exists():
with open(state_file) as f:
return RunState.from_dict(json.load(f))
return RunState(commit_id=self.commit_id)
def save(self):
"""Save current state to disk."""
state_file = self.run_dir / "state.json"
with open(state_file, "w") as f:
json.dump(self.state.to_dict(), f, indent=2)
def get_demo_dir(self, demo_name: str) -> Path:
"""Get or create directory for a demo."""
demo_dir = self.run_dir / demo_name
demo_dir.mkdir(parents=True, exist_ok=True)
return demo_dir
def get_demo_state(self, demo_name: str) -> DemoRunState:
"""Get state for a specific demo."""
if demo_name not in self.state.demos:
self.state.demos[demo_name] = DemoRunState(demo_name=demo_name)
return self.state.demos[demo_name]
def update_demo_status(self, demo_name: str, status: str, **kwargs):
"""Update demo status and save."""
demo_state = self.get_demo_state(demo_name)
demo_state.status = status
for key, value in kwargs.items():
if hasattr(demo_state, key):
setattr(demo_state, key, value)
if status == "running" and not demo_state.started_at:
demo_state.started_at = datetime.now().isoformat()
elif status in ("success", "failed"):
demo_state.completed_at = datetime.now().isoformat()
self.save()
self._save_demo_status(demo_name, demo_state)
def _save_demo_status(self, demo_name: str, state: DemoRunState):
"""Save demo-specific status file."""
demo_dir = self.get_demo_dir(demo_name)
with open(demo_dir / "status.json", "w") as f:
json.dump(state.to_dict(), f, indent=2)
def save_errors(self, demo_name: str, errors: List[Dict[str, Any]]):
"""Save extracted errors for a demo."""
demo_dir = self.get_demo_dir(demo_name)
with open(demo_dir / "errors.json", "w") as f:
json.dump({
"extracted_at": datetime.now().isoformat(),
"errors": errors
}, f, indent=2)
def save_job_output(self, demo_name: str, output: str):
"""Save raw job output."""
demo_dir = self.get_demo_dir(demo_name)
with open(demo_dir / "job_output.log", "w") as f:
f.write(output)
def add_fix_attempt(self, demo_name: str, description: str, branch: str, files_changed: List[str]):
"""Record a fix attempt."""
demo_state = self.get_demo_state(demo_name)
attempt = {
"timestamp": datetime.now().isoformat(),
"description": description,
"branch": branch,
"files_changed": files_changed,
"result": "pending"
}
demo_state.fix_attempts.append(attempt)
self.save()
# Also save to fix_attempts.json
demo_dir = self.get_demo_dir(demo_name)
with open(demo_dir / "fix_attempts.json", "w") as f:
json.dump(demo_state.fix_attempts, f, indent=2)
def update_fix_result(self, demo_name: str, result: str):
"""Update the result of the latest fix attempt."""
demo_state = self.get_demo_state(demo_name)
if demo_state.fix_attempts:
demo_state.fix_attempts[-1]["result"] = result
self.save()
def add_note(self, demo_name: str, note: str):
"""Add a note to the demo's notes.md file."""
demo_dir = self.get_demo_dir(demo_name)
notes_file = demo_dir / "notes.md"
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
with open(notes_file, "a") as f:
f.write(f"\n## {timestamp}\n\n{note}\n")
def get_summary(self) -> str:
"""Get a summary of all demo states."""
lines = [
f"# Release Run: {self.commit_id}",
f"Branch: {self.state.branch}",
f"Started: {self.state.started_at}",
"",
"## Demo Status",
""
]
for name, demo in sorted(self.state.demos.items()):
status_emoji = {
"pending": "⏳",
"running": "🔄",
"success": "✅",
"failed": "❌",
"fixing": "🔧"
}.get(demo.status, "❓")
line = f"- {status_emoji} **{name}**: {demo.status}"
if demo.error_summary:
line += f" - {demo.error_summary[:50]}..."
if demo.fix_attempts:
line += f" ({len(demo.fix_attempts)} fix attempts)"
lines.append(line)
return "\n".join(lines)
@classmethod
def list_runs(cls) -> List[str]:
"""List all existing run directories."""
if not RUNS_DIR.exists():
return []
return sorted([d.name for d in RUNS_DIR.iterdir() if d.is_dir()])
@classmethod
def get_latest_run(cls) -> Optional["RunStateManager"]:
"""Get the most recent run state manager."""
runs = cls.list_runs()
if not runs:
return None
return cls(runs[-1])
# Convenience functions
def get_run_state(commit_id: Optional[str] = None) -> RunStateManager:
"""Get or create a run state manager."""
return RunStateManager(commit_id)
def get_latest_run() -> Optional[RunStateManager]:
"""Get the latest run state."""
return RunStateManager.get_latest_run()
================================================
FILE: ai_release/runs/.gitignore
================================================
*.log
*/
!.gitignore
================================================
FILE: build-and-distribute.sh
================================================
#!/bin/bash
# Check for pending changes before doing anything
if ! git diff --quiet || ! git diff --cached --quiet; then
echo "Error: You have uncommitted changes."
echo "Please commit and push your changes before running a release."
echo ""
git status --short
exit 1
fi
if [ -n "$(git log origin/main..HEAD 2>/dev/null)" ]; then
echo "Error: You have unpushed commits."
echo "Please push your changes before running a release."
echo ""
git log origin/main..HEAD --oneline
exit 1
fi
# Check if gh CLI is installed
if ! command -v gh &> /dev/null; then
echo "Error: GitHub CLI (gh) is not installed. Please install it first."
echo "Visit: https://cli.github.com/"
exit 1
fi
# Check if pip-compile is installed (from pip-tools)
if ! command -v pip-compile &> /dev/null; then
echo "Error: pip-compile is not installed. Please install pip-tools first."
echo "Run: pip install pip-tools"
exit 1
fi
# Check authentication status
if ! gh auth status &> /dev/null; then
echo "GitHub CLI not authenticated. Please login..."
gh auth login
fi
# Check if active account is Enterprise Managed User (ends with _data)
ACTIVE_ACCOUNT=$(gh auth status | grep "Logged in to" | head -1 | sed 's/.*Logged in to github.com account \([^ ]*\).*/\1/')
if [[ "$ACTIVE_ACCOUNT" == *"_data" ]]; then
echo "Warning: Active account '$ACTIVE_ACCOUNT' appears to be an Enterprise Managed User"
echo "Switching to regular account..."
# Get list of available accounts by parsing auth status output
AVAILABLE_ACCOUNTS=$(gh auth status | grep "Logged in to" | sed 's/.*Logged in to github.com account \([^ ]*\).*/\1/')
# Find first account that doesn't end with _data
REGULAR_ACCOUNT=""
while IFS= read -r account; do
if [[ "$account" != *"_data" ]]; then
REGULAR_ACCOUNT="$account"
break
fi
done <<< "$AVAILABLE_ACCOUNTS"
if [[ -n "$REGULAR_ACCOUNT" ]]; then
echo "Switching to regular account: $REGULAR_ACCOUNT"
gh auth switch --user "$REGULAR_ACCOUNT" || {
echo "Error: Failed to switch to regular account"
exit 1
}
else
echo "Error: No regular account found. Please add a regular GitHub account:"
echo "gh auth login"
exit 1
fi
fi
# Check access to required repositories
echo "Checking access to required repositories..."
REPOS=("databricks-demos/dbdemos" "databricks-demos/dbdemos-notebooks" "databricks-demos/dbdemos-dataset" "databricks-demos/dbdemos-resources")
for repo in "${REPOS[@]}"; do
if ! gh api "repos/$repo" &> /dev/null; then
echo "Error: No access to repository $repo"
echo "Please ensure you have the necessary permissions or try logging in again:"
echo "gh auth login"
exit 1
fi
echo "✓ Access confirmed for $repo"
done
# Switch to main and pull latest
git checkout main || exit 1
git pull || exit 1
# Get current version from setup.py
CURRENT_VERSION=$(grep "version=" setup.py | sed "s/.*version='\([^']*\)'.*/\1/")
echo "Current version: $CURRENT_VERSION"
# Bump version (patch increment)
IFS='.' read -ra VERSION_PARTS <<< "$CURRENT_VERSION"
NEW_PATCH=$((VERSION_PARTS[2] + 1))
NEW_VERSION="${VERSION_PARTS[0]}.${VERSION_PARTS[1]}.$NEW_PATCH"
echo "New version: $NEW_VERSION"
# Update version in setup.py
sed -i.bak "s/version='[^']*'/version='$NEW_VERSION'/" setup.py
rm setup.py.bak
# Update version in __init__.py
sed -i.bak "s/__version__ = \"[^\"]*\"/__version__ = \"$NEW_VERSION\"/" dbdemos/__init__.py
rm dbdemos/__init__.py.bak
# Generate requirements.txt with hashes from trusted private index
echo "Generating requirements.txt with hashes..."
# Extract dependencies from setup.py and write to requirements.in
python3 -c "
import ast
import sys
with open('setup.py', 'r') as f:
content = f.read()
# Parse the setup.py file
tree = ast.parse(content)
# Find the setup() call and extract install_requires
for node in ast.walk(tree):
if isinstance(node, ast.Call) and getattr(node.func, 'id', None) == 'setup':
for keyword in node.keywords:
if keyword.arg == 'install_requires':
# Extract the list of dependencies
deps = ast.literal_eval(compile(ast.Expression(keyword.value), '<string>', 'eval'))
for dep in deps:
print(dep)
sys.exit(0)
print('Error: Could not extract install_requires from setup.py', file=sys.stderr)
sys.exit(1)
" > requirements.in
if [ $? -ne 0 ]; then
echo "Error: Failed to extract dependencies from setup.py"
exit 1
fi
echo "Extracted dependencies:"
cat requirements.in
# Run pip-compile with private index to get trusted hashes
PRIVATE_INDEX="https://pypi-proxy.dev.databricks.com/simple/"
pip-compile --generate-hashes --index-url="$PRIVATE_INDEX" --output-file=requirements.txt requirements.in
if [ $? -ne 0 ]; then
echo "Error: pip-compile failed"
exit 1
fi
# Remove the private index URL from requirements.txt (keep hashes, they're content-based)
sed -i.bak '/^--index-url/d' requirements.txt
# Also clean up the comment that references the private index
sed -i.bak "s|--index-url=$PRIVATE_INDEX ||g" requirements.txt
rm requirements.txt.bak
echo "requirements.txt generated with hashes (private index removed)"
# Use the version we just bumped
VERSION=$NEW_VERSION
echo "Using bumped version: $VERSION"
#package
rm -rf ./dist/*
rm -rf ./dbdemos/bundles/.DS_Store
python3 setup.py clean --all bdist_wheel
echo "Package built under dist/ - updating pypi with new version..."
ls -alh ./dist
if ! twine upload dist/*; then
echo "Error: Failed to upload package to PyPI"
exit 1
fi
echo "Upload ok - available as pip install dbdemos"
# Create or switch to release branch and commit the bumped version
echo "Creating/updating release branch with bumped version..."
git checkout -b release/v$VERSION 2>/dev/null || git checkout release/v$VERSION
git add setup.py dbdemos/__init__.py requirements.in requirements.txt
git commit -m "Bump version to $VERSION"
git push origin release/v$VERSION
# Create PR to main branch
echo "Creating pull request to main branch..."
if gh pr create --title "Release v$VERSION" --body "Automated release for version $VERSION" --base main --head release/v$VERSION; then
echo "Pull request created successfully"
else
echo "Warning: Failed to create pull request (may already exist)"
fi
# Also update main with the version bump so it doesn't get lost
echo "Syncing version bump to main..."
git checkout main
git add setup.py dbdemos/__init__.py requirements.in requirements.txt
git commit -m "Bump version to $VERSION"
git push origin main
# Find the wheel file
WHL_FILE=$(find ./dist -name "*.whl" | head -n 1)
if [ -z "$WHL_FILE" ]; then
echo "Error: No wheel file found in ./dist directory"
exit 1
fi
echo "Found wheel file: $WHL_FILE"
# Extract version from wheel filename (format: dbdemos-0.6.12-py3-none-any.whl)
VERSION=$(basename "$WHL_FILE" | sed -E 's/dbdemos-([0-9]+\.[0-9]+\.[0-9]+).*/\1/')
echo "Extracted version from wheel file: $VERSION"
# Function to create a release and upload asset using gh CLI
create_release_with_asset() {
local repo=$1
local tag_name="v$VERSION"
local release_name="v$VERSION"
echo "Creating release $release_name for $repo..."
# Create the release using gh CLI
if gh release create "$tag_name" "$WHL_FILE" --repo "$repo" --title "$release_name" --notes "Release version $VERSION"; then
echo "Release created and asset uploaded successfully for $repo"
return 0
else
echo "Error creating release for $repo"
return 1
fi
}
# Create releases with assets on all repositories
echo "Creating releases for version v$VERSION..."
create_release_with_asset "databricks-demos/dbdemos"
create_release_with_asset "databricks-demos/dbdemos-notebooks"
create_release_with_asset "databricks-demos/dbdemos-dataset"
create_release_with_asset "databricks-demos/dbdemos-resources"
echo "Release process completed for v$VERSION!"
================================================
FILE: build.sh
================================================
python3 setup.py clean --all bdist_wheel
conda activate test_dbdemos
#pip3 install dist/dbdemos-0.3.0-py3-none-any.whl --force
#python3 test_package.py
#cp dist/dbdemos-* release/
================================================
FILE: dbdemos/__init__.py
================================================
__version__ = "0.6.34"
from .dbdemos import list_demos, install, create_cluster, help, install_all, check_status_all, check_status, get_html_list_demos
================================================
FILE: dbdemos/conf.py
================================================
import json
from pathlib import Path
from typing import List
import requests
import urllib
from datetime import date
import re
import threading
from requests import Response
def merge_dict(a, b, path=None, override = True):
"""merges dict b into a. Mutate a"""
if path is None: path = []
for key in b:
if key in a:
if isinstance(a[key], dict) and isinstance(b[key], dict):
merge_dict(a[key], b[key], path + [str(key)])
elif override:
a[key] = b[key]
else:
a[key] = b[key]
class Conf():
def __init__(self, username: str, workspace_url: str, org_id: str, pat_token: str, default_cluster_template: str = None, default_cluster_job_template = None,
repo_staging_path: str = None, repo_name: str = None, repo_url: str = None, branch: str = "master", github_token = None, run_test_as_username="quentin.ambard@databricks.com"):
self.username = username
name = self.username[:self.username.rfind('@')]
self.name = re.sub("[^A-Za-z0-9]", '_', name)
self.workspace_url = workspace_url
self.org_id = org_id
self.pat_token = pat_token
self.headers = {"Authorization": "Bearer " + pat_token, 'Content-type': 'application/json', 'User-Agent': 'dbdemos'}
self.default_cluster_template = default_cluster_template
self.default_cluster_job_template = default_cluster_job_template
self.repo_staging_path = repo_staging_path
self.repo_name = repo_name
assert repo_url is None or ".git" not in repo_url, "repo_url should not contain .git"
self.repo_url = repo_url
self.branch = branch
self.github_token = github_token
self.run_test_as_username = run_test_as_username
def get_repo_path(self):
return self.repo_staging_path+"/"+self.repo_name
#Add internal pool id to accelerate our demos & unit tests
def get_demo_pool(self):
if self.org_id == "1444828305810485" or "e2-demo-field-eng" in self.workspace_url:
return "0727-104344-hauls13-pool-uftxk0r6"
if self.org_id == "1660015457675682" or self.is_dev_env():
return "1025-140806-yup112-pool-yz565bma"
if self.org_id == "5206439413157315":
return "1010-172835-slues66-pool-7dhzc23j"
if self.org_id == "984752964297111":
return "1010-173019-honor44-pool-ksw4stjz"
if self.org_id == "2556758628403379":
return "1010-173021-dance560-pool-hl7wefwy"
return None
def is_dev_env(self):
return "e2-demo-tools" in self.workspace_url or "local" in self.workspace_url
def is_demo_env(self):
return "e2-demo-field-eng" in self.workspace_url or "eastus2" or self.org_id in ["1444828305810485"]
def is_fe_env(self):
return "e2-demo-field-eng" in self.workspace_url or "eastus2" in self.workspace_url or \
self.org_id in ["5206439413157315", "984752964297111", "local", "1444828305810485", "2556758628403379"]
class DBClient():
def __init__(self, conf: Conf):
self.conf = conf
def clean_path(self, path):
if path.startswith("http"):
raise Exception(f"Wrong path {path}, use with api path directly (no http://xxx..xxx).")
if path.startswith("/"):
path = path[1:]
if path.startswith("api/"):
path = path[len("api/"):]
return path
def post(self, path: str, json: dict = {}, retry = 0):
url = self.conf.workspace_url+"/api/"+self.clean_path(path)
with requests.post(url, headers = self.conf.headers, json=json, timeout=60) as r:
if r.status_code == 429 and retry < 2:
import time
import random
wait_time = 15 * (retry+1) + random.randint(2*retry, 10*retry)
print(f'WARN: hitting api request limit 429 error: {path}. Sleeping {wait_time}sec and retrying...')
time.sleep(wait_time)
print('Retrying call.')
return self.post(path, json, retry+1)
else:
return self.get_json_result(url, r)
def put(self, path: str, json: dict = None, data: bytes = None):
url = self.conf.workspace_url+"/api/"+self.clean_path(path)
headers = self.conf.headers
if data is not None:
files = {'file': ('file', data, 'application/octet-stream')}
with requests.put(url, headers=headers, files=files, timeout=60) as r:
return self.get_json_result(url, r)
else:
with requests.put(url, headers=headers, json=json, timeout=60) as r:
return self.get_json_result(url, r)
def patch(self, path: str, json: dict = {}):
url = self.conf.workspace_url+"/api/"+self.clean_path(path)
with requests.patch(url, headers = self.conf.headers, json=json, timeout=60) as r:
return self.get_json_result(url, r)
def get(self, path: str, params: dict = {}, print_auth_error = True):
url = self.conf.workspace_url+"/api/"+self.clean_path(path)
with requests.get(url, headers = self.conf.headers, params=params, timeout=60) as r:
return self.get_json_result(url, r, print_auth_error)
def delete(self, path: str, params: dict = {}):
url = self.conf.workspace_url+"/api/"+self.clean_path(path)
with requests.delete(url, headers = self.conf.headers, params=params, timeout=60) as r:
return self.get_json_result(url, r)
def get_json_result(self, url: str, r: Response, print_auth_error = True):
if r.status_code == 403:
if print_auth_error:
print(f"Unauthorized call. Check your PAT token {r.text} - {r.url} - {url}")
try:
return r.json()
except Exception as e:
print(f"API CALL ERROR - can't read json. status: {r.status_code} {r.text} - URL: {url} - {e}")
raise e
def search_cluster(self, cluster_name: str, tags: dict):
clusters = self.db.get("2.1/clusters/list")
for c in clusters:
if c['cluster_name'] == cluster_name:
match = True
#Check if all the tags are in the cluster conf
for k, v in tags.items():
if k not in c['custom_tags'] or c['custom_tags'][k] != v:
match = False
if match:
return c
return None
def find_job(self, name, offset = 0, limit = 25):
r = self.get("2.1/jobs/list", {"limit": limit, "offset": offset, "name": urllib.parse.quote_plus(name)})
if 'jobs' in r:
for job in r['jobs']:
if job["settings"]["name"] == name:
return job
if r['has_more']:
return self.find_job(name, offset+limit, limit)
return None
class GenieRoom():
def __init__(self, id: str, display_name: str, description: str, table_identifiers: List[str], curated_questions: List[str], instructions: str, sql_instructions: List[dict], function_names: List[str], benchmarks:List[dict]):
self.display_name = display_name
self.id = id
self.description = description
self.instructions = instructions
self.table_identifiers = table_identifiers
self.sql_instructions = sql_instructions
self.curated_questions = curated_questions
self.function_names = function_names
self.benchmarks= benchmarks
class DataFolder():
def __init__(self, source_folder: str, source_format: str, target_table_name: str = None, target_volume_folder_name: str = None, target_format: str = "delta"):
assert target_volume_folder_name or target_table_name, "Error, data folder should either has target_table_name or target_volume_folder_name set"
self.source_folder = source_folder
self.source_format = source_format
self.target_table_name = target_table_name
self.target_format = target_format
self.target_volume_folder_name = target_volume_folder_name
class DemoNotebook():
def __init__(self, path: str, title: str, description: str, pre_run: bool = False, publish_on_website: bool = False,
add_cluster_setup_cell: bool = False, parameters: dict = {}, depends_on_previous: bool = True, libraries: list = [], warehouse_id = None, object_type = None):
self.path = path
self.title = title
self.description = description
self.pre_run = pre_run
self.publish_on_website = publish_on_website
self.add_cluster_setup_cell = add_cluster_setup_cell
self.parameters = parameters
self.depends_on_previous = depends_on_previous
self.libraries = libraries
self.warehouse_id = warehouse_id
self.object_type = object_type
def __repr__(self):
return self.path
def get_folder(self):
p = Path(self.get_clean_path())
p.parts
def get_clean_path(self):
#Some notebook path are relatives, like ../../demo-retail/lakehouse-retail/_resources/xxx
# DThis function removes it and returns _resources/xxx
p = Path(self.path)
parent_count = p.parts.count('..')
if parent_count > 0:
return str(p.relative_to(*p.parts[:parent_count*2-1]))
return self.path
def toJSON(self):
return json.dumps(self, default=lambda o: o.__dict__)
class DemoConf():
def __init__(self, path: str, json_conf: dict, catalog:str = None, schema: str = None):
self.json_conf = json_conf
self.notebooks = []
self.cluster = json_conf.get('cluster', {})
self.cluster_libraries = json_conf.get('cluster_libraries', [])
self.workflows = json_conf.get('workflows', [])
self.pipelines = json_conf.get('pipelines', [])
self.repos = json_conf.get('repos', [])
self.serverless_supported = json_conf.get('serverless_supported', False)
self.init_job = json_conf.get('init_job', {})
self.job_id = None
self.run_id = None
if path.startswith('/'):
path = path[1:]
self.path = path
self.name = json_conf['name']
self.category = json_conf['category']
self.title = json_conf['title']
self.description = json_conf['description']
self.tags = json_conf.get('tags', [])
self.custom_schema_supported = json_conf.get('custom_schema_supported', False)
self.schema = schema
self.catalog = catalog
self.default_schema = json_conf.get('default_schema', "")
self.default_catalog = json_conf.get('default_catalog', "")
self.custom_message = json_conf.get('custom_message', "")
self.create_cluster = json_conf.get('create_cluster', True)
self.dashboards = json_conf.get('dashboards', [])
self.sql_queries = json_conf.get('sql_queries', [])
self.bundle = json_conf.get('bundle', False)
self.env_version = json_conf.get('env_version', 2)
self.data_folders: List[DataFolder] = []
for data_folder in json_conf.get('data_folders', []):
self.data_folders.append(DataFolder(data_folder['source_folder'], data_folder['source_format'], data_folder.get('target_table_name', None),
data_folder.get('target_volume_folder', None), data_folder['target_format']))
self.genie_rooms: List[GenieRoom] = []
for genie_room in json_conf.get('genie_rooms', []):
self.genie_rooms.append(GenieRoom(genie_room['id'], genie_room.get('display_name', None), genie_room.get('description', None),
genie_room['table_identifiers'], genie_room.get('curated_questions', []),
genie_room.get('instructions', None), genie_room.get('sql_instructions', []),
genie_room.get('function_names', []),genie_room.get('benchmarks', [])))
for n in json_conf.get('notebooks', []):
add_cluster_setup_cell = n.get('add_cluster_setup_cell', False)
params = n.get('parameters', {})
depends_on_previous = n.get('depends_on_previous', True)
libraries = n.get('libraries', [])
warehouse_id = n.get('warehouse_id', None)
self.notebooks.append(DemoNotebook(n['path'], n['title'], n['description'], n['pre_run'], n['publish_on_website'],
add_cluster_setup_cell, params, depends_on_previous, libraries, warehouse_id, n.get('object_type', None)))
self._notebook_lock = threading.Lock()
def __repr__(self):
return self.path + "("+str(self.notebooks)+")"
def update_notebook_object_type(self, notebook: DemoNotebook, object_type: str):
with self._notebook_lock:
for n in self.json_conf['notebooks']:
if n['path'] == notebook.path:
n['object_type'] = object_type
break
def add_notebook(self, notebook):
self.notebooks.append(notebook)
#TODO: this isn't clean, need a better solution
self.json_conf["notebooks"].append(notebook.__dict__)
def set_pipeline_id(self, id, uid):
j = json.dumps(self.init_job)
j = j.replace("{{DYNAMIC_SDP_ID_"+id+"}}", uid)
self.init_job = json.loads(j)
j = json.dumps(self.workflows)
j = j.replace("{{DYNAMIC_SDP_ID_"+id+"}}", uid)
self.workflows = json.loads(j)
def get_job_name(self):
return "field-bundle_"+self.name
def get_notebooks_to_run(self):
return [n for n in self.notebooks if n.pre_run]
def get_notebooks_to_publish(self) -> List[DemoNotebook]:
return [n for n in self.notebooks if n.publish_on_website]
def get_bundle_path(self):
return self.get_bundle_root_path() + "/install_package"
def get_bundle_dashboard_path(self):
return self.get_bundle_root_path() + "/dashboards"
def get_bundle_root_path(self):
return "dbdemos/bundles/"+self.name
def get_minisite_path(self):
return "dbdemos/minisite/"+self.name
class ConfTemplate:
def __init__(self, username, demo_name, catalog = None, schema = None, demo_folder = ""):
self.catalog = catalog
self.schema = schema
self.username = username
self.demo_name = demo_name
self.demo_folder = demo_folder
def template_TODAY(self):
return date.today().strftime("%Y-%m-%d")
def template_CURRENT_USER(self):
return self.username
def template_CATALOG(self):
return self.catalog
def template_SCHEMA(self):
return self.schema
def template_CURRENT_USER_NAME(self):
name = self.username[:self.username.rfind('@')]
name = re.sub("[^A-Za-z0-9]", '_', name)
return name
def template_DEMO_NAME(self):
return self.demo_name
def template_DEMO_FOLDER(self):
return self.demo_folder
def template_SHARED_WAREHOUSE_ID(self):
return self.demo_folder
def replace_template_key(self, text: str):
for key in set(re.findall(r'\{\{(.*?)\}\}', text)):
if "Drift_detection" not in key: #TODO need to improve that, mlops demo has {{}} in the product like tasks.Drift_detection.values.all_violations_count
if not key.startswith("DYNAMIC") and not key.startswith("SHARED_WAREHOUSE"):
func = getattr(self, f"template_{key}")
replacement = func()
text = text.replace("{{"+key+"}}", replacement)
return text
================================================
FILE: dbdemos/dbdemos.py
================================================
from .exceptions.dbdemos_exception import TokenException
from .installer import Installer
from collections import defaultdict
from .installer_report import InstallerReport
CSS_LIST = """
<style>
.dbdemo {
font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica Neue,Arial,Noto Sans,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol,Noto Color Emoji,FontAwesome;
color: #3b3b3b;
padding: 0px 0px 20px 0px;
}
.dbdemo_box {
width: 400px;
padding: 10px;
box-shadow: 0 .15rem 1.15rem 0 rgba(58,59,69,.15)!important;
float: left;
min-height: 170px;
margin: 0px 20px 20px 20px;
}
.dbdemo_category {
clear: both;
}
.category {
margin-left: 20px;
margin-bottom: 5px;
}
.dbdemo_logo {
width: 100%;
height: 225px;
}
.code {
padding: 5px;
border: 1px solid #e4e4e4;
font-family: monospace;
background-color: #f5f5f5;
margin: 5px 0px 0px 0px;
}
.dbdemo_description {
height: 100px;
}
.menu_button {
font-size: 15px;
cursor: pointer;
border: 0px;
padding: 10px 20px 10px 20px;
margin-right: 10px;
background-color: rgb(238, 237, 233);
border-radius: 20px;
}
.menu_button:hover {
background-color: rgb(245, 244, 242)
}
.menu_button.selected {
background-color: rgb(158, 214, 196)
}
.new_tag {
background-color: red;
color: white;
font-size: 13px;
padding: 2px 7px;
border-radius: 3px;
margin-right: 5px;
}
</style>
"""
JS_LIST = """<script>
const buttons = document.querySelectorAll('.menu_button');
const sections = document.querySelectorAll('.dbdemo_category');
buttons.forEach(button => {
button.addEventListener('click', () => {
const selectedCategory = button.getAttribute('category');
sections.forEach(section => {
if (section.id === `category-${selectedCategory}`) {
section.style.display = 'block';
} else {
section.style.display = 'none';
}
});
buttons.forEach(btn => {
if (btn === button) {
btn.classList.add('selected');
} else {
btn.classList.remove('selected');
}
});
});
});
</script>"""
def help():
installer = Installer()
if installer.report.displayHTML_available():
from dbruntime.display import displayHTML
displayHTML("""<style>
.dbdemos_install{
font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica Neue,Arial,Noto Sans,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol,Noto Color Emoji,FontAwesome;
color: #3b3b3b;
box-shadow: 0 .15rem 1.15rem 0 rgba(58,59,69,.15)!important;
padding: 10px;
margin: 10px;
}
.code {
padding: 0px 5px;
border: 1px solid #e4e4e4;
font-family: monospace;
background-color: #f5f5f5;
margin: 5px 0px 0px 0px;
display: inline;
}
</style>
<div class="dbdemos_install">
<h1>DBDemos</h1>
<i>Install databricks demos: notebooks, Delta Live Table Pipeline, DBSQL Dashboards, ML Models etc.</i>
<ul>
<li>
<div class="code">dbdemos.help()</div>: display help.<br/><br/>
</li>
<li>
<div class="code">dbdemos.list_demos(category: str = None)</div>: list all demos available, can filter per category (ex: 'governance').<br/><br/>
</li>
<li>
<div class="code">dbdemos.install(demo_name: str, path: str = "./", overwrite: bool = False, use_current_cluster = False, username: str = None, pat_token: str = None, workspace_url: str = None, skip_dashboards: bool = False, cloud: str = "AWS", catalog: str = None, schema: str = None, serverless: bool = None, warehouse_name: str = None, skip_genie_rooms: bool = False, dlt_policy_id: str = None, dlt_compute_settings: dict = None)</div>: install the given demo to the given path.<br/><br/>
<ul>
<li>If overwrite is True, dbdemos will delete the given path folder and re-install the notebooks.</li>
<li>use_current_cluster = True will not start a new cluster to init the demo but use the current cluster instead. <strong>Set it to True it if you don't have cluster creation permission</strong>.</li>
<li>skip_dashboards = True will not load the DBSQL dashboard if any (faster, use it if the dashboard generation creates some issue).</li>
<li>If no authentication are provided, dbdemos will use the current user credential & workspace + cloud to install the demo.</li>
<li>catalog and schema options let you chose where to load the data and other assets.</li>
<li>Dashboards require a warehouse, you can specify it with the warehouse_name='xx' option.</li>
<li>Dbdemos will detect serverless compute and use the current cluster when you're running serverless. You can force it with the serverless=True option.</li>
<li>Genie rooms are in beta. You can skip the genie room installation with skip_genie_rooms = True.</li>
<li>dlt_policy_id will be used in the dlt (example: "0003963E5B551CE4"). Use it with dlt_compute_settings = {"autoscale": {"min_workers": 1, "max_workers": 5}} to respect the policy requirements.</li>
</ul><br/>
</li>
<li>
<div class="code">dbdemos.create_cluster(demo_name: str)</div>: install update the interactive cluster for the demo (scoped to the user).<br/><br/>
</li>
<li>
<div class="code">dbdemos.install_all(path: str = "./", overwrite: bool = False, username: str = None, pat_token: str = None, workspace_url: str = None, skip_dashboards: bool = False, cloud: str = "AWS")</div>: install all the demos to the given path.<br/><br/>
</li>
</ul>
</div>""")
else:
print("------------ DBDemos ------------------")
print("""dbdemos.help(): display help.""")
print("""dbdemos.list_demos(category: str = None): list all demos available, can filter per category (ex: 'governance').""")
print("""dbdemos.install(demo_name: str, path: str = "./", overwrite: bool = False, username: str = None, pat_token: str = None, workspace_url: str = None, skip_dashboards: bool = False, cloud: str = "AWS"): install the given demo to the given path.""")
print("""dbdemos.create_cluster(demo_name: str): install update the interactive cluster for the demo (scoped to the user).""")
print("""dbdemos.install_all(path: str = "./", overwrite: bool = False, username: str = None, pat_token: str = None, workspace_url: str = None, skip_dashboards: bool = False, cloud: str = "AWS")</div>: install all the demos to the given path.""")
def list_demos(category = None, installer = None, pat_token = None):
check_version()
deprecated_demos = ["uc-04-audit-log", "llm-dolly-chatbot"]
if installer == None:
installer = Installer(pat_token=pat_token)
installer.tracker.track_list()
demos = defaultdict(lambda: [])
#Define category order
demos["lakehouse"] = []
demos["data-engineering"] = []
demos["governance"] = []
demos["DBSQL"] = []
demos["data-science"] = []
demos["AI-BI"] = []
for demo in installer.get_demos_available():
conf = installer.get_demo_conf(demo)
if (category is None or conf.category == category.lower()) and conf.name not in deprecated_demos:
demos[conf.category].append(conf)
if installer.report.displayHTML_available():
content = get_html_list_demos(demos)
from dbruntime.display import displayHTML
displayHTML(content)
else:
list_console(demos)
def get_html_list_demos(demos):
categories = list(demos.keys())
content = f"""{CSS_LIST}<div class="dbdemo">
<div style="padding: 10px 0px 20px 20px">"""
for i, cat in enumerate(categories):
content += f"""<button category="{cat}" class="menu_button {"selected" if i == 0 else ""}" type="button">{f'<span class="new_tag">NEW!</span>' if cat == 'AI-BI' else ''}<span>{cat.capitalize()}</span></button>"""
content += """</div>"""
for i, cat in enumerate(categories):
content += f"""<div class="dbdemo_category" style="min-height: 200px; display: {"block" if i == 0 else "none"}" id="category-{cat}">"""
ds = list(demos[cat])
ds.sort(key=lambda d: d.name)
for demo in ds:
content += f"""
<div class="dbdemo_box">
<img class="dbdemo_logo" src="https://github.com/databricks-demos/dbdemos-resources/raw/main/icon/{demo.name}.jpg" />
<div class="dbdemo_description">
<h2>{demo.title}</h2>
{demo.description}
</div>
<div class="code">
dbdemos.install('{demo.name}')
</div>
</div>"""
content += """</div>"""
content += f"""</div>{JS_LIST}"""
return content
def list_console(demos):
print("----------------------------------------------------")
print("----------------- Demos Available ------------------")
print("----------------------------------------------------")
categories = list(demos.keys())
for cat in categories:
print(f"{cat.capitalize()}")
ds = list(demos[cat])
ds.sort(key=lambda d: d.name)
for demo in ds:
print(f" - {demo.name}: {demo.title} ({demo.description}) => dbdemos.install('{demo.name}')")
print("")
print("----------------------------------------------------")
def list_delta_live_tables(category = None):
pass
def list_dashboards(category = None):
pass
def install(demo_name, path = None, overwrite = False, username = None, pat_token = None, workspace_url = None, skip_dashboards = False, cloud = "AWS", start_cluster: bool = None,
use_current_cluster: bool = False, current_cluster_id = None, warehouse_name = None, debug = False, catalog = None, schema = None, serverless=None, skip_genie_rooms=False,
create_schema=True, dlt_policy_id = None, dlt_compute_settings = None):
check_version()
if demo_name == "llm-fine-tuning" :
print("ERROR: llm-fine-tuning is deprecated and has been removed. You can restore it from an older dbdemos version: %pip install dbdemos==0.6.28")
return
elif demo_name == "chatbot-rag-llm" or demo_name == "llm-tools-functions" or demo_name == "llm-rag-chatbot":
print(f"ERROR: {demo_name} is deprecated and has been removed. You can restore it from an older dbdemos version: %pip install dbdemos==0.6.28")
print("We will instead install the new ai-agent demos")
demo_name = "ai-agent"
elif demo_name == "dlt-loans" or demo_name == "dlt-loan" :
print("ERROR: dlt-loans is deprecated and has been removed. You can restore it from an older dbdemos version: %pip install dbdemos==0.6.28")
print("We will instead install the new pipeline-bike demos")
demo_name = "pipeline-bike"
elif demo_name == "dlt-unit-test":
print("WARN: dlt-unit-test has been renamed to declarative-pipeline-unit-test")
demo_name = "declarative-pipeline-unit-test"
elif demo_name == "dlt-cdc":
print("WARN: dlt-cdc has been renamed to declarative-pipeline-cdc")
demo_name = "declarative-pipeline-cdc"
elif demo_name == "lakehouse-retail-churn":
print("WARN: lakehouse-retail-churn has been renamed to lakehouse-retail-c360")
demo_name = "lakehouse-retail-c360"
elif demo_name == "identity-pk-fk":
print("WARN: identity-pk-fk has been renamed to sql-warehouse")
demo_name = "sql-warehouse"
elif demo_name == "auto-loader":
print("WARN: auto-loader has been renamed to data-ingestion")
demo_name = "data-ingestion"
try:
installer = Installer(username, pat_token, workspace_url, cloud, current_cluster_id = current_cluster_id)
except TokenException as e:
report = InstallerReport(workspace_url)
report.display_token_error(e, demo_name)
if not installer.test_premium_pricing():
#Force dashboard skip as dbsql isn't available to avoid any error.
skip_dashboards = True
installer.install_demo(demo_name, path, overwrite, skip_dashboards = skip_dashboards, start_cluster = start_cluster, use_current_cluster = use_current_cluster,
debug = debug, catalog = catalog, schema = schema, serverless = serverless, warehouse_name=warehouse_name, skip_genie_rooms=skip_genie_rooms, create_schema=create_schema, dlt_policy_id = dlt_policy_id, dlt_compute_settings = dlt_compute_settings)
def install_all(path = None, overwrite = False, username = None, pat_token = None, workspace_url = None, skip_dashboards = False, cloud = "AWS", start_cluster = None, use_current_cluster = False, catalog = None, schema = None, dlt_policy_id = None, dlt_compute_settings = None):
"""
Install all the bundle demos.
"""
installer = Installer(username, pat_token, workspace_url, cloud)
for demo_name in installer.get_demos_available():
installer.install_demo(demo_name, path, overwrite, skip_dashboards = skip_dashboards, start_cluster = start_cluster, use_current_cluster = use_current_cluster, catalog = catalog, schema = schema, dlt_policy_id = dlt_policy_id, dlt_compute_settings = dlt_compute_settings)
def check_status_all(username = None, pat_token = None, workspace_url = None, cloud = "AWS"):
"""
Check all dbdemos bundle demos installation status (see #check_status)
"""
installer = Installer(username, pat_token, workspace_url, cloud)
for demo_name in installer.get_demos_available():
check_status(demo_name, username, pat_token, workspace_url, cloud)
def check_status(demo_name:str, username = None, pat_token = None, workspace_url = None, cloud = "AWS", catalog = None, schema = None):
"""
Check the status of the given demo installation. Will pool the installation job if any and wait for its completion.
Throw an error if the job wasn't successful.
"""
installer = Installer(username, pat_token, workspace_url, cloud)
demo_conf = installer.get_demo_conf(demo_name, catalog, schema)
if schema is None:
schema = demo_conf.default_schema
if catalog is None:
catalog = demo_conf.default_catalog
if "settings" in demo_conf.init_job:
job_name = demo_conf.init_job["settings"]["name"]
existing_job = installer.db.find_job(job_name)
if existing_job == None:
raise Exception(f"Couldn't find job for demo {demo_name}. Did you install it first?")
installer.installer_workflow.wait_for_run_completion(existing_job['job_id'], debug=True)
runs = installer.db.get("2.1/jobs/runs/list", {"job_id": existing_job['job_id'], "limit": 1})
if runs['runs'][0]['state']['result_state'] != "SUCCESS":
raise Exception(f"Job {existing_job['job_id']} for demo {demo_name} failed: {installer.db.conf.workspace_url}/#job/{existing_job['job_id']}/run/{runs['runs'][0]['run_id']} - {runs}")
def create_cluster(demo_name, username = None, pat_token = None, workspace_url = None, cloud = "AWS"):
installer = Installer(username, pat_token, workspace_url, cloud = cloud)
installer.check_demo_name(demo_name)
print(f"Updating cluster for demo {demo_name}...")
demo_conf = installer.get_demo_conf(demo_name)
installer.tracker.track_create_cluster(demo_conf.category, demo_name)
cluster_id, cluster_name = installer.load_demo_cluster(demo_name, demo_conf, True)
installer.report.display_install_result(demo_name, demo_conf.description, demo_conf.title, cluster_id = cluster_id, cluster_name = cluster_name)
def check_version():
"""
Check if a newer version of dbdemos is available on PyPI.
Prints a warning if the installed version is outdated.
"""
try:
import pkg_resources
import requests
import json
# Get installed version
installed_version = pkg_resources.get_distribution('dbdemos').version
# Get latest version from PyPI
pypi_response = requests.get("https://pypi.org/pypi/dbdemos/json")
latest_version = json.loads(pypi_response.text)['info']['version']
# Compare versions
if pkg_resources.parse_version(latest_version) > pkg_resources.parse_version(installed_version):
print(f"\nWARNING: You are using dbdemos version {installed_version}, however version {latest_version} is available. You should consider upgrading:")
print("%pip install --upgrade dbdemos")
print("dbutils.library.restartPython()")
except Exception as e:
# Silently handle any errors during version check
pass
================================================
FILE: dbdemos/exceptions/__init__.py
================================================
================================================
FILE: dbdemos/exceptions/dbdemos_exception.py
================================================
class TokenException(Exception):
def __init__(self, message):
super().__init__(message)
self.message = message
class ClusterException(Exception):
def __init__(self, message, cluster_conf, response):
super().__init__(message)
self.response = response
self.cluster_conf = cluster_conf
class ClusterPermissionException(ClusterException):
def __init__(self, message, cluster_conf, response):
super().__init__(message, cluster_conf, response)
class ClusterCreationException(ClusterException):
def __init__(self, message, cluster_conf, response):
super().__init__(message, cluster_conf, response)
class GenieCreationException(Exception):
def __init__(self, message, genie_conf, response):
super().__init__(message)
self.response = response
self.genie_conf = genie_conf
class ExistingResourceException(Exception):
def __init__(self, install_path, response):
super().__init__(f"Folder {install_path} isn't empty.")
self.install_path = install_path
self.response = response
class SQLQueryException(Exception):
def __init__(self, message):
super().__init__(message)
class DataLoaderException(Exception):
def __init__(self, message):
super().__init__(message)
class FolderDeletionException(Exception):
def __init__(self, install_path, response):
super().__init__(f"Can't delete folder {install_path}.")
self.install_path = install_path
self.response = response
class FolderCreationException(Exception):
def __init__(self, install_path, response):
super().__init__(f"Can't load notebook {install_path}.")
self.install_path = install_path
self.response = response
class SDPException(Exception):
def __init__(self, message, description, pipeline_conf, response):
super().__init__(message)
self.description = description
self.pipeline_conf = pipeline_conf
self.response = response
class SDPNotAvailableException(SDPException):
def __init__(self, message, pipeline_conf, response):
super().__init__("SDP not available", message, pipeline_conf, response)
class SDPCreationException(SDPException):
def __init__(self, message, pipeline_conf, response):
super().__init__("SDP creation failure", message, pipeline_conf, response)
class WorkflowException(Exception):
def __init__(self, message, details, job_config, response):
super().__init__(message)
self.details = details
self.job_config = job_config
self.response = response
================================================
FILE: dbdemos/installer.py
================================================
import collections
import pkg_resources
from .conf import DBClient, DemoConf, Conf, ConfTemplate, merge_dict, DemoNotebook
from .exceptions.dbdemos_exception import ClusterPermissionException, ClusterCreationException, ClusterException, \
ExistingResourceException, FolderDeletionException, SDPNotAvailableException, SDPCreationException, SDPException, \
FolderCreationException, TokenException
from .installer_report import InstallerReport
from .installer_genie import InstallerGenie
from .installer_dashboard import InstallerDashboard
from .tracker import Tracker
from .notebook_parser import NotebookParser
from .installer_workflows import InstallerWorkflow
from .installer_repos import InstallerRepo
from pathlib import Path
import time
import json
import re
import base64
from concurrent.futures import ThreadPoolExecutor
from datetime import date
import urllib
import threading
from dbdemos.sql_query import SQLQueryExecutor
from databricks.sdk import WorkspaceClient
class Installer:
def __init__(self, username = None, pat_token = None, workspace_url = None, cloud = "AWS", org_id: str = None, current_cluster_id: str = None):
self.cloud = cloud
self.dbutils = None
if username is None:
username = self.get_current_username()
if workspace_url is None:
workspace_url = self.get_current_url()
if pat_token is None:
pat_token = self.get_current_pat_token()
if org_id is None:
org_id = self.get_org_id()
self.current_cluster_id = current_cluster_id
if self.current_cluster_id is None:
self.current_cluster_id = self.get_current_cluster_id()
conf = Conf(username, workspace_url, org_id, pat_token)
self.tracker = Tracker(org_id, self.get_uid(), username)
self.db = DBClient(conf)
self.report = InstallerReport(self.db.conf.workspace_url)
self.installer_workflow = InstallerWorkflow(self)
self.installer_repo = InstallerRepo(self)
self.installer_dashboard = InstallerDashboard(self)
self.installer_genie = InstallerGenie(self)
self.sql_query_executor = SQLQueryExecutor()
#Slows down on GCP as the dashboard API is very sensitive to back-pressure
# 1 dashboard at a time to reduce import pression as it seems to be creating new errors.
self.max_workers = 1 if self.get_current_cloud() == "GCP" else 1
def get_dbutils(self):
if self.dbutils is None:
try:
from pyspark.sql import SparkSession
spark = SparkSession.getActiveSession()
from pyspark.dbutils import DBUtils
self.dbutils = DBUtils(spark)
except:
try:
import IPython
self.dbutils = IPython.get_ipython().user_ns["dbutils"]
except:
#Can't get dbutils (local run)
return None
return self.dbutils
def get_current_url(self):
try:
return "https://"+self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().browserHostName().get()
except:
try:
return "https://"+self.get_dbutils_tags_safe()['browserHostName']
except:
return "local"
def get_dbutils_tags_safe(self):
import json
return json.loads(self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().safeToJson())['attributes']
def get_current_cluster_id(self):
try:
return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().tags().apply('clusterId')
except:
try:
return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().clusterId().get()
except:
try:
return self.get_dbutils_tags_safe()['clusterId']
except:
return "local"
def get_org_id(self):
try:
return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().tags().apply('orgId')
except:
try:
return self.get_dbutils_tags_safe()['orgId']
except:
return "local"
def get_uid(self):
try:
return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().tags().apply('userId')
except:
return "local"
def get_current_folder(self):
try:
current_notebook = self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()
return current_notebook[:current_notebook.rfind("/")]
except:
try:
current_notebook = self.get_dbutils_tags_safe()['notebook_path']
return current_notebook[:current_notebook.rfind("/")]
except:
return "local"
def get_workspace_id(self):
try:
return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().workspaceId().get()
except:
try:
return self.get_dbutils_tags_safe()['orgId']
except:
return "local"
def get_current_pat_token(self):
try:
token = self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()
except Exception as e:
raise TokenException("Couldn't get a PAT Token: "+str(e)+". If you're installing it locally or from a batch, please use the pat_token='xxx' parameter instead using a secret.")
if len(token) == 0:
raise TokenException("Empty PAT Token.")
return token
def get_current_username(self):
try:
return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().tags().apply('user')
except Exception as e2:
try:
return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().userName().get()
except Exception as e:
try:
return self.get_dbutils_tags_safe()['user']
except:
print(f"WARN: couldn't get current username. This shouldn't happen - unpredictable behavior - 2 errors: {e2} - {e} - will return 'unknown'")
return "unknown"
def get_current_cloud(self):
try:
hostname = self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().browserHostName().get()
except:
print(f"WARNING: Can't get cloud from dbutils. Fallback to default local cloud {self.cloud}")
return self.cloud
if "gcp" in hostname:
return "GCP"
elif "azure" in hostname:
return "AZURE"
else:
return "AWS"
def get_current_cluster_id(self):
try:
return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().tags().apply('clusterId')
except:
try:
return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().clusterId().get()
except:
try:
return self.get_dbutils_tags_safe()['clusterId']
except:
return "local"
def get_workspace_url(self):
try:
workspace_url = "https://"+self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().browserHostName().get()
except Exception as e:
raise Exception("Couldn't get workspace URL: "+str(e))
return workspace_url
def check_demo_name(self, demo_name):
demos = collections.defaultdict(lambda: [])
#Define category order
demos["lakehouse"] = []
demo_availables = self.get_demos_available()
if demo_name not in demo_availables:
for demo in demo_availables:
conf = self.get_demo_conf(demo)
demos[conf.category].append(conf)
self.report.display_demo_name_error(demo_name, demos)
def get_demos_available(self):
return set(pkg_resources.resource_listdir("dbdemos", "bundles"))
def get_demo_conf(self, demo_name:str, catalog:str = None, schema:str = None, demo_folder: str = ""):
demo = self.get_resource(f"bundles/{demo_name}/conf.json")
raw_demo = json.loads(demo)
catalog = catalog if catalog is not None else raw_demo.get('default_catalog', None)
schema = schema if schema is not None else raw_demo.get('default_schema', None)
conf_template = ConfTemplate(self.db.conf.username, demo_name, catalog, schema, demo_folder)
return DemoConf(demo_name, json.loads(conf_template.replace_template_key(demo)), catalog, schema)
def get_resource(self, path, decode=True):
resource = pkg_resources.resource_string("dbdemos", path)
return resource.decode('UTF-8') if decode else resource
def resource_isdir(self, path):
return pkg_resources.resource_isdir("dbdemos", path)
def test_premium_pricing(self):
try:
w = self.db.get("2.0/sql/config/warehouses", {"limit": 1}, print_auth_error = False)
if "error_code" in w and (w["error_code"] == "FEATURE_DISABLED" or w["error_code"] == "ENDPOINT_NOT_FOUND"):
self.report.display_non_premium_warn(Exception(f"DBSQL not available, either at workspace level or user entitlement."), w)
return False
return True
except Exception as e:
print(e)
self.report.display_non_premium_warn(Exception(f"DBSQL not available"), str(e))
return False
def cluster_is_serverless(self):
try:
cluster_details = self.db.get("2.0/clusters/get", {"cluster_id": self.get_current_cluster_id()})
return cluster_details.get("enable_serverless_compute", False)
except Exception as e:
print(f"Couldn't get cluster serverless status. Will consider it False. {e}")
return False
def create_or_check_schema(self, demo_conf: DemoConf, create_schema: bool, debug=True):
"""Create or verify schema exists based on create_schema parameter"""
ws = WorkspaceClient(token=self.db.conf.pat_token, host=self.db.conf.workspace_url)
try:
catalog = ws.catalogs.get(demo_conf.catalog)
except Exception as e:
if create_schema:
if debug:
print(f"Can't describe catalog {demo_conf.catalog}. Will now try to create it. Error: {e}")
try:
print(f"Catalog {demo_conf.catalog} doesn't exist. Creating it. You can set create_schema=False to avoid catalog and schema creation, or install in another catalog with catalog=<catalog_name>.")
self.sql_query_executor.execute_query(ws, f"CREATE CATALOG IF NOT EXISTS {demo_conf.catalog}")
#note: ws.catalogs.create(demo_conf.catalog) this doesn't work properly in serverless workspaces with default storage for now (Metastore storage root URL does not exist error)
except Exception as e:
self.report.display_schema_creation_error(e, demo_conf)
else:
self.report.display_schema_not_found_error(e, demo_conf)
schema_full_name = f"{demo_conf.catalog}.{demo_conf.schema}"
try:
schema = ws.schemas.get(schema_full_name)
except Exception as e:
if create_schema:
if debug:
print(f"Can't describe schema {schema_full_name}. Will now try to create it. Error: {e}")
try:
schema = ws.schemas.create(demo_conf.schema, catalog_name=demo_conf.catalog)
except Exception as e:
self.report.display_schema_creation_error(e, demo_conf)
else:
self.report.display_schema_not_found_error(e, demo_conf)
def install_demo(self, demo_name, install_path, overwrite=False, update_cluster_if_exists = True, skip_dashboards = False, start_cluster = None,
use_current_cluster = False, debug = False, catalog = None, schema = None, serverless=False, warehouse_name = None, skip_genie_rooms=False,
create_schema=True, dlt_policy_id = None, dlt_compu
gitextract_ps7v2nt2/ ├── .claude/ │ └── commands/ │ └── release.md ├── .cursorignore ├── .gitignore ├── .vscode/ │ ├── launch.json │ └── settings.json ├── CLAUDE.md ├── LICENSE ├── MANIFEST.in ├── NOTICE ├── README.md ├── README_AIBI.md ├── SECURITY.md ├── ai_release/ │ ├── __init__.py │ ├── bundle.py │ ├── compute.py │ ├── inspect_jobs.py │ ├── jobs.py │ ├── run_remote.py │ ├── run_state.py │ └── runs/ │ └── .gitignore ├── build-and-distribute.sh ├── build.sh ├── dbdemos/ │ ├── __init__.py │ ├── conf.py │ ├── dbdemos.py │ ├── exceptions/ │ │ ├── __init__.py │ │ └── dbdemos_exception.py │ ├── installer.py │ ├── installer_dashboard.py │ ├── installer_genie.py │ ├── installer_report.py │ ├── installer_repos.py │ ├── installer_workflows.py │ ├── job_bundler.py │ ├── notebook_parser.py │ ├── packager.py │ ├── resources/ │ │ ├── default_cluster_config-AWS.json │ │ ├── default_cluster_config-AZURE.json │ │ ├── default_cluster_config-GCP.json │ │ ├── default_cluster_config.json │ │ ├── default_cluster_job_config.json │ │ └── default_test_job_conf.json │ ├── sql_query.py │ ├── template/ │ │ ├── LICENSE.html │ │ ├── NOTICE.html │ │ ├── README.html │ │ ├── code_viewer.html │ │ └── index.html │ └── tracker.py ├── docs/ │ ├── CNAME │ └── index.html ├── main.py ├── requirements.in ├── requirements.txt ├── setup.py ├── test/ │ ├── __init__.py │ ├── test2.html │ ├── test_installer.py │ ├── test_installer_genie.py │ ├── test_job_bundler.py │ ├── test_list_demos.html │ ├── test_list_demos2.html │ └── test_notebook_parser.py ├── test_demo.py └── test_list_html.html
SYMBOL INDEX (376 symbols across 26 files)
FILE: ai_release/bundle.py
function load_config (line 54) | def load_config(args):
function load_cluster_templates (line 94) | def load_cluster_templates():
function create_conf (line 107) | def create_conf(config):
function list_demos (line 132) | def list_demos(bundler: JobBundler):
function get_job_status (line 145) | def get_job_status(bundler: JobBundler, demo_name: str):
function wait_for_run (line 210) | def wait_for_run(bundler: JobBundler, job_id: int, run_id: int):
function repair_job (line 233) | def repair_job(bundler: JobBundler, demo_name: str, wait: bool = False):
function cleanup_demo_schema (line 295) | def cleanup_demo_schema(bundler: JobBundler, demo_conf):
function bundle_demo (line 329) | def bundle_demo(bundler: JobBundler, demo_path: str, force: bool = False...
function bundle_all (line 378) | def bundle_all(bundler: JobBundler, force: bool = False, cleanup_schema:...
function find_demo_path (line 432) | def find_demo_path(bundler: JobBundler, demo_name: str) -> str:
function main (line 465) | def main():
FILE: ai_release/compute.py
class ExecutionResult (line 26) | class ExecutionResult:
method __init__ (line 29) | def __init__(
method __repr__ (line 57) | def __repr__(self):
method to_dict (line 62) | def to_dict(self) -> Dict[str, Any]:
function get_workspace_client (line 83) | def get_workspace_client(host: str, token: str) -> WorkspaceClient:
function list_clusters (line 94) | def list_clusters(client: WorkspaceClient, include_terminated: bool = Fa...
function find_cluster_by_name (line 130) | def find_cluster_by_name(client: WorkspaceClient, name_pattern: str) -> ...
function start_cluster (line 157) | def start_cluster(client: WorkspaceClient, cluster_id: str) -> Dict[str,...
function get_cluster_status (line 190) | def get_cluster_status(client: WorkspaceClient, cluster_id: str) -> Dict...
function wait_for_cluster (line 200) | def wait_for_cluster(client: WorkspaceClient, cluster_id: str, timeout: ...
function create_context (line 221) | def create_context(client: WorkspaceClient, cluster_id: str, language: s...
function destroy_context (line 230) | def destroy_context(client: WorkspaceClient, cluster_id: str, context_id...
function execute_command (line 235) | def execute_command(
function execute_file (line 359) | def execute_file(
FILE: ai_release/inspect_jobs.py
function format_timestamp (line 46) | def format_timestamp(ts: int) -> str:
function format_duration (line 53) | def format_duration(start: int, end: int) -> str:
function print_job_list (line 66) | def print_job_list(jobs: list, failed_only: bool = False):
function print_fix_workflow (line 102) | def print_fix_workflow(job: JobInfo, inspector: JobInspector):
function print_job_details (line 149) | def print_job_details(job: JobInfo, inspector: JobInspector, show_errors...
function print_task_output (line 242) | def print_task_output(inspector: JobInspector, task_run_id: int):
function main (line 277) | def main():
FILE: ai_release/jobs.py
class NotebookError (line 25) | class NotebookError:
class TaskResult (line 36) | class TaskResult:
method failed (line 48) | def failed(self) -> bool:
method get_error_summary (line 51) | def get_error_summary(self) -> str:
class JobRunResult (line 77) | class JobRunResult:
method succeeded (line 91) | def succeeded(self) -> bool:
method failed (line 95) | def failed(self) -> bool:
method running (line 99) | def running(self) -> bool:
method failed_tasks (line 103) | def failed_tasks(self) -> List[TaskResult]:
method get_failure_summary (line 106) | def get_failure_summary(self) -> str:
class JobInfo (line 132) | class JobInfo:
class JobInspector (line 142) | class JobInspector:
method __init__ (line 164) | def __init__(self, host: str, token: str, github_token: str = None, re...
method _github_get (line 179) | def _github_get(self, path: str) -> dict:
method list_bundle_jobs (line 191) | def list_bundle_jobs(self, include_run_details: bool = True) -> List[J...
method find_job (line 233) | def find_job(self, demo_name: str) -> Optional[JobInfo]:
method get_job_run_details (line 257) | def get_job_run_details(self, job_id: int, run_id: int = None) -> Opti...
method _parse_run (line 289) | def _parse_run(self, run, job_name: str) -> JobRunResult:
method get_task_output (line 338) | def get_task_output(self, task_run_id: int) -> Optional[Dict[str, Any]]:
method export_notebook_html (line 359) | def export_notebook_html(self, task_run_id: int) -> Optional[str]:
method extract_errors_from_html (line 378) | def extract_errors_from_html(self, html_content: str) -> List[Notebook...
method get_task_errors (line 451) | def get_task_errors(self, task: TaskResult) -> TaskResult:
method get_head_commit (line 483) | def get_head_commit(self) -> Optional[str]:
method check_job_up_to_date (line 497) | def check_job_up_to_date(self, job_info: JobInfo) -> bool:
method get_failed_jobs (line 518) | def get_failed_jobs(self) -> List[JobInfo]:
method get_job_url (line 523) | def get_job_url(self, job_id: int, run_id: int = None) -> str:
function load_inspector_from_config (line 530) | def load_inspector_from_config() -> JobInspector:
FILE: ai_release/run_remote.py
function load_config (line 58) | def load_config():
function save_context (line 77) | def save_context(cluster_id: str, context_id: str):
function load_context (line 84) | def load_context():
function clear_context (line 92) | def clear_context():
function main (line 99) | def main():
FILE: ai_release/run_state.py
class DemoRunState (line 28) | class DemoRunState:
method to_dict (line 40) | def to_dict(self) -> dict:
method from_dict (line 44) | def from_dict(cls, data: dict) -> "DemoRunState":
class RunState (line 49) | class RunState:
method to_dict (line 56) | def to_dict(self) -> dict:
method from_dict (line 65) | def from_dict(cls, data: dict) -> "RunState":
class RunStateManager (line 75) | class RunStateManager:
method __init__ (line 78) | def __init__(self, commit_id: Optional[str] = None):
method _get_current_commit (line 87) | def _get_current_commit(self) -> str:
method _load_or_create_state (line 102) | def _load_or_create_state(self) -> RunState:
method save (line 110) | def save(self):
method get_demo_dir (line 116) | def get_demo_dir(self, demo_name: str) -> Path:
method get_demo_state (line 122) | def get_demo_state(self, demo_name: str) -> DemoRunState:
method update_demo_status (line 128) | def update_demo_status(self, demo_name: str, status: str, **kwargs):
method _save_demo_status (line 144) | def _save_demo_status(self, demo_name: str, state: DemoRunState):
method save_errors (line 150) | def save_errors(self, demo_name: str, errors: List[Dict[str, Any]]):
method save_job_output (line 159) | def save_job_output(self, demo_name: str, output: str):
method add_fix_attempt (line 165) | def add_fix_attempt(self, demo_name: str, description: str, branch: st...
method update_fix_result (line 183) | def update_fix_result(self, demo_name: str, result: str):
method add_note (line 190) | def add_note(self, demo_name: str, note: str):
method get_summary (line 199) | def get_summary(self) -> str:
method list_runs (line 229) | def list_runs(cls) -> List[str]:
method get_latest_run (line 236) | def get_latest_run(cls) -> Optional["RunStateManager"]:
function get_run_state (line 245) | def get_run_state(commit_id: Optional[str] = None) -> RunStateManager:
function get_latest_run (line 250) | def get_latest_run() -> Optional[RunStateManager]:
FILE: dbdemos/conf.py
function merge_dict (line 13) | def merge_dict(a, b, path=None, override = True):
class Conf (line 25) | class Conf():
method __init__ (line 26) | def __init__(self, username: str, workspace_url: str, org_id: str, pat...
method get_repo_path (line 45) | def get_repo_path(self):
method get_demo_pool (line 49) | def get_demo_pool(self):
method is_dev_env (line 62) | def is_dev_env(self):
method is_demo_env (line 65) | def is_demo_env(self):
method is_fe_env (line 68) | def is_fe_env(self):
class DBClient (line 72) | class DBClient():
method __init__ (line 73) | def __init__(self, conf: Conf):
method clean_path (line 76) | def clean_path(self, path):
method post (line 85) | def post(self, path: str, json: dict = {}, retry = 0):
method put (line 99) | def put(self, path: str, json: dict = None, data: bytes = None):
method patch (line 110) | def patch(self, path: str, json: dict = {}):
method get (line 115) | def get(self, path: str, params: dict = {}, print_auth_error = True):
method delete (line 120) | def delete(self, path: str, params: dict = {}):
method get_json_result (line 125) | def get_json_result(self, url: str, r: Response, print_auth_error = Tr...
method search_cluster (line 135) | def search_cluster(self, cluster_name: str, tags: dict):
method find_job (line 148) | def find_job(self, name, offset = 0, limit = 25):
class GenieRoom (line 158) | class GenieRoom():
method __init__ (line 159) | def __init__(self, id: str, display_name: str, description: str, table...
class DataFolder (line 170) | class DataFolder():
method __init__ (line 171) | def __init__(self, source_folder: str, source_format: str, target_tabl...
class DemoNotebook (line 179) | class DemoNotebook():
method __init__ (line 180) | def __init__(self, path: str, title: str, description: str, pre_run: b...
method __repr__ (line 194) | def __repr__(self):
method get_folder (line 197) | def get_folder(self):
method get_clean_path (line 201) | def get_clean_path(self):
method toJSON (line 211) | def toJSON(self):
class DemoConf (line 214) | class DemoConf():
method __init__ (line 215) | def __init__(self, path: str, json_conf: dict, catalog:str = None, sch...
method __repr__ (line 270) | def __repr__(self):
method update_notebook_object_type (line 273) | def update_notebook_object_type(self, notebook: DemoNotebook, object_t...
method add_notebook (line 280) | def add_notebook(self, notebook):
method set_pipeline_id (line 285) | def set_pipeline_id(self, id, uid):
method get_job_name (line 293) | def get_job_name(self):
method get_notebooks_to_run (line 296) | def get_notebooks_to_run(self):
method get_notebooks_to_publish (line 299) | def get_notebooks_to_publish(self) -> List[DemoNotebook]:
method get_bundle_path (line 302) | def get_bundle_path(self):
method get_bundle_dashboard_path (line 305) | def get_bundle_dashboard_path(self):
method get_bundle_root_path (line 308) | def get_bundle_root_path(self):
method get_minisite_path (line 311) | def get_minisite_path(self):
class ConfTemplate (line 315) | class ConfTemplate:
method __init__ (line 316) | def __init__(self, username, demo_name, catalog = None, schema = None,...
method template_TODAY (line 323) | def template_TODAY(self):
method template_CURRENT_USER (line 326) | def template_CURRENT_USER(self):
method template_CATALOG (line 329) | def template_CATALOG(self):
method template_SCHEMA (line 332) | def template_SCHEMA(self):
method template_CURRENT_USER_NAME (line 335) | def template_CURRENT_USER_NAME(self):
method template_DEMO_NAME (line 340) | def template_DEMO_NAME(self):
method template_DEMO_FOLDER (line 343) | def template_DEMO_FOLDER(self):
method template_SHARED_WAREHOUSE_ID (line 346) | def template_SHARED_WAREHOUSE_ID(self):
method replace_template_key (line 349) | def replace_template_key(self, text: str):
FILE: dbdemos/dbdemos.py
function help (line 96) | def help():
function list_demos (line 157) | def list_demos(category = None, installer = None, pat_token = None):
function get_html_list_demos (line 182) | def get_html_list_demos(demos):
function list_console (line 210) | def list_console(demos):
function list_delta_live_tables (line 224) | def list_delta_live_tables(category = None):
function list_dashboards (line 227) | def list_dashboards(category = None):
function install (line 230) | def install(demo_name, path = None, overwrite = False, username = None, ...
function install_all (line 277) | def install_all(path = None, overwrite = False, username = None, pat_tok...
function check_status_all (line 285) | def check_status_all(username = None, pat_token = None, workspace_url = ...
function check_status (line 293) | def check_status(demo_name:str, username = None, pat_token = None, works...
function create_cluster (line 315) | def create_cluster(demo_name, username = None, pat_token = None, workspa...
function check_version (line 325) | def check_version():
FILE: dbdemos/exceptions/dbdemos_exception.py
class TokenException (line 2) | class TokenException(Exception):
method __init__ (line 3) | def __init__(self, message):
class ClusterException (line 7) | class ClusterException(Exception):
method __init__ (line 8) | def __init__(self, message, cluster_conf, response):
class ClusterPermissionException (line 14) | class ClusterPermissionException(ClusterException):
method __init__ (line 15) | def __init__(self, message, cluster_conf, response):
class ClusterCreationException (line 19) | class ClusterCreationException(ClusterException):
method __init__ (line 20) | def __init__(self, message, cluster_conf, response):
class GenieCreationException (line 24) | class GenieCreationException(Exception):
method __init__ (line 25) | def __init__(self, message, genie_conf, response):
class ExistingResourceException (line 31) | class ExistingResourceException(Exception):
method __init__ (line 32) | def __init__(self, install_path, response):
class SQLQueryException (line 37) | class SQLQueryException(Exception):
method __init__ (line 38) | def __init__(self, message):
class DataLoaderException (line 41) | class DataLoaderException(Exception):
method __init__ (line 42) | def __init__(self, message):
class FolderDeletionException (line 45) | class FolderDeletionException(Exception):
method __init__ (line 46) | def __init__(self, install_path, response):
class FolderCreationException (line 51) | class FolderCreationException(Exception):
method __init__ (line 52) | def __init__(self, install_path, response):
class SDPException (line 59) | class SDPException(Exception):
method __init__ (line 60) | def __init__(self, message, description, pipeline_conf, response):
class SDPNotAvailableException (line 66) | class SDPNotAvailableException(SDPException):
method __init__ (line 67) | def __init__(self, message, pipeline_conf, response):
class SDPCreationException (line 70) | class SDPCreationException(SDPException):
method __init__ (line 71) | def __init__(self, message, pipeline_conf, response):
class WorkflowException (line 74) | class WorkflowException(Exception):
method __init__ (line 75) | def __init__(self, message, details, job_config, response):
FILE: dbdemos/installer.py
class Installer (line 29) | class Installer:
method __init__ (line 30) | def __init__(self, username = None, pat_token = None, workspace_url = ...
method get_dbutils (line 58) | def get_dbutils(self):
method get_current_url (line 75) | def get_current_url(self):
method get_dbutils_tags_safe (line 84) | def get_dbutils_tags_safe(self):
method get_current_cluster_id (line 88) | def get_current_cluster_id(self):
method get_org_id (line 100) | def get_org_id(self):
method get_uid (line 109) | def get_uid(self):
method get_current_folder (line 115) | def get_current_folder(self):
method get_workspace_id (line 125) | def get_workspace_id(self):
method get_current_pat_token (line 133) | def get_current_pat_token(self):
method get_current_username (line 142) | def get_current_username(self):
method get_current_cloud (line 155) | def get_current_cloud(self):
method get_current_cluster_id (line 168) | def get_current_cluster_id(self):
method get_workspace_url (line 180) | def get_workspace_url(self):
method check_demo_name (line 187) | def check_demo_name(self, demo_name):
method get_demos_available (line 198) | def get_demos_available(self):
method get_demo_conf (line 201) | def get_demo_conf(self, demo_name:str, catalog:str = None, schema:str ...
method get_resource (line 209) | def get_resource(self, path, decode=True):
method resource_isdir (line 213) | def resource_isdir(self, path):
method test_premium_pricing (line 216) | def test_premium_pricing(self):
method cluster_is_serverless (line 228) | def cluster_is_serverless(self):
method create_or_check_schema (line 236) | def create_or_check_schema(self, demo_conf: DemoConf, create_schema: b...
method install_demo (line 268) | def install_demo(self, demo_name, install_path, overwrite=False, updat...
method get_demo_datasource (line 332) | def get_demo_datasource(self, warehouse_name = None):
method get_or_create_endpoint (line 356) | def get_or_create_endpoint(self, username: str, demo_conf: DemoConf, d...
method check_if_install_folder_exists (line 403) | def check_if_install_folder_exists(self, demo_name: str, install_path:...
method install_notebooks (line 417) | def install_notebooks(self, demo_name: str, install_path: str, demo_co...
method load_demo_pipelines (line 487) | def load_demo_pipelines(self, demo_name, demo_conf: DemoConf, debug=Fa...
method load_demo_cluster (line 559) | def load_demo_cluster(self, demo_name, demo_conf: DemoConf, update_clu...
method wait_for_cluster_to_stop (line 628) | def wait_for_cluster_to_stop(self, cluster_conf, cluster):
method find_cluster (line 644) | def find_cluster(self, cluster_name):
method get_pipeline (line 652) | def get_pipeline(self, name):
method add_cluster_setup_cell (line 665) | def add_cluster_setup_cell(self, parser: NotebookParser, demo_name, cl...
method add_extra_cell (line 673) | def add_extra_cell(self, html, cell_content, position = 0):
method get_notebook_content (line 687) | def get_notebook_content(self, html):
FILE: dbdemos/installer_dashboard.py
class InstallerDashboard (line 9) | class InstallerDashboard:
method __init__ (line 10) | def __init__(self, installer: 'Installer'):
method install_dashboards (line 14) | def install_dashboards(self, demo_conf: DemoConf, install_path, wareho...
method replace_dashboard_schema (line 29) | def replace_dashboard_schema(self, demo_conf: DemoConf, definition: str):
method load_lakeview_dashboard (line 39) | def load_lakeview_dashboard(self, demo_conf: DemoConf, install_path, d...
FILE: dbdemos/installer_genie.py
class InstallerGenie (line 16) | class InstallerGenie:
method __init__ (line 19) | def __init__(self, installer: 'Installer'):
method install_genies (line 24) | def install_genies(self, demo_conf: DemoConf, install_path: str, wareh...
method install_genie (line 46) | def install_genie(self, room: GenieRoom, genie_path, warehouse_id, deb...
method create_temp_table_for_genie_creation (line 105) | def create_temp_table_for_genie_creation(self, ws: WorkspaceClient, ro...
method delete_temp_table_for_genie_creation (line 113) | def delete_temp_table_for_genie_creation(self, ws, room: GenieRoom, de...
method load_genie_data (line 120) | def load_genie_data(self, demo_conf: DemoConf, warehouse_id, debug=True):
method run_sql_queries (line 135) | def run_sql_queries(self, ws: WorkspaceClient, demo_conf: DemoConf, wa...
method get_current_cluster_id (line 147) | def get_current_cluster_id(self):
method load_data (line 150) | def load_data(self, ws: WorkspaceClient, data_folder: DataFolder, ware...
method create_raw_data_volume (line 176) | def create_raw_data_volume(self, ws: WorkspaceClient, demo_conf: DemoC...
method load_data_through_volume (line 200) | def load_data_through_volume(self, ws: WorkspaceClient, data_folders: ...
method load_data_to_volume (line 215) | def load_data_to_volume(self, ws: WorkspaceClient, data_folder: DataFo...
method create_table_from_volume (line 276) | def create_table_from_volume(self, ws: WorkspaceClient, data_folder: D...
FILE: dbdemos/installer_report.py
class InstallerReport (line 7) | class InstallerReport:
method __init__ (line 78) | def __init__(self, workspace_url: str):
method displayHTML_available (line 81) | def displayHTML_available(self):
method display_cluster_creation_warn (line 88) | def display_cluster_creation_warn(self, exception: ClusterCreationExce...
method display_serverless_warn (line 98) | def display_serverless_warn(self, exception: Exception, demo_conf: Dem...
method display_custom_schema_not_supported_error (line 103) | def display_custom_schema_not_supported_error(self, exception: Excepti...
method display_custom_schema_missing_error (line 108) | def display_custom_schema_missing_error(self, exception: Exception, de...
method display_incorrect_schema_error (line 112) | def display_incorrect_schema_error(self, exception: Exception, demo_co...
method display_warehouse_creation_error (line 116) | def display_warehouse_creation_error(self, exception: Exception, demo_...
method display_unknow_warehouse_error (line 121) | def display_unknow_warehouse_error(self, exception: Exception, demo_co...
method display_genie_room_creation_error (line 126) | def display_genie_room_creation_error(self, exception: Exception, demo...
method display_dashboard_error (line 131) | def display_dashboard_error(self, exception: Exception, demo_conf: Dem...
method display_folder_already_existing (line 139) | def display_folder_already_existing(self, exception: ExistingResourceE...
method display_folder_permission (line 146) | def display_folder_permission(self, exception: FolderDeletionException...
method display_folder_creation_error (line 152) | def display_folder_creation_error(self, exception: FolderCreationExcep...
method display_non_premium_warn (line 160) | def display_non_premium_warn(self, exception: Exception, response):
method display_pipeline_error (line 167) | def display_pipeline_error(self, exception: SDPException):
method display_pipeline_error_migration (line 174) | def display_pipeline_error_migration(self, exception: SDPException):
method display_workflow_error (line 183) | def display_workflow_error(self, exception: WorkflowException, demo_na...
method display_token_error (line 191) | def display_token_error(self, exception: TokenException, demo_name: str):
method display_demo_name_error (line 204) | def display_demo_name_error(self, name, demos):
method display_error (line 217) | def display_error(self, exception, message, raise_error = True, warnin...
method display_install_info (line 232) | def display_install_info(self, demo_conf: DemoConf, install_path, cata...
method display_info (line 265) | def display_info(self, info: str, title: str=""):
method display_install_result (line 279) | def display_install_result(self, demo_name, description, title, instal...
method get_install_result_html (line 286) | def get_install_result_html(self, demo_name, description, title, insta...
method display_install_result_html (line 375) | def display_install_result_html(self, demo_name, description, title, i...
method display_install_result_console (line 381) | def display_install_result_console(self, demo_name, description, title...
method display_schema_creation_error (line 437) | def display_schema_creation_error(self, exception: Exception, demo_con...
method display_schema_not_found_error (line 443) | def display_schema_not_found_error(self, exception: Exception, demo_co...
FILE: dbdemos/installer_repos.py
class InstallerRepo (line 8) | class InstallerRepo:
method __init__ (line 9) | def __init__(self, installer: 'Installer'):
method install_repos (line 14) | def install_repos(self, demo_conf: DemoConf, debug = False):
method get_repos (line 27) | def get_repos(self, path_prefix):
method update_or_create_repo (line 32) | def update_or_create_repo(self, repo):
FILE: dbdemos/installer_workflows.py
class InstallerWorkflow (line 11) | class InstallerWorkflow:
method __init__ (line 12) | def __init__(self, installer: 'Installer'):
method install_workflows (line 17) | def install_workflows(self, demo_conf: DemoConf, use_cluster_id = None...
method create_demo_init_job (line 33) | def create_demo_init_job(self, demo_conf: DemoConf, use_cluster_id = N...
method start_demo_init_job (line 44) | def start_demo_init_job(self, demo_conf: DemoConf, init_job, debug = F...
method create_or_replace_job (line 54) | def create_or_replace_job(self, demo_conf: DemoConf, definition: dict,...
method replace_warehouse_id (line 144) | def replace_warehouse_id(self, demo_conf: DemoConf, definition, wareho...
method wait_for_run_completion (line 157) | def wait_for_run_completion(self, job_id, max_retry=10, debug = False):
FILE: dbdemos/job_bundler.py
class JobBundler (line 10) | class JobBundler:
method __init__ (line 11) | def __init__(self, conf: Conf):
method get_cluster_conf (line 18) | def get_cluster_conf(self, demo_conf: DemoConf):
method load_bundles_conf (line 27) | def load_bundles_conf(self):
method add_bundle_from_config (line 54) | def add_bundle_from_config(self, bundle_config_paths):
method ignore_bundle (line 61) | def ignore_bundle(self, bundle_path):
method add_bundle (line 68) | def add_bundle(self, bundle_path, config_path: str = "_resources/bundl...
method reset_staging_repo (line 108) | def reset_staging_repo(self, skip_pull = False):
method start_and_wait_bundle_jobs (line 129) | def start_and_wait_bundle_jobs(self, force_execution: bool = False, sk...
method create_or_update_bundle_jobs (line 134) | def create_or_update_bundle_jobs(self, recreate_jobs: bool = False):
method get_head_commit (line 141) | def get_head_commit(self):
method run_bundle_jobs (line 153) | def run_bundle_jobs(self, force_execution: bool = False, skip_executio...
method wait_for_bundle_jobs_completion (line 194) | def wait_for_bundle_jobs_completion(self):
method wait_for_bundle_job_completion (line 199) | def wait_for_bundle_job_completion(self, demo_conf: DemoConf):
method create_bundle_job (line 209) | def create_bundle_job(self, demo_conf: DemoConf, recreate_jobs: bool =...
method create_or_update_job (line 276) | def create_or_update_job(self, demo_conf: DemoConf, job_conf: dict, re...
method check_if_demo_file_changed_since_commit (line 295) | def check_if_demo_file_changed_since_commit(self, demo_conf: DemoConf,...
method get_changed_files_since_commit (line 302) | def get_changed_files_since_commit(self, owner, repo, base_commit, las...
method cancel_job_run (line 320) | def cancel_job_run(self, demo_conf: DemoConf, run):
FILE: dbdemos/notebook_parser.py
class NotebookParser (line 9) | class NotebookParser:
method __init__ (line 11) | def __init__(self, html):
method get_notebook_content (line 15) | def get_notebook_content(self, html):
method get_html (line 22) | def get_html(self):
method contains (line 31) | def contains(self, str):
method remove_static_settings (line 34) | def remove_static_settings(self):
method set_tracker_tag (line 38) | def set_tracker_tag(self, org_id, uid, category, demo_name, notebook, ...
method remove_uncomment_tag (line 56) | def remove_uncomment_tag(self):
method remove_dbdemos_build (line 61) | def remove_dbdemos_build(self):
method remove_robots_meta (line 64) | def remove_robots_meta(self):
method add_cell_as_html_for_seo (line 68) | def add_cell_as_html_for_seo(self):
method _replace_with_optional_escaped_quotes (line 96) | def _replace_with_optional_escaped_quotes(content: str, old: str, new:...
method replace_schema_in_content (line 114) | def replace_schema_in_content(content: str, demo_conf: DemoConf) -> str:
method replace_schema (line 150) | def replace_schema(self, demo_conf: DemoConf):
method replace_in_notebook (line 154) | def replace_in_notebook(self, old, new, regex = False):
method add_extra_cell (line 160) | def add_extra_cell(self, cell_content, position = 1):
method remove_automl_result_links (line 174) | def remove_automl_result_links(self):
method change_relative_links_for_minisite (line 187) | def change_relative_links_for_minisite(self):
method add_javascript_to_minisite_relative_links (line 192) | def add_javascript_to_minisite_relative_links(self, notebook_path):
method set_environement_metadata (line 327) | def set_environement_metadata(self, client_version: str = "3"):
method hide_commands_and_results (line 339) | def hide_commands_and_results(self):
method remove_delete_cell (line 353) | def remove_delete_cell(self):
method replace_dynamic_links (line 358) | def replace_dynamic_links(self, items, name, link_path):
method replace_dynamic_links_workflow (line 373) | def replace_dynamic_links_workflow(self, workflows):
method replace_dynamic_links_repo (line 379) | def replace_dynamic_links_repo(self, repos):
method replace_dynamic_links_pipeline (line 388) | def replace_dynamic_links_pipeline(self, pipelines_id):
method replace_dynamic_links_lakeview_dashboards (line 395) | def replace_dynamic_links_lakeview_dashboards(self, dashboards_id):
method replace_dynamic_links_genie (line 402) | def replace_dynamic_links_genie(self, genie_rooms):
FILE: dbdemos/packager.py
class Packager (line 17) | class Packager:
method __init__ (line 19) | def __init__(self, conf: Conf, jobBundler: JobBundler):
method package_all (line 23) | def package_all(self, iframe_root_src = "./"):
method clean_bundle (line 35) | def clean_bundle(self, demo_conf: DemoConf):
method extract_lakeview_dashboards (line 40) | def extract_lakeview_dashboards(self, demo_conf: DemoConf):
method process_file_content (line 55) | def process_file_content(self, file, destination_path, extension = ""):
method process_notebook_content (line 61) | def process_notebook_content(self, demo_conf: DemoConf, html, full_path):
method package_demo (line 80) | def package_demo(self, demo_conf: DemoConf):
method get_file_icon_svg (line 154) | def get_file_icon_svg(self, file_path: str) -> str:
method build_tree_structure (line 175) | def build_tree_structure(self, notebooks_to_publish):
method render_tree_html (line 203) | def render_tree_html(self, tree, iframe_root_src="./", level=0):
method generate_html_from_code_file (line 260) | def generate_html_from_code_file(self, code_file_path: str, output_htm...
method build_minisite (line 310) | def build_minisite(self, demo_conf: DemoConf, iframe_root_src = "./"):
FILE: dbdemos/sql_query.py
class SQLQueryExecutor (line 10) | class SQLQueryExecutor:
method __init__ (line 11) | def __init__(self):
method get_or_create_shared_warehouse (line 14) | def get_or_create_shared_warehouse(self, ws: WorkspaceClient) -> str:
method execute_query_as_list (line 39) | def execute_query_as_list(self, ws: WorkspaceClient, query: str, timeo...
method execute_query (line 43) | def execute_query(self, ws: WorkspaceClient, query: str, timeout: int ...
method get_results_formatted_as_list (line 85) | def get_results_formatted_as_list(self, result_data: ResultData, resul...
FILE: dbdemos/tracker.py
class Tracker (line 5) | class Tracker:
method __init__ (line 10) | def __init__(self, org_id, uid, email = None):
method track_install (line 20) | def track_install(self, category, demo_name):
method track_create_cluster (line 23) | def track_create_cluster(self, category, demo_name):
method track_list (line 26) | def track_list(self):
method get_user_hash (line 29) | def get_user_hash(self):
method get_track_url (line 34) | def get_track_url(self, category, demo_name, event, notebook = ""):
method get_track_params (line 38) | def get_track_params(self, category, demo_name, event, notebook =""):
method track (line 56) | def track(self, category, demo_name, event):
FILE: main.py
function bundle (line 20) | def bundle():
FILE: test/test_installer.py
function test_html (line 9) | def test_html():
function test_list (line 47) | def test_list():
function test_list_html (line 51) | def test_list_html():
FILE: test/test_installer_genie.py
function test_room_install (line 10) | def test_room_install():
function test_load_genie_data (line 27) | def test_load_genie_data():
function test_schema_creation (line 43) | def test_schema_creation():
function load_data_to_volume (line 59) | def load_data_to_volume():
function test_load_data (line 74) | def test_load_data():
FILE: test/test_job_bundler.py
class TestJobBundler (line 6) | class TestJobBundler(unittest.TestCase):
method setUp (line 7) | def setUp(self):
method test_get_changed_files_since_commit (line 20) | def test_get_changed_files_since_commit(self):
method test_check_if_demo_file_changed_since_commit (line 46) | def test_check_if_demo_file_changed_since_commit(self):
FILE: test/test_notebook_parser.py
function test_close_cell (line 9) | def test_close_cell():
function test_automl (line 17) | def test_automl():
function test_change_relative_links_for_minisite (line 28) | def test_change_relative_links_for_minisite():
function test_parser_contains (line 35) | def test_parser_contains():
function test_parser_notebook (line 44) | def test_parser_notebook():
FILE: test_demo.py
function load_conf (line 7) | def load_conf(conf_path):
function bundle (line 19) | def bundle(conf, demo_path_in_repo):
Condensed preview — 65 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (4,671K chars).
[
{
"path": ".claude/commands/release.md",
"chars": 14600,
"preview": "# DBDemos Release Workflow\n\nYou are helping with the dbdemos release process. This involves bundling demos from the `dbd"
},
{
"path": ".cursorignore",
"chars": 416,
"preview": "bdemos/exceptions/__pycache__\n__pycache__\nlocal_conf_awsevent.json\nlocal_conf_cse2.json\nlocal_conf_gcp.json\nlocal_conf_i"
},
{
"path": ".gitignore",
"chars": 418,
"preview": "dbdemos/exceptions/__pycache__\n__pycache__\nupdate-minisite-dbdemos-website.sh\nlocal_conf_awsevent.json\nlocal_conf_cse2.j"
},
{
"path": ".vscode/launch.json",
"chars": 491,
"preview": "{\n // Use IntelliSense to learn about possible attributes.\n // Hover to view descriptions of existing attributes.\n"
},
{
"path": ".vscode/settings.json",
"chars": 381,
"preview": "{\n \"python.testing.unittestArgs\": [\n \"-v\",\n \"-s\",\n \"./test\",\n \"-p\",\n \"test_*.py\"\n "
},
{
"path": "CLAUDE.md",
"chars": 12092,
"preview": "# CLAUDE.md\n\nThis file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.\n\n## "
},
{
"path": "LICENSE",
"chars": 3391,
"preview": "Copyright (2022) Databricks, Inc.\n\nThis library (the \"Software\") may not be used except in connection with the Licensee'"
},
{
"path": "MANIFEST.in",
"chars": 111,
"preview": "recursive-include dbdemos/bundles *\nrecursive-include dbdemos/template *\nrecursive-include dbdemos/resources *\n"
},
{
"path": "NOTICE",
"chars": 1736,
"preview": "Copyright (2022) Databricks, Inc.\n## License\nThis Software includes software developed at Databricks (https://www.databr"
},
{
"path": "README.md",
"chars": 8922,
"preview": "# dbdemos\n\nDBDemos is a toolkit to easily install Lakehouse demos for Databricks.\n\n**Looking for the dbdemos notebooks a"
},
{
"path": "README_AIBI.md",
"chars": 12493,
"preview": "# Adding an AI-BI demo to dbdemos\n\n*Note: Adding new content from external contributors required special terms approval."
},
{
"path": "SECURITY.md",
"chars": 436,
"preview": "# Security Policy\n\n## Reporting a Vulnerability\n\nPlease email bugbounty@databricks.com to report any security vulnerabil"
},
{
"path": "ai_release/__init__.py",
"chars": 291,
"preview": "\"\"\"\nAI Release Tools for DBDemos\n\nThis module provides tools for Claude Code to:\n1. Execute code remotely on Databricks "
},
{
"path": "ai_release/bundle.py",
"chars": 19379,
"preview": "#!/usr/bin/env python3\n\"\"\"\nDBDemos Bundle CLI - For bundling and testing demos\n\nThis script is designed to be run by Cla"
},
{
"path": "ai_release/compute.py",
"chars": 13156,
"preview": "\"\"\"\nRemote Code Execution on Databricks Clusters\n\nThis module provides functions to execute code on Databricks clusters "
},
{
"path": "ai_release/inspect_jobs.py",
"chars": 15300,
"preview": "#!/usr/bin/env python3\n\"\"\"\nJob Inspection CLI for DBDemos\n\nInspect bundle jobs, check their status, and get detailed fai"
},
{
"path": "ai_release/jobs.py",
"chars": 19029,
"preview": "\"\"\"\nJob Inspection Module for DBDemos\n\nProvides functions to inspect bundle jobs, get failure details, and compare git c"
},
{
"path": "ai_release/run_remote.py",
"chars": 8237,
"preview": "#!/usr/bin/env python3\n\"\"\"\nRemote Code Execution CLI for DBDemos\n\nExecute Python code on a Databricks cluster for testin"
},
{
"path": "ai_release/run_state.py",
"chars": 8829,
"preview": "\"\"\"\nRun state management for AI release workflow.\n\nTracks job runs, errors, and fixes in a persistent folder structure:\n"
},
{
"path": "ai_release/runs/.gitignore",
"chars": 21,
"preview": "*.log\n*/\n!.gitignore\n"
},
{
"path": "build-and-distribute.sh",
"chars": 8104,
"preview": "#!/bin/bash\n\n# Check for pending changes before doing anything\nif ! git diff --quiet || ! git diff --cached --quiet; the"
},
{
"path": "build.sh",
"chars": 181,
"preview": "python3 setup.py clean --all bdist_wheel\nconda activate test_dbdemos\n#pip3 install dist/dbdemos-0.3.0-py3-none-any.whl -"
},
{
"path": "dbdemos/__init__.py",
"chars": 154,
"preview": "__version__ = \"0.6.34\"\n\nfrom .dbdemos import list_demos, install, create_cluster, help, install_all, check_status_all, c"
},
{
"path": "dbdemos/conf.py",
"chars": 15816,
"preview": "import json\nfrom pathlib import Path\nfrom typing import List\nimport requests\nimport urllib\nfrom datetime import date\nimp"
},
{
"path": "dbdemos/dbdemos.py",
"chars": 17318,
"preview": "from .exceptions.dbdemos_exception import TokenException\nfrom .installer import Installer\nfrom collections import defaul"
},
{
"path": "dbdemos/exceptions/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "dbdemos/exceptions/dbdemos_exception.py",
"chars": 2633,
"preview": "\nclass TokenException(Exception):\n def __init__(self, message):\n super().__init__(message)\n self.messag"
},
{
"path": "dbdemos/installer.py",
"chars": 39605,
"preview": "import collections\n\nimport pkg_resources\n\n\nfrom .conf import DBClient, DemoConf, Conf, ConfTemplate, merge_dict, DemoNot"
},
{
"path": "dbdemos/installer_dashboard.py",
"chars": 4451,
"preview": "from .conf import DemoConf\nimport pkg_resources\n\nfrom typing import TYPE_CHECKING\nif TYPE_CHECKING:\n from .installer "
},
{
"path": "dbdemos/installer_genie.py",
"chars": 17396,
"preview": "import json\nfrom concurrent.futures import ThreadPoolExecutor, as_completed\nfrom databricks.sdk import WorkspaceClient\nf"
},
{
"path": "dbdemos/installer_report.py",
"chars": 32562,
"preview": "from .conf import DemoConf\nfrom .exceptions.dbdemos_exception import ClusterCreationException, ExistingResourceException"
},
{
"path": "dbdemos/installer_repos.py",
"chars": 2431,
"preview": "from .conf import DemoConf\n\nfrom typing import TYPE_CHECKING\nif TYPE_CHECKING:\n from .installer import Installer\n\n\ncl"
},
{
"path": "dbdemos/installer_workflows.py",
"chars": 9465,
"preview": "from .conf import DemoConf, merge_dict, ConfTemplate\nimport json\nimport time\n\nfrom .exceptions.dbdemos_exception import "
},
{
"path": "dbdemos/job_bundler.py",
"chars": 18385,
"preview": "from .conf import DBClient, DemoConf, Conf, ConfTemplate, merge_dict\nimport time\nimport json\nimport re\nimport base64\nfro"
},
{
"path": "dbdemos/notebook_parser.py",
"chars": 20956,
"preview": "from dbdemos.conf import DemoConf\n\nfrom .tracker import Tracker\nimport urllib\nimport re\nimport base64\nimport json\n\nclass"
},
{
"path": "dbdemos/packager.py",
"chars": 18972,
"preview": "import pkg_resources\nfrom pathlib import Path\nfrom .conf import DBClient, DemoConf, Conf, DemoNotebook\nfrom .notebook_pa"
},
{
"path": "dbdemos/resources/default_cluster_config-AWS.json",
"chars": 223,
"preview": "{\n \"node_type_id\": \"i3.xlarge\",\n \"aws_attributes\": {\n \"first_on_demand\": 1,\n \"availability\": \"SPOT_WITH_FALLBACK"
},
{
"path": "dbdemos/resources/default_cluster_config-AZURE.json",
"chars": 163,
"preview": "{\n \"node_type_id\": \"Standard_D8ds_v4\",\n \"azure_attributes\": {\n \"first_on_demand\": 1,\n \"availability\": \"ON_DEMAND"
},
{
"path": "dbdemos/resources/default_cluster_config-GCP.json",
"chars": 161,
"preview": "{\n \"node_type_id\": \"n1-standard-8\",\n \"gcp_attributes\": {\n \"use_preemptible_executors\": false,\n \"availability\": \""
},
{
"path": "dbdemos/resources/default_cluster_config.json",
"chars": 493,
"preview": "{\n \"autoscale\": {\n \"min_workers\": 4,\n \"max_workers\": 4\n },\n \"cluster_name\": \"dbdemos-{{DEMO_NAME}}-{{CURRENT_US"
},
{
"path": "dbdemos/resources/default_cluster_job_config.json",
"chars": 200,
"preview": "{\n \"spark_version\": \"16.4.x-cpu-ml-scala2.12\",\n \"spark_conf\": {\n \"spark.databricks.dataLineage.enabled\": \"true\"\n }"
},
{
"path": "dbdemos/resources/default_test_job_conf.json",
"chars": 946,
"preview": "{\n \"name\": \"field-demos_{{DEMO_NAME}}\",\n \"email_notifications\": {\n \"no_alert_for_skipped_runs\": false\n },\n \"timeo"
},
{
"path": "dbdemos/sql_query.py",
"chars": 4248,
"preview": "import logging\nfrom databricks.sdk import WorkspaceClient\nfrom databricks.sdk.service.sql import StatementState, Execute"
},
{
"path": "dbdemos/template/LICENSE.html",
"chars": 1270843,
"preview": "<!DOCTYPE html>\n<html>\n<head>\n <meta name=\"databricks-html-version\" content=\"1\">\n<title>LICENSE - Databricks</title>\n\n<"
},
{
"path": "dbdemos/template/NOTICE.html",
"chars": 1264610,
"preview": "<!DOCTYPE html>\n<html>\n<head>\n <meta name=\"databricks-html-version\" content=\"1\">\n<title>NOTICE - Databricks</title>\n\n<m"
},
{
"path": "dbdemos/template/README.html",
"chars": 1262806,
"preview": "<!DOCTYPE html>\n<html>\n<head>\n <meta name=\"databricks-html-version\" content=\"1\">\n<title>README - Databricks</title>\n\n<m"
},
{
"path": "dbdemos/template/code_viewer.html",
"chars": 3883,
"preview": "<!doctype html>\n<html lang=\"en\">\n<head>\n <meta charset=\"utf-8\">\n <meta name=\"viewport\" content=\"width=device-width"
},
{
"path": "dbdemos/template/index.html",
"chars": 11550,
"preview": "<!doctype html>\n<html lang=\"en\">\n<head>\n <!-- Google tag (gtag.js) -->\n <!--\n <script async src=\"https://www.go"
},
{
"path": "dbdemos/tracker.py",
"chars": 3339,
"preview": "import requests\nimport urllib.parse\nimport hashlib\n\nclass Tracker:\n #Set this value to false to disable dbdemo toolki"
},
{
"path": "docs/CNAME",
"chars": 10,
"preview": "dbdemos.ai"
},
{
"path": "docs/index.html",
"chars": 1177,
"preview": "<!doctype html>\n<html lang=\"en\">\n<body>\n\ndbdemos.ai - Demos for Databricks\n<strong>dbdemos.ai has a new home under datab"
},
{
"path": "main.py",
"chars": 11759,
"preview": "import json\nfrom dbdemos.conf import Conf, DemoConf\nfrom dbdemos.installer import Installer\nfrom dbdemos.job_bundler imp"
},
{
"path": "requirements.in",
"chars": 75,
"preview": "# Direct dependencies from setup.py\nrequests\npandas\ndatabricks-sdk>=0.38.0\n"
},
{
"path": "requirements.txt",
"chars": 36079,
"preview": "#\n# This file is autogenerated by pip-compile with Python 3.11\n# by the following command:\n#\n# pip-compile --generate"
},
{
"path": "setup.py",
"chars": 1027,
"preview": "from setuptools import setup, find_packages\n\n#python setup.py clean --all bdist_wheel\nsetup(\n #this will be the packa"
},
{
"path": "test/__init__.py",
"chars": 55,
"preview": "# Empty file to mark the directory as a Python package "
},
{
"path": "test/test2.html",
"chars": 13146,
"preview": "\n <style>\n .dbdemos_install{\n font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Roboto"
},
{
"path": "test/test_installer.py",
"chars": 6270,
"preview": "import dbdemos\nfrom dbdemos.conf import DemoNotebook\n\nfrom dbdemos.installer import Installer\nfrom dbdemos.installer_rep"
},
{
"path": "test/test_installer_genie.py",
"chars": 4864,
"preview": "import dbdemos\nfrom dbdemos.conf import DemoNotebook, DemoConf, DataFolder\n\nfrom dbdemos.installer import Installer\nfrom"
},
{
"path": "test/test_job_bundler.py",
"chars": 3668,
"preview": "import unittest\nfrom dbdemos.job_bundler import JobBundler\nfrom dbdemos.conf import Conf\nimport json\n\nclass TestJobBundl"
},
{
"path": "test/test_list_demos.html",
"chars": 18965,
"preview": "<style>\n.dbdemo {\n font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica Neue,Arial,Noto Sans,sans-se"
},
{
"path": "test/test_list_demos2.html",
"chars": 17086,
"preview": " <style>\n.dbdemo {\n font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica Neue,Arial,Noto Sans,san"
},
{
"path": "test/test_notebook_parser.py",
"chars": 4201,
"preview": "import re\nimport base64\nimport urllib.parse\nimport json\nfrom dbdemos.notebook_parser import NotebookParser\n\n\n\ndef test_c"
},
{
"path": "test_demo.py",
"chars": 2993,
"preview": "import json\nfrom dbdemos.conf import Conf\nfrom dbdemos.job_bundler import JobBundler\nfrom dbdemos.packager import Packag"
},
{
"path": "test_list_html.html",
"chars": 6706,
"preview": "\n<style>\n.dbdemo {\n font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica Neue,Arial,Noto Sans,sans-s"
}
]
About this extraction
This page contains the full source code of the databricks-demos/dbdemos GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 65 files (4.1 MB), approximately 1.1M tokens, and a symbol index with 376 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.