Repository: databricks-demos/dbdemos Branch: main Commit: fa3f7f981d75 Files: 65 Total size: 4.1 MB Directory structure: gitextract_ps7v2nt2/ ├── .claude/ │ └── commands/ │ └── release.md ├── .cursorignore ├── .gitignore ├── .vscode/ │ ├── launch.json │ └── settings.json ├── CLAUDE.md ├── LICENSE ├── MANIFEST.in ├── NOTICE ├── README.md ├── README_AIBI.md ├── SECURITY.md ├── ai_release/ │ ├── __init__.py │ ├── bundle.py │ ├── compute.py │ ├── inspect_jobs.py │ ├── jobs.py │ ├── run_remote.py │ ├── run_state.py │ └── runs/ │ └── .gitignore ├── build-and-distribute.sh ├── build.sh ├── dbdemos/ │ ├── __init__.py │ ├── conf.py │ ├── dbdemos.py │ ├── exceptions/ │ │ ├── __init__.py │ │ └── dbdemos_exception.py │ ├── installer.py │ ├── installer_dashboard.py │ ├── installer_genie.py │ ├── installer_report.py │ ├── installer_repos.py │ ├── installer_workflows.py │ ├── job_bundler.py │ ├── notebook_parser.py │ ├── packager.py │ ├── resources/ │ │ ├── default_cluster_config-AWS.json │ │ ├── default_cluster_config-AZURE.json │ │ ├── default_cluster_config-GCP.json │ │ ├── default_cluster_config.json │ │ ├── default_cluster_job_config.json │ │ └── default_test_job_conf.json │ ├── sql_query.py │ ├── template/ │ │ ├── LICENSE.html │ │ ├── NOTICE.html │ │ ├── README.html │ │ ├── code_viewer.html │ │ └── index.html │ └── tracker.py ├── docs/ │ ├── CNAME │ └── index.html ├── main.py ├── requirements.in ├── requirements.txt ├── setup.py ├── test/ │ ├── __init__.py │ ├── test2.html │ ├── test_installer.py │ ├── test_installer_genie.py │ ├── test_job_bundler.py │ ├── test_list_demos.html │ ├── test_list_demos2.html │ └── test_notebook_parser.py ├── test_demo.py └── test_list_html.html ================================================ FILE CONTENTS ================================================ ================================================ FILE: .claude/commands/release.md ================================================ # DBDemos Release Workflow You are helping with the dbdemos release process. This involves bundling demos from the `dbdemos-notebooks` repository, testing them, fixing any issues, and preparing for release. ## ⛔ CRITICAL WARNINGS 1. **NEVER run a release to PyPI by yourself** - Only the human can trigger `./build-and-distribute.sh` 2. **NEVER commit secrets** - PAT tokens, GitHub tokens must never appear in commits or outputs 3. **NEVER push directly to main** - Always use feature branches and PRs 4. **NEVER cleanup workspace resources yourself** - Always ask the human to do cleanup ## 📚 Notebook Code Quality Principles (CRITICAL) **The dbdemos-notebooks are state-of-the-art examples that customers will reuse.** Code must be: 1. **Clean and minimal** - No unnecessary code, no hacks, no workarounds 2. **Simple and readable** - Easy to understand for learning purposes 3. **Safe to re-run** - Notebooks must work when run multiple times (idempotent) 4. **No error handling hacks** - Don't add try/except blocks to work around specific errors 5. **No comments explaining errors** - Don't add comments like "handles BudgetPolicy error" ### What NOT to do: ```python # BAD - Don't add error handling for specific workspace issues try: agents.deploy(...) except NotFound as e: if "BudgetPolicy" in str(e): # cleanup and retry... ``` ### What TO do instead: - If a job fails due to stale data/resources, **ASK THE HUMAN** to clean up the workspace - Never attempt cleanup yourself - the human must do it - Fix the root cause in the code, not the symptom ### Handling Stale Resource Errors: **Note:** The bundler now automatically cleans up schemas before running (via `DROP SCHEMA CASCADE`). This should prevent most stale resource errors. If you still encounter issues: - `BudgetPolicy not found` → Schema cleanup should fix this, or ask human to delete the serving endpoint - `Model version already exists` → Should be fixed by schema cleanup - `Endpoint already exists` → Should be fixed by schema cleanup - `Table already exists` → Should be fixed by schema cleanup If automatic cleanup fails or you need manual intervention, **ask the human** to run: ```sql DROP SCHEMA IF EXISTS main__build. CASCADE; ``` Or delete specific resources via the Databricks UI/API. ## Overview The dbdemos package bundles notebooks from the `dbdemos-notebooks` repository. The bundling process: 1. Creates/updates jobs in a Databricks workspace that run the notebooks 2. Waits for job completion 3. Downloads executed notebooks with outputs 4. Packages them into the `dbdemos/bundles/` directory ## Environment Setup Before starting, verify these are available: - `DATABRICKS_TOKEN` or token in `local_conf_E2TOOL.json` - `GITHUB_TOKEN` or token in `local_conf_E2TOOL.json` - Workspace: `https://e2-demo-tools.cloud.databricks.com/` - dbdemos-notebooks repo at: `../dbdemos-notebooks` (configurable) - Test cluster: Matches `cluster_name_pattern` in config (default: "quentin") ## AI Release Tools Location All AI-powered release tools are in `ai_release/`: - `ai_release/bundle.py` - Bundle and test demos - `ai_release/run_remote.py` - Execute code on Databricks clusters - `ai_release/compute.py` - Remote execution library - `ai_release/run_state.py` - Persistent state tracking for runs - `ai_release/jobs.py` - Job inspection library (uses Databricks SDK) - `ai_release/inspect_jobs.py` - CLI for job inspection ## ⏱️ Important: Job Run Times **Bundle jobs typically take 15-30 minutes to complete.** Each job runs all notebooks in a demo on a Databricks cluster. - Do NOT wait synchronously for jobs to complete - Start the job, then work on other tasks or let the user know to check back later - Use `--status` to check job progress without blocking - The state tracking system persists progress across sessions --- ## Part 0: Run State Tracking The AI release workflow tracks state persistently in `ai_release/runs/`: ``` ai_release/runs/ / state.json # Overall run state / status.json # Demo-specific status errors.json # Extracted errors from failed runs fix_attempts.json # History of fix attempts job_output.log # Raw job output notes.md # AI notes and observations ``` ### Using Run State in Python ```python from ai_release.run_state import get_run_state, get_latest_run # Get or create state for current commit state = get_run_state() # Update demo status state.update_demo_status("ai-agent", "running", job_id=123, run_id=456) # Save errors state.save_errors("ai-agent", [{"cell": 5, "error": "ImportError..."}]) # Record a fix attempt state.add_fix_attempt("ai-agent", "Remove protobuf constraint", "ai-fix-ai-agent-pip", ["01_create_first_billing_agent.py"]) # Add notes state.add_note("ai-agent", "The pip install fails due to protobuf<5 conflict with grpcio-status") # Get summary print(state.get_summary()) # Resume from previous session state = get_latest_run() ``` ### When to Use State Tracking - Before starting a bundle job: `state.update_demo_status(demo, "running", ...)` - After job completes: `state.update_demo_status(demo, "success")` or `"failed"` - When extracting errors: `state.save_errors(demo, errors)` - When making a fix: `state.add_fix_attempt(demo, description, branch, files)` - To add context for future sessions: `state.add_note(demo, note)` --- ## Part 1: Remote Code Execution (Testing Fixes) Before committing a fix to dbdemos-notebooks, test it interactively on a cluster. ### List Available Clusters ```bash python ai_release/run_remote.py --list-clusters ``` ### Check/Start the Test Cluster ```bash # Check status python ai_release/run_remote.py --cluster-status # Start if not running (will ask for confirmation) python ai_release/run_remote.py --start-cluster --wait-for-cluster ``` ### Execute Code for Testing ```bash # Execute Python code python ai_release/run_remote.py --code "print(spark.version)" # Execute SQL python ai_release/run_remote.py --code "SELECT current_catalog()" --language sql # Execute a file python ai_release/run_remote.py --file path/to/test_script.py # With longer timeout (default 300s) python ai_release/run_remote.py --code "long_running_code()" --timeout 600 ``` ### Context Reuse (Faster Follow-up Commands) ```bash # First command - save context python ai_release/run_remote.py --code "x = spark.range(100)" --save-context # Follow-up commands reuse context (faster, keeps variables) python ai_release/run_remote.py --code "x.count()" --load-context # Clear context when done python ai_release/run_remote.py --clear-context ``` --- ## Part 2: Bundling Commands ### Check Configuration ```bash python ai_release/bundle.py --check-config ``` ### Check Status of a Demo ```bash python ai_release/bundle.py --demo --status ``` This shows recent job runs, task status, and error details. ### Bundle a Specific Demo (from main) ```bash python ai_release/bundle.py --demo ``` ### Bundle from a Feature Branch ```bash python ai_release/bundle.py --demo --branch ``` ### Force Re-run (ignore diff optimization) ```bash python ai_release/bundle.py --demo --force ``` ### Repair Failed Job (re-run only failed tasks) ```bash python ai_release/bundle.py --demo --repair ``` Use this for quick iteration when debugging. After fixing, always do a full re-run. Add `--wait` to wait for completion: ```bash python ai_release/bundle.py --demo --repair --wait ``` ### Schema Cleanup (Default: Enabled) By default, the bundler automatically drops the demo schema (`main__build.`) before running. This ensures a clean state and avoids stale resource errors. ```bash # Cleanup is enabled by default - these are equivalent: python ai_release/bundle.py --demo python ai_release/bundle.py --demo --cleanup-schema # To skip cleanup (not recommended unless debugging): python ai_release/bundle.py --demo --no-cleanup-schema ``` ### Bundle All Demos ```bash python ai_release/bundle.py --all ``` This uses GitHub diff API to only run demos with changed files. ### List Available Demos ```bash python ai_release/bundle.py --list-demos ``` --- ## Part 3: Fixing a Failed Demo - Complete Workflow When a demo fails, follow this workflow: ### Step 1: Identify the Error ```bash # Get job status with auto-extracted errors from notebook cells python ai_release/inspect_jobs.py --demo # For full error traces and failing code python ai_release/inspect_jobs.py --demo --errors # List all failed jobs python ai_release/inspect_jobs.py --list --failed-only ``` The inspection tool automatically: - Fetches the job run details - Exports the notebook HTML - Extracts cell-level errors with traceback - Shows the exact code that failed - Suggests a fix workflow Common issues: - Missing/incompatible dependencies (pip install failures) - API changes in Databricks - Data schema changes - Cluster configuration issues ### Step 2: Test the Fix Interactively (Optional but Recommended) Before touching the notebooks, test your fix on a cluster: ```bash # Start cluster if needed python ai_release/run_remote.py --start-cluster --wait-for-cluster # Test your fix code python ai_release/run_remote.py --code " # Your fix code here df = spark.read.table('your_table') # ... " ``` ### Step 3: Create a Fix Branch in dbdemos-notebooks ```bash cd ../dbdemos-notebooks git checkout main git pull origin main git checkout -b ai-fix-- ``` ### Step 4: Make the Fix Edit the notebook files in `../dbdemos-notebooks`. The notebooks are `.py` files using Databricks notebook format. ### Step 5: Commit and Push ```bash cd ../dbdemos-notebooks git add . git commit -m "fix: " git push origin ai-fix-- ``` ### Step 6: Test the Fix (Full Re-run) ```bash cd ../dbdemos python ai_release/bundle.py --demo --branch ai-fix-- --force ``` ### Step 7: If Still Failing - Iterate ```bash # Make more fixes in dbdemos-notebooks cd ../dbdemos-notebooks # ... edit files ... git add . && git commit -m "fix: additional fixes" && git push # Quick test with repair (faster, but use full re-run for final verification) cd ../dbdemos python ai_release/bundle.py --demo --repair --wait # Or full re-run if dependencies changed python ai_release/bundle.py --demo --branch ai-fix-- --force ``` ### Step 8: Create PR (When Tests Pass) ```bash cd ../dbdemos-notebooks gh pr create --title "fix: " --body "## Summary - Fixed ## Testing - Bundling job passed: 🤖 Generated with Claude Code" ``` ### Step 9: After PR is Merged - Final Verification Wait for the human to merge the PR, then: ```bash cd ../dbdemos python ai_release/bundle.py --demo --force ``` Report the result to the human. --- ## Part 4: Full Release Workflow When all demos are working and you're asked to prepare a release: ### Step 1: Bundle All Demos from Main ```bash python ai_release/bundle.py --all --force ``` ### Step 2: Verify All Passed Check output for any failures. If any failed, fix them first. ### Step 3: Report to Human Tell the human: - All demos bundled successfully - Any changes made - Ready for PyPI release ### Step 4: Human Runs Release **The human will run:** `./build-and-distribute.sh` **You must NEVER run this yourself.** --- ## Useful Information ### Demo Path Structure Demos are located in paths like: - `product_demos/Delta-Lake/delta-lake` - `demo-retail/lakehouse-retail-c360` - `aibi/aibi-marketing-campaign` ### Job Naming Convention Jobs are named: `field-bundle_` ### Bundle Config Location Each demo has a config at: `/_resources/bundle_config` ### Workspace URLs - Jobs: `https://e2-demo-tools.cloud.databricks.com/#job/` - Runs: `https://e2-demo-tools.cloud.databricks.com/#job//run/` ### Package Versioning Rules (IMPORTANT) When fixing `%pip install` lines in notebooks, follow these rules for Databricks packages: **Always use latest (no version pin):** - `databricks-langchain` - use latest - `databricks-agents` - use latest - `databricks-feature-engineering` - use latest (NOT pinned like `==0.12.1`) - `databricks-sdk` - use latest - `databricks-mcp` - use latest **Use minimum version (`>=`):** - `mlflow>=3.10.1` - minimum version constraint is OK **Never pin these constraints (they cause conflicts):** - `protobuf<5` - REMOVE, conflicts with grpcio-status - `cryptography<43` - REMOVE, unnecessary constraint **Example - BAD:** ``` %pip install mlflow>=3.10.1 databricks-feature-engineering==0.12.1 protobuf<5 cryptography<43 ``` **Example - GOOD:** ``` %pip install mlflow>=3.10.1 databricks-langchain databricks-agents databricks-feature-engineering ``` ### Common Errors and Fixes 1. **"couldn't get notebook for run... You probably did a run repair"** - Solution: Do a full re-run with `--force` 2. **"last job failed for demo X. Can't package"** - Solution: Fix the failing notebook, then re-run 3. **API rate limits (429 errors)** - The script auto-retries. If persistent, wait a few minutes. 4. **"Couldn't pull the repo"** - Git conflicts in workspace. May need manual resolution. 5. **Cluster not running** - Use `python ai_release/run_remote.py --start-cluster --wait-for-cluster` 6. **pip install CalledProcessError with protobuf/cryptography conflicts** - Remove `protobuf<5` and `cryptography<43` constraints - Remove pinned versions like `databricks-feature-engineering==0.12.1` - See "Package Versioning Rules" above --- ## Files Reference - `ai_release/inspect_jobs.py` - Job inspection CLI (auto-extracts errors from notebooks) - `ai_release/jobs.py` - Job inspection library (uses Databricks SDK) - `ai_release/bundle.py` - Main CLI for bundling - `ai_release/run_remote.py` - Remote code execution CLI - `ai_release/compute.py` - Remote execution library - `ai_release/run_state.py` - Persistent state tracking for runs - `ai_release/runs/` - Directory containing run state (gitignored) - `dbdemos/job_bundler.py` - Job creation and execution - `dbdemos/packager.py` - Packaging executed notebooks - `local_conf_E2TOOL.json` - Local configuration (gitignored) ## SDK Documentation Databricks SDK for Python: https://databricks-sdk-py.readthedocs.io/en/latest/ - `../dbdemos-notebooks/` - Source notebooks repository ================================================ FILE: .cursorignore ================================================ bdemos/exceptions/__pycache__ __pycache__ local_conf_awsevent.json local_conf_cse2.json local_conf_gcp.json local_conf_ioannis.json local_conf.json local_conf* .eggs .DS_Store build dbdemos/minisite/ dbdemos/bundles dist *.egg-info/ dbdemos/__pycache__ .idea send_to_e2.sh test_package.py dbdemos/resources/local_conf.json field-demo venv databricks-demos.iml dist conf.json .DS_Store __pycache__ .idea config.json ================================================ FILE: .gitignore ================================================ dbdemos/exceptions/__pycache__ __pycache__ update-minisite-dbdemos-website.sh local_conf_awsevent.json local_conf_cse2.json local_conf_gcp.json local_conf_ioannis.json local_conf.json local_conf* .eggs .DS_Store build dbdemos/minisite/ dbdemos/bundles dist *.egg-info/ dbdemos/__pycache__ .idea send_to_e2.sh test_package.py dbdemos/resources/local_conf.json field-demo venv databricks-demos.iml local_conf_azure.json ================================================ FILE: .vscode/launch.json ================================================ { // Use IntelliSense to learn about possible attributes. // Hover to view descriptions of existing attributes. // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387 "version": "0.2.0", "configurations": [ { "name": "Python Debugger: Launch main", "type": "debugpy", "request": "launch", "program": "${workspaceFolder}/main.py", "console": "integratedTerminal" } ] } ================================================ FILE: .vscode/settings.json ================================================ { "python.testing.unittestArgs": [ "-v", "-s", "./test", "-p", "test_*.py" ], "python.testing.pytestEnabled": false, "python.testing.unittestEnabled": true, "python-envs.defaultEnvManager": "ms-python.python:conda", "python-envs.defaultPackageManager": "ms-python.python:conda", "python-envs.pythonProjects": [] } ================================================ FILE: CLAUDE.md ================================================ # CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## ⛔ CRITICAL: NEVER Release to PyPI **Claude Code must NEVER run `./build-and-distribute.sh` or release to PyPI.** Only the human maintainer can trigger a PyPI release. When demos are ready: 1. Report to the human that bundling is complete 2. Wait for the human to run the release script manually ## ⛔ CRITICAL: Never Commit Secrets **NEVER include PAT tokens, GitHub tokens, or any credentials in:** - Commits - PR descriptions - Tool outputs shown to user - Log messages Tokens are stored in `local_conf.json` (gitignored) or environment variables. ## CRITICAL: Do Not Modify Bundle and Minisite Directories **NEVER search, read, edit, or modify files under these directories unless explicitly asked:** - `dbdemos/bundles/` - Contains packaged demo bundles (generated artifacts) - `dbdemos/minisite/` - Contains generated minisite content These directories contain packaged/generated demo content that should only be modified through the bundling workflow (`job_bundler.py` → `packager.py`). Direct edits to these files will be overwritten during the next bundling process and can break demo installations. **Work on the source code in the core modules instead** (`installer.py`, `packager.py`, `job_bundler.py`, etc.) or on the source repository (`dbdemos-notebooks`). ## Project Overview `dbdemos` is a Python toolkit for installing and packaging Databricks demos. It automates deployment of complete demo environments including notebooks, Spark Declarative Pipeline (SDP) pipelines, DBSQL dashboards, workflows, ML models, and AI/BI Genie spaces. The project serves two main purposes: 1. **End-user library**: Users install demos via `pip install dbdemos` and call `dbdemos.install('demo-name')` 2. **Demo packaging system**: Maintainers package demos from source repositories (usually `dbdemos-notebooks`) into distributable bundles ## Architecture ### Core Components - **installer.py**: Main installation engine that deploys demos to Databricks workspaces - Creates clusters, SDP pipelines, workflows, dashboards, and ML models - Handles resource templating (replacing {{CURRENT_USER}}, {{DEMO_FOLDER}}, etc.) - Manages demo lifecycle from download to deployment - **job_bundler.py**: Manages the demo bundling workflow - Scans repositories for demos with `_resources/bundle_config` files - Executes pre-run jobs to generate notebook outputs - Tracks execution state and commit history to avoid redundant runs - **packager.py**: Packages demos into distributable bundles - Downloads notebooks (with or without pre-run results) - Extracts Lakeview dashboards from workspace - Processes notebook content (removes build tags, updates paths) - Generates minisite HTML for [dbdemos.ai](https://www.dbdemos.ai) - **dbdemos.py**: User-facing API layer providing `help()`, `list_demos()`, `install()` functions - **conf.py**: Configuration management including `DBClient` for Databricks REST API calls - **installer_*.py modules**: Specialized installers for different resource types: - `installer_workflows.py`: Job/workflow deployment - `installer_dashboard.py`: DBSQL dashboard installation - `installer_genie.py`: AI/BI Genie space setup - `installer_repos.py`: Repository management - **notebook_parser.py**: Parses and transforms notebook JSON/HTML content ### Demo Bundle Structure Each demo lives in `dbdemos/bundles/{demo-name}/` with: - `_resources/bundle_config`: JSON configuration defining demo metadata, notebooks, pipelines, workflows, dashboards - Notebook files (`.html` format, pre-run with cell outputs) - `_resources/dashboards/*.lvdash.json`: Dashboard definitions Bundle configs use template keys that get replaced during installation: - `{{CURRENT_USER}}`: Installing user's email - `{{CURRENT_USER_NAME}}`: Sanitized username - `{{DEMO_FOLDER}}`: Installation path - `{{DEMO_NAME}}`: Demo identifier - `{{TODAY}}`: Current date Demos are sourced from external repositories (typically `databricks-demos/dbdemos-notebooks`) and bundled into this package for distribution. ## Common Development Commands ### Building the Package ```bash # Build wheel distribution python setup.py clean --all bdist_wheel # Build script (used locally) ./build.sh ``` ### Testing ```bash # Run all tests pytest # Run specific test file pytest test/test_installer.py # Run specific test pytest test/test_installer.py::TestInstaller::test_method_name ``` ### Bundling Demos (Maintainer Workflow) Create a `local_conf.json` file with workspace credentials (see `local_conf_example.json`): ```json { "username": "user@example.com", "url": "https://workspace.cloud.databricks.com", "org_id": "1234567890", "pat_token": "dapi...", "repo_staging_path": "/Repos/user@example.com", "repo_name": "dbdemos-notebooks", "repo_url": "https://github.com/databricks-demos/dbdemos-notebooks", "branch": "master", "github_token": "ghp_..." } ``` Then use `main.py` to bundle demos: ```python from dbdemos.job_bundler import JobBundler from dbdemos.packager import Packager bundler = JobBundler(conf) bundler.reset_staging_repo(skip_pull=False) bundler.add_bundle("product_demos/delta-lake") # or use load_bundles_conf() to discover all bundler.start_and_wait_bundle_jobs(force_execution=False) packager = Packager(conf, bundler) packager.package_all() ``` See `test_demo.py` for a complete bundling example. ### Distribution and Release ```bash # Full release process (bumps version, builds, uploads to PyPI, creates GitHub releases) ./build-and-distribute.sh ``` This script: 1. Verifies GitHub CLI authentication and repository access 2. Auto-increments version in `setup.py` and `dbdemos/__init__.py` 3. Builds wheel package 4. Uploads to PyPI via `twine` 5. Creates release branch and pull request 6. Creates GitHub releases on multiple repositories (`dbdemos`, `dbdemos-notebooks`, `dbdemos-dataset`, `dbdemos-resources`) ## Key Implementation Details ### Dynamic Link Replacement Notebooks contain special attributes in HTML links that get replaced during installation: - `dbdemos-pipeline-id="pipeline-id"`: Links to SDP pipelines - `dbdemos-workflow-id="workflow-id"`: Links to workflows - `dbdemos-dashboard-id="dashboard-id"`: Links to dashboards The installer updates these links with actual resource IDs/URLs after creation. ### Resource Creation Flow 1. Parse bundle configuration 2. Create/update Git repo if specified 3. Create demo cluster (with auto-termination) 4. Install notebooks to workspace 5. Create SDP pipelines 6. Create workflows 7. Create DBSQL dashboards 8. Create Genie spaces (for AI/BI demos) 9. Update notebook links to point to created resources 10. Track installation metrics ### Cluster Configuration Default cluster configs are in `dbdemos/resources/`: - `default_cluster_config.json`: Standard demo cluster - `default_test_job_conf.json`: Job cluster configuration - Cloud-specific variants for AWS/Azure/GCP Demos can override cluster settings in their bundle config under the `cluster` key. ### Multi-Cloud Support The project supports AWS, Azure, and GCP. Cloud-specific configurations include: - Instance type selection - Storage paths (S3/ADLS/GCS) - Authentication mechanisms - DBR version selection Cloud is detected automatically from workspace or specified via `cloud` parameter in `install()`. ### Serverless Support Some demos support serverless compute. Set `serverless=True` when installing to use: - Serverless SDP pipelines - Serverless SQL warehouses - Serverless notebooks (where supported) ## Testing Considerations - Tests use local configuration files (see `local_conf_*.json` examples) - Tests require a Databricks workspace with appropriate permissions - Most tests are in the `test/` directory - `test_demo.py` in root is for bundling workflow testing ## Data Collection By default, dbdemos collects usage metrics (views, installations) to improve demo quality. This can be disabled by setting `Tracker.enable_tracker = False` in `tracker.py`. No PII is collected; only aggregate usage data and org IDs. ## Important Constraints - Users need cluster creation, SDP pipeline creation, and DBSQL dashboard permissions - Unity Catalog demos require a UC metastore - Some demos have resource quotas (compute, storage) - Pre-run notebooks require job execution in staging workspace - Dashboard API has rate limits (especially on GCP workspaces) ## Claude Code Release Workflow For the full release workflow, use the `/release` command which provides detailed instructions. ### Quick Reference **Remote Execution** (`ai_release/run_remote.py`) - Test fixes before committing: ```bash # List clusters python ai_release/run_remote.py --list-clusters # Start/check cluster python ai_release/run_remote.py --start-cluster --wait-for-cluster # Execute code python ai_release/run_remote.py --code "print(spark.version)" # Execute SQL python ai_release/run_remote.py --code "SELECT 1" --language sql ``` **Job Inspection** (`ai_release/inspect_jobs.py`) - Auto-extracts errors from notebook: ```bash # List all jobs with status python ai_release/inspect_jobs.py --list # List only failed jobs python ai_release/inspect_jobs.py --list --failed-only # Get details for a demo (auto-fetches errors if failed) python ai_release/inspect_jobs.py --demo # Show full error traces and code python ai_release/inspect_jobs.py --demo --errors ``` **Bundle CLI** (`ai_release/bundle.py`): ```bash # Check demo status python ai_release/bundle.py --demo --status # Bundle from main python ai_release/bundle.py --demo # Bundle from feature branch python ai_release/bundle.py --demo --branch # Repair failed job (quick iteration) python ai_release/bundle.py --demo --repair # Force full re-run python ai_release/bundle.py --demo --force # Bundle all demos python ai_release/bundle.py --all ``` **Fix Workflow Summary**: 1. Inspect errors: `python ai_release/inspect_jobs.py --demo --errors` 2. Test fix interactively: `python ai_release/run_remote.py --code "..."` 3. Create fix branch in `../dbdemos-notebooks`: `ai-fix--` 4. Make fix, commit, push 5. Test: `--branch ai-fix-... --force` 6. If fails, iterate with `--repair` or `--force` 7. Create PR when green 8. Human merges PR 9. Final verification from main: `--force` 10. Human runs `./build-and-distribute.sh` **GitHub CLI Account Switch** - Use the public account for PRs: ```bash # List accounts gh auth status # Switch to public account (for creating PRs on public repos) gh auth switch --user QuentinAmbard # Switch to enterprise account (quentin-ambard_data is EMU, can't create PRs on public repos) gh auth switch --user quentin-ambard_data ``` **PR Status Verification** - Always check PR state before assuming: ```bash # Check if PR is merged before running tests from main gh pr view --json state,mergedAt # Never assume a PR is not merged - always verify first ``` ### Environment - **Workspace**: `https://e2-demo-tools.cloud.databricks.com/` - **Config**: `local_conf_E2TOOL.json` (primary) or environment variables - **Notebooks repo**: `../dbdemos-notebooks` (configurable) - **Test cluster**: Matches `cluster_name_pattern` in config (default: "quentin") ### Key Files - `ai_release/inspect_jobs.py` - Job inspection CLI (auto-extracts errors) - `ai_release/jobs.py` - Job inspection library (uses Databricks SDK) - `ai_release/bundle.py` - Bundle CLI for demo packaging - `ai_release/run_remote.py` - Remote code execution on clusters - `ai_release/compute.py` - Remote execution library - `.claude/commands/release.md` - Full release workflow documentation - `dbdemos/job_bundler.py` - Job creation and execution logic - `dbdemos/packager.py` - Packaging logic - `local_conf.json` - Local configuration (gitignored, contains secrets) ### Databricks SDK All Databricks API operations use the Python SDK: https://databricks-sdk-py.readthedocs.io/en/latest/ ================================================ FILE: LICENSE ================================================ Copyright (2022) Databricks, Inc. This library (the "Software") may not be used except in connection with the Licensee's use of the Databricks Platform Services pursuant to an Agreement (defined below) between Licensee (defined below) and Databricks, Inc. ("Databricks"). The Object Code version of the Software shall be deemed part of the Downloadable Services under the Agreement, or if the Agreement does not define Downloadable Services, Subscription Services, or if neither are defined then the term in such Agreement that refers to the applicable Databricks Platform Services (as defined below) shall be substituted herein for “Downloadable Services.” Licensee's use of the Software must comply at all times with any restrictions applicable to the Downlodable Services and Subscription Services, generally, and must be used in accordance with any applicable documentation. For the avoidance of doubt, the Software constitutes Databricks Confidential Information under the Agreement. Additionally, and notwithstanding anything in the Agreement to the contrary: THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. you may view, make limited copies of, and may compile the Source Code version of the Software into an Object Code version of the Software. For the avoidance of doubt, you may not make derivative works of Software (or make any any changes to the Source Code version of the unless you have agreed to separate terms with Databricks permitting such modifications (e.g., a contribution license agreement)). If you have not agreed to an Agreement or otherwise do not agree to these terms, you may not use the Software or view, copy or compile the Source Code of the Software. This license terminates automatically upon the termination of the Agreement or Licensee's breach of these terms. Additionally, Databricks may terminate this license at any time on notice. Upon termination, you must permanently delete the Software and all copies thereof (including the Source Code). Agreement: the agreement between Databricks and Licensee governing the use of the Databricks Platform Services, which shall be, with respect to Databricks, the Databricks Terms of Service located at www.databricks.com/termsofservice, and with respect to Databricks Community Edition, the Community Edition Terms of Service located at www.databricks.com/ce-termsofuse, in each case unless Licensee has entered into a separate written agreement with Databricks governing the use of the applicable Databricks Platform Services. Databricks Platform Services: the Databricks services or the Databricks Community Edition services, according to where the Software is used. Licensee: the user of the Software, or, if the Software is being used on behalf of a company, the company. Object Code: is version of the Software produced when an interpreter or a compiler translates the Source Code into recognizable and executable machine code. Source Code: the human readable portion of the Software. ================================================ FILE: MANIFEST.in ================================================ recursive-include dbdemos/bundles * recursive-include dbdemos/template * recursive-include dbdemos/resources * ================================================ FILE: NOTICE ================================================ Copyright (2022) Databricks, Inc. ## License This Software includes software developed at Databricks (https://www.databricks.com/) and its use is subject to the included LICENSE file. This Software contains code from the following open source projects, licensed under the Apache 2.0 license: psf/requests - https://github.com/psf/requests Copyright 2019 Kenneth Reitz ## Data collection To improve users experience and dbdemos asset quality, dbdemos sends report usage and capture views in the installed notebook (usually in the first cell) and other assets like dashboards. This information is captured for product improvement only and not for marketing purpose, and doesn't contain PII information. By using `dbdemos` and the assets it provides, you consent to this data collection. If you wish to disable it, you can set `Tracker.enable_tracker` to False in the `tracker.py` file. ## Resource creation To simplify your experience, `dbdemos` will create and start for you resources. As example, a demo could start (not exhaustive): - A cluster to run your demo - A Delta Live Table Pipeline to ingest data - A DBSQL endpoint to run DBSQL dashboard - An ML model While `dbdemos` does its best to limit the consumption and enforce resource auto-termination, you remain responsible for the resources created and the potential consumption associated. ## Catalog/Database created dbdemos will try to create catalogs & databases (schemas). Demos are using the hive_metastore or UC catalogs. dbdemos will try to use the dbdemos catalog when possible. Permissions / ownership can be granted to all users (account users) in these datasets. ## Support Databricks does not offer official support for `dbdemos` and the associated assets. ================================================ FILE: README.md ================================================ # dbdemos DBDemos is a toolkit to easily install Lakehouse demos for Databricks. **Looking for the dbdemos notebooks and content?** Access [https://github.com/databricks-demos/dbdemos](https://github.com/databricks-demos/dbdemos-notebooks)? Simply deploy & share demos on any workspace. dbdemos is packaged with a list of demos: - Lakehouse, end-to-end demos (ex: Lakehouse Retail Churn) - Product demos (ex: Delta Live Table, CDC, ML, DBSQL Dashboard, MLOps...) **Please visit [dbdemos.ai](https://www.dbdemos.ai) to explore all our demos.** ## Installation **Do not clone the repo, just pip install dbdemos wheel:** ``` %pip install dbdemos ``` ## Usage within Databricks See [demo video](https://drive.google.com/file/d/12Iu50r7hlawVN01eE_GoUKBQ4kvUrR56/view?usp=sharing) ``` import dbdemos dbdemos.help() dbdemos.list_demos() dbdemos.install('lakehouse-retail-c360', path='./', overwrite = True) ``` ![Dbdemos install](https://github.com/databricks-demos/dbdemos/raw/main/resources/dbdemos-screenshot.png) ## Requirements `dbdemos` requires the current user to have: * Cluster creation permission * SDP Pipeline creation permission * DBSQL dashboard & query creation permission * For UC demos: Unity Catalog metastore must be available (demo will be installed but won't work) ## Features * Load demo notebooks (pre-run) to the given path * Start job to load dataset based on demo requirement * Start demo cluster customized for the demo & the current user * Setup SDP pipelines * Setup DBSQL dashboard * Create ML Model * Demo links are updated with resources created for an easy navigation ## Feedback Demo not working? Can't use dbdemos? Please open a github issue.
Make sure you mention the name of the demo. # DBDemos Developer options ## Adding an AI/BI demo to dbdemos open [README_AIBI.md](README_AIBI.md) for more details on how to contribute & add an AI/BI demo. Read the following if you want to add a new demo bundle. ## Packaging a demo with dbdemos Your demo must contain a `_resources` folder where you include all initialization scripts and your bundle configuration file. ### Links & tags DBdemos will dynamically override the link to point to the resources created. **Always use links relative to the local path to support multi workspaces. Do not add the workspace id.** #### SDP pipelines: Your SDP pipeline must be added in the bundle file (see below). Within your notebook, to identify your pipeline using the id in the bundle file, specify the id `dbdemos-pipeline-id=""`as following: `Spark Declarative Pipeline` #### Workflows: Your workflows must be added in the bundle file (see below). Within your notebook, to identify your workflow using the id in the bundle file, specify the id `dbdemos-workflow-id=""`as following: `Access your workflow` #### DBSQL dashboards: Similar to workflows, your dashboard id must match the one in the bundle file. Dashboards definition should be added to the _dashboards folder (make sure the file name matches the dashboard id: `churn-prediction.lvdash.json`). ` Churn Analysis Dashboard` ### bundle_config The demo must contain the a `./_resources/bundle_config` file containing your bundle definition. This need to be a notebook & not a .json file (due to current api limitation). ```json { "name": "", "category": "", "title": ".", "description": "<Description>", "bundle": <Will bundle when True, skip when False>, "tags": [{"sdp": "Spark Declarative Pipeline"}], "notebooks": [ { "path": "<notebbok path from the demo folder (ex: resources/00-load-data)>", "pre_run": <Will start a job to run it before packaging to get the cells results>, "publish_on_website": <Will add the notebook in the public website (with the results if it's pre_run=True)>, "add_cluster_setup_cell": <if True, add a cell with the name of the demo cluster>, "title": "<Title>", "description": "<Description (will be in minisite also)>", "parameters": {"<key>": "<value. Will be sent to the pre_run job>"} } ], "init_job": { "settings": { "name": "demos_sdp_cdc_init_{{CURRENT_USER_NAME}}", "email_notifications": { "no_alert_for_skipped_runs": False }, "timeout_seconds": 0, "max_concurrent_runs": 1, "tasks": [ { "task_key": "init_data", "notebook_task": { "notebook_path": "{{DEMO_FOLDER}}/_resources/01-load-data-quality-dashboard", "source": "WORKSPACE" }, "job_cluster_key": "Shared_job_cluster", "timeout_seconds": 0, "email_notifications": {} } ] .... Full standard job definition } }, "pipelines": <list of SDP pipelines if any> [ { "id": "sdp-cdc", <id, used in the notebook links to go to the generated notebook: <a dbdemos-pipeline-id="sdp-cdc" href="#joblist/pipelines/xxxx">installed SDP pipeline</a> > "run_after_creation": True, "definition": { ... Any SDP pipeline configuration... "libraries": [ { "notebook": { "path": "{{DEMO_FOLDER}}/_resources/00-Data_CDC_Generator" } } ], "name": "demos_sdp_cdc_{{CURRENT_USER_NAME}}", "storage": "/demos/sdp/cdc/{{CURRENT_USER_NAME}}", "target": "demos_sdp_cdc_{{CURRENT_USER_NAME}}" } } ], "workflows": [{ "start_on_install": False, "id": "credit-job", "definition": { "settings": { ... full pipeline settings } }], "dashboards": [{"name": "[dbdemos] Retail Churn Prediction Dashboard", "id": "churn-prediction"}] } ``` dbdemos will replace the values defined as {{<KEY>}} based on who install the demo. Supported keys: * TODAY * CURRENT_USER (email) * CURRENT_USER_NAME (derivated from email) * DEMO_NAME * DEMO_FOLDER # DBDemo Installer configuration The following describe how to package the demos created. The installer needs to fetch data from a workspace & start jobs. To do so, it requires informations `local_conf.json` ```json { "pat_token": "xxx", "username": "xx.xx@databricks.com", "url": "https://xxx.databricks.com", "repo_staging_path": "/Repos/xx.xx@databricks.com", "repo_name": "dbdemos-notebooks", "repo_url": "https://github.com/databricks-demos/dbdemos-notebooks.git", #put your clone here "branch": "master", "current_folder": "<Used to mock the current folder outside of a notebook, ex: /Users/quentin.ambard@databricks.com/test_install_demo>" } ``` ### Creating the bundles: ```python bundler = JobBundler(conf) # the bundler will use a stating repo dir in the workspace to analyze & run content. bundler.reset_staging_repo(skip_pull=False) # Discover bundles from repo: bundler.load_bundles_conf() # Or manually add bundle to run faster: #bundler.add_bundle("product_demos/Auto-Loader (cloudFiles)") # Run the jobs (only if there is a new commit since the last time, or failure, or force execution) bundler.start_and_wait_bundle_jobs(force_execution = False) packager = Packager(conf, bundler) packager.package_all() ``` ## Licence See LICENSE file. ## Data collection To improve users experience and dbdemos asset quality, dbdemos sends report usage and capture views in the installed notebook (usually in the first cell) and dashboards. This information is captured for product improvement only and not for marketing purpose, and doesn't contain PII information. By using `dbdemos` and the assets it provides, you consent to this data collection. If you wish to disable it, you can set `Tracker.enable_tracker` to False in the `tracker.py` file. ## Resource creation To simplify your experience, `dbdemos` will create and start for you resources. As example, a demo could start (not exhaustive): - A cluster to run your demo - A Delta Live Table Pipeline to ingest data - A DBSQL endpoint to run DBSQL dashboard - An ML model While `dbdemos` does its best to limit the consumption and enforce resource auto-termination, you remain responsible for the resources created and the potential consumption associated. ## Support Databricks does not offer official support for `dbdemos` and the associated assets. For any issue with `dbdemos` or the demos installed, please open an issue and the demo team will have a look on a best effort basis. ================================================ FILE: README_AIBI.md ================================================ # Adding an AI-BI demo to dbdemos *Note: Adding new content from external contributors required special terms approval. Please open an issue if you'd like to contribute and are not part of the Databricks team.* *Note: if you're part of the Databricks team, please reach the demo team slack channel before starting the process for alignement and avoid duplicating work.* ## Fork dbdemos-notebooks The actual AI-BI demo content is in the [dbdemos-notebooks repository](https://github.com/databricks-demos/dbdemos-notebooks). Start by forking the repository and create a new branch there with your changes. ## Create the demo Start by creating your dataset (must be all crafted/generated with DBRX to avoid any license issues), dashboard and genie space. Once you're ready, add your dbdemos to the [aibi folder](https://github.com/databricks-demos/dbdemos-notebooks/tree/main/aibi). For that, clone the the [aibi-marketing-campaign folder](https://github.com/databricks-demos/dbdemos-notebooks/tree/main/aibi/aibi-marketing-campaign) and replace the content with your own. Make sure the name of the folder has the similar pattern: `aibi-<use-case>`. ## Data Transformation and Table Structure ### Start with your story first Think about what would be a good Dashboard+Genie. Ideally you want to show some business outcome in the dashboard, and you see a spike somewhere. Then you open genie to ask a followup question. ### Dataset Once your story is ready, work backward to generate your dataset. Think about the gold table required, and then the raw dataset that you'll clean to create these tables. **Your dataset must be entirely crafted with tools like faker / DBRX. Double check any dataset license. Add a NOTICE file in your dataset folder explaining where the data is coming from / how it was created.** Datasets are stored in the [dbdemos-datasets repository](https://github.com/databricks-demos/dbdemos-datasets), and then mirrored in the dbdemos-dataset S3 bucket. Fork this repository and add your data in the `aibi` folder. ### Defining the dbdemos genie room setup All the configuration should go in the bundle file. See this example: [https://github.com/databricks-demos/dbdemos-notebooks/blob/main/aibi/aibi-marketing-campaign/_resources/bundle_config.py](https://github.com/databricks-demos/dbdemos-notebooks/blob/main/aibi/aibi-marketing-campaign/_resources/bundle_config.py) Here is what your bundle should look like: ```json { "name": "aibi-marketing-campaign", "category": "AI-BI", "title": "AI/BI: Marketing Campaign effectiveness", "custom_schema_supported": True, "default_catalog": "main", "default_schema": "dbdemos_aibi_cme_marketing_campaign", "description": "Analyze your Marketing Campaign effectiveness leveraging AI/BI Dashboard. Deep dive into your data and metrics, asking plain question through Genie Room.", "bundle": True, "notebooks": [ { "path": "AI-BI-Marketing-campaign", "pre_run": False, "publish_on_website": True, "add_cluster_setup_cell": False, "title": "AI BI: Campaign effectiveness", "description": "Discover Databricks Intelligence Data Platform capabilities." } ], "init_job": {}, "cluster": {}, "pipelines": [], "dashboards": [{"name": "[dbdemos] AI/BI - Marketing Campaign", "id": "web-marketing"} ], "data_folders":[ {"source_folder":"aibi/dbdemos_aibi_cme_marketing_campaign/raw_campaigns", "source_format": "parquet", "target_volume_folder":"raw_campaigns", "target_format":"parquet"}], "sql_queries": [ [ "CREATE OR REPLACE TABLE `{{CATALOG}}`.`{{SCHEMA}}`.raw_campaigns TBLPROPERTIES (delta.autooptimize.optimizewrite = TRUE, delta.autooptimize.autocompact = TRUE ) COMMENT 'This is the bronze table for campaigns created from parquet files' AS SELECT * FROM read_files('/Volumes/{{CATALOG}}/{{SCHEMA}}/dbdemos_raw_data/raw_campaigns', format => 'parquet', pathGlobFilter => '*.parquet')" ], [ "... queries in here will be executed in parallel", " ... (don't forget to add comments) on the table, and PK/FK" ], ["CREATE OR REPLACE FUNCTION {{CATALOG}}.{{SCHEMA}}.my_ai_forecast(input_table STRING, target_column STRING, time_column STRING, periods INT) RETURN TABLE ..."] ], "genie_rooms":[ { "id": "marketing-campaign", "display_name": "DBDemos - AI/BI - Marketing Campaign", "description": "Analyze your Marketing Campaign effectiveness leveraging AI/BI Dashboard. Deep dive into your data and metrics.", "table_identifiers": ["{{CATALOG}}.{{SCHEMA}}.campaigns", "..."], "sql_instructions": [ { "title": "Compute rolling metrics", "content": "select date, unique_clicks, sum(unique_clicks) OVER (ORDER BY date RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS clicks_t7d, sum(total_delivered) OVER (ORDER BY date RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS delivered_t7d, sum(unique_clicks) OVER (ORDER BY date RANGE BETWEEN 27 PRECEDING AND CURRENT ROW) AS clicks_t28d, sum(total_delivered) OVER (ORDER BY date RANGE BETWEEN 27 PRECEDING AND CURRENT ROW) AS delivered_t28d, sum(unique_clicks) OVER (ORDER BY date RANGE BETWEEN 90 PRECEDING AND CURRENT ROW) AS clicks_t91d, sum(total_delivered) OVER (ORDER BY date RANGE BETWEEN 90 PRECEDING AND CURRENT ROW) AS delivered_t91d, unique_clicks / total_delivered as ctr, total_delivered / total_sent AS delivery_rate, total_optouts / total_delivered AS optout_rate, total_spam / total_delivered AS spam_rate, clicks_t7d / delivered_t7d as ctr_t7d, clicks_t28d / delivered_t28d as ctr_t28d, clicks_t91d / delivered_t91d as ctr_t91d from {{CATALOG}}.{{SCHEMA}}.metrics_daily_rolling" } ], "instructions": "If a customer ask a forecast, leverage the sql fonction ai_forecast", "function_names": [ "{{CATALOG}}.{{SCHEMA}}.my_ai_forecast" ], "curated_questions": [ "How has the total number of emails sent, delivered, and the unique clicks evolved over the last six months?", "..." ] } ] } ``` ### Data Loading and Transformation AIBI demos should start with raw data files in a volume and implement a few transformation steps to showcase data lineage. This helps demonstrate the end-to-end data workflow and provides a more comprehensive view of Databricks' capabilities. **Important:** Avoid using Materialized Views (MVs) for transformations as they can slow down the dbdemos installation process. Instead, use standard SQL transformations in your demo for now (we'll revisit soon). Example transformation flow: 1. Start with raw data in volume, typically 3+ sources 2. Create bronze table(s) directly from the volume files (~3+ tables) 3. [optional] Create silver table(s) with basic transformations (cleaning, type conversion, etc.) 4. Create gold table(s) with business-specific transformations and potentially a few joins (we want to keep at least 2 or 3 tables in the genie room) ### Gold Table Requirements - Gold tables (used in the Genie room) should have PK and FK defined - Gold tables should include comprehensive comments on all fields. This improves the Genie experience by providing context for each column and helps users understand the data model. Example gold table creation with comments directly in the CREATE statement: ```sql CREATE OR REPLACE TABLE {{CATALOG}}.{{SCHEMA}}.customer_gold ( id STRING COMMENT 'Unique customer identifier' PRIMARY KEY, first_name STRING COMMENT 'Customer first name', last_name STRING COMMENT 'Customer last name', email STRING COMMENT 'Customer email address', signup_date DATE COMMENT 'Date when customer created their account', last_activity_date DATE COMMENT 'Most recent date of customer activity', customer_segment STRING COMMENT 'Customer segmentation category (New, Loyal, At-Risk, Churned)', lifetime_value DOUBLE COMMENT 'Calculated total customer spend in USD' ) AS SELECT id, first_name, last_name, email, signup_date, last_activity_date, customer_segment, lifetime_value FROM {{CATALOG}}.{{SCHEMA}}.customer_silver; ``` This approach is more concise and ensures all column comments are created in a single SQL statement. ## SQL AI Functions We need help implementing SQL AI Functions in the installer_genie.py file and the JSON configuration. These functions enhance the AI capabilities of the Genie room and enable more sophisticated queries. The AI functions should be added as part of the SQL statement. Don't forget to add comments on them (at the function level and function param level). Once created, you can add them to the genie room under `"function_names": ["{{CATALOG}}.{{SCHEMA}}.ai_forecast", "..."]` **Note: this isn't yet implemented. If you're interested in contributing to DBDemos, reach out to the demo team. The implementation should go in the `InstallerGenie` class to create these functions during demo installation and make them available in the Genie room.** ## Update the Main notebook Update the notebook cloned from the folder above, with your use-case. ### Present the use-case Rename & Update the main notebook, detailing your use-case, what is the data and the insights you want to show. ### Update tracking Update the demo name in the first cell in the tracker pixel, and the notebook name. ### Update the dashboard link Put your dashboard in the dashboards folder. In the dashboard json, make sure you use the same catalog and schema as the one you have in the bundle configuration file, typically `main.dbdemos_aibi_xxxxxx` You can then reference the dashboard like this: ```html <a dbdemos-dashboard-id="web-marketing" href='/sql/dashboardsv3/02ef00cc36721f9e1f2028ee75723cc1' target="_blank">your dashboard</a> ``` the ID here `web-marketing` must match the ID in the bundle configuration (and the dashboard file name): ``` "dashboards": [{"name": "[dbdemos] AI/BI - Marketing Campaign", "id": "web-marketing"}] ``` ### Update the Genie Room link ```html <a dbdemos-genie-id="marketing-campaign" href='/genie/rooms/01ef775474091f7ba11a8a9d2075eb58' target="_blank">your genie space</a> ``` the ID here `marketing-campaign` must match the ID in the bundle configuration: ``` "genie_rooms":[ { "id": "marketing-campaign", "display_name": "DBDemos - AI/BI - Marketing Campaign", ... } ] ``` ## Update the bundle configuration - Make sure the dashboard ID and genie room ID match your links as above. The dashboard ID must match the dashboard file name in the dashboards folder (see below). - Keep the `default_catalog` to main, `default_schema` should follow the naming convention `dbdemos_aibi_<industry>_<use-case>`. - Make sure you add a sql instruction in the genie room, with curated questions, descriptions and an instruction. - Dataset folders must be the path in the databricks-datasets repository (see below). ## Add your dashboard under the dashboards folder - create json file in the dashboards folder. Make sure you format it correctly as it's easy to read/diff - your catalog.schema in the queries must match the `default_catalog` and `default_schema` in the bundle configuration. - your dashboard name must match the id in the bundle configuration. - don't forget to update the dashboard tracking (add the tracker in the MD at the end of the dashboard, match the demo name) ## Add your images Images are stored in the [dbdemos-resources repository](https://github.com/databricks-demos/dbdemos-resources). To add an image, fork the repo and send a PR. You need at least 2 images: - the miniature for the demo list: `https://github.com/databricks-demos/dbdemos-resources/raw/main/icon/<demo_name>.jpg` - the screenshot of the dashboard: `https://www.dbdemos.ai/assets/img/dbdemos/<demo_name>-dashboard-0.png` (1 per dashboard you add) Reach out the demo team for a demo miniature https://www.dbdemos.ai/assets/img/dbdemos/aibi-marketing-campaign-dashboard-0.png https://github.com/databricks-demos/dbdemos-resources/raw/main/icon/aibi-marketing-campaign.jpg # Packaging & testing your demo Open the `test_demo.py` file. Update the conf to match your databricks-notebooks repo fork/branch in the config json file. dbdemos needs a workspace and a repo to package the demo, make sure you configure it in the conf json file (use your fork). Make sure you update the bundle folder to match your demo: ``` bundle(conf, "aibi/aibi-marketing-campaign") ``` ================================================ FILE: SECURITY.md ================================================ # Security Policy ## Reporting a Vulnerability Please email bugbounty@databricks.com to report any security vulnerabilities. We will acknowledge receipt of your vulnerability and strive to send you regular updates about our progress. If you're curious about the status of your disclosure please feel free to email us again. If you want to encrypt your disclosure email, you can use [this PGP key](https://keybase.io/arikfr/key.asc). ================================================ FILE: ai_release/__init__.py ================================================ """ AI Release Tools for DBDemos This module provides tools for Claude Code to: 1. Execute code remotely on Databricks clusters for testing 2. Bundle and test demos 3. Fix issues in dbdemos-notebooks CRITICAL: Never release to PyPI - only the human can do that! """ __version__ = "0.1.0" ================================================ FILE: ai_release/bundle.py ================================================ #!/usr/bin/env python3 """ DBDemos Bundle CLI - For bundling and testing demos This script is designed to be run by Claude Code for the release workflow. It supports bundling specific demos, running from feature branches, and job repair. Usage: # Bundle a specific demo from main branch python ai_release/bundle.py --demo lakehouse-retail-c360 # Bundle from a feature branch python ai_release/bundle.py --demo lakehouse-retail-c360 --branch fix/retail-bug # Bundle all demos (uses GitHub diff to only run changed ones) python ai_release/bundle.py --all # Repair a failed job (re-run only failed tasks) python ai_release/bundle.py --demo lakehouse-retail-c360 --repair # Force full re-run (ignore commit diff optimization) python ai_release/bundle.py --demo lakehouse-retail-c360 --force # Get job status and error details python ai_release/bundle.py --demo lakehouse-retail-c360 --status # List all available demos python ai_release/bundle.py --list-demos Environment Variables: DATABRICKS_HOST: Workspace URL (default: https://e2-demo-tools.cloud.databricks.com/) DATABRICKS_TOKEN: PAT token for Databricks GITHUB_TOKEN: GitHub token for API access DBDEMOS_NOTEBOOKS_PATH: Path to dbdemos-notebooks repo (default: ../dbdemos-notebooks) Config File: Can also use local_conf.json in the repo root for configuration. """ import argparse import json import os import sys from pathlib import Path # Add parent directory to path for imports sys.path.insert(0, str(Path(__file__).parent.parent)) from dbdemos.conf import Conf, DemoConf from dbdemos.job_bundler import JobBundler from dbdemos.packager import Packager def load_config(args): """Load configuration from environment variables and/or local_conf.json""" config = {} # Try to load from local_conf.json files (E2TOOL is primary for bundling) repo_root = Path(__file__).parent.parent conf_files = [ repo_root / "local_conf_E2TOOL.json", # Primary for bundling/testing repo_root / "local_conf.json", ] for conf_file in conf_files: if conf_file.exists(): with open(conf_file, "r") as f: config = json.load(f) print(f"Loaded config from {conf_file}") break # dbdemos-notebooks path default_notebooks_path = str(repo_root.parent / "dbdemos-notebooks") notebooks_path = os.environ.get("DBDEMOS_NOTEBOOKS_PATH", config.get("dbdemos_notebooks_path", default_notebooks_path)) config["dbdemos_notebooks_path"] = notebooks_path # Branch override from CLI if args.branch: config["branch"] = args.branch elif "branch" not in config: config["branch"] = "main" # Validate required fields required = ["pat_token", "github_token", "url"] missing = [f for f in required if not config.get(f)] if missing: print(f"ERROR: Missing required config: {missing}") print("Set via environment variables or local_conf.json") sys.exit(1) return config def load_cluster_templates(): """Load default cluster configuration templates""" repo_root = Path(__file__).parent.parent with open(repo_root / "dbdemos/resources/default_cluster_config.json", "r") as f: default_cluster_template = f.read() with open(repo_root / "dbdemos/resources/default_test_job_conf.json", "r") as f: default_cluster_job_template = f.read() return default_cluster_template, default_cluster_job_template def create_conf(config): """Create Conf object from config dict""" default_cluster_template, default_cluster_job_template = load_cluster_templates() # Strip .git from repo_url if present (Conf doesn't allow it) repo_url = config.get("repo_url", "https://github.com/databricks-demos/dbdemos-notebooks") if repo_url.endswith(".git"): repo_url = repo_url[:-4] return Conf( username=config.get("username", "claude-code@databricks.com"), workspace_url=config["url"], org_id=config.get("org_id", ""), pat_token=config["pat_token"], default_cluster_template=default_cluster_template, default_cluster_job_template=default_cluster_job_template, repo_staging_path=config.get("repo_staging_path", "/Repos/quentin.ambard@databricks.com"), repo_name=config.get("repo_name", "dbdemos-notebooks"), repo_url=repo_url, branch=config["branch"], github_token=config["github_token"], run_test_as_username=config.get("run_test_as_username", "quentin.ambard@databricks.com") ) def list_demos(bundler: JobBundler): """List all available demos""" print("Scanning for available demos...") bundler.reset_staging_repo(skip_pull=False) bundler.load_bundles_conf() print(f"\nFound {len(bundler.bundles)} demos:\n") for path, demo_conf in sorted(bundler.bundles.items()): print(f" - {demo_conf.name:<40} ({path})") return bundler.bundles def get_job_status(bundler: JobBundler, demo_name: str): """Get detailed job status for a demo""" # Find the demo bundler.reset_staging_repo(skip_pull=True) # Try to find the job job_name = f"field-bundle_{demo_name}" job = bundler.db.find_job(job_name) if not job: print(f"No job found for demo: {demo_name}") return None job_id = job["job_id"] print(f"\n{'='*80}") print(f"Job: {job_name}") print(f"Job ID: {job_id}") print(f"URL: {bundler.conf.workspace_url}/#job/{job_id}") print(f"{'='*80}\n") # Get recent runs runs = bundler.db.get("2.1/jobs/runs/list", {"job_id": job_id, "limit": 5, "expand_tasks": "true"}) if "runs" not in runs or len(runs["runs"]) == 0: print("No runs found for this job.") return None for i, run in enumerate(runs["runs"]): run_id = run["run_id"] state = run["state"] status = run.get("status", {}) print(f"\n--- Run {i+1}: {run_id} ---") print(f"State: {state.get('life_cycle_state', 'N/A')} / {state.get('result_state', 'N/A')}") print(f"URL: {bundler.conf.workspace_url}/#job/{job_id}/run/{run_id}") if "termination_details" in status: print(f"Termination: {status['termination_details']}") # Show task details for the most recent run if i == 0 and "tasks" in run: print(f"\nTasks ({len(run['tasks'])} total):") for task in run["tasks"]: task_key = task["task_key"] task_state = task.get("state", {}) task_result = task_state.get("result_state", "PENDING") # Get error info if failed error_info = "" if task_result == "FAILED": # Try to get run output for error details task_run_id = task.get("run_id") if task_run_id: task_output = bundler.db.get("2.1/jobs/runs/get-output", {"run_id": task_run_id}) if "error" in task_output: error_info = f"\n Error: {task_output['error'][:200]}..." if "error_trace" in task_output: error_info += f"\n Trace: {task_output['error_trace'][:500]}..." status_icon = "✓" if task_result == "SUCCESS" else "✗" if task_result == "FAILED" else "○" print(f" {status_icon} {task_key}: {task_result}{error_info}") return runs["runs"][0] if runs["runs"] else None def wait_for_run(bundler: JobBundler, job_id: int, run_id: int): """Wait for a job run to complete""" import time print(f"Waiting for job completion...") print(f"URL: {bundler.conf.workspace_url}/#job/{job_id}/run/{run_id}") i = 0 while True: run = bundler.db.get("2.1/jobs/runs/get", {"run_id": run_id}) state = run.get("state", {}) life_cycle = state.get("life_cycle_state", "UNKNOWN") if life_cycle not in ["RUNNING", "PENDING"]: result = state.get("result_state", "UNKNOWN") print(f"\nJob finished: {life_cycle} / {result}") return result == "SUCCESS" if i % 60 == 0: # Print every 5 minutes print(f" Still running... ({i * 5}s elapsed)") i += 1 time.sleep(5) def repair_job(bundler: JobBundler, demo_name: str, wait: bool = False): """Repair a failed job (re-run only failed tasks)""" job_name = f"field-bundle_{demo_name}" job = bundler.db.find_job(job_name) if not job: print(f"No job found for demo: {demo_name}") return False job_id = job["job_id"] # Get the most recent run runs = bundler.db.get("2.1/jobs/runs/list", {"job_id": job_id, "limit": 1, "expand_tasks": "true"}) if "runs" not in runs or len(runs["runs"]) == 0: print("No runs found to repair.") return False latest_run = runs["runs"][0] run_id = latest_run["run_id"] # Check if run is in a repairable state state = latest_run["state"] if state.get("life_cycle_state") != "TERMINATED": print(f"Run is not terminated (state: {state.get('life_cycle_state')}). Cannot repair.") return False if state.get("result_state") == "SUCCESS": print("Run already succeeded. No repair needed.") return True # Find failed tasks failed_tasks = [] for task in latest_run.get("tasks", []): task_state = task.get("state", {}) if task_state.get("result_state") in ["FAILED", "CANCELED", "TIMEDOUT"]: failed_tasks.append(task["task_key"]) if not failed_tasks: print("No failed tasks found to repair.") return True print(f"Repairing run {run_id} - re-running tasks: {failed_tasks}") # Call repair API repair_response = bundler.db.post("2.1/jobs/runs/repair", { "run_id": run_id, "rerun_tasks": failed_tasks }) if "repair_id" in repair_response: print(f"Repair started. Repair ID: {repair_response['repair_id']}") print(f"URL: {bundler.conf.workspace_url}/#job/{job_id}/run/{run_id}") if wait: return wait_for_run(bundler, job_id, run_id) return True else: print(f"Failed to repair: {repair_response}") return False def cleanup_demo_schema(bundler: JobBundler, demo_conf): """Drop the demo schema to ensure clean state before running. Uses main__build as the catalog (bundling catalog) and the demo's default_schema. Uses Databricks SDK: w.schemas.delete(full_name=schema_full_name, force=True) """ from databricks.sdk import WorkspaceClient from databricks.sdk.errors import NotFound # Bundling uses main__build catalog catalog = "main__build" schema = demo_conf.default_schema if not schema: print(f" No default_schema defined for {demo_conf.name}, skipping cleanup") return full_schema = f"{catalog}.{schema}" print(f" Cleaning up schema: {full_schema}") try: w = WorkspaceClient( host=bundler.conf.workspace_url, token=bundler.conf.pat_token ) # force=True is equivalent to CASCADE w.schemas.delete(full_name=full_schema, force=True) print(f" ✓ Schema {full_schema} dropped successfully") except NotFound: print(f" ✓ Schema {full_schema} does not exist (nothing to clean)") except Exception as e: print(f" WARNING: Error during schema cleanup: {e}") def bundle_demo(bundler: JobBundler, demo_path: str, force: bool = False, skip_packaging: bool = False, cleanup_schema: bool = True): """Bundle a specific demo""" print(f"\nBundling demo: {demo_path}") print(f"Branch: {bundler.conf.branch}") bundler.reset_staging_repo(skip_pull=False) bundler.add_bundle(demo_path) if len(bundler.bundles) == 0: print(f"ERROR: Demo not found or not configured for bundling: {demo_path}") return False # Clean up schema before running if requested if cleanup_schema: print("\nCleaning up demo schemas...") for path, demo_conf in bundler.bundles.items(): cleanup_demo_schema(bundler, demo_conf) # Run the job bundler.start_and_wait_bundle_jobs( force_execution=force, skip_execution=False, recreate_jobs=False ) # Check results for path, demo_conf in bundler.bundles.items(): if demo_conf.run_id: run = bundler.db.get("2.1/jobs/runs/get", {"run_id": demo_conf.run_id}) result_state = run.get("state", {}).get("result_state", "UNKNOWN") if result_state == "SUCCESS": print(f"\n✓ Job succeeded for {demo_conf.name}") if not skip_packaging: print("Packaging demo...") packager = Packager(bundler.conf, bundler) packager.package_all() print(f"✓ Demo packaged successfully") return True else: print(f"\n✗ Job failed for {demo_conf.name}: {result_state}") print(f"Check: {bundler.conf.workspace_url}/#job/{demo_conf.job_id}/run/{demo_conf.run_id}") return False return False def bundle_all(bundler: JobBundler, force: bool = False, cleanup_schema: bool = True): """Bundle all demos (uses diff optimization)""" print("\nBundling all demos...") print(f"Branch: {bundler.conf.branch}") bundler.reset_staging_repo(skip_pull=False) bundler.load_bundles_conf() print(f"Found {len(bundler.bundles)} demos") # Clean up schemas before running if requested if cleanup_schema: print("\nCleaning up demo schemas...") for path, demo_conf in bundler.bundles.items(): cleanup_demo_schema(bundler, demo_conf) # Run jobs (will skip unchanged demos unless force=True) bundler.start_and_wait_bundle_jobs( force_execution=force, skip_execution=False, recreate_jobs=False ) # Check results success_count = 0 fail_count = 0 skip_count = 0 for path, demo_conf in bundler.bundles.items(): if demo_conf.run_id: run = bundler.db.get("2.1/jobs/runs/get", {"run_id": demo_conf.run_id}) result_state = run.get("state", {}).get("result_state", "UNKNOWN") if result_state == "SUCCESS": success_count += 1 else: fail_count += 1 print(f"✗ {demo_conf.name} failed: {result_state}") else: skip_count += 1 print(f"\nResults: {success_count} succeeded, {fail_count} failed, {skip_count} skipped") if fail_count == 0: print("\nPackaging all demos...") packager = Packager(bundler.conf, bundler) packager.package_all() print("✓ All demos packaged successfully") return True else: print("\n✗ Some jobs failed. Fix errors before packaging.") return False def find_demo_path(bundler: JobBundler, demo_name: str) -> str: """Find the full path for a demo by name""" bundler.reset_staging_repo(skip_pull=True) bundler.load_bundles_conf() # Check if it's already a path if demo_name in bundler.bundles: return demo_name # Search by demo name for path, demo_conf in bundler.bundles.items(): if demo_conf.name == demo_name: return path # Partial match matches = [] for path, demo_conf in bundler.bundles.items(): if demo_name in demo_conf.name or demo_name in path: matches.append((path, demo_conf.name)) if len(matches) == 1: return matches[0][0] elif len(matches) > 1: print(f"Multiple matches for '{demo_name}':") for path, name in matches: print(f" - {name} ({path})") print("\nPlease be more specific.") return None print(f"Demo not found: {demo_name}") return None def main(): parser = argparse.ArgumentParser( description="DBDemos Bundle CLI - Bundle and test demos", formatter_class=argparse.RawDescriptionHelpFormatter, epilog=__doc__ ) # Actions parser.add_argument("--demo", "-d", help="Demo name or path to bundle") parser.add_argument("--all", "-a", action="store_true", help="Bundle all demos") parser.add_argument("--list-demos", "-l", action="store_true", help="List all available demos") parser.add_argument("--status", "-s", action="store_true", help="Get job status for a demo") # Options parser.add_argument("--branch", "-b", help="Git branch to use (overrides config)") parser.add_argument("--force", "-f", action="store_true", help="Force re-run (ignore diff optimization)") parser.add_argument("--repair", "-r", action="store_true", help="Repair failed job (re-run failed tasks only)") parser.add_argument("--skip-packaging", action="store_true", help="Skip packaging step (useful for debugging)") parser.add_argument("--check-config", action="store_true", help="Verify configuration without running anything") parser.add_argument("--wait", "-w", action="store_true", help="Wait for job/repair completion") parser.add_argument("--no-cleanup-schema", action="store_true", help="Skip schema cleanup (default: cleanup enabled)") parser.add_argument("--cleanup-schema", action="store_true", default=True, help="Clean up demo schema before running (default: True)") args = parser.parse_args() # Load config config = load_config(args) conf = create_conf(config) bundler = JobBundler(conf) print(f"Workspace: {conf.workspace_url}") print(f"Branch: {conf.branch}") # Check config only if args.check_config: print(f"\n✓ Configuration valid") print(f" - Username: {conf.username}") print(f" - Repo: {conf.repo_url}") print(f" - Repo path: {conf.get_repo_path()}") print(f" - Notebooks path: {config.get('dbdemos_notebooks_path', 'N/A')}") return 0 # Execute action if args.list_demos: list_demos(bundler) return 0 if args.status: if not args.demo: print("ERROR: --status requires --demo") return 1 get_job_status(bundler, args.demo) return 0 if args.repair: if not args.demo: print("ERROR: --repair requires --demo") return 1 success = repair_job(bundler, args.demo, wait=args.wait) return 0 if success else 1 # Determine cleanup_schema setting (--no-cleanup-schema disables it) cleanup_schema = not args.no_cleanup_schema if args.demo: demo_path = find_demo_path(bundler, args.demo) if not demo_path: return 1 # Reset bundler after find_demo_path used it bundler = JobBundler(conf) success = bundle_demo(bundler, demo_path, force=args.force, skip_packaging=args.skip_packaging, cleanup_schema=cleanup_schema) return 0 if success else 1 if args.all: success = bundle_all(bundler, force=args.force, cleanup_schema=cleanup_schema) return 0 if success else 1 parser.print_help() return 1 if __name__ == "__main__": sys.exit(main()) ================================================ FILE: ai_release/compute.py ================================================ """ Remote Code Execution on Databricks Clusters This module provides functions to execute code on Databricks clusters for testing notebook fixes before committing them to dbdemos-notebooks. Based on databricks-tools-core from ai-dev-kit. """ import datetime import json import time from pathlib import Path from typing import Optional, List, Dict, Any from databricks.sdk import WorkspaceClient from databricks.sdk.service.compute import ( CommandStatus, ClusterSource, Language, ListClustersFilterBy, State, ) class ExecutionResult: """Result from code execution on a Databricks cluster.""" def __init__( self, success: bool, output: Optional[str] = None, error: Optional[str] = None, cluster_id: Optional[str] = None, cluster_name: Optional[str] = None, context_id: Optional[str] = None, context_destroyed: bool = True, ): self.success = success self.output = output self.error = error self.cluster_id = cluster_id self.cluster_name = cluster_name self.context_id = context_id self.context_destroyed = context_destroyed if success and context_id and not context_destroyed: self.message = ( f"Execution successful. Reuse context_id='{context_id}' with " f"cluster_id='{cluster_id}' for follow-up commands." ) elif success: self.message = "Execution successful." else: self.message = f"Execution failed: {error}" def __repr__(self): if self.success: return f"ExecutionResult(success=True, output={repr(self.output[:100] if self.output else None)}...)" return f"ExecutionResult(success=False, error={repr(self.error)})" def to_dict(self) -> Dict[str, Any]: return { "success": self.success, "output": self.output, "error": self.error, "cluster_id": self.cluster_id, "cluster_name": self.cluster_name, "context_id": self.context_id, "context_destroyed": self.context_destroyed, "message": self.message, } _LANGUAGE_MAP = { "python": Language.PYTHON, "scala": Language.SCALA, "sql": Language.SQL, "r": Language.R, } def get_workspace_client(host: str, token: str) -> WorkspaceClient: """Create a WorkspaceClient with explicit credentials.""" return WorkspaceClient( host=host, token=token, auth_type="pat", product="dbdemos-ai-release", product_version="0.1.0", ) def list_clusters(client: WorkspaceClient, include_terminated: bool = False) -> List[Dict[str, Any]]: """List user-created clusters in the workspace.""" clusters = [] # Only list user-created clusters user_sources = [ClusterSource.UI, ClusterSource.API] # Running clusters running_filter = ListClustersFilterBy( cluster_sources=user_sources, cluster_states=[State.RUNNING, State.PENDING, State.RESIZING, State.RESTARTING], ) for cluster in client.clusters.list(filter_by=running_filter): clusters.append({ "cluster_id": cluster.cluster_id, "cluster_name": cluster.cluster_name, "state": cluster.state.value if cluster.state else None, "creator_user_name": cluster.creator_user_name, }) if include_terminated: terminated_filter = ListClustersFilterBy( cluster_sources=user_sources, cluster_states=[State.TERMINATED, State.TERMINATING, State.ERROR], ) for cluster in client.clusters.list(filter_by=terminated_filter): clusters.append({ "cluster_id": cluster.cluster_id, "cluster_name": cluster.cluster_name, "state": cluster.state.value if cluster.state else None, "creator_user_name": cluster.creator_user_name, }) return clusters def find_cluster_by_name(client: WorkspaceClient, name_pattern: str) -> Optional[Dict[str, Any]]: """ Find a cluster by name pattern (case-insensitive). Args: client: WorkspaceClient name_pattern: Pattern to match (e.g., "quentin") Returns: Cluster info dict or None """ clusters = list_clusters(client, include_terminated=True) pattern_lower = name_pattern.lower() # First try running clusters for cluster in clusters: if cluster["state"] == "RUNNING" and pattern_lower in cluster["cluster_name"].lower(): return cluster # Then try any cluster for cluster in clusters: if pattern_lower in cluster["cluster_name"].lower(): return cluster return None def start_cluster(client: WorkspaceClient, cluster_id: str) -> Dict[str, Any]: """Start a terminated cluster.""" cluster = client.clusters.get(cluster_id) cluster_name = cluster.cluster_name or cluster_id current_state = cluster.state.value if cluster.state else "UNKNOWN" if current_state == "RUNNING": return { "cluster_id": cluster_id, "cluster_name": cluster_name, "state": "RUNNING", "message": f"Cluster '{cluster_name}' is already running.", } if current_state not in ("TERMINATED", "ERROR"): return { "cluster_id": cluster_id, "cluster_name": cluster_name, "state": current_state, "message": f"Cluster '{cluster_name}' is in state {current_state}.", } client.clusters.start(cluster_id) return { "cluster_id": cluster_id, "cluster_name": cluster_name, "previous_state": current_state, "state": "PENDING", "message": f"Cluster '{cluster_name}' is starting (3-8 minutes).", } def get_cluster_status(client: WorkspaceClient, cluster_id: str) -> Dict[str, Any]: """Get cluster status.""" cluster = client.clusters.get(cluster_id) return { "cluster_id": cluster_id, "cluster_name": cluster.cluster_name or cluster_id, "state": cluster.state.value if cluster.state else "UNKNOWN", } def wait_for_cluster(client: WorkspaceClient, cluster_id: str, timeout: int = 600) -> bool: """Wait for cluster to reach RUNNING state.""" start_time = time.time() while time.time() - start_time < timeout: status = get_cluster_status(client, cluster_id) state = status["state"] if state == "RUNNING": print(f"✓ Cluster '{status['cluster_name']}' is running") return True elif state in ("TERMINATED", "ERROR"): print(f"✗ Cluster '{status['cluster_name']}' is {state}") return False print(f" Cluster state: {state}... waiting") time.sleep(30) print(f"✗ Timeout waiting for cluster") return False def create_context(client: WorkspaceClient, cluster_id: str, language: str = "python") -> str: """Create an execution context on a cluster.""" lang_enum = _LANGUAGE_MAP.get(language.lower(), Language.PYTHON) result = client.command_execution.create( cluster_id=cluster_id, language=lang_enum ).result() return result.id def destroy_context(client: WorkspaceClient, cluster_id: str, context_id: str) -> None: """Destroy an execution context.""" client.command_execution.destroy(cluster_id=cluster_id, context_id=context_id) def execute_command( client: WorkspaceClient, code: str, cluster_id: str, context_id: Optional[str] = None, language: str = "python", timeout: int = 300, destroy_context_on_completion: bool = False, ) -> ExecutionResult: """ Execute code on a Databricks cluster. Args: client: WorkspaceClient code: Code to execute cluster_id: Cluster ID context_id: Optional existing context ID (for state preservation) language: "python", "scala", "sql", or "r" timeout: Timeout in seconds destroy_context_on_completion: Whether to destroy context after execution Returns: ExecutionResult """ # Get cluster name for better output try: cluster_info = client.clusters.get(cluster_id) cluster_name = cluster_info.cluster_name except Exception: cluster_name = cluster_id # Create context if not provided context_created = False if context_id is None: context_id = create_context(client, cluster_id, language) context_created = True lang_enum = _LANGUAGE_MAP.get(language.lower(), Language.PYTHON) try: result = client.command_execution.execute( cluster_id=cluster_id, context_id=context_id, language=lang_enum, command=code, ).result(timeout=datetime.timedelta(seconds=timeout)) if result.status == CommandStatus.FINISHED: # Check for error in results if result.results and result.results.result_type and result.results.result_type.value == "error": error_msg = result.results.cause if result.results.cause else "Unknown error" return ExecutionResult( success=False, error=error_msg, cluster_id=cluster_id, cluster_name=cluster_name, context_id=context_id, context_destroyed=False, ) output = result.results.data if result.results and result.results.data else "Success (no output)" exec_result = ExecutionResult( success=True, output=str(output), cluster_id=cluster_id, cluster_name=cluster_name, context_id=context_id, context_destroyed=False, ) elif result.status in [CommandStatus.ERROR, CommandStatus.CANCELLED]: error_msg = result.results.cause if result.results and result.results.cause else "Unknown error" exec_result = ExecutionResult( success=False, error=error_msg, cluster_id=cluster_id, cluster_name=cluster_name, context_id=context_id, context_destroyed=False, ) else: exec_result = ExecutionResult( success=False, error=f"Unexpected status: {result.status}", cluster_id=cluster_id, cluster_name=cluster_name, context_id=context_id, context_destroyed=False, ) # Destroy context if requested if destroy_context_on_completion: try: destroy_context(client, cluster_id, context_id) exec_result.context_destroyed = True except Exception: pass return exec_result except TimeoutError: return ExecutionResult( success=False, error=f"Command timed out after {timeout}s", cluster_id=cluster_id, cluster_name=cluster_name, context_id=context_id, context_destroyed=False, ) except Exception as e: if context_created and destroy_context_on_completion: try: destroy_context(client, cluster_id, context_id) except Exception: pass return ExecutionResult( success=False, error=str(e), cluster_id=cluster_id, cluster_name=cluster_name, context_id=context_id if not destroy_context_on_completion else None, context_destroyed=destroy_context_on_completion, ) def execute_file( client: WorkspaceClient, file_path: str, cluster_id: str, context_id: Optional[str] = None, timeout: int = 600, destroy_context_on_completion: bool = False, ) -> ExecutionResult: """ Execute a local Python file on a Databricks cluster. Args: client: WorkspaceClient file_path: Path to the Python file cluster_id: Cluster ID context_id: Optional existing context ID timeout: Timeout in seconds destroy_context_on_completion: Whether to destroy context after Returns: ExecutionResult """ try: with open(file_path, "r", encoding="utf-8") as f: code = f.read() except FileNotFoundError: return ExecutionResult(success=False, error=f"File not found: {file_path}") except Exception as e: return ExecutionResult(success=False, error=f"Failed to read file: {e}") if not code.strip(): return ExecutionResult(success=False, error=f"File is empty: {file_path}") return execute_command( client=client, code=code, cluster_id=cluster_id, context_id=context_id, language="python", timeout=timeout, destroy_context_on_completion=destroy_context_on_completion, ) ================================================ FILE: ai_release/inspect_jobs.py ================================================ #!/usr/bin/env python3 """ Job Inspection CLI for DBDemos Inspect bundle jobs, check their status, and get detailed failure information. Automatically extracts errors from notebook HTML when API doesn't provide them. Usage: # List all bundle jobs with their status python ai_release/inspect_jobs.py --list # List only failed jobs python ai_release/inspect_jobs.py --list --failed-only # Get detailed info for a specific demo (auto-fetches errors) python ai_release/inspect_jobs.py --demo ai-agent # Get detailed failure info with fix suggestions python ai_release/inspect_jobs.py --demo ai-agent --errors # Export notebook path for the failed task python ai_release/inspect_jobs.py --demo ai-agent --notebook-path # Check if job is up-to-date with HEAD commit python ai_release/inspect_jobs.py --demo ai-agent --check-commit # Get task output for debugging python ai_release/inspect_jobs.py --task-output <task_run_id> # Export failure summary to file python ai_release/inspect_jobs.py --demo ai-agent --errors --output errors.txt """ import argparse import json import sys from datetime import datetime from pathlib import Path # Add parent directory to path sys.path.insert(0, str(Path(__file__).parent.parent)) from ai_release.jobs import JobInspector, load_inspector_from_config, JobInfo def format_timestamp(ts: int) -> str: """Format a millisecond timestamp to human-readable string.""" if not ts: return "N/A" return datetime.fromtimestamp(ts / 1000).strftime("%Y-%m-%d %H:%M:%S") def format_duration(start: int, end: int) -> str: """Format duration between two timestamps.""" if not start or not end: return "N/A" duration_sec = (end - start) / 1000 if duration_sec < 60: return f"{duration_sec:.0f}s" elif duration_sec < 3600: return f"{duration_sec / 60:.1f}m" else: return f"{duration_sec / 3600:.1f}h" def print_job_list(jobs: list, failed_only: bool = False): """Print a formatted list of jobs.""" if failed_only: jobs = [j for j in jobs if j.latest_run and j.latest_run.failed] if not jobs: print("No jobs found.") return print(f"\n{'Demo Name':<40} {'State':<12} {'Result':<10} {'Run Time':<12} {'Commit':<10}") print("=" * 90) for job in sorted(jobs, key=lambda j: j.demo_name): run = job.latest_run if run: state_icon = "🟢" if run.succeeded else "🔴" if run.failed else "🟡" if run.running else "⚪" result = run.result_state or run.state duration = format_duration(run.start_time, run.end_time) commit = (run.used_commit or "")[:8] else: state_icon = "⚪" result = "NO RUNS" duration = "N/A" commit = "N/A" print(f"{state_icon} {job.demo_name:<38} {result:<12} {duration:<12} {commit:<10}") # Summary total = len(jobs) succeeded = len([j for j in jobs if j.latest_run and j.latest_run.succeeded]) failed = len([j for j in jobs if j.latest_run and j.latest_run.failed]) running = len([j for j in jobs if j.latest_run and j.latest_run.running]) print(f"\nTotal: {total} | ✓ Succeeded: {succeeded} | ✗ Failed: {failed} | ◐ Running: {running}") def print_fix_workflow(job: JobInfo, inspector: JobInspector): """Print suggested fix workflow for a failed job.""" print("\n" + "=" * 80) print("SUGGESTED FIX WORKFLOW") print("=" * 80) demo_name = job.demo_name run = job.latest_run # Get the notebook path from the first failed task notebook_path = None if run and run.failed_tasks: notebook_path = run.failed_tasks[0].notebook_path print(f""" 1. TEST FIX INTERACTIVELY (optional but recommended): python ai_release/run_remote.py --start-cluster --wait-for-cluster python ai_release/run_remote.py --code "# test your fix code here" 2. CREATE FIX BRANCH in dbdemos-notebooks: cd ../dbdemos-notebooks git checkout main && git pull origin main git checkout -b ai-fix-{demo_name}-<issue> 3. EDIT THE NOTEBOOK: {notebook_path or 'Check the failed task notebook path above'} 4. COMMIT AND PUSH: git add . && git commit -m "fix: <description>" && git push origin ai-fix-{demo_name}-<issue> 5. TEST THE FIX: cd ../dbdemos python ai_release/bundle.py --demo {demo_name} --branch ai-fix-{demo_name}-<issue> --force 6. IF STILL FAILING - iterate with repair (faster): python ai_release/bundle.py --demo {demo_name} --repair --wait 7. CREATE PR when tests pass: cd ../dbdemos-notebooks gh pr create --title "fix: {demo_name} <issue>" --body "Fixed <issue>" 8. AFTER PR MERGED - final verification: cd ../dbdemos python ai_release/bundle.py --demo {demo_name} --force """) def print_job_details(job: JobInfo, inspector: JobInspector, show_errors: bool = False, check_commit: bool = False, show_workflow: bool = True): """Print detailed information about a job.""" print(f"\n{'=' * 80}") print(f"Demo: {job.demo_name}") print(f"Job ID: {job.job_id}") print(f"Job URL: {inspector.get_job_url(job.job_id)}") print(f"{'=' * 80}") run = job.latest_run if not run: print("\nNo runs found for this job.") return print(f"\nLatest Run: {run.run_id}") print(f"Run URL: {inspector.get_job_url(job.job_id, run.run_id)}") print(f"State: {run.state}") print(f"Result: {run.result_state or 'N/A'}") print(f"Started: {format_timestamp(run.start_time)}") print(f"Ended: {format_timestamp(run.end_time)}") print(f"Duration: {format_duration(run.start_time, run.end_time)}") print(f"Git Commit: {run.used_commit or 'N/A'}") if run.state_message: print(f"Message: {run.state_message}") # Check commit status if check_commit: print(f"\n--- Git Commit Check ---") head = inspector.get_head_commit() if head: print(f"HEAD Commit: {head}") if run.used_commit: if run.used_commit == head: print("✓ Job is UP-TO-DATE with HEAD") else: print("✗ Job is OUTDATED - HEAD has newer commits") else: print("? Cannot determine - no commit info in job run") else: print("Could not fetch HEAD commit from GitHub") # Print tasks print(f"\n--- Tasks ({len(run.tasks)} total) ---") for task in run.tasks: icon = "✓" if task.state == "SUCCESS" else "✗" if task.failed else "○" notebook = task.notebook_path.split("/")[-1] if task.notebook_path else "N/A" print(f" {icon} {task.task_key}: {task.state} ({notebook})") # Print errors if job failed (always show for failed jobs, more detail with --errors) if run.failed_tasks: print(f"\n{'=' * 80}") print("FAILURE DETAILS") print(f"{'=' * 80}") for task in run.failed_tasks: print(f"\n--- Task: {task.task_key} ---") if task.notebook_path: print(f"Notebook: {task.notebook_path}") # Show error summary if task.error_message: print(f"\nError: {task.error_message}") # Show notebook errors if available if task.notebook_errors: print(f"\n--- Notebook Cell Errors ({len(task.notebook_errors)} found) ---") for err in task.notebook_errors: print(f"\n[Cell {err.cell_index}] {err.error_name}: {err.error_message}") if err.cell_source and show_errors: # Show the code that caused the error src = err.cell_source if len(src) > 500: src = src[:500] + "\n... (truncated)" print(f"\nCode:\n{src}") if err.error_trace and show_errors: trace = err.error_trace if len(trace) > 2000: trace = trace[:2000] + "\n... (truncated)" print(f"\nTraceback:\n{trace}") # Fallback to API trace if no notebook errors elif task.error_trace and show_errors: trace = task.error_trace if len(trace) > 3000: trace = trace[:3000] + "\n... (truncated, use --task-output for full trace)" print(f"\nStack Trace:\n{trace}") # Show fix workflow for failed jobs if show_workflow: print_fix_workflow(job, inspector) def print_task_output(inspector: JobInspector, task_run_id: int): """Print the full output from a task run, including exported notebook errors.""" print(f"\n{'=' * 80}") print(f"Task Run ID: {task_run_id}") print(f"{'=' * 80}") # First try standard API output output = inspector.get_task_output(task_run_id) if output: if output.get("error"): print(f"\nAPI Error:\n{output['error']}") if output.get("error_trace"): print(f"\nAPI Stack Trace:\n{output['error_trace']}") if output.get("notebook_output"): print(f"\nNotebook Output:\n{output['notebook_output']}") # Also export and parse notebook HTML for cell-level errors print("\n--- Extracting errors from notebook HTML ---") html = inspector.export_notebook_html(task_run_id) if html: errors = inspector.extract_errors_from_html(html) if errors: print(f"Found {len(errors)} error(s) in notebook cells:") for err in errors: print(f"\n[Cell {err.cell_index}] {err.error_name}: {err.error_message}") if err.cell_source: print(f"\nCode:\n{err.cell_source}") if err.error_trace: print(f"\nTraceback:\n{err.error_trace}") else: print("No cell errors found in notebook HTML") else: print("Could not export notebook HTML") def main(): parser = argparse.ArgumentParser( description="Inspect DBDemos bundle jobs", formatter_class=argparse.RawDescriptionHelpFormatter, epilog=__doc__ ) # Main actions parser.add_argument("--list", "-l", action="store_true", help="List all bundle jobs") parser.add_argument("--demo", "-d", help="Get details for a specific demo") parser.add_argument("--task-output", type=int, help="Get output from a specific task run ID") # Options parser.add_argument("--failed-only", "-f", action="store_true", help="Only show failed jobs") parser.add_argument("--errors", "-e", action="store_true", help="Show detailed error traces and code") parser.add_argument("--check-commit", "-c", action="store_true", help="Check if job is up-to-date with HEAD") parser.add_argument("--no-workflow", action="store_true", help="Don't show fix workflow suggestions") parser.add_argument("--notebook-path", action="store_true", help="Print only the notebook path for the first failed task") parser.add_argument("--output", "-o", help="Write output to file") parser.add_argument("--json", action="store_true", help="Output as JSON") args = parser.parse_args() # Load inspector try: inspector = load_inspector_from_config() print(f"Workspace: {inspector.host}") except Exception as e: print(f"Error loading config: {e}") return 1 # Redirect output to file if requested output_file = None if args.output: output_file = open(args.output, "w") sys.stdout = output_file try: # List jobs if args.list: print("\nFetching bundle jobs...") jobs = inspector.list_bundle_jobs(include_run_details=True) print_job_list(jobs, failed_only=args.failed_only) return 0 # Get demo details if args.demo: print(f"\nFetching job for demo: {args.demo}") job = inspector.find_job(args.demo) if not job: print(f"No job found for demo: {args.demo}") return 1 # Always get full details for failed jobs (to get errors) if job.latest_run and (job.latest_run.failed or args.errors or args.check_commit): print("Fetching error details...") job.latest_run = inspector.get_job_run_details(job.job_id, job.latest_run.run_id) # Just print notebook path if requested if args.notebook_path: if job.latest_run and job.latest_run.failed_tasks: for task in job.latest_run.failed_tasks: if task.notebook_path: print(task.notebook_path) return 0 if args.json: # Output as JSON for programmatic use data = { "demo_name": job.demo_name, "job_id": job.job_id, "job_url": inspector.get_job_url(job.job_id), } if job.latest_run: data["latest_run"] = { "run_id": job.latest_run.run_id, "state": job.latest_run.state, "result_state": job.latest_run.result_state, "used_commit": job.latest_run.used_commit, "failed_tasks": [ { "task_key": t.task_key, "run_id": t.run_id, "notebook_path": t.notebook_path, "error_message": t.error_message, "error_trace": t.error_trace, "notebook_errors": [ { "cell_index": e.cell_index, "error_name": e.error_name, "error_message": e.error_message, "cell_source": e.cell_source, } for e in t.notebook_errors ] if t.notebook_errors else [] } for t in job.latest_run.failed_tasks ] } print(json.dumps(data, indent=2)) else: print_job_details(job, inspector, show_errors=args.errors, check_commit=args.check_commit, show_workflow=not args.no_workflow) return 0 # Get task output if args.task_output: print_task_output(inspector, args.task_output) return 0 # No action specified parser.print_help() return 1 finally: if output_file: output_file.close() sys.stdout = sys.__stdout__ print(f"Output written to: {args.output}") if __name__ == "__main__": sys.exit(main()) ================================================ FILE: ai_release/jobs.py ================================================ """ Job Inspection Module for DBDemos Provides functions to inspect bundle jobs, get failure details, and compare git commits. Uses the Databricks SDK for all API operations. SDK Documentation: https://databricks-sdk-py.readthedocs.io/en/latest/ """ import json import re import requests import urllib.parse from bs4 import BeautifulSoup from dataclasses import dataclass, field from typing import Optional, List, Dict, Any from pathlib import Path from html import unescape from databricks.sdk import WorkspaceClient from databricks.sdk.service.jobs import RunResultState, RunLifeCycleState, ViewsToExport @dataclass class NotebookError: """Error extracted from a notebook cell.""" cell_index: int cell_type: str # "code", "markdown" error_name: str # e.g., "NameError", "ValueError" error_message: str error_trace: Optional[str] = None cell_source: Optional[str] = None # The code that caused the error @dataclass class TaskResult: """Result from a single task in a job run.""" task_key: str run_id: int state: str # SUCCESS, FAILED, SKIPPED, etc. notebook_path: Optional[str] = None error_message: Optional[str] = None error_trace: Optional[str] = None used_commit: Optional[str] = None notebook_errors: List[NotebookError] = field(default_factory=list) @property def failed(self) -> bool: return self.state in ("FAILED", "TIMEDOUT", "CANCELED") def get_error_summary(self) -> str: """Get a summary of all errors for this task.""" if not self.failed: return "Task succeeded" lines = [] if self.error_message: lines.append(f"Error: {self.error_message}") if self.error_trace: lines.append(f"Trace: {self.error_trace[:500]}...") for err in self.notebook_errors: lines.append(f"\n[Cell {err.cell_index}] {err.error_name}: {err.error_message}") if err.cell_source: # Show first 200 chars of source src = err.cell_source[:200] if len(err.cell_source) > 200: src += "..." lines.append(f"Code: {src}") if err.error_trace: lines.append(f"Traceback:\n{err.error_trace}") return "\n".join(lines) if lines else "Unknown error" @dataclass class JobRunResult: """Result from a job run with all task details.""" job_id: int job_name: str run_id: int state: str # RUNNING, TERMINATED, etc. result_state: Optional[str] = None # SUCCESS, FAILED, etc. state_message: Optional[str] = None start_time: Optional[int] = None end_time: Optional[int] = None tasks: List[TaskResult] = field(default_factory=list) used_commit: Optional[str] = None # Most recent commit from tasks @property def succeeded(self) -> bool: return self.result_state == "SUCCESS" @property def failed(self) -> bool: return self.result_state in ("FAILED", "TIMEDOUT", "CANCELED") @property def running(self) -> bool: return self.state == "RUNNING" @property def failed_tasks(self) -> List[TaskResult]: return [t for t in self.tasks if t.failed] def get_failure_summary(self) -> str: """Get a human-readable summary of failures.""" if self.succeeded: return "Job succeeded" lines = [f"Job {self.job_name} FAILED"] if self.state_message: lines.append(f"Message: {self.state_message}") for task in self.failed_tasks: lines.append(f"\n--- Task: {task.task_key} ---") if task.notebook_path: lines.append(f"Notebook: {task.notebook_path}") if task.error_message: lines.append(f"Error: {task.error_message}") if task.error_trace: # Truncate long traces trace = task.error_trace if len(trace) > 2000: trace = trace[:2000] + "\n... (truncated)" lines.append(f"Trace:\n{trace}") return "\n".join(lines) @dataclass class JobInfo: """Information about a bundle job.""" job_id: int job_name: str demo_name: str latest_run: Optional[JobRunResult] = None head_commit: Optional[str] = None is_up_to_date: Optional[bool] = None # True if latest run used HEAD commit class JobInspector: """ Inspects bundle jobs and retrieves detailed failure information. Uses the Databricks SDK for all API operations. Usage: inspector = JobInspector(host, token, github_token, repo_url) # List all bundle jobs jobs = inspector.list_bundle_jobs() # Get detailed failure info result = inspector.get_job_run_details(job_id, run_id) print(result.get_failure_summary()) # Get task error output output = inspector.get_task_output(task_run_id) """ # Both prefixes are used - field-demos_ for demos, field-bundle_ for bundling JOB_PREFIXES = ["field-demos_", "field-bundle_"] def __init__(self, host: str, token: str, github_token: str = None, repo_url: str = None): self.host = host.rstrip("/") self.token = token self.github_token = github_token self.repo_url = repo_url # Create Databricks SDK client self.ws = WorkspaceClient( host=host, token=token, auth_type="pat", product="dbdemos-ai-release", product_version="0.1.0" ) def _github_get(self, path: str) -> dict: """Make a GET request to the GitHub API.""" if not self.github_token: raise ValueError("GitHub token required for this operation") headers = { "Accept": "application/vnd.github.v3+json", "Authorization": f"token {self.github_token}" } url = f"https://api.github.com/{path}" resp = requests.get(url, headers=headers, timeout=60) return resp.json() def list_bundle_jobs(self, include_run_details: bool = True) -> List[JobInfo]: """ List all bundle jobs (jobs with 'field-bundle_' prefix). Args: include_run_details: If True, fetches latest run details for each job Returns: List of JobInfo objects """ jobs = [] # List all jobs using SDK for job in self.ws.jobs.list(): name = job.settings.name if job.settings else None if not name: continue # Check all known prefixes demo_name = None for prefix in self.JOB_PREFIXES: if name.startswith(prefix): demo_name = name[len(prefix):] break if demo_name: job_info = JobInfo( job_id=job.job_id, job_name=name, demo_name=demo_name ) if include_run_details: # Get latest run using SDK runs = list(self.ws.jobs.list_runs(job_id=job.job_id, limit=1, expand_tasks=True)) if runs: job_info.latest_run = self._parse_run(runs[0], name) jobs.append(job_info) return jobs def find_job(self, demo_name: str) -> Optional[JobInfo]: """Find a bundle job by demo name. Tries all known prefixes.""" # Try each prefix for prefix in self.JOB_PREFIXES: job_name = f"{prefix}{demo_name}" # Search with name filter using SDK for job in self.ws.jobs.list(name=job_name): if job.settings and job.settings.name == job_name: job_info = JobInfo( job_id=job.job_id, job_name=job_name, demo_name=demo_name ) # Get latest run runs = list(self.ws.jobs.list_runs(job_id=job.job_id, limit=1, expand_tasks=True)) if runs: job_info.latest_run = self._parse_run(runs[0], job_name) return job_info return None def get_job_run_details(self, job_id: int, run_id: int = None) -> Optional[JobRunResult]: """ Get detailed information about a job run. Args: job_id: The job ID run_id: Specific run ID. If None, gets the latest run. Returns: JobRunResult with full task details and errors """ if run_id is None: # Get latest run runs = list(self.ws.jobs.list_runs(job_id=job_id, limit=1, expand_tasks=True)) if not runs: return None run = runs[0] else: run = self.ws.jobs.get_run(run_id=run_id) # Get job name job = self.ws.jobs.get(job_id=job_id) job_name = job.settings.name if job.settings else f"job_{job_id}" result = self._parse_run(run, job_name) # For failed tasks, get detailed error output (API + notebook HTML) for task in result.failed_tasks: self.get_task_errors(task) return result def _parse_run(self, run, job_name: str) -> JobRunResult: """Parse a run object from SDK into a JobRunResult.""" # Get state info state = run.state lifecycle_state = state.life_cycle_state.value if state and state.life_cycle_state else "UNKNOWN" result_state = state.result_state.value if state and state.result_state else None state_message = state.state_message if state else None # Parse tasks tasks = [] most_recent_commit = None for task in (run.tasks or []): task_state = task.state task_result_state = task_state.result_state.value if task_state and task_state.result_state else "UNKNOWN" # Get commit from git_source used_commit = None if task.git_source and task.git_source.git_snapshot: used_commit = task.git_source.git_snapshot.used_commit if used_commit and (not most_recent_commit or used_commit > most_recent_commit): most_recent_commit = used_commit notebook_path = None if task.notebook_task: notebook_path = task.notebook_task.notebook_path task_result = TaskResult( task_key=task.task_key or "unknown", run_id=task.run_id or 0, state=task_result_state, notebook_path=notebook_path, used_commit=used_commit ) tasks.append(task_result) return JobRunResult( job_id=run.job_id or 0, job_name=job_name, run_id=run.run_id or 0, state=lifecycle_state, result_state=result_state, state_message=state_message, start_time=run.start_time, end_time=run.end_time, tasks=tasks, used_commit=most_recent_commit ) def get_task_output(self, task_run_id: int) -> Optional[Dict[str, Any]]: """ Get the output/error from a specific task run. Args: task_run_id: The task's run_id (not the job run_id) Returns: Dict with 'error' and 'error_trace' if available """ try: output = self.ws.jobs.get_run_output(run_id=task_run_id) return { "error": output.error, "error_trace": output.error_trace, "metadata": str(output.metadata) if output.metadata else None, "notebook_output": str(output.notebook_output) if output.notebook_output else None } except Exception as e: return {"error": str(e)} def export_notebook_html(self, task_run_id: int) -> Optional[str]: """ Export the notebook HTML from a task run. Args: task_run_id: The task's run_id Returns: HTML content of the notebook with outputs, or None if failed """ try: export = self.ws.jobs.export_run(run_id=task_run_id, views_to_export=ViewsToExport.ALL) if export.views and len(export.views) > 0: return export.views[0].content return None except Exception as e: print(f"Failed to export notebook: {e}") return None def extract_errors_from_html(self, html_content: str) -> List[NotebookError]: """ Parse notebook HTML and extract error information from failed cells. The HTML contains a base64+URL encoded JSON model with command details. Args: html_content: The HTML content from export_notebook_html Returns: List of NotebookError objects for each failed cell """ import base64 errors = [] # Find the notebook model in the HTML - it's base64 then URL encoded JSON match = re.search(r'__DATABRICKS_NOTEBOOK_MODEL = \'([^\']+)\'', html_content) if not match: return errors try: encoded = match.group(1) # Decode: base64 -> URL encoding -> JSON decoded_bytes = base64.b64decode(encoded) url_encoded = decoded_bytes.decode('utf-8') json_str = urllib.parse.unquote(url_encoded) model = json.loads(json_str) except Exception as e: print(f"Failed to parse notebook model: {e}") return errors # Extract errors from commands for idx, cmd in enumerate(model.get('commands', [])): state = cmd.get('state') error_summary = cmd.get('errorSummary') error = cmd.get('error') # Skip non-error commands and "Command skipped" errors (not the root cause) if not (error or error_summary) or state != 'error': continue if error_summary == 'Command skipped': continue # Get command source cell_source = cmd.get('command', '') # Parse error name and message error_name = "Error" error_message = error_summary or "Unknown error" error_trace = None # Try to parse Python exception from error_summary exc_match = re.search(r'(\w+Error|\w+Exception):\s*(.+)', error_summary or '') if exc_match: error_name = exc_match.group(1) error_message = exc_match.group(2).strip() # Clean ANSI codes from error trace if error: # Remove ANSI escape codes error_trace = re.sub(r'\x1b\[[0-9;]*m', '', str(error)) errors.append(NotebookError( cell_index=idx, cell_type="code", error_name=error_name, error_message=error_message, error_trace=error_trace, cell_source=cell_source[:500] if cell_source else None # Truncate source )) return errors def get_task_errors(self, task: TaskResult) -> TaskResult: """ Get comprehensive error information for a failed task. First tries API, then falls back to exporting and parsing notebook HTML. Args: task: TaskResult to enrich with error information Returns: The same TaskResult with error fields populated """ # First try the standard API output = self.get_task_output(task.run_id) if output: task.error_message = output.get("error") task.error_trace = output.get("error_trace") # If no error from API, export and parse the notebook if not task.error_message and not task.error_trace: html = self.export_notebook_html(task.run_id) if html: errors = self.extract_errors_from_html(html) task.notebook_errors = errors # Set primary error from first notebook error if errors: first_err = errors[0] task.error_message = f"{first_err.error_name}: {first_err.error_message}" task.error_trace = first_err.error_trace return task def get_head_commit(self) -> Optional[str]: """Get the HEAD commit SHA from the GitHub repo.""" if not self.repo_url or not self.github_token: return None # Extract owner/repo from URL match = re.search(r'github\.com[/:]([^/]+)/([^/\.]+)', self.repo_url) if not match: return None owner, repo = match.groups() resp = self._github_get(f"repos/{owner}/{repo}/commits/HEAD") return resp.get("sha") def check_job_up_to_date(self, job_info: JobInfo) -> bool: """ Check if a job's latest run used the HEAD commit. Args: job_info: JobInfo with latest_run populated Returns: True if the job was run with the latest commit """ if not job_info.latest_run or not job_info.latest_run.used_commit: return False head_commit = self.get_head_commit() if not head_commit: return False job_info.head_commit = head_commit job_info.is_up_to_date = job_info.latest_run.used_commit == head_commit return job_info.is_up_to_date def get_failed_jobs(self) -> List[JobInfo]: """Get all bundle jobs that have a failed latest run.""" all_jobs = self.list_bundle_jobs(include_run_details=True) return [j for j in all_jobs if j.latest_run and j.latest_run.failed] def get_job_url(self, job_id: int, run_id: int = None) -> str: """Get the workspace URL for a job or run.""" if run_id: return f"{self.host}/#job/{job_id}/run/{run_id}" return f"{self.host}/#job/{job_id}" def load_inspector_from_config() -> JobInspector: """Load a JobInspector using the local config file.""" repo_root = Path(__file__).parent.parent conf_files = [ repo_root / "local_conf_E2TOOL.json", repo_root / "local_conf.json", ] config = None for conf_file in conf_files: if conf_file.exists(): with open(conf_file, "r") as f: config = json.load(f) break if not config: raise FileNotFoundError("No config file found") # Clean repo_url repo_url = config.get("repo_url", "") if repo_url.endswith(".git"): repo_url = repo_url[:-4] return JobInspector( host=config["url"], token=config["pat_token"], github_token=config.get("github_token"), repo_url=repo_url ) ================================================ FILE: ai_release/run_remote.py ================================================ #!/usr/bin/env python3 """ Remote Code Execution CLI for DBDemos Execute Python code on a Databricks cluster for testing notebook fixes. Usage: # Execute code directly python ai_release/run_remote.py --code "print('Hello from Databricks!')" # Execute a file python ai_release/run_remote.py --file path/to/script.py # Execute SQL python ai_release/run_remote.py --code "SELECT 1" --language sql # List available clusters python ai_release/run_remote.py --list-clusters # Start a cluster python ai_release/run_remote.py --start-cluster # Check cluster status python ai_release/run_remote.py --cluster-status # Reuse context for faster follow-up commands python ai_release/run_remote.py --code "x = 1" --save-context python ai_release/run_remote.py --code "print(x)" --load-context Environment Variables / Config: Uses local_conf_E2TOOL.json for credentials. Cluster is auto-selected by matching "cluster_name_pattern" (default: "quentin") """ import argparse import json import os import sys from pathlib import Path # Add parent directory to path sys.path.insert(0, str(Path(__file__).parent.parent)) from ai_release.compute import ( get_workspace_client, list_clusters, find_cluster_by_name, start_cluster, get_cluster_status, wait_for_cluster, execute_command, execute_file, ) CONTEXT_FILE = Path(__file__).parent / ".execution_context.json" def load_config(): """Load configuration from local_conf_E2TOOL.json""" repo_root = Path(__file__).parent.parent conf_files = [ repo_root / "local_conf_E2TOOL.json", repo_root / "local_conf.json", ] for conf_file in conf_files: if conf_file.exists(): with open(conf_file, "r") as f: config = json.load(f) print(f"Loaded config from {conf_file.name}") return config print("ERROR: No config file found (local_conf_E2TOOL.json or local_conf.json)") sys.exit(1) def save_context(cluster_id: str, context_id: str): """Save execution context for reuse.""" with open(CONTEXT_FILE, "w") as f: json.dump({"cluster_id": cluster_id, "context_id": context_id}, f) print(f"Context saved to {CONTEXT_FILE}") def load_context(): """Load saved execution context.""" if CONTEXT_FILE.exists(): with open(CONTEXT_FILE, "r") as f: return json.load(f) return None def clear_context(): """Clear saved context.""" if CONTEXT_FILE.exists(): CONTEXT_FILE.unlink() print("Context cleared") def main(): parser = argparse.ArgumentParser( description="Execute code on Databricks clusters", formatter_class=argparse.RawDescriptionHelpFormatter, epilog=__doc__, ) # Execution options parser.add_argument("--code", "-c", help="Code to execute") parser.add_argument("--file", "-f", help="Python file to execute") parser.add_argument("--language", "-l", default="python", choices=["python", "sql", "scala", "r"]) parser.add_argument("--timeout", "-t", type=int, default=300, help="Timeout in seconds") # Cluster management parser.add_argument("--list-clusters", action="store_true", help="List available clusters") parser.add_argument("--start-cluster", action="store_true", help="Start the configured cluster") parser.add_argument("--cluster-status", action="store_true", help="Check cluster status") parser.add_argument("--wait-for-cluster", action="store_true", help="Wait for cluster to be running") parser.add_argument("--cluster-name", help="Cluster name pattern to match (default: from config or 'quentin')") # Context management parser.add_argument("--save-context", action="store_true", help="Save context for reuse") parser.add_argument("--load-context", action="store_true", help="Reuse saved context") parser.add_argument("--clear-context", action="store_true", help="Clear saved context") parser.add_argument("--destroy-context", action="store_true", help="Destroy context after execution") args = parser.parse_args() # Load config config = load_config() host = config.get("url", os.environ.get("DATABRICKS_HOST")) token = config.get("pat_token", os.environ.get("DATABRICKS_TOKEN")) cluster_pattern = args.cluster_name or config.get("cluster_name_pattern", "quentin") if not host or not token: print("ERROR: Missing workspace URL or token") sys.exit(1) # Create client client = get_workspace_client(host, token) print(f"Workspace: {host}") # Clear context if args.clear_context: clear_context() return 0 # List clusters if args.list_clusters: clusters = list_clusters(client, include_terminated=True) print(f"\nFound {len(clusters)} clusters:\n") for c in clusters: state_icon = "🟢" if c["state"] == "RUNNING" else "🔴" if c["state"] == "TERMINATED" else "🟡" print(f" {state_icon} {c['cluster_name']:<40} {c['state']:<12} {c['cluster_id']}") return 0 # Find cluster cluster = find_cluster_by_name(client, cluster_pattern) if not cluster: print(f"ERROR: No cluster found matching '{cluster_pattern}'") print("Use --list-clusters to see available clusters") return 1 print(f"Cluster: {cluster['cluster_name']} ({cluster['state']})") cluster_id = cluster["cluster_id"] # Cluster status if args.cluster_status: status = get_cluster_status(client, cluster_id) print(f" State: {status['state']}") return 0 # Start cluster if args.start_cluster: result = start_cluster(client, cluster_id) print(f" {result['message']}") if args.wait_for_cluster and result.get("state") != "RUNNING": wait_for_cluster(client, cluster_id) return 0 # Wait for cluster if args.wait_for_cluster: success = wait_for_cluster(client, cluster_id) return 0 if success else 1 # Execute code if args.code or args.file: # Check cluster is running if cluster["state"] != "RUNNING": print(f"ERROR: Cluster is {cluster['state']}, not RUNNING") print("Use --start-cluster --wait-for-cluster to start it") return 1 # Load context if requested context_id = None if args.load_context: saved = load_context() if saved and saved.get("cluster_id") == cluster_id: context_id = saved.get("context_id") print(f"Reusing context: {context_id}") else: print("No saved context found or cluster changed, creating new context") # Execute if args.file: print(f"\nExecuting file: {args.file}") result = execute_file( client=client, file_path=args.file, cluster_id=cluster_id, context_id=context_id, timeout=args.timeout, destroy_context_on_completion=args.destroy_context, ) else: print(f"\nExecuting {args.language} code...") result = execute_command( client=client, code=args.code, cluster_id=cluster_id, context_id=context_id, language=args.language, timeout=args.timeout, destroy_context_on_completion=args.destroy_context, ) # Print result print("\n" + "=" * 60) if result.success: print("✓ SUCCESS") print("=" * 60) print(result.output) else: print("✗ FAILED") print("=" * 60) print(result.error) # Save context if requested if args.save_context and result.context_id and not result.context_destroyed: save_context(cluster_id, result.context_id) print(f"\nContext saved. Use --load-context to reuse.") return 0 if result.success else 1 parser.print_help() return 1 if __name__ == "__main__": sys.exit(main()) ================================================ FILE: ai_release/run_state.py ================================================ """ Run state management for AI release workflow. Tracks job runs, errors, and fixes in a persistent folder structure: ai_release/runs/ <commit_id>/ state.json - Overall run state <demo_name>/ status.json - Demo-specific status errors.json - Extracted errors from failed runs fix_attempts.json - History of fix attempts job_output.log - Raw job output notes.md - AI notes and observations """ import json import os from dataclasses import dataclass, field, asdict from datetime import datetime from pathlib import Path from typing import Optional, List, Dict, Any RUNS_DIR = Path(__file__).parent / "runs" @dataclass class DemoRunState: """State for a single demo run.""" demo_name: str status: str = "pending" # pending, running, success, failed, fixing job_id: Optional[int] = None run_id: Optional[int] = None branch: Optional[str] = None started_at: Optional[str] = None completed_at: Optional[str] = None error_summary: Optional[str] = None fix_attempts: List[Dict[str, Any]] = field(default_factory=list) def to_dict(self) -> dict: return asdict(self) @classmethod def from_dict(cls, data: dict) -> "DemoRunState": return cls(**data) @dataclass class RunState: """Overall state for a release run.""" commit_id: str branch: str = "main" started_at: str = field(default_factory=lambda: datetime.now().isoformat()) demos: Dict[str, DemoRunState] = field(default_factory=dict) def to_dict(self) -> dict: return { "commit_id": self.commit_id, "branch": self.branch, "started_at": self.started_at, "demos": {k: v.to_dict() for k, v in self.demos.items()} } @classmethod def from_dict(cls, data: dict) -> "RunState": demos = {k: DemoRunState.from_dict(v) for k, v in data.get("demos", {}).items()} return cls( commit_id=data["commit_id"], branch=data.get("branch", "main"), started_at=data.get("started_at", ""), demos=demos ) class RunStateManager: """Manages persistent run state for AI release workflow.""" def __init__(self, commit_id: Optional[str] = None): """Initialize with a specific commit or auto-detect from git.""" if commit_id is None: commit_id = self._get_current_commit() self.commit_id = commit_id self.run_dir = RUNS_DIR / commit_id self.run_dir.mkdir(parents=True, exist_ok=True) self.state = self._load_or_create_state() def _get_current_commit(self) -> str: """Get current git commit from dbdemos-notebooks.""" import subprocess try: result = subprocess.run( ["git", "rev-parse", "--short", "HEAD"], cwd=Path(__file__).parent.parent.parent / "dbdemos-notebooks", capture_output=True, text=True ) if result.returncode == 0: return result.stdout.strip() except Exception: pass return datetime.now().strftime("%Y%m%d_%H%M%S") def _load_or_create_state(self) -> RunState: """Load existing state or create new one.""" state_file = self.run_dir / "state.json" if state_file.exists(): with open(state_file) as f: return RunState.from_dict(json.load(f)) return RunState(commit_id=self.commit_id) def save(self): """Save current state to disk.""" state_file = self.run_dir / "state.json" with open(state_file, "w") as f: json.dump(self.state.to_dict(), f, indent=2) def get_demo_dir(self, demo_name: str) -> Path: """Get or create directory for a demo.""" demo_dir = self.run_dir / demo_name demo_dir.mkdir(parents=True, exist_ok=True) return demo_dir def get_demo_state(self, demo_name: str) -> DemoRunState: """Get state for a specific demo.""" if demo_name not in self.state.demos: self.state.demos[demo_name] = DemoRunState(demo_name=demo_name) return self.state.demos[demo_name] def update_demo_status(self, demo_name: str, status: str, **kwargs): """Update demo status and save.""" demo_state = self.get_demo_state(demo_name) demo_state.status = status for key, value in kwargs.items(): if hasattr(demo_state, key): setattr(demo_state, key, value) if status == "running" and not demo_state.started_at: demo_state.started_at = datetime.now().isoformat() elif status in ("success", "failed"): demo_state.completed_at = datetime.now().isoformat() self.save() self._save_demo_status(demo_name, demo_state) def _save_demo_status(self, demo_name: str, state: DemoRunState): """Save demo-specific status file.""" demo_dir = self.get_demo_dir(demo_name) with open(demo_dir / "status.json", "w") as f: json.dump(state.to_dict(), f, indent=2) def save_errors(self, demo_name: str, errors: List[Dict[str, Any]]): """Save extracted errors for a demo.""" demo_dir = self.get_demo_dir(demo_name) with open(demo_dir / "errors.json", "w") as f: json.dump({ "extracted_at": datetime.now().isoformat(), "errors": errors }, f, indent=2) def save_job_output(self, demo_name: str, output: str): """Save raw job output.""" demo_dir = self.get_demo_dir(demo_name) with open(demo_dir / "job_output.log", "w") as f: f.write(output) def add_fix_attempt(self, demo_name: str, description: str, branch: str, files_changed: List[str]): """Record a fix attempt.""" demo_state = self.get_demo_state(demo_name) attempt = { "timestamp": datetime.now().isoformat(), "description": description, "branch": branch, "files_changed": files_changed, "result": "pending" } demo_state.fix_attempts.append(attempt) self.save() # Also save to fix_attempts.json demo_dir = self.get_demo_dir(demo_name) with open(demo_dir / "fix_attempts.json", "w") as f: json.dump(demo_state.fix_attempts, f, indent=2) def update_fix_result(self, demo_name: str, result: str): """Update the result of the latest fix attempt.""" demo_state = self.get_demo_state(demo_name) if demo_state.fix_attempts: demo_state.fix_attempts[-1]["result"] = result self.save() def add_note(self, demo_name: str, note: str): """Add a note to the demo's notes.md file.""" demo_dir = self.get_demo_dir(demo_name) notes_file = demo_dir / "notes.md" timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S") with open(notes_file, "a") as f: f.write(f"\n## {timestamp}\n\n{note}\n") def get_summary(self) -> str: """Get a summary of all demo states.""" lines = [ f"# Release Run: {self.commit_id}", f"Branch: {self.state.branch}", f"Started: {self.state.started_at}", "", "## Demo Status", "" ] for name, demo in sorted(self.state.demos.items()): status_emoji = { "pending": "⏳", "running": "🔄", "success": "✅", "failed": "❌", "fixing": "🔧" }.get(demo.status, "❓") line = f"- {status_emoji} **{name}**: {demo.status}" if demo.error_summary: line += f" - {demo.error_summary[:50]}..." if demo.fix_attempts: line += f" ({len(demo.fix_attempts)} fix attempts)" lines.append(line) return "\n".join(lines) @classmethod def list_runs(cls) -> List[str]: """List all existing run directories.""" if not RUNS_DIR.exists(): return [] return sorted([d.name for d in RUNS_DIR.iterdir() if d.is_dir()]) @classmethod def get_latest_run(cls) -> Optional["RunStateManager"]: """Get the most recent run state manager.""" runs = cls.list_runs() if not runs: return None return cls(runs[-1]) # Convenience functions def get_run_state(commit_id: Optional[str] = None) -> RunStateManager: """Get or create a run state manager.""" return RunStateManager(commit_id) def get_latest_run() -> Optional[RunStateManager]: """Get the latest run state.""" return RunStateManager.get_latest_run() ================================================ FILE: ai_release/runs/.gitignore ================================================ *.log */ !.gitignore ================================================ FILE: build-and-distribute.sh ================================================ #!/bin/bash # Check for pending changes before doing anything if ! git diff --quiet || ! git diff --cached --quiet; then echo "Error: You have uncommitted changes." echo "Please commit and push your changes before running a release." echo "" git status --short exit 1 fi if [ -n "$(git log origin/main..HEAD 2>/dev/null)" ]; then echo "Error: You have unpushed commits." echo "Please push your changes before running a release." echo "" git log origin/main..HEAD --oneline exit 1 fi # Check if gh CLI is installed if ! command -v gh &> /dev/null; then echo "Error: GitHub CLI (gh) is not installed. Please install it first." echo "Visit: https://cli.github.com/" exit 1 fi # Check if pip-compile is installed (from pip-tools) if ! command -v pip-compile &> /dev/null; then echo "Error: pip-compile is not installed. Please install pip-tools first." echo "Run: pip install pip-tools" exit 1 fi # Check authentication status if ! gh auth status &> /dev/null; then echo "GitHub CLI not authenticated. Please login..." gh auth login fi # Check if active account is Enterprise Managed User (ends with _data) ACTIVE_ACCOUNT=$(gh auth status | grep "Logged in to" | head -1 | sed 's/.*Logged in to github.com account \([^ ]*\).*/\1/') if [[ "$ACTIVE_ACCOUNT" == *"_data" ]]; then echo "Warning: Active account '$ACTIVE_ACCOUNT' appears to be an Enterprise Managed User" echo "Switching to regular account..." # Get list of available accounts by parsing auth status output AVAILABLE_ACCOUNTS=$(gh auth status | grep "Logged in to" | sed 's/.*Logged in to github.com account \([^ ]*\).*/\1/') # Find first account that doesn't end with _data REGULAR_ACCOUNT="" while IFS= read -r account; do if [[ "$account" != *"_data" ]]; then REGULAR_ACCOUNT="$account" break fi done <<< "$AVAILABLE_ACCOUNTS" if [[ -n "$REGULAR_ACCOUNT" ]]; then echo "Switching to regular account: $REGULAR_ACCOUNT" gh auth switch --user "$REGULAR_ACCOUNT" || { echo "Error: Failed to switch to regular account" exit 1 } else echo "Error: No regular account found. Please add a regular GitHub account:" echo "gh auth login" exit 1 fi fi # Check access to required repositories echo "Checking access to required repositories..." REPOS=("databricks-demos/dbdemos" "databricks-demos/dbdemos-notebooks" "databricks-demos/dbdemos-dataset" "databricks-demos/dbdemos-resources") for repo in "${REPOS[@]}"; do if ! gh api "repos/$repo" &> /dev/null; then echo "Error: No access to repository $repo" echo "Please ensure you have the necessary permissions or try logging in again:" echo "gh auth login" exit 1 fi echo "✓ Access confirmed for $repo" done # Switch to main and pull latest git checkout main || exit 1 git pull || exit 1 # Get current version from setup.py CURRENT_VERSION=$(grep "version=" setup.py | sed "s/.*version='\([^']*\)'.*/\1/") echo "Current version: $CURRENT_VERSION" # Bump version (patch increment) IFS='.' read -ra VERSION_PARTS <<< "$CURRENT_VERSION" NEW_PATCH=$((VERSION_PARTS[2] + 1)) NEW_VERSION="${VERSION_PARTS[0]}.${VERSION_PARTS[1]}.$NEW_PATCH" echo "New version: $NEW_VERSION" # Update version in setup.py sed -i.bak "s/version='[^']*'/version='$NEW_VERSION'/" setup.py rm setup.py.bak # Update version in __init__.py sed -i.bak "s/__version__ = \"[^\"]*\"/__version__ = \"$NEW_VERSION\"/" dbdemos/__init__.py rm dbdemos/__init__.py.bak # Generate requirements.txt with hashes from trusted private index echo "Generating requirements.txt with hashes..." # Extract dependencies from setup.py and write to requirements.in python3 -c " import ast import sys with open('setup.py', 'r') as f: content = f.read() # Parse the setup.py file tree = ast.parse(content) # Find the setup() call and extract install_requires for node in ast.walk(tree): if isinstance(node, ast.Call) and getattr(node.func, 'id', None) == 'setup': for keyword in node.keywords: if keyword.arg == 'install_requires': # Extract the list of dependencies deps = ast.literal_eval(compile(ast.Expression(keyword.value), '<string>', 'eval')) for dep in deps: print(dep) sys.exit(0) print('Error: Could not extract install_requires from setup.py', file=sys.stderr) sys.exit(1) " > requirements.in if [ $? -ne 0 ]; then echo "Error: Failed to extract dependencies from setup.py" exit 1 fi echo "Extracted dependencies:" cat requirements.in # Run pip-compile with private index to get trusted hashes PRIVATE_INDEX="https://pypi-proxy.dev.databricks.com/simple/" pip-compile --generate-hashes --index-url="$PRIVATE_INDEX" --output-file=requirements.txt requirements.in if [ $? -ne 0 ]; then echo "Error: pip-compile failed" exit 1 fi # Remove the private index URL from requirements.txt (keep hashes, they're content-based) sed -i.bak '/^--index-url/d' requirements.txt # Also clean up the comment that references the private index sed -i.bak "s|--index-url=$PRIVATE_INDEX ||g" requirements.txt rm requirements.txt.bak echo "requirements.txt generated with hashes (private index removed)" # Use the version we just bumped VERSION=$NEW_VERSION echo "Using bumped version: $VERSION" #package rm -rf ./dist/* rm -rf ./dbdemos/bundles/.DS_Store python3 setup.py clean --all bdist_wheel echo "Package built under dist/ - updating pypi with new version..." ls -alh ./dist if ! twine upload dist/*; then echo "Error: Failed to upload package to PyPI" exit 1 fi echo "Upload ok - available as pip install dbdemos" # Create or switch to release branch and commit the bumped version echo "Creating/updating release branch with bumped version..." git checkout -b release/v$VERSION 2>/dev/null || git checkout release/v$VERSION git add setup.py dbdemos/__init__.py requirements.in requirements.txt git commit -m "Bump version to $VERSION" git push origin release/v$VERSION # Create PR to main branch echo "Creating pull request to main branch..." if gh pr create --title "Release v$VERSION" --body "Automated release for version $VERSION" --base main --head release/v$VERSION; then echo "Pull request created successfully" else echo "Warning: Failed to create pull request (may already exist)" fi # Also update main with the version bump so it doesn't get lost echo "Syncing version bump to main..." git checkout main git add setup.py dbdemos/__init__.py requirements.in requirements.txt git commit -m "Bump version to $VERSION" git push origin main # Find the wheel file WHL_FILE=$(find ./dist -name "*.whl" | head -n 1) if [ -z "$WHL_FILE" ]; then echo "Error: No wheel file found in ./dist directory" exit 1 fi echo "Found wheel file: $WHL_FILE" # Extract version from wheel filename (format: dbdemos-0.6.12-py3-none-any.whl) VERSION=$(basename "$WHL_FILE" | sed -E 's/dbdemos-([0-9]+\.[0-9]+\.[0-9]+).*/\1/') echo "Extracted version from wheel file: $VERSION" # Function to create a release and upload asset using gh CLI create_release_with_asset() { local repo=$1 local tag_name="v$VERSION" local release_name="v$VERSION" echo "Creating release $release_name for $repo..." # Create the release using gh CLI if gh release create "$tag_name" "$WHL_FILE" --repo "$repo" --title "$release_name" --notes "Release version $VERSION"; then echo "Release created and asset uploaded successfully for $repo" return 0 else echo "Error creating release for $repo" return 1 fi } # Create releases with assets on all repositories echo "Creating releases for version v$VERSION..." create_release_with_asset "databricks-demos/dbdemos" create_release_with_asset "databricks-demos/dbdemos-notebooks" create_release_with_asset "databricks-demos/dbdemos-dataset" create_release_with_asset "databricks-demos/dbdemos-resources" echo "Release process completed for v$VERSION!" ================================================ FILE: build.sh ================================================ python3 setup.py clean --all bdist_wheel conda activate test_dbdemos #pip3 install dist/dbdemos-0.3.0-py3-none-any.whl --force #python3 test_package.py #cp dist/dbdemos-* release/ ================================================ FILE: dbdemos/__init__.py ================================================ __version__ = "0.6.34" from .dbdemos import list_demos, install, create_cluster, help, install_all, check_status_all, check_status, get_html_list_demos ================================================ FILE: dbdemos/conf.py ================================================ import json from pathlib import Path from typing import List import requests import urllib from datetime import date import re import threading from requests import Response def merge_dict(a, b, path=None, override = True): """merges dict b into a. Mutate a""" if path is None: path = [] for key in b: if key in a: if isinstance(a[key], dict) and isinstance(b[key], dict): merge_dict(a[key], b[key], path + [str(key)]) elif override: a[key] = b[key] else: a[key] = b[key] class Conf(): def __init__(self, username: str, workspace_url: str, org_id: str, pat_token: str, default_cluster_template: str = None, default_cluster_job_template = None, repo_staging_path: str = None, repo_name: str = None, repo_url: str = None, branch: str = "master", github_token = None, run_test_as_username="quentin.ambard@databricks.com"): self.username = username name = self.username[:self.username.rfind('@')] self.name = re.sub("[^A-Za-z0-9]", '_', name) self.workspace_url = workspace_url self.org_id = org_id self.pat_token = pat_token self.headers = {"Authorization": "Bearer " + pat_token, 'Content-type': 'application/json', 'User-Agent': 'dbdemos'} self.default_cluster_template = default_cluster_template self.default_cluster_job_template = default_cluster_job_template self.repo_staging_path = repo_staging_path self.repo_name = repo_name assert repo_url is None or ".git" not in repo_url, "repo_url should not contain .git" self.repo_url = repo_url self.branch = branch self.github_token = github_token self.run_test_as_username = run_test_as_username def get_repo_path(self): return self.repo_staging_path+"/"+self.repo_name #Add internal pool id to accelerate our demos & unit tests def get_demo_pool(self): if self.org_id == "1444828305810485" or "e2-demo-field-eng" in self.workspace_url: return "0727-104344-hauls13-pool-uftxk0r6" if self.org_id == "1660015457675682" or self.is_dev_env(): return "1025-140806-yup112-pool-yz565bma" if self.org_id == "5206439413157315": return "1010-172835-slues66-pool-7dhzc23j" if self.org_id == "984752964297111": return "1010-173019-honor44-pool-ksw4stjz" if self.org_id == "2556758628403379": return "1010-173021-dance560-pool-hl7wefwy" return None def is_dev_env(self): return "e2-demo-tools" in self.workspace_url or "local" in self.workspace_url def is_demo_env(self): return "e2-demo-field-eng" in self.workspace_url or "eastus2" or self.org_id in ["1444828305810485"] def is_fe_env(self): return "e2-demo-field-eng" in self.workspace_url or "eastus2" in self.workspace_url or \ self.org_id in ["5206439413157315", "984752964297111", "local", "1444828305810485", "2556758628403379"] class DBClient(): def __init__(self, conf: Conf): self.conf = conf def clean_path(self, path): if path.startswith("http"): raise Exception(f"Wrong path {path}, use with api path directly (no http://xxx..xxx).") if path.startswith("/"): path = path[1:] if path.startswith("api/"): path = path[len("api/"):] return path def post(self, path: str, json: dict = {}, retry = 0): url = self.conf.workspace_url+"/api/"+self.clean_path(path) with requests.post(url, headers = self.conf.headers, json=json, timeout=60) as r: if r.status_code == 429 and retry < 2: import time import random wait_time = 15 * (retry+1) + random.randint(2*retry, 10*retry) print(f'WARN: hitting api request limit 429 error: {path}. Sleeping {wait_time}sec and retrying...') time.sleep(wait_time) print('Retrying call.') return self.post(path, json, retry+1) else: return self.get_json_result(url, r) def put(self, path: str, json: dict = None, data: bytes = None): url = self.conf.workspace_url+"/api/"+self.clean_path(path) headers = self.conf.headers if data is not None: files = {'file': ('file', data, 'application/octet-stream')} with requests.put(url, headers=headers, files=files, timeout=60) as r: return self.get_json_result(url, r) else: with requests.put(url, headers=headers, json=json, timeout=60) as r: return self.get_json_result(url, r) def patch(self, path: str, json: dict = {}): url = self.conf.workspace_url+"/api/"+self.clean_path(path) with requests.patch(url, headers = self.conf.headers, json=json, timeout=60) as r: return self.get_json_result(url, r) def get(self, path: str, params: dict = {}, print_auth_error = True): url = self.conf.workspace_url+"/api/"+self.clean_path(path) with requests.get(url, headers = self.conf.headers, params=params, timeout=60) as r: return self.get_json_result(url, r, print_auth_error) def delete(self, path: str, params: dict = {}): url = self.conf.workspace_url+"/api/"+self.clean_path(path) with requests.delete(url, headers = self.conf.headers, params=params, timeout=60) as r: return self.get_json_result(url, r) def get_json_result(self, url: str, r: Response, print_auth_error = True): if r.status_code == 403: if print_auth_error: print(f"Unauthorized call. Check your PAT token {r.text} - {r.url} - {url}") try: return r.json() except Exception as e: print(f"API CALL ERROR - can't read json. status: {r.status_code} {r.text} - URL: {url} - {e}") raise e def search_cluster(self, cluster_name: str, tags: dict): clusters = self.db.get("2.1/clusters/list") for c in clusters: if c['cluster_name'] == cluster_name: match = True #Check if all the tags are in the cluster conf for k, v in tags.items(): if k not in c['custom_tags'] or c['custom_tags'][k] != v: match = False if match: return c return None def find_job(self, name, offset = 0, limit = 25): r = self.get("2.1/jobs/list", {"limit": limit, "offset": offset, "name": urllib.parse.quote_plus(name)}) if 'jobs' in r: for job in r['jobs']: if job["settings"]["name"] == name: return job if r['has_more']: return self.find_job(name, offset+limit, limit) return None class GenieRoom(): def __init__(self, id: str, display_name: str, description: str, table_identifiers: List[str], curated_questions: List[str], instructions: str, sql_instructions: List[dict], function_names: List[str], benchmarks:List[dict]): self.display_name = display_name self.id = id self.description = description self.instructions = instructions self.table_identifiers = table_identifiers self.sql_instructions = sql_instructions self.curated_questions = curated_questions self.function_names = function_names self.benchmarks= benchmarks class DataFolder(): def __init__(self, source_folder: str, source_format: str, target_table_name: str = None, target_volume_folder_name: str = None, target_format: str = "delta"): assert target_volume_folder_name or target_table_name, "Error, data folder should either has target_table_name or target_volume_folder_name set" self.source_folder = source_folder self.source_format = source_format self.target_table_name = target_table_name self.target_format = target_format self.target_volume_folder_name = target_volume_folder_name class DemoNotebook(): def __init__(self, path: str, title: str, description: str, pre_run: bool = False, publish_on_website: bool = False, add_cluster_setup_cell: bool = False, parameters: dict = {}, depends_on_previous: bool = True, libraries: list = [], warehouse_id = None, object_type = None): self.path = path self.title = title self.description = description self.pre_run = pre_run self.publish_on_website = publish_on_website self.add_cluster_setup_cell = add_cluster_setup_cell self.parameters = parameters self.depends_on_previous = depends_on_previous self.libraries = libraries self.warehouse_id = warehouse_id self.object_type = object_type def __repr__(self): return self.path def get_folder(self): p = Path(self.get_clean_path()) p.parts def get_clean_path(self): #Some notebook path are relatives, like ../../demo-retail/lakehouse-retail/_resources/xxx # DThis function removes it and returns _resources/xxx p = Path(self.path) parent_count = p.parts.count('..') if parent_count > 0: return str(p.relative_to(*p.parts[:parent_count*2-1])) return self.path def toJSON(self): return json.dumps(self, default=lambda o: o.__dict__) class DemoConf(): def __init__(self, path: str, json_conf: dict, catalog:str = None, schema: str = None): self.json_conf = json_conf self.notebooks = [] self.cluster = json_conf.get('cluster', {}) self.cluster_libraries = json_conf.get('cluster_libraries', []) self.workflows = json_conf.get('workflows', []) self.pipelines = json_conf.get('pipelines', []) self.repos = json_conf.get('repos', []) self.serverless_supported = json_conf.get('serverless_supported', False) self.init_job = json_conf.get('init_job', {}) self.job_id = None self.run_id = None if path.startswith('/'): path = path[1:] self.path = path self.name = json_conf['name'] self.category = json_conf['category'] self.title = json_conf['title'] self.description = json_conf['description'] self.tags = json_conf.get('tags', []) self.custom_schema_supported = json_conf.get('custom_schema_supported', False) self.schema = schema self.catalog = catalog self.default_schema = json_conf.get('default_schema', "") self.default_catalog = json_conf.get('default_catalog', "") self.custom_message = json_conf.get('custom_message', "") self.create_cluster = json_conf.get('create_cluster', True) self.dashboards = json_conf.get('dashboards', []) self.sql_queries = json_conf.get('sql_queries', []) self.bundle = json_conf.get('bundle', False) self.env_version = json_conf.get('env_version', 2) self.data_folders: List[DataFolder] = [] for data_folder in json_conf.get('data_folders', []): self.data_folders.append(DataFolder(data_folder['source_folder'], data_folder['source_format'], data_folder.get('target_table_name', None), data_folder.get('target_volume_folder', None), data_folder['target_format'])) self.genie_rooms: List[GenieRoom] = [] for genie_room in json_conf.get('genie_rooms', []): self.genie_rooms.append(GenieRoom(genie_room['id'], genie_room.get('display_name', None), genie_room.get('description', None), genie_room['table_identifiers'], genie_room.get('curated_questions', []), genie_room.get('instructions', None), genie_room.get('sql_instructions', []), genie_room.get('function_names', []),genie_room.get('benchmarks', []))) for n in json_conf.get('notebooks', []): add_cluster_setup_cell = n.get('add_cluster_setup_cell', False) params = n.get('parameters', {}) depends_on_previous = n.get('depends_on_previous', True) libraries = n.get('libraries', []) warehouse_id = n.get('warehouse_id', None) self.notebooks.append(DemoNotebook(n['path'], n['title'], n['description'], n['pre_run'], n['publish_on_website'], add_cluster_setup_cell, params, depends_on_previous, libraries, warehouse_id, n.get('object_type', None))) self._notebook_lock = threading.Lock() def __repr__(self): return self.path + "("+str(self.notebooks)+")" def update_notebook_object_type(self, notebook: DemoNotebook, object_type: str): with self._notebook_lock: for n in self.json_conf['notebooks']: if n['path'] == notebook.path: n['object_type'] = object_type break def add_notebook(self, notebook): self.notebooks.append(notebook) #TODO: this isn't clean, need a better solution self.json_conf["notebooks"].append(notebook.__dict__) def set_pipeline_id(self, id, uid): j = json.dumps(self.init_job) j = j.replace("{{DYNAMIC_SDP_ID_"+id+"}}", uid) self.init_job = json.loads(j) j = json.dumps(self.workflows) j = j.replace("{{DYNAMIC_SDP_ID_"+id+"}}", uid) self.workflows = json.loads(j) def get_job_name(self): return "field-bundle_"+self.name def get_notebooks_to_run(self): return [n for n in self.notebooks if n.pre_run] def get_notebooks_to_publish(self) -> List[DemoNotebook]: return [n for n in self.notebooks if n.publish_on_website] def get_bundle_path(self): return self.get_bundle_root_path() + "/install_package" def get_bundle_dashboard_path(self): return self.get_bundle_root_path() + "/dashboards" def get_bundle_root_path(self): return "dbdemos/bundles/"+self.name def get_minisite_path(self): return "dbdemos/minisite/"+self.name class ConfTemplate: def __init__(self, username, demo_name, catalog = None, schema = None, demo_folder = ""): self.catalog = catalog self.schema = schema self.username = username self.demo_name = demo_name self.demo_folder = demo_folder def template_TODAY(self): return date.today().strftime("%Y-%m-%d") def template_CURRENT_USER(self): return self.username def template_CATALOG(self): return self.catalog def template_SCHEMA(self): return self.schema def template_CURRENT_USER_NAME(self): name = self.username[:self.username.rfind('@')] name = re.sub("[^A-Za-z0-9]", '_', name) return name def template_DEMO_NAME(self): return self.demo_name def template_DEMO_FOLDER(self): return self.demo_folder def template_SHARED_WAREHOUSE_ID(self): return self.demo_folder def replace_template_key(self, text: str): for key in set(re.findall(r'\{\{(.*?)\}\}', text)): if "Drift_detection" not in key: #TODO need to improve that, mlops demo has {{}} in the product like tasks.Drift_detection.values.all_violations_count if not key.startswith("DYNAMIC") and not key.startswith("SHARED_WAREHOUSE"): func = getattr(self, f"template_{key}") replacement = func() text = text.replace("{{"+key+"}}", replacement) return text ================================================ FILE: dbdemos/dbdemos.py ================================================ from .exceptions.dbdemos_exception import TokenException from .installer import Installer from collections import defaultdict from .installer_report import InstallerReport CSS_LIST = """ <style> .dbdemo { font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica Neue,Arial,Noto Sans,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol,Noto Color Emoji,FontAwesome; color: #3b3b3b; padding: 0px 0px 20px 0px; } .dbdemo_box { width: 400px; padding: 10px; box-shadow: 0 .15rem 1.15rem 0 rgba(58,59,69,.15)!important; float: left; min-height: 170px; margin: 0px 20px 20px 20px; } .dbdemo_category { clear: both; } .category { margin-left: 20px; margin-bottom: 5px; } .dbdemo_logo { width: 100%; height: 225px; } .code { padding: 5px; border: 1px solid #e4e4e4; font-family: monospace; background-color: #f5f5f5; margin: 5px 0px 0px 0px; } .dbdemo_description { height: 100px; } .menu_button { font-size: 15px; cursor: pointer; border: 0px; padding: 10px 20px 10px 20px; margin-right: 10px; background-color: rgb(238, 237, 233); border-radius: 20px; } .menu_button:hover { background-color: rgb(245, 244, 242) } .menu_button.selected { background-color: rgb(158, 214, 196) } .new_tag { background-color: red; color: white; font-size: 13px; padding: 2px 7px; border-radius: 3px; margin-right: 5px; } </style> """ JS_LIST = """<script> const buttons = document.querySelectorAll('.menu_button'); const sections = document.querySelectorAll('.dbdemo_category'); buttons.forEach(button => { button.addEventListener('click', () => { const selectedCategory = button.getAttribute('category'); sections.forEach(section => { if (section.id === `category-${selectedCategory}`) { section.style.display = 'block'; } else { section.style.display = 'none'; } }); buttons.forEach(btn => { if (btn === button) { btn.classList.add('selected'); } else { btn.classList.remove('selected'); } }); }); }); </script>""" def help(): installer = Installer() if installer.report.displayHTML_available(): from dbruntime.display import displayHTML displayHTML("""<style> .dbdemos_install{ font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica Neue,Arial,Noto Sans,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol,Noto Color Emoji,FontAwesome; color: #3b3b3b; box-shadow: 0 .15rem 1.15rem 0 rgba(58,59,69,.15)!important; padding: 10px; margin: 10px; } .code { padding: 0px 5px; border: 1px solid #e4e4e4; font-family: monospace; background-color: #f5f5f5; margin: 5px 0px 0px 0px; display: inline; } </style> <div class="dbdemos_install"> <h1>DBDemos</h1> <i>Install databricks demos: notebooks, Delta Live Table Pipeline, DBSQL Dashboards, ML Models etc.</i> <ul> <li> <div class="code">dbdemos.help()</div>: display help.<br/><br/> </li> <li> <div class="code">dbdemos.list_demos(category: str = None)</div>: list all demos available, can filter per category (ex: 'governance').<br/><br/> </li> <li> <div class="code">dbdemos.install(demo_name: str, path: str = "./", overwrite: bool = False, use_current_cluster = False, username: str = None, pat_token: str = None, workspace_url: str = None, skip_dashboards: bool = False, cloud: str = "AWS", catalog: str = None, schema: str = None, serverless: bool = None, warehouse_name: str = None, skip_genie_rooms: bool = False, dlt_policy_id: str = None, dlt_compute_settings: dict = None)</div>: install the given demo to the given path.<br/><br/> <ul> <li>If overwrite is True, dbdemos will delete the given path folder and re-install the notebooks.</li> <li>use_current_cluster = True will not start a new cluster to init the demo but use the current cluster instead. <strong>Set it to True it if you don't have cluster creation permission</strong>.</li> <li>skip_dashboards = True will not load the DBSQL dashboard if any (faster, use it if the dashboard generation creates some issue).</li> <li>If no authentication are provided, dbdemos will use the current user credential & workspace + cloud to install the demo.</li> <li>catalog and schema options let you chose where to load the data and other assets.</li> <li>Dashboards require a warehouse, you can specify it with the warehouse_name='xx' option.</li> <li>Dbdemos will detect serverless compute and use the current cluster when you're running serverless. You can force it with the serverless=True option.</li> <li>Genie rooms are in beta. You can skip the genie room installation with skip_genie_rooms = True.</li> <li>dlt_policy_id will be used in the dlt (example: "0003963E5B551CE4"). Use it with dlt_compute_settings = {"autoscale": {"min_workers": 1, "max_workers": 5}} to respect the policy requirements.</li> </ul><br/> </li> <li> <div class="code">dbdemos.create_cluster(demo_name: str)</div>: install update the interactive cluster for the demo (scoped to the user).<br/><br/> </li> <li> <div class="code">dbdemos.install_all(path: str = "./", overwrite: bool = False, username: str = None, pat_token: str = None, workspace_url: str = None, skip_dashboards: bool = False, cloud: str = "AWS")</div>: install all the demos to the given path.<br/><br/> </li> </ul> </div>""") else: print("------------ DBDemos ------------------") print("""dbdemos.help(): display help.""") print("""dbdemos.list_demos(category: str = None): list all demos available, can filter per category (ex: 'governance').""") print("""dbdemos.install(demo_name: str, path: str = "./", overwrite: bool = False, username: str = None, pat_token: str = None, workspace_url: str = None, skip_dashboards: bool = False, cloud: str = "AWS"): install the given demo to the given path.""") print("""dbdemos.create_cluster(demo_name: str): install update the interactive cluster for the demo (scoped to the user).""") print("""dbdemos.install_all(path: str = "./", overwrite: bool = False, username: str = None, pat_token: str = None, workspace_url: str = None, skip_dashboards: bool = False, cloud: str = "AWS")</div>: install all the demos to the given path.""") def list_demos(category = None, installer = None, pat_token = None): check_version() deprecated_demos = ["uc-04-audit-log", "llm-dolly-chatbot"] if installer == None: installer = Installer(pat_token=pat_token) installer.tracker.track_list() demos = defaultdict(lambda: []) #Define category order demos["lakehouse"] = [] demos["data-engineering"] = [] demos["governance"] = [] demos["DBSQL"] = [] demos["data-science"] = [] demos["AI-BI"] = [] for demo in installer.get_demos_available(): conf = installer.get_demo_conf(demo) if (category is None or conf.category == category.lower()) and conf.name not in deprecated_demos: demos[conf.category].append(conf) if installer.report.displayHTML_available(): content = get_html_list_demos(demos) from dbruntime.display import displayHTML displayHTML(content) else: list_console(demos) def get_html_list_demos(demos): categories = list(demos.keys()) content = f"""{CSS_LIST}<div class="dbdemo"> <div style="padding: 10px 0px 20px 20px">""" for i, cat in enumerate(categories): content += f"""<button category="{cat}" class="menu_button {"selected" if i == 0 else ""}" type="button">{f'<span class="new_tag">NEW!</span>' if cat == 'AI-BI' else ''}<span>{cat.capitalize()}</span></button>""" content += """</div>""" for i, cat in enumerate(categories): content += f"""<div class="dbdemo_category" style="min-height: 200px; display: {"block" if i == 0 else "none"}" id="category-{cat}">""" ds = list(demos[cat]) ds.sort(key=lambda d: d.name) for demo in ds: content += f""" <div class="dbdemo_box"> <img class="dbdemo_logo" src="https://github.com/databricks-demos/dbdemos-resources/raw/main/icon/{demo.name}.jpg" /> <div class="dbdemo_description"> <h2>{demo.title}</h2> {demo.description} </div> <div class="code"> dbdemos.install('{demo.name}') </div> </div>""" content += """</div>""" content += f"""</div>{JS_LIST}""" return content def list_console(demos): print("----------------------------------------------------") print("----------------- Demos Available ------------------") print("----------------------------------------------------") categories = list(demos.keys()) for cat in categories: print(f"{cat.capitalize()}") ds = list(demos[cat]) ds.sort(key=lambda d: d.name) for demo in ds: print(f" - {demo.name}: {demo.title} ({demo.description}) => dbdemos.install('{demo.name}')") print("") print("----------------------------------------------------") def list_delta_live_tables(category = None): pass def list_dashboards(category = None): pass def install(demo_name, path = None, overwrite = False, username = None, pat_token = None, workspace_url = None, skip_dashboards = False, cloud = "AWS", start_cluster: bool = None, use_current_cluster: bool = False, current_cluster_id = None, warehouse_name = None, debug = False, catalog = None, schema = None, serverless=None, skip_genie_rooms=False, create_schema=True, dlt_policy_id = None, dlt_compute_settings = None): check_version() if demo_name == "llm-fine-tuning" : print("ERROR: llm-fine-tuning is deprecated and has been removed. You can restore it from an older dbdemos version: %pip install dbdemos==0.6.28") return elif demo_name == "chatbot-rag-llm" or demo_name == "llm-tools-functions" or demo_name == "llm-rag-chatbot": print(f"ERROR: {demo_name} is deprecated and has been removed. You can restore it from an older dbdemos version: %pip install dbdemos==0.6.28") print("We will instead install the new ai-agent demos") demo_name = "ai-agent" elif demo_name == "dlt-loans" or demo_name == "dlt-loan" : print("ERROR: dlt-loans is deprecated and has been removed. You can restore it from an older dbdemos version: %pip install dbdemos==0.6.28") print("We will instead install the new pipeline-bike demos") demo_name = "pipeline-bike" elif demo_name == "dlt-unit-test": print("WARN: dlt-unit-test has been renamed to declarative-pipeline-unit-test") demo_name = "declarative-pipeline-unit-test" elif demo_name == "dlt-cdc": print("WARN: dlt-cdc has been renamed to declarative-pipeline-cdc") demo_name = "declarative-pipeline-cdc" elif demo_name == "lakehouse-retail-churn": print("WARN: lakehouse-retail-churn has been renamed to lakehouse-retail-c360") demo_name = "lakehouse-retail-c360" elif demo_name == "identity-pk-fk": print("WARN: identity-pk-fk has been renamed to sql-warehouse") demo_name = "sql-warehouse" elif demo_name == "auto-loader": print("WARN: auto-loader has been renamed to data-ingestion") demo_name = "data-ingestion" try: installer = Installer(username, pat_token, workspace_url, cloud, current_cluster_id = current_cluster_id) except TokenException as e: report = InstallerReport(workspace_url) report.display_token_error(e, demo_name) if not installer.test_premium_pricing(): #Force dashboard skip as dbsql isn't available to avoid any error. skip_dashboards = True installer.install_demo(demo_name, path, overwrite, skip_dashboards = skip_dashboards, start_cluster = start_cluster, use_current_cluster = use_current_cluster, debug = debug, catalog = catalog, schema = schema, serverless = serverless, warehouse_name=warehouse_name, skip_genie_rooms=skip_genie_rooms, create_schema=create_schema, dlt_policy_id = dlt_policy_id, dlt_compute_settings = dlt_compute_settings) def install_all(path = None, overwrite = False, username = None, pat_token = None, workspace_url = None, skip_dashboards = False, cloud = "AWS", start_cluster = None, use_current_cluster = False, catalog = None, schema = None, dlt_policy_id = None, dlt_compute_settings = None): """ Install all the bundle demos. """ installer = Installer(username, pat_token, workspace_url, cloud) for demo_name in installer.get_demos_available(): installer.install_demo(demo_name, path, overwrite, skip_dashboards = skip_dashboards, start_cluster = start_cluster, use_current_cluster = use_current_cluster, catalog = catalog, schema = schema, dlt_policy_id = dlt_policy_id, dlt_compute_settings = dlt_compute_settings) def check_status_all(username = None, pat_token = None, workspace_url = None, cloud = "AWS"): """ Check all dbdemos bundle demos installation status (see #check_status) """ installer = Installer(username, pat_token, workspace_url, cloud) for demo_name in installer.get_demos_available(): check_status(demo_name, username, pat_token, workspace_url, cloud) def check_status(demo_name:str, username = None, pat_token = None, workspace_url = None, cloud = "AWS", catalog = None, schema = None): """ Check the status of the given demo installation. Will pool the installation job if any and wait for its completion. Throw an error if the job wasn't successful. """ installer = Installer(username, pat_token, workspace_url, cloud) demo_conf = installer.get_demo_conf(demo_name, catalog, schema) if schema is None: schema = demo_conf.default_schema if catalog is None: catalog = demo_conf.default_catalog if "settings" in demo_conf.init_job: job_name = demo_conf.init_job["settings"]["name"] existing_job = installer.db.find_job(job_name) if existing_job == None: raise Exception(f"Couldn't find job for demo {demo_name}. Did you install it first?") installer.installer_workflow.wait_for_run_completion(existing_job['job_id'], debug=True) runs = installer.db.get("2.1/jobs/runs/list", {"job_id": existing_job['job_id'], "limit": 1}) if runs['runs'][0]['state']['result_state'] != "SUCCESS": raise Exception(f"Job {existing_job['job_id']} for demo {demo_name} failed: {installer.db.conf.workspace_url}/#job/{existing_job['job_id']}/run/{runs['runs'][0]['run_id']} - {runs}") def create_cluster(demo_name, username = None, pat_token = None, workspace_url = None, cloud = "AWS"): installer = Installer(username, pat_token, workspace_url, cloud = cloud) installer.check_demo_name(demo_name) print(f"Updating cluster for demo {demo_name}...") demo_conf = installer.get_demo_conf(demo_name) installer.tracker.track_create_cluster(demo_conf.category, demo_name) cluster_id, cluster_name = installer.load_demo_cluster(demo_name, demo_conf, True) installer.report.display_install_result(demo_name, demo_conf.description, demo_conf.title, cluster_id = cluster_id, cluster_name = cluster_name) def check_version(): """ Check if a newer version of dbdemos is available on PyPI. Prints a warning if the installed version is outdated. """ try: import pkg_resources import requests import json # Get installed version installed_version = pkg_resources.get_distribution('dbdemos').version # Get latest version from PyPI pypi_response = requests.get("https://pypi.org/pypi/dbdemos/json") latest_version = json.loads(pypi_response.text)['info']['version'] # Compare versions if pkg_resources.parse_version(latest_version) > pkg_resources.parse_version(installed_version): print(f"\nWARNING: You are using dbdemos version {installed_version}, however version {latest_version} is available. You should consider upgrading:") print("%pip install --upgrade dbdemos") print("dbutils.library.restartPython()") except Exception as e: # Silently handle any errors during version check pass ================================================ FILE: dbdemos/exceptions/__init__.py ================================================ ================================================ FILE: dbdemos/exceptions/dbdemos_exception.py ================================================ class TokenException(Exception): def __init__(self, message): super().__init__(message) self.message = message class ClusterException(Exception): def __init__(self, message, cluster_conf, response): super().__init__(message) self.response = response self.cluster_conf = cluster_conf class ClusterPermissionException(ClusterException): def __init__(self, message, cluster_conf, response): super().__init__(message, cluster_conf, response) class ClusterCreationException(ClusterException): def __init__(self, message, cluster_conf, response): super().__init__(message, cluster_conf, response) class GenieCreationException(Exception): def __init__(self, message, genie_conf, response): super().__init__(message) self.response = response self.genie_conf = genie_conf class ExistingResourceException(Exception): def __init__(self, install_path, response): super().__init__(f"Folder {install_path} isn't empty.") self.install_path = install_path self.response = response class SQLQueryException(Exception): def __init__(self, message): super().__init__(message) class DataLoaderException(Exception): def __init__(self, message): super().__init__(message) class FolderDeletionException(Exception): def __init__(self, install_path, response): super().__init__(f"Can't delete folder {install_path}.") self.install_path = install_path self.response = response class FolderCreationException(Exception): def __init__(self, install_path, response): super().__init__(f"Can't load notebook {install_path}.") self.install_path = install_path self.response = response class SDPException(Exception): def __init__(self, message, description, pipeline_conf, response): super().__init__(message) self.description = description self.pipeline_conf = pipeline_conf self.response = response class SDPNotAvailableException(SDPException): def __init__(self, message, pipeline_conf, response): super().__init__("SDP not available", message, pipeline_conf, response) class SDPCreationException(SDPException): def __init__(self, message, pipeline_conf, response): super().__init__("SDP creation failure", message, pipeline_conf, response) class WorkflowException(Exception): def __init__(self, message, details, job_config, response): super().__init__(message) self.details = details self.job_config = job_config self.response = response ================================================ FILE: dbdemos/installer.py ================================================ import collections import pkg_resources from .conf import DBClient, DemoConf, Conf, ConfTemplate, merge_dict, DemoNotebook from .exceptions.dbdemos_exception import ClusterPermissionException, ClusterCreationException, ClusterException, \ ExistingResourceException, FolderDeletionException, SDPNotAvailableException, SDPCreationException, SDPException, \ FolderCreationException, TokenException from .installer_report import InstallerReport from .installer_genie import InstallerGenie from .installer_dashboard import InstallerDashboard from .tracker import Tracker from .notebook_parser import NotebookParser from .installer_workflows import InstallerWorkflow from .installer_repos import InstallerRepo from pathlib import Path import time import json import re import base64 from concurrent.futures import ThreadPoolExecutor from datetime import date import urllib import threading from dbdemos.sql_query import SQLQueryExecutor from databricks.sdk import WorkspaceClient class Installer: def __init__(self, username = None, pat_token = None, workspace_url = None, cloud = "AWS", org_id: str = None, current_cluster_id: str = None): self.cloud = cloud self.dbutils = None if username is None: username = self.get_current_username() if workspace_url is None: workspace_url = self.get_current_url() if pat_token is None: pat_token = self.get_current_pat_token() if org_id is None: org_id = self.get_org_id() self.current_cluster_id = current_cluster_id if self.current_cluster_id is None: self.current_cluster_id = self.get_current_cluster_id() conf = Conf(username, workspace_url, org_id, pat_token) self.tracker = Tracker(org_id, self.get_uid(), username) self.db = DBClient(conf) self.report = InstallerReport(self.db.conf.workspace_url) self.installer_workflow = InstallerWorkflow(self) self.installer_repo = InstallerRepo(self) self.installer_dashboard = InstallerDashboard(self) self.installer_genie = InstallerGenie(self) self.sql_query_executor = SQLQueryExecutor() #Slows down on GCP as the dashboard API is very sensitive to back-pressure # 1 dashboard at a time to reduce import pression as it seems to be creating new errors. self.max_workers = 1 if self.get_current_cloud() == "GCP" else 1 def get_dbutils(self): if self.dbutils is None: try: from pyspark.sql import SparkSession spark = SparkSession.getActiveSession() from pyspark.dbutils import DBUtils self.dbutils = DBUtils(spark) except: try: import IPython self.dbutils = IPython.get_ipython().user_ns["dbutils"] except: #Can't get dbutils (local run) return None return self.dbutils def get_current_url(self): try: return "https://"+self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().browserHostName().get() except: try: return "https://"+self.get_dbutils_tags_safe()['browserHostName'] except: return "local" def get_dbutils_tags_safe(self): import json return json.loads(self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().safeToJson())['attributes'] def get_current_cluster_id(self): try: return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().tags().apply('clusterId') except: try: return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().clusterId().get() except: try: return self.get_dbutils_tags_safe()['clusterId'] except: return "local" def get_org_id(self): try: return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().tags().apply('orgId') except: try: return self.get_dbutils_tags_safe()['orgId'] except: return "local" def get_uid(self): try: return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().tags().apply('userId') except: return "local" def get_current_folder(self): try: current_notebook = self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get() return current_notebook[:current_notebook.rfind("/")] except: try: current_notebook = self.get_dbutils_tags_safe()['notebook_path'] return current_notebook[:current_notebook.rfind("/")] except: return "local" def get_workspace_id(self): try: return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().workspaceId().get() except: try: return self.get_dbutils_tags_safe()['orgId'] except: return "local" def get_current_pat_token(self): try: token = self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().apiToken().get() except Exception as e: raise TokenException("Couldn't get a PAT Token: "+str(e)+". If you're installing it locally or from a batch, please use the pat_token='xxx' parameter instead using a secret.") if len(token) == 0: raise TokenException("Empty PAT Token.") return token def get_current_username(self): try: return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().tags().apply('user') except Exception as e2: try: return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().userName().get() except Exception as e: try: return self.get_dbutils_tags_safe()['user'] except: print(f"WARN: couldn't get current username. This shouldn't happen - unpredictable behavior - 2 errors: {e2} - {e} - will return 'unknown'") return "unknown" def get_current_cloud(self): try: hostname = self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().browserHostName().get() except: print(f"WARNING: Can't get cloud from dbutils. Fallback to default local cloud {self.cloud}") return self.cloud if "gcp" in hostname: return "GCP" elif "azure" in hostname: return "AZURE" else: return "AWS" def get_current_cluster_id(self): try: return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().tags().apply('clusterId') except: try: return self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().clusterId().get() except: try: return self.get_dbutils_tags_safe()['clusterId'] except: return "local" def get_workspace_url(self): try: workspace_url = "https://"+self.get_dbutils().notebook.entry_point.getDbutils().notebook().getContext().browserHostName().get() except Exception as e: raise Exception("Couldn't get workspace URL: "+str(e)) return workspace_url def check_demo_name(self, demo_name): demos = collections.defaultdict(lambda: []) #Define category order demos["lakehouse"] = [] demo_availables = self.get_demos_available() if demo_name not in demo_availables: for demo in demo_availables: conf = self.get_demo_conf(demo) demos[conf.category].append(conf) self.report.display_demo_name_error(demo_name, demos) def get_demos_available(self): return set(pkg_resources.resource_listdir("dbdemos", "bundles")) def get_demo_conf(self, demo_name:str, catalog:str = None, schema:str = None, demo_folder: str = ""): demo = self.get_resource(f"bundles/{demo_name}/conf.json") raw_demo = json.loads(demo) catalog = catalog if catalog is not None else raw_demo.get('default_catalog', None) schema = schema if schema is not None else raw_demo.get('default_schema', None) conf_template = ConfTemplate(self.db.conf.username, demo_name, catalog, schema, demo_folder) return DemoConf(demo_name, json.loads(conf_template.replace_template_key(demo)), catalog, schema) def get_resource(self, path, decode=True): resource = pkg_resources.resource_string("dbdemos", path) return resource.decode('UTF-8') if decode else resource def resource_isdir(self, path): return pkg_resources.resource_isdir("dbdemos", path) def test_premium_pricing(self): try: w = self.db.get("2.0/sql/config/warehouses", {"limit": 1}, print_auth_error = False) if "error_code" in w and (w["error_code"] == "FEATURE_DISABLED" or w["error_code"] == "ENDPOINT_NOT_FOUND"): self.report.display_non_premium_warn(Exception(f"DBSQL not available, either at workspace level or user entitlement."), w) return False return True except Exception as e: print(e) self.report.display_non_premium_warn(Exception(f"DBSQL not available"), str(e)) return False def cluster_is_serverless(self): try: cluster_details = self.db.get("2.0/clusters/get", {"cluster_id": self.get_current_cluster_id()}) return cluster_details.get("enable_serverless_compute", False) except Exception as e: print(f"Couldn't get cluster serverless status. Will consider it False. {e}") return False def create_or_check_schema(self, demo_conf: DemoConf, create_schema: bool, debug=True): """Create or verify schema exists based on create_schema parameter""" ws = WorkspaceClient(token=self.db.conf.pat_token, host=self.db.conf.workspace_url) try: catalog = ws.catalogs.get(demo_conf.catalog) except Exception as e: if create_schema: if debug: print(f"Can't describe catalog {demo_conf.catalog}. Will now try to create it. Error: {e}") try: print(f"Catalog {demo_conf.catalog} doesn't exist. Creating it. You can set create_schema=False to avoid catalog and schema creation, or install in another catalog with catalog=<catalog_name>.") self.sql_query_executor.execute_query(ws, f"CREATE CATALOG IF NOT EXISTS {demo_conf.catalog}") #note: ws.catalogs.create(demo_conf.catalog) this doesn't work properly in serverless workspaces with default storage for now (Metastore storage root URL does not exist error) except Exception as e: self.report.display_schema_creation_error(e, demo_conf) else: self.report.display_schema_not_found_error(e, demo_conf) schema_full_name = f"{demo_conf.catalog}.{demo_conf.schema}" try: schema = ws.schemas.get(schema_full_name) except Exception as e: if create_schema: if debug: print(f"Can't describe schema {schema_full_name}. Will now try to create it. Error: {e}") try: schema = ws.schemas.create(demo_conf.schema, catalog_name=demo_conf.catalog) except Exception as e: self.report.display_schema_creation_error(e, demo_conf) else: self.report.display_schema_not_found_error(e, demo_conf) def install_demo(self, demo_name, install_path, overwrite=False, update_cluster_if_exists = True, skip_dashboards = False, start_cluster = None, use_current_cluster = False, debug = False, catalog = None, schema = None, serverless=False, warehouse_name = None, skip_genie_rooms=False, create_schema=True, dlt_policy_id = None, dlt_compute_settings = None): # first get the demo conf. if install_path is None: install_path = self.get_current_folder() elif install_path.startswith("./"): install_path = self.get_current_folder()+"/"+install_path[2:] elif not install_path.startswith("/"): install_path = self.get_current_folder()+"/"+install_path if install_path.endswith("/"): install_path = install_path[:-1] if serverless is None: serverless = self.cluster_is_serverless() self.check_demo_name(demo_name) demo_conf = self.get_demo_conf(demo_name, catalog, schema, install_path+"/"+demo_name) if (schema is not None or catalog is not None) and not demo_conf.custom_schema_supported: self.report.display_custom_schema_not_supported_error(Exception(f'Custom schema not supported for {demo_conf.name}'), demo_conf) if schema is None: schema = demo_conf.default_schema if catalog is None: catalog = demo_conf.default_catalog if "-" in schema or "-" in catalog: self.report.display_incorrect_schema_error(Exception('Please use a valid schema/catalog name.'), demo_conf) # Add schema validation/creation after demo_conf initialization if demo_conf.custom_schema_supported: self.create_or_check_schema(demo_conf, create_schema, debug) if demo_name.startswith("aibi"): use_current_cluster = True if serverless: use_current_cluster = True if not demo_conf.serverless_supported: self.report.display_serverless_warn(Exception('This DBDemo content is not yet updated to Serverless/Express!'), demo_conf) self.report.display_install_info(demo_conf, install_path, catalog, schema) self.tracker.track_install(demo_conf.category, demo_name) use_cluster_id = self.current_cluster_id if use_current_cluster else None try: cluster_id, cluster_name = self.load_demo_cluster(demo_name, demo_conf, update_cluster_if_exists, start_cluster, use_cluster_id) except ClusterException as e: #Fallback to current cluster if we can't create a cluster. cluster_id = self.current_cluster_id self.report.display_cluster_creation_warn(e, demo_conf) cluster_name = "Current Cluster" self.check_if_install_folder_exists(demo_name, install_path, demo_conf, overwrite, debug) pipeline_ids = self.load_demo_pipelines(demo_name, demo_conf, debug, serverless, dlt_policy_id, dlt_compute_settings) # Create Genie rooms before dashboards so we can optionally inject their uid into dashboards genie_rooms = self.installer_genie.install_genies(demo_conf, install_path, warehouse_name, skip_genie_rooms, debug) dashboards = [] if skip_dashboards else self.installer_dashboard.install_dashboards(demo_conf, install_path, warehouse_name, debug, genie_rooms) repos = self.installer_repo.install_repos(demo_conf, debug) workflows = self.installer_workflow.install_workflows(demo_conf, use_cluster_id, warehouse_name, serverless, debug) init_job = self.installer_workflow.create_demo_init_job(demo_conf, use_cluster_id, warehouse_name, serverless, debug) all_workflows = workflows if init_job["id"] is None else workflows + [init_job] notebooks = self.install_notebooks(demo_name, install_path, demo_conf, cluster_name, cluster_id, pipeline_ids, dashboards, all_workflows, repos, overwrite, use_current_cluster, genie_rooms, debug) self.installer_workflow.start_demo_init_job(demo_conf, init_job, debug) for pipeline in pipeline_ids: if "run_after_creation" in pipeline and pipeline["run_after_creation"]: self.db.post(f"2.0/pipelines/{pipeline['uid']}/updates", { "full_refresh": True }) self.report.display_install_result(demo_name, demo_conf.description, demo_conf.title, install_path, notebooks, init_job['uid'], init_job['run_id'], serverless, cluster_id, cluster_name, pipeline_ids, dashboards, workflows, genie_rooms) def get_demo_datasource(self, warehouse_name = None): data_sources = self.db.get("2.0/preview/sql/data_sources") if warehouse_name is not None: for source in data_sources: if source['name'] == warehouse_name: return source raise Exception(f"""Error creating the dashboard: cannot find warehouse with warehouse_name='{warehouse_name}' to load your dashboards. Use a different name and make sure the endpoint exists.""") for source in data_sources: if source['name'] == "dbdemos-shared-endpoint": return source #Try to fallback to an existing shared endpoint. for source in data_sources: #Default serverless warehouse in express workspaces if "serverless starter warehouse" in source['name'].lower(): return source for source in data_sources: if "shared-sql-endpoint" in source['name'].lower(): return source for source in data_sources: if "shared" in source['name'].lower(): return source return None def get_or_create_endpoint(self, username: str, demo_conf: DemoConf, default_endpoint_name: str ="dbdemos-shared-endpoint", warehouse_name: str = None, throw_error: bool = False): try: ds = self.get_demo_datasource(warehouse_name) except Exception as e: self.report.display_unknow_warehouse_error(e, demo_conf, warehouse_name) if ds is not None: return ds def get_definition(serverless, name): return { "name": name, "cluster_size": "Small", "min_num_clusters": 1, "max_num_clusters": 1, "tags": { "project": "dbdemos" }, "spot_instance_policy": "COST_OPTIMIZED", "warehouse_type": "PRO", "enable_photon": "true", "enable_serverless_compute": serverless, "channel": { "name": "CHANNEL_NAME_CURRENT" } } def try_create_endpoint(serverless): w = self.db.post("2.0/sql/warehouses", json=get_definition(serverless, default_endpoint_name)) if "message" in w and "already exists" in w['message']: w = self.db.post("2.0/sql/warehouses", json=get_definition(serverless, default_endpoint_name + "-" + username)) if "id" in w: return w if serverless: print(f"WARN: Couldn't create serverless warehouse ({default_endpoint_name}). Will fallback to standard SQL warehouse. Creation response: {w}") else: print(f"WARN: Couldn't create warehouse: {default_endpoint_name} and {default_endpoint_name}-{username}. Creation response: {w}. Use another warehouse to view your dashboard.") return None if try_create_endpoint(True) is None: #Try to fallback with classic endpoint? try_create_endpoint(False) ds = self.get_demo_datasource() if ds is not None: return ds print(f"ERROR: Couldn't create endpoint. Use the option warehouse_name={warehouse_name} to specify a different warehouse during the installation.") if throw_error: self.report.display_warehouse_creation_error(Exception("Couldn't create endpoint - see WARNINGS for more details."), demo_conf) return None #Check if the folder already exists, and delete it if needed. def check_if_install_folder_exists(self, demo_name: str, install_path: str, demo_conf: DemoConf, overwrite=False, debug=False): install_path = install_path+"/"+demo_name s = self.db.get("2.0/workspace/get-status", {"path": install_path}) if 'object_type' in s: if not overwrite: self.report.display_folder_already_existing(ExistingResourceException(install_path, s), demo_conf) if debug: print(f" Folder {install_path} already exists. Deleting the existing content...") assert install_path.lower() not in ['/users', '/repos', '/shared', '/workspace', '/workspace/shared', '/workspace/users'],\ "Demo name is missing, shouldn't happen. Fail to prevent main deletion." d = self.db.post("2.0/workspace/delete", {"path": install_path, 'recursive': True}) if 'error_code' in d: self.report.display_folder_permission(FolderDeletionException(install_path, d), demo_conf) def install_notebooks(self, demo_name: str, install_path: str, demo_conf: DemoConf, cluster_name: str, cluster_id: str, pipeline_ids, dashboards, workflows, repos, overwrite=False, use_current_cluster=False, genie_rooms = [], debug=False): assert len(demo_name) > 4, "wrong demo name. Fail to prevent potential delete errors." if debug: print(f' Installing notebooks') install_path = install_path+"/"+demo_name folders_created = set() #Avoid multiple mkdirs in parallel as it's creating error. folders_created_lock = threading.Lock() def load_notebook(notebook): return load_notebook_path(notebook, "bundles/"+demo_name+"/install_package/"+notebook.get_clean_path()) def load_notebook_path(notebook: DemoNotebook, template_path): parent = str(Path(install_path+"/"+notebook.get_clean_path()).parent) with folders_created_lock: if parent not in folders_created: r = self.db.post("2.0/workspace/mkdirs", {"path": parent}) folders_created.add(parent) if 'error_code' in r: if r['error_code'] == "RESOURCE_ALREADY_EXISTS": self.report.display_folder_creation_error(FolderCreationException(install_path, r), demo_conf) if notebook.object_type == "FILE": file = self.get_resource(template_path, decode=False) # Decode file content, replace schema, then re-encode file_content = file.decode('utf-8') file_content = NotebookParser.replace_schema_in_content(file_content, demo_conf) file_encoded = base64.b64encode(file_content.encode('utf-8')).decode("utf-8") r = self.db.post(f"2.0/workspace/import", {"path": install_path+"/"+notebook.get_clean_path(), "content": file_encoded, "format": "AUTO", "overwrite": False}) if 'error_code' in r: self.report.display_folder_creation_error(FolderCreationException(f"{install_path}/{notebook.get_clean_path()}", r), demo_conf) elif notebook.object_type == "DIRECTORY": zip_folder = self.get_resource(template_path+".zip", decode=False) zip_folder_encoded = base64.b64encode(zip_folder).decode("utf-8") r = self.db.post(f"2.0/workspace/import", {"path": install_path+"/"+notebook.get_clean_path()+".zip", "content": zip_folder_encoded, "format": "AUTO", "overwrite": False}) if 'error_code' in r: self.report.display_folder_creation_error(FolderCreationException(f"{install_path}/{notebook.get_clean_path()}", r), demo_conf) else: html = self.get_resource(template_path+".html") parser = NotebookParser(html) if notebook.add_cluster_setup_cell and not use_current_cluster: self.add_cluster_setup_cell(parser, demo_name, cluster_name, cluster_id, self.db.conf.workspace_url) parser.replace_dynamic_links_lakeview_dashboards(dashboards) parser.replace_dynamic_links_genie(genie_rooms) parser.remove_automl_result_links() parser.replace_schema(demo_conf) parser.replace_dynamic_links_pipeline(pipeline_ids) parser.replace_dynamic_links_repo(repos) parser.remove_delete_cell() parser.replace_dynamic_links_workflow(workflows) parser.set_tracker_tag(self.get_org_id(), self.get_uid(), demo_conf.category, demo_name, notebook.get_clean_path(), self.db.conf.username) content = parser.get_html() content = base64.b64encode(content.encode("utf-8")).decode("utf-8") r = self.db.post("2.0/workspace/import", {"path": install_path+"/"+notebook.get_clean_path(), "content": content, "format": "HTML"}) if 'error_code' in r: self.report.display_folder_creation_error(FolderCreationException(f"{install_path}/{notebook.get_clean_path()}", r), demo_conf) return notebook #Always adds the licence notebooks with ThreadPoolExecutor(max_workers=self.max_workers) as executor: notebooks = [ DemoNotebook("_resources/LICENSE", "LICENSE", "Demo License"), DemoNotebook("_resources/NOTICE", "NOTICE", "Demo Notice"), DemoNotebook("_resources/README", "README", "Readme") ] def load_notebook_template(notebook): load_notebook_path(notebook, f"template/{notebook.title}") collections.deque(executor.map(load_notebook_template, notebooks)) with ThreadPoolExecutor(max_workers=self.max_workers) as executor: return [n for n in executor.map(load_notebook, demo_conf.notebooks)] def load_demo_pipelines(self, demo_name, demo_conf: DemoConf, debug=False, serverless=False, dlt_policy_id = None, dlt_compute_settings = None): #default cluster conf pipeline_ids = [] for pipeline in demo_conf.pipelines: definition = pipeline["definition"] if "event_log" not in definition: definition["event_log"] = {"catalog": demo_conf.catalog, "schema": demo_conf.schema, "name": "dlt_event_log_"} if "target" in definition: definition["schema"] = definition["target"] del definition["target"] #target is deprecated now (https://docs.databricks.com/api/workspace/pipelines/create#schema) #Force channel to current due to issue with PREVIEW on serverless with python verison definition["channel"] = "CURRENT" today = date.today().strftime("%Y-%m-%d") #modify cluster definitions if serverless if serverless: if "clusters" in definition: del definition['clusters'] definition['photon'] = True definition['serverless'] = True if dlt_policy_id is not None: self.report.display_pipeline_error(SDPCreationException(f"Policy ID is not supported for serverless pipelines, {dlt_policy_id}", definition, None)) else: #enforce demo tagging in the cluster for cluster in definition["clusters"]: merge_dict(cluster, {"custom_tags": {"project": "dbdemos", "demo": demo_name, "demo_install_date": today}}) if dlt_policy_id is not None: cluster["dlt_policy_id"] = dlt_policy_id if self.db.conf.get_demo_pool() is not None: cluster["instance_pool_id"] = self.db.conf.get_demo_pool() if "node_type_id" in cluster: del cluster["node_type_id"] if "enable_elastic_disk" in cluster: del cluster["enable_elastic_disk"] if "aws_attributes" in cluster: del cluster["aws_attributes"] if dlt_compute_settings is not None: merge_dict(cluster, dlt_compute_settings) existing_pipeline = self.get_pipeline(definition["name"]) if debug: print(f' Installing pipeline {definition["name"]}') if existing_pipeline == None: p = self.db.post("2.0/pipelines", definition) if 'error_code' in p and p['error_code'] == 'FEATURE_DISABLED': message = f'SDP pipelines are not available in this workspace. Only Premium workspaces are supported on Azure.' pipeline_ids.append({"name": pipeline["definition"]["name"], "uid": "INSTALLATION_ERROR", "id": pipeline["id"], "error": True}) self.report.display_pipeline_error(SDPNotAvailableException(message, definition, p)) continue if 'error_code' in p: pipeline_ids.append({"name": pipeline["definition"]["name"], "uid": "INSTALLATION_ERROR", "id": pipeline["id"], "error": True}) self.report.display_pipeline_error(SDPCreationException(f"Error creating the SDP pipeline: {p['error_code']}", definition, p)) continue id = p['pipeline_id'] else: if debug: print(" Updating existing pipeline with last configuration") id = existing_pipeline['pipeline_id'] p = self.db.put("2.0/pipelines/"+id, definition) if 'error_code' in p: pipeline_ids.append({"name": pipeline["definition"]["name"], "uid": "INSTALLATION_ERROR", "id": pipeline["id"], "error": True}) if 'complete the migration' in str(p).lower() or 'CANNOT_SET_SCHEMA_FOR_EXISTING_PIPELINE' in str(p): self.report.display_pipeline_error_migration(SDPCreationException(f"Please delete the existing SDP pipeline id {id} before re-installing this demo.", definition, p)) else: self.report.display_pipeline_error(SDPCreationException(f"Error updating the SDP pipeline {id}: {p['error_code']}", definition, p)) continue permissions = self.db.patch(f"2.0/preview/permissions/pipelines/{id}", { "access_control_list": [{"group_name": "users", "permission_level": "CAN_MANAGE"}] }) if 'error_code' in permissions: print(f"WARN: Couldn't update the pipeline permission for all users to access: {permissions}. Try deleting the pipeline first?") pipeline_ids.append({"name": definition['name'], "uid": id, "id": pipeline["id"], "run_after_creation": pipeline["run_after_creation"]}) #Update the demo conf tags {{}} with the actual id (to be loaded as a job for example) demo_conf.set_pipeline_id(pipeline["id"], id) return pipeline_ids def load_demo_cluster(self, demo_name, demo_conf: DemoConf, update_cluster_if_exists, start_cluster = None, use_cluster_id = None): if use_cluster_id is not None: return (use_cluster_id, "Interactive cluster you used for installation - make sure the cluster configuration matches.") if demo_conf.create_cluster == False: return (None, "This demo doesn't require cluster") #Do not start clusters by default in Databricks FE clusters to avoid costs as we have shared clusters for demos if start_cluster is None: start_cluster = not (self.db.conf.is_dev_env() or self.db.conf.is_fe_env()) #default cluster conf conf_template = ConfTemplate(self.db.conf.username, demo_name) cluster_conf = self.get_resource("resources/default_cluster_config.json") cluster_conf = json.loads(conf_template.replace_template_key(cluster_conf)) #add cloud specific setup cloud = self.get_current_cloud() cluster_conf_cloud = self.get_resource(f"resources/default_cluster_config-{cloud}.json") cluster_conf_cloud = json.loads(conf_template.replace_template_key(cluster_conf_cloud)) merge_dict(cluster_conf, cluster_conf_cloud) merge_dict(cluster_conf, demo_conf.cluster) if "driver_node_type_id" in cluster_conf: if cloud not in cluster_conf["driver_node_type_id"] or cloud not in cluster_conf["node_type_id"]: raise Exception(f"""ERROR CREATING CLUSTER FOR DEMO {demo_name}. You need to speficy the cloud type for all clouds: "node_type_id": {"AWS": "g5.4xlarge", "AZURE": "Standard_NC8as_T4_v3", "GCP": "a2-highgpu-1g"} and "driver_node_type_id" """) cluster_conf["node_type_id"] = cluster_conf["node_type_id"][cloud] cluster_conf["driver_node_type_id"] = cluster_conf["driver_node_type_id"][cloud] if "spark.databricks.cluster.profile" in cluster_conf["spark_conf"] and cluster_conf["spark_conf"]["spark.databricks.cluster.profile"] == "singleNode": del cluster_conf["autoscale"] cluster_conf["num_workers"] = 0 existing_cluster = self.find_cluster(cluster_conf["cluster_name"]) if existing_cluster is None: cluster = self.db.post("2.0/clusters/create", json = cluster_conf) if "error_code" in cluster and cluster["error_code"] == "PERMISSION_DENIED": raise ClusterPermissionException(f"Can't create cluster for demo {demo_name}", cluster_conf, cluster) if "cluster_id" not in cluster or "error_code" in cluster: print(f" WARN: couldn't create the cluster for the demo: {cluster}") raise ClusterCreationException(f"Can't create cluster for demo {demo_name}", cluster_conf, cluster) else: cluster_conf["cluster_id"] = cluster["cluster_id"] else: cluster_conf["cluster_id"] = existing_cluster["cluster_id"] cluster = self.db.get("2.0/clusters/get", params = {"cluster_id": cluster_conf["cluster_id"]}) self.wait_for_cluster_to_stop(cluster_conf, cluster) if update_cluster_if_exists: cluster = self.db.post("2.0/clusters/edit", json = cluster_conf) if "error_code" in cluster and cluster["error_code"] != "INVALID_STATE": raise ClusterCreationException(f"couldn't edit the cluster conf for {demo_name}", cluster_conf, cluster) self.wait_for_cluster_to_stop(cluster_conf, cluster) if len(demo_conf.cluster_libraries) > 0: install = self.db.post("2.0/libraries/install", json = {"cluster_id": cluster_conf["cluster_id"], "libraries": demo_conf.cluster_libraries}) if "error_code" in install: print(f"WARN: Couldn't install the libs: {cluster_conf}, libraries={demo_conf.cluster_libraries}") # Only start if the cluster already exists (it's starting by default for new cluster) if existing_cluster is not None and start_cluster: start = self.db.post("2.0/clusters/start", json = {"cluster_id": cluster_conf["cluster_id"]}) if "error_code" in start: if start["error_code"] == "INVALID_STATE" and \ ("unexpected state Pending" in start["message"] or "unexpected state Restarting" in start["message"]): print(f"INFO: looks like the cluster is already starting... full answer: {start}") else: raise ClusterCreationException(f"Couldn't start the cluster for {demo_name}: {start['error_code']} - {start['message']}", cluster_conf, start) return cluster_conf['cluster_id'], cluster_conf['cluster_name'] def wait_for_cluster_to_stop(self, cluster_conf, cluster): if "error_code" in cluster and cluster["error_code"] == "INVALID_STATE": print(f" Demo cluster {cluster_conf['cluster_name']} in invalid state. Stopping it...") cluster = self.db.post("2.0/clusters/delete", json = {"cluster_id": cluster_conf["cluster_id"]}) i = 0 while i < 30: i += 1 cluster = self.db.get("2.0/clusters/get", params = {"cluster_id": cluster_conf["cluster_id"]}) if cluster["state"] == "TERMINATED": print(" Cluster properly stopped.") break time.sleep(2) if cluster["state"] != "TERMINATED": print(f" WARNING: Couldn't stop the demo cluster properly. Unknown state. Please stop your cluster {cluster_conf['cluster_name']} before.") #return the cluster with the given name or none def find_cluster(self, cluster_name): clusters = self.db.get("2.0/clusters/list") if "clusters" in clusters: for c in clusters["clusters"]: if c["cluster_name"] == cluster_name: return c return None def get_pipeline(self, name): def get_pipelines(token = None): r = self.db.get("2.0/pipelines", {"max_results": 100, "page_token": token}) if "statuses" in r: for p in r["statuses"]: if p["name"] == name: return p if "next_page_token" in r: return get_pipelines(r["next_page_token"]) return None return get_pipelines() def add_cluster_setup_cell(self, parser: NotebookParser, demo_name, cluster_name, cluster_id, env_url): content = """%md \n### A cluster has been created for this demo\nTo run this demo, just select the cluster `{{CLUSTER_NAME}}` from the dropdown menu ([open cluster configuration]({{ENV_URL}}/#setting/clusters/{{CLUSTER_ID}}/configuration)). <br />\n*Note: If the cluster was deleted after 30 days, you can re-create it with `dbdemos.create_cluster('{{DEMO_NAME}}')` or re-install the demo: `dbdemos.install('{{DEMO_NAME}}')`*""" content = content.replace("{{DEMO_NAME}}", demo_name) \ .replace("{{ENV_URL}}", env_url) \ .replace("{{CLUSTER_NAME}}", cluster_name) \ .replace("{{CLUSTER_ID}}", cluster_id) parser.add_extra_cell(content) def add_extra_cell(self, html, cell_content, position = 0): command = { "version": "CommandV1", "subtype": "command", "commandType": "auto", "position": 1, "command": cell_content } raw_content, content = self.get_notebook_content(html) content = json.loads(urllib.parse.unquote(content)) content["commands"].insert(position, command) content = urllib.parse.quote(json.dumps(content), safe="()*''") return html.replace(raw_content, base64.b64encode(content.encode('utf-8')).decode('utf-8')) def get_notebook_content(self, html): match = re.search(r'__DATABRICKS_NOTEBOOK_MODEL = \'(.*?)\'', html) raw_content = match.group(1) return raw_content, base64.b64decode(raw_content).decode('utf-8') ================================================ FILE: dbdemos/installer_dashboard.py ================================================ from .conf import DemoConf import pkg_resources from typing import TYPE_CHECKING if TYPE_CHECKING: from .installer import Installer class InstallerDashboard: def __init__(self, installer: 'Installer'): self.installer = installer self.db = installer.db def install_dashboards(self, demo_conf: DemoConf, install_path, warehouse_name = None, debug = True, genie_rooms = None): if len(demo_conf.dashboards) > 0: try: if debug: print(f'installing {len(demo_conf.dashboards)} dashboards...') installed_dash = [self.load_lakeview_dashboard(demo_conf, install_path, d, warehouse_name, genie_rooms) for d in demo_conf.dashboards] if debug: print(f'dashboard installed: {installed_dash}') return installed_dash except Exception as e: self.installer.report.display_dashboard_error(e, demo_conf) elif "dashboards" in pkg_resources.resource_listdir("dbdemos", "bundles/"+demo_conf.name): raise Exception("Old dashboard are not supported anymore. This shouldn't happen - please fill a bug") return [] def replace_dashboard_schema(self, demo_conf: DemoConf, definition: str): import re #main__build is used during the build process to avoid collision with default main. #main_build is used because agent don't support __ in their catalog name. definition = re.sub(r"`?main[_]{1,2}build`", "main", definition) definition = re.sub(r"main[_]{1,2}build\.", "main.", definition) definition = re.sub(r"`main[_]{1,2}build`\.", "`main`.", definition) if demo_conf.custom_schema_supported: return re.sub(r"`?" + re.escape(demo_conf.default_catalog) + r"`?\.`?" + re.escape(demo_conf.default_schema) + r"`?", f"`{demo_conf.catalog}`.`{demo_conf.schema}`", definition) return definition def load_lakeview_dashboard(self, demo_conf: DemoConf, install_path, dashboard, warehouse_name = None, genie_rooms = None): endpoint = self.installer.get_or_create_endpoint(self.db.conf.name, demo_conf, warehouse_name = warehouse_name) try: definition = self.installer.get_resource(f"bundles/{demo_conf.name}/install_package/_resources/dashboards/{dashboard['id']}.lvdash.json") definition = self.replace_dashboard_schema(demo_conf, definition) except Exception as e: raise Exception(f"Can't load dashboard {dashboard} in demo {demo_conf.name}. Check bundle configuration under dashboards: [..]. " f"The dashboard id should match the file name under the _resources/dashboard/<dashboard> folder.. {e}") # If dashboard['genie_room_id'] matches a created Genie, set overrideId to that Genie's UID. try: target_room_uid = None mapped_room_id = dashboard.get("genie_room_id") if mapped_room_id and genie_rooms: for room in genie_rooms: if room.get("id") == mapped_room_id: target_room_uid = room.get("uid") break if target_room_uid: import re as _re pattern = r'"overrideId"\s*:\s*""' if _re.search(pattern, definition): definition = _re.sub(pattern, f'"overrideId": "{target_room_uid}"', definition, count=1) # If mapping missing or UID not found, skip injection silently. except Exception: pass dashboard_path = f"{install_path}/{demo_conf.name}/_dashboards" #Make sure the dashboard folder exists f = self.db.post("2.0/workspace/mkdirs", {"path": dashboard_path}) if "error_code" in f: raise Exception(f"ERROR - wrong install path, can't save dashboard here: {f} - {dashboard_path}") #Avoid issue with / in the dashboard name (such as AI/BI) dashboard['name'] = dashboard['name'].replace('/', '') dashboard_creation = self.db.post(f"2.0/lakeview/dashboards", { "display_name": dashboard['name'], "warehouse_id": endpoint['warehouse_id'], "serialized_dashboard": definition, "parent_path": dashboard_path }) dashboard['uid'] = dashboard_creation['dashboard_id'] dashboard['is_lakeview'] = True return dashboard ================================================ FILE: dbdemos/installer_genie.py ================================================ import json from concurrent.futures import ThreadPoolExecutor, as_completed from databricks.sdk import WorkspaceClient from databricks.sdk.service.catalog import VolumeType from databricks.sdk.service.sql import StatementState from dbdemos.sql_query import SQLQueryExecutor from .conf import DataFolder, DemoConf, GenieRoom from .exceptions.dbdemos_exception import GenieCreationException, DataLoaderException, SQLQueryException from typing import TYPE_CHECKING if TYPE_CHECKING: from .installer import Installer class InstallerGenie: VOLUME_NAME = "dbdemos_raw_data" def __init__(self, installer: 'Installer'): self.installer = installer self.db = installer.db self.sql_query_executor = SQLQueryExecutor() def install_genies(self, demo_conf: DemoConf, install_path: str, warehouse_name: str, skip_genie_rooms: bool, debug=True): rooms = [] if len(demo_conf.genie_rooms) > 0 or len(demo_conf.data_folders) > 0: warehouse = self.installer.get_or_create_endpoint(self.db.conf.name, demo_conf, warehouse_name = warehouse_name, throw_error=True) try: warehouse_id = warehouse['endpoint_id'] self.load_genie_data(demo_conf, warehouse_id, debug) if not skip_genie_rooms and len(demo_conf.genie_rooms) > 0: if debug: print(f"Installing genie room {demo_conf.genie_rooms}") genie_path = f"{install_path}/{demo_conf.name}/_genie_spaces" #Make sure the genie folder exists self.db.post("2.0/workspace/mkdirs", {"path": genie_path}) path = self.db.get("2.0/workspace/get-status", {"path": genie_path}) if "error_code" in path: raise Exception(f"ERROR - wrong install path, can't save genie spaces here: {path}") for room in demo_conf.genie_rooms: rooms.append(self.install_genie(room, path, warehouse_id, debug)) except Exception as e: self.installer.report.display_genie_room_creation_error(e, demo_conf) return rooms def install_genie(self, room: GenieRoom, genie_path, warehouse_id, debug=True): #Genie rooms don't allow / anymore ws = WorkspaceClient(token=self.installer.db.conf.pat_token, host=self.installer.db.conf.workspace_url) self.create_temp_table_for_genie_creation(ws, room, warehouse_id, debug) room.display_name = room.display_name.replace("/", "-") room_payload = { "display_name": room.display_name, "description": room.description, "warehouse_id": warehouse_id, "table_identifiers": room.table_identifiers, "parent_folder": f'folders/{genie_path["object_id"]}', "run_as_type": "VIEWER" } created_room = self.db.post("2.0/data-rooms", json=room_payload) if 'id' not in created_room: raise GenieCreationException(f"Error creating room {room_payload} - {created_room}", room, created_room) if debug: print(f"Genie room created created_room: {created_room} - {room_payload}") actions = [{ "action_type": "CREATE", "curated_question": { "data_room_id": created_room['id'], "question_text": q, "question_type": "SAMPLE_QUESTION" } } for q in room.curated_questions] questions = self.db.post(f"2.0/data-rooms/{created_room['id']}/curated-questions/batch-actions", {"actions": actions}) if debug: print(f"Genie room question created:{questions}") if room.instructions: instructions = self.db.post(f"2.0/data-rooms/{created_room['id']}/instructions", {"title": "Notes", "content": room.instructions, "instruction_type": "TEXT_INSTRUCTION"}) if debug: print(f"genie room instructions: {instructions}") for function_name in room.function_names: instructions = self.db.post(f"2.0/data-rooms/{created_room['id']}/instructions", {"title": "SQL Function", "content": function_name, "instruction_type": "CERTIFIED_ANSWER"}) if debug: print(f"genie room function: {instructions}") for sql in room.sql_instructions: instructions = self.db.post(f"2.0/data-rooms/{created_room['id']}/instructions", {"title": sql['title'], "content": sql['content'], "instruction_type": "SQL_INSTRUCTION"}) if debug: print(f"genie room SQL instructions: {instructions}") for b in room.benchmarks: benchmark = { "question_text": b["question_text"], "question_type":"BENCHMARK", "answer_text": b["answer_text"], "is_deprecated": False, "updatable_fields_mask":[] } instructions = self.db.post(f"2.0/data-rooms/{created_room['id']}/curated-questions", {"curated_question": benchmark, "data_room_id":created_room['id']}) if debug: print(f"genie room benchmarks: {instructions}") self.delete_temp_table_for_genie_creation(ws, room, debug) return {"id": room.id, "uid": created_room['id'], 'name': room.display_name} # we need to have the table existing before creating the genie room, however they're created in SDP which is in a job and not yet available. # This is a workaround to create a temp table with a property that will be used to delete it once the genie room is created so that the SDP table can run without issue. def create_temp_table_for_genie_creation(self, ws: WorkspaceClient, room: GenieRoom, warehouse_id, debug=False): for table in room.table_identifiers: if not ws.tables.exists(table).table_exists: sql_query = f"CREATE TABLE IF NOT EXISTS {table} TBLPROPERTIES ('dbdemos.mock_table_for_genie' = 1);" if debug: print(f"Creating temp genie table {table}: {sql_query}") self.sql_query_executor.execute_query(ws, sql_query, warehouse_id=warehouse_id, debug=debug) def delete_temp_table_for_genie_creation(self, ws, room: GenieRoom, debug=False): for table in room.table_identifiers: if ws.tables.exists(table).table_exists and 'dbdemos.mock_table_for_genie' in ws.tables.get(table).properties: if debug: print(f'Deleting temp genie table {table}') ws.tables.delete(table) def load_genie_data(self, demo_conf: DemoConf, warehouse_id, debug=True): if demo_conf.data_folders: print(f"Loading data in your schema {demo_conf.catalog}.{demo_conf.schema} using warehouse {warehouse_id}, this might take a few seconds (you can use another warehouse with the option: warehouse_name='xxx')...") ws = WorkspaceClient(token=self.installer.db.conf.pat_token, host=self.installer.db.conf.workspace_url) if any(d.target_volume_folder_name is not None for d in demo_conf.data_folders): self.create_raw_data_volume(ws, demo_conf, debug) with ThreadPoolExecutor(max_workers=3) as executor: futures = [executor.submit(self.load_data, ws, data_folder, warehouse_id, demo_conf, debug) for data_folder in demo_conf.data_folders] for future in futures: future.result() if demo_conf.sql_queries: self.run_sql_queries(ws, demo_conf, warehouse_id, debug) def run_sql_queries(self, ws: WorkspaceClient, demo_conf: DemoConf, warehouse_id, debug=True): for batch in demo_conf.sql_queries: with ThreadPoolExecutor(max_workers=5) as ex: futures = [ex.submit(self.sql_query_executor.execute_query, ws, q, warehouse_id=warehouse_id, debug=debug) for q in batch] for f in as_completed(futures): try: f.result() except SQLQueryException as e: if "tag name" in str(e).lower(): print(f"Warn - SQL error on tag ignored - probably free edition: {e}") else: raise def get_current_cluster_id(self): return json.loads(self.installer.get_dbutils_tags_safe()['clusterId']) def load_data(self, ws: WorkspaceClient, data_folder: DataFolder, warehouse_id, conf: DemoConf, debug=True): # Load table to a table if data_folder.target_table_name: try: sql_query = f"""CREATE TABLE IF NOT EXISTS {conf.catalog}.{conf.schema}.{data_folder.target_table_name} as SELECT * FROM read_files('s3://dbdemos-dataset/{data_folder.source_folder}', format => '{data_folder.source_format}', pathGlobFilter => '*.{data_folder.source_format}')""" if debug: print(f"Loading data {data_folder}: {sql_query}") self.sql_query_executor.execute_query(ws, sql_query, warehouse_id=warehouse_id, debug=debug) except Exception as e: if "com.amazonaws.auth.BasicSessionCredentials" in str(e): print("INFO: Basic Credential error detected downloading the files from our demo bucket. Will try to load data to volume first, please wait as this is a slower workflow...") self.create_raw_data_volume(ws, conf, debug) self.load_data_to_volume(ws, data_folder, conf, debug) self.create_table_from_volume(ws, data_folder, warehouse_id, conf, debug) else: raise DataLoaderException(f"Error loading data from S3: {str(e)}") else: self.load_data_to_volume(ws, data_folder, conf, debug) # Class-level lock for volume creation import threading _volume_creation_lock = threading.Lock() def create_raw_data_volume(self, ws: WorkspaceClient, demo_conf: DemoConf, debug=True): with InstallerGenie._volume_creation_lock: full_volume_name = f"{demo_conf.catalog}/{demo_conf.schema}/{InstallerGenie.VOLUME_NAME}" try: ws.volumes.read(f"{demo_conf.catalog}.{demo_conf.schema}.{InstallerGenie.VOLUME_NAME}") except Exception as e: if debug: print(f"Volume {full_volume_name} doesn't seem to exist, creating it - {e}") try: ws.volumes.create( catalog_name=demo_conf.catalog, schema_name=demo_conf.schema, name=InstallerGenie.VOLUME_NAME, volume_type=VolumeType.MANAGED ) except Exception as e: raise DataLoaderException(f"Can't create volume {full_volume_name} to load data demo, and it doesn't seem to be existing. <br/>" f"Please create the volume or grant you USAGE/READ permission, or install the demo in another catalog: dbdemos.install(xxx, catalog=xxx, schema=xxx, warehouse_id=xx).<br/>" f" {e}") # -------------------------------------------------------------------------------------------------------------------------------------------- # Experimental, first upload data to the volume as some warehouse don't have access to the S3 bucket directly when instance profiles exist. # -------------------------------------------------------------------------------------------------------------------------------------------- def load_data_through_volume(self, ws: WorkspaceClient, data_folders: list[DataFolder], warehouse_id: str, demo_conf: DemoConf, debug=True): print('INFO: Basic Credential error detected downloading the files from our demo S3 bucket. Will try to load data to volume first, please wait as this might take a while...') self.create_raw_data_volume(ws, demo_conf, debug) def load_data_and_create_table(ws: WorkspaceClient, data_folder: DataFolder, warehouse_id: str, demo_conf: DemoConf, debug=True): self.load_data_to_volume(ws, demo_conf, data_folder, debug) self.create_table_from_volume(ws, data_folder, warehouse_id, demo_conf, debug) with ThreadPoolExecutor(max_workers=3) as executor: futures = [executor.submit(load_data_and_create_table, ws, data_folder, warehouse_id, demo_conf, debug) for data_folder in data_folders] for future in futures: future.result() def load_data_to_volume(self, ws: WorkspaceClient, data_folder: DataFolder, demo_conf: DemoConf, debug=True): assert data_folder.source_format in ["csv", "json", "parquet"], "data loader through volume only support csv, json and parquet" import requests import collections dbutils = self.installer.get_dbutils() try: folder = data_folder.target_volume_folder_name if data_folder.target_volume_folder_name else data_folder.source_folder #first try with a dbutils copy if available copied_successfully = False if debug: print(f"Copying {data_folder.source_folder} to {f'/Volumes/{demo_conf.catalog}/{demo_conf.schema}/{InstallerGenie.VOLUME_NAME}/{folder}'} using dbutils fs.cp") if dbutils is not None: try: dbutils.fs.cp(f"s3://dbdemos-dataset/{data_folder.source_folder}", f"/Volumes/{demo_conf.catalog}/{demo_conf.schema}/{InstallerGenie.VOLUME_NAME}/{folder}", recurse=True) copied_successfully = True except Exception as e: copied_successfully = False if debug: print(f"Error copying {data_folder.source_folder} to {f'/Volumes/{demo_conf.catalog}/{demo_conf.schema}/{InstallerGenie.VOLUME_NAME}/{folder}'} using dbutils fs.cp: {e}") if copied_successfully and debug: print(f"Copied {data_folder.source_folder} to {f'/Volumes/{demo_conf.catalog}/{demo_conf.schema}/{InstallerGenie.VOLUME_NAME}/{folder}'} using dbutils fs.cp") if not copied_successfully: # Get list of files from GitHub API, to avoid adding a S3 boto dependency just for this github_path = f"https://api.github.com/repos/databricks-demos/dbdemos-dataset/contents/{data_folder.source_folder}" if debug: print(f"Getting files from {github_path}") files = requests.get(github_path).json() if 'message' in files: print(f"Error getting files from {github_path}: {files}") files = [f['download_url'] for f in files] if debug: print(f"Found {len(files)} files in GitHub repo for {data_folder.source_folder}") def copy_file(file_url): if not file_url.endswith('/'): file_name = file_url.split('/')[-1] target_path = f"/Volumes/{demo_conf.catalog}/{demo_conf.schema}/{InstallerGenie.VOLUME_NAME}/{folder}/{file_name}" s3_url = file_url.replace("https://raw.githubusercontent.com/databricks-demos/dbdemos-dataset/main/", "https://dbdemos-dataset.s3.amazonaws.com/") if debug: print(f"Copying {s3_url} to {target_path}") response = requests.get(s3_url) response.raise_for_status() if debug: print(f"File {file_name} in memory. sending to volume...") import io buffer = io.BytesIO(response.content) ws.files.upload(target_path, buffer, overwrite=True) if debug: print(f"File {file_name} in volume!") with ThreadPoolExecutor(max_workers=5) as executor: collections.deque(executor.map(copy_file, files)) except Exception as e: raise DataLoaderException(f"Error loading data from S3: {str(e)}") def create_table_from_volume(self, ws: WorkspaceClient, data_folder: DataFolder, warehouse_id, conf: DemoConf, debug=True): self.sql_query_executor.execute_query(ws, f"""CREATE TABLE IF NOT EXISTS {conf.catalog}.{conf.schema}.{data_folder.target_table_name} as SELECT * FROM read_files('/Volumes/{conf.catalog}/{conf.schema}/{InstallerGenie.VOLUME_NAME}/{data_folder.source_folder}', format => '{data_folder.source_format}', pathGlobFilter => '*.{data_folder.source_format}')""", warehouse_id=warehouse_id, debug=debug) ================================================ FILE: dbdemos/installer_report.py ================================================ from .conf import DemoConf from .exceptions.dbdemos_exception import ClusterCreationException, ExistingResourceException, FolderDeletionException, \ SDPException, WorkflowException, FolderCreationException, TokenException from pathlib import Path import json class InstallerReport: NOTEBOOK_SVG = """<svg width="1em" height="1em" viewBox="0 0 16 16" fill="none" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" focusable="false" class=""><path fill-rule="evenodd" clip-rule="evenodd" d="M3 1.75A.75.75 0 013.75 1h10.5a.75.75 0 01.75.75v12.5a.75.75 0 01-.75.75H3.75a.75.75 0 01-.75-.75V12.5H1V11h2V8.75H1v-1.5h2V5H1V3.5h2V1.75zm1.5.75v11H6v-11H4.5zm3 0v11h6v-11h-6z" fill="currentColor"></path></svg>""" FOLDER_SVG = """<svg width="1em" height="1em" viewBox="0 0 16 16" fill="none" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" focusable="false" class=""><path d="M.75 2a.75.75 0 00-.75.75v10.5c0 .414.336.75.75.75h14.5a.75.75 0 00.75-.75v-8.5a.75.75 0 00-.75-.75H7.81L6.617 2.805A2.75 2.75 0 004.672 2H.75z" fill="currentColor"></path></svg>""" DASHBOARD_SVG = """<svg width="1em" height="1em" viewBox="0 0 16 16" fill="none" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" focusable="false" class=""><path fill-rule="evenodd" clip-rule="evenodd" d="M1 1.75A.75.75 0 011.75 1h12.5a.75.75 0 01.75.75v12.5a.75.75 0 01-.75.75H1.75a.75.75 0 01-.75-.75V1.75zm1.5 8.75v3h4.75v-3H2.5zm0-1.5h4.75V2.5H2.5V9zm6.25-6.5v3h4.75v-3H8.75zm0 11V7h4.75v6.5H8.75z" fill="currentColor"></path></svg>""" GENIE_SVG = """<svg xmlns="http://www.w3.org/2000/svg" width="1em" height="1em" fill="none" viewBox="0 0 16 16" aria-hidden="true" focusable="false" class=""><path fill="currentColor" fill-rule="evenodd" d="M0 2.75A.75.75 0 0 1 .75 2H8v1.5H1.5v9h13V10H16v3.25a.75.75 0 0 1-.75.75H.75a.75.75 0 0 1-.75-.75zm12.987-.14a.75.75 0 0 0-1.474 0l-.137.728a1.93 1.93 0 0 1-1.538 1.538l-.727.137a.75.75 0 0 0 0 1.474l.727.137c.78.147 1.39.758 1.538 1.538l.137.727a.75.75 0 0 0 1.474 0l.137-.727c.147-.78.758-1.39 1.538-1.538l.727-.137a.75.75 0 0 0 0-1.474l-.727-.137a1.93 1.93 0 0 1-1.538-1.538z" clip-rule="evenodd"></path></svg>""" CSS_REPORT = """ <style> .dbdemos_install{ font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica Neue,Arial,Noto Sans,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol,Noto Color Emoji,FontAwesome; color: #3b3b3b; box-shadow: 0 .15rem 1.15rem 0 rgba(58,59,69,.15)!important; padding: 10px 20px 20px 20px; margin: 10px; } .dbdemos_block{ display: block !important; } .update_container { display: flex; gap: 20px; padding: 10px; margin: 10px 0; } .update_box { flex: 1; background-color: #f3fff9; padding: 15px; border-radius: 5px; box-shadow: 0 .15rem 1.15rem 0 rgba(58, 59, 69, .15); overflow: hidden; } .update_title { font-weight: bold; color: #34a853; margin-bottom: 10px; font-size: 1.4em; } .code { padding: 5px; border: 1px solid #e4e4e4; font-family: monospace; background-color: #f5f5f5; margin: 5px 0px 0px 0px; display: inline; } .update_image { float: right; width: 200px; margin: 0 0 10px 10px; border-radius: 5px; } .subfolder { padding-left: 30px; } .notebook { margin-bottom: 3px; } .dbdemos_install a { color: #3835a4; } .container_dbdemos { padding-left: 20px; } .path_desc { color: #928e9b; font-style: oblique; } </style>""" def __init__(self, workspace_url: str): self.workspace_url = workspace_url def displayHTML_available(self): try: from dbruntime.display import displayHTML return True except: return False def display_cluster_creation_warn(self, exception: ClusterCreationException, demo_conf: DemoConf): self.display_error(exception, f"By default, dbdemos tries to create a new cluster for your demo with the proper settings. <br/>" f"dbdemos couldn't create a cluster for you (probably due to your permissions). Instead we will use the current cluster to run the setup job and load the data.<br/>" f"For the demo to run properly, <strong>make sure your cluster has UC enabled and using Databricks Runtime (DBR) version {exception.cluster_conf['spark_version']}</strong>.<br/>" f"<i>Note: you can avoid this message setting `use_current_cluster`:</i><br/>" f"""<div class="code dbdemos_block">dbdemos.install('{demo_conf.name}', use_current_cluster = True)</div><br/>""" f"<strong>Cluster creation details</strong><br/>" f"""Full cluster configuration: <div class="code dbdemos_block">{json.dumps(exception.cluster_conf)}.</div><br/>""" f"""Full error: <div class="code dbdemos_block">{json.dumps(exception.response)}</div>""", raise_error=False, warning=True) def display_serverless_warn(self, exception: Exception, demo_conf: DemoConf): self.display_error(exception, f"This demo might not fully work on Serverless and Databricks Test Drive!<br/>" f"We're actively working to update this content to fully work on serverless.<br/>" f"Some of the notebooks might not work as expected as they are tested with DBRML, we'll be releasing an new version very shortly, stay tuned!<br/>", raise_error=False, warning=True) def display_custom_schema_not_supported_error(self, exception: Exception, demo_conf: DemoConf): self.display_error(exception, f"This demo doesn't support custom catalog/schema yet.<br/>" f"Please open a Github issue to accelerate the support for this demo.<br/>" f"Remove the 'catalog' and 'schema' option from your installation.<br/>") def display_custom_schema_missing_error(self, exception: Exception, demo_conf: DemoConf): self.display_error(exception, f"Both schema and catalog option must be defined.<br/>" f"""<div class="code dbdemos_block">dbdemos.install('{demo_conf.name}', catalog = 'xxx', schema = 'xxx')</div><br/>""") def display_incorrect_schema_error(self, exception: Exception, demo_conf: DemoConf): self.display_error(exception, f"Incorrect schema/catalog name.<br/>" f"""Please use a correct catalog/schema name. Use '_' instead of '-'.""") def display_warehouse_creation_error(self, exception: Exception, demo_conf: DemoConf): self.display_error(exception, f"""This demo requires a warehouse to work and couldn't find one or create one!<br/> You can specify the SQL warehouse you would like to use to load the dashboard with warehouse_name = 'xxx': <div class="code dbdemos_block">dbdemos.install('{demo_conf.name}', warehouse_name = 'xxx')</div><br/>""") def display_unknow_warehouse_error(self, exception: Exception, demo_conf: DemoConf, warehouse_name: str): self.display_error(exception, f"""Can't find your warehouse!<br/> The warehouse you specified: {warehouse_name} can't be find. Make sure it exists and you have access to it. <div class="code dbdemos_block">dbdemos.install('{demo_conf.name}', warehouse_name = 'xxx')</div><br/>""") def display_genie_room_creation_error(self, exception: Exception, demo_conf: DemoConf): self.display_error(exception, f"""This demo couldn't install the genie room properly.<br/> Genie room support for DBDemos is in beta. You can skip the genie room installation with skip_genie_rooms = True: <div class="code dbdemos_block">dbdemos.install('{demo_conf.name}', skip_genie_rooms = True)</div><br/>""") def display_dashboard_error(self, exception: Exception, demo_conf: DemoConf): self.display_error(exception, f"""Couldn't create or update a dashboard. <br/> If this is a permission error, we recommend you to search the existing dashboard and delete it manually.<br/> You can skip the dashboard installation with skip_dashboards = True: <div class="code dbdemos_block">dbdemos.install('{demo_conf.name}', skip_dashboards = True)</div><br/> You can also specify the SQL warehouse you'd like to use to load the dashboard with warehouse_name = 'xxx': <div class="code dbdemos_block">dbdemos.install('{demo_conf.name}', warehouse_name = 'xxx')</div><br/>""") def display_folder_already_existing(self, exception: ExistingResourceException, demo_conf: DemoConf): self.display_error(exception, f"""Please install demo with overwrite=True to replace the existing folder content under {exception.install_path}: <div class="code dbdemos_block">dbdemos.install('{demo_conf.name}', overwrite=True, path='{exception.install_path}')</div><br/> All content under {exception.install_path} will be deleted.<br/><br/> <strong>Details</strong><br/> Folder list response: <div class="code dbdemos_block">{json.dumps(exception.response)}</div>""") def display_folder_permission(self, exception: FolderDeletionException, demo_conf: DemoConf): self.display_error(exception, f"""Can't delete the folder {exception.install_path}. <br/> Do you have read/write permission?<br/><br/> <strong>Details</strong><br/> Delete response: <div class="code dbdemos_block">{json.dumps(exception.response)}</div>""") def display_folder_creation_error(self, exception: FolderCreationException, demo_conf: DemoConf): self.display_error(exception, f"""Couldn't load the model in the current folder. Do you have permissions to write in {exception.install_path}? Please install demo with overwrite=True to replace the existing folder: <div class="code dbdemos_block">dbdemos.install('{demo_conf.name}', overwrite=True, path='{exception.install_path}')</div><br/> All content under {exception.install_path} will be deleted.<br/><br/> <strong>Details</strong><br/> Folder list response: <div class="code dbdemos_block">{json.dumps(exception.response)}</div>""") def display_non_premium_warn(self, exception: Exception, response): self.display_error(exception, f"""DBSQL isn't available in this workspace. Only Premium/Enterprise workspaces are supported.<br/> dbdemos will try its best to install the demo and load the notebooks, but some component won't be available (SDP pipelines, Dashboards etc).<br/> Forcing skip_dashboards = True and continuing.<br/><br/> <strong>Details</strong><br/> API response: <div class="code dbdemos_block">{json.dumps(response)}</div>""", raise_error=False, warning=True) def display_pipeline_error(self, exception: SDPException): self.display_error(exception, f"""{exception.description}. <br/> Skipping pipelines. Your demo will be installed without SDP pipelines.<br/><br/> <strong>Details</strong><br/> Pipeline configuration: <div class="code dbdemos_block">{json.dumps(exception.pipeline_conf)}</div> API response: <div class="code dbdemos_block">{json.dumps(exception.response)}</div>""", raise_error=False, warning=True) def display_pipeline_error_migration(self, exception: SDPException): self.display_error(exception, f"""{exception.description}. <br/> Skipping pipelines. Your demo will be installed without SDP pipelines.<br/><br/> <strong>Details</strong><br/> DBDemos updated its API to use the latest SDP features. You installed your pipeline on an older version which needs to be updated.<br/> The easiest fix is to delete the existing pipeline and re-install the demo to get the latest SDP features.<br/> Pipeline configuration: <div class="code dbdemos_block">{json.dumps(exception.pipeline_conf)}</div> API response: <div class="code dbdemos_block">{json.dumps(exception.response)}</div>""", raise_error=False, warning=True) def display_workflow_error(self, exception: WorkflowException, demo_name: str): self.display_error(exception, f"""{exception.details}. <br/> dbdemos creates jobs to load your demo data. If you don't have cluster creation permission, you can start the job using the current cluster. <div class="code dbdemos_block">dbdemos.install('{demo_name}', use_current_cluster=True)</div><br/> <strong>Details</strong><br/> Pipeline configuration: <div class="code dbdemos_block">{json.dumps(exception.job_config)}</div> API response: <div class="code dbdemos_block">{json.dumps(exception.response)}</div>""") def display_token_error(self, exception: TokenException, demo_name: str): self.display_error(exception, f"""dbdemos couldn't not programmatically acquire a pat token to call the API to install the demo.<br/> This can be due to the following: <ul><li>Legacy cluster being used with admin protection for "<a href="https://docs.databricks.com/administration-guide/account-settings/no-isolation-shared.html">No isolation shared</a>" (account level setting)."</li> <li>Restriction on Shared cluster</li></ul> Please use a cluster with Access mode set to Isolation, Single User and re-run your dbdemos command.<br/> Alternatively, you can use a PAT token in the install: <div class="code dbdemos_block">#Get pat token from the UI and save it as a token.<br/> pat_token = dbutils.secrets.get(scope="my_scope", key="dbdemos_token")<br/> dbdemos.install('{demo_name}', pat_token=pat_token)</div><br/> <strong>Details</strong><br/> Error: <div class="code dbdemos_block">{exception.message}</div>""") def display_demo_name_error(self, name, demos): html = "<h2>Demos available:</h2>" for cat in demos: html += f"<strong>{cat}</strong>" for demo in demos[cat]: html += f"""<div style="padding-left: 40px">{demo.name}: <span class="path_desc">{demo.description}</span></div>""" self.display_error(Exception(f"Demo '{name}' doesn't exist"), f"""This demo doesn't exist, please check your demo name and re-run the installation.<br/> {html} <br/><br/> To get a full demo list, please run <div class="code dbdemos_block">dbdemos.list_demos()</div>""") def display_error(self, exception, message, raise_error = True, warning = False): color = "#d18b2a" if warning else "#eb0707" level = "warning" if warning else "error" error = f"""{InstallerReport.CSS_REPORT}<div class="dbdemos_install"> <h1 style="color: {color}">Installation {level}: {exception}</h1> {message} </div>""" if self.displayHTML_available(): from dbruntime.display import displayHTML displayHTML(error) else: print(error) if raise_error: raise exception def display_install_info(self, demo_conf: DemoConf, install_path, catalog: str, schema: str): print(f"Installing demo {demo_conf.name} under {install_path}, please wait...") print(f"""Help us improving dbdemos, share your feedback or create an issue if something isn't working: https://github.com/databricks-demos/dbdemos""") # ----------------------------------------- # Update the new demo here # ----------------------------------------- info = """ <div class="update_container"> <div class="update_box"> <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/icon/declarative-pipelines.jpg" class="update_image"> <div class="update_title">Discover our Spark Declarative Pipelines demo!</div> <p>Discover how Lakeflow SDP simplifies batch and streaming ETL with automated reliability and built-in data quality:<br><br> <span class="code">dbdemos.install('pipeline-bike')</span> </p> </div> <div class="update_box"> <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/icon/aibi-marketing-campaign.jpg" class="update_image"> <div class="update_title">New AI Agent demo!</div> <p>Discover how to build, package and evaluate a multi-agent system with Databricks and MLFlow 3.0!<br><br> <span class="code">dbdemos.install('ai-agent')</span></p> </div> </div> """ if demo_conf.custom_schema_supported: if not catalog: info += f"""This demo supports utilizing custom Unity Catalog Schema's! The default schema is {demo_conf.default_catalog}.{demo_conf.default_schema}. To install it somewhere else, run <div class="code dbdemos_block">dbdemos.install('{demo_conf.name}', catalog='xxx', schema='xxx')</div><br/>""" else: info += f"""This demo content will be installed in the schema `{catalog}`.`{schema}`<br/>""" if len(demo_conf.custom_message) > 0: info += "<br/>"+demo_conf.custom_message+"<br/>" self.display_info(info, "Installation in progress...") def display_info(self, info: str, title: str=""): if len(info) > 0: if len(title) > 0: title = f"""<h2 style="color: #4875c2">{title}</h2>""" html = f"""{InstallerReport.CSS_REPORT} <div class="dbdemos_install">{title} {info} </div>""" if self.displayHTML_available(): from dbruntime.display import displayHTML displayHTML(html) else: print(html) def display_install_result(self, demo_name, description, title, install_path = None, notebooks = [], job_id = None, run_id = None, serverless = False, cluster_id = None, cluster_name = None, pipelines_ids = [], dashboards = [], workflows = [], genie_rooms = []): if self.displayHTML_available(): self.display_install_result_html(demo_name, description, title, install_path, notebooks, job_id, run_id, serverless, cluster_id, cluster_name, pipelines_ids, dashboards, workflows, genie_rooms) else: self.display_install_result_console(demo_name, description, title, install_path, notebooks, job_id, run_id, serverless, cluster_id, cluster_name, pipelines_ids, dashboards, workflows, genie_rooms) def get_install_result_html(self, demo_name, description, title, install_path = None, notebooks = [], job_id = None, run_id = None, serverless = False, cluster_id = None, cluster_name = None, pipelines_ids = [], dashboards = [], workflows = [], genie_rooms = []): html = f"""{InstallerReport.CSS_REPORT} <div class="dbdemos_install"> <img style="float:right; width: 180px; padding: 10px" src="https://github.com/databricks-demos/dbdemos-resources/raw/main/icon/{demo_name}.jpg" /> <h1>Your demo: '{title}' is ready!</h1> <i>{description}</i><br/><br/> """ if not serverless and cluster_id is not None: cluster_section = f""" <h2>Interactive cluster for the demo:</h2> <a href="{self.workspace_url}/#setting/clusters/{cluster_id}/configuration">{cluster_name}</a>. You can refresh your demo cluster with: <div class="code"> dbdemos.create_cluster('{demo_name}') </div>""" cluster_instruction = f' using the cluster <a href="{self.workspace_url}/#setting/clusters/{cluster_id}/configuration">{cluster_name}</a>' else: cluster_section = "" cluster_instruction = "" if len(notebooks) > 0: first = list(filter(lambda n: "/" not in n.get_clean_path(), notebooks)) if len(first) == 0: first = list(filter(lambda n: "resources" not in n.get_clean_path(), notebooks)) first.sort(key=lambda n: n.get_clean_path()) html += f"""Start with the first notebook {InstallerReport.NOTEBOOK_SVG} <a href="{self.workspace_url}/#workspace{install_path}/{demo_name}/{first[0].get_clean_path()}">{demo_name}/{first[0].get_clean_path()}</a>{cluster_instruction}\n""" html += """<h2>Notebook installed:</h2><div class="container_dbdemos">\n """ if len(pipelines_ids)>0 or len(dashboards)>0: html += """<div style="float: right; width: 300px">""" if len(pipelines_ids)>0: html += f"""<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/icon/{demo_name}-dlt-0.png?raw=true" style="width: 300px; margin-bottom: 10px">""" if len(dashboards)>0: html += f"""<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/icon/{demo_name}-dashboard-0.png?raw=true" style="width: 300px">""" html += """</div>""" previous_folder = "" for n in notebooks: if "_resources" not in n.get_clean_path(): #from pathlib import Path parts = Path(n.get_clean_path()).parts path = n.get_clean_path() if len(parts) > 1 : path = str(Path(*parts[1:])) if previous_folder != parts[0]: div_class = "subfolder" html += f"""<div class="notebook">{InstallerReport.FOLDER_SVG} {parts[0]}</div>\n""" previous_folder = parts[0] elif len(parts) == 1: div_class = "" html += f"""<div class="notebook {div_class}">{InstallerReport.NOTEBOOK_SVG} <a href="{self.workspace_url}/#workspace{install_path}/{demo_name}/{n.get_clean_path()}">{path}</a>: <span class="path_desc">{n.title}</span></div>""" html += """</div>""" if len(pipelines_ids) > 0: html += f"""<h2>Spark Declarative Pipelines</h2><ul>""" for p in pipelines_ids: if 'error' in p: html += f"""<li>{p['name']}: Installation error</li>""" else: html += f"""<li><a href="{self.workspace_url}/#joblist/pipelines/{p['uid']}">{p['name']}</a></li>""" html +="</ul>" if len(dashboards) > 0: html += f"""<h2>Databricks AI/BI Dashboards</h2><div class="container_dbdemos">""" for d in dashboards: if "error" in d: error_already_installed = "" html += f"""<div>ERROR INSTALLING DASHBOARD {d['name']}: {d['error']}. The Import/Export API must be enabled.{error_already_installed}</div>""" else: html += f"""<div>{InstallerReport.DASHBOARD_SVG} <a href="{self.workspace_url}/sql/dashboardsv3/{d['uid']}">{d['name']}</a></div>""" html +="</div>" if len(genie_rooms) > 0: html += f"""<h2>Databricks AI/BI Genie Spaces: Talk to your data</h2><div class="container_dbdemos">""" for g in genie_rooms: html += f"""<div>{InstallerReport.GENIE_SVG} <a href="{self.workspace_url}/genie/rooms/{g['uid']}">{g['name']}</a></div>""" html +="</div>" if len(workflows) > 0: html += f"""<h2>Workflows</h2><ul>""" for w in workflows: if w['run_id'] is not None: html += f"""We created and started a <a href="{self.workspace_url}/#job/{w['uid']}/run/{w['run_id']}">workflow</a> as part of your demo !""" else: html += f"""We created a <a href="{self.workspace_url}/#job/{w['uid']}">workflow</a> as part of your demo !""" html +="</ul>" if job_id is not None: html += f"""<h2>Initialization job started</h2> <div style="background-color: #e8f1ff; padding: 10px"> We started a <a href="{self.workspace_url}/#job/{job_id}/run/{run_id}">job to initialize your demo data</a> (for DBSQL Dashboards & Delta Live Table). <strong>Please wait for the job completion to be able to access the dataset & dashboards...</strong> </div>""" html += cluster_section+"</div>" return html def display_install_result_html(self, demo_name, description, title, install_path = None, notebooks = [], job_id = None, run_id = None, serverless = False, cluster_id = None, cluster_name = None, pipelines_ids = [], dashboards = [], workflows = [], genie_rooms = []): from dbruntime.display import displayHTML html = self.get_install_result_html(demo_name, description, title, install_path, notebooks, job_id, run_id, serverless, cluster_id, cluster_name, pipelines_ids, dashboards, workflows, genie_rooms) displayHTML(html) def display_install_result_console(self, demo_name, description, title, install_path = None, notebooks = [], job_id = None, run_id = None, serverless = False, cluster_id = None, cluster_name = None, pipelines_ids = [], dashboards = [], workflows = [], genie_rooms = []): if len(notebooks) > 0: print("----------------------------------------------------") print("-------------- Notebook installed: -----------------") for n in notebooks: if "_resources" not in n.get_clean_path(): print(f" - {n.title}: {self.workspace_url}/#workspace{install_path}/{demo_name}/{n.get_clean_path()}") if job_id is not None: print("----------------------------------------------------") print("--- Job initialization started (load demo data): ---") print(f" - Job run available under: {self.workspace_url}/#job/{job_id}/run/{run_id}") if not serverless and cluster_id is not None: print("----------------------------------------------------") print("------------ Demo interactive cluster: -------------") print(f" - {cluster_name}: {self.workspace_url}/#setting/clusters/{cluster_id}/configuration") cluster_instruction = f" using the cluster {cluster_name}" else: cluster_instruction = "" if len(pipelines_ids) > 0: print("----------------------------------------------------") print("------------ Spark Declarative Pipelines available: -----------") for p in pipelines_ids: if 'error' in p: print(f" - {p['name']}: Installation error") else: print(f" - {p['name']}: {self.workspace_url}/#joblist/pipelines/{p['uid']}") if len(dashboards) > 0: print("----------------------------------------------------") print("------------- DBSQL Dashboard available: -----------") for d in dashboards: error_already_installed = "" if "error" in d: print(f" - ERROR INSTALLING DASHBOARD {d['name']}: {d['error']}. The Import/Export API must be enabled.{error_already_installed}") else: print(f" - {d['name']}: {self.workspace_url}/sql/dashboardsv3/{d['uid']}") if len(genie_rooms) > 0: print("----------------------------------------------------") print("------------- Genie Spaces available: -----------") for g in genie_rooms: print(f" - {g['name']}: {self.workspace_url}/genie/rooms/{g['uid']}") if len(workflows) > 0: print("----------------------------------------------------") print("-------------------- Workflows: --------------------") for w in workflows: if w['run_id'] is not None: print(f"""We created and started a workflow as part of your demo: {self.workspace_url}/#job/{w['uid']}/run/{w['run_id']}""") else: print(f"""We created a workflow as part of your demo: {self.workspace_url}/#job/{w['uid']}/tasks""") print("----------------------------------------------------") print(f"Your demo {title} is ready! ") first = list(filter(lambda n: "/" not in n.get_clean_path(), notebooks)) if len(first) > 0: first.sort(key=lambda n: n.get_clean_path()) print(f"Start with the first notebook {demo_name}/{first[0].get_clean_path()}{cluster_instruction}: {self.workspace_url}/#workspace{install_path}/{demo_name}/{first[0].get_clean_path()}.") def display_schema_creation_error(self, exception: Exception, demo_conf: DemoConf): self.display_error(exception, f"""Can't create catalog/schema `{demo_conf.catalog}`.`{demo_conf.schema}`. <br/> Please verify you have the proper permissions to create catalogs and schemas, or install the demo in another location:<br/> <div class="code dbdemos_block">dbdemos.install('{demo_conf.name}', catalog='{demo_conf.catalog}', schema='{demo_conf.schema}', create_schema=True)</div><br/> Error details: {str(exception)}""") def display_schema_not_found_error(self, exception: Exception, demo_conf: DemoConf): self.display_error(exception, f"""The catalog/schema `{demo_conf.catalog}`.`{demo_conf.schema}` doesn't exist. <br/> Either create it manually, or set create_schema=True to let dbdemos create it for you:<br/> <div class="code dbdemos_block">dbdemos.install('{demo_conf.name}', catalog='{demo_conf.catalog}', schema='{demo_conf.schema}', create_schema=True)</div><br/> Error details: {str(exception)}""") ================================================ FILE: dbdemos/installer_repos.py ================================================ from .conf import DemoConf from typing import TYPE_CHECKING if TYPE_CHECKING: from .installer import Installer class InstallerRepo: def __init__(self, installer: 'Installer'): self.installer = installer self.db = installer.db #Start the init job if it exists def install_repos(self, demo_conf: DemoConf, debug = False): repos = [] if len(demo_conf.repos) > 0: if debug: print(f" Loading demo repos") #We have an init jon for repo in demo_conf.repos: repo_id = self.update_or_create_repo(repo) #returns the path as id as that's what we'll change in the URL (#/workspace/<the/repo/path>/README.md) # see notebook_parser.replace_dynamic_links_repo for more details. repos.append({"uid": repo['path'], "id": repo['id'], "repo_id": repo_id}) return repos def get_repos(self, path_prefix): assert len(path_prefix) > 2 return self.db.get("/2.0/repos", {"path_prefix": path_prefix}) def update_or_create_repo(self, repo): repo_path = repo['path'] folder = repo_path[:repo_path.rfind('/')] r = self.get_repos(repo_path) #No repo, clone it if 'repos' not in r: if repo_path.endswith('/'): repo_path = repo_path[:-1] f = self.installer.db.post("/2.0/workspace/mkdirs", json = { "path": folder}) data = { "url": repo['url'], "branch": repo['branch'], "provider": repo['provider'], "path": repo_path } r = self.db.post("/2.0/repos", data) if 'error_code' in r: error = f"ERROR - Could not clone the repo {repo['url']} under {repo_path}: {r}" raise Exception(error) r = self.get_repos(repo_path) try: return r['repos'][0]["id"] except: raise Exception(f"couldn't properly create the repository {data} - {r}") repo_id = r['repos'][0]["id"] r = self.db.patch(f"/2.0/repos/{repo_id}", {"branch": repo["branch"]}) if 'error_code' in r and r['error_code'] == 'GIT_CONFLICT': print(f"Error during repo pull {repo_path}: Git conflict. Please resolve manual conflict to get the last version.") return repo_id ================================================ FILE: dbdemos/installer_workflows.py ================================================ from .conf import DemoConf, merge_dict, ConfTemplate import json import time from .exceptions.dbdemos_exception import WorkflowException from typing import TYPE_CHECKING if TYPE_CHECKING: from .installer import Installer class InstallerWorkflow: def __init__(self, installer: 'Installer'): self.installer = installer self.db = installer.db #Start the init job if it exists def install_workflows(self, demo_conf: DemoConf, use_cluster_id = None, warehouse_name: str = None, serverless = False, debug = False): workflows = [] if len(demo_conf.workflows) > 0: if debug: print(f" Loading demo workflows") # We have an init jon for workflow in demo_conf.workflows: definition = workflow['definition'] job_name = definition["settings"]["name"] # add cloud specific setup job_id, run_id = self.create_or_replace_job(demo_conf, definition, job_name, workflow['start_on_install'], use_cluster_id, warehouse_name, serverless, debug) # print(f" Demo workflow available: {self.installer.db.conf.workspace_url}/#job/{job_id}/tasks") workflows.append({"uid": job_id, "run_id": run_id, "id": workflow['id']}) return workflows #create or update the init job if it exists def create_demo_init_job(self, demo_conf: DemoConf, use_cluster_id = None, warehouse_name: str = None, serverless = False, debug = False): if "settings" in demo_conf.init_job: job_name = demo_conf.init_job["settings"]["name"] if debug: print(f" Searching for existing demo initialisation job {job_name}") #We have an init json job_id, run_id = self.create_or_replace_job(demo_conf, demo_conf.init_job, job_name, False, use_cluster_id, warehouse_name, serverless, debug) return {"uid": job_id, "run_id": run_id, "id": "init-job"} return {"uid": None, "run_id": None, "id": None} #Start the init job if it exists. def start_demo_init_job(self, demo_conf: DemoConf, init_job, debug = False): if init_job['uid'] is not None: j = self.installer.db.post("2.1/jobs/run-now", {"job_id": init_job['uid']}) if debug: print(f'Starting init job {init_job}: {j}') if "error_code" in j: self.installer.report.display_workflow_error(WorkflowException("Can't start the workflow", {"job_id": init_job['uid']}, init_job, j), demo_conf.name) init_job['run_id'] = j['run_id'] return j['run_id'] def create_or_replace_job(self, demo_conf: DemoConf, definition: dict, job_name: str, run_now: bool, use_cluster_id = None, warehouse_name: str = None, serverless = False, debug = False): cloud = self.installer.get_current_cloud() conf_template = ConfTemplate(self.db.conf.username, demo_conf.name) cluster_conf = self.installer.get_resource("resources/default_cluster_job_config.json") cluster_conf = json.loads(conf_template.replace_template_key(cluster_conf)) cluster_conf_cloud = json.loads(self.installer.get_resource(f"resources/default_cluster_config-{cloud}.json")) merge_dict(cluster_conf, cluster_conf_cloud) definition = self.replace_warehouse_id(demo_conf, definition, warehouse_name) #Use a given interactive cluster, change the job setting accordingly. if use_cluster_id is not None: del definition["settings"]["job_clusters"] for task in definition["settings"]["tasks"]: if "job_cluster_key" in task: del task["job_cluster_key"] task["existing_cluster_id"] = use_cluster_id #otherwise set the job properties based on the definition & add pool for our dev workspace. else: for cluster in definition["settings"]["job_clusters"]: if "new_cluster" in cluster: merge_dict(cluster["new_cluster"], cluster_conf, override=False) #Let's make sure we add our dev pool for faster startup if self.db.conf.get_demo_pool() is not None: cluster["new_cluster"]["instance_pool_id"] = self.db.conf.get_demo_pool() cluster["new_cluster"].pop("node_type_id", None) cluster["new_cluster"].pop("enable_elastic_disk", None) cluster["new_cluster"].pop("aws_attributes", None) # Add support for clsuter specific task for task in definition["settings"]["tasks"]: if "new_cluster" in task: merge_dict(task["new_cluster"], cluster_conf, override=False) # if we're installing from a serverless cluster, update the job to be fully serverless if serverless: environments = [] for task in definition["settings"]["tasks"]: task.pop("new_cluster", None) task.pop("job_cluster_key", None) task.pop("existing_cluster_id", None) # Serverless doesn't support libraries. Instead, they have environements and we can link these env to each task. # Extract libraries if they exist and convert to environment for serverless compute. if "libraries" in task: env_key = "env_" + task["task_key"] dependencies = [] for lib in task["libraries"]: if "pypi" in lib and "package" in lib["pypi"]: dependencies.append(lib["pypi"]["package"]) if dependencies: environments.append({ "environment_key": env_key, "spec": { "client": "2", "dependencies": dependencies } }) task["environment_key"] = env_key task.pop("libraries", None) definition["settings"].pop("job_clusters", None) if environments: definition["settings"]["environments"] = environments existing_job = self.installer.db.find_job(job_name) if existing_job is not None: job_id = existing_job["job_id"] self.installer.db.post("/2.1/jobs/runs/cancel-all", {"job_id": job_id}) self.wait_for_run_completion(job_id, debug=debug) if debug: print(" Updating existing job") job_config = {"job_id": job_id, "new_settings": definition["settings"]} r = self.installer.db.post("2.1/jobs/reset", job_config) if "error_code" in r: self.installer.report.display_workflow_error(WorkflowException("Can't update the workflow", f"error resetting the workflow, do you have permission?.", job_config, r), demo_conf.name) else: if debug: print(" Creating a new job for demo initialization (data & table setup).") r_jobs = self.installer.db.post("2.1/jobs/create", definition["settings"]) if "error_code" in r_jobs: self.installer.report.display_workflow_error(WorkflowException("Can't create the workflow", {}, definition["settings"], r_jobs), demo_conf.name) job_id = r_jobs["job_id"] if run_now: j = self.installer.db.post("2.1/jobs/run-now", {"job_id": job_id}) if "error_code" in j: self.installer.report.display_workflow_error(WorkflowException("Can't start the workflow", {"job_id": job_id}, j), demo_conf.name) return job_id, j['run_id'] return job_id, None def replace_warehouse_id(self, demo_conf: DemoConf, definition, warehouse_name: str = None): # Jobs need a warehouse ID. Let's replace it with the one created. TODO: should be in the template? if "{{SHARED_WAREHOUSE_ID}}" in json.dumps(definition): endpoint = self.installer.get_or_create_endpoint(self.db.conf.name, demo_conf, warehouse_name = warehouse_name) if endpoint is None: print( "ERROR: couldn't create or get a SQL endpoint for dbdemos. Do you have permission? Your workflow won't be able to execute the task.") #TODO: quick & dirty, need to improve definition = json.loads(json.dumps(definition).replace(""", "warehouse_id": "{{SHARED_WAREHOUSE_ID}}"}""", "")) else: definition = json.loads(json.dumps(definition).replace("{{SHARED_WAREHOUSE_ID}}", endpoint['warehouse_id'])) return definition def wait_for_run_completion(self, job_id, max_retry=10, debug = False): def is_still_running(job_id): runs = self.installer.db.get("2.1/jobs/runs/list", {"job_id": job_id, "active_only": "true"}) return "runs" in runs and len(runs["runs"]) > 0 i = 0 while i <= max_retry and is_still_running(job_id): if debug: print(f" A run is still running for job {job_id}, waiting for termination...") time.sleep(5) ================================================ FILE: dbdemos/job_bundler.py ================================================ from .conf import DBClient, DemoConf, Conf, ConfTemplate, merge_dict import time import json import re import base64 from concurrent.futures import ThreadPoolExecutor import collections import requests class JobBundler: def __init__(self, conf: Conf): self.bundles = {} self.staging_reseted = False self.head_commit_id = None self.conf = conf self.db = DBClient(conf) def get_cluster_conf(self, demo_conf: DemoConf): conf_template = ConfTemplate(self.conf.username, demo_conf.name) #default conf cluster_conf = json.loads(conf_template.replace_template_key(self.conf.default_cluster_template)) #demo specific demo_cluster_conf = json.loads(conf_template.replace_template_key(json.dumps(demo_conf.cluster))) merge_dict(cluster_conf, demo_cluster_conf) return cluster_conf def load_bundles_conf(self): #if not self.staging_reseted: # self.reset_staging_repo() print("scanning folder for bundles...") from threading import Lock bundle_set = set() bundle_lock = Lock() def find_conf_files(path, depth = 0): objects = self.db.get("2.0/workspace/list", {"path": path}) if "objects" not in objects: return objects = objects["objects"] with ThreadPoolExecutor(max_workers=3 if depth <= 2 else 1) as executor: params = [(o['path'], depth+1) for o in objects if o['object_type'] == 'DIRECTORY'] for _ in executor.map(lambda args, f=find_conf_files: f(*args), params): pass for o in objects: if o['object_type'] == 'NOTEBOOK' and o['path'].endswith("/bundle_config"): with bundle_lock: bundle_set.add(o['path']) find_conf_files(self.conf.get_repo_path()) with ThreadPoolExecutor(max_workers=5) as executor: collections.deque(executor.map(self.add_bundle_from_config, bundle_set)) def add_bundle_from_config(self, bundle_config_paths): #Remove the /Repos/xxx from the path (we need it from the repo root) path = bundle_config_paths[len(self.conf.get_repo_path()):] path = path[:-len("_resources/bundle_config")-1] print(f"add bundle under {path}") self.add_bundle(path) def ignore_bundle(self, bundle_path): print("WARNING --------------------------------------------------------------------------------") print(f"TEMPORARY DISABLING DEMO - {bundle_path}") print("WARNING --------------------------------------------------------------------------------") print(self.bundles) del self.bundles[bundle_path] def add_bundle(self, bundle_path, config_path: str = "_resources/bundle_config"): if not self.staging_reseted: self.reset_staging_repo() #Let's get the demo conf from the demo folder. config_path = self.conf.get_repo_path()+"/"+bundle_path+"/"+config_path file = self.db.get("2.0/workspace/export", {"path": config_path, "format": "SOURCE", "direct_download": False}) if "content" not in file: raise Exception(f"Couldn't download bundle file: {config_path}. Check your bundle path if you added it manualy.") content = base64.b64decode(file['content']).decode('utf8') #TODO not great, we can't download a file so need to use a notebook. We could use eval() to eval the cell but it's not super safe. #Need to wait for file support via api (Q3) lines = [l for l in content.split('\n') if not l.startswith("#") and len(l) > 0] j = "\n".join(lines) j = re.sub(r'[:\s*]True', ' true', j) j = re.sub(r'[:\s*]False', ' false', j) # Handle triple-quoted multi-line strings by converting them to valid JSON strings pattern = r'"""([\s\S]*?)"""' def replace_multiline(match): # Get the content of the multi-line string string_content = match.group(1) # Escape newlines and other special characters for JSON escaped = json.dumps(string_content) # Return the escaped string (without the outer quotes that dumps adds) return escaped j = re.sub(pattern, replace_multiline, j) try: json_conf = json.loads(j) except Exception as e: raise Exception(f"incorrect json setting for {config_path}: {e}. The cell should contain a python object. Please use double quote.\n {j}") demo_conf = DemoConf(bundle_path, json_conf) if not demo_conf.bundle: print(f'SKIPPING DEMO {demo_conf.name} as it is not flagged for bundle.') else: self.bundles[bundle_path] = demo_conf def reset_staging_repo(self, skip_pull = False): repo_path = self.conf.get_repo_path() print(f"Cloning repo { self.conf.repo_url} and pulling last content under {repo_path}...") repos_response = self.db.get("2.0/repos", {"path_prefix": repo_path}) print(repos_response) if len(repos_response) == 0: print(f"creating repo under {repo_path}") self.db.post("2.0/repos", {"url": self.conf.repo_url, "provider": "gitHub", "path": repo_path}) repos = self.db.get("2.0/repos", {"path_prefix": repo_path})['repos'] else: repos = repos_response['repos'] if skip_pull: self.head_commit_id = repos[0]['head_commit_id'] else: print(f"Pulling last content from branch {self.conf.branch}") r = self.db.patch(f"2.0/repos/{repos[0]['id']}", {"branch": self.conf.branch}) if 'error_code' in r : raise Exception(f"Couldn't pull the repo: {r}. Please solve conflict or delete repo before.") self.head_commit_id = r['head_commit_id'] self.staging_reseted = True def start_and_wait_bundle_jobs(self, force_execution: bool = False, skip_execution: bool = False, recreate_jobs: bool = False): self.create_or_update_bundle_jobs(recreate_jobs) self.run_bundle_jobs(force_execution, skip_execution) self.wait_for_bundle_jobs_completion() def create_or_update_bundle_jobs(self, recreate_jobs: bool = False): with ThreadPoolExecutor(max_workers=10) as executor: confs = [c[1] for c in self.bundles.items()] def create_bundle_job(demo_conf): demo_conf.job_id = self.create_bundle_job(demo_conf, recreate_jobs) collections.deque(executor.map(create_bundle_job, confs)) def get_head_commit(self): owner, repo = self.conf.repo_url.split('/')[-2:] headers = { "Accept": "application/vnd.github.v3+json", "Authorization": f"token {self.conf.github_token}" } # Get the latest commit (head) from the default branch response = requests.get(f"https://api.github.com/repos/{owner}/{repo}/commits/HEAD", headers=headers) if response.status_code != 200: raise Exception(f"Error fetching head commit: {response.status_code}, {response.text}. Please check your github token in your conf file, or get a new in https://github.com/settings/tokens.") return response.json()['sha'] def run_bundle_jobs(self, force_execution: bool = False, skip_execution = False): if not force_execution: head_commit = self.get_head_commit() with ThreadPoolExecutor(max_workers=10) as executor: def run_job(demo_conf): if demo_conf.job_id is not None: execute = True runs = self.db.get("2.1/jobs/runs/list", {"job_id": demo_conf.job_id, 'limit': 2, 'expand_tasks': "true"}) #Last run was successful if 'runs' in runs and len(runs['runs']) > 0: run = runs['runs'][0] if run["status"]["state"] != "TERMINATED": run = self.cancel_job_run(demo_conf, run) if not force_execution: if "termination_details" not in run["status"]: raise Exception(f"termination_details missing, should not happen. Job {demo_conf.name} status is {run['status']}") elif run["status"]["termination_details"]["code"] == "SUCCESS": print(f"Job {demo_conf.name} status is {run['status']['termination_details']}...") if skip_execution: execute = False demo_conf.run_id = run['run_id'] print(f"skipping job execution {demo_conf.name} as it was already run and skip_execution=True.") else: #last run was using the same commit version. most_recent_commit = '' for task in run['tasks']: # Safely get the commit if git_source and git_snapshot exist task_commit = task.get('git_source', {}).get('git_snapshot', {}).get('used_commit', '') if task_commit > most_recent_commit: most_recent_commit = task_commit if not self.check_if_demo_file_changed_since_commit(demo_conf, most_recent_commit, head_commit) and most_recent_commit != '': execute = False demo_conf.run_id = run['run_id'] print(f"skipping job execution for {demo_conf.name} as no files changed since last run. run with force_execution=true to override this check.") if execute: run = self.db.post("2.1/jobs/run-now", {"job_id": demo_conf.job_id}) demo_conf.run_id = run["run_id"] collections.deque(executor.map(run_job, [c[1] for c in self.bundles.items()])) def wait_for_bundle_jobs_completion(self): for _, demo_conf in self.bundles.items(): if demo_conf.run_id is not None: self.wait_for_bundle_job_completion(demo_conf) def wait_for_bundle_job_completion(self, demo_conf: DemoConf): if demo_conf.run_id is not None: i = 0 while self.db.get("2.1/jobs/runs/get", {"run_id": demo_conf.run_id})["state"]["life_cycle_state"] == "RUNNING": if i % 200 == 0: print(f"Waiting for {demo_conf.get_job_name()} completion... " f"{self.conf.workspace_url}/#job/{demo_conf.job_id}/run/{demo_conf.run_id}") i += 1 time.sleep(5) def create_bundle_job(self, demo_conf: DemoConf, recreate_jobs: bool = False): notebooks_to_run = demo_conf.get_notebooks_to_run() if len(notebooks_to_run) == 0: return None else: #default job conf. TODO: add specific job setup per demo if required? conf_template = ConfTemplate(self.conf.username, demo_conf.name) default_job_conf = json.loads(conf_template.replace_template_key(self.conf.default_cluster_job_template)) default_job_conf["git_source"]["git_url"] = self.conf.repo_url default_job_conf["git_source"]["git_branch"] = self.conf.branch cluster_conf = self.get_cluster_conf(demo_conf) #Update the job cluster with the specific demo setup if any for job_cluster in default_job_conf["job_clusters"]: merge_dict(job_cluster["new_cluster"], cluster_conf) job_cluster["new_cluster"]["single_user_name"] = self.conf.run_test_as_username # Custom instance (ex: gpu), not i3, remove the pool # expected format: {"AWS": "g5.4xlarge", "AZURE": "Standard_NC8as_T4_v3", "GCP": "a2-highgpu-1g"} if "node_type_id" in job_cluster["new_cluster"] and "AWS" in job_cluster["new_cluster"]["node_type_id"]: job_cluster["new_cluster"].pop('instance_pool_id', None) job_cluster["new_cluster"]["node_type_id"] = job_cluster["new_cluster"]["node_type_id"]["AWS"] job_cluster["new_cluster"]["driver_node_type_id"] = job_cluster["new_cluster"]["driver_node_type_id"]["AWS"] elif self.db.conf.get_demo_pool() is not None: job_cluster["new_cluster"]["instance_pool_id"] = self.db.conf.get_demo_pool() job_cluster["new_cluster"].pop("node_type_id", None) job_cluster["new_cluster"].pop("enable_elastic_disk", None) job_cluster["new_cluster"].pop("aws_attributes", None) elif 'instance_pool_id' in job_cluster["new_cluster"]: job_cluster["new_cluster"].pop('node_type_id', None) job_cluster["new_cluster"].pop("enable_elastic_disk", None) job_cluster["new_cluster"].pop("aws_attributes", None) job_cluster["new_cluster"].pop('cluster_name', None) job_cluster["new_cluster"].pop('autotermination_minutes', None) if job_cluster["new_cluster"]["spark_conf"].get("spark.databricks.cluster.profile", "") == "singleNode": del job_cluster["new_cluster"]["autoscale"] job_cluster["new_cluster"]["num_workers"] = 0 default_job_conf['tasks'] = [] # Added for unit testing 01/13/2025. Enforcing single user to reduce # complexity of testing. default_job_conf["run_as"] = {"user_name": self.conf.run_test_as_username} for i, notebook in enumerate(notebooks_to_run): task = { "task_key": f"bundle_{demo_conf.name}_{i}", "notebook_task": { "notebook_path": demo_conf.path+"/"+notebook.path, "base_parameters": {"reset_all_data": "false"}, "source": "GIT" }, "libraries": notebook.libraries, "job_cluster_key": default_job_conf["job_clusters"][0]["job_cluster_key"], "timeout_seconds": 0, "email_notifications": {}} merge_dict(task["notebook_task"]["base_parameters"], notebook.parameters) if notebook.warehouse_id: del task["job_cluster_key"] task["notebook_task"]["warehouse_id"] = notebook.warehouse_id if notebook.depends_on_previous: task["depends_on"] = [{"task_key": f"bundle_{demo_conf.name}_{i-1}"}] default_job_conf['tasks'].append(task) if "depends_on" in default_job_conf['tasks'][0]: del default_job_conf['tasks'][0]["depends_on"] return self.create_or_update_job(demo_conf, default_job_conf, recreate_jobs) def create_or_update_job(self, demo_conf: DemoConf, job_conf: dict, recreate_jobs: bool = False): print(f'searching for job {job_conf["name"]}') existing_job = self.db.find_job(job_conf["name"]) if recreate_jobs: self.db.post("2.1/jobs/delete", {'job_id': existing_job['job_id']}) existing_job = None if existing_job is not None: # update the job print(f"test job {existing_job['job_id']} already existing for {demo_conf.name}, updating it with last config") self.db.post("2.1/jobs/reset", {'job_id': existing_job['job_id'], 'new_settings': job_conf}) return existing_job['job_id'] else: # create the job from scratch print(f"test job doesn't exist for {demo_conf.name}, creating a new one") r = self.db.post("2.1/jobs/create", job_conf) if 'job_id' not in r: raise Exception(f"Error starting the job for demo {demo_conf.name}: {r}. Please check your cluster/job setup {job_conf}") return r['job_id'] def check_if_demo_file_changed_since_commit(self, demo_conf: DemoConf, base_commit, last_commit = None): if base_commit is None or base_commit == '': return True owner, repo = self.conf.repo_url.split('/')[-2:] files = self.get_changed_files_since_commit(owner, repo, base_commit, last_commit) return any(f.startswith(demo_conf.path) for f in files) def get_changed_files_since_commit(self, owner, repo, base_commit, last_commit = None): headers = { "Accept": "application/vnd.github.v3+json", "Authorization": f"token {self.conf.github_token}" } if last_commit is None: last_commit = self.get_head_commit() # Compare the base commit with the latest commit compare_url = f"https://api.github.com/repos/{owner}/{repo}/compare/{base_commit}...{last_commit}" compare_response = requests.get(compare_url, headers=headers) if compare_response.status_code == 200: data = compare_response.json() files = data['files'] return [file['filename'] for file in files] else: raise Exception(f"Error fetching latest commit: {compare_response.status_code}, {compare_response.text}") def cancel_job_run(self, demo_conf: DemoConf, run): """Cancel a running job and wait for termination""" print(f"Job {demo_conf.name} status is {run['status']['state']}, cancelling it...") self.db.post("2.1/jobs/runs/cancel-all", {"job_id": demo_conf.job_id}) time.sleep(5) while True: run = self.db.get("2.1/jobs/runs/get", {"run_id": run['run_id']}) if run["status"]["state"] == "TERMINATED": break print(f"Waiting for job {demo_conf.name} to be terminated after cancellation...") time.sleep(10) return run ================================================ FILE: dbdemos/notebook_parser.py ================================================ from dbdemos.conf import DemoConf from .tracker import Tracker import urllib import re import base64 import json class NotebookParser: def __init__(self, html): self.html = html self.raw_content, self.content = self.get_notebook_content(html) def get_notebook_content(self, html): match = re.search(r'__DATABRICKS_NOTEBOOK_MODEL = \'(.*?)\'', html) raw_content = match.group(1) content = base64.b64decode(raw_content).decode('utf-8') content = urllib.parse.unquote(content) return raw_content, content def get_html(self): content = json.loads(self.content) #force the position to avoid bug during import for i in range(len(content["commands"])): content["commands"][i]['position'] = i content = json.dumps(content) content = urllib.parse.quote(content, safe="()*''") return self.html.replace(self.raw_content, base64.b64encode(content.encode('utf-8')).decode('utf-8')) def contains(self, str): return str in self.content def remove_static_settings(self): #Remove the static settings tags are it's too big & unecessary to repeat in each notebook. self.html = re.sub("""<script>\s?window\.__STATIC_SETTINGS__.*</script>""", "", self.html) def set_tracker_tag(self, org_id, uid, category, demo_name, notebook, username): #Replace internal tags with dbdemos if Tracker.enable_tracker: tracker = Tracker(org_id, uid, username) #Our demos in the repo already have tags used when we clone the notebook directly. #We need to update the tracker with the demo configuration & dbdemos setup. tracker_url = tracker.get_track_url(category, demo_name, "VIEW", notebook) r = r"""(<img\s*width=\\?"1px\\?"\s*src=\\?")(https:\/\/ppxrzfxige\.execute-api\.us-west-2\.amazonaws\.com\/v1\/analytics.*?)(\\?"\s?\/?>)""" self.content = re.sub(r, rf'\1{tracker_url}\3', self.content) #old legacy tracker, to be migrted & emoved r = r"""(<img\s*width=\\?"1px\\?"\s*src=\\?")(https:\/\/www\.google-analytics\.com\/collect.*?)(\\?"\s?\/?>)""" self.content = re.sub(r, rf'\1{tracker_url}\3', self.content) else: #Remove all the tracker from the notebook self.replace_in_notebook(r"""<img\s*width=\\?"1px\\?"\s*src=\\?"https:\/\/www\.google-analytics\.com\/collect.*?\\?"\s?\/?>""", "", True) self.replace_in_notebook(r"""<img\s*width=\\?"1px\\?"\s*src=\\?"https:\/\/ppxrzfxige\.execute-api\.us-west-2\.amazonaws\.com\/v1\/analytics.*?\\?"\s?\/?>""", "", True) def remove_uncomment_tag(self): self.replace_in_notebook('[#-]{1,2}\s*UNCOMMENT_FOR_DEMO ?', '', True) ##Remove the __build to avoid catalog conflict during build vs test # TODO: improve build and get a separate metastore for tests vs build. def remove_dbdemos_build(self): self.replace_in_notebook('dbdemos__build', 'dbdemos') def remove_robots_meta(self): #Drop the noindex tag self.html = self.html.replace('<meta name="robots" content="nofollow, noindex">', '') def add_cell_as_html_for_seo(self): #Add div as hidden HTML for SEO to capture the main information in the page. def md_to_html(text): if text.startswith('%md-sandbox'): text = text[len('%md-sandbox'):] if text.startswith('%md'): text = text[len('%md'):] #quick translation to html for seo for i in reversed(range(1,6)): tag = "#"*i text = re.sub(rf'\s*{tag}\s*(.*)', rf'<h{i}>\1</h{i}>', text) text = text.replace('\n', '<br/>') return text #Drop the noindex tag content = json.loads(self.content) html = "" for c in content["commands"]: if c['command'].startswith('%md'): html += '<div>'+md_to_html(c['command'])+'</div>' if len(html) > 0: self.html = self.html.replace('<body>', f'''<body><div id='no_js_render' style='display: none'>{html}</div>''') self.html = self.html.replace('<script>', "<script>window.addEventListener('load', function(event) { " "if (/bot|google|baidu|bing|msn|teoma|slurp|yandex/i.test(navigator.userAgent)) {" "document.getElementById('no_js_render').style.display = 'block';" "};" "});", 1) @staticmethod def _replace_with_optional_escaped_quotes(content: str, old: str, new: str) -> str: """ Helper to replace text handling both escaped and unescaped quotes. In JSON content, quotes are escaped as \", but in parsed content they're not. We handle both by trying replacements with escaped quotes first, then unescaped. This is much faster than using regex. """ # Try with escaped quotes first (JSON format: \") old_escaped = old.replace('"', '\\"') new_escaped = new.replace('"', '\\"') content = content.replace(old_escaped, new_escaped) # Then try with unescaped quotes (plain text format: ") content = content.replace(old, new) return content @staticmethod def replace_schema_in_content(content: str, demo_conf: DemoConf) -> str: """ Static method to replace schema/catalog references in any content string. Used for both notebook content and FILE object types. """ #main__build is used during the build process to avoid collision with default main. # #main_build is used because agent don't support __ in their catalog name - TODO should improve this and move everything to main_build content = NotebookParser._replace_with_optional_escaped_quotes(content, 'catalog = "main__build"', 'catalog = "main"') content = NotebookParser._replace_with_optional_escaped_quotes(content, 'catalog = "main_build"', 'catalog = "main"') content = content.replace(f'main__build.{demo_conf.default_schema}', f'main.{demo_conf.default_schema}') content = content.replace(f'main_build.{demo_conf.default_schema}', f'main.{demo_conf.default_schema}') content = content.replace('Volumes/main__build', 'Volumes/main') content = content.replace('Volumes/main_build', 'Volumes/main') #TODO we need to unify this across all demos. if demo_conf.custom_schema_supported: content = re.sub(r"\$catalog=[0-9a-z_]*\s{1,3}\$schema=[0-9a-z_]*", f"$catalog={demo_conf.catalog} $schema={demo_conf.schema}", content) content = re.sub(r"\$catalog=[0-9a-z_]*\s{1,3}\$db=[0-9a-z_]*", f"$catalog={demo_conf.catalog} $db={demo_conf.schema}", content) content = content.replace(f"{demo_conf.default_catalog}.{demo_conf.default_schema}", f"{demo_conf.catalog}.{demo_conf.schema}") content = NotebookParser._replace_with_optional_escaped_quotes(content, f'dbutils.widgets.text("catalog", "{demo_conf.default_catalog}"', f'dbutils.widgets.text("catalog", "{demo_conf.catalog}"') content = NotebookParser._replace_with_optional_escaped_quotes(content, f'dbutils.widgets.text("schema", "{demo_conf.default_schema}"', f'dbutils.widgets.text("schema", "{demo_conf.schema}"') content = NotebookParser._replace_with_optional_escaped_quotes(content, f'dbutils.widgets.text("db", "{demo_conf.default_schema}"', f'dbutils.widgets.text("db", "{demo_conf.schema}"') content = content.replace(f'Volumes/{demo_conf.default_catalog}/{demo_conf.default_schema}', f'Volumes/{demo_conf.catalog}/{demo_conf.schema}') content = NotebookParser._replace_with_optional_escaped_quotes(content, f'catalog = "{demo_conf.default_catalog}"', f'catalog = "{demo_conf.catalog}"') content = NotebookParser._replace_with_optional_escaped_quotes(content, f'dbName = db = "{demo_conf.default_schema}"', f'dbName = db = "{demo_conf.schema}"') content = NotebookParser._replace_with_optional_escaped_quotes(content, f'schema = dbName = db = "{demo_conf.default_schema}"', f'schema = dbName = db = "{demo_conf.schema}"') content = NotebookParser._replace_with_optional_escaped_quotes(content, f'db = "{demo_conf.default_schema}"', f'db = "{demo_conf.schema}"') content = NotebookParser._replace_with_optional_escaped_quotes(content, f'schema = "{demo_conf.default_schema}"', f'schema = "{demo_conf.schema}"') content = content.replace(f'USE SCHEMA {demo_conf.default_schema}', f'USE SCHEMA {demo_conf.schema}') content = content.replace(f'USE CATALOG {demo_conf.default_catalog}', f'USE CATALOG {demo_conf.catalog}') content = content.replace(f'CREATE CATALOG IF NOT EXISTS {demo_conf.default_catalog}', f'CREATE CATALOG IF NOT EXISTS {demo_conf.catalog}') content = content.replace(f'CREATE SCHEMA IF NOT EXISTS {demo_conf.default_schema}', f'CREATE SCHEMA IF NOT EXISTS {demo_conf.schema}') return content def replace_schema(self, demo_conf: DemoConf): """Replace schema/catalog in notebook content""" self.content = NotebookParser.replace_schema_in_content(self.content, demo_conf) def replace_in_notebook(self, old, new, regex = False): if regex: self.content = re.sub(old, new, self.content) else: self.content = self.content.replace(old, new) def add_extra_cell(self, cell_content, position = 1): command = { "version": "CommandV1", "bindings": {}, "subtype": "command", "commandType": "auto", "position": position, "command": cell_content } content = json.loads(self.content) content["commands"].insert(position, command) self.content = json.dumps(content) #as auto ml links are unique per workspace, we have to delete them def remove_automl_result_links(self): if "display_automl_" in self.content: content = json.loads(self.content) for c in content["commands"]: if re.search('display_automl_[a-zA-Z]*_link', c["command"]): if 'results' in c and c['results'] is not None and 'data' in c['results'] and c['results']['data'] is not None and len(c['results']['data']) > 0: contains_exp_link = len([d for d in c['results']['data'] if 'Data exploration notebook' in d['data']]) > 0 if contains_exp_link: c['results']['data'] = [{'type': 'ansi', 'data': 'Please run the notebook cells to get your AutoML links (from the begining)', 'name': None, 'arguments': {}, 'addedWidgets': {}, 'removedWidgets': [], 'datasetInfos': [], 'metadata': {}}] self.content = json.dumps(content) #Will change the content to def change_relative_links_for_minisite(self): #self.replace_in_notebook("""<a\s*(?:target="_blank")?\s*(?:rel="noopener noreferrer")?\s*href="\$\.\/(.*)">""", """<a href="./$1">""", True) self.replace_in_notebook("""\]\(\$\.\/(.*?)\)""", """](./\g<1>.html)""", True) def add_javascript_to_minisite_relative_links(self, notebook_path): # Add JavaScript to the HTML (not content) that intercepts link clicks # This is much more reliable than trying to modify the notebook content # Get the notebook's directory (remove filename) notebook_dir = '/'.join(notebook_path.split('/')[:-1]) script = f""" <script type="text/javascript"> (function() {{ const NOTEBOOK_PATH = '{notebook_path}'; const NOTEBOOK_DIR = '{notebook_dir}'; function resolvePath(relativePath) {{ // If path starts with /, it's absolute from root if (relativePath.startsWith('/')) {{ return relativePath.substring(1); }} // Otherwise, resolve relative to current notebook's directory if (!NOTEBOOK_DIR || NOTEBOOK_DIR === '') {{ return relativePath; }} // Combine directory with relative path let parts = NOTEBOOK_DIR.split('/').filter(p => p !== ''); let pathParts = relativePath.split('/').filter(p => p !== ''); for (let part of pathParts) {{ if (part === '..') {{ parts.pop(); }} else if (part !== '.') {{ parts.push(part); }} }} return parts.join('/'); }} function setupMinisiteLinks() {{ // Find all links in the page const links = document.querySelectorAll('a[href]'); links.forEach(function(link) {{ const href = link.getAttribute('href'); if (!href) return; // Check if link has $ (internal demo link) if (href.includes('$')) {{ // All $ links are relative to the current notebook directory // Remove various prefixes: /$./ $./ /$. $. let relativePath = href; // Remove /$./ if (relativePath.includes('/$' + './')) {{ relativePath = relativePath.replace('/$' + './', ''); }} // Remove $./ else if (relativePath.includes('$' + './')) {{ relativePath = relativePath.replace('$' + './', ''); }} // Remove /$. else if (relativePath.includes('/$' + '.')) {{ relativePath = relativePath.replace('/$' + '.', ''); }} // Remove $. else if (relativePath.includes('$' + '.')) {{ relativePath = relativePath.replace('$' + '.', ''); }} // Just remove $ else {{ relativePath = relativePath.replace('$', ''); }} // Remove only a single leading ./ if present (preserve ../ for navigation) if (relativePath.startsWith('./')) {{ relativePath = relativePath.substring(2); }} // Always resolve against notebook directory let targetPath = resolvePath(relativePath) + '.html'; // Remove target and rel attributes immediately link.removeAttribute('target'); link.removeAttribute('rel'); // Change href to # immediately to prevent navigation link.setAttribute('href', '#'); // Store targetPath in data attribute for debugging link.setAttribute('data-target-path', targetPath); // Add click handler with capture phase link.addEventListener('click', function(e) {{ e.preventDefault(); e.stopPropagation(); // Send message to parent window if (window.parent && window.parent !== window) {{ window.parent.postMessage({{ type: 'dbdemos-navigate', targetPath: targetPath }}, '*'); }} else {{ window.location.href = targetPath; }} return false; }}, true); }} else if (!href.startsWith('#') && !href.startsWith('http')) {{ // Remove non-demo internal links (convert to plain text) const text = document.createTextNode(link.textContent); link.parentNode.replaceChild(text, link); }} }}); }} // Run when DOM is ready if (document.readyState === 'loading') {{ document.addEventListener('DOMContentLoaded', setupMinisiteLinks); }} else {{ setupMinisiteLinks(); }} // Also run after a short delay to catch dynamically loaded content setTimeout(setupMinisiteLinks, 500); }})(); </script> """ # Insert the script before </body> self.html = self.html.replace('</body>', script + '</body>') #Set the environment metadata to the notebook. # TODO: might want to re-evaluate this once we move to ipynb format as it'll be set in the ipynb file, as metadata. def set_environement_metadata(self, client_version: str = "3"): content = json.loads(self.content) env_metadata = content.get("environmentMetadata", {}) if env_metadata is None: env_metadata = {} if ("client" not in env_metadata or env_metadata["client"] is None or int(env_metadata["client"]) < int(client_version)): env_metadata["client"] = str(client_version) content["environmentMetadata"] = env_metadata self.content = json.dumps(content) def hide_commands_and_results(self): # self.replace_in_notebook('e2-demo-tools', 'xxxx', True) content = json.loads(self.content) for c in content["commands"]: if "#hide_this_code" in c["command"].lower(): c["hideCommandCode"] = True if "%run " in c["command"]: c["hideCommandResult"] = True if "results" in c and c["results"] is not None and "data" in c["results"] and c["results"]["data"] is not None and \ c["results"]["type"] == "table" and len(c["results"]["data"])>0 and str(c["results"]["data"][0][0]).startswith("This Delta Live Tables query is syntactically valid"): c["hideCommandResult"] = True self.content = json.dumps(content) def remove_delete_cell(self): content = json.loads(self.content) content["commands"] = [c for c in content["commands"] if "#dbdemos__delete_this_cell" not in c["command"].lower()] self.content = json.dumps(content) def replace_dynamic_links(self, items, name, link_path): if len(items) == 0: return matches = re.finditer(rf'<a\s*dbdemos-{name}-id=\\?[\'"](?P<item_id>.*?)\\?[\'"]\s*href=\\?[\'"].*?\/?{link_path}\/(?P<item_uid>[a-zA-Z0-9_-]*).*?>', self.content) for match in matches: item_id = match.groupdict()["item_id"] installed = False for i in items: if i["id"] == item_id: installed = True self.content = self.content.replace(match.groupdict()["item_uid"], str(i['uid'])) if not installed: print(f'''ERROR: couldn't find {name} with dbdemos-{name}-id={item_id}''') def replace_dynamic_links_workflow(self, workflows): """ Replace the links in the notebook with the workflow installed if any """ self.replace_dynamic_links(workflows, "workflow", "#job") def replace_dynamic_links_repo(self, repos): for r in repos: if r["uid"].startswith("/"): r["uid"] = r["uid"][1:] """ Replace the links in the notebook with the repos installed if any """ self.replace_dynamic_links(repos, "repo", "#workspace") def replace_dynamic_links_pipeline(self, pipelines_id): """ Replace the links in the notebook with the SDP pipeline installed if any """ self.replace_dynamic_links(pipelines_id, "pipeline", "#joblist/pipelines") def replace_dynamic_links_lakeview_dashboards(self, dashboards_id): """ Replace the links in the notebook with the Lakeview dashboard installed if any """ self.replace_dynamic_links(dashboards_id, "dashboard", "/sql/dashboardsv3") def replace_dynamic_links_genie(self, genie_rooms): """ Replace the links in the notebook with the Genie room installed if any """ self.replace_dynamic_links(genie_rooms, "genie", "/genie/rooms") ================================================ FILE: dbdemos/packager.py ================================================ import pkg_resources from pathlib import Path from .conf import DBClient, DemoConf, Conf, DemoNotebook from .notebook_parser import NotebookParser import json import os import re import shutil import base64 from .job_bundler import JobBundler from concurrent.futures import ThreadPoolExecutor import collections import zipfile import io class Packager: DASHBOARD_IMPORT_API = "_import_api" def __init__(self, conf: Conf, jobBundler: JobBundler): self.db = DBClient(conf) self.jobBundler = jobBundler def package_all(self, iframe_root_src = "./"): def package_demo(demo_conf: DemoConf): self.clean_bundle(demo_conf) self.package_demo(demo_conf) if len(demo_conf.dashboards) > 0: self.extract_lakeview_dashboards(demo_conf) self.build_minisite(demo_conf, iframe_root_src) confs = [demo_conf for _, demo_conf in self.jobBundler.bundles.items()] with ThreadPoolExecutor(max_workers=3) as executor: collections.deque(executor.map(package_demo, confs)) def clean_bundle(self, demo_conf: DemoConf): if Path(demo_conf.get_bundle_root_path()).exists(): shutil.rmtree(demo_conf.get_bundle_root_path()) def extract_lakeview_dashboards(self, demo_conf: DemoConf): for d in demo_conf.dashboards: repo_path = self.jobBundler.conf.get_repo_path()+"/"+demo_conf.path+"/_resources/dashboards/"+d['id']+".lvdash.json" repo_path = os.path.realpath(repo_path) dashboard_file = self.db.get("2.0/workspace/export", {"path": repo_path, "format": "SOURCE", "direct_download": False}) if 'error_code' in dashboard_file: raise Exception(f"Couldn't find dashboard {repo_path} in repo. Check repo ID in bundle conf file and make sure the dashboard is here. " f"{dashboard_file['error_code']} - {dashboard_file['message']}") dashboard_file = base64.b64decode(dashboard_file['content']).decode('utf-8') full_path = demo_conf.get_bundle_path()+"/_resources/dashboards/"+d['id']+".lvdash.json" Path(full_path[:full_path.rindex("/")]).mkdir(parents=True, exist_ok=True) with open(full_path, "w") as f: f.write(dashboard_file) def process_file_content(self, file, destination_path, extension = ""): # Decode base64 content from the folder dict file_content = base64.b64decode(file['content']) with open(destination_path + extension, "wb") as f: f.write(file_content) def process_notebook_content(self, demo_conf: DemoConf, html, full_path): #Replace notebook content. parser = NotebookParser(html) parser.remove_uncomment_tag() parser.set_environement_metadata(demo_conf.env_version) parser.remove_dbdemos_build() #parser.remove_static_settings() parser.hide_commands_and_results() #Moving away from the initial 00-global-setup, remove it once migration is completed requires_global_setup_v2 = False if parser.contains("00-global-setup-v2"): parser.replace_in_notebook('(?:\.\.\/)*_resources\/00-global-setup-v2', './00-global-setup-v2', True) requires_global_setup_v2 = True elif parser.contains("00-global-setup"): raise Exception("00-global-setup is deprecated. Please use 00-global-setup-v2 instead.") with open(full_path, "w") as f: f.write(parser.get_html()) return requires_global_setup_v2 def package_demo(self, demo_conf: DemoConf): print(f"packaging demo {demo_conf.name} ({demo_conf.path})") if len(demo_conf.get_notebooks_to_publish()) > 0 and not self.jobBundler.staging_reseted: self.jobBundler.reset_staging_repo() if len(demo_conf.get_notebooks_to_run()) > 0: run = self.db.get("2.1/jobs/runs/get", {"run_id": demo_conf.run_id, "include_history": False}) if 'state' not in run: raise Exception(f"Can't get the last job {self.db.conf.workspace_url}/#job/{demo_conf.job_id}/run/{demo_conf.run_id} state for demo {demo_conf.name}: {run}") if run['state']['result_state'] != 'SUCCESS': raise Exception(f"last job {self.db.conf.workspace_url}/#job/{demo_conf.job_id}/run/{demo_conf.run_id} failed for demo {demo_conf.name}. Can't package the demo. {run['state']}") def download_notebook_html(notebook: DemoNotebook): full_path = demo_conf.get_bundle_path()+"/"+notebook.get_clean_path() print(f"downloading {notebook.path} to {full_path}") Path(full_path[:full_path.rindex("/")]).mkdir(parents=True, exist_ok=True) if not notebook.pre_run: repo_path = self.jobBundler.conf.get_repo_path()+"/"+demo_conf.path+"/"+notebook.path repo_path = os.path.realpath(repo_path) #print(f"downloading from repo {repo_path}") status = self.db.get("2.0/workspace/get-status", {"path": repo_path}) if 'error_code' in status: raise Exception(f"Couldn't find file {repo_path} in workspace. Check notebook path in bundle conf file. {status['error_code']} - {status['message']}") #We add the type of the object in the conf to know how to load it back. demo_conf.update_notebook_object_type(notebook, status['object_type']) if status['object_type'] == 'NOTEBOOK': file = self.db.get("2.0/workspace/export", {"path": repo_path, "format": "HTML", "direct_download": False}) if 'error_code' in file: raise Exception(f"Couldn't find file {repo_path} in workspace. Check notebook path in bundle conf file. {file['error_code']} - {file['message']}") html = base64.b64decode(file['content']).decode('utf-8') return self.process_notebook_content(demo_conf, html, full_path+".html") elif status['object_type'] == 'DIRECTORY': folder = self.db.get("2.0/workspace/export", {"path": repo_path, "format": "AUTO", "direct_download": True}) return self.process_file_content(folder, full_path, ".zip") elif status['object_type'] == 'FILE': file = self.db.get("2.0/workspace/export", {"path": repo_path, "format": "AUTO", "direct_download": True}) return self.process_file_content(file, full_path) else: raise Exception(f"Unsupported object type {status['object_type']} for {repo_path}") else: tasks = [t for t in run['tasks'] if t['notebook_task']['notebook_path'].endswith(notebook.get_clean_path())] if len(tasks) == 0: raise Exception(f"couldn't find task for notebook {notebook.path}. Please re-run the job & make sure the stating git repo is synch / reseted.") #print(f"Exporting notebook from job run {tasks[0]['run_id']}") notebook_result = self.db.get("2.1/jobs/runs/export", {'run_id': tasks[0]['run_id'], 'views_to_export': 'ALL'}) if "views" not in notebook_result: raise Exception(f"couldn't get notebook for run {tasks[0]['run_id']} - {notebook.path}. {demo_conf.name}. You probably did a run repair. Please re run the job. - {notebook_result}") html = notebook_result["views"][0]["content"] return self.process_notebook_content(demo_conf, html, full_path+".html") requires_global_setup_v2 = False # Process notebooks in parallel with max 5 workers with ThreadPoolExecutor(max_workers=10) as executor: # Submit all notebooks for processing and collect futures futures = [executor.submit(download_notebook_html, notebook) for notebook in demo_conf.notebooks] # Process results as they complete for future in futures: rv1 = future.result() if rv1: requires_global_setup_v2 = True #Add the global notebook if required if requires_global_setup_v2: init_notebook = DemoNotebook("_resources/00-global-setup-v2", "Global init", "Global init") demo_conf.add_notebook(init_notebook) file = self.db.get("2.0/workspace/export", {"path": self.jobBundler.conf.get_repo_path() +"/"+ init_notebook.path, "format": "HTML", "direct_download": False}) if 'error_code' in file: raise Exception(f"Couldn't find file '{self.jobBundler.conf.get_repo_path()}/{init_notebook.path}' in workspace. Check notebook path in bundle conf file. {file['error_code']} - {file['message']}") html = base64.b64decode(file['content']).decode('utf-8') with open(demo_conf.get_bundle_path() + "/" + init_notebook.path+".html", "w") as f: f.write(html) def get_file_icon_svg(self, file_path: str) -> str: """ Get the appropriate SVG icon based on file type Args: file_path: Path to the file Returns: SVG icon as HTML string """ # Notebook icon (for .html notebook files) notebook_icon = '''<svg class="file-icon" xmlns="http://www.w3.org/2000/svg" width="1em" height="1em" fill="none" viewBox="0 0 16 16" aria-hidden="true" focusable="false"><path fill="currentColor" fill-rule="evenodd" d="M3 1.75A.75.75 0 0 1 3.75 1h10.5a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75H3.75a.75.75 0 0 1-.75-.75V12.5H1V11h2V8.75H1v-1.5h2V5H1V3.5h2zm1.5.75v11H6v-11zm3 0v11h6v-11z" clip-rule="evenodd"></path></svg>''' # Generic file icon (for .py, .sql, and other code files) file_icon = '''<svg class="file-icon" xmlns="http://www.w3.org/2000/svg" width="1em" height="1em" fill="none" viewBox="0 0 16 16" aria-hidden="true" focusable="false"><path fill="currentColor" fill-rule="evenodd" d="M2 1.75A.75.75 0 0 1 2.75 1h6a.75.75 0 0 1 .53.22l4.5 4.5c.141.14.22.331.22.53v9a.75.75 0 0 1-.75.75H2.75a.75.75 0 0 1-.75-.75zm1.5.75v12h9V7H8.75A.75.75 0 0 1 8 6.25V2.5zm6 1.06 1.94 1.94H9.5z" clip-rule="evenodd"></path></svg>''' if file_path.endswith(('.py', '.sql')): return file_icon else: return notebook_icon def build_tree_structure(self, notebooks_to_publish): """ Build a hierarchical tree structure from notebook paths Returns a nested dict representing the folder/file hierarchy """ tree = {} for notebook in notebooks_to_publish: parts = notebook.get_clean_path().split('/') current = tree # Navigate/create the folder structure for i, part in enumerate(parts[:-1]): if part not in current: current[part] = {'__type__': 'folder', '__children__': {}} current = current[part]['__children__'] # Add the file at the end filename = parts[-1] current[filename] = { '__type__': 'file', '__notebook__': notebook, '__path__': notebook.get_clean_path() } return tree def render_tree_html(self, tree, iframe_root_src="./", level=0): """ Recursively render the tree structure as HTML with CSS-based tree lines Args: tree: The tree structure dict iframe_root_src: Root path for iframe sources level: Current depth level """ html = "" items = sorted(tree.items(), key=lambda x: (x[1].get('__type__') != 'folder', x[0])) for idx, (name, node) in enumerate(items): is_last = (idx == len(items) - 1) if node['__type__'] == 'folder': # Render folder folder_id = f"folder_{level}_{idx}_{name.replace(' ', '_')}" html += f''' <div class="tree-item tree-folder {'tree-last' if is_last else ''}"> <div class="tree-item-row folder-row expanded" data-folder-id="{folder_id}" onclick="toggleFolder('{folder_id}')"> <svg class="folder-icon" xmlns="http://www.w3.org/2000/svg" width="1em" height="1em" fill="none" viewBox="0 0 16 16"> <path fill="currentColor" d="M.75 2a.75.75 0 0 0-.75.75v10.5c0 .414.336.75.75.75h14.5a.75.75 0 0 0 .75-.75v-8.5a.75.75 0 0 0-.75-.75H7.81L6.617 2.805A2.75 2.75 0 0 0 4.672 2z"></path> </svg> <span class="folder-name">{name}</span> </div> <div class="tree-children" id="{folder_id}"> ''' # Recursively render children html += self.render_tree_html(node['__children__'], iframe_root_src, level + 1) html += ''' </div> </div> ''' else: # Render file notebook = node['__notebook__'] path = node['__path__'] file_icon = self.get_file_icon_svg(path) notebook_link = iframe_root_src + path + ".html" # Use the filename (last part of the path) filename = path.split('/')[-1] html += f''' <div class="tree-item tree-file {'tree-last' if is_last else ''}"> <a href="#" class="tree-item-row file-row _left_menu" iframe-src="{notebook_link}"> {file_icon} <span class="file-name">{filename}</span> </a> </div> ''' return html def generate_html_from_code_file(self, code_file_path: str, output_html_path: str, demo_name: str): """ Generate HTML file from .py or .sql code file with syntax highlighting Args: code_file_path: Path to the source code file (.py or .sql) output_html_path: Path where the HTML file should be saved demo_name: Name of the demo (for metadata) """ import html # Determine file type and language file_extension = code_file_path.split('.')[-1] file_name = os.path.basename(code_file_path) file_path_display = code_file_path if file_extension == 'py': language = 'python' file_type = 'Python' elif file_extension == 'sql': language = 'sql' file_type = 'SQL' else: raise ValueError(f"Unsupported file type: {file_extension}. Only .py and .sql are supported.") # Read the code file with open(code_file_path, 'r', encoding='utf-8') as f: code_content = f.read() # HTML escape the code content to prevent XSS code_content_escaped = html.escape(code_content) # Load the code viewer template template = pkg_resources.resource_string("dbdemos", "template/code_viewer.html").decode('UTF-8') # Replace placeholders template = template.replace("{{FILE_NAME}}", file_name) template = template.replace("{{FILE_TYPE}}", file_type) template = template.replace("{{FILE_PATH}}", file_path_display) template = template.replace("{{LANGUAGE}}", language) template = template.replace("{{CODE_CONTENT}}", code_content_escaped) template = template.replace("{{DEMO_NAME}}", demo_name) # Write the HTML file with open(output_html_path, 'w', encoding='utf-8') as f: f.write(template) #Build HTML pages with index. # - If the notebook is pre-run, load them from the install_package folder # - If the notebook isn't pre-run, download them from the pacakge workspace as HTML (ex: can't run SDP pipelines) def build_minisite(self, demo_conf: DemoConf, iframe_root_src = "./"): notebooks_to_publish = demo_conf.get_notebooks_to_publish() print(f"Build minisite for demo {demo_conf.name} ({demo_conf.path}) - {notebooks_to_publish}") minisite_path = demo_conf.get_minisite_path() for notebook in notebooks_to_publish: Path(minisite_path).mkdir(parents=True, exist_ok=True) full_path = minisite_path+"/"+notebook.get_clean_path()+".html" Path(full_path[:full_path.rindex("/")]).mkdir(parents=True, exist_ok=True) # Check if we have a code file (.py or .sql) or notebook HTML file # Code files are stored with their full extension in the bundle clean_path = notebook.get_clean_path() if notebook.path.endswith(('.py', '.sql')): # Code file - path already includes extension (.py or .sql) source_file_path = demo_conf.get_bundle_path() + "/" + clean_path file_type = clean_path.split('.')[-1].upper() print(f" Generating HTML from {file_type} file: {source_file_path}") self.generate_html_from_code_file(source_file_path, full_path, demo_conf.name) else: # Standard notebook HTML file - append .html extension source_file_path = demo_conf.get_bundle_path() + "/" + clean_path + ".html" if not os.path.exists(source_file_path): raise FileNotFoundError(f"Could not find notebook file: {source_file_path}") with open(source_file_path, "r") as f: parser = NotebookParser(f.read()) with open(full_path, "w") as f: parser.remove_robots_meta() parser.add_cell_as_html_for_seo() parser.remove_delete_cell() parser.add_javascript_to_minisite_relative_links(notebook.get_clean_path()) f.write(parser.get_html()) # Build the tree structure from all notebooks tree = self.build_tree_structure(notebooks_to_publish) # Render the tree as HTML tree_html = self.render_tree_html(tree, iframe_root_src) # Create the index file template = pkg_resources.resource_string("dbdemos", "template/index.html").decode('UTF-8') template = template.replace("{{LEFT_MENU}}", tree_html) template = template.replace("{{TITLE}}", demo_conf.title) template = template.replace("{{DESCRIPTION}}", demo_conf.description) template = template.replace("{{DEMO_NAME}}", demo_conf.name) with open(minisite_path+"/index.html", "w") as f: f.write(template) #dump the conf with open(demo_conf.get_bundle_root_path()+"/conf.json", "w") as f: f.write(json.dumps(demo_conf.json_conf)) ================================================ FILE: dbdemos/resources/default_cluster_config-AWS.json ================================================ { "node_type_id": "i3.xlarge", "aws_attributes": { "first_on_demand": 1, "availability": "SPOT_WITH_FALLBACK", "instance_profile_arn": null, "spot_bid_price_percent": 100, "ebs_volume_count": 0 } } ================================================ FILE: dbdemos/resources/default_cluster_config-AZURE.json ================================================ { "node_type_id": "Standard_D8ds_v4", "azure_attributes": { "first_on_demand": 1, "availability": "ON_DEMAND_AZURE", "spot_bid_max_price": -1 } } ================================================ FILE: dbdemos/resources/default_cluster_config-GCP.json ================================================ { "node_type_id": "n1-standard-8", "gcp_attributes": { "use_preemptible_executors": false, "availability": "ON_DEMAND_GCP", "zone_id": "HA" } } ================================================ FILE: dbdemos/resources/default_cluster_config.json ================================================ { "autoscale": { "min_workers": 4, "max_workers": 4 }, "cluster_name": "dbdemos-{{DEMO_NAME}}-{{CURRENT_USER_NAME}}", "spark_version": "16.4.x-cpu-ml-scala2.12", "spark_conf": { "spark.databricks.dataLineage.enabled": "true" }, "ssh_public_keys": [], "custom_tags": { "project": "dbdemos", "demo": "{{DEMO_NAME}}" }, "spark_env_vars": {}, "autotermination_minutes": 60, "clus4ter_source": "UI", "init_scripts": [], "runtime_engine": "STANDARD" } ================================================ FILE: dbdemos/resources/default_cluster_job_config.json ================================================ { "spark_version": "16.4.x-cpu-ml-scala2.12", "spark_conf": { "spark.databricks.dataLineage.enabled": "true" }, "custom_tags": { "project": "dbdemos", "demo": "{{DEMO_NAME}}" } } ================================================ FILE: dbdemos/resources/default_test_job_conf.json ================================================ { "name": "field-demos_{{DEMO_NAME}}", "email_notifications": { "no_alert_for_skipped_runs": false }, "timeout_seconds": 0, "max_concurrent_runs": 1, "tasks": [ ], "job_clusters": [ { "job_cluster_key": "field_demo_test", "new_cluster": { "spark_version": "16.4.x-cpu-ml-scala2.12", "custom_tags": { "project": "dbdemos", "demo_bundle_job": "autoloader", "demo": "autoloader" }, "spark_conf": { "spark.databricks.dataLineage.enabled": "true" }, "instance_pool_id": "1025-140806-yup112-pool-yz565bma", "data_security_mode": "NONE", "runtime_engine": "STANDARD", "num_workers": 3 } } ], "git_source": { "git_url": "WILL BE OVERRIDED BY THE LOCAL CONF SETUP", "git_provider": "gitHub", "git_branch": "WILL BE OVERRIDED BY THE LOCAL CONF SETUP" }, "format": "MULTI_TASK" } ================================================ FILE: dbdemos/sql_query.py ================================================ import logging from databricks.sdk import WorkspaceClient from databricks.sdk.service.sql import StatementState, ExecuteStatementRequestOnWaitTimeout from databricks.sdk.service.sql import ResultData, ResultManifest from typing import List, Dict, Any import time from dbdemos.exceptions.dbdemos_exception import SQLQueryException class SQLQueryExecutor: def __init__(self): self.logger = logging.getLogger(__name__) def get_or_create_shared_warehouse(self, ws: WorkspaceClient) -> str: warehouses = ws.warehouses.list() # First, look for a shared warehouse for warehouse in warehouses: if 'shared' in warehouse.name.lower(): return warehouse.id # If no shared warehouse, look for a running warehouse for warehouse in warehouses: if warehouse.state == 'RUNNING': return warehouse.id # If no running warehouse, return the first available warehouse if warehouses: return warehouses[0].id # If no warehouses at all, create a new one new_warehouse = ws.warehouses.create( name="shared-warehouse", cluster_size="Small", auto_stop_mins=10 ) return new_warehouse.id def execute_query_as_list(self, ws: WorkspaceClient, query: str, timeout: int = 50, warehouse_id: str = None, debug: bool = False) -> tuple[ResultData, ResultManifest]: data, manifest = self.execute_query(ws, query, timeout, warehouse_id, debug) return self.get_results_formatted_as_list(data, manifest) def execute_query(self, ws: WorkspaceClient, query: str, timeout: int = 50, warehouse_id: str = None, debug: bool = False) -> tuple[ResultData, ResultManifest]: if not warehouse_id: warehouse_id = self.get_or_create_shared_warehouse(ws) if debug: print(f"Executing query: {query} with warehouse {warehouse_id}") # Execute the query with a maximum wait timeout of 50 seconds statement = ws.statement_execution.execute_statement( warehouse_id=warehouse_id, statement=query, wait_timeout=f"{timeout}s", on_wait_timeout=ExecuteStatementRequestOnWaitTimeout.CONTINUE ) # If the statement is not completed within the wait_timeout, poll for results while statement.status.state in [StatementState.PENDING, StatementState.RUNNING]: time.sleep(1) statement = ws.statement_execution.get_statement(statement.statement_id) if statement.status.state == StatementState.FAILED: raise SQLQueryException(f"Query execution failed: {statement.status.error}") # Fetch initial results results = ws.statement_execution.get_statement(statement.statement_id) # Initialize combined result data combined_data = ResultData(data_array=[]) if results.result and results.result.data_array: combined_data.data_array.extend(results.result.data_array) # Fetch additional chunks if they exist if results.manifest and results.manifest.chunks: chunk_index = results.manifest.chunks[0].chunk_index + 1 while chunk_index < results.manifest.total_chunk_count: chunk = ws.statement_execution.get_statement_result_chunk_n( statement_id=statement.statement_id, chunk_index=chunk_index ) if chunk.data_array: combined_data.data_array.extend(chunk.data_array) chunk_index += 1 return combined_data, results.manifest def get_results_formatted_as_list(self, result_data: ResultData, result_manifest: ResultManifest) -> List[Dict[str, Any]]: column_names = [col.name for col in result_manifest.schema.columns] result_list = [] if result_data.data_array: for row in result_data.data_array: result_dict = {column_names[i]: value for i, value in enumerate(row)} result_list.append(result_dict) return result_list ================================================ FILE: dbdemos/template/LICENSE.html ================================================ <!DOCTYPE html> <html> <head> <meta name="databricks-html-version" content="1"> <title>LICENSE - Databricks ================================================ FILE: dbdemos/template/NOTICE.html ================================================ NOTICE - Databricks ================================================ FILE: dbdemos/template/README.html ================================================ README - Databricks ================================================ FILE: dbdemos/template/code_viewer.html ================================================ {{FILE_NAME}}

{{FILE_NAME}} {{FILE_TYPE}}

{{FILE_PATH}}
{{CODE_CONTENT}}
================================================ FILE: dbdemos/template/index.html ================================================ {{TITLE}}
%pip install dbdemos
import dbdemos
dbdemos.install('{{DEMO_NAME}}')
Try this demo in your workspace!
Run the following in your notebook:
License | Notice
databricks-logo

{{TITLE}}

{{DESCRIPTION}}
{{LEFT_MENU}}
================================================ FILE: dbdemos/tracker.py ================================================ import requests import urllib.parse import hashlib class Tracker: #Set this value to false to disable dbdemo toolkit tracker. enable_tracker = True URL = "https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics" def __init__(self, org_id, uid, email = None): self.org_id = org_id self.uid = uid # This is aggregating user behavior within Databricks at the org level to better understand dbdemos usage and improve the product. # We are not collecting any email/PII data. Please reach out to the demo team if you have any questions. if email is not None and email.endswith("@databricks.com"): self.email = email else: self.email = None def track_install(self, category, demo_name): self.track(category, demo_name, "INSTALL") def track_create_cluster(self, category, demo_name): self.track(category, demo_name, "CREATE_CLUSTER") def track_list(self): self.track("list_demos", "list_demos", "LIST") def get_user_hash(self): if self.email is None or not self.email.endswith("@databricks.com"): return None return hashlib.sha256(self.email.encode()).hexdigest() def get_track_url(self, category, demo_name, event, notebook = ""): params = self.get_track_params(category, demo_name, event, notebook) return Tracker.URL+"?"+urllib.parse.urlencode(params) def get_track_params(self, category, demo_name, event, notebook =""): if not Tracker.enable_tracker: return {} if len(notebook) > 0: notebook = '/'+notebook params = {"category": category, "org_id": self.org_id, #legacy "cid" -- ignore "uid": self.uid "notebook": notebook, "demo_name": demo_name, "event": event, "path": f"/_dbdemos/{category}/{demo_name}{notebook}", #legacy tracking "dp" "version": 1} user_hash = self.get_user_hash() if user_hash is not None: params["user_hash"] = user_hash return params def track(self, category, demo_name, event): if self.org_id == "1660015457675682": print("skipping tracker for test / dev") elif Tracker.enable_tracker: headers = {"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", "accept-encoding": "gzip, deflate, br", "accept-language": "en-US,en;q=0.9", "cache-control": "max-age=0", "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"} try: with requests.post(Tracker.URL, json = self.get_track_params(category, demo_name, event), headers=headers, timeout=5) as t: if t.status_code != 200: print(f"Info - Usage report error (internet access not available?). See readme to disable it, you can ignore safely this. Details: {t.text}") except Exception as e: print("Usage report error. See readme to disable it. "+(str(e))) ================================================ FILE: docs/CNAME ================================================ dbdemos.ai ================================================ FILE: docs/index.html ================================================ dbdemos.ai - Demos for Databricks dbdemos.ai has a new home under databricks.com!
Redirecting to https://www.databricks.com/resources/demos/tutorials.. ================================================ FILE: main.py ================================================ import json from dbdemos.conf import Conf, DemoConf from dbdemos.installer import Installer from dbdemos.job_bundler import JobBundler from dbdemos.packager import Packager import traceback with open("./local_conf_E2TOOL.json", "r") as r: c = json.loads(r.read()) with open("./dbdemos/resources/default_cluster_config.json", "r") as cc: default_cluster_template = cc.read() with open("./dbdemos/resources/default_test_job_conf.json", "r") as cc: default_cluster_job_template = cc.read() conf = Conf(c['username'], c['url'], c['org_id'], c['pat_token'], default_cluster_template, default_cluster_job_template, c['repo_staging_path'], c['repo_name'], c['repo_url'], c['branch'], github_token=c['github_token']) def bundle(): bundler = JobBundler(conf) # the bundler will use a stating repo dir in the workspace to analyze & run content. bundler.reset_staging_repo(skip_pull=False, ) """ bundler.load_bundles_conf() #bundler.add_bundle("product_demos/Unity-Catalog/uc-05-upgrade") # Or manually add bundle to run faster: bundler.add_bundle("product_demos/Unity-Catalog/05-Upgrade-to-UC") bundler.add_bundle("product_demos/Unity-Catalog/02-External-location") #bundler.load_bundles_conf() """ bundler.load_bundles_conf() #bundler.ignore_bundle("/product_demos/delta-sharing-airlines") #bundler.add_bundle("aibi/aibi-marketing-campaign") #bundler.add_bundle("demo-retail/lakehouse-retail-c360") # Run the jobs (only if there is a new commit since the last time, or failure, or force execution) bundler.start_and_wait_bundle_jobs(force_execution = False, skip_execution=False, recreate_jobs=False) packager = Packager(conf, bundler) packager.package_all() bundle() #Loads conf to install with open("local_conf_E2FE.json", "r") as r: c = json.loads(r.read()) from dbdemos.installer import Installer import dbdemos #dbdemos.list_demos(pat_token=c['pat_token']) #dbdemos.install_all("/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], cloud="AWS", start_cluster = False, skip_dashboards=False, catalog='main_test_quentin') #dbdemos.check_status_all() #dbdemos.install("ai-agent", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='test_quentin_dbdemos_ai_agent', cloud="AWS", start_cluster = False, skip_dashboards=False, serverless=True) #dbdemos.install("computer-vision-pcb", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test2', cloud="AWS", start_cluster = False, skip_dashboards=False) #dbdemos.install("uc-01-acl", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], start_cluster = False, schema='test_quentin_acl', catalog='dbdemos') #dbdemos.install("uc-02-external-location", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], cloud="AWS") #dbdemos.install("uc-03-data-lineage", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], cloud="AWS") #dbdemos.install("uc-05-upgrade", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url']) #dbdemos.install("dlt-cdc", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test2', cloud="AWS", start_cluster = False, debug=True, dlt_policy_id = "0003963E5B551CE4", dlt_compute_settings = {"autoscale": {"min_workers": 1, "max_workers": 5}}) #dbdemos.install("dlt-loans", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test2', cloud="AWS", start_cluster = False, debug=True) #dbdemos.install("declarative-pipeline-unit-test", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='test_quentin_pipeline_unit_test', cloud="AWS", start_cluster = False, debug=True) #dbdemos.install("llm-tools-functions", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], cloud="AWS", use_current_cluster=False, current_cluster_id=c["current_cluster_id"], schema='test_quentin_rag', catalog='dbdemos', debug=True) #dbdemos.install("declarative-pipeline-unit-test", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test_unit_test', cloud="AWS", start_cluster = False, debug=True) #dbdemos.install("auto-loader", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test2', cloud="AWS", start_cluster = False, debug=True) """ """ #dbdemos.install_all("/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], cloud="AWS", start_cluster = False, skip_dashboards = True) #dbdemos.check_status_all(c['username'], c['pat_token'], c['url'], cloud="AWS") #dbdemos.install("delta-lake", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], cloud="AWS", start_cluster = True, skip_dashboards=False) #dbdemos.install("delta-lake", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], cloud="AWS", start_cluster = True, skip_dashboards=False) #installer = Installer() #for d in installer.get_demos_available(): # dbdemos.install(d, "/Users/quentin.ambard@databricks.com/test_dbdemos", True, c['username'], c['pat_token'], c['url'], cloud="AWS") #dbdemos.list_demos(None) #dbdemos.install("llm-rag-chatbot", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test_rag', cloud="AWS", start_cluster = False, skip_dashboards=True, use_current_cluster=True, debug = True) #dbdemos.install("lakehouse-monitoring", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test_lhm', cloud="AWS", start_cluster = False, skip_dashboards=False, use_current_cluster=True, debug = True) #dbdemos.install("uc-04-system-tables", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test_sys', cloud="AWS", start_cluster = False, debug = True, serverless=True) #dbdemos.install("declarative-pipeline-cdc", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test2', cloud="AWS", start_cluster = False, skip_dashboards=False) #dbdemos.install("lakehouse-retail-c360", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main_test_quentin2', schema='quentin_test2', cloud="AWS", start_cluster = False, skip_dashboards=False, create_schema=True, debug=True) #dbdemos.install("lakehouse-fsi-smart-claims", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test2', cloud="AWS", start_cluster = False, skip_dashboards=False) #dbdemos.install("lakehouse-fsi-credit", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test2', cloud="AWS", start_cluster = False, skip_dashboards=False) #dbdemos.install("lakehouse-fsi-fraud", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test2', cloud="AWS", start_cluster = False, skip_dashboards=False) #dbdemos.install("lakehouse-iot-platform", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test3', cloud="AWS", start_cluster = False) #dbdemos.install("pipeline-bike", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test3', cloud="AWS", start_cluster = False) #dbdemos.install("feature-store", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], cloud="AWS", use_current_cluster=False, current_cluster_id=c["current_cluster_id"]) #dbdemos.install("delta-lake", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], cloud="GCP") #dbdemos.install("delta-lake", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], cloud="Azure", use_current_cluster=True, current_cluster_id=c["current_cluster_id"]) #dbdemos.install("mlops-end2end", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], cloud="AWS", skip_dashboards=True, schema='test_quentin_dbdemos', catalog='dbdemos') #dbdemos.install("pandas-on-spark", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], cloud="AWS") #dbdemos.install("delta-sharing-airlines", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url']) #dbdemos.install("dlt-loans", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url']) #dbdemos.install("dlt-unit-test", "/Users/quentin.ambard@databricks.com/test_install", True, c['username'], c['pat_token'], c['url']) #dbdemos.create_cluster("uc-05-upgrade", c['username'], c['pat_token'], c['url'], "GCP") dbdemos.install("aibi-marketing-campaign", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test_sys', cloud="AWS", start_cluster = False, debug = True) #dbdemos.install("aibi-portfolio-assistant", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test_sys', cloud="AWS", start_cluster = False, debug = True) #dbdemos.install("aibi-supply-chain-forecasting", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test_sys', cloud="AWS", start_cluster = False, debug = True) #dbdemos.install("aibi-sales-pipeline-review", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test_sys', cloud="AWS", start_cluster = False, debug = True) #dbdemos.install("aibi-patient-genomics", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test_sys', cloud="AWS", start_cluster = False, debug = True) #dbdemos.install("aibi-customer-support", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test_sys', cloud="AWS", start_cluster = False, debug = True) #dbdemos.install("dbt-on-databricks", "/Users/quentin.ambard@databricks.com/test_install_quentin", True, c['username'], c['pat_token'], c['url'], catalog='main', schema='quentin_test_sys', cloud="AWS", start_cluster = False, debug = True, serverless=True) ================================================ FILE: requirements.in ================================================ # Direct dependencies from setup.py requests pandas databricks-sdk>=0.38.0 ================================================ FILE: requirements.txt ================================================ # # This file is autogenerated by pip-compile with Python 3.11 # by the following command: # # pip-compile --generate-hashes --output-file=requirements.txt requirements.in # certifi==2026.2.25 \ --hash=sha256:027692e4402ad994f1c42e52a4997a9763c646b73e4096e4d5d6db8af1d6f0fa \ --hash=sha256:e887ab5cee78ea814d3472169153c2d12cd43b14bd03329a39a9c6e2e80bfba7 # via requests cffi==2.0.0 \ --hash=sha256:00bdf7acc5f795150faa6957054fbbca2439db2f775ce831222b66f192f03beb \ --hash=sha256:07b271772c100085dd28b74fa0cd81c8fb1a3ba18b21e03d7c27f3436a10606b \ --hash=sha256:087067fa8953339c723661eda6b54bc98c5625757ea62e95eb4898ad5e776e9f \ --hash=sha256:0a1527a803f0a659de1af2e1fd700213caba79377e27e4693648c2923da066f9 \ --hash=sha256:0cf2d91ecc3fcc0625c2c530fe004f82c110405f101548512cce44322fa8ac44 \ --hash=sha256:0f6084a0ea23d05d20c3edcda20c3d006f9b6f3fefeac38f59262e10cef47ee2 \ --hash=sha256:12873ca6cb9b0f0d3a0da705d6086fe911591737a59f28b7936bdfed27c0d47c \ --hash=sha256:19f705ada2530c1167abacb171925dd886168931e0a7b78f5bffcae5c6b5be75 \ --hash=sha256:1cd13c99ce269b3ed80b417dcd591415d3372bcac067009b6e0f59c7d4015e65 \ --hash=sha256:1e3a615586f05fc4065a8b22b8152f0c1b00cdbc60596d187c2a74f9e3036e4e \ --hash=sha256:1f72fb8906754ac8a2cc3f9f5aaa298070652a0ffae577e0ea9bd480dc3c931a \ --hash=sha256:1fc9ea04857caf665289b7a75923f2c6ed559b8298a1b8c49e59f7dd95c8481e \ --hash=sha256:203a48d1fb583fc7d78a4c6655692963b860a417c0528492a6bc21f1aaefab25 \ --hash=sha256:2081580ebb843f759b9f617314a24ed5738c51d2aee65d31e02f6f7a2b97707a \ --hash=sha256:21d1152871b019407d8ac3985f6775c079416c282e431a4da6afe7aefd2bccbe \ --hash=sha256:24b6f81f1983e6df8db3adc38562c83f7d4a0c36162885ec7f7b77c7dcbec97b \ --hash=sha256:256f80b80ca3853f90c21b23ee78cd008713787b1b1e93eae9f3d6a7134abd91 \ --hash=sha256:28a3a209b96630bca57cce802da70c266eb08c6e97e5afd61a75611ee6c64592 \ --hash=sha256:2c8f814d84194c9ea681642fd164267891702542f028a15fc97d4674b6206187 \ --hash=sha256:2de9a304e27f7596cd03d16f1b7c72219bd944e99cc52b84d0145aefb07cbd3c \ --hash=sha256:38100abb9d1b1435bc4cc340bb4489635dc2f0da7456590877030c9b3d40b0c1 \ --hash=sha256:3925dd22fa2b7699ed2617149842d2e6adde22b262fcbfada50e3d195e4b3a94 \ --hash=sha256:3e17ed538242334bf70832644a32a7aae3d83b57567f9fd60a26257e992b79ba \ --hash=sha256:3e837e369566884707ddaf85fc1744b47575005c0a229de3327f8f9a20f4efeb \ --hash=sha256:3f4d46d8b35698056ec29bca21546e1551a205058ae1a181d871e278b0b28165 \ --hash=sha256:44d1b5909021139fe36001ae048dbdde8214afa20200eda0f64c068cac5d5529 \ --hash=sha256:45d5e886156860dc35862657e1494b9bae8dfa63bf56796f2fb56e1679fc0bca \ --hash=sha256:4647afc2f90d1ddd33441e5b0e85b16b12ddec4fca55f0d9671fef036ecca27c \ --hash=sha256:4671d9dd5ec934cb9a73e7ee9676f9362aba54f7f34910956b84d727b0d73fb6 \ --hash=sha256:53f77cbe57044e88bbd5ed26ac1d0514d2acf0591dd6bb02a3ae37f76811b80c \ --hash=sha256:5eda85d6d1879e692d546a078b44251cdd08dd1cfb98dfb77b670c97cee49ea0 \ --hash=sha256:5fed36fccc0612a53f1d4d9a816b50a36702c28a2aa880cb8a122b3466638743 \ --hash=sha256:61d028e90346df14fedc3d1e5441df818d095f3b87d286825dfcbd6459b7ef63 \ --hash=sha256:66f011380d0e49ed280c789fbd08ff0d40968ee7b665575489afa95c98196ab5 \ --hash=sha256:6824f87845e3396029f3820c206e459ccc91760e8fa24422f8b0c3d1731cbec5 \ --hash=sha256:6c6c373cfc5c83a975506110d17457138c8c63016b563cc9ed6e056a82f13ce4 \ --hash=sha256:6d02d6655b0e54f54c4ef0b94eb6be0607b70853c45ce98bd278dc7de718be5d \ --hash=sha256:6d50360be4546678fc1b79ffe7a66265e28667840010348dd69a314145807a1b \ --hash=sha256:730cacb21e1bdff3ce90babf007d0a0917cc3e6492f336c2f0134101e0944f93 \ --hash=sha256:737fe7d37e1a1bffe70bd5754ea763a62a066dc5913ca57e957824b72a85e205 \ --hash=sha256:74a03b9698e198d47562765773b4a8309919089150a0bb17d829ad7b44b60d27 \ --hash=sha256:7553fb2090d71822f02c629afe6042c299edf91ba1bf94951165613553984512 \ --hash=sha256:7a66c7204d8869299919db4d5069a82f1561581af12b11b3c9f48c584eb8743d \ --hash=sha256:7cc09976e8b56f8cebd752f7113ad07752461f48a58cbba644139015ac24954c \ --hash=sha256:81afed14892743bbe14dacb9e36d9e0e504cd204e0b165062c488942b9718037 \ --hash=sha256:8941aaadaf67246224cee8c3803777eed332a19d909b47e29c9842ef1e79ac26 \ --hash=sha256:89472c9762729b5ae1ad974b777416bfda4ac5642423fa93bd57a09204712322 \ --hash=sha256:8ea985900c5c95ce9db1745f7933eeef5d314f0565b27625d9a10ec9881e1bfb \ --hash=sha256:8eca2a813c1cb7ad4fb74d368c2ffbbb4789d377ee5bb8df98373c2cc0dee76c \ --hash=sha256:92b68146a71df78564e4ef48af17551a5ddd142e5190cdf2c5624d0c3ff5b2e8 \ --hash=sha256:9332088d75dc3241c702d852d4671613136d90fa6881da7d770a483fd05248b4 \ --hash=sha256:94698a9c5f91f9d138526b48fe26a199609544591f859c870d477351dc7b2414 \ --hash=sha256:9a67fc9e8eb39039280526379fb3a70023d77caec1852002b4da7e8b270c4dd9 \ --hash=sha256:9de40a7b0323d889cf8d23d1ef214f565ab154443c42737dfe52ff82cf857664 \ --hash=sha256:a05d0c237b3349096d3981b727493e22147f934b20f6f125a3eba8f994bec4a9 \ --hash=sha256:afb8db5439b81cf9c9d0c80404b60c3cc9c3add93e114dcae767f1477cb53775 \ --hash=sha256:b18a3ed7d5b3bd8d9ef7a8cb226502c6bf8308df1525e1cc676c3680e7176739 \ --hash=sha256:b1e74d11748e7e98e2f426ab176d4ed720a64412b6a15054378afdb71e0f37dc \ --hash=sha256:b21e08af67b8a103c71a250401c78d5e0893beff75e28c53c98f4de42f774062 \ --hash=sha256:b4c854ef3adc177950a8dfc81a86f5115d2abd545751a304c5bcf2c2c7283cfe \ --hash=sha256:b882b3df248017dba09d6b16defe9b5c407fe32fc7c65a9c69798e6175601be9 \ --hash=sha256:baf5215e0ab74c16e2dd324e8ec067ef59e41125d3eade2b863d294fd5035c92 \ --hash=sha256:c649e3a33450ec82378822b3dad03cc228b8f5963c0c12fc3b1e0ab940f768a5 \ --hash=sha256:c654de545946e0db659b3400168c9ad31b5d29593291482c43e3564effbcee13 \ --hash=sha256:c6638687455baf640e37344fe26d37c404db8b80d037c3d29f58fe8d1c3b194d \ --hash=sha256:c8d3b5532fc71b7a77c09192b4a5a200ea992702734a2e9279a37f2478236f26 \ --hash=sha256:cb527a79772e5ef98fb1d700678fe031e353e765d1ca2d409c92263c6d43e09f \ --hash=sha256:cf364028c016c03078a23b503f02058f1814320a56ad535686f90565636a9495 \ --hash=sha256:d48a880098c96020b02d5a1f7d9251308510ce8858940e6fa99ece33f610838b \ --hash=sha256:d68b6cef7827e8641e8ef16f4494edda8b36104d79773a334beaa1e3521430f6 \ --hash=sha256:d9b29c1f0ae438d5ee9acb31cadee00a58c46cc9c0b2f9038c6b0b3470877a8c \ --hash=sha256:d9b97165e8aed9272a6bb17c01e3cc5871a594a446ebedc996e2397a1c1ea8ef \ --hash=sha256:da68248800ad6320861f129cd9c1bf96ca849a2771a59e0344e88681905916f5 \ --hash=sha256:da902562c3e9c550df360bfa53c035b2f241fed6d9aef119048073680ace4a18 \ --hash=sha256:dbd5c7a25a7cb98f5ca55d258b103a2054f859a46ae11aaf23134f9cc0d356ad \ --hash=sha256:dd4f05f54a52fb558f1ba9f528228066954fee3ebe629fc1660d874d040ae5a3 \ --hash=sha256:de8dad4425a6ca6e4e5e297b27b5c824ecc7581910bf9aee86cb6835e6812aa7 \ --hash=sha256:e11e82b744887154b182fd3e7e8512418446501191994dbf9c9fc1f32cc8efd5 \ --hash=sha256:e6e73b9e02893c764e7e8d5bb5ce277f1a009cd5243f8228f75f842bf937c534 \ --hash=sha256:f73b96c41e3b2adedc34a7356e64c8eb96e03a3782b535e043a986276ce12a49 \ --hash=sha256:f93fd8e5c8c0a4aa1f424d6173f14a892044054871c771f8566e4008eaa359d2 \ --hash=sha256:fc33c5141b55ed366cfaad382df24fe7dcbc686de5be719b207bb248e3053dc5 \ --hash=sha256:fc7de24befaeae77ba923797c7c87834c73648a05a4bde34b3b7e5588973a453 \ --hash=sha256:fe562eb1a64e67dd297ccc4f5addea2501664954f2692b69a76449ec7913ecbf # via cryptography charset-normalizer==3.4.6 \ --hash=sha256:06a7e86163334edfc5d20fe104db92fcd666e5a5df0977cb5680a506fe26cc8e \ --hash=sha256:0c173ce3a681f309f31b87125fecec7a5d1347261ea11ebbb856fa6006b23c8c \ --hash=sha256:0e28d62a8fc7a1fa411c43bd65e346f3bce9716dc51b897fbe930c5987b402d5 \ --hash=sha256:0e901eb1049fdb80f5bd11ed5ea1e498ec423102f7a9b9e4645d5b8204ff2815 \ --hash=sha256:11afb56037cbc4b1555a34dd69151e8e069bee82e613a73bef6e714ce733585f \ --hash=sha256:150b8ce8e830eb7ccb029ec9ca36022f756986aaaa7956aad6d9ec90089338c0 \ --hash=sha256:172985e4ff804a7ad08eebec0a1640ece87ba5041d565fff23c8f99c1f389484 \ --hash=sha256:197c1a244a274bb016dd8b79204850144ef77fe81c5b797dc389327adb552407 \ --hash=sha256:1ae6b62897110aa7c79ea2f5dd38d1abca6db663687c0b1ad9aed6f6bae3d9d6 \ --hash=sha256:1cf0a70018692f85172348fe06d3a4b63f94ecb055e13a00c644d368eb82e5b8 \ --hash=sha256:1ed80ff870ca6de33f4d953fda4d55654b9a2b340ff39ab32fa3adbcd718f264 \ --hash=sha256:22c6f0c2fbc31e76c3b8a86fba1a56eda6166e238c29cdd3d14befdb4a4e4815 \ --hash=sha256:231d4da14bcd9301310faf492051bee27df11f2bc7549bc0bb41fef11b82daa2 \ --hash=sha256:259695e2ccc253feb2a016303543d691825e920917e31f894ca1a687982b1de4 \ --hash=sha256:2a24157fa36980478dd1770b585c0f30d19e18f4fb0c47c13aa568f871718579 \ --hash=sha256:2b1a63e8224e401cafe7739f77efd3f9e7f5f2026bda4aead8e59afab537784f \ --hash=sha256:2bd9d128ef93637a5d7a6af25363cf5dec3fa21cf80e68055aad627f280e8afa \ --hash=sha256:2e1d8ca8611099001949d1cdfaefc510cf0f212484fe7c565f735b68c78c3c95 \ --hash=sha256:2ef7fedc7a6ecbe99969cd09632516738a97eeb8bd7258bf8a0f23114c057dab \ --hash=sha256:2f7fdd9b6e6c529d6a2501a2d36b240109e78a8ceaef5687cfcfa2bbe671d297 \ --hash=sha256:30f445ae60aad5e1f8bdbb3108e39f6fbc09f4ea16c815c66578878325f8f15a \ --hash=sha256:31215157227939b4fb3d740cd23fe27be0439afef67b785a1eb78a3ae69cba9e \ --hash=sha256:34315ff4fc374b285ad7f4a0bf7dcbfe769e1b104230d40f49f700d4ab6bbd84 \ --hash=sha256:3516bbb8d42169de9e61b8520cbeeeb716f12f4ecfe3fd30a9919aa16c806ca8 \ --hash=sha256:3778fd7d7cd04ae8f54651f4a7a0bd6e39a0cf20f801720a4c21d80e9b7ad6b0 \ --hash=sha256:39f5068d35621da2881271e5c3205125cc456f54e9030d3f723288c873a71bf9 \ --hash=sha256:404a1e552cf5b675a87f0651f8b79f5f1e6fd100ee88dc612f89aa16abd4486f \ --hash=sha256:419a9d91bd238052642a51938af8ac05da5b3343becde08d5cdeab9046df9ee1 \ --hash=sha256:423fb7e748a08f854a08a222b983f4df1912b1daedce51a72bd24fe8f26a1843 \ --hash=sha256:4482481cb0572180b6fd976a4d5c72a30263e98564da68b86ec91f0fe35e8565 \ --hash=sha256:461598cd852bfa5a61b09cae2b1c02e2efcd166ee5516e243d540ac24bfa68a7 \ --hash=sha256:47955475ac79cc504ef2704b192364e51d0d473ad452caedd0002605f780101c \ --hash=sha256:48696db7f18afb80a068821504296eb0787d9ce239b91ca15059d1d3eaacf13b \ --hash=sha256:4be9f4830ba8741527693848403e2c457c16e499100963ec711b1c6f2049b7c7 \ --hash=sha256:4d1d02209e06550bdaef34af58e041ad71b88e624f5d825519da3a3308e22687 \ --hash=sha256:4f41da960b196ea355357285ad1316a00099f22d0929fe168343b99b254729c9 \ --hash=sha256:517ad0e93394ac532745129ceabdf2696b609ec9f87863d337140317ebce1c14 \ --hash=sha256:51fb3c322c81d20567019778cb5a4a6f2dc1c200b886bc0d636238e364848c89 \ --hash=sha256:5273b9f0b5835ff0350c0828faea623c68bfa65b792720c453e22b25cc72930f \ --hash=sha256:530d548084c4a9f7a16ed4a294d459b4f229db50df689bfe92027452452943a0 \ --hash=sha256:530e8cebeea0d76bdcf93357aa5e41336f48c3dc709ac52da2bb167c5b8271d9 \ --hash=sha256:54fae94be3d75f3e573c9a1b5402dc593de19377013c9a0e4285e3d402dd3a2a \ --hash=sha256:572d7c822caf521f0525ba1bce1a622a0b85cf47ffbdae6c9c19e3b5ac3c4389 \ --hash=sha256:58c948d0d086229efc484fe2f30c2d382c86720f55cd9bc33591774348ad44e0 \ --hash=sha256:5d11595abf8dd942a77883a39d81433739b287b6aa71620f15164f8096221b30 \ --hash=sha256:5f8ddd609f9e1af8c7bd6e2aca279c931aefecd148a14402d4e368f3171769fd \ --hash=sha256:5feb91325bbceade6afab43eb3b508c63ee53579fe896c77137ded51c6b6958e \ --hash=sha256:60c74963d8350241a79cb8feea80e54d518f72c26db618862a8f53e5023deaf9 \ --hash=sha256:613f19aa6e082cf96e17e3ffd89383343d0d589abda756b7764cf78361fd41dc \ --hash=sha256:659a1e1b500fac8f2779dd9e1570464e012f43e580371470b45277a27baa7532 \ --hash=sha256:695f5c2823691a25f17bc5d5ffe79fa90972cc34b002ac6c843bb8a1720e950d \ --hash=sha256:69dd852c2f0ad631b8b60cfbe25a28c0058a894de5abb566619c205ce0550eae \ --hash=sha256:6cceb5473417d28edd20c6c984ab6fee6c6267d38d906823ebfe20b03d607dc2 \ --hash=sha256:71be7e0e01753a89cf024abf7ecb6bca2c81738ead80d43004d9b5e3f1244e64 \ --hash=sha256:74119174722c4349af9708993118581686f343adc1c8c9c007d59be90d077f3f \ --hash=sha256:74a2e659c7ecbc73562e2a15e05039f1e22c75b7c7618b4b574a3ea9118d1557 \ --hash=sha256:7504e9b7dc05f99a9bbb4525c67a2c155073b44d720470a148b34166a69c054e \ --hash=sha256:79090741d842f564b1b2827c0b82d846405b744d31e84f18d7a7b41c20e473ff \ --hash=sha256:7a6967aaf043bceabab5412ed6bd6bd26603dae84d5cb75bf8d9a74a4959d398 \ --hash=sha256:7bda6eebafd42133efdca535b04ccb338ab29467b3f7bf79569883676fc628db \ --hash=sha256:7edbed096e4a4798710ed6bc75dcaa2a21b68b6c356553ac4823c3658d53743a \ --hash=sha256:7f9019c9cb613f084481bd6a100b12e1547cf2efe362d873c2e31e4035a6fa43 \ --hash=sha256:802168e03fba8bbc5ce0d866d589e4b1ca751d06edee69f7f3a19c5a9fe6b597 \ --hash=sha256:80d0a5615143c0b3225e5e3ef22c8d5d51f3f72ce0ea6fb84c943546c7b25b6c \ --hash=sha256:82060f995ab5003a2d6e0f4ad29065b7672b6593c8c63559beefe5b443242c3e \ --hash=sha256:836ab36280f21fc1a03c99cd05c6b7af70d2697e374c7af0b61ed271401a72a2 \ --hash=sha256:8761ac29b6c81574724322a554605608a9960769ea83d2c73e396f3df896ad54 \ --hash=sha256:87725cfb1a4f1f8c2fc9890ae2f42094120f4b44db9360be5d99a4c6b0e03a9e \ --hash=sha256:899d28f422116b08be5118ef350c292b36fc15ec2daeb9ea987c89281c7bb5c4 \ --hash=sha256:8bc5f0687d796c05b1e28ab0d38a50e6309906ee09375dd3aff6a9c09dd6e8f4 \ --hash=sha256:8bea55c4eef25b0b19a0337dc4e3f9a15b00d569c77211fa8cde38684f234fb7 \ --hash=sha256:8e5a94886bedca0f9b78fecd6afb6629142fd2605aa70a125d49f4edc6037ee6 \ --hash=sha256:90ca27cd8da8118b18a52d5f547859cc1f8354a00cd1e8e5120df3e30d6279e5 \ --hash=sha256:92734d4d8d187a354a556626c221cd1a892a4e0802ccb2af432a1d85ec012194 \ --hash=sha256:947cf925bc916d90adba35a64c82aace04fa39b46b52d4630ece166655905a69 \ --hash=sha256:95b52c68d64c1878818687a473a10547b3292e82b6f6fe483808fb1468e2f52f \ --hash=sha256:97d0235baafca5f2b09cf332cc275f021e694e8362c6bb9c96fc9a0eb74fc316 \ --hash=sha256:9ca4c0b502ab399ef89248a2c84c54954f77a070f28e546a85e91da627d1301e \ --hash=sha256:9cc4fc6c196d6a8b76629a70ddfcd4635a6898756e2d9cac5565cf0654605d73 \ --hash=sha256:9cc6e6d9e571d2f863fa77700701dae73ed5f78881efc8b3f9a4398772ff53e8 \ --hash=sha256:a056d1ad2633548ca18ffa2f85c202cfb48b68615129143915b8dc72a806a923 \ --hash=sha256:a26611d9987b230566f24a0a125f17fe0de6a6aff9f25c9f564aaa2721a5fb88 \ --hash=sha256:a4474d924a47185a06411e0064b803c68be044be2d60e50e8bddcc2649957c1f \ --hash=sha256:a4ea868bc28109052790eb2b52a9ab33f3aa7adc02f96673526ff47419490e21 \ --hash=sha256:a9e68c9d88823b274cf1e72f28cb5dc89c990edf430b0bfd3e2fb0785bfeabf4 \ --hash=sha256:aa9cccf4a44b9b62d8ba8b4dd06c649ba683e4bf04eea606d2e94cfc2d6ff4d6 \ --hash=sha256:ab30e5e3e706e3063bc6de96b118688cb10396b70bb9864a430f67df98c61ecc \ --hash=sha256:ac2393c73378fea4e52aa56285a3d64be50f1a12395afef9cce47772f60334c2 \ --hash=sha256:ad8faf8df23f0378c6d527d8b0b15ea4a2e23c89376877c598c4870d1b2c7866 \ --hash=sha256:b35b200d6a71b9839a46b9b7fff66b6638bb52fc9658aa58796b0326595d3021 \ --hash=sha256:b3694e3f87f8ac7ce279d4355645b3c878d24d1424581b46282f24b92f5a4ae2 \ --hash=sha256:b4ff1d35e8c5bd078be89349b6f3a845128e685e751b6ea1169cf2160b344c4d \ --hash=sha256:bbc8c8650c6e51041ad1be191742b8b421d05bbd3410f43fa2a00c8db87678e8 \ --hash=sha256:bc72863f4d9aba2e8fd9085e63548a324ba706d2ea2c83b260da08a59b9482de \ --hash=sha256:bf625105bb9eef28a56a943fec8c8a98aeb80e7d7db99bd3c388137e6eb2d237 \ --hash=sha256:c2274ca724536f173122f36c98ce188fd24ce3dad886ec2b7af859518ce008a4 \ --hash=sha256:c45a03a4c69820a399f1dda9e1d8fbf3562eda46e7720458180302021b08f778 \ --hash=sha256:c8ae56368f8cc97c7e40a7ee18e1cedaf8e780cd8bc5ed5ac8b81f238614facb \ --hash=sha256:c907cdc8109f6c619e6254212e794d6548373cc40e1ec75e6e3823d9135d29cc \ --hash=sha256:ca0276464d148c72defa8bb4390cce01b4a0e425f3b50d1435aa6d7a18107602 \ --hash=sha256:cd5e2801c89992ed8c0a3f0293ae83c159a60d9a5d685005383ef4caca77f2c4 \ --hash=sha256:d08ec48f0a1c48d75d0356cea971921848fb620fdeba805b28f937e90691209f \ --hash=sha256:d1a2ee9c1499fc8f86f4521f27a973c914b211ffa87322f4ee33bb35392da2c5 \ --hash=sha256:d5f5d1e9def3405f60e3ca8232d56f35c98fb7bf581efcc60051ebf53cb8b611 \ --hash=sha256:d60377dce4511655582e300dc1e5a5f24ba0cb229005a1d5c8d0cb72bb758ab8 \ --hash=sha256:d73beaac5e90173ac3deb9928a74763a6d230f494e4bfb422c217a0ad8e629bf \ --hash=sha256:d7de2637729c67d67cf87614b566626057e95c303bc0a55ffe391f5205e7003d \ --hash=sha256:dad6e0f2e481fffdcf776d10ebee25e0ef89f16d691f1e5dee4b586375fdc64b \ --hash=sha256:dda86aba335c902b6149a02a55b38e96287157e609200811837678214ba2b1db \ --hash=sha256:df01808ee470038c3f8dc4f48620df7225c49c2d6639e38f96e6d6ac6e6f7b0e \ --hash=sha256:e1f6e2f00a6b8edb562826e4632e26d063ac10307e80f7461f7de3ad8ef3f077 \ --hash=sha256:e25369dc110d58ddf29b949377a93e0716d72a24f62bad72b2b39f155949c1fd \ --hash=sha256:e3c701e954abf6fc03a49f7c579cc80c2c6cc52525340ca3186c41d3f33482ef \ --hash=sha256:e5bcc1a1ae744e0bb59641171ae53743760130600da8db48cbb6e4918e186e4e \ --hash=sha256:e68c14b04827dd76dcbd1aeea9e604e3e4b78322d8faf2f8132c7138efa340a8 \ --hash=sha256:e8aeb10fcbe92767f0fa69ad5a72deca50d0dca07fbde97848997d778a50c9fe \ --hash=sha256:e985a16ff513596f217cee86c21371b8cd011c0f6f056d0920aa2d926c544058 \ --hash=sha256:ecbbd45615a6885fe3240eb9db73b9e62518b611850fdf8ab08bd56de7ad2b17 \ --hash=sha256:ee4ec14bc1680d6b0afab9aea2ef27e26d2024f18b24a2d7155a52b60da7e833 \ --hash=sha256:ef5960d965e67165d75b7c7ffc60a83ec5abfc5c11b764ec13ea54fbef8b4421 \ --hash=sha256:f0cdaecd4c953bfae0b6bb64910aaaca5a424ad9c72d85cb88417bb9814f7550 \ --hash=sha256:f1ce721c8a7dfec21fcbdfe04e8f68174183cf4e8188e0645e92aa23985c57ff \ --hash=sha256:f50498891691e0864dc3da965f340fada0771f6142a378083dc4608f4ea513e2 \ --hash=sha256:f5ea69428fa1b49573eef0cc44a1d43bebd45ad0c611eb7d7eac760c7ae771bc \ --hash=sha256:f61aa92e4aad0be58eb6eb4e0c21acf32cf8065f4b2cae5665da756c4ceef982 \ --hash=sha256:f6e4333fb15c83f7d1482a76d45a0818897b3d33f00efd215528ff7c51b8e35d \ --hash=sha256:f820f24b09e3e779fe84c3c456cb4108a7aa639b0d1f02c28046e11bfcd088ed \ --hash=sha256:f98059e4fcd3e3e4e2d632b7cf81c2faae96c43c60b569e9c621468082f1d104 \ --hash=sha256:fcce033e4021347d80ed9c66dcf1e7b1546319834b74445f561d2e2221de5659 # via requests cryptography==46.0.6 \ --hash=sha256:02fad249cb0e090b574e30b276a3da6a149e04ee2f049725b1f69e7b8351ec70 \ --hash=sha256:063b67749f338ca9c5a0b7fe438a52c25f9526b851e24e6c9310e7195aad3b4d \ --hash=sha256:12cae594e9473bca1a7aceb90536060643128bb274fcea0fc459ab90f7d1ae7a \ --hash=sha256:12f0fa16cc247b13c43d56d7b35287ff1569b5b1f4c5e87e92cc4fcc00cd10c0 \ --hash=sha256:22259338084d6ae497a19bae5d4c66b7ca1387d3264d1c2c0e72d9e9b6a77b97 \ --hash=sha256:26031f1e5ca62fcb9d1fcb34b2b60b390d1aacaa15dc8b895a9ed00968b97b30 \ --hash=sha256:27550628a518c5c6c903d84f637fbecf287f6cb9ced3804838a1295dc1fd0759 \ --hash=sha256:2b417edbe8877cda9022dde3a008e2deb50be9c407eef034aeeb3a8b11d9db3c \ --hash=sha256:2ea0f37e9a9cf0df2952893ad145fd9627d326a59daec9b0802480fa3bcd2ead \ --hash=sha256:2ef9e69886cbb137c2aef9772c2e7138dc581fad4fcbcf13cc181eb5a3ab6275 \ --hash=sha256:341359d6c9e68834e204ceaf25936dffeafea3829ab80e9503860dcc4f4dac58 \ --hash=sha256:380343e0653b1c9d7e1f55b52aaa2dbb2fdf2730088d48c43ca1c7c0abb7cc2f \ --hash=sha256:3c21d92ed15e9cfc6eb64c1f5a0326db22ca9c2566ca46d845119b45b4400361 \ --hash=sha256:3dfa6567f2e9e4c5dceb8ccb5a708158a2a871052fa75c8b78cb0977063f1507 \ --hash=sha256:456b3215172aeefb9284550b162801d62f5f264a081049a3e94307fe20792cfa \ --hash=sha256:4668298aef7cddeaf5c6ecc244c2302a2b8e40f384255505c22875eebb47888b \ --hash=sha256:50575a76e2951fe7dbd1f56d181f8c5ceeeb075e9ff88e7ad997d2f42af06e7b \ --hash=sha256:639301950939d844a9e1c4464d7e07f902fe9a7f6b215bb0d4f28584729935d8 \ --hash=sha256:64235194bad039a10bb6d2d930ab3323baaec67e2ce36215fd0952fad0930ca8 \ --hash=sha256:6617f67b1606dfd9fe4dbfa354a9508d4a6d37afe30306fe6c101b7ce3274b72 \ --hash=sha256:67177e8a9f421aa2d3a170c3e56eca4e0128883cf52a071a7cbf53297f18b175 \ --hash=sha256:6728c49e3b2c180ef26f8e9f0a883a2c585638db64cf265b49c9ba10652d430e \ --hash=sha256:6739d56300662c468fddb0e5e291f9b4d084bead381667b9e654c7dd81705124 \ --hash=sha256:69cf0056d6947edc6e6760e5f17afe4bea06b56a9ac8a06de9d2bd6b532d4f3a \ --hash=sha256:760997a4b950ff00d418398ad73fbc91aa2894b5c1db7ccb45b4f68b42a63b3c \ --hash=sha256:79e865c642cfc5c0b3eb12af83c35c5aeff4fa5c672dc28c43721c2c9fdd2f0f \ --hash=sha256:7e6142674f2a9291463e5e150090b95a8519b2fb6e6aaec8917dd8d094ce750d \ --hash=sha256:7f417f034f91dcec1cb6c5c35b07cdbb2ef262557f701b4ecd803ee8cefed4f4 \ --hash=sha256:7f6690b6c55e9c5332c0b59b9c8a3fb232ebf059094c17f9019a51e9827df91c \ --hash=sha256:8927ccfbe967c7df312ade694f987e7e9e22b2425976ddbf28271d7e58845290 \ --hash=sha256:8ce35b77aaf02f3b59c90b2c8a05c73bac12cea5b4e8f3fbece1f5fddea5f0ca \ --hash=sha256:8e7304c4f4e9490e11efe56af6713983460ee0780f16c63f219984dab3af9d2d \ --hash=sha256:90e5f0a7b3be5f40c3a0a0eafb32c681d8d2c181fc2a1bdabe9b3f611d9f6b1a \ --hash=sha256:97c8115b27e19e592a05c45d0dd89c57f81f841cc9880e353e0d3bf25b2139ed \ --hash=sha256:9a693028b9cbe51b5a1136232ee8f2bc242e4e19d456ded3fa7c86e43c713b4a \ --hash=sha256:9a9c42a2723999a710445bc0d974e345c32adfd8d2fac6d8a251fa829ad31cfb \ --hash=sha256:a3e84d5ec9ba01f8fd03802b2147ba77f0c8f2617b2aff254cedd551844209c8 \ --hash=sha256:aad75154a7ac9039936d50cf431719a2f8d4ed3d3c277ac03f3339ded1a5e707 \ --hash=sha256:b12c6b1e1651e42ab5de8b1e00dc3b6354fdfd778e7fa60541ddacc27cd21410 \ --hash=sha256:b928a3ca837c77a10e81a814a693f2295200adb3352395fad024559b7be7a736 \ --hash=sha256:bcb87663e1f7b075e48c3be3ecb5f0b46c8fc50b50a97cf264e7f60242dca3f2 \ --hash=sha256:c797e2517cb7880f8297e2c0f43bb910e91381339336f75d2c1c2cbf811b70b4 \ --hash=sha256:c89eb37fae9216985d8734c1afd172ba4927f5a05cfd9bf0e4863c6d5465b013 \ --hash=sha256:cdcd3edcbc5d55757e5f5f3d330dd00007ae463a7e7aa5bf132d1f22a4b62b19 \ --hash=sha256:d24c13369e856b94892a89ddf70b332e0b70ad4a5c43cf3e9cb71d6d7ffa1f7b \ --hash=sha256:d4e4aadb7fc1f88687f47ca20bb7227981b03afaae69287029da08096853b738 \ --hash=sha256:d9528b535a6c4f8ff37847144b8986a9a143585f0540fbcb1a98115b543aa463 \ --hash=sha256:ed3775295fb91f70b4027aeba878d79b3e55c0b3e97eaa4de71f8f23a9f2eb77 \ --hash=sha256:ed418c37d095aeddf5336898a132fba01091f0ac5844e3e8018506f014b6d2c4 # via google-auth databricks-sdk==0.102.0 \ --hash=sha256:75d1253276ee8f3dd5e7b00d62594b7051838435e618f74a8570a6dbd723ec12 \ --hash=sha256:8fa5f82317ee27cc46323c6e2543d2cfefb4468653f92ba558271043c6f72fb9 # via -r requirements.in google-auth==2.49.1 \ --hash=sha256:16d40da1c3c5a0533f57d268fe72e0ebb0ae1cc3b567024122651c045d879b64 \ --hash=sha256:195ebe3dca18eddd1b3db5edc5189b76c13e96f29e73043b923ebcf3f1a860f7 # via databricks-sdk idna==3.11 \ --hash=sha256:771a87f49d9defaf64091e6e6fe9c18d4833f140bd19464795bc32d966ca37ea \ --hash=sha256:795dafcc9c04ed0c1fb032c2aa73654d8e8c5023a7df64a53f39190ada629902 # via requests numpy==2.4.3 \ --hash=sha256:0200b25c687033316fb39f0ff4e3e690e8957a2c3c8d22499891ec58c37a3eb5 \ --hash=sha256:0448e7f9caefb34b4b7dd2b77f21e8906e5d6f0365ad525f9f4f530b13df2afc \ --hash=sha256:0a195f4216be9305a73c0e91c9b026a35f2161237cf1c6de9b681637772ea657 \ --hash=sha256:0a60e17a14d640f49146cb38e3f105f571318db7826d9b6fef7e4dce758faecd \ --hash=sha256:120df8c0a81ebbf5b9020c91439fccd85f5e018a927a39f624845be194a2be02 \ --hash=sha256:148d59127ac95979d6f07e4d460f934ebdd6eed641db9c0db6c73026f2b2101a \ --hash=sha256:1ec84fd7c8e652b0f4aaaf2e6e9cc8eaa9b1b80a537e06b2e3a2fb176eedcb26 \ --hash=sha256:22654fe6be0e5206f553a9250762c653d3698e46686eee53b399ab90da59bd92 \ --hash=sha256:22c31dc07025123aedf7f2db9e91783df13f1776dc52c6b22c620870dc0fab22 \ --hash=sha256:23b46bb6d8ecb68b58c09944483c135ae5f0e9b8d8858ece5e4ead783771d2a9 \ --hash=sha256:2629289168f4897a3c4e23dc98d6f1731f0fc0fe52fb9db19f974041e4cc12b9 \ --hash=sha256:26952e18d82a1dbbc2f008d402021baa8d6fc8e84347a2072a25e08b46d698b9 \ --hash=sha256:29363fbfa6f8ee855d7569c96ce524845e3d726d6c19b29eceec7dd555dab152 \ --hash=sha256:297837823f5bc572c5f9379b0c9f3a3365f08492cbdc33bcc3af174372ebb168 \ --hash=sha256:2abad5c7fef172b3377502bde47892439bae394a71bc329f31df0fd829b41a9e \ --hash=sha256:2b3f8d2c4589b1a2028d2a770b0fc4d1f332fb5e01521f4de3199a896d158ddd \ --hash=sha256:2ddb7919366ee468342b91dea2352824c25b55814a987847b6c52003a7c97f15 \ --hash=sha256:2e03c05abaee1f672e9d67bc858f300b5ccba1c21397211e8d77d98350972093 \ --hash=sha256:32e3bef222ad6b052280311d1d60db8e259e4947052c3ae7dd6817451fc8a4c5 \ --hash=sha256:33b3bf58ee84b172c067f56aeadc7ee9ab6de69c5e800ab5b10295d54c581adb \ --hash=sha256:45f003dbdffb997a03da2d1d0cb41fbd24a87507fb41605c0420a3db5bd4667b \ --hash=sha256:483a201202b73495f00dbc83796c6ae63137a9bdade074f7648b3e32613412dd \ --hash=sha256:48da3a4ee1336454b07497ff7ec83903efa5505792c4e6d9bf83d99dc07a1e18 \ --hash=sha256:4b42639cdde6d24e732ff823a3fa5b701d8acad89c4142bc1d0bd6dc85200ba5 \ --hash=sha256:4bd4741a6a676770e0e97fe9ab2e51de01183df3dcbcec591d26d331a40de950 \ --hash=sha256:4d382735cecd7bcf090172489a525cd7d4087bc331f7df9f60ddc9a296cf208e \ --hash=sha256:52077feedeff7c76ed7c9f1a0428558e50825347b7545bbb8523da2cd55c547a \ --hash=sha256:54f29b877279d51e210e0c80709ee14ccbbad647810e8f3d375561c45ef613dd \ --hash=sha256:5884ce5c7acfae1e4e1b6fde43797d10aa506074d25b531b4f54bde33c0c31d4 \ --hash=sha256:5e10da9e93247e554bb1d22f8edc51847ddd7dde52d85ce31024c1b4312bfba0 \ --hash=sha256:61b0cbabbb6126c8df63b9a3a0c4b1f44ebca5e12ff6997b80fcf267fb3150ef \ --hash=sha256:65f3c2455188f09678355f5cae1f959a06b778bc66d535da07bf2ef20cd319d5 \ --hash=sha256:679f2a834bae9020f81534671c56fd0cc76dd7e5182f57131478e23d0dc59e24 \ --hash=sha256:6bd06731541f89cdc01b261ba2c9e037f1543df7472517836b78dfb15bd6e476 \ --hash=sha256:715de7f82e192e8cae5a507a347d97ad17598f8e026152ca97233e3666daaa71 \ --hash=sha256:737f630a337364665aba3b5a77e56a68cc42d350edd010c345d65a3efa3addcc \ --hash=sha256:7395e69ff32526710748f92cd8c9849b361830968ea3e24a676f272653e8983e \ --hash=sha256:76dbb9d4e43c16cf9aa711fcd8de1e2eeb27539dcefb60a1d5e9f12fae1d1ed8 \ --hash=sha256:76f0f283506c28b12bba319c0fab98217e9f9b54e6160e9c79e9f7348ba32e9c \ --hash=sha256:77e76d932c49a75617c6d13464e41203cd410956614d0a0e999b25e9e8d27eec \ --hash=sha256:7aa4e54f6469300ebca1d9eb80acd5253cdfa36f2c03d79a35883687da430875 \ --hash=sha256:7d1ce23cce91fcea443320a9d0ece9b9305d4368875bab09538f7a5b4131938a \ --hash=sha256:7e58765ad74dcebd3ef0208a5078fba32dc8ec3578fe84a604432950cd043d79 \ --hash=sha256:7f3408ff897f8ab07a07fbe2823d7aee6ff644c097cc1f90382511fe982f647f \ --hash=sha256:8ba7b51e71c05aa1f9bc3641463cd82308eab40ce0d5c7e1fd4038cbf9938147 \ --hash=sha256:8e236dbda4e1d319d681afcbb136c0c4a8e0f1a5c58ceec2adebb547357fe857 \ --hash=sha256:94f3c4a151a2e529adf49c1d54f0f57ff8f9b233ee4d44af623a81553ab86368 \ --hash=sha256:9684823a78a6cd6ad7511fc5e25b07947d1d5b5e2812c93fe99d7d4195130720 \ --hash=sha256:a016db5c5dba78fa8fe9f5d80d6708f9c42ab087a739803c0ac83a43d686a470 \ --hash=sha256:a111698b4a3f8dcbe54c64a7708f049355abd603e619013c346553c1fd4ca90b \ --hash=sha256:a1988292870c7cb9d0ebb4cc96b4d447513a9644801de54606dc7aabf2b7d920 \ --hash=sha256:a315e5234d88067f2d97e1f2ef670a7569df445d55400f1e33d117418d008d52 \ --hash=sha256:a749547700de0a20a6718293396ec237bb38218049cfce788e08fcb716e8cf73 \ --hash=sha256:a97cbf7e905c435865c2d939af3d93f99d18eaaa3cabe4256f4304fb51604349 \ --hash=sha256:abdce0f71dcb4a00e4e77f3faf05e4616ceccfe72ccaa07f47ee79cda3b7b0f4 \ --hash=sha256:b346845443716c8e542d54112966383b448f4a3ba5c66409771b8c0889485dd3 \ --hash=sha256:b44fd60341c4d9783039598efadd03617fa28d041fc37d22b62d08f2027fa0e7 \ --hash=sha256:bb2e3cf95854233799013779216c57e153c1ee67a0bf92138acca0e429aefaee \ --hash=sha256:bc71942c789ef415a37f0d4eab90341425a00d538cd0642445d30b41023d3395 \ --hash=sha256:be3b8487d725a77acccc9924f65fd8bce9af7fac8c9820df1049424a2115af6c \ --hash=sha256:c59020932feb24ed49ffd03704fbab89f22aa9c0d4b180ff45542fe8918f5611 \ --hash=sha256:c6b124bfcafb9e8d3ed09130dbee44848c20b3e758b6bbf006e641778927c028 \ --hash=sha256:c9619741e9da2059cd9c3f206110b97583c7152c1dc9f8aafd4beb450ac1c89d \ --hash=sha256:cd32fbacb9fd1bf041bf8e89e4576b6f00b895f06d00914820ae06a616bdfef7 \ --hash=sha256:d1b90d840b25874cf5cd20c219af10bac3667db3876d9a495609273ebe679070 \ --hash=sha256:d213c7e6e8d211888cc359bab7199670a00f5b82c0978b9d1c75baf1eddbeac0 \ --hash=sha256:d5f51900414fc9204a0e0da158ba2ac52b75656e7dce7e77fb9f84bfa343b4cc \ --hash=sha256:d71e379452a2f670ccb689ec801b1218cd3983e253105d6e83780967e899d687 \ --hash=sha256:d84f0f881cb2225c2dfd7f78a10a5645d487a496c6668d6cc39f0f114164f3d0 \ --hash=sha256:decb0eb8a53c3b009b0962378065589685d66b23467ef5dac16cbe818afde27f \ --hash=sha256:e7dd01a46700b1967487141a66ac1a3cf0dd8ebf1f08db37d46389401512ca97 \ --hash=sha256:eb610595dd91560905c132c709412b512135a60f1851ccbd2c959e136431ff67 # via pandas pandas==3.0.1 \ --hash=sha256:06aff2ad6f0b94a17822cf8b83bbb563b090ed82ff4fe7712db2ce57cd50d9b8 \ --hash=sha256:0ab749dfba921edf641d4036c4c21c0b3ea70fea478165cb98a998fb2a261955 \ --hash=sha256:0f463ebfd8de7f326d38037c7363c6dacb857c5881ab8961fb387804d6daf2f7 \ --hash=sha256:108dd1790337a494aa80e38def654ca3f0968cf4f362c85f44c15e471667102d \ --hash=sha256:15860b1fdb1973fffade772fdb931ccf9b2f400a3f5665aef94a00445d7d8dd5 \ --hash=sha256:1849f0bba9c8a2fb0f691d492b834cc8dadf617e29015c66e989448d58d011ee \ --hash=sha256:1ff8cf1d2896e34343197685f432450ec99a85ba8d90cce2030c5eee2ef98791 \ --hash=sha256:24ba315ba3d6e5806063ac6eb717504e499ce30bd8c236d8693a5fd3f084c796 \ --hash=sha256:331ca75a2f8672c365ae25c0b29e46f5ac0c6551fdace8eec4cd65e4fac271ff \ --hash=sha256:356e5c055ed9b0da1580d465657bc7d00635af4fd47f30afb23025352ba764d1 \ --hash=sha256:3b66857e983208654294bb6477b8a63dee26b37bdd0eb34d010556e91261784f \ --hash=sha256:406ce835c55bac912f2a0dcfaf27c06d73c6b04a5dde45f1fd3169ce31337389 \ --hash=sha256:4186a699674af418f655dbd420ed87f50d56b4cd6603784279d9eef6627823c8 \ --hash=sha256:44f1364411d5670efa692b146c748f4ed013df91ee91e9bec5677fb1fd58b937 \ --hash=sha256:476f84f8c20c9f5bc47252b66b4bb25e1a9fc2fa98cead96744d8116cb85771d \ --hash=sha256:4a68773d5a778afb31d12e34f7dd4612ab90de8c6fb1d8ffe5d4a03b955082a1 \ --hash=sha256:4e1b677accee34a09e0dc2ce5624e4a58a1870ffe56fc021e9caf7f23cd7668f \ --hash=sha256:5272627187b5d9c20e55d27caf5f2cd23e286aba25cadf73c8590e432e2b7262 \ --hash=sha256:532527a701281b9dd371e2f582ed9094f4c12dd9ffb82c0c54ee28d8ac9520c4 \ --hash=sha256:536232a5fe26dd989bd633e7a0c450705fdc86a207fec7254a55e9a22950fe43 \ --hash=sha256:56cf59638bf24dc9bdf2154c81e248b3289f9a09a6d04e63608c159022352749 \ --hash=sha256:58eeb1b2e0fb322befcf2bbc9ba0af41e616abadb3d3414a6bc7167f6cbfce32 \ --hash=sha256:5ae2ab1f166668b41e770650101e7090824fd34d17915dd9cd479f5c5e0065e9 \ --hash=sha256:661e0f665932af88c7877f31da0dc743fe9c8f2524bdffe23d24fdcb67ef9d56 \ --hash=sha256:6bf0603c2e30e2cafac32807b06435f28741135cb8697eae8b28c7d492fc7d76 \ --hash=sha256:6c426422973973cae1f4a23e51d4ae85974f44871b24844e4f7de752dd877098 \ --hash=sha256:75e6e292ff898679e47a2199172593d9f6107fd2dd3617c22c2946e97d5df46e \ --hash=sha256:830994d7e1f31dd7e790045235605ab61cff6c94defc774547e8b7fdfbff3dc7 \ --hash=sha256:84f0904a69e7365f79a0c77d3cdfccbfb05bf87847e3a51a41e1426b0edb9c79 \ --hash=sha256:85fe4c4df62e1e20f9db6ebfb88c844b092c22cd5324bdcf94bfa2fc1b391221 \ --hash=sha256:93325b0fe372d192965f4cca88d97667f49557398bbf94abdda3bf1b591dbe66 \ --hash=sha256:94f87a04984d6b63788327cd9f79dda62b7f9043909d2440ceccf709249ca988 \ --hash=sha256:97ca08674e3287c7148f4858b01136f8bdfe7202ad25ad04fec602dd1d29d132 \ --hash=sha256:9832c2c69da24b602c32e0c7b1b508a03949c18ba08d4d9f1c1033426685b447 \ --hash=sha256:99d0f92ed92d3083d140bf6b97774f9f13863924cf3f52a70711f4e7588f9d0a \ --hash=sha256:9d810036895f9ad6345b8f2a338dd6998a74e8483847403582cab67745bff821 \ --hash=sha256:9fea306c783e28884c29057a1d9baa11a349bbf99538ec1da44c8476563d1b25 \ --hash=sha256:a64ce8b0f2de1d2efd2ae40b0abe7f8ae6b29fbfb3812098ed5a6f8e235ad9bf \ --hash=sha256:a8d37a43c52917427e897cb2e429f67a449327394396a81034a4449b99afda59 \ --hash=sha256:a9cabbdcd03f1b6cd254d6dda8ae09b0252524be1592594c00b7895916cb1324 \ --hash=sha256:b03f91ae8c10a85c1613102c7bef5229b5379f343030a3ccefeca8a33414cf35 \ --hash=sha256:b8e36891080b87823aff3640c78649b91b8ff6eea3c0d70aeabd72ea43ab069b \ --hash=sha256:c1a9f55e0f46951874b863d1f3906dcb57df2d9be5c5847ba4dfb55b2c815249 \ --hash=sha256:c3d288439e11b5325b02ae6e9cc83e6805a62c40c5a6220bea9beb899c073b1c \ --hash=sha256:cd9af1276b5ca9e298bd79a26bda32fa9cc87ed095b2a9a60978d2ca058eaf87 \ --hash=sha256:d54855f04f8246ed7b6fc96b05d4871591143c46c0b6f4af874764ed0d2d6f06 \ --hash=sha256:de09668c1bf3b925c07e5762291602f0d789eca1b3a781f99c1c78f6cac0e7ea \ --hash=sha256:eca8b4510f6763f3d37359c2105df03a7a221a508f30e396a51d0713d462e68a # via -r requirements.in protobuf==6.33.6 \ --hash=sha256:0cd27b587afca21b7cfa59a74dcbd48a50f0a6400cfb59391340ad729d91d326 \ --hash=sha256:77179e006c476e69bf8e8ce866640091ec42e1beb80b213c3900006ecfba6901 \ --hash=sha256:7d29d9b65f8afef196f8334e80d6bc1d5d4adedb449971fefd3723824e6e77d3 \ --hash=sha256:9720e6961b251bde64edfdab7d500725a2af5280f3f4c87e57c0208376aa8c3a \ --hash=sha256:a6768d25248312c297558af96a9f9c929e8c4cee0659cb07e780731095f38135 \ --hash=sha256:bd56799fb262994b2c2faa1799693c95cc2e22c62f56fb43af311cae45d26f0e \ --hash=sha256:c96c37eec15086b79762ed265d59ab204dabc53056e3443e702d2681f4b39ce3 \ --hash=sha256:e2afbae9b8e1825e3529f88d514754e094278bb95eadc0e199751cdd9a2e82a2 \ --hash=sha256:e9db7e292e0ab79dd108d7f1a94fe31601ce1ee3f7b79e0692043423020b0593 \ --hash=sha256:f443a394af5ed23672bc6c486be138628fbe5c651ccbc536873d7da23d1868cf # via databricks-sdk pyasn1==0.6.3 \ --hash=sha256:697a8ecd6d98891189184ca1fa05d1bb00e2f84b5977c481452050549c8a72cf \ --hash=sha256:a80184d120f0864a52a073acc6fc642847d0be408e7c7252f31390c0f4eadcde # via pyasn1-modules pyasn1-modules==0.4.2 \ --hash=sha256:29253a9207ce32b64c3ac6600edc75368f98473906e8fd1043bd6b5b1de2c14a \ --hash=sha256:677091de870a80aae844b1ca6134f54652fa2c8c5a52aa396440ac3106e941e6 # via google-auth pycparser==3.0 \ --hash=sha256:600f49d217304a5902ac3c37e1281c9fe94e4d0489de643a9504c5cdfdfc6b29 \ --hash=sha256:b727414169a36b7d524c1c3e31839a521725078d7b2ff038656844266160a992 # via cffi python-dateutil==2.9.0.post0 \ --hash=sha256:37dd54208da7e1cd875388217d5e00ebd4179249f90fb72437e91a35459a0ad3 \ --hash=sha256:a8b2bc7bffae282281c8140a97d3aa9c14da0b136dfe83f850eea9a5f7470427 # via pandas requests==2.33.0 \ --hash=sha256:3324635456fa185245e24865e810cecec7b4caf933d7eb133dcde67d48cee69b \ --hash=sha256:c7ebc5e8b0f21837386ad0e1c8fe8b829fa5f544d8df3b2253bff14ef29d7652 # via # -r requirements.in # databricks-sdk six==1.17.0 \ --hash=sha256:4721f391ed90541fddacab5acf947aa0d3dc7d27b2e1e8eda2be8970586c3274 \ --hash=sha256:ff70335d468e7eb6ec65b95b99d3a2836546063f63acc5171de367e834932a81 # via python-dateutil urllib3==2.6.3 \ --hash=sha256:1b62b6884944a57dbe321509ab94fd4d3b307075e0c2eae991ac71ee15ad38ed \ --hash=sha256:bf272323e553dfb2e87d9bfd225ca7b0f467b919d7bbd355436d3fd37cb0acd4 # via requests ================================================ FILE: setup.py ================================================ from setuptools import setup, find_packages #python setup.py clean --all bdist_wheel setup( #this will be the package name you will see, e.g. the output of 'conda list' in anaconda prompt name = 'dbdemos', #some version number you may wish to add - increment this after every update version='0.6.34', author="Databricks", author_email=["quentin.ambard@databricks.com", "cal.reynolds@databricks.com"], description="Install databricks demos: notebooks, Delta Live Table Pipeline, DBSQL Dashboards, ML Models etc.", long_description=open("README.md").read(), long_description_content_type="text/markdown", url="https://github.com/databricks-demos/dbdemos", packages=find_packages(exclude=["tests", "tests.*"]), setup_requires=["wheel"], include_package_data=True, install_requires=["requests", "pandas", "databricks-sdk>=0.38.0"], license="Databricks License", license_files = ('LICENSE',), tests_require=[ "pytest" ], python_requires=">=3.7" ) ================================================ FILE: test/__init__.py ================================================ # Empty file to mark the directory as a Python package ================================================ FILE: test/test2.html ================================================

Your demo Lakehouse - Banking Fraud is ready!

Build your Banking platform and detect Fraud in real-time

Start with the first notebook lakehouse-fsi-fraud/00-FSI-fraud-detection-introduction-lakehouse using the cluster dbdemos-lakehouse-fsi-fraud-quentin_ambard

Notebook installed:

00-FSI-fraud-detection-introduction-lakehouse: Lakehouse - Fraud introduction
01-Data-ingestion
01.1-DLT-fraud-detection-SQL: Ingest data with Delta Live Table
02-Data-governance
02-UC-data-governance-ACL-fsi-fraud: Governance with Unity Catalog
03-BI-data-warehousing
03-BI-Datawarehousing-fraud: Datawarehousing & BI / Dashboarding
04-Data-Science-ML
04.1-AutoML-FSI-fraud: Build Fraud prediction model (AutoML)
04.2-automl-generated-notebook-fraud: Explore Fraud Prediction generated model
04.3-Model-serving-realtime-inference-fraud: Infer Fraud in realtime - serverless API
04.5-AB-testing-model-serving-fraud: Roll-out our new model with A/B testing.
05-Workflow-orchestration
05-Workflow-orchestration-fsi-fraud: Orchestrate churn prevention with Workflow

Delta Live Table Pipelines

DBSQL Dashboards

Initialization job started

We started a job to initialize your demo data (for DBSQL Dashboards & Delta Live Table). Please wait for the job completion to be able to access the dataset & dashboards...

Interactive cluster for the demo:

dbdemos-lakehouse-fsi-fraud-quentin_ambard. You can refresh your demo cluster with:
dbdemos.create_cluster('lakehouse-fsi-fraud')
================================================ FILE: test/test_installer.py ================================================ import dbdemos from dbdemos.conf import DemoNotebook from dbdemos.installer import Installer from dbdemos.installer_report import InstallerReport def test_html(): installer = InstallerReport("http://localhost") demo_name = "lakehouse-fsi-fraud" description = "Build your Banking platform and detect Fraud in real-time" title = "Lakehouse - Banking Fraud" install_path = "/Users/quentin.ambard@databricks.com/test_install_quentin" notebooks =[{"path": "_resources/00-setup", "title": "Prep data", "description": "Helpers & setup.", "pre_run": False, "publish_on_website": False, "add_cluster_setup_cell": False, "parameters": {}, "depends_on_previous": True}, {"path": "_resources/01-load-data", "title": "Prep data", "description": "Prep data for demo.", "pre_run": False, "publish_on_website": False, "add_cluster_setup_cell": False, "parameters": {}, "depends_on_previous": True}, {"path": "00-FSI-fraud-detection-introduction-lakehouse", "title": "Lakehouse - Fraud introduction", "description": "Start here to explore the Lakehouse.", "pre_run": False, "publish_on_website": True, "add_cluster_setup_cell": False, "parameters": {}, "depends_on_previous": True}, {"path": "01-Data-ingestion/01.1-DLT-fraud-detection-SQL", "title": "Ingest data with Delta Live Table", "description": "SQL DLT pipeline to ingest data & build clean tables.", "pre_run": True, "publish_on_website": True, "add_cluster_setup_cell": False, "parameters": {}, "depends_on_previous": True}, {"path": "02-Data-governance/02-UC-data-governance-ACL-fsi-fraud", "title": "Governance with Unity Catalog", "description": "Secure your tables, lineage, auditlog...", "pre_run": True, "publish_on_website": True, "add_cluster_setup_cell": True, "parameters": {}, "depends_on_previous": True}, {"path": "03-BI-data-warehousing/03-BI-Datawarehousing-fraud", "title": "Datawarehousing & BI / Dashboarding", "description": "Run interactive queries on top of your data", "pre_run": False, "publish_on_website": True, "add_cluster_setup_cell": False, "parameters": {}, "depends_on_previous": True}, {"path": "04-Data-Science-ML/04.1-AutoML-FSI-fraud", "title": "Build Fraud prediction model (AutoML)", "description": "Leverage Databricks AutoML to create a Fraud model in a few clicks", "pre_run": True, "publish_on_website": True, "add_cluster_setup_cell": True, "parameters": {}, "depends_on_previous": True}, {"path": "04-Data-Science-ML/04.2-automl-generated-notebook-fraud", "title": "Explore Fraud Prediction generated model", "description": "Explore the best Fraud model generated by AutoML and deploy it in production.", "pre_run": True, "publish_on_website": True, "add_cluster_setup_cell": True, "parameters": {"shap_enabled": "true"}, "depends_on_previous": True}, {"path": "04-Data-Science-ML/04.3-Model-serving-realtime-inference-fraud", "title": "Infer Fraud in realtime - serverless API", "description": "Once your model is deployed, run low latency inferences.", "pre_run": True, "publish_on_website": True, "add_cluster_setup_cell": True, "parameters": {}, "depends_on_previous": True}, {"path": "04-Data-Science-ML/04.4-Upgrade-to-imbalance-and-xgboost-model-fraud", "title": "Upgrade our model to XGboost", "description": "Improve AutoML model to handle imbalanced data.", "pre_run": True, "publish_on_website": True, "add_cluster_setup_cell": True, "parameters": {}, "depends_on_previous": True}, {"path": "04-Data-Science-ML/04.5-AB-testing-model-serving-fraud", "title": "Roll-out our new model with A/B testing.", "description": "Deploy the new model comparing its performance with the previous one.", "pre_run": True, "publish_on_website": True, "add_cluster_setup_cell": True, "parameters": {}, "depends_on_previous": True}, {"path": "05-Workflow-orchestration/05-Workflow-orchestration-fsi-fraud", "title": "Orchestrate churn prevention with Workflow", "description": "Orchestrate all tasks in a job and schedule your data/model refresh", "pre_run": False, "publish_on_website": True, "add_cluster_setup_cell": False, "parameters": {}, "depends_on_previous": True}, {"path": "_resources/00-global-setup", "title": "Global init", "description": "Global init", "pre_run": False, "publish_on_website": False, "add_cluster_setup_cell": False, "parameters": {}, "depends_on_previous": True}] notebooks = [DemoNotebook(n['path'], n['title'], n['description'], n['pre_run'], n['publish_on_website'], n['add_cluster_setup_cell'], n['parameters'], n['depends_on_previous']) for n in notebooks] job_id=720149424775058 run_id=120544375 cluster_id="0320-175126-exzit1ks" cluster_name="dbdemos-lakehouse-fsi-fraud-quentin_ambard" pipelines_ids= [{"name": "dbdemos-fsi-fraud-detection", "uid": "30177e65-8729-4363-9e9c-7bff51caddc3", "id": "dlt-fsi-fraud", "run_after_creation": True}] dashboards=[{"id": "9fc6a3bb-ff36-4e06-b5f9-912d7e77dc05", "name": "FSI Fraud Detection - dbdemos", "uid": "e1fa43f0-865d-4f5b-b884-64806fe2526a"}] workflows= [] from pathlib import Path html = installer.get_install_result_html(demo_name, description, title, install_path, notebooks, job_id, run_id, cluster_id, cluster_name, pipelines_ids, dashboards, workflows) with open("./test2.html", "w") as text_file: text_file.write(html) print(html) def test_list(): i = Installer("test","test","test","test","test","test") dbdemos.list_demos(None, i) def test_list_html(): deprecated_demos = ["uc-04-audit-log", "llm-dolly-chatbot"] installer = Installer("http://localhost", pat_token="test") from collections import defaultdict demos = defaultdict(lambda: []) #Define category order demos["lakehouse"] = [] for demo in installer.get_demos_available(): conf = installer.get_demo_conf(demo) if len(demos[conf.category]) == 0: demos[conf.category].append(conf) content = dbdemos.get_html_list_demos(demos) with open("./test_list_html.html", "w") as text_file: text_file.write(content) print(content) #test_list_html() #test_list() #test_html() ================================================ FILE: test/test_installer_genie.py ================================================ import dbdemos from dbdemos.conf import DemoNotebook, DemoConf, DataFolder from dbdemos.installer import Installer from dbdemos.installer_genie import InstallerGenie from databricks.sdk import WorkspaceClient def test_room_install(): room = [{'display_name': 'test quentin API', 'description': 'test Desc', 'table_identifiers': ['main.quentin_test.*'], 'sql_instructions': [{"title": "test", "content": "select * from test"}], 'instructions': 'This is a description', 'curated_questions': ["What's the Churn?", "Hoow many turbines do I have?"]}] demo_conf = {"genie_rooms": room, "name": "test", "category": "test", "title": "title", "description": "description", "bundle": True} conf = DemoConf(path="/Users/quentin.ambard@databricks.com/test_install_quentin", json_conf=demo_conf) with open("../local_conf_E2FE.json", "r") as r: import json c = json.loads(r.read()) installer = Installer(c['username'], c['pat_token'], c['url'], cloud = "AWS") genie_installer = InstallerGenie(installer) genie_installer.install_genies(conf, "/Users/quentin.ambard@databricks.com/test_install_quentin", warehouse_id="475b94ddc7cd5211", debug=True) def test_load_genie_data(): data_folders = [ {"source_folder":"fsi/fraud-transaction/customers", "source_format": "parquet", "target_table_name":"iot_parts", "target_format":"delta"}, {"source_folder":"fsi/fraud-transaction/fraud_report", "source_format": "parquet", "target_table_name":"iot_turbines", "target_format":"delta"} ] demo_conf = {"data_folders": data_folders, "name": "test", "category": "test", "title": "title", "description": "description", "bundle": True} conf = DemoConf(path="/Users/quentin.ambard@databricks.com/test_install_quentin", json_conf=demo_conf) with open("../local_conf_E2FE.json", "r") as r: import json c = json.loads(r.read()) installer = Installer(c['username'], c['pat_token'], c['url'], cloud = "AWS") genie_installer = InstallerGenie(installer) conf.catalog = 'dbdemos' conf.schema = 'test_quentin' genie_installer.load_genie_data(conf, warehouse_id="475b94ddc7cd5211", debug=True) def test_schema_creation(): demo_conf = {"name": "test", "category": "test", "title": "title", "description": "description", "bundle": True} conf = DemoConf(path="/Users/quentin.ambard@databricks.com/test_install_quentin", json_conf=demo_conf) with open("../local_conf_E2FE.json", "r") as r: import json c = json.loads(r.read()) installer = Installer(c['username'], c['pat_token'], c['url'], cloud = "AWS") genie_installer = InstallerGenie(installer) conf.catalog = 'dbdemos' conf.schema = 'test_quentin2' ws = WorkspaceClient(token=installer.db.conf.pat_token, host=installer.db.conf.workspace_url) genie_installer.create_schema(ws,conf, debug=True) def load_data_to_volume(): demo_conf = {"name": "test", "category": "test", "title": "title", "description": "description", "bundle": True} conf = DemoConf(path="/Users/quentin.ambard@databricks.com/test_install_quentin", json_conf=demo_conf) with open("../local_conf_E2FE.json", "r") as r: import json c = json.loads(r.read()) installer = Installer(c['username'], c['pat_token'], c['url'], cloud = "AWS") genie_installer = InstallerGenie(installer) conf.catalog = 'dbdemos' conf.schema = 'test_quentin2' ws = WorkspaceClient(token=installer.db.conf.pat_token, host=installer.db.conf.workspace_url) data_folder = DataFolder(source_folder="fsi/fraud-transaction/customers", source_format="parquet", target_table_name="iot_parts", target_format="delta") genie_installer.load_data_to_volume(ws, data_folder, warehouse_id="475b94ddc7cd5211", conf=conf, debug=True) def test_load_data(): demo_conf = {"name": "test", "category": "test", "title": "title", "description": "description", "bundle": True} conf = DemoConf(path="/Users/quentin.ambard@databricks.com/test_install_quentin", json_conf=demo_conf) with open("../local_conf_E2FE.json", "r") as r: import json c = json.loads(r.read()) installer = Installer(c['username'], c['pat_token'], c['url'], cloud = "AWS") genie_installer = InstallerGenie(installer) conf.catalog = 'dbdemos' conf.schema = 'test_quentin2' ws = WorkspaceClient(token=installer.db.conf.pat_token, host=installer.db.conf.workspace_url) data_folder = DataFolder(source_folder="fsi/fraud-transaction/customers", source_format="parquet", target_table_name="dbdemos_aibi_customers", target_format="delta") genie_installer.load_data(ws, data_folder, warehouse_id="475b94ddc7cd5211", conf=conf, debug=True) #test_load_data() test_room_install() #test_schema_creation() #test_html() ================================================ FILE: test/test_job_bundler.py ================================================ import unittest from dbdemos.job_bundler import JobBundler from dbdemos.conf import Conf import json class TestJobBundler(unittest.TestCase): def setUp(self): with open('local_conf.json', 'r') as f: local_conf = json.load(f) self.conf = Conf(username="test_user@test.com", workspace_url="https://test.cloud.databricks.com", org_id="1234567890", pat_token="test_token", repo_url="https://github.com/databricks-demos/dbdemos-notebooks", github_token=local_conf.get('github_token'), branch="main") self.job_bundler = JobBundler(self.conf) def test_get_changed_files_since_commit(self): # Test with a known repository and commit owner = "databricks-demos" repo = "dbdemos-notebooks" base_commit = "0652f84e30f3ea2e6802dbe5f36538a30f0d8aa1" # or use a specific commit SHA last_commit = "b65796d63f628eacc32f7160033181c3477997dd" files = self.job_bundler.get_changed_files_since_commit(owner, repo, base_commit, last_commit) # Check that we got a list of files back self.assertIsInstance(files, list, "Should return a list of changed files") self.assertEqual(files, ['demo-FSI/lakehouse-fsi-smart-claims/02-Data-Science-ML/02.1-Model-Training.py']) # Check that file paths are strings if files: self.assertIsInstance(files[0], str, "File paths should be strings") # Test with last_commit = None to get changes since base commit up to HEAD files_to_head = self.job_bundler.get_changed_files_since_commit(owner, repo, base_commit) # Check that we got a non-empty list back self.assertIsInstance(files_to_head, list, "Should return a list of changed files") self.assertGreater(len(files_to_head), 0, "Should have at least one changed file") # Check that file paths are strings self.assertIsInstance(files_to_head[0], str, "File paths should be strings") def test_check_if_demo_file_changed_since_commit(self): from dbdemos.conf import DemoConf # Create a demo config for testing with required name field demo_conf = DemoConf("demo-FSI/lakehouse-fsi-smart-claims", { "name": "lakehouse-fsi-smart-claims", "title": "FSI Smart Claims", "category": "test", "description": "description", "bundle": True }) # Test with known commits where we know a file changed in that demo base_commit = "0652f84e30f3ea2e6802dbe5f36538a30f0d8aa1" last_commit = "b65796d63f628eacc32f7160033181c3477997dd" # This should return True as we know a file changed in this demo between these commits has_changes = self.job_bundler.check_if_demo_file_changed_since_commit(demo_conf, base_commit, last_commit) self.assertTrue(has_changes, "Should detect changes in demo files") # Test with a demo path that we know didn't change demo_conf_no_changes = DemoConf("demo-retail", { "name": "demo-retail", "title": "Demo Retail", "category": "test", "description": "description", "bundle": True }) has_changes = self.job_bundler.check_if_demo_file_changed_since_commit(demo_conf_no_changes, base_commit, last_commit) self.assertFalse(has_changes, "Should not detect changes in unmodified demo") if __name__ == '__main__': unittest.main() ================================================ FILE: test/test_list_demos.html ================================================

Lakehouse - Credit Decisioning

Build your banking data platform and identify credit worthy customers
dbdemos.install('lakehouse-fsi-credit')

Retail Banking - Fraud Detection

Build your Banking platform and detect Fraud in real-time. End 2 End demo, with Model Serving & realtime fraud inference A/B testing.
dbdemos.install('lakehouse-fsi-fraud')

Lakehouse for HLS: Patient readmission

Build your data platform and personalized health care to reduce readmission risk
dbdemos.install('lakehouse-hls-readmission')

Lakehouse for IoT & Predictive Maintenance

Detect faulty wind turbine: Ingestion (DLT), BI, Predictive Maintenance (ML), Governance (UC), Orchestration
dbdemos.install('lakehouse-iot-platform')

Lakehouse for C360: Reducing Customer Churn

Centralize customer data and reduce churn: Ingestion (DLT), BI, Predictive Maintenance (ML), Governance (UC), Orchestration.
dbdemos.install('lakehouse-retail-c360')
================================================ FILE: test/test_list_demos2.html ================================================

Lakehouse - Credit Decisioning

Build your banking data platform and identify credit worthy customers
dbdemos.install('lakehouse-fsi-credit')

Retail Banking - Fraud Detection

Build your Banking platform and detect Fraud in real-time. End 2 End demo, with Model Serving & realtime fraud inference A/B testing.
dbdemos.install('lakehouse-fsi-fraud')

Lakehouse for HLS: Patient readmission

Build your data platform and personalized health care to reduce readmission risk
dbdemos.install('lakehouse-hls-readmission')

Lakehouse for IoT & Predictive Maintenance

Detect faulty wind turbine: Ingestion (DLT), BI, Predictive Maintenance (ML), Governance (UC), Orchestration
dbdemos.install('lakehouse-iot-platform')

Lakehouse for C360: Reducing Customer Churn

Centralize customer data and reduce churn: Ingestion (DLT), BI, Predictive Maintenance (ML), Governance (UC), Orchestration.
dbdemos.install('lakehouse-retail-c360')

Databricks Autoloader (cloudfile)

Incremental ingestion on your cloud storage folder.
dbdemos.install('auto-loader')

CDC Pipeline with Delta

Process CDC data to build an entire pipeline and materialize your operational tables in your lakehouse.
dbdemos.install('cdc-pipeline')

Orchestrate and run your dbt jobs

Launch your dbt pipelines in production using a SQL Warehouse. Leverage Databricks Workflow (orchestration) and add a dbt task in your transformation pipeline.
dbdemos.install('dbt-on-databricks')

Delta Lake

Store your table with Delta Lake & discover how Delta Lake can simplify your Data Pipelines.
dbdemos.install('delta-lake')

CDC pipeline with Delta Live Table.

Ingest Change Data Capture flow with APPLY INTO and simplify SCDT2 implementation.
dbdemos.install('dlt-cdc')

Full Delta Live Tables Pipeline - Loan

Ingest loan data and implement a DLT pipeline with quarantine.
dbdemos.install('dlt-loans')

Unit Testing Delta Live Table (DLT) for production-grade pipelines

Deploy robust Delta Live Table pipelines with unit tests leveraging expectation.
dbdemos.install('dlt-unit-test')

Spark Streaming - Advanced

Deep dive on Spark Streaming with Delta to build webapp user sessions from clicks, with custom aggregation state management.
dbdemos.install('streaming-sessionization')

Delta Sharing - Airlines

Share your data to external organization using Delta Sharing.
dbdemos.install('delta-sharing-airlines')

Table ACL & Dynamic Views with UC

Discover to GRANT permission on your table with UC and implement more advanced control such as data masking at row-level, based on each user.
dbdemos.install('uc-01-acl')

Access data on External Location

Discover how you can secure files/table in external location (cloud storage like S3/ADLS/GCS) with simple GRANT command.
dbdemos.install('uc-02-external-location')

Data Lineage with Unity Catalog

Discover data lineage with Unity Catalog: table to table and column to column
dbdemos.install('uc-03-data-lineage')

Audit-log with Databricks

[DEPRECATED - prefer uc-04-system-tables] Track usage with Audit-log.
dbdemos.install('uc-04-audit-log')

System Tables: Billing Forecast, Usage and Audit

Track and analysis usage, billing & access with UC System tables.
dbdemos.install('uc-04-system-tables')

Upgrade table to Unity Catalog

Discover how to upgrade your hive_metastore tables to Unity Catalog to benefit from UC capabilities: Security/ACL/Row-level/Lineage/Audit...
dbdemos.install('uc-05-upgrade')

Image classification - Default detection

Deep Learning using Databricks Lakehouse: detect defaults in PCBs with Hugging Face transformers and PyTorch Lightning.
dbdemos.install('computer-vision-pcb')

Feature Store and Online Inference

Leverage Databricks Feature Store with streaming and online store.
dbdemos.install('feature-store')

Build your Chat Bot with Dolly

Democratizing the magic of ChatGPT with open models and Databricks Lakehouse (starts GPU)
dbdemos.install('llm-dolly-chatbot')

MLOps - End 2 end pipeline

Automate your model deployment with MLFlow webhook & repo, end 2 end!
dbdemos.install('mlops-end2end')

Pandas API with spark backend (Koalas)

Let you Data Science team scale to TB of data while working with Pandas API, without having to learn & move to another framework.
dbdemos.install('pandas-on-spark')

Data Warehousing with Identity, Primary Key & Foreign Key

Define your schema with auto incremental column and Primary + Foreign Key. Ideal for Data Warehouse & BI support!
dbdemos.install('identity-pk-fk')

AI Functions: query LLM with DBSQL

Call Azure OpenAI's model from your Lakehouse data using AI_GENERATE_TEXT()
dbdemos.install('sql-ai-functions')
================================================ FILE: test/test_notebook_parser.py ================================================ import re import base64 import urllib.parse import json from dbdemos.notebook_parser import NotebookParser def test_close_cell(): with open("../dbdemos/template/LICENSE.html", "r") as f: p = NotebookParser(f.read()) p.hide_commands_and_results() #print(p.get_html()) #p.hide_command_result(0) def test_automl(): with open("../dbdemos/bundles/mlops-end2end/install_package/01_feature_engineering.html", "r") as f: p = NotebookParser(f.read()) assert "Data exploration notebook" in p.content assert "Please run the notebook cells to get your AutoML links" not in p.content p.remove_automl_result_links() assert "Data exploration notebook" not in p.content assert "Please run the notebook cells to get your AutoML links" in p.content #print(p.get_html()) #p.hide_command_result(0) def test_change_relative_links_for_minisite(): with open("../dbdemos/bundles/llm-dolly-chatbot/install_package/01-Dolly-Introduction.html", "r") as f: p = NotebookParser(f.read()) assert p.contains("""n the next [03-Q&A-prompt-engineering-for-dolly]($./03-Q&A-prompt-engineering-for-dolly) not""") p.change_relative_links_for_minisite() assert p.contains("""n the next [03-Q&A-prompt-engineering-for-dolly](./03-Q&A-prompt-engineering-for-dolly.html) not""") def test_parser_contains(): with open("../dbdemos/bundles/mlops-end2end/install_package/_resources/00-setup.html", "r") as f: p = NotebookParser(f.read()) assert p.contains("00-global-setup") p.replace_in_notebook('00-global-setup', './00-global-setup-test', True) assert p.contains("./00-global-setup-test") #print(p.get_html()) #p.hide_command_result(0) def test_parser_notebook(): with open("../dbdemos/bundles/lakehouse-retail-c360/install_package/01-Data-ingestion/01.1-DLT-churn-SQL.html", "r") as f: p = NotebookParser(f.read()) assert p.contains("""""") p.replace_dynamic_links_pipeline([{"id": "dlt-churn", "uid": "uuuiduuu"}]) assert p.contains("""""") with open("../dbdemos/bundles/dlt-cdc/install_package/01-Retail_DLT_CDC_SQL.html", "r") as f: p = NotebookParser(f.read()) assert p.contains("""""") p.replace_dynamic_links_pipeline([{"id": "dlt-cdc", "uid": "uuuiduuu"}]) assert p.contains("""""") with open("../dbdemos/bundles/dbt-on-databricks/install_package/00-DBT-on-databricks.html", "r") as f: p = NotebookParser(f.read()) p.replace_dynamic_links_pipeline([{"id": "dlt-test", "uid": "uuuiduuu"}]) #assert """Delta Live Table Pipeline for unit-test demo""") p.replace_dynamic_links_workflow([{'uid': 450396635732004, 'run_id': 3426479, 'id': 'dbt'}]) assert p.contains("""""") assert p.contains("""""") p.replace_dynamic_links_repo([{'uid': '/Repos/quentin.ambard@databricks.com/dbdemos-dbt-databricks-c360', 'id': 'dbt-databricks-c360', 'repo_id': 3891038073826409}]) assert p.contains("""""") test_automl() test_close_cell() test_automl() test_parser_contains() test_parser_notebook() test_change_relative_links_for_minisite() ================================================ FILE: test_demo.py ================================================ import json from dbdemos.conf import Conf from dbdemos.job_bundler import JobBundler from dbdemos.packager import Packager def load_conf(conf_path): with open(conf_path, "r") as r: c = json.loads(r.read()) with open("./dbdemos/resources/default_cluster_config.json", "r") as cc: default_cluster_template = cc.read() with open("./dbdemos/resources/default_test_job_conf.json", "r") as cc: default_cluster_job_template = cc.read() return Conf(c['username'], c['url'], c['org_id'], c['pat_token'], default_cluster_template, default_cluster_job_template, c['repo_staging_path'], c['repo_name'], c['repo_url'], c['branch'],github_token=c['github_token']) def bundle(conf, demo_path_in_repo): bundler = JobBundler(conf) # the bundler will use a stating repo dir in the workspace to analyze & run content. bundler.reset_staging_repo(skip_pull=False) bundler.add_bundle(demo_path_in_repo) # Run the jobs (only if there is a new commit since the last time, or failure, or force execution) bundler.start_and_wait_bundle_jobs(force_execution = False, skip_execution=True) packager = Packager(conf, bundler) packager.package_all() #Loads conf (your workspace url & token) : local_conf.json. # Conf file example. This is what will be used as repo content to build the package. You can use the repo you're working on. """ { "cloud": "AWS", "pat_token": "xxx", "username": "xx.xx@databricks.com", "url": "https://e2-demo-field-eng.cloud.databricks.com/", "org_id": "1444828305810485", "repo_staging_path": "/Repos/xx.xxx@databricks.com", "repo_name": "field-demos, "repo_url": "", "branch": "master", "github_token": "ghp_xxx" } """ conf = load_conf("local_conf_azure.json") #This will create the bundle and save it in the local ./bundles and ./minisite folder. #change the path with your demo path in the https://github.com/databricks/field-demo repo (your fork) try: bundle(conf, "aibi/aibi-marketing-campaign") except Exception as e: print(f"Failure building the job: {e}") raise e # Now that your demo is packaged, we can install it & test. # We recommend testing in a new workspace so that you have a fresh install # Load the conf for the workspace where you want to install the demo: import dbdemos try: #Install your demo in a given folder: dbdemos.install("aibi-marketing-campaign", "/Users/andrea.picasso@databricks.com/test_install_aibidemo", True, conf.username, conf.pat_token, conf.workspace_url, cloud="AWS", start_cluster = False) #Check if the init job is successful: #dbdemos.check_status("sql-ai-functions", conf.username, conf.pat_token, conf.workspace_url, cloud="AWS") print("looking good! Ready to send your PR with your new demo!") except Exception as e: print(f"Failure installing the demo: {e}") raise e ================================================ FILE: test_list_html.html ================================================

Lakehouse for Retail Banking: Credit Decisioning

Build your banking data platform and identify credit worthy customers
dbdemos.install('lakehouse-fsi-credit')