Full Code of totalhack/zillion for AI

master c670be726770 cached

117 files

2.7 MB

700.7k tokens

969 symbols

1 requests

Download .txt

Showing preview only (2,802K chars total). Download the full file or copy to clipboard to get everything.

Repository: totalhack/zillion
Branch: master
Commit: c670be726770
Files: 117
Total size: 2.7 MB

Directory structure:
gitextract_rw4myeyk/

├── .gitattributes
├── .github/
│   └── FUNDING.yml
├── .gitignore
├── .pre-commit-config.yaml
├── .pylintrc
├── .readthedocs.yml
├── AUTHORS.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── Makefile
├── README.md
├── dev_config.yml
├── docker-compose-nlp.yml
├── docker-compose.yml
├── docs/
│   ├── build_markdown.py
│   ├── markdown/
│   │   ├── contributing.md
│   │   ├── readme_badges.md
│   │   ├── readme_contents.md
│   │   ├── readme_docs.md
│   │   ├── readme_how_to_contribute.md
│   │   ├── readme_intro.md
│   │   └── readme_toc.md
│   ├── mkdocs/
│   │   ├── api.md
│   │   ├── contributing.md
│   │   ├── css/
│   │   │   └── extra.css
│   │   ├── index.md
│   │   ├── zillion.configs.md
│   │   ├── zillion.core.md
│   │   ├── zillion.datasource.md
│   │   ├── zillion.dialects.md
│   │   ├── zillion.field.md
│   │   ├── zillion.model.md
│   │   ├── zillion.nlp.md
│   │   ├── zillion.report.md
│   │   ├── zillion.scripts.md
│   │   ├── zillion.sql_utils.md
│   │   ├── zillion.version.md
│   │   └── zillion.warehouse.md
│   ├── mkdocs_index.md
│   ├── readme.md
│   └── requirements.txt
├── examples/
│   ├── baseball_warehouse.json
│   ├── example_wh_config.json
│   ├── minimal_example.py
│   └── sample_config.yaml
├── mkdocs.yml
├── pyproject.toml
├── tests/
│   ├── __init__.py
│   ├── conftest.py
│   ├── dma_zip.csv
│   ├── dma_zip.html
│   ├── dma_zip.json
│   ├── dma_zip.xlsx
│   ├── pytest.ini
│   ├── setup/
│   │   ├── campaigns.csv
│   │   ├── common.sqlite.sql
│   │   ├── create_testdb2_sqlite.py
│   │   ├── duckdb/
│   │   │   ├── load.sql
│   │   │   └── schema.sql
│   │   ├── init_mysql_data.sh
│   │   ├── init_postgres_data.sh
│   │   ├── leads.csv
│   │   ├── partner_sibling.csv
│   │   ├── partners.csv
│   │   ├── sales.csv
│   │   ├── testdb1.sqlite.sql
│   │   ├── zillion_db.sqlite.sql
│   │   ├── zillion_test.mysql.sql
│   │   └── zillion_test.postgres.sql
│   ├── test_adhoc_ds_config.json
│   ├── test_config.yaml
│   ├── test_core.py
│   ├── test_duckdb.py
│   ├── test_duckdb_wh_config.json
│   ├── test_example_wh_config.py
│   ├── test_include_wh_config.json
│   ├── test_mysql.py
│   ├── test_mysql_ds_config.json
│   ├── test_nlp.py
│   ├── test_performance.py
│   ├── test_postgresql.py
│   ├── test_postgresql_ds_config.json
│   ├── test_reports.py
│   ├── test_scripts.py
│   ├── test_sqlite_ds_config.json
│   ├── test_table_config.json
│   ├── test_utils.py
│   ├── test_wh_config.json
│   ├── testdb1
│   ├── testdb2
│   ├── zillion_test_0.7.duckdb
│   └── zillion_test_1.x.duckdb
└── zillion/
    ├── __init__.py
    ├── configs.py
    ├── core.py
    ├── datasource.py
    ├── dialects/
    │   ├── __init__.py
    │   ├── conversions.py
    │   ├── duckdb.py
    │   ├── mysql.py
    │   ├── postgresql.py
    │   └── sqlite.py
    ├── field.py
    ├── model.py
    ├── nlp.py
    ├── report.py
    ├── scripts/
    │   ├── __init__.py
    │   ├── bootstrap_datasource_config.py
    │   ├── json_to_yaml.py
    │   ├── load_config.py
    │   ├── run_report.py
    │   └── yaml_to_json.py
    ├── sql_utils.py
    ├── version.py
    ├── warehouse.py
    └── zillion_test

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitattributes
================================================
# This is a hack to get lingquist to ignore the SQL dumps that are causing
# this repo to not be viewed as a pythong project.
tests/* linguist-documentation

================================================
FILE: .github/FUNDING.yml
================================================
github: [totalhack]

================================================
FILE: .gitignore
================================================
*.egg-info
build
docs/site
*.pyc
*.out
dist
.DS_Store
.idea/
tests/test_wh_config.yaml
volumes
.python-version
GEMINI.md
test-all.sh


================================================
FILE: .pre-commit-config.yaml
================================================
repos:
-   repo: https://github.com/psf/black
    rev: stable
    hooks:
    - id: black
      language_version: python3.7


================================================
FILE: .pylintrc
================================================
[MASTER]

# A comma-separated list of package or module names from where C extensions may
# be loaded. Extensions are loading into the active Python interpreter and may
# run arbitrary code.
extension-pkg-whitelist=

# Add files or directories to the blacklist. They should be base names, not
# paths.
ignore=CVS

# Add files or directories matching the regex patterns to the blacklist. The
# regex matches against base names, not paths.
ignore-patterns=

# Python code to execute, usually for sys.path manipulation such as
# pygtk.require().
#init-hook=

# Use multiple processes to speed up Pylint. Specifying 0 will auto-detect the
# number of processors available to use.
jobs=1

# Control the amount of potential inferred values when inferring a single
# object. This can help the performance when dealing with large functions or
# complex, nested conditions.
limit-inference-results=100

# List of plugins (as comma separated values of python modules names) to load,
# usually to register additional checkers.
load-plugins=

# Pickle collected data for later comparisons.
persistent=yes

# Specify a configuration file.
#rcfile=

# When enabled, pylint would attempt to guess common misconfiguration and emit
# user-friendly hints instead of false-positive error messages.
suggestion-mode=yes

# Allow loading of arbitrary C extensions. Extensions are imported into the
# active Python interpreter and may run arbitrary code.
unsafe-load-any-extension=no


[MESSAGES CONTROL]

# Only show warnings with the listed confidence levels. Leave empty to show
# all. Valid levels: HIGH, INFERENCE, INFERENCE_FAILURE, UNDEFINED.
confidence=

# Disable the message, report, category or checker with the given id(s). You
# can either give multiple identifiers separated by comma (,) or put this
# option multiple times (only on the command line, not in the configuration
# file where it should appear only once). You can also use "--disable=all" to
# disable everything first and then reenable specific checks. For example, if
# you want to run only the similarities checker, you can use "--disable=all
# --enable=similarities". If you want to run only the classes checker, but have
# no Warning level messages displayed, use "--disable=all --enable=classes
# --disable=W".
disable=print-statement,
        parameter-unpacking,
        unpacking-in-except,
        old-raise-syntax,
        backtick,
        long-suffix,
        old-ne-operator,
        old-octal-literal,
        import-star-module-level,
        non-ascii-bytes-literal,
        raw-checker-failed,
        bad-inline-option,
        locally-disabled,
        file-ignored,
        suppressed-message,
        useless-suppression,
        deprecated-pragma,
        use-symbolic-message-instead,
        apply-builtin,
        basestring-builtin,
        buffer-builtin,
        cmp-builtin,
        coerce-builtin,
        execfile-builtin,
        file-builtin,
        long-builtin,
        raw_input-builtin,
        reduce-builtin,
        standarderror-builtin,
        unicode-builtin,
        xrange-builtin,
        coerce-method,
        delslice-method,
        getslice-method,
        setslice-method,
        no-absolute-import,
        old-division,
        dict-iter-method,
        dict-view-method,
        next-method-called,
        metaclass-assignment,
        indexing-exception,
        raising-string,
        reload-builtin,
        oct-method,
        hex-method,
        nonzero-method,
        cmp-method,
        input-builtin,
        round-builtin,
        intern-builtin,
        unichr-builtin,
        map-builtin-not-iterating,
        zip-builtin-not-iterating,
        range-builtin-not-iterating,
        filter-builtin-not-iterating,
        using-cmp-argument,
        eq-without-hash,
        div-method,
        idiv-method,
        rdiv-method,
        exception-message-attribute,
        invalid-str-codec,
        sys-max-int,
        bad-python3-import,
        deprecated-string-function,
        deprecated-str-translate-call,
        deprecated-itertools-function,
        deprecated-types-field,
        next-method-defined,
        dict-items-not-iterating,
        dict-keys-not-iterating,
        dict-values-not-iterating,
        deprecated-operator-function,
        deprecated-urllib-function,
        xreadlines-attribute,
        deprecated-sys-function,
        exception-escape,
        comprehension-escape,
        bad-continuation,
        too-many-lines,
        redefined-builtin,
        too-few-public-methods,
        too-many-ancestors,
        invalid-name,
        no-member,
        unnecessary-pass,
        abstract-method,
        arguments-differ,
        no-self-use,
        protected-access,
        too-many-arguments,
        unused-argument,
        attribute-defined-outside-init,
        broad-except,
        return-in-init,
        bare-except,
        unused-wildcard-import,
        too-many-locals,
        too-many-branches,
        duplicate-code

# Enable the message, report, category or checker with the given id(s). You can
# either give multiple identifier separated by comma (,) or put this option
# multiple time (only on the command line, not in the configuration file where
# it should appear only once). See also the "--disable" option for examples.
enable=c-extension-no-member


[REPORTS]

# Python expression which should return a note less than 10 (10 is the highest
# note). You have access to the variables errors warning, statement which
# respectively contain the number of errors / warnings messages and the total
# number of statements analyzed. This is used by the global evaluation report
# (RP0004).
evaluation=10.0 - ((float(5 * error + warning + refactor + convention) / statement) * 10)

# Template used to display messages. This is a python new-style format string
# used to format the message information. See doc for all details.
#msg-template=

# Set the output format. Available formats are text, parseable, colorized, json
# and msvs (visual studio). You can also give a reporter class, e.g.
# mypackage.mymodule.MyReporterClass.
output-format=text

# Tells whether to display a full report or only the messages.
reports=no

# Activate the evaluation score.
score=yes


[REFACTORING]

# Maximum number of nested blocks for function / method body
max-nested-blocks=5

# Complete name of functions that never returns. When checking for
# inconsistent-return-statements if a never returning function is called then
# it will be considered as an explicit return statement and no message will be
# printed.
never-returning-functions=sys.exit


[BASIC]

# Naming style matching correct argument names.
argument-naming-style=snake_case

# Regular expression matching correct argument names. Overrides argument-
# naming-style.
#argument-rgx=

# Naming style matching correct attribute names.
attr-naming-style=snake_case

# Regular expression matching correct attribute names. Overrides attr-naming-
# style.
#attr-rgx=

# Bad variable names which should always be refused, separated by a comma.
bad-names=foo,
          bar,
          baz,
          toto,
          tutu,
          tata

# Naming style matching correct class attribute names.
class-attribute-naming-style=any

# Regular expression matching correct class attribute names. Overrides class-
# attribute-naming-style.
#class-attribute-rgx=

# Naming style matching correct class names.
class-naming-style=PascalCase

# Regular expression matching correct class names. Overrides class-naming-
# style.
#class-rgx=

# Naming style matching correct constant names.
const-naming-style=UPPER_CASE

# Regular expression matching correct constant names. Overrides const-naming-
# style.
#const-rgx=

# Minimum line length for functions/classes that require docstrings, shorter
# ones are exempt.
docstring-min-length=-1

# Naming style matching correct function names.
function-naming-style=snake_case

# Regular expression matching correct function names. Overrides function-
# naming-style.
#function-rgx=

# Good variable names which should always be accepted, separated by a comma.
good-names=i,
           j,
           k,
           ex,
           Run,
           _

# Include a hint for the correct naming format with invalid-name.
include-naming-hint=no

# Naming style matching correct inline iteration names.
inlinevar-naming-style=any

# Regular expression matching correct inline iteration names. Overrides
# inlinevar-naming-style.
#inlinevar-rgx=

# Naming style matching correct method names.
method-naming-style=snake_case

# Regular expression matching correct method names. Overrides method-naming-
# style.
#method-rgx=

# Naming style matching correct module names.
module-naming-style=snake_case

# Regular expression matching correct module names. Overrides module-naming-
# style.
#module-rgx=

# Colon-delimited sets of names that determine each other's naming style when
# the name regexes allow several styles.
name-group=

# Regular expression which should only match function or class names that do
# not require a docstring.
no-docstring-rgx=^_

# List of decorators that produce properties, such as abc.abstractproperty. Add
# to this list to register other decorators that produce valid properties.
# These decorators are taken in consideration only for invalid-name.
property-classes=abc.abstractproperty

# Naming style matching correct variable names.
variable-naming-style=snake_case

# Regular expression matching correct variable names. Overrides variable-
# naming-style.
#variable-rgx=


[FORMAT]

# Expected format of line ending, e.g. empty (any line ending), LF or CRLF.
expected-line-ending-format=

# Regexp for a line that is allowed to be longer than the limit.
ignore-long-lines=^\s*(# )?<?https?://\S+>?$

# Number of spaces of indent required inside a hanging or continued line.
indent-after-paren=4

# String used as indentation unit. This is usually "    " (4 spaces) or "\t" (1
# tab).
indent-string='    '

# Maximum number of characters on a single line.
max-line-length=100

# Maximum number of lines in a module.
max-module-lines=1000

# List of optional constructs for which whitespace checking is disabled. `dict-
# separator` is used to allow tabulation in dicts, etc.: {1  : 1,\n222: 2}.
# `trailing-comma` allows a space between comma and closing bracket: (a, ).
# `empty-line` allows space-only lines.
no-space-check=trailing-comma,
               dict-separator

# Allow the body of a class to be on the same line as the declaration if body
# contains single statement.
single-line-class-stmt=no

# Allow the body of an if to be on the same line as the test if there is no
# else.
single-line-if-stmt=no


[LOGGING]

# Format style used to check logging format string. `old` means using %
# formatting, while `new` is for `{}` formatting.
logging-format-style=old

# Logging modules to check that the string format arguments are in logging
# function parameter format.
logging-modules=logging


[MISCELLANEOUS]

# List of note tags to take in consideration, separated by a comma.
notes=FIXME,
      XXX,
      TODO


[SIMILARITIES]

# Ignore comments when computing similarities.
ignore-comments=yes

# Ignore docstrings when computing similarities.
ignore-docstrings=yes

# Ignore imports when computing similarities.
ignore-imports=no

# Minimum lines number of a similarity.
min-similarity-lines=4


[SPELLING]

# Limits count of emitted suggestions for spelling mistakes.
max-spelling-suggestions=4

# Spelling dictionary name. Available dictionaries: none. To make it working
# install python-enchant package..
spelling-dict=

# List of comma separated words that should not be checked.
spelling-ignore-words=

# A path to a file that contains private dictionary; one word per line.
spelling-private-dict-file=

# Tells whether to store unknown words to indicated private dictionary in
# --spelling-private-dict-file option instead of raising a message.
spelling-store-unknown-words=no


[STRING]

# This flag controls whether the implicit-str-concat-in-sequence should
# generate a warning on implicit string concatenation in sequences defined over
# several lines.
check-str-concat-over-line-jumps=no


[TYPECHECK]

# List of decorators that produce context managers, such as
# contextlib.contextmanager. Add to this list to register other decorators that
# produce valid context managers.
contextmanager-decorators=contextlib.contextmanager

# List of members which are set dynamically and missed by pylint inference
# system, and so shouldn't trigger E1101 when accessed. Python regular
# expressions are accepted.
generated-members=

# Tells whether missing members accessed in mixin class should be ignored. A
# mixin class is detected if its name ends with "mixin" (case insensitive).
ignore-mixin-members=yes

# Tells whether to warn about missing members when the owner of the attribute
# is inferred to be None.
ignore-none=yes

# This flag controls whether pylint should warn about no-member and similar
# checks whenever an opaque object is returned when inferring. The inference
# can return multiple potential results while evaluating a Python object, but
# some branches might not be evaluated, which results in partial inference. In
# that case, it might be useful to still emit no-member and other checks for
# the rest of the inferred objects.
ignore-on-opaque-inference=yes

# List of class names for which member attributes should not be checked (useful
# for classes with dynamically set attributes). This supports the use of
# qualified names.
ignored-classes=optparse.Values,thread._local,_thread._local

# List of module names for which member attributes should not be checked
# (useful for modules/projects where namespaces are manipulated during runtime
# and thus existing member attributes cannot be deduced by static analysis. It
# supports qualified module names, as well as Unix pattern matching.
ignored-modules=

# Show a hint with possible names when a member name was not found. The aspect
# of finding the hint is based on edit distance.
missing-member-hint=yes

# The minimum edit distance a name should have in order to be considered a
# similar match for a missing member name.
missing-member-hint-distance=1

# The total number of similar names that should be taken in consideration when
# showing a hint for a missing member.
missing-member-max-choices=1


[VARIABLES]

# List of additional names supposed to be defined in builtins. Remember that
# you should avoid defining new builtins when possible.
additional-builtins=

# Tells whether unused global variables should be treated as a violation.
allow-global-unused-variables=yes

# List of strings which can identify a callback function by name. A callback
# name must start or end with one of those strings.
callbacks=cb_,
          _cb

# A regular expression matching the name of dummy variables (i.e. expected to
# not be used).
dummy-variables-rgx=_+$|(_[a-zA-Z0-9_]*[a-zA-Z0-9]+?$)|dummy|^ignored_|^unused_

# Argument names that match this expression will be ignored. Default to name
# with leading underscore.
ignored-argument-names=_.*|^ignored_|^unused_

# Tells whether we should check for unused import in __init__ files.
init-import=no

# List of qualified module names which can have objects that can redefine
# builtins.
redefining-builtins-modules=six.moves,past.builtins,future.builtins,builtins,io


[CLASSES]

# List of method names used to declare (i.e. assign) instance attributes.
defining-attr-methods=__init__,
                      __new__,
                      setUp

# List of member names, which should be excluded from the protected access
# warning.
exclude-protected=_asdict,
                  _fields,
                  _replace,
                  _source,
                  _make

# List of valid names for the first argument in a class method.
valid-classmethod-first-arg=cls

# List of valid names for the first argument in a metaclass class method.
valid-metaclass-classmethod-first-arg=cls


[DESIGN]

# Maximum number of arguments for function / method.
max-args=5

# Maximum number of attributes for a class (see R0902).
max-attributes=7

# Maximum number of boolean expressions in an if statement.
max-bool-expr=5

# Maximum number of branch for function / method body.
max-branches=12

# Maximum number of locals for function / method body.
max-locals=15

# Maximum number of parents for a class (see R0901).
max-parents=7

# Maximum number of public methods for a class (see R0904).
max-public-methods=20

# Maximum number of return / yield for function / method body.
max-returns=6

# Maximum number of statements in function / method body.
max-statements=50

# Minimum number of public methods for a class (see R0903).
min-public-methods=2


[IMPORTS]

# Allow wildcard imports from modules that define __all__.
allow-wildcard-with-all=no

# Analyse import fallback blocks. This can be used to support both Python 2 and
# 3 compatible code, which means that the block might have code that exists
# only in one or another interpreter, leading to false positives when analysed.
analyse-fallback-blocks=no

# Deprecated modules which should not be used, separated by a comma.
deprecated-modules=optparse,tkinter.tix

# Create a graph of external dependencies in the given file (report RP0402 must
# not be disabled).
ext-import-graph=

# Create a graph of every (i.e. internal and external) dependencies in the
# given file (report RP0402 must not be disabled).
import-graph=

# Create a graph of internal dependencies in the given file (report RP0402 must
# not be disabled).
int-import-graph=

# Force import order to recognize a module as part of the standard
# compatibility libraries.
known-standard-library=

# Force import order to recognize a module as part of a third party library.
known-third-party=enchant


[EXCEPTIONS]

# Exceptions that will emit a warning when being caught. Defaults to
# "BaseException, Exception".
overgeneral-exceptions=BaseException,
                       Exception


================================================
FILE: .readthedocs.yml
================================================
mkdocs:
  configuration: mkdocs.yml

python:
  version: 3.7
  install:
    - requirements: docs/requirements.txt


================================================
FILE: AUTHORS.md
================================================
Zillion is written and maintained by [@totalhack](https://github.com/totalhack). Contributors welcome!

# **Core Contributors**

- [@totalhack](https://github.com/totalhack)

# **Patches and Suggestions**

Please see the [contributing](https://github.com/totalhack/zillion/blob/master/CONTRIBUTING.md)
guide for more information.


================================================
FILE: CODE_OF_CONDUCT.md
================================================
[This](https://www.kennethreitz.org/essays/be-cordial-or-be-on-your-way) is a good starting point.


================================================
FILE: CONTRIBUTING.md
================================================
Your help and feedback are greatly appreciated. Whether it's supporting/testing
a new datasource type, finding bugs, or suggesting features, every little bit
helps make `Zillion` reach its potential. 

Please also consider manicuring or configuring datasets that others may find
useful. With as little as a CSV and a short JSON configuration file you can
give back to the community. You can host these shared datasources easily with
GitHub.

## **How to Contribute**

1.  Check for open issues or open a new issue to start a discussion around a
    feature idea or a bug.
2.  Fork [the repository](https://github.com/totalhack/zillion) on GitHub to
    start making your changes to the **master** branch (or branch off of it).
3.  Write a test which shows that the bug was fixed or that the feature works
    as expected.
4.  Send a [pull request](https://help.github.com/en/articles/creating-a-pull-request-from-a-fork). Add yourself to
    [AUTHORS](https://github.com/totalhack/zillion/blob/master/AUTHORS.md).

## **Development Setup**

```shell
# Clone this repo
git clone https://github.com/totalhack/zillion.git
cd zillion

# Install dependencies
# Note: activate your venv first if desired!
pip install ".[dev]"

# Bring up test databases -- test data will init the first time
# You can optionally run these DBs directly on your machine instead
docker-compose up

# Run tests
export ZILLION_CONFIG=$(pwd)/tests/test_config.yaml
cd tests
pytest
```

## **Good Bug Reports**

Please be aware of the following things when filing bug reports:

1. Avoid raising duplicate issues. *Please* use the GitHub issue search feature
   to check whether your bug report or feature request has been mentioned in
   the past. Duplicate bug reports and feature requests are a huge maintenance
   burden on the limited resources of the project. If it is clear from your
   report that you would have struggled to find the original, that's ok, but
   if searching for a selection of words in your issue title would have found
   the duplicate then the issue will likely be closed.
2. When filing bug reports about exceptions or tracebacks, please include the
   *complete* traceback. Partial tracebacks, or just the exception text, are
   not helpful. Issues that do not contain complete tracebacks may be closed
   without warning.
3. Make sure you provide a suitable amount of information to work with.

## **Questions**

The GitHub issue tracker is for *bug reports* and *feature requests*. Please do
not use it to ask questions about how to use Zillion.


================================================
FILE: LICENSE
================================================
The MIT License (MIT)

Copyright (c) 2019-present, Kurt Matarese

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.


================================================
FILE: Makefile
================================================
PY := $(shell which python)
ENV := $(abspath $(dir $(PY))/..)
PIP := $(ENV)/bin/pip
UV := $(ENV)/bin/uv

PACKAGE_NAME := zillion
TEST_ENV := /tmp/zillion_pip_test

VERSION = $(shell $(PY) -c "import re,sys;print(re.search(r'__version__\\s*=\\s*[\\\"\\']([^\\\"\\']+)[\\\"\\']', open('zillion/version.py').read()).group(1))")

all: develop

# bootstrap venv and tooling
bootstrap:
	python -m venv $(ENV)
	$(PIP) install -U pip setuptools wheel build twine uv

clean:
	rm -rf build dist *.egg-info pinned-requirements.txt

docs:
	cd docs && $(PY) build_markdown.py

deploy_docs:
	$(PIP) install -U mkdocs mkdocs-material mkdocs-material-extensions
	cd docs && mkdocs gh-deploy

lock:
	$(UV) lock

sync-dev:
	$(UV) sync --dev --active --extra dev --extra mysql --extra postgres --extra duckdb

sync-ci:
	$(UV) sync --locked --active

sync-runtime:
	$(UV) sync --locked --active

# build wheel/sdist via pyproject
build:
	$(PY) -m build

# export pinned requirements (optional, for pip-only CI/Docker)
requirements:
	$(UV) export --format requirements-txt -o pinned-requirements.txt

# install built wheel into env
install-wheel: build
	$(PIP) install --force-reinstall dist/$(PACKAGE_NAME)-$(VERSION)-py3-none-any.whl

uninstall:
	if $(PIP) freeze 2>&1 | grep -q "^$(PACKAGE_NAME)=="; then \
	  $(PIP) uninstall -y $(PACKAGE_NAME); \
	else \
	  echo "No installed package found!"; \
	fi

dist: clean build

upload:
	$(PY) -m twine upload dist/*

test_upload:
	$(PY) -m twine upload --repository-url "https://test.pypi.org/legacy/" dist/*

# create a clean test venv; if uv.lock exists, optionally use uv inside the test venv to sync
test_env:
	rm -rf $(TEST_ENV)
	$(PY) -m venv $(TEST_ENV)
	$(TEST_ENV)/bin/pip install -U pip
	if [ -f uv.lock ]; then \
	  $(TEST_ENV)/bin/pip install uv; \
	  $(TEST_ENV)/bin/uv sync --locked; \
	fi

# publish to PyPI then smoke-test install in a clean venv
pip: dist upload test_env
	sleep 30
	$(TEST_ENV)/bin/pip install -U $(PACKAGE_NAME)==$(VERSION)
	$(TEST_ENV)/bin/python -c "import $(PACKAGE_NAME)"

# publish to TestPyPI then smoke-test install from TestPyPI in a clean venv
test_pip: dist test_upload test_env
	sleep 30
	$(TEST_ENV)/bin/pip install -i "https://test.pypi.org/simple/" --extra-index-url "https://pypi.org/simple/" $(PACKAGE_NAME)==$(VERSION)
	$(TEST_ENV)/bin/python -c "import $(PACKAGE_NAME)"

.PHONY: all bootstrap clean docs deploy_docs develop build export-pinned install uninstall dist upload test_upload test_env pip test_pip


================================================
FILE: README.md
================================================
Zillion: Make sense of it all
=============================

[![Generic badge](https://img.shields.io/badge/Status-Alpha-yellow.svg)](https://shields.io/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
![License: MIT](https://img.shields.io/badge/license-MIT-blue)
![Python 3.6+](https://img.shields.io/badge/python-3.6%2B-blue)
[![Downloads](https://static.pepy.tech/badge/zillion)](https://pepy.tech/project/zillion)


**Introduction**
----------------

`Zillion` is a data modeling and analytics tool that allows combining and
analyzing data from multiple datasources through a simple API. It acts as a semantic layer
on top of your data, writes SQL so you don't have to, and easily bolts onto existing
database infrastructure via SQLAlchemy Core. The `Zillion` NLP extension has experimental
support for AI-powered natural language querying and warehouse configuration.

With `Zillion` you can:

* Define a warehouse that contains a variety of SQL and/or file-like
  datasources
* Define or reflect metrics, dimensions, and relationships in your data
* Run multi-datasource reports and combine the results in a DataFrame
* Flexibly aggregate your data with multi-level rollups and table pivots
* Customize or combine fields with formulas
* Apply technical transformations including rolling, cumulative, and rank
  statistics
* Apply automatic type conversions - i.e. get a "year" dimension for free
  from a "date" column
* Save and share report specifications
* Utilize ad hoc or public datasources, tables, and fields to enrich reports
* Query your warehouse with natural language (NLP extension)
* Leverage AI to bootstrap your warehouse configurations (NLP extension)


**Table of Contents**
---------------------

* [Installation](#installation)
* [Primer](#primer)
    * [Metrics and Dimensions](#metrics-and-dimensions)
    * [Warehouse Theory](#warehouse-theory)
    * [Query Layers](#query-layers)
    * [Warehouse Creation](#warehouse-creation)
    * [Executing Reports](#executing-reports)
    * [Natural Language Querying](#natural-language-querying)
    * [Zillion Configuration](#zillion-configuration)
* [Example - Sales Analytics](#example-sales-analytics)
    * [Warehouse Configuration](#example-warehouse-config)
    * [Reports](#example-reports)
* [Advanced Topics](#advanced-topics)
    * [Subreports](#subreports)
    * [FormulaMetrics](#formula-metrics)
    * [Divisor Metrics](#divisor-metrics)
    * [Aggregation Variants](#aggregation-variants)
    * [FormulaDimensions](#formula-dimensions)
    * [DataSource Formulas](#datasource-formulas)
    * [Type Conversions](#type-conversions)
    * [AdHocMetrics](#adhoc-metrics)
    * [AdHocDimensions](#adhoc-dimensions)
    * [AdHocDataTables](#adhoc-data-tables)
    * [Technicals](#technicals)
    * [Config Variables](#config-variables)
    * [DataSource Priority](#datasource-priority)
* [Supported DataSources](#supported-datasources)
* [Multiprocess Considerations](#multiprocess-considerations)
* [Demo UI / Web API](#demo-ui)
* [Docs](#documentation)
* [How to Contribute](#how-to-contribute)


<a name="installation"></a>

**Installation**
----------------

> **Warning**: This project is in an alpha state and is subject to change. Please test carefully for production usage and report any issues.

```shell
$ pip install zillion

or

$ pip install zillion[nlp]
```

---

<a name="primer"></a>

**Primer**
----------

The following is meant to give a quick overview of some theory and
nomenclature used in data warehousing with `Zillion` which will be useful
if you are newer to this area. You can also skip below for a usage [example](#example-sales-analytics) or warehouse/datasource creation [quickstart](#warehouse-creation) options.

In short: `Zillion` writes SQL for you and makes data accessible through a very simple API:

```python
result = warehouse.execute(
    metrics=["revenue", "leads"],
    dimensions=["date"],
    criteria=[
        ("date", ">", "2020-01-01"),
        ("partner", "=", "Partner A")
    ]
)
```

<a name="metrics-and-dimensions"></a>

### **Metrics and Dimensions**

In `Zillion` there are two main types of `Fields` that will be used in
your report requests:

1. `Dimensions`: attributes of data used for labelling, grouping, and filtering
2. `Metrics`: facts and measures that may be broken down along dimensions

A `Field` encapsulates the concept of a column in your data. For example, you
may have a `Field` called "revenue". That `Field` may occur across several
datasources or possibly in multiple tables within a single datasource. `Zillion` 
understands that all of those columns represent the same concept, and it can try 
to use any of them to satisfy reports requesting "revenue".

Likewise there are two main types of tables used to structure your warehouse:

1. `Dimension Tables`: reference/attribute tables containing only related
dimensions
2. `Metric Tables`: fact tables that may contain metrics and some related
dimensions/attributes

Dimension tables are often static or slowly growing in terms of row count and contain
attributes tied to a primary key. Some common examples would be lists of US Zip Codes or
company/partner directories.

Metric tables are generally more transactional in nature. Some common examples
would be records for web requests, ecommerce sales, or stock market price history.

<a name="warehouse-theory"></a>

### **Warehouse Theory**

If you really want to go deep on dimensional modeling and the drill-across
querying technique `Zillion` employs, I recommend reading Ralph Kimball's
[book](https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/data-warehouse-dw-toolkit/) on data warehousing.

To summarize, [drill-across
querying](https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/drilling-across/)
forms one or more queries to satisfy a report request for `metrics` that may
exist across multiple datasources and/or tables at a particular `dimension` grain.

`Zillion` supports flexible warehouse setups such as
[snowflake](https://en.wikipedia.org/wiki/Snowflake_schema) or
[star](https://en.wikipedia.org/wiki/Star_schema) schemas, though it isn't
picky about it. You can specify table relationships through a parent-child
lineage, and `Zillion` can also infer acceptable joins based on the presence
of dimension table primary keys. `Zillion` does not support many-to-many relationships at this time, though most analytics-focused scenarios should be able to work around that by adding views to the model if needed.

<a name="query-layers"></a>

### **Query Layers**

`Zillion` reports can be thought of as running in two layers:

1. `DataSource Layer`: SQL queries against the warehouse's datasources
2. `Combined Layer`: A final SQL query against the combined data from the
DataSource Layer

The Combined Layer is just another SQL database (in-memory SQLite by default)
that is used to tie the datasource data together and apply a few additional
features such as rollups, row filters, row limits, sorting, pivots, and technical computations.

<a name="warehouse-creation"></a>

### **Warehouse Creation**

There are multiple ways to quickly initialize a warehouse from a local or remote file:

```python
# Path/link to a CSV, XLSX, XLS, JSON, HTML, or Google Sheet
# This builds a single-table Warehouse for quick/ad-hoc analysis.
url = "https://raw.githubusercontent.com/totalhack/zillion/master/tests/dma_zip.xlsx"
wh = Warehouse.from_data_file(url, ["Zip_Code"]) # Second arg is primary key

# Path/link to a sqlite database
# This can build a single or multi-table Warehouse
url = "https://github.com/totalhack/zillion/blob/master/tests/testdb1?raw=true"
wh = Warehouse.from_db_file(url)

# Path/link to a WarehouseConfigSchema (or pass a dict)
# This is the recommended production approach!
config = "https://raw.githubusercontent.com/totalhack/zillion/master/examples/example_wh_config.json"
wh = Warehouse(config=config)
```

Zillion also provides a helper script to boostrap a DataSource configuration file for an existing database. See `zillion.scripts.bootstrap_datasource_config.py`. The bootstrap script requires a connection/database url and output file as arguments. See `--help` output for more options, including the optional `--nlp` flag that leverages OpenAI to infer configuration information such as column types, table types, and table relationships. The NLP feature requires the NLP extension to be installed as well as the following set in your `Zillion` config file:

* OPENAI_MODEL
* OPENAI_API_KEY

<a name="executing-reports"></a>

### **Executing Reports**

The main purpose of `Zillion` is to execute reports against a `Warehouse`.
At a high level you will be crafting reports as follows:

```python
result = warehouse.execute(
    metrics=["revenue", "leads"],
    dimensions=["date"],
    criteria=[
        ("date", ">", "2020-01-01"),
        ("partner", "=", "Partner A")
    ]
)
print(result.df) # Pandas DataFrame
```

When comparing to writing SQL, it's helpful to think of the dimensions as the
target columns of a **group by** SQL statement. Think of the metrics as the
columns you are **aggregating**. Think of the criteria as the **where
clause**. Your criteria are applied in the DataSource Layer SQL queries.

The `ReportResult` has a Pandas DataFrame with the dimensions as the index and
the metrics as the columns.

A `Report` is said to have a `grain`, which defines the dimensions each metric
must be able to join to in order to satisfy the `Report` requirements. The
`grain` is a combination of **all** dimensions, including those referenced in
criteria or in metric formulas. In the example above, the `grain` would be
`{date, partner}`. Both "revenue" and "leads" must be able to join to those
dimensions for this report to be possible.

These concepts can take time to sink in and obviously vary with the specifics
of your data model, but you will become more familiar with them as you start
putting together reports against your data warehouses.

<a name="natural-language-querying"></a>

### **Natural Language Querying**

With the NLP extension `Zillion` has experimental support for natural language querying of your data warehouse. For example:

```python
result = warehouse.execute_text("revenue and leads by date last month")
print(result.df) # Pandas DataFrame
```

This NLP feature requires a running instance of Qdrant (vector database) and the following values set in your `Zillion` config file:

* QDRANT_HOST
* OPENAI_API_KEY

Embeddings will be produced and stored in both Qdrant and a local cache. The
vector database will be initialized the first time you try to use this by
analyzing all fields in your warehouse. An example docker file to run Qdrant is provided in the root of this repo.

You have some control over how fields get embedded. Namely in the configuration for any field you can choose whether to exclude a field from embeddings or override which embeddings map to that field. All fields are
included by default. The following example would exclude the `net_revenue` field from being embedded and map `revenue` metric requests to the `gross_revenue` field.

```javascript
{
    "name": "gross_revenue",
    "type": "numeric(10,2)",
    "aggregation": "sum",
    "rounding": 2,
    "meta": {
        "nlp": {
            // enabled defaults to true
            "embedding_text": "revenue" // str or list of str
        }
    }
},
{
    "name": "net_revenue",
    "type": "numeric(10,2)",
    "aggregation": "sum",
    "rounding": 2,
    "meta": {
        "nlp": {
            "enabled": false
        }
    }
},
```

Additionally you may also exclude fields via the following warehouse-level configuration settings:

```javascript
{
    "meta": {
        "nlp": {
            "field_disabled_patterns": [
                // list of regex patterns to exclude
                "rpl_ma_5"
            ],
            "field_disabled_groups": [
                // list of "groups" to exclude, assuming you have
                // set group value in the field's meta dict.
                "No NLP"
            ]
        }
    },
    ...
}
```

If a field is disabled at any of the aforementioned levels it will be ignored. This type of control becomes useful as your data model gets more complex and you want to guide the NLP logic in cases where it could confuse similarly named fields. Any time you adjust which fields are excluded you will want to force recreation of your embeddings collection using the `force_recreate` flag on `Warehouse.init_embeddings`.

> *Note:* This feature is in its infancy. It's usefulness will depend on the
quality of both the input query and your data model (i.e. good field names)!

<a name="zillion-configuration"></a>

### **Zillion Configuration**

In addition to configuring the structure of your `Warehouse`, which will be
discussed further below, `Zillion` has a global configuration to control some
basic settings. The `ZILLION_CONFIG` environment var can point to a yaml config file. See `examples/sample_config.yaml` for more details on what values can be set. Environment vars prefixed with ZILLION_ can override config settings (i.e. ZILLION_DB_URL will override DB_URL).

The database used to store Zillion report specs can be configured by setting the DB_URL value in your `Zillion` config to a valid database connection string. By default a SQLite DB in /tmp is used.

---

<a name="example-sales-analytics"></a>

**Example - Sales Analytics**
-----------------------------

Below we will walk through a simple hypothetical sales data model that
demonstrates basic `DataSource` and `Warehouse` configuration and then shows
some sample [reports](#example-reports). The data is a simple SQLite database
that is part of the `Zillion` test code. For reference, the schema is as
follows:

```sql
CREATE TABLE partners (
  id INTEGER PRIMARY KEY,
  name VARCHAR NOT NULL UNIQUE,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE campaigns (
  id INTEGER PRIMARY KEY,
  name VARCHAR NOT NULL UNIQUE,
  category VARCHAR NOT NULL,
  partner_id INTEGER NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE leads (
  id INTEGER PRIMARY KEY,
  name VARCHAR NOT NULL,
  campaign_id INTEGER NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE sales (
  id INTEGER PRIMARY KEY,
  item VARCHAR NOT NULL,
  quantity INTEGER NOT NULL,
  revenue DECIMAL(10, 2),
  lead_id INTEGER NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```

<a name="example-warehouse-config"></a>

### **Warehouse Configuration**

A `Warehouse` may be created from a JSON or YAML configuration that defines
its fields, datasources, and tables. The code below shows how it can be done in as little as one line of code if you have a pointer to a JSON/YAML `Warehouse` config.

```python
from zillion import Warehouse

wh = Warehouse(config="https://raw.githubusercontent.com/totalhack/zillion/master/examples/example_wh_config.json")
```

This example config uses a `data_url` in its `DataSource` `connect` info that
tells `Zillion` to dynamically download that data and connect to it as a
SQLite database. This is useful for quick examples or analysis, though in most
scenarios you would put a connection string to an existing database like you
see
[here](https://raw.githubusercontent.com/totalhack/zillion/master/tests/test_mysql_ds_config.json)

The basics of `Zillion's` warehouse configuration structure are as follows:

A `Warehouse` config has the following main sections:

* `metrics`: optional list of metric configs for global metrics
* `dimensions`: optional list of dimension configs for global dimensions
* `datasources`: mapping of datasource names to datasource configs or config URLs

A `DataSource` config has the following main sections:

* `connect`: database connection url or dict of connect params
* `metrics`: optional list of metric configs specific to this datasource
* `dimensions`: optional list of dimension configs specific to this datasource
* `tables`: mapping of table names to table configs or config URLs

> Tip: datasource and table configs may also be replaced with a URL that points
to a local or remote config file.

In this example all four tables in our database are included in the config,
two as dimension tables and two as metric tables. The tables are linked
through a parent->child relationship: partners to campaigns, and leads to
sales.  Some tables also utilize the `create_fields` flag to automatically
create `Fields` on the datasource from column definitions. Other metrics and
dimensions are defined explicitly.

To view the structure of this `Warehouse` after init you can use the `print_info`
method which shows all metrics, dimensions, tables, and columns that are part
of your data warehouse:

```python
wh.print_info() # Formatted print of the Warehouse structure
```

For a deeper dive of the config schema please see the full
[docs](https://totalhack.github.io/zillion/zillion.configs/).

<a name="example-reports"></a>

### **Reports**

**Example:** Get sales, leads, and revenue by partner:

```python
result = wh.execute(
    metrics=["sales", "leads", "revenue"],
    dimensions=["partner_name"]
)

print(result.df)
"""
              sales  leads  revenue
partner_name
Partner A        11      4    165.0
Partner B         2      2     19.0
Partner C         5      1    118.5
"""
```

**Example:** Let's limit to Partner A and break down by its campaigns:

```python
result = wh.execute(
    metrics=["sales", "leads", "revenue"],
    dimensions=["campaign_name"],
    criteria=[("partner_name", "=", "Partner A")]
)

print(result.df)
"""
               sales  leads  revenue
campaign_name
Campaign 1A        5      2       83
Campaign 2A        6      2       82
"""
```

**Example:** The output below shows rollups at the campaign level within each
partner, and also a rollup of totals at the partner and campaign level.

> *Note:* the output contains a special character to mark DataFrame rollup rows
that were added to the result. The
[ReportResult](https://totalhack.github.io/zillion/zillion.report/#reportresult)
object contains some helper attributes to automatically access or filter
rollups, as well as a `df_display` attribute that returns the result with
friendlier display values substituted for special characters. The
under-the-hood special character is left here for illustration, but may not
render the same in all scenarios.

```python
from zillion import RollupTypes

result = wh.execute(
    metrics=["sales", "leads", "revenue"],
    dimensions=["partner_name", "campaign_name"],
    rollup=RollupTypes.ALL
)

print(result.df)
"""
                            sales  leads  revenue
partner_name campaign_name
Partner A    Campaign 1A      5.0    2.0     83.0
             Campaign 2A      6.0    2.0     82.0
             􏿿               11.0    4.0    165.0
Partner B    Campaign 1B      1.0    1.0      6.0
             Campaign 2B      1.0    1.0     13.0
             􏿿                2.0    2.0     19.0
Partner C    Campaign 1C      5.0    1.0    118.5
             􏿿                5.0    1.0    118.5
􏿿            􏿿               18.0    7.0    302.5
"""
```

See the `Report`
[docs](https://totalhack.github.io/zillion/zillion.report/#report) for more
information on supported rollup behavior.

**Example:** Save a report spec (not the data):

First you must make sure you have saved your `Warehouse`, as saved reports
are scoped to a particular `Warehouse` ID. To save a `Warehouse`
you must provide a URL that points to the complete config.

```python
name = "My Unique Warehouse Name"
config_url = <some url pointing to a complete warehouse config>
wh.save(name, config_url) # wh.id is populated after this

spec_id = wh.save_report(
    metrics=["sales", "leads", "revenue"],
    dimensions=["partner_name"]
)
```

> *Note*: If you built your `Warehouse` in python from a list of `DataSources`,
or passed in a `dict` for the `config` param on init, there currently is not
a built-in way to output a complete config to a file for reference when saving.

**Example:** Load and run a report from a spec ID:

```python
result = wh.execute_id(spec_id)
```

This assumes you have saved this report ID previously in the database specified by the DB_URL in your `Zillion` yaml configuration.

**Example:** Unsupported Grain

If you attempt an impossible report, you will get an
`UnsupportedGrainException`. The report below is impossible because it
attempts to break down the leads metric by a dimension that only exists
in a child table. Generally speaking, child tables can join back up to
parents (and "siblings" of parents) to find dimensions, but not the other
way around.

```python
# Fails with UnsupportedGrainException
result = wh.execute(
    metrics=["leads"],
    dimensions=["sale_id"]
)
```

---

<a name="advanced-topics"></a>

**Advanced Topics**
-------------------

<a name="subreports"></a>

### **Subreports**

Sometimes you need subquery-like functionality in order to filter one
report to the results of some other (that perhaps required a different grain).
Zillion provides a simplistic way of doing that by using the `in report` or `not in report`
criteria operations. There are two supported ways to specify the subreport: passing a
report spec ID or passing a dict of report params.

```python
# Assuming you have saved report 1234 and it has "partner" as a dimension:

result = warehouse.execute(
    metrics=["revenue", "leads"],
    dimensions=["date"],
    criteria=[
        ("date", ">", "2020-01-01"),
        ("partner", "in report", 1234)
    ]
)

# Or with a dict:

result = warehouse.execute(
    metrics=["revenue", "leads"],
    dimensions=["date"],
    criteria=[
        ("date", ">", "2020-01-01"),
        ("partner", "in report", dict(
            metrics=[...],
            dimension=["partner"],
            criteria=[...]
        ))
    ]
)
```

The criteria field used in `in report` or `not in report` must be a dimension
in the subreport. Note that subreports are executed at `Report` object initialization
time instead of during `execute` -- as such they can not be killed using `Report.kill`.
This may change down the road.

<a name="formula-metrics"></a>

### **Formula Metrics**

In our example above our config included a formula-based metric called "rpl",
which is simply `revenue / leads`. A `FormulaMetric` combines other metrics
and/or dimensions to calculate a new metric at the Combined Layer of
querying. The syntax must match your Combined Layer database, which is SQLite
in our example.

```json
{
    "name": "rpl",
    "aggregation": "mean",
    "rounding": 2,
    "formula": "{revenue}/{leads}"
}
```

<a name="divisor-metrics"></a>

### **Divisor Metrics**

As a convenience, rather than having to repeatedly define formula metrics for
rate variants of a core metric, you can specify a divisor metric configuration on a non-formula metric. As an example, say you have a `revenue` metric and want to create variants for `revenue_per_lead` and `revenue_per_sale`. You can define your revenue metric as follows:

```json
{
    "name": "revenue",
    "type": "numeric(10,2)",
    "aggregation": "sum",
    "rounding": 2,
    "divisors": {
        "metrics": [
            "leads",
            "sales"
        ]
    }
}
```

See `zillion.configs.DivisorsConfigSchema` for more details on configuration options, such as overriding naming templates, formula templates, and rounding.

<a name="aggregation-variants"></a>

### **Aggregation Variants**

Another minor convenience feature is the ability to automatically generate variants of metrics for different aggregation types in a single field configuration instead of across multiple fields in your config file. As an example, say you have a `sales` column in your data and want to create variants for `sales_mean` and `sales_sum`. You can define your metric as follows:

```json
{
    "name": "sales",
    "aggregation": {
        "mean": {
            "type": "numeric(10,2)",
            "rounding": 2
        },
        "sum": {
            "type": "integer"
        }
    }
}
```

The resulting warehouse would not have a `sales` metric, but would instead have `sales_mean` and `sales_sum`. Note that you can further customize the settings for the generated fields, such as setting a custom name, by specifying that in the nested settings for that aggregation type. In practice this is not a big efficiency gain over just defining the metrics separately, but some may prefer this approach.

<a name="formula-dimensions"></a>

### **Formula Dimensions**

Experimental support exists for `FormulaDimension` fields as well. A `FormulaDimension` can only use other dimensions as part of its formula, and it also gets evaluated in the Combined Layer database. As an additional restriction, a `FormulaDimension` can not be used in report criteria as those filters are evaluated at the DataSource Layer. The following example assumes a SQLite Combined Layer database:


```json
{
    "name": "partner_is_a",
    "formula": "{partner_name} = 'Partner A'"
}
```

<a name="datasource-formulas"></a>

### **DataSource Formulas**

Our example also includes a metric "sales" whose value is calculated via
formula at the DataSource Layer of querying. Note the following in the
`fields` list for the "id" param in the "main.sales" table. These formulas are
in the syntax of the particular `DataSource` database technology, which also
happens to be SQLite in our example.

```json
"fields": [
    "sale_id",
    {"name":"sales", "ds_formula": "COUNT(DISTINCT sales.id)"}
]
```

<a name="type-conversions"></a>

### **Type Conversions**

Our example also automatically created a handful of dimensions from the
"created_at" columns of the leads and sales tables. Support for automatic type
conversions is limited, but for date/datetime columns in supported
`DataSource` technologies you can get a variety of dimensions for free this
way.

The output of `wh.print_info` will show the added dimensions, which are
prefixed with "lead_" or "sale_" as specified by the optional
`type_conversion_prefix` in the config for each table. Some examples of
auto-generated dimensions in our example warehouse include sale_hour,
sale_day_name, sale_month, sale_year, etc. 

As an optimization in the where clause of underlying report queries, `Zillion` 
will try to apply conversions to criteria values instead of columns. For example, 
it is generally more efficient to query as `my_datetime > '2020-01-01' and my_datetime < '2020-01-02'`
instead of `DATE(my_datetime) == '2020-01-01'`, because the latter can prevent index
usage in many database technologies. The ability to apply conversions to values
instead of columns varies by field and `DataSource` technology as well. 

To prevent type conversions, set `skip_conversion_fields` to `true` on your
`DataSource` config.

See `zillion.field.TYPE_ALLOWED_CONVERSIONS` and `zillion.field.DIALECT_CONVERSIONS`
for more details on currently supported conversions.

<a name="adhoc-metrics"></a>

### **Ad Hoc Metrics**

You may also define metrics "ad hoc" with each report request. Below is an
example that creates a revenue-per-lead metric on the fly. These only exist
within the scope of the report, and the name can not conflict with any existing
fields:

```python
result = wh.execute(
    metrics=[
        "leads",
        {"formula": "{revenue}/{leads}", "name": "my_rpl"}
    ],
    dimensions=["partner_name"]
)
```

<a name="adhoc-dimensions"></a>

### **Ad Hoc Dimensions**

You may also define dimensions "ad hoc" with each report request. Below is an
example that creates a dimension that partitions on a particular dimension value on the fly. Ad Hoc Dimensions are a subclass of `FormulaDimension`s and therefore have the same restrictions, such as not being able to use a metric as a formula field. These only exist within the scope of the report, and the name can not conflict with any existing fields:

```python
result = wh.execute(
    metrics=["leads"],
    dimensions=[{"name": "partner_is_a", "formula": "{partner_name} = 'Partner A'"]
)
```

<a name="adhoc-tables"></a>

### **Ad Hoc Tables**

`Zillion` also supports creation or syncing of ad hoc tables in your database
during `DataSource` or `Warehouse` init. An example of a table config that
does this is shown
[here](https://github.com/totalhack/zillion/blob/master/tests/test_adhoc_ds_config.json).
It uses the table config's `data_url` and `if_exists` params to control the
syncing and/or creation of the "main.dma_zip" table from a remote CSV in a
SQLite database.  The same can be done in other database types too.

The potential performance drawbacks to such an approach should be obvious,
particularly if you are initializing your warehouse often or if the remote
data file is large. It is often better to sync and create your data ahead of
time so you have complete schema control, but this method can be very useful
in certain scenarios.

> **Warning**: be careful not to overwrite existing tables in your database!

<a name="technicals"></a>

### **Technicals**

There are a variety of technical computations that can be applied to metrics to
compute rolling, cumulative, or rank statistics. For example, to compute a 5-point
moving average on revenue one might define a new metric as follows:

```json
{
    "name": "revenue_ma_5",
    "type": "numeric(10,2)",
    "aggregation": "sum",
    "rounding": 2,
    "technical": "mean(5)"
}
```

Technical computations are computed at the Combined Layer, whereas the "aggregation"
is done at the DataSource Layer (hence needing to define both above). 

For more info on how shorthand technical strings are parsed, see the
[parse_technical_string](https://totalhack.github.io/zillion/zillion.configs/#parse_technical_string)
code. For a full list of supported technical types see
`zillion.core.TechnicalTypes`.

Technicals also support two modes: "group" and "all". The mode controls how to
apply the technical computation across the data's dimensions. In "group" mode,
it computes the technical across the last dimension, whereas in "all" mode in
computes the technical across all data without any regard for dimensions.

The point of this becomes more clear if you try to do a "cumsum" technical
across data broken down by something like ["partner_name", "date"]. If "group"
mode is used (the default in most cases) it will do cumulative sums *within*
each partner over the date ranges. If "all" mode is used, it will do a
cumulative sum across every data row. You can be explicit about the mode by
appending it to the technical string: i.e. "cumsum:all" or "mean(5):group"

---

<a name="config-variables"></a>

### **Config Variables**

If you'd like to avoid putting sensitive connection information directly in
your `DataSource` configs you can leverage config variables. In your `Zillion`
yaml config you can specify a `DATASOURCE_CONTEXTS` section as follows:

```yaml
DATASOURCE_CONTEXTS:
  my_ds_name:
    user: user123
    pass: goodpassword
    host: 127.0.0.1
    schema: reporting
```

Then when your `DataSource` config for the datasource named "my_ds_name" is
read, it can use this context to populate variables in your connection url:

```json
"datasources": {
    "my_ds_name": {
        "connect": "mysql+pymysql://{user}:{pass}@{host}/{schema}"
        ...
    }
}
```

<a name="datasource-priority"></a>

### **DataSource Priority**

On `Warehouse` init you can specify a default priority order for datasources
by name. This will come into play when a report could be satisfied by multiple
datasources. `DataSources` earlier in the list will be higher priority. This
would be useful if you wanted to favor a set of faster, aggregate tables that
are grouped in a `DataSource`.

```python
wh = Warehouse(config=config, ds_priority=["aggr_ds", "raw_ds", ...])
```

<a name="supported-datasources"></a>

**Supported DataSources**
-------------------------

`Zillion's` goal is to support any database technology that SQLAlchemy
supports (pictured below). That said the support and testing levels in `Zillion` vary at the
moment. In particular, the ability to do type conversions, database
reflection, and kill running queries all require some database-specific code
for support. The following list summarizes known support levels. Your mileage
may vary with untested database technologies that SQLAlchemy supports (it
might work just fine, just hasn't been tested yet). Please report bugs and
help add more support!

* SQLite: supported
* MySQL: supported
* PostgreSQL: supported
* DuckDB: supported
* BigQuery, Redshift, Snowflake, SingleStore, PlanetScale, etc: not tested but would like to support these

SQLAlchemy has connectors to many popular databases. The barrier to support many of these is likely
pretty low given the simple nature of the sql operations `Zillion` uses.

![SQLAlchemy Connectors](https://github.com/totalhack/zillion/blob/master/docs/images/sqlalchemy_connectors.webp?raw=true)

Note that the above is different than the database support for the Combined Layer
database. Currently only SQLite is supported there; that should be sufficient for
most use cases but more options will be added down the road.

<a name="multiprocess-considerations"></a>

**Multiprocess Considerations**
-------------------------------

If you plan to run `Zillion` in a multiprocess scenario, whether on a single
node or across multiple nodes, there are a couple of things to consider:

* SQLite DataSources do not scale well and may run into locking issues with multiple processes trying to access them on the same node.
* Any file-based database technology that isn't centrally accessible would be challenging when using multiple nodes.
* Ad Hoc DataSource and Ad Hoc Table downloads should be avoided as they may conflict/repeat across each process. Offload this to an external
ETL process that is better suited to manage those data flows in a scalable production scenario.

Note that you can still use the default SQLite in-memory Combined Layer DB without issues, as that is made on the fly with each report request and
requires no coordination/communication with other processes or nodes.

<a name="demo-ui"></a>

**Demo UI / Web API**
--------------------

[Zillion Web UI](https://github.com/totalhack/zillion-web) is a demo UI and web API for Zillion that also includes an experimental ChatGPT plugin. See the README there for more info on installation and project structure. Please note that the code is light on testing and polish, but is expected to work in modern browsers. Also ChatGPT plugins are quite slow at the moment, so currently that is mostly for fun and not that useful.

---

<a name="documentation"></a>

**Documentation**
-----------------

More thorough documentation can be found [here](https://totalhack.github.io/zillion/).
You can supplement your knowledge by perusing the [tests](https://github.com/totalhack/zillion/tree/master/tests) directory
or the [API reference](https://totalhack.github.io/zillion/).


---

<a name="how-to-contribute"></a>

**How to Contribute**
---------------------

Please See the
[contributing](https://github.com/totalhack/zillion/blob/master/CONTRIBUTING.md)
guide for more information. If you are looking for inspiration, adding support and tests for additional database technologies would be a great help.






================================================
FILE: dev_config.yml
================================================
DEBUG: false
LOG_LEVEL: WARNING
LOAD_TABLE_CHUNK_SIZE: 5000
DB_URL: sqlite:////tmp/zillion.db
ADHOC_DATASOURCE_DIRECTORY: /tmp

# OPENAI_API_KEY: xyz
OPENAI_MODEL: gpt-3.5-turbo
QDRANT_HOST: localhost

DATASOURCE_QUERY_MODE: "sequential"
DATASOURCE_QUERY_TIMEOUT: null
DATASOURCE_QUERY_WORKERS: 4

DB_ENGINE_POOL_SIZE: 7

DATASOURCE_CONTEXTS:
  testdb2:
    schema: testdb2
  test_adhoc_db:
    user: totalhack
  duckdb:
    schema: zillion_test
  mysql:
    user: root
    host: 127.0.0.1
    schema: zillion_test
  postgresql:
    user: postgres
    host: 127.0.0.1
    schema: zillion_test

TEST:
  MySQLHost: 127.0.0.1
  MySQLPort: 3306
  MySQLUser: root
  MySQLTestSchema: zillion_test
  PostgreSQLHost: 127.0.0.1
  PostgreSQLPort: 5432
  PostgreSQLUser: postgres
  PostgreSQLTestSchema: zillion_test
  DuckDBTestSchemaBase: zillion_test


================================================
FILE: docker-compose-nlp.yml
================================================
version: "3.8"
services:    
  qdrant:
    image: qdrant/qdrant
    ports:
      - 6333:6333
      - 6334:6334
    environment:
      - TZ=America/New_York
    volumes:
      - ./volumes/qdrant:/qdrant/storage


================================================
FILE: docker-compose.yml
================================================
version: "3.8"
services:

  mysql:
    image: mysql:8.0.32
    ports:
      - 3306:3306
    command: ['--default-authentication-plugin=mysql_native_password', '--character-set-server=utf8mb4', '--collation-server=utf8mb4_unicode_ci']
    environment:
      - MYSQL_ROOT_PASSWORD=
      - MYSQL_DATABASE=zillion_test
      - MYSQL_ALLOW_EMPTY_PASSWORD=1
      - TZ=America/New_York
    volumes:
      - test-mysql8-data:/var/lib/mysql/

  postgres:
    image: postgres:15
    ports:
      - 5432:5432
    environment:
      - PGDATA=/var/lib/postgresql/data/pgdata_test
      - POSTGRES_SERVER=127.0.0.1
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=
      - POSTGRES_HOST_AUTH_METHOD=trust      
      - POSTGRES_DB=zillion_test
    volumes:
      - test-pg15-data:/var/lib/postgresql/data/pgdata_test

volumes:
  test-pg15-data:
  test-mysql8-data:


================================================
FILE: docs/build_markdown.py
================================================
import enum
import importlib
import inspect
import os
import pkgutil
import shutil

import markdown
from tlbx import st

import zillion


CWD = os.path.dirname(os.path.abspath(__file__))

OPTS = dict(
    extensions=["pymdownx.snippets", "admonition"],
    extension_configs={"pymdownx.snippets": {"base_path": CWD}},
)


def get_classes(module):
    return set(
        [
            x
            for x in inspect.getmembers(module, inspect.isclass)
            if (not x[0].startswith("_"))
            and x[1].__module__ == module.__name__
            and not type(x[1]) is enum.EnumMeta
        ]
    )


def get_funcs(module):
    return set(
        [
            x
            for x in inspect.getmembers(module, inspect.isfunction)
            if (not x[0].startswith("_")) and x[1].__module__ == module.__name__
        ]
    )


def get_object_attributes(obj):
    return set(
        [y[0] for y in inspect.getmembers(obj, lambda x: not inspect.isroutine(x))]
    )


def get_zillion_members(obj):
    member_set = set()
    for cls in obj.mro():
        if not cls.__module__.startswith("zillion"):
            break
        member_set |= cls.__dict__.keys()
    member_set = {x for x in member_set if (not x.startswith("_"))}
    member_set -= get_object_attributes(obj)
    return sorted(member_set)


def process_markdown(infile, outfile, **opts):
    with open(infile, "r") as f:
        text = f.read()
    md = markdown.Markdown(**opts)
    md.convert(text)
    md = u"\n".join(md.lines)
    with open(outfile, "w") as f:
        f.write(md)


def linkcode_resolve(obj):
    try:
        fn = inspect.getsourcefile(inspect.unwrap(obj))
    except TypeError:
        fn = None
    if not fn:
        return None

    try:
        source, lineno = inspect.getsourcelines(obj)
    except OSError:
        lineno = None

    if lineno:
        linespec = "#L{:d}-L{:d}".format(lineno, lineno + len(source) - 1)
    else:
        linespec = ""

    fn = os.path.relpath(fn, start=os.path.dirname(zillion.__file__))

    return "https://github.com/totalhack/zillion/blob/master/zillion/" "{}{}".format(
        fn, linespec
    )


def create_module_file(fullname):
    module = importlib.import_module(fullname)
    classes = get_classes(module)
    funcs = get_funcs(module)

    out = "[//]: # (This is an auto-generated file. Do not edit)\n"
    out += "# Module %s\n\n" % fullname

    for name, obj in sorted(classes):
        if issubclass(obj, Exception):
            # These cause errors in inspect.signature call in mkautodoc
            continue

        codelink = linkcode_resolve(obj)
        if codelink:
            out += "\n## [%s](%s)\n\n" % (name, codelink)
        else:
            out += "\n## %s\n\n" % name

        if obj.__bases__ and obj.__bases__ != (object,):
            base_names = ", ".join(
                [x.__module__ + "." + x.__name__ for x in obj.__bases__]
            )
            out += "*Bases*: %s\n\n" % base_names

        members = get_zillion_members(obj)
        if members:
            members = ":members: " + " ".join(members)
        else:
            members = ""

        out += CLASS_TEMPLATE % dict(name=fullname + "." + name, members=members)
        out += "\n"

    for name, obj in sorted(funcs):
        codelink = linkcode_resolve(obj)
        if codelink:
            out += "\n## [%s](%s)\n\n" % (name, codelink)
        else:
            out += "\n## %s\n\n" % name
        out += FUNC_TEMPLATE % dict(name=fullname + "." + name)
        out += "\n"

    filename = "%s/mkdocs/%s" % (CWD, fullname + ".md")
    print("Writing %s" % filename)
    with open(filename, "w") as f:
        f.write(out)


# -------- Build main README


INPUT_FILE = "%s/readme.md" % CWD
OUTPUT_FILE = "%s/../README.md" % CWD
print("Building %s from %s" % (OUTPUT_FILE, INPUT_FILE))
process_markdown(INPUT_FILE, OUTPUT_FILE, **OPTS)


# -------- CONTRIBUTING.md


INPUT_FILE = "%s/markdown/contributing.md" % CWD
OUTPUT_FILE = "%s/../CONTRIBUTING.md" % CWD
print("Building %s from %s" % (OUTPUT_FILE, INPUT_FILE))
shutil.copyfile(INPUT_FILE, OUTPUT_FILE)


INPUT_FILE = "%s/markdown/contributing.md" % CWD
OUTPUT_FILE = "%s/mkdocs/contributing.md" % CWD
print("Building %s from %s" % (OUTPUT_FILE, INPUT_FILE))
shutil.copyfile(INPUT_FILE, OUTPUT_FILE)


# -------- Build mkdocs index


INPUT_FILE = "%s/mkdocs_index.md" % CWD
OUTPUT_FILE = "%s/mkdocs/index.md" % CWD
print("Building %s from %s" % (OUTPUT_FILE, INPUT_FILE))
process_markdown(INPUT_FILE, OUTPUT_FILE, **OPTS)


# -------- Build API docs


API_FILE = "%s/mkdocs/api.md" % CWD
out = "# API Reference\n"

CLASS_TEMPLATE = """::: %(name)s
    :docstring:
    %(members)s
"""

FUNC_TEMPLATE = """::: %(name)s
    :docstring:
"""

walk = pkgutil.walk_packages(["../zillion"])

for module in walk:
    fullname = "zillion." + module.name
    path = fullname + ".md"
    md = "* [%s](%s)" % (fullname, path)
    out += "\n%s" % md
    create_module_file(fullname)

with open(API_FILE, "w") as f:
    f.write(out)


================================================
FILE: docs/markdown/contributing.md
================================================
Your help and feedback are greatly appreciated. Whether it's supporting/testing
a new datasource type, finding bugs, or suggesting features, every little bit
helps make `Zillion` reach its potential. 

Please also consider manicuring or configuring datasets that others may find
useful. With as little as a CSV and a short JSON configuration file you can
give back to the community. You can host these shared datasources easily with
GitHub.

## **How to Contribute**

1.  Check for open issues or open a new issue to start a discussion around a
    feature idea or a bug.
2.  Fork [the repository](https://github.com/totalhack/zillion) on GitHub to
    start making your changes to the **master** branch (or branch off of it).
3.  Write a test which shows that the bug was fixed or that the feature works
    as expected.
4.  Send a [pull request](https://help.github.com/en/articles/creating-a-pull-request-from-a-fork). Add yourself to
    [AUTHORS](https://github.com/totalhack/zillion/blob/master/AUTHORS.md).

## **Development Setup**

```shell
# Clone this repo
git clone https://github.com/totalhack/zillion.git
cd zillion

# Install dependencies
# Note: activate your venv first if desired!
pip install ".[dev]"

# Bring up test databases -- test data will init the first time
# You can optionally run these DBs directly on your machine instead
docker-compose up

# Run tests
export ZILLION_CONFIG=$(pwd)/tests/test_config.yaml
cd tests
pytest
```

## **Good Bug Reports**

Please be aware of the following things when filing bug reports:

1. Avoid raising duplicate issues. *Please* use the GitHub issue search feature
   to check whether your bug report or feature request has been mentioned in
   the past. Duplicate bug reports and feature requests are a huge maintenance
   burden on the limited resources of the project. If it is clear from your
   report that you would have struggled to find the original, that's ok, but
   if searching for a selection of words in your issue title would have found
   the duplicate then the issue will likely be closed.
2. When filing bug reports about exceptions or tracebacks, please include the
   *complete* traceback. Partial tracebacks, or just the exception text, are
   not helpful. Issues that do not contain complete tracebacks may be closed
   without warning.
3. Make sure you provide a suitable amount of information to work with.

## **Questions**

The GitHub issue tracker is for *bug reports* and *feature requests*. Please do
not use it to ask questions about how to use Zillion.


================================================
FILE: docs/markdown/readme_badges.md
================================================
[![Generic badge](https://img.shields.io/badge/Status-Alpha-yellow.svg)](https://shields.io/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
![License: MIT](https://img.shields.io/badge/license-MIT-blue)
![Python 3.6+](https://img.shields.io/badge/python-3.6%2B-blue)
[![Downloads](https://static.pepy.tech/badge/zillion)](https://pepy.tech/project/zillion)


================================================
FILE: docs/markdown/readme_contents.md
================================================
<a name="installation"></a>

**Installation**
----------------

> **Warning**: This project is in an alpha state and is subject to change. Please test carefully for production usage and report any issues.

```shell
$ pip install zillion

or

$ pip install zillion[nlp]
```

---

<a name="primer"></a>

**Primer**
----------

The following is meant to give a quick overview of some theory and
nomenclature used in data warehousing with `Zillion` which will be useful
if you are newer to this area. You can also skip below for a usage [example](#example-sales-analytics) or warehouse/datasource creation [quickstart](#warehouse-creation) options.

In short: `Zillion` writes SQL for you and makes data accessible through a very simple API:

```python
result = warehouse.execute(
    metrics=["revenue", "leads"],
    dimensions=["date"],
    criteria=[
        ("date", ">", "2020-01-01"),
        ("partner", "=", "Partner A")
    ]
)
```

<a name="metrics-and-dimensions"></a>

### **Metrics and Dimensions**

In `Zillion` there are two main types of `Fields` that will be used in
your report requests:

1. `Dimensions`: attributes of data used for labelling, grouping, and filtering
2. `Metrics`: facts and measures that may be broken down along dimensions

A `Field` encapsulates the concept of a column in your data. For example, you
may have a `Field` called "revenue". That `Field` may occur across several
datasources or possibly in multiple tables within a single datasource. `Zillion` 
understands that all of those columns represent the same concept, and it can try 
to use any of them to satisfy reports requesting "revenue".

Likewise there are two main types of tables used to structure your warehouse:

1. `Dimension Tables`: reference/attribute tables containing only related
dimensions
2. `Metric Tables`: fact tables that may contain metrics and some related
dimensions/attributes

Dimension tables are often static or slowly growing in terms of row count and contain
attributes tied to a primary key. Some common examples would be lists of US Zip Codes or
company/partner directories.

Metric tables are generally more transactional in nature. Some common examples
would be records for web requests, ecommerce sales, or stock market price history.

<a name="warehouse-theory"></a>

### **Warehouse Theory**

If you really want to go deep on dimensional modeling and the drill-across
querying technique `Zillion` employs, I recommend reading Ralph Kimball's
[book](https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/data-warehouse-dw-toolkit/) on data warehousing.

To summarize, [drill-across
querying](https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/drilling-across/)
forms one or more queries to satisfy a report request for `metrics` that may
exist across multiple datasources and/or tables at a particular `dimension` grain.

`Zillion` supports flexible warehouse setups such as
[snowflake](https://en.wikipedia.org/wiki/Snowflake_schema) or
[star](https://en.wikipedia.org/wiki/Star_schema) schemas, though it isn't
picky about it. You can specify table relationships through a parent-child
lineage, and `Zillion` can also infer acceptable joins based on the presence
of dimension table primary keys. `Zillion` does not support many-to-many relationships at this time, though most analytics-focused scenarios should be able to work around that by adding views to the model if needed.

<a name="query-layers"></a>

### **Query Layers**

`Zillion` reports can be thought of as running in two layers:

1. `DataSource Layer`: SQL queries against the warehouse's datasources
2. `Combined Layer`: A final SQL query against the combined data from the
DataSource Layer

The Combined Layer is just another SQL database (in-memory SQLite by default)
that is used to tie the datasource data together and apply a few additional
features such as rollups, row filters, row limits, sorting, pivots, and technical computations.

<a name="warehouse-creation"></a>

### **Warehouse Creation**

There are multiple ways to quickly initialize a warehouse from a local or remote file:

```python
# Path/link to a CSV, XLSX, XLS, JSON, HTML, or Google Sheet
# This builds a single-table Warehouse for quick/ad-hoc analysis.
url = "https://raw.githubusercontent.com/totalhack/zillion/master/tests/dma_zip.xlsx"
wh = Warehouse.from_data_file(url, ["Zip_Code"]) # Second arg is primary key

# Path/link to a sqlite database
# This can build a single or multi-table Warehouse
url = "https://github.com/totalhack/zillion/blob/master/tests/testdb1?raw=true"
wh = Warehouse.from_db_file(url)

# Path/link to a WarehouseConfigSchema (or pass a dict)
# This is the recommended production approach!
config = "https://raw.githubusercontent.com/totalhack/zillion/master/examples/example_wh_config.json"
wh = Warehouse(config=config)
```

Zillion also provides a helper script to boostrap a DataSource configuration file for an existing database. See `zillion.scripts.bootstrap_datasource_config.py`. The bootstrap script requires a connection/database url and output file as arguments. See `--help` output for more options, including the optional `--nlp` flag that leverages OpenAI to infer configuration information such as column types, table types, and table relationships. The NLP feature requires the NLP extension to be installed as well as the following set in your `Zillion` config file:

* OPENAI_MODEL
* OPENAI_API_KEY

<a name="executing-reports"></a>

### **Executing Reports**

The main purpose of `Zillion` is to execute reports against a `Warehouse`.
At a high level you will be crafting reports as follows:

```python
result = warehouse.execute(
    metrics=["revenue", "leads"],
    dimensions=["date"],
    criteria=[
        ("date", ">", "2020-01-01"),
        ("partner", "=", "Partner A")
    ]
)
print(result.df) # Pandas DataFrame
```

When comparing to writing SQL, it's helpful to think of the dimensions as the
target columns of a **group by** SQL statement. Think of the metrics as the
columns you are **aggregating**. Think of the criteria as the **where
clause**. Your criteria are applied in the DataSource Layer SQL queries.

The `ReportResult` has a Pandas DataFrame with the dimensions as the index and
the metrics as the columns.

A `Report` is said to have a `grain`, which defines the dimensions each metric
must be able to join to in order to satisfy the `Report` requirements. The
`grain` is a combination of **all** dimensions, including those referenced in
criteria or in metric formulas. In the example above, the `grain` would be
`{date, partner}`. Both "revenue" and "leads" must be able to join to those
dimensions for this report to be possible.

These concepts can take time to sink in and obviously vary with the specifics
of your data model, but you will become more familiar with them as you start
putting together reports against your data warehouses.

<a name="natural-language-querying"></a>

### **Natural Language Querying**

With the NLP extension `Zillion` has experimental support for natural language querying of your data warehouse. For example:

```python
result = warehouse.execute_text("revenue and leads by date last month")
print(result.df) # Pandas DataFrame
```

This NLP feature requires a running instance of Qdrant (vector database) and the following values set in your `Zillion` config file:

* QDRANT_HOST
* OPENAI_API_KEY

Embeddings will be produced and stored in both Qdrant and a local cache. The
vector database will be initialized the first time you try to use this by
analyzing all fields in your warehouse. An example docker file to run Qdrant is provided in the root of this repo.

You have some control over how fields get embedded. Namely in the configuration for any field you can choose whether to exclude a field from embeddings or override which embeddings map to that field. All fields are
included by default. The following example would exclude the `net_revenue` field from being embedded and map `revenue` metric requests to the `gross_revenue` field.

```javascript
{
    "name": "gross_revenue",
    "type": "numeric(10,2)",
    "aggregation": "sum",
    "rounding": 2,
    "meta": {
        "nlp": {
            // enabled defaults to true
            "embedding_text": "revenue" // str or list of str
        }
    }
},
{
    "name": "net_revenue",
    "type": "numeric(10,2)",
    "aggregation": "sum",
    "rounding": 2,
    "meta": {
        "nlp": {
            "enabled": false
        }
    }
},
```

Additionally you may also exclude fields via the following warehouse-level configuration settings:

```javascript
{
    "meta": {
        "nlp": {
            "field_disabled_patterns": [
                // list of regex patterns to exclude
                "rpl_ma_5"
            ],
            "field_disabled_groups": [
                // list of "groups" to exclude, assuming you have
                // set group value in the field's meta dict.
                "No NLP"
            ]
        }
    },
    ...
}
```

If a field is disabled at any of the aforementioned levels it will be ignored. This type of control becomes useful as your data model gets more complex and you want to guide the NLP logic in cases where it could confuse similarly named fields. Any time you adjust which fields are excluded you will want to force recreation of your embeddings collection using the `force_recreate` flag on `Warehouse.init_embeddings`.

> *Note:* This feature is in its infancy. It's usefulness will depend on the
quality of both the input query and your data model (i.e. good field names)!

<a name="zillion-configuration"></a>

### **Zillion Configuration**

In addition to configuring the structure of your `Warehouse`, which will be
discussed further below, `Zillion` has a global configuration to control some
basic settings. The `ZILLION_CONFIG` environment var can point to a yaml config file. See `examples/sample_config.yaml` for more details on what values can be set. Environment vars prefixed with ZILLION_ can override config settings (i.e. ZILLION_DB_URL will override DB_URL).

The database used to store Zillion report specs can be configured by setting the DB_URL value in your `Zillion` config to a valid database connection string. By default a SQLite DB in /tmp is used.

---

<a name="example-sales-analytics"></a>

**Example - Sales Analytics**
-----------------------------

Below we will walk through a simple hypothetical sales data model that
demonstrates basic `DataSource` and `Warehouse` configuration and then shows
some sample [reports](#example-reports). The data is a simple SQLite database
that is part of the `Zillion` test code. For reference, the schema is as
follows:

```sql
CREATE TABLE partners (
  id INTEGER PRIMARY KEY,
  name VARCHAR NOT NULL UNIQUE,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE campaigns (
  id INTEGER PRIMARY KEY,
  name VARCHAR NOT NULL UNIQUE,
  category VARCHAR NOT NULL,
  partner_id INTEGER NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE leads (
  id INTEGER PRIMARY KEY,
  name VARCHAR NOT NULL,
  campaign_id INTEGER NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE sales (
  id INTEGER PRIMARY KEY,
  item VARCHAR NOT NULL,
  quantity INTEGER NOT NULL,
  revenue DECIMAL(10, 2),
  lead_id INTEGER NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```

<a name="example-warehouse-config"></a>

### **Warehouse Configuration**

A `Warehouse` may be created from a JSON or YAML configuration that defines
its fields, datasources, and tables. The code below shows how it can be done in as little as one line of code if you have a pointer to a JSON/YAML `Warehouse` config.

```python
from zillion import Warehouse

wh = Warehouse(config="https://raw.githubusercontent.com/totalhack/zillion/master/examples/example_wh_config.json")
```

This example config uses a `data_url` in its `DataSource` `connect` info that
tells `Zillion` to dynamically download that data and connect to it as a
SQLite database. This is useful for quick examples or analysis, though in most
scenarios you would put a connection string to an existing database like you
see
[here](https://raw.githubusercontent.com/totalhack/zillion/master/tests/test_mysql_ds_config.json)

The basics of `Zillion's` warehouse configuration structure are as follows:

A `Warehouse` config has the following main sections:

* `metrics`: optional list of metric configs for global metrics
* `dimensions`: optional list of dimension configs for global dimensions
* `datasources`: mapping of datasource names to datasource configs or config URLs

A `DataSource` config has the following main sections:

* `connect`: database connection url or dict of connect params
* `metrics`: optional list of metric configs specific to this datasource
* `dimensions`: optional list of dimension configs specific to this datasource
* `tables`: mapping of table names to table configs or config URLs

> Tip: datasource and table configs may also be replaced with a URL that points
to a local or remote config file.

In this example all four tables in our database are included in the config,
two as dimension tables and two as metric tables. The tables are linked
through a parent->child relationship: partners to campaigns, and leads to
sales.  Some tables also utilize the `create_fields` flag to automatically
create `Fields` on the datasource from column definitions. Other metrics and
dimensions are defined explicitly.

To view the structure of this `Warehouse` after init you can use the `print_info`
method which shows all metrics, dimensions, tables, and columns that are part
of your data warehouse:

```python
wh.print_info() # Formatted print of the Warehouse structure
```

For a deeper dive of the config schema please see the full
[docs](https://totalhack.github.io/zillion/zillion.configs/).

<a name="example-reports"></a>

### **Reports**

**Example:** Get sales, leads, and revenue by partner:

```python
result = wh.execute(
    metrics=["sales", "leads", "revenue"],
    dimensions=["partner_name"]
)

print(result.df)
"""
              sales  leads  revenue
partner_name
Partner A        11      4    165.0
Partner B         2      2     19.0
Partner C         5      1    118.5
"""
```

**Example:** Let's limit to Partner A and break down by its campaigns:

```python
result = wh.execute(
    metrics=["sales", "leads", "revenue"],
    dimensions=["campaign_name"],
    criteria=[("partner_name", "=", "Partner A")]
)

print(result.df)
"""
               sales  leads  revenue
campaign_name
Campaign 1A        5      2       83
Campaign 2A        6      2       82
"""
```

**Example:** The output below shows rollups at the campaign level within each
partner, and also a rollup of totals at the partner and campaign level.

> *Note:* the output contains a special character to mark DataFrame rollup rows
that were added to the result. The
[ReportResult](https://totalhack.github.io/zillion/zillion.report/#reportresult)
object contains some helper attributes to automatically access or filter
rollups, as well as a `df_display` attribute that returns the result with
friendlier display values substituted for special characters. The
under-the-hood special character is left here for illustration, but may not
render the same in all scenarios.

```python
from zillion import RollupTypes

result = wh.execute(
    metrics=["sales", "leads", "revenue"],
    dimensions=["partner_name", "campaign_name"],
    rollup=RollupTypes.ALL
)

print(result.df)
"""
                            sales  leads  revenue
partner_name campaign_name
Partner A    Campaign 1A      5.0    2.0     83.0
             Campaign 2A      6.0    2.0     82.0
             􏿿               11.0    4.0    165.0
Partner B    Campaign 1B      1.0    1.0      6.0
             Campaign 2B      1.0    1.0     13.0
             􏿿                2.0    2.0     19.0
Partner C    Campaign 1C      5.0    1.0    118.5
             􏿿                5.0    1.0    118.5
􏿿            􏿿               18.0    7.0    302.5
"""
```

See the `Report`
[docs](https://totalhack.github.io/zillion/zillion.report/#report) for more
information on supported rollup behavior.

**Example:** Save a report spec (not the data):

First you must make sure you have saved your `Warehouse`, as saved reports
are scoped to a particular `Warehouse` ID. To save a `Warehouse`
you must provide a URL that points to the complete config.

```python
name = "My Unique Warehouse Name"
config_url = <some url pointing to a complete warehouse config>
wh.save(name, config_url) # wh.id is populated after this

spec_id = wh.save_report(
    metrics=["sales", "leads", "revenue"],
    dimensions=["partner_name"]
)
```

> *Note*: If you built your `Warehouse` in python from a list of `DataSources`,
or passed in a `dict` for the `config` param on init, there currently is not
a built-in way to output a complete config to a file for reference when saving.

**Example:** Load and run a report from a spec ID:

```python
result = wh.execute_id(spec_id)
```

This assumes you have saved this report ID previously in the database specified by the DB_URL in your `Zillion` yaml configuration.

**Example:** Unsupported Grain

If you attempt an impossible report, you will get an
`UnsupportedGrainException`. The report below is impossible because it
attempts to break down the leads metric by a dimension that only exists
in a child table. Generally speaking, child tables can join back up to
parents (and "siblings" of parents) to find dimensions, but not the other
way around.

```python
# Fails with UnsupportedGrainException
result = wh.execute(
    metrics=["leads"],
    dimensions=["sale_id"]
)
```

---

<a name="advanced-topics"></a>

**Advanced Topics**
-------------------

<a name="subreports"></a>

### **Subreports**

Sometimes you need subquery-like functionality in order to filter one
report to the results of some other (that perhaps required a different grain).
Zillion provides a simplistic way of doing that by using the `in report` or `not in report`
criteria operations. There are two supported ways to specify the subreport: passing a
report spec ID or passing a dict of report params.

```python
# Assuming you have saved report 1234 and it has "partner" as a dimension:

result = warehouse.execute(
    metrics=["revenue", "leads"],
    dimensions=["date"],
    criteria=[
        ("date", ">", "2020-01-01"),
        ("partner", "in report", 1234)
    ]
)

# Or with a dict:

result = warehouse.execute(
    metrics=["revenue", "leads"],
    dimensions=["date"],
    criteria=[
        ("date", ">", "2020-01-01"),
        ("partner", "in report", dict(
            metrics=[...],
            dimension=["partner"],
            criteria=[...]
        ))
    ]
)
```

The criteria field used in `in report` or `not in report` must be a dimension
in the subreport. Note that subreports are executed at `Report` object initialization
time instead of during `execute` -- as such they can not be killed using `Report.kill`.
This may change down the road.

<a name="formula-metrics"></a>

### **Formula Metrics**

In our example above our config included a formula-based metric called "rpl",
which is simply `revenue / leads`. A `FormulaMetric` combines other metrics
and/or dimensions to calculate a new metric at the Combined Layer of
querying. The syntax must match your Combined Layer database, which is SQLite
in our example.

```json
{
    "name": "rpl",
    "aggregation": "mean",
    "rounding": 2,
    "formula": "{revenue}/{leads}"
}
```

<a name="divisor-metrics"></a>

### **Divisor Metrics**

As a convenience, rather than having to repeatedly define formula metrics for
rate variants of a core metric, you can specify a divisor metric configuration on a non-formula metric. As an example, say you have a `revenue` metric and want to create variants for `revenue_per_lead` and `revenue_per_sale`. You can define your revenue metric as follows:

```json
{
    "name": "revenue",
    "type": "numeric(10,2)",
    "aggregation": "sum",
    "rounding": 2,
    "divisors": {
        "metrics": [
            "leads",
            "sales"
        ]
    }
}
```

See `zillion.configs.DivisorsConfigSchema` for more details on configuration options, such as overriding naming templates, formula templates, and rounding.

<a name="aggregation-variants"></a>

### **Aggregation Variants**

Another minor convenience feature is the ability to automatically generate variants of metrics for different aggregation types in a single field configuration instead of across multiple fields in your config file. As an example, say you have a `sales` column in your data and want to create variants for `sales_mean` and `sales_sum`. You can define your metric as follows:

```json
{
    "name": "sales",
    "aggregation": {
        "mean": {
            "type": "numeric(10,2)",
            "rounding": 2
        },
        "sum": {
            "type": "integer"
        }
    }
}
```

The resulting warehouse would not have a `sales` metric, but would instead have `sales_mean` and `sales_sum`. Note that you can further customize the settings for the generated fields, such as setting a custom name, by specifying that in the nested settings for that aggregation type. In practice this is not a big efficiency gain over just defining the metrics separately, but some may prefer this approach.

<a name="formula-dimensions"></a>

### **Formula Dimensions**

Experimental support exists for `FormulaDimension` fields as well. A `FormulaDimension` can only use other dimensions as part of its formula, and it also gets evaluated in the Combined Layer database. As an additional restriction, a `FormulaDimension` can not be used in report criteria as those filters are evaluated at the DataSource Layer. The following example assumes a SQLite Combined Layer database:


```json
{
    "name": "partner_is_a",
    "formula": "{partner_name} = 'Partner A'"
}
```

<a name="datasource-formulas"></a>

### **DataSource Formulas**

Our example also includes a metric "sales" whose value is calculated via
formula at the DataSource Layer of querying. Note the following in the
`fields` list for the "id" param in the "main.sales" table. These formulas are
in the syntax of the particular `DataSource` database technology, which also
happens to be SQLite in our example.

```json
"fields": [
    "sale_id",
    {"name":"sales", "ds_formula": "COUNT(DISTINCT sales.id)"}
]
```

<a name="type-conversions"></a>

### **Type Conversions**

Our example also automatically created a handful of dimensions from the
"created_at" columns of the leads and sales tables. Support for automatic type
conversions is limited, but for date/datetime columns in supported
`DataSource` technologies you can get a variety of dimensions for free this
way.

The output of `wh.print_info` will show the added dimensions, which are
prefixed with "lead_" or "sale_" as specified by the optional
`type_conversion_prefix` in the config for each table. Some examples of
auto-generated dimensions in our example warehouse include sale_hour,
sale_day_name, sale_month, sale_year, etc. 

As an optimization in the where clause of underlying report queries, `Zillion` 
will try to apply conversions to criteria values instead of columns. For example, 
it is generally more efficient to query as `my_datetime > '2020-01-01' and my_datetime < '2020-01-02'`
instead of `DATE(my_datetime) == '2020-01-01'`, because the latter can prevent index
usage in many database technologies. The ability to apply conversions to values
instead of columns varies by field and `DataSource` technology as well. 

To prevent type conversions, set `skip_conversion_fields` to `true` on your
`DataSource` config.

See `zillion.field.TYPE_ALLOWED_CONVERSIONS` and `zillion.field.DIALECT_CONVERSIONS`
for more details on currently supported conversions.

<a name="adhoc-metrics"></a>

### **Ad Hoc Metrics**

You may also define metrics "ad hoc" with each report request. Below is an
example that creates a revenue-per-lead metric on the fly. These only exist
within the scope of the report, and the name can not conflict with any existing
fields:

```python
result = wh.execute(
    metrics=[
        "leads",
        {"formula": "{revenue}/{leads}", "name": "my_rpl"}
    ],
    dimensions=["partner_name"]
)
```

<a name="adhoc-dimensions"></a>

### **Ad Hoc Dimensions**

You may also define dimensions "ad hoc" with each report request. Below is an
example that creates a dimension that partitions on a particular dimension value on the fly. Ad Hoc Dimensions are a subclass of `FormulaDimension`s and therefore have the same restrictions, such as not being able to use a metric as a formula field. These only exist within the scope of the report, and the name can not conflict with any existing fields:

```python
result = wh.execute(
    metrics=["leads"],
    dimensions=[{"name": "partner_is_a", "formula": "{partner_name} = 'Partner A'"]
)
```

<a name="adhoc-tables"></a>

### **Ad Hoc Tables**

`Zillion` also supports creation or syncing of ad hoc tables in your database
during `DataSource` or `Warehouse` init. An example of a table config that
does this is shown
[here](https://github.com/totalhack/zillion/blob/master/tests/test_adhoc_ds_config.json).
It uses the table config's `data_url` and `if_exists` params to control the
syncing and/or creation of the "main.dma_zip" table from a remote CSV in a
SQLite database.  The same can be done in other database types too.

The potential performance drawbacks to such an approach should be obvious,
particularly if you are initializing your warehouse often or if the remote
data file is large. It is often better to sync and create your data ahead of
time so you have complete schema control, but this method can be very useful
in certain scenarios.

> **Warning**: be careful not to overwrite existing tables in your database!

<a name="technicals"></a>

### **Technicals**

There are a variety of technical computations that can be applied to metrics to
compute rolling, cumulative, or rank statistics. For example, to compute a 5-point
moving average on revenue one might define a new metric as follows:

```json
{
    "name": "revenue_ma_5",
    "type": "numeric(10,2)",
    "aggregation": "sum",
    "rounding": 2,
    "technical": "mean(5)"
}
```

Technical computations are computed at the Combined Layer, whereas the "aggregation"
is done at the DataSource Layer (hence needing to define both above). 

For more info on how shorthand technical strings are parsed, see the
[parse_technical_string](https://totalhack.github.io/zillion/zillion.configs/#parse_technical_string)
code. For a full list of supported technical types see
`zillion.core.TechnicalTypes`.

Technicals also support two modes: "group" and "all". The mode controls how to
apply the technical computation across the data's dimensions. In "group" mode,
it computes the technical across the last dimension, whereas in "all" mode in
computes the technical across all data without any regard for dimensions.

The point of this becomes more clear if you try to do a "cumsum" technical
across data broken down by something like ["partner_name", "date"]. If "group"
mode is used (the default in most cases) it will do cumulative sums *within*
each partner over the date ranges. If "all" mode is used, it will do a
cumulative sum across every data row. You can be explicit about the mode by
appending it to the technical string: i.e. "cumsum:all" or "mean(5):group"

---

<a name="config-variables"></a>

### **Config Variables**

If you'd like to avoid putting sensitive connection information directly in
your `DataSource` configs you can leverage config variables. In your `Zillion`
yaml config you can specify a `DATASOURCE_CONTEXTS` section as follows:

```yaml
DATASOURCE_CONTEXTS:
  my_ds_name:
    user: user123
    pass: goodpassword
    host: 127.0.0.1
    schema: reporting
```

Then when your `DataSource` config for the datasource named "my_ds_name" is
read, it can use this context to populate variables in your connection url:

```json
"datasources": {
    "my_ds_name": {
        "connect": "mysql+pymysql://{user}:{pass}@{host}/{schema}"
        ...
    }
}
```

<a name="datasource-priority"></a>

### **DataSource Priority**

On `Warehouse` init you can specify a default priority order for datasources
by name. This will come into play when a report could be satisfied by multiple
datasources. `DataSources` earlier in the list will be higher priority. This
would be useful if you wanted to favor a set of faster, aggregate tables that
are grouped in a `DataSource`.

```python
wh = Warehouse(config=config, ds_priority=["aggr_ds", "raw_ds", ...])
```

<a name="supported-datasources"></a>

**Supported DataSources**
-------------------------

`Zillion's` goal is to support any database technology that SQLAlchemy
supports (pictured below). That said the support and testing levels in `Zillion` vary at the
moment. In particular, the ability to do type conversions, database
reflection, and kill running queries all require some database-specific code
for support. The following list summarizes known support levels. Your mileage
may vary with untested database technologies that SQLAlchemy supports (it
might work just fine, just hasn't been tested yet). Please report bugs and
help add more support!

* SQLite: supported
* MySQL: supported
* PostgreSQL: supported
* DuckDB: supported
* BigQuery, Redshift, Snowflake, SingleStore, PlanetScale, etc: not tested but would like to support these

SQLAlchemy has connectors to many popular databases. The barrier to support many of these is likely
pretty low given the simple nature of the sql operations `Zillion` uses.

![SQLAlchemy Connectors](https://github.com/totalhack/zillion/blob/master/docs/images/sqlalchemy_connectors.webp?raw=true)

Note that the above is different than the database support for the Combined Layer
database. Currently only SQLite is supported there; that should be sufficient for
most use cases but more options will be added down the road.

<a name="multiprocess-considerations"></a>

**Multiprocess Considerations**
-------------------------------

If you plan to run `Zillion` in a multiprocess scenario, whether on a single
node or across multiple nodes, there are a couple of things to consider:

* SQLite DataSources do not scale well and may run into locking issues with multiple processes trying to access them on the same node.
* Any file-based database technology that isn't centrally accessible would be challenging when using multiple nodes.
* Ad Hoc DataSource and Ad Hoc Table downloads should be avoided as they may conflict/repeat across each process. Offload this to an external
ETL process that is better suited to manage those data flows in a scalable production scenario.

Note that you can still use the default SQLite in-memory Combined Layer DB without issues, as that is made on the fly with each report request and
requires no coordination/communication with other processes or nodes.

<a name="demo-ui"></a>

**Demo UI / Web API**
--------------------

[Zillion Web UI](https://github.com/totalhack/zillion-web) is a demo UI and web API for Zillion that also includes an experimental ChatGPT plugin. See the README there for more info on installation and project structure. Please note that the code is light on testing and polish, but is expected to work in modern browsers. Also ChatGPT plugins are quite slow at the moment, so currently that is mostly for fun and not that useful.

================================================
FILE: docs/markdown/readme_docs.md
================================================
<a name="documentation"></a>

**Documentation**
-----------------

More thorough documentation can be found [here](https://totalhack.github.io/zillion/).
You can supplement your knowledge by perusing the [tests](https://github.com/totalhack/zillion/tree/master/tests) directory
or the [API reference](https://totalhack.github.io/zillion/).


================================================
FILE: docs/markdown/readme_how_to_contribute.md
================================================
<a name="how-to-contribute"></a>

**How to Contribute**
---------------------

Please See the
[contributing](https://github.com/totalhack/zillion/blob/master/CONTRIBUTING.md)
guide for more information. If you are looking for inspiration, adding support and tests for additional database technologies would be a great help.



================================================
FILE: docs/markdown/readme_intro.md
================================================
**Introduction**
----------------

`Zillion` is a data modeling and analytics tool that allows combining and
analyzing data from multiple datasources through a simple API. It acts as a semantic layer
on top of your data, writes SQL so you don't have to, and easily bolts onto existing
database infrastructure via SQLAlchemy Core. The `Zillion` NLP extension has experimental
support for AI-powered natural language querying and warehouse configuration.

With `Zillion` you can:

* Define a warehouse that contains a variety of SQL and/or file-like
  datasources
* Define or reflect metrics, dimensions, and relationships in your data
* Run multi-datasource reports and combine the results in a DataFrame
* Flexibly aggregate your data with multi-level rollups and table pivots
* Customize or combine fields with formulas
* Apply technical transformations including rolling, cumulative, and rank
  statistics
* Apply automatic type conversions - i.e. get a "year" dimension for free
  from a "date" column
* Save and share report specifications
* Utilize ad hoc or public datasources, tables, and fields to enrich reports
* Query your warehouse with natural language (NLP extension)
* Leverage AI to bootstrap your warehouse configurations (NLP extension)


================================================
FILE: docs/markdown/readme_toc.md
================================================
**Table of Contents**
---------------------

* [Installation](#installation)
* [Primer](#primer)
    * [Metrics and Dimensions](#metrics-and-dimensions)
    * [Warehouse Theory](#warehouse-theory)
    * [Query Layers](#query-layers)
    * [Warehouse Creation](#warehouse-creation)
    * [Executing Reports](#executing-reports)
    * [Natural Language Querying](#natural-language-querying)
    * [Zillion Configuration](#zillion-configuration)
* [Example - Sales Analytics](#example-sales-analytics)
    * [Warehouse Configuration](#example-warehouse-config)
    * [Reports](#example-reports)
* [Advanced Topics](#advanced-topics)
    * [Subreports](#subreports)
    * [FormulaMetrics](#formula-metrics)
    * [Divisor Metrics](#divisor-metrics)
    * [Aggregation Variants](#aggregation-variants)
    * [FormulaDimensions](#formula-dimensions)
    * [DataSource Formulas](#datasource-formulas)
    * [Type Conversions](#type-conversions)
    * [AdHocMetrics](#adhoc-metrics)
    * [AdHocDimensions](#adhoc-dimensions)
    * [AdHocDataTables](#adhoc-data-tables)
    * [Technicals](#technicals)
    * [Config Variables](#config-variables)
    * [DataSource Priority](#datasource-priority)
* [Supported DataSources](#supported-datasources)
* [Multiprocess Considerations](#multiprocess-considerations)
* [Demo UI / Web API](#demo-ui)
* [Docs](#documentation)
* [How to Contribute](#how-to-contribute)


================================================
FILE: docs/mkdocs/api.md
================================================
# API Reference

* [zillion.configs](zillion.configs.md)
* [zillion.core](zillion.core.md)
* [zillion.datasource](zillion.datasource.md)
* [zillion.dialects](zillion.dialects.md)
* [zillion.field](zillion.field.md)
* [zillion.model](zillion.model.md)
* [zillion.nlp](zillion.nlp.md)
* [zillion.report](zillion.report.md)
* [zillion.scripts](zillion.scripts.md)
* [zillion.sql_utils](zillion.sql_utils.md)
* [zillion.version](zillion.version.md)
* [zillion.warehouse](zillion.warehouse.md)

================================================
FILE: docs/mkdocs/contributing.md
================================================
Your help and feedback are greatly appreciated. Whether it's supporting/testing
a new datasource type, finding bugs, or suggesting features, every little bit
helps make `Zillion` reach its potential. 

Please also consider manicuring or configuring datasets that others may find
useful. With as little as a CSV and a short JSON configuration file you can
give back to the community. You can host these shared datasources easily with
GitHub.

## **How to Contribute**

1.  Check for open issues or open a new issue to start a discussion around a
    feature idea or a bug.
2.  Fork [the repository](https://github.com/totalhack/zillion) on GitHub to
    start making your changes to the **master** branch (or branch off of it).
3.  Write a test which shows that the bug was fixed or that the feature works
    as expected.
4.  Send a [pull request](https://help.github.com/en/articles/creating-a-pull-request-from-a-fork). Add yourself to
    [AUTHORS](https://github.com/totalhack/zillion/blob/master/AUTHORS.md).

## **Development Setup**

```shell
# Clone this repo
git clone https://github.com/totalhack/zillion.git
cd zillion

# Install dependencies
# Note: activate your venv first if desired!
pip install ".[dev]"

# Bring up test databases -- test data will init the first time
# You can optionally run these DBs directly on your machine instead
docker-compose up

# Run tests
export ZILLION_CONFIG=$(pwd)/tests/test_config.yaml
cd tests
pytest
```

## **Good Bug Reports**

Please be aware of the following things when filing bug reports:

1. Avoid raising duplicate issues. *Please* use the GitHub issue search feature
   to check whether your bug report or feature request has been mentioned in
   the past. Duplicate bug reports and feature requests are a huge maintenance
   burden on the limited resources of the project. If it is clear from your
   report that you would have struggled to find the original, that's ok, but
   if searching for a selection of words in your issue title would have found
   the duplicate then the issue will likely be closed.
2. When filing bug reports about exceptions or tracebacks, please include the
   *complete* traceback. Partial tracebacks, or just the exception text, are
   not helpful. Issues that do not contain complete tracebacks may be closed
   without warning.
3. Make sure you provide a suitable amount of information to work with.

## **Questions**

The GitHub issue tracker is for *bug reports* and *feature requests*. Please do
not use it to ask questions about how to use Zillion.


================================================
FILE: docs/mkdocs/css/extra.css
================================================
pre { color: white !important; }

.md-clipboard:before {
    color: rgb(255, 255, 255);
}

.codehilite:hover .md-clipboard:before,.md-typeset .highlight:hover .md-clipboard:before,pre:hover .md-clipboard:before {
    color: rgba(255, 255, 255, 0.54) !important
}

.md-typeset code {
    font-size: 0.87em;
}

.md-typeset pre > code {
    background-color: transparent;
    color: #cccccc;
}

.md-typeset p > code {
    color: black;
}

.md-typeset li > code {
    color: black;
}

/* mkautodoc */

div.autodoc-docstring {
  padding-left: 20px;
  margin-bottom: 30px;
  border-left: 5px solid rgba(230, 230, 230);
}

div.autodoc-members {
  padding-left: 20px;
  margin-bottom: 15px;
}

div.autodoc-signature {
    color: black;
}

.autodoc-signature code {
    color: black;
}

.autodoc-param {
    font-size: 0.95em;
}


================================================
FILE: docs/mkdocs/index.md
================================================
[![Generic badge](https://img.shields.io/badge/Status-Alpha-yellow.svg)](https://shields.io/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
![License: MIT](https://img.shields.io/badge/license-MIT-blue)
![Python 3.6+](https://img.shields.io/badge/python-3.6%2B-blue)
[![Downloads](https://static.pepy.tech/badge/zillion)](https://pepy.tech/project/zillion)


**Introduction**
----------------

`Zillion` is a data modeling and analytics tool that allows combining and
analyzing data from multiple datasources through a simple API. It acts as a semantic layer
on top of your data, writes SQL so you don't have to, and easily bolts onto existing
database infrastructure via SQLAlchemy Core. The `Zillion` NLP extension has experimental
support for AI-powered natural language querying and warehouse configuration.

With `Zillion` you can:

* Define a warehouse that contains a variety of SQL and/or file-like
  datasources
* Define or reflect metrics, dimensions, and relationships in your data
* Run multi-datasource reports and combine the results in a DataFrame
* Flexibly aggregate your data with multi-level rollups and table pivots
* Customize or combine fields with formulas
* Apply technical transformations including rolling, cumulative, and rank
  statistics
* Apply automatic type conversions - i.e. get a "year" dimension for free
  from a "date" column
* Save and share report specifications
* Utilize ad hoc or public datasources, tables, and fields to enrich reports
* Query your warehouse with natural language (NLP extension)
* Leverage AI to bootstrap your warehouse configurations (NLP extension)


---

<a name="installation"></a>

**Installation**
----------------

> **Warning**: This project is in an alpha state and is subject to change. Please test carefully for production usage and report any issues.

```shell
$ pip install zillion

or

$ pip install zillion[nlp]
```

---

<a name="primer"></a>

**Primer**
----------

The following is meant to give a quick overview of some theory and
nomenclature used in data warehousing with `Zillion` which will be useful
if you are newer to this area. You can also skip below for a usage [example](#example-sales-analytics) or warehouse/datasource creation [quickstart](#warehouse-creation) options.

In short: `Zillion` writes SQL for you and makes data accessible through a very simple API:

```python
result = warehouse.execute(
    metrics=["revenue", "leads"],
    dimensions=["date"],
    criteria=[
        ("date", ">", "2020-01-01"),
        ("partner", "=", "Partner A")
    ]
)
```

<a name="metrics-and-dimensions"></a>

### **Metrics and Dimensions**

In `Zillion` there are two main types of `Fields` that will be used in
your report requests:

1. `Dimensions`: attributes of data used for labelling, grouping, and filtering
2. `Metrics`: facts and measures that may be broken down along dimensions

A `Field` encapsulates the concept of a column in your data. For example, you
may have a `Field` called "revenue". That `Field` may occur across several
datasources or possibly in multiple tables within a single datasource. `Zillion` 
understands that all of those columns represent the same concept, and it can try 
to use any of them to satisfy reports requesting "revenue".

Likewise there are two main types of tables used to structure your warehouse:

1. `Dimension Tables`: reference/attribute tables containing only related
dimensions
2. `Metric Tables`: fact tables that may contain metrics and some related
dimensions/attributes

Dimension tables are often static or slowly growing in terms of row count and contain
attributes tied to a primary key. Some common examples would be lists of US Zip Codes or
company/partner directories.

Metric tables are generally more transactional in nature. Some common examples
would be records for web requests, ecommerce sales, or stock market price history.

<a name="warehouse-theory"></a>

### **Warehouse Theory**

If you really want to go deep on dimensional modeling and the drill-across
querying technique `Zillion` employs, I recommend reading Ralph Kimball's
[book](https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/data-warehouse-dw-toolkit/) on data warehousing.

To summarize, [drill-across
querying](https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/drilling-across/)
forms one or more queries to satisfy a report request for `metrics` that may
exist across multiple datasources and/or tables at a particular `dimension` grain.

`Zillion` supports flexible warehouse setups such as
[snowflake](https://en.wikipedia.org/wiki/Snowflake_schema) or
[star](https://en.wikipedia.org/wiki/Star_schema) schemas, though it isn't
picky about it. You can specify table relationships through a parent-child
lineage, and `Zillion` can also infer acceptable joins based on the presence
of dimension table primary keys. `Zillion` does not support many-to-many relationships at this time, though most analytics-focused scenarios should be able to work around that by adding views to the model if needed.

<a name="query-layers"></a>

### **Query Layers**

`Zillion` reports can be thought of as running in two layers:

1. `DataSource Layer`: SQL queries against the warehouse's datasources
2. `Combined Layer`: A final SQL query against the combined data from the
DataSource Layer

The Combined Layer is just another SQL database (in-memory SQLite by default)
that is used to tie the datasource data together and apply a few additional
features such as rollups, row filters, row limits, sorting, pivots, and technical computations.

<a name="warehouse-creation"></a>

### **Warehouse Creation**

There are multiple ways to quickly initialize a warehouse from a local or remote file:

```python
# Path/link to a CSV, XLSX, XLS, JSON, HTML, or Google Sheet
# This builds a single-table Warehouse for quick/ad-hoc analysis.
url = "https://raw.githubusercontent.com/totalhack/zillion/master/tests/dma_zip.xlsx"
wh = Warehouse.from_data_file(url, ["Zip_Code"]) # Second arg is primary key

# Path/link to a sqlite database
# This can build a single or multi-table Warehouse
url = "https://github.com/totalhack/zillion/blob/master/tests/testdb1?raw=true"
wh = Warehouse.from_db_file(url)

# Path/link to a WarehouseConfigSchema (or pass a dict)
# This is the recommended production approach!
config = "https://raw.githubusercontent.com/totalhack/zillion/master/examples/example_wh_config.json"
wh = Warehouse(config=config)
```

Zillion also provides a helper script to boostrap a DataSource configuration file for an existing database. See `zillion.scripts.bootstrap_datasource_config.py`. The bootstrap script requires a connection/database url and output file as arguments. See `--help` output for more options, including the optional `--nlp` flag that leverages OpenAI to infer configuration information such as column types, table types, and table relationships. The NLP feature requires the NLP extension to be installed as well as the following set in your `Zillion` config file:

* OPENAI_MODEL
* OPENAI_API_KEY

<a name="executing-reports"></a>

### **Executing Reports**

The main purpose of `Zillion` is to execute reports against a `Warehouse`.
At a high level you will be crafting reports as follows:

```python
result = warehouse.execute(
    metrics=["revenue", "leads"],
    dimensions=["date"],
    criteria=[
        ("date", ">", "2020-01-01"),
        ("partner", "=", "Partner A")
    ]
)
print(result.df) # Pandas DataFrame
```

When comparing to writing SQL, it's helpful to think of the dimensions as the
target columns of a **group by** SQL statement. Think of the metrics as the
columns you are **aggregating**. Think of the criteria as the **where
clause**. Your criteria are applied in the DataSource Layer SQL queries.

The `ReportResult` has a Pandas DataFrame with the dimensions as the index and
the metrics as the columns.

A `Report` is said to have a `grain`, which defines the dimensions each metric
must be able to join to in order to satisfy the `Report` requirements. The
`grain` is a combination of **all** dimensions, including those referenced in
criteria or in metric formulas. In the example above, the `grain` would be
`{date, partner}`. Both "revenue" and "leads" must be able to join to those
dimensions for this report to be possible.

These concepts can take time to sink in and obviously vary with the specifics
of your data model, but you will become more familiar with them as you start
putting together reports against your data warehouses.

<a name="natural-language-querying"></a>

### **Natural Language Querying**

With the NLP extension `Zillion` has experimental support for natural language querying of your data warehouse. For example:

```python
result = warehouse.execute_text("revenue and leads by date last month")
print(result.df) # Pandas DataFrame
```

This NLP feature requires a running instance of Qdrant (vector database) and the following values set in your `Zillion` config file:

* QDRANT_HOST
* OPENAI_API_KEY

Embeddings will be produced and stored in both Qdrant and a local cache. The
vector database will be initialized the first time you try to use this by
analyzing all fields in your warehouse. An example docker file to run Qdrant is provided in the root of this repo.

You have some control over how fields get embedded. Namely in the configuration for any field you can choose whether to exclude a field from embeddings or override which embeddings map to that field. All fields are
included by default. The following example would exclude the `net_revenue` field from being embedded and map `revenue` metric requests to the `gross_revenue` field.

```javascript
{
    "name": "gross_revenue",
    "type": "numeric(10,2)",
    "aggregation": "sum",
    "rounding": 2,
    "meta": {
        "nlp": {
            // enabled defaults to true
            "embedding_text": "revenue" // str or list of str
        }
    }
},
{
    "name": "net_revenue",
    "type": "numeric(10,2)",
    "aggregation": "sum",
    "rounding": 2,
    "meta": {
        "nlp": {
            "enabled": false
        }
    }
},
```

Additionally you may also exclude fields via the following warehouse-level configuration settings:

```javascript
{
    "meta": {
        "nlp": {
            "field_disabled_patterns": [
                // list of regex patterns to exclude
                "rpl_ma_5"
            ],
            "field_disabled_groups": [
                // list of "groups" to exclude, assuming you have
                // set group value in the field's meta dict.
                "No NLP"
            ]
        }
    },
    ...
}
```

If a field is disabled at any of the aforementioned levels it will be ignored. This type of control becomes useful as your data model gets more complex and you want to guide the NLP logic in cases where it could confuse similarly named fields. Any time you adjust which fields are excluded you will want to force recreation of your embeddings collection using the `force_recreate` flag on `Warehouse.init_embeddings`.

> *Note:* This feature is in its infancy. It's usefulness will depend on the
quality of both the input query and your data model (i.e. good field names)!

<a name="zillion-configuration"></a>

### **Zillion Configuration**

In addition to configuring the structure of your `Warehouse`, which will be
discussed further below, `Zillion` has a global configuration to control some
basic settings. The `ZILLION_CONFIG` environment var can point to a yaml config file. See `examples/sample_config.yaml` for more details on what values can be set. Environment vars prefixed with ZILLION_ can override config settings (i.e. ZILLION_DB_URL will override DB_URL).

The database used to store Zillion report specs can be configured by setting the DB_URL value in your `Zillion` config to a valid database connection string. By default a SQLite DB in /tmp is used.

---

<a name="example-sales-analytics"></a>

**Example - Sales Analytics**
-----------------------------

Below we will walk through a simple hypothetical sales data model that
demonstrates basic `DataSource` and `Warehouse` configuration and then shows
some sample [reports](#example-reports). The data is a simple SQLite database
that is part of the `Zillion` test code. For reference, the schema is as
follows:

```sql
CREATE TABLE partners (
  id INTEGER PRIMARY KEY,
  name VARCHAR NOT NULL UNIQUE,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE campaigns (
  id INTEGER PRIMARY KEY,
  name VARCHAR NOT NULL UNIQUE,
  category VARCHAR NOT NULL,
  partner_id INTEGER NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE leads (
  id INTEGER PRIMARY KEY,
  name VARCHAR NOT NULL,
  campaign_id INTEGER NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE sales (
  id INTEGER PRIMARY KEY,
  item VARCHAR NOT NULL,
  quantity INTEGER NOT NULL,
  revenue DECIMAL(10, 2),
  lead_id INTEGER NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```

<a name="example-warehouse-config"></a>

### **Warehouse Configuration**

A `Warehouse` may be created from a JSON or YAML configuration that defines
its fields, datasources, and tables. The code below shows how it can be done in as little as one line of code if you have a pointer to a JSON/YAML `Warehouse` config.

```python
from zillion import Warehouse

wh = Warehouse(config="https://raw.githubusercontent.com/totalhack/zillion/master/examples/example_wh_config.json")
```

This example config uses a `data_url` in its `DataSource` `connect` info that
tells `Zillion` to dynamically download that data and connect to it as a
SQLite database. This is useful for quick examples or analysis, though in most
scenarios you would put a connection string to an existing database like you
see
[here](https://raw.githubusercontent.com/totalhack/zillion/master/tests/test_mysql_ds_config.json)

The basics of `Zillion's` warehouse configuration structure are as follows:

A `Warehouse` config has the following main sections:

* `metrics`: optional list of metric configs for global metrics
* `dimensions`: optional list of dimension configs for global dimensions
* `datasources`: mapping of datasource names to datasource configs or config URLs

A `DataSource` config has the following main sections:

* `connect`: database connection url or dict of connect params
* `metrics`: optional list of metric configs specific to this datasource
* `dimensions`: optional list of dimension configs specific to this datasource
* `tables`: mapping of table names to table configs or config URLs

> Tip: datasource and table configs may also be replaced with a URL that points
to a local or remote config file.

In this example all four tables in our database are included in the config,
two as dimension tables and two as metric tables. The tables are linked
through a parent->child relationship: partners to campaigns, and leads to
sales.  Some tables also utilize the `create_fields` flag to automatically
create `Fields` on the datasource from column definitions. Other metrics and
dimensions are defined explicitly.

To view the structure of this `Warehouse` after init you can use the `print_info`
method which shows all metrics, dimensions, tables, and columns that are part
of your data warehouse:

```python
wh.print_info() # Formatted print of the Warehouse structure
```

For a deeper dive of the config schema please see the full
[docs](https://totalhack.github.io/zillion/zillion.configs/).

<a name="example-reports"></a>

### **Reports**

**Example:** Get sales, leads, and revenue by partner:

```python
result = wh.execute(
    metrics=["sales", "leads", "revenue"],
    dimensions=["partner_name"]
)

print(result.df)
"""
              sales  leads  revenue
partner_name
Partner A        11      4    165.0
Partner B         2      2     19.0
Partner C         5      1    118.5
"""
```

**Example:** Let's limit to Partner A and break down by its campaigns:

```python
result = wh.execute(
    metrics=["sales", "leads", "revenue"],
    dimensions=["campaign_name"],
    criteria=[("partner_name", "=", "Partner A")]
)

print(result.df)
"""
               sales  leads  revenue
campaign_name
Campaign 1A        5      2       83
Campaign 2A        6      2       82
"""
```

**Example:** The output below shows rollups at the campaign level within each
partner, and also a rollup of totals at the partner and campaign level.

> *Note:* the output contains a special character to mark DataFrame rollup rows
that were added to the result. The
[ReportResult](https://totalhack.github.io/zillion/zillion.report/#reportresult)
object contains some helper attributes to automatically access or filter
rollups, as well as a `df_display` attribute that returns the result with
friendlier display values substituted for special characters. The
under-the-hood special character is left here for illustration, but may not
render the same in all scenarios.

```python
from zillion import RollupTypes

result = wh.execute(
    metrics=["sales", "leads", "revenue"],
    dimensions=["partner_name", "campaign_name"],
    rollup=RollupTypes.ALL
)

print(result.df)
"""
                            sales  leads  revenue
partner_name campaign_name
Partner A    Campaign 1A      5.0    2.0     83.0
             Campaign 2A      6.0    2.0     82.0
             􏿿               11.0    4.0    165.0
Partner B    Campaign 1B      1.0    1.0      6.0
             Campaign 2B      1.0    1.0     13.0
             􏿿                2.0    2.0     19.0
Partner C    Campaign 1C      5.0    1.0    118.5
             􏿿                5.0    1.0    118.5
􏿿            􏿿               18.0    7.0    302.5
"""
```

See the `Report`
[docs](https://totalhack.github.io/zillion/zillion.report/#report) for more
information on supported rollup behavior.

**Example:** Save a report spec (not the data):

First you must make sure you have saved your `Warehouse`, as saved reports
are scoped to a particular `Warehouse` ID. To save a `Warehouse`
you must provide a URL that points to the complete config.

```python
name = "My Unique Warehouse Name"
config_url = <some url pointing to a complete warehouse config>
wh.save(name, config_url) # wh.id is populated after this

spec_id = wh.save_report(
    metrics=["sales", "leads", "revenue"],
    dimensions=["partner_name"]
)
```

> *Note*: If you built your `Warehouse` in python from a list of `DataSources`,
or passed in a `dict` for the `config` param on init, there currently is not
a built-in way to output a complete config to a file for reference when saving.

**Example:** Load and run a report from a spec ID:

```python
result = wh.execute_id(spec_id)
```

This assumes you have saved this report ID previously in the database specified by the DB_URL in your `Zillion` yaml configuration.

**Example:** Unsupported Grain

If you attempt an impossible report, you will get an
`UnsupportedGrainException`. The report below is impossible because it
attempts to break down the leads metric by a dimension that only exists
in a child table. Generally speaking, child tables can join back up to
parents (and "siblings" of parents) to find dimensions, but not the other
way around.

```python
# Fails with UnsupportedGrainException
result = wh.execute(
    metrics=["leads"],
    dimensions=["sale_id"]
)
```

---

<a name="advanced-topics"></a>

**Advanced Topics**
-------------------

<a name="subreports"></a>

### **Subreports**

Sometimes you need subquery-like functionality in order to filter one
report to the results of some other (that perhaps required a different grain).
Zillion provides a simplistic way of doing that by using the `in report` or `not in report`
criteria operations. There are two supported ways to specify the subreport: passing a
report spec ID or passing a dict of report params.

```python
# Assuming you have saved report 1234 and it has "partner" as a dimension:

result = warehouse.execute(
    metrics=["revenue", "leads"],
    dimensions=["date"],
    criteria=[
        ("date", ">", "2020-01-01"),
        ("partner", "in report", 1234)
    ]
)

# Or with a dict:

result = warehouse.execute(
    metrics=["revenue", "leads"],
    dimensions=["date"],
    criteria=[
        ("date", ">", "2020-01-01"),
        ("partner", "in report", dict(
            metrics=[...],
            dimension=["partner"],
            criteria=[...]
        ))
    ]
)
```

The criteria field used in `in report` or `not in report` must be a dimension
in the subreport. Note that subreports are executed at `Report` object initialization
time instead of during `execute` -- as such they can not be killed using `Report.kill`.
This may change down the road.

<a name="formula-metrics"></a>

### **Formula Metrics**

In our example above our config included a formula-based metric called "rpl",
which is simply `revenue / leads`. A `FormulaMetric` combines other metrics
and/or dimensions to calculate a new metric at the Combined Layer of
querying. The syntax must match your Combined Layer database, which is SQLite
in our example.

```json
{
    "name": "rpl",
    "aggregation": "mean",
    "rounding": 2,
    "formula": "{revenue}/{leads}"
}
```

<a name="divisor-metrics"></a>

### **Divisor Metrics**

As a convenience, rather than having to repeatedly define formula metrics for
rate variants of a core metric, you can specify a divisor metric configuration on a non-formula metric. As an example, say you have a `revenue` metric and want to create variants for `revenue_per_lead` and `revenue_per_sale`. You can define your revenue metric as follows:

```json
{
    "name": "revenue",
    "type": "numeric(10,2)",
    "aggregation": "sum",
    "rounding": 2,
    "divisors": {
        "metrics": [
            "leads",
            "sales"
        ]
    }
}
```

See `zillion.configs.DivisorsConfigSchema` for more details on configuration options, such as overriding naming templates, formula templates, and rounding.

<a name="aggregation-variants"></a>

### **Aggregation Variants**

Another minor convenience feature is the ability to automatically generate variants of metrics for different aggregation types in a single field configuration instead of across multiple fields in your config file. As an example, say you have a `sales` column in your data and want to create variants for `sales_mean` and `sales_sum`. You can define your metric as follows:

```json
{
    "name": "sales",
    "aggregation": {
        "mean": {
            "type": "numeric(10,2)",
            "rounding": 2
        },
        "sum": {
            "type": "integer"
        }
    }
}
```

The resulting warehouse would not have a `sales` metric, but would instead have `sales_mean` and `sales_sum`. Note that you can further customize the settings for the generated fields, such as setting a custom name, by specifying that in the nested settings for that aggregation type. In practice this is not a big efficiency gain over just defining the metrics separately, but some may prefer this approach.

<a name="formula-dimensions"></a>

### **Formula Dimensions**

Experimental support exists for `FormulaDimension` fields as well. A `FormulaDimension` can only use other dimensions as part of its formula, and it also gets evaluated in the Combined Layer database. As an additional restriction, a `FormulaDimension` can not be used in report criteria as those filters are evaluated at the DataSource Layer. The following example assumes a SQLite Combined Layer database:


```json
{
    "name": "partner_is_a",
    "formula": "{partner_name} = 'Partner A'"
}
```

<a name="datasource-formulas"></a>

### **DataSource Formulas**

Our example also includes a metric "sales" whose value is calculated via
formula at the DataSource Layer of querying. Note the following in the
`fields` list for the "id" param in the "main.sales" table. These formulas are
in the syntax of the particular `DataSource` database technology, which also
happens to be SQLite in our example.

```json
"fields": [
    "sale_id",
    {"name":"sales", "ds_formula": "COUNT(DISTINCT sales.id)"}
]
```

<a name="type-conversions"></a>

### **Type Conversions**

Our example also automatically created a handful of dimensions from the
"created_at" columns of the leads and sales tables. Support for automatic type
conversions is limited, but for date/datetime columns in supported
`DataSource` technologies you can get a variety of dimensions for free this
way.

The output of `wh.print_info` will show the added dimensions, which are
prefixed with "lead_" or "sale_" as specified by the optional
`type_conversion_prefix` in the config for each table. Some examples of
auto-generated dimensions in our example warehouse include sale_hour,
sale_day_name, sale_month, sale_year, etc. 

As an optimization in the where clause of underlying report queries, `Zillion` 
will try to apply conversions to criteria values instead of columns. For example, 
it is generally more efficient to query as `my_datetime > '2020-01-01' and my_datetime < '2020-01-02'`
instead of `DATE(my_datetime) == '2020-01-01'`, because the latter can prevent index
usage in many database technologies. The ability to apply conversions to values
instead of columns varies by field and `DataSource` technology as well. 

To prevent type conversions, set `skip_conversion_fields` to `true` on your
`DataSource` config.

See `zillion.field.TYPE_ALLOWED_CONVERSIONS` and `zillion.field.DIALECT_CONVERSIONS`
for more details on currently supported conversions.

<a name="adhoc-metrics"></a>

### **Ad Hoc Metrics**

You may also define metrics "ad hoc" with each report request. Below is an
example that creates a revenue-per-lead metric on the fly. These only exist
within the scope of the report, and the name can not conflict with any existing
fields:

```python
result = wh.execute(
    metrics=[
        "leads",
        {"formula": "{revenue}/{leads}", "name": "my_rpl"}
    ],
    dimensions=["partner_name"]
)
```

<a name="adhoc-dimensions"></a>

### **Ad Hoc Dimensions**

You may also define dimensions "ad hoc" with each report request. Below is an
example that creates a dimension that partitions on a particular dimension value on the fly. Ad Hoc Dimensions are a subclass of `FormulaDimension`s and therefore have the same restrictions, such as not being able to use a metric as a formula field. These only exist within the scope of the report, and the name can not conflict with any existing fields:

```python
result = wh.execute(
    metrics=["leads"],
    dimensions=[{"name": "partner_is_a", "formula": "{partner_name} = 'Partner A'"]
)
```

<a name="adhoc-tables"></a>

### **Ad Hoc Tables**

`Zillion` also supports creation or syncing of ad hoc tables in your database
during `DataSource` or `Warehouse` init. An example of a table config that
does this is shown
[here](https://github.com/totalhack/zillion/blob/master/tests/test_adhoc_ds_config.json).
It uses the table config's `data_url` and `if_exists` params to control the
syncing and/or creation of the "main.dma_zip" table from a remote CSV in a
SQLite database.  The same can be done in other database types too.

The potential performance drawbacks to such an approach should be obvious,
particularly if you are initializing your warehouse often or if the remote
data file is large. It is often better to sync and create your data ahead of
time so you have complete schema control, but this method can be very useful
in certain scenarios.

> **Warning**: be careful not to overwrite existing tables in your database!

<a name="technicals"></a>

### **Technicals**

There are a variety of technical computations that can be applied to metrics to
compute rolling, cumulative, or rank statistics. For example, to compute a 5-point
moving average on revenue one might define a new metric as follows:

```json
{
    "name": "revenue_ma_5",
    "type": "numeric(10,2)",
    "aggregation": "sum",
    "rounding": 2,
    "technical": "mean(5)"
}
```

Technical computations are computed at the Combined Layer, whereas the "aggregation"
is done at the DataSource Layer (hence needing to define both above). 

For more info on how shorthand technical strings are parsed, see the
[parse_technical_string](https://totalhack.github.io/zillion/zillion.configs/#parse_technical_string)
code. For a full list of supported technical types see
`zillion.core.TechnicalTypes`.

Technicals also support two modes: "group" and "all". The mode controls how to
apply the technical computation across the data's dimensions. In "group" mode,
it computes the technical across the last dimension, whereas in "all" mode in
computes the technical across all data without any regard for dimensions.

The point of this becomes more clear if you try to do a "cumsum" technical
across data broken down by something like ["partner_name", "date"]. If "group"
mode is used (the default in most cases) it will do cumulative sums *within*
each partner over the date ranges. If "all" mode is used, it will do a
cumulative sum across every data row. You can be explicit about the mode by
appending it to the technical string: i.e. "cumsum:all" or "mean(5):group"

---

<a name="config-variables"></a>

### **Config Variables**

If you'd like to avoid putting sensitive connection information directly in
your `DataSource` configs you can leverage config variables. In your `Zillion`
yaml config you can specify a `DATASOURCE_CONTEXTS` section as follows:

```yaml
DATASOURCE_CONTEXTS:
  my_ds_name:
    user: user123
    pass: goodpassword
    host: 127.0.0.1
    schema: reporting
```

Then when your `DataSource` config for the datasource named "my_ds_name" is
read, it can use this context to populate variables in your connection url:

```json
"datasources": {
    "my_ds_name": {
        "connect": "mysql+pymysql://{user}:{pass}@{host}/{schema}"
        ...
    }
}
```

<a name="datasource-priority"></a>

### **DataSource Priority**

On `Warehouse` init you can specify a default priority order for datasources
by name. This will come into play when a report could be satisfied by multiple
datasources. `DataSources` earlier in the list will be higher priority. This
would be useful if you wanted to favor a set of faster, aggregate tables that
are grouped in a `DataSource`.

```python
wh = Warehouse(config=config, ds_priority=["aggr_ds", "raw_ds", ...])
```

<a name="supported-datasources"></a>

**Supported DataSources**
-------------------------

`Zillion's` goal is to support any database technology that SQLAlchemy
supports (pictured below). That said the support and testing levels in `Zillion` vary at the
moment. In particular, the ability to do type conversions, database
reflection, and kill running queries all require some database-specific code
for support. The following list summarizes known support levels. Your mileage
may vary with untested database technologies that SQLAlchemy supports (it
might work just fine, just hasn't been tested yet). Please report bugs and
help add more support!

* SQLite: supported
* MySQL: supported
* PostgreSQL: supported
* DuckDB: supported
* BigQuery, Redshift, Snowflake, SingleStore, PlanetScale, etc: not tested but would like to support these

SQLAlchemy has connectors to many popular databases. The barrier to support many of these is likely
pretty low given the simple nature of the sql operations `Zillion` uses.

![SQLAlchemy Connectors](https://github.com/totalhack/zillion/blob/master/docs/images/sqlalchemy_connectors.webp?raw=true)

Note that the above is different than the database support for the Combined Layer
database. Currently only SQLite is supported there; that should be sufficient for
most use cases but more options will be added down the road.

<a name="multiprocess-considerations"></a>

**Multiprocess Considerations**
-------------------------------

If you plan to run `Zillion` in a multiprocess scenario, whether on a single
node or across multiple nodes, there are a couple of things to consider:

* SQLite DataSources do not scale well and may run into locking issues with multiple processes trying to access them on the same node.
* Any file-based database technology that isn't centrally accessible would be challenging when using multiple nodes.
* Ad Hoc DataSource and Ad Hoc Table downloads should be avoided as they may conflict/repeat across each process. Offload this to an external
ETL process that is better suited to manage those data flows in a scalable production scenario.

Note that you can still use the default SQLite in-memory Combined Layer DB without issues, as that is made on the fly with each report request and
requires no coordination/communication with other processes or nodes.

<a name="demo-ui"></a>

**Demo UI / Web API**
--------------------

[Zillion Web UI](https://github.com/totalhack/zillion-web) is a demo UI and web API for Zillion that also includes an experimental ChatGPT plugin. See the README there for more info on installation and project structure. Please note that the code is light on testing and polish, but is expected to work in modern browsers. Also ChatGPT plugins are quite slow at the moment, so currently that is mostly for fun and not that useful.




================================================
FILE: docs/mkdocs/zillion.configs.md
================================================
[//]: # (This is an auto-generated file. Do not edit)
# Module zillion.configs


## [AdHocFieldSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L982-L985)

*Bases*: zillion.configs.FormulaFieldConfigSchema

::: zillion.configs.AdHocFieldSchema
    :docstring:
    


## [AdHocMetricSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L988-L1015)

*Bases*: zillion.configs.AdHocFieldSchema

::: zillion.configs.AdHocMetricSchema
    :docstring:
    


## [BaseSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L541-L556)

*Bases*: marshmallow.schema.Schema

::: zillion.configs.BaseSchema
    :docstring:
    


## [BollingerTechnical](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1609-L1637)

*Bases*: zillion.configs.RollingTechnical

::: zillion.configs.BollingerTechnical
    :docstring:
    :members: apply get_default_mode parse_technical_string_params


## [ColumnConfigSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L648-L651)

*Bases*: zillion.configs.ColumnInfoSchema

::: zillion.configs.ColumnConfigSchema
    :docstring:
    


## [ColumnFieldConfigField](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L610-L615)

*Bases*: marshmallow.fields.Field

::: zillion.configs.ColumnFieldConfigField
    :docstring:
    


## [ColumnFieldConfigSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L594-L607)

*Bases*: zillion.configs.BaseSchema

::: zillion.configs.ColumnFieldConfigSchema
    :docstring:
    


## [ColumnInfo](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1324-L1410)

*Bases*: zillion.configs.ZillionInfo, tlbx.logging_utils.PrintMixin

::: zillion.configs.ColumnInfo
    :docstring:
    :members: add_field create field_ds_formula get_criteria_conversion get_field get_field_names get_fields has_field has_field_ds_formula schema_load schema_validate


## [ColumnInfoSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L618-L645)

*Bases*: zillion.configs.BaseSchema

::: zillion.configs.ColumnInfoSchema
    :docstring:
    


## [ConfigMixin](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1228-L1245)

::: zillion.configs.ConfigMixin
    :docstring:
    :members: from_config to_config


## [DataSourceConfigField](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1124-L1130)

*Bases*: marshmallow.fields.Field

::: zillion.configs.DataSourceConfigField
    :docstring:
    


## [DataSourceConfigSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1069-L1121)

*Bases*: zillion.configs.BaseSchema

::: zillion.configs.DataSourceConfigSchema
    :docstring:
    


## [DataSourceConnectField](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1036-L1041)

*Bases*: marshmallow.fields.Field

::: zillion.configs.DataSourceConnectField
    :docstring:
    


## [DataSourceConnectSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1026-L1033)

*Bases*: zillion.configs.BaseSchema

::: zillion.configs.DataSourceConnectSchema
    :docstring:
    


## [DataSourceCriteriaConversionsField](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L575-L583)

*Bases*: marshmallow.fields.Field

::: zillion.configs.DataSourceCriteriaConversionsField
    :docstring:
    


## [DiffTechnical](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1561-L1576)

*Bases*: zillion.configs.PandasTechnical

::: zillion.configs.DiffTechnical
    :docstring:
    :members: apply get_default_mode parse_technical_string_params


## [DimensionConfigSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L968-L971)

*Bases*: zillion.configs.FieldConfigSchema, zillion.configs.DimensionConfigSchemaMixin

::: zillion.configs.DimensionConfigSchema
    :docstring:
    


## [DimensionConfigSchemaMixin](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L945-L965)

::: zillion.configs.DimensionConfigSchemaMixin
    :docstring:
    


## [DimensionValuesField](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L586-L591)

*Bases*: marshmallow.fields.Field

::: zillion.configs.DimensionValuesField
    :docstring:
    


## [DivisorsConfigField](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L914-L919)

*Bases*: marshmallow.fields.Field

::: zillion.configs.DivisorsConfigField
    :docstring:
    


## [DivisorsConfigSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L892-L911)

*Bases*: zillion.configs.BaseSchema

::: zillion.configs.DivisorsConfigSchema
    :docstring:
    


## [FieldConfigSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L796-L819)

*Bases*: zillion.configs.BaseSchema

::: zillion.configs.FieldConfigSchema
    :docstring:
    


## [FieldMetaNLPConfigSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L765-L779)

*Bases*: zillion.configs.BaseSchema

::: zillion.configs.FieldMetaNLPConfigSchema
    :docstring:
    


## [FormulaDimensionConfigSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L974-L979)

*Bases*: zillion.configs.FormulaFieldConfigSchema, zillion.configs.DimensionConfigSchemaMixin

::: zillion.configs.FormulaDimensionConfigSchema
    :docstring:
    


## [FormulaFieldConfigSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L822-L847)

*Bases*: zillion.configs.BaseSchema

::: zillion.configs.FormulaFieldConfigSchema
    :docstring:
    


## [FormulaMetricConfigSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L939-L942)

*Bases*: zillion.configs.FormulaFieldConfigSchema, zillion.configs.MetricConfigSchemaMixin

::: zillion.configs.FormulaMetricConfigSchema
    :docstring:
    


## [MetricConfigSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L922-L936)

*Bases*: zillion.configs.FieldConfigSchema, zillion.configs.MetricConfigSchemaMixin

::: zillion.configs.MetricConfigSchema
    :docstring:
    


## [MetricConfigSchemaMixin](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L850-L889)

::: zillion.configs.MetricConfigSchemaMixin
    :docstring:
    


## [NLPEmbeddingTextField](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L757-L762)

*Bases*: marshmallow.fields.Field

::: zillion.configs.NLPEmbeddingTextField
    :docstring:
    


## [PandasTechnical](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1530-L1542)

*Bases*: zillion.configs.Technical

::: zillion.configs.PandasTechnical
    :docstring:
    :members: apply get_default_mode parse_technical_string_params


## [PolyNested](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L520-L538)

*Bases*: marshmallow.fields.Nested

::: zillion.configs.PolyNested
    :docstring:
    


## [RankTechnical](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1545-L1558)

*Bases*: zillion.configs.PandasTechnical

::: zillion.configs.RankTechnical
    :docstring:
    :members: apply get_default_mode parse_technical_string_params


## [RollingTechnical](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1579-L1606)

*Bases*: zillion.configs.Technical

::: zillion.configs.RollingTechnical
    :docstring:
    :members: apply get_default_mode parse_technical_string_params


## [TableConfigSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L715-L754)

*Bases*: zillion.configs.TableInfoSchema

::: zillion.configs.TableConfigSchema
    :docstring:
    


## [TableInfo](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1316-L1321)

*Bases*: zillion.configs.ZillionInfo, tlbx.logging_utils.PrintMixin

::: zillion.configs.TableInfo
    :docstring:
    :members: create schema_load schema_validate


## [TableInfoSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L662-L712)

*Bases*: zillion.configs.BaseSchema

::: zillion.configs.TableInfoSchema
    :docstring:
    


## [TableNameField](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1018-L1023)

*Bases*: marshmallow.fields.String

::: zillion.configs.TableNameField
    :docstring:
    


## [TableTypeField](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L654-L659)

*Bases*: marshmallow.fields.Field

::: zillion.configs.TableTypeField
    :docstring:
    


## [Technical](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1416-L1527)

*Bases*: tlbx.object_utils.MappingMixin, tlbx.logging_utils.PrintMixin

::: zillion.configs.Technical
    :docstring:
    :members: apply get_default_mode parse_technical_string_params


## [TechnicalField](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L567-L572)

*Bases*: marshmallow.fields.Field

::: zillion.configs.TechnicalField
    :docstring:
    


## [TechnicalInfoSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L559-L564)

*Bases*: zillion.configs.BaseSchema

::: zillion.configs.TechnicalInfoSchema
    :docstring:
    


## [WarehouseConfigSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1154-L1225)

*Bases*: zillion.configs.BaseSchema

::: zillion.configs.WarehouseConfigSchema
    :docstring:
    


## [WarehouseMetaNLPConfigSchema](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1133-L1151)

*Bases*: zillion.configs.BaseSchema

::: zillion.configs.WarehouseMetaNLPConfigSchema
    :docstring:
    


## [ZillionInfo](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1248-L1313)

*Bases*: tlbx.object_utils.MappingMixin

::: zillion.configs.ZillionInfo
    :docstring:
    :members: create schema_load schema_validate


## [check_field_meta_nlp_config](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L782-L793)

::: zillion.configs.check_field_meta_nlp_config
    :docstring:


## [check_metric_configs](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1044-L1066)

::: zillion.configs.check_metric_configs
    :docstring:


## [create_technical](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1721-L1745)

::: zillion.configs.create_technical
    :docstring:


## [default_field_display_name](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L217-L231)

::: zillion.configs.default_field_display_name
    :docstring:


## [default_field_name](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L201-L214)

::: zillion.configs.default_field_name
    :docstring:


## [field_safe_name](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L182-L198)

::: zillion.configs.field_safe_name
    :docstring:


## [get_aggregation_metrics](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L503-L516)

::: zillion.configs.get_aggregation_metrics
    :docstring:


## [get_divisor_metrics](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L457-L500)

::: zillion.configs.get_divisor_metrics
    :docstring:


## [has_valid_sqlalchemy_type_values](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L300-L307)

::: zillion.configs.has_valid_sqlalchemy_type_values
    :docstring:


## [is_active](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L234-L240)

::: zillion.configs.is_active
    :docstring:


## [is_valid_aggregation](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L310-L319)

::: zillion.configs.is_valid_aggregation
    :docstring:


## [is_valid_column_field_config](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L322-L330)

::: zillion.configs.is_valid_column_field_config
    :docstring:


## [is_valid_connect_type](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L408-L416)

::: zillion.configs.is_valid_connect_type
    :docstring:


## [is_valid_datasource_config](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L430-L436)

::: zillion.configs.is_valid_datasource_config
    :docstring:


## [is_valid_datasource_connect](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L419-L427)

::: zillion.configs.is_valid_datasource_connect
    :docstring:


## [is_valid_datasource_criteria_conversions](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L365-L405)

::: zillion.configs.is_valid_datasource_criteria_conversions
    :docstring:


## [is_valid_dimension_values](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L356-L362)

::: zillion.configs.is_valid_dimension_values
    :docstring:


## [is_valid_divisors_config](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L448-L454)

::: zillion.configs.is_valid_divisors_config
    :docstring:


## [is_valid_field_display_name](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L278-L287)

::: zillion.configs.is_valid_field_display_name
    :docstring:


## [is_valid_field_name](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L264-L275)

::: zillion.configs.is_valid_field_name
    :docstring:


## [is_valid_field_nlp_embedding_text_config](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L439-L445)

::: zillion.configs.is_valid_field_nlp_embedding_text_config
    :docstring:


## [is_valid_if_exists](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L257-L261)

::: zillion.configs.is_valid_if_exists
    :docstring:


## [is_valid_sqlalchemy_type](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L290-L297)

::: zillion.configs.is_valid_sqlalchemy_type
    :docstring:


## [is_valid_table_name](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L250-L254)

::: zillion.configs.is_valid_table_name
    :docstring:


## [is_valid_table_type](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L243-L247)

::: zillion.configs.is_valid_table_type
    :docstring:


## [is_valid_technical](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L347-L353)

::: zillion.configs.is_valid_technical
    :docstring:


## [is_valid_technical_mode](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L340-L344)

::: zillion.configs.is_valid_technical_mode
    :docstring:


## [is_valid_technical_type](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L333-L337)

::: zillion.configs.is_valid_technical_type
    :docstring:


## [load_datasource_config](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L137-L153)

::: zillion.configs.load_datasource_config
    :docstring:


## [load_datasource_config_from_env](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L156-L160)

::: zillion.configs.load_datasource_config_from_env
    :docstring:


## [load_warehouse_config](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L111-L127)

::: zillion.configs.load_warehouse_config
    :docstring:


## [load_warehouse_config_from_env](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L130-L134)

::: zillion.configs.load_warehouse_config_from_env
    :docstring:


## [parse_schema_file](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L86-L108)

::: zillion.configs.parse_schema_file
    :docstring:


## [parse_technical_string](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L1690-L1718)

::: zillion.configs.parse_technical_string
    :docstring:


## [table_safe_name](https://github.com/totalhack/zillion/blob/master/zillion/configs.py#L163-L179)

::: zillion.configs.table_safe_name
    :docstring:



================================================
FILE: docs/mkdocs/zillion.core.md
================================================
[//]: # (This is an auto-generated file. Do not edit)
# Module zillion.core


## [AggregationTypes](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L162-L171)

::: zillion.core.AggregationTypes
    :docstring:
    


## [DataSourceQueryModes](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L223-L227)

::: zillion.core.DataSourceQueryModes
    :docstring:
    


## [ExecutionState](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L230-L235)

::: zillion.core.ExecutionState
    :docstring:
    


## [FieldTypes](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L148-L152)

::: zillion.core.FieldTypes
    :docstring:
    


## [IfExistsModes](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L238-L248)

::: zillion.core.IfExistsModes
    :docstring:
    


## [IfFileExistsModes](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L251-L258)

*Bases*: zillion.core.IfExistsModes

::: zillion.core.IfFileExistsModes
    :docstring:
    


## [OrderByTypes](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L216-L220)

::: zillion.core.OrderByTypes
    :docstring:
    


## [RollupTypes](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L209-L213)

::: zillion.core.RollupTypes
    :docstring:
    


## [TableTypes](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L155-L159)

::: zillion.core.TableTypes
    :docstring:
    


## [TechnicalModes](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L194-L206)

::: zillion.core.TechnicalModes
    :docstring:
    


## [TechnicalTypes](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L174-L191)

::: zillion.core.TechnicalTypes
    :docstring:
    


## [dbg](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L507-L511)

::: zillion.core.dbg
    :docstring:


## [dbgsql](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L514-L518)

::: zillion.core.dbgsql
    :docstring:


## [dictmerge](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L380-L407)

::: zillion.core.dictmerge
    :docstring:


## [download_file](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L310-L320)

::: zillion.core.download_file
    :docstring:


## [error](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L535-L539)

::: zillion.core.error
    :docstring:


## [get_modified_time](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L323-L325)

::: zillion.core.get_modified_time
    :docstring:


## [get_time_since_modified](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L328-L330)

::: zillion.core.get_time_since_modified
    :docstring:


## [get_zillion_config_log_level](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L477-L478)

::: zillion.core.get_zillion_config_log_level
    :docstring:


## [igetattr](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L284-L291)

::: zillion.core.igetattr
    :docstring:


## [info](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L521-L525)

::: zillion.core.info
    :docstring:


## [load_json_or_yaml_from_str](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L340-L377)

::: zillion.core.load_json_or_yaml_from_str
    :docstring:


## [load_yaml](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L333-L337)

::: zillion.core.load_yaml
    :docstring:


## [load_zillion_config](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L410-L471)

::: zillion.core.load_zillion_config
    :docstring:


## [powerset](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L264-L269)

::: zillion.core.powerset
    :docstring:


## [raiseif](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L272-L275)

::: zillion.core.raiseif
    :docstring:


## [raiseifnot](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L278-L281)

::: zillion.core.raiseifnot
    :docstring:


## [read_filepath_or_buffer](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L294-L307)

::: zillion.core.read_filepath_or_buffer
    :docstring:


## [set_log_level](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L501-L504)

::: zillion.core.set_log_level
    :docstring:


## [set_log_level_from_config](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L484-L495)

::: zillion.core.set_log_level_from_config
    :docstring:


## [warn](https://github.com/totalhack/zillion/blob/master/zillion/core.py#L528-L532)

::: zillion.core.warn
    :docstring:



================================================
FILE: docs/mkdocs/zillion.datasource.md
================================================
[//]: # (This is an auto-generated file. Do not edit)
# Module zillion.datasource


## [AdHocDataTable](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L1652-L1870)

*Bases*: tlbx.logging_utils.PrintMixin

::: zillion.datasource.AdHocDataTable
    :docstring:
    :members: get_dataframe table_exists to_sql


## [CSVDataTable](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L1889-L1899)

*Bases*: zillion.datasource.AdHocDataTable

::: zillion.datasource.CSVDataTable
    :docstring:
    :members: get_dataframe table_exists to_sql


## [DataSource](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L494-L1649)

*Bases*: zillion.field.FieldManagerMixin, tlbx.logging_utils.PrintMixin

::: zillion.datasource.DataSource
    :docstring:
    :members: add_dimension add_metric apply_config directly_has_dimension directly_has_field directly_has_metric find_descendent_tables find_neighbor_tables find_possible_table_sets from_data_file from_datatables from_db_file get_child_field_managers get_columns_with_field get_dialect_name get_dim_tables_with_dim get_dimension get_dimension_configs get_dimension_names get_dimensions get_direct_dimension_configs get_direct_dimensions get_direct_fields get_direct_metric_configs get_direct_metrics get_field get_field_instances get_field_managers get_field_names get_fields get_metric get_metric_configs get_metric_names get_metric_tables_with_metric get_metrics get_params get_possible_joins get_table get_tables_with_field has_dimension has_field has_metric has_table print_dimensions print_info print_metrics


## [ExcelDataTable](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L1902-L1914)

*Bases*: zillion.datasource.AdHocDataTable

::: zillion.datasource.ExcelDataTable
    :docstring:
    :members: get_dataframe table_exists to_sql


## [GoogleSheetsDataTable](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L1944-L1966)

*Bases*: zillion.datasource.AdHocDataTable

::: zillion.datasource.GoogleSheetsDataTable
    :docstring:
    :members: get_dataframe table_exists to_sql


## [HTMLDataTable](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L1928-L1941)

*Bases*: zillion.datasource.AdHocDataTable

::: zillion.datasource.HTMLDataTable
    :docstring:
    :members: get_dataframe table_exists to_sql


## [JSONDataTable](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L1917-L1925)

*Bases*: zillion.datasource.AdHocDataTable

::: zillion.datasource.JSONDataTable
    :docstring:
    :members: get_dataframe table_exists to_sql


## [Join](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L341-L451)

*Bases*: tlbx.logging_utils.PrintMixin

::: zillion.datasource.Join
    :docstring:
    :members: add_field add_fields add_join_part_tables combine get_covered_fields join_fields_for_table join_parts_for_table


## [JoinPart](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L331-L338)

*Bases*: tlbx.logging_utils.PrintMixin

::: zillion.datasource.JoinPart
    :docstring:
    


## [NeighborTable](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L484-L491)

*Bases*: tlbx.logging_utils.PrintMixin

::: zillion.datasource.NeighborTable
    :docstring:
    


## [SQLiteDataTable](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L1873-L1886)

*Bases*: zillion.datasource.AdHocDataTable

::: zillion.datasource.SQLiteDataTable
    :docstring:
    :members: get_dataframe table_exists to_sql


## [TableSet](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L280-L328)

*Bases*: tlbx.logging_utils.PrintMixin

::: zillion.datasource.TableSet
    :docstring:
    :members: get_covered_fields get_covered_metrics


## [connect_url_to_metadata](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L88-L101)

::: zillion.datasource.connect_url_to_metadata
    :docstring:


## [data_url_to_metadata](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L132-L166)

::: zillion.datasource.data_url_to_metadata
    :docstring:


## [datatable_from_config](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L1969-L2015)

::: zillion.datasource.datatable_from_config
    :docstring:


## [entity_name_from_file](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L56-L57)

::: zillion.datasource.entity_name_from_file
    :docstring:


## [get_adhoc_datasource_filename](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L219-L222)

::: zillion.datasource.get_adhoc_datasource_filename
    :docstring:


## [get_adhoc_datasource_url](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L225-L227)

::: zillion.datasource.get_adhoc_datasource_url
    :docstring:


## [get_ds_config_context](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L60-L62)

::: zillion.datasource.get_ds_config_context
    :docstring:


## [get_engine_extra_kwargs](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L73-L85)

::: zillion.datasource.get_engine_extra_kwargs
    :docstring:


## [join_from_path](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L454-L481)

::: zillion.datasource.join_from_path
    :docstring:


## [metadata_from_connect](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L169-L189)

::: zillion.datasource.metadata_from_connect
    :docstring:


## [parse_replace_after](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L104-L129)

::: zillion.datasource.parse_replace_after
    :docstring:


## [populate_url_context](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L65-L70)

::: zillion.datasource.populate_url_context
    :docstring:


## [reflect_metadata](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L192-L216)

::: zillion.datasource.reflect_metadata
    :docstring:


## [url_connect](https://github.com/totalhack/zillion/blob/master/zillion/datasource.py#L230-L277)

::: zillion.datasource.url_connect
    :docstring:



================================================
FILE: docs/mkdocs/zillion.dialects.md
================================================
[//]: # (This is an auto-generated file. Do not edit)
# Module zillion.dialects



================================================
FILE: docs/mkdocs/zillion.field.md
================================================
[//]: # (This is an auto-generated file. Do not edit)
# Module zillion.field


## [AdHocDimension](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L765-L780)

*Bases*: zillion.field.FormulaDimension

::: zillion.field.AdHocDimension
    :docstring:
    :members: copy create from_config get_all_raw_fields get_ds_expression get_final_select_clause get_formula_fields to_config


## [AdHocField](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L677-L690)

*Bases*: zillion.field.FormulaField

::: zillion.field.AdHocField
    :docstring:
    :members: copy create from_config get_all_raw_fields get_ds_expression get_final_select_clause get_formula_fields to_config


## [AdHocMetric](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L693-L762)

*Bases*: zillion.field.FormulaMetric

::: zillion.field.AdHocMetric
    :docstring:
    :members: copy create from_config get_all_raw_fields get_ds_expression get_final_select_clause get_formula_fields to_config


## [Dimension](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L333-L455)

*Bases*: zillion.field.Field

::: zillion.field.Dimension
    :docstring:
    :members: copy from_config get_all_raw_fields get_ds_expression get_final_select_clause get_formula_fields get_values is_valid_value sort to_config


## [Field](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L37-L170)

*Bases*: zillion.configs.ConfigMixin, tlbx.logging_utils.PrintMixin

::: zillion.field.Field
    :docstring:
    :members: copy from_config get_all_raw_fields get_ds_expression get_final_select_clause get_formula_fields to_config


## [FieldManagerMixin](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L815-L1167)

::: zillion.field.FieldManagerMixin
    :docstring:
    :members: add_dimension add_metric directly_has_dimension directly_has_field directly_has_metric get_child_field_managers get_dimension get_dimension_configs get_dimension_names get_dimensions get_direct_dimension_configs get_direct_dimensions get_direct_fields get_direct_metric_configs get_direct_metrics get_field get_field_instances get_field_managers get_field_names get_fields get_metric get_metric_configs get_metric_names get_metrics has_dimension has_field has_metric print_dimensions print_metrics


## [FormulaDimension](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L575-L591)

*Bases*: zillion.field.FormulaField

::: zillion.field.FormulaDimension
    :docstring:
    :members: copy from_config get_all_raw_fields get_ds_expression get_final_select_clause get_formula_fields to_config


## [FormulaField](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L458-L572)

*Bases*: zillion.field.Field

::: zillion.field.FormulaField
    :docstring:
    :members: copy from_config get_all_raw_fields get_ds_expression get_final_select_clause get_formula_fields to_config


## [FormulaMetric](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L594-L674)

*Bases*: zillion.field.FormulaField

::: zillion.field.FormulaMetric
    :docstring:
    :members: copy from_config get_all_raw_fields get_ds_expression get_final_select_clause get_formula_fields to_config


## [Metric](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L173-L330)

*Bases*: zillion.field.Field

::: zillion.field.Metric
    :docstring:
    :members: copy from_config get_all_raw_fields get_ds_expression get_final_select_clause get_formula_fields to_config


## [create_dimension](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L799-L812)

::: zillion.field.create_dimension
    :docstring:


## [create_metric](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L783-L796)

::: zillion.field.create_metric
    :docstring:


## [get_conversions_for_type](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L1350-L1366)

::: zillion.field.get_conversions_for_type
    :docstring:


## [get_dialect_type_conversions](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L1381-L1434)

::: zillion.field.get_dialect_type_conversions
    :docstring:


## [get_table_dimensions](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L1196-L1219)

::: zillion.field.get_table_dimensions
    :docstring:


## [get_table_field_column](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L1244-L1265)

::: zillion.field.get_table_field_column
    :docstring:


## [get_table_fields](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L1222-L1241)

::: zillion.field.get_table_fields
    :docstring:


## [get_table_metrics](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L1170-L1193)

::: zillion.field.get_table_metrics
    :docstring:


## [replace_non_named_formula_args](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L1369-L1378)

::: zillion.field.replace_non_named_formula_args
    :docstring:


## [sort_by_value_order](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L1325-L1347)

::: zillion.field.sort_by_value_order
    :docstring:


## [table_field_allows_grain](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L1268-L1285)

::: zillion.field.table_field_allows_grain
    :docstring:


## [values_from_db](https://github.com/totalhack/zillion/blob/master/zillion/field.py#L1288-L1322)

::: zillion.field.values_from_db
    :docstring:



================================================
FILE: docs/mkdocs/zillion.model.md
================================================
[//]: # (This is an auto-generated file. Do not edit)
# Module zillion.model



================================================
FILE: docs/mkdocs/zillion.nlp.md
================================================
[//]: # (This is an auto-generated file. Do not edit)
# Module zillion.nlp


## [build_chain](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L708-L733)

::: zillion.nlp.build_chain
    :docstring:


## [build_llm](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L678-L705)

::: zillion.nlp.build_llm
    :docstring:


## [field_name_to_embedding_text](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L558-L560)

::: zillion.nlp.field_name_to_embedding_text
    :docstring:


## [get_dimensions_prompt_str](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L1092-L1093)

::: zillion.nlp.get_dimensions_prompt_str
    :docstring:


## [get_field_fuzzy](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L972-L1020)

::: zillion.nlp.get_field_fuzzy
    :docstring:


## [get_field_name_variants](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L943-L964)

::: zillion.nlp.get_field_name_variants
    :docstring:


## [get_fields_prompt_str](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L1074-L1085)

::: zillion.nlp.get_fields_prompt_str
    :docstring:


## [get_metrics_prompt_str](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L1088-L1089)

::: zillion.nlp.get_metrics_prompt_str
    :docstring:


## [get_nlp_table_info](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L1270-L1296)

::: zillion.nlp.get_nlp_table_info
    :docstring:


## [get_nlp_table_relationships](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L1178-L1224)

::: zillion.nlp.get_nlp_table_relationships
    :docstring:


## [get_openai_class](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L654-L657)

::: zillion.nlp.get_openai_class
    :docstring:


## [get_openai_model_context_size](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L660-L675)

::: zillion.nlp.get_openai_model_context_size
    :docstring:


## [get_warehouse_collection_name](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L563-L575)

::: zillion.nlp.get_warehouse_collection_name
    :docstring:


## [hash_text](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L46-L48)

::: zillion.nlp.hash_text
    :docstring:


## [init_warehouse_embeddings](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L600-L651)

::: zillion.nlp.init_warehouse_embeddings
    :docstring:


## [map_warehouse_report_params](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L1023-L1071)

::: zillion.nlp.map_warehouse_report_params
    :docstring:


## [parse_nlp_table_info](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L1244-L1264)

::: zillion.nlp.parse_nlp_table_info
    :docstring:


## [parse_nlp_table_relationships](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L1151-L1175)

::: zillion.nlp.parse_nlp_table_relationships
    :docstring:


## [parse_text_to_report_json_output](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L870-L888)

::: zillion.nlp.parse_text_to_report_json_output
    :docstring:


## [text_to_report_params](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L1100-L1135)

::: zillion.nlp.text_to_report_params
    :docstring:


## [warehouse_field_nlp_enabled](https://github.com/totalhack/zillion/blob/master/zillion/nlp.py#L578-L597)

::: zillion.nlp.warehouse_field_nlp_enabled
    :docstring:



================================================
FILE: docs/mkdocs/zillion.report.md
================================================
[//]: # (This is an auto-generated file. Do not edit)
# Module zillion.report


## [BaseCombinedResult](https://github.com/totalhack/zillion/blob/master/zillion/report.py#L707-L833)

::: zillion.report.BaseCombinedResult
    :docstring:
    :members: add_warning clean_up create_table get_conn get_cursor get_final_result get_metric_clause ifnull_clause load_table


## [DataSourceQuery](https://github.com/totalhack/zillion/blob/master/zillion/report.py#L162-L647)

*Bases*: zillion.report.ExecutionStateMixin, tlbx.logging_utils.PrintMixin

::: zillion.report.DataSourceQuery
    :docstring:
    :members: add_metric covers_field covers_metric execute get_conn get_datasource get_datasource_name get_dialect_name get_tables kill


## [DataSourceQueryResult](https://github.com/totalhack/zillion/blob/master/zillion/report.py#L688-L704)

*Bases*: tlbx.logging_utils.PrintMixin

::: zillion.report.DataSourceQueryResult
    :docstring:
    


## [DataSourceQuerySummary](https://github.com/totalhack/zillion/blob/master/zillion/report.py#L650-L685)

*Bases*: tlbx.logging_utils.PrintMixin

::: zillion.report.DataSourceQuerySummary
    :docstring:
    :members: format


## [ExecutionStateMixin](https://github.com/totalhack/zillion/blob/master/zillion/report.py#L50-L159)

::: zillion.report.ExecutionStateMixin
    :docstring:
    


## [Report](https://github.com/totalhack/zillion/blob/master/zillion/report.py#L1448-L2365)

*Bases*: zillion.report.ExecutionStateMixin

::: zillion.report.Report
    :docstring:
    :members: delete execute from_params from_text get_dimension_grain get_grain get_json get_params kill load load_warehouse_id_for_report save


## [ReportResult](https://github.com/totalhack/zillion/blob/master/zillion/report.py#L2368-L2454)

*Bases*: tlbx.logging_utils.PrintMixin

::: zillion.report.ReportResult
    :docstring:
    


## [SQLiteMemoryCombinedResult](https://github.com/totalhack/zillion/blob/master/zillion/report.py#L836-L1445)

*Bases*: zillion.report.BaseCombinedResult

::: zillion.report.SQLiteMemoryCombinedResult
    :docstring:
    :members: add_warning clean_up create_table get_conn get_cursor get_final_result get_metric_clause ifnull_clause load_table



================================================
FILE: docs/mkdocs/zillion.scripts.md
================================================
[//]: # (This is an auto-generated file. Do not edit)
# Module zillion.scripts



================================================
FILE: docs/mkdocs/zillion.sql_utils.md
================================================
[//]: # (This is an auto-generated file. Do not edit)
# Module zillion.sql_utils


## [aggregation_to_sqla_func](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L250-L252)

::: zillion.sql_utils.aggregation_to_sqla_func
    :docstring:


## [check_metadata_url](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L440-L453)

::: zillion.sql_utils.check_metadata_url
    :docstring:


## [column_fullname](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L316-L336)

::: zillion.sql_utils.column_fullname
    :docstring:


## [comment](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L456-L459)

::: zillion.sql_utils.comment
    :docstring:


## [contains_aggregation](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L110-L140)

::: zillion.sql_utils.contains_aggregation
    :docstring:


## [contains_sql_keywords](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L83-L107)

::: zillion.sql_utils.contains_sql_keywords
    :docstring:


## [filter_dialect_schemas](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L521-L548)

::: zillion.sql_utils.filter_dialect_schemas
    :docstring:


## [get_postgres_pid](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L562-L566)

::: zillion.sql_utils.get_postgres_pid
    :docstring:


## [get_postgres_schemas](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L551-L559)

::: zillion.sql_utils.get_postgres_schemas
    :docstring:


## [get_schema_and_table_name](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L339-L348)

::: zillion.sql_utils.get_schema_and_table_name
    :docstring:


## [get_schemas](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L490-L493)

::: zillion.sql_utils.get_schemas
    :docstring:


## [get_sqla_criterion_expr](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L351-L437)

::: zillion.sql_utils.get_sqla_criterion_expr
    :docstring:


## [infer_aggregation_and_rounding](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L213-L247)

::: zillion.sql_utils.infer_aggregation_and_rounding
    :docstring:


## [is_numeric_type](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L255-L260)

::: zillion.sql_utils.is_numeric_type
    :docstring:


## [is_probably_metric](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L263-L293)

::: zillion.sql_utils.is_probably_metric
    :docstring:


## [printexpr](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L311-L313)

::: zillion.sql_utils.printexpr
    :docstring:


## [sqla_compile](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L296-L308)

::: zillion.sql_utils.sqla_compile
    :docstring:


## [to_duckdb_type](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L514-L518)

::: zillion.sql_utils.to_duckdb_type
    :docstring:


## [to_generic_sa_type](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L195-L210)

::: zillion.sql_utils.to_generic_sa_type
    :docstring:


## [to_mysql_type](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L499-L501)

::: zillion.sql_utils.to_mysql_type
    :docstring:


## [to_postgresql_type](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L504-L506)

::: zillion.sql_utils.to_postgresql_type
    :docstring:


## [to_sqlite_type](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L509-L511)

::: zillion.sql_utils.to_sqlite_type
    :docstring:


## [type_string_to_sa_type](https://github.com/totalhack/zillion/blob/master/zillion/sql_utils.py#L143-L192)

::: zillion.sql_utils.type_string_to_sa_type
    :docstring:



================================================
FILE: docs/mkdocs/zillion.version.md
================================================
[//]: # (This is an auto-generated file. Do not edit)
# Module zillion.version



================================================
FILE: docs/mkdocs/zillion.warehouse.md
================================================
[//]: # (This is an auto-generated file. Do not edit)
# Module zillion.warehouse


## [Warehouse](https://github.com/totalhack/zillion/blob/master/zillion/warehouse.py#L16-L1209)

*Bases*: zillion.field.FieldManagerMixin

::: zillion.warehouse.Warehouse
    :docstring:
    :members: add_datasource add_dimension add_metric apply_config delete delete_report directly_has_dimension directly_has_field directly_has_metric execute execute_id execute_text from_data_file from_db_file get_child_field_managers get_datasource get_dimension get_dimension_configs get_dimension_names get_dimension_table_set get_dimensions get_direct_dimension_configs get_direct_dimensions get_direct_fields get_direct_metric_configs get_direct_metrics get_field get_field_instances get_field_managers get_field_names get_fields get_metric get_metric_configs get_metric_names get_metric_table_set get_metrics has_dimension has_field has_metric init_embeddings load load_report load_report_and_warehouse load_warehouse_for_report print_dimensions print_info print_metrics remove_datasource run_integrity_checks save save_report



================================================
FILE: docs/mkdocs_index.md
================================================
--8<-- "markdown/readme_badges.md"

--8<-- "markdown/readme_intro.md"

---

--8<-- "markdown/readme_contents.md"


================================================
FILE: docs/readme.md
================================================
Zillion: Make sense of it all
=============================

--8<-- "markdown/readme_badges.md"

--8<-- "markdown/readme_intro.md"

--8<-- "markdown/readme_toc.md"

--8<-- "markdown/readme_contents.md"

---

--8<-- "markdown/readme_docs.md"

---

--8<-- "markdown/readme_how_to_contribute.md"


================================================
FILE: docs/requirements.txt
================================================
zillion
mkdocs==1.1.2
mkdocs-material==5.2.1
mkdocs-material-extensions==1.0
mkdocs-minify-plugin==0.3.0
mkautodoc==0.1.0


================================================
FILE: examples/baseball_warehouse.json
================================================
{
    "metrics": [
        {
            "name": "games",
            "display_name": "G",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "at_bats",
            "display_name": "AB",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "runs",
            "display_name": "R",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "hits",
            "display_name": "H",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "singles",
            "display_name": "1B",
            "aggregation": "sum",
            "formula": "{hits} - {doubles} - {triples} - {home_runs}"
        },
        {
            "name": "doubles",
            "display_name": "2B",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "triples",
            "display_name": "3B",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "home_runs",
            "display_name": "HR",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "runs_batted_in",
            "display_name": "RBI",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "stolen_bases",
            "display_name": "SB",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "caught_stealing",
            "display_name": "CS",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "walks",
            "display_name": "BB",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "strikeouts",
            "display_name": "SO",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "intentional_walks",
            "display_name": "IBB",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "hit_by_pitch",
            "display_name": "HBP",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "sacrifice_hits",
            "display_name": "SH",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "sacrifice_flies",
            "display_name": "SF",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "grounded_into_double_plays",
            "display_name": "GIDP",
            "type": "integer",
            "aggregation": "sum"
        },
        {
            "name": "batting_average",
            "display_name": "AVG",
            "aggregation": "mean",
            "rounding": 3,
            "formula": "1.0*{hits}/{at_bats}",
            "description": "Hits per At Bat"
        },
        {
            "name": "on_base_percentage",
            "display_name": "OBP",
            "aggregation": "mean",
            "rounding": 3,
            "formula": "1.0*({hits} + {walks} + {hit_by_pitch})/({at_bats} + {walks} + {hit_by_pitch} + {sacrifice_flies})",
            "description": "(Hits + Walks + Hit By Pitch) / (At Bats + Walks + Hit By Pitch + Sacrifice Flies)"
        },
        {
            "name": "slugging_percentage",
            "display_name": "SLG",
            "aggregation": "mean",
            "rounding": 3,
            "formula": "1.0*({singles} + 2*{doubles} + 3*{triples} + 4*{home_runs})/{at_bats}",
            "description": "(Singles + 2xDoubles + 3xTriples + 4xHome Runs) / At Bats"
        },
        {
            "name": "on_base_plus_slugging",
            "display_name": "OPS",
            "aggregation": "mean",
            "rounding": 3,
            "formula": "1.0*({on_base_percentage} + {slugging_percentage})",
            "description": "OBP + SLG"
        }
    ],
    "dimensions": [
        {
            "name": "player_id",
            "display_name": "Player ID",
            "type": "string(10)",
            "description": "A unique code assigned to each player"
        },
        {
            "name": "year",
            "display_name": "Year",
            "type": "integer",
            "description": "Year"
        },
        {
            "name": "stint",
            "display_name": "Stint",
            "type": "integer",
            "description": "Player's stint (order of appearances within a season)"
        },
        {
            "name": "team_id",
            "display_name": "Team ID",
            "type": "string(3)",
            "description": "A unique ID for a franchise/year combination"
        },
        {
            "name": "league_id",
            "display_name": "League ID",
            "type": "string(2)",
            "description": "Unique ID for the league"
        },
        {
            "name": "franchise_id",
            "display_name": "Franchise ID",
            "type": "string(3)",
            "description": "A unique ID for the franchise"
        },
        {
            "name": "franchise_name",
            "display_name": "Franchise Name",
            "type": "string(50)",
            "description": "Full name of the franchise"
        },
        {
            "name": "ballpark",
            "display_name": "Ballpark",
            "type": "string(50)",
            "description": "Name of the ballpark"
        },
        {
            "name": "birth_year",
            "display_name": "Birth Year",
            "type": "integer",
            "description": "Year player was born"
        },
        {
            "name": "birth_country",
            "display_name": "Birth Country",
            "type": "string(32)",
            "description": "Country where player was born"
        },
        {
            "name": "birth_state",
            "display_name": "Birth State",
            "type": "string(3)",
            "description": "State where player was born"
        },
        {
            "name": "first_name",
            "display_name": "First Name",
            "type": "string(32)",
            "description": "Player's first name"
        },
        {
            "name": "last_name",
            "display_name": "Last Name",
            "type": "string(32)",
            "description": "Player's last name"
        },
        {
            "name": "weight",
            "display_name": "Weight",
            "type": "integer",
            "description": "Player's weight in pounds"
        },
        {
            "name": "height",
            "display_name": "Height",
            "type": "integer",
            "description": "Player's height in inches"
        },
        {
            "name": "bats",
            "display_name": "Bats",
            "type": "string(5)",
            "description": "Player's batting hand (left, right, or both)"
        },
        {
            "name": "throws",
            "display_name": "Throws",
            "type": "string(5)",
            "description": "Player's throwing hand (left or right)"
        },
        {
            "name": "debut_date",
            "display_name": "Debut Date",
            "type": "date",
            "description": "Date that player made first major league appearance"
        }
    ],
    "datasources": {
        "baseball_data_bank": {
            "connect": "sqlite:////tmp/baseball.db",
            "tables": {
                "main.people": {

Download .txt

gitextract_rw4myeyk/

├── .gitattributes
├── .github/
│   └── FUNDING.yml
├── .gitignore
├── .pre-commit-config.yaml
├── .pylintrc
├── .readthedocs.yml
├── AUTHORS.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── Makefile
├── README.md
├── dev_config.yml
├── docker-compose-nlp.yml
├── docker-compose.yml
├── docs/
│   ├── build_markdown.py
│   ├── markdown/
│   │   ├── contributing.md
│   │   ├── readme_badges.md
│   │   ├── readme_contents.md
│   │   ├── readme_docs.md
│   │   ├── readme_how_to_contribute.md
│   │   ├── readme_intro.md
│   │   └── readme_toc.md
│   ├── mkdocs/
│   │   ├── api.md
│   │   ├── contributing.md
│   │   ├── css/
│   │   │   └── extra.css
│   │   ├── index.md
│   │   ├── zillion.configs.md
│   │   ├── zillion.core.md
│   │   ├── zillion.datasource.md
│   │   ├── zillion.dialects.md
│   │   ├── zillion.field.md
│   │   ├── zillion.model.md
│   │   ├── zillion.nlp.md
│   │   ├── zillion.report.md
│   │   ├── zillion.scripts.md
│   │   ├── zillion.sql_utils.md
│   │   ├── zillion.version.md
│   │   └── zillion.warehouse.md
│   ├── mkdocs_index.md
│   ├── readme.md
│   └── requirements.txt
├── examples/
│   ├── baseball_warehouse.json
│   ├── example_wh_config.json
│   ├── minimal_example.py
│   └── sample_config.yaml
├── mkdocs.yml
├── pyproject.toml
├── tests/
│   ├── __init__.py
│   ├── conftest.py
│   ├── dma_zip.csv
│   ├── dma_zip.html
│   ├── dma_zip.json
│   ├── dma_zip.xlsx
│   ├── pytest.ini
│   ├── setup/
│   │   ├── campaigns.csv
│   │   ├── common.sqlite.sql
│   │   ├── create_testdb2_sqlite.py
│   │   ├── duckdb/
│   │   │   ├── load.sql
│   │   │   └── schema.sql
│   │   ├── init_mysql_data.sh
│   │   ├── init_postgres_data.sh
│   │   ├── leads.csv
│   │   ├── partner_sibling.csv
│   │   ├── partners.csv
│   │   ├── sales.csv
│   │   ├── testdb1.sqlite.sql
│   │   ├── zillion_db.sqlite.sql
│   │   ├── zillion_test.mysql.sql
│   │   └── zillion_test.postgres.sql
│   ├── test_adhoc_ds_config.json
│   ├── test_config.yaml
│   ├── test_core.py
│   ├── test_duckdb.py
│   ├── test_duckdb_wh_config.json
│   ├── test_example_wh_config.py
│   ├── test_include_wh_config.json
│   ├── test_mysql.py
│   ├── test_mysql_ds_config.json
│   ├── test_nlp.py
│   ├── test_performance.py
│   ├── test_postgresql.py
│   ├── test_postgresql_ds_config.json
│   ├── test_reports.py
│   ├── test_scripts.py
│   ├── test_sqlite_ds_config.json
│   ├── test_table_config.json
│   ├── test_utils.py
│   ├── test_wh_config.json
│   ├── testdb1
│   ├── testdb2
│   ├── zillion_test_0.7.duckdb
│   └── zillion_test_1.x.duckdb
└── zillion/
    ├── __init__.py
    ├── configs.py
    ├── core.py
    ├── datasource.py
    ├── dialects/
    │   ├── __init__.py
    │   ├── conversions.py
    │   ├── duckdb.py
    │   ├── mysql.py
    │   ├── postgresql.py
    │   └── sqlite.py
    ├── field.py
    ├── model.py
    ├── nlp.py
    ├── report.py
    ├── scripts/
    │   ├── __init__.py
    │   ├── bootstrap_datasource_config.py
    │   ├── json_to_yaml.py
    │   ├── load_config.py
    │   ├── run_report.py
    │   └── yaml_to_json.py
    ├── sql_utils.py
    ├── version.py
    ├── warehouse.py
    └── zillion_test

Download .txt

SYMBOL INDEX (969 symbols across 33 files)

FILE: docs/build_markdown.py
  function get_classes (line 22) | def get_classes(module):
  function get_funcs (line 34) | def get_funcs(module):
  function get_object_attributes (line 44) | def get_object_attributes(obj):
  function get_zillion_members (line 50) | def get_zillion_members(obj):
  function process_markdown (line 61) | def process_markdown(infile, outfile, **opts):
  function linkcode_resolve (line 71) | def linkcode_resolve(obj):
  function create_module_file (line 96) | def create_module_file(fullname):

FILE: tests/conftest.py
  function pytest_addoption (line 11) | def pytest_addoption(parser):
  function pytest_configure (line 28) | def pytest_configure(config):
  function mysql_setup (line 39) | def mysql_setup():
  function postgresql_setup (line 53) | def postgresql_setup():
  function duckdb_setup (line 71) | def duckdb_setup():
  function config (line 85) | def config():
  function ds_config (line 90) | def ds_config():
  function adhoc_config (line 96) | def adhoc_config():
  function wh (line 101) | def wh(config):
  function saved_wh (line 106) | def saved_wh():
  function adhoc_ds (line 119) | def adhoc_ds(config):
  function mysql_ds_config (line 124) | def mysql_ds_config(mysql_setup):
  function mysql_ds (line 129) | def mysql_ds(mysql_ds_config):
  function mysql_wh (line 134) | def mysql_wh(mysql_ds):
  function postgresql_ds_config (line 139) | def postgresql_ds_config(postgresql_setup):
  function postgresql_ds (line 144) | def postgresql_ds(postgresql_ds_config):
  function postgresql_wh (line 149) | def postgresql_wh(postgresql_ds):
  function duckdb_wh (line 154) | def duckdb_wh():
  function pymysql_conn (line 169) | def pymysql_conn():
  function sqlalchemy_mysql_conn (line 179) | def sqlalchemy_mysql_conn():

FILE: tests/setup/common.sqlite.sql
  type partners (line 2) | CREATE TABLE IF NOT EXISTS partners (
  type partner_sibling (line 10) | CREATE TABLE IF NOT EXISTS partner_sibling (
  type campaigns (line 16) | CREATE TABLE IF NOT EXISTS campaigns (

FILE: tests/setup/testdb1.sqlite.sql
  type leads (line 2) | CREATE TABLE IF NOT EXISTS leads (
  type sales (line 10) | CREATE TABLE IF NOT EXISTS sales (

FILE: tests/setup/zillion_db.sqlite.sql
  type warehouses (line 2) | CREATE TABLE IF NOT EXISTS warehouses (
  type report_specs (line 12) | CREATE TABLE IF NOT EXISTS report_specs (

FILE: tests/setup/zillion_test.mysql.sql
  type `campaign_cost` (line 33) | CREATE TABLE `campaign_cost` (
  type `campaigns` (line 58) | CREATE TABLE `campaigns` (
  type `partners` (line 86) | CREATE TABLE `partners` (
  type `campaign_transactions` (line 120) | CREATE TABLE `campaign_transactions` (
  type `campaigns` (line 146) | CREATE TABLE `campaigns` (

FILE: tests/setup/zillion_test.postgres.sql
  type zillion_test (line 36) | CREATE TABLE zillion_test.campaign_cost (
  type zillion_test (line 49) | CREATE TABLE zillion_test.campaign_transactions (
  type zillion_test (line 82) | CREATE TABLE zillion_test.campaigns (
  type zillion_test (line 118) | CREATE TABLE zillion_test.partners (
  type idx_16390_idx_name (line 99273) | CREATE UNIQUE INDEX idx_16390_idx_name ON zillion_test.campaigns USING b...
  type idx_16400_idx_campaign_id (line 99280) | CREATE INDEX idx_16400_idx_campaign_id ON zillion_test.campaign_transact...
  type idx_16406_idx_name (line 99287) | CREATE UNIQUE INDEX idx_16406_idx_name ON zillion_test.partners USING bt...

FILE: tests/test_core.py
  function test_zillion_config (line 18) | def test_zillion_config():
  function test_wh_config_init (line 40) | def test_wh_config_init(config):
  function test_wh_config_include (line 44) | def test_wh_config_include(config):
  function test_datasource_config_init (line 52) | def test_datasource_config_init(ds_config):
  function test_load_remote_wh_config (line 59) | def test_load_remote_wh_config():
  function test_wh_from_url_wh_config (line 63) | def test_wh_from_url_wh_config():
  function test_wh_from_yaml_url_wh_config (line 68) | def test_wh_from_yaml_url_wh_config():
  function test_load_wh_config_from_env (line 78) | def test_load_wh_config_from_env():
  function test_wh_save_and_load (line 84) | def test_wh_save_and_load():
  function test_datasource_metadata_init (line 94) | def test_datasource_metadata_init(ds_config):
  function test_datasource_metadata_and_config_init (line 102) | def test_datasource_metadata_and_config_init(ds_config):
  function test_datasource_from_config (line 112) | def test_datasource_from_config(ds_config):
  function test_datasource_skip_conversion_fields (line 119) | def test_datasource_skip_conversion_fields(ds_config):
  function test_datasource_config_data_url (line 125) | def test_datasource_config_data_url(ds_config):
  function test_datasource_config_data_url_replace_after (line 138) | def test_datasource_config_data_url_replace_after(ds_config):
  function test_parse_replace_after (line 158) | def test_parse_replace_after():
  function test_datasource_from_db_file (line 173) | def test_datasource_from_db_file(ds_config):
  function test_datasource_config_table_data_url (line 190) | def test_datasource_config_table_data_url(adhoc_config):
  function test_datasource_metadata_and_table_data_url (line 198) | def test_datasource_metadata_and_table_data_url(ds_config, adhoc_config):
  function test_datasource_apply_config_table_data_url (line 218) | def test_datasource_apply_config_table_data_url(ds_config, adhoc_config):
  function test_warehouse_init (line 240) | def test_warehouse_init(config):
  function test_warehouse_no_config (line 246) | def test_warehouse_no_config(ds_config):
  function test_warehouse_has_zillion_info_no_config (line 253) | def test_warehouse_has_zillion_info_no_config(ds_config):
  function test_field_display_name (line 261) | def test_field_display_name(config):
  function test_field_description (line 273) | def test_field_description(config):
  function test_field_meta (line 286) | def test_field_meta(config):
  function test_reserved_field_name (line 292) | def test_reserved_field_name(config):
  function test_field_name_starts_with_number (line 301) | def test_field_name_starts_with_number(config):
  function test_dimension_to_config (line 309) | def test_dimension_to_config(config):
  function test_metric_to_config (line 315) | def test_metric_to_config(config):
  function test_formula_metric_to_config (line 323) | def test_formula_metric_to_config(config):
  function test_get_metric_configs (line 331) | def test_get_metric_configs(config):
  function test_get_dimension_configs (line 337) | def test_get_dimension_configs(config):
  function test_dimension_copy (line 343) | def test_dimension_copy(config):
  function test_metric_copy (line 349) | def test_metric_copy(config):
  function test_formula_metric_copy (line 355) | def test_formula_metric_copy(config):
  function test_warehouse_technical_within_formula (line 361) | def test_warehouse_technical_within_formula(config):
  function test_warehouse_metric_divisor (line 374) | def test_warehouse_metric_divisor(config):
  function test_warehouse_metric_multiple_aggregations (line 381) | def test_warehouse_metric_multiple_aggregations(config):
  function test_warehouse_remote_datasource_config (line 388) | def test_warehouse_remote_datasource_config(config):
  function test_warehouse_remote_csv_table (line 396) | def test_warehouse_remote_csv_table(adhoc_config):
  function test_warehouse_remote_google_sheet (line 405) | def test_warehouse_remote_google_sheet(adhoc_config):
  function test_warehouse_remote_xlsx_table (line 414) | def test_warehouse_remote_xlsx_table(adhoc_config):
  function test_warehouse_remote_json_table (line 425) | def test_warehouse_remote_json_table(adhoc_config):
  function test_warehouse_remote_html_table (line 436) | def test_warehouse_remote_html_table(adhoc_config):
  function test_reuse_existing_remote_table (line 448) | def test_reuse_existing_remote_table(adhoc_config):
  function test_bad_table_data_url (line 465) | def test_bad_table_data_url(ds_config):
  function test_warehouse_from_data_file (line 471) | def test_warehouse_from_data_file():
  function test_warehouse_from_db_file (line 480) | def test_warehouse_from_db_file():
  function test_column_config_override (line 486) | def test_column_config_override(config):
  function test_table_config_override (line 493) | def test_table_config_override(config):
  function test_no_create_fields_no_columns (line 500) | def test_no_create_fields_no_columns(config):
  function test_no_create_fields_has_columns (line 508) | def test_no_create_fields_has_columns(config):
  function test_no_create_fields_field_exists_has_columns (line 518) | def test_no_create_fields_field_exists_has_columns(config):
  function test_create_fields_no_columns (line 528) | def test_create_fields_no_columns(config):
  function test_create_fields_has_columns (line 539) | def test_create_fields_has_columns(config):
  function test_get_dimension_table_set (line 552) | def test_get_dimension_table_set(wh):
  function test_get_metric_table_set (line 568) | def test_get_metric_table_set(wh):
  function test_get_supported_dimensions (line 585) | def test_get_supported_dimensions(wh):
  function test_contains_aggregation (line 592) | def test_contains_aggregation():
  function test_contains_sql_keyword (line 617) | def test_contains_sql_keyword():
  function test_adhoc_datatable_no_columns (line 644) | def test_adhoc_datatable_no_columns():
  function test_adhoc_datatable_has_columns (line 668) | def test_adhoc_datatable_has_columns():
  function test_csv_datatable (line 693) | def test_csv_datatable():
  function test_excel_datatable (line 716) | def test_excel_datatable():
  function test_json_datatable (line 740) | def test_json_datatable():
  function test_html_datatable (line 758) | def test_html_datatable():

FILE: tests/test_duckdb.py
  function test_duckdb_datasource (line 9) | def test_duckdb_datasource(duckdb_wh):
  function test_duckdb_date_dimension_conversions (line 46) | def test_duckdb_date_dimension_conversions(duckdb_wh):
  function test_duckdb_where_criteria_conversions (line 59) | def test_duckdb_where_criteria_conversions(duckdb_wh):

FILE: tests/test_example_wh_config.py
  function test_example_wh_init (line 9) | def test_example_wh_init():
  function test_example_wh_report1 (line 14) | def test_example_wh_report1():
  function test_example_wh_report2 (line 23) | def test_example_wh_report2():
  function test_example_wh_report3 (line 34) | def test_example_wh_report3():

FILE: tests/test_mysql.py
  function test_mysql_datasource (line 8) | def test_mysql_datasource(mysql_wh):
  function test_mysql_table_data_url (line 16) | def test_mysql_table_data_url(mysql_ds_config, adhoc_config):
  function test_mysql_ignore_table_data_url (line 25) | def test_mysql_ignore_table_data_url(mysql_ds_config, adhoc_config):
  function test_mysql_report_repeat_criteria (line 35) | def test_mysql_report_repeat_criteria(wh):
  function test_mysql_sequential_timeout (line 44) | def test_mysql_sequential_timeout(mysql_wh):
  function test_mysql_multithreaded_timeout (line 57) | def test_mysql_multithreaded_timeout(mysql_wh):
  function test_mysql_date_dimension_conversions (line 70) | def test_mysql_date_dimension_conversions(mysql_wh):
  function test_mysql_where_criteria_conversions (line 82) | def test_mysql_where_criteria_conversions(mysql_wh):

FILE: tests/test_nlp.py
  function n_days_ago (line 9) | def n_days_ago(n):
  function test_text_to_report_no_fields (line 48) | def test_text_to_report_no_fields():
  function test_text_to_report_all_fields (line 68) | def test_text_to_report_all_fields(config):
  function test_text_to_report_dimension_fields (line 89) | def test_text_to_report_dimension_fields(config):
  function test_init_warehouse_embeddings (line 101) | def test_init_warehouse_embeddings(config):
  function test_openai_embeddings_cached (line 126) | def test_openai_embeddings_cached():
  function test_map_warehouse_report_params (line 138) | def test_map_warehouse_report_params(config):
  function test_warehouse_execute_text (line 153) | def test_warehouse_execute_text(config):
  function test_nlp_datasource_from_db_file (line 161) | def test_nlp_datasource_from_db_file(ds_config):

FILE: tests/test_performance.py
  function profiled (line 13) | def profiled(pattern=None):
  function get_adhoc_ds (line 28) | def get_adhoc_ds(size):
  function test_performance_adhoc_ds (line 68) | def test_performance_adhoc_ds(wh):
  function test_performance_multi_rollup (line 79) | def test_performance_multi_rollup(wh):

FILE: tests/test_postgresql.py
  function test_postgresql_datasource (line 8) | def test_postgresql_datasource(postgresql_wh):
  function test_postgresql_sequential_timeout (line 16) | def test_postgresql_sequential_timeout(postgresql_wh):
  function test_postgresql_multithreaded_timeout (line 30) | def test_postgresql_multithreaded_timeout(postgresql_wh):
  function test_postgresql_date_dimension_conversions (line 43) | def test_postgresql_date_dimension_conversions(postgresql_wh):
  function test_postgresql_where_criteria_conversions (line 55) | def test_postgresql_where_criteria_conversions(postgresql_wh):

FILE: tests/test_reports.py
  function test_basic_report (line 13) | def test_basic_report(wh):
  function test_report_none_criteria (line 24) | def test_report_none_criteria(wh):
  function test_report_invalid_criteria_value (line 33) | def test_report_invalid_criteria_value(wh):
  function test_report_criteria_values_from_callable (line 41) | def test_report_criteria_values_from_callable(wh):
  function test_report_sequential_timeout (line 85) | def test_report_sequential_timeout(wh):
  function test_report_multithreaded_timeout (line 101) | def test_report_multithreaded_timeout(wh):
  function test_report_one_worker (line 117) | def test_report_one_worker(wh):
  function test_report_reuse_after_timeout (line 130) | def test_report_reuse_after_timeout(wh):
  function test_report_kill (line 151) | def test_report_kill(wh):
  function test_report_reuse_after_kill (line 171) | def test_report_reuse_after_kill(wh):
  function test_report_timeout_then_kill (line 200) | def test_report_timeout_then_kill(wh):
  function test_impossible_report (line 228) | def test_impossible_report(wh):
  function test_report_count_aggr (line 236) | def test_report_count_aggr(wh):
  function test_report_criteria_between (line 244) | def test_report_criteria_between(wh):
  function test_report_criteria_in (line 257) | def test_report_criteria_in(wh):
  function test_report_criteria_like (line 274) | def test_report_criteria_like(wh):
  function test_report_repeat_criteria (line 283) | def test_report_repeat_criteria(wh):
  function test_report_between_date_criteria (line 292) | def test_report_between_date_criteria(wh):
  function test_row_filter_single_dimension (line 301) | def test_row_filter_single_dimension(wh):
  function test_row_filter_invalid_type (line 311) | def test_row_filter_invalid_type(wh):
  function test_row_filter_formula_metric (line 321) | def test_row_filter_formula_metric(wh):
  function test_report_pivot (line 331) | def test_report_pivot(wh):
  function test_report_order_by (line 345) | def test_report_order_by(wh):
  function test_report_order_by_only_dims (line 355) | def test_report_order_by_only_dims(wh):
  function test_report_order_by_custom_sort (line 364) | def test_report_order_by_custom_sort(wh):
  function test_report_custom_sort_no_order_by (line 376) | def test_report_custom_sort_no_order_by(wh):
  function test_report_limit (line 414) | def test_report_limit(wh):
  function test_report_limit_only_dims (line 423) | def test_report_limit_only_dims(wh):
  function test_report_order_by_and_limit (line 431) | def test_report_order_by_and_limit(wh):
  function test_report_limit_first (line 442) | def test_report_limit_first(wh):
  function test_report_df_display (line 453) | def test_report_df_display(wh):
  function test_report_df_display_no_dims (line 463) | def test_report_df_display_no_dims(wh):
  function test_report_df_display_no_metrics (line 471) | def test_report_df_display_no_metrics(wh):
  function test_report_technical_ma (line 479) | def test_report_technical_ma(wh):
  function test_report_adhoc_technicals (line 492) | def test_report_adhoc_technicals(wh):
  function test_report_no_dimension_technical (line 533) | def test_report_no_dimension_technical(wh):
  function test_report_technical_formula_ma (line 564) | def test_report_technical_formula_ma(wh):
  function test_report_technical_rolling_sum (line 577) | def test_report_technical_rolling_sum(wh):
  function test_report_technical_cumsum (line 589) | def test_report_technical_cumsum(wh):
  function test_report_technical_diff (line 601) | def test_report_technical_diff(wh):
  function test_report_technical_pct_diff (line 613) | def test_report_technical_pct_diff(wh):
  function test_report_technical_bollinger (line 625) | def test_report_technical_bollinger(wh):
  function test_report_no_dimensions (line 681) | def test_report_no_dimensions(wh):
  function test_report_no_metrics (line 689) | def test_report_no_metrics(wh):
  function test_report_null_criteria (line 697) | def test_report_null_criteria(wh):
  function test_report_incomplete_dimensions (line 724) | def test_report_incomplete_dimensions(config):
  function test_report_inactive_table (line 742) | def test_report_inactive_table(config):
  function test_report_count_metric (line 756) | def test_report_count_metric(wh):
  function test_report_alias_metric (line 764) | def test_report_alias_metric(wh):
  function test_report_alias_dimension (line 772) | def test_report_alias_dimension(wh):
  function test_report_sibling_dimension (line 780) | def test_report_sibling_dimension(wh):
  function test_report_multiple_queries (line 788) | def test_report_multiple_queries(wh):
  function test_report_formula_metric (line 796) | def test_report_formula_metric(wh):
  function test_report_formula_metric_divisor (line 804) | def test_report_formula_metric_divisor(wh):
  function test_report_nested_formula_metric (line 812) | def test_report_nested_formula_metric(wh):
  function test_report_ds_dimension_formula (line 820) | def test_report_ds_dimension_formula(wh):
  function test_report_ds_metric_formula (line 828) | def test_report_ds_metric_formula(wh):
  function test_report_where_ds_formula (line 836) | def test_report_where_ds_formula(wh):
  function test_report_metric_formula_with_dim (line 845) | def test_report_metric_formula_with_dim(config):
  function test_report_only_dimensions_ds_formula (line 861) | def test_report_only_dimensions_ds_formula(wh):
  function test_report_non_existent_metric (line 869) | def test_report_non_existent_metric(wh):
  function test_report_metric_required_grain (line 876) | def test_report_metric_required_grain(wh):
  function test_report_metric_formula_required_grain (line 887) | def test_report_metric_formula_required_grain(wh):
  function test_report_metric_formula_field_required_grain (line 894) | def test_report_metric_formula_field_required_grain(wh):
  function test_report_partial_grain (line 901) | def test_report_partial_grain(wh):
  function test_report_metric_ifnull (line 925) | def test_report_metric_ifnull(wh):
  function test_report_weighted_formula_metric (line 931) | def test_report_weighted_formula_metric(wh):
  function test_report_weighted_ds_metric_formula (line 941) | def test_report_weighted_ds_metric_formula(wh):
  function test_report_weighted_metric (line 949) | def test_report_weighted_metric(wh):
  function test_report_multiple_weighted_metrics (line 958) | def test_report_multiple_weighted_metrics(wh):
  function test_report_repeat_weighted_metrics (line 967) | def test_report_repeat_weighted_metrics(wh):
  function test_report_weighted_rollup (line 976) | def test_report_weighted_rollup(wh):
  function test_report_weighted_multi_rollup (line 986) | def test_report_weighted_multi_rollup(wh):
  function test_report_multi_dimension (line 997) | def test_report_multi_dimension(wh):
  function test_report_rollup (line 1005) | def test_report_rollup(wh):
  function test_report_multi_rollup (line 1017) | def test_report_multi_rollup(wh):
  function test_report_all_rollup (line 1029) | def test_report_all_rollup(wh):
  function test_report_rollup_order_null (line 1041) | def test_report_rollup_order_null(wh):
  function test_report_multi_rollup_pivot (line 1076) | def test_report_multi_rollup_pivot(wh):
  function test_report_adhoc_metric (line 1087) | def test_report_adhoc_metric(wh):
  function test_report_adhoc_metric_display_name (line 1095) | def test_report_adhoc_metric_display_name(wh):
  function test_report_adhoc_nested_metric (line 1111) | def test_report_adhoc_nested_metric(wh):
  function test_report_adhoc_aggregation (line 1123) | def test_report_adhoc_aggregation(wh):
  function test_report_adhoc_weighting (line 1136) | def test_report_adhoc_weighting(wh):
  function test_report_adhoc_dimension (line 1154) | def test_report_adhoc_dimension(wh):
  function test_report_formula_dimension (line 1227) | def test_report_formula_dimension(wh):
  function test_report_where_criteria_conversions (line 1235) | def test_report_where_criteria_conversions(config):
  function test_report_sqlite_date_conversions (line 1252) | def test_report_sqlite_date_conversions(config):
  function test_report_datasource_priority (line 1269) | def test_report_datasource_priority(wh):
  function test_report_table_priority (line 1276) | def test_report_table_priority(config):
  function test_report_disabled_tables (line 1291) | def test_report_disabled_tables(wh):
  function test_report_multi_datasource (line 1322) | def test_report_multi_datasource(wh):
  function test_report_save_and_load (line 1332) | def test_report_save_and_load(saved_wh):
  function test_report_save_with_meta (line 1351) | def test_report_save_with_meta(saved_wh):
  function test_report_adhoc_metric_save_and_load (line 1363) | def test_report_adhoc_metric_save_and_load(saved_wh):
  function test_report_load_invalid_id (line 1376) | def test_report_load_invalid_id(saved_wh):
  function test_report_sub_report_by_id (line 1381) | def test_report_sub_report_by_id(saved_wh):
  function test_report_sub_report_with_params (line 1438) | def test_report_sub_report_with_params(saved_wh):
  function test_report_adhoc_datasource (line 1492) | def test_report_adhoc_datasource(wh, adhoc_ds):
  function test_report_save_and_load_adhoc_datasource (line 1500) | def test_report_save_and_load_adhoc_datasource(saved_wh, adhoc_ds):
  function test_report_missing_adhoc_datasource_save_and_load (line 1515) | def test_report_missing_adhoc_datasource_save_and_load(saved_wh, adhoc_ds):
  function test_report_invalid_adhoc_datasource (line 1529) | def test_report_invalid_adhoc_datasource(wh, adhoc_ds):
  function test_regular_datasource_adhoc (line 1540) | def test_regular_datasource_adhoc(config):
  function test_only_adhoc_datasource (line 1551) | def test_only_adhoc_datasource(adhoc_ds):
  function test_no_use_full_column_names (line 1560) | def test_no_use_full_column_names(config):
  function test_report_column_required_grain (line 1570) | def test_report_column_required_grain(config):
  function test_ds_metric_formula_sql_injection (line 1579) | def test_ds_metric_formula_sql_injection(config):
  function test_ds_dim_formula_sql_injection (line 1596) | def test_ds_dim_formula_sql_injection(config):
  function test_metric_formula_sql_injection (line 1613) | def test_metric_formula_sql_injection(config):
  function test_weighting_metric_sql_injection (line 1630) | def test_weighting_metric_sql_injection(config):
  function test_adhoc_metric_sql_injection (line 1648) | def test_adhoc_metric_sql_injection(wh):
  function test_criteria_sql_injection (line 1658) | def test_criteria_sql_injection(wh):
  function test_row_filter_sql_injection (line 1686) | def test_row_filter_sql_injection(wh):
  function test_pivot_sql_injection (line 1694) | def test_pivot_sql_injection(wh):
  function test_metric_name_sql_injection (line 1702) | def test_metric_name_sql_injection(config):
  function test_dimension_name_sql_injection (line 1717) | def test_dimension_name_sql_injection(config):
  function test_type_conversion_prefix_sql_injection (line 1725) | def test_type_conversion_prefix_sql_injection(config):
  function test_table_name_sql_injection (line 1738) | def test_table_name_sql_injection(config):

FILE: tests/test_scripts.py
  function test_bootstrap_datasource_config (line 7) | def test_bootstrap_datasource_config():
  function test_nlp_bootstrap_datasource_config (line 18) | def test_nlp_bootstrap_datasource_config():

FILE: tests/test_utils.py
  function update_zillion_config (line 59) | def update_zillion_config(updates):
  function mysql_data_init (line 69) | def mysql_data_init():
  function postgresql_data_init (line 79) | def postgresql_data_init():
  function duckdb_data_init (line 91) | def duckdb_data_init(conn):
  function get_pymysql_conn (line 99) | def get_pymysql_conn():
  function get_sqlalchemy_mysql_engine (line 116) | def get_sqlalchemy_mysql_engine():
  function get_sqlalchemy_mysql_conn (line 133) | def get_sqlalchemy_mysql_conn():
  function get_sqlalchemy_postgresql_engine (line 138) | def get_sqlalchemy_postgresql_engine():
  function get_sqlalchemy_postgresql_conn (line 148) | def get_sqlalchemy_postgresql_conn():
  function get_sqlalchemy_duckdb_engine (line 153) | def get_sqlalchemy_duckdb_engine():
  function get_sqlalchemy_duckdb_conn (line 164) | def get_sqlalchemy_duckdb_conn():
  function get_sql (line 170) | def get_sql(sql):
  function create_test_metadata (line 180) | def create_test_metadata(ds_config):
  function drop_metadata_table_if_exists (line 198) | def drop_metadata_table_if_exists(metadata, table_name):
  function create_adhoc_data (line 206) | def create_adhoc_data(column_types, size):
  function create_adhoc_datatable (line 228) | def create_adhoc_datatable(name, table_config, primary_key, column_types...
  function get_adhoc_table_config (line 247) | def get_adhoc_table_config():
  function get_dma_zip_table_config (line 260) | def get_dma_zip_table_config():
  function get_adhoc_datasource (line 273) | def get_adhoc_datasource(size=10, name="adhoc_table1", reuse=False):
  function get_testdb_url (line 306) | def get_testdb_url(dbname=DEFAULT_TEST_DB):
  function get_date_conversion_test_params (line 335) | def get_date_conversion_test_params():
  function wh_execute_args (line 404) | def wh_execute_args(d):
  function wh_execute (line 424) | def wh_execute(wh, d):

FILE: zillion/configs.py
  class PatchedField (line 22) | class PatchedField(original_mfields.Field):
    method __init__ (line 23) | def __init__(self, *args, **kwargs):
  function parse_schema_file (line 86) | def parse_schema_file(f, schema):
  function load_warehouse_config (line 111) | def load_warehouse_config(cfg):
  function load_warehouse_config_from_env (line 130) | def load_warehouse_config_from_env(var):
  function load_datasource_config (line 137) | def load_datasource_config(cfg):
  function load_datasource_config_from_env (line 156) | def load_datasource_config_from_env(var):
  function table_safe_name (line 163) | def table_safe_name(name):
  function field_safe_name (line 182) | def field_safe_name(name):
  function default_field_name (line 201) | def default_field_name(column):
  function default_field_display_name (line 217) | def default_field_display_name(name):
  function is_active (line 234) | def is_active(obj):
  function is_valid_table_type (line 243) | def is_valid_table_type(val):
  function is_valid_table_name (line 250) | def is_valid_table_name(val):
  function is_valid_if_exists (line 257) | def is_valid_if_exists(val):
  function is_valid_field_name (line 264) | def is_valid_field_name(val):
  function is_valid_field_display_name (line 278) | def is_valid_field_display_name(val):
  function is_valid_sqlalchemy_type (line 290) | def is_valid_sqlalchemy_type(val):
  function has_valid_sqlalchemy_type_values (line 300) | def has_valid_sqlalchemy_type_values(val):
  function is_valid_aggregation (line 310) | def is_valid_aggregation(val):
  function is_valid_column_field_config (line 322) | def is_valid_column_field_config(val):
  function is_valid_technical_type (line 333) | def is_valid_technical_type(val):
  function is_valid_technical_mode (line 340) | def is_valid_technical_mode(val):
  function is_valid_technical (line 347) | def is_valid_technical(val):
  function is_valid_dimension_values (line 356) | def is_valid_dimension_values(val):
  function is_valid_datasource_criteria_conversions (line 365) | def is_valid_datasource_criteria_conversions(val):
  function is_valid_connect_type (line 408) | def is_valid_connect_type(val):
  function is_valid_datasource_connect (line 419) | def is_valid_datasource_connect(val):
  function is_valid_datasource_config (line 430) | def is_valid_datasource_config(val):
  function is_valid_field_nlp_embedding_text_config (line 439) | def is_valid_field_nlp_embedding_text_config(val):
  function is_valid_divisors_config (line 448) | def is_valid_divisors_config(val):
  function get_divisor_metrics (line 457) | def get_divisor_metrics(metric):
  function get_aggregation_metrics (line 503) | def get_aggregation_metrics(metric):
  class PolyNested (line 520) | class PolyNested(mfields.Nested):
    method _deserialize (line 523) | def _deserialize(self, value, attr, data, partial=None, **kwargs):
  class BaseSchema (line 541) | class BaseSchema(Schema):
    class Meta (line 553) | class Meta:
  class TechnicalInfoSchema (line 559) | class TechnicalInfoSchema(BaseSchema):
  class TechnicalField (line 567) | class TechnicalField(mfields.Field):
    method _validate (line 570) | def _validate(self, value):
  class DataSourceCriteriaConversionsField (line 575) | class DataSourceCriteriaConversionsField(mfields.Field):
    method _validate (line 581) | def _validate(self, value):
  class DimensionValuesField (line 586) | class DimensionValuesField(mfields.Field):
    method _validate (line 589) | def _validate(self, value):
  class ColumnFieldConfigSchema (line 594) | class ColumnFieldConfigSchema(BaseSchema):
  class ColumnFieldConfigField (line 610) | class ColumnFieldConfigField(mfields.Field):
    method _validate (line 613) | def _validate(self, value):
  class ColumnInfoSchema (line 618) | class ColumnInfoSchema(BaseSchema):
  class ColumnConfigSchema (line 648) | class ColumnConfigSchema(ColumnInfoSchema):
  class TableTypeField (line 654) | class TableTypeField(mfields.Field):
    method _validate (line 657) | def _validate(self, value):
  class TableInfoSchema (line 662) | class TableInfoSchema(BaseSchema):
  class TableConfigSchema (line 715) | class TableConfigSchema(TableInfoSchema):
  class NLPEmbeddingTextField (line 757) | class NLPEmbeddingTextField(mfields.Field):
    method _validate (line 760) | def _validate(self, value):
  class FieldMetaNLPConfigSchema (line 765) | class FieldMetaNLPConfigSchema(BaseSchema):
  function check_field_meta_nlp_config (line 782) | def check_field_meta_nlp_config(data):
  class FieldConfigSchema (line 796) | class FieldConfigSchema(BaseSchema):
    method _check_meta (line 817) | def _check_meta(self, data, **kwargs):
  class FormulaFieldConfigSchema (line 822) | class FormulaFieldConfigSchema(BaseSchema):
    method _check_meta (line 845) | def _check_meta(self, data, **kwargs):
  class MetricConfigSchemaMixin (line 850) | class MetricConfigSchemaMixin:
    method _validate_weighting_aggregation (line 881) | def _validate_weighting_aggregation(self, data):
  class DivisorsConfigSchema (line 892) | class DivisorsConfigSchema(BaseSchema):
  class DivisorsConfigField (line 914) | class DivisorsConfigField(mfields.Field):
    method _validate (line 917) | def _validate(self, value):
  class MetricConfigSchema (line 922) | class MetricConfigSchema(FieldConfigSchema, MetricConfigSchemaMixin):
  class FormulaMetricConfigSchema (line 939) | class FormulaMetricConfigSchema(FormulaFieldConfigSchema, MetricConfigSc...
  class DimensionConfigSchemaMixin (line 945) | class DimensionConfigSchemaMixin:
  class DimensionConfigSchema (line 968) | class DimensionConfigSchema(FieldConfigSchema, DimensionConfigSchemaMixin):
  class FormulaDimensionConfigSchema (line 974) | class FormulaDimensionConfigSchema(
  class AdHocFieldSchema (line 982) | class AdHocFieldSchema(FormulaFieldConfigSchema):
  class AdHocMetricSchema (line 988) | class AdHocMetricSchema(AdHocFieldSchema):
  class TableNameField (line 1018) | class TableNameField(mfields.Str):
    method _validate (line 1021) | def _validate(self, value):
  class DataSourceConnectSchema (line 1026) | class DataSourceConnectSchema(BaseSchema):
  class DataSourceConnectField (line 1036) | class DataSourceConnectField(mfields.Field):
    method _validate (line 1039) | def _validate(self, value):
  function check_metric_configs (line 1044) | def check_metric_configs(data):
  class DataSourceConfigSchema (line 1069) | class DataSourceConfigSchema(BaseSchema):
    method _check_table_refs (line 1103) | def _check_table_refs(self, data, **kwargs):
    method _check_metrics (line 1118) | def _check_metrics(self, data, **kwargs):
  class DataSourceConfigField (line 1124) | class DataSourceConfigField(mfields.Field):
    method _validate (line 1128) | def _validate(self, value):
  class WarehouseMetaNLPConfigSchema (line 1133) | class WarehouseMetaNLPConfigSchema(BaseSchema):
  class WarehouseConfigSchema (line 1154) | class WarehouseConfigSchema(BaseSchema):
    method _check_meta (line 1180) | def _check_meta(self, data, **kwargs):
    method _check_includes (line 1194) | def _check_includes(self, data, **kwargs):
    method _check_ds_refs (line 1207) | def _check_ds_refs(self, data, **kwargs):
    method _check_metrics (line 1222) | def _check_metrics(self, data, **kwargs):
  class ConfigMixin (line 1228) | class ConfigMixin:
    method __init__ (line 1233) | def __init__(self, *args, **kwargs):
    method to_config (line 1238) | def to_config(self):
    method from_config (line 1243) | def from_config(cls, config):
  class ZillionInfo (line 1248) | class ZillionInfo(MappingMixin):
    method __init__ (line 1270) | def __init__(self, **kwargs):
    method schema_validate (line 1275) | def schema_validate(cls, zillion_info, unknown=RAISE):
    method schema_load (line 1288) | def schema_load(cls, zillion_info, unknown=RAISE):
    method create (line 1305) | def create(cls, zillion_info, unknown=RAISE):
  class TableInfo (line 1316) | class TableInfo(ZillionInfo, PrintMixin):
  class ColumnInfo (line 1324) | class ColumnInfo(ZillionInfo, PrintMixin):
    method __init__ (line 1331) | def __init__(self, **kwargs):
    method has_field (line 1337) | def has_field(self, field):
    method add_field (line 1345) | def add_field(self, field):
    method get_field (line 1350) | def get_field(self, name):
    method get_fields (line 1367) | def get_fields(self):
    method get_field_names (line 1371) | def get_field_names(self):
    method field_ds_formula (line 1375) | def field_ds_formula(self, name):
    method has_field_ds_formula (line 1382) | def has_field_ds_formula(self, name):
    method get_criteria_conversion (line 1390) | def get_criteria_conversion(self, field_name, operation):
    method _add_field_to_map (line 1400) | def _add_field_to_map(self, field):
  class Technical (line 1416) | class Technical(MappingMixin, PrintMixin):
    method __init__ (line 1437) | def __init__(self, type, params, mode=None):
    method _check_params (line 1446) | def _check_params(cls, params):
    method parse_technical_string_params (line 1455) | def parse_technical_string_params(cls, val):
    method get_default_mode (line 1465) | def get_default_mode(cls):
    method _apply (line 1469) | def _apply(self, df_slice, column, rounding=None):
    method apply (line 1474) | def apply(self, df, column, rounding=None):
  class PandasTechnical (line 1530) | class PandasTechnical(Technical):
    method _apply (line 1533) | def _apply(self, df_or_series_slice, column, rounding=None):
  class RankTechnical (line 1545) | class RankTechnical(PandasTechnical):
    method _apply (line 1548) | def _apply(self, df_or_series_slice, column, rounding=None):
  class DiffTechnical (line 1561) | class DiffTechnical(PandasTechnical):
    method parse_technical_string_params (line 1567) | def parse_technical_string_params(cls, val):
  class RollingTechnical (line 1579) | class RollingTechnical(Technical):
    method _apply (line 1584) | def _apply(self, df_or_series_slice, column, rounding=None):
    method parse_technical_string_params (line 1596) | def parse_technical_string_params(cls, val):
  class BollingerTechnical (line 1609) | class BollingerTechnical(RollingTechnical):
    method _apply (line 1613) | def _apply(self, df_slice, column, rounding=None):
  function _extract_technical_string_parts (line 1665) | def _extract_technical_string_parts(val):
  function parse_technical_string (line 1690) | def parse_technical_string(val):
  function create_technical (line 1721) | def create_technical(info):

FILE: zillion/core.py
  class ZillionException (line 84) | class ZillionException(Exception):
  class InvalidTechnicalException (line 88) | class InvalidTechnicalException(ZillionException):
  class WarehouseException (line 92) | class WarehouseException(ZillionException):
  class InvalidWarehouseIdException (line 96) | class InvalidWarehouseIdException(ZillionException):
  class ReportException (line 100) | class ReportException(ZillionException):
  class InvalidReportIdException (line 104) | class InvalidReportIdException(ZillionException):
  class UnsupportedGrainException (line 108) | class UnsupportedGrainException(ZillionException):
  class UnsupportedKillException (line 112) | class UnsupportedKillException(ZillionException):
  class FailedKillException (line 116) | class FailedKillException(ZillionException):
  class DataSourceQueryTimeoutException (line 120) | class DataSourceQueryTimeoutException(ZillionException):
  class ExecutionKilledException (line 124) | class ExecutionKilledException(ZillionException):
  class ExecutionLockException (line 128) | class ExecutionLockException(ZillionException):
  class InvalidFieldException (line 132) | class InvalidFieldException(ZillionException):
  class InvalidDimensionValueException (line 136) | class InvalidDimensionValueException(ZillionException):
  class DisallowedSQLException (line 140) | class DisallowedSQLException(ZillionException):
  class MaxFormulaDepthException (line 144) | class MaxFormulaDepthException(ZillionException):
  class FieldTypes (line 148) | class FieldTypes(metaclass=ClassValueContainsMeta):
  class TableTypes (line 155) | class TableTypes(metaclass=ClassValueContainsMeta):
  class AggregationTypes (line 162) | class AggregationTypes(metaclass=ClassValueContainsMeta):
  class TechnicalTypes (line 174) | class TechnicalTypes(metaclass=ClassValueContainsMeta):
  class TechnicalModes (line 194) | class TechnicalModes(metaclass=ClassValueContainsMeta):
  class RollupTypes (line 209) | class RollupTypes(metaclass=ClassValueContainsMeta):
  class OrderByTypes (line 216) | class OrderByTypes(metaclass=ClassValueContainsMeta):
  class DataSourceQueryModes (line 223) | class DataSourceQueryModes(metaclass=ClassValueContainsMeta):
  class ExecutionState (line 230) | class ExecutionState:
  class IfExistsModes (line 238) | class IfExistsModes(metaclass=ClassValueContainsMeta):
  class IfFileExistsModes (line 251) | class IfFileExistsModes(IfExistsModes):
  function powerset (line 264) | def powerset(iterable, max_combo_len=None):
  function raiseif (line 272) | def raiseif(cond, msg="", exc=ZillionException):
  function raiseifnot (line 278) | def raiseifnot(cond, msg="", exc=ZillionException):
  function igetattr (line 284) | def igetattr(obj, attr, *args):
  function read_filepath_or_buffer (line 294) | def read_filepath_or_buffer(f, open_flags="r", compression=None):
  function download_file (line 310) | def download_file(url, outfile=None):
  function get_modified_time (line 323) | def get_modified_time(fname):
  function get_time_since_modified (line 328) | def get_time_since_modified(fname):
  function load_yaml (line 333) | def load_yaml(fname):
  function load_json_or_yaml_from_str (line 340) | def load_json_or_yaml_from_str(string, f=None, schema=None):
  function dictmerge (line 380) | def dictmerge(x, y, path=None, overwrite=False, extend=False):
  function load_zillion_config (line 410) | def load_zillion_config():
  function get_zillion_config_log_level (line 477) | def get_zillion_config_log_level():
  function set_log_level_from_config (line 484) | def set_log_level_from_config(cfg):
  function set_log_level (line 501) | def set_log_level(level):
  function dbg (line 507) | def dbg(msg, **kwargs):
  function dbgsql (line 514) | def dbgsql(msg, **kwargs):
  function info (line 521) | def info(msg, **kwargs):
  function warn (line 528) | def warn(msg, **kwargs):
  function error (line 535) | def error(msg, **kwargs):

FILE: zillion/datasource.py
  function entity_name_from_file (line 57) | def entity_name_from_file(filename):
  function get_ds_config_context (line 61) | def get_ds_config_context(name):
  function populate_url_context (line 66) | def populate_url_context(url, ds_name):
  function get_engine_extra_kwargs (line 74) | def get_engine_extra_kwargs(url):
  function connect_url_to_metadata (line 89) | def connect_url_to_metadata(url, ds_name=None):
  function parse_replace_after (line 105) | def parse_replace_after(replace_after):
  function data_url_to_metadata (line 133) | def data_url_to_metadata(
  function metadata_from_connect (line 170) | def metadata_from_connect(connect, ds_name):
  function reflect_metadata (line 193) | def reflect_metadata(metadata, reflect_only=None):
  function get_adhoc_datasource_filename (line 220) | def get_adhoc_datasource_filename(ds_name):
  function get_adhoc_datasource_url (line 226) | def get_adhoc_datasource_url(ds_name):
  function url_connect (line 231) | def url_connect(
  class TableSet (line 281) | class TableSet(PrintMixin):
    method __init__ (line 300) | def __init__(self, datasource, ds_table, join, grain, target_fields):
    method get_covered_metrics (line 303) | def get_covered_metrics(self, wh):
    method get_covered_fields (line 322) | def get_covered_fields(self):
    method __len__ (line 326) | def __len__(self):
  class JoinPart (line 332) | class JoinPart(PrintMixin):
    method __init__ (line 338) | def __init__(self, datasource, table_names, join_fields):
  class Join (line 342) | class Join(PrintMixin):
    method __init__ (line 358) | def __init__(self, join_parts, field_map):
    method __key (line 365) | def __key(self):
    method __hash__ (line 368) | def __hash__(self):
    method __eq__ (line 371) | def __eq__(self, other):
    method __len__ (line 374) | def __len__(self):
    method add_join_part_tables (line 377) | def add_join_part_tables(self, join_part):
    method get_covered_fields (line 392) | def get_covered_fields(self):
    method add_field (line 401) | def add_field(self, field):
    method add_fields (line 415) | def add_fields(self, fields):
    method join_parts_for_table (line 421) | def join_parts_for_table(self, table_name):
    method join_fields_for_table (line 425) | def join_fields_for_table(self, table_name):
    method combine (line 434) | def combine(cls, join1, join2):
  function join_from_path (line 455) | def join_from_path(ds, path, field_map=None):
  class NeighborTable (line 485) | class NeighborTable(PrintMixin):
    method __init__ (line 491) | def __init__(self, table, join_fields):
  class DataSource (line 495) | class DataSource(FieldManagerMixin, PrintMixin):
    method __init__ (line 513) | def __init__(self, name, metadata=None, config=None, nlp=False):
    method metric_tables (line 551) | def metric_tables(self):
    method dimension_tables (line 560) | def dimension_tables(self):
    method has_table (line 568) | def has_table(self, table, check_active=True):
    method get_table (line 592) | def get_table(self, fullname, check_active=True):
    method get_tables_with_field (line 611) | def get_tables_with_field(self, field_name, table_type=None):
    method get_metric_tables_with_metric (line 634) | def get_metric_tables_with_metric(self, metric_name):
    method get_dim_tables_with_dim (line 638) | def get_dim_tables_with_dim(self, dim_name):
    method get_columns_with_field (line 642) | def get_columns_with_field(self, field_name):
    method apply_config (line 656) | def apply_config(self, config, reflect=False, nlp=False):
    method find_neighbor_tables (line 713) | def find_neighbor_tables(self, table):
    method find_descendent_tables (line 786) | def find_descendent_tables(self, table):
    method get_possible_joins (line 790) | def get_possible_joins(self, table, grain):
    method find_possible_table_sets (line 834) | def find_possible_table_sets(
    method get_dialect_name (line 879) | def get_dialect_name(self):
    method get_params (line 883) | def get_params(self):
    method print_info (line 889) | def print_info(self):
    method _load_adhoc_tables (line 916) | def _load_adhoc_tables(self, config):
    method _apply_table_configs (line 946) | def _apply_table_configs(self, table_configs):
    method _ensure_metadata_info (line 983) | def _ensure_metadata_info(self):
    method _add_conversion_fields (line 1040) | def _add_conversion_fields(self):
    method _add_metric_column (line 1116) | def _add_metric_column(self, column, field, aggregation=None, rounding...
    method _add_dimension_column (line 1133) | def _add_dimension_column(self, column, field):
    method _add_metric_table_fields (line 1143) | def _add_metric_table_fields(self, table, nlp=False):
    method _add_dimension_table_fields (line 1180) | def _add_dimension_table_fields(self, table):
    method _populate_fields (line 1203) | def _populate_fields(self, config, nlp=False):
    method _build_graph (line 1217) | def _build_graph(self):
    method _invert_field_joins (line 1237) | def _invert_field_joins(self, field_joins):
    method _populate_max_join_field_coverage (line 1247) | def _populate_max_join_field_coverage(self, join_fields, grain):
    method _eliminate_redundant_joins (line 1257) | def _eliminate_redundant_joins(self, sorted_join_fields, main_table):
    method _find_join_combinations (line 1315) | def _find_join_combinations(self, sorted_join_fields, grain):
    method _combine_orthogonal_joins (line 1395) | def _combine_orthogonal_joins(self, candidates):
    method _choose_best_join_combination (line 1426) | def _choose_best_join_combination(self, candidates):
    method _consolidate_field_joins (line 1440) | def _consolidate_field_joins(self, table, grain, field_joins):
    method _find_joins_to_dimension (line 1466) | def _find_joins_to_dimension(self, table, dimension):
    method from_db_file (line 1512) | def from_db_file(
    method from_datatables (line 1565) | def from_datatables(cls, name, datatables, config=None, nlp=False):
    method from_data_file (line 1609) | def from_data_file(
    method _check_or_create_name (line 1656) | def _check_or_create_name(cls, name):
  class AdHocDataTable (line 1672) | class AdHocDataTable(PrintMixin):
    method __init__ (line 1705) | def __init__(
    method fullname (line 1761) | def fullname(self):
    method get_dataframe (line 1767) | def get_dataframe(self):
    method table_exists (line 1779) | def table_exists(self, engine):
    method to_sql (line 1783) | def to_sql(self, engine, method="multi", chunksize=int(1e3)):
  class SQLiteDataTable (line 1896) | class SQLiteDataTable(AdHocDataTable):
    method get_dataframe (line 1905) | def get_dataframe(self):
    method to_sql (line 1908) | def to_sql(self, engine, **kwargs):
  class CSVDataTable (line 1912) | class CSVDataTable(AdHocDataTable):
    method get_dataframe (line 1915) | def get_dataframe(self):
  class ExcelDataTable (line 1925) | class ExcelDataTable(AdHocDataTable):
    method get_dataframe (line 1928) | def get_dataframe(self):
  class JSONDataTable (line 1940) | class JSONDataTable(AdHocDataTable):
    method get_dataframe (line 1943) | def get_dataframe(self, orient="table"):
  class HTMLDataTable (line 1951) | class HTMLDataTable(AdHocDataTable):
    method get_dataframe (line 1956) | def get_dataframe(self):
  class GoogleSheetsDataTable (line 1967) | class GoogleSheetsDataTable(AdHocDataTable):
    method get_dataframe (line 1970) | def get_dataframe(self):
  function datatable_from_config (line 1992) | def datatable_from_config(name, config, schema=None, **kwargs):

FILE: zillion/dialects/conversions.py
  class DialectDateConversions (line 4) | class DialectDateConversions:
    method f (line 6) | def f(cls, func, i=0):
    method raw_value (line 12) | def raw_value(cls, v):
    method date_year_start (line 16) | def date_year_start(cls, v):
    method date_year_plus_year (line 20) | def date_year_plus_year(cls, v):
    method datetime_year_end (line 24) | def datetime_year_end(cls, x):
    method date_month_start (line 28) | def date_month_start(cls, x):
    method date_month_plus_month (line 32) | def date_month_plus_month(cls, x):
    method datetime_month_end (line 36) | def datetime_month_end(cls, x):
    method date_plus_day (line 40) | def date_plus_day(cls, x):
    method datetime_day_end (line 44) | def datetime_day_end(cls, x):
    method datetime_hour_plus_hour (line 48) | def datetime_hour_plus_hour(cls, x):
    method datetime_hour_end (line 52) | def datetime_hour_end(cls, x):
    method datetime_minute_plus_minute (line 56) | def datetime_minute_plus_minute(cls, x):
    method datetime_minute_end (line 60) | def datetime_minute_end(cls, x):
    method get_year_criteria_conversions (line 64) | def get_year_criteria_conversions(cls):
    method get_month_criteria_conversions (line 92) | def get_month_criteria_conversions(cls):
    method get_date_criteria_conversions (line 121) | def get_date_criteria_conversions(cls):
    method get_hour_criteria_conversions (line 136) | def get_hour_criteria_conversions(cls):
    method get_minute_criteria_conversions (line 154) | def get_minute_criteria_conversions(cls):
    method get_datetime_criteria_conversions (line 175) | def get_datetime_criteria_conversions(cls):

FILE: zillion/dialects/duckdb.py
  class DuckDBDialectDateConversions (line 14) | class DuckDBDialectDateConversions(DialectDateConversions):
    method date_year_start (line 16) | def date_year_start(cls, x):
    method date_year_plus_year (line 20) | def date_year_plus_year(cls, x):
    method datetime_year_end (line 26) | def datetime_year_end(cls, x):
    method date_month_start (line 34) | def date_month_start(cls, x):
    method date_month_plus_month (line 40) | def date_month_plus_month(cls, x):
    method datetime_month_end (line 50) | def datetime_month_end(cls, x):
    method date_plus_day (line 58) | def date_plus_day(cls, x):
    method datetime_day_end (line 62) | def datetime_day_end(cls, x):
    method datetime_hour_plus_hour (line 70) | def datetime_hour_plus_hour(cls, x):
    method datetime_hour_end (line 74) | def datetime_hour_end(cls, x):
    method datetime_minute_plus_minute (line 82) | def datetime_minute_plus_minute(cls, x):
    method datetime_minute_end (line 86) | def datetime_minute_end(cls, x):

FILE: zillion/dialects/mysql.py
  class MySQLDialectDateConversions (line 6) | class MySQLDialectDateConversions(DialectDateConversions):
    method date_year_start (line 8) | def date_year_start(cls, x):
    method date_year_plus_year (line 12) | def date_year_plus_year(cls, x):
    method datetime_year_end (line 16) | def datetime_year_end(cls, x):
    method date_month_start (line 23) | def date_month_start(cls, x):
    method date_month_plus_month (line 27) | def date_month_plus_month(cls, x):
    method datetime_month_end (line 31) | def datetime_month_end(cls, x):
    method date_plus_day (line 38) | def date_plus_day(cls, x):
    method datetime_day_end (line 42) | def datetime_day_end(cls, x):
    method datetime_hour_plus_hour (line 48) | def datetime_hour_plus_hour(cls, x):
    method datetime_hour_end (line 52) | def datetime_hour_end(cls, x):
    method datetime_minute_plus_minute (line 58) | def datetime_minute_plus_minute(cls, x):
    method datetime_minute_end (line 62) | def datetime_minute_end(cls, x):

FILE: zillion/dialects/postgresql.py
  function get_interval (line 8) | def get_interval(n, t):
  class PostgreSQLDialectDateConversions (line 12) | class PostgreSQLDialectDateConversions(DialectDateConversions):
    method date_year_start (line 14) | def date_year_start(cls, x):
    method date_year_plus_year (line 18) | def date_year_plus_year(cls, x):
    method datetime_year_end (line 22) | def datetime_year_end(cls, x):
    method date_month_start (line 30) | def date_month_start(cls, x):
    method date_month_plus_month (line 34) | def date_month_plus_month(cls, x):
    method datetime_month_end (line 38) | def datetime_month_end(cls, x):
    method date_plus_day (line 46) | def date_plus_day(cls, x):
    method datetime_day_end (line 50) | def datetime_day_end(cls, x):
    method datetime_hour_plus_hour (line 58) | def datetime_hour_plus_hour(cls, x):
    method datetime_hour_end (line 62) | def datetime_hour_end(cls, x):
    method datetime_minute_plus_minute (line 70) | def datetime_minute_plus_minute(cls, x):
    method datetime_minute_end (line 76) | def datetime_minute_end(cls, x):

FILE: zillion/dialects/sqlite.py
  class SQLiteDialectDateConversions (line 7) | class SQLiteDialectDateConversions(DialectDateConversions):
    method date_year_start (line 9) | def date_year_start(cls, x):
    method date_year_plus_year (line 13) | def date_year_plus_year(cls, x):
    method datetime_year_end (line 17) | def datetime_year_end(cls, x):
    method date_month_start (line 21) | def date_month_start(cls, x):
    method date_month_plus_month (line 25) | def date_month_plus_month(cls, x):
    method datetime_month_end (line 29) | def datetime_month_end(cls, x):
    method date_plus_day (line 33) | def date_plus_day(cls, x):
    method datetime_day_end (line 37) | def datetime_day_end(cls, x):
    method datetime_hour_plus_hour (line 41) | def datetime_hour_plus_hour(cls, x):
    method datetime_hour_end (line 45) | def datetime_hour_end(cls, x):
    method datetime_minute_plus_minute (line 49) | def datetime_minute_plus_minute(cls, x):
    method datetime_minute_end (line 53) | def datetime_minute_end(cls, x):

FILE: zillion/field.py
  class Field (line 37) | class Field(ConfigMixin, PrintMixin):
    method __init__ (line 71) | def __init__(
    method copy (line 83) | def copy(self):
    method get_all_raw_fields (line 87) | def get_all_raw_fields(self, warehouse, adhoc_fms=None):
    method get_formula_fields (line 106) | def get_formula_fields(self, warehouse, depth=0, adhoc_fms=None):
    method get_ds_expression (line 127) | def get_ds_expression(self, column, label=True, ignore_formula=False):
    method get_final_select_clause (line 157) | def get_final_select_clause(self, *args, **kwargs):
    method __key (line 163) | def __key(self):
    method __hash__ (line 166) | def __hash__(self):
    method __eq__ (line 169) | def __eq__(self, other):
  class Metric (line 173) | class Metric(Field):
    method __init__ (line 204) | def __init__(
    method get_all_raw_fields (line 244) | def get_all_raw_fields(self, warehouse, adhoc_fms=None):
    method get_ds_expression (line 266) | def get_ds_expression(self, column, label=True):
    method get_final_select_clause (line 326) | def get_final_select_clause(self, *args, ifnull_clause=None, **kwargs):
  class Dimension (line 333) | class Dimension(Field):
    method __init__ (line 358) | def __init__(
    method get_values (line 384) | def get_values(self, warehouse_id, refresh=False):
    method is_valid_value (line 410) | def is_valid_value(self, warehouse_id, value, ignore_none=True):
    method sort (line 440) | def sort(self, warehouse_id, values):
  class FormulaField (line 458) | class FormulaField(Field):
    method __init__ (line 472) | def __init__(self, name, formula, **kwargs):
    method get_formula_fields (line 475) | def get_formula_fields(self, warehouse, depth=0, adhoc_fms=None):
    method get_ds_expression (line 533) | def get_ds_expression(self, *args, **kwargs):
    method get_final_select_clause (line 537) | def get_final_select_clause(self, warehouse, adhoc_fms=None, **kwargs):
    method _check_formula_fields (line 562) | def _check_formula_fields(self, warehouse, adhoc_fms=None):
  class FormulaDimension (line 575) | class FormulaDimension(FormulaField):
    method _check_formula_fields (line 580) | def _check_formula_fields(self, warehouse, adhoc_fms=None):
  class FormulaMetric (line 594) | class FormulaMetric(FormulaField):
    method __init__ (line 623) | def __init__(
    method get_all_raw_fields (line 654) | def get_all_raw_fields(self, warehouse, adhoc_fms=None):
  class AdHocField (line 677) | class AdHocField(FormulaField):
    method create (line 681) | def create(cls, obj):
  class AdHocMetric (line 693) | class AdHocMetric(FormulaMetric):
    method __init__ (line 719) | def __init__(
    method create (line 747) | def create(cls, obj):
  class AdHocDimension (line 765) | class AdHocDimension(FormulaDimension):
    method create (line 771) | def create(cls, obj):
  function create_metric (line 783) | def create_metric(metric_def):
  function create_dimension (line 799) | def create_dimension(dim_def):
  class FieldManagerMixin (line 815) | class FieldManagerMixin:
    method get_child_field_managers (line 831) | def get_child_field_managers(self):
    method get_field_managers (line 835) | def get_field_managers(self, adhoc_fms=None):
    method get_direct_metrics (line 839) | def get_direct_metrics(self):
    method get_direct_dimensions (line 843) | def get_direct_dimensions(self):
    method directly_has_metric (line 847) | def directly_has_metric(self, name):
    method directly_has_dimension (line 851) | def directly_has_dimension(self, name):
    method directly_has_field (line 855) | def directly_has_field(self, name):
    method print_metrics (line 861) | def print_metrics(self, indent=None):
    method print_dimensions (line 865) | def print_dimensions(self, indent=None):
    method has_metric (line 871) | def has_metric(self, name, adhoc_fms=None):
    method has_dimension (line 880) | def has_dimension(self, name, adhoc_fms=None):
    method has_field (line 889) | def has_field(self, name, adhoc_fms=None):
    method get_metric (line 898) | def get_metric(self, obj, adhoc_fms=None):
    method get_dimension (line 920) | def get_dimension(self, obj, adhoc_fms=None):
    method get_field (line 942) | def get_field(self, obj, adhoc_fms=None):
    method get_field_instances (line 962) | def get_field_instances(self, field, adhoc_fms=None):
    method get_metrics (line 978) | def get_metrics(self, adhoc_fms=None):
    method get_dimensions (line 987) | def get_dimensions(self, adhoc_fms=None):
    method get_fields (line 996) | def get_fields(self, adhoc_fms=None):
    method get_direct_fields (line 1006) | def get_direct_fields(self):
    method get_direct_metric_configs (line 1013) | def get_direct_metric_configs(self):
    method get_direct_dimension_configs (line 1018) | def get_direct_dimension_configs(self):
    method get_metric_configs (line 1023) | def get_metric_configs(self, adhoc_fms=None):
    method get_dimension_configs (line 1032) | def get_dimension_configs(self, adhoc_fms=None):
    method get_metric_names (line 1041) | def get_metric_names(self, adhoc_fms=None):
    method get_dimension_names (line 1045) | def get_dimension_names(self, adhoc_fms=None):
    method get_field_names (line 1049) | def get_field_names(self, adhoc_fms=None):
    method add_metric (line 1053) | def add_metric(self, metric, force=False):
    method add_dimension (line 1064) | def add_dimension(self, dimension, force=False):
    method _add_default_display_names (line 1076) | def _add_default_display_names(self, adhoc_fms=None, display_names=None):
    method _populate_global_fields (line 1093) | def _populate_global_fields(self, config, force=False):
    method _find_field_sources (line 1146) | def _find_field_sources(self, field, adhoc_fms=None):
  function get_table_metrics (line 1170) | def get_table_metrics(fm, table, adhoc_fms=None):
  function get_table_dimensions (line 1196) | def get_table_dimensions(fm, table, adhoc_fms=None):
  function get_table_fields (line 1222) | def get_table_fields(table):
  function get_table_field_column (line 1244) | def get_table_field_column(table, field_name):
  function table_field_allows_grain (line 1268) | def table_field_allows_grain(table, field, grain):
  function values_from_db (line 1288) | def values_from_db(warehouse_id, field):
  function sort_by_value_order (line 1325) | def sort_by_value_order(warehouse_id, field, values):
  function get_conversions_for_type (line 1350) | def get_conversions_for_type(coltype):
  function replace_non_named_formula_args (line 1369) | def replace_non_named_formula_args(formula, column):
  function get_dialect_type_conversions (line 1381) | def get_dialect_type_conversions(dialect, column):

FILE: zillion/nlp.py
  function hash_text (line 46) | def hash_text(text):
  class EmbeddingsCache (line 57) | class EmbeddingsCache:
    method __init__ (line 72) | def __init__(
    method get_text_hash (line 87) | def get_text_hash(cls, text):
    method cache (line 92) | def cache(self):
    method cache (line 98) | def cache(self, value):
    method decode (line 101) | def decode(self, blob):
    method encode (line 105) | def encode(self, values):
    method init_cache (line 109) | def init_cache(self):
    method _get_key (line 121) | def _get_key(self, text):
    method __getitem__ (line 126) | def __getitem__(self, key):
    method __setitem__ (line 157) | def __setitem__(self, key, value):
    method __delitem__ (line 192) | def __delitem__(self, key):
  class OpenAIEmbeddingsCached (line 207) | class OpenAIEmbeddingsCached(OpenAIEmbeddings):
    class Config (line 211) | class Config:
    method __init__ (line 216) | def __init__(self, *args, **kwargs):
    method embed_query (line 225) | def embed_query(self, query):
    method embed_documents (line 234) | def embed_documents(self, documents):
  class QdrantCustom (line 244) | class QdrantCustom(Qdrant):
    method get_id (line 248) | def get_id(cls, text):
    method add_texts (line 252) | def add_texts(self, texts, metadatas=None, bulk_embedder=None):
    method similarity_search_with_score (line 291) | def similarity_search_with_score(self, query, k=4, **kwargs):
  class EmbeddingsAPI (line 323) | class EmbeddingsAPI:
    method __init__ (line 326) | def __init__(self):
    method connect (line 333) | def connect(self):
    method ensure_client (line 349) | def ensure_client(self):
    method embed_documents (line 354) | def embed_documents(self, rows):
    method embed_query (line 358) | def embed_query(self, query):
    method recreate_collection (line 362) | def recreate_collection(
    method create_collection_if_necessary (line 399) | def create_collection_if_necessary(
    method add_texts (line 442) | def add_texts(
    method similarity_search_with_score (line 463) | def similarity_search_with_score(self, collection_name, query, **kwargs):
    method get_collection (line 472) | def get_collection(self, name):
    method delete_collection (line 477) | def delete_collection(self, name):
    method get_embeddings (line 482) | def get_embeddings(
    method delete_embeddings (line 517) | def delete_embeddings(self, collection_name, texts):
    method upsert_embedding (line 533) | def upsert_embedding(self, collection_name, text, payload):
  function field_name_to_embedding_text (line 558) | def field_name_to_embedding_text(name):
  function get_warehouse_collection_name (line 563) | def get_warehouse_collection_name(warehouse):
  function warehouse_field_nlp_enabled (line 578) | def warehouse_field_nlp_enabled(warehouse, field_def):
  function init_warehouse_embeddings (line 600) | def init_warehouse_embeddings(warehouse, force_recreate=False):
  function get_openai_class (line 654) | def get_openai_class(model=None):
  function get_openai_model_context_size (line 660) | def get_openai_model_context_size(model):
  function build_llm (line 678) | def build_llm(model=None, max_tokens=None, request_timeout=LLM_REQUEST_T...
  function build_chain (line 708) | def build_chain(
  function parse_text_to_report_json_output (line 870) | def parse_text_to_report_json_output(output):
  function get_field_name_variants (line 943) | def get_field_name_variants(name):
  function get_field_fuzzy (line 972) | def get_field_fuzzy(warehouse, name, field_type=None):
  function map_warehouse_report_params (line 1023) | def map_warehouse_report_params(warehouse, report):
  function get_fields_prompt_str (line 1074) | def get_fields_prompt_str(warehouse, fields):
  function get_metrics_prompt_str (line 1088) | def get_metrics_prompt_str(warehouse):
  function get_dimensions_prompt_str (line 1092) | def get_dimensions_prompt_str(warehouse):
  class MaxTokensException (line 1096) | class MaxTokensException(Exception):
  function text_to_report_params (line 1100) | def text_to_report_params(query, warehouse=None, prompt_version="no_fiel...
  function parse_nlp_table_relationships (line 1151) | def parse_nlp_table_relationships(output):
  function get_nlp_table_relationships (line 1178) | def get_nlp_table_relationships(metadata, table_names):
  function parse_nlp_table_info (line 1244) | def parse_nlp_table_info(output):
  function get_nlp_table_info (line 1270) | def get_nlp_table_info(table):

FILE: zillion/report.py
  class ExecutionStateMixin (line 50) | class ExecutionStateMixin:
    method __init__ (line 53) | def __init__(self):
    method _ready (line 58) | def _ready(self):
    method _querying (line 63) | def _querying(self):
    method _killed (line 68) | def _killed(self):
    method _get_lock (line 73) | def _get_lock(self, timeout=None):
    method _raise_if_killed (line 93) | def _raise_if_killed(self, timeout=None):
    method _get_state (line 106) | def _get_state(self):
    method _set_state (line 110) | def _set_state(
  class DataSourceQuery (line 162) | class DataSourceQuery(ExecutionStateMixin, PrintMixin):
    method __init__ (line 182) | def __init__(self, warehouse, metrics, dimensions, criteria, table_set):
    method get_datasource (line 191) | def get_datasource(self):
    method get_datasource_name (line 195) | def get_datasource_name(self):
    method get_tables (line 199) | def get_tables(self):
    method get_dialect_name (line 206) | def get_dialect_name(self):
    method covers_metric (line 210) | def covers_metric(self, metric):
    method covers_field (line 226) | def covers_field(self, field):
    method add_metric (line 242) | def add_metric(self, metric, adhoc_fms=None):
    method get_conn (line 260) | def get_conn(self):
    method execute (line 266) | def execute(self, timeout=None, label=None):
    method kill (line 338) | def kill(self, main_thread=None):
    method _format_query (line 387) | def _format_query(self):
    method _get_bind (line 391) | def _get_bind(self):
    method _add_prefix_with (line 400) | def _add_prefix_with(self, select):
    method _build_select (line 416) | def _build_select(self):
    method _get_field (line 435) | def _get_field(self, name):
    method _column_for_field (line 459) | def _column_for_field(self, field, table=None):
    method _get_field_expression (line 493) | def _get_field_expression(self, field, label=True):
    method _get_join (line 511) | def _get_join(self):
    method _convert_criteria (line 563) | def _convert_criteria(self, field, conversion, value):
    method _add_where (line 594) | def _add_where(self, select):
    method _add_group_by (line 632) | def _add_group_by(self, select):
    method _add_order_by (line 640) | def _add_order_by(self, select, asc=True):
  class DataSourceQuerySummary (line 650) | class DataSourceQuerySummary(PrintMixin):
    method __init__ (line 663) | def __init__(self, query, data, duration):
    method format (line 671) | def format(self):
    method _format_query (line 683) | def _format_query(self):
  class DataSourceQueryResult (line 688) | class DataSourceQueryResult(PrintMixin):
    method __init__ (line 701) | def __init__(self, query, data, duration):
  class BaseCombinedResult (line 707) | class BaseCombinedResult:
    method __init__ (line 722) | def __init__(
    method get_conn (line 740) | def get_conn(self):
    method get_cursor (line 744) | def get_cursor(self, conn):
    method create_table (line 748) | def create_table(self):
    method load_table (line 752) | def load_table(self):
    method clean_up (line 756) | def clean_up(self):
    method add_warning (line 760) | def add_warning(self, msg, log=True):
    method ifnull_clause (line 765) | def ifnull_clause(self, column_clause, ifnull_value):
    method get_metric_clause (line 769) | def get_metric_clause(self, metric, has_formula_dims):
    method get_final_result (line 773) | def get_final_result(
    method _get_row_hash (line 787) | def _get_row_hash(self, row):
    method _get_fields (line 812) | def _get_fields(self):
    method _get_field_names (line 830) | def _get_field_names(self):
  class SQLiteMemoryCombinedResult (line 836) | class SQLiteMemoryCombinedResult(BaseCombinedResult):
    method get_conn (line 839) | def get_conn(self):
    method get_cursor (line 843) | def get_cursor(self, conn):
    method create_table (line 848) | def create_table(self):
    method load_table (line 879) | def load_table(self):
    method ifnull_clause (line 887) | def ifnull_clause(self, column_clause, ifnull_value):
    method get_metric_clause (line 891) | def get_metric_clause(self, metric, has_formula_dims):
    method get_final_result (line 933) | def get_final_result(
    method clean_up (line 1071) | def clean_up(self):
    method _sort (line 1078) | def _sort(self, series, rollup):
    method _wavg (line 1092) | def _wavg(self, d, w, raise_on_zero_div_error=False):
    method _select_all (line 1103) | def _select_all(self):
    method _get_final_select_sql (line 1108) | def _get_final_select_sql(self, columns, dimension_aliases, formula_di...
    method _get_bulk_insert_sql (line 1143) | def _get_bulk_insert_sql(self, rows):
    method _apply_row_filters (line 1193) | def _apply_row_filters(self, df, row_filters, metrics, dimensions):
    method _get_multi_rollup_df (line 1262) | def _get_multi_rollup_df(self, df, rollup, dimensions, aggrs, wavgs):
    method _apply_rollup (line 1327) | def _apply_rollup(self, df, rollup, metrics, dimensions):
    method _apply_technicals (line 1398) | def _apply_technicals(self, df, technicals, rounding):
    method _apply_limits (line 1420) | def _apply_limits(self, df, row_filters, limit, metrics, dimensions):
  class Report (line 1448) | class Report(ExecutionStateMixin):
    method __init__ (line 1520) | def __init__(
    method get_params (line 1639) | def get_params(self):
    method get_json (line 1660) | def get_json(self):
    method save (line 1664) | def save(self, meta=None):
    method execute (line 1697) | def execute(self):
    method kill (line 1759) | def kill(self, soft=False, raise_if_failed=False):
    method get_grain (line 1805) | def get_grain(self):
    method get_dimension_grain (line 1818) | def get_dimension_grain(self):
    method _get_fields_dict (line 1824) | def _get_fields_dict(self, names, field_type, adhoc_datasources=None):
    method _populate_criteria_fields (line 1849) | def _populate_criteria_fields(self, criteria, adhoc_datasources=None):
    method _process_subreports (line 1868) | def _process_subreports(self, criteria, adhoc_datasources=None):
    method _check_order_by (line 1914) | def _check_order_by(self, order_by):
    method _add_ds_fields (line 1934) | def _add_ds_fields(self, field):
    method _get_query_label (line 1972) | def _get_query_label(self, query_label):
    method _execute_ds_queries_sequential (line 1976) | def _execute_ds_queries_sequential(self, queries):
    method _execute_ds_queries_multithread (line 1997) | def _execute_ds_queries_multithread(self, queries):
    method _execute_ds_queries (line 2031) | def _execute_ds_queries(self, queries):
    method _check_required_grain (line 2054) | def _check_required_grain(self):
    method _build_ds_queries (line 2079) | def _build_ds_queries(self, allow_partial=False, disabled_tables=None):
    method _create_combined_result (line 2149) | def _create_combined_result(self, ds_query_results):
    method from_params (line 2169) | def from_params(cls, warehouse, params, adhoc_datasources=None, report...
    method from_text (line 2197) | def from_text(
    method load (line 2243) | def load(cls, warehouse, spec_id, adhoc_datasources=None, report_depth...
    method load_warehouse_id_for_report (line 2272) | def load_warehouse_id_for_report(cls, spec_id):
    method delete (line 2296) | def delete(cls, warehouse, spec_id):
    method _load_report_spec (line 2316) | def _load_report_spec(cls, warehouse, spec_id):
    method _load_params (line 2347) | def _load_params(cls, warehouse, spec_id):
  class ReportResult (line 2368) | class ReportResult(PrintMixin):
    method __init__ (line 2390) | def __init__(
    method rollup_mask (line 2408) | def rollup_mask(self):
    method rollup_rows (line 2420) | def rollup_rows(self):
    method non_rollup_rows (line 2425) | def non_rollup_rows(self):
    method display_name_map (line 2430) | def display_name_map(self):
    method df_display (line 2444) | def df_display(self):

FILE: zillion/scripts/bootstrap_datasource_config.py
  function get_primary_key (line 61) | def get_primary_key(table, full_names=False):
  function get_field_name (line 70) | def get_field_name(table, column, full_names=False):
  function get_foreign_key_relationships (line 77) | def get_foreign_key_relationships(metadata, table_configs):
  function infer_table_relationships (line 111) | def infer_table_relationships(metadata, table_configs, nlp=False):
  function get_configs (line 208) | def get_configs(
  class SecureAction (line 324) | class SecureAction(argparse.Action):
    method __call__ (line 325) | def __call__(self, parser, namespace, values, option_string=None):
  function main (line 383) | def main(

FILE: zillion/scripts/load_config.py
  function main (line 27) | def main(

FILE: zillion/scripts/run_report.py
  function main (line 45) | def main(

FILE: zillion/sql_utils.py
  class InvalidSQLAlchemyTypeString (line 79) | class InvalidSQLAlchemyTypeString(Exception):
  function contains_sql_keywords (line 83) | def contains_sql_keywords(sql):
  function contains_aggregation (line 110) | def contains_aggregation(sql):
  function type_string_to_sa_type (line 143) | def type_string_to_sa_type(type_string):
  function to_generic_sa_type (line 195) | def to_generic_sa_type(type):
  function infer_aggregation_and_rounding (line 213) | def infer_aggregation_and_rounding(column):
  function aggregation_to_sqla_func (line 250) | def aggregation_to_sqla_func(aggregation):
  function is_numeric_type (line 255) | def is_numeric_type(type):
  function is_probably_metric (line 263) | def is_probably_metric(column, formula=None, nlp_column_info=None):
  function sqla_compile (line 296) | def sqla_compile(expr):
  function printexpr (line 311) | def printexpr(expr):
  function column_fullname (line 316) | def column_fullname(column, prefix=None):
  function get_schema_and_table_name (line 339) | def get_schema_and_table_name(table):
  function get_sqla_criterion_expr (line 351) | def get_sqla_criterion_expr(column, criterion, negate=False):
  function check_metadata_url (line 440) | def check_metadata_url(url, confirm_exists=False):
  function comment (line 456) | def comment(self, c):
  function _compile_element (line 466) | def _compile_element(elem, prepend_newline=False):
  function get_schemas (line 490) | def get_schemas(engine):
  function to_mysql_type (line 499) | def to_mysql_type(type):
  function to_postgresql_type (line 504) | def to_postgresql_type(type):
  function to_sqlite_type (line 509) | def to_sqlite_type(type):
  function to_duckdb_type (line 514) | def to_duckdb_type(type):
  function filter_dialect_schemas (line 521) | def filter_dialect_schemas(schemas, dialect):
  function get_postgres_schemas (line 551) | def get_postgres_schemas(conn):
  function get_postgres_pid (line 562) | def get_postgres_pid(conn):

FILE: zillion/warehouse.py
  class Warehouse (line 16) | class Warehouse(FieldManagerMixin):
    method __init__ (line 40) | def __init__(self, config=None, datasources=None, ds_priority=None, nl...
    method __repr__ (line 82) | def __repr__(self):
    method datasources (line 90) | def datasources(self):
    method datasource_names (line 95) | def datasource_names(self):
    method print_info (line 99) | def print_info(self):
    method get_datasource (line 111) | def get_datasource(self, name, adhoc_datasources=None):
    method get_child_field_managers (line 134) | def get_child_field_managers(self):
    method add_datasource (line 138) | def add_datasource(self, ds, skip_integrity_checks=False):
    method remove_datasource (line 154) | def remove_datasource(self, ds, skip_integrity_checks=False):
    method apply_config (line 170) | def apply_config(self, config, skip_integrity_checks=False, nlp=False):
    method run_integrity_checks (line 199) | def run_integrity_checks(self, adhoc_datasources=None):
    method load_report (line 243) | def load_report(self, spec_id, adhoc_datasources=None):
    method delete_report (line 263) | def delete_report(self, spec_id):
    method save_report (line 277) | def save_report(self, meta=None, **kwargs):
    method save (line 300) | def save(self, name, config_url, meta=None):
    method execute (line 345) | def execute(
    method execute_id (line 389) | def execute_id(self, spec_id, adhoc_datasources=None):
    method execute_text (line 409) | def execute_text(self, text, adhoc_datasources=None, allow_partial=Fal...
    method get_metric_table_set (line 440) | def get_metric_table_set(
    method get_dimension_table_set (line 489) | def get_dimension_table_set(
    method init_embeddings (line 538) | def init_embeddings(self, force_recreate=False):
    method _get_embeddings_collection_name (line 551) | def _get_embeddings_collection_name(self):
    method _set_embeddings_collection_name (line 555) | def _set_embeddings_collection_name(self, name):
    method _create_or_update_datasources (line 560) | def _create_or_update_datasources(
    method _clear_supported_dimension_cache (line 586) | def _clear_supported_dimension_cache(self):
    method _check_reserved_field_names (line 590) | def _check_reserved_field_names(self, adhoc_datasources=None):
    method _check_conflicting_fields (line 598) | def _check_conflicting_fields(self, adhoc_datasources=None):
    method _check_fields_have_type (line 643) | def _check_fields_have_type(self, adhoc_datasources=None):
    method _check_primary_key_dimensions (line 668) | def _check_primary_key_dimensions(self, adhoc_datasources=None):
    method _check_weighting_metrics (line 695) | def _check_weighting_metrics(self, adhoc_datasources=None):
    method _check_required_grain (line 722) | def _check_required_grain(self, adhoc_datasources=None):
    method _check_incomplete_dimensions (line 757) | def _check_incomplete_dimensions(self, adhoc_datasources=None):
    method _check_valid_table_parents (line 778) | def _check_valid_table_parents(self, adhoc_datasources=None):
    method _get_supported_dimensions_for_metric (line 805) | def _get_supported_dimensions_for_metric(
    method _get_supported_dimensions (line 863) | def _get_supported_dimensions(
    method _get_ds_tables_with_metric (line 892) | def _get_ds_tables_with_metric(
    method _get_ds_dim_tables_with_dim (line 927) | def _get_ds_dim_tables_with_dim(
    method _get_ds_table_sets (line 961) | def _get_ds_table_sets(
    method _choose_best_datasource (line 997) | def _choose_best_datasource(self, ds_names):
    method _choose_best_table_set (line 1021) | def _choose_best_table_set(self, ds_table_sets):
    method _generate_unsupported_grain_msg (line 1062) | def _generate_unsupported_grain_msg(
    method load (line 1100) | def load(cls, id):
    method load_warehouse_for_report (line 1127) | def load_warehouse_for_report(cls, spec_id):
    method load_report_and_warehouse (line 1144) | def load_report_and_warehouse(cls, spec_id):
    method delete (line 1161) | def delete(cls, id):
    method _load_warehouse (line 1178) | def _load_warehouse(cls, id):
    method from_db_file (line 1200) | def from_db_file(cls, *args, **kwargs):
    method from_data_file (line 1206) | def from_data_file(cls, *args, **kwargs):

Download .json

Condensed preview — 117 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (3,043K chars).

[
  {
    "path": ".gitattributes",
    "chars": 156,
    "preview": "# This is a hack to get lingquist to ignore the SQL dumps that are causing\n# this repo to not be viewed as a pythong pro"
  },
  {
    "path": ".github/FUNDING.yml",
    "chars": 19,
    "preview": "github: [totalhack]"
  },
  {
    "path": ".gitignore",
    "chars": 133,
    "preview": "*.egg-info\nbuild\ndocs/site\n*.pyc\n*.out\ndist\n.DS_Store\n.idea/\ntests/test_wh_config.yaml\nvolumes\n.python-version\nGEMINI.md"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 123,
    "preview": "repos:\n-   repo: https://github.com/psf/black\n    rev: stable\n    hooks:\n    - id: black\n      language_version: python3"
  },
  {
    "path": ".pylintrc",
    "chars": 18051,
    "preview": "[MASTER]\n\n# A comma-separated list of package or module names from where C extensions may\n# be loaded. Extensions are lo"
  },
  {
    "path": ".readthedocs.yml",
    "chars": 113,
    "preview": "mkdocs:\n  configuration: mkdocs.yml\n\npython:\n  version: 3.7\n  install:\n    - requirements: docs/requirements.txt\n"
  },
  {
    "path": "AUTHORS.md",
    "chars": 330,
    "preview": "Zillion is written and maintained by [@totalhack](https://github.com/totalhack). Contributors welcome!\n\n# **Core Contrib"
  },
  {
    "path": "CODE_OF_CONDUCT.md",
    "chars": 99,
    "preview": "[This](https://www.kennethreitz.org/essays/be-cordial-or-be-on-your-way) is a good starting point.\n"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 2547,
    "preview": "Your help and feedback are greatly appreciated. Whether it's supporting/testing\na new datasource type, finding bugs, or "
  },
  {
    "path": "LICENSE",
    "chars": 1089,
    "preview": "The MIT License (MIT)\n\nCopyright (c) 2019-present, Kurt Matarese\n\nPermission is hereby granted, free of charge, to any p"
  },
  {
    "path": "Makefile",
    "chars": 2487,
    "preview": "PY := $(shell which python)\nENV := $(abspath $(dir $(PY))/..)\nPIP := $(ENV)/bin/pip\nUV := $(ENV)/bin/uv\n\nPACKAGE_NAME :="
  },
  {
    "path": "README.md",
    "chars": 35559,
    "preview": "Zillion: Make sense of it all\n=============================\n\n[![Generic badge](https://img.shields.io/badge/Status-Alpha"
  },
  {
    "path": "dev_config.yml",
    "chars": 843,
    "preview": "DEBUG: false\nLOG_LEVEL: WARNING\nLOAD_TABLE_CHUNK_SIZE: 5000\nDB_URL: sqlite:////tmp/zillion.db\nADHOC_DATASOURCE_DIRECTORY"
  },
  {
    "path": "docker-compose-nlp.yml",
    "chars": 210,
    "preview": "version: \"3.8\"\nservices:    \n  qdrant:\n    image: qdrant/qdrant\n    ports:\n      - 6333:6333\n      - 6334:6334\n    envir"
  },
  {
    "path": "docker-compose.yml",
    "chars": 861,
    "preview": "version: \"3.8\"\nservices:\n\n  mysql:\n    image: mysql:8.0.32\n    ports:\n      - 3306:3306\n    command: ['--default-authent"
  },
  {
    "path": "docs/build_markdown.py",
    "chars": 5030,
    "preview": "import enum\nimport importlib\nimport inspect\nimport os\nimport pkgutil\nimport shutil\n\nimport markdown\nfrom tlbx import st\n"
  },
  {
    "path": "docs/markdown/contributing.md",
    "chars": 2547,
    "preview": "Your help and feedback are greatly appreciated. Whether it's supporting/testing\na new datasource type, finding bugs, or "
  },
  {
    "path": "docs/markdown/readme_badges.md",
    "chars": 424,
    "preview": "[![Generic badge](https://img.shields.io/badge/Status-Alpha-yellow.svg)](https://shields.io/)\n[![Code style: black](http"
  },
  {
    "path": "docs/markdown/readme_contents.md",
    "chars": 31732,
    "preview": "<a name=\"installation\"></a>\n\n**Installation**\n----------------\n\n> **Warning**: This project is in an alpha state and is "
  },
  {
    "path": "docs/markdown/readme_docs.md",
    "chars": 340,
    "preview": "<a name=\"documentation\"></a>\n\n**Documentation**\n-----------------\n\nMore thorough documentation can be found [here](https"
  },
  {
    "path": "docs/markdown/readme_how_to_contribute.md",
    "chars": 325,
    "preview": "<a name=\"how-to-contribute\"></a>\n\n**How to Contribute**\n---------------------\n\nPlease See the\n[contributing](https://git"
  },
  {
    "path": "docs/markdown/readme_intro.md",
    "chars": 1255,
    "preview": "**Introduction**\n----------------\n\n`Zillion` is a data modeling and analytics tool that allows combining and\nanalyzing d"
  },
  {
    "path": "docs/markdown/readme_toc.md",
    "chars": 1399,
    "preview": "**Table of Contents**\n---------------------\n\n* [Installation](#installation)\n* [Primer](#primer)\n    * [Metrics and Dime"
  },
  {
    "path": "docs/mkdocs/api.md",
    "chars": 488,
    "preview": "# API Reference\n\n* [zillion.configs](zillion.configs.md)\n* [zillion.core](zillion.core.md)\n* [zillion.datasource](zillio"
  },
  {
    "path": "docs/mkdocs/contributing.md",
    "chars": 2547,
    "preview": "Your help and feedback are greatly appreciated. Whether it's supporting/testing\na new datasource type, finding bugs, or "
  },
  {
    "path": "docs/mkdocs/css/extra.css",
    "chars": 820,
    "preview": "pre { color: white !important; }\n\n.md-clipboard:before {\n    color: rgb(255, 255, 255);\n}\n\n.codehilite:hover .md-clipboa"
  },
  {
    "path": "docs/mkdocs/index.md",
    "chars": 33423,
    "preview": "[![Generic badge](https://img.shields.io/badge/Status-Alpha-yellow.svg)](https://shields.io/)\n[![Code style: black](http"
  },
  {
    "path": "docs/mkdocs/zillion.configs.md",
    "chars": 15947,
    "preview": "[//]: # (This is an auto-generated file. Do not edit)\n# Module zillion.configs\n\n\n## [AdHocFieldSchema](https://github.co"
  },
  {
    "path": "docs/mkdocs/zillion.core.md",
    "chars": 4680,
    "preview": "[//]: # (This is an auto-generated file. Do not edit)\n# Module zillion.core\n\n\n## [AggregationTypes](https://github.com/t"
  },
  {
    "path": "docs/mkdocs/zillion.datasource.md",
    "chars": 6230,
    "preview": "[//]: # (This is an auto-generated file. Do not edit)\n# Module zillion.datasource\n\n\n## [AdHocDataTable](https://github.c"
  },
  {
    "path": "docs/mkdocs/zillion.dialects.md",
    "chars": 81,
    "preview": "[//]: # (This is an auto-generated file. Do not edit)\n# Module zillion.dialects\n\n"
  },
  {
    "path": "docs/mkdocs/zillion.field.md",
    "chars": 5474,
    "preview": "[//]: # (This is an auto-generated file. Do not edit)\n# Module zillion.field\n\n\n## [AdHocDimension](https://github.com/to"
  },
  {
    "path": "docs/mkdocs/zillion.model.md",
    "chars": 78,
    "preview": "[//]: # (This is an auto-generated file. Do not edit)\n# Module zillion.model\n\n"
  },
  {
    "path": "docs/mkdocs/zillion.nlp.md",
    "chars": 3476,
    "preview": "[//]: # (This is an auto-generated file. Do not edit)\n# Module zillion.nlp\n\n\n## [build_chain](https://github.com/totalha"
  },
  {
    "path": "docs/mkdocs/zillion.report.md",
    "chars": 2204,
    "preview": "[//]: # (This is an auto-generated file. Do not edit)\n# Module zillion.report\n\n\n## [BaseCombinedResult](https://github.c"
  },
  {
    "path": "docs/mkdocs/zillion.scripts.md",
    "chars": 80,
    "preview": "[//]: # (This is an auto-generated file. Do not edit)\n# Module zillion.scripts\n\n"
  },
  {
    "path": "docs/mkdocs/zillion.sql_utils.md",
    "chars": 3858,
    "preview": "[//]: # (This is an auto-generated file. Do not edit)\n# Module zillion.sql_utils\n\n\n## [aggregation_to_sqla_func](https:/"
  },
  {
    "path": "docs/mkdocs/zillion.version.md",
    "chars": 80,
    "preview": "[//]: # (This is an auto-generated file. Do not edit)\n# Module zillion.version\n\n"
  },
  {
    "path": "docs/mkdocs/zillion.warehouse.md",
    "chars": 1104,
    "preview": "[//]: # (This is an auto-generated file. Do not edit)\n# Module zillion.warehouse\n\n\n## [Warehouse](https://github.com/tot"
  },
  {
    "path": "docs/mkdocs_index.md",
    "chars": 113,
    "preview": "--8<-- \"markdown/readme_badges.md\"\n\n--8<-- \"markdown/readme_intro.md\"\n\n---\n\n--8<-- \"markdown/readme_contents.md\"\n"
  },
  {
    "path": "docs/readme.md",
    "chars": 293,
    "preview": "Zillion: Make sense of it all\n=============================\n\n--8<-- \"markdown/readme_badges.md\"\n\n--8<-- \"markdown/readme"
  },
  {
    "path": "docs/requirements.txt",
    "chars": 122,
    "preview": "zillion\nmkdocs==1.1.2\nmkdocs-material==5.2.1\nmkdocs-material-extensions==1.0\nmkdocs-minify-plugin==0.3.0\nmkautodoc==0.1."
  },
  {
    "path": "examples/baseball_warehouse.json",
    "chars": 16521,
    "preview": "{\n    \"metrics\": [\n        {\n            \"name\": \"games\",\n            \"display_name\": \"G\",\n            \"type\": \"integer\""
  },
  {
    "path": "examples/example_wh_config.json",
    "chars": 4219,
    "preview": "{\n    \"metrics\": [\n        {\n            \"name\": \"revenue\",\n            \"type\": \"numeric(10,2)\",\n            \"aggregatio"
  },
  {
    "path": "examples/minimal_example.py",
    "chars": 251,
    "preview": "from zillion import Warehouse\n\nconfig = \"https://raw.githubusercontent.com/totalhack/zillion/master/examples/example_wh_"
  },
  {
    "path": "examples/sample_config.yaml",
    "chars": 1515,
    "preview": "# Note: env var substitution is supported via $FOO or ${FOO} syntax\n\n# Turn on debug logging\nDEBUG: false\n# Control the "
  },
  {
    "path": "mkdocs.yml",
    "chars": 1418,
    "preview": "site_name: Zillion\nsite_description: Make sense of it all.\nsite_author: totalhack\nsite_url: https://totalhack.github.io/"
  },
  {
    "path": "pyproject.toml",
    "chars": 2453,
    "preview": "[build-system]\nrequires = [\"setuptools>=61\", \"wheel\"]\nbuild-backend = \"setuptools.build_meta\"\n\n[project]\nname = \"zillion"
  },
  {
    "path": "tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "tests/conftest.py",
    "chars": 4378,
    "preview": "import pytest\nimport sys\nimport time\n\nfrom tlbx import json\nfrom zillion.core import zillion_config\n\nfrom .test_utils im"
  },
  {
    "path": "tests/dma_zip.csv",
    "chars": 30464,
    "preview": "Zip_Code,DMA_Code,DMA_Description\n501,501,NEW YORK\n544,501,NEW YORK\n1001,543,SPRINGFIELD - HOLYOKE\n1002,543,SPRINGFIELD "
  },
  {
    "path": "tests/dma_zip.html",
    "chars": 94642,
    "preview": "<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th>Zip_Code</th>\n      <th>DMA"
  },
  {
    "path": "tests/dma_zip.json",
    "chars": 74634,
    "preview": "{\"schema\": {\"fields\":[{\"name\":\"Zip_Code\",\"type\":\"integer\"},{\"name\":\"DMA_Code\",\"type\":\"integer\"},{\"name\":\"DMA_Description"
  },
  {
    "path": "tests/pytest.ini",
    "chars": 120,
    "preview": "[pytest]\naddopts = -p no:warnings\nmarkers =\n    longrun: long running test\n    nlp: tests that require the nlp extension"
  },
  {
    "path": "tests/setup/campaigns.csv",
    "chars": 333,
    "preview": "id,name,category,partner_id,created_at\n1,\"Campaign 1A\",fruits,1,\"2019-03-26 21:02:15\"\n2,\"Campaign 2A\",vegetables,1,\"2019"
  },
  {
    "path": "tests/setup/common.sqlite.sql",
    "chars": 582,
    "preview": "DROP TABLE IF EXISTS partners;\nCREATE TABLE IF NOT EXISTS partners (\n  id INTEGER PRIMARY KEY,\n  name VARCHAR NOT NULL U"
  },
  {
    "path": "tests/setup/create_testdb2_sqlite.py",
    "chars": 1253,
    "preview": "from sqlite3 import connect, Row\n\nfrom test_utils import get_testdb_url\nfrom tlbx import st, rmfile, shell\n\nCREATE_AGGRE"
  },
  {
    "path": "tests/setup/duckdb/load.sql",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "tests/setup/duckdb/schema.sql",
    "chars": 12,
    "preview": "\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "tests/setup/init_mysql_data.sh",
    "chars": 221,
    "preview": "#! /usr/bin/env bash\n\n# Exit in case of error\nset -e\n\ndocker exec -i `docker-compose ps -q mysql | xargs docker inspect "
  },
  {
    "path": "tests/setup/init_postgres_data.sh",
    "chars": 243,
    "preview": "#! /usr/bin/env bash\n\n# Exit in case of error\nset -e\n\ndocker exec -i -u postgres `docker-compose ps -q postgres | xargs "
  },
  {
    "path": "tests/setup/leads.csv",
    "chars": 320,
    "preview": "id,name,campaign_id,created_at\n1,\"John Doe\",1,\"2020-04-30 23:24:11\"\n2,\"Jane Doe\",1,\"2020-04-30 23:24:11\"\n3,\"Jim Doe\",2,\""
  },
  {
    "path": "tests/setup/partner_sibling.csv",
    "chars": 75,
    "preview": "partner_id,sibling_dim\n1,\"Partner A Sibling Dim\"\n2,\"Partner B Sibling Dim\"\n"
  },
  {
    "path": "tests/setup/partners.csv",
    "chars": 127,
    "preview": "id,name,created_at\n1,\"Partner A\",\"2019-03-26 21:02:15\"\n2,\"Partner B\",\"2019-03-26 21:02:15\"\n3,\"Partner C\",\"2019-03-26 21:"
  },
  {
    "path": "tests/setup/sales.csv",
    "chars": 756,
    "preview": "id,item,quantity,revenue,lead_id,created_at\n1,apple,10,10,1,\"2020-04-30 23:24:11\"\n2,orange,15,7,1,\"2020-04-30 23:24:11\"\n"
  },
  {
    "path": "tests/setup/testdb1.sqlite.sql",
    "chars": 4742,
    "preview": "DROP TABLE IF EXISTS leads;\nCREATE TABLE IF NOT EXISTS leads (\n  id INTEGER PRIMARY KEY,\n  name VARCHAR DEFAULT NULL,\n  "
  },
  {
    "path": "tests/setup/zillion_db.sqlite.sql",
    "chars": 421,
    "preview": "\nCREATE TABLE IF NOT EXISTS warehouses (\n  id INTEGER NOT NULL,\n  name VARCHAR(128) NOT NULL,\n  params TEXT NOT NULL,\n  "
  },
  {
    "path": "tests/setup/zillion_test.mysql.sql",
    "chars": 984889,
    "preview": "-- MySQL dump 10.13  Distrib 5.7.28, for osx10.14 (x86_64)\n--\n-- Host: localhost    Database: zillion_test\n-- ----------"
  },
  {
    "path": "tests/setup/zillion_test.postgres.sql",
    "chars": 787969,
    "preview": "--\n-- PostgreSQL database dump\n--\n\n-- Dumped from database version 12.2\n-- Dumped by pg_dump version 12.2\n\nSET statement"
  },
  {
    "path": "tests/test_adhoc_ds_config.json",
    "chars": 830,
    "preview": "{\n  \"datasources\": {\n    \"test_adhoc_db\": {\n      \"connect\": \"sqlite:////tmp/test_adhoc_db\",\n      \"tables\": {\n        \""
  },
  {
    "path": "tests/test_config.yaml",
    "chars": 827,
    "preview": "DEBUG: false\nLOG_LEVEL: WARNING\nLOAD_TABLE_CHUNK_SIZE: 5000\nZILLION_DB_URL: sqlite:////tmp/zillion.db\n\n# OPENAI_API_KEY:"
  },
  {
    "path": "tests/test_core.py",
    "chars": 23835,
    "preview": "from collections import OrderedDict\nimport os\nimport pytest\nimport time\n\nfrom marshmallow import ValidationError\nfrom tl"
  },
  {
    "path": "tests/test_duckdb.py",
    "chars": 2308,
    "preview": "import pytest\n\nfrom .test_utils import *\nfrom zillion.core import *\nfrom zillion.datasource import *\n\n\n@pytest.mark.skip"
  },
  {
    "path": "tests/test_duckdb_wh_config.json",
    "chars": 13427,
    "preview": "{\n    \"metrics\": [\n        {\n            \"name\": \"rpl\",\n            \"display_name\": \"Revenue/Lead\",\n            \"aggrega"
  },
  {
    "path": "tests/test_example_wh_config.py",
    "chars": 1044,
    "preview": "from zillion.configs import load_warehouse_config\nfrom zillion.core import RollupTypes, info\nfrom zillion.warehouse impo"
  },
  {
    "path": "tests/test_include_wh_config.json",
    "chars": 969,
    "preview": "{\n    \"metrics\": [\n        {\n            \"name\": \"rpl_include\",\n            \"aggregation\": \"mean\",\n            \"rounding"
  },
  {
    "path": "tests/test_mysql.py",
    "chars": 3050,
    "preview": "import pytest\n\nfrom .test_utils import *\nfrom zillion.core import *\nfrom zillion.datasource import *\n\n\ndef test_mysql_da"
  },
  {
    "path": "tests/test_mysql_ds_config.json",
    "chars": 3433,
    "preview": "{\n    \"connect\": \"mysql+pymysql://{user}@{host}/{schema}\",\n    \"prefix_with\": \"STRAIGHT_JOIN\",\n    \"metrics\": [\n        "
  },
  {
    "path": "tests/test_nlp.py",
    "chars": 5000,
    "preview": "from datetime import datetime, timedelta\n\nimport pytest\n\nfrom .test_utils import *\nfrom zillion.nlp import *\n\n\ndef n_day"
  },
  {
    "path": "tests/test_performance.py",
    "chars": 2478,
    "preview": "import contextlib\nimport cProfile\nimport pstats\nimport pytest\nimport time\n\nfrom .test_utils import *\nfrom zillion.core i"
  },
  {
    "path": "tests/test_postgresql.py",
    "chars": 2069,
    "preview": "import pytest\n\nfrom .test_utils import *\nfrom zillion.core import *\nfrom zillion.datasource import *\n\n\ndef test_postgres"
  },
  {
    "path": "tests/test_postgresql_ds_config.json",
    "chars": 3479,
    "preview": "{\n    \"connect\": \"postgresql+psycopg2://{user}@{host}/{schema}\",\n    \"metrics\": [\n        {\n            \"name\": \"clicks\""
  },
  {
    "path": "tests/test_reports.py",
    "chars": 54616,
    "preview": "import pytest\nimport threading\n\nimport pandas as pd\n\nfrom .test_utils import *\nfrom zillion.configs import zillion_confi"
  },
  {
    "path": "tests/test_scripts.py",
    "chars": 766,
    "preview": "import pytest\nfrom unittest.mock import patch\n\nTEST_DB_URL = \"https://github.com/totalhack/zillion/blob/master/tests/tes"
  },
  {
    "path": "tests/test_sqlite_ds_config.json",
    "chars": 2472,
    "preview": "{\n    \"connect\": {\n        \"func\": \"zillion.datasource.url_connect\",\n        \"params\": {\n            \"connect_url\": \"sql"
  },
  {
    "path": "tests/test_table_config.json",
    "chars": 856,
    "preview": "{\n    \"type\": \"metric\",\n    \"priority\": 1,\n    \"create_fields\": true,\n    \"primary_key\": [\n        \"lead_id\"\n    ],\n    "
  },
  {
    "path": "tests/test_utils.py",
    "chars": 13249,
    "preview": "\"\"\"\nTo setup testing against local databases:\n\nmysql -u root -h 127.0.0.1 < zillion_test.mysql.sql\npsql -h 127.0.0.1 -U "
  },
  {
    "path": "tests/test_wh_config.json",
    "chars": 9957,
    "preview": "{\n  \"meta\": {\n    \"nlp\": {\n      \"collection_name\": null,\n      \"field_disabled_patterns\": [\"rpl_ma_5\"],\n      \"field_di"
  },
  {
    "path": "zillion/__init__.py",
    "chars": 507,
    "preview": "\"\"\"Zillion package\"\"\"\n\nfrom .version import __version__\nfrom .core import (\n    FieldTypes,\n    TableTypes,\n    Aggregat"
  },
  {
    "path": "zillion/configs.py",
    "chars": 59527,
    "preview": "import sys\nfrom collections import OrderedDict, defaultdict\nimport os\nimport string\n\nfrom marshmallow import (\n    Schem"
  },
  {
    "path": "zillion/core.py",
    "chars": 14452,
    "preview": "# pylint: disable=unused-import,missing-class-docstring\nfrom collections.abc import MutableMapping\nfrom itertools import"
  },
  {
    "path": "zillion/datasource.py",
    "chars": 75992,
    "preview": "from collections import defaultdict\nimport datetime\nimport os\nimport random\nfrom urllib.parse import urlparse, urlunpars"
  },
  {
    "path": "zillion/dialects/__init__.py",
    "chars": 91,
    "preview": "from .duckdb import *\nfrom .mysql import *\nfrom .postgresql import *\nfrom .sqlite import *\n"
  },
  {
    "path": "zillion/dialects/conversions.py",
    "chars": 6313,
    "preview": "import sqlalchemy as sa\n\n\nclass DialectDateConversions:\n    @classmethod\n    def f(cls, func, i=0):\n        \"\"\"Generate "
  },
  {
    "path": "zillion/dialects/duckdb.py",
    "chars": 5016,
    "preview": "import sys\nimport sqlalchemy as sa\nfrom sqlalchemy import func\nfrom sqlalchemy.dialects.postgresql import DATE, TIMESTAM"
  },
  {
    "path": "zillion/dialects/mysql.py",
    "chars": 3765,
    "preview": "from sqlalchemy import func, text\n\nfrom zillion.dialects.conversions import DialectDateConversions\n\n\nclass MySQLDialectD"
  },
  {
    "path": "zillion/dialects/postgresql.py",
    "chars": 4601,
    "preview": "from sqlalchemy import func, text\nfrom sqlalchemy.dialects.postgresql import INTERVAL\nfrom sqlalchemy.sql.functions impo"
  },
  {
    "path": "zillion/dialects/sqlite.py",
    "chars": 4710,
    "preview": "from sqlalchemy import func\nfrom tlbx import st\n\nfrom zillion.dialects.conversions import DialectDateConversions\n\n\nclass"
  },
  {
    "path": "zillion/field.py",
    "chars": 53577,
    "preview": "import sqlalchemy as sa\n\nfrom zillion.configs import (\n    ConfigMixin,\n    FieldConfigSchema,\n    FormulaFieldConfigSch"
  },
  {
    "path": "zillion/model.py",
    "chars": 1718,
    "preview": "nlp_installed = False\n\nimport sqlalchemy as sa\n\nfrom zillion.core import zillion_config, nlp_installed\n\nzillion_engine ="
  },
  {
    "path": "zillion/nlp.py",
    "chars": 41964,
    "preview": "from datetime import datetime, timedelta\nimport hashlib\nimport re\nimport struct\nimport time\n\ntry:\n    from qdrant_client"
  },
  {
    "path": "zillion/report.py",
    "chars": 88048,
    "preview": "from collections import OrderedDict\nfrom concurrent.futures import as_completed, ThreadPoolExecutor\nfrom contextlib impo"
  },
  {
    "path": "zillion/scripts/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "zillion/scripts/bootstrap_datasource_config.py",
    "chars": 15096,
    "preview": "\"\"\"\nThis is a helper script to bootstrap the creation of a config for a datasource.\nThis is useful when you have an exis"
  },
  {
    "path": "zillion/scripts/json_to_yaml.py",
    "chars": 121,
    "preview": "#!/usr/bin/env python\n\nimport sys, yaml, json\n\nprint(yaml.dump(json.loads(sys.stdin.read()), indent=2, sort_keys=False))"
  },
  {
    "path": "zillion/scripts/load_config.py",
    "chars": 1241,
    "preview": "#!/usr/bin/env python\n\"\"\"Helper script to load a config for testing/inspection. This will drop \nyou into a PDB shell wit"
  },
  {
    "path": "zillion/scripts/run_report.py",
    "chars": 2962,
    "preview": "#!/usr/bin/env python\n\"\"\"Helper script to run reports in the command line\"\"\"\n\nimport ast\nimport logging\n\nfrom tlbx impor"
  },
  {
    "path": "zillion/scripts/yaml_to_json.py",
    "chars": 109,
    "preview": "#!/usr/bin/env python\n\nimport sys, yaml, json\n\nprint(json.dumps(yaml.safe_load(sys.stdin.read()), indent=4))\n"
  },
  {
    "path": "zillion/sql_utils.py",
    "chars": 16832,
    "preview": "import ast\nimport os\nimport re\n\nimport sqlalchemy as sa\nfrom sqlalchemy.dialects.mysql import dialect as mysql_dialect\nf"
  },
  {
    "path": "zillion/version.py",
    "chars": 46,
    "preview": "\"\"\"Package version\"\"\"\n\n__version__ = \"0.14.0\"\n"
  },
  {
    "path": "zillion/warehouse.py",
    "chars": 42411,
    "preview": "from collections import defaultdict, OrderedDict\nimport time\n\nimport sqlalchemy as sa\n\nfrom zillion.configs import load_"
  }
]

// ... and 6 more files (download for full content)

About this extraction

This page contains the full source code of the totalhack/zillion GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 117 files (2.7 MB), approximately 700.7k tokens, and a symbol index with 969 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo