Full Code of alteryx/compose for AI

main 87953ceaab69 cached

96 files

740.1 KB

390.3k tokens

221 symbols

1 requests

Download .txt

Showing preview only (776K chars total). Download the full file or copy to clipboard to get everything.

Repository: alteryx/compose
Branch: main
Commit: 87953ceaab69
Files: 96
Total size: 740.1 KB

Directory structure:
gitextract_0qfvetgz/

├── .codecov.yml
├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── blank_issue.md
│   │   ├── bug_report.md
│   │   ├── config.yml
│   │   ├── documentation_improvement.md
│   │   └── feature_request.md
│   ├── auto_assign.yml
│   └── workflows/
│       ├── auto_approve_dependency_PRs.yml
│       ├── build_docs.yml
│       ├── create_feedstock_pr.yaml
│       ├── install_test.yml
│       ├── latest_dependency_checker.yml
│       ├── lint_check.yml
│       ├── release.yml
│       ├── release_notes_updated.yml
│       └── unit_tests_with_latest_deps.yml
├── .gitignore
├── .pre-commit-config.yaml
├── .readthedocs.yaml
├── LICENSE
├── Makefile
├── README.md
├── composeml/
│   ├── __init__.py
│   ├── conftest.py
│   ├── data_slice/
│   │   ├── __init__.py
│   │   ├── extension.py
│   │   ├── generator.py
│   │   └── offset.py
│   ├── demos/
│   │   ├── __init__.py
│   │   └── transactions.csv
│   ├── label_maker.py
│   ├── label_search.py
│   ├── label_times/
│   │   ├── __init__.py
│   │   ├── description.py
│   │   ├── deserialize.py
│   │   ├── object.py
│   │   └── plots.py
│   ├── tests/
│   │   ├── __init__.py
│   │   ├── requirement_files/
│   │   │   ├── latest_core_dependencies.txt
│   │   │   ├── minimum_core_requirements.txt
│   │   │   └── minimum_test_requirements.txt
│   │   ├── test_data_slice/
│   │   │   ├── __init__.py
│   │   │   ├── test_extension.py
│   │   │   └── test_offset.py
│   │   ├── test_datasets.py
│   │   ├── test_featuretools.py
│   │   ├── test_label_maker.py
│   │   ├── test_label_plots.py
│   │   ├── test_label_serialization.py
│   │   ├── test_label_times.py
│   │   ├── test_label_transforms/
│   │   │   ├── __init__.py
│   │   │   ├── test_bin.py
│   │   │   ├── test_lead.py
│   │   │   ├── test_sample.py
│   │   │   └── test_threshold.py
│   │   ├── test_version.py
│   │   └── utils.py
│   ├── update_checker.py
│   └── version.py
├── contributing.md
├── docs/
│   ├── Makefile
│   ├── make.bat
│   └── source/
│       ├── _static/
│       │   └── style.css
│       ├── _templates/
│       │   ├── class.rst
│       │   └── layout.html
│       ├── api_reference.rst
│       ├── conf.py
│       ├── examples/
│       │   ├── demo/
│       │   │   ├── __init__.py
│       │   │   ├── chicago_bike/
│       │   │   │   ├── __init__.py
│       │   │   │   └── sample.csv
│       │   │   ├── next_purchase/
│       │   │   │   ├── __init__.py
│       │   │   │   └── sample.csv
│       │   │   ├── turbofan_degredation/
│       │   │   │   ├── __init__.py
│       │   │   │   └── sample.csv
│       │   │   └── utils.py
│       │   ├── predict_bike_trips.ipynb
│       │   ├── predict_next_purchase.ipynb
│       │   └── predict_turbofan_degredation.ipynb
│       ├── images/
│       │   ├── innovation_labs.xml
│       │   ├── label-maker.xml
│       │   ├── labeling-function.xml
│       │   └── workflow.xml
│       ├── index.rst
│       ├── install.md
│       ├── release_notes.rst
│       ├── resources/
│       │   ├── faq.ipynb
│       │   └── help.rst
│       ├── resources.rst
│       ├── start.ipynb
│       ├── tutorials.rst
│       ├── user_guide/
│       │   ├── controlling_cutoff_times.ipynb
│       │   ├── data_slice_generator.ipynb
│       │   └── using_label_transforms.ipynb
│       └── user_guide.rst
├── pyproject.toml
└── release.md

================================================
FILE CONTENTS
================================================

================================================
FILE: .codecov.yml
================================================
codecov:
  notify:
    require_ci_to_pass: yes

comment:
  layout: "diff, files"

coverage:
  precision: 2
  round: down
  range: 90..100
  status:
    project:
      default:
        target: 100%
    patch:
      default:
        target: 100%
    changes: no

ignore:
  - "composeml/update_checker.py"


================================================
FILE: .github/ISSUE_TEMPLATE/blank_issue.md
================================================
---
name: Blank Issue
about: Create a blank issue
title: ''
labels: ''
assignees: ''

---


================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug Report
about: Create a bug report to help us improve Compose
title: ''
labels: 'bug'
assignees: ''

---

[A clear and concise description of what the bug is.]

#### Code Sample, a copy-pastable example to reproduce your bug.

```python
# Your code here

```


================================================
FILE: .github/ISSUE_TEMPLATE/config.yml
================================================
blank_issues_enabled: true
contact_links:
  - name: General Technical Question
    about: "If you have a question like *How should I create my label times?* you can ask on StackOverflow using the #compose-ml tag."
    url: https://stackoverflow.com/questions/tagged/compose-ml
  - name: Real-time chat
    url: https://join.slack.com/t/alteryx-oss/shared_invite/zt-182tyvuxv-NzIn6eiCEf8TBziuKp0bNA
    about: "If you want to meet others in the community and chat about all things Alteryx OSS then check out our Slack."


================================================
FILE: .github/ISSUE_TEMPLATE/documentation_improvement.md
================================================
---
name: Documentation Improvement
about: Suggest an idea for improving the documentation
title: ''
labels: 'documentation'
assignees: ''

---

[a description of what documentation you believe needs to be fixed/improved]


================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================
---
name: Feature Request
about: Suggest an idea for this project
title: ''
labels: 'new feature'
assignees: ''

---

- As a [user/developer], I wish I could use Compose to ...

#### Code Example

```python
# Your code here, if applicable

```


================================================
FILE: .github/auto_assign.yml
================================================
# Set to author to set pr creator as assignee
addAssignees: author


================================================
FILE: .github/workflows/auto_approve_dependency_PRs.yml
================================================
name: Auto Approve Dependency PRs
on:
  schedule:
      - cron: '*/30 * * * *'
  workflow_dispatch:
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Find dependency PRs
        id: find_prs
        run: |
          gh auth status
          gh pr list --repo "${{ github.repository }}" --assignee "machineFL" --base main --state open --search "status:success review:required" --limit 1 --json number > dep_PRs_waiting_approval.json
          dep_pull_request=$(cat dep_PRs_waiting_approval.json | grep -Eo "[0-9]*")
          echo ::set-output name=dep_pull_request::${dep_pull_request}
        env:
          GITHUB_TOKEN: ${{ secrets.AUTO_APPROVE_TOKEN }}
      - name: Approve dependency PRs and enable auto-merge
        if: ${{ steps.find_prs.outputs.dep_pull_request > 1 }}
        run: |
          gh pr review --repo "${{ github.repository }}" --comment --body "auto approve" ${{ steps.find_prs.outputs.dep_pull_request }}
          gh pr review --repo "${{ github.repository }}" --approve ${{ steps.find_prs.outputs.dep_pull_request }}
          gh pr merge --repo "${{ github.repository }}" --auto --squash --delete-branch ${{ steps.find_prs.outputs.dep_pull_request }}
        env:
          GITHUB_TOKEN: ${{ secrets.AUTO_APPROVE_TOKEN }}


================================================
FILE: .github/workflows/build_docs.yml
================================================
on:
  pull_request:
    types: [opened, synchronize]
  push:
    branches:
      - main

name: Build Docs
jobs:
  doc_tests:
    name: Doc Tests / Python 3.8
    runs-on: ubuntu-latest
    steps:
      - name: Set up Python 3.8
        uses: actions/setup-python@v4
        with:
          python-version: 3.8
      - name: Checkout repository
        uses: actions/checkout@v3
        with:
          ref: ${{ github.event.pull_request.head.ref }}
          repository: ${{ github.event.pull_request.head.repo.full_name }}
      - name: Build source distribution
        run: make package
      - name: Install package with doc requirements
        run: |
          python -m pip config --site set global.progress_bar off
          python -m pip install unpacked_sdist/
          python -m pip install unpacked_sdist/[dev]
          python -m pip install unpacked_sdist/[docs]
          python -m pip check
          sudo apt install -q -y pandoc
          sudo apt install -q -y graphviz
      - name: Run doc tests
        run: make -C docs/ -e "SPHINXOPTS=-W" clean html


================================================
FILE: .github/workflows/create_feedstock_pr.yaml
================================================
name: Create Feedstock PR
on:
  workflow_dispatch:
    inputs:
      version:
        description: 'released PyPI version to use (ex - v1.11.1)'
        required: true
jobs:
  create_feedstock_pr:
    name: Create Feedstock PR
    runs-on: ubuntu-latest
    steps:
      - name: Checkout inputted version
        uses: actions/checkout@v3
        with:
          repository: ${{ github.event.pull_request.head.repo.full_name }}
          ref: ${{ github.event.inputs.version }}
          path: "./compose"
      - name: Pull latest from upstream for user forked feedstock
        run: |
          gh auth status
          gh repo sync alteryx/composeml-feedstock --branch main --source conda-forge/composeml-feedstock --force
        env:
          GITHUB_TOKEN: ${{ secrets.AUTO_APPROVE_TOKEN }}
      - uses: actions/checkout@v3
        with:
          repository: alteryx/composeml-feedstock
          ref: main
          path: "./composeml-feedstock"
          fetch-depth: '0'
      - name: Run Create Feedstock meta YAML
        id: create-feedstock-meta
        uses: alteryx/create-feedstock-meta-yaml@v4
        with:
          project: "composeml"
          pypi_version: ${{ github.event.inputs.version }}
          project_metadata_filepath: "compose/pyproject.toml"
          meta_yaml_filepath: "composeml-feedstock/recipe/meta.yaml"
      - name: View updated meta yaml
        run: cat composeml-feedstock/recipe/meta.yaml
      - name: Push updated yaml
        run: |
          cd composeml-feedstock
          git config --unset-all http.https://github.com/.extraheader
          git config --global user.email "machineOSS@alteryx.com"
          git config --global user.name "machineAYX Bot"
          git remote set-url origin https://${{ secrets.AUTO_APPROVE_TOKEN }}@github.com/alteryx/composeml-feedstock
          git checkout -b ${{ github.event.inputs.version }}
          git add recipe/meta.yaml
          git commit -m "${{ github.event.inputs.version }}"
          git push origin ${{ github.event.inputs.version }}
      - name: Adding URL to job output
        run: |
          echo "Conda Feedstock Pull Request: https://github.com/alteryx/composeml-feedstock/pull/new/${{ github.event.inputs.version }}" >> $GITHUB_STEP_SUMMARY


================================================
FILE: .github/workflows/install_test.yml
================================================
on:
  pull_request:
    types: [opened, synchronize]
  push:
    branches:
      - main

name: Install Test
jobs:
  install_cm_complete:
    name: ${{ matrix.os }} - ${{ matrix.python_version }} install compose
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest, macos-latest]
        python_version: ["3.8", "3.9", "3.10", "3.11"]
    runs-on: ${{ matrix.os }}
    steps:
      - name: Set up python ${{ matrix.python_version }}
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python_version }}
      - name: Checkout repository
        uses: actions/checkout@v3
      - name: Build compose package
        run: make package
      - name: Install compose complete from sdist
        run: |
          pip config --site set global.progress_bar off
          python -m pip install "unpacked_sdist/[complete]"
      - name: Test by importing packages
        run: |
          python -c "import alteryx_open_src_update_checker"
        env:
          ALTERYX_OPEN_SRC_UPDATE_CHECKER: False
      - name: Check package conflicts
        run: |
          python -m pip check


================================================
FILE: .github/workflows/latest_dependency_checker.yml
================================================
# This workflow will install dependenies and if any critical dependencies have changed a pull request
# will be created which will trigger a CI run with the new dependencies.

name: Latest Dependency Checker
on:
  workflow_dispatch:
  schedule:
    - cron: '0 * * * *'
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - name: Set up Python 3.8
      uses: actions/setup-python@v4
      with:
        python-version: '3.8.x'

    - name: Install pip and virtualenv
      run: |
        python -m pip install --upgrade pip
        python -m pip install virtualenv
    - name: Update latest core dependencies
      run: |
        python -m virtualenv venv_core
        source venv_core/bin/activate
        python -m pip install --upgrade pip
        python -m pip install .[test]
        make checkdeps OUTPUT_FILEPATH=composeml/tests/requirement_files/latest_core_dependencies.txt

    - name: Create Pull Request
      uses: peter-evans/create-pull-request@v3
      with:
        token: ${{ secrets.REPO_SCOPED_TOKEN }}
        commit-message: Update latest dependencies
        title: Automated Latest Dependency Updates
        author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
        body: "This is an auto-generated PR with **latest** dependency updates.
               Please do not delete the `latest-dep-update` branch because it's needed by the auto-dependency bot."
        branch: latest-dep-update
        branch-suffix: short-commit-hash
        base: main
        assignees: machineFL
        reviewers: machineAYX


================================================
FILE: .github/workflows/lint_check.yml
================================================
on:
  pull_request:
    types: [opened, synchronize]
  push:
    branches:
      - main

name: Lint Check
jobs:
  lint_test:
    name: ${{ matrix.python_version }} lint check
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        python_version: ["3.11"]
    steps:
      - name: Set up python ${{ matrix.python_version }}
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python_version }}
      - name: Checkout repository
        uses: actions/checkout@v3
        with:
          ref: ${{ github.event.pull_request.head.ref }}
          repository: ${{ github.event.pull_request.head.repo.full_name }}
      - name: Build compose package
        run: make package
      - name: Install compose with dev, and test requirements
        run: |
          pip config --site set global.progress_bar off
          python -m pip install --upgrade pip
          python -m pip install .[dev]
      - name: Run lint test
        run: make lint


================================================
FILE: .github/workflows/release.yml
================================================
on:
  release:
    types: [published]

name: Release
jobs:
  pypi:
    name: Release to PyPI
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - name: Remove docs and tests before release
      run: |
        rm -rf docs/
    - name: Upload to PyPI
      uses: FeatureLabs/gh-action-pypi-upload@v2
      env:
        PYPI_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
        PYPI_USERNAME: ${{ secrets.PYPI_USERNAME }}
        TEST_PYPI_USERNAME: ${{ secrets.TEST_PYPI_USERNAME }}
        TEST_PYPI_PASSWORD: ${{ secrets.TEST_PYPI_PASSWORD }}


================================================
FILE: .github/workflows/release_notes_updated.yml
================================================
name: Release Notes Updated

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  release_notes_updated:
    name: release notes updated
    runs-on: ubuntu-latest
    steps:
      - name: Check for development branch
        id: branch
        shell: python
        run: |
          from re import compile
          main = '^main$'
          release = '^release_v\d+\.\d+\.\d+$'
          dep_update = '^latest-dep-update-[a-f0-9]{7}$'
          min_dep_update = '^min-dep-update-[a-f0-9]{7}$'
          regex = main, release, dep_update, min_dep_update
          patterns = list(map(compile, regex))
          ref = "${{ github.event.pull_request.head.ref }}"
          is_dev = not any(pattern.match(ref) for pattern in patterns)
          print('::set-output name=is_dev::' + str(is_dev))

      - name: Checkout repository
        if: ${{ steps.branch.outputs.is_dev == 'True' }}
        uses: actions/checkout@v3
        with:
          ref: ${{ github.event.pull_request.head.ref }}
          repository: ${{ github.event.pull_request.head.repo.full_name }}

      - name: Check if release notes were updated
        if: ${{ steps.branch.outputs.is_dev == 'True' }}
        run: cat docs/source/release_notes.rst | grep ":pr:\`${{ github.event.number }}\`"


================================================
FILE: .github/workflows/unit_tests_with_latest_deps.yml
================================================
on:
  pull_request:
    types: [opened, synchronize]
  push:
    branches:
      - main

name: Unit Tests - Latest Dependencies
jobs:
  unit_tests:
    name: Unit Tests / Python ${{ matrix.python-version }}
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.8", "3.9", "3.10", "3.11"]
    steps:
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python-version }}
      - name: Checkout repository
        uses: actions/checkout@v3
        with:
          ref: ${{ github.event.pull_request.head.ref }}
          repository: ${{ github.event.pull_request.head.repo.full_name }}
      - name: Build source distribution
        run: make package
      - name: Install package with test requirements
        run: |
          python -m pip config --site set global.progress_bar off
          python -m pip install --upgrade pip
          python -m pip install unpacked_sdist/[test]
      - if: ${{ matrix.python-version == 3.8 }}
        name: Run unit tests with code coverage
        run: |
          coverage erase
          cd unpacked_sdist/
          pytest composeml/ -n auto --cov=composeml --cov-config=../pyproject.toml --cov-report=xml:../coverage.xml
      - if: ${{ matrix.python-version != 3.8 }}
        name: Run unit tests with no code coverage
        run: |
          cd unpacked_sdist/
          pytest composeml/ -n auto
      - if: ${{ matrix.python-version == 3.8 }}
        name: Upload coverage to Codecov
        uses: codecov/codecov-action@v3
        with:
          token: ${{ secrets.CODECOV_TOKEN }}
          fail_ci_if_error: true
          files: ${{ github.workspace }}/coverage.xml
          verbose: true


================================================
FILE: .gitignore
================================================
cb_model.json
.DS_Store

# IDE
.vscode
docs/source/examples/demo/*/download

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/
docs/source/generated

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/


================================================
FILE: .pre-commit-config.yaml
================================================
exclude: |
  (?x)
  .html$|.csv$|.svg$|.md$|.txt$|.json$|.xml$|.pickle$|^.github/|
  (LICENSE.*|README.*)
default_stages: [commit]
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: 'v4.3.0'
    hooks:
      - id: check-yaml
      - id: end-of-file-fixer
      - id: trailing-whitespace
  - repo: https://github.com/kynan/nbstripout
    rev: 0.5.0
    hooks:
      - id: nbstripout
        entry: nbstripout
        language: python
        types: [jupyter]
  - repo: https://github.com/MarcoGorelli/absolufy-imports
    rev: 'v0.3.1'
    hooks:
      - id: absolufy-imports
        files: ^composeml/
  - repo: https://github.com/asottile/add-trailing-comma
    rev: v2.2.3
    hooks:
      - id: add-trailing-comma
        name: Add trailing comma
  - repo: https://github.com/python/black
    rev: 22.12.0
    hooks:
      - id: black
        args:
          - --config=./pyproject.toml
        additional_dependencies: [".[jupyter]"]
        types_or: [python, jupyter]
  - repo: https://github.com/charliermarsh/ruff-pre-commit
    rev: 'v0.0.231'
    hooks:
      - id: ruff
        args:
          - --config=./pyproject.toml
          - --fix


================================================
FILE: .readthedocs.yaml
================================================
# .readthedocs.yml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Build documentation in the docs/ directory with Sphinx
sphinx:
  configuration: docs/source/conf.py
  fail_on_warning: true

# Optionally build your docs in additional formats such as PDF and ePub
formats: []

# Optionally set the version of Python and requirements required to build your docs
python:
  version: "3.8"
  install:
    - method: pip
      path: .
      extra_requirements:
        - dev
        - docs


================================================
FILE: LICENSE
================================================
BSD 3-Clause License

Copyright (c) 2017, Feature Labs, Inc.
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
  contributors may be used to endorse or promote products derived from
  this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


================================================
FILE: Makefile
================================================
.PHONY: clean
clean:
	find . -name '*.pyo' -delete
	find . -name '*.pyc' -delete
	find . -name __pycache__ -delete
	find . -name '*~' -delete
	find . -name '.coverage.*' -delete

.PHONY: lint
lint:
	black . --check --config=./pyproject.toml
	ruff . --config=./pyproject.toml

.PHONY: lint-fix
lint-fix:
	black . --config=./pyproject.toml
	ruff . --fix --config=./pyproject.toml

.PHONY: test
test:
	python -m pytest composeml/ -n auto

.PHONY: testcoverage
testcoverage:
	python -m pytest composeml/ --cov=composeml -n auto

.PHONY: installdeps
installdeps: upgradepip
	pip install -e ".[dev]"

.PHONY: checkdeps
checkdeps:
	$(eval allow_list='matplotlib|pandas|seaborn|woodwork|featuretools|evalml|tqdm')
	pip freeze | grep -v "alteryx/compose.git" | grep -E $(allow_list) > $(OUTPUT_FILEPATH)

.PHONY: upgradepip
upgradepip:
	python -m pip install --upgrade pip

.PHONY: upgradebuild
upgradebuild:
	python -m pip install --upgrade build

.PHONY: upgradesetuptools
upgradesetuptools:
	python -m pip install --upgrade setuptools

.PHONY: package
package: upgradepip upgradebuild upgradesetuptools
	python -m build
	$(eval PACKAGE=$(shell python -c 'import setuptools; setuptools.setup()' --version))
	tar -zxvf "dist/composeml-${PACKAGE}.tar.gz"
	mv "composeml-${PACKAGE}" unpacked_sdist


================================================
FILE: README.md
================================================
<p align="center"><img width=50% src="https://raw.githubusercontent.com/alteryx/compose/main/docs/source/images/compose.png" alt="Compose" /></p>
<p align="center"><i>"Build better training examples in a fraction of the time."</i></p>
<p align="center">
    <a href="https://github.com/alteryx/compose/actions?query=workflow%3ATests" target="_blank">
        <img src="https://github.com/alteryx/compose/workflows/Tests/badge.svg" alt="Tests" />
    </a>
    <a href="https://codecov.io/gh/alteryx/compose">
        <img src="https://codecov.io/gh/alteryx/compose/branch/main/graph/badge.svg?token=mDz4ueTUEO"/>
    </a>
    <a href="https://compose.alteryx.com/en/stable/?badge=stable" target="_blank">
        <img src="https://readthedocs.com/projects/feature-labs-inc-compose/badge/?version=stable&token=5c3ace685cdb6e10eb67828a4dc74d09b20bb842980c8ee9eb4e9ed168d05b00"
            alt="ReadTheDocs" />
    </a>
    <a href="https://badge.fury.io/py/composeml" target="_blank">
        <img src="https://badge.fury.io/py/composeml.svg?maxAge=2592000" alt="PyPI Version" />
    </a>
    <a href="https://stackoverflow.com/questions/tagged/compose-ml" target="_blank">
        <img src="https://img.shields.io/badge/questions-on_stackoverflow-blue.svg?" alt="StackOverflow" />
    </a>
    <a href="https://pepy.tech/project/composeml" target="_blank">
        <img src="https://pepy.tech/badge/composeml/month" alt="PyPI Downloads" />
    </a>
</p>
<hr>

[Compose](https://compose.alteryx.com) is a machine learning tool for automated prediction engineering. It allows you to structure prediction problems and generate labels for supervised learning. An end user defines an outcome of interest by writing a *labeling function*, then runs a search to automatically extract training examples from historical data. Its result is then provided to [Featuretools](https://docs.featuretools.com/) for automated feature engineering and subsequently to [EvalML](https://evalml.alteryx.com/) for automated machine learning. The workflow of an applied machine learning engineer then becomes:

<br><p align="center"><img width=90% src="https://raw.githubusercontent.com/alteryx/compose/main/docs/source/images/workflow.png" alt="Compose" /></p><br>

By automating the early stage of the machine learning pipeline, our end user can easily define a task and solve it. See the [documentation](https://compose.alteryx.com) for more information.

## Installation
Install with pip

```
python -m pip install composeml
```

or from the Conda-forge channel on [conda](https://anaconda.org/conda-forge/composeml):

```
conda install -c conda-forge composeml
```

### Add-ons

**Update checker** - Receive automatic notifications of new Compose releases

```
python -m pip install "composeml[update_checker]"
```

## Example
> Will a customer spend more than 300 in the next hour of transactions?

In this example, we automatically generate new training examples from a historical dataset of transactions.

```python
import composeml as cp
df = cp.demos.load_transactions()
df = df[df.columns[:7]]
df.head()
```

<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>transaction_id</th>
      <th>session_id</th>
      <th>transaction_time</th>
      <th>product_id</th>
      <th>amount</th>
      <th>customer_id</th>
      <th>device</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>298</td>
      <td>1</td>
      <td>2014-01-01 00:00:00</td>
      <td>5</td>
      <td>127.64</td>
      <td>2</td>
      <td>desktop</td>
    </tr>
    <tr>
      <td>10</td>
      <td>1</td>
      <td>2014-01-01 00:09:45</td>
      <td>5</td>
      <td>57.39</td>
      <td>2</td>
      <td>desktop</td>
    </tr>
    <tr>
      <td>495</td>
      <td>1</td>
      <td>2014-01-01 00:14:05</td>
      <td>5</td>
      <td>69.45</td>
      <td>2</td>
      <td>desktop</td>
    </tr>
    <tr>
      <td>460</td>
      <td>10</td>
      <td>2014-01-01 02:33:50</td>
      <td>5</td>
      <td>123.19</td>
      <td>2</td>
      <td>tablet</td>
    </tr>
    <tr>
      <td>302</td>
      <td>10</td>
      <td>2014-01-01 02:37:05</td>
      <td>5</td>
      <td>64.47</td>
      <td>2</td>
      <td>tablet</td>
    </tr>
  </tbody>
</table>

First, we represent the prediction problem with a labeling function and a label maker.

```python
def total_spent(ds):
    return ds['amount'].sum()

label_maker = cp.LabelMaker(
    target_dataframe_index="customer_id",
    time_index="transaction_time",
    labeling_function=total_spent,
    window_size="1h",
)
```

Then, we run a search to automatically generate the training examples.

```python
label_times = label_maker.search(
    df.sort_values('transaction_time'),
    num_examples_per_instance=2,
    minimum_data='2014-01-01',
    drop_empty=False,
    verbose=False,
)

label_times = label_times.threshold(300)
label_times.head()
```

<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>customer_id</th>
      <th>time</th>
      <th>total_spent</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>2014-01-01 00:00:00</td>
      <td>True</td>
    </tr>
    <tr>
      <td>1</td>
      <td>2014-01-01 01:00:00</td>
      <td>True</td>
    </tr>
    <tr>
      <td>2</td>
      <td>2014-01-01 00:00:00</td>
      <td>False</td>
    </tr>
    <tr>
      <td>2</td>
      <td>2014-01-01 01:00:00</td>
      <td>False</td>
    </tr>
    <tr>
      <td>3</td>
      <td>2014-01-01 00:00:00</td>
      <td>False</td>
    </tr>
  </tbody>
</table>

We now have labels that are ready to use in [Featuretools](https://docs.featuretools.com/) to generate features.

## Support

The Innovation Labs open source community is happy to provide support to users of Compose. Project support can be found in three places depending on the type of question:

1. For usage questions, use [Stack Overflow](https://stackoverflow.com/questions/tagged/compose-ml) with the `composeml` tag.
2. For bugs, issues, or feature requests start a Github [issue](https://github.com/alteryx/compose/issues/new).
3. For discussion regarding development on the core library, use [Slack](https://join.slack.com/t/alteryx-oss/shared_invite/zt-182tyvuxv-NzIn6eiCEf8TBziuKp0bNA).
4. For everything else, the core developers can be reached by email at open_source_support@alteryx.com

## Citing Compose
Compose is built upon a newly defined part of the machine learning process — prediction engineering. If you use Compose, please consider citing this paper:
James Max Kanter, Gillespie, Owen, Kalyan Veeramachaneni. [Label, Segment,Featurize: a cross domain framework for prediction engineering.](https://dai.lids.mit.edu/wp-content/uploads/2017/10/Pred_eng1.pdf) IEEE DSAA 2016.

BibTeX entry:

```bibtex
@inproceedings{kanter2016label,
  title={Label, segment, featurize: a cross domain framework for prediction engineering},
  author={Kanter, James Max and Gillespie, Owen and Veeramachaneni, Kalyan},
  booktitle={2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)},
  pages={430--439},
  year={2016},
  organization={IEEE}
}
```

## Acknowledgements 

The open source development has been supported in part by DARPA's Data driven discovery of models program (D3M). 

## Alteryx

**Compose** is an open source project maintained by [Alteryx](https://www.alteryx.com). We developed Compose to enable flexible definition of the machine learning task. To see the other open source projects we’re working on visit [Alteryx Open Source](https://www.alteryx.com/open-source). If building impactful data science pipelines is important to you or your business, please get in touch.

<p align="center">
  <a href="https://www.alteryx.com/open-source">
    <img src="https://alteryx-oss-web-images.s3.amazonaws.com/OpenSource_Logo-01.png" alt="Alteryx Open Source" width="800"/>
  </a>
</p>


================================================
FILE: composeml/__init__.py
================================================
# flake8:noqa
from composeml.version import __version__
from composeml import demos, update_checker
from composeml.label_maker import LabelMaker
from composeml.label_times import LabelTimes, read_label_times


================================================
FILE: composeml/conftest.py
================================================
import pandas as pd
import pytest

from composeml import LabelTimes
from composeml.tests.utils import read_csv


@pytest.fixture(scope="session")
def transactions():
    df = read_csv(
        data=[
            "time,amount,customer_id",
            "2019-01-01 08:00:00,1,0",
            "2019-01-01 08:30:00,1,0",
            "2019-01-01 09:00:00,1,1",
            "2019-01-01 09:30:00,1,1",
            "2019-01-01 10:00:00,1,1",
            "2019-01-01 10:30:00,1,2",
            "2019-01-01 11:00:00,1,2",
            "2019-01-01 11:30:00,1,2",
            "2019-01-01 12:00:00,1,2",
            "2019-01-01 12:30:00,1,3",
        ],
    )
    return df


@pytest.fixture(scope="session")
def total_spent_fn():
    def total_spent(df):
        value = df.amount.sum()
        return value

    return total_spent


@pytest.fixture(scope="session")
def unique_amounts_fn():
    def unique_amounts(df):
        return df.amount.nunique()

    return unique_amounts


@pytest.fixture
def total_spent():
    data = [
        "customer_id,time,total_spent",
        "0,2019-01-01 08:00:00,9",
        "0,2019-01-01 08:30:00,8",
        "1,2019-01-01 09:00:00,7",
        "1,2019-01-01 09:30:00,6",
        "1,2019-01-01 10:00:00,5",
        "2,2019-01-01 10:30:00,4",
        "2,2019-01-01 11:00:00,3",
        "2,2019-01-01 11:30:00,2",
        "2,2019-01-01 12:00:00,1",
        "3,2019-01-01 12:30:00,0",
    ]

    data = read_csv(data, parse_dates=["time"])

    kwargs = {
        "data": data,
        "target_columns": ["total_spent"],
        "target_dataframe_index": "customer_id",
        "search_settings": {
            "num_examples_per_instance": -1,
        },
    }

    label_times = LabelTimes(**kwargs)
    return label_times


@pytest.fixture
def labels():
    records = [
        {
            "label_id": 0,
            "customer_id": 1,
            "time": "2014-01-01 00:45:00",
            "my_labeling_function": 226.92999999999998,
        },
        {
            "label_id": 1,
            "customer_id": 1,
            "time": "2014-01-01 00:48:00",
            "my_labeling_function": 47.95,
        },
        {
            "label_id": 2,
            "customer_id": 2,
            "time": "2014-01-01 00:01:00",
            "my_labeling_function": 283.46000000000004,
        },
        {
            "label_id": 3,
            "customer_id": 2,
            "time": "2014-01-01 00:04:00",
            "my_labeling_function": 31.54,
        },
    ]

    dtype = {"time": "datetime64[ns]"}
    values = pd.DataFrame(records).astype(dtype).set_index("label_id")
    values = values[["customer_id", "time", "my_labeling_function"]]
    values = LabelTimes(
        values,
        target_columns=["my_labeling_function"],
        target_dataframe_index="customer_id",
    )
    return values


@pytest.fixture(autouse=True)
def add_labels(doctest_namespace, labels):
    doctest_namespace["labels"] = labels


================================================
FILE: composeml/data_slice/__init__.py
================================================
# flake8:noqa
from composeml.data_slice.generator import DataSliceGenerator


================================================
FILE: composeml/data_slice/extension.py
================================================
import pandas as pd

from composeml.data_slice.offset import DataSliceOffset, DataSliceStep


class DataSliceContext:
    """Tracks contextual attributes about a data slice."""

    def __init__(
        self,
        slice_number=0,
        slice_start=None,
        slice_stop=None,
        next_start=None,
    ):
        """Creates the data slice context.

        Args:
            slice_number (int): The latest count of data slices.
            slice_start (int or Timestamp): When the data slice starts.
            slice_stop (int or Timestamp): When the data slice stops.
            next_start (int or Timestamp): When the next data slice starts.
        """
        self.next_start = next_start
        self.slice_stop = slice_stop
        self.slice_start = slice_start
        self.slice_number = slice_number

    def __repr__(self):
        """Represents the data slice context as a string."""
        return self._series.fillna("").to_string()

    @property
    def _series(self):
        """Represents the data slice context as a pandas series."""
        keys = reversed(list(vars(self)))
        attrs = {key: getattr(self, key) for key in keys}
        context = pd.Series(attrs, name="context")
        return context

    @property
    def count(self):
        """Alias for the data slice number."""
        return self.slice_number

    @property
    def start(self):
        """Alias for the start point of a data slice."""
        return self.slice_start

    @property
    def stop(self):
        """Alias for the stopping point of a data slice."""
        return self.slice_stop


class DataSliceFrame(pd.DataFrame):
    """Subclasses pandas data frame for data slice."""

    _metadata = ["context"]

    @property
    def _constructor(self):
        return DataSliceFrame

    @property
    def ctx(self):
        """Alias for the data slice context."""
        return self.context


@pd.api.extensions.register_dataframe_accessor("slice")
class DataSliceExtension:
    def __init__(self, df):
        self._df = df

    def __call__(self, size=None, start=None, stop=None, step=None, drop_empty=True):
        """Returns a data slice generator based on the data frame.

        Args:
            size (int or str): The size of each data slice. A string represents a timedelta or frequency.
                An integer represents the number of rows. The default value is the length of the data frame.
            start (int or str): Where to start the first data slice.
            stop (int or str): Where to stop generating data slices.
            step (int or str): The step size between data slices. The default value is the data slice size.
            drop_empty (bool): Whether to drop empty data slices. The default value is True.

        Returns:
            ds (generator): Returns a generator of data slices.
        """
        self._check_index()
        offsets = self._check_offsets(size, start, stop, step)
        generator = self._apply(*offsets, drop_empty=drop_empty)
        return generator

    def __getitem__(self, offset):
        """Generates data slices from a slice object."""
        if not isinstance(offset, slice):
            raise TypeError("must be a slice object")
        return self(size=offset.step, start=offset.start, stop=offset.stop)

    def _apply(self, size, start, stop, step, drop_empty=True):
        """Generates data slices based on the data frame."""
        df = self._apply_start(self._df, start, step)
        if df.empty and drop_empty:
            return df

        df, slice_number = DataSliceFrame(df), 1
        while start.value and start.value <= stop.value:
            if df.empty and drop_empty:
                break
            ds = self._apply_size(df, start, size)
            df = self._apply_step(df, start, step)
            if ds.empty and drop_empty:
                continue
            ds.context.next_start = start.value
            ds.context.slice_number = slice_number
            slice_number += 1
            yield ds

    def _apply_size(self, df, start, size):
        """Returns a data slice calculated by the offsets."""
        if size._is_offset_position:
            index = self._get_index(df, size.value)
            stop = index or self._last_index
            ds = df.iloc[: size.value]
        else:
            stop = start.value + size.value
            ds = df[:stop]

            # Pandas includes both endpoints when slicing by time.
            # This results in the right endpoint overlapping in consecutive data slices.
            # Resolved by making the right endpoint exclusive.
            # https://pandas.pydata.org/pandas-docs/version/0.19/gotchas.html#endpoints-are-inclusive

            if not ds.empty:
                overlap = ds.index == stop
                if overlap.any():
                    ds = ds[~overlap]

        ds.context = DataSliceContext(slice_start=start.value, slice_stop=stop)
        return ds

    def _apply_start(self, df, start, step):
        """Removes data before the index calculated by the offset."""
        inplace = start.value == self._first_index
        if start._is_offset_position and not inplace:
            df = df.iloc[start.value :]
            first_index = df.first_valid_index()
            start.value = self._first_index = first_index

        if start._is_offset_timestamp and not inplace:
            df = df[df.index >= start.value]
            if step._is_offset_position:
                first_index = df.first_valid_index()
                start.value = self._first_index = first_index

        return df

    def _apply_step(self, df, start, step):
        """Strides the first index by the offset."""
        if step._is_offset_position:
            df = df.iloc[step.value :]
            first_index = df.first_valid_index()
            start.value = first_index
        else:
            start.value += step.value
            df = df[start.value :]

        return df

    def _check_index(self):
        """Checks if index values are null or unsorted."""
        null = self._df.index.isnull().any()
        assert not null, "index contains null values"
        assert self._is_sorted, "data frame must be sorted chronologically"
        self._first_index = self._df.first_valid_index()
        self._last_index = self._df.last_valid_index()

    def _check_offsets(self, size, start, stop, step):
        """Checks for valid data slice offsets."""
        size = self._check_size(size or len(self._df))
        start = self._check_start(start or self._first_index)
        stop = self._check_stop(stop or self._last_index)
        step = self._check_step(step or size)
        offsets = size, start, stop, step

        if any(offset._is_offset_frequency for offset in offsets):
            info = "offset by frequency requires a time index"
            assert self._is_time_index, info

        return offsets

    def _check_size(self, size):
        """Checks for valid offset size."""
        if not isinstance(size, DataSliceStep):
            size = DataSliceStep(size)

        assert size._is_positive, "offset must be positive"
        return size

    def _check_start(self, start):
        """Checks for valid offset start."""
        if not isinstance(start, DataSliceOffset):
            start = DataSliceOffset(start)

        if start._is_offset_frequency:
            start.value += self._first_index

        return start

    def _check_step(self, step):
        """Checks for valid offset step."""
        if not isinstance(step, DataSliceStep):
            step = DataSliceStep(step)

        assert step._is_positive, "offset must be positive"
        return step

    def _check_stop(self, stop):
        """Checks for valid offset stop."""
        if not isinstance(stop, DataSliceOffset):
            stop = DataSliceOffset(stop)

        if stop._is_offset_frequency:
            base = "first" if stop._is_positive else "last"
            value = getattr(self, f"_{base}_index")
            stop.value += value

        inplace = stop.value == self._last_index
        if stop._is_offset_position and not inplace:
            index = self._get_index(self._df, stop.value)
            stop.value = index or self._last_index

        return stop

    def _get_index(self, df, i):
        """Helper function for getting index values."""
        if i < df.index.size and df.index.size > 0:
            return df.index[i]

    @property
    def _is_sorted(self):
        """Whether index values are sorted."""
        return self._df.index.is_monotonic_increasing

    @property
    def _is_time_index(self):
        """Whether the data frame has a time index type."""
        return pd.api.types.is_datetime64_any_dtype(self._df.index)


================================================
FILE: composeml/data_slice/generator.py
================================================
from composeml.data_slice.extension import DataSliceContext, DataSliceFrame


class DataSliceGenerator:
    """Generates data slices for the lable maker."""

    def __init__(
        self,
        window_size,
        gap=None,
        min_data=None,
        max_data=None,
        drop_empty=True,
    ):
        self.window_size = window_size
        self.gap = gap
        self.min_data = min_data
        self.max_data = max_data
        self.drop_empty = drop_empty

    def __call__(self, df):
        """Applies the data slice generator to the data frame."""
        is_column = self.window_size in df
        method = "column" if is_column else "time"
        attr = f"_slice_by_{method}"
        return getattr(self, attr)(df)

    def _slice_by_column(self, df):
        """Slices the data frame by an existing column."""
        slices = df.groupby(self.window_size, sort=False)
        slice_number = 1

        for group, ds in slices:
            ds = DataSliceFrame(ds)
            ds.context = DataSliceContext(
                slice_number=slice_number,
                slice_start=ds.first_valid_index(),
                slice_stop=ds.last_valid_index(),
            )
            setattr(ds.context, self.window_size, group)
            del ds.context.next_start
            slice_number += 1
            yield ds

    def _slice_by_time(self, df):
        """Slices the data frame along the time index."""
        data_slices = df.slice(
            size=self.window_size,
            start=self.min_data,
            stop=self.max_data,
            step=self.gap,
            drop_empty=self.drop_empty,
        )

        for ds in data_slices:
            yield ds


================================================
FILE: composeml/data_slice/offset.py
================================================
import re

import pandas as pd


class DataSliceOffset:
    """Offsets for calculating data slice indices."""

    def __init__(self, value):
        self.value = value
        self._check()

    def _check(self):
        """Checks if the value is a valid offset."""
        if isinstance(self.value, str):
            self._parse_value()
        assert self._is_valid_offset, self._invalid_offset_error

    @property
    def _is_offset_base(self):
        """Whether offset is a base type."""
        return issubclass(type(self.value), pd.tseries.offsets.BaseOffset)

    @property
    def _is_offset_position(self):
        """Whether offset is integer-location based."""
        return pd.api.types.is_integer(self.value)

    @property
    def _is_offset_timedelta(self):
        """Whether offset is a timedelta."""
        return isinstance(self.value, pd.Timedelta)

    @property
    def _is_offset_timestamp(self):
        """Whether offset is a timestamp."""
        return isinstance(self.value, pd.Timestamp)

    @property
    def _is_offset_frequency(self):
        """Whether offset is a base type or timedelta."""
        value = self._is_offset_base
        value |= self._is_offset_timedelta
        return value

    def __int__(self):
        """Typecasts offset value to an integer."""
        if self._is_offset_position:
            return self.value
        elif self._is_offset_base:
            return self.value.n
        elif self._is_offset_timedelta:
            return self.value.value
        else:
            raise TypeError("offset must be position or frequency based")

    def __float__(self):
        """Typecasts offset value to a float."""
        if self._is_offset_timestamp:
            return self.value.timestamp()
        else:
            raise TypeError("offset must be a timestamp")

    @property
    def _is_positive(self):
        """Whether the offset value is positive."""
        timestamp = self._is_offset_timestamp
        numeric = float if timestamp else int
        return numeric(self) > 0

    @property
    def _is_valid_offset(self):
        """Whether offset is a valid type."""
        value = self._is_offset_position
        value |= self._is_offset_frequency
        value |= self._is_offset_timestamp
        return value

    @property
    def _invalid_offset_error(self):
        """Returns message for invalid offset."""
        info = "offset must be position or time based\n\n"
        info += "\tFor information about offset aliases, visit the link below.\n"
        info += (
            "\thttps://pandas.pydata.org/docs/user_guide/timeseries.html#offset-aliases"
        )
        return info

    def _parse_offset_alias(self, alias):
        """Parses an alias to an offset."""
        value = self._parse_offset_alias_phrase(alias)
        value = value or pd.tseries.frequencies.to_offset(alias)
        return value

    def _parse_offset_alias_phrase(self, value):
        """Parses an alias phrase to an offset."""
        pattern = re.compile("until start of next (?P<unit>[a-z]+)")
        match = pattern.search(value.lower())

        if match:
            match = match.groupdict()
            unit = match["unit"]

            if unit == "month":
                return pd.offsets.MonthBegin()

            if unit == "year":
                return pd.offsets.YearBegin()

    def _parse_value(self):
        """Parses the value to an offset."""
        for parser in self._parsers:
            try:
                value = parser(self.value)
                if value is not None:
                    self.value = value
                    break
            except Exception:
                continue

    @property
    def _parsers(self):
        """Returns the value parsers."""
        return pd.Timestamp, self._parse_offset_alias, pd.Timedelta


class DataSliceStep(DataSliceOffset):
    @property
    def _is_valid_offset(self):
        """Whether offset is a valid type."""
        value = self._is_offset_position
        value |= self._is_offset_frequency
        return value

    @property
    def _parsers(self):
        """Returns the value parsers."""
        return self._parse_offset_alias, pd.Timedelta


================================================
FILE: composeml/demos/__init__.py
================================================
import os

import pandas as pd

DATA = os.path.join(os.path.dirname(__file__))


def load_transactions():
    path = os.path.join(DATA, "transactions.csv")
    df = pd.read_csv(path, parse_dates=["transaction_time"])
    return df


================================================
FILE: composeml/demos/transactions.csv
================================================
transaction_id,session_id,transaction_time,product_id,amount,customer_id,device,session_start,zip_code,join_date,date_of_birth,brand
298,1,2014-01-01 00:00:00,5,127.64,2,desktop,2014-01-01 00:00:00,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
10,1,2014-01-01 00:09:45,5,57.39,2,desktop,2014-01-01 00:00:00,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
495,1,2014-01-01 00:14:05,5,69.45,2,desktop,2014-01-01 00:00:00,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
460,10,2014-01-01 02:33:50,5,123.19,2,tablet,2014-01-01 02:31:40,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
302,10,2014-01-01 02:37:05,5,64.47,2,tablet,2014-01-01 02:31:40,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
212,10,2014-01-01 02:41:25,5,52.28,2,tablet,2014-01-01 02:31:40,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
440,10,2014-01-01 02:44:40,5,50.45,2,tablet,2014-01-01 02:31:40,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
405,15,2014-01-01 03:42:05,5,47.39,2,desktop,2014-01-01 03:41:00,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
180,15,2014-01-01 03:48:35,5,146.81,2,desktop,2014-01-01 03:41:00,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
220,16,2014-01-01 03:55:05,5,135.48,2,desktop,2014-01-01 03:49:40,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
253,17,2014-01-01 04:00:30,5,41.95,2,tablet,2014-01-01 04:00:30,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
340,17,2014-01-01 04:08:05,5,100.99,2,tablet,2014-01-01 04:00:30,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
301,31,2014-01-01 07:49:05,5,66.86,2,mobile,2014-01-01 07:42:35,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
346,31,2014-01-01 07:51:15,5,18.81,2,mobile,2014-01-01 07:42:35,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
161,31,2014-01-01 07:55:35,5,75.96,2,mobile,2014-01-01 07:42:35,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
420,31,2014-01-01 07:59:55,5,66.1,2,mobile,2014-01-01 07:42:35,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
468,33,2014-01-01 08:11:50,5,46.99,2,mobile,2014-01-01 08:10:45,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
281,33,2014-01-01 08:15:05,5,86.81,2,mobile,2014-01-01 08:10:45,13244,2012-04-15 23:31:04,1986-08-18 00:00:00,A
270,2,2014-01-01 00:18:25,5,123.53,5,mobile,2014-01-01 00:17:20,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
453,2,2014-01-01 00:19:30,5,9.32,5,mobile,2014-01-01 00:17:20,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
74,2,2014-01-01 00:23:50,5,90.69,5,mobile,2014-01-01 00:17:20,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
207,2,2014-01-01 00:24:55,5,48.27,5,mobile,2014-01-01 00:17:20,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
122,2,2014-01-01 00:27:05,5,13.81,5,mobile,2014-01-01 00:17:20,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
40,20,2014-01-01 04:46:00,5,53.22,5,desktop,2014-01-01 04:46:00,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
377,20,2014-01-01 05:01:10,5,83.33,5,desktop,2014-01-01 04:46:00,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
206,24,2014-01-01 05:48:50,5,61.3,5,tablet,2014-01-01 05:44:30,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
94,24,2014-01-01 05:55:20,5,100.42,5,tablet,2014-01-01 05:44:30,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
84,24,2014-01-01 05:57:30,5,75.75,5,tablet,2014-01-01 05:44:30,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
256,28,2014-01-01 06:51:40,5,101.39,5,mobile,2014-01-01 06:50:35,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
292,28,2014-01-01 06:53:50,5,138.17,5,mobile,2014-01-01 06:50:35,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
490,28,2014-01-01 07:07:55,5,149.02,5,mobile,2014-01-01 06:50:35,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
154,28,2014-01-01 07:09:00,5,44.11,5,mobile,2014-01-01 06:50:35,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
73,30,2014-01-01 07:29:35,5,42.94,5,desktop,2014-01-01 07:27:25,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
240,30,2014-01-01 07:40:25,5,59.71,5,desktop,2014-01-01 07:27:25,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
297,32,2014-01-01 08:04:15,5,20.65,5,mobile,2014-01-01 08:02:05,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
391,32,2014-01-01 08:07:30,5,57.88,5,mobile,2014-01-01 08:02:05,60091,2010-07-17 05:27:50,1984-07-28 00:00:00,A
461,3,2014-01-01 00:39:00,5,102.76,4,mobile,2014-01-01 00:28:10,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
44,3,2014-01-01 00:43:20,5,147.73,4,mobile,2014-01-01 00:28:10,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
327,5,2014-01-01 01:12:35,5,20.06,4,mobile,2014-01-01 01:11:30,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
48,5,2014-01-01 01:14:45,5,131.29,4,mobile,2014-01-01 01:11:30,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
442,5,2014-01-01 01:16:55,5,97.18,4,mobile,2014-01-01 01:11:30,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
285,8,2014-01-01 01:55:55,5,51.69,4,tablet,2014-01-01 01:55:55,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
225,8,2014-01-01 02:06:45,5,124.13,4,tablet,2014-01-01 01:55:55,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
254,11,2014-01-01 02:52:15,5,118.51,4,mobile,2014-01-01 02:47:55,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
465,11,2014-01-01 02:56:35,5,66.95,4,mobile,2014-01-01 02:47:55,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
487,11,2014-01-01 03:03:05,5,27.02,4,mobile,2014-01-01 02:47:55,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
117,12,2014-01-01 03:04:10,5,101.84,4,desktop,2014-01-01 03:04:10,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
196,12,2014-01-01 03:05:15,5,29.37,4,desktop,2014-01-01 03:04:10,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
226,21,2014-01-01 05:07:40,5,77.78,4,desktop,2014-01-01 05:02:15,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
288,21,2014-01-01 05:09:50,5,55.74,4,desktop,2014-01-01 05:02:15,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
494,21,2014-01-01 05:10:55,5,109.3,4,desktop,2014-01-01 05:02:15,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
380,21,2014-01-01 05:14:10,5,57.09,4,desktop,2014-01-01 05:02:15,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
236,21,2014-01-01 05:17:25,5,69.62,4,desktop,2014-01-01 05:02:15,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
87,21,2014-01-01 05:18:30,5,7.93,4,desktop,2014-01-01 05:02:15,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
109,22,2014-01-01 05:30:25,5,82.69,4,desktop,2014-01-01 05:21:45,60091,2011-04-08 20:08:14,2006-08-15 00:00:00,A
275,4,2014-01-01 00:45:30,5,108.11,1,mobile,2014-01-01 00:44:25,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
101,4,2014-01-01 00:46:35,5,112.53,1,mobile,2014-01-01 00:44:25,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
80,4,2014-01-01 00:47:40,5,6.29,1,mobile,2014-01-01 00:44:25,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
163,4,2014-01-01 00:52:00,5,31.37,1,mobile,2014-01-01 00:44:25,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
293,4,2014-01-01 00:53:05,5,82.88,1,mobile,2014-01-01 00:44:25,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
103,4,2014-01-01 00:57:25,5,20.79,1,mobile,2014-01-01 00:44:25,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
488,4,2014-01-01 01:03:55,5,129.0,1,mobile,2014-01-01 00:44:25,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
413,4,2014-01-01 01:05:00,5,119.98,1,mobile,2014-01-01 00:44:25,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
191,6,2014-01-01 01:31:00,5,139.23,1,tablet,2014-01-01 01:23:25,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
372,6,2014-01-01 01:37:30,5,114.84,1,tablet,2014-01-01 01:23:25,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
387,6,2014-01-01 01:38:35,5,49.71,1,tablet,2014-01-01 01:23:25,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
287,9,2014-01-01 02:28:25,5,50.94,1,desktop,2014-01-01 02:15:25,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
190,14,2014-01-01 03:29:05,5,110.52,1,tablet,2014-01-01 03:28:00,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
7,14,2014-01-01 03:39:55,5,107.42,1,tablet,2014-01-01 03:28:00,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
19,18,2014-01-01 04:14:35,5,133.49,1,desktop,2014-01-01 04:14:35,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
392,18,2014-01-01 04:17:50,5,72.67,1,desktop,2014-01-01 04:14:35,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
398,26,2014-01-01 06:18:05,5,27.95,1,tablet,2014-01-01 06:17:00,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
152,26,2014-01-01 06:26:45,5,42.81,1,tablet,2014-01-01 06:17:00,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
221,26,2014-01-01 06:31:05,5,7.08,1,tablet,2014-01-01 06:17:00,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
403,27,2014-01-01 06:35:25,5,28.26,1,mobile,2014-01-01 06:34:20,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
368,27,2014-01-01 06:36:30,5,139.43,1,mobile,2014-01-01 06:34:20,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
334,27,2014-01-01 06:38:40,5,54.26,1,mobile,2014-01-01 06:34:20,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
333,27,2014-01-01 06:44:05,5,103.2,1,mobile,2014-01-01 06:34:20,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
339,27,2014-01-01 06:45:10,5,26.56,1,mobile,2014-01-01 06:34:20,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
43,27,2014-01-01 06:47:20,5,55.26,1,mobile,2014-01-01 06:34:20,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
199,27,2014-01-01 06:48:25,5,5.91,1,mobile,2014-01-01 06:34:20,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
355,29,2014-01-01 07:11:10,5,110.68,1,mobile,2014-01-01 07:10:05,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
352,29,2014-01-01 07:13:20,5,92.43,1,mobile,2014-01-01 07:10:05,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
182,29,2014-01-01 07:16:35,5,125.73,1,mobile,2014-01-01 07:10:05,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
177,29,2014-01-01 07:19:50,5,55.11,1,mobile,2014-01-01 07:10:05,60091,2011-04-17 10:48:33,1994-07-18 00:00:00,A
259,7,2014-01-01 01:45:05,5,32.85,3,tablet,2014-01-01 01:39:40,13244,2011-08-13 15:42:34,2003-11-21 00:00:00,A
274,7,2014-01-01 01:46:10,5,14.45,3,tablet,2014-01-01 01:39:40,13244,2011-08-13 15:42:34,2003-11-21 00:00:00,A
214,7,2014-01-01 01:51:35,5,101.58,3,tablet,2014-01-01 01:39:40,13244,2011-08-13 15:42:34,2003-11-21 00:00:00,A
441,19,2014-01-01 04:30:50,5,9.34,3,desktop,2014-01-01 04:27:35,13244,2011-08-13 15:42:34,2003-11-21 00:00:00,A
146,19,2014-01-01 04:38:25,5,126.74,3,desktop,2014-01-01 04:27:35,13244,2011-08-13 15:42:34,2003-11-21 00:00:00,A
483,19,2014-01-01 04:43:50,5,60.17,3,desktop,2014-01-01 04:27:35,13244,2011-08-13 15:42:34,2003-11-21 00:00:00,A
159,23,2014-01-01 05:32:35,5,43.69,3,desktop,2014-01-01 05:32:35,13244,2011-08-13 15:42:34,2003-11-21 00:00:00,A
186,23,2014-01-01 05:40:10,5,128.26,3,desktop,2014-01-01 05:32:35,13244,2011-08-13 15:42:34,2003-11-21 00:00:00,A
378,25,2014-01-01 06:15:55,5,131.83,3,desktop,2014-01-01 05:59:40,13244,2011-08-13 15:42:34,2003-11-21 00:00:00,A
110,34,2014-01-01 08:24:50,5,145.74,3,desktop,2014-01-01 08:24:50,13244,2011-08-13 15:42:34,2003-11-21 00:00:00,A
497,34,2014-01-01 08:29:10,5,148.86,3,desktop,2014-01-01 08:24:50,13244,2011-08-13 15:42:34,2003-11-21 00:00:00,A
467,34,2014-01-01 08:32:25,5,145.19,3,desktop,2014-01-01 08:24:50,13244,2011-08-13 15:42:34,2003-11-21 00:00:00,A
267,34,2014-01-01 08:38:55,5,58.47,3,desktop,2014-01-01 08:24:50,13244,2011-08-13 15:42:34,2003-11-21 00:00:00,A
493,35,2014-01-01 08:48:40,5,132.94,3,mobile,2014-01-01 08:44:20,13244,2011-08-13 15:42:34,2003-11-21 00:00:00,A
338,35,2014-01-01 08:51:55,5,93.71,3,mobile,2014-01-01 08:44:20,13244,2011-08-13 15:42:34,2003-11-21 00:00:00,A


================================================
FILE: composeml/label_maker.py
================================================
from sys import stdout

from pandas import Series
from pandas.api.types import is_categorical_dtype
from tqdm import tqdm

from composeml.data_slice import DataSliceGenerator
from composeml.label_search import ExampleSearch, LabelSearch
from composeml.label_times import LabelTimes


class LabelMaker:
    """Automatically makes labels for prediction problems."""

    def __init__(
        self,
        target_dataframe_index,
        time_index,
        labeling_function=None,
        window_size=None,
    ):
        """Creates an instance of label maker.

        Args:
            target_dataframe_index (str): The index of the target dataframe, from which labels will be created.
            time_index (str): Name of time column in the data frame.
            labeling_function (function or list(function) or dict(str=function)): Function, list of functions, or dictionary of functions that transform a data slice.
                When set as a dictionary, the key is used as the name of the labeling function.
            window_size (str or int): Size of the data slices. As a string, the value can be a timedelta or a column in the data frame to group by.
                As an integer, the value can be the number of rows. Default value is all future data.
        """
        self.labeling_function = labeling_function or {}
        self.target_dataframe_index = target_dataframe_index
        self.time_index = time_index
        self.window_size = window_size

    def _name_labeling_function(self, function):
        """Gets the names of the labeling functions."""
        has_name = hasattr(function, "__name__")
        return function.__name__ if has_name else type(function).__name__

    def _check_labeling_function(self, function, name=None):
        """Checks whether the labeling function is callable."""
        assert callable(function), "labeling function must be callabe"
        return function

    @property
    def labeling_function(self):
        """Gets the labeling function(s)."""
        return self._labeling_function

    @labeling_function.setter
    def labeling_function(self, value):
        """Sets and formats the intial labeling function(s).

        Args:
            value (function or list(function) or dict(str=function)): Function that transforms a data slice to a label.
        """
        if isinstance(value, dict):
            for name, function in value.items():
                self._check_labeling_function(function)
                assert isinstance(name, str), "labeling function name must be string"

        if callable(value):
            value = [value]

        if isinstance(value, (tuple, list)):
            value = {
                self._name_labeling_function(function): self._check_labeling_function(
                    function,
                )
                for function in value
            }

        assert isinstance(value, dict), "value type for labeling function not supported"
        self._labeling_function = value

    def _check_cutoff_time(self, value):
        if isinstance(value, Series):
            if value.index.is_unique:
                return value.to_dict()
            else:
                raise ValueError("more than one cutoff time exists for a target group")
        else:
            return value

    def slice(
        self,
        df,
        num_examples_per_instance,
        minimum_data=None,
        maximum_data=None,
        gap=None,
        drop_empty=True,
    ):
        """Generates data slices of target dataframe.

        Args:
            df (DataFrame): Data frame to create slices on.
            num_examples_per_instance (int): Number of examples per unique instance of target dataframe.
            minimum_data (int or str or Series): The amount of data needed before starting the search. Defaults to the first value in the time index.
                The value can be a datetime string to directly set the first cutoff time or a timedelta string to denote the amount of data needed before
                the first cutoff time. The value can also be an integer to denote the number of rows needed before the first cutoff time.
                If a Series, minimum_data should be datetime string, timedelta string, or integer values with a unique set of target groups as the corresponding index.
            maximum_data (str): Maximum data before stopping the search. Default value is last time of index.
            gap (str or int): Time between examples. Default value is window size.
                If an integer, search will start on the first event after the minimum data.
            drop_empty (bool): Whether to drop empty slices. Default value is True.

        Returns:
            ds (generator): Returns a generator of data slices.
        """
        self._check_example_count(num_examples_per_instance, gap)
        df = self.set_index(df)
        target_groups = df.groupby(self.target_dataframe_index)
        num_examples_per_instance = ExampleSearch._check_number(
            num_examples_per_instance,
        )

        minimum_data = self._check_cutoff_time(minimum_data)
        minimum_data_varies = isinstance(minimum_data, dict)

        for group_key, df in target_groups:
            if minimum_data_varies:
                if group_key not in minimum_data:
                    continue
                min_data_for_group = minimum_data[group_key]
            else:
                min_data_for_group = minimum_data

            generator = DataSliceGenerator(
                window_size=self.window_size,
                min_data=min_data_for_group,
                max_data=maximum_data,
                drop_empty=drop_empty,
                gap=gap,
            )

            for ds in generator(df):
                setattr(ds.context, self.target_dataframe_index, group_key)
                yield ds

                if ds.context.slice_number >= num_examples_per_instance:
                    break

    @property
    def _bar_format(self):
        """Template to format the progress bar during a label search."""
        value = "Elapsed: {elapsed} | "
        value += "Remaining: {remaining} | "
        value += "Progress: {l_bar}{bar}| "
        value += self.target_dataframe_index + ": {n}/{total} "
        return value

    def _check_example_count(self, num_examples_per_instance, gap):
        """Checks whether example count corresponds to data slices."""
        if self.window_size is None and gap is None:
            more_than_one = num_examples_per_instance > 1
            assert (
                not more_than_one
            ), "must specify gap if num_examples > 1 and window size = none"

    def search(
        self,
        df,
        num_examples_per_instance,
        minimum_data=None,
        maximum_data=None,
        gap=None,
        drop_empty=True,
        verbose=True,
        *args,
        **kwargs,
    ):
        """Searches the data to calculates labels.

        Args:
            df (DataFrame): Data frame to search and extract labels.
            num_examples_per_instance (int or dict): The expected number of examples to return from each dataframe group.
                A dictionary can be used to further specify the expected number of examples to return from each label.
            minimum_data (int or str or Series): The amount of data needed before starting the search. Defaults to the first value in the time index.
                The value can be a datetime string to directly set the first cutoff time or a timedelta string to denote the amount of data needed before
                the first cutoff time. The value can also be an integer to denote the number of rows needed before the first cutoff time.
                If a Series, minimum_data should be datetime string, timedelta string, or integer values with a unique set of target groups as the corresponding index.
            maximum_data (str): Maximum data before stopping the search. Defaults to the last value in the time index.
            gap (str or int): Time between examples. Default value is window size.
                If an integer, search will start on the first event after the minimum data.
            drop_empty (bool): Whether to drop empty slices. Default value is True.
            verbose (bool): Whether to render progress bar. Default value is True.
            *args: Positional arguments for labeling function.
            **kwargs: Keyword arguments for labeling function.

        Returns:
            lt (LabelTimes): Calculated labels with cutoff times.
        """
        assert self.labeling_function, "missing labeling function(s)"
        self._check_example_count(num_examples_per_instance, gap)
        is_label_search = isinstance(num_examples_per_instance, dict)
        search = (LabelSearch if is_label_search else ExampleSearch)(
            num_examples_per_instance,
        )

        # check minimum data cutoff time
        minimum_data = self._check_cutoff_time(minimum_data)
        minimum_data_varies = isinstance(minimum_data, dict)

        df = self.set_index(df)
        total = search.expected_count if search.is_finite else 1
        # If the target is categorical, make sure there are no unused categories
        if is_categorical_dtype(df[self.target_dataframe_index]):
            df[self.target_dataframe_index] = df[
                self.target_dataframe_index
            ].cat.remove_unused_categories()
        target_groups = df.groupby(self.target_dataframe_index)
        total *= target_groups.ngroups

        progress_bar = tqdm(
            total=total,
            file=stdout,
            disable=not verbose,
            bar_format=self._bar_format,
        )

        records = []
        for group_count, (group_key, df) in enumerate(target_groups, start=1):
            if minimum_data_varies:
                if group_key not in minimum_data:
                    continue
                min_data_for_group = minimum_data[group_key]
            else:
                min_data_for_group = minimum_data

            generator = DataSliceGenerator(
                window_size=self.window_size,
                min_data=min_data_for_group,
                max_data=maximum_data,
                drop_empty=drop_empty,
                gap=gap,
            )

            for ds in generator(df):
                setattr(ds.context, self.target_dataframe_index, group_key)

                items = self.labeling_function.items()
                labels = {name: lf(ds, *args, **kwargs) for name, lf in items}
                valid_labels = search.is_valid_labels(labels)
                if not valid_labels:
                    continue

                records.append(
                    {
                        self.target_dataframe_index: group_key,
                        "time": ds.context.slice_start,
                        **labels,
                    },
                )

                search.update_count(labels)
                # if finite search, update progress bar for the example found
                if search.is_finite:
                    progress_bar.update(n=1)
                if search.is_complete:
                    break

            # if finite search, update progress bar for missing examples
            if search.is_finite:
                progress_bar.update(
                    n=group_count * search.expected_count - progress_bar.n,
                )
            else:
                progress_bar.update(
                    n=1,
                )  # otherwise, update progress bar once for each group
            search.reset_count()

        total -= progress_bar.n
        progress_bar.update(n=total)
        progress_bar.close()

        lt = LabelTimes(
            data=records,
            target_columns=list(self.labeling_function),
            target_dataframe_index=self.target_dataframe_index,
            search_settings={
                "num_examples_per_instance": num_examples_per_instance,
                "minimum_data": minimum_data,
                "maximum_data": str(maximum_data),
                "window_size": str(self.window_size),
                "gap": str(gap),
            },
        )

        return lt

    def set_index(self, df):
        """Sets the time index in a data frame (if not already set).

        Args:
            df (DataFrame): Data frame to set time index in.

        Returns:
            df (DataFrame): Data frame with time index set.
        """
        if df.index.name != self.time_index:
            df = df.set_index(self.time_index)

        if "time" not in str(df.index.dtype):
            df.index = df.index.astype("datetime64[ns]")

        return df


================================================
FILE: composeml/label_search.py
================================================
from collections import Counter

from pandas import isnull


class ExampleSearch:
    """A label search based on the number of examples.

    Args:
        expected_count (int): The expected number of examples to find.
    """

    def __init__(self, expected_count):
        self.expected_count = self._check_number(expected_count)
        self.reset_count()

    @staticmethod
    def _check_number(n):
        """Checks and formats the expected number of examples."""
        if n == -1 or n == "inf":
            return float("inf")
        else:
            info = "expected count must be numeric"
            assert isinstance(n, (int, float)), info
            return n

    @staticmethod
    def _is_finite_number(n):
        """Checks if a number if finite."""
        return n > 0 and abs(n) != float("inf")

    @property
    def is_complete(self):
        """Whether the search has found the expected number of examples."""
        return self.actual_count >= self.expected_count

    @property
    def is_finite(self):
        """Whether the expected number of examples is a finite number."""
        return self._is_finite_number(self.expected_count)

    def is_valid_labels(self, labels):
        """Whether the label values are not null."""
        return not any(map(isnull, labels.values()))

    def reset_count(self):
        """Reset the internal count of actual labels."""
        self.actual_count = 0

    def update_count(self, labels):
        """Update the internal count of actual labels."""
        self.actual_count += 1


class LabelSearch(ExampleSearch):
    """A label search based on the number of examples for each label.

    Args:
        expected_label_counts (dict): The expected number of examples to be find for each label.
            The dictionary should map a label to the number of examples to find for the label.
    """

    def __init__(self, expected_label_counts):
        items = expected_label_counts.items()
        self.expected_label_counts = Counter(
            {label: self._check_number(count) for label, count in items},
        )
        self.expected_count = sum(self.expected_label_counts.values())
        self.actual_label_counts = Counter()

    @property
    def is_complete(self):
        """Whether the search has found the expected number of examples for each label."""
        return len(self.expected_label_counts - self.actual_label_counts) == 0

    def is_complete_label(self, label):
        """Whether the search has found the expected number of examples for a label."""
        return (
            self.actual_label_counts.get(label, 0) >= self.expected_label_counts[label]
        )

    def is_valid_labels(self, labels):
        """Whether label values meet the search criteria.

        The search criteria is defined as label values that are not null, expected by the user, and have not reached the expected count.
        When these conditions are met by any label value, the labels are set to return to the user.
        This includes the other label values which share the same cutoff time.

        Args:
            labels (dict): The actual label values found during a search.

        Returns:
            value (bool): The value is True when valid, otherwise False.
        """
        label_values = labels.values()
        not_null = super().is_valid_labels(labels)
        is_expected = not_null and any(
            label in self.expected_label_counts for label in label_values
        )
        value = is_expected and any(
            not self.is_complete_label(label) for label in label_values
        )
        return value

    def reset_count(self):
        """Reset the internal count of actual labels."""
        self.actual_label_counts.clear()

    def update_count(self, labels):
        """Update the internal count of the actual labels.

        Args:
            labels (dict): The actual label values found during a search.
        """
        self.actual_label_counts.update(labels.values())


================================================
FILE: composeml/label_times/__init__.py
================================================
# flake8:noqa
from composeml.label_times.deserialize import read_label_times
from composeml.label_times.object import LabelTimes


================================================
FILE: composeml/label_times/description.py
================================================
import pandas as pd


def describe_label_times(label_times):
    """Prints out label info with transform settings that reproduce labels."""
    target_column = label_times.target_columns[0]
    is_discrete = label_times.is_discrete[target_column]

    if is_discrete:
        distribution = label_times[target_column].value_counts()
        distribution.sort_index(inplace=True)
        distribution.index = distribution.index.astype("str")
        distribution["Total:"] = distribution.sum()
    else:
        distribution = label_times[target_column].describe()

    print("Label Distribution\n" + "-" * 18, end="\n")
    print(distribution.to_string(), end="\n\n\n")

    metadata = label_times.settings
    target_column = metadata["label_times"]["target_columns"][0]
    target_type = metadata["label_times"]["target_types"][target_column]
    target_dataframe_index = metadata["label_times"]["target_dataframe_index"]

    settings = {
        "target_column": target_column,
        "target_dataframe_index": target_dataframe_index,
        "target_type": target_type,
    }

    settings.update(metadata["label_times"]["search_settings"])
    settings = pd.Series(settings)

    print("Settings\n" + "-" * 8, end="\n")
    settings.sort_index(inplace=True)
    print(settings.to_string(), end="\n\n\n")

    print("Transforms\n" + "-" * 10, end="\n")
    transforms = metadata["label_times"]["transforms"]
    for step, transform in enumerate(transforms):
        transform = pd.Series(transform)
        transform.sort_index(inplace=True)
        name = transform.pop("transform")
        transform = transform.add_prefix("  - ")
        transform = transform.add_suffix(":")
        transform = transform.to_string()
        header = "{}. {}\n".format(step + 1, name)
        print(header + transform, end="\n\n")

    if len(transforms) == 0:
        print("No transforms applied", end="\n\n")


================================================
FILE: composeml/label_times/deserialize.py
================================================
import json
import os

import pandas as pd

from composeml.label_times.object import LabelTimes


def read_config(path):
    """Reads config file from disk."""
    file = os.path.join(path, "settings.json")
    assert os.path.exists(file), "settings not found: '%s'" % file

    with open(file, "r") as file:
        settings = json.load(file)
        return settings


def read_data(path):
    """Reads data file from disk."""
    file = ""
    for file in os.listdir(path):
        if file.startswith("data"):
            break

    assert file.startswith("data"), "data not found"
    extension = os.path.splitext(file)[1].lstrip(".")
    info = "file extension must be csv, parquet, or pickle"
    assert extension in ["csv", "parquet", "pickle"], info

    read = getattr(pd, "read_%s" % extension)
    data = read(os.path.join(path, file))
    return data


def read_label_times(path, load_settings=True):
    """Reads label times from disk.

    Args:
        path (str): Directory where label times is stored.

    Returns:
        lt (LabelTimes): Deserialized label times.
    """
    kwargs = {}
    data = read_data(path)

    if load_settings:
        config = read_config(path)
        data = data.astype(config["dtypes"])
        kwargs.update(config["label_times"])

    lt = LabelTimes(data=data, **kwargs)
    return lt


================================================
FILE: composeml/label_times/object.py
================================================
import json
import os

import pandas as pd

from composeml.label_times.description import describe_label_times
from composeml.label_times.plots import LabelPlots
from composeml.version import __version__

SCHEMA_VERSION = "0.1.0"


class LabelTimes(pd.DataFrame):
    """The data frame that contains labels and cutoff times for the target dataframe."""

    def __init__(
        self,
        data=None,
        target_dataframe_index=None,
        target_types=None,
        target_columns=None,
        search_settings=None,
        transforms=None,
        *args,
        **kwargs,
    ):
        super().__init__(data=data, *args, **kwargs)
        self.target_dataframe_index = target_dataframe_index
        self.target_columns = target_columns or []
        self.target_types = target_types or {}
        self.search_settings = search_settings or {}
        self.transforms = transforms or []
        self.plot = LabelPlots(self)

        if not self.empty:
            self._check_label_times()

    def _assert_single_target(self):
        """Asserts that the label times object contains a single target."""
        info = "must first select an individual target"
        assert self._is_single_target, info

    def _check_target_columns(self):
        """Validates the target columns."""
        if not self.target_columns:
            self.target_columns = self._infer_target_columns()
        else:
            for target in self.target_columns:
                info = 'target "%s" not found in data frame'
                assert target in self.columns, info % target

    def _check_target_types(self):
        """Validates the target types."""
        if isinstance(self.target_types, dict):
            self.target_types = pd.Series(self.target_types, dtype="object")

        if self.target_types.empty:
            self.target_types = self._infer_target_types()
        else:
            target_names = self.target_types.index.tolist()
            match = target_names == self.target_columns
            assert match, "target names in types must match target columns"

    def _check_label_times(self):
        """Validates the lables times object."""
        self._check_target_columns()
        self._check_target_types()

    def _infer_target_columns(self):
        """Infers the names of the targets in the data frame.

        Returns:
            value (list): A list of the target names.
        """
        not_targets = [self.target_dataframe_index, "time"]
        target_columns = self.columns.difference(not_targets)
        assert not target_columns.empty, "target columns not found"
        value = target_columns.tolist()
        return value

    @property
    def _is_single_target(self):
        return len(self.target_columns) == 1

    def _get_target_type(self, dtype):
        is_discrete = pd.api.types.is_bool_dtype(dtype)
        is_discrete |= pd.api.types.is_categorical_dtype(dtype)
        is_discrete |= pd.api.types.is_object_dtype(dtype)
        value = "discrete" if is_discrete else "continuous"
        return value

    def _infer_target_types(self):
        """Infers the target type from the data type.

        Returns:
            types (Series): Inferred label type. Either "continuous" or "discrete".
        """
        dtypes = self.dtypes[self.target_columns]
        types = dtypes.apply(self._get_target_type)
        return types

    def select(self, target):
        """Selects one of the target variables.

        Args:
            target (str): The name of the target column.

        Returns:
            lt (LabelTimes): A label times object that contains a single target.

        Examples:
            Create a label times object that contains multiple target variables.

            >>> entity = [0, 0, 1, 1]
            >>> labels = [True, False, True, False]
            >>> time = ['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04']
            >>> data = {'entity': entity, 'time': time, 'A': labels, 'B': labels}
            >>> lt = LabelTimes(data=data, target_dataframe_index='entity', target_columns=['A', 'B'])
            >>> lt
               entity        time      A      B
            0       0  2020-01-01   True   True
            1       0  2020-01-02  False  False
            2       1  2020-01-03   True   True
            3       1  2020-01-04  False  False

            Select a single target from the label times.

            >>> lt.select('B')
               entity        time      B
            0       0  2020-01-01   True
            1       0  2020-01-02  False
            2       1  2020-01-03   True
            3       1  2020-01-04  False
        """
        assert not self._is_single_target, "only one target exists"
        if not isinstance(target, str):
            raise TypeError("target name must be string")
        assert target in self.target_columns, 'target "%s" not found' % target

        lt = self.copy()
        lt.target_columns = [target]
        lt.target_types = lt.target_types[[target]]
        lt = lt[[self.target_dataframe_index, "time", target]]
        return lt

    @property
    def settings(self):
        """Returns metadata about the label times."""
        return {
            "compose_version": __version__,
            "schema_version": SCHEMA_VERSION,
            "label_times": {
                "target_dataframe_index": self.target_dataframe_index,
                "target_columns": self.target_columns,
                "target_types": self.target_types.to_dict(),
                "search_settings": self.search_settings,
                "transforms": self.transforms,
            },
        }

    @property
    def is_discrete(self):
        """Whether labels are discrete."""
        return self.target_types.eq("discrete")

    @property
    def distribution(self):
        """Returns label distribution if labels are discrete."""
        self._assert_single_target()
        target_column = self.target_columns[0]

        if self.is_discrete[target_column]:
            labels = self.assign(count=1)
            labels = labels.groupby(target_column)
            distribution = labels["count"].count()
            return distribution
        else:
            return self[target_column].describe()

    @property
    def count(self):
        """Returns label count per instance."""
        self._assert_single_target()
        count = self.groupby(self.target_dataframe_index)
        count = count[self.target_columns[0]].count()
        count = count.to_frame("count")
        return count

    @property
    def count_by_time(self):
        """Returns label count across cutoff times."""
        self._assert_single_target()
        target_column = self.target_columns[0]

        if self.is_discrete[target_column]:
            keys = ["time", target_column]
            value = self.groupby(keys).time.count()
            value = value.unstack(target_column).fillna(0)
        else:
            value = self.groupby("time")
            value = value[target_column].count()

        value = (
            value.cumsum()
        )  # In Python 3.5, these values automatically convert to float.
        value = value.astype("int")
        return value

    def describe(self):
        """Prints out the settings used to make the label times."""
        if not self.empty:
            self._assert_single_target()
            describe_label_times(self)

    def copy(self, deep=True):
        """Make a copy of this object's indices and data.

        Args:
            deep (bool): Make a deep copy, including a copy of the data and the indices.
                With ``deep=False`` neither the indices nor the data are copied. Default is True.

        Returns:
            lt (LabelTimes): A copy of the label times object.
        """
        lt = super().copy(deep=deep)
        lt.target_dataframe_index = self.target_dataframe_index
        lt.target_columns = self.target_columns
        lt.target_types = self.target_types.copy()
        lt.search_settings = self.search_settings.copy()
        lt.transforms = self.transforms.copy()
        return lt

    def threshold(self, value, inplace=False):
        """Creates binary labels by testing if labels are above threshold.

        Args:
            value (float) : Value of threshold.
            inplace (bool) : Modify labels in place.

        Returns:
            labels (LabelTimes) : Instance of labels.
        """
        self._assert_single_target()
        target_column = self.target_columns[0]
        labels = self if inplace else self.copy()
        labels[target_column] = labels[target_column].gt(value)
        labels.target_types[target_column] = "discrete"

        transform = {"transform": "threshold", "value": value}
        labels.transforms.append(transform)

        if not inplace:
            return labels

    def apply_lead(self, value, inplace=False):
        """Shifts the label times earlier for predicting in advance.

        Args:
            value (str) : Time to shift earlier.
            inplace (bool) : Modify labels in place.

        Returns:
            labels (LabelTimes) : Instance of labels.
        """
        labels = self if inplace else self.copy()
        labels["time"] = labels["time"].sub(pd.Timedelta(value))

        transform = {"transform": "apply_lead", "value": value}
        labels.transforms.append(transform)

        if not inplace:
            return labels

    def bin(self, bins, quantiles=False, labels=None, right=True, precision=3):
        """Bin labels into discrete intervals.

        Args:
            bins (int or array): The criteria to bin by.
                As an integer, the value can be the number of equal-width or quantile-based bins.
                If :code:`quantiles` is False, the value is defined as the number of equal-width bins.
                The range is extended by .1% on each side to include the minimum and maximum values.
                If :code:`quantiles` is True, the value is defined as the number of quantiles (e.g. 10 for deciles, 4 for quartiles, etc.)
                As an array, the value can be custom or quantile-based edges.
                If :code:`quantiles` is False, the value is defined as bin edges allowing for non-uniform width. No extension is done.
                If :code:`quantiles` is True, the value is defined as bin edges usings an array of quantiles (e.g. [0, .25, .5, .75, 1.] for quartiles)

            quantiles (bool): Determines whether to use a quantile-based discretization function.
            labels (array): Specifies the labels for the returned bins. Must be the same length as the resulting bins.
            right (bool) : Indicates whether bins includes the rightmost edge or not. Does not apply to quantile-based bins.
            precision (int): The precision at which to store and display the bins labels. Default value is 3.

        Returns:
            LabelTimes : Instance of labels.

        Examples:
            These are the target values for the examples.

            >>> data = [226.93, 47.95, 283.46, 31.54]
            >>> lt = LabelTimes({'target': data})
            >>> lt
               target
            0  226.93
            1   47.95
            2  283.46
            3   31.54

            Bin values using equal-widths.

            >>> lt.bin(2)
                        target
            0  (157.5, 283.46]
            1  (31.288, 157.5]
            2  (157.5, 283.46]
            3  (31.288, 157.5]

            Bin values using custom-widths.

            >>> lt.bin([0, 200, 400])
                   target
            0  (200, 400]
            1    (0, 200]
            2  (200, 400]
            3    (0, 200]

            Bin values using infinite edges.

            >>> lt.bin(['-inf', 100, 'inf'])
                      target
            0   (100.0, inf]
            1  (-inf, 100.0]
            2   (100.0, inf]
            3  (-inf, 100.0]

            Bin values using quartiles.

            >>> lt.bin(4, quantiles=True)
                                     target
            0             (137.44, 241.062]
            1              (43.848, 137.44]
            2             (241.062, 283.46]
            3  (31.538999999999998, 43.848]

            Bin values using custom quantiles with precision.

            >>> lt.bin([0, .5, 1], quantiles=True, precision=1)
                       target
            0  (137.4, 283.5]
            1   (31.4, 137.4]
            2  (137.4, 283.5]
            3   (31.4, 137.4]

            Assign labels to bins.

            >>> lt.bin(2, labels=['low', 'high'])
              target
            0   high
            1    low
            2   high
            3    low
        """  # noqa
        self._assert_single_target()
        target_column = self.target_columns[0]
        values = self[target_column].values

        if quantiles:
            values = pd.qcut(values, q=bins, labels=labels, precision=precision)

        else:
            if isinstance(bins, list):
                for i, edge in enumerate(bins):
                    if edge in ["-inf", "inf"]:
                        bins[i] = float(edge)

            values = pd.cut(
                values,
                bins=bins,
                labels=labels,
                right=right,
                precision=precision,
            )

        transform = {
            "transform": "bin",
            "bins": bins,
            "quantiles": quantiles,
            "labels": labels,
            "right": right,
            "precision": precision,
        }

        lt = self.copy()
        lt[target_column] = values
        lt.transforms.append(transform)
        lt.target_types[target_column] = "discrete"
        return lt

    def _sample(self, key, value, settings, random_state=None, replace=False):
        """Returns a random sample of labels.

        Args:
            key (str) : Determines the sampling method. Can either be 'n' or 'frac'.
            value (int or float) : Quantity to sample.
            settings (dict) : Transform settings used for sampling.
            random_state (int) : Seed for the random number generator.
            replace (bool) : Sample with or without replacement. Default value is False.

        Returns:
            LabelTimes : Random sample of labels.
        """
        sample = super().sample(
            random_state=random_state, replace=replace, **{key: value}
        )
        return sample

    def _sample_per_label(self, key, value, settings, random_state=None, replace=False):
        """Returns a random sample per label.

        Args:
            key (str) : Determines the sampling method. Can either be 'n' or 'frac'.
            value (dict) : Quantity to sample per label.
            settings (dict) : Transform settings used for sampling.
            random_state (int) : Seed for the random number generator.
            replace (bool) : Sample with or without replacement. Default value is False.

        Returns:
            LabelTimes : Random sample per label.
        """
        sample_per_label = []
        target_column = self.target_columns[0]

        for (
            label,
            value,
        ) in value.items():
            label = self[self[target_column] == label]
            sample = label._sample(
                key,
                value,
                settings,
                random_state=random_state,
                replace=replace,
            )
            sample_per_label.append(sample)

        sample = pd.concat(sample_per_label, axis=0, sort=False)
        return sample

    def sample(
        self,
        n=None,
        frac=None,
        random_state=None,
        replace=False,
        per_instance=False,
    ):
        """Return a random sample of labels.

        Args:
            n (int or dict) : Sample number of labels. A dictionary returns
                the number of samples to each label. Cannot be used with frac.
            frac (float or dict) : Sample fraction of labels. A dictionary returns
                the sample fraction to each label. Cannot be used with n.
            random_state (int) : Seed for the random number generator.
            replace (bool) : Sample with or without replacement. Default value is False.
            per_instance (bool): Whether to apply sampling to each group. Default is False.

        Returns:
            LabelTimes : Random sample of labels.

        Examples:
            Create a label times object.

            >>> entity = [0, 0, 1, 1]
            >>> labels = [True, False, True, False]
            >>> data = {'entity': entity, 'labels': labels}
            >>> lt = LabelTimes(data=data, target_dataframe_index='entity', target_columns=['labels'])
            >>> lt
               entity  labels
            0       0    True
            1       0   False
            2       1    True
            3       1   False

            Sample a number of the examples.

            >>> lt.sample(n=3, random_state=0)
               entity  labels
            1       0   False
            2       1    True
            3       1   False

            Sample a fraction of the examples.

            >>> lt.sample(frac=.25, random_state=0)
               entity  labels
            2       1    True

            Sample a number of the examples for specific labels.

            >>> n = {True: 1, False: 1}
            >>> lt.sample(n=n, random_state=0)
               entity  labels
            2       1    True
            3       1   False

            Sample a fraction of the examples for specific labels.

            >>> frac = {True: .5, False: .5}
            >>> lt.sample(frac=frac, random_state=0)
               entity  labels
            2       1    True
            3       1   False

            Sample a number of the examples from each entity group.

            >>> lt.sample(n={True: 1}, per_instance=True, random_state=0)
               entity  labels
            0       0    True
            2       1    True

            Sample a fraction of the examples from each entity group.

            >>> lt.sample(frac=.5, per_instance=True, random_state=0)
               entity  labels
            1       0   False
            3       1   False
        """  # noqa
        self._assert_single_target()

        settings = {
            "transform": "sample",
            "n": n,
            "frac": frac,
            "random_state": random_state,
            "replace": replace,
            "per_instance": per_instance,
        }

        key, value = ("n", n) if n else ("frac", frac)
        assert value, "must set value for 'n' or 'frac'"

        per_label = isinstance(value, dict)
        method = "_sample_per_label" if per_label else "_sample"

        def transform(lt):
            sample = getattr(lt, method)(
                key=key,
                value=value,
                settings=settings,
                random_state=random_state,
                replace=replace,
            )
            return sample

        if per_instance:
            groupby = self.groupby(self.target_dataframe_index, group_keys=False)
            sample = groupby.apply(transform)
        else:
            sample = transform(self)

        sample = sample.copy()
        sample.sort_index(inplace=True)
        sample.transforms.append(settings)
        return sample

    def equals(self, other, **kwargs):
        """Determines if two label time objects are the same.

        Args:
            other (LabelTimes) : Other label time object for comparison.
            **kwargs: Keyword arguments to pass to underlying pandas.DataFrame.equals method

        Returns:
            bool : Whether label time objects are the same.
        """
        is_equal = super().equals(other, **kwargs)
        is_equal &= self.settings == other.settings
        return is_equal

    def _save_settings(self, path):
        """Write the settings in json format to disk.

        Args:
            path (str) : Directory on disk to write to.
        """
        settings = self.settings
        dtypes = self.dtypes.astype("str")
        settings["dtypes"] = dtypes.to_dict()

        file = os.path.join(path, "settings.json")
        with open(file, "w") as file:
            json.dump(settings, file)

    def to_csv(self, path, save_settings=True, **kwargs):
        """Write label times in csv format to disk.

        Args:
            path (str) : Location on disk to write to (will be created as a directory).
            save_settings (bool) : Whether to save the settings used to make the label times.
            **kwargs: Keyword arguments to pass to underlying pandas.DataFrame.to_csv method
        """
        os.makedirs(path, exist_ok=True)
        file = os.path.join(path, "data.csv")
        super().to_csv(file, index=False, **kwargs)

        if save_settings:
            self._save_settings(path)

    def to_parquet(self, path, save_settings=True, **kwargs):
        """Write label times in parquet format to disk.

        Args:
            path (str) : Location on disk to write to (will be created as a directory).
            save_settings (bool) : Whether to save the settings used to make the label times.
            **kwargs: Keyword arguments to pass to underlying pandas.DataFrame.to_parquet method
        """
        os.makedirs(path, exist_ok=True)
        file = os.path.join(path, "data.parquet")
        super().to_parquet(file, compression=None, engine="auto", **kwargs)

        if save_settings:
            self._save_settings(path)

    def to_pickle(self, path, save_settings=True, **kwargs):
        """Write label times in pickle format to disk.

        Args:
            path (str) : Location on disk to write to (will be created as a directory).
            save_settings (bool) : Whether to save the settings used to make the label times.
            **kwargs: Keyword arguments to pass to underlying pandas.DataFrame.to_pickle method
        """
        os.makedirs(path, exist_ok=True)
        file = os.path.join(path, "data.pickle")
        super().to_pickle(file, **kwargs)

        if save_settings:
            self._save_settings(path)

    # ----------------------------------------
    # Subclassing Pandas Data Frame
    # ----------------------------------------

    _metadata = [
        "search_settings",
        "target_columns",
        "target_dataframe_index",
        "target_types",
        "transforms",
    ]

    def __finalize__(self, other, method=None, **kwargs):
        """Propagate metadata from other label times data frames.

        Args:
            other (LabelTimes) : The label times from which to get the attributes from.
            method (str) : A passed method name for optionally taking different types of propagation actions based on this value.
        """
        if method == "concat":
            other = other.objs[0]

            for key in self._metadata:
                value = getattr(other, key, None)
                setattr(self, key, value)

            return self

        return super().__finalize__(other=other, method=method, **kwargs)

    @property
    def _constructor(self):
        return LabelTimes


================================================
FILE: composeml/label_times/plots.py
================================================
import matplotlib as mpl  # isort:skip
import pandas as pd
import seaborn as sns

# Raises an import error on OSX if not included.
# https://matplotlib.org/3.1.0/faq/osx_framework.html#working-with-matplotlib-on-osx
mpl.use("agg")  # noqa
pd.plotting.register_matplotlib_converters()
sns.set_context("notebook")
sns.set_style("darkgrid")
COLOR = sns.color_palette("Set1", n_colors=100, desat=0.75)


class LabelPlots:
    """Creates plots for Label Times."""

    def __init__(self, label_times):
        """Initializes Label Plots.

        Args:
            label_times (LabelTimes) : instance of Label Times
        """
        self._label_times = label_times

    def count_by_time(self, ax=None, **kwargs):
        """Plots the label distribution across cutoff times."""
        count_by_time = self._label_times.count_by_time
        count_by_time.sort_index(inplace=True)
        target_column = self._label_times.target_columns[0]

        ax = ax or mpl.pyplot.axes(label=id(self))
        vmin = count_by_time.index.min()
        vmax = count_by_time.index.max()
        ax.set_xlim(vmin, vmax)

        locator = mpl.dates.AutoDateLocator()
        formatter = mpl.dates.AutoDateFormatter(locator)
        ax.xaxis.set_major_locator(locator)
        ax.xaxis.set_major_formatter(formatter)
        for label in ax.get_xticklabels():
            label.set_rotation(30)

        if len(count_by_time.shape) > 1:
            ax.stackplot(
                count_by_time.index,
                count_by_time.values.T,
                labels=count_by_time.columns,
                colors=COLOR,
                alpha=0.9,
                **kwargs,
            )

            ax.legend(
                loc="upper left",
                title=target_column,
                facecolor="w",
                framealpha=0.9,
            )

            ax.set_title("Label Count vs. Cutoff Times")
            ax.set_ylabel("Count")
            ax.set_xlabel("Time")

        else:
            ax.fill_between(
                count_by_time.index,
                count_by_time.values.T,
                color=COLOR[1],
            )

            ax.set_title("Label vs. Cutoff Times")
            ax.set_ylabel(target_column)
            ax.set_xlabel("Time")

        return ax

    @property
    def dist(self):
        """Alias for distribution."""
        return self.distribution

    def distribution(self, **kwargs):
        """Plots the label distribution."""
        self._label_times._assert_single_target()
        target_column = self._label_times.target_columns[0]
        dist = self._label_times[target_column]
        is_discrete = self._label_times.is_discrete[target_column]

        if is_discrete:
            ax = sns.countplot(x=dist, palette=COLOR, **kwargs)
        else:
            ax = sns.histplot(x=dist, kde=True, color=COLOR[1], **kwargs)

        ax.set_title("Label Distribution")
        ax.set_ylabel("Count")
        return ax


================================================
FILE: composeml/tests/__init__.py
================================================


================================================
FILE: composeml/tests/requirement_files/latest_core_dependencies.txt
================================================
featuretools==1.27.0
matplotlib==3.7.2
pandas==2.0.3
seaborn==0.12.2
tqdm==4.66.1
woodwork==0.25.1


================================================
FILE: composeml/tests/requirement_files/minimum_core_requirements.txt
================================================
matplotlib==3.3.3
pandas==2.0.0
seaborn==0.12.2
tqdm==4.32.0


================================================
FILE: composeml/tests/requirement_files/minimum_test_requirements.txt
================================================
featuretools==1.27.0
matplotlib==3.3.3
pandas==2.0.0
pip==21.3.1
pyarrow==7.0.0
pytest-cov==3.0.0
pytest-xdist==2.5.0
pytest==7.1.2
seaborn==0.12.2
tqdm==4.32.0
wheel==0.33.1
woodwork==0.25.1


================================================
FILE: composeml/tests/test_data_slice/__init__.py
================================================


================================================
FILE: composeml/tests/test_data_slice/test_extension.py
================================================
import pandas as pd
from pytest import fixture, mark, raises

from composeml import LabelMaker


@fixture
def data_slice(transactions):
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        window_size="1h",
    )
    ds = next(lm.slice(transactions, num_examples_per_instance=1))
    return ds


def test_context(data_slice):
    print(data_slice.context)
    context = str(data_slice.context)
    actual = context.splitlines()

    expected = [
        "customer_id                       0",
        "slice_number                      1",
        "slice_start     2019-01-01 08:00:00",
        "slice_stop      2019-01-01 09:00:00",
        "next_start      2019-01-01 09:00:00",
    ]

    assert actual == expected


def test_context_aliases(data_slice):
    assert data_slice.context == data_slice.ctx
    assert data_slice.context.slice_number == data_slice.ctx.count
    assert data_slice.context.slice_start == data_slice.ctx.start
    assert data_slice.context.slice_stop == data_slice.ctx.stop


@mark.parametrize(
    "time_based,offsets",
    argvalues=[
        [False, (2, 4, 2)],
        [False, (2, -6, 2)],
        [True, (pd.Timedelta("1h"), pd.Timedelta("2h"), pd.Timedelta("1h"))],
        [True, (pd.Timedelta("1h"), pd.Timedelta("-2h30min"), pd.Timedelta("1h"))],
        [True, ("2019-01-01 09:00:00", "2019-01-01 10:00:00", pd.Timedelta("1h"))],
    ],
)
def test_subscriptable_slices(transactions, time_based, offsets):
    if time_based:
        dtypes = {"time": "datetime64[ns]"}
        transactions = transactions.astype(dtypes)
        transactions.set_index("time", inplace=True)

    start, stop, size = offsets
    slices = transactions.slice[start:stop:size]
    actual = tuple(map(len, slices))
    assert actual == (2, 2)


def test_subscriptable_error(transactions):
    with raises(TypeError, match="must be a slice object"):
        transactions.slice[0]


def test_time_index_error(transactions):
    match = "offset by frequency requires a time index"
    with raises(AssertionError, match=match):
        transactions.slice[::"1h"]


def test_minimum_data_per_group(transactions):
    lm = LabelMaker(
        "customer_id",
        labeling_function=len,
        time_index="time",
        window_size="1h",
    )
    minimum_data = {1: "2019-01-01 09:00:00", 3: "2019-01-01 12:00:00"}
    lengths = [len(ds) for ds in lm.slice(transactions, 1, minimum_data=minimum_data)]
    assert lengths == [2, 1]


def test_drop_empty(transactions):
    df = transactions.astype({"time": "datetime64[ns]"})
    df.set_index("time", inplace=True)
    df.sort_index(inplace=True)

    ds = df.slice(
        size="1h",
        drop_empty=True,
        stop="2019-01-01 15:00:00",
        start="2019-01-01 08:00:00",
    )

    assert len(list(ds)) == 5


================================================
FILE: composeml/tests/test_data_slice/test_offset.py
================================================
from pytest import raises

from composeml.data_slice.offset import DataSliceOffset


def test_numeric_typecast():
    assert int(DataSliceOffset("1 nanosecond")) == 1
    assert float(DataSliceOffset("1970-01-01")) == 0.0


def test_numeric_typecast_errors():
    match = "offset must be position or frequency based"
    with raises(TypeError, match=match):
        int(DataSliceOffset("1970-01-01"))

    match = "offset must be a timestamp"
    with raises(TypeError, match=match):
        float(DataSliceOffset("1 nanosecond"))


def test_invalid_value():
    match = "offset must be position or time based"
    with raises(AssertionError, match=match):
        DataSliceOffset(None)


def test_alias_phrase():
    phrase = "until start of next month"
    actual = DataSliceOffset(phrase).value
    expected = DataSliceOffset("MS").value
    assert actual == expected

    phrase = "until start of next year"
    actual = DataSliceOffset(phrase).value
    expected = DataSliceOffset("YS").value
    assert actual == expected


================================================
FILE: composeml/tests/test_datasets.py
================================================
import pytest

from composeml import demos


@pytest.fixture
def transactions():
    return demos.load_transactions()


def test_transactions(transactions):
    assert len(transactions) == 100


================================================
FILE: composeml/tests/test_featuretools.py
================================================
import featuretools as ft
import pytest

from composeml import LabelMaker


def total_spent(df):
    total = df.amount.sum()
    return total


@pytest.fixture
def labels():
    df = ft.demo.load_mock_customer(return_single_table=True, random_seed=0)
    df = df[["transaction_time", "customer_id", "amount"]]
    df.sort_values("transaction_time", inplace=True)

    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="transaction_time",
        labeling_function=total_spent,
        window_size="1h",
    )

    lt = lm.search(
        df,
        minimum_data="10min",
        num_examples_per_instance=2,
        gap="30min",
        drop_empty=True,
        verbose=False,
    )

    lt = lt.threshold(1250)
    return lt


def test_dfs(labels):
    target_column = labels.target_columns[0]
    es = ft.demo.load_mock_customer(return_entityset=True, random_seed=0)
    feature_matrix, _ = ft.dfs(
        entityset=es,
        target_dataframe_name="customers",
        cutoff_time=labels,
        cutoff_time_in_index=True,
    )
    assert target_column in feature_matrix

    columns = ["customer_id", "time", target_column]
    given_labels = feature_matrix.reset_index()[columns]
    given_labels = given_labels.sort_values(["customer_id", "time"])
    given_labels = given_labels.reset_index(drop=True)
    given_labels = given_labels.rename_axis("label_id")

    assert given_labels.equals(labels)


================================================
FILE: composeml/tests/test_label_maker.py
================================================
import pandas as pd
import pytest

from composeml import LabelMaker
from composeml.tests.utils import to_csv


def test_search_default(transactions, total_spent_fn):
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=total_spent_fn,
    )

    given_labels = lm.search(transactions, num_examples_per_instance=1)
    given_labels = to_csv(given_labels, index=False)

    labels = [
        "customer_id,time,total_spent",
        "0,2019-01-01 08:00:00,2",
        "1,2019-01-01 09:00:00,3",
        "2,2019-01-01 10:30:00,4",
        "3,2019-01-01 12:30:00,1",
    ]

    assert given_labels == labels


def test_search_examples_per_label(transactions, total_spent_fn):
    def total_spent(ds):
        return total_spent_fn(ds) > 2

    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=total_spent,
    )

    n_examples = {True: -1, False: 1}
    given_labels = lm.search(transactions, num_examples_per_instance=n_examples, gap=1)
    given_labels = to_csv(given_labels, index=False)

    labels = [
        "customer_id,time,total_spent",
        "0,2019-01-01 08:00:00,False",
        "1,2019-01-01 09:00:00,True",
        "1,2019-01-01 09:30:00,False",
        "2,2019-01-01 10:30:00,True",
        "2,2019-01-01 11:00:00,True",
        "2,2019-01-01 11:30:00,False",
        "3,2019-01-01 12:30:00,False",
    ]

    assert given_labels == labels


def test_search_with_undefined_labels(transactions, total_spent_fn):
    def total_spent(ds):
        return total_spent_fn(ds) % 3

    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=total_spent,
    )

    n_examples = {1: 1, 2: 1}
    given_labels = lm.search(transactions, num_examples_per_instance=n_examples, gap=1)
    given_labels = to_csv(given_labels, index=False)

    labels = [
        "customer_id,time,total_spent",
        "0,2019-01-01 08:00:00,2",
        "0,2019-01-01 08:30:00,1",
        "1,2019-01-01 09:30:00,2",
        "1,2019-01-01 10:00:00,1",
        "2,2019-01-01 10:30:00,1",
        "2,2019-01-01 11:30:00,2",
        "3,2019-01-01 12:30:00,1",
    ]

    assert given_labels == labels


def test_search_with_multiple_targets(transactions, total_spent_fn, unique_amounts_fn):
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        window_size=2,
        labeling_function={
            "total_spent": total_spent_fn,
            "unique_amounts": unique_amounts_fn,
        },
    )

    expected = [
        "customer_id,time,total_spent,unique_amounts",
        "0,2019-01-01 08:00:00,2,1",
        "1,2019-01-01 09:00:00,2,1",
        "1,2019-01-01 10:00:00,1,1",
        "2,2019-01-01 10:30:00,2,1",
        "2,2019-01-01 11:30:00,2,1",
        "3,2019-01-01 12:30:00,1,1",
    ]

    lt = lm.search(transactions, num_examples_per_instance=-1)
    actual = lt.pipe(to_csv, index=False)
    info = "unexpected calculated values"
    assert actual == expected, info

    expected = [
        "customer_id,time,unique_amounts",
        "0,2019-01-01 08:00:00,1",
        "1,2019-01-01 09:00:00,1",
        "1,2019-01-01 10:00:00,1",
        "2,2019-01-01 10:30:00,1",
        "2,2019-01-01 11:30:00,1",
        "3,2019-01-01 12:30:00,1",
    ]

    actual = lt.select("unique_amounts")
    actual = actual.pipe(to_csv, index=False)
    info = "selected values differ from calculated values"
    assert actual == expected, info


def test_search_offset_mix_0(transactions, total_spent_fn):
    """
    Test offset mix with window_size (absolute), minimum_data (absolute), and gap (absolute).
    """
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=total_spent_fn,
        window_size="2h",
    )

    given_labels = lm.search(
        transactions,
        num_examples_per_instance=2,
        minimum_data="30min",
        gap="2h",
        drop_empty=True,
    )

    given_labels = to_csv(given_labels, index=False)

    labels = [
        "customer_id,time,total_spent",
        "0,2019-01-01 08:30:00,1",
        "1,2019-01-01 09:30:00,2",
        "2,2019-01-01 11:00:00,3",
    ]

    assert given_labels == labels


def test_search_offset_mix_1(transactions, total_spent_fn):
    """
    Test offset mix with window_size (relative), minimum_data (absolute), and gap (absolute).
    """
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=total_spent_fn,
        window_size=4,
    )

    given_labels = lm.search(
        transactions,
        num_examples_per_instance=2,
        minimum_data="2019-01-01 10:00:00",
        gap="4h",
    )

    given_labels = to_csv(given_labels, index=False)

    labels = [
        "customer_id,time,total_spent",
        "1,2019-01-01 10:00:00,1",
        "2,2019-01-01 10:00:00,4",
        "3,2019-01-01 10:00:00,1",
    ]

    assert given_labels == labels


def test_search_offset_mix_2(transactions, total_spent_fn):
    """
    Test offset mix with window_size (absolute), minimum_data (relative), and gap (absolute).
    """
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=total_spent_fn,
        window_size="30min",
    )

    given_labels = lm.search(
        transactions,
        num_examples_per_instance=2,
        minimum_data=2,
    )

    given_labels = to_csv(given_labels, index=False)

    labels = [
        "customer_id,time,total_spent",
        "1,2019-01-01 10:00:00,1",
        "2,2019-01-01 11:30:00,1",
        "2,2019-01-01 12:00:00,1",
    ]

    assert given_labels == labels


def test_search_offset_mix_3(transactions, total_spent_fn):
    """
    Test offset mix with window_size (absolute), minimum_data (absolute), and gap (relative).
    """
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=total_spent_fn,
        window_size="8h",
    )

    given_labels = lm.search(
        transactions,
        num_examples_per_instance=-1,
        minimum_data="2019-01-01 08:00:00",
        gap=1,
    )

    given_labels = to_csv(given_labels, index=False)

    labels = [
        "customer_id,time,total_spent",
        "0,2019-01-01 08:00:00,2",
        "0,2019-01-01 08:30:00,1",
        "1,2019-01-01 09:00:00,3",
        "1,2019-01-01 09:30:00,2",
        "1,2019-01-01 10:00:00,1",
        "2,2019-01-01 10:30:00,4",
        "2,2019-01-01 11:00:00,3",
        "2,2019-01-01 11:30:00,2",
        "2,2019-01-01 12:00:00,1",
        "3,2019-01-01 12:30:00,1",
    ]

    assert given_labels == labels


def test_search_offset_mix_4(transactions, total_spent_fn):
    """
    Test offset mix with window_size (relative), minimum_data (relative), and gap (absolute).
    """
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=total_spent_fn,
        window_size=1,
    )

    given_labels = lm.search(
        transactions,
        num_examples_per_instance=2,
        gap="30min",
    )

    given_labels = to_csv(given_labels, index=False)

    labels = [
        "customer_id,time,total_spent",
        "0,2019-01-01 08:00:00,1",
        "0,2019-01-01 08:30:00,1",
        "1,2019-01-01 09:00:00,1",
        "1,2019-01-01 09:30:00,1",
        "2,2019-01-01 10:30:00,1",
        "2,2019-01-01 11:00:00,1",
        "3,2019-01-01 12:30:00,1",
    ]

    assert given_labels == labels


def test_search_offset_mix_5(transactions, total_spent_fn):
    """
    Test offset mix with window_size (relative), minimum_data (absolute), and gap (relative).
    """
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=total_spent_fn,
        window_size=2,
    )

    labels = lm.search(
        transactions,
        num_examples_per_instance=2,
        minimum_data="1h",
        gap=2,
    )

    labels = to_csv(labels, index=False)

    expected_labels = [
        "customer_id,time,total_spent",
        "0,2019-01-01 08:00:00,2",
        "1,2019-01-01 09:00:00,2",
        "1,2019-01-01 10:00:00,1",
        "2,2019-01-01 10:30:00,2",
        "2,2019-01-01 11:30:00,2",
        "3,2019-01-01 12:30:00,1",
    ]
    assert labels == expected_labels


def test_search_offset_mix_6(transactions, total_spent_fn):
    """
    Test offset mix with window_size (absolute), minimum_data (relative), and gap (relative).
    """
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=total_spent_fn,
        window_size="1h",
    )

    given_labels = lm.search(
        transactions,
        num_examples_per_instance=1,
        minimum_data=3,
        gap=1,
    )

    given_labels = to_csv(given_labels, index=False)

    labels = [
        "customer_id,time,total_spent",
        "2,2019-01-01 12:00:00,1",
    ]

    assert given_labels == labels


def test_search_offset_mix_7(transactions, total_spent_fn):
    """
    Test offset mix with window_size (relative), minimum_data (relative), and gap (relative).
    """

    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=total_spent_fn,
        window_size=10,
    )

    given_labels = lm.search(
        transactions,
        num_examples_per_instance=float("inf"),
    )

    given_labels = to_csv(given_labels, index=False)

    labels = [
        "customer_id,time,total_spent",
        "0,2019-01-01 08:00:00,2",
        "1,2019-01-01 09:00:00,3",
        "2,2019-01-01 10:30:00,4",
        "3,2019-01-01 12:30:00,1",
    ]

    assert given_labels == labels


def test_search_offset_negative_0(transactions, total_spent_fn):
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=lambda: None,
        window_size=2,
    )

    match = "offset must be positive"
    with pytest.raises(AssertionError, match=match):
        lm.search(
            transactions,
            num_examples_per_instance=2,
            minimum_data=-1,
            gap=-1,
        )


def test_search_offset_negative_1(transactions, total_spent_fn):
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=lambda: None,
        window_size=2,
    )

    match = "offset must be positive"
    with pytest.raises(AssertionError, match=match):
        lm.search(
            transactions,
            num_examples_per_instance=2,
            minimum_data="-1h",
            gap="-1h",
        )


def test_search_invalid_n_examples(transactions, total_spent_fn):
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=total_spent_fn,
    )

    with pytest.raises(AssertionError, match="must specify gap"):
        next(lm.slice(transactions, num_examples_per_instance=2))

    with pytest.raises(AssertionError, match="must specify gap"):
        lm.search(transactions, num_examples_per_instance=2)


def test_column_based_windows(transactions, total_spent_fn):
    session_id = [1, 2, 3, 3, 4, 5, 5, 5, 6, 7]
    df = transactions.assign(session_id=session_id)

    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        window_size="session_id",
        labeling_function=total_spent_fn,
    )

    actual = lm.search(df, -1).pipe(to_csv, index=False)

    expected = [
        "customer_id,time,total_spent",
        "0,2019-01-01 08:00:00,1",
        "0,2019-01-01 08:30:00,1",
        "1,2019-01-01 09:00:00,2",
        "1,2019-01-01 10:00:00,1",
        "2,2019-01-01 10:30:00,3",
        "2,2019-01-01 12:00:00,1",
        "3,2019-01-01 12:30:00,1",
    ]

    assert actual == expected


def test_search_with_invalid_index(transactions, total_spent_fn):
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=lambda df: None,
        window_size=2,
    )

    df = transactions.sample(n=10, random_state=0)
    match = "data frame must be sorted chronologically"
    with pytest.raises(AssertionError, match=match):
        lm.search(df, num_examples_per_instance=2)

    df = transactions.assign(time=pd.NaT)
    match = "index contains null values"
    with pytest.raises(AssertionError, match=match):
        lm.search(df, num_examples_per_instance=2)


def test_search_on_empty_labels(transactions):
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=lambda ds: None,
        window_size=2,
    )

    given_labels = lm.search(
        transactions,
        minimum_data=1,
        num_examples_per_instance=2,
        gap=1,
    )

    assert given_labels.empty


def test_data_slice_overlap(transactions, total_spent_fn):
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=total_spent_fn,
        window_size="1h",
    )

    for ds in lm.slice(transactions, num_examples_per_instance=2):
        overlap = ds.index == ds.context.slice_stop
        assert not overlap.any()


def test_label_type(transactions, total_spent_fn):
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=total_spent_fn,
    )
    lt = lm.search(transactions, num_examples_per_instance=1)
    assert lt.target_types["total_spent"] == "continuous"
    assert lt.bin(2).target_types["total_spent"] == "discrete"


def test_search_with_maximum_data(transactions):
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=len,
        window_size="1h",
    )

    lt = lm.search(
        df=transactions.sort_values("time"),
        num_examples_per_instance=-1,
        minimum_data="2019-01-01 08:00:00",
        maximum_data="2019-01-01 09:00:00",
        drop_empty=False,
    )

    expected = [
        "customer_id,time,len",
        "0,2019-01-01 08:00:00,2",
        "0,2019-01-01 09:00:00,0",
        "1,2019-01-01 08:00:00,0",
        "1,2019-01-01 09:00:00,2",
        "2,2019-01-01 08:00:00,0",
        "2,2019-01-01 09:00:00,0",
        "3,2019-01-01 08:00:00,0",
        "3,2019-01-01 09:00:00,0",
    ]

    actual = lt.pipe(to_csv, index=False)
    assert actual == expected

    lt = lm.search(
        df=transactions.sort_values("time"),
        num_examples_per_instance=-1,
        maximum_data="30min",
        drop_empty=False,
        gap="30min",
    )

    expected = [
        "customer_id,time,len",
        "0,2019-01-01 08:00:00,2",
        "0,2019-01-01 08:30:00,1",
        "1,2019-01-01 09:00:00,2",
        "1,2019-01-01 09:30:00,2",
        "2,2019-01-01 10:30:00,2",
        "2,2019-01-01 11:00:00,2",
        "3,2019-01-01 12:30:00,1",
        "3,2019-01-01 13:00:00,0",
    ]

    actual = lt.pipe(to_csv, index=False)
    assert actual == expected


@pytest.mark.parametrize(
    "minimum_data",
    [
        {1: "2019-01-01 09:30:00", 2: "2019-01-01 11:30:00"},
        {1: pd.Timedelta("30min"), 2: pd.Timedelta("1h")},
        {1: 1, 2: 2},
    ],
)
def test_minimum_data_per_group(transactions, minimum_data):
    lm = LabelMaker(
        "customer_id",
        labeling_function=len,
        time_index="time",
        window_size="1h",
    )
    for supported_type in [minimum_data, pd.Series(minimum_data)]:
        lt = lm.search(transactions, 1, minimum_data=supported_type)
        actual = to_csv(lt, index=False)

        expected = [
            "customer_id,time,len",
            "1,2019-01-01 09:30:00,2",
            "2,2019-01-01 11:30:00,2",
        ]

        assert actual == expected


def test_minimum_data_per_group_error(transactions):
    lm = LabelMaker(
        "customer_id",
        labeling_function=len,
        time_index="time",
        window_size="1h",
    )
    data = ["2019-01-01 09:00:00", "2019-01-01 12:00:00"]
    minimum_data = pd.Series(data=data, index=[1, 1])
    match = "more than one cutoff time exists for a target group"

    with pytest.raises(ValueError, match=match):
        lm.search(transactions, 1, minimum_data=minimum_data)


def test_label_maker_categorical_target_with_missing_data(transactions, total_spent_fn):
    transactions = transactions.copy()
    transactions["customer_id"] = transactions["customer_id"].astype("category")
    lm = LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        window_size=3,
        labeling_function=total_spent_fn,
    )
    # use on only the first 8 rows so the df will not contain data for customer 3
    lm.search(transactions.head(8), -1)


================================================
FILE: composeml/tests/test_label_plots.py
================================================
from pytest import raises


def test_count_by_time_categorical(total_spent):
    total_spent = total_spent.bin(2, labels=range(2))
    title = total_spent.plot.count_by_time().get_title()
    assert title == "Label Count vs. Cutoff Times"


def test_count_by_time_continuous(total_spent):
    title = total_spent.plot.count_by_time().get_title()
    assert title == "Label vs. Cutoff Times"


def test_distribution_categorical(total_spent):
    ax = total_spent.bin(2, labels=range(2))
    title = ax.plot.dist().get_title()
    assert title == "Label Distribution"


def test_distribution_continuous(total_spent):
    title = total_spent.plot.dist().get_title()
    assert title == "Label Distribution"


def test_single_target(total_spent):
    lt = total_spent.copy()
    lt.target_columns.append("target_2")
    match = "must first select an individual target"

    with raises(AssertionError, match=match):
        lt.plot.dist()

    with raises(AssertionError, match=match):
        lt.plot.count_by_time()


================================================
FILE: composeml/tests/test_label_serialization.py
================================================
import os
import shutil

import pandas as pd
import pytest

import composeml as cp


@pytest.fixture
def path():
    pwd = os.path.dirname(__file__)
    path = os.path.join(pwd, ".cache")
    yield path
    shutil.rmtree(path)


@pytest.fixture
def total_spent(transactions, total_spent_fn):
    lm = cp.LabelMaker(
        target_dataframe_index="customer_id",
        time_index="time",
        labeling_function=total_spent_fn,
    )
    lt = lm.search(transactions, num_examples_per_instance=1, verbose=False)
    return lt


def test_csv(path, total_spent):
    total_spent.to_csv(path)
    total_spent_copy = cp.read_label_times(path)
    pd.testing.assert_frame_equal(total_spent, total_spent_copy)
    assert total_spent.equals(total_spent_copy)


def test_parquet(path, total_spent):
    total_spent.to_parquet(path)
    total_spent_copy = cp.read_label_times(path)
    pd.testing.assert_frame_equal(total_spent, total_spent_copy)
    assert total_spent.equals(total_spent_copy)


def test_pickle(path, total_spent):
    total_spent.to_pickle(path)
    total_spent_copy = cp.read_label_times(path)
    pd.testing.assert_frame_equal(total_spent, total_spent_copy)
    assert total_spent.equals(total_spent_copy)


================================================
FILE: composeml/tests/test_label_times.py
================================================
from pytest import raises

from composeml.label_times import LabelTimes
from composeml.tests.utils import to_csv


def test_count_by_time_categorical(total_spent):
    given_answer = total_spent.bin(2, labels=range(2))
    given_answer = to_csv(given_answer.count_by_time)

    answer = [
        "time,0,1",
        "2019-01-01 08:00:00,0,1",
        "2019-01-01 08:30:00,0,2",
        "2019-01-01 09:00:00,0,3",
        "2019-01-01 09:30:00,0,4",
        "2019-01-01 10:00:00,0,5",
        "2019-01-01 10:30:00,1,5",
        "2019-01-01 11:00:00,2,5",
        "2019-01-01 11:30:00,3,5",
        "2019-01-01 12:00:00,4,5",
        "2019-01-01 12:30:00,5,5",
    ]

    assert given_answer == answer


def test_count_by_time_continuous(total_spent):
    given_answer = total_spent.count_by_time
    given_answer = to_csv(given_answer, header=True, index=True)

    answer = [
        "time,total_spent",
        "2019-01-01 08:00:00,1",
        "2019-01-01 08:30:00,2",
        "2019-01-01 09:00:00,3",
        "2019-01-01 09:30:00,4",
        "2019-01-01 10:00:00,5",
        "2019-01-01 10:30:00,6",
        "2019-01-01 11:00:00,7",
        "2019-01-01 11:30:00,8",
        "2019-01-01 12:00:00,9",
        "2019-01-01 12:30:00,10",
    ]

    assert given_answer == answer


def test_sorted_distribution(capsys, total_spent):
    bins = [0, 5, 10, 20]
    total_spent.bin(bins).describe()
    captured = capsys.readouterr()

    out = "\n".join(
        [
            "Label Distribution",
            "------------------",
            "total_spent",
            "(0, 5]      5",
            "(5, 10]     4",
            "(10, 20]    0",
            "Total:      9",
            "",
            "",
            "Settings",
            "--------",
            "num_examples_per_instance             -1",
            "target_column                total_spent",
            "target_dataframe_index       customer_id",
            "target_type                     discrete",
            "",
            "",
            "Transforms",
            "----------",
            "1. bin",
            "  - bins:         [0, 5, 10, 20]",
            "  - labels:                 None",
            "  - precision:                 3",
            "  - quantiles:             False",
            "  - right:                  True",
            "",
            "",
        ],
    )
    assert captured.out == out


def test_describe_no_transforms(capsys):
    data = {"target": range(3)}
    LabelTimes(data).describe()
    captured = capsys.readouterr()
    out = "\n".join(
        [
            "Label Distribution",
            "------------------",
            "count    3.0",
            "mean     1.0",
            "std      1.0",
            "min      0.0",
            "25%      0.5",
            "50%      1.0",
            "75%      1.5",
            "max      2.0",
            "",
            "",
            "Settings",
            "--------",
            "target_column                 target",
            "target_dataframe_index          None",
            "target_type               continuous",
            "",
            "",
            "Transforms",
            "----------",
            "No transforms applied",
            "",
            "",
        ],
    )

    assert captured.out == out


def test_distribution_categorical(total_spent):
    labels = range(2)
    given_answer = total_spent.bin(2, labels=labels).distribution
    given_answer = to_csv(given_answer)

    answer = [
        "total_spent,count",
        "0,5",
        "1,5",
    ]

    assert given_answer == answer


def test_distribution_continous(total_spent):
    distribution = total_spent.distribution
    actual = to_csv(distribution.round(4))

    expected = [
        ",total_spent",
        "count,10.0",
        "mean,4.5",
        "std,3.0277",
        "min,0.0",
        "25%,2.25",
        "50%,4.5",
        "75%,6.75",
        "max,9.0",
    ]

    assert actual == expected


def test_target_type(total_spent):
    types = total_spent.target_types
    assert types["total_spent"] == "continuous"
    total_spent = total_spent.threshold(5)
    types = total_spent.target_types
    assert types["total_spent"] == "discrete"


def test_count(total_spent):
    given_answer = total_spent.count
    given_answer = to_csv(given_answer, index=True)

    answer = [
        "customer_id,count",
        "0,2",
        "1,3",
        "2,4",
        "3,1",
    ]

    assert given_answer == answer


def test_label_select_errors(total_spent):
    match = "only one target exists"
    with raises(AssertionError, match=match):
        total_spent.select("a")

    lt = total_spent.copy()
    lt.target_columns.append("b")

    match = "target name must be string"
    with raises(TypeError, match=match):
        total_spent.select(123)

    match = 'target "a" not found'
    with raises(AssertionError, match=match):
        lt.select("a")


================================================
FILE: composeml/tests/test_label_transforms/__init__.py
================================================


================================================
FILE: composeml/tests/test_label_transforms/test_bin.py
================================================
import pandas as pd
from pytest import raises


def test_bins(labels):
    given_labels = labels.bin(2)
    transform = given_labels.transforms[0]

    assert transform["transform"] == "bin"
    assert transform["bins"] == 2
    assert transform["quantiles"] is False
    assert transform["labels"] is None
    assert transform["right"] is True

    answer = [
        pd.Interval(157.5, 283.46, closed="right"),
        pd.Interval(31.288, 157.5, closed="right"),
        pd.Interval(157.5, 283.46, closed="right"),
        pd.Interval(31.288, 157.5, closed="right"),
    ]

    answer = pd.Categorical(answer, ordered=True)
    labels = labels.assign(my_labeling_function=answer)
    pd.testing.assert_frame_equal(given_labels, labels)


def test_quantile_bins(labels):
    given_labels = labels.bin(2, quantiles=True)
    transform = given_labels.transforms[0]

    assert transform["transform"] == "bin"
    assert transform["bins"] == 2
    assert transform["quantiles"] is True
    assert transform["labels"] is None
    assert transform["right"] is True

    answer = [
        pd.Interval(137.44, 283.46, closed="right"),
        pd.Interval(31.538999999999998, 137.44, closed="right"),
        pd.Interval(137.44, 283.46, closed="right"),
        pd.Interval(31.538999999999998, 137.44, closed="right"),
    ]

    answer = pd.Categorical(answer, ordered=True)
    labels = labels.assign(my_labeling_function=answer)
    pd.testing.assert_frame_equal(given_labels, labels)


def test_single_target(total_spent):
    lt = total_spent.copy()
    lt.target_columns.append("target_2")
    match = "must first select an individual target"
    with raises(AssertionError, match=match):
        lt.bin(2)


================================================
FILE: composeml/tests/test_label_transforms/test_lead.py
================================================
import pandas as pd


def test_lead(labels):
    labels = labels.apply_lead("10min")
    transform = labels.transforms[0]

    assert transform["transform"] == "apply_lead"
    assert transform["value"] == "10min"

    answer = [
        "2014-01-01 00:35:00",
        "2014-01-01 00:38:00",
        "2013-12-31 23:51:00",
        "2013-12-31 23:54:00",
    ]

    time = pd.Series(answer, name="time", dtype="datetime64[ns]")
    time = time.rename_axis("label_id")

    pd.testing.assert_series_equal(labels["time"], time)


================================================
FILE: composeml/tests/test_label_transforms/test_sample.py
================================================
import pytest

from composeml import LabelTimes
from composeml.tests.utils import read_csv, to_csv


@pytest.fixture
def labels(labels):
    return labels.threshold(100)


def test_sample_n_int(labels):
    given_answer = labels.sample(n=2, random_state=0)
    given_answer = given_answer.sort_index()
    given_answer = to_csv(given_answer, index=True)

    answer = [
        "label_id,customer_id,time,my_labeling_function",
        "2,2,2014-01-01 00:01:00,True",
        "3,2,2014-01-01 00:04:00,False",
    ]

    assert given_answer == answer


def test_sample_n_per_label(labels):
    n = {True: 1, False: 2}
    given_answer = labels.sample(n=n, random_state=0)
    given_answer = given_answer.sort_index()
    given_answer = to_csv(given_answer, index=True)

    answer = [
        "label_id,customer_id,time,my_labeling_function",
        "1,1,2014-01-01 00:48:00,False",
        "2,2,2014-01-01 00:01:00,True",
        "3,2,2014-01-01 00:04:00,False",
    ]

    assert given_answer == answer


def test_sample_frac_int(labels):
    given_answer = labels.sample(frac=0.25, random_state=0)
    given_answer = given_answer.sort_index()
    given_answer = to_csv(given_answer, index=True)

    answer = [
        "label_id,customer_id,time,my_labeling_function",
        "2,2,2014-01-01 00:01:00,True",
    ]

    assert given_answer == answer


def test_sample_frac_per_label(labels):
    frac = {True: 1.0, False: 0.5}
    given_answer = labels.sample(frac=frac, random_state=0)
    given_answer = given_answer.sort_index()
    given_answer = to_csv(given_answer, index=True)

    answer = [
        "label_id,customer_id,time,my_labeling_function",
        "0,1,2014-01-01 00:45:00,True",
        "2,2,2014-01-01 00:01:00,True",
        "3,2,2014-01-01 00:04:00,False",
    ]

    assert given_answer == answer


def test_sample_in_transforms(labels):
    n = {True: 2, False: 2}

    transform = {
        "transform": "sample",
        "n": n,
        "frac": None,
        "random_state": None,
        "replace": False,
        "per_instance": False,
    }

    sample = labels.sample(n=n)
    assert transform != labels.transforms[-1]
    assert transform == sample.transforms[-1]


def test_sample_with_replacement(labels):
    assert labels.shape[0] < 20
    n = {True: 10, False: 10}
    sample = labels.sample(n=n, replace=True)
    assert sample.shape[0] == 20


def test_single_target(total_spent):
    lt = total_spent.copy()
    lt.target_columns.append("target_2")
    match = "must first select an individual target"
    with pytest.raises(AssertionError, match=match):
        lt.sample(2)


def test_sample_n_per_instance():
    data = read_csv(
        [
            "target_dataframe_index,labels",
            "0,a",
            "0,b",
            "1,a",
            "1,b",
        ],
    )

    lt = LabelTimes(data=data, target_dataframe_index="target_dataframe_index")
    sample = lt.sample(n={"a": 1}, per_instance=True, random_state=0)
    actual = to_csv(sample, index=False)

    expected = [
        "target_dataframe_index,labels",
        "0,a",
        "1,a",
    ]

    assert expected == actual


def test_sample_frac_per_instance():
    data = read_csv(
        [
            "target_dataframe_index,labels",
            "0,a",
            "0,a",
            "0,a",
            "0,a",
            "1,a",
            "1,a",
        ],
    )

    lt = LabelTimes(data=data, target_dataframe_index="target_dataframe_index")
    sample = lt.sample(frac={"a": 0.5}, per_instance=True, random_state=0)
    actual = to_csv(sample, index=False)

    expected = [
        "target_dataframe_index,labels",
        "0,a",
        "0,a",
        "1,a",
    ]

    assert expected == actual


================================================
FILE: composeml/tests/test_label_transforms/test_threshold.py
================================================
from pytest import raises


def test_threshold(labels):
    labels = labels.threshold(200)
    transform = labels.transforms[0]

    assert transform["transform"] == "threshold"
    assert transform["value"] == 200

    answer = [True, False, True, False]
    target_column = labels.target_columns[0]
    given_answer = labels[target_column].values.tolist()
    assert given_answer == answer


def test_single_target(total_spent):
    lt = total_spent.copy()
    lt.target_columns.append("target_2")
    match = "must first select an individual target"
    with raises(AssertionError, match=match):
        lt.threshold(200)


================================================
FILE: composeml/tests/test_version.py
================================================
from composeml import __version__


def test_version():
    assert __version__ == "0.10.1"


================================================
FILE: composeml/tests/utils.py
================================================
from io import StringIO

import pandas as pd


def read_csv(data, **kwargs):
    """Helper function for creating a dataframe from in-memory CSV string (or list of strings).

    Args:
        data (str or list) : CSV string(s)

    Returns:
        DataFrame : Instance of a dataframe.
    """
    if isinstance(data, list):
        data = "\n".join(data)

    # This creates a file-like object for reading in CSV string.
    with StringIO(data) as data:
        df = pd.read_csv(data, **kwargs)

    return df


def to_csv(label_times, **kwargs):
    df = pd.DataFrame(label_times)
    csv = df.to_csv(**kwargs)
    return csv.splitlines()


================================================
FILE: composeml/update_checker.py
================================================
from pkg_resources import iter_entry_points

for entry_point in iter_entry_points("alteryx_open_src_initialize"):
    try:
        method = entry_point.load()
        if callable(method):
            method("composeml")
    except Exception:
        pass


================================================
FILE: composeml/version.py
================================================
__version__ = "0.10.1"


================================================
FILE: contributing.md
================================================
# Contributing to Compose

:+1::tada: First off, thank you for taking the time to contribute! :tada::+1:

Whether you are a novice or experienced software developer, all contributions and suggestions are welcome!

There are many ways to contribute to Compose, with the most common ones being contribution of code or documentation to the project.

**To contribute, you can:**
1. Help users on our [Slack channel](https://join.slack.com/t/alteryx-oss/shared_invite/zt-182tyvuxv-NzIn6eiCEf8TBziuKp0bNA). Answer questions under the compose tag on [Stack Overflow](https://stackoverflow.com/questions/tagged/composeml)

2. Submit a pull request for one of [Good First Issues](https://github.com/alteryx/compose/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+First+Issue%22)

3. Make changes to the codebase, see [Contributing to the codebase](#Contributing-to-the-Codebase).

4. Improve our documentation, which can be found under the [docs](docs/) directory or at https://compose.alteryx.com/en/stable/

5. [Report issues](#Report-issues) you're facing, and give a "thumbs up" on issues that others reported and that are relevant to you. Issues should be used for bugs, and feature requests only.

6. Spread the word: reference Compose from your blog and articles, link to it from your website, or simply star it in [Compose GitHub page](https://github.com/alteryx/compose) to say "I use it".

## Contributing to the Codebase

Before starting major work, you should touch base with the maintainers of Compose by filing an issue on GitHub or posting a message in the [#development channel on Slack](https://join.slack.com/t/alteryx-oss/shared_invite/zt-182tyvuxv-NzIn6eiCEf8TBziuKp0bNA). This will increase the likelihood your pull request will eventually get merged in.

#### 1. Fork and clone repo
* The code is hosted on GitHub, so you will need to use Git to fork the project and make changes to the codebase. To start, go to the [Compose GitHub page](https://github.com/alteryx/compose) and click the `Fork` button.
* After you have created the fork, you will want to clone the fork to your machine and connect your version of the project to the upstream Compose repo.
  ```bash
  git clone https://github.com/your-user-name/compose.git
  cd compose
  git remote add upstream https://github.com/alteryx/compose
  ```
* Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment. You can run the following steps to create a separate virtual environment, and install Compose in editable mode. 
  ```bash
  python -m venv venv
  source venv/bin/activate
  make installdeps
  git checkout -b issue####-branch_name
  ```

#### 2. Implement your Pull Request

* Implement your pull request. If needed, add new tests or update the documentation.
* Before submitting to GitHub, verify the tests run and the code lints properly
  ```bash
  # runs linting
  make lint

  # will fix some common linting issues automatically
  make lint-fix

  # runs test
  make test
  ```
* If you made changes to the documentation, build the documentation locally.
  ```bash
  # go to docs and build
  cd docs
  make html

  # view docs locally
  open build/html/index.html
  ```

#### 3. Submit your Pull Request

* Once your changes are ready to be submitted, make sure to push your changes to GitHub before creating a pull request.
* If you need to update your code with the latest changes from the main Compose repo, you can do that by running the commands below, which will merge the latest changes from the Compose `main` branch into your current local branch. You may need to resolve merge conflicts if there are conflicts between your changes and the upstream changes. After the merge, you will need to push the updates to your forked repo after running these commands.
  ```bash
  git fetch upstream
  git merge upstream/main
  ```
* Create a pull request to merge the changes from your forked repo branch into the Compose `main` branch. Creating the pull request will automatically run our continuous integration.
* If this is your first contribution, you will need to sign the Contributor License Agreement as directed.
* Update the "Future Release" section of the release notes (`docs/source/release_notes.rst`) to include your pull request and add your github username to the list of contributors.  Add a description of your PR to the subsection that most closely matches your contribution:
    * Enhancements: new features or additions to Compose.
    * Fixes: things like bugfixes or adding more descriptive error messages.
    * Changes: modifications to an existing part of Compose.
    * Documentation Changes
    * Testing Changes

   Documentation or testing changes rarely warrant an individual release notes entry; the PR number can be added to their respective "Miscellaneous changes" entries.
* We will review your changes, and you will most likely be asked to make additional changes before it is finally ready to merge. However, once it's reviewed by a maintainer of Compose, passes continuous integration, we will merge it, and you will have successfully contributed to Compose!

## Report issues
When reporting issues please include as much detail as possible about your operating system, Compose version and python version. Whenever possible, please also include a brief, self-contained code example that demonstrates the problem.


================================================
FILE: docs/Makefile
================================================
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line.
SPHINXOPTS    =
SPHINXBUILD   = sphinx-build
SOURCEDIR     = source
BUILDDIR      = build

# Put it first so that "make" without argument is like "make help".
help:
	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)


================================================
FILE: docs/make.bat
================================================
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
	set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

if "%1" == "" goto help

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
	echo.
	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
	echo.installed, then set the SPHINXBUILD environment variable to point
	echo.to the full path of the 'sphinx-build' executable. Alternatively you
	echo.may add the Sphinx directory to PATH.
	echo.
	echo.If you don't have Sphinx installed, grab it from
	echo.http://sphinx-doc.org/
	exit /b 1
)

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%

:end
popd


================================================
FILE: docs/source/_static/style.css
================================================
.footer {
    background-color: #0D2345;
    padding-bottom: 40px;
    padding-top: 40px;
    width: 100%;
}

.footer-cell-1 {
    grid-row: 1;
    grid-column: 1 / 3;
}

.footer-cell-2 {
    grid-row: 1;
    grid-column: 4;
    margin-bottom: 15px;
    text-align: right;
}

.footer-cell-3 {
    grid-row: 2;
    grid-column: 1 / 5;
}

.footer-cell-4 {
    grid-row: 3;
    grid-column: 1 / 3;
}

.footer-container {
    display: grid;
    margin-left: 10%;
    margin-right: 10%;
}

.footer-image-alteryx {
    padding-top: 22px;
    width: 270px;
}

.footer-image-copyright {
    width: 180px;
}

.footer-image-github {
    width: 50px;
}

.footer-image-twitter {
    width: 60px;
}

.footer-line {
    border-top: 2px solid white;
    margin-left: 7px;
    margin-right: 15px;
}


================================================
FILE: docs/source/_templates/class.rst
================================================
{{ fullname | escape | underline}}

.. currentmodule:: {{ module }}

.. autoclass:: {{ objname }}

   {% block methods %}
   {% if methods %}
   .. rubric:: Methods

   .. autosummary::
      :nosignatures:
      :toctree: methods

   {% for item in methods %}
   {%- if item not in inherited_members %}
      ~{{ name }}.{{ item }}
   {%- endif %}
   {%- endfor %}
   {% endif %}
   {% endblock %}


================================================
FILE: docs/source/_templates/layout.html
================================================
{% extends "!layout.html" %}

{%- block extrahead %}

<script>
  !function () {
    var analytics = window.analytics = window.analytics || []; if (!analytics.initialize) if (analytics.invoked) window.console && console.error && console.error("Segment snippet included twice."); else {
      analytics.invoked = !0; analytics.methods = ["trackSubmit", "trackClick", "trackLink", "trackForm", "pageview", "identify", "reset", "group", "track", "ready", "alias", "debug", "page", "once", "off", "on"]; analytics.factory = function (t) { return function () { var e = Array.prototype.slice.call(arguments); e.unshift(t); analytics.push(e); return analytics } }; for (var t = 0; t < analytics.methods.length; t++) { var e = analytics.methods[t]; analytics[e] = analytics.factory(e) } analytics.load = function (t, e) { var n = document.createElement("script"); n.type = "text/javascript"; n.async = !0; n.src = "https://cdn.segment.com/analytics.js/v1/" + t + "/analytics.min.js"; var a = document.getElementsByTagName("script")[0]; a.parentNode.insertBefore(n, a); analytics._loadOptions = e }; analytics.SNIPPET_VERSION = "4.1.0";
      analytics.load("ze8imyBlahLiQl1WxZCnHzhNWgviYKOn");
      analytics.page();
    }
  }();
</script>

{% set image = 'https://alteryx-oss-web-images.s3.amazonaws.com/compose_open_graph.png' %}
{% set description = 'A machine learning tool for automated prediction engineering' %}
{% if meta is defined %}
    {% if meta.description is defined %}
        {% set description = meta.description %}
    {% endif %}
{% endif %}

<meta property="og:title" content="{{ title|striptags|e }}{{ titlesuffix }}">
<meta content="{{description}}" />
<meta property="og:description" content="{{description}}">
<meta property="og:image" content="{{image}}">
<meta property="twitter:image" content="{{image}}">
<meta name="twitter:card" content="summary_large_image">

{% endblock %}

{%- block footer %}

<footer class="footer">
  <div class="footer-container">
    <div class="footer-cell-1">
      <img class="footer-image-alteryx" src="{{ pathto('_static/images/alteryx_open_source.svg', 1) }}" alt="Alteryx Open Source">
    </div>
    <div class="footer-cell-2">
      <a href="https://github.com/alteryx/compose" target="_blank">
        <img class="footer-image-github" src="{{ pathto('_static/images/github.svg', 1) }}" alt="GitHub">
      </a>
      <a href="https://twitter.com/AlteryxOSS" target="_blank">
        <img class="footer-image-twitter" src="{{ pathto('_static/images/twitter.svg', 1) }}" alt="Twitter">
      </a>
    </div>
    <div class="footer-cell-3">
      <hr class="footer-line">
    </div>
    <div class="footer-cell-4">
      <img class="footer-image-copyright" src="{{ pathto('_static/images/copyright.svg', 1) }}" alt="Copyright">
    </div>
  </div>
</footer>

{% endblock %}


================================================
FILE: docs/source/api_reference.rst
================================================
.. currentmodule:: composeml

=============
API Reference
=============

Label Maker
===========

.. autosummary::
    :toctree: generated
    :template: class.rst
    :nosignatures:

    LabelMaker

Label Times
============

.. autosummary::
    :toctree: generated
    :template: class.rst
    :nosignatures:

    LabelTimes

Transform Methods
-----------------

.. autosummary::
    :nosignatures:

    LabelTimes.apply_lead
    LabelTimes.bin
    LabelTimes.sample
    LabelTimes.threshold

.. currentmodule:: composeml.label_times.plots

Label Plots
===========

.. autosummary::
    :toctree: generated
    :template: class.rst
    :nosignatures:

    LabelPlots

Plotting Methods
----------------

.. autosummary::
    :nosignatures:

    LabelPlots.count_by_time
    LabelPlots.distribution


================================================
FILE: docs/source/conf.py
================================================
# -*- coding: utf-8 -*-
#
# Configuration file for the Sphinx documentation builder.
#
# This file does only contain a selection of the most common options. For a
# full list see the documentation:
# http://www.sphinx-doc.org/en/master/config

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))

from composeml import __version__ as version

# -- Project information -----------------------------------------------------

project = "Compose"
copyright = "2020, Alteryx, Inc."
author = "Alteryx, Inc."

# The full version, including alpha/beta/rc tags
release = version


# -- General configuration ---------------------------------------------------

# If your documentation needs a minimal Sphinx version, state it here.
#
# needs_sphinx = '1.0'

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
    "nbsphinx",
    "sphinx.ext.autodoc",
    "sphinx.ext.autosummary",
    "sphinx.ext.intersphinx",
    "sphinx.ext.napoleon",
    "sphinx.ext.viewcode",
    "sphinx.ext.extlinks",
    "sphinx_inline_tabs",
    "sphinx_copybutton",
    "myst_parser",
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
# source_suffix = ['.rst', '.md']
source_suffix = ".rst"

# The master toctree document.
master_doc = "index"

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ["**.ipynb_checkpoints"]

# The name of the Pygments (syntax highlighting) style to use.
pygments_style = None


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages.  See the documentation for
# a list of builtin themes.
#

html_theme = "pydata_sphinx_theme"
html_logo = "images/compose_nav2.png"
html_favicon = "images/favicon.ico"
html_theme_options = {
    "icon_links": [
        {
            "name": "GitHub",
            "url": "https://github.com/alteryx/compose",
            "icon": "fab fa-github-square",
            "type": "fontawesome",
        },
        {
            "name": "Twitter",
            "url": "https://twitter.com/AlteryxOSS",
            "icon": "fab fa-twitter-square",
            "type": "fontawesome",
        },
        {
            "name": "Slack",
            "url": "https://join.slack.com/t/alteryx-oss/shared_invite/zt-182tyvuxv-NzIn6eiCEf8TBziuKp0bNA",
            "icon": "fab fa-slack",
            "type": "fontawesome",
        },
    ],
    "collapse_navigation": False,
    "navigation_depth": 2,
}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]

# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
#
# The default sidebars (for documents that don't match any pattern) are
# defined by theme itself.  Builtin themes are using these templates by
# default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
# 'searchbox.html']``.
#
# html_sidebars = {}


# -- Options for HTMLHelp output ---------------------------------------------

# Output file base name for HTML help builder.
htmlhelp_basename = "Composedoc"


# -- Options for LaTeX output ------------------------------------------------

latex_elements = {
    # The paper size ('letterpaper' or 'a4paper').
    #
    # 'papersize': 'letterpaper',
    # The font size ('10pt', '11pt' or '12pt').
    #
    # 'pointsize': '10pt',
    # Additional stuff for the LaTeX preamble.
    #
    # 'preamble': '',
    # Latex figure (float) alignment
    #
    # 'figure_align': 'htbp',
}

# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
#  author, documentclass [howto, manual, or own class]).
latex_documents = [
    (master_doc, "Compose.tex", "Compose Documentation", "Alteryx, Inc.", "manual"),
]


# -- Options for manual page output ------------------------------------------

# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [(master_doc, "composeml", "Compose Documentation", [author], 1)]


# -- Options for Texinfo output ----------------------------------------------

# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
#  dir menu entry, description, category)
texinfo_documents = [
    (
        master_doc,
        "Compose",
        "Compose Documentation",
        author,
        "Compose",
        "One line description of project.",
        "Miscellaneous",
    ),
]


# -- Options for Epub output -------------------------------------------------

# Bibliographic Dublin Core info.
epub_title = project

# The unique identifier of the text. This can be a ISBN number
# or the project homepage.
#
# epub_identifier = ''

# A unique identification for the text.
#
# epub_uid = ''

# A list of files that should not be packed into the epub file.
epub_exclude_files = ["search.html"]

# -- Options for Markdown files ----------------------------------------------

myst_admonition_enable = True
myst_deflist_enable = True
myst_heading_anchors = 3

# -- Options for Sphinx Copy Button ------------------------------------------

copybutton_prompt_text = "myinputprompt"
copybutton_prompt_text = r">>> |\.\.\. |\$ |In \[\d*\]: | {2,5}\.\.\.: | {5,8}: "
copybutton_prompt_is_regexp = True


# -- Extension configuration -------------------------------------------------

extlinks = {
    "issue": ("https://github.com/alteryx/compose/issues/%s", "#"),
    "pr": ("https://github.com/alteryx/compose/pull/%s", "#"),
    "user": ("https://github.com/%s", "@"),
}

autosummary_generate = ["api_reference.rst"]
templates_path = ["_templates"]


def setup(app):
    app.add_css_file("style.css")


html_show_sphinx = False


================================================
FILE: docs/source/examples/demo/__init__.py
================================================
import os
import warnings

warnings.filterwarnings("ignore")
PWD = os.path.dirname(__file__)


================================================
FILE: docs/source/examples/demo/chicago_bike/__init__.py
================================================
from demo import PWD
from pandas import read_csv
from os.path import join

PWD = join(PWD, "chicago_bike")


def _read(file):
    return read_csv(
        join(PWD, file),
        parse_dates=["starttime", "stoptime"],
        index_col="trip_id",
    )


def load_sample():
    return _read("sample.csv")


================================================
FILE: docs/source/examples/demo/chicago_bike/sample.csv
================================================
trip_id,gender,starttime,stoptime,tripduration,temperature,events,from_station_id,dpcapacity_start,to_station_id,dpcapacity_end
2331610,Female,2014-06-29 13:35:00,2014-06-29 13:56:00,20.75,82.9,cloudy,178,15.0,76,39.0
2347603,Female,2014-06-30 12:07:00,2014-06-30 12:37:00,30.15,82.0,cloudy,211,19.0,177,15.0
2345120,Male,2014-06-30 08:36:00,2014-06-30 08:43:00,6.516666666666668,75.0,cloudy,340,15.0,67,15.0
2347527,Male,2014-06-30 12:00:00,2014-06-30 12:08:00,7.25,82.0,cloudy,56,19.0,56,19.0
2344421,Male,2014-06-30 08:04:00,2014-06-30 08:11:00,7.316666666666666,75.0,cloudy,77,23.0,37,19.0
2336431,Male,2014-06-29 16:31:00,2014-06-29 16:53:00,21.816666666666666,84.9,cloudy,349,15.0,13,19.0
2351574,Male,2014-06-30 16:55:00,2014-06-30 17:07:00,11.783333333333333,84.0,cloudy,190,15.0,93,15.0
2351672,Male,2014-06-30 16:58:00,2014-06-30 17:13:00,15.533333333333333,84.0,cloudy,37,19.0,289,19.0
2351751,Male,2014-06-30 17:00:00,2014-06-30 17:07:00,7.566666666666666,84.0,cloudy,169,15.0,134,19.0
2331505,Female,2014-06-29 13:29:00,2014-06-29 13:36:00,7.716666666666668,82.9,cloudy,181,31.0,106,27.0
2336748,Male,2014-06-29 16:44:00,2014-06-29 17:05:00,21.08333333333333,84.9,cloudy,268,15.0,232,23.0
2350914,Male,2014-06-30 16:27:00,2014-06-30 16:36:00,8.3,84.0,cloudy,49,27.0,191,23.0
2341339,Female,2014-06-29 20:48:00,2014-06-29 21:15:00,26.88333333333333,81.0,cloudy,99,19.0,62,27.0
2352857,Female,2014-06-30 17:36:00,2014-06-30 17:40:00,4.4,84.0,cloudy,304,15.0,303,15.0
2340830,Female,2014-06-29 20:11:00,2014-06-29 20:19:00,7.5166666666666675,81.0,cloudy,123,15.0,116,15.0
2343445,Male,2014-06-30 07:00:00,2014-06-30 07:03:00,3.083333333333333,73.0,cloudy,74,23.0,48,27.0
2338352,Male,2014-06-29 17:59:00,2014-06-29 18:11:00,11.7,84.9,cloudy,84,19.0,134,19.0
2338548,Male,2014-06-29 18:07:00,2014-06-29 18:41:00,33.9,84.2,cloudy,349,15.0,73,19.0
2338565,Male,2014-06-29 18:08:00,2014-06-29 18:25:00,17.183333333333334,84.2,cloudy,176,19.0,152,15.0
2340701,Male,2014-06-29 19:58:00,2014-06-29 20:28:00,30.36666666666667,82.0,cloudy,234,19.0,293,19.0
2338809,Male,2014-06-29 18:22:00,2014-06-29 18:43:00,21.233333333333334,84.2,cloudy,254,15.0,249,15.0
2349824,Male,2014-06-30 15:15:00,2014-06-30 15:29:00,13.45,84.9,cloudy,283,23.0,186,15.0
2352795,Male,2014-06-30 17:34:00,2014-06-30 17:51:00,17.233333333333334,84.0,cloudy,168,19.0,168,19.0
2350998,Male,2014-06-30 16:26:00,2014-06-30 16:35:00,9.2,84.0,cloudy,52,31.0,91,31.0
2349370,Male,2014-06-30 14:37:00,2014-06-30 14:48:00,11.433333333333335,87.1,cloudy,290,15.0,213,15.0
2354718,Male,2014-06-30 19:49:00,2014-06-30 20:07:00,18.733333333333334,73.0,tstorms,55,15.0,55,15.0
2351674,Male,2014-06-30 16:58:00,2014-06-30 17:06:00,8.1,84.0,cloudy,264,19.0,66,19.0
2338317,Male,2014-06-29 17:57:00,2014-06-29 18:34:00,37.3,84.9,cloudy,263,11.0,75,23.0
2342680,Male,2014-06-29 23:22:00,2014-06-29 23:28:00,5.416666666666668,78.1,cloudy,350,15.0,214,15.0
2341145,Male,2014-06-29 20:33:00,2014-06-29 21:02:00,28.766666666666666,81.0,cloudy,341,19.0,150,11.0
2352020,Male,2014-06-30 17:06:00,2014-06-30 17:15:00,9.116666666666667,84.0,cloudy,264,19.0,212,31.0
2334784,Male,2014-06-29 15:27:00,2014-06-29 15:54:00,26.91666666666667,82.9,cloudy,324,15.0,295,15.0
2344311,Male,2014-06-30 07:57:00,2014-06-30 08:11:00,14.133333333333333,73.0,cloudy,75,23.0,35,39.0
2343074,Male,2014-06-30 05:35:00,2014-06-30 05:41:00,5.9833333333333325,73.0,cloudy,332,15.0,327,19.0
2341720,Female,2014-06-29 21:15:00,2014-06-29 21:19:00,3.5166666666666666,79.0,cloudy,248,15.0,322,15.0
2353889,Male,2014-06-30 18:18:00,2014-06-30 18:21:00,2.9166666666666665,82.0,tstorms,309,11.0,158,15.0
2342764,Male,2014-06-29 23:50:00,2014-06-30 00:13:00,23.48333333333333,78.1,cloudy,244,19.0,303,15.0
2352400,Female,2014-06-30 17:20:00,2014-06-30 17:42:00,21.95,84.0,cloudy,81,39.0,340,15.0
2331873,Male,2014-06-29 13:46:00,2014-06-29 14:10:00,24.91666666666667,82.9,cloudy,156,15.0,94,19.0
2351727,Male,2014-06-30 16:59:00,2014-06-30 17:16:00,16.633333333333333,84.0,cloudy,134,19.0,69,19.0
2345935,Male,2014-06-30 09:22:00,2014-06-30 09:30:00,7.6,78.1,cloudy,255,31.0,90,35.0
2353412,Male,2014-06-30 17:56:00,2014-06-30 18:23:00,26.88333333333333,84.0,cloudy,51,31.0,223,15.0
2349890,Male,2014-06-30 15:22:00,2014-06-30 15:30:00,8.0,84.9,cloudy,181,31.0,106,27.0
2336202,Male,2014-06-29 16:22:00,2014-06-29 16:29:00,6.833333333333332,84.9,cloudy,268,15.0,143,15.0
2332401,Male,2014-06-29 14:04:00,2014-06-29 14:32:00,28.3,84.0,cloudy,312,15.0,94,19.0
2348888,Male,2014-06-30 13:56:00,2014-06-30 13:59:00,3.15,84.9,cloudy,60,19.0,93,15.0
2347827,Male,2014-06-30 12:20:00,2014-06-30 12:31:00,11.85,82.0,cloudy,287,27.0,291,19.0
2335752,Male,2014-06-29 16:05:00,2014-06-29 16:12:00,7.05,84.9,cloudy,144,15.0,87,19.0
2346051,Male,2014-06-30 09:33:00,2014-06-30 09:49:00,16.85,78.1,cloudy,120,15.0,51,31.0
2348804,Male,2014-06-30 13:49:00,2014-06-30 13:57:00,8.366666666666667,84.9,cloudy,51,31.0,26,31.0
2337626,Male,2014-06-29 17:22:00,2014-06-29 17:30:00,8.1,84.9,cloudy,16,11.0,309,11.0
2336628,Male,2014-06-29 16:39:00,2014-06-29 16:55:00,16.25,84.9,cloudy,154,15.0,69,19.0
2349449,Male,2014-06-30 14:44:00,2014-06-30 14:49:00,5.466666666666668,87.1,cloudy,185,11.0,290,15.0
2352026,Male,2014-06-30 17:08:00,2014-06-30 17:18:00,9.366666666666667,84.0,cloudy,195,31.0,91,31.0
2339115,Male,2014-06-29 18:36:00,2014-06-29 18:41:00,4.35,84.2,cloudy,114,27.0,232,23.0
2345532,Male,2014-06-30 08:56:00,2014-06-30 09:00:00,3.9166666666666665,75.0,cloudy,174,23.0,98,15.0
2332243,Male,2014-06-29 13:58:00,2014-06-29 14:24:00,26.33333333333333,82.9,cloudy,333,15.0,93,15.0
2346818,Male,2014-06-30 10:52:00,2014-06-30 10:59:00,6.833333333333332,78.1,cloudy,48,27.0,291,19.0
2353493,Male,2014-06-30 17:59:00,2014-06-30 18:17:00,17.933333333333334,84.0,cloudy,66,19.0,69,19.0
2337808,Female,2014-06-29 17:31:00,2014-06-29 17:47:00,16.416666666666668,84.9,cloudy,35,39.0,255,31.0
2354440,Male,2014-06-30 18:52:00,2014-06-30 18:56:00,4.166666666666667,82.0,tstorms,75,23.0,198,19.0
2344857,Male,2014-06-30 08:25:00,2014-06-30 08:35:00,10.116666666666667,75.0,cloudy,77,23.0,37,19.0
2338197,Male,2014-06-29 17:51:00,2014-06-29 18:09:00,18.766666666666666,84.9,cloudy,177,15.0,99,19.0
2349939,Male,2014-06-30 15:24:00,2014-06-30 15:30:00,5.4,84.9,cloudy,72,15.0,338,15.0
2347092,Male,2014-06-30 11:22:00,2014-06-30 11:32:00,9.733333333333333,79.0,cloudy,110,23.0,194,11.0
2347136,Female,2014-06-30 11:26:00,2014-06-30 11:31:00,4.966666666666667,79.0,cloudy,74,23.0,181,31.0
2342887,Male,2014-06-30 00:38:00,2014-06-30 00:43:00,5.616666666666666,78.1,cloudy,226,15.0,300,15.0
2344767,Female,2014-06-30 08:20:00,2014-06-30 08:41:00,21.116666666666667,75.0,cloudy,220,19.0,173,15.0
2351335,Male,2014-06-30 16:47:00,2014-06-30 16:53:00,6.1,84.0,cloudy,51,31.0,192,39.0
2348169,Male,2014-06-30 12:54:00,2014-06-30 13:02:00,8.4,82.0,cloudy,66,19.0,110,23.0
2354487,Male,2014-06-30 18:56:00,2014-06-30 19:05:00,8.833333333333334,82.0,tstorms,118,19.0,34,15.0
2345814,Male,2014-06-30 09:13:00,2014-06-30 09:30:00,16.916666666666668,78.1,cloudy,24,15.0,90,35.0
2344619,Male,2014-06-30 08:14:00,2014-06-30 08:38:00,23.66666666666667,75.0,cloudy,131,15.0,110,23.0
2336070,Female,2014-06-29 16:16:00,2014-06-29 16:36:00,19.983333333333334,84.9,cloudy,97,35.0,137,15.0
2353002,Male,2014-06-30 17:40:00,2014-06-30 17:46:00,5.133333333333334,84.0,cloudy,210,19.0,183,15.0
2334944,Male,2014-06-29 15:33:00,2014-06-29 16:09:00,35.65,82.9,cloudy,249,15.0,234,19.0
2352711,Male,2014-06-30 17:31:00,2014-06-30 17:41:00,9.916666666666666,84.0,cloudy,212,31.0,192,39.0
2346848,Male,2014-06-30 10:55:00,2014-06-30 11:02:00,7.116666666666666,78.1,cloudy,240,23.0,117,23.0
2334688,Male,2014-06-29 15:23:00,2014-06-29 15:46:00,23.466666666666665,82.9,cloudy,324,15.0,85,23.0
2346252,Female,2014-06-30 09:50:00,2014-06-30 09:58:00,8.016666666666667,78.1,cloudy,144,15.0,60,19.0
2343617,Male,2014-06-30 07:16:00,2014-06-30 07:22:00,5.85,73.0,cloudy,292,11.0,229,19.0
2344957,Male,2014-06-30 08:29:00,2014-06-30 08:40:00,11.433333333333335,75.0,cloudy,232,23.0,251,15.0
2339764,Male,2014-06-29 19:12:00,2014-06-29 19:28:00,16.116666666666667,82.0,cloudy,164,23.0,30,15.0
2339479,Female,2014-06-29 18:54:00,2014-06-29 19:13:00,18.666666666666668,84.2,cloudy,225,15.0,157,15.0
2344240,Female,2014-06-30 07:52:00,2014-06-30 08:01:00,9.35,73.0,cloudy,310,11.0,87,19.0
2351425,Female,2014-06-30 16:50:00,2014-06-30 16:52:00,2.083333333333333,84.0,cloudy,314,15.0,244,19.0
2351605,Male,2014-06-30 16:53:00,2014-06-30 17:05:00,11.95,84.0,cloudy,120,15.0,279,15.0
2334617,Male,2014-06-29 15:21:00,2014-06-29 15:59:00,38.05,82.9,cloudy,346,15.0,215,15.0
2338547,Male,2014-06-29 18:07:00,2014-06-29 18:21:00,14.466666666666667,84.2,cloudy,176,19.0,71,15.0
2340148,Male,2014-06-29 19:33:00,2014-06-29 19:41:00,7.466666666666668,82.0,cloudy,94,19.0,127,15.0
2354544,Male,2014-06-30 19:02:00,2014-06-30 19:09:00,6.716666666666668,73.0,tstorms,344,15.0,234,19.0
2354549,Male,2014-06-30 19:03:00,2014-06-30 19:09:00,6.65,73.0,tstorms,114,27.0,347,15.0
2343237,Male,2014-06-30 06:35:00,2014-06-30 06:52:00,17.6,73.0,cloudy,168,19.0,76,39.0
2353805,Male,2014-06-30 18:14:00,2014-06-30 18:19:00,5.633333333333334,82.0,tstorms,91,31.0,80,19.0
2345494,Female,2014-06-30 08:55:00,2014-06-30 09:05:00,9.933333333333334,75.0,cloudy,192,39.0,43,43.0
2345516,Male,2014-06-30 08:55:00,2014-06-30 09:00:00,5.166666666666667,75.0,cloudy,77,23.0,80,19.0
2349455,Female,2014-06-30 14:44:00,2014-06-30 15:04:00,19.85,87.1,cloudy,71,15.0,291,19.0
2352905,Male,2014-06-30 17:38:00,2014-06-30 18:01:00,23.4,84.0,cloudy,90,35.0,274,15.0
2349049,Male,2014-06-30 14:10:00,2014-06-30 14:24:00,13.366666666666667,87.1,cloudy,44,27.0,198,19.0
2333292,Male,2014-06-29 14:34:00,2014-06-29 14:49:00,15.1,84.0,cloudy,115,23.0,165,19.0
2342336,Male,2014-06-29 22:28:00,2014-06-29 22:38:00,9.45,78.1,cloudy,154,15.0,246,11.0
2345149,Male,2014-06-30 08:38:00,2014-06-30 08:47:00,9.3,75.0,cloudy,59,19.0,170,15.0
2338660,Female,2014-06-29 18:12:00,2014-06-29 18:43:00,31.0,84.2,cloudy,177,15.0,332,15.0
2341225,Male,2014-06-29 20:39:00,2014-06-29 21:14:00,35.016666666666666,81.0,cloudy,85,23.0,14,15.0
2343302,Female,2014-06-30 06:41:00,2014-06-30 06:48:00,6.8,73.0,cloudy,276,11.0,69,19.0
2344562,Male,2014-06-30 08:12:00,2014-06-30 08:18:00,6.65,75.0,cloudy,43,43.0,174,23.0
2343592,Female,2014-06-30 07:09:00,2014-06-30 07:36:00,26.65,73.0,cloudy,297,15.0,93,15.0
2344473,Male,2014-06-30 08:06:00,2014-06-30 08:13:00,6.45,75.0,cloudy,287,27.0,91,31.0
2350673,Male,2014-06-30 16:12:00,2014-06-30 16:25:00,12.133333333333333,84.0,cloudy,51,31.0,301,19.0
2353814,Male,2014-06-30 18:14:00,2014-06-30 18:29:00,15.216666666666667,82.0,tstorms,168,19.0,43,43.0
2332398,Female,2014-06-29 14:04:00,2014-06-29 14:10:00,5.766666666666668,84.0,cloudy,209,11.0,120,15.0
2334744,Male,2014-06-29 15:26:00,2014-06-29 15:34:00,8.25,82.9,cloudy,198,19.0,66,19.0
2348254,Male,2014-06-30 13:03:00,2014-06-30 13:14:00,11.716666666666667,84.9,cloudy,318,15.0,311,15.0
2343661,Male,2014-06-30 07:20:00,2014-06-30 07:30:00,10.516666666666667,73.0,cloudy,15,15.0,280,11.0
2353389,Male,2014-06-30 17:55:00,2014-06-30 18:02:00,6.8,84.0,cloudy,210,19.0,305,15.0
2348103,Female,2014-06-30 12:50:00,2014-06-30 12:56:00,5.9,82.0,cloudy,113,15.0,331,19.0
2353703,Male,2014-06-30 18:09:00,2014-06-30 18:32:00,23.48333333333333,82.0,tstorms,52,31.0,16,11.0
2343517,Male,2014-06-30 07:08:00,2014-06-30 07:18:00,10.016666666666667,73.0,cloudy,50,27.0,195,31.0
2351496,Female,2014-06-30 16:52:00,2014-06-30 16:59:00,6.55,84.0,cloudy,244,19.0,308,11.0
2332643,Female,2014-06-29 14:11:00,2014-06-29 14:20:00,9.166666666666666,84.0,cloudy,291,19.0,212,31.0
2352918,Female,2014-06-30 17:38:00,2014-06-30 17:49:00,11.366666666666667,84.0,cloudy,174,23.0,90,35.0
2345454,Male,2014-06-30 08:52:00,2014-06-30 08:58:00,6.0,75.0,cloudy,36,31.0,100,23.0
2352036,Female,2014-06-30 17:08:00,2014-06-30 17:13:00,4.733333333333333,84.0,cloudy,130,15.0,16,11.0
2340128,Male,2014-06-29 19:32:00,2014-06-29 19:49:00,17.583333333333332,82.0,cloudy,76,39.0,341,19.0
2352250,Male,2014-06-30 17:15:00,2014-06-30 17:20:00,5.15,84.0,cloudy,118,19.0,288,11.0
2344345,Male,2014-06-30 08:00:00,2014-06-30 08:07:00,7.216666666666668,75.0,cloudy,37,19.0,194,11.0
2338757,Male,2014-06-29 18:18:00,2014-06-29 18:34:00,15.5,84.2,cloudy,174,23.0,22,15.0
2343113,Male,2014-06-30 06:02:00,2014-06-30 06:06:00,4.033333333333333,73.0,cloudy,153,19.0,115,23.0
2337013,Female,2014-06-29 16:54:00,2014-06-29 17:20:00,25.7,84.9,cloudy,118,19.0,326,11.0
2339012,Female,2014-06-29 18:31:00,2014-06-29 19:10:00,38.38333333333333,84.2,cloudy,34,15.0,114,27.0
2343454,Male,2014-06-30 07:02:00,2014-06-30 07:06:00,4.633333333333334,73.0,cloudy,69,19.0,160,15.0
2332706,Female,2014-06-29 14:14:00,2014-06-29 14:17:00,3.4166666666666665,84.0,cloudy,302,19.0,152,15.0
2353305,Male,2014-06-30 17:51:00,2014-06-30 17:56:00,5.0,84.0,cloudy,195,31.0,81,39.0
2344660,Male,2014-06-30 08:16:00,2014-06-30 08:24:00,8.55,75.0,cloudy,191,23.0,181,31.0
2352761,Male,2014-06-30 17:33:00,2014-06-30 17:40:00,7.2,84.0,cloudy,71,15.0,75,23.0
2353277,Male,2014-06-30 17:50:00,2014-06-30 18:08:00,17.616666666666667,84.0,cloudy,198,19.0,183,15.0
2340101,Male,2014-06-29 19:30:00,2014-06-29 19:48:00,18.366666666666667,82.0,cloudy,286,23.0,130,15.0
2351677,Female,2014-06-30 16:57:00,2014-06-30 17:07:00,9.466666666666667,84.0,cloudy,261,15.0,21,15.0
2346809,Male,2014-06-30 10:52:00,2014-06-30 11:08:00,16.4,78.1,cloudy,303,15.0,238,15.0
2351366,Male,2014-06-30 16:48:00,2014-06-30 17:12:00,23.83333333333333,84.0,cloudy,126,15.0,158,15.0
2337456,Female,2014-06-29 17:14:00,2014-06-29 17:17:00,3.2666666666666666,84.9,cloudy,219,11.0,310,11.0
2345074,Male,2014-06-30 08:34:00,2014-06-30 08:38:00,3.966666666666667,75.0,cloudy,272,11.0,147,15.0
2347405,Male,2014-06-30 11:47:00,2014-06-30 11:54:00,6.4,79.0,cloudy,19,15.0,342,15.0
2339183,Male,2014-06-29 18:41:00,2014-06-29 19:00:00,19.233333333333334,84.2,cloudy,177,15.0,26,31.0
2352683,Female,2014-06-30 17:28:00,2014-06-30 17:47:00,19.683333333333334,84.0,cloudy,91,31.0,255,31.0
2347624,Male,2014-06-30 12:09:00,2014-06-30 12:20:00,11.8,82.0,cloudy,112,15.0,53,19.0
2338311,Female,2014-06-29 17:57:00,2014-06-29 18:11:00,14.2,84.9,cloudy,56,19.0,57,15.0
2351400,Male,2014-06-30 16:50:00,2014-06-30 17:04:00,14.583333333333336,84.0,cloudy,106,27.0,91,31.0
2351143,Male,2014-06-30 16:38:00,2014-06-30 16:45:00,7.216666666666668,84.0,cloudy,264,19.0,164,23.0
2339778,Female,2014-06-29 19:13:00,2014-06-29 19:21:00,8.016666666666667,82.0,cloudy,344,15.0,297,15.0
2334632,Male,2014-06-29 15:22:00,2014-06-29 15:37:00,15.033333333333333,82.9,cloudy,334,19.0,289,19.0
2350917,Male,2014-06-30 16:27:00,2014-06-30 16:29:00,2.066666666666667,84.0,cloudy,152,15.0,302,19.0
2341760,Female,2014-06-29 21:20:00,2014-06-29 21:26:00,6.216666666666668,79.0,cloudy,300,15.0,329,15.0
2332198,Male,2014-06-29 13:56:00,2014-06-29 14:12:00,15.716666666666667,82.9,cloudy,315,11.0,290,15.0
2344977,Male,2014-06-30 08:30:00,2014-06-30 08:34:00,3.95,75.0,cloudy,28,15.0,118,19.0
2352320,Female,2014-06-30 17:17:00,2014-06-30 17:45:00,27.73333333333333,84.0,cloudy,249,15.0,324,15.0
2338104,Male,2014-06-29 17:44:00,2014-06-29 18:04:00,19.9,84.9,cloudy,165,19.0,141,23.0
2341930,Female,2014-06-29 21:36:00,2014-06-29 21:47:00,11.2,79.0,cloudy,268,15.0,113,15.0
2348512,Female,2014-06-30 13:25:00,2014-06-30 13:36:00,10.166666666666666,84.9,cloudy,20,15.0,92,19.0
2352406,Male,2014-06-30 17:20:00,2014-06-30 17:25:00,4.733333333333333,84.0,cloudy,284,23.0,321,19.0
2338071,Male,2014-06-29 17:43:00,2014-06-29 18:08:00,25.08333333333333,84.9,cloudy,16,11.0,340,15.0
2352448,Male,2014-06-30 17:21:00,2014-06-30 17:48:00,27.0,84.0,cloudy,164,23.0,15,15.0
2343943,Male,2014-06-30 07:39:00,2014-06-30 07:50:00,11.316666666666665,73.0,cloudy,301,19.0,49,27.0
2352087,Male,2014-06-30 17:07:00,2014-06-30 17:23:00,16.05,84.0,cloudy,91,31.0,138,15.0
2339857,Male,2014-06-29 19:17:00,2014-06-29 19:36:00,19.533333333333328,82.0,cloudy,117,23.0,324,15.0
2332104,Female,2014-06-29 13:54:00,2014-06-29 14:17:00,23.5,82.9,cloudy,277,15.0,199,15.0
2350106,Male,2014-06-30 15:35:00,2014-06-30 15:51:00,16.366666666666667,84.9,cloudy,33,27.0,84,19.0
2344762,Male,2014-06-30 08:20:00,2014-06-30 08:29:00,8.566666666666666,75.0,cloudy,43,43.0,174,23.0
2351014,Female,2014-06-30 16:31:00,2014-06-30 16:43:00,12.25,84.0,cloudy,110,23.0,192,39.0
2338425,Female,2014-06-29 18:02:00,2014-06-29 18:14:00,11.483333333333333,84.2,cloudy,255,31.0,90,35.0
2343236,Female,2014-06-30 06:35:00,2014-06-30 06:40:00,5.016666666666667,73.0,cloudy,190,15.0,67,15.0
2339713,Female,2014-06-29 19:09:00,2014-06-29 19:21:00,11.316666666666665,82.0,cloudy,302,19.0,230,19.0
2331669,Female,2014-06-29 13:38:00,2014-06-29 13:52:00,13.916666666666664,82.9,cloudy,13,19.0,20,15.0
2353506,Male,2014-06-30 18:00:00,2014-06-30 18:09:00,9.416666666666666,82.0,tstorms,177,15.0,156,15.0
2352319,Female,2014-06-30 17:17:00,2014-06-30 17:26:00,9.35,84.0,cloudy,43,43.0,5,19.0
2352719,Male,2014-06-30 17:31:00,2014-06-30 17:37:00,5.866666666666666,84.0,cloudy,138,15.0,289,19.0
2336553,Female,2014-06-29 16:36:00,2014-06-29 16:48:00,12.416666666666664,84.9,cloudy,332,15.0,250,19.0
2353056,Female,2014-06-30 17:42:00,2014-06-30 17:51:00,8.766666666666667,84.0,cloudy,176,19.0,94,19.0
2335260,Male,2014-06-29 15:47:00,2014-06-29 16:13:00,26.1,82.9,cloudy,264,19.0,268,15.0
2351678,Female,2014-06-30 16:58:00,2014-06-30 17:01:00,3.333333333333333,84.0,cloudy,213,15.0,159,9.0
2340477,Male,2014-06-29 19:51:00,2014-06-29 20:05:00,13.916666666666664,82.0,cloudy,327,19.0,299,15.0
2341171,Male,2014-06-29 20:35:00,2014-06-29 20:46:00,10.983333333333333,81.0,cloudy,93,15.0,153,19.0
2350877,Female,2014-06-30 16:25:00,2014-06-30 16:40:00,14.716666666666667,84.0,cloudy,137,15.0,280,11.0
2351710,Male,2014-06-30 16:59:00,2014-06-30 17:03:00,4.633333333333334,84.0,cloudy,37,19.0,192,39.0
2334946,Male,2014-06-29 15:33:00,2014-06-29 15:47:00,13.3,82.9,cloudy,343,15.0,177,15.0
2346480,Male,2014-06-30 10:17:00,2014-06-30 10:42:00,24.88333333333333,78.1,cloudy,272,11.0,284,23.0
2335266,Female,2014-06-29 15:47:00,2014-06-29 15:57:00,9.633333333333333,82.9,cloudy,343,15.0,93,15.0
2351856,Male,2014-06-30 17:04:00,2014-06-30 17:13:00,9.333333333333334,84.0,cloudy,195,31.0,91,31.0
2339206,Female,2014-06-29 18:42:00,2014-06-29 19:05:00,23.65,84.2,cloudy,176,19.0,329,15.0
2343736,Male,2014-06-30 07:25:00,2014-06-30 07:37:00,11.983333333333333,73.0,cloudy,291,19.0,52,31.0
2342166,Female,2014-06-29 22:03:00,2014-06-29 22:10:00,6.6,78.1,cloudy,326,11.0,242,15.0
2347637,Male,2014-06-30 12:09:00,2014-06-30 12:22:00,12.55,82.0,cloudy,24,15.0,91,31.0
2351226,Male,2014-06-30 16:42:00,2014-06-30 17:00:00,17.266666666666666,84.0,cloudy,51,31.0,61,15.0
2346964,Male,2014-06-30 11:09:00,2014-06-30 11:13:00,4.066666666666666,79.0,cloudy,196,19.0,47,19.0
2346583,Male,2014-06-30 10:31:00,2014-06-30 10:46:00,14.5,78.1,cloudy,207,15.0,108,19.0
2346569,Male,2014-06-30 10:30:00,2014-06-30 10:56:00,25.73333333333333,78.1,cloudy,313,19.0,313,19.0
2340978,Male,2014-06-29 20:22:00,2014-06-29 20:41:00,19.016666666666666,81.0,cloudy,268,15.0,268,15.0
2336751,Male,2014-06-29 16:44:00,2014-06-29 16:53:00,8.683333333333334,84.9,cloudy,301,19.0,94,19.0
2338598,Male,2014-06-29 18:08:00,2014-06-29 18:17:00,8.316666666666666,84.2,cloudy,330,19.0,114,27.0
2333013,Male,2014-06-29 14:24:00,2014-06-29 14:33:00,8.733333333333333,84.0,cloudy,250,19.0,156,15.0
2345612,Male,2014-06-30 08:59:00,2014-06-30 09:28:00,29.11666666666667,75.0,cloudy,119,19.0,20,15.0
2341697,Female,2014-06-29 21:13:00,2014-06-29 21:23:00,10.133333333333333,79.0,cloudy,144,15.0,115,23.0
2346246,Male,2014-06-30 09:50:00,2014-06-30 10:04:00,14.4,78.1,cloudy,168,19.0,287,27.0
2352797,Male,2014-06-30 17:34:00,2014-06-30 17:43:00,8.516666666666667,84.0,cloudy,286,23.0,90,35.0
2351388,Male,2014-06-30 16:49:00,2014-06-30 17:09:00,19.866666666666667,84.0,cloudy,66,19.0,273,15.0
2342788,Male,2014-06-29 23:53:00,2014-06-30 00:06:00,12.8,78.1,cloudy,67,15.0,117,23.0
2331967,Male,2014-06-29 13:49:00,2014-06-29 13:53:00,4.633333333333334,82.9,cloudy,195,31.0,51,31.0
2343373,Male,2014-06-30 06:49:00,2014-06-30 07:00:00,10.533333333333333,73.0,cloudy,174,23.0,26,31.0
2345782,Male,2014-06-30 09:11:00,2014-06-30 09:23:00,11.333333333333336,78.1,cloudy,199,15.0,283,23.0
2351848,Male,2014-06-30 17:03:00,2014-06-30 17:26:00,23.266666666666666,84.0,cloudy,286,23.0,25,23.0
2351827,Male,2014-06-30 17:03:00,2014-06-30 17:06:00,3.6666666666666665,84.0,cloudy,134,19.0,192,39.0
2353711,Male,2014-06-30 18:09:00,2014-06-30 18:19:00,9.3,82.0,tstorms,69,19.0,123,15.0
2351929,Male,2014-06-30 17:06:00,2014-06-30 17:12:00,5.783333333333332,84.0,cloudy,158,15.0,16,11.0
2351752,Male,2014-06-30 16:57:00,2014-06-30 17:18:00,21.133333333333333,84.0,cloudy,75,23.0,305,15.0
2343398,Male,2014-06-30 06:53:00,2014-06-30 06:57:00,3.566666666666667,73.0,cloudy,192,39.0,283,23.0
2343962,Male,2014-06-30 07:38:00,2014-06-30 07:40:00,2.3,73.0,cloudy,93,15.0,60,19.0
2344075,Male,2014-06-30 07:47:00,2014-06-30 07:54:00,6.85,73.0,cloudy,66,19.0,47,19.0
2332299,Male,2014-06-29 14:00:00,2014-06-29 14:17:00,16.333333333333332,84.0,cloudy,177,15.0,249,15.0
2343187,Male,2014-06-30 06:28:00,2014-06-30 06:38:00,10.033333333333333,73.0,cloudy,190,15.0,20,15.0
2344731,Male,2014-06-30 08:20:00,2014-06-30 08:32:00,12.45,75.0,cloudy,75,23.0,51,31.0
2350115,Male,2014-06-30 15:35:00,2014-06-30 15:56:00,21.48333333333333,84.9,cloudy,36,31.0,350,15.0
2350788,Male,2014-06-30 16:20:00,2014-06-30 16:32:00,11.95,84.0,cloudy,149,11.0,149,11.0
2344797,Male,2014-06-30 08:22:00,2014-06-30 08:25:00,3.3,75.0,cloudy,316,19.0,344,15.0
2342324,Male,2014-06-29 22:26:00,2014-06-29 22:38:00,11.233333333333333,78.1,cloudy,120,15.0,280,11.0
2333178,Female,2014-06-29 14:30:00,2014-06-29 14:53:00,23.08333333333333,84.0,cloudy,150,11.0,247,15.0
2346655,Male,2014-06-30 10:37:00,2014-06-30 11:23:00,46.48333333333333,78.1,cloudy,157,15.0,164,23.0
2343777,Male,2014-06-30 07:29:00,2014-06-30 07:45:00,15.7,73.0,cloudy,192,39.0,120,15.0
2354481,Female,2014-06-30 18:55:00,2014-06-30 19:19:00,23.9,82.0,tstorms,232,23.0,156,15.0
2351837,Male,2014-06-30 17:03:00,2014-06-30 17:19:00,15.733333333333333,84.0,cloudy,28,15.0,350,15.0
2350694,Male,2014-06-30 16:14:00,2014-06-30 16:18:00,4.5,84.0,cloudy,181,31.0,111,19.0
2333345,Male,2014-06-29 14:36:00,2014-06-29 14:57:00,20.933333333333334,84.0,cloudy,177,15.0,312,15.0
2352571,Male,2014-06-30 17:26:00,2014-06-30 17:38:00,12.366666666666667,84.0,cloudy,93,15.0,258,19.0
2351868,Male,2014-06-30 17:04:00,2014-06-30 17:08:00,4.116666666666666,84.0,cloudy,195,31.0,43,43.0
2347621,Male,2014-06-30 12:09:00,2014-06-30 12:33:00,24.86666666666667,82.0,cloudy,77,23.0,301,19.0
2353457,Male,2014-06-30 17:58:00,2014-06-30 18:08:00,9.9,84.0,cloudy,152,15.0,227,15.0
2348064,Male,2014-06-30 12:47:00,2014-06-30 12:54:00,6.966666666666668,82.0,cloudy,198,19.0,84,19.0
2350322,Female,2014-06-30 15:50:00,2014-06-30 16:03:00,12.55,84.9,cloudy,100,23.0,186,15.0
2343897,Male,2014-06-30 07:37:00,2014-06-30 07:54:00,17.2,73.0,cloudy,220,19.0,53,19.0
2339298,Male,2014-06-29 18:46:00,2014-06-29 19:01:00,14.45,84.2,cloudy,260,19.0,130,15.0
2344057,Female,2014-06-30 07:46:00,2014-06-30 07:52:00,5.7,73.0,cloudy,239,15.0,344,15.0
2344853,Male,2014-06-30 08:25:00,2014-06-30 08:39:00,14.433333333333335,75.0,cloudy,48,27.0,134,19.0
2352075,Male,2014-06-30 17:10:00,2014-06-30 17:18:00,8.416666666666666,84.0,cloudy,43,43.0,174,23.0
2350492,Male,2014-06-30 16:01:00,2014-06-30 16:27:00,26.45,84.0,cloudy,100,23.0,127,15.0
2354454,Female,2014-06-30 18:53:00,2014-06-30 19:09:00,16.133333333333333,82.0,tstorms,17,15.0,305,15.0
2337105,Male,2014-06-29 16:59:00,2014-06-29 17:23:00,24.216666666666665,84.9,cloudy,177,15.0,177,15.0
2340733,Male,2014-06-29 20:04:00,2014-06-29 20:20:00,15.333333333333336,81.0,cloudy,35,39.0,45,15.0
2334807,Male,2014-06-29 15:28:00,2014-06-29 15:36:00,7.65,82.9,cloudy,205,15.0,14,15.0
2351822,Male,2014-06-30 17:03:00,2014-06-30 17:15:00,12.516666666666667,84.0,cloudy,261,15.0,77,23.0
2343839,Male,2014-06-30 07:34:00,2014-06-30 07:40:00,5.916666666666668,73.0,cloudy,165,19.0,117,23.0
2344144,Male,2014-06-30 07:50:00,2014-06-30 07:55:00,5.1,73.0,cloudy,130,15.0,213,15.0
2340841,Male,2014-06-29 20:12:00,2014-06-29 20:21:00,9.45,81.0,cloudy,174,23.0,22,15.0
2345813,Male,2014-06-30 09:13:00,2014-06-30 09:22:00,9.2,78.1,cloudy,66,19.0,48,27.0
2340464,Male,2014-06-29 19:50:00,2014-06-29 19:59:00,8.116666666666667,82.0,cloudy,110,23.0,26,31.0
2347532,Male,2014-06-30 12:01:00,2014-06-30 12:14:00,13.3,82.0,cloudy,51,31.0,255,31.0
2345015,Male,2014-06-30 08:31:00,2014-06-30 08:58:00,26.66666666666667,75.0,cloudy,94,19.0,174,23.0
2338571,Female,2014-06-29 18:08:00,2014-06-29 18:28:00,20.45,84.2,cloudy,258,19.0,289,19.0
2352362,Male,2014-06-30 17:19:00,2014-06-30 17:30:00,11.433333333333335,84.0,cloudy,100,23.0,175,19.0
2353464,Male,2014-06-30 17:58:00,2014-06-30 18:02:00,3.616666666666666,84.0,cloudy,69,19.0,315,11.0
2353853,Female,2014-06-30 18:16:00,2014-06-30 18:23:00,6.666666666666668,82.0,tstorms,113,15.0,144,15.0
2352361,Male,2014-06-30 17:18:00,2014-06-30 17:34:00,15.35,84.0,cloudy,287,27.0,28,15.0
2337845,Female,2014-06-29 17:32:00,2014-06-29 17:37:00,4.65,84.9,cloudy,17,15.0,183,15.0
2345287,Male,2014-06-30 08:43:00,2014-06-30 08:47:00,3.583333333333333,75.0,cloudy,343,15.0,67,15.0
2346267,Male,2014-06-30 09:53:00,2014-06-30 10:13:00,20.33333333333333,78.1,cloudy,141,23.0,37,19.0
2335819,Male,2014-06-29 16:08:00,2014-06-29 16:21:00,12.866666666666667,84.9,cloudy,289,19.0,152,15.0
2354722,Male,2014-06-30 19:53:00,2014-06-30 20:04:00,11.2,73.0,tstorms,135,11.0,278,15.0
2339292,Female,2014-06-29 18:46:00,2014-06-29 18:57:00,11.2,84.2,cloudy,274,15.0,57,15.0
2346354,Male,2014-06-30 10:05:00,2014-06-30 10:15:00,10.15,78.1,cloudy,146,11.0,50,27.0
2347951,Female,2014-06-30 12:37:00,2014-06-30 13:05:00,27.53333333333333,82.0,cloudy,196,19.0,349,15.0
2339736,Male,2014-06-29 19:11:00,2014-06-29 19:24:00,13.733333333333333,82.0,cloudy,198,19.0,261,15.0
2335941,Male,2014-06-29 16:12:00,2014-06-29 16:29:00,16.583333333333332,84.9,cloudy,97,35.0,35,39.0
2349387,Female,2014-06-30 14:38:00,2014-06-30 14:52:00,13.566666666666665,87.1,cloudy,37,19.0,59,19.0
2352072,Female,2014-06-30 17:10:00,2014-06-30 17:17:00,7.333333333333332,84.0,cloudy,98,15.0,321,19.0
2353726,Female,2014-06-30 18:06:00,2014-06-30 18:13:00,7.25,82.0,tstorms,58,19.0,210,19.0
2332642,Female,2014-06-29 14:11:00,2014-06-29 14:20:00,8.85,84.0,cloudy,315,11.0,128,15.0
2354460,Male,2014-06-30 18:54:00,2014-06-30 18:59:00,5.416666666666668,82.0,tstorms,51,31.0,52,31.0
2332204,Male,2014-06-29 13:57:00,2014-06-29 14:12:00,15.833333333333336,82.9,cloudy,291,19.0,13,19.0
2340677,Female,2014-06-29 20:02:00,2014-06-29 20:06:00,4.466666666666667,81.0,cloudy,181,31.0,110,23.0
2340045,Male,2014-06-29 19:27:00,2014-06-29 19:46:00,18.566666666666666,82.0,cloudy,233,15.0,61,15.0
2336433,Female,2014-06-29 16:31:00,2014-06-29 16:40:00,9.416666666666666,84.9,cloudy,260,19.0,259,15.0
2332489,Female,2014-06-29 14:07:00,2014-06-29 14:26:00,19.016666666666666,84.0,cloudy,76,39.0,273,15.0
2351029,Male,2014-06-30 16:32:00,2014-06-30 16:42:00,10.15,84.0,cloudy,110,23.0,91,31.0
2352926,Male,2014-06-30 17:38:00,2014-06-30 17:54:00,15.616666666666667,84.0,cloudy,199,15.0,60,19.0
2351736,Female,2014-06-30 17:00:00,2014-06-30 17:19:00,19.2,84.0,cloudy,100,23.0,340,15.0
2343715,Male,2014-06-30 07:24:00,2014-06-30 07:34:00,10.266666666666667,73.0,cloudy,91,31.0,195,31.0
2352847,Male,2014-06-30 17:36:00,2014-06-30 17:45:00,9.033333333333333,84.0,cloudy,286,23.0,91,31.0
2351793,Female,2014-06-30 17:01:00,2014-06-30 17:17:00,15.916666666666664,84.0,cloudy,69,19.0,228,11.0
2347166,Male,2014-06-30 11:29:00,2014-06-30 11:36:00,7.3,79.0,cloudy,236,15.0,48,27.0
2350257,Male,2014-06-30 15:46:00,2014-06-30 15:52:00,6.283333333333332,84.9,cloudy,250,19.0,115,23.0
2351478,Male,2014-06-30 16:52:00,2014-06-30 17:04:00,11.55,84.0,cloudy,48,27.0,192,39.0
2345866,Male,2014-06-30 09:16:00,2014-06-30 09:22:00,6.516666666666668,78.1,cloudy,77,23.0,37,19.0
2335695,Male,2014-06-29 15:51:00,2014-06-29 16:03:00,11.95,82.9,cloudy,254,15.0,256,15.0
2338304,Male,2014-06-29 17:56:00,2014-06-29 18:11:00,14.683333333333335,84.9,cloudy,51,31.0,268,15.0
2346323,Female,2014-06-30 09:59:00,2014-06-30 10:52:00,52.51666666666666,78.1,cloudy,294,15.0,35,39.0
2350321,Female,2014-06-30 15:50:00,2014-06-30 16:08:00,17.85,84.9,cloudy,173,15.0,91,31.0
2354762,Male,2014-06-30 20:08:00,2014-06-30 20:28:00,20.316666666666666,70.0,rain or snow,148,11.0,171,11.0
2342510,Male,2014-06-29 22:54:00,2014-06-29 23:07:00,12.75,78.1,cloudy,93,15.0,228,11.0
2347885,Male,2014-06-30 12:32:00,2014-06-30 12:46:00,13.916666666666664,82.0,cloudy,100,23.0,35,39.0
2337576,Male,2014-06-29 17:20:00,2014-06-29 17:46:00,26.18333333333333,84.9,cloudy,334,19.0,118,19.0
2349426,Male,2014-06-30 14:42:00,2014-06-30 14:56:00,14.7,87.1,cloudy,176,19.0,127,15.0
2353938,Male,2014-06-30 18:21:00,2014-06-30 18:33:00,11.95,82.0,tstorms,316,19.0,242,15.0
2342152,Male,2014-06-29 22:02:00,2014-06-29 22:18:00,16.533333333333335,78.1,cloudy,59,19.0,15,15.0
2344840,Male,2014-06-30 08:24:00,2014-06-30 08:32:00,8.516666666666667,75.0,cloudy,192,39.0,32,19.0
2345628,Female,2014-06-30 09:02:00,2014-06-30 09:16:00,14.4,78.1,cloudy,329,15.0,301,19.0
2351063,Female,2014-06-30 16:33:00,2014-06-30 16:43:00,10.05,84.0,cloudy,317,15.0,21,15.0
2338694,Male,2014-06-29 18:14:00,2014-06-29 18:26:00,12.116666666666667,84.2,cloudy,46,19.0,286,23.0
2354535,Male,2014-06-30 19:01:00,2014-06-30 19:08:00,6.2333333333333325,73.0,tstorms,81,39.0,181,31.0
2345366,Female,2014-06-30 08:47:00,2014-06-30 09:15:00,27.63333333333333,75.0,cloudy,157,15.0,100,23.0
2354086,Male,2014-06-30 18:30:00,2014-06-30 18:46:00,16.7,82.0,tstorms,177,15.0,251,15.0
2349564,Female,2014-06-30 14:55:00,2014-06-30 15:02:00,7.566666666666666,87.1,cloudy,175,19.0,283,23.0
2348380,Male,2014-06-30 13:13:00,2014-06-30 13:33:00,19.7,84.9,cloudy,160,15.0,160,15.0
2345755,Male,2014-06-30 09:10:00,2014-06-30 09:13:00,2.8,78.1,cloudy,289,19.0,118,19.0
2344779,Female,2014-06-30 08:21:00,2014-06-30 08:35:00,14.133333333333333,75.0,cloudy,46,19.0,37,19.0
2333051,Female,2014-06-29 14:25:00,2014-06-29 14:40:00,14.883333333333333,84.0,cloudy,177,15.0,232,23.0
2345070,Male,2014-06-30 08:33:00,2014-06-30 08:54:00,20.733333333333334,75.0,cloudy,334,19.0,106

Download .txt

gitextract_0qfvetgz/

├── .codecov.yml
├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── blank_issue.md
│   │   ├── bug_report.md
│   │   ├── config.yml
│   │   ├── documentation_improvement.md
│   │   └── feature_request.md
│   ├── auto_assign.yml
│   └── workflows/
│       ├── auto_approve_dependency_PRs.yml
│       ├── build_docs.yml
│       ├── create_feedstock_pr.yaml
│       ├── install_test.yml
│       ├── latest_dependency_checker.yml
│       ├── lint_check.yml
│       ├── release.yml
│       ├── release_notes_updated.yml
│       └── unit_tests_with_latest_deps.yml
├── .gitignore
├── .pre-commit-config.yaml
├── .readthedocs.yaml
├── LICENSE
├── Makefile
├── README.md
├── composeml/
│   ├── __init__.py
│   ├── conftest.py
│   ├── data_slice/
│   │   ├── __init__.py
│   │   ├── extension.py
│   │   ├── generator.py
│   │   └── offset.py
│   ├── demos/
│   │   ├── __init__.py
│   │   └── transactions.csv
│   ├── label_maker.py
│   ├── label_search.py
│   ├── label_times/
│   │   ├── __init__.py
│   │   ├── description.py
│   │   ├── deserialize.py
│   │   ├── object.py
│   │   └── plots.py
│   ├── tests/
│   │   ├── __init__.py
│   │   ├── requirement_files/
│   │   │   ├── latest_core_dependencies.txt
│   │   │   ├── minimum_core_requirements.txt
│   │   │   └── minimum_test_requirements.txt
│   │   ├── test_data_slice/
│   │   │   ├── __init__.py
│   │   │   ├── test_extension.py
│   │   │   └── test_offset.py
│   │   ├── test_datasets.py
│   │   ├── test_featuretools.py
│   │   ├── test_label_maker.py
│   │   ├── test_label_plots.py
│   │   ├── test_label_serialization.py
│   │   ├── test_label_times.py
│   │   ├── test_label_transforms/
│   │   │   ├── __init__.py
│   │   │   ├── test_bin.py
│   │   │   ├── test_lead.py
│   │   │   ├── test_sample.py
│   │   │   └── test_threshold.py
│   │   ├── test_version.py
│   │   └── utils.py
│   ├── update_checker.py
│   └── version.py
├── contributing.md
├── docs/
│   ├── Makefile
│   ├── make.bat
│   └── source/
│       ├── _static/
│       │   └── style.css
│       ├── _templates/
│       │   ├── class.rst
│       │   └── layout.html
│       ├── api_reference.rst
│       ├── conf.py
│       ├── examples/
│       │   ├── demo/
│       │   │   ├── __init__.py
│       │   │   ├── chicago_bike/
│       │   │   │   ├── __init__.py
│       │   │   │   └── sample.csv
│       │   │   ├── next_purchase/
│       │   │   │   ├── __init__.py
│       │   │   │   └── sample.csv
│       │   │   ├── turbofan_degredation/
│       │   │   │   ├── __init__.py
│       │   │   │   └── sample.csv
│       │   │   └── utils.py
│       │   ├── predict_bike_trips.ipynb
│       │   ├── predict_next_purchase.ipynb
│       │   └── predict_turbofan_degredation.ipynb
│       ├── images/
│       │   ├── innovation_labs.xml
│       │   ├── label-maker.xml
│       │   ├── labeling-function.xml
│       │   └── workflow.xml
│       ├── index.rst
│       ├── install.md
│       ├── release_notes.rst
│       ├── resources/
│       │   ├── faq.ipynb
│       │   └── help.rst
│       ├── resources.rst
│       ├── start.ipynb
│       ├── tutorials.rst
│       ├── user_guide/
│       │   ├── controlling_cutoff_times.ipynb
│       │   ├── data_slice_generator.ipynb
│       │   └── using_label_transforms.ipynb
│       └── user_guide.rst
├── pyproject.toml
└── release.md

Download .txt

SYMBOL INDEX (221 symbols across 30 files)

FILE: composeml/conftest.py
  function transactions (line 9) | def transactions():
  function total_spent_fn (line 29) | def total_spent_fn():
  function unique_amounts_fn (line 38) | def unique_amounts_fn():
  function total_spent (line 46) | def total_spent():
  function labels (line 77) | def labels():
  function add_labels (line 117) | def add_labels(doctest_namespace, labels):

FILE: composeml/data_slice/extension.py
  class DataSliceContext (line 6) | class DataSliceContext:
    method __init__ (line 9) | def __init__(
    method __repr__ (line 29) | def __repr__(self):
    method _series (line 34) | def _series(self):
    method count (line 42) | def count(self):
    method start (line 47) | def start(self):
    method stop (line 52) | def stop(self):
  class DataSliceFrame (line 57) | class DataSliceFrame(pd.DataFrame):
    method _constructor (line 63) | def _constructor(self):
    method ctx (line 67) | def ctx(self):
  class DataSliceExtension (line 73) | class DataSliceExtension:
    method __init__ (line 74) | def __init__(self, df):
    method __call__ (line 77) | def __call__(self, size=None, start=None, stop=None, step=None, drop_e...
    method __getitem__ (line 96) | def __getitem__(self, offset):
    method _apply (line 102) | def _apply(self, size, start, stop, step, drop_empty=True):
    method _apply_size (line 121) | def _apply_size(self, df, start, size):
    method _apply_start (line 144) | def _apply_start(self, df, start, step):
    method _apply_step (line 160) | def _apply_step(self, df, start, step):
    method _check_index (line 172) | def _check_index(self):
    method _check_offsets (line 180) | def _check_offsets(self, size, start, stop, step):
    method _check_size (line 194) | def _check_size(self, size):
    method _check_start (line 202) | def _check_start(self, start):
    method _check_step (line 212) | def _check_step(self, step):
    method _check_stop (line 220) | def _check_stop(self, stop):
    method _get_index (line 237) | def _get_index(self, df, i):
    method _is_sorted (line 243) | def _is_sorted(self):
    method _is_time_index (line 248) | def _is_time_index(self):

FILE: composeml/data_slice/generator.py
  class DataSliceGenerator (line 4) | class DataSliceGenerator:
    method __init__ (line 7) | def __init__(
    method __call__ (line 21) | def __call__(self, df):
    method _slice_by_column (line 28) | def _slice_by_column(self, df):
    method _slice_by_time (line 45) | def _slice_by_time(self, df):

FILE: composeml/data_slice/offset.py
  class DataSliceOffset (line 6) | class DataSliceOffset:
    method __init__ (line 9) | def __init__(self, value):
    method _check (line 13) | def _check(self):
    method _is_offset_base (line 20) | def _is_offset_base(self):
    method _is_offset_position (line 25) | def _is_offset_position(self):
    method _is_offset_timedelta (line 30) | def _is_offset_timedelta(self):
    method _is_offset_timestamp (line 35) | def _is_offset_timestamp(self):
    method _is_offset_frequency (line 40) | def _is_offset_frequency(self):
    method __int__ (line 46) | def __int__(self):
    method __float__ (line 57) | def __float__(self):
    method _is_positive (line 65) | def _is_positive(self):
    method _is_valid_offset (line 72) | def _is_valid_offset(self):
    method _invalid_offset_error (line 80) | def _invalid_offset_error(self):
    method _parse_offset_alias (line 89) | def _parse_offset_alias(self, alias):
    method _parse_offset_alias_phrase (line 95) | def _parse_offset_alias_phrase(self, value):
    method _parse_value (line 110) | def _parse_value(self):
    method _parsers (line 122) | def _parsers(self):
  class DataSliceStep (line 127) | class DataSliceStep(DataSliceOffset):
    method _is_valid_offset (line 129) | def _is_valid_offset(self):
    method _parsers (line 136) | def _parsers(self):

FILE: composeml/demos/__init__.py
  function load_transactions (line 8) | def load_transactions():

FILE: composeml/label_maker.py
  class LabelMaker (line 12) | class LabelMaker:
    method __init__ (line 15) | def __init__(
    method _name_labeling_function (line 37) | def _name_labeling_function(self, function):
    method _check_labeling_function (line 42) | def _check_labeling_function(self, function, name=None):
    method labeling_function (line 48) | def labeling_function(self):
    method labeling_function (line 53) | def labeling_function(self, value):
    method _check_cutoff_time (line 78) | def _check_cutoff_time(self, value):
    method slice (line 87) | def slice(
    method _bar_format (line 147) | def _bar_format(self):
    method _check_example_count (line 155) | def _check_example_count(self, num_examples_per_instance, gap):
    method search (line 163) | def search(
    method set_index (line 295) | def set_index(self, df):

FILE: composeml/label_search.py
  class ExampleSearch (line 6) | class ExampleSearch:
    method __init__ (line 13) | def __init__(self, expected_count):
    method _check_number (line 18) | def _check_number(n):
    method _is_finite_number (line 28) | def _is_finite_number(n):
    method is_complete (line 33) | def is_complete(self):
    method is_finite (line 38) | def is_finite(self):
    method is_valid_labels (line 42) | def is_valid_labels(self, labels):
    method reset_count (line 46) | def reset_count(self):
    method update_count (line 50) | def update_count(self, labels):
  class LabelSearch (line 55) | class LabelSearch(ExampleSearch):
    method __init__ (line 63) | def __init__(self, expected_label_counts):
    method is_complete (line 72) | def is_complete(self):
    method is_complete_label (line 76) | def is_complete_label(self, label):
    method is_valid_labels (line 82) | def is_valid_labels(self, labels):
    method reset_count (line 105) | def reset_count(self):
    method update_count (line 109) | def update_count(self, labels):

FILE: composeml/label_times/description.py
  function describe_label_times (line 4) | def describe_label_times(label_times):

FILE: composeml/label_times/deserialize.py
  function read_config (line 9) | def read_config(path):
  function read_data (line 19) | def read_data(path):
  function read_label_times (line 36) | def read_label_times(path, load_settings=True):

FILE: composeml/label_times/object.py
  class LabelTimes (line 13) | class LabelTimes(pd.DataFrame):
    method __init__ (line 16) | def __init__(
    method _assert_single_target (line 38) | def _assert_single_target(self):
    method _check_target_columns (line 43) | def _check_target_columns(self):
    method _check_target_types (line 52) | def _check_target_types(self):
    method _check_label_times (line 64) | def _check_label_times(self):
    method _infer_target_columns (line 69) | def _infer_target_columns(self):
    method _is_single_target (line 82) | def _is_single_target(self):
    method _get_target_type (line 85) | def _get_target_type(self, dtype):
    method _infer_target_types (line 92) | def _infer_target_types(self):
    method select (line 102) | def select(self, target):
    method settings (line 147) | def settings(self):
    method is_discrete (line 162) | def is_discrete(self):
    method distribution (line 167) | def distribution(self):
    method count (line 181) | def count(self):
    method count_by_time (line 190) | def count_by_time(self):
    method describe (line 209) | def describe(self):
    method copy (line 215) | def copy(self, deep=True):
    method threshold (line 233) | def threshold(self, value, inplace=False):
    method apply_lead (line 255) | def apply_lead(self, value, inplace=False):
    method bin (line 274) | def bin(self, bins, quantiles=False, labels=None, right=True, precisio...
    method _sample (line 397) | def _sample(self, key, value, settings, random_state=None, replace=Fal...
    method _sample_per_label (line 415) | def _sample_per_label(self, key, value, settings, random_state=None, r...
    method sample (line 448) | def sample(
    method equals (line 566) | def equals(self, other, **kwargs):
    method _save_settings (line 580) | def _save_settings(self, path):
    method to_csv (line 594) | def to_csv(self, path, save_settings=True, **kwargs):
    method to_parquet (line 609) | def to_parquet(self, path, save_settings=True, **kwargs):
    method to_pickle (line 624) | def to_pickle(self, path, save_settings=True, **kwargs):
    method __finalize__ (line 651) | def __finalize__(self, other, method=None, **kwargs):
    method _constructor (line 670) | def _constructor(self):

FILE: composeml/label_times/plots.py
  class LabelPlots (line 14) | class LabelPlots:
    method __init__ (line 17) | def __init__(self, label_times):
    method count_by_time (line 25) | def count_by_time(self, ax=None, **kwargs):
    method dist (line 78) | def dist(self):
    method distribution (line 82) | def distribution(self, **kwargs):

FILE: composeml/tests/test_data_slice/test_extension.py
  function data_slice (line 8) | def data_slice(transactions):
  function test_context (line 18) | def test_context(data_slice):
  function test_context_aliases (line 34) | def test_context_aliases(data_slice):
  function test_subscriptable_slices (line 51) | def test_subscriptable_slices(transactions, time_based, offsets):
  function test_subscriptable_error (line 63) | def test_subscriptable_error(transactions):
  function test_time_index_error (line 68) | def test_time_index_error(transactions):
  function test_minimum_data_per_group (line 74) | def test_minimum_data_per_group(transactions):
  function test_drop_empty (line 86) | def test_drop_empty(transactions):

FILE: composeml/tests/test_data_slice/test_offset.py
  function test_numeric_typecast (line 6) | def test_numeric_typecast():
  function test_numeric_typecast_errors (line 11) | def test_numeric_typecast_errors():
  function test_invalid_value (line 21) | def test_invalid_value():
  function test_alias_phrase (line 27) | def test_alias_phrase():

FILE: composeml/tests/test_datasets.py
  function transactions (line 7) | def transactions():
  function test_transactions (line 11) | def test_transactions(transactions):

FILE: composeml/tests/test_featuretools.py
  function total_spent (line 7) | def total_spent(df):
  function labels (line 13) | def labels():
  function test_dfs (line 38) | def test_dfs(labels):

FILE: composeml/tests/test_label_maker.py
  function test_search_default (line 8) | def test_search_default(transactions, total_spent_fn):
  function test_search_examples_per_label (line 29) | def test_search_examples_per_label(transactions, total_spent_fn):
  function test_search_with_undefined_labels (line 57) | def test_search_with_undefined_labels(transactions, total_spent_fn):
  function test_search_with_multiple_targets (line 85) | def test_search_with_multiple_targets(transactions, total_spent_fn, uniq...
  function test_search_offset_mix_0 (line 127) | def test_search_offset_mix_0(transactions, total_spent_fn):
  function test_search_offset_mix_1 (line 158) | def test_search_offset_mix_1(transactions, total_spent_fn):
  function test_search_offset_mix_2 (line 188) | def test_search_offset_mix_2(transactions, total_spent_fn):
  function test_search_offset_mix_3 (line 217) | def test_search_offset_mix_3(transactions, total_spent_fn):
  function test_search_offset_mix_4 (line 254) | def test_search_offset_mix_4(transactions, total_spent_fn):
  function test_search_offset_mix_5 (line 287) | def test_search_offset_mix_5(transactions, total_spent_fn):
  function test_search_offset_mix_6 (line 319) | def test_search_offset_mix_6(transactions, total_spent_fn):
  function test_search_offset_mix_7 (line 347) | def test_search_offset_mix_7(transactions, total_spent_fn):
  function test_search_offset_negative_0 (line 377) | def test_search_offset_negative_0(transactions, total_spent_fn):
  function test_search_offset_negative_1 (line 395) | def test_search_offset_negative_1(transactions, total_spent_fn):
  function test_search_invalid_n_examples (line 413) | def test_search_invalid_n_examples(transactions, total_spent_fn):
  function test_column_based_windows (line 427) | def test_column_based_windows(transactions, total_spent_fn):
  function test_search_with_invalid_index (line 454) | def test_search_with_invalid_index(transactions, total_spent_fn):
  function test_search_on_empty_labels (line 473) | def test_search_on_empty_labels(transactions):
  function test_data_slice_overlap (line 491) | def test_data_slice_overlap(transactions, total_spent_fn):
  function test_label_type (line 504) | def test_label_type(transactions, total_spent_fn):
  function test_search_with_maximum_data (line 515) | def test_search_with_maximum_data(transactions):
  function test_minimum_data_per_group (line 578) | def test_minimum_data_per_group(transactions, minimum_data):
  function test_minimum_data_per_group_error (line 598) | def test_minimum_data_per_group_error(transactions):
  function test_label_maker_categorical_target_with_missing_data (line 613) | def test_label_maker_categorical_target_with_missing_data(transactions, ...

FILE: composeml/tests/test_label_plots.py
  function test_count_by_time_categorical (line 4) | def test_count_by_time_categorical(total_spent):
  function test_count_by_time_continuous (line 10) | def test_count_by_time_continuous(total_spent):
  function test_distribution_categorical (line 15) | def test_distribution_categorical(total_spent):
  function test_distribution_continuous (line 21) | def test_distribution_continuous(total_spent):
  function test_single_target (line 26) | def test_single_target(total_spent):

FILE: composeml/tests/test_label_serialization.py
  function path (line 11) | def path():
  function total_spent (line 19) | def total_spent(transactions, total_spent_fn):
  function test_csv (line 29) | def test_csv(path, total_spent):
  function test_parquet (line 36) | def test_parquet(path, total_spent):
  function test_pickle (line 43) | def test_pickle(path, total_spent):

FILE: composeml/tests/test_label_times.py
  function test_count_by_time_categorical (line 7) | def test_count_by_time_categorical(total_spent):
  function test_count_by_time_continuous (line 28) | def test_count_by_time_continuous(total_spent):
  function test_sorted_distribution (line 49) | def test_sorted_distribution(capsys, total_spent):
  function test_describe_no_transforms (line 88) | def test_describe_no_transforms(capsys):
  function test_distribution_categorical (line 124) | def test_distribution_categorical(total_spent):
  function test_distribution_continous (line 138) | def test_distribution_continous(total_spent):
  function test_target_type (line 157) | def test_target_type(total_spent):
  function test_count (line 165) | def test_count(total_spent):
  function test_label_select_errors (line 180) | def test_label_select_errors(total_spent):

FILE: composeml/tests/test_label_transforms/test_bin.py
  function test_bins (line 5) | def test_bins(labels):
  function test_quantile_bins (line 27) | def test_quantile_bins(labels):
  function test_single_target (line 49) | def test_single_target(total_spent):

FILE: composeml/tests/test_label_transforms/test_lead.py
  function test_lead (line 4) | def test_lead(labels):

FILE: composeml/tests/test_label_transforms/test_sample.py
  function labels (line 8) | def labels(labels):
  function test_sample_n_int (line 12) | def test_sample_n_int(labels):
  function test_sample_n_per_label (line 26) | def test_sample_n_per_label(labels):
  function test_sample_frac_int (line 42) | def test_sample_frac_int(labels):
  function test_sample_frac_per_label (line 55) | def test_sample_frac_per_label(labels):
  function test_sample_in_transforms (line 71) | def test_sample_in_transforms(labels):
  function test_sample_with_replacement (line 88) | def test_sample_with_replacement(labels):
  function test_single_target (line 95) | def test_single_target(total_spent):
  function test_sample_n_per_instance (line 103) | def test_sample_n_per_instance():
  function test_sample_frac_per_instance (line 127) | def test_sample_frac_per_instance():

FILE: composeml/tests/test_label_transforms/test_threshold.py
  function test_threshold (line 4) | def test_threshold(labels):
  function test_single_target (line 17) | def test_single_target(total_spent):

FILE: composeml/tests/test_version.py
  function test_version (line 4) | def test_version():

FILE: composeml/tests/utils.py
  function read_csv (line 6) | def read_csv(data, **kwargs):
  function to_csv (line 25) | def to_csv(label_times, **kwargs):

FILE: docs/source/conf.py
  function setup (line 228) | def setup(app):

FILE: docs/source/examples/demo/chicago_bike/__init__.py
  function _read (line 8) | def _read(file):
  function load_sample (line 16) | def load_sample():

FILE: docs/source/examples/demo/next_purchase/__init__.py
  function _add_time (line 12) | def _add_time(df, start="2015-01-01"):
  function _data (line 42) | def _data(nrows=1000000):
  function _read (line 62) | def _read(file):
  function load_sample (line 68) | def load_sample():

FILE: docs/source/examples/demo/turbofan_degredation/__init__.py
  function _download_data (line 9) | def _download_data():
  function _data (line 14) | def _data():
  function _read (line 27) | def _read(file):
  function load_sample (line 33) | def load_sample():

FILE: docs/source/examples/demo/utils.py
  function download (line 9) | def download(url, output="data"):
  function extract (line 22) | def extract(content, content_type, output):
  function extract_tarball (line 31) | def extract_tarball(content, output):
  function extract_zip (line 42) | def extract_zip(content, output):

Download .json

Condensed preview — 96 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (784K chars).

[
  {
    "path": ".codecov.yml",
    "chars": 303,
    "preview": "codecov:\n  notify:\n    require_ci_to_pass: yes\n\ncomment:\n  layout: \"diff, files\"\n\ncoverage:\n  precision: 2\n  round: down"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/blank_issue.md",
    "chars": 90,
    "preview": "---\nname: Blank Issue\nabout: Create a blank issue\ntitle: ''\nlabels: ''\nassignees: ''\n\n---\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "chars": 272,
    "preview": "---\nname: Bug Report\nabout: Create a bug report to help us improve Compose\ntitle: ''\nlabels: 'bug'\nassignees: ''\n\n---\n\n["
  },
  {
    "path": ".github/ISSUE_TEMPLATE/config.yml",
    "chars": 519,
    "preview": "blank_issues_enabled: true\ncontact_links:\n  - name: General Technical Question\n    about: \"If you have a question like *"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/documentation_improvement.md",
    "chars": 222,
    "preview": "---\nname: Documentation Improvement\nabout: Suggest an idea for improving the documentation\ntitle: ''\nlabels: 'documentat"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "chars": 244,
    "preview": "---\nname: Feature Request\nabout: Suggest an idea for this project\ntitle: ''\nlabels: 'new feature'\nassignees: ''\n\n---\n\n- "
  },
  {
    "path": ".github/auto_assign.yml",
    "chars": 67,
    "preview": "# Set to author to set pr creator as assignee\naddAssignees: author\n"
  },
  {
    "path": ".github/workflows/auto_approve_dependency_PRs.yml",
    "chars": 1268,
    "preview": "name: Auto Approve Dependency PRs\non:\n  schedule:\n      - cron: '*/30 * * * *'\n  workflow_dispatch:\njobs:\n  build:\n    r"
  },
  {
    "path": ".github/workflows/build_docs.yml",
    "chars": 1075,
    "preview": "on:\n  pull_request:\n    types: [opened, synchronize]\n  push:\n    branches:\n      - main\n\nname: Build Docs\njobs:\n  doc_te"
  },
  {
    "path": ".github/workflows/create_feedstock_pr.yaml",
    "chars": 2263,
    "preview": "name: Create Feedstock PR\non:\n  workflow_dispatch:\n    inputs:\n      version:\n        description: 'released PyPI versio"
  },
  {
    "path": ".github/workflows/install_test.yml",
    "chars": 1137,
    "preview": "on:\n  pull_request:\n    types: [opened, synchronize]\n  push:\n    branches:\n      - main\n\nname: Install Test\njobs:\n  inst"
  },
  {
    "path": ".github/workflows/latest_dependency_checker.yml",
    "chars": 1603,
    "preview": "# This workflow will install dependenies and if any critical dependencies have changed a pull request\n# will be created "
  },
  {
    "path": ".github/workflows/lint_check.yml",
    "chars": 1006,
    "preview": "on:\n  pull_request:\n    types: [opened, synchronize]\n  push:\n    branches:\n      - main\n\nname: Lint Check\njobs:\n  lint_t"
  },
  {
    "path": ".github/workflows/release.yml",
    "chars": 561,
    "preview": "on:\n  release:\n    types: [published]\n\nname: Release\njobs:\n  pypi:\n    name: Release to PyPI\n    runs-on: ubuntu-latest\n"
  },
  {
    "path": ".github/workflows/release_notes_updated.yml",
    "chars": 1273,
    "preview": "name: Release Notes Updated\n\non:\n  pull_request:\n    types: [opened, synchronize]\n\njobs:\n  release_notes_updated:\n    na"
  },
  {
    "path": ".github/workflows/unit_tests_with_latest_deps.yml",
    "chars": 1766,
    "preview": "on:\n  pull_request:\n    types: [opened, synchronize]\n  push:\n    branches:\n      - main\n\nname: Unit Tests - Latest Depen"
  },
  {
    "path": ".gitignore",
    "chars": 1302,
    "preview": "cb_model.json\n.DS_Store\n\n# IDE\n.vscode\ndocs/source/examples/demo/*/download\n\n# Byte-compiled / optimized / DLL files\n__p"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 1173,
    "preview": "exclude: |\n  (?x)\n  .html$|.csv$|.svg$|.md$|.txt$|.json$|.xml$|.pickle$|^.github/|\n  (LICENSE.*|README.*)\ndefault_stages"
  },
  {
    "path": ".readthedocs.yaml",
    "chars": 573,
    "preview": "# .readthedocs.yml\n# Read the Docs configuration file\n# See https://docs.readthedocs.io/en/stable/config-file/v2.html fo"
  },
  {
    "path": "LICENSE",
    "chars": 1518,
    "preview": "BSD 3-Clause License\n\nCopyright (c) 2017, Feature Labs, Inc.\nAll rights reserved.\n\nRedistribution and use in source and "
  },
  {
    "path": "Makefile",
    "chars": 1288,
    "preview": ".PHONY: clean\nclean:\n\tfind . -name '*.pyo' -delete\n\tfind . -name '*.pyc' -delete\n\tfind . -name __pycache__ -delete\n\tfind"
  },
  {
    "path": "README.md",
    "chars": 7904,
    "preview": "<p align=\"center\"><img width=50% src=\"https://raw.githubusercontent.com/alteryx/compose/main/docs/source/images/compose."
  },
  {
    "path": "composeml/__init__.py",
    "chars": 208,
    "preview": "# flake8:noqa\nfrom composeml.version import __version__\nfrom composeml import demos, update_checker\nfrom composeml.label"
  },
  {
    "path": "composeml/conftest.py",
    "chars": 2940,
    "preview": "import pandas as pd\nimport pytest\n\nfrom composeml import LabelTimes\nfrom composeml.tests.utils import read_csv\n\n\n@pytest"
  },
  {
    "path": "composeml/data_slice/__init__.py",
    "chars": 76,
    "preview": "# flake8:noqa\nfrom composeml.data_slice.generator import DataSliceGenerator\n"
  },
  {
    "path": "composeml/data_slice/extension.py",
    "chars": 8768,
    "preview": "import pandas as pd\n\nfrom composeml.data_slice.offset import DataSliceOffset, DataSliceStep\n\n\nclass DataSliceContext:\n  "
  },
  {
    "path": "composeml/data_slice/generator.py",
    "chars": 1689,
    "preview": "from composeml.data_slice.extension import DataSliceContext, DataSliceFrame\n\n\nclass DataSliceGenerator:\n    \"\"\"Generates"
  },
  {
    "path": "composeml/data_slice/offset.py",
    "chars": 4215,
    "preview": "import re\n\nimport pandas as pd\n\n\nclass DataSliceOffset:\n    \"\"\"Offsets for calculating data slice indices.\"\"\"\n\n    def _"
  },
  {
    "path": "composeml/demos/__init__.py",
    "chars": 231,
    "preview": "import os\n\nimport pandas as pd\n\nDATA = os.path.join(os.path.dirname(__file__))\n\n\ndef load_transactions():\n    path = os."
  },
  {
    "path": "composeml/demos/transactions.csv",
    "chars": 11348,
    "preview": "transaction_id,session_id,transaction_time,product_id,amount,customer_id,device,session_start,zip_code,join_date,date_of"
  },
  {
    "path": "composeml/label_maker.py",
    "chars": 12734,
    "preview": "from sys import stdout\n\nfrom pandas import Series\nfrom pandas.api.types import is_categorical_dtype\nfrom tqdm import tqd"
  },
  {
    "path": "composeml/label_search.py",
    "chars": 4006,
    "preview": "from collections import Counter\n\nfrom pandas import isnull\n\n\nclass ExampleSearch:\n    \"\"\"A label search based on the num"
  },
  {
    "path": "composeml/label_times/__init__.py",
    "chars": 129,
    "preview": "# flake8:noqa\nfrom composeml.label_times.deserialize import read_label_times\nfrom composeml.label_times.object import La"
  },
  {
    "path": "composeml/label_times/description.py",
    "chars": 1905,
    "preview": "import pandas as pd\n\n\ndef describe_label_times(label_times):\n    \"\"\"Prints out label info with transform settings that r"
  },
  {
    "path": "composeml/label_times/deserialize.py",
    "chars": 1338,
    "preview": "import json\nimport os\n\nimport pandas as pd\n\nfrom composeml.label_times.object import LabelTimes\n\n\ndef read_config(path):"
  },
  {
    "path": "composeml/label_times/object.py",
    "chars": 23278,
    "preview": "import json\nimport os\n\nimport pandas as pd\n\nfrom composeml.label_times.description import describe_label_times\nfrom comp"
  },
  {
    "path": "composeml/label_times/plots.py",
    "chars": 2963,
    "preview": "import matplotlib as mpl  # isort:skip\nimport pandas as pd\nimport seaborn as sns\n\n# Raises an import error on OSX if not"
  },
  {
    "path": "composeml/tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "composeml/tests/requirement_files/latest_core_dependencies.txt",
    "chars": 99,
    "preview": "featuretools==1.27.0\nmatplotlib==3.7.2\npandas==2.0.3\nseaborn==0.12.2\ntqdm==4.66.1\nwoodwork==0.25.1\n"
  },
  {
    "path": "composeml/tests/requirement_files/minimum_core_requirements.txt",
    "chars": 61,
    "preview": "matplotlib==3.3.3\npandas==2.0.0\nseaborn==0.12.2\ntqdm==4.32.0\n"
  },
  {
    "path": "composeml/tests/requirement_files/minimum_test_requirements.txt",
    "chars": 192,
    "preview": "featuretools==1.27.0\nmatplotlib==3.3.3\npandas==2.0.0\npip==21.3.1\npyarrow==7.0.0\npytest-cov==3.0.0\npytest-xdist==2.5.0\npy"
  },
  {
    "path": "composeml/tests/test_data_slice/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "composeml/tests/test_data_slice/test_extension.py",
    "chars": 2833,
    "preview": "import pandas as pd\nfrom pytest import fixture, mark, raises\n\nfrom composeml import LabelMaker\n\n\n@fixture\ndef data_slice"
  },
  {
    "path": "composeml/tests/test_data_slice/test_offset.py",
    "chars": 1028,
    "preview": "from pytest import raises\n\nfrom composeml.data_slice.offset import DataSliceOffset\n\n\ndef test_numeric_typecast():\n    as"
  },
  {
    "path": "composeml/tests/test_datasets.py",
    "chars": 193,
    "preview": "import pytest\n\nfrom composeml import demos\n\n\n@pytest.fixture\ndef transactions():\n    return demos.load_transactions()\n\n\n"
  },
  {
    "path": "composeml/tests/test_featuretools.py",
    "chars": 1442,
    "preview": "import featuretools as ft\nimport pytest\n\nfrom composeml import LabelMaker\n\n\ndef total_spent(df):\n    total = df.amount.s"
  },
  {
    "path": "composeml/tests/test_label_maker.py",
    "chars": 16972,
    "preview": "import pandas as pd\nimport pytest\n\nfrom composeml import LabelMaker\nfrom composeml.tests.utils import to_csv\n\n\ndef test_"
  },
  {
    "path": "composeml/tests/test_label_plots.py",
    "chars": 1014,
    "preview": "from pytest import raises\n\n\ndef test_count_by_time_categorical(total_spent):\n    total_spent = total_spent.bin(2, labels"
  },
  {
    "path": "composeml/tests/test_label_serialization.py",
    "chars": 1220,
    "preview": "import os\nimport shutil\n\nimport pandas as pd\nimport pytest\n\nimport composeml as cp\n\n\n@pytest.fixture\ndef path():\n    pwd"
  },
  {
    "path": "composeml/tests/test_label_times.py",
    "chars": 4929,
    "preview": "from pytest import raises\n\nfrom composeml.label_times import LabelTimes\nfrom composeml.tests.utils import to_csv\n\n\ndef t"
  },
  {
    "path": "composeml/tests/test_label_transforms/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "composeml/tests/test_label_transforms/test_bin.py",
    "chars": 1707,
    "preview": "import pandas as pd\nfrom pytest import raises\n\n\ndef test_bins(labels):\n    given_labels = labels.bin(2)\n    transform = "
  },
  {
    "path": "composeml/tests/test_label_transforms/test_lead.py",
    "chars": 525,
    "preview": "import pandas as pd\n\n\ndef test_lead(labels):\n    labels = labels.apply_lead(\"10min\")\n    transform = labels.transforms[0"
  },
  {
    "path": "composeml/tests/test_label_transforms/test_sample.py",
    "chars": 3725,
    "preview": "import pytest\n\nfrom composeml import LabelTimes\nfrom composeml.tests.utils import read_csv, to_csv\n\n\n@pytest.fixture\ndef"
  },
  {
    "path": "composeml/tests/test_label_transforms/test_threshold.py",
    "chars": 625,
    "preview": "from pytest import raises\n\n\ndef test_threshold(labels):\n    labels = labels.threshold(200)\n    transform = labels.transf"
  },
  {
    "path": "composeml/tests/test_version.py",
    "chars": 91,
    "preview": "from composeml import __version__\n\n\ndef test_version():\n    assert __version__ == \"0.10.1\"\n"
  },
  {
    "path": "composeml/tests/utils.py",
    "chars": 641,
    "preview": "from io import StringIO\n\nimport pandas as pd\n\n\ndef read_csv(data, **kwargs):\n    \"\"\"Helper function for creating a dataf"
  },
  {
    "path": "composeml/update_checker.py",
    "chars": 255,
    "preview": "from pkg_resources import iter_entry_points\n\nfor entry_point in iter_entry_points(\"alteryx_open_src_initialize\"):\n    tr"
  },
  {
    "path": "composeml/version.py",
    "chars": 23,
    "preview": "__version__ = \"0.10.1\"\n"
  },
  {
    "path": "contributing.md",
    "chars": 5501,
    "preview": "# Contributing to Compose\n\n:+1::tada: First off, thank you for taking the time to contribute! :tada::+1:\n\nWhether you ar"
  },
  {
    "path": "docs/Makefile",
    "chars": 585,
    "preview": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line.\nSPHINXOPTS    =\nSPHI"
  },
  {
    "path": "docs/make.bat",
    "chars": 756,
    "preview": "@ECHO OFF\n\npushd %~dp0\n\nREM Command file for Sphinx documentation\n\nif \"%SPHINXBUILD%\" == \"\" (\n\tset SPHINXBUILD=sphinx-bu"
  },
  {
    "path": "docs/source/_static/style.css",
    "chars": 783,
    "preview": ".footer {\n    background-color: #0D2345;\n    padding-bottom: 40px;\n    padding-top: 40px;\n    width: 100%;\n}\n\n.footer-ce"
  },
  {
    "path": "docs/source/_templates/class.rst",
    "chars": 399,
    "preview": "{{ fullname | escape | underline}}\n\n.. currentmodule:: {{ module }}\n\n.. autoclass:: {{ objname }}\n\n   {% block methods %"
  },
  {
    "path": "docs/source/_templates/layout.html",
    "chars": 2829,
    "preview": "{% extends \"!layout.html\" %}\n\n{%- block extrahead %}\n\n<script>\n  !function () {\n    var analytics = window.analytics = w"
  },
  {
    "path": "docs/source/api_reference.rst",
    "chars": 799,
    "preview": ".. currentmodule:: composeml\n\n=============\nAPI Reference\n=============\n\nLabel Maker\n===========\n\n.. autosummary::\n    :"
  },
  {
    "path": "docs/source/conf.py",
    "chars": 6860,
    "preview": "# -*- coding: utf-8 -*-\n#\n# Configuration file for the Sphinx documentation builder.\n#\n# This file does only contain a s"
  },
  {
    "path": "docs/source/examples/demo/__init__.py",
    "chars": 93,
    "preview": "import os\nimport warnings\n\nwarnings.filterwarnings(\"ignore\")\nPWD = os.path.dirname(__file__)\n"
  },
  {
    "path": "docs/source/examples/demo/chicago_bike/__init__.py",
    "chars": 306,
    "preview": "from demo import PWD\nfrom pandas import read_csv\nfrom os.path import join\n\nPWD = join(PWD, \"chicago_bike\")\n\n\ndef _read(f"
  },
  {
    "path": "docs/source/examples/demo/chicago_bike/sample.csv",
    "chars": 97265,
    "preview": "trip_id,gender,starttime,stoptime,tripduration,temperature,events,from_station_id,dpcapacity_start,to_station_id,dpcapac"
  },
  {
    "path": "docs/source/examples/demo/next_purchase/__init__.py",
    "chars": 2088,
    "preview": "import os\nimport pandas as pd\nimport requests\nimport tarfile\nfrom demo import PWD, utils\nfrom tqdm import tqdm\n\nURL = r\""
  },
  {
    "path": "docs/source/examples/demo/next_purchase/sample.csv",
    "chars": 22071,
    "preview": "id,order_id,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id,department,user_id,order_time\n24,"
  },
  {
    "path": "docs/source/examples/demo/turbofan_degredation/__init__.py",
    "chars": 932,
    "preview": "import os\nimport pandas as pd\nfrom demo import utils\n\nURL = r\"https://ti.arc.nasa.gov/c/6/\"\nPWD = os.path.dirname(__file"
  },
  {
    "path": "docs/source/examples/demo/turbofan_degredation/sample.csv",
    "chars": 80788,
    "preview": "id,engine_no,time_in_cycles,operational_setting_1,operational_setting_2,operational_setting_3,sensor_measurement_1,senso"
  },
  {
    "path": "docs/source/examples/demo/utils.py",
    "chars": 1493,
    "preview": "import os\nimport tarfile\nfrom zipfile import ZipFile\n\nimport requests\nfrom tqdm import tqdm\n\n\ndef download(url, output=\""
  },
  {
    "path": "docs/source/examples/predict_bike_trips.ipynb",
    "chars": 15528,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Predict Bike Trips\\n\",\n    \"\\n\",\n"
  },
  {
    "path": "docs/source/examples/predict_next_purchase.ipynb",
    "chars": 16450,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Predict Next Purchase\\n\",\n    \"\\n"
  },
  {
    "path": "docs/source/examples/predict_turbofan_degredation.ipynb",
    "chars": 17262,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"raw_mimetype\": \"text/restructuredtext\"\n   },\n   \"sou"
  },
  {
    "path": "docs/source/images/innovation_labs.xml",
    "chars": 59155,
    "preview": "<mxfile host=\"Electron\" modified=\"2020-08-28T18:11:21.195Z\" agent=\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) Apple"
  },
  {
    "path": "docs/source/images/label-maker.xml",
    "chars": 2402,
    "preview": "<mxfile modified=\"2019-07-02T21:30:54.163Z\" host=\"www.draw.io\" agent=\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWeb"
  },
  {
    "path": "docs/source/images/labeling-function.xml",
    "chars": 89882,
    "preview": "<mxfile modified=\"2019-07-02T21:15:56.017Z\" host=\"www.draw.io\" agent=\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWeb"
  },
  {
    "path": "docs/source/images/workflow.xml",
    "chars": 128987,
    "preview": "<mxfile modified=\"2020-07-16T17:20:08.201Z\" host=\"Electron\" agent=\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) Apple"
  },
  {
    "path": "docs/source/index.rst",
    "chars": 2017,
    "preview": "================\nWhat is Compose?\n================\n\n.. toctree::\n    :hidden:\n    :maxdepth: 1\n\n    install\n    start\n  "
  },
  {
    "path": "docs/source/install.md",
    "chars": 1329,
    "preview": "# Install\n\nCompose is available for Python 3.8, 3.9, 3.10, and 3.11. It can be installed from [PyPI](https://pypi.org/pr"
  },
  {
    "path": "docs/source/release_notes.rst",
    "chars": 10288,
    "preview": "Release Notes\n-------------\n\nFuture Release\n==============\n    * Enhancements\n    * Fixes\n    * Changes\n        * Remove"
  },
  {
    "path": "docs/source/resources/faq.ipynb",
    "chars": 5408,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# FAQ\\n\",\n    \"\\n\",\n    \"## I have "
  },
  {
    "path": "docs/source/resources/help.rst",
    "chars": 1695,
    "preview": "====\nHelp\n====\n\nCouldn't find what you were looking for? The Alteryx open source community is happy to provide support t"
  },
  {
    "path": "docs/source/resources.rst",
    "chars": 166,
    "preview": "=========\nResources\n=========\n\nFrequently asked questions and additional resources\n\n.. toctree::\n    :glob:\n    :maxdept"
  },
  {
    "path": "docs/source/start.ipynb",
    "chars": 6431,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"raw\",\n   \"metadata\": {\n    \"raw_mimetype\": \"text/restructuredtext\"\n   },\n   \"source\":"
  },
  {
    "path": "docs/source/tutorials.rst",
    "chars": 173,
    "preview": "=========\nTutorials\n=========\n\nUse these tutorial to learn how to use Compose for building AutoML applications.\n\n.. toct"
  },
  {
    "path": "docs/source/user_guide/controlling_cutoff_times.ipynb",
    "chars": 6546,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"fcfef470\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Controlling "
  },
  {
    "path": "docs/source/user_guide/data_slice_generator.ipynb",
    "chars": 14261,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"raw\",\n   \"metadata\": {\n    \"raw_mimetype\": \"text/restructuredtext\"\n   },\n   \"source\":"
  },
  {
    "path": "docs/source/user_guide/using_label_transforms.ipynb",
    "chars": 10464,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"raw\",\n   \"metadata\": {\n    \"raw_mimetype\": \"text/restructuredtext\"\n   },\n   \"source\":"
  },
  {
    "path": "docs/source/user_guide.rst",
    "chars": 292,
    "preview": "==========\nUser Guide\n==========\n\nUse these guides to learn how to use label transformations and generate better trainin"
  },
  {
    "path": "pyproject.toml",
    "chars": 3643,
    "preview": "[project]\nname = \"composeml\"\nreadme = \"README.md\"\ndescription = \"a framework for automated prediction engineering\"\ndynam"
  },
  {
    "path": "release.md",
    "chars": 1113,
    "preview": "# Release Process\n## Prerequisites\nThe environment variables `PYPI_USERNAME` and `PYPI_PASSWORD` must be already set in "
  }
]

About this extraction

This page contains the full source code of the alteryx/compose GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 96 files (740.1 KB), approximately 390.3k tokens, and a symbol index with 221 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo