Full Code of airbytehq/quickstarts for AI

main d00a63074425 cached
753 files
4.4 MB
1.2M tokens
37 symbols
1 requests
Download .txt
Showing preview only (4,833K chars total). Download the full file or copy to clipboard to get everything.
Repository: airbytehq/quickstarts
Branch: main
Commit: d00a63074425
Files: 753
Total size: 4.4 MB

Directory structure:
gitextract_zp7ng51c/

├── .devcontainer/
│   ├── README.md
│   └── devcontainer.json
├── .gitignore
├── CONTRIBUTING.md
├── README.md
├── airbyte_dbt_airflow_bigquery/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   └── ecommerce/
│   │   │       ├── marts/
│   │   │       │   ├── product_popularity.sql
│   │   │       │   ├── purchase_patterns.sql
│   │   │       │   ├── schema.yml
│   │   │       │   └── user_demographics.sql
│   │   │       ├── sources/
│   │   │       │   └── faker_sources.yml
│   │   │       └── staging/
│   │   │           ├── schema.yml
│   │   │           ├── stg_products.sql
│   │   │           ├── stg_purchases.sql
│   │   │           └── stg_users.sql
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   ├── .gitkeep
│   │   │   ├── raw_customers.csv
│   │   │   ├── raw_orders.csv
│   │   │   └── raw_payments.csv
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       ├── terraform.tfvars
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── .gitignore
│   │   ├── Dockerfile
│   │   ├── README.md
│   │   ├── airflow/
│   │   │   ├── config/
│   │   │   │   └── dbt_config.py
│   │   │   ├── dags/
│   │   │   │   └── elt_dag.py
│   │   │   └── plugins/
│   │   │       ├── custom_docs_plugin.py
│   │   │       ├── dbt_upload_docs.py
│   │   │       ├── static/
│   │   │       │   └── .gitkeep
│   │   │       └── templates/
│   │   │           └── dbt/
│   │   │               └── .gitkeep
│   │   ├── docker-compose.yaml
│   │   └── requirements.txt
│   └── setup.py
├── airbyte_dbt_airflow_snowflake/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── example/
│   │   │   │   ├── my_first_dbt_model.sql
│   │   │   │   ├── my_second_dbt_model.sql
│   │   │   │   └── schema.yml
│   │   │   └── sources.yml
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── .gitignore
│   │   ├── Dockerfile
│   │   ├── README.md
│   │   ├── airflow/
│   │   │   ├── dags/
│   │   │   │   └── my_elt_dag.py
│   │   │   └── plugins/
│   │   │       ├── static/
│   │   │       │   └── .gitkeep
│   │   │       └── templates/
│   │   │           └── dbt/
│   │   │               └── .gitkeep
│   │   ├── docker-compose.yaml
│   │   └── requirements.txt
│   └── setup.py
├── airbyte_dbt_dagster/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   └── example/
│   │   │       ├── my_first_dbt_model.sql
│   │   │       ├── my_second_dbt_model.sql
│   │   │       └── schema.yml
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── airbyte_dbt_dagster_snowflake/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── example/
│   │   │   │   ├── my_first_dbt_model.sql
│   │   │   │   ├── my_second_dbt_model.sql
│   │   │   │   └── schema.yml
│   │   │   └── sources.yml
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── airbyte_dbt_prefect_bigquery/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── product_popularity.sql
│   │   │   │   ├── purchase_patterns.sql
│   │   │   │   └── user_demographics.sql
│   │   │   ├── sources/
│   │   │   │   └── faker_sources.yml
│   │   │   └── staging/
│   │   │       ├── stg_products.sql
│   │   │       ├── stg_purchases.sql
│   │   │       └── stg_users.sql
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   └── my_elt_flow.py
│   └── setup.py
├── airbyte_dbt_prefect_snowflake/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── example/
│   │   │   │   ├── my_first_dbt_model.sql
│   │   │   │   ├── my_second_dbt_model.sql
│   │   │   │   └── schema.yml
│   │   │   └── sources.yml
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   └── my_elt_flow.py
│   └── setup.py
├── airbyte_dbt_snowflake_looker/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── example/
│   │   │   │   ├── my_first_dbt_model.sql
│   │   │   │   ├── my_second_dbt_model.sql
│   │   │   │   └── schema.yml
│   │   │   └── sources.yml
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── airbyte_lib_notebooks/
│   ├── AirbyteLib_Basic_Features_Demo.ipynb
│   ├── AirbyteLib_CoinAPI_Demo.ipynb
│   ├── AirbyteLib_GA4_Demo.ipynb
│   ├── AirbyteLib_Github_Incremental_Demo.ipynb
│   ├── PyAirbyte_Postgres_Custom_Cache_Demo.ipynb
│   ├── PyAirbyte_Shopify_Demo.ipynb
│   └── README.md
├── airbyte_s3_pinecone_rag/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── dbt_project.yml
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   └── purchase_data.sql
│   │   │   ├── sources/
│   │   │   │   └── s3.source.yml
│   │   │   └── staging/
│   │   │       └── stg_purchases.sql
│   │   └── profiles.yml
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── output.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── query.py
│   ├── quickstart.md
│   └── setup.py
├── api_to_warehouse/
│   ├── .gitignore
│   ├── Readme.md
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   └── setup.py
├── customer_segmentation_analytics_shopify/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   ├── customer_activity_analysis.py
│   │   │   ├── purchase_pattern_segmentation_analysis.py
│   │   │   └── rfm_segmentation_analysis.py
│   │   ├── dbt_project.yml
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── customer_activity.sql
│   │   │   │   ├── purchase_pattern_segmentation.sql
│   │   │   │   └── rfm_segmentation.sql
│   │   │   ├── sources/
│   │   │   │   └── shopify_sources.yml
│   │   │   └── staging/
│   │   │       ├── stg_customers.sql
│   │   │       └── stg_transactions.sql
│   │   └── profiles.yml
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── data_to_pinecone_llm/
│   ├── .gitignore
│   ├── .vscode/
│   │   └── quickstart.code-workspace
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── notion.source.yml
│   │   │   └── notion_data.sql
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── output.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── query.py
│   ├── quickstart.md
│   ├── secrets/
│   │   ├── .gitignore
│   │   └── README.md
│   └── setup.py
├── database_snapshot/
│   ├── .gitignore
│   ├── README.md
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── connections/
│   │       │   ├── main.tf
│   │       │   ├── provider.tf
│   │       │   └── variables.tf
│   │       ├── destinations/
│   │       │   ├── main.tf
│   │       │   ├── outputs.tf
│   │       │   ├── provider.tf
│   │       │   └── variables.tf
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       ├── sources/
│   │       │   ├── main.tf
│   │       │   ├── outputs.tf
│   │       │   ├── provider.tf
│   │       │   └── variables.tf
│   │       └── variables.tf
│   └── setup.py
├── developer_productivity_analytics_github/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── dbt_project.yml
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── avarage_time_to_merge_pr_analysis.sql
│   │   │   │   ├── commits_over_time_per_dev_analysis.sql
│   │   │   │   ├── dev_activity_by_day_of_week_analysis.sql
│   │   │   │   ├── dev_collaboration_network_analysis.sql
│   │   │   │   ├── freq_of_code_contribution_analysis.sql
│   │   │   │   ├── no_of_code_reviews_per_dev_analysis.sql
│   │   │   │   ├── no_of_commits_per_dev_per_repo_analysis.sql
│   │   │   │   ├── no_of_pr_per_dev_analysis.sql
│   │   │   │   ├── number_of_pr_open_or_closed.sql
│   │   │   │   ├── top_collaborators_by_repo_analysis.sql
│   │   │   │   └── track_issues_assigned_by_dev_analysis.sql
│   │   │   ├── sources/
│   │   │   │   └── github_source.yml
│   │   │   └── staging/
│   │   │       ├── stg_branches.sql
│   │   │       ├── stg_collaborators.sql
│   │   │       ├── stg_comments.sql
│   │   │       ├── stg_commits.sql
│   │   │       ├── stg_issues.sql
│   │   │       ├── stg_organizations.sql
│   │   │       ├── stg_pull_requests.sql
│   │   │       ├── stg_repositories.sql
│   │   │       ├── stg_review_comments.sql
│   │   │       ├── stg_reviews.sql
│   │   │       ├── stg_stargazers.sql
│   │   │       ├── stg_tags.sql
│   │   │       ├── stg_teams.sql
│   │   │       └── stg_users.sql
│   │   ├── profiles.yml
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── ecommerce_analytics_bigquery/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── product_popularity.sql
│   │   │   │   ├── purchase_patterns.sql
│   │   │   │   └── user_demographics.sql
│   │   │   ├── sources/
│   │   │   │   └── faker_sources.yml
│   │   │   └── staging/
│   │   │       ├── stg_products.sql
│   │   │       ├── stg_purchases.sql
│   │   │       └── stg_users.sql
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── elt_simplified_stack/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── dbt_project.yml
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── commits-per-repo.sql
│   │   │   │   ├── pr-per-dev.sql
│   │   │   │   └── pr-per-status.sql
│   │   │   ├── sources/
│   │   │   │   └── github_source.yml
│   │   │   └── staging/
│   │   │       ├── stg_commits.sql
│   │   │       └── stg_pull_requests.sql
│   │   ├── profiles.yml
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   └── my_elt_flow.py
│   └── setup.py
├── error_analysis_stack_sentry/
│   ├── Readme.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── example/
│   │   │   │   ├── Insight_Table.sql
│   │   │   │   └── schema.yml
│   │   │   └── sources.yml
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── error_analysis_stack.egg-info/
│   │   ├── PKG-INFO
│   │   ├── SOURCES.txt
│   │   ├── dependency_links.txt
│   │   ├── requires.txt
│   │   └── top_level.txt
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── orchestration.egg-info/
│   │   │   ├── PKG-INFO
│   │   │   ├── SOURCES.txt
│   │   │   ├── dependency_links.txt
│   │   │   ├── requires.txt
│   │   │   └── top_level.txt
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── github_insight_stack/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── Readme.md
│   │   │   ├── sources.yml
│   │   │   └── test-models/
│   │   │       ├── code_quality.sql
│   │   │       ├── collaboration_patterns.sql
│   │   │       └── project_health.sql
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── low_latency_data_availability/
│   ├── .gitignore
│   ├── README.md
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   └── setup.py
├── mongodb_mysql_integration/
│   ├── .gitignore
│   ├── README.md
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   └── setup.py
├── multisource_aggregation/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── example/
│   │   │   │   ├── my_first_dbt_model.sql
│   │   │   │   ├── my_second_dbt_model.sql
│   │   │   │   └── schema.yml
│   │   │   └── sources.yml
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── connections/
│   │       │   ├── main.tf
│   │       │   ├── provider.tf
│   │       │   └── variables.tf
│   │       ├── destination_warehouse/
│   │       │   ├── main.tf
│   │       │   ├── outputs.tf
│   │       │   ├── provider.tf
│   │       │   └── variables.tf
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       ├── source_databases/
│   │       │   ├── main.tf
│   │       │   ├── outputs.tf
│   │       │   ├── provider.tf
│   │       │   └── variables.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── mysql_to_postgres_incremental_stack/
│   ├── .gitignore
│   ├── README.md
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   └── setup.py
├── outdoor_activity_analytics_recreation/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   ├── campsite_availability_analysis.py
│   │   │   ├── campsite_type_analysis.py
│   │   │   ├── count_recareas_by_activity_analysis.py
│   │   │   └── most_common_activities_in_recareas_analysis.py
│   │   ├── dbt_project.yml
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── campsite_availability_over_time.sql
│   │   │   │   ├── campsite_type_counts.sql
│   │   │   │   ├── count_recarea_by_activity_analysis.sql
│   │   │   │   └── most_common_activities_in_recareas.sql
│   │   │   ├── sources/
│   │   │   │   └── recreation_source.yml
│   │   │   └── staging/
│   │   │       ├── stg_activities.sql
│   │   │       ├── stg_campsites.sql
│   │   │       ├── stg_facilities.sql
│   │   │       └── stg_recreationareas.sql
│   │   └── profiles.yml
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── postgres_data_replication/
│   ├── .gitignore
│   ├── README.md
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   └── setup.py
├── postgres_snowflake_integration/
│   ├── .gitignore
│   ├── README.md
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   └── setup.py
├── postgres_to_mysql_migration/
│   ├── .gitignore
│   ├── README.md
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   └── setup.py
├── pyairbyte_notebooks/
│   ├── AI ChatBot - 1.0 Launch Demo.ipynb
│   ├── Chatoverpolygonstockdata_langchain.ipynb
│   ├── PyAirbyte_Apify_Demo.ipynb
│   ├── PyAirbyte_Basic_Features_Demo.ipynb
│   ├── PyAirbyte_CoinAPI_Demo.ipynb
│   ├── PyAirbyte_Document_Creation_RAG_with_Langchain_Demo.ipynb
│   ├── PyAirbyte_GA4_Demo.ipynb
│   ├── PyAirbyte_Github_Incremental_Demo.ipynb
│   ├── PyAirbyte_Postgres_Custom_Cache_Demo.ipynb
│   ├── PyAirbyte_Shopify_Demo.ipynb
│   ├── PyAirbyte_Snowflake_Cortex_Github.ipynb
│   ├── PyAirbyte_Snowflake_Custom_Cache_Demo.ipynb
│   ├── PyAirbyte_as_an_Orchestrator_Demo.ipynb
│   ├── RAG_using_github_pyairbyte_chroma.ipynb
│   ├── README.md
│   ├── rag_using_gdrive_pyairbyte_pinecone.ipynb
│   ├── rag_using_github_pyairbyte_weaviate.ipynb
│   ├── rag_using_gitlab_pyairbyte_qdrant.ipynb
│   ├── rag_using_jira_pyairbyte_pinecone.ipynb
│   ├── rag_using_s3_pyairbyte_pinecone.ipynb
│   ├── rag_using_shopify_pyairbyte_langchain.ipynb
│   ├── rag_with_fb_marketing_milvus_lite.ipynb
│   ├── rag_with_pyairbyte_and_milvus_lite.ipynb
│   ├── sentiment_analysis_airbyte_gsheets_snowflakecortex.ipynb
│   └── using_langchain_airbyte_package.ipynb
├── satisfaction_analytics_zendesk_support/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analysis/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── analyze_satisfaction_score_over_time.sql
│   │   │   │   ├── avarage_satisfaction_rating.sql
│   │   │   │   ├── feedback_analysis_for_low_score.sql
│   │   │   │   └── trend_analysis_by_score.sql
│   │   │   ├── sources/
│   │   │   │   └── zendesk_support_sources.yml
│   │   │   └── staging/
│   │   │       ├── stg_brands.sql
│   │   │       ├── stg_groups.sql
│   │   │       ├── stg_organizations.sql
│   │   │       ├── stg_satisfaction_ratings.sql
│   │   │       ├── stg_tags.sql
│   │   │       ├── stg_ticket_audits.sql
│   │   │       ├── stg_ticket_comments.sql
│   │   │       ├── stg_ticket_fields.sql
│   │   │       ├── stg_ticket_forms.sql
│   │   │       ├── stg_ticket_metric_events.sql
│   │   │       ├── stg_ticket_metrics.sql
│   │   │       ├── stg_tickets.sql
│   │   │       └── stg_users.sql
│   │   ├── profiles.yml
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   ├── setup.py
│   │   ├── tmp3ks7pwhz/
│   │   │   └── storage/
│   │   │       ├── ad21fadd-c131-4a7c-98a7-fa5ad3a929de/
│   │   │       │   └── compute_logs/
│   │   │       │       ├── mdvhnoik.complete
│   │   │       │       ├── mdvhnoik.err
│   │   │       │       ├── mdvhnoik.out
│   │   │       │       ├── uzgmeijp.complete
│   │   │       │       ├── uzgmeijp.err
│   │   │       │       └── uzgmeijp.out
│   │   │       └── f7507115-918d-443f-ab91-a065e84fa403/
│   │   │           └── compute_logs/
│   │   │               ├── aeebjmfa.complete
│   │   │               ├── aeebjmfa.err
│   │   │               ├── aeebjmfa.out
│   │   │               ├── zqbkkiww.complete
│   │   │               ├── zqbkkiww.err
│   │   │               └── zqbkkiww.out
│   │   └── tmpb3ctnsbk/
│   │       └── storage/
│   │           ├── 0bc4e544-546d-44df-b79c-e75413c56ecb/
│   │           │   └── compute_logs/
│   │           │       ├── xozgecli.complete
│   │           │       ├── xozgecli.err
│   │           │       ├── xozgecli.out
│   │           │       ├── yyxjctam.complete
│   │           │       ├── yyxjctam.err
│   │           │       └── yyxjctam.out
│   │           └── 1eac78ed-12d1-4147-9c48-79b27dd586ed/
│   │               └── compute_logs/
│   │                   ├── iqvvuhde.complete
│   │                   ├── iqvvuhde.err
│   │                   ├── iqvvuhde.out
│   │                   ├── izklbfmq.complete
│   │                   ├── izklbfmq.err
│   │                   └── izklbfmq.out
│   └── setup.py
├── shopping_cart_analytics_shopify/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── dbt_project.yml
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── abandoned_checkout_ratio.sql
│   │   │   │   ├── location_based_abandoned_checkouts.sql
│   │   │   │   ├── most_abandoned_products.sql
│   │   │   │   └── time_based.sql
│   │   │   ├── sources/
│   │   │   │   └── shopify_source.yml
│   │   │   └── staging/
│   │   │       └── stg_abandoned_checkouts.sql
│   │   └── profiles.yml
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── ticket_volume_analytics_zendesk_support/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── dbt_project.yml
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── busier_day_of_week_analysis.sql
│   │   │   │   ├── pattern_and_trend_analysis.sql
│   │   │   │   ├── seasonal_analysis.sql
│   │   │   │   ├── ticket_priority_analysis.sql
│   │   │   │   ├── ticket_resolution_time_analysis.sql
│   │   │   │   ├── ticket_source_analysis.sql
│   │   │   │   └── ticket_volume_analysis.sql
│   │   │   ├── sources/
│   │   │   │   └── zendesk_support_sources.yml
│   │   │   └── staging/
│   │   │       ├── stg_schedules.sql
│   │   │       ├── stg_ticket_metrics.sql
│   │   │       ├── stg_tickets.sql
│   │   │       └── stg_users.sql
│   │   ├── profiles.yml
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── vector_store_integration/
│   ├── AI_assistant_streamlit_app/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── app.py
│   │   └── requirements.txt
│   ├── RAG_using_PGVector.ipynb
│   ├── RAG_using_Snowflake_Cortex.ipynb
│   └── RAG_using_Vectara.ipynb
└── weather_data_stack/
    ├── .gitignore
    ├── README.md
    ├── dbt_project/
    │   ├── .gitignore
    │   ├── README.md
    │   ├── analyses/
    │   │   └── .gitkeep
    │   ├── dbt_project.yml
    │   ├── macros/
    │   │   └── .gitkeep
    │   ├── models/
    │   │   ├── marts/
    │   │   │   └── historial_weather_trends.sql
    │   │   ├── sources/
    │   │   │   └── weatherstack_source.yml
    │   │   └── staging/
    │   │       └── stg_current_weather.sql
    │   ├── profiles.yml
    │   ├── seeds/
    │   │   └── .gitkeep
    │   ├── snapshots/
    │   │   └── .gitkeep
    │   └── tests/
    │       └── .gitkeep
    ├── orchestration/
    │   ├── orchestration/
    │   │   ├── __init__.py
    │   │   ├── assets.py
    │   │   ├── constants.py
    │   │   ├── definitions.py
    │   │   └── schedules.py
    │   ├── pyproject.toml
    │   └── setup.py
    └── setup.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .devcontainer/README.md
================================================
# `.devcontainer` Config

This directory houses a set of Dev Container config files, which streamline contributions from team and community members.

## Developing in the Browser using Codespaces

GitHub Codespaces allows maintainers and contributors to launch directly into a web browser window that hosts the VS Code IDE.

## Container Prebuild Optimizations

Prebuilds of these dev containers can significantly speed up launch times.

## Sharing Codespace Links

Per the [GitHub Docs](https://docs.github.com/en/codespaces/setting-up-your-project-for-codespaces/setting-up-your-repository/facilitating-quick-creation-and-resumption-of-codespaces#creating-a-link-to-the-codespace-creation-page-for-your-repository), you can:

- Create a codespace for the default branch:
  - [`https://codespaces.new/airbytehq/quickstarts`](https://codespaces.new/airbytehq/airbyte)
- Create a codespace for a specific branch of the repository:
  - `https://codespaces.new/airbytehq/quickstarts/tree/BRANCH-NAME`
  - `https://codespaces.new/FORK-NAME/quickstarts/tree/BRANCH-NAME`
  - E.g. https://codespaces.new/aaronsteers/quickstarts/tree/aj%2Ffeat%2Fdevcontainers
- Create a codespace for a pull request:
  - https://codespaces.new/airbytehq/quickstarts/pull/PR-SHA


================================================
FILE: .devcontainer/devcontainer.json
================================================
// This is a generic devcontainer definition for working with Quickstarts.
//
// Included in this devcontainer:
// - Python (3.10)
// - Terraform CLI
// - dbt (BigQuery variant)
// - Docker-In-Docker support (DinD)
// - Various VS Code extensions supporting the above 👆

{
    "name": "Airbyte Quickstarts Dev Container (Generic)",

    // For general devcontainer config, see: https://aka.ms/devcontainer.json
    // For Python-specific options, see: https://github.com/devcontainers/templates/tree/main/src/python
    "image": "mcr.microsoft.com/devcontainers/python:0-3.10",

    "features": {
        // Features to add to the dev container.
        // More info: https://containers.dev/features.
        "ghcr.io/devcontainers/features/docker-in-docker:2": {},
        "ghcr.io/devcontainers-contrib/features/poetry:2": {},
        "ghcr.io/devcontainers/features/terraform": {},
        "ghcr.io/devcontainers-contrib/features/pipx-package:1": {
            "package": "dbt-bigquery",
            "version": "1.7.2",
            "interpreter": "python3",
            "includeDeps": true // ...because dbt-query doesn't directly surface the dbt CLI
        }
    },
    "overrideFeatureInstallOrder": [
        // Strict ordering gives best chance of cache reuse.
        // Put things that aren't changing at top of list:
        "ghcr.io/devcontainers/features/docker-in-docker:2",
        "ghcr.io/devcontainers/features/terraform",
        "ghcr.io/devcontainers-contrib/features/poetry:2",
        "ghcr.io/devcontainers-contrib/features/pipx-package:1"
    ],

    // Configure tool-specific properties.
    "customizations": {
        "vscode": {
            "extensions": [
                // Python extensions:
                "charliermarsh.ruff",
                "ms-python.black-formatter",
                "ms-python.mypy-type-checker",
                "ms-python.python",
                "ms-python.vscode-pylance",
                "ms-toolsai.jupyter",
                // Toml support:
                "tamasfe.even-better-toml",
                // Yaml and JSON Schema support:
                "redhat.vscode-yaml",
                // Contributing:
                "GitHub.vscode-pull-request-github"
            ],
            "settings": {
                "extensions.ignoreRecommendations": true,
                "git.autofetch": true,
                "git.openRepositoryInParentFolders": "always",
                "python.defaultInterpreterPath": ".venv/bin/python",
                "python.interpreter.infoVisibility": "always",
                "python.terminal.activateEnvironment": true,
                "python.testing.pytestEnabled": true
            }
        }
    },
    "containerEnv": {
        "POETRY_VIRTUALENVS_IN_PROJECT": "true"
    }

    // Use 'forwardPorts' to make a list of ports inside the container available locally.
    // "forwardPorts": [],
    // Use 'postCreateCommand' to run commands after the container is created.
    // "postCreateCommand": "pip3 install --user -r requirements.txt",
    // Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
    // "remoteUser": "root"
}


================================================
FILE: .gitignore
================================================
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

#Desktop Services Store
.DS_Store

# PyAirbyte caches and virtual environments
.cache
.venv*


================================================
FILE: CONTRIBUTING.md
================================================
# Contributing to Airbyte Quickstarts

Thank you for considering contributing to Airbyte Quickstarts! 🌟 It’s people like you that make this project valuable for the community. Whether it’s fixing bugs or adding new Quickstarts, we welcome your contributions.

## How Can I Contribute?

### 1. Reporting Bugs

- First, check the [Issues](https://github.com/airbytehq/quickstarts/issues) to see if the bug has already been reported.
- If it hasn’t, [open a new issue](https://github.com/airbytehq/quickstarts/issues/new), providing a descriptive title and a clear description.

### 2. Suggesting Enhancements

- Before suggesting enhancements, please read the [documentation](https://github.com/airbytehq/quickstarts/blob/main/README.md) and check the [Issues](https://github.com/airbytehq/quickstarts/issues) to see if it has been discussed before.
- If it hasn’t, [open a new issue](https://github.com/airbytehq/quickstarts/issues/new), providing a descriptive title, detailed description, and use case.

### 3. Submitting Changes and New Quickstarts

1. **Fork the Repository**: Create your own fork of the [quickstarts repository](https://github.com/airbytehq/quickstarts).
2. **Clone the Repository**: Clone your forked repository to your local machine.
   ```sh
   git clone https://github.com/your-username/quickstarts.git
   ```
3. **Create a Branch**: Create a new branch from `main` for your changes.
   ```sh
   git checkout -b feature/my-new-feature
   ```
4. **Make Changes**: Make your changes or additions to the new branch.
5. **Commit Changes**: Commit your changes with a clear and descriptive commit message.
   ```sh
   git commit -m "Add a new quickstart"
   ```
6. **Push Changes**: Push your changes to your fork on GitHub.
   ```sh
   git push origin feature/my-new-feature
   ```
7. **Submit a Pull Request**: Go to the [Pull Requests](https://github.com/airbytehq/quickstarts/pulls) of the original repository and create a new pull request. Provide a clear description of the changes and reference any related issues.

### 4. Notes for New Quickstarts

1. **Create it in a New Directory**: Each Quickstart should live in its own directory and be a standalone project. 
2. **Add a README.md**: All Quickstarts should have clear and detailed instructions about how to set them up.

## Style Guides

- Write clean and simple code, following the existing code structure and naming conventions.
- For Markdown files, adhere to [Markdown Guide](https://www.markdownguide.org/extended-syntax/).
- Include comments with clear explanations of your code.
- Update the documentation (README.md) if needed, to reflect the changes made.

## Review Process

Once your pull request is submitted, maintainers will review it. They may ask for additional changes or clarifications. Once the pull request is approved, it will be merged into the main branch.

## Contact

For questions or help with the contributing process, please reach out in the #hackathons channel in the [Airbyte Slack](https://airbytehq.slack.com/).

Thank you for contributing to Airbyte Quickstarts! 🚀

================================================
FILE: README.md
================================================
# Airbyte Quickstarts

Welcome to Airbyte Quickstarts! This repository provides various templates to help you quickly build your data stack tailored to different domains like Marketing, Product, Finance, Operations, and more.

## Objective

To empower data teams by providing ready-to-use code templates, enabling the swift and efficient deployment of data stacks with minimal configuration.

## How To Start?

1. **Choose a Template**: Navigate to the Quickstart that suits you needs. Each folder in this repository is a Quickstart and can be used as a standalone project.
2. **Follow Setup Instructions**: Each Quickstart contains a `README.md` file with step-by-step instructions to set up the stack.
3. **Customize**: Modify the Quickstart as needed to suit your specific requirements.

## List Of Available Quickstarts

- [Airbyte, dbt, Airflow and BigQuery E-commerce Stack](./airbyte_dbt_airflow_bigquery)
- [Airbyte, dbt, Airflow and Snowflake Basic Stack](./airbyte_dbt_airflow_snowflake)
- [Airbyte, dbt, Dagster and BigQuery Basic Stack](./airbyte_dbt_dagster)
- [Airbyte, dbt, Dagster and Snowflake Basic Stack](./airbyte_dbt_dagster_snowflake)
- [Airbyte, dbt, Prefect and BigQuery (PAD) Stack](./airbyte_dbt_prefect_bigquery)
- [Airbyte, dbt, Prefect and Snowflake Basic Stack](./airbyte_dbt_prefect_snowflake)
- [Airbyte, dbt, Snowflake and Looker Basic Stack](./airbyte_dbt_snowflake_looker)
- [API to Data Warehouse Integration Stack](./api_to_warehouse)
- [Customer Satisfaction Analytics Stack With Zendesk Support, Airbyte, dbt, Dagster and BigQuery](./satisfaction_analytics_zendesk_support)
- [Customer Ticket Volume Analytics Stack With Zendesk Support, Airbyte, dbt, Dagster and BigQuery](./ticket_volume_analytics_zendesk_support)
- [Database Snapshot Stack](./database_snapshot)
- [E-commerce Analytics with Airbyte, dbt, Dagster and BigQuery](./ecommerce_analytics_bigquery)
- [ELT simplified stack with Airbyte, dbt, Prefect, Github and Bigquery](./elt_simplified_stack)
- [Github Insight Stack with Airbyte, dbt, Dagster and BigQuery](./github_insight_stack)
- [Low-Latency Data Availability Stack](./low_latency_data_availability)
- [MongoDB MySQL Integration Stack](./mongodb_mysql_integration)
- [Multisource Database Aggregtion Stack](./multisource_aggregation)
- [Postgres Data Replication Stack](./postgres_data_replication)
- [Postgres to MySQL Database Migration Stack](./postgres_to_mysql_migration)
- [Postgres to Snowflake Data Integraton](./postgres_snowflake_integration)

## Contribution Guidelines

We highly encourage community contributions to help improve, expand and add new Quickstarts! Please read our [Contribution Guidelines](CONTRIBUTING.md) before making a submission.

If you're looking to contribute with a new Quickstart, you can look for inspiration in the [Issues](https://github.com/airbytehq/quickstarts/issues) tab. There, we keep a list of our most wanted Quickstarts and often offer rewards for contributions, for example, during our different Hackathons.

## Contact

For questions or help with the contributing process, please reach out in the #hackathons channel in the [Airbyte Slack](https://airbytehq.slack.com/).


================================================
FILE: airbyte_dbt_airflow_bigquery/.gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

#Desktop Services Store
.DS_Store

================================================
FILE: airbyte_dbt_airflow_bigquery/README.md
================================================
# E-commerce Analytics Stack with Airbyte, dbt, Airflow (ADA) and BigQuery

Welcome to the Airbyte, dbt and Airflow (ADA) Stack with BigQuery quickstart! This repo contains the code to show how to utilize Airbyte and dbt for data extraction and transformation, and implement Apache Airflow to orchestrate the data workflows, providing a end-to-end ELT pipeline. With this setup, you can pull fake e-commerce data, put it into BigQuery, and play around with it using dbt and Airflow.

Here's the diagram of the end to end data pipeline you will build, from the Airflow DAG Graph view:

![elt_dag](assets/elt_dag.png)

And here are the transformations happening when the dbt DAG is executed:

![ecommerce_dag](assets/ecommerce_dag.png)

## Table of Contents

- [Prerequisites](#prerequisites)
- [Setting an environment for your project](#1-setting-an-environment-for-your-project)
- [Setting Up BigQuery](#2-setting-up-bigquery)
- [Setting Up Airbyte Connectors](#3-setting-up-airbyte-connectors)
- [Setting Up the dbt Project](#4-setting-up-the-dbt-project)
- [Setting Up Airflow](#5-setting-up-airflow)
- [Orchestrating with Airflow](#6-orchestrating-with-airflow)
- [Next Steps](#7-next-steps)

## Prerequisites

Before you embark on this integration, ensure you have the following set up and ready:

1. **Python 3.10 or later**: If not installed, download and install it from [Python's official website](https://www.python.org/downloads/).

2. **Docker and Docker Compose (Docker Desktop)**: Install [Docker](https://docs.docker.com/get-docker/) following the official documentation for your specific OS.

3. **Airbyte OSS version**: Deploy the open-source version of Airbyte locally. Follow the installation instructions from the [Airbyte Documentation](https://docs.airbyte.com/quickstart/deploy-airbyte/).

4. **Terraform (Optional)**: Terraform will help you provision and manage the Airbyte resources. If you haven't installed it, follow the [official Terraform installation guide](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli). This is an optional step because you can also create and manage Airbyte resources via the UI. Both ways will be described below.

5. **Google Cloud account with BigQuery**: You will also need to add the necessary permissions to allow Airbyte and dbt to access the data in BigQuery. A step-by-step guide is provided [below](#2-setting-up-bigquery).

## 1. Setting an environment for your project

Get the project up and running on your local machine by following these steps:

1. **Clone the repository (Clone only this quickstart)**:  
   ```bash
   git clone --filter=blob:none --sparse  https://github.com/airbytehq/quickstarts.git
   ```

   ```bash
   cd quickstarts
   ```

   ```bash
   git sparse-checkout add airbyte_dbt_airflow_bigquery
   ```

2. **Navigate to the directory**:  
   ```bash
   cd airbyte_dbt_airflow_bigquery
   ```

   At this point you can view the code in your preferred IDE. 
   
   The next steps are only necessary if want to develop or test the dbt models locally, since Airbyte and Airflow are running on Docker.

3. **Set up a virtual environment**:  
   
   You can use the following commands, just make sure to adapt to your specific python installation.

   - For Linux and Mac:
     ```bash
     python3 -m venv venv
     source venv/bin/activate
     ```

   - For Windows:
     ```bash
     python -m venv venv
     .\venv\Scripts\activate
     ```

4. **Install dependencies**: 

   ```bash
   pip install -e ".[dev]"
   ```

## 2. Setting up BigQuery

1. **Create a Google Cloud project**:
   - If you have a Google Cloud project, you can skip this step.
   - Go to the [Google Cloud Console](https://console.cloud.google.com/).
   - Click on the "Select a project" dropdown at the top right and select "New Project".
   - Give your project a name and follow the steps to create it.

2. **Create BigQuery datasets**:
   - In the Google Cloud Console, go to BigQuery.
   - Make two new datasets: `raw_data` for Airbyte and `transformed_data` for dbt.
     - If you pick different names, remember to change the names in the code too.
   
   **How to create a dataset:**
   - In the left sidebar, click on your project name.
   - Click “Create Dataset”.
   - Enter the dataset ID (either `raw_data` or `transformed_data`).
   - Click "Create Dataset".

3. **Create a Service Account and Assign Roles**:
   - Go to “IAM & Admin” > “Service accounts” in the Google Cloud Console.
   - Click “Create Service Account”.
   - Name your service account.
   - Assign the “BigQuery Data Editor” and “BigQuery Job User” roles to the service account.

   **How to create a service account and assign roles:**
   - While creating the service account, under the “Grant this service account access to project” section, click the “Role” dropdown.
   - Choose the “BigQuery Data Editor” and “BigQuery Job User” roles.
   - Finish the creation process.
   
4. **Generate a JSON key for the Service Account**:
   - Make a JSON key to let the service account sign in.
   
   **How to generate a JSON key:**
   - Find the service account in the “Service accounts” list.
   - Click on the service account name.
   - In the “Keys” section, click “Add Key” and pick JSON.
   - The key will download automatically. Keep it safe and don’t share it.

## 3. Setting Up Airbyte Connectors

To set up your Airbyte connectors, you can choose to do it via Terraform, or the UI. Choose one of the two following options.

### 3.1. Setting Up Airbyte Connectors with Terraform

Airbyte allows you to create connectors for sources and destinations via Terraform, facilitating data synchronization between various platforms. Here's how you can set this up:

1. **Navigate to the Airbyte Configuration Directory**:

   ```bash
   cd infra/airbyte
   ```

2. **Modify Configuration Files**:

   Within the `infra/airbyte` directory, you'll find three crucial Terraform files:
    - `provider.tf`: Defines the Airbyte provider.
    - `main.tf`: Contains the main configuration for creating Airbyte resources.
    - `variables.tf`: Holds various variables, including credentials.

   Adjust the configurations in these files to suit your project's needs: 

   - Provide credentials for your BigQuery connection in the `main.tf` file.
      - `dataset_id`: The name of the BigQuery dataset where Airbyte will load data. In this case, enter “raw_data”.
      - `project_id`: Your BigQuery project ID.
      - `credentials_json`: The contents of the service account JSON file. You should input a string, so you need to convert the JSON content to string beforehand.
      - `workspace_id`: Your Airbyte workspace ID, which can be found in the webapp url. For example, in this url: http://localhost:8000/workspaces/910ab70f-0a67-4d25-a983-999e99e1e395/ the workspace id would be `910ab70f-0a67-4d25-a983-999e99e1e395`.

   - Alternatively, you can utilize the `variables.tf` file to manage these credentials:
      - You’ll be prompted to enter the credentials when you execute `terraform plan` and `terraform apply`. If going for this option, just move to the next step. If you don’t want to use variables, remove them from the file.

3. **Initialize Terraform**:
   
   This step prepares Terraform to create the resources defined in your configuration files.
   ```bash
   terraform init
   ```

4. **Review the Plan**:

   Before applying any changes, review the plan to understand what Terraform will do.
   ```bash
   terraform plan
   ```

5. **Apply Configuration**:

   After reviewing and confirming the plan, apply the Terraform configurations to create the necessary Airbyte resources.
   ```bash
   terraform apply
   ```

6. **Verify in Airbyte UI**:

   Once Terraform completes its tasks, navigate to the [Airbyte UI](http://localhost:8000/). Here, you should see your source and destination connectors, as well as the connection between them, set up and ready to go 🎉.

### 3.2. Setting Up Airbyte Connectors Using the UI

Start by launching the Airbyte UI by going to http://localhost:8000/ in your browser. Then:

1. **Create a source**:

   - Go to the Sources tab and click on `+ New source`.
   - Search for “faker” using the search bar and select `Sample Data (Faker)`.
   - Adjust the Count and optional fields as needed for your use case. You can also leave as is. 
   - Click on `Set up source`.

2. **Create a destination**:

   - Go to the Destinations tab and click on `+ New destination`.
   - Search for “bigquery” using the search bar and select `BigQuery`.
   - Enter the connection details as needed.
   - For simplicity, you can use `Standard Inserts` as the loading method.
   - In the `Service Account Key JSON` field, enter the contents of the JSON file. Yes, the full JSON.
   - Click on `Set up destination`.

3. **Create a connection**:

   - Go to the Connections tab and click on `+ New connection`.
   - Select the source and destination you just created.
   - Enter the connection details as needed.
   - Click on `Set up connection`.

That’s it! Your connection is set up and ready to go! 🎉 

## 4. Setting Up the dbt Project

[dbt (data build tool)](https://www.getdbt.com/) allows you to transform your data by writing, documenting, and executing SQL workflows. Setting up the dbt project requires specifying connection details for your data platform, in this case, BigQuery. Here’s a step-by-step guide to help you set this up:

1. **Navigate to the dbt Project Directory**:

   Move to the directory containing the dbt configuration:
   ```bash
   cd ../../dbt_project
   ```

2. **Update Connection Details**:

   - You'll find a `profiles.yml` file within the directory. This file contains configurations for dbt to connect with your data platform. Update this file with your BigQuery connection details. Specifically, you need to update the Service Account JSON file path, the dataset location and your BigQuery project ID.
   - Provide your BigQuery project ID in the `database` field of the `/models/ecommerce/sources/faker_sources.yml` file.

   If you want to avoid hardcoding credentials in the `profiles.yml` file, you can leverage environment variables. Here's an example: `keyfile: "{{ env_var('DBT_BIGQUERY_KEYFILE_PATH', '') }}"`

3. **Test the Connection (Optional)**:
   You can test the connection to your BigQuery instance using the following command. Just take into account that you would need to provide the local path to your service account key file instead.
   
   ```bash
   dbt debug
   ```
   
   If everything is set up correctly, this command should report a successful connection to BigQuery 🎉.

## 5. Setting Up Airflow

Let's set up Airflow for our project, following the steps below. We are basing our setup on the Running Airflow in Docker guide, with some customizations:

1. **Navigate to the Orchestration Directory**:

   ```bash
   cd ../orchestration
   ```

2. **Set Environment Variables**:

   - Open the `.env.example` file located in the `orchestration` directory.
   - Update the necessary fields, paying special attention to the `GCP_SERVICE_ACCOUNT_PATH`, which should point to your local service account JSON key directory path.
   - Rename the file from `.env.example` to `.env` after filling in the details.

3. **Build the custom Airflow image**:

   ```bash
   docker compose build
   ```

4. **Launch the Airflow container**:

   ```bash
   docker compose up
   ```

   This might take a few minutes initially as it sets up necessary databases and metadata.

5. **Setting up Airflow Connections**:

   Both for using Airbyte and dbt, we need to set up connections in Airflow:

   - Access the Airflow UI by navigating to `http://localhost:8080` in your browser. The default username and password are both `airflow`, unless you changed it on the `.env` file.
   - Go to the "Admin" > "Connections" tab.

   **5.1. Create Airbyte Connection**:

      Click on the `+` button to create a new connection and fill in the following details to create an Airbyte connection:

      - **Connection Id**: The name of the connection, this will be used in the DAGs responsible for triggering Airbyte syncs. Name it `airbyte_connection`.
      - **Connection Type**: The type of the connection. In this case, select `Airbyte`.
      - **Host**: The host of the Airbyte instance. Since we're running it locally, use `airbyte-proxy`, which is the name of the container running Airbyte. In case you have a remote instance, you can use the URL of the instance.
      - **Port**: The port of the Airbyte instance. By default the API is exposed on port `8001`.
      - **Login**: If you're using the proxy (it's used by default in the official Airbyte Docker Compose file), this is required. By default it's `airbyte`.
      - **Password**: If you're using the proxy (it's used by default in the official Airbyte Docker Compose file), this is required. By default it's `password`.

      Click on the `Test` button, and make sure you get a `Connection successfully tested` message at the top. Then, you can `Save` the connection.

   **5.2. Create Google Cloud (BigQuery) connection**:

      Click on the `+` button to create a new connection and fill in the following details to create an Google Cloud connection:

      - **Connection Id**: The name of the connection, this one will be used in the DAGs responsible for triggering dbt runs. Name it `dbt_file_connection`.
      - **Connection Type**: The type of the connection. Select `Google Cloud` from the drop down menu.
      - **Project ID**: The Google Cloud project ID. 
      - **Keyfile path**: The path to the service account key file. In this case, it's mounted to `/opt/airflow/service_accounts/[your-service-account-key-file].json`. 
         - Alternatively, you can use the **Keyfile JSON** field and paste the contents of the key file.

      Click on the `Test` button, and make sure you get a `Connection successfully tested` message at the top. Then, you can `Save` the connection.

6. **Integrate dbt with Airflow**:

   We use [Astronomer Cosmos](https://astronomer.github.io/astronomer-cosmos/) to integrate dbt with Airflow. This library parses DAGs and Task Groups from dbt models, and allows us to use Airflow connections instead of dbt profiles. Additionally, it runs tests automatically after each model is completed. To set it up, we've created the file `orchestration/airflow/config/dbt_config.py` with the necessary configurations.

   Update the following in the `dbt_config.py` file, if necessary:

   - The `location` key inside `google_config` with the location of your BigQuery `transformed_data` dataset, if it's not `US`.
   - The method used to create the `google_condig`. The code uses the `GoogleCloudServiceAccountFileProfileMapping` method, assuming that the Google Cloud connection in Airflow was created using the *Keyfile Path*. If you used the *Keyfile JSON*, you should use the `GoogleCloudServiceAccountDictProfileMapping` method instead.

7. **Link Airbyte connection to the Airflow DAG**:

   The last step being being able to execute the DAG in Airflow, is to include the `connection_id` from Airbyte:

   - Visit the Airbyte UI at http://localhost:8000/.
   - In the "Connections" tab, select the "Faker to BigQuery" connection and copy its connection id from the URL.
   - Update the `connection_id` in the `extract_data` task within `orchestration/airflow/dags/elt_dag.py` with this id.

   That's it! Airflow has been configured to work with dbt and Airbyte. 🎉 

## 6. Orchestrating with Airflow
Now that everything is set up, it's time to run your data pipeline!

- In the Airflow UI, go to the "DAGs" section.
- Locate `elt_dag` and click on "Trigger DAG" under the "Actions" column.

This will initiate the complete data pipeline, starting with the Airbyte sync from Faker to BigQuery, followed by dbt transforming the raw data into `staging` and `marts` models. As the last step, it generates dbt docs.

- Confirm the sync status in the Airbyte UI.
- After dbt jobs completion, check the BigQuery console to see the newly created views in the `transformed_data` dataset.
- Once the dbt pipeline completes, you can check the dbt docs from the Airflow UI by going to the "Custom Docs" > "dbt" tab.

Congratulations! You've successfully run an end-to-end workflow with Airflow, dbt and Airbyte. 🎉

## 7. Next Steps

Once you've gone through the steps above, you should have a working Airbyte, dbt and Airflow (ADA) Stack with BigQuery. You can use this as a starting point for your project, and adapt it to your needs. There are lots of things you can do beyond this point, and these tools are evolving fast and adding new features almost every week. Here are some ideas to continue your project:

1. **Expand your data sources**:

   This quickstart uses a very simple data source. Airbyte provides hundreds of sources that might be integrated into your pipeline. And besides configuring and orchestrating them, don't forget to add them as sources in your dbt project. This will make sure you have a lineage graph like the one we showed in the beginning of this document.

2. **Dive into dbt and improve your transformations**:

   dbt is a very powerful tool, and it has lots of features that can help you improve your transformations. You can find more details in the [dbt Documentation](https://docs.getdbt.com/). It's very important that you understand the types of materializations and incremental models, as well as understanding the models, sources, metrics and everything else that dbt provides.

3. **Apply Data Quality into your pipeline**

   dbt provides a simple test framework that is a good starting point, but there is a lot more you can do to ensure your data is correct. You can use Airflow to run manual data quality checks, by using [Sensors](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/sensors.html) or operators that run custom queries. You can also use specialized tools such as [Great Expectations](https://greatexpectations.io/) to create more complex data quality checks.

4. **Monitoring and alerts**

   Airflow's UI is a good start for simple monitoring, but as your pipelines scale it might be useful to have a more robust monitoring solution. You can use tools such as [Prometheus](https://prometheus.io/) and [Grafana](https://grafana.com/) to create dashboards and alerts for your pipelines, but you can create [notifications using Airflow](https://airflow.apache.org/docs/apache-airflow/2.6.0/howto/notifications.html) or other tools such as [re_data](https://docs.getre.io/latest/docs/re_data/introduction/whatis_data/).

5. **Contribute with the community**

   All tools mentioned here are open-source and have very active communities. You can contribute with them by creating issues, suggesting features, or even creating pull requests. You can also contribute with the Airbyte community by creating [connectors](https://docs.airbyte.io/connector-development) for new sources and destinations.


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/.gitignore
================================================

target/
dbt_packages/
logs/

#Desktop Services Store
.DS_Store

#User cookie
.user.yml

================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/README.md
================================================
Welcome to your new dbt project!

### Using the starter project

Try running the following commands:
- dbt run
- dbt test


### Resources:
- Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction)
- Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers
- Join the [chat](https://community.getdbt.com/) on Slack for live discussions and support
- Find [dbt events](https://events.getdbt.com) near you
- Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices

## This project

We've created two dbt models: example (which contains the default dbt example from jaffle-shop) and  ecommerce, which uses data from the dataset extracted via airbyte using the Faker source.

This project is being orchestrated via Apache Airflow using the [Astronomer Cosmos](https://astronomer.github.io/astronomer-cosmos/) project. For more details in orchestrating dbt models with Airflow, you can check the `orchestration` folder in this quickstart.

The ecommerce dbt model was forked and updated from the [Ecommerce Analytics Bigquery Quickstart](https://github.com/airbytehq/quickstarts/tree/main/ecommerce_analytics_bigquery).

================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/analyses/.gitkeep
================================================


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/dbt_project.yml
================================================
# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'dbt_project'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'dbt_project'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ['models']
analysis-paths: ['analyses']
test-paths: ['tests']
seed-paths: ['seeds']
macro-paths: ['macros']
snapshot-paths: ['snapshots']

clean-targets: # directories to be removed by `dbt clean`
    - 'target'
    - 'dbt_packages'

# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
    dbt_project:
        # Config indicated by + and applies to all files under models/example/
        ecommerce:
            +materialized: view
            staging:
                +materialized: view
            marts:
                +materialized: view


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/macros/.gitkeep
================================================


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/marts/product_popularity.sql
================================================
WITH base AS (
  SELECT 
    product_id,
    COUNT(id) AS purchase_count
  FROM {{ ref('stg_purchases') }}
  GROUP BY 1
)

SELECT 
  p.id,
  p.make,
  p.model,
  b.purchase_count
FROM {{ ref('stg_products') }} p
LEFT JOIN base b ON p.id = b.product_id
ORDER BY b.purchase_count DESC


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/marts/purchase_patterns.sql
================================================
SELECT 
  user_id,
  product_id,
  purchased_at,
  added_to_cart_at,
  TIMESTAMP_DIFF(purchased_at, added_to_cart_at, SECOND) AS time_to_purchase_seconds,
  returned_at
FROM {{ ref('stg_purchases') }}


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/marts/schema.yml
================================================
version: 2

models:
    - name: product_popularity
      columns:
          - name: id
            tests:
                - unique
          - name: make
            tests:
                - not_null
          - name: model
            tests:
                - not_null
          - name: purchase_count
            tests:
                - not_null

    - name: purchase_patterns
      columns:
          - name: user_id
            tests:
                - not_null
          - name: product_id
            tests:
                - not_null
          - name: purchased_at
          - name: added_to_cart_at
            tests:
                - not_null
          - name: time_to_purchase_seconds
          - name: returned_at

    - name: user_demographics
      columns:
          - name: gender
            tests:
                - not_null
          - name: academic_degree
            tests:
                - not_null
          - name: nationality
            tests:
                - not_null
          - name: average_age
            tests:
                - not_null
          - name: user_count
            tests:
                - not_null


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/marts/user_demographics.sql
================================================
WITH base AS (
  SELECT 
    id AS user_id,
    gender,
    academic_degree,
    nationality,
    age
  FROM {{ ref('stg_users') }}
)

SELECT 
  gender,
  academic_degree,
  nationality,
  AVG(age) AS average_age,
  COUNT(user_id) AS user_count
FROM base
GROUP BY 1, 2, 3


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/sources/faker_sources.yml
================================================
version: 2

sources:
    - name: faker
      project: your_project_id # Update this field with your BigQuery project ID
      dataset: raw_data
      tables:
          - name: users
            description: 'Simulated user data from the Faker connector.'
            columns:
                - name: id
                  description: 'Unique identifier for the user.'
                - name: address
                - name: occupation
                - name: gender
                - name: academic_degree
                - name: weight
                - name: created_at
                - name: language
                - name: telephone
                - name: title
                - name: updated_at
                - name: nationality
                - name: blood_type
                - name: name
                - name: age
                - name: email
                - name: height
                - name: _airbyte_raw_id
                - name: _airbyte_extracted_at
                - name: _airbyte_meta

          - name: products
            description: 'Simulated product data from the Faker connector.'
            columns:
                - name: id
                  description: 'Unique identifier for the product.'
                - name: updated_at
                - name: year
                - name: price
                - name: created_at
                - name: model
                - name: make
                - name: _airbyte_raw_id
                - name: _airbyte_extracted_at
                - name: _airbyte_meta

          - name: purchases
            description: 'Simulated purchase data from the Faker connector.'
            columns:
                - name: id
                  description: 'Unique identifier for the purchase.'
                - name: updated_at
                - name: purchased_at
                - name: user_id
                - name: returned_at
                - name: product_id
                - name: created_at
                - name: added_to_cart_at
                - name: _airbyte_raw_id
                - name: _airbyte_extracted_at
                - name: _airbyte_meta


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/staging/schema.yml
================================================
version: 2

models:
    - name: stg_users
      columns:
          - name: id
            tests:
                - unique
          - name: gender
            tests:
                - not_null
          - name: academic_degree
            tests:
                - not_null
          - name: title
            tests:
                - not_null
          - name: nationality
            tests:
                - not_null
          - name: age
            tests:
                - not_null
          - name: name
            tests:
                - not_null
          - name: email
            tests:
                - not_null
          - name: created_at
            tests:
                - not_null
          - name: updated_at
            tests:
                - not_null
          - name: _airbyte_extracted_at
            tests:
                - not_null

    - name: stg_purchases
      columns:
          - name: id
            tests:
                - unique
          - name: user_id
            tests:
                - not_null
          - name: product_id
            tests:
                - not_null
          - name: updated_at
            tests:
                - not_null
          - name: purchased_at
          - name: returned_at
          - name: created_at
            tests:
                - not_null
          - name: added_to_cart_at
          - name: _airbyte_extracted_at
            tests:
                - not_null

    - name: stg_products
      columns:
          - name: id
            tests:
                - unique
          - name: year
            tests:
                - not_null
          - name: price
            tests:
                - not_null
          - name: model
            tests:
                - not_null
          - name: make
            tests:
                - not_null
          - name: created_at
            tests:
                - not_null
          - name: updated_at
            tests:
                - not_null
          - name: _airbyte_extracted_at
            tests:
                - not_null


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/staging/stg_products.sql
================================================
select
    id,
    year,
    price,
    model,
    make,
    created_at,
    updated_at,
    _airbyte_extracted_at
from {{ source('faker', 'products') }}


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/staging/stg_purchases.sql
================================================
select
    id,
    user_id,
    product_id,
    updated_at,
    purchased_at,
    returned_at,
    created_at,
    added_to_cart_at,
    _airbyte_extracted_at
from {{ source('faker', 'purchases') }}


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/staging/stg_users.sql
================================================
select
    id,
    gender,
    academic_degree,
    title,
    nationality,
    age,
    name,
    email,
    created_at,
    updated_at,
    _airbyte_extracted_at,
from {{ source('faker', 'users') }}


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/profiles.yml
================================================
dbt_project:
  outputs:
    dev:
      dataset: transformed_data
      job_execution_timeout_seconds: 300
      job_retries: 1
      keyfile: /opt/airflow/service_accounts/your_keyfile_path.json # Update this field with your file name, example: /opt/airflow/service_accounts/airflow-***116-83db69931a10.json
      location: your_dataset_location # Update this field with your dataset location, example: US
      method: service-account
      priority: interactive
      project: your_project_id # Update this field with your BigQuery project ID
      threads: 1
      type: bigquery
  target: dev

  

================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/seeds/.gitkeep
================================================


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/seeds/raw_customers.csv
================================================
id,first_name,last_name
1,Michael,P.
2,Shawn,M.
3,Kathleen,P.
4,Jimmy,C.
5,Katherine,R.
6,Sarah,R.
7,Martin,M.
8,Frank,R.
9,Jennifer,F.
10,Henry,W.
11,Fred,S.
12,Amy,D.
13,Kathleen,M.
14,Steve,F.
15,Teresa,H.
16,Amanda,H.
17,Kimberly,R.
18,Johnny,K.
19,Virginia,F.
20,Anna,A.
21,Willie,H.
22,Sean,H.
23,Mildred,A.
24,David,G.
25,Victor,H.
26,Aaron,R.
27,Benjamin,B.
28,Lisa,W.
29,Benjamin,K.
30,Christina,W.
31,Jane,G.
32,Thomas,O.
33,Katherine,M.
34,Jennifer,S.
35,Sara,T.
36,Harold,O.
37,Shirley,J.
38,Dennis,J.
39,Louise,W.
40,Maria,A.
41,Gloria,C.
42,Diana,S.
43,Kelly,N.
44,Jane,R.
45,Scott,B.
46,Norma,C.
47,Marie,P.
48,Lillian,C.
49,Judy,N.
50,Billy,L.
51,Howard,R.
52,Laura,F.
53,Anne,B.
54,Rose,M.
55,Nicholas,R.
56,Joshua,K.
57,Paul,W.
58,Kathryn,K.
59,Adam,A.
60,Norma,W.
61,Timothy,R.
62,Elizabeth,P.
63,Edward,G.
64,David,C.
65,Brenda,W.
66,Adam,W.
67,Michael,H.
68,Jesse,E.
69,Janet,P.
70,Helen,F.
71,Gerald,C.
72,Kathryn,O.
73,Alan,B.
74,Harry,A.
75,Andrea,H.
76,Barbara,W.
77,Anne,W.
78,Harry,H.
79,Jack,R.
80,Phillip,H.
81,Shirley,H.
82,Arthur,D.
83,Virginia,R.
84,Christina,R.
85,Theresa,M.
86,Jason,C.
87,Phillip,B.
88,Adam,T.
89,Margaret,J.
90,Paul,P.
91,Todd,W.
92,Willie,O.
93,Frances,R.
94,Gregory,H.
95,Lisa,P.
96,Jacqueline,A.
97,Shirley,D.
98,Nicole,M.
99,Mary,G.
100,Jean,M.


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/seeds/raw_orders.csv
================================================
id,user_id,order_date,status
1,1,2018-01-01,returned
2,3,2018-01-02,completed
3,94,2018-01-04,completed
4,50,2018-01-05,completed
5,64,2018-01-05,completed
6,54,2018-01-07,completed
7,88,2018-01-09,completed
8,2,2018-01-11,returned
9,53,2018-01-12,completed
10,7,2018-01-14,completed
11,99,2018-01-14,completed
12,59,2018-01-15,completed
13,84,2018-01-17,completed
14,40,2018-01-17,returned
15,25,2018-01-17,completed
16,39,2018-01-18,completed
17,71,2018-01-18,completed
18,64,2018-01-20,returned
19,54,2018-01-22,completed
20,20,2018-01-23,completed
21,71,2018-01-23,completed
22,86,2018-01-24,completed
23,22,2018-01-26,return_pending
24,3,2018-01-27,completed
25,51,2018-01-28,completed
26,32,2018-01-28,completed
27,94,2018-01-29,completed
28,8,2018-01-29,completed
29,57,2018-01-31,completed
30,69,2018-02-02,completed
31,16,2018-02-02,completed
32,28,2018-02-04,completed
33,42,2018-02-04,completed
34,38,2018-02-06,completed
35,80,2018-02-08,completed
36,85,2018-02-10,completed
37,1,2018-02-10,completed
38,51,2018-02-10,completed
39,26,2018-02-11,completed
40,33,2018-02-13,completed
41,99,2018-02-14,completed
42,92,2018-02-16,completed
43,31,2018-02-17,completed
44,66,2018-02-17,completed
45,22,2018-02-17,completed
46,6,2018-02-19,completed
47,50,2018-02-20,completed
48,27,2018-02-21,completed
49,35,2018-02-21,completed
50,51,2018-02-23,completed
51,71,2018-02-24,completed
52,54,2018-02-25,return_pending
53,34,2018-02-26,completed
54,54,2018-02-26,completed
55,18,2018-02-27,completed
56,79,2018-02-28,completed
57,93,2018-03-01,completed
58,22,2018-03-01,completed
59,30,2018-03-02,completed
60,12,2018-03-03,completed
61,63,2018-03-03,completed
62,57,2018-03-05,completed
63,70,2018-03-06,completed
64,13,2018-03-07,completed
65,26,2018-03-08,completed
66,36,2018-03-10,completed
67,79,2018-03-11,completed
68,53,2018-03-11,completed
69,3,2018-03-11,completed
70,8,2018-03-12,completed
71,42,2018-03-12,shipped
72,30,2018-03-14,shipped
73,19,2018-03-16,completed
74,9,2018-03-17,shipped
75,69,2018-03-18,completed
76,25,2018-03-20,completed
77,35,2018-03-21,shipped
78,90,2018-03-23,shipped
79,52,2018-03-23,shipped
80,11,2018-03-23,shipped
81,76,2018-03-23,shipped
82,46,2018-03-24,shipped
83,54,2018-03-24,shipped
84,70,2018-03-26,placed
85,47,2018-03-26,shipped
86,68,2018-03-26,placed
87,46,2018-03-27,placed
88,91,2018-03-27,shipped
89,21,2018-03-28,placed
90,66,2018-03-30,shipped
91,47,2018-03-31,placed
92,84,2018-04-02,placed
93,66,2018-04-03,placed
94,63,2018-04-03,placed
95,27,2018-04-04,placed
96,90,2018-04-06,placed
97,89,2018-04-07,placed
98,41,2018-04-07,placed
99,85,2018-04-09,placed


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/seeds/raw_payments.csv
================================================
id,order_id,payment_method,amount
1,1,credit_card,1000
2,2,credit_card,2000
3,3,coupon,100
4,4,coupon,2500
5,5,bank_transfer,1700
6,6,credit_card,600
7,7,credit_card,1600
8,8,credit_card,2300
9,9,gift_card,2300
10,9,bank_transfer,0
11,10,bank_transfer,2600
12,11,credit_card,2700
13,12,credit_card,100
14,13,credit_card,500
15,13,bank_transfer,1400
16,14,bank_transfer,300
17,15,coupon,2200
18,16,credit_card,1000
19,17,bank_transfer,200
20,18,credit_card,500
21,18,credit_card,800
22,19,gift_card,600
23,20,bank_transfer,1500
24,21,credit_card,1200
25,22,bank_transfer,800
26,23,gift_card,2300
27,24,coupon,2600
28,25,bank_transfer,2000
29,25,credit_card,2200
30,25,coupon,1600
31,26,credit_card,3000
32,27,credit_card,2300
33,28,bank_transfer,1900
34,29,bank_transfer,1200
35,30,credit_card,1300
36,31,credit_card,1200
37,32,credit_card,300
38,33,credit_card,2200
39,34,bank_transfer,1500
40,35,credit_card,2900
41,36,bank_transfer,900
42,37,credit_card,2300
43,38,credit_card,1500
44,39,bank_transfer,800
45,40,credit_card,1400
46,41,credit_card,1700
47,42,coupon,1700
48,43,gift_card,1800
49,44,gift_card,1100
50,45,bank_transfer,500
51,46,bank_transfer,800
52,47,credit_card,2200
53,48,bank_transfer,300
54,49,credit_card,600
55,49,credit_card,900
56,50,credit_card,2600
57,51,credit_card,2900
58,51,credit_card,100
59,52,bank_transfer,1500
60,53,credit_card,300
61,54,credit_card,1800
62,54,bank_transfer,1100
63,55,credit_card,2900
64,56,credit_card,400
65,57,bank_transfer,200
66,58,coupon,1800
67,58,gift_card,600
68,59,gift_card,2800
69,60,credit_card,400
70,61,bank_transfer,1600
71,62,gift_card,1400
72,63,credit_card,2900
73,64,bank_transfer,2600
74,65,credit_card,0
75,66,credit_card,2800
76,67,bank_transfer,400
77,67,credit_card,1900
78,68,credit_card,1600
79,69,credit_card,1900
80,70,credit_card,2600
81,71,credit_card,500
82,72,credit_card,2900
83,73,bank_transfer,300
84,74,credit_card,3000
85,75,credit_card,1900
86,76,coupon,200
87,77,credit_card,0
88,77,bank_transfer,1900
89,78,bank_transfer,2600
90,79,credit_card,1800
91,79,credit_card,900
92,80,gift_card,300
93,81,coupon,200
94,82,credit_card,800
95,83,credit_card,100
96,84,bank_transfer,2500
97,85,bank_transfer,1700
98,86,coupon,2300
99,87,gift_card,3000
100,87,credit_card,2600
101,88,credit_card,2900
102,89,bank_transfer,2200
103,90,bank_transfer,200
104,91,credit_card,1900
105,92,bank_transfer,1500
106,92,coupon,200
107,93,gift_card,2600
108,94,coupon,700
109,95,coupon,2400
110,96,gift_card,1700
111,97,bank_transfer,1400
112,98,bank_transfer,1000
113,99,credit_card,2400


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/snapshots/.gitkeep
================================================


================================================
FILE: airbyte_dbt_airflow_bigquery/dbt_project/tests/.gitkeep
================================================


================================================
FILE: airbyte_dbt_airflow_bigquery/infra/.gitignore
================================================
# Local .terraform directories
**/.terraform/*

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log
crash.*.log

# Exclude all .tfvars files, which are likely to contain sensitive data, such as
# password, private keys, and other secrets. These should not be part of version 
# control as they are data points which are potentially sensitive and subject 
# to change depending on the environment.
*.tfvars.json

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*

# Ignore CLI configuration files
.terraformrc
terraform.rc

================================================
FILE: airbyte_dbt_airflow_bigquery/infra/README.md
================================================
# Airbyte setup with terraform

This folder contains the terraform code to setup a source, destination and connection in Airbyte using terraform.

We're using the [airbyte official provider](https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs), and any details can be found in the documentation.

For this example we're using:
- [Airbyte Source Faker](https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs/resources/source_faker)
- [Airbyte Destination BigQuery](https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs/resources/destination_bigquery)
- [Airbyte Connection](https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs/resources/connection)

This is all optional, since part of the advantage of using Airbyte is setting up the sources and destinations via the UI. However, if you want to automate this process, you can use this terraform code as a starting point.

================================================
FILE: airbyte_dbt_airflow_bigquery/infra/airbyte/main.tf
================================================
// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs

// Sources
resource "airbyte_source_faker" "faker" {
  configuration = {
    always_updated    = false
    count             = 1000
    parallelism       = 9
    records_per_slice = 10
    seed              = 6
    source_type       = "faker"
  }
  name         = "Faker"
  workspace_id = var.workspace_id
}

// Destinations
resource "airbyte_destination_bigquery" "bigquery" {
  configuration = {
    dataset_id       = var.dataset_id
    dataset_location = "US"
    destination_type = "bigquery"
    project_id       = var.project_id
    credentials_json = file(var.credentials_json_path)
    loading_method = {
      destination_bigquery_loading_method_standard_inserts = {
        method = "Standard"
      }
    }
  }
  name         = "BigQuery"
  workspace_id = var.workspace_id
}

// Connections
resource "airbyte_connection" "faker_to_bigquery" {
  name           = "Faker to BigQuery"
  source_id      = airbyte_source_faker.faker.source_id
  destination_id = airbyte_destination_bigquery.bigquery.destination_id
  configurations = {
    streams = [
      {
        name = "users"
      },
      {
        name = "products"
      },
      {
        name = "purchases"
      },
    ]
  }
}


================================================
FILE: airbyte_dbt_airflow_bigquery/infra/airbyte/provider.tf
================================================
// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs

terraform {
  required_providers {
    airbyte = {
      source = "airbytehq/airbyte"
      version = "0.3.3"
    }
  }
}

provider "airbyte" {
  // If running locally (Airbyte OSS) with docker-compose using the airbyte-proxy, 
  // include the actual password/username you've set up (or use the defaults below)
  username = "airbyte"
  password = "password"
  
  // if running locally (Airbyte OSS), include the server url to the airbyte-api-server
  server_url = "http://localhost:8006/v1"
}

================================================
FILE: airbyte_dbt_airflow_bigquery/infra/airbyte/terraform.tfvars
================================================
workspace_id=""
dataset_id="sample_ecommerce"
project_id=""
credentials_json_path = ""

================================================
FILE: airbyte_dbt_airflow_bigquery/infra/airbyte/variables.tf
================================================
variable "workspace_id" {
  type = string
}

variable "dataset_id" {
  type = string
}

variable "project_id" {
  type = string
}

variable "credentials_json_path" {
  type = string
}




================================================
FILE: airbyte_dbt_airflow_bigquery/orchestration/.gitignore
================================================
logs
__pycache__

================================================
FILE: airbyte_dbt_airflow_bigquery/orchestration/Dockerfile
================================================
FROM apache/airflow:2.7.2-python3.11
COPY requirements.txt /
RUN pip install --no-cache-dir -r /requirements.txt

================================================
FILE: airbyte_dbt_airflow_bigquery/orchestration/README.md
================================================
# Airflow setup with Airbyte and DBT

This folder contains the code to setup Airflow with Airbyte and DBT.

## Setup

We're using the [Running Airflow in Docker](https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html) as a starting point.

We've downloaded the official `docker-compose.yaml` file provided by Airflow and adapted it to:
- Use some configurations from an .env file
- Add the Airbyte operator, dbt and astronomer-cosmos packages
- Mount our dbt project folder into the container image
- For running locally, we've set up the network to use the one deployed by the Airbyte container setup (from [Airbyte Local Deployment](https://docs.airbyte.com/deploying-airbyte/local-deployment))
- Admitting you're

## Features

- Providing dbt docs as a plugin from airflow, and making it available in the UI (and behing authentication)
- Example dag with the airbyte operator
- Example dag rendering dbt docs
- Example dag orchestrating specific dbt-models inside a dag with multiple tasks
- Example dag orchestrating specific dbt models as a dag

We're also using dataset aware schedules, and the airflow decorator to write the dag code.

================================================
FILE: airbyte_dbt_airflow_bigquery/orchestration/airflow/config/dbt_config.py
================================================
from cosmos.config import ProjectConfig, ProfileConfig
from cosmos.profiles import GoogleCloudServiceAccountDictProfileMapping, GoogleCloudServiceAccountFileProfileMapping


project_config = ProjectConfig(
    dbt_project_path="/opt/airflow/dbt_project",
)

google_config = GoogleCloudServiceAccountFileProfileMapping(
    conn_id="dbt_file_connection",
    profile_args={
        "dataset": "transformed_data",
        "location": "US", # Update if you're using a different location for your dataset
        "threads": 1,
        "retries": 1,
        "priority": "interactive",
    }
)

profile_config = ProfileConfig(
    profile_name="dbt_project",
    target_name="dev",
    profile_mapping=google_config
)

================================================
FILE: airbyte_dbt_airflow_bigquery/orchestration/airflow/dags/elt_dag.py
================================================
from pendulum import datetime
from airflow.decorators import dag
from airflow.providers.airbyte.operators.airbyte import AirbyteTriggerSyncOperator
from airflow.operators.trigger_dagrun import TriggerDagRunOperator

from cosmos import DbtDag
from cosmos.operators import DbtDocsOperator
from cosmos.config import RenderConfig

from dbt_config import project_config, profile_config  # type:ignore
from dbt_upload_docs import upload_docs  # type:ignore

# Define the ELT DAG
@dag(
    dag_id="elt_dag",
    start_date=datetime(2023, 10, 1),
    schedule="@daily",
    tags=["airbyte", "dbt", "bigquery", "ecommerce"],
    catchup=False,
)
def extract_and_transform():
    """
    Runs the connection "Faker to BigQuery" on Airbyte and then triggers the dbt DAG.
    """
    # Airbyte sync task
    extract_data = AirbyteTriggerSyncOperator(
        task_id="trigger_airbyte_faker_to_bigquery",
        airbyte_conn_id="airbyte_connection",
        connection_id="your_connection_id", # Update with your Airbyte connection ID
        asynchronous=False,
        timeout=3600,
        wait_seconds=3
    )

    # Trigger for dbt DAG
    trigger_dbt_dag = TriggerDagRunOperator(
        task_id="trigger_dbt_dag",
        trigger_dag_id="dbt_ecommerce",
        wait_for_completion=True,
        poke_interval=30,
    )

    render_dbt_docs = DbtDocsOperator(
        task_id="render_dbt_docs",
        profile_config=profile_config,
        project_dir="/opt/airflow/dbt_project",
        callback=upload_docs,
    )

    # Set the order of tasks
    extract_data >> trigger_dbt_dag >> render_dbt_docs

# Instantiate the ELT DAG
extract_and_transform_dag = extract_and_transform()

# Define the dbt DAG using DbtDag from the cosmos library
dbt_cosmos_dag = DbtDag(
    dag_id="dbt_ecommerce",
    start_date=datetime(2023, 10, 1),
    tags=["dbt", "ecommerce"],
    catchup=False,
    project_config=project_config,
    profile_config=profile_config,
    render_config=RenderConfig(select=["path:models/ecommerce"]),
)

# Instantiate the dbt DAG
dbt_cosmos_dag


================================================
FILE: airbyte_dbt_airflow_bigquery/orchestration/airflow/plugins/custom_docs_plugin.py
================================================
"""Plugins example"""
from __future__ import annotations

from flask import Blueprint
from flask_appbuilder import BaseView, expose

from airflow.plugins_manager import AirflowPlugin
from airflow.security import permissions
from airflow.www.auth import has_access

bp = Blueprint(
    "Docs Plugin",
    __name__,
    template_folder="templates",
    static_folder="static",
    static_url_path="/dbtdocspluginview",
)

class DbtDocsPluginView(BaseView):
    """Creating a Flask-AppBuilder View"""
    default_view = "index"
    @expose("/")
    @has_access(
        [
            (permissions.ACTION_CAN_READ, permissions.RESOURCE_WEBSITE),
        ]
    )
    def index(self):
        """Create default view"""
        return self.render_template("dbt/index.html", name="DBT")

# Creating a flask blueprint

class CustomDocsPlugin(AirflowPlugin):
    """Defining the plugin class"""

    name = "Docs Plugin"
    flask_blueprints = [bp]
    appbuilder_views = [{
        "name": "dbt",
        "category": "Custom Docs",
        "view": DbtDocsPluginView()
    }]

================================================
FILE: airbyte_dbt_airflow_bigquery/orchestration/airflow/plugins/dbt_upload_docs.py
================================================
import shutil, os, re
def fix_file():
    
    with open('/opt/airflow/plugins/templates/dbt/dbt_index.html') as f:
        html_contents = f.read()
    
    # Define a regular expression to match the script tag
    script_regex = r'<script type="text/javascript">(.*?)</script>'

    # Find the script tag that you want to extract
    script_match = re.search(script_regex, html_contents, re.DOTALL)

    # Get the contents of the script tag
    script_contents = script_match.group(1)

    # Write the script contents to a separate JavaScript file
    if not os.path.exists('/opt/airflow/plugins/static'):
      os.makedirs('/opt/airflow/plugins/static')
    with open('/opt/airflow/plugins/static/script.js', 'w') as f:
        f.write(script_contents)

    # Remove the script tag from the HTML contents
    html_contents = html_contents.replace(script_contents,"")

    # Add a new script tag to the head section of the HTML contents
    new_script_tag = f'<script type="text/javascript" src="./script.js"></script>'
    head_regex = r'<head>(.*?)</head>'
    head_match = re.search(head_regex, html_contents, re.DOTALL)
    head_contents = head_match.group(1)
    head_contents += new_script_tag
    html_contents = re.sub(head_regex, '<head>' + head_contents + '</head>', html_contents, flags=re.DOTALL)

    # Write the modified HTML contents to a new file
    with open('/opt/airflow/plugins/templates/dbt/index.html', 'w') as f:
        f.write(html_contents)

def upload_docs(project_dir):
    # upload docs to a storage of your choice
    # you only need to upload the following files:
    # - f"{project_dir}/target/index.html"
    # - f"{project_dir}/target/manifest.json"
    # - f"{project_dir}/target/graph.gpickle"
    # - f"{project_dir}/target/catalog.json"

    shutil.move(f"{project_dir}/target/index.html", "/opt/airflow/plugins/templates/dbt/dbt_index.html")
    shutil.move(f"{project_dir}/target/manifest.json", "/opt/airflow/plugins/static/manifest.json")
    shutil.move(f"{project_dir}/target/graph.gpickle", "/opt/airflow/plugins/static/graph.gpickle")
    shutil.move(f"{project_dir}/target/catalog.json", "/opt/airflow/plugins/static/catalog.json")
    fix_file()
    pass

================================================
FILE: airbyte_dbt_airflow_bigquery/orchestration/airflow/plugins/static/.gitkeep
================================================


================================================
FILE: airbyte_dbt_airflow_bigquery/orchestration/airflow/plugins/templates/dbt/.gitkeep
================================================


================================================
FILE: airbyte_dbt_airflow_bigquery/orchestration/docker-compose.yaml
================================================
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#

# Basic Airflow cluster configuration for CeleryExecutor with Redis and PostgreSQL.
#
# WARNING: This configuration is for local development. Do not use it in a production deployment.
#
# This configuration supports basic configuration using environment variables or an .env file
# The following variables are supported:
#
# AIRFLOW_IMAGE_NAME           - Docker image name used to run Airflow.
#                                Default: apache/airflow:2.7.2
# AIRFLOW_UID                  - User ID in Airflow containers
#                                Default: 50000
# AIRFLOW_PROJ_DIR             - Base path to which all the files will be volumed.
#                                Default: .
# Those configurations are useful mostly in case of standalone testing/running Airflow in test/try-out mode
#
# _AIRFLOW_WWW_USER_USERNAME   - Username for the administrator account (if requested).
#                                Default: airflow
# _AIRFLOW_WWW_USER_PASSWORD   - Password for the administrator account (if requested).
#                                Default: airflow
# _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when starting all containers.
#                                Use this option ONLY for quick checks. Installing requirements at container
#                                startup is done EVERY TIME the service is started.
#                                A better way is to build a custom image or extend the official image
#                                as described in https://airflow.apache.org/docs/docker-stack/build.html.
#                                Default: ''
#
# Feel free to modify this file to suit your needs.
---
version: '3.8'
x-airflow-common: &airflow-common
    # In order to add custom dependencies or upgrade provider packages you can use your extended image.
    # Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml
    # and uncomment the "build" line below, Then run `docker-compose build` to build the images.
    # image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.7.2}
    build: .
    env_file:
        - ./.env
    environment: &airflow-common-env
        AIRFLOW__CORE__EXECUTOR: CeleryExecutor
        AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${POSTGRES_USER:-airflow}:${POSTGRES_PASSWORD:-airflow}@postgres/airflow
        # For backward compatibility, with Airflow <2.3
        AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${POSTGRES_USER:-airflow}:${POSTGRES_PASSWORD:-airflow}@postgres/airflow
        AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://${POSTGRES_USER:-airflow}:${POSTGRES_PASSWORD:-airflow}@postgres/airflow
        AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
        AIRFLOW__CORE__FERNET_KEY: ''
        AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
        AIRFLOW__CORE__LOAD_EXAMPLES: ${LOAD_EXAMPLES:-true}
        AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session'
        AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true'
        AIRFLOW__CORE__LAZY_LOAD_PLUGINS: 'false'
        # WARNING: Use _PIP_ADDITIONAL_REQUIREMENTS option ONLY for a quick checks
        # for other purpose (development, test and especially production usage) build/extend Airflow image.
        _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
    volumes:
        - ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
        - ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
        - ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config
        - ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins
        - ${DBT_PROJ_DIR:-.}:/opt/airflow/dbt_project
        - ${GCP_SERVICE_ACCOUNT_PATH:-.}:/opt/airflow/service_accounts
    user: '${AIRFLOW_UID:-50000}:0'
    depends_on: &airflow-common-depends-on
        redis:
            condition: service_healthy
        postgres:
            condition: service_healthy
    networks:
        - airbyte_airbyte_public
services:
    postgres:
        image: postgres:15
        environment:
            POSTGRES_USER: ${POSTGRES_USER:-airflow}
            POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-airflow}
            POSTGRES_DB: airflow
        volumes:
            - postgres-db-volume:/var/lib/postgresql/data
        healthcheck:
            test: ['CMD', 'pg_isready', '-U', 'airflow']
            interval: 10s
            retries: 5
            start_period: 5s
        restart: always
        networks:
            - airbyte_airbyte_public

    redis:
        image: redis:latest
        expose:
            - 6379
        healthcheck:
            test: ['CMD', 'redis-cli', 'ping']
            interval: 10s
            timeout: 30s
            retries: 50
            start_period: 30s
        restart: always
        networks:
            - airbyte_airbyte_public

    airflow-webserver:
        <<: *airflow-common
        command: webserver
        ports:
            - '8080:8080'
        healthcheck:
            test: ['CMD', 'curl', '--fail', 'http://localhost:8080/health']
            interval: 30s
            timeout: 10s
            retries: 5
            start_period: 30s
        restart: always
        depends_on:
            <<: *airflow-common-depends-on
            airflow-init:
                condition: service_completed_successfully

    airflow-scheduler:
        <<: *airflow-common
        command: scheduler
        healthcheck:
            test: ['CMD', 'curl', '--fail', 'http://localhost:8974/health']
            interval: 30s
            timeout: 10s
            retries: 5
            start_period: 30s
        restart: always
        depends_on:
            <<: *airflow-common-depends-on
            airflow-init:
                condition: service_completed_successfully

    airflow-worker:
        <<: *airflow-common
        command: celery worker
        healthcheck:
            # yamllint disable rule:line-length
            test:
                - 'CMD-SHELL'
                - 'celery --app airflow.providers.celery.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}" || celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
            interval: 30s
            timeout: 10s
            retries: 5
            start_period: 30s
        environment:
            <<: *airflow-common-env
            # Required to handle warm shutdown of the celery workers properly
            # See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
            DUMB_INIT_SETSID: '0'
        restart: always
        depends_on:
            <<: *airflow-common-depends-on
            airflow-init:
                condition: service_completed_successfully

    airflow-triggerer:
        <<: *airflow-common
        command: triggerer
        healthcheck:
            test:
                [
                    'CMD-SHELL',
                    'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"',
                ]
            interval: 30s
            timeout: 10s
            retries: 5
            start_period: 30s
        restart: always
        depends_on:
            <<: *airflow-common-depends-on
            airflow-init:
                condition: service_completed_successfully

    airflow-init:
        <<: *airflow-common
        entrypoint: /bin/bash
        # yamllint disable rule:line-length
        command:
            - -c
            - |
                function ver() {
                  printf "%04d%04d%04d%04d" $${1//./ }
                }
                airflow_version=$$(AIRFLOW__LOGGING__LOGGING_LEVEL=INFO && gosu airflow airflow version)
                airflow_version_comparable=$$(ver $${airflow_version})
                min_airflow_version=2.2.0
                min_airflow_version_comparable=$$(ver $${min_airflow_version})
                if (( airflow_version_comparable < min_airflow_version_comparable )); then
                  echo
                  echo -e "\033[1;31mERROR!!!: Too old Airflow version $${airflow_version}!\e[0m"
                  echo "The minimum Airflow version supported: $${min_airflow_version}. Only use this or higher!"
                  echo
                  exit 1
                fi
                if [[ -z "${AIRFLOW_UID}" ]]; then
                  echo
                  echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
                  echo "If you are on Linux, you SHOULD follow the instructions below to set "
                  echo "AIRFLOW_UID environment variable, otherwise files will be owned by root."
                  echo "For other operating systems you can get rid of the warning with manually created .env file:"
                  echo "    See: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user"
                  echo
                fi
                one_meg=1048576
                mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg))
                cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat)
                disk_available=$$(df / | tail -1 | awk '{print $$4}')
                warning_resources="false"
                if (( mem_available < 4000 )) ; then
                  echo
                  echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m"
                  echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))"
                  echo
                  warning_resources="true"
                fi
                if (( cpus_available < 2 )); then
                  echo
                  echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m"
                  echo "At least 2 CPUs recommended. You have $${cpus_available}"
                  echo
                  warning_resources="true"
                fi
                if (( disk_available < one_meg * 10 )); then
                  echo
                  echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m"
                  echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))"
                  echo
                  warning_resources="true"
                fi
                if [[ $${warning_resources} == "true" ]]; then
                  echo
                  echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m"
                  echo "Please follow the instructions to increase amount of resources available:"
                  echo "   https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#before-you-begin"
                  echo
                fi
                mkdir -p /sources/logs /sources/dags /sources/plugins
                chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins}
                exec /entrypoint airflow version
        # yamllint enable rule:line-length
        environment:
            <<: *airflow-common-env
            _AIRFLOW_DB_MIGRATE: 'true'
            _AIRFLOW_WWW_USER_CREATE: 'true'
            _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
            _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
            _PIP_ADDITIONAL_REQUIREMENTS: ''
        user: '0:0'
        volumes:
            - ${AIRFLOW_PROJ_DIR:-.}:/sources

    airflow-cli:
        <<: *airflow-common
        profiles:
            - debug
        environment:
            <<: *airflow-common-env
            CONNECTION_CHECK_MAX_COUNT: '0'
        # Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252
        command:
            - bash
            - -c
            - airflow

    # You can enable flower by adding "--profile flower" option e.g. docker-compose --profile flower up
    # or by explicitly targeted on the command line e.g. docker-compose up flower.
    # See: https://docs.docker.com/compose/profiles/
    flower:
        <<: *airflow-common
        command: celery flower
        profiles:
            - flower
        ports:
            - '5555:5555'
        healthcheck:
            test: ['CMD', 'curl', '--fail', 'http://localhost:5555/']
            interval: 30s
            timeout: 10s
            retries: 5
            start_period: 30s
        restart: always
        depends_on:
            <<: *airflow-common-depends-on
            airflow-init:
                condition: service_completed_successfully
        networks:
            - airbyte_airbyte_public

volumes:
    postgres-db-volume:
networks:
    airbyte_airbyte_public:
        external: true


================================================
FILE: airbyte_dbt_airflow_bigquery/orchestration/requirements.txt
================================================
dbt-core~=1.6.0
astronomer-cosmos~=1.1.0
astronomer-cosmos[dbt-bigquery]~=1.1.0
apache-airflow-providers-google~=10.9.0
apache-airflow-providers-airbyte~=3.3.2

================================================
FILE: airbyte_dbt_airflow_bigquery/setup.py
================================================
from setuptools import find_packages, setup

setup(
    name="airbyte-dbt-airflow-bigquery",
    packages=find_packages(),
    install_requires=[
        "dbt-bigquery",
        "astronomer-cosmos[dbt-bigquery]",
        "apache-airflow-providers-google",
        "apache-airflow-providers-airbyte",
        "apache-airflow",
    ],
    extras_require={"dev": ["pytest"]},
)




================================================
FILE: airbyte_dbt_airflow_snowflake/.gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

#Desktop Services Store
.DS_Store

================================================
FILE: airbyte_dbt_airflow_snowflake/README.md
================================================
# Airbyte-dbt-Airflow-Snowflake Integration

Welcome to the "Airbyte-dbt-Airflow-Snowflake Integration" repository! This repo provides a quickstart template for building a full data stack using Airbyte, Airflow, dbt, and Snowflake. Easily extract data from Postgres and load it into Snowflake using Airbyte, and apply necessary transformations using dbt, all orchestrated seamlessly with Airflow. While this template doesn't delve into specific data or transformations, its goal is to showcase the synergy of these tools.

This quickstart is designed to minimize setup hassles and propel you forward.

## Table of Contents

- [Airbyte-dbt-Airflow-Snowflake Integration](#airbyte-dbt-airflow-snowflake-integration)
  - [Table of Contents](#table-of-contents)
  - [Infrastructure Layout](#infrastructure-layout)
  - [Airflow Pipeline DAG](#airflow-pipeline-dag)
  - [Prerequisites](#prerequisites)
  - [1. Setting an environment for your project](#1-setting-an-environment-for-your-project)
  - [2. Setting Up Airbyte Connectors with Terraform](#2-setting-up-airbyte-connectors-with-terraform)
  - [3. Setting Up the dbt Project](#3-setting-up-the-dbt-project)
  - [4. Orchestrating with Airflow](#4-orchestrating-with-airflow)
  - [Next Steps](#next-steps)

## Infrastructure Layout
![insfrastructure layout](images/ada_snowflake_stack.png)

## Airflow Pipeline DAG
![pipeline dag](images/dag.png)

## Prerequisites

Before you embark on this integration, ensure you have the following set up and ready:

1. **Python 3.10 or later**: If not installed, download and install it from [Python's official website](https://www.python.org/downloads/).

2. **Docker and Docker Compose (Docker Desktop)**: Install [Docker](https://docs.docker.com/get-docker/) following the official documentation for your specific OS.

3. **Airbyte OSS version**: Deploy the open-source version of Airbyte. Follow the installation instructions from the [Airbyte Documentation](https://docs.airbyte.com/quickstart/deploy-airbyte/).

4. **Terraform**: Terraform will help you provision and manage the Airbyte resources. If you haven't installed it, follow the [official Terraform installation guide](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli).


## 1. Setting an environment for your project

Get the project up and running on your local machine by following these steps:

1. **Clone the repository (Clone only this quickstart)**:  
   ```bash
   git clone --filter=blob:none --sparse  https://github.com/airbytehq/quickstarts.git
   ```

   ```bash
   cd quickstarts
   ```

   ```bash
   git sparse-checkout add airbyte_dbt_airflow_snowflake
   ```

   
2. **Navigate to the directory**:  
   ```bash
   cd airbyte_dbt_airflow_snowflake
   ```

3. **Set Up a Virtual Environment**:  
   - For Mac:
     ```bash
     python3 -m venv venv
     source venv/bin/activate
     ```
   - For Windows:
     ```bash
     python -m venv venv
     .\venv\Scripts\activate
     ```

4. **Install Dependencies**:  
   ```bash
   pip install -e ".[dev]"
   ```

## 2. Setting Up Airbyte Connectors with Terraform

Airbyte allows you to create connectors for sources and destinations, facilitating data synchronization between various platforms. In this project, we're harnessing the power of Terraform to automate the creation of these connectors and the connections between them. Here's how you can set this up:

1. **Navigate to the Airbyte Configuration Directory**:
   
   Change to the relevant directory containing the Terraform configuration for Airbyte:
   ```bash
   cd infra/airbyte
   ```

2. **Modify Configuration Files**:

   Within the `infra/airbyte` directory, you'll find three crucial Terraform files:
    - `provider.tf`: Defines the Airbyte provider.
    - `main.tf`: Contains the main configuration for creating Airbyte resources.
    - `variables.tf`: Holds various variables, including credentials.

   Adjust the configurations in these files to suit your project's needs. Specifically, provide credentials for your Postgres and Snowflake connections. You can utilize the `variables.tf` file to manage these credentials.

3. **Initialize Terraform**:
   
   This step prepares Terraform to create the resources defined in your configuration files.
   ```bash
   terraform init
   ```

4. **Review the Plan**:

   Before applying any changes, review the plan to understand what Terraform will do.
   ```bash
   terraform plan
   ```

5. **Apply Configuration**:

   After reviewing and confirming the plan, apply the Terraform configurations to create the necessary Airbyte resources.
   ```bash
   terraform apply
   ```

6. **Verify in Airbyte UI**:

   Once Terraform completes its tasks, navigate to the Airbyte UI. Here, you should see your source and destination connectors, as well as the connection between them, set up and ready to go.

## 3. Setting Up the dbt Project

[dbt (data build tool)](https://www.getdbt.com/) allows you to transform your data by writing, documenting, and executing SQL workflows. Setting up the dbt project requires specifying connection details for your data platform, in this case, Snowflake. Here’s a step-by-step guide to help you set this up:

1. **Navigate to the dbt Project Directory**:

   Change to the directory containing the dbt configuration:
   ```bash
   cd ../../dbt_project
   ```

2. **Update Connection Details**:

   You'll find a `profiles.yml` file within the directory. This file contains configurations for dbt to connect with your data platform. Update this file with your Snowflake connection details.

3. **Utilize Environment Variables (Optional but Recommended)**:

   To keep your credentials secure, you can leverage environment variables. An example is provided within the `profiles.yml` file.

4. **Test the Connection**:

   Once you’ve updated the connection details, you can test the connection to your Snowflake instance using:
   ```bash
   dbt debug
   ```

   If everything is set up correctly, this command should report a successful connection to Snowflake.

## 4. Orchestrating with Airflow

[Airflow](https://airflow.apache.org/) is a modern data orchestrator designed to help you build, test, and monitor your data workflows. In this section, we'll walk you through setting up Airflow to oversee both the Airbyte and dbt workflows:

1. **Navigate to the Orchestration Directory**:

   Switch to the directory containing the Airflow orchestration configurations:
   ```bash
   cd ../../orchestration
   ```

2. **Set Environment Variables**:

   The Airflow pipeline requires certain environment variables to run successfully. The variables will be set using the `.env` file. Populate the `.env` file with the contents of the `.env.example` file and modify to suit your use case. 
   
   Particularly, modify the `AIRFLOW_AIRBYTE_CONN` value which is the connection URI that Airflow uses to connect to the Airbyte API. See [here](https://airflow.apache.org/docs/apache-airflow/2.0.2/howto/connection.html#connection-uri-format) for more details. 
   
   Also modify the `AIRBYTE_CONN_ID` value which is the id of the connection you have set up in Airbyte.

3. **Build and Run Airflow Locally**:

   Build our Airflow image with the necessary packages and services
   ```bash
   docker compose build
   ```

   And then run it
   ```bash
   docker compose up
   ```

4. **Access Airflow in Your Browser**:

   When it's done, you can access the Airflow UI at `http://127.0.0.1:8080`. The default username and password are both `airflow`, unless you changed it on the `.env` file.

   Here, you should see the DAG for the Extract, Load and Transformation pipeline. To get an overview of DAG, click on the DAG's name and select the Graph view. This will give you a clear picture of the process lineage and visualize how the operation flows from extraction to transformation.

## Next Steps

Once you've set up and launched this initial integration, the real power lies in its adaptability and extensibility. Here’s a roadmap to help you customize and harness this project tailored to your specific data needs:

1. **Create dbt Sources for Airbyte Data**:

   Your raw data extracted via Airbyte can be represented as sources in dbt. Start by [creating new dbt sources](https://docs.getdbt.com/docs/build/sources) to represent this data, allowing for structured transformations down the line.

2. **Add Your dbt Transformations**:

   With your dbt sources in place, you can now build upon them. Add your custom SQL transformations in dbt, ensuring that you treat the sources as an upstream dependency. This ensures that your transformations work on the most up-to-date raw data.

3. **Execute the Pipeline in Airflow**:

   Navigate to the Airflow UI and Trigger the DAG. This triggers the entire pipeline, encompassing the extraction via Airbyte, transformations via dbt, and any other subsequent steps. Modify the schedule as well to suit your use case.

4. **Extend the Project**:

   The real beauty of this integration is its extensibility. Whether you want to add more data sources, integrate additional tools, or enhance your transformation logic – the floor is yours. With the foundation set, sky's the limit for how you want to extend and refine your data processes.

================================================
FILE: airbyte_dbt_airflow_snowflake/dbt_project/.gitignore
================================================

target/
dbt_packages/
logs/

#Desktop Services Store
.DS_Store

#User cookie
.user.yml

================================================
FILE: airbyte_dbt_airflow_snowflake/dbt_project/README.md
================================================
Welcome to your new dbt project!

### Using the starter project

Try running the following commands:
- dbt run
- dbt test


### Resources:
- Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction)
- Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers
- Join the [chat](https://community.getdbt.com/) on Slack for live discussions and support
- Find [dbt events](https://events.getdbt.com) near you
- Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices


================================================
FILE: airbyte_dbt_airflow_snowflake/dbt_project/analyses/.gitkeep
================================================


================================================
FILE: airbyte_dbt_airflow_snowflake/dbt_project/dbt_project.yml
================================================

# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'dbt_project'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'dbt_project'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  dbt_project:
    # Config indicated by + and applies to all files under models/example/
    example:
      +materialized: view


================================================
FILE: airbyte_dbt_airflow_snowflake/dbt_project/macros/.gitkeep
================================================


================================================
FILE: airbyte_dbt_airflow_snowflake/dbt_project/models/example/my_first_dbt_model.sql
================================================

/*
    Welcome to your first dbt model!
    Did you know that you can also configure models directly within SQL files?
    This will override configurations stated in dbt_project.yml

    Try changing "table" to "view" below
*/

{{ config(materialized='table') }}

with source_data as (

    select * from {{ source('snowflake', 'sample_table') }}

)

select *
from source_data

/*
    Uncomment the line below to remove records with null `id` values
*/

-- where id is not null


================================================
FILE: airbyte_dbt_airflow_snowflake/dbt_project/models/example/my_second_dbt_model.sql
================================================

-- Use the `ref` function to select from other models

select *
from {{ ref('my_first_dbt_model') }}
where id = 1


================================================
FILE: airbyte_dbt_airflow_snowflake/dbt_project/models/example/schema.yml
================================================

version: 2

models:
  - name: my_first_dbt_model
    description: "A starter dbt model"
    columns:
      - name: id
        description: "The primary key for this table"
        tests:
          - unique
          - not_null

  - name: my_second_dbt_model
    description: "A starter dbt model"
    columns:
      - name: id
        description: "The primary key for this table"
        tests:
          - unique
          - not_null


================================================
FILE: airbyte_dbt_airflow_snowflake/dbt_project/models/sources.yml
================================================
version: 2

sources:
  - name: snowflake
    tables:
      - name: sample_table


================================================
FILE: airbyte_dbt_airflow_snowflake/dbt_project/profiles.yml
================================================
dbt_project:
  outputs:
    dev:

      type: snowflake
      account: "{{ env_var('DBT_SNOWFLAKE_ACCOUNT_ID', '') }}"

      # User/password auth
      user: username
      password: "{{ env_var('DBT_SNOWFLAKE_PASSWORD', '') }}"

      role: user_role
      database: database_name
      warehouse: warehouse_name
      schema: dbt_schema
      threads: 1
      client_session_keep_alive: False
      query_tag: anything

      # optional
      connect_retries: 0 # default 0
      connect_timeout: 10 # default: 10
      retry_on_database_errors: False # default: false
      retry_all: False  # default: false
      reuse_connections: False # default: false (available v1.4+)

  target: dev

================================================
FILE: airbyte_dbt_airflow_snowflake/dbt_project/seeds/.gitkeep
================================================


================================================
FILE: airbyte_dbt_airflow_snowflake/dbt_project/snapshots/.gitkeep
================================================


================================================
FILE: airbyte_dbt_airflow_snowflake/dbt_project/tests/.gitkeep
================================================


================================================
FILE: airbyte_dbt_airflow_snowflake/infra/.gitignore
================================================
# Local .terraform directories
**/.terraform/*

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log
crash.*.log

# Exclude all .tfvars files, which are likely to contain sensitive data, such as
# password, private keys, and other secrets. These should not be part of version 
# control as they are data points which are potentially sensitive and subject 
# to change depending on the environment.
*.tfvars
*.tfvars.json

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*

# Ignore CLI configuration files
.terraformrc
terraform.rc

================================================
FILE: airbyte_dbt_airflow_snowflake/infra/airbyte/main.tf
================================================
// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs

// Sources
resource "airbyte_source_postgres" "postgres" {
    configuration = {
        database = "...my_database..."
        host = "...my_host..."
        username = "...my_username..."
        password = "...my_password..."
        port = 5432
        source_type = "postgres"
        schemas = [
            "...my_schema..."
        ]
        ssl_mode = {
            allow = {}
        }
        tunnel_method = {
            no_tunnel = {}
        }
        replication_method = {
            scan_changes_with_user_defined_cursor = {}
        }
    }
    name = "Postgres"
    workspace_id = var.workspace_id
}

// Destinations
resource "airbyte_destination_snowflake" "snowflake" {
    configuration = {
        credentials = {
            destination_snowflake_authorization_method_key_pair_authentication = {
                auth_type            = "Key Pair Authentication"
                private_key          = "...my_private_key..."
                private_key_password = "...my_private_key_password..."
            }
        }
        database         = "AIRBYTE_DATABASE"
        destination_type = "snowflake"
        host             = "accountname.us-east-2.aws.snowflakecomputing.com"
        jdbc_url_params  = "...my_jdbc_url_params..."
        raw_data_schema  = "...my_raw_data_schema..."
        role             = "AIRBYTE_ROLE"
        schema           = "AIRBYTE_SCHEMA"
        username         = "AIRBYTE_USER"
        warehouse        = "AIRBYTE_WAREHOUSE"
    }
    name         = "Snowflake"
    workspace_id = var.workspace_id
}   

// Connections
resource "airbyte_connection" "postgres_to_snowflake" {
    name = "Postgres to Snowflake"
    source_id = airbyte_source_postgres.postgres.source_id
    destination_id = airbyte_destination_snowflake.snowflake.destination_id
    configurations = {
        streams = [
            {
                name = "...my_table_name_1..."
            },
            {
                name = "...my_table_name_2..."
            },
        ]
    }
}

================================================
FILE: airbyte_dbt_airflow_snowflake/infra/airbyte/provider.tf
================================================
// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs

terraform {
  required_providers {
    airbyte = {
      source = "airbytehq/airbyte"
      version = "0.3.4"
    }
  }
}

provider "airbyte" {
  // If running locally (Airbyte OSS) with docker-compose using the airbyte-proxy, 
  // include the actual password/username you've set up (or use the defaults below)
  username = "airbyte"
  password = "password"
  
  // if running locally (Airbyte OSS), include the server url to the airbyte-api-server
  server_url = "http://localhost:8006/v1"
}

================================================
FILE: airbyte_dbt_airflow_snowflake/infra/airbyte/variables.tf
================================================
variable "workspace_id" {
    type = string
}








================================================
FILE: airbyte_dbt_airflow_snowflake/orchestration/.gitignore
================================================
logs
__pycache__

================================================
FILE: airbyte_dbt_airflow_snowflake/orchestration/Dockerfile
================================================
FROM apache/airflow:2.7.2-python3.11
COPY requirements.txt /
RUN pip install --no-cache-dir -r /requirements.txt

================================================
FILE: airbyte_dbt_airflow_snowflake/orchestration/README.md
================================================
# Airflow setup with Airbyte and DBT

This folder contains the code to setup Airflow with Airbyte and DBT.

## Setup

We're using the [Running Airflow in Docker](https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html) as a starting point.

We've downloaded the official `docker-compose.yaml` file provided by Airflow and adapted it to:
- Use some configurations from an .env file
- Add the Airbyte operator
- Mount our dbt project folder into the container image
- For running locally, we've set up the network to use the one deployed by the Airbyte container setup (from [Airbyte Local Deployment](https://docs.airbyte.com/deploying-airbyte/local-deployment))


================================================
FILE: airbyte_dbt_airflow_snowflake/orchestration/airflow/dags/my_elt_dag.py
================================================
import pendulum, os

from datetime import timedelta

from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.empty import EmptyOperator
from airflow.sensors.python import PythonSensor

from airflow.providers.airbyte.operators.airbyte import AirbyteTriggerSyncOperator
from airflow.providers.airbyte.sensors.airbyte import AirbyteJobSensor
from airflow.providers.airbyte.hooks.airbyte import AirbyteHook



AIRFLOW_AIRBYTE_CONN_ID = os.getenv("AIRFLOW_AIRBYTE_CONN") # The name of the Airflow connection to get connection information for Airbyte.
AIRBYTE_CONNECTION_ID = os.getenv("AIRBYTE_CONN_ID") # the Airbyte ConnectionId UUID between a source and destination.
DBT_DIR = "/opt/airflow/dbt_project"

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': pendulum.today('UTC').add(days=-1),
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
    }

def check_airbyte_health():
    airbyte_hook = AirbyteHook(airbyte_conn_id=AIRFLOW_AIRBYTE_CONN_ID)
    is_healthy, message = airbyte_hook.test_connection()
    print(message)
    return is_healthy

with DAG(
    dag_id='ELT_DAG',
    default_args=default_args,
    schedule='@daily',
    ) as dag:

   start_pipeline_task = EmptyOperator(task_id="start_pipeline")
   end_pipeline_task = EmptyOperator(task_id="end_pipeline")

   airbyte_precheck_task = PythonSensor(
        task_id="check_airbyte_health",
        poke_interval=10,
        timeout=3600,
        mode="poke",
        python_callable=check_airbyte_health,
    )
   
   trigger_airbyte_sync_task = AirbyteTriggerSyncOperator(
       task_id='airbyte_trigger_sync',
       airbyte_conn_id=AIRFLOW_AIRBYTE_CONN_ID,
       connection_id=AIRBYTE_CONNECTION_ID,
       asynchronous=True
   )

   wait_for_sync_completion_task = AirbyteJobSensor(
       task_id='airbyte_check_sync',
       airbyte_conn_id=AIRFLOW_AIRBYTE_CONN_ID,
       airbyte_job_id=trigger_airbyte_sync_task.output
   )

   run_dbt_check_task = BashOperator(
       task_id='run_dbt_precheck',
       bash_command='pwd && dbt debug && dbt list',
       cwd=DBT_DIR
   )

   run_dbt_model_task = BashOperator(
       task_id='run_dbt_model',
       bash_command='dbt run',
       cwd=DBT_DIR
   )

   start_pipeline_task >> airbyte_precheck_task >> trigger_airbyte_sync_task \
    >> [wait_for_sync_completion_task, run_dbt_check_task] \
        >> run_dbt_model_task >> end_pipeline_task


================================================
FILE: airbyte_dbt_airflow_snowflake/orchestration/airflow/plugins/static/.gitkeep
================================================


================================================
FILE: airbyte_dbt_airflow_snowflake/orchestration/airflow/plugins/templates/dbt/.gitkeep
================================================


================================================
FILE: airbyte_dbt_airflow_snowflake/orchestration/docker-compose.yaml
================================================
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#

# Basic Airflow cluster configuration for CeleryExecutor with Redis and PostgreSQL.
#
# WARNING: This configuration is for local development. Do not use it in a production deployment.
#
# This configuration supports basic configuration using environment variables or an .env file
# The following variables are supported:
#
# AIRFLOW_IMAGE_NAME           - Docker image name used to run Airflow.
#                                Default: apache/airflow:2.7.2
# AIRFLOW_UID                  - User ID in Airflow containers
#                                Default: 50000
# AIRFLOW_PROJ_DIR             - Base path to which all the files will be volumed.
#                                Default: .
# Those configurations are useful mostly in case of standalone testing/running Airflow in test/try-out mode
#
# _AIRFLOW_WWW_USER_USERNAME   - Username for the administrator account (if requested).
#                                Default: airflow
# _AIRFLOW_WWW_USER_PASSWORD   - Password for the administrator account (if requested).
#                                Default: airflow
# _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when starting all containers.
#                                Use this option ONLY for quick checks. Installing requirements at container
#                                startup is done EVERY TIME the service is started.
#                                A better way is to build a custom image or extend the official image
#                                as described in https://airflow.apache.org/docs/docker-stack/build.html.
#                                Default: ''
#
# Feel free to modify this file to suit your needs.
---
version: '3.8'
x-airflow-common: &airflow-common
    # In order to add custom dependencies or upgrade provider packages you can use your extended image.
    # Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml
    # and uncomment the "build" line below, Then run `docker-compose build` to build the images.
    # image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.7.2}
    build: .
    env_file:
        - ./.env
    environment: &airflow-common-env
        AIRFLOW__CORE__EXECUTOR: CeleryExecutor
        AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${POSTGRES_USER:-airflow}:${POSTGRES_PASSWORD:-airflow}@postgres/airflow
        # For backward compatibility, with Airflow <2.3
        AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${POSTGRES_USER:-airflow}:${POSTGRES_PASSWORD:-airflow}@postgres/airflow
        AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://${POSTGRES_USER:-airflow}:${POSTGRES_PASSWORD:-airflow}@postgres/airflow
        AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
        AIRFLOW__CORE__FERNET_KEY: ''
        AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
        AIRFLOW__CORE__LOAD_EXAMPLES: ${LOAD_EXAMPLES:-true}
        AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session'
        AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true'
        AIRFLOW__CORE__LAZY_LOAD_PLUGINS: 'false'
        # WARNING: Use _PIP_ADDITIONAL_REQUIREMENTS option ONLY for a quick checks
        # for other purpose (development, test and especially production usage) build/extend Airflow image.
        _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
    volumes:
        - ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
        - ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
        - ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config
        - ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins
        - ${DBT_PROJ_DIR:-.}:/opt/airflow/dbt_project
    user: '${AIRFLOW_UID:-50000}:0'
    depends_on: &airflow-common-depends-on
        redis:
            condition: service_healthy
        postgres:
            condition: service_healthy
    networks:
        - airbyte_airbyte_public
services:
    postgres:
        image: postgres:15
        environment:
            POSTGRES_USER: ${POSTGRES_USER:-airflow}
            POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-airflow}
            POSTGRES_DB: airflow
        volumes:
            - postgres-db-volume:/var/lib/postgresql/data
        healthcheck:
            test: ['CMD', 'pg_isready', '-U', 'airflow']
            interval: 10s
            retries: 5
            start_period: 5s
        restart: always
        networks:
            - airbyte_airbyte_public

    redis:
        image: redis:latest
        expose:
            - 6379
        healthcheck:
            test: ['CMD', 'redis-cli', 'ping']
            interval: 10s
            timeout: 30s
            retries: 50
            start_period: 30s
        restart: always
        networks:
            - airbyte_airbyte_public

    airflow-webserver:
        <<: *airflow-common
        command: webserver
        ports:
            - '8080:8080'
        healthcheck:
            test: ['CMD', 'curl', '--fail', 'http://localhost:8080/health']
            interval: 30s
            timeout: 10s
            retries: 5
            start_period: 30s
        restart: always
        depends_on:
            <<: *airflow-common-depends-on
            airflow-init:
                condition: service_completed_successfully

    airflow-scheduler:
        <<: *airflow-common
        command: scheduler
        healthcheck:
            test: ['CMD', 'curl', '--fail', 'http://localhost:8974/health']
            interval: 30s
            timeout: 10s
            retries: 5
            start_period: 30s
        restart: always
        depends_on:
            <<: *airflow-common-depends-on
            airflow-init:
                condition: service_completed_successfully

    airflow-worker:
        <<: *airflow-common
        command: celery worker
        healthcheck:
            # yamllint disable rule:line-length
            test:
                - 'CMD-SHELL'
                - 'celery --app airflow.providers.celery.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}" || celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
            interval: 30s
            timeout: 10s
            retries: 5
            start_period: 30s
        environment:
            <<: *airflow-common-env
            # Required to handle warm shutdown of the celery workers properly
            # See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
            DUMB_INIT_SETSID: '0'
        restart: always
        depends_on:
            <<: *airflow-common-depends-on
            airflow-init:
                condition: service_completed_successfully

    airflow-triggerer:
        <<: *airflow-common
        command: triggerer
        healthcheck:
            test:
                [
                    'CMD-SHELL',
                    'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"',
                ]
            interval: 30s
            timeout: 10s
            retries: 5
            start_period: 30s
        restart: always
        depends_on:
            <<: *airflow-common-depends-on
            airflow-init:
                condition: service_completed_successfully

    airflow-init:
        <<: *airflow-common
        entrypoint: /bin/bash
        # yamllint disable rule:line-length
        command:
            - -c
            - |
                function ver() {
                  printf "%04d%04d%04d%04d" $${1//./ }
                }
                airflow_version=$$(AIRFLOW__LOGGING__LOGGING_LEVEL=INFO && gosu airflow airflow version)
                airflow_version_comparable=$$(ver $${airflow_version})
                min_airflow_version=2.2.0
                min_airflow_version_comparable=$$(ver $${min_airflow_version})
                if (( airflow_version_comparable < min_airflow_version_comparable )); then
                  echo
                  echo -e "\033[1;31mERROR!!!: Too old Airflow version $${airflow_version}!\e[0m"
                  echo "The minimum Airflow version supported: $${min_airflow_version}. Only use this or higher!"
                  echo
                  exit 1
                fi
                if [[ -z "${AIRFLOW_UID}" ]]; then
                  echo
                  echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
                  echo "If you are on Linux, you SHOULD follow the instructions below to set "
                  echo "AIRFLOW_UID environment variable, otherwise files will be owned by root."
                  echo "For other operating systems you can get rid of the warning with manually created .env file:"
                  echo "    See: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user"
                  echo
                fi
                one_meg=1048576
                mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg))
                cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat)
                disk_available=$$(df / | tail -1 | awk '{print $$4}')
                warning_resources="false"
                if (( mem_available < 4000 )) ; then
                  echo
                  echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m"
                  echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))"
                  echo
                  warning_resources="true"
                fi
                if (( cpus_available < 2 )); then
                  echo
                  echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m"
                  echo "At least 2 CPUs recommended. You have $${cpus_available}"
                  echo
                  warning_resources="true"
                fi
                if (( disk_available < one_meg * 10 )); then
                  echo
                  echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m"
                  echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))"
                  echo
                  warning_resources="true"
                fi
                if [[ $${warning_resources} == "true" ]]; then
                  echo
                  echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m"
                  echo "Please follow the instructions to increase amount of resources available:"
                  echo "   https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#before-you-begin"
                  echo
                fi
                mkdir -p /sources/logs /sources/dags /sources/plugins
                chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins}
                exec /entrypoint airflow version
        # yamllint enable rule:line-length
        environment:
            <<: *airflow-common-env
            _AIRFLOW_DB_MIGRATE: 'true'
            _AIRFLOW_WWW_USER_CREATE: 'true'
            _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
            _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
            _PIP_ADDITIONAL_REQUIREMENTS: ''
        user: '0:0'
        volumes:
            - ${AIRFLOW_PROJ_DIR:-.}:/sources

    airflow-cli:
        <<: *airflow-common
        profiles:
            - debug
        environment:
            <<: *airflow-common-env
            CONNECTION_CHECK_MAX_COUNT: '0'
        # Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252
        command:
            - bash
            - -c
            - airflow

    # You can enable flower by adding "--profile flower" option e.g. docker-compose --profile flower up
    # or by explicitly targeted on the command line e.g. docker-compose up flower.
    # See: https://docs.docker.com/compose/profiles/
    flower:
        <<: *airflow-common
        command: celery flower
        profiles:
            - flower
        ports:
            - '5555:5555'
        healthcheck:
            test: ['CMD', 'curl', '--fail', 'http://localhost:5555/']
            interval: 30s
            timeout: 10s
            retries: 5
            start_period: 30s
        restart: always
        depends_on:
            <<: *airflow-common-depends-on
            airflow-init:
                condition: service_completed_successfully
        networks:
            - airbyte_airbyte_public

volumes:
    postgres-db-volume:
networks:
    airbyte_airbyte_public:
        external: true


================================================
FILE: airbyte_dbt_airflow_snowflake/orchestration/requirements.txt
================================================
dbt-core~=1.6.0
dbt-snowflake
apache-airflow-providers-airbyte~=3.3.2


================================================
FILE: airbyte_dbt_airflow_snowflake/setup.py
================================================
from setuptools import find_packages, setup

setup(
    name="airbyte-dbt-airflow-snowflake",
    packages=find_packages(),
    install_requires=[
        "dbt-snowflake",
        "apache-airflow[airbyte]",
        "apache-airflow",
    ],
    extras_require={"dev": ["pytest"]},
)

================================================
FILE: airbyte_dbt_dagster/.gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

#Desktop Services Store
.DS_Store

================================================
FILE: airbyte_dbt_dagster/README.md
================================================
# Airbyte-dbt-Dagster Integration

Welcome to the "Airbyte-dbt-Dagster Integration" repository! This repo provides a quickstart template for building a full data stack using Airbyte, Dagster, dbt, and BigQuery. Easily extract data from Postgres, load it into BigQuery, and apply necessary transformations using dbt, all orchestrated seamlessly with Dagster. While this template doesn't delve into specific data or transformations, its goal is to showcase the synergy of these tools.

This quickstart is designed to minimize setup hassles and propel you forward.

## Table of Contents

- [Prerequisites](#prerequisites)
- [Setting an environment for your project](#1-setting-an-environment-for-your-project)
- [Setting Up Airbyte Connectors with Terraform](#2-setting-up-airbyte-connectors-with-terraform)
- [Setting Up the dbt Project](#3-setting-up-the-dbt-project)
- [Orchestrating with Dagster](#4-orchestrating-with-dagster)
- [Next Steps](#next-steps)

## Prerequisites

Before you embark on this integration, ensure you have the following set up and ready:

1. **Python 3.10 or later**: If not installed, download and install it from [Python's official website](https://www.python.org/downloads/).

2. **Docker and Docker Compose (Docker Desktop)**: Install [Docker](https://docs.docker.com/get-docker/) following the official documentation for your specific OS.

3. **Airbyte OSS version**: Deploy the open-source version of Airbyte. Follow the installation instructions from the [Airbyte Documentation](https://docs.airbyte.com/quickstart/deploy-airbyte/).

4. **Terraform**: Terraform will help you provision and manage the Airbyte resources. If you haven't installed it, follow the [official Terraform installation guide](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli).


## 1. Setting an environment for your project

Get the project up and running on your local machine by following these steps:

1. **Clone the repository (Clone only this quickstart)**:  
   ```bash
   git clone --filter=blob:none --sparse  https://github.com/airbytehq/quickstarts.git
   ```

   ```bash
   cd quickstarts
   ```

   ```bash
   git sparse-checkout add airbyte_dbt_dagster
   ```

   
2. **Navigate to the directory**:  
   ```bash
   cd airbyte_dbt_dagster
   ```

3. **Set Up a Virtual Environment**:  
   - For Mac:
     ```bash
     python3 -m venv venv
     source venv/bin/activate
     ```
   - For Windows:
     ```bash
     python -m venv venv
     .\venv\Scripts\activate
     ```

4. **Install Dependencies**:  
   ```bash
   pip install -e ".[dev]"
   ```

## 2. Setting Up Airbyte Connectors with Terraform

Airbyte allows you to create connectors for sources and destinations, facilitating data synchronization between various platforms. In this project, we're harnessing the power of Terraform to automate the creation of these connectors and the connections between them. Here's how you can set this up:

1. **Navigate to the Airbyte Configuration Directory**:
   
   Change to the relevant directory containing the Terraform configuration for Airbyte:
   ```bash
   cd infra/airbyte
   ```

2. **Modify Configuration Files**:

   Within the `infra/airbyte` directory, you'll find three crucial Terraform files:
    - `provider.tf`: Defines the Airbyte provider.
    - `main.tf`: Contains the main configuration for creating Airbyte resources.
    - `variables.tf`: Holds various variables, including credentials.

   Adjust the configurations in these files to suit your project's needs. Specifically, provide credentials for your Postgres and BigQuery connections. You can utilize the `variables.tf` file to manage these credentials.

3. **Initialize Terraform**:
   
   This step prepares Terraform to create the resources defined in your configuration files.
   ```bash
   terraform init
   ```

4. **Review the Plan**:

   Before applying any changes, review the plan to understand what Terraform will do.
   ```bash
   terraform plan
   ```

5. **Apply Configuration**:

   After reviewing and confirming the plan, apply the Terraform configurations to create the necessary Airbyte resources.
   ```bash
   terraform apply
   ```

6. **Verify in Airbyte UI**:

   Once Terraform completes its tasks, navigate to the Airbyte UI. Here, you should see your source and destination connectors, as well as the connection between them, set up and ready to go.

## 3. Setting Up the dbt Project

[dbt (data build tool)](https://www.getdbt.com/) allows you to transform your data by writing, documenting, and executing SQL workflows. Setting up the dbt project requires specifying connection details for your data platform, in this case, BigQuery. Here’s a step-by-step guide to help you set this up:

1. **Navigate to the dbt Project Directory**:

   Change to the directory containing the dbt configuration:
   ```bash
   cd dbt_project
   ```

2. **Update Connection Details**:

   You'll find a `profiles.yml` file within the directory. This file contains configurations for dbt to connect with your data platform. Update this file with your BigQuery connection details.

3. **Utilize Environment Variables (Optional but Recommended)**:

   To keep your credentials secure, you can leverage environment variables. An example is provided within the `profiles.yml` file.

4. **Test the Connection**:

   Once you’ve updated the connection details, you can test the connection to your BigQuery instance using:
   ```bash
   dbt debug
   ```

   If everything is set up correctly, this command should report a successful connection to BigQuery.

## 4. Orchestrating with Dagster

[Dagster](https://dagster.io/) is a modern data orchestrator designed to help you build, test, and monitor your data workflows. In this section, we'll walk you through setting up Dagster to oversee both the Airbyte and dbt workflows:

1. **Navigate to the Orchestration Directory**:

   Switch to the directory containing the Dagster orchestration configurations:
   ```bash
   cd orchestration
   ```

2. **Set Environment Variables**:

   Dagster requires certain environment variables to be set to interact with other tools like dbt and Airbyte. Set the following variables:

   ```bash
   export DAGSTER_DBT_PARSE_PROJECT_ON_LOAD=1
   export AIRBYTE_PASSWORD=password
   ```
   
   Note: The `AIRBYTE_PASSWORD` is set to `password` as a default for local Airbyte instances. If you've changed this during your Airbyte setup, ensure you use the appropriate password here.

3. **Launch the Dagster UI**:

   With the environment variables in place, kick-start the Dagster UI:
   ```bash
   dagster dev
   ```

4. **Access Dagster in Your Browser**:

   Open your browser and navigate to:
   ```
   http://127.0.0.1:3000
   ```

   Here, you should see assets for both Airbyte and dbt. To get an overview of how these assets interrelate, click on "view global asset lineage". This will give you a clear picture of the data lineage, visualizing how data flows between the tools.

## Next Steps

Once you've set up and launched this initial integration, the real power lies in its adaptability and extensibility. Here’s a roadmap to help you customize and harness this project tailored to your specific data needs:

1. **Create dbt Sources for Airbyte Data**:

   Your raw data extracted via Airbyte can be represented as sources in dbt. Start by [creating new dbt sources](https://docs.getdbt.com/docs/build/sources) to represent this data, allowing for structured transformations down the line.

2. **Add Your dbt Transformations**:

   With your dbt sources in place, you can now build upon them. Add your custom SQL transformations in dbt, ensuring that you treat the sources as an upstream dependency. This ensures that your transformations work on the most up-to-date raw data.

3. **Execute the Pipeline in Dagster**:

   Navigate to the Dagster UI and click on "Materialize all". This triggers the entire pipeline, encompassing the extraction via Airbyte, transformations via dbt, and any other subsequent steps.

4. **Extend the Project**:

   The real beauty of this integration is its extensibility. Whether you want to add more data sources, integrate additional tools, or enhance your transformation logic – the floor is yours. With the foundation set, sky's the limit for how you want to extend and refine your data processes.

================================================
FILE: airbyte_dbt_dagster/dbt_project/.gitignore
================================================

target/
dbt_packages/
logs/

#Desktop Services Store
.DS_Store

#User cookie
.user.yml

================================================
FILE: airbyte_dbt_dagster/dbt_project/README.md
================================================
Welcome to your new dbt project!

### Using the starter project

Try running the following commands:
- dbt run
- dbt test


### Resources:
- Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction)
- Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers
- Join the [chat](https://community.getdbt.com/) on Slack for live discussions and support
- Find [dbt events](https://events.getdbt.com) near you
- Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices


================================================
FILE: airbyte_dbt_dagster/dbt_project/analyses/.gitkeep
================================================


================================================
FILE: airbyte_dbt_dagster/dbt_project/dbt_project.yml
================================================

# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'dbt_project'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'dbt_project'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  dbt_project:
    # Config indicated by + and applies to all files under models/example/
    example:
      +materialized: view


================================================
FILE: airbyte_dbt_dagster/dbt_project/macros/.gitkeep
================================================


================================================
FILE: airbyte_dbt_dagster/dbt_project/models/example/my_first_dbt_model.sql
================================================

/*
    Welcome to your first dbt model!
    Did you know that you can also configure models directly within SQL files?
    This will override configurations stated in dbt_project.yml

    Try changing "table" to "view" below
*/

{{ config(materialized='table') }}

with source_data as (

    select 1 as id
    union all
    select null as id

)

select *
from source_data

/*
    Uncomment the line below to remove records with null `id` values
*/

-- where id is not null


================================================
FILE: airbyte_dbt_dagster/dbt_project/models/example/my_second_dbt_model.sql
================================================

-- Use the `ref` function to select from other models

select *
from {{ ref('my_first_dbt_model') }}
where id = 1


================================================
FILE: airbyte_dbt_dagster/dbt_project/models/example/schema.yml
================================================

version: 2

models:
  - name: my_first_dbt_model
    description: "A starter dbt model"
    columns:
      - name: id
        description: "The primary key for this table"
        tests:
          - unique
          - not_null

  - name: my_second_dbt_model
    description: "A starter dbt model"
    columns:
      - name: id
        description: "The primary key for this table"
        tests:
          - unique
          - not_null


================================================
FILE: airbyte_dbt_dagster/dbt_project/profiles.yml
================================================
dbt_project:
  outputs:
    dev:
      dataset: my_dataset
      job_execution_timeout_seconds: 300
      job_retries: 1
      keyfile: "{{ env_var('DBT_BIGQUERY_KEYFILE_PATH', '') }}"
      location: my_dataset_location
      method: service-account
      priority: interactive
      project: my_project_id
      threads: 1
      type: bigquery
  target: dev

================================================
FILE: airbyte_dbt_dagster/dbt_project/seeds/.gitkeep
================================================


================================================
FILE: airbyte_dbt_dagster/dbt_project/snapshots/.gitkeep
================================================


================================================
FILE: airbyte_dbt_dagster/dbt_project/tests/.gitkeep
================================================


================================================
FILE: airbyte_dbt_dagster/infra/.gitignore
================================================
# Local .terraform directories
**/.terraform/*

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log
crash.*.log

# Exclude all .tfvars files, which are likely to contain sensitive data, such as
# password, private keys, and other secrets. These should not be part of version 
# control as they are data points which are potentially sensitive and subject 
# to change depending on the environment.
*.tfvars
*.tfvars.json

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*

# Ignore CLI configuration files
.terraformrc
terraform.rc

================================================
FILE: airbyte_dbt_dagster/infra/airbyte/.terraform.lock.hcl
================================================
# This file is maintained automatically by "terraform init".
# Manual edits may be lost in future updates.

provider "registry.terraform.io/airbytehq/airbyte" {
  version     = "0.3.3"
  constraints = "0.3.3"
  hashes = [
    "h1:a6g5uWP/pt1/popVNlKwnTssWNfdYY4KVFPMisN/yvU=",
    "zh:0efa470b34d9b912b47efe4469c51713bfc3c2413e52c17e1e903f2a3cddb2f6",
    "zh:1bddd69fa2c2d4f3e239d60555446df9bc4ce0c0cabbe7e092fe1d44989ab004",
    "zh:2e20540403a0010007b53456663fb037b24e30f6c8943f65da1bcf7fa4dfc8a6",
    "zh:2f415369ad884e8b7115a5c5ff229d052f7af1fca27abbfc8ebef379ed11aec4",
    "zh:46fd9a906f4b6461112dcc5a5aa01a3fcd7a19a72d4ad0b2e37790da37701fe1",
    "zh:83503ebb77bb6d6941c42ba323cf22380d08a1506554a2dcc8ac54e74c0886a1",
    "zh:890df766e9b839623b1f0437355032a3c006226a6c200cd911e15ee1a9014e9f",
    "zh:8fd770eff726826d3a63b9e3733c5455b5cde004027b04ee3f75888eb8538c90",
    "zh:b0fc890ed4f9b077bf70ed121cc3550e7a07d16e7798ad517623274aa62ad7b0",
    "zh:c2a01612362da9b73cd5958f281e1aa7ff09af42182e463097d11ed78e778e72",
    "zh:c64b2bb1887a0367d64ba3393d4b3a16c418cf5b1792e2e7aae7c0b5413eb334",
    "zh:ce14ebbf0ed91913ec62655a511763dec62b5779de9a209bd6f1c336640cddc0",
    "zh:e0662ca837eee10f7733ea9a501d995281f56bd9b410ae13ad03eb106011db14",
    "zh:e103d480fc6066004bc98e9e04a141a1f55b918cc2912716beebcc6fc4c872fb",
    "zh:e2507049098f0f1b21cb56870f4a5ef624bcf6d3959e5612eada1f8117341648",
  ]
}


================================================
FILE: airbyte_dbt_dagster/infra/airbyte/main.tf
================================================
// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs

// Sources
resource "airbyte_source_postgres" "postgres" {
    configuration = {
        database = "...my_database..."
        host = "...my_host..."
        username = "...my_username..."
        password = "...my_password..."
        port = 5432
        source_type = "postgres"
        schemas = [
            "...my_schema..."
        ]
        ssl_mode = {
            allow = {}
        }
        tunnel_method = {
            no_tunnel = {}
        }
        replication_method = {}
    }
    name = "Postgres"
    workspace_id = var.workspace_id
}

// Destinations
resource "airbyte_destination_bigquery" "bigquery" {
    configuration = {
        dataset_id = "...my_dataset_id..."
        dataset_location = "...my_dataset_location..."
        destination_type = "bigquery"
        project_id = "...my_project_id..."
        credentials_json = "...my_credentials_json_file_path..."
        loading_method = {
            destination_bigquery_loading_method_standard_inserts = {
                method = "Standard"
            }
        }
    }
    name = "BigQuery"
    workspace_id = var.workspace_id
}

// Connections
resource "airbyte_connection" "postgres_to_bigquery" {
    name = "Postgres to BigQuery"
    source_id = airbyte_source_postgres.postgres.source_id
    destination_id = airbyte_destination_bigquery.bigquery.destination_id
    configurations = {
        streams = [
            {
                name = "...my_table_name_1..."
            },
            {
                name = "...my_table_name_2..."
            },
        ]
    }
}

================================================
FILE: airbyte_dbt_dagster/infra/airbyte/provider.tf
================================================
// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs

terraform {
  required_providers {
    airbyte = {
      source = "airbytehq/airbyte"
      version = "0.3.3"
    }
  }
}

provider "airbyte" {
  // If running locally (Airbyte OSS) with docker-compose using the airbyte-proxy, 
  // include the actual password/username you've set up (or use the defaults below)
  username = "airbyte"
  password = "password"
  
  // if running locally (Airbyte OSS), include the server url to the airbyte-api-server
  server_url = "http://localhost:8006/v1"
}

================================================
FILE: airbyte_dbt_dagster/infra/airbyte/variables.tf
================================================
variable "workspace_id" {
    type = string
}








================================================
FILE: airbyte_dbt_dagster/orchestration/orchestration/__init__.py
================================================


================================================
FILE: airbyte_dbt_dagster/orchestration/orchestration/assets.py
================================================
import os
from dagster import OpExecutionContext
from dagster_dbt import DbtCliResource, dbt_assets
from dagster_airbyte import AirbyteResource, load_assets_from_airbyte_instance

from .constants import dbt_manifest_path

@dbt_assets(manifest=dbt_manifest_path)
def dbt_project_dbt_assets(context: OpExecutionContext, dbt: DbtCliResource):
    yield from dbt.cli(["build"], context=context).stream()

airbyte_instance = AirbyteResource(
    host="localhost",
    port="8000",
    # If using basic auth, include username and password:
    username="airbyte",
    password=os.getenv("AIRBYTE_PASSWORD")
)

airbyte_assets = load_assets_from_airbyte_instance(airbyte_instance)

================================================
FILE: airbyte_dbt_dagster/orchestration/orchestration/constants.py
================================================
import os
from pathlib import Path

from dagster_dbt import DbtCliResource

dbt_project_dir = Path(__file__).joinpath("..", "..", "..", "dbt_project").resolve()
dbt = DbtCliResource(project_dir=os.fspath(dbt_project_dir))

# If DAGSTER_DBT_PARSE_PROJECT_ON_LOAD is set, a manifest will be created at runtime.
# Otherwise, we expect a manifest to be present in the project's target directory.
if os.getenv("DAGSTER_DBT_PARSE_PROJECT_ON_LOAD"):
    dbt_parse_invocation = dbt.cli(["parse"], manifest={}).wait()
    dbt_manifest_path = dbt_parse_invocation.target_path.joinpath("manifest.json")
else:
    dbt_manifest_path = dbt_project_dir.joinpath("target", "manifest.json")

================================================
FILE: airbyte_dbt_dagster/orchestration/orchestration/definitions.py
================================================
import os

from dagster import Definitions
from dagster_dbt import DbtCliResource

from .assets import dbt_project_dbt_assets, airbyte_assets
from .constants import dbt_project_dir
from .schedules import schedules

defs = Definitions(
    assets=[dbt_project_dbt_assets, airbyte_assets],
    schedules=schedules,
    resources={
        "dbt": DbtCliResource(project_dir=os.fspath(dbt_project_dir)),
    },
)

================================================
FILE: airbyte_dbt_dagster/orchestration/orchestration/schedules.py
================================================
"""
To add a daily schedule that materializes your dbt assets, uncomment the following lines.
"""
from dagster_dbt import build_schedule_from_dbt_selection

from .assets import dbt_project_dbt_assets

schedules = [
#     build_schedule_from_dbt_selection(
#         [dbt_project_dbt_assets],
#         job_name="materialize_dbt_models",
#         cron_schedule="0 0 * * *",
#         dbt_select="fqn:*",
#     ),
]

================================================
FILE: airbyte_dbt_dagster/orchestration/pyproject.toml
================================================
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[tool.dagster]
module_name = "orchestration.definitions"
code_location_name = "orchestration"

================================================
FILE: airbyte_dbt_dagster/orchestration/setup.py
================================================
from setuptools import find_packages, setup

setup(
    name="orchestration",
    version="0.0.1",
    packages=find_packages(),
    install_requires=[
        "dagster",
        "dagster-cloud",
        "dagster-dbt",
        "dbt-core>=1.4.0",
        "dbt-bigquery",
    ],
    extras_require={
        "dev": [
            "dagster-webserver",
        ]
    },
)

================================================
FILE: airbyte_dbt_dagster/setup.py
================================================
from setuptools import find_packages, setup

setup(
    name="airbyte-dbt-dagster",
    packages=find_packages(),
    install_requires=[
        "dbt-bigquery",
        "dagster",
        "dagster-cloud",
        "dagster-dbt",
        "dagster-airbyte",
    ],
    extras_require={"dev": ["dagit", "pytest"]},
)

================================================
FILE: airbyte_dbt_dagster_snowflake/.gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

#Desktop Services Store
.DS_Store

================================================
FILE: airbyte_dbt_dagster_snowflake/README.md
================================================
# Airbyte-dbt-Dagster-Snowflake Integration

Welcome to the "Airbyte-dbt-Dagster-Snowflake Integration" repository! This repo provides a quickstart template for building a full data stack using Airbyte, Dagster, dbt, and Snowflake. Easily extract data from Postgres and load it into Snowflake using Airbyte, and apply necessary transformations using dbt, all orchestrated seamlessly with Dagster. While this template doesn't delve into specific data or transformations, its goal is to showcase the synergy of these tools.

This quickstart is designed to minimize setup hassles and propel you forward.

## Table of Contents

- [Airbyte-dbt-Dagster-Snowflake Integration](#airbyte-dbt-dagster-snowflake-integration)
  - [Table of Contents](#table-of-contents)
  - [Infrastructure Layout](#infrastructure-layout)
  - [Pipeline DAG](#pipeline-dag)
  - [Prerequisites](#prerequisites)
  - [1. Setting an environment for your project](#1-setting-an-environment-for-your-project)
  - [2. Setting Up Airbyte Connectors with Terraform](#2-setting-up-airbyte-connectors-with-terraform)
  - [3. Setting Up the dbt Project](#3-setting-up-the-dbt-project)
  - [4. Orchestrating with Dagster](#4-orchestrating-with-dagster)
  - [Next Steps](#next-steps)

## Infrastructure Layout
![insfrastructure layout](images/dad_snowflake_stack.png)

## Pipeline DAG
![pipeline dag](images/dag.svg)

## Prerequisites

Before you embark on this integration, ensure you have the following set up and ready:

1. **Python 3.10 or later**: If not installed, download and install it from [Python's official website](https://www.python.org/downloads/).

2. **Docker and Docker Compose (Docker Desktop)**: Install [Docker](https://docs.docker.com/get-docker/) following the official documentation for your specific OS.

3. **Airbyte OSS version**: Deploy the open-source version of Airbyte. Follow the installation instructions from the [Airbyte Documentation](https://docs.airbyte.com/quickstart/deploy-airbyte/).

4. **Terraform**: Terraform will help you provision and manage the Airbyte resources. If you haven't installed it, follow the [official Terraform installation guide](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli).


## 1. Setting an environment for your project

Get the project up and running on your local machine by following these steps:

1. **Clone the repository (Clone only this quickstart)**:  
   ```bash
   git clone --filter=blob:none --sparse  https://github.com/airbytehq/quickstarts.git
   ```

   ```bash
   cd quickstarts
   ```

   ```bash
   git sparse-checkout add airbyte_dbt_dagster_snowflake
   ```

   
2. **Navigate to the directory**:  
   ```bash
   cd airbyte_dbt_dagster_snowflake
   ```

3. **Set Up a Virtual Environment**:  
   - For Mac:
     ```bash
     python3 -m venv venv
     source venv/bin/activate
     ```
   - For Windows:
     ```bash
     python -m venv venv
     .\venv\Scripts\activate
     ```

4. **Install Dependencies**:  
   ```bash
   pip install -e ".[dev]"
   ```

## 2. Setting Up Airbyte Connectors with Terraform

Airbyte allows you to create connectors for sources and destinations, facilitating data synchronization between various platforms. In this project, we're harnessing the power of Terraform to automate the creation of these connectors and the connections between them. Here's how you can set this up:

1. **Navigate to the Airbyte Configuration Directory**:
   
   Change to the relevant directory containing the Terraform configuration for Airbyte:
   ```bash
   cd infra/airbyte
   ```

2. **Modify Configuration Files**:

   Within the `infra/airbyte` directory, you'll find three crucial Terraform files:
    - `provider.tf`: Defines the Airbyte provider.
    - `main.tf`: Contains the main configuration for creating Airbyte resources.
    - `variables.tf`: Holds various variables, including credentials.

   Adjust the configurations in these files to suit your project's needs. Specifically, provide credentials for your Postgres and Snowflake connections. You can utilize the `variables.tf` file to manage these credentials.

3. **Initialize Terraform**:
   
   This step prepares Terraform to create the resources defined in your configuration files.
   ```bash
   terraform init
   ```

4. **Review the Plan**:

   Before applying any changes, review the plan to understand what Terraform will do.
   ```bash
   terraform plan
   ```

5. **Apply Configuration**:

   After reviewing and confirming the plan, apply the Terraform configurations to create the necessary Airbyte resources.
   ```bash
   terraform apply
   ```

6. **Verify in Airbyte UI**:

   Once Terraform completes its tasks, navigate to the Airbyte UI. Here, you should see your source and destination connectors, as well as the connection between them, set up and ready to go.

## 3. Setting Up the dbt Project

[dbt (data build tool)](https://www.getdbt.com/) allows you to transform your data by writing, documenting, and executing SQL workflows. Setting up the dbt project requires specifying connection details for your data platform, in this case, Snowflake. Here’s a step-by-step guide to help you set this up:

1. **Navigate to the dbt Project Directory**:

   Change to the directory containing the dbt configuration:
   ```bash
   cd ../../dbt_project
   ```

2. **Update Connection Details**:

   You'll find a `profiles.yml` file within the directory. This file contains configurations for dbt to connect with your data platform. Update this file with your Snowflake connection details.

3. **Utilize Environment Variables (Optional but Recommended)**:

   To keep your credentials secure, you can leverage environment variables. An example is provided within the `profiles.yml` file.

4. **Test the Connection**:

   Once you’ve updated the connection details, you can test the connection to your Snowflake instance using:
   ```bash
   dbt debug
   ```

   If everything is set up correctly, this command should report a successful connection to Snowflake.

## 4. Orchestrating with Dagster

[Dagster](https://dagster.io/) is a modern data orchestrator designed to help you build, test, and monitor your data workflows. In this section, we'll walk you through setting up Dagster to oversee both the Airbyte and dbt workflows:

1. **Navigate to the Orchestration Directory**:

   Switch to the directory containing the Dagster orchestration configurations:
   ```bash
   cd ../../orchestration
   ```

2. **Set Environment Variables**:

   Dagster requires certain environment variables to be set to interact with other tools like dbt and Airbyte. Set the following variables:

   ```bash
   export DAGSTER_DBT_PARSE_PROJECT_ON_LOAD=1
   export AIRBYTE_PASSWORD=password
   ```
   
   Note: The `AIRBYTE_PASSWORD` is set to `password` as a default for local Airbyte instances. If you've changed this during your Airbyte setup, ensure you use the appropriate password here.

3. **Launch the Dagster UI**:

   With the environment variables in place, kick-start the Dagster UI:
   ```bash
   dagster dev
   ```

4. **Access Dagster in Your Browser**:

   Open your browser and navigate to:
   ```
   http://127.0.0.1:3000
   ```

   Here, you should see assets for both Airbyte and dbt. To get an overview of how these assets interrelate, click on "view global asset lineage". This will give you a clear picture of the data lineage, visualizing how data flows between the tools.

## Next Steps

Once you've set up and launched this initial integration, the real power lies in its adaptability and extensibility. Here’s a roadmap to help you customize and harness this project tailored to your specific data needs:

1. **Create dbt Sources for Airbyte Data**:

   Your raw data extracted via Airbyte can be represented as sources in dbt. Start by [creating new dbt sources](https://docs.getdbt.com/docs/build/sources) to represent this data, allowing for structured transformations down the line.

2. **Add Your dbt Transformations**:

   With your dbt sources in place, you can now build upon them. Add your custom SQL transformations in dbt, ensuring that you treat the sources as an upstream dependency. This ensures that your transformations work on the most up-to-date raw data.

3. **Execute the Pipeline in Dagster**:

   Navigate to the Dagster UI and click on "Materialize all". This triggers the entire pipeline, encompassing the extraction via Airbyte, transformations via dbt, and any other subsequent steps.

4. **Extend the Project**:

   The real beauty of this integration is its extensibility. Whether you want to add more data sources, integrate additional tools, or enhance your transformation logic – the floor is yours. With the foundation set, sky's the limit for how you want to extend and refine your data processes.

================================================
FILE: airbyte_dbt_dagster_snowflake/dbt_project/.gitignore
================================================

target/
dbt_packages/
logs/

#Desktop Services Store
.DS_Store

#User cookie
.user.yml

================================================
FILE: airbyte_dbt_dagster_snowflake/dbt_project/README.md
================================================
Welcome to your new dbt project!

### Using the starter project

Try running the following commands:
- dbt run
- dbt test


### Resources:
- Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction)
- Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers
- Join the [chat](https://community.getdbt.com/) on Slack for live discussions and support
- Find [dbt events](https://events.getdbt.com) near you
- Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices


================================================
FILE: airbyte_dbt_dagster_snowflake/dbt_project/analyses/.gitkeep
================================================


================================================
FILE: airbyte_dbt_dagster_snowflake/dbt_project/dbt_project.yml
================================================

# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'dbt_project'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'dbt_project'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  dbt_project:
    # Config indicated by + and applies to all files under models/example/
    example:
      +materialized: view


================================================
FILE: airbyte_dbt_dagster_snowflake/dbt_project/macros/.gitkeep
================================================


================================================
FILE: airbyte_dbt_dagster_snowflake/dbt_project/models/example/my_first_dbt_model.sql
================================================

/*
    Welcome to your first dbt model!
    Did you know that you can also configure models directly within SQL files?
    This will override configurations stated in dbt_project.yml

    Try changing "table" to "view" below
*/

{{ config(materialized='table') }}

with source_data as (

    select * from {{ source('snowflake', 'sample_table') }}

)

select *
from source_data

/*
    Uncomment the line below to remove records with null `id` values
*/

-- where id is not null


================================================
FILE: airbyte_dbt_dagster_snowflake/dbt_project/models/example/my_second_dbt_model.sql
================================================

-- Use the `ref` function to select from other models

select *
from {{ ref('my_first_dbt_model') }}
where id = 1


================================================
FILE: airbyte_dbt_dagster_snowflake/dbt_project/models/example/schema.yml
================================================

version: 2

models:
  - name: my_first_dbt_model
    description: "A starter dbt model"
    columns:
      - name: id
        description: "The primary key for this table"
        tests:
          - unique
          - not_null

  - name: my_second_dbt_model
    description: "A starter dbt model"
    columns:
      - name: id
        description: "The primary key for this table"
        tests:
          - unique
          - not_null


================================================
FILE: airbyte_dbt_dagster_snowflake/dbt_project/models/sources.yml
================================================
version: 2

sources:
  - name: snowflake
    tables:
      - name: sample_table
        meta:
          dagster:
            asset_key: ["sample_table"] # This metadata specifies the corresponding Dagster asset for this dbt source.


================================================
FILE: airbyte_dbt_dagster_snowflake/dbt_project/profiles.yml
================================================
dbt_project:
  outputs:
    dev:

      type: snowflake
      account: "{{ env_var('DBT_SNOWFLAKE_ACCOUNT_ID', '') }}"

      # User/password auth
      user: username
      password: "{{ env_var('DBT_SNOWFLAKE_PASSWORD', '') }}"

      role: user_role
      database: database_name
      warehouse: warehouse_name
      schema: dbt_schema
      threads: 1
      client_session_keep_alive: False
      query_tag: anything

      # optional
      connect_retries: 0 # default 0
      connect_timeout: 10 # default: 10
      retry_on_database_errors: False # default: false
      retry_all: False  # default: false
      reuse_connections: False # default: false (available v1.4+)

  target: dev

================================================
FILE: airbyte_dbt_dagster_snowflake/dbt_project/seeds/.gitkeep
================================================


================================================
FILE: airbyte_dbt_dagster_snowflake/dbt_project/snapshots/.gitkeep
================================================


================================================
FILE: airbyte_dbt_dagster_snowflake/dbt_project/tests/.gitkeep
================================================


================================================
FILE: airbyte_dbt_dagster_snowflake/infra/.gitignore
================================================
# Local .terraform directories
**/.terraform/*

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log
crash.*.log

# Exclude all .tfvars files, which are likely to contain sensitive data, such as
# password, private keys, and other secrets. These should not be part of version 
# control as they are data points which are potentially sensitive and subject 
# to change depending on the environment.
*.tfvars
*.tfvars.json

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*

# Ignore CLI configuration files
.terraformrc
terraform.rc

================================================
FILE: airbyte_dbt_dagster_snowflake/infra/airbyte/main.tf
================================================
// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs

// Sources
resource "airbyte_source_postgres" "postgres" {
    configuration = {
        database = "...my_database..."
        host = "...my_host..."
        username = "...my_username..."
        password = "...my_password..."
        port = 5432
        source_type = "postgres"
        schemas = [
            "...my_schema..."
        ]
        ssl_mode = {
            source_postgres_ssl_modes_allow = {
                mode = "allow"
            }
        }
        tunnel_method = {
            source_postgres_ssh_tunnel_method_no_tunnel = {
                tunnel_method = "NO_TUNNEL"
            }
        }
        replication_method = {
            source_postgres_update_method_scan_changes_with_user_defined_cursor = {
                method = "Standard"
            }
        }
    }
    name = "Postgres"
    workspace_id = var.workspace_id
}

// Destinations
resource "airbyte_destination_snowflake" "snowflake" {
    configuration = {
        credentials = {
            destination_snowflake_authorization_method_key_pair_authentication = {
                auth_type            = "Key Pair Authentication"
                private_key          = "...my_private_key..."
                private_key_password = "...my_private_key_password..."
            }
        }
        database         = "AIRBYTE_DATABASE"
        destination_type = "snowflake"
        host             = "accountname.us-east-2.aws.snowflakecomputing.com"
        jdbc_url_params  = "...my_jdbc_url_params..."
        raw_data_schema  = "...my_raw_data_schema..."
        role             = "AIRBYTE_ROLE"
        schema           = "AIRBYTE_SCHEMA"
        username         = "AIRBYTE_USER"
        warehouse        = "AIRBYTE_WAREHOUSE"
    }
    name         = "Snowflake"
    workspace_id = var.workspace_id
}   

// Connections
resource "airbyte_connection" "postgres_to_snowflake" {
    name = "Postgres to Snowflake"
    source_id = airbyte_source_postgres.postgres.source_id
    destination_id = airbyte_destination_snowflake.snowflake.destination_id
    configurations = {
        streams = [
            {
                name = "...my_table_name_1..."
            },
            {
                name = "...my_table_name_2..."
            },
        ]
    }
}

================================================
FILE: airbyte_dbt_dagster_snowflake/infra/airbyte/provider.tf
================================================
// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs

terraform {
  required_providers {
    airbyte = {
      source = "airbytehq/airbyte"
      version = "0.3.4"
    }
  }
}

provider "airbyte" {
  // If running locally (Airbyte OSS) with docker-compose using the airbyte-proxy, 
  // include the actual password/username you've set up (or use the defaults below)
  username = "airbyte"
  password = "password"
  
  // if running locally (Airbyte OSS), include the server url to the airbyte-api-server
  server_url = "http://localhost:8006/v1"
}

================================================
FILE: airbyte_dbt_dagster_snowflake/infra/airbyte/variables.tf
================================================
variable "workspace_id" {
    type = string
}








================================================
FILE: airbyte_dbt_dagster_snowflake/orchestration/orchestration/__init__.py
================================================


================================================
FILE: airbyte_dbt_dagster_snowflake/orchestration/orchestration/assets.py
================================================
import os
from dagster import OpExecutionContext
from dagster_dbt import DbtCliResource, dbt_assets
from dagster_airbyte import AirbyteResource, load_assets_from_airbyte_instance

from .constants import dbt_manifest_path

@dbt_assets(manifest=dbt_manifest_path)
def dbt_project_dbt_assets(context: OpExecutionContext, dbt: DbtCliResource):
    yield from dbt.cli(["build"], context=context).stream()

airbyte_instance = AirbyteResource(
    host="localhost",
    port="8000",
    # If using basic auth, include username and password:
    username="airbyte",
    password=os.getenv("AIRBYTE_PASSWORD")
)

airbyte_assets = load_assets_from_airbyte_instance(airbyte_instance)

================================================
FILE: airbyte_dbt_dagster_snowflake/orchestration/orchestration/constants.py
================================================
import os
from pathlib import Path

from dagster_dbt import DbtCliResource

dbt_project_dir = Path(__file__).joinpath("..", "..", "..", "dbt_project").resolve()
dbt = DbtCliResource(project_dir=os.fspath(dbt_project_dir))

# If DAGSTER_DBT_PARSE_PROJECT_ON_LOAD is set, a manifest will be created at runtime.
# Otherwise, we expect a manifest to be present in the project's target directory.
if os.getenv("DAGSTER_DBT_PARSE_PROJECT_ON_LOAD"):
    dbt_parse_invocation = dbt.cli(["parse"], manifest={}).wait()
    dbt_manifest_path = dbt_parse_invocation.target_path.joinpath("manifest.json")
else:
    dbt_manifest_path = dbt_project_dir.joinpath("target", "manifest.json")

================================================
FILE: airbyte_dbt_dagster_snowflake/orchestration/orchestration/definitions.py
================================================
import os

from dagster import Definitions
from dagster_dbt import DbtCliResource

from .assets import dbt_project_dbt_assets, airbyte_assets
from .constants import dbt_project_dir
from .schedules import schedules

defs = Definitions(
    assets=[dbt_project_dbt_assets, airbyte_assets],
    schedules=schedules,
    resources={
        "dbt": DbtCliResource(project_dir=os.fspath(dbt_project_dir)),
    },
)

================================================
FILE: airbyte_dbt_dagster_snowflake/orchestration/orchestration/schedules.py
================================================
"""
To add a daily schedule that materializes your dbt assets, uncomment the following lines.
"""
from dagster_dbt import build_schedule_from_dbt_selection

from .assets import dbt_project_dbt_assets

schedules = [
#     build_schedule_from_dbt_selection(
#         [dbt_project_dbt_assets],
#         job_name="materialize_dbt_models",
#         cron_schedule="0 0 * * *",
#         dbt_select="fqn:*",
#     ),
]

================================================
FILE: airbyte_dbt_dagster_snowflake/orchestration/pyproject.toml
================================================
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[tool.dagster]
module_name = "orchestration.definitions"
code_location_name = "orchestration"

================================================
FILE: airbyte_dbt_dagster_snowflake/orchestration/setup.py
================================================
from setuptools import find_packages, setup

setup(
    name="orchestration",
    version="0.0.1",
    packages=find_packages(),
    install_requires=[
        "dagster",
        "dagster-cloud",
        "dagster-dbt",
        "dbt-core>=1.4.0",
        "dbt-snowflake",
    ],
    extras_require={
        "dev": [
            "dagster-webserver",
        ]
    },
)

================================================
FILE: airbyte_dbt_dagster_snowflake/setup.py
================================================
from setuptools import find_packages, setup

setup(
    name="airbyte-dbt-dagster-snowflake",
    packages=find_packages(),
    install_requires=[
        "dbt-snowflake",
        "dagster",
        "dagster-cloud",
        "dagster-dbt",
        "dagster-airbyte",
    ],
    extras_require={"dev": ["dagit", "pytest"]},
)

================================================
FILE: airbyte_dbt_prefect_bigquery/.gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

#Desktop Services Store
.DS_Store

================================================
FILE: airbyte_dbt_prefect_bigquery/README.md
================================================
# Airbyte-dbt-Prefect-BigQuery Integration

Welcome to the Prefect, Airbyte, dbt (PAD) Stack with BigQuery quickstart! This repo contains the code to show how to utilize Airbyte and dbt for data extraction and transformation, and utilize Prefect to orchestrate the data workflows, providing a end-to-end ELT pipeline. With this setup, you can pull fake e-commerce data, put it into BigQuery, and play around with it using dbt and Prefect.

## Infrastructure Layout
![insfrastructure layout](images/pad_bigquery_stack.png)

## Pipeline DAG
![pipeline dag](images/dag.png)

## Table of Contents
   - [Prerequisites](#prerequisites)
   - [Setting an environment for your project](#1-setting-an-environment-for-your-project)
   - [Setting Up BigQuery to work with Airbyte and dbt](#2-setting-up-bigquery)
   - [Setting Up Airbyte Connectors](#3-setting-up-airbyte-connectors)
   - [Setting Up the dbt Project](#4-setting-up-the-dbt-project)
   - [Orchestrating with Prefect](#5)
   - [Next Steps](#next-steps)

## Prerequisites

Before you embark on this integration, ensure you have the following set up and ready:

1. **Python 3.10 or later**: If not installed, download and install it from [Python's official website](https://www.python.org/downloads/).

2. **Docker and Docker Compose (Docker Desktop)**: Install [Docker](https://docs.docker.com/get-docker/) following the official documentation for your specific OS.

3. **Airbyte OSS version**: Deploy the open-source version of Airbyte locally. Follow the installation instructions from the [Airbyte Documentation](https://docs.airbyte.com/quickstart/deploy-airbyte/).

4. **Terraform (Optional)**: Terraform will help you provision and manage the Airbyte resources. If you haven't installed it, follow the [official Terraform installation guide](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli). This is an optional step because you can also create and manage Airbyte resources via the UI. Both ways will be described below.

5. **Google Cloud account with BigQuery**: You will also need to add the necessary permissions to allow Airbyte and dbt to access the data in BigQuery. A step-by-step guide is provided [below](#2-setting-up-bigquery).


## 1. Setting an environment for your project

Get the project up and running on your local machine by following these steps:

1. **Clone the repository (Clone only this quickstart)**:  
   ```bash
   git clone --filter=blob:none --sparse  https://github.com/airbytehq/quickstarts.git
   ```

   ```bash
   cd quickstarts
   ```

   ```bash
   git sparse-checkout add airbyte_dbt_prefect_bigquery
   ```

   
2. **Navigate to the directory**:  
   ```bash
   cd airbyte_dbt_prefect_bigquery
   ```

   At this point you can view the code in your preferred IDE.

3. **Set Up a Virtual Environment**:  
   - For Mac:
     ```bash
     python3 -m venv venv
     source venv/bin/activate
     ```

   - For Windows:
     ```bash
     python -m venv venv
     .\venv\Scripts\activate
     ```

4. **Install Dependencies**:  
   ```bash
   pip install -e ".[dev]"
   ```

## 2. Setting up BigQuery
If you don't have a Google Cloud account, you can sign up and get free credits, which are more than enough to implement this project.

1. **Create a Google Cloud project**:
   - Go to the [Google Cloud Console](https://console.cloud.google.com/).
   - Click on the "Select a project" dropdown at the top right and select "New Project".
   - Give your project a name and follow the steps to create it.

2. **Create BigQuery datasets**:
   - In the Google Cloud Console, go to BigQuery.
   - Make two new datasets: `raw_data` for Airbyte and `transformed_data` for dbt.
     - If you pick different names, remember to change the names in the code too.
   
   **How to create a dataset:**
   - In the left sidebar, click on your project name.
   - Click “Create Dataset”.
   - Enter the dataset ID (either `raw_data` or `transformed_data`).
   - Click "Create Dataset".

3. **Create a Service Account and Assign Roles**:
   - Go to “IAM & Admin” > “Service accounts” in the Google Cloud Console.
   - Click “Create Service Account”.
   - Name your service account.
   - Assign the “BigQuery Data Editor” and “BigQuery Job User” roles to the service account.

   **How to create a service account and assign roles:**
   - While creating the service account, under the “Grant this service account access to project” section, click the “Role” dropdown.
   - Choose the “BigQuery Data Editor” and “BigQuery Job User” roles.
   - Finish the creation process.
   
4. **Generate a JSON key for the Service Account**:
   - Make a JSON key to let the service account sign in.
   
   **How to generate a JSON key:**
   - Find the service account in the “Service accounts” list.
   - Click on the service account name.
   - In the “Keys” section, click “Add Key” and pick JSON.
   - The key will download automatically. Keep it safe and don’t share it.

## 3. Setting Up Airbyte Connectors

To set up your Airbyte connectors, you can choose to do it via Terraform, or the UI. Choose one of the two following options.

### 3.1. Setting Up Airbyte Connectors with Terraform

Airbyte allows you to create connectors for sources and destinations via Terraform, facilitating data synchronization between various platforms. Here's how you can set this up:

1. **Navigate to the Airbyte Configuration Directory**:

   ```bash
   cd infra/airbyte
   ```

2. **Modify Configuration Files**:

   Within the `infra/airbyte` directory, you'll find three crucial Terraform files:
    - `provider.tf`: Defines the Airbyte provider.
    - `main.tf`: Contains the main configuration for creating Airbyte resources.
    - `variables.tf`: Holds various variables, including credentials.

   Adjust the configurations in these files to suit your project's needs: 

   - Provide credentials for your BigQuery connection in the `main.tf` file.
      - `dataset_id`: The name of the BigQuery dataset where Airbyte will load data. In this case, enter “raw_data”.
      - `project_id`: Your BigQuery project ID.
      - `credentials_json`: The contents of the service account JSON file. You should input a string, so you need to convert the JSON content to string beforehand.
      - `workspace_id`: Your Airbyte workspace ID, which can be found in the webapp url. For example, in this url: http://localhost:8000/workspaces/910ab70f-0a67-4d25-a983-999e99e1e395/ the workspace id would be `910ab70f-0a67-4d25-a983-999e99e1e395`.

   - Alternatively, you can utilize the `variables.tf` file to manage these credentials:
      - You’ll be prompted to enter the credentials when you execute `terraform plan` and `terraform apply`. If going for this option, just move to the next step. If you don’t want to use variables, remove them from the file.

3. **Initialize Terraform**:
   
   This step prepares Terraform to create the resources defined in your configuration files.
   ```bash
   terraform init
   ```

4. **Review the Plan**:

   Before applying any changes, review the plan to understand what Terraform will do.
   ```bash
   terraform plan
   ```

5. **Apply Configuration**:

   After reviewing and confirming the plan, apply the Terraform configurations to create the necessary Airbyte resources.
   ```bash
   terraform apply
   ```

6. **Verify in Airbyte UI**:

   Once Terraform completes its tasks, navigate to the [Airbyte UI](http://localhost:8000/). Here, you should see your source and destination connectors, as well as the connection between them, set up and ready to go 🎉.

### 3.2. Setting Up Airbyte Connectors Using the UI

Start by launching the Airbyte UI by going to http://localhost:8000/ in your browser. Then:

1. **Create a source**:

   - Go to the Sources tab and click on `+ New source`.
   - Search for “faker” using the search bar and select `Sample Data (Faker)`.
   - Adjust the Count and optional fields as needed for your use case. You can also leave as is. 
   - Click on `Set up source`.

2. **Create a destination**:

   - Go to the Destinations tab and click on `+ New destination`.
   - Search for “bigquery” using the search bar and select `BigQuery`.
   - Enter the connection details as needed.
   - For simplicity, you can use `Standard Inserts` as the loading method.
   - In the `Service Account Key JSON` field, enter the contents of the JSON file. Yes, the full JSON.
   - Click on `Set up destination`.

3. **Create a connection**:

   - Go to the Connections tab and click on `+ New connection`.
   - Select the source and destination you just created.
   - Enter the connection details as needed.
   - Click on `Set up connection`.

That’s it! Your connection is set up and ready to go! 🎉 

## 4. Setting Up the dbt Project

[dbt (data build tool)](https://www.getdbt.com/) allows you to transform your data by writing, documenting, and executing SQL workflows. Setting up the dbt project requires specifying connection details for your data platform, in this case, BigQuery. Here’s a step-by-step guide to help you set this up:

1. **Navigate to the dbt Project Directory**:

   Move to the directory containing the dbt configuration:
   ```bash
   cd ../../dbt_project
   ```

2. **Update Connection Details**:

   - You'll find a `profiles.yml` file within the directory. This file contains configurations for dbt to connect with your data platform. Update this file with your BigQuery connection details. Specifically, you need to update the Service Account JSON file path, the dataset location and your BigQuery project ID.
   - Provide your BigQuery project ID in the `database` field of the `/models/ecommerce/sources/faker_sources.yml` file.

   If you want to avoid hardcoding credentials in the `profiles.yml` file, you can leverage environment variables. Here's an example: `keyfile: "{{ env_var('DBT_BIGQUERY_KEYFILE_PATH', '') }}"`

3. **Test the Connection (Optional)**:
   You can test the connection to your BigQuery instance using the following command. Just take into account that you would need to provide the local path to your service account key file instead.
   
   ```bash
   dbt debug
   ```
   
   If everything is set up correctly, this command should report a successful connection to BigQuery 🎉.

## 5. Orchestrating with Prefect

[Prefect](https://prefect.io/) is an orchestration workflow tool that makes it easy to build, run, and monitor data workflows by writing Python code. In this section, we'll walk you through creating a Prefect flow to orchestrate both Airbyte extract and load operations, and dbt transformations with Python:

1. **Navigate to the Orchestration Directory**:

   Switch to the directory containing the Prefect orchestration configurations:
   ```bash
   cd ../orchestration
   ```

2. **Update the Airbyte Connection ID**:

   Open the `my_elt_flow.py` Python script and update the `connection_id` key in the `airbyte_connection` object.

   To find your connection id go to the [Airbyte UI](http://localhost:8000/), then select the connection you want to use from the "Connections" tab and copy the ID from the URL (you'll find it after `/connections/`, i.e., `e3646db8-6612-4142-8edf-1e51932b6836`).

3. **Set Environment Variables**:

   Prefect requires certain environment variables to be set to interact with other tools, like Airbyte. Set the following variables:

   ```bash
   export AIRBYTE_PASSWORD=password
   ```

   Additionally, set the following environment variable to avoid unnecessary notifications from Prefect:

   ```bash
   export PREFECT_API_SERVICES_FLOW_RUN_NOTIFICATIONS_ENABLED=false
   ```
   
4. **Connect to Prefect's API**:

   Open a new terminal window. Start a local Prefect server instance in your virtual environment:

   ```bash
   prefect server start
   ```

5. **Deploy the Flow**:

   Go back to your previous terminal and execute the following python script:

   ```bash
   python my_elt_flow.py
   ```

   When you run the flow script, Prefect will automatically create a flow deployment that you can interact with via the UI and API. The script will stay running so that it can listen for scheduled or triggered runs of this flow; once a run is found, it will be executed within a subprocess.

6. **Access Prefect UI in Your Browser**:

   Open your browser and navigate to:
   ```
   http://127.0.0.1:4200
   ```
   You can now begin interacting with your newly created deployment! 🎉

## Next Steps

Congratulations on deploying and running this quickstart! 🎉 Here are some suggestions on what you can explore next to dive deeper and get more out of your project:

### 1. **Explore the Data and Insights**
   - Dive into the datasets in BigQuery, run some queries, and explore the data you've collected and transformed. This is your chance to uncover insights and understand the data better!

### 2. **Optimize Your dbt Models**
   - Review the transformations you’ve applied using dbt. Try optimizing the models or create new ones based on your evolving needs and insights you want to extract.

### 3. **Expand Your Data Sources**
   - Add more data sources to Airbyte. Explore different types of sources available, and see how they can enrich your existing datasets and broaden your analytical capabilities.

### 4. **Enhance Data Quality and Testing**
   - Implement data quality tests in dbt to ensure the reliability and accuracy of your transformations. Use dbt's testing features to validate your data and catch issues early on.

### 5. **Scale Your Setup**
   - Consider scaling your setup to handle more data, more sources, and more transformations. Optimize your configurations and resources to ensure smooth and efficient processing of larger datasets.

### 7. **Contribute to the Community**
   - Share your learnings, optimizations, and new configurations with the community. Contribute to the respective tool’s communities and help others learn and grow.

================================================
FILE: airbyte_dbt_prefect_bigquery/dbt_project/.gitignore
================================================

target/
dbt_packages/
logs/

#Desktop Services Store
.DS_Store

#User cookie
.user.yml

================================================
FILE: airbyte_dbt_prefect_bigquery/dbt_project/README.md
================================================
Welcome to your new dbt project!

### Using the starter project

Try running the following commands:
- dbt run
- dbt test


### Resources:
- Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction)
- Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers
- Join the [chat](https://community.getdbt.com/) on Slack for live discussions and support
- Find [dbt events](https://events.getdbt.com) near you
- Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices


================================================
FILE: airbyte_dbt_prefect_bigquery/dbt_project/analyses/.gitkeep
================================================


================================================
FILE: airbyte_dbt_prefect_bigquery/dbt_project/dbt_project.yml
================================================

# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'dbt_project'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'dbt_project'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  dbt_project:
    # Config indicated by + and applies to all files under models/example/
    example:
      +materialized: view


================================================
FILE: airbyte_dbt_prefect_bigquery/dbt_project/macros/.gitkeep
================================================


================================================
FILE: airbyte_dbt_prefect_bigquery/dbt_project/models/marts/product_popularity.sql
================================================
WITH base AS (
  SELECT 
    product_id,
    COUNT(id) AS purchase_count
  FROM {{ ref('stg_purchases') }}
  GROUP BY 1
)

SELECT 
  p.id,
  p.make,
  p.model,
  b.purchase_count
FROM {{ ref('stg_products') }} p
LEFT JOIN base b ON p.id = b.product_id
ORDER BY b.purchase_count DESC


================================================
FILE: airbyte_dbt_prefect_bigquery/dbt_project/models/marts/purchase_patterns.sql
================================================
SELECT 
  user_id,
  product_id,
  purchased_at,
  added_to_cart_at,
  TIMESTAMP_DIFF(purchased_at, added_to_cart_at, SECOND) AS time_to_purchase_seconds,
  returned_at
FROM {{ ref('stg_purchases') }}


================================================
FILE: airbyte_dbt_prefect_bigquery/dbt_project/models/marts/user_demographics.sql
================================================
WITH base AS (
  SELECT 
    id AS user_id,
    gender,
    academic_degree,
    nationality,
    age
  FROM {{ ref('stg_users') }}
)

SELECT 
  gender,
  academic_degree,
  nationality,
  AVG(age) AS average_age,
  COUNT(user_id) AS user_count
FROM base
GROUP BY 1, 2, 3


================================================
FILE: airbyte_dbt_prefect_bigquery/dbt_project/models/sources/faker_sources.yml
================================================
version: 2

sources:
  - name: faker
    database: your_bigquery_project_id # Update with your BigQuery project ID
    schema: raw_data

    tables:
      - name: users
        description: "Simulated user data from the Faker connector."
        columns:
          - name: id
            description: "Unique identifier for the user."
          - name: address
          - name: occupation
          - name: gender
          - name: academic_degree
          - name: weight
          - name: created_at
          - name: language
          - name: telephone
          - name: title
          - name: updated_at
          - name: nationality
          - name: bl
Download .txt
gitextract_zp7ng51c/

├── .devcontainer/
│   ├── README.md
│   └── devcontainer.json
├── .gitignore
├── CONTRIBUTING.md
├── README.md
├── airbyte_dbt_airflow_bigquery/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   └── ecommerce/
│   │   │       ├── marts/
│   │   │       │   ├── product_popularity.sql
│   │   │       │   ├── purchase_patterns.sql
│   │   │       │   ├── schema.yml
│   │   │       │   └── user_demographics.sql
│   │   │       ├── sources/
│   │   │       │   └── faker_sources.yml
│   │   │       └── staging/
│   │   │           ├── schema.yml
│   │   │           ├── stg_products.sql
│   │   │           ├── stg_purchases.sql
│   │   │           └── stg_users.sql
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   ├── .gitkeep
│   │   │   ├── raw_customers.csv
│   │   │   ├── raw_orders.csv
│   │   │   └── raw_payments.csv
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       ├── terraform.tfvars
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── .gitignore
│   │   ├── Dockerfile
│   │   ├── README.md
│   │   ├── airflow/
│   │   │   ├── config/
│   │   │   │   └── dbt_config.py
│   │   │   ├── dags/
│   │   │   │   └── elt_dag.py
│   │   │   └── plugins/
│   │   │       ├── custom_docs_plugin.py
│   │   │       ├── dbt_upload_docs.py
│   │   │       ├── static/
│   │   │       │   └── .gitkeep
│   │   │       └── templates/
│   │   │           └── dbt/
│   │   │               └── .gitkeep
│   │   ├── docker-compose.yaml
│   │   └── requirements.txt
│   └── setup.py
├── airbyte_dbt_airflow_snowflake/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── example/
│   │   │   │   ├── my_first_dbt_model.sql
│   │   │   │   ├── my_second_dbt_model.sql
│   │   │   │   └── schema.yml
│   │   │   └── sources.yml
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── .gitignore
│   │   ├── Dockerfile
│   │   ├── README.md
│   │   ├── airflow/
│   │   │   ├── dags/
│   │   │   │   └── my_elt_dag.py
│   │   │   └── plugins/
│   │   │       ├── static/
│   │   │       │   └── .gitkeep
│   │   │       └── templates/
│   │   │           └── dbt/
│   │   │               └── .gitkeep
│   │   ├── docker-compose.yaml
│   │   └── requirements.txt
│   └── setup.py
├── airbyte_dbt_dagster/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   └── example/
│   │   │       ├── my_first_dbt_model.sql
│   │   │       ├── my_second_dbt_model.sql
│   │   │       └── schema.yml
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── airbyte_dbt_dagster_snowflake/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── example/
│   │   │   │   ├── my_first_dbt_model.sql
│   │   │   │   ├── my_second_dbt_model.sql
│   │   │   │   └── schema.yml
│   │   │   └── sources.yml
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── airbyte_dbt_prefect_bigquery/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── product_popularity.sql
│   │   │   │   ├── purchase_patterns.sql
│   │   │   │   └── user_demographics.sql
│   │   │   ├── sources/
│   │   │   │   └── faker_sources.yml
│   │   │   └── staging/
│   │   │       ├── stg_products.sql
│   │   │       ├── stg_purchases.sql
│   │   │       └── stg_users.sql
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   └── my_elt_flow.py
│   └── setup.py
├── airbyte_dbt_prefect_snowflake/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── example/
│   │   │   │   ├── my_first_dbt_model.sql
│   │   │   │   ├── my_second_dbt_model.sql
│   │   │   │   └── schema.yml
│   │   │   └── sources.yml
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   └── my_elt_flow.py
│   └── setup.py
├── airbyte_dbt_snowflake_looker/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── example/
│   │   │   │   ├── my_first_dbt_model.sql
│   │   │   │   ├── my_second_dbt_model.sql
│   │   │   │   └── schema.yml
│   │   │   └── sources.yml
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── airbyte_lib_notebooks/
│   ├── AirbyteLib_Basic_Features_Demo.ipynb
│   ├── AirbyteLib_CoinAPI_Demo.ipynb
│   ├── AirbyteLib_GA4_Demo.ipynb
│   ├── AirbyteLib_Github_Incremental_Demo.ipynb
│   ├── PyAirbyte_Postgres_Custom_Cache_Demo.ipynb
│   ├── PyAirbyte_Shopify_Demo.ipynb
│   └── README.md
├── airbyte_s3_pinecone_rag/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── dbt_project.yml
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   └── purchase_data.sql
│   │   │   ├── sources/
│   │   │   │   └── s3.source.yml
│   │   │   └── staging/
│   │   │       └── stg_purchases.sql
│   │   └── profiles.yml
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── output.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── query.py
│   ├── quickstart.md
│   └── setup.py
├── api_to_warehouse/
│   ├── .gitignore
│   ├── Readme.md
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   └── setup.py
├── customer_segmentation_analytics_shopify/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   ├── customer_activity_analysis.py
│   │   │   ├── purchase_pattern_segmentation_analysis.py
│   │   │   └── rfm_segmentation_analysis.py
│   │   ├── dbt_project.yml
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── customer_activity.sql
│   │   │   │   ├── purchase_pattern_segmentation.sql
│   │   │   │   └── rfm_segmentation.sql
│   │   │   ├── sources/
│   │   │   │   └── shopify_sources.yml
│   │   │   └── staging/
│   │   │       ├── stg_customers.sql
│   │   │       └── stg_transactions.sql
│   │   └── profiles.yml
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── data_to_pinecone_llm/
│   ├── .gitignore
│   ├── .vscode/
│   │   └── quickstart.code-workspace
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── notion.source.yml
│   │   │   └── notion_data.sql
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── output.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── query.py
│   ├── quickstart.md
│   ├── secrets/
│   │   ├── .gitignore
│   │   └── README.md
│   └── setup.py
├── database_snapshot/
│   ├── .gitignore
│   ├── README.md
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── connections/
│   │       │   ├── main.tf
│   │       │   ├── provider.tf
│   │       │   └── variables.tf
│   │       ├── destinations/
│   │       │   ├── main.tf
│   │       │   ├── outputs.tf
│   │       │   ├── provider.tf
│   │       │   └── variables.tf
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       ├── sources/
│   │       │   ├── main.tf
│   │       │   ├── outputs.tf
│   │       │   ├── provider.tf
│   │       │   └── variables.tf
│   │       └── variables.tf
│   └── setup.py
├── developer_productivity_analytics_github/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── dbt_project.yml
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── avarage_time_to_merge_pr_analysis.sql
│   │   │   │   ├── commits_over_time_per_dev_analysis.sql
│   │   │   │   ├── dev_activity_by_day_of_week_analysis.sql
│   │   │   │   ├── dev_collaboration_network_analysis.sql
│   │   │   │   ├── freq_of_code_contribution_analysis.sql
│   │   │   │   ├── no_of_code_reviews_per_dev_analysis.sql
│   │   │   │   ├── no_of_commits_per_dev_per_repo_analysis.sql
│   │   │   │   ├── no_of_pr_per_dev_analysis.sql
│   │   │   │   ├── number_of_pr_open_or_closed.sql
│   │   │   │   ├── top_collaborators_by_repo_analysis.sql
│   │   │   │   └── track_issues_assigned_by_dev_analysis.sql
│   │   │   ├── sources/
│   │   │   │   └── github_source.yml
│   │   │   └── staging/
│   │   │       ├── stg_branches.sql
│   │   │       ├── stg_collaborators.sql
│   │   │       ├── stg_comments.sql
│   │   │       ├── stg_commits.sql
│   │   │       ├── stg_issues.sql
│   │   │       ├── stg_organizations.sql
│   │   │       ├── stg_pull_requests.sql
│   │   │       ├── stg_repositories.sql
│   │   │       ├── stg_review_comments.sql
│   │   │       ├── stg_reviews.sql
│   │   │       ├── stg_stargazers.sql
│   │   │       ├── stg_tags.sql
│   │   │       ├── stg_teams.sql
│   │   │       └── stg_users.sql
│   │   ├── profiles.yml
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── ecommerce_analytics_bigquery/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── product_popularity.sql
│   │   │   │   ├── purchase_patterns.sql
│   │   │   │   └── user_demographics.sql
│   │   │   ├── sources/
│   │   │   │   └── faker_sources.yml
│   │   │   └── staging/
│   │   │       ├── stg_products.sql
│   │   │       ├── stg_purchases.sql
│   │   │       └── stg_users.sql
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── elt_simplified_stack/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── dbt_project.yml
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── commits-per-repo.sql
│   │   │   │   ├── pr-per-dev.sql
│   │   │   │   └── pr-per-status.sql
│   │   │   ├── sources/
│   │   │   │   └── github_source.yml
│   │   │   └── staging/
│   │   │       ├── stg_commits.sql
│   │   │       └── stg_pull_requests.sql
│   │   ├── profiles.yml
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   └── my_elt_flow.py
│   └── setup.py
├── error_analysis_stack_sentry/
│   ├── Readme.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── example/
│   │   │   │   ├── Insight_Table.sql
│   │   │   │   └── schema.yml
│   │   │   └── sources.yml
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── error_analysis_stack.egg-info/
│   │   ├── PKG-INFO
│   │   ├── SOURCES.txt
│   │   ├── dependency_links.txt
│   │   ├── requires.txt
│   │   └── top_level.txt
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── orchestration.egg-info/
│   │   │   ├── PKG-INFO
│   │   │   ├── SOURCES.txt
│   │   │   ├── dependency_links.txt
│   │   │   ├── requires.txt
│   │   │   └── top_level.txt
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── github_insight_stack/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── Readme.md
│   │   │   ├── sources.yml
│   │   │   └── test-models/
│   │   │       ├── code_quality.sql
│   │   │       ├── collaboration_patterns.sql
│   │   │       └── project_health.sql
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── low_latency_data_availability/
│   ├── .gitignore
│   ├── README.md
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   └── setup.py
├── mongodb_mysql_integration/
│   ├── .gitignore
│   ├── README.md
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   └── setup.py
├── multisource_aggregation/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── macros/
│   │   │   └── .gitkeep
│   │   ├── models/
│   │   │   ├── example/
│   │   │   │   ├── my_first_dbt_model.sql
│   │   │   │   ├── my_second_dbt_model.sql
│   │   │   │   └── schema.yml
│   │   │   └── sources.yml
│   │   ├── profiles.yml
│   │   ├── seeds/
│   │   │   └── .gitkeep
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── connections/
│   │       │   ├── main.tf
│   │       │   ├── provider.tf
│   │       │   └── variables.tf
│   │       ├── destination_warehouse/
│   │       │   ├── main.tf
│   │       │   ├── outputs.tf
│   │       │   ├── provider.tf
│   │       │   └── variables.tf
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       ├── source_databases/
│   │       │   ├── main.tf
│   │       │   ├── outputs.tf
│   │       │   ├── provider.tf
│   │       │   └── variables.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── mysql_to_postgres_incremental_stack/
│   ├── .gitignore
│   ├── README.md
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   └── setup.py
├── outdoor_activity_analytics_recreation/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analyses/
│   │   │   ├── campsite_availability_analysis.py
│   │   │   ├── campsite_type_analysis.py
│   │   │   ├── count_recareas_by_activity_analysis.py
│   │   │   └── most_common_activities_in_recareas_analysis.py
│   │   ├── dbt_project.yml
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── campsite_availability_over_time.sql
│   │   │   │   ├── campsite_type_counts.sql
│   │   │   │   ├── count_recarea_by_activity_analysis.sql
│   │   │   │   └── most_common_activities_in_recareas.sql
│   │   │   ├── sources/
│   │   │   │   └── recreation_source.yml
│   │   │   └── staging/
│   │   │       ├── stg_activities.sql
│   │   │       ├── stg_campsites.sql
│   │   │       ├── stg_facilities.sql
│   │   │       └── stg_recreationareas.sql
│   │   └── profiles.yml
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── postgres_data_replication/
│   ├── .gitignore
│   ├── README.md
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   └── setup.py
├── postgres_snowflake_integration/
│   ├── .gitignore
│   ├── README.md
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   └── setup.py
├── postgres_to_mysql_migration/
│   ├── .gitignore
│   ├── README.md
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   └── setup.py
├── pyairbyte_notebooks/
│   ├── AI ChatBot - 1.0 Launch Demo.ipynb
│   ├── Chatoverpolygonstockdata_langchain.ipynb
│   ├── PyAirbyte_Apify_Demo.ipynb
│   ├── PyAirbyte_Basic_Features_Demo.ipynb
│   ├── PyAirbyte_CoinAPI_Demo.ipynb
│   ├── PyAirbyte_Document_Creation_RAG_with_Langchain_Demo.ipynb
│   ├── PyAirbyte_GA4_Demo.ipynb
│   ├── PyAirbyte_Github_Incremental_Demo.ipynb
│   ├── PyAirbyte_Postgres_Custom_Cache_Demo.ipynb
│   ├── PyAirbyte_Shopify_Demo.ipynb
│   ├── PyAirbyte_Snowflake_Cortex_Github.ipynb
│   ├── PyAirbyte_Snowflake_Custom_Cache_Demo.ipynb
│   ├── PyAirbyte_as_an_Orchestrator_Demo.ipynb
│   ├── RAG_using_github_pyairbyte_chroma.ipynb
│   ├── README.md
│   ├── rag_using_gdrive_pyairbyte_pinecone.ipynb
│   ├── rag_using_github_pyairbyte_weaviate.ipynb
│   ├── rag_using_gitlab_pyairbyte_qdrant.ipynb
│   ├── rag_using_jira_pyairbyte_pinecone.ipynb
│   ├── rag_using_s3_pyairbyte_pinecone.ipynb
│   ├── rag_using_shopify_pyairbyte_langchain.ipynb
│   ├── rag_with_fb_marketing_milvus_lite.ipynb
│   ├── rag_with_pyairbyte_and_milvus_lite.ipynb
│   ├── sentiment_analysis_airbyte_gsheets_snowflakecortex.ipynb
│   └── using_langchain_airbyte_package.ipynb
├── satisfaction_analytics_zendesk_support/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── analysis/
│   │   │   └── .gitkeep
│   │   ├── dbt_project.yml
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── analyze_satisfaction_score_over_time.sql
│   │   │   │   ├── avarage_satisfaction_rating.sql
│   │   │   │   ├── feedback_analysis_for_low_score.sql
│   │   │   │   └── trend_analysis_by_score.sql
│   │   │   ├── sources/
│   │   │   │   └── zendesk_support_sources.yml
│   │   │   └── staging/
│   │   │       ├── stg_brands.sql
│   │   │       ├── stg_groups.sql
│   │   │       ├── stg_organizations.sql
│   │   │       ├── stg_satisfaction_ratings.sql
│   │   │       ├── stg_tags.sql
│   │   │       ├── stg_ticket_audits.sql
│   │   │       ├── stg_ticket_comments.sql
│   │   │       ├── stg_ticket_fields.sql
│   │   │       ├── stg_ticket_forms.sql
│   │   │       ├── stg_ticket_metric_events.sql
│   │   │       ├── stg_ticket_metrics.sql
│   │   │       ├── stg_tickets.sql
│   │   │       └── stg_users.sql
│   │   ├── profiles.yml
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   ├── setup.py
│   │   ├── tmp3ks7pwhz/
│   │   │   └── storage/
│   │   │       ├── ad21fadd-c131-4a7c-98a7-fa5ad3a929de/
│   │   │       │   └── compute_logs/
│   │   │       │       ├── mdvhnoik.complete
│   │   │       │       ├── mdvhnoik.err
│   │   │       │       ├── mdvhnoik.out
│   │   │       │       ├── uzgmeijp.complete
│   │   │       │       ├── uzgmeijp.err
│   │   │       │       └── uzgmeijp.out
│   │   │       └── f7507115-918d-443f-ab91-a065e84fa403/
│   │   │           └── compute_logs/
│   │   │               ├── aeebjmfa.complete
│   │   │               ├── aeebjmfa.err
│   │   │               ├── aeebjmfa.out
│   │   │               ├── zqbkkiww.complete
│   │   │               ├── zqbkkiww.err
│   │   │               └── zqbkkiww.out
│   │   └── tmpb3ctnsbk/
│   │       └── storage/
│   │           ├── 0bc4e544-546d-44df-b79c-e75413c56ecb/
│   │           │   └── compute_logs/
│   │           │       ├── xozgecli.complete
│   │           │       ├── xozgecli.err
│   │           │       ├── xozgecli.out
│   │           │       ├── yyxjctam.complete
│   │           │       ├── yyxjctam.err
│   │           │       └── yyxjctam.out
│   │           └── 1eac78ed-12d1-4147-9c48-79b27dd586ed/
│   │               └── compute_logs/
│   │                   ├── iqvvuhde.complete
│   │                   ├── iqvvuhde.err
│   │                   ├── iqvvuhde.out
│   │                   ├── izklbfmq.complete
│   │                   ├── izklbfmq.err
│   │                   └── izklbfmq.out
│   └── setup.py
├── shopping_cart_analytics_shopify/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── dbt_project.yml
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── abandoned_checkout_ratio.sql
│   │   │   │   ├── location_based_abandoned_checkouts.sql
│   │   │   │   ├── most_abandoned_products.sql
│   │   │   │   └── time_based.sql
│   │   │   ├── sources/
│   │   │   │   └── shopify_source.yml
│   │   │   └── staging/
│   │   │       └── stg_abandoned_checkouts.sql
│   │   └── profiles.yml
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── ticket_volume_analytics_zendesk_support/
│   ├── .gitignore
│   ├── README.md
│   ├── dbt_project/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── dbt_project.yml
│   │   ├── models/
│   │   │   ├── marts/
│   │   │   │   ├── busier_day_of_week_analysis.sql
│   │   │   │   ├── pattern_and_trend_analysis.sql
│   │   │   │   ├── seasonal_analysis.sql
│   │   │   │   ├── ticket_priority_analysis.sql
│   │   │   │   ├── ticket_resolution_time_analysis.sql
│   │   │   │   ├── ticket_source_analysis.sql
│   │   │   │   └── ticket_volume_analysis.sql
│   │   │   ├── sources/
│   │   │   │   └── zendesk_support_sources.yml
│   │   │   └── staging/
│   │   │       ├── stg_schedules.sql
│   │   │       ├── stg_ticket_metrics.sql
│   │   │       ├── stg_tickets.sql
│   │   │       └── stg_users.sql
│   │   ├── profiles.yml
│   │   ├── snapshots/
│   │   │   └── .gitkeep
│   │   └── tests/
│   │       └── .gitkeep
│   ├── infra/
│   │   ├── .gitignore
│   │   └── airbyte/
│   │       ├── .terraform.lock.hcl
│   │       ├── main.tf
│   │       ├── provider.tf
│   │       └── variables.tf
│   ├── orchestration/
│   │   ├── orchestration/
│   │   │   ├── __init__.py
│   │   │   ├── assets.py
│   │   │   ├── constants.py
│   │   │   ├── definitions.py
│   │   │   └── schedules.py
│   │   ├── pyproject.toml
│   │   └── setup.py
│   └── setup.py
├── vector_store_integration/
│   ├── AI_assistant_streamlit_app/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── app.py
│   │   └── requirements.txt
│   ├── RAG_using_PGVector.ipynb
│   ├── RAG_using_Snowflake_Cortex.ipynb
│   └── RAG_using_Vectara.ipynb
└── weather_data_stack/
    ├── .gitignore
    ├── README.md
    ├── dbt_project/
    │   ├── .gitignore
    │   ├── README.md
    │   ├── analyses/
    │   │   └── .gitkeep
    │   ├── dbt_project.yml
    │   ├── macros/
    │   │   └── .gitkeep
    │   ├── models/
    │   │   ├── marts/
    │   │   │   └── historial_weather_trends.sql
    │   │   ├── sources/
    │   │   │   └── weatherstack_source.yml
    │   │   └── staging/
    │   │       └── stg_current_weather.sql
    │   ├── profiles.yml
    │   ├── seeds/
    │   │   └── .gitkeep
    │   ├── snapshots/
    │   │   └── .gitkeep
    │   └── tests/
    │       └── .gitkeep
    ├── orchestration/
    │   ├── orchestration/
    │   │   ├── __init__.py
    │   │   ├── assets.py
    │   │   ├── constants.py
    │   │   ├── definitions.py
    │   │   └── schedules.py
    │   ├── pyproject.toml
    │   └── setup.py
    └── setup.py
Download .txt
SYMBOL INDEX (37 symbols across 22 files)

FILE: airbyte_dbt_airflow_bigquery/orchestration/airflow/dags/elt_dag.py
  function extract_and_transform (line 21) | def extract_and_transform():

FILE: airbyte_dbt_airflow_bigquery/orchestration/airflow/plugins/custom_docs_plugin.py
  class DbtDocsPluginView (line 19) | class DbtDocsPluginView(BaseView):
    method index (line 28) | def index(self):
  class CustomDocsPlugin (line 34) | class CustomDocsPlugin(AirflowPlugin):

FILE: airbyte_dbt_airflow_bigquery/orchestration/airflow/plugins/dbt_upload_docs.py
  function fix_file (line 2) | def fix_file():
  function upload_docs (line 37) | def upload_docs(project_dir):

FILE: airbyte_dbt_airflow_snowflake/orchestration/airflow/dags/my_elt_dag.py
  function check_airbyte_health (line 28) | def check_airbyte_health():

FILE: airbyte_dbt_dagster/orchestration/orchestration/assets.py
  function dbt_project_dbt_assets (line 9) | def dbt_project_dbt_assets(context: OpExecutionContext, dbt: DbtCliResou...

FILE: airbyte_dbt_dagster_snowflake/orchestration/orchestration/assets.py
  function dbt_project_dbt_assets (line 9) | def dbt_project_dbt_assets(context: OpExecutionContext, dbt: DbtCliResou...

FILE: airbyte_dbt_prefect_bigquery/orchestration/my_elt_flow.py
  function run_airbyte_sync (line 29) | def run_airbyte_sync(connection: AirbyteConnection) -> AirbyteSyncResult:
  function run_dbt_commands (line 34) | def run_dbt_commands(commands, prev_task_result):
  function my_elt_flow (line 44) | def my_elt_flow():

FILE: airbyte_dbt_prefect_snowflake/orchestration/my_elt_flow.py
  function run_airbyte_sync (line 29) | def run_airbyte_sync(connection: AirbyteConnection) -> AirbyteSyncResult:
  function run_dbt_commands (line 34) | def run_dbt_commands(commands, prev_task_result):
  function my_elt_flow (line 44) | def my_elt_flow():

FILE: airbyte_dbt_snowflake_looker/orchestration/orchestration/assets.py
  function dbt_project_dbt_assets (line 9) | def dbt_project_dbt_assets(context: OpExecutionContext, dbt: DbtCliResou...

FILE: customer_segmentation_analytics_shopify/orchestration/orchestration/assets.py
  function dbt_project_dbt_assets (line 10) | def dbt_project_dbt_assets(context: OpExecutionContext, dbt: DbtCliResou...

FILE: developer_productivity_analytics_github/orchestration/orchestration/assets.py
  function dbt_project_dbt_assets (line 10) | def dbt_project_dbt_assets(context: OpExecutionContext, dbt: DbtCliResou...

FILE: ecommerce_analytics_bigquery/orchestration/orchestration/assets.py
  function dbt_project_dbt_assets (line 9) | def dbt_project_dbt_assets(context: OpExecutionContext, dbt: DbtCliResou...

FILE: elt_simplified_stack/orchestration/my_elt_flow.py
  function run_airbyte_sync (line 29) | def run_airbyte_sync(connection: AirbyteConnection) -> AirbyteSyncResult:
  function run_dbt_commands (line 34) | def run_dbt_commands(commands, prev_task_result):
  function my_elt_flow (line 44) | def my_elt_flow():

FILE: error_analysis_stack_sentry/orchestration/orchestration/assets.py
  function dbt_project_dbt_assets (line 9) | def dbt_project_dbt_assets(context: OpExecutionContext, dbt: DbtCliResou...

FILE: github_insight_stack/orchestration/orchestration/assets.py
  function dbt_project_dbt_assets (line 9) | def dbt_project_dbt_assets(context: OpExecutionContext, dbt: DbtCliResou...

FILE: multisource_aggregation/orchestration/orchestration/assets.py
  function dbt_project_dbt_assets (line 9) | def dbt_project_dbt_assets(context: OpExecutionContext, dbt: DbtCliResou...

FILE: outdoor_activity_analytics_recreation/orchestration/orchestration/assets.py
  function dbt_project_dbt_assets (line 10) | def dbt_project_dbt_assets(context: OpExecutionContext, dbt: DbtCliResou...

FILE: satisfaction_analytics_zendesk_support/orchestration/orchestration/assets.py
  function dbt_project_dbt_assets (line 10) | def dbt_project_dbt_assets(context: OpExecutionContext, dbt: DbtCliResou...

FILE: shopping_cart_analytics_shopify/orchestration/orchestration/assets.py
  function dbt_project_dbt_assets (line 10) | def dbt_project_dbt_assets(context: OpExecutionContext, dbt: DbtCliResou...

FILE: ticket_volume_analytics_zendesk_support/orchestration/orchestration/assets.py
  function dbt_project_dbt_assets (line 10) | def dbt_project_dbt_assets(context: OpExecutionContext, dbt: DbtCliResou...

FILE: vector_store_integration/AI_assistant_streamlit_app/app.py
  function get_db_connection (line 11) | def get_db_connection():
  function get_similar_chunks (line 22) | def get_similar_chunks(query_vector, table_names) -> List[str]:
  function get_completion (line 42) | def get_completion(question, document_chunks: List[str], model_name: str...
  function get_user_intent (line 69) | def get_user_intent(query):
  function get_tables_to_query (line 101) | def get_tables_to_query(query):
  function get_response (line 113) | def get_response(query):
  function response_generator (line 125) | def response_generator(query):

FILE: weather_data_stack/orchestration/orchestration/assets.py
  function dbt_project_dbt_assets (line 9) | def dbt_project_dbt_assets(context: OpExecutionContext, dbt: DbtCliResou...
Condensed preview — 753 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (4,818K chars).
[
  {
    "path": ".devcontainer/README.md",
    "chars": 1255,
    "preview": "# `.devcontainer` Config\n\nThis directory houses a set of Dev Container config files, which streamline contributions from"
  },
  {
    "path": ".devcontainer/devcontainer.json",
    "chars": 3170,
    "preview": "// This is a generic devcontainer definition for working with Quickstarts.\n//\n// Included in this devcontainer:\n// - Pyt"
  },
  {
    "path": ".gitignore",
    "chars": 155,
    "preview": "# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n#Desktop Services Store\n.DS_Store\n\n# PyAirbyte caches and "
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 3080,
    "preview": "# Contributing to Airbyte Quickstarts\n\nThank you for considering contributing to Airbyte Quickstarts! 🌟 It’s people like"
  },
  {
    "path": "README.md",
    "chars": 3184,
    "preview": "# Airbyte Quickstarts\n\nWelcome to Airbyte Quickstarts! This repository provides various templates to help you quickly bu"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/.gitignore",
    "chars": 3112,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/README.md",
    "chars": 19018,
    "preview": "# E-commerce Analytics Stack with Airbyte, dbt, Airflow (ADA) and BigQuery\n\nWelcome to the Airbyte, dbt and Airflow (ADA"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/.gitignore",
    "chars": 87,
    "preview": "\ntarget/\ndbt_packages/\nlogs/\n\n#Desktop Services Store\n.DS_Store\n\n#User cookie\n.user.yml"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/README.md",
    "chars": 1223,
    "preview": "Welcome to your new dbt project!\n\n### Using the starter project\n\nTry running the following commands:\n- dbt run\n- dbt tes"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/analyses/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/dbt_project.yml",
    "chars": 1389,
    "preview": "# Name your project! Project names should contain only lowercase characters\n# and underscores. A good package name shoul"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/macros/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/marts/product_popularity.sql",
    "chars": 283,
    "preview": "WITH base AS (\n  SELECT \n    product_id,\n    COUNT(id) AS purchase_count\n  FROM {{ ref('stg_purchases') }}\n  GROUP BY 1\n"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/marts/purchase_patterns.sql",
    "chars": 201,
    "preview": "SELECT \n  user_id,\n  product_id,\n  purchased_at,\n  added_to_cart_at,\n  TIMESTAMP_DIFF(purchased_at, added_to_cart_at, SE"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/marts/schema.yml",
    "chars": 1151,
    "preview": "version: 2\n\nmodels:\n    - name: product_popularity\n      columns:\n          - name: id\n            tests:\n              "
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/marts/user_demographics.sql",
    "chars": 272,
    "preview": "WITH base AS (\n  SELECT \n    id AS user_id,\n    gender,\n    academic_degree,\n    nationality,\n    age\n  FROM {{ ref('stg"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/sources/faker_sources.yml",
    "chars": 2147,
    "preview": "version: 2\n\nsources:\n    - name: faker\n      project: your_project_id # Update this field with your BigQuery project ID\n"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/staging/schema.yml",
    "chars": 2068,
    "preview": "version: 2\n\nmodels:\n    - name: stg_users\n      columns:\n          - name: id\n            tests:\n                - uniqu"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/staging/stg_products.sql",
    "chars": 154,
    "preview": "select\n    id,\n    year,\n    price,\n    model,\n    make,\n    created_at,\n    updated_at,\n    _airbyte_extracted_at\nfrom "
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/staging/stg_purchases.sql",
    "chars": 199,
    "preview": "select\n    id,\n    user_id,\n    product_id,\n    updated_at,\n    purchased_at,\n    returned_at,\n    created_at,\n    added"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/models/ecommerce/staging/stg_users.sql",
    "chars": 201,
    "preview": "select\n    id,\n    gender,\n    academic_degree,\n    title,\n    nationality,\n    age,\n    name,\n    email,\n    created_at"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/profiles.yml",
    "chars": 600,
    "preview": "dbt_project:\n  outputs:\n    dev:\n      dataset: transformed_data\n      job_execution_timeout_seconds: 300\n      job_retr"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/seeds/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/seeds/raw_customers.csv",
    "chars": 1302,
    "preview": "id,first_name,last_name\n1,Michael,P.\n2,Shawn,M.\n3,Kathleen,P.\n4,Jimmy,C.\n5,Katherine,R.\n6,Sarah,R.\n7,Martin,M.\n8,Frank,R"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/seeds/raw_orders.csv",
    "chars": 2723,
    "preview": "id,user_id,order_date,status\r\n1,1,2018-01-01,returned\r\n2,3,2018-01-02,completed\r\n3,94,2018-01-04,completed\r\n4,50,2018-01"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/seeds/raw_payments.csv",
    "chars": 2560,
    "preview": "id,order_id,payment_method,amount\n1,1,credit_card,1000\n2,2,credit_card,2000\n3,3,coupon,100\n4,4,coupon,2500\n5,5,bank_tran"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/snapshots/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/dbt_project/tests/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/infra/.gitignore",
    "chars": 874,
    "preview": "# Local .terraform directories\n**/.terraform/*\n\n# .tfstate files\n*.tfstate\n*.tfstate.*\n\n# Crash log files\ncrash.log\ncras"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/infra/README.md",
    "chars": 936,
    "preview": "# Airbyte setup with terraform\n\nThis folder contains the terraform code to setup a source, destination and connection in"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/infra/airbyte/main.tf",
    "chars": 1314,
    "preview": "// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs\n\n// S"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/infra/airbyte/provider.tf",
    "chars": 609,
    "preview": "// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs\n\nterr"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/infra/airbyte/terraform.tfvars",
    "chars": 86,
    "preview": "workspace_id=\"\"\ndataset_id=\"sample_ecommerce\"\nproject_id=\"\"\ncredentials_json_path = \"\""
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/infra/airbyte/variables.tf",
    "chars": 186,
    "preview": "variable \"workspace_id\" {\n  type = string\n}\n\nvariable \"dataset_id\" {\n  type = string\n}\n\nvariable \"project_id\" {\n  type ="
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/orchestration/.gitignore",
    "chars": 16,
    "preview": "logs\n__pycache__"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/orchestration/Dockerfile",
    "chars": 112,
    "preview": "FROM apache/airflow:2.7.2-python3.11\nCOPY requirements.txt /\nRUN pip install --no-cache-dir -r /requirements.txt"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/orchestration/README.md",
    "chars": 1172,
    "preview": "# Airflow setup with Airbyte and DBT\n\nThis folder contains the code to setup Airflow with Airbyte and DBT.\n\n## Setup\n\nWe"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/orchestration/airflow/config/dbt_config.py",
    "chars": 711,
    "preview": "from cosmos.config import ProjectConfig, ProfileConfig\nfrom cosmos.profiles import GoogleCloudServiceAccountDictProfileM"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/orchestration/airflow/dags/elt_dag.py",
    "chars": 2057,
    "preview": "from pendulum import datetime\nfrom airflow.decorators import dag\nfrom airflow.providers.airbyte.operators.airbyte import"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/orchestration/airflow/plugins/custom_docs_plugin.py",
    "chars": 1065,
    "preview": "\"\"\"Plugins example\"\"\"\nfrom __future__ import annotations\n\nfrom flask import Blueprint\nfrom flask_appbuilder import BaseV"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/orchestration/airflow/plugins/dbt_upload_docs.py",
    "chars": 2205,
    "preview": "import shutil, os, re\ndef fix_file():\n    \n    with open('/opt/airflow/plugins/templates/dbt/dbt_index.html') as f:\n    "
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/orchestration/airflow/plugins/static/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/orchestration/airflow/plugins/templates/dbt/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/orchestration/docker-compose.yaml",
    "chars": 13502,
    "preview": "# Licensed to the Apache Software Foundation (ASF) under one\n# or more contributor license agreements.  See the NOTICE f"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/orchestration/requirements.txt",
    "chars": 159,
    "preview": "dbt-core~=1.6.0\nastronomer-cosmos~=1.1.0\nastronomer-cosmos[dbt-bigquery]~=1.1.0\napache-airflow-providers-google~=10.9.0\n"
  },
  {
    "path": "airbyte_dbt_airflow_bigquery/setup.py",
    "chars": 377,
    "preview": "from setuptools import find_packages, setup\n\nsetup(\n    name=\"airbyte-dbt-airflow-bigquery\",\n    packages=find_packages("
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/.gitignore",
    "chars": 3112,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/README.md",
    "chars": 9276,
    "preview": "# Airbyte-dbt-Airflow-Snowflake Integration\n\nWelcome to the \"Airbyte-dbt-Airflow-Snowflake Integration\" repository! This"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/dbt_project/.gitignore",
    "chars": 87,
    "preview": "\ntarget/\ndbt_packages/\nlogs/\n\n#Desktop Services Store\n.DS_Store\n\n#User cookie\n.user.yml"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/dbt_project/README.md",
    "chars": 571,
    "preview": "Welcome to your new dbt project!\n\n### Using the starter project\n\nTry running the following commands:\n- dbt run\n- dbt tes"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/dbt_project/analyses/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/dbt_project/dbt_project.yml",
    "chars": 1265,
    "preview": "\n# Name your project! Project names should contain only lowercase characters\n# and underscores. A good package name shou"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/dbt_project/macros/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/dbt_project/models/example/my_first_dbt_model.sql",
    "chars": 480,
    "preview": "\n/*\n    Welcome to your first dbt model!\n    Did you know that you can also configure models directly within SQL files?\n"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/dbt_project/models/example/my_second_dbt_model.sql",
    "chars": 115,
    "preview": "\n-- Use the `ref` function to select from other models\n\nselect *\nfrom {{ ref('my_first_dbt_model') }}\nwhere id = 1\n"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/dbt_project/models/example/schema.yml",
    "chars": 437,
    "preview": "\nversion: 2\n\nmodels:\n  - name: my_first_dbt_model\n    description: \"A starter dbt model\"\n    columns:\n      - name: id\n "
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/dbt_project/models/sources.yml",
    "chars": 80,
    "preview": "version: 2\n\nsources:\n  - name: snowflake\n    tables:\n      - name: sample_table\n"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/dbt_project/profiles.yml",
    "chars": 693,
    "preview": "dbt_project:\n  outputs:\n    dev:\n\n      type: snowflake\n      account: \"{{ env_var('DBT_SNOWFLAKE_ACCOUNT_ID', '') }}\"\n\n"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/dbt_project/seeds/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/dbt_project/snapshots/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/dbt_project/tests/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/infra/.gitignore",
    "chars": 883,
    "preview": "# Local .terraform directories\n**/.terraform/*\n\n# .tfstate files\n*.tfstate\n*.tfstate.*\n\n# Crash log files\ncrash.log\ncras"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/infra/airbyte/main.tf",
    "chars": 2138,
    "preview": "// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs\n\n// S"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/infra/airbyte/provider.tf",
    "chars": 609,
    "preview": "// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs\n\nterr"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/infra/airbyte/variables.tf",
    "chars": 52,
    "preview": "variable \"workspace_id\" {\n    type = string\n}\n\n\n\n\n\n\n"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/orchestration/.gitignore",
    "chars": 16,
    "preview": "logs\n__pycache__"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/orchestration/Dockerfile",
    "chars": 112,
    "preview": "FROM apache/airflow:2.7.2-python3.11\nCOPY requirements.txt /\nRUN pip install --no-cache-dir -r /requirements.txt"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/orchestration/README.md",
    "chars": 692,
    "preview": "# Airflow setup with Airbyte and DBT\n\nThis folder contains the code to setup Airflow with Airbyte and DBT.\n\n## Setup\n\nWe"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/orchestration/airflow/dags/my_elt_dag.py",
    "chars": 2453,
    "preview": "import pendulum, os\n\nfrom datetime import timedelta\n\nfrom airflow import DAG\nfrom airflow.operators.bash import BashOper"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/orchestration/airflow/plugins/static/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/orchestration/airflow/plugins/templates/dbt/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/orchestration/docker-compose.yaml",
    "chars": 13431,
    "preview": "# Licensed to the Apache Software Foundation (ASF) under one\n# or more contributor license agreements.  See the NOTICE f"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/orchestration/requirements.txt",
    "chars": 70,
    "preview": "dbt-core~=1.6.0\ndbt-snowflake\napache-airflow-providers-airbyte~=3.3.2\n"
  },
  {
    "path": "airbyte_dbt_airflow_snowflake/setup.py",
    "chars": 281,
    "preview": "from setuptools import find_packages, setup\n\nsetup(\n    name=\"airbyte-dbt-airflow-snowflake\",\n    packages=find_packages"
  },
  {
    "path": "airbyte_dbt_dagster/.gitignore",
    "chars": 3112,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": "airbyte_dbt_dagster/README.md",
    "chars": 8362,
    "preview": "# Airbyte-dbt-Dagster Integration\n\nWelcome to the \"Airbyte-dbt-Dagster Integration\" repository! This repo provides a qui"
  },
  {
    "path": "airbyte_dbt_dagster/dbt_project/.gitignore",
    "chars": 87,
    "preview": "\ntarget/\ndbt_packages/\nlogs/\n\n#Desktop Services Store\n.DS_Store\n\n#User cookie\n.user.yml"
  },
  {
    "path": "airbyte_dbt_dagster/dbt_project/README.md",
    "chars": 571,
    "preview": "Welcome to your new dbt project!\n\n### Using the starter project\n\nTry running the following commands:\n- dbt run\n- dbt tes"
  },
  {
    "path": "airbyte_dbt_dagster/dbt_project/analyses/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_dagster/dbt_project/dbt_project.yml",
    "chars": 1265,
    "preview": "\n# Name your project! Project names should contain only lowercase characters\n# and underscores. A good package name shou"
  },
  {
    "path": "airbyte_dbt_dagster/dbt_project/macros/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_dagster/dbt_project/models/example/my_first_dbt_model.sql",
    "chars": 475,
    "preview": "\n/*\n    Welcome to your first dbt model!\n    Did you know that you can also configure models directly within SQL files?\n"
  },
  {
    "path": "airbyte_dbt_dagster/dbt_project/models/example/my_second_dbt_model.sql",
    "chars": 115,
    "preview": "\n-- Use the `ref` function to select from other models\n\nselect *\nfrom {{ ref('my_first_dbt_model') }}\nwhere id = 1\n"
  },
  {
    "path": "airbyte_dbt_dagster/dbt_project/models/example/schema.yml",
    "chars": 437,
    "preview": "\nversion: 2\n\nmodels:\n  - name: my_first_dbt_model\n    description: \"A starter dbt model\"\n    columns:\n      - name: id\n "
  },
  {
    "path": "airbyte_dbt_dagster/dbt_project/profiles.yml",
    "chars": 359,
    "preview": "dbt_project:\n  outputs:\n    dev:\n      dataset: my_dataset\n      job_execution_timeout_seconds: 300\n      job_retries: 1"
  },
  {
    "path": "airbyte_dbt_dagster/dbt_project/seeds/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_dagster/dbt_project/snapshots/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_dagster/dbt_project/tests/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_dagster/infra/.gitignore",
    "chars": 883,
    "preview": "# Local .terraform directories\n**/.terraform/*\n\n# .tfstate files\n*.tfstate\n*.tfstate.*\n\n# Crash log files\ncrash.log\ncras"
  },
  {
    "path": "airbyte_dbt_dagster/infra/airbyte/.terraform.lock.hcl",
    "chars": 1408,
    "preview": "# This file is maintained automatically by \"terraform init\".\n# Manual edits may be lost in future updates.\n\nprovider \"re"
  },
  {
    "path": "airbyte_dbt_dagster/infra/airbyte/main.tf",
    "chars": 1681,
    "preview": "// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs\n\n// S"
  },
  {
    "path": "airbyte_dbt_dagster/infra/airbyte/provider.tf",
    "chars": 609,
    "preview": "// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs\n\nterr"
  },
  {
    "path": "airbyte_dbt_dagster/infra/airbyte/variables.tf",
    "chars": 52,
    "preview": "variable \"workspace_id\" {\n    type = string\n}\n\n\n\n\n\n\n"
  },
  {
    "path": "airbyte_dbt_dagster/orchestration/orchestration/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_dagster/orchestration/orchestration/assets.py",
    "chars": 672,
    "preview": "import os\nfrom dagster import OpExecutionContext\nfrom dagster_dbt import DbtCliResource, dbt_assets\nfrom dagster_airbyte"
  },
  {
    "path": "airbyte_dbt_dagster/orchestration/orchestration/constants.py",
    "chars": 673,
    "preview": "import os\nfrom pathlib import Path\n\nfrom dagster_dbt import DbtCliResource\n\ndbt_project_dir = Path(__file__).joinpath(\"."
  },
  {
    "path": "airbyte_dbt_dagster/orchestration/orchestration/definitions.py",
    "chars": 408,
    "preview": "import os\n\nfrom dagster import Definitions\nfrom dagster_dbt import DbtCliResource\n\nfrom .assets import dbt_project_dbt_a"
  },
  {
    "path": "airbyte_dbt_dagster/orchestration/orchestration/schedules.py",
    "chars": 414,
    "preview": "\"\"\"\nTo add a daily schedule that materializes your dbt assets, uncomment the following lines.\n\"\"\"\nfrom dagster_dbt impor"
  },
  {
    "path": "airbyte_dbt_dagster/orchestration/pyproject.toml",
    "chars": 175,
    "preview": "[build-system]\nrequires = [\"setuptools\"]\nbuild-backend = \"setuptools.build_meta\"\n\n[tool.dagster]\nmodule_name = \"orchestr"
  },
  {
    "path": "airbyte_dbt_dagster/orchestration/setup.py",
    "chars": 366,
    "preview": "from setuptools import find_packages, setup\n\nsetup(\n    name=\"orchestration\",\n    version=\"0.0.1\",\n    packages=find_pac"
  },
  {
    "path": "airbyte_dbt_dagster/setup.py",
    "chars": 312,
    "preview": "from setuptools import find_packages, setup\n\nsetup(\n    name=\"airbyte-dbt-dagster\",\n    packages=find_packages(),\n    in"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/.gitignore",
    "chars": 3112,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/README.md",
    "chars": 8815,
    "preview": "# Airbyte-dbt-Dagster-Snowflake Integration\n\nWelcome to the \"Airbyte-dbt-Dagster-Snowflake Integration\" repository! This"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/dbt_project/.gitignore",
    "chars": 87,
    "preview": "\ntarget/\ndbt_packages/\nlogs/\n\n#Desktop Services Store\n.DS_Store\n\n#User cookie\n.user.yml"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/dbt_project/README.md",
    "chars": 571,
    "preview": "Welcome to your new dbt project!\n\n### Using the starter project\n\nTry running the following commands:\n- dbt run\n- dbt tes"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/dbt_project/analyses/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/dbt_project/dbt_project.yml",
    "chars": 1265,
    "preview": "\n# Name your project! Project names should contain only lowercase characters\n# and underscores. A good package name shou"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/dbt_project/macros/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/dbt_project/models/example/my_first_dbt_model.sql",
    "chars": 480,
    "preview": "\n/*\n    Welcome to your first dbt model!\n    Did you know that you can also configure models directly within SQL files?\n"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/dbt_project/models/example/my_second_dbt_model.sql",
    "chars": 115,
    "preview": "\n-- Use the `ref` function to select from other models\n\nselect *\nfrom {{ ref('my_first_dbt_model') }}\nwhere id = 1\n"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/dbt_project/models/example/schema.yml",
    "chars": 437,
    "preview": "\nversion: 2\n\nmodels:\n  - name: my_first_dbt_model\n    description: \"A starter dbt model\"\n    columns:\n      - name: id\n "
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/dbt_project/models/sources.yml",
    "chars": 232,
    "preview": "version: 2\n\nsources:\n  - name: snowflake\n    tables:\n      - name: sample_table\n        meta:\n          dagster:\n       "
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/dbt_project/profiles.yml",
    "chars": 693,
    "preview": "dbt_project:\n  outputs:\n    dev:\n\n      type: snowflake\n      account: \"{{ env_var('DBT_SNOWFLAKE_ACCOUNT_ID', '') }}\"\n\n"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/dbt_project/seeds/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/dbt_project/snapshots/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/dbt_project/tests/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/infra/.gitignore",
    "chars": 883,
    "preview": "# Local .terraform directories\n**/.terraform/*\n\n# .tfstate files\n*.tfstate\n*.tfstate.*\n\n# Crash log files\ncrash.log\ncras"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/infra/airbyte/main.tf",
    "chars": 2378,
    "preview": "// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs\n\n// S"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/infra/airbyte/provider.tf",
    "chars": 609,
    "preview": "// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs\n\nterr"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/infra/airbyte/variables.tf",
    "chars": 52,
    "preview": "variable \"workspace_id\" {\n    type = string\n}\n\n\n\n\n\n\n"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/orchestration/orchestration/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/orchestration/orchestration/assets.py",
    "chars": 672,
    "preview": "import os\nfrom dagster import OpExecutionContext\nfrom dagster_dbt import DbtCliResource, dbt_assets\nfrom dagster_airbyte"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/orchestration/orchestration/constants.py",
    "chars": 673,
    "preview": "import os\nfrom pathlib import Path\n\nfrom dagster_dbt import DbtCliResource\n\ndbt_project_dir = Path(__file__).joinpath(\"."
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/orchestration/orchestration/definitions.py",
    "chars": 408,
    "preview": "import os\n\nfrom dagster import Definitions\nfrom dagster_dbt import DbtCliResource\n\nfrom .assets import dbt_project_dbt_a"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/orchestration/orchestration/schedules.py",
    "chars": 414,
    "preview": "\"\"\"\nTo add a daily schedule that materializes your dbt assets, uncomment the following lines.\n\"\"\"\nfrom dagster_dbt impor"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/orchestration/pyproject.toml",
    "chars": 175,
    "preview": "[build-system]\nrequires = [\"setuptools\"]\nbuild-backend = \"setuptools.build_meta\"\n\n[tool.dagster]\nmodule_name = \"orchestr"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/orchestration/setup.py",
    "chars": 367,
    "preview": "from setuptools import find_packages, setup\n\nsetup(\n    name=\"orchestration\",\n    version=\"0.0.1\",\n    packages=find_pac"
  },
  {
    "path": "airbyte_dbt_dagster_snowflake/setup.py",
    "chars": 323,
    "preview": "from setuptools import find_packages, setup\n\nsetup(\n    name=\"airbyte-dbt-dagster-snowflake\",\n    packages=find_packages"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/.gitignore",
    "chars": 3112,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/README.md",
    "chars": 13915,
    "preview": "# Airbyte-dbt-Prefect-BigQuery Integration\n\nWelcome to the Prefect, Airbyte, dbt (PAD) Stack with BigQuery quickstart! T"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/dbt_project/.gitignore",
    "chars": 87,
    "preview": "\ntarget/\ndbt_packages/\nlogs/\n\n#Desktop Services Store\n.DS_Store\n\n#User cookie\n.user.yml"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/dbt_project/README.md",
    "chars": 571,
    "preview": "Welcome to your new dbt project!\n\n### Using the starter project\n\nTry running the following commands:\n- dbt run\n- dbt tes"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/dbt_project/analyses/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/dbt_project/dbt_project.yml",
    "chars": 1265,
    "preview": "\n# Name your project! Project names should contain only lowercase characters\n# and underscores. A good package name shou"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/dbt_project/macros/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/dbt_project/models/marts/product_popularity.sql",
    "chars": 283,
    "preview": "WITH base AS (\n  SELECT \n    product_id,\n    COUNT(id) AS purchase_count\n  FROM {{ ref('stg_purchases') }}\n  GROUP BY 1\n"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/dbt_project/models/marts/purchase_patterns.sql",
    "chars": 201,
    "preview": "SELECT \n  user_id,\n  product_id,\n  purchased_at,\n  added_to_cart_at,\n  TIMESTAMP_DIFF(purchased_at, added_to_cart_at, SE"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/dbt_project/models/marts/user_demographics.sql",
    "chars": 272,
    "preview": "WITH base AS (\n  SELECT \n    id AS user_id,\n    gender,\n    academic_degree,\n    nationality,\n    age\n  FROM {{ ref('stg"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/dbt_project/models/sources/faker_sources.yml",
    "chars": 1839,
    "preview": "version: 2\n\nsources:\n  - name: faker\n    database: your_bigquery_project_id # Update with your BigQuery project ID\n    s"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/dbt_project/models/staging/stg_products.sql",
    "chars": 155,
    "preview": "select\n    id,\n    year,\n    price,\n    model,\n    make,\n    created_at,\n    updated_at,\n    _airbyte_extracted_at,\nfrom"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/dbt_project/models/staging/stg_purchases.sql",
    "chars": 200,
    "preview": "select\n    id,\n    user_id,\n    product_id,\n    updated_at,\n    purchased_at,\n    returned_at,\n    created_at,\n    added"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/dbt_project/models/staging/stg_users.sql",
    "chars": 201,
    "preview": "select\n    id,\n    gender,\n    academic_degree,\n    title,\n    nationality,\n    age,\n    name,\n    email,\n    created_at"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/dbt_project/profiles.yml",
    "chars": 365,
    "preview": "dbt_project:\n  outputs:\n    dev:\n      dataset: transformed_data\n      job_execution_timeout_seconds: 300\n      job_retr"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/dbt_project/seeds/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/dbt_project/snapshots/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/dbt_project/tests/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/infra/.gitignore",
    "chars": 883,
    "preview": "# Local .terraform directories\n**/.terraform/*\n\n# .tfstate files\n*.tfstate\n*.tfstate.*\n\n# Crash log files\ncrash.log\ncras"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/infra/airbyte/main.tf",
    "chars": 1310,
    "preview": "// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs\n\n// S"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/infra/airbyte/provider.tf",
    "chars": 609,
    "preview": "// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs\n\nterr"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/infra/airbyte/variables.tf",
    "chars": 146,
    "preview": "variable \"workspace_id\" {\n    type = string\n}\n\nvariable \"project_id\" {\n    type = string\n}\n\nvariable \"credentials_json_p"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/orchestration/my_elt_flow.py",
    "chars": 1800,
    "preview": "import os\n\nfrom prefect import flow, task\n\nfrom prefect_airbyte.server import AirbyteServer\nfrom prefect_airbyte.connect"
  },
  {
    "path": "airbyte_dbt_prefect_bigquery/setup.py",
    "chars": 287,
    "preview": "from setuptools import find_packages, setup\n\nsetup(\n    name=\"airbyte-dbt-prefect-bigquery\",\n    packages=find_packages("
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/.gitignore",
    "chars": 3113,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/README.md",
    "chars": 8869,
    "preview": "# Airbyte-dbt-Prefect-Snowflake Integration\n\nWelcome to the \"Airbyte-dbt-Prefect-Snowflake Integration\" repository! This"
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/dbt_project/.gitignore",
    "chars": 88,
    "preview": "\ntarget/\ndbt_packages/\nlogs/\n\n#Desktop Services Store\n.DS_Store\n\n#User cookie\n.user.yml\n"
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/dbt_project/README.md",
    "chars": 571,
    "preview": "Welcome to your new dbt project!\n\n### Using the starter project\n\nTry running the following commands:\n- dbt run\n- dbt tes"
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/dbt_project/analyses/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/dbt_project/dbt_project.yml",
    "chars": 1265,
    "preview": "\n# Name your project! Project names should contain only lowercase characters\n# and underscores. A good package name shou"
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/dbt_project/macros/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/dbt_project/models/example/my_first_dbt_model.sql",
    "chars": 480,
    "preview": "\n/*\n    Welcome to your first dbt model!\n    Did you know that you can also configure models directly within SQL files?\n"
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/dbt_project/models/example/my_second_dbt_model.sql",
    "chars": 115,
    "preview": "\n-- Use the `ref` function to select from other models\n\nselect *\nfrom {{ ref('my_first_dbt_model') }}\nwhere id = 1\n"
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/dbt_project/models/example/schema.yml",
    "chars": 437,
    "preview": "\nversion: 2\n\nmodels:\n  - name: my_first_dbt_model\n    description: \"A starter dbt model\"\n    columns:\n      - name: id\n "
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/dbt_project/models/sources.yml",
    "chars": 232,
    "preview": "version: 2\n\nsources:\n  - name: snowflake\n    tables:\n      - name: sample_table\n        meta:\n          dagster:\n       "
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/dbt_project/profiles.yml",
    "chars": 694,
    "preview": "dbt_project:\n  outputs:\n    dev:\n\n      type: snowflake\n      account: \"{{ env_var('DBT_SNOWFLAKE_ACCOUNT_ID', '') }}\"\n\n"
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/dbt_project/seeds/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/dbt_project/snapshots/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/dbt_project/tests/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/infra/.gitignore",
    "chars": 884,
    "preview": "# Local .terraform directories\n**/.terraform/*\n\n# .tfstate files\n*.tfstate\n*.tfstate.*\n\n# Crash log files\ncrash.log\ncras"
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/infra/airbyte/main.tf",
    "chars": 2139,
    "preview": "// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs\n\n// S"
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/infra/airbyte/provider.tf",
    "chars": 610,
    "preview": "// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs\n\nterr"
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/infra/airbyte/variables.tf",
    "chars": 46,
    "preview": "variable \"workspace_id\" {\n    type = string\n}\n"
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/orchestration/my_elt_flow.py",
    "chars": 1895,
    "preview": "import os\n\nfrom prefect import flow, task\n\nfrom prefect_airbyte.server import AirbyteServer\nfrom prefect_airbyte.connect"
  },
  {
    "path": "airbyte_dbt_prefect_snowflake/setup.py",
    "chars": 317,
    "preview": "from setuptools import find_packages, setup\n\nsetup(\n    name=\"airbyte-dbt-prefect-snowflake\",\n    packages=find_packages"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/.gitignore",
    "chars": 3112,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/README.md",
    "chars": 16021,
    "preview": "# Airbyte-dbt-Snowflake-Looker Integration\n\nWelcome to the \"Airbyte-dbt-Snowflake-Looker Integration\" repository! This r"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/dbt_project/.gitignore",
    "chars": 87,
    "preview": "\ntarget/\ndbt_packages/\nlogs/\n\n#Desktop Services Store\n.DS_Store\n\n#User cookie\n.user.yml"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/dbt_project/README.md",
    "chars": 571,
    "preview": "Welcome to your new dbt project!\n\n### Using the starter project\n\nTry running the following commands:\n- dbt run\n- dbt tes"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/dbt_project/analyses/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_snowflake_looker/dbt_project/dbt_project.yml",
    "chars": 1265,
    "preview": "\n# Name your project! Project names should contain only lowercase characters\n# and underscores. A good package name shou"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/dbt_project/macros/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_snowflake_looker/dbt_project/models/example/my_first_dbt_model.sql",
    "chars": 480,
    "preview": "\n/*\n    Welcome to your first dbt model!\n    Did you know that you can also configure models directly within SQL files?\n"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/dbt_project/models/example/my_second_dbt_model.sql",
    "chars": 115,
    "preview": "\n-- Use the `ref` function to select from other models\n\nselect *\nfrom {{ ref('my_first_dbt_model') }}\nwhere id = 1\n"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/dbt_project/models/example/schema.yml",
    "chars": 437,
    "preview": "\nversion: 2\n\nmodels:\n  - name: my_first_dbt_model\n    description: \"A starter dbt model\"\n    columns:\n      - name: id\n "
  },
  {
    "path": "airbyte_dbt_snowflake_looker/dbt_project/models/sources.yml",
    "chars": 232,
    "preview": "version: 2\n\nsources:\n  - name: snowflake\n    tables:\n      - name: sample_table\n        meta:\n          dagster:\n       "
  },
  {
    "path": "airbyte_dbt_snowflake_looker/dbt_project/profiles.yml",
    "chars": 693,
    "preview": "dbt_project:\n  outputs:\n    dev:\n\n      type: snowflake\n      account: \"{{ env_var('DBT_SNOWFLAKE_ACCOUNT_ID', '') }}\"\n\n"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/dbt_project/seeds/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_snowflake_looker/dbt_project/snapshots/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_snowflake_looker/dbt_project/tests/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_snowflake_looker/infra/.gitignore",
    "chars": 883,
    "preview": "# Local .terraform directories\n**/.terraform/*\n\n# .tfstate files\n*.tfstate\n*.tfstate.*\n\n# Crash log files\ncrash.log\ncras"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/infra/airbyte/main.tf",
    "chars": 2138,
    "preview": "// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs\n\n// S"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/infra/airbyte/provider.tf",
    "chars": 609,
    "preview": "// Airbyte Terraform provider documentation: https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs\n\nterr"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/infra/airbyte/variables.tf",
    "chars": 52,
    "preview": "variable \"workspace_id\" {\n    type = string\n}\n\n\n\n\n\n\n"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/orchestration/orchestration/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "airbyte_dbt_snowflake_looker/orchestration/orchestration/assets.py",
    "chars": 672,
    "preview": "import os\nfrom dagster import OpExecutionContext\nfrom dagster_dbt import DbtCliResource, dbt_assets\nfrom dagster_airbyte"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/orchestration/orchestration/constants.py",
    "chars": 673,
    "preview": "import os\nfrom pathlib import Path\n\nfrom dagster_dbt import DbtCliResource\n\ndbt_project_dir = Path(__file__).joinpath(\"."
  },
  {
    "path": "airbyte_dbt_snowflake_looker/orchestration/orchestration/definitions.py",
    "chars": 408,
    "preview": "import os\n\nfrom dagster import Definitions\nfrom dagster_dbt import DbtCliResource\n\nfrom .assets import dbt_project_dbt_a"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/orchestration/orchestration/schedules.py",
    "chars": 414,
    "preview": "\"\"\"\nTo add a daily schedule that materializes your dbt assets, uncomment the following lines.\n\"\"\"\nfrom dagster_dbt impor"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/orchestration/pyproject.toml",
    "chars": 175,
    "preview": "[build-system]\nrequires = [\"setuptools\"]\nbuild-backend = \"setuptools.build_meta\"\n\n[tool.dagster]\nmodule_name = \"orchestr"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/orchestration/setup.py",
    "chars": 367,
    "preview": "from setuptools import find_packages, setup\n\nsetup(\n    name=\"orchestration\",\n    version=\"0.0.1\",\n    packages=find_pac"
  },
  {
    "path": "airbyte_dbt_snowflake_looker/setup.py",
    "chars": 322,
    "preview": "from setuptools import find_packages, setup\n\nsetup(\n    name=\"airbyte-dbt-snowflake-looker\",\n    packages=find_packages("
  }
]

// ... and 553 more files (download for full content)

About this extraction

This page contains the full source code of the airbytehq/quickstarts GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 753 files (4.4 MB), approximately 1.2M tokens, and a symbol index with 37 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!