master a74d0ea8451a cached
231 files
149.7 MB
94.1k tokens
47 symbols
1 requests
Download .txt
Showing preview only (331K chars total). Download the full file or copy to clipboard to get everything.
Repository: microsoft/c9-python-getting-started
Branch: master
Commit: a74d0ea8451a
Files: 231
Total size: 149.7 MB

Directory structure:
gitextract_0t_49sca/

├── .gitignore
├── CODE_OF_CONDUCT.md
├── LICENSE
├── README.md
├── SECURITY.md
├── even-more-python-for-beginners-data-tools/
│   ├── 01 - Jupyter Notebooks/
│   │   └── README.md
│   ├── 02 - Introduction to Anaconda and Conda/
│   │   └── README.md
│   ├── 03 - Intro to Pandas/
│   │   ├── 03 - Pandas Series and DataFrame.ipynb
│   │   └── README.md
│   ├── 04 - Examining Pandas DataFrame contents/
│   │   ├── 04 - Exploring pandas DataFrame contents.ipynb
│   │   └── README.md
│   ├── 05 - Query a pandas Dataframe/
│   │   ├── 05 - Querying DataFrames.ipynb
│   │   └── README.md
│   ├── 06 - CSV Files and Jupyter Notebooks/
│   │   ├── README.md
│   │   └── airports.csv
│   ├── 07 - Read and write CSV files from pandas DataFrames/
│   │   ├── 07 - Read write CSV files.ipynb
│   │   ├── README.md
│   │   ├── airports.csv
│   │   ├── airportsBlankValues.csv
│   │   ├── airportsInvalidRows.csv
│   │   └── airportsNoHeaderRows.csv
│   ├── 08 - Removing and splitting DataFrame columns/
│   │   ├── 08 - Removing columns.ipynb
│   │   ├── README.md
│   │   └── flight_delays.csv
│   ├── 09 - Handling duplicates and rows with missing values/
│   │   ├── 09 - Removing rows.ipynb
│   │   ├── Lots_of_flight_data.csv
│   │   ├── README.md
│   │   └── airportsDuplicateRows.csv
│   ├── 10 - Splitting test and training data with scikit-learn/
│   │   ├── 10 - Train Test split.ipynb
│   │   ├── Lots_of_flight_data.csv
│   │   └── README.md
│   ├── 11 - Train a linear regression model with scikit-learn/
│   │   ├── 11 - Train a basic model.ipynb
│   │   ├── Lots_of_flight_data.csv
│   │   └── README.md
│   ├── 12 - Testing a model/
│   │   ├── 12 - Test a model.ipynb
│   │   ├── Lots_of_flight_data.csv
│   │   └── README.md
│   ├── 13 - Evaluating accuracy of a model using calculations/
│   │   ├── 13 - Evaluate accuracy.ipynb
│   │   ├── Lots_of_flight_data.csv
│   │   └── README.md
│   ├── 14 - NumPy vs Pandas/
│   │   ├── 14 - Working with numpy and pandas.ipynb
│   │   ├── Lots_of_flight_data.csv
│   │   └── README.md
│   ├── 15 - Visualizing data with Matplotlib/
│   │   ├── 15 - Visualizing correlations.ipynb
│   │   ├── Lots_of_flight_data.csv
│   │   └── README.md
│   ├── README.md
│   └── Slides/
│       ├── 01 - Jupyter Notebooks.pptx
│       ├── 02 - Intro to Anaconda and conda.pptx
│       ├── 03 - Pandas series and DataFrame.pptx
│       ├── 04 - Examining pandas DataFrame contents.pptx
│       ├── 05 - Query a pandas DataFrame.pptx
│       ├── 06 - CSV Files and Jupyter notebooks.pptx
│       ├── 07 - Read and write CSV files from DataFrames.pptx
│       ├── 08 - Remove columns from DataFrame.pptx
│       ├── 09 - Remove rows with missing values.pptx
│       ├── 10 - Splitting test and training data.pptx
│       ├── 11 - Train a linear regression model with scikitlearn.pptx
│       ├── 12 - Testing a model.pptx
│       ├── 13 - Evaluate accuracy of a model using calculations.pptx
│       ├── 14 - Working with numpy and pandas.pptx
│       └── 15 - Visualizing Data Correlations with Matplotlib.pptx
├── more-python-for-beginners/
│   ├── .gitignore
│   ├── 01 - Formatting and linting/
│   │   ├── .vscode/
│   │   │   └── settings.json
│   │   ├── README.md
│   │   ├── bad.py
│   │   └── good.py
│   ├── 02 - Lambdas/
│   │   ├── README.md
│   │   ├── failed_sort.py
│   │   ├── lambda_sorter.py
│   │   └── method_sorter.py
│   ├── 03 - Classes/
│   │   ├── README.md
│   │   ├── basic_class.py
│   │   └── properties_class.py
│   ├── 04 - Inheritance/
│   │   ├── README.md
│   │   └── demo.py
│   ├── 05 - Mixins/
│   │   ├── README.md
│   │   └── demo.py
│   ├── 06 - Managing the file system/
│   │   ├── README.md
│   │   ├── demo.txt
│   │   ├── directories.py
│   │   ├── files.py
│   │   └── paths.py
│   ├── 07 - Reading and writing files/
│   │   ├── README.md
│   │   ├── demo.txt
│   │   ├── manage.py
│   │   ├── read.py
│   │   └── write.py
│   ├── 08 - Managing external resources/
│   │   ├── README.md
│   │   ├── demo.py
│   │   └── output.txt
│   ├── 09 - Asynchronous programming/
│   │   ├── README.md
│   │   ├── async_demo.py
│   │   └── sync_demo.py
│   ├── README.md
│   ├── Slides/
│   │   ├── 01 - Formatting and linting.pptx
│   │   ├── 02 - Lambdas.pptx
│   │   ├── 03 - Classes.pptx
│   │   ├── 04 - Inhheritance.pptx
│   │   ├── 05 - Mixins (multiple inheritance).pptx
│   │   ├── 06 - Managing the file system.pptx
│   │   ├── 07 - Working with files.pptx
│   │   ├── 08 - Cleanup with with.pptx
│   │   └── 09 - Asynchronous operations.pptx
│   └── requirements.txt
└── python-for-beginners/
    ├── 02 - Print/
    │   ├── README.md
    │   ├── ask_for_input.py
    │   ├── coding_challenge.py
    │   ├── coding_challenge_solution.py
    │   ├── hello_world.py
    │   ├── print_blank_line.py
    │   └── single_or_double_quotes.py
    ├── 03 - Comments/
    │   ├── README.md
    │   ├── comments_are_not_executed.py
    │   ├── comments_for_debugging.py
    │   ├── enable_pin.py
    │   └── string_in_double_quotes.py
    ├── 04 - String variables/
    │   ├── README.md
    │   ├── code_challenge.py
    │   ├── code_challenge_solution.py
    │   ├── combine_strings.py
    │   ├── format_strings.py
    │   ├── string_functions.py
    │   └── strings_in_variables.py
    ├── 05 - Numeric variables/
    │   ├── README.md
    │   ├── code_challenge.py
    │   ├── code_challenge_solution.py
    │   ├── combining_strings_and_numbers.py
    │   ├── convert_strings_to_numbers_for_math.py
    │   ├── doing_math.py
    │   ├── numbers_treated_as_strings.py
    │   └── print_pi.py
    ├── 06 - Dates/
    │   ├── README.md
    │   ├── code_challenge.py
    │   ├── code_challenge_solution.py
    │   ├── date_functions.py
    │   ├── format_date.py
    │   ├── get_current_date.py
    │   └── input_date.py
    ├── 07 - Error handling/
    │   ├── README.md
    │   ├── logic.py
    │   ├── runtime.py
    │   └── syntax.py
    ├── 08 - Handling conditions/
    │   ├── README.md
    │   ├── add_else.py
    │   ├── add_else_different_indentation.py
    │   ├── case_insensitive_comparisons.py
    │   ├── check_tax.py
    │   ├── code_challenge.py
    │   ├── code_challenge_solution.py
    │   └── comparing_strings.py
    ├── 09 - Handling multiple conditions/
    │   ├── README.md
    │   ├── add_else_to_elif.py
    │   ├── code_challenge.py
    │   ├── code_challenge_solution.py
    │   ├── multiple_if_statements.py
    │   ├── nested_if.py
    │   ├── or_statements.py
    │   ├── use_elif.py
    │   └── use_in_statements.py
    ├── 10 - Complex conditon checks/
    │   ├── boolean_variables.py
    │   ├── code_challenge.py
    │   ├── code_challenge_solution.py
    │   ├── readme.md
    │   └── using_and.py
    ├── 11 - Collections/
    │   ├── README.md
    │   ├── arrays.py
    │   ├── common-operations.py
    │   ├── dictionaries.py
    │   ├── lists.py
    │   └── ranges.py
    ├── 12 - Loops/
    │   ├── README.md
    │   ├── for.py
    │   ├── number.py
    │   └── while.py
    ├── 13 - Functions/
    │   ├── README.md
    │   ├── code_challenge.py
    │   ├── code_challenge_solution.py
    │   ├── get_initails_function.py
    │   ├── get_initials.py
    │   ├── getting_clever_with_functions_harder_to_read.py
    │   ├── print_time_function.py
    │   ├── print_time_function_different_messages.py
    │   ├── print_time_function_fix_import.py
    │   ├── print_time_repeated_code.py
    │   └── print_time_with_message_parameter.py
    ├── 14 - Function parameters/
    │   ├── code_challenge.py
    │   ├── code_challenge_solution.py
    │   ├── get_initials_default_values.py
    │   ├── get_initials_function.py
    │   ├── get_initials_multiple_parameters.py
    │   ├── get_initials_named_parameters.py
    │   ├── named_parameters_make_code_readable.py
    │   └── readme.md
    ├── 15 - Packages/
    │   ├── README.md
    │   ├── color_import_demo.py
    │   ├── helpers.py
    │   ├── import_module.py
    │   └── requirements.txt
    ├── 16 - Calling APIs/
    │   ├── call_api.py
    │   ├── code_challenge.py
    │   └── readme.md
    ├── 17 - JSON/
    │   ├── create_json_from_dict.py
    │   ├── create_json_with_list.py
    │   ├── create_json_with_nested_dict.py
    │   ├── read_json.py
    │   ├── read_key_pair.py
    │   ├── read_key_pair_list.py
    │   ├── read_subkey.py
    │   └── readme.md
    ├── 18 - Decorators/
    │   ├── README.md
    │   └── creating_decorators.py
    ├── README.md
    └── Slides/
        ├── 0 - Intro.pptx
        ├── 1 - Getting started.pptx
        ├── 10 - ComplexConditionChecks.pptx
        ├── 11 - Collections.pptx
        ├── 12 - Loops.pptx
        ├── 13 - Functions.pptx
        ├── 14 - FunctionParameters.pptx
        ├── 15 - ModulesPackages.pptx
        ├── 16 - CallingAPI.pptx
        ├── 17 - JSON.pptx
        ├── 2 - Print.pptx
        ├── 3 - Comments.pptx
        ├── 4 - StringVariables.pptx
        ├── 5 - NumericVariables.pptx
        ├── 6 - Dates.pptx
        ├── 7 - ErrorHandling.pptx
        ├── 8 - Conditions.pptx
        └── 9 - MultipleConditions.pptx

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
env

# Created by https://www.gitignore.io/api/python,visualstudiocode
# Edit at https://www.gitignore.io/?templates=python,visualstudiocode

### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don’t work, or not
#   install all needed dependencies.
#Pipfile.lock

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

### VisualStudioCode ###
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json

### VisualStudioCode Patch ###
# Ignore all local history of files
.history

# End of https://www.gitignore.io/api/python,visualstudiocode


================================================
FILE: CODE_OF_CONDUCT.md
================================================
# Microsoft Open Source Code of Conduct

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/?WT.mc_id=python-c9-niner).

Resources:

- [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/?WT.mc_id=python-c9-niner)
- [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/?WT.mc_id=python-c9-niner)
- Contact [opencode@microsoft.com](mailto:opencode@microsoft.com?WT.mc_id=python-c9-niner) with questions or concerns


================================================
FILE: LICENSE
================================================
    MIT License

    Copyright (c) Microsoft Corporation.

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to deal
    in the Software without restriction, including without limitation the rights
    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
    copies of the Software, and to permit persons to whom the Software is
    furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all
    copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
    SOFTWARE


================================================
FILE: README.md
================================================
# Getting started with Python

## Overview

These three series on Channel 9 and YouTube are designed to help get you up to speed on Python. If you're a beginning developer looking to add Python to your quiver of languages or trying to get started on data science or web project which uses Python, these videos are here to help show you the foundations necessary to walk through a tutorial or other quick start.

We do assume you are familiar with another programming language, and some core programming concepts. For example, we highlight the syntax for boolean expressions and creating classes, but we don't dig into what a [boolean](https://en.wikipedia.org/wiki/Boolean_data_type) is or [object oriented design](https://en.wikipedia.org/wiki/Object-oriented_design). We show you how to perform the tasks you're familiar with in other languages in Python.

### What you'll learn

- The basics of Python
- Common syntax
- Popular packages

## Prerequisites

- Light experience with another programming language, such as [JavaScript](https://www.edx.org/course/javascript-introduction), [Java](https://www.java.com) or [C#](https://docs.microsoft.com/dotnet/csharp/)
- [An understanding of Git](https://git-scm.com/book/en/v1/Getting-Started)

## Courses

### Getting started

[Python for beginners](https://aka.ms/pythonbeginnerseries) is the perfect starting location for getting started. No Python experience is required! We'll show you how to set up [Visual Studio Code](https://code.visualstudio.com?WT.mc_id=python-c9-niner) as your code editor, and start creating Python code. You'll see how to manage create, structure and run your code, how to manage packages, and even make [REST calls](https://en.wikipedia.org/wiki/Representational_state_transfer).

### Dig a little deeper

[More Python for beginners](https://aka.ms/morepython) digs deeper into Python syntax. You'll explore how to create classes and mixins in Python, how to work with the file system, and introduce `async/await`. This is the perfect next step if you're looking to see a bit more of what Python can do.

### Peek at data science tools

[Even more Python for beginners](https://aka.ms/evenmorepython) is a practical exploration of a couple of the most common packages and tools you'll use when working with data and machine learning. While we won't dig into why you choose particular machine learning models (that's another course), you will get hands-on with Jupyter Notebooks, and create and test models using scikit-learn and pandas.


## Next steps

As the goal of these courses is to help get you up to speed on Python so you can work through a quick start. The next step after completing the videos is to follow a tutorial! Here are a few of our favorites:

- [Quickstart: Detect faces in an image using the Face REST API and Python](https://docs.microsoft.com/azure/cognitive-services/face/QuickStarts/Python?WT.mc_id=python-c9-niner?WT.mc_id=python-c9-niner)
- [Quickstart: Analyze a local image using the Computer Vision REST API and Python](https://docs.microsoft.com/azure/cognitive-services/computer-vision/quickstarts/python-disk?WT.mc_id=python-c9-niner?WT.mc_id=python-c9-niner)
- [Quickstart: Using the Python REST API to call the Text Analytics Cognitive Service](https://docs.microsoft.com/azure/cognitive-services/Text-Analytics/quickstarts/python?WT.mc_id=python-c9-niner?WT.mc_id=python-c9-niner)
- [Tutorial: Build a Flask app with Azure Cognitive Services](https://docs.microsoft.com/azure/cognitive-services/translator/tutorial-build-flask-app-translation-synthesis?WT.mc_id=python-c9-niner)
- [Flask tutorial in Visual Studio Code](https://code.visualstudio.com/docs/python/tutorial-flask?WT.mc_id=python-c9-niner)
- [Django tutorial in Visual Studio Code](https://code.visualstudio.com/docs/python/tutorial-django?WT.mc_id=python-c9-niner)
- [Predict flight delays by creating a machine learning model in Python](https://docs.microsoft.com/learn/modules/predict-flight-delays-with-python?WT.mc_id=python-c9-niner)

## Contributing

This project welcomes contributions and suggestions.  Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.


When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.


================================================
FILE: SECURITY.md
================================================
<!-- BEGIN MICROSOFT SECURITY.MD V0.0.2 BLOCK -->

## Security

Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [many more](https://opensource.microsoft.com/?WT.mc_id=python-c9-niner).

If you believe you have found a security vulnerability in any Microsoft-owned repository that meets Microsoft's [definition](https://docs.microsoft.com/previous-versions/tn-archive/cc751383(v=technet.10)?WT.mc_id=python-c9-niner) of a security vulnerability, please report it to us as described below.

## Reporting Security Issues

**Please do not report security vulnerabilities through public GitHub issues.**

Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report?WT.mc_id=python-c9-niner).

If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com).  If possible, encrypt your message with our PGP key; please download it from the the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/msrc/pgp-key-msrc?WT.mc_id=python-c9-niner).

You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc?WT.mc_id=python-c9-niner).

Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:

  * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
  * Full paths of source file(s) related to the manifestation of the issue
  * The location of the affected source code (tag/branch/commit or direct URL)
  * Any special configuration required to reproduce the issue
  * Step-by-step instructions to reproduce the issue
  * Proof-of-concept or exploit code (if possible)
  * Impact of the issue, including how an attacker might exploit the issue

This information will help us triage your report more quickly.

If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://microsoft.com/msrc/bounty?WT.mc_id=python-c9-niner) page for more details about our active programs.

## Preferred Languages

We prefer all communications to be in English.

## Policy

Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/msrc/cvd?WT.mc_id=python-c9-niner).

<!-- END MICROSOFT SECURITY.MD BLOCK -->


================================================
FILE: even-more-python-for-beginners-data-tools/01 - Jupyter Notebooks/README.md
================================================
# Jupyter Notebooks

Jupyter Notebooks are an open source web application that allows you to create and share Python code. They are frequently used for data science. The code samples in this course are completed using Jupyter Notebooks which have a .ipynb file extension.

## Documentation

- [Jupyter](https://jupyter.org/) to install Jupyter so you can run Jupyter Notebooks locally on your computer
- [Jupyter Notebook viewer](https://nbviewer.jupyter.org/) to view Jupyter Notebooks in this GitHub repository without installing Jupyter
- [Azure Notebooks](https://notebooks.azure.com/) to create a free Azure Notebooks account to run Notebooks in the cloud
- [Create and run a notebook](https://docs.microsoft.com/azure/notebooks/tutorial-create-run-jupyter-notebook?WT.mc_id=python-c9-niner) is a tutorial that walks you through the process of using Azure Notebooks to create a complete Jupyter Notebook that demonstrates linear regression
- [How to create and clone projects](https://docs.microsoft.com/azure/notebooks/create-clone-jupyter-notebooks?WT.mc_id=python-c9-niner) to create a project
- [Manage and configure projects in Azure Notebooks](https://docs.microsoft.com/azure/notebooks/configure-manage-azure-notebooks-projects?WT.mc_id=python-c9-niner) to upload Notebooks to your project

## Microsoft Learn Resources

Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner).

- [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner)


================================================
FILE: even-more-python-for-beginners-data-tools/02 - Introduction to Anaconda and Conda/README.md
================================================
# Anaconda

[Anaconda](https://www.anaconda.com/) is an open source distribution of Python and R for data science. It includes more than 1500 packages, a graphical interface called Anaconda Navigator, a command line interface called Anaconda prompt and a tool called Conda.

## Conda

Python code often relies on external libraries stored in packages. Conda is an open source package management system and environment management system. Conda helps you manage environments and install packages for Jupyter Notebooks.

## Documentation

- [Conda home page](https://docs.conda.io/)
- [Managing Conda environments](https://docs.conda.io/projects/conda/latest/user-guide/tasks/manage-environments.html) to find links and instructions for creating Conda environments, activating, and de-activating Conda environments 
- [Managing packages](https://docs.conda.io/projects/conda/latest/user-guide/getting-started.html#managing-packages) to learn how to install packages in a Conda environment
- [Conda cheat sheet](https://docs.conda.io/projects/conda/latest/user-guide/cheatsheet.html) is a handy quick reference of common Conda commands


================================================
FILE: even-more-python-for-beginners-data-tools/03 - Intro to Pandas/03 - Pandas Series and DataFrame.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# pandas Series and DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## pandas\n",
    "**pandas** is an open source library providing data structures and data analysis tools for Python programmers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Series\n",
    "The pandas **Series** is a one dimensional array, similar to a Python list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     Seattle-Tacoma\n",
       "1             Dulles\n",
       "2    London Heathrow\n",
       "3           Schiphol\n",
       "4             Changi\n",
       "5            Pearson\n",
       "6             Narita\n",
       "dtype: object"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports = pd.Series([\n",
    "                      'Seattle-Tacoma', \n",
    "                      'Dulles', \n",
    "                      'London Heathrow', \n",
    "                      'Schiphol', \n",
    "                      'Changi', \n",
    "                      'Pearson', \n",
    "                      'Narita'\n",
    "                      ])\n",
    "\n",
    "# When using a notebook, you can use the print statement\n",
    "# print(airports) to examine the contents of a variable\n",
    "# or you can print a value on the screen by just typing the object name\n",
    "airports"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can reference an individual value in a Series using it's index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'London Heathrow'"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports[2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can use a loop to iterate through all the values in a Series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Seattle-Tacoma\n",
      "Dulles\n",
      "London Heathrow\n",
      "Schiphol\n",
      "Changi\n",
      "Pearson\n",
      "Narita\n"
     ]
    }
   ],
   "source": [
    "for value in airports:\n",
    "    print(value) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## DataFrame\n",
    "Most of the time when we are working with pandas we are dealing with two-dimensional arrays\n",
    "\n",
    "The pandas **DataFrame** can store two dimensional arrays"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seatte-Tacoma</td>\n",
       "      <td>Seattle</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>London Heathrow</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>Amsterdam</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Tokyo</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 0           1               2\n",
       "0    Seatte-Tacoma     Seattle             USA\n",
       "1           Dulles  Washington             USA\n",
       "2  London Heathrow      London  United Kingdom\n",
       "3         Schiphol   Amsterdam     Netherlands\n",
       "4           Changi   Singapore       Singapore\n",
       "5          Pearson     Toronto          Canada\n",
       "6           Narita       Tokyo           Japan"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports = pd.DataFrame([\n",
    "                        ['Seatte-Tacoma', 'Seattle', 'USA'],\n",
    "                        ['Dulles', 'Washington', 'USA'],\n",
    "                        ['London Heathrow', 'London', 'United Kingdom'],\n",
    "                        ['Schiphol', 'Amsterdam', 'Netherlands'],\n",
    "                        ['Changi', 'Singapore', 'Singapore'],\n",
    "                        ['Pearson', 'Toronto', 'Canada'],\n",
    "                        ['Narita', 'Tokyo', 'Japan']\n",
    "                        ])\n",
    "\n",
    "airports"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the **columns** parameter to specify names for the columns when you create the DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>City</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seatte-Tacoma</td>\n",
       "      <td>Seattle</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>London Heathrow</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>Amsterdam</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Tokyo</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Name        City         Country\n",
       "0    Seatte-Tacoma     Seattle             USA\n",
       "1           Dulles  Washington             USA\n",
       "2  London Heathrow      London  United Kingdom\n",
       "3         Schiphol   Amsterdam     Netherlands\n",
       "4           Changi   Singapore       Singapore\n",
       "5          Pearson     Toronto          Canada\n",
       "6           Narita       Tokyo           Japan"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports = pd.DataFrame([\n",
    "                        ['Seatte-Tacoma', 'Seattle', 'USA'],\n",
    "                        ['Dulles', 'Washington', 'USA'],\n",
    "                        ['London Heathrow', 'London', 'United Kingdom'],\n",
    "                        ['Schiphol', 'Amsterdam', 'Netherlands'],\n",
    "                        ['Changi', 'Singapore', 'Singapore'],\n",
    "                        ['Pearson', 'Toronto', 'Canada'],\n",
    "                        ['Narita', 'Tokyo', 'Japan']\n",
    "                        ],\n",
    "                        columns = ['Name', 'City', 'Country']\n",
    "                        )\n",
    "\n",
    "airports "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: even-more-python-for-beginners-data-tools/03 - Intro to Pandas/README.md
================================================
# pandas

[pandas](https://pandas/pydata.org​) is an open source Python library contains a number of high performance data structures and tools for data analysis.

## Documentation

- [Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html) stores one dimensional arrays
- [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html) stores two dimensional arrays and can contain different datatypes

## Microsoft Learn Resources

Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner).

- [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner)


================================================
FILE: even-more-python-for-beginners-data-tools/04 - Examining Pandas DataFrame contents/04 - Exploring pandas DataFrame contents.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Examining pandas DataFrame contents\n",
    "It's useful to be able to quickly examine the contents of a DataFrame. \n",
    "\n",
    "Let's start by importing the pandas library and creating a DataFrame populated with information about airports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>City</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seatte-Tacoma</td>\n",
       "      <td>Seattle</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Heathrow</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>Amsterdam</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Tokyo</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Name        City         Country\n",
       "0  Seatte-Tacoma     Seattle             USA\n",
       "1         Dulles  Washington             USA\n",
       "2       Heathrow      London  United Kingdom\n",
       "3       Schiphol   Amsterdam     Netherlands\n",
       "4         Changi   Singapore       Singapore\n",
       "5        Pearson     Toronto          Canada\n",
       "6         Narita       Tokyo           Japan"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports = pd.DataFrame([\n",
    "                        ['Seatte-Tacoma', 'Seattle', 'USA'],\n",
    "                        ['Dulles', 'Washington', 'USA'],\n",
    "                        ['Heathrow', 'London', 'United Kingdom'],\n",
    "                        ['Schiphol', 'Amsterdam', 'Netherlands'],\n",
    "                        ['Changi', 'Singapore', 'Singapore'],\n",
    "                        ['Pearson', 'Toronto', 'Canada'],\n",
    "                        ['Narita', 'Tokyo', 'Japan']\n",
    "                        ],\n",
    "                        columns = ['Name', 'City', 'Country']\n",
    "                        )\n",
    "\n",
    "airports "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Returning first *n* rows\n",
    "If you have thousands of rows, you might just want to look at the first few rows\n",
    "\n",
    "* **head**(*n*) returns the top *n* rows "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>City</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seatte-Tacoma</td>\n",
       "      <td>Seattle</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Heathrow</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Name        City         Country\n",
       "0  Seatte-Tacoma     Seattle             USA\n",
       "1         Dulles  Washington             USA\n",
       "2       Heathrow      London  United Kingdom"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Returning last *n* rows\n",
    "Looking at the last rows in a DataFrame can be a good way to check that all your data loaded correctly\n",
    "* **tail**(*n*) returns the last *n* rows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>City</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Tokyo</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name       City    Country\n",
       "4   Changi  Singapore  Singapore\n",
       "5  Pearson    Toronto     Canada\n",
       "6   Narita      Tokyo      Japan"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports.tail(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Checkign number of rows and columns in DataFrame\n",
    "Sometimes you just need to know how much data you have in the DataFrame\n",
    "\n",
    "* **shape** returns the number of rows and columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(7, 3)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting mroe detailed information about DataFrame contents\n",
    "\n",
    "* **info**() returns more detailed information about the DataFrame\n",
    "\n",
    "Information returned includes:\n",
    "* The number of rows, and the range of index values\n",
    "* The number of columns\n",
    "* For each column: column name, number of non-null values, the datatype\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 7 entries, 0 to 6\n",
      "Data columns (total 3 columns):\n",
      "Name       7 non-null object\n",
      "City       7 non-null object\n",
      "Country    7 non-null object\n",
      "dtypes: object(3)\n",
      "memory usage: 148.0+ bytes\n"
     ]
    }
   ],
   "source": [
    "airports.info()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: even-more-python-for-beginners-data-tools/04 - Examining Pandas DataFrame contents/README.md
================================================
# Examining pandas DataFrame contents

The pandas [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) is a structure for storing two-dimensional tabular data.

## Common functions

- [head](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) returns the first *n* rows from the DataFrame
- [tail](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) returns the last *n* rows from the DataFrame
- [shape](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html) returns the dimensions of the DataFrame (e.g. number of rows and columns)
- [info](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html) provides a summary of the DataFrame content including column names, their datatypes, and number of rows containing non-null values


================================================
FILE: even-more-python-for-beginners-data-tools/05 - Query a pandas Dataframe/05 - Querying DataFrames.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Query a pandas DataFrame \n",
    "\n",
    "Returning a portion of the data in a DataFrame is called slicing or dicing the data\n",
    "\n",
    "There are many different ways to query a pandas DataFrame, here are a few to get you started"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>City</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seatte-Tacoma</td>\n",
       "      <td>Seattle</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>London Heathrow</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>Amsterdam</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Tokyo</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Name        City         Country\n",
       "0    Seatte-Tacoma     Seattle             USA\n",
       "1           Dulles  Washington             USA\n",
       "2  London Heathrow      London  United Kingdom\n",
       "3         Schiphol   Amsterdam     Netherlands\n",
       "4           Changi   Singapore       Singapore\n",
       "5          Pearson     Toronto          Canada\n",
       "6           Narita       Tokyo           Japan"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports = pd.DataFrame([\n",
    "                        ['Seatte-Tacoma', 'Seattle', 'USA'],\n",
    "                        ['Dulles', 'Washington', 'USA'],\n",
    "                        ['London Heathrow', 'London', 'United Kingdom'],\n",
    "                        ['Schiphol', 'Amsterdam', 'Netherlands'],\n",
    "                        ['Changi', 'Singapore', 'Singapore'],\n",
    "                        ['Pearson', 'Toronto', 'Canada'],\n",
    "                        ['Narita', 'Tokyo', 'Japan']\n",
    "                        ],\n",
    "                        columns = ['Name', 'City', 'Country']\n",
    "                        )\n",
    "airports "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Return one column\n",
    "Specify the name of the column you want to return\n",
    "* *DataFrameName*['*columnName*']\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       Seattle\n",
       "1    Washington\n",
       "2        London\n",
       "3     Amsterdam\n",
       "4     Singapore\n",
       "5       Toronto\n",
       "6         Tokyo\n",
       "Name: City, dtype: object"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports['City']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Return multiple columns\n",
    "Provide a list of the columns you want to return\n",
    "* *DataFrameName*[['*FirstColumnName*','*SecondColumnName*',...]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seatte-Tacoma</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>London Heathrow</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Name         Country\n",
       "0    Seatte-Tacoma             USA\n",
       "1           Dulles             USA\n",
       "2  London Heathrow  United Kingdom\n",
       "3         Schiphol     Netherlands\n",
       "4           Changi       Singapore\n",
       "5          Pearson          Canada\n",
       "6           Narita           Japan"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports[['Name', 'Country']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using *iloc* to specify rows and columns to return\n",
    "**iloc**[*rows*,*columns*] allows you to access a group of rows or columns by row and column index positions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You specify the specific row and column you want returned\n",
    "* First row is row 0\n",
    "* First column is column 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Seatte-Tacoma'"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Return the value in the first row, first column\n",
    "airports.iloc[0,0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'United Kingdom'"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Return the value in the third row, third column\n",
    "airports.iloc[2,2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A value of *:* returns all rows or all columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>City</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seatte-Tacoma</td>\n",
       "      <td>Seattle</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>London Heathrow</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>Amsterdam</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Tokyo</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Name        City         Country\n",
       "0    Seatte-Tacoma     Seattle             USA\n",
       "1           Dulles  Washington             USA\n",
       "2  London Heathrow      London  United Kingdom\n",
       "3         Schiphol   Amsterdam     Netherlands\n",
       "4           Changi   Singapore       Singapore\n",
       "5          Pearson     Toronto          Canada\n",
       "6           Narita       Tokyo           Japan"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports.iloc[:,:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can request a range of rows or a range of columns\n",
    "* [x:y] will return rows or columns x through y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>City</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seatte-Tacoma</td>\n",
       "      <td>Seattle</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Name        City Country\n",
       "0  Seatte-Tacoma     Seattle     USA\n",
       "1         Dulles  Washington     USA"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Return the first two rows and display all columns \n",
    "airports.iloc[0:2,:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>City</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seatte-Tacoma</td>\n",
       "      <td>Seattle</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>London Heathrow</td>\n",
       "      <td>London</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>Amsterdam</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Toronto</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Tokyo</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Name        City\n",
       "0    Seatte-Tacoma     Seattle\n",
       "1           Dulles  Washington\n",
       "2  London Heathrow      London\n",
       "3         Schiphol   Amsterdam\n",
       "4           Changi   Singapore\n",
       "5          Pearson     Toronto\n",
       "6           Narita       Tokyo"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Return all rows and display the first two columns\n",
    "airports.iloc[:,0:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can request a list of rows or a list of columns\n",
    "* [x,y,z] will return rows or columns x,y, and z"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seatte-Tacoma</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>London Heathrow</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Name         Country\n",
       "0    Seatte-Tacoma             USA\n",
       "1           Dulles             USA\n",
       "2  London Heathrow  United Kingdom\n",
       "3         Schiphol     Netherlands\n",
       "4           Changi       Singapore\n",
       "5          Pearson          Canada\n",
       "6           Narita           Japan"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports.iloc[:,[0,2]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using *loc* to specify columns by name\n",
    "If you want to list the column names instead of the column positions use **loc** instead of **iloc**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seatte-Tacoma</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>London Heathrow</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Name         Country\n",
       "0    Seatte-Tacoma             USA\n",
       "1           Dulles             USA\n",
       "2  London Heathrow  United Kingdom\n",
       "3         Schiphol     Netherlands\n",
       "4           Changi       Singapore\n",
       "5          Pearson          Canada\n",
       "6           Narita           Japan"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports.loc[:,['Name', 'Country']]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: even-more-python-for-beginners-data-tools/05 - Query a pandas Dataframe/README.md
================================================
# Query a pandas DataFrame

The pandas [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)  is a structure for storing two-dimensional tabular data.

## Common properties

- [loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) returns specific rows and columns by specifying column names
- [iloc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html) returns specific rows and columns by specifying column positions

## Microsoft Learn Resources

Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner).

- [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner)


================================================
FILE: even-more-python-for-beginners-data-tools/06 - CSV Files and Jupyter Notebooks/README.md
================================================
# CSV Files and Jupyter Notebooks

CSV files are comma separated variable file. CSV files are frequently used to store data. In order to access the data in a CSV file from a Jupyter Notebook you must upload the file. 

## Microsoft Learn Resources

Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner).

- [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner)


================================================
FILE: even-more-python-for-beginners-data-tools/06 - CSV Files and Jupyter Notebooks/airports.csv
================================================
Name,City,Country
Seattle-Tacoma,Seattle,USA
Dulles,Washington,USA
Heathrow,London,United Kingdom
Schiphol,Amsterdam,Netherlands
Changi,Singapore,Singapore
Pearson,Toronto,Canada
Narita,Tokyo,Japan


================================================
FILE: even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/07 - Read write CSV files.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Read and write CSV files with pandas DataFrames\n",
    "\n",
    "You can load data from a CSV file directly into a pandas DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reading a CSV file into a pandas DataFrame\n",
    "**read_csv** allows you to read the contents of a csv file into a DataFrame\n",
    "\n",
    "airports.csv contains the following:  \n",
    "\n",
    "Name,City,Country  \n",
    "Seattle-Tacoma,Seattle,USA  \n",
    "Dulles,Washington,USA  \n",
    "Heathrow,London,United Kingdom  \n",
    "Schiphol,Amsterdam,Netherlands  \n",
    "Changi,Singapore,Singapore  \n",
    "Pearson,Toronto,Canada  \n",
    "Narita,Tokyo,Japan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>City</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seattle-Tacoma</td>\n",
       "      <td>Seattle</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Heathrow</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>Amsterdam</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Tokyo</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Name        City         Country\n",
       "0  Seattle-Tacoma     Seattle             USA\n",
       "1          Dulles  Washington             USA\n",
       "2        Heathrow      London  United Kingdom\n",
       "3        Schiphol   Amsterdam     Netherlands\n",
       "4          Changi   Singapore       Singapore\n",
       "5         Pearson     Toronto          Canada\n",
       "6          Narita       Tokyo           Japan"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports_df = pd.read_csv('Data/airports.csv')\n",
    "airports_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Handling rows with errors\n",
    "By default rows with an extra , or other issues cause an error\n",
    "\n",
    "Note the extra , in the row for Heathrow London in airportsInvalidRows.csv:  \n",
    "\n",
    "Name,City,Country  \n",
    "Seattle-Tacoma,Seattle,USA  \n",
    "Dulles,Washington,USA  \n",
    "Heathrow,London,,United Kingdom  \n",
    "Schiphol,Amsterdam,Netherlands  \n",
    "Changi,Singapore,Singapore  \n",
    "Pearson,Toronto,Canada  \n",
    "Narita,Tokyo,Japan  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "ename": "ParserError",
     "evalue": "Error tokenizing data. C error: Expected 3 fields in line 4, saw 4\n",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mParserError\u001b[0m                               Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-3-73bdf61a29e1>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mairports_df\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'Data/airportsInvalidRows.csv'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m      2\u001b[0m \u001b[0mairports_df\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mparser_f\u001b[1;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)\u001b[0m\n\u001b[0;32m    683\u001b[0m         )\n\u001b[0;32m    684\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 685\u001b[1;33m         \u001b[1;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    686\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    687\u001b[0m     \u001b[0mparser_f\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mname\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36m_read\u001b[1;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[0;32m    461\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    462\u001b[0m     \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 463\u001b[1;33m         \u001b[0mdata\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mparser\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    464\u001b[0m     \u001b[1;32mfinally\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    465\u001b[0m         \u001b[0mparser\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mclose\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mread\u001b[1;34m(self, nrows)\u001b[0m\n\u001b[0;32m   1152\u001b[0m     \u001b[1;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mNone\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1153\u001b[0m         \u001b[0mnrows\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0m_validate_integer\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"nrows\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1154\u001b[1;33m         \u001b[0mret\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   1155\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1156\u001b[0m         \u001b[1;31m# May alter columns / col_dict\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mread\u001b[1;34m(self, nrows)\u001b[0m\n\u001b[0;32m   2046\u001b[0m     \u001b[1;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mNone\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   2047\u001b[0m         \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2048\u001b[1;33m             \u001b[0mdata\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_reader\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   2049\u001b[0m         \u001b[1;32mexcept\u001b[0m \u001b[0mStopIteration\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   2050\u001b[0m             \u001b[1;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_first_chunk\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader.read\u001b[1;34m()\u001b[0m\n",
      "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_low_memory\u001b[1;34m()\u001b[0m\n",
      "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_rows\u001b[1;34m()\u001b[0m\n",
      "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._tokenize_rows\u001b[1;34m()\u001b[0m\n",
      "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.raise_parser_error\u001b[1;34m()\u001b[0m\n",
      "\u001b[1;31mParserError\u001b[0m: Error tokenizing data. C error: Expected 3 fields in line 4, saw 4\n"
     ]
    }
   ],
   "source": [
    "airports_df = pd.read_csv('Data/airportsInvalidRows.csv')\n",
    "airports_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Specify **error_bad_lines=False** to skip any rows with errors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "b'Skipping line 4: expected 3 fields, saw 4\\n'\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>City</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seattle-Tacoma</td>\n",
       "      <td>Seattle</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>Amsterdam</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Tokyo</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Name        City      Country\n",
       "0  Seattle-Tacoma     Seattle          USA\n",
       "1          Dulles  Washington          USA\n",
       "2        Schiphol   Amsterdam  Netherlands\n",
       "3          Changi   Singapore    Singapore\n",
       "4         Pearson     Toronto       Canada\n",
       "5          Narita       Tokyo        Japan"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports_df = pd.read_csv(\n",
    "                          'Data/airportsInvalidRows.csv', \n",
    "                           error_bad_lines=False\n",
    "                           )\n",
    "airports_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Handling files which do not contain column headers\n",
    "If your file does not have the column headers in the first row by default, the first row of data is treated as headers\n",
    "\n",
    "airportsNoHeaderRows.csv contains airport data but does not have a row specifying the column headers:\n",
    "\n",
    "Seattle-Tacoma,Seattle,USA  \n",
    "Dulles,Washington,USA  \n",
    "Heathrow,London,United Kingdom  \n",
    "Schiphol,Amsterdam,Netherlands  \n",
    "Changi,Singapore,Singapore  \n",
    "Pearson,Toronto,Canada  \n",
    "Narita,Tokyo,Japan  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Seattle-Tacoma</th>\n",
       "      <th>Seattle</th>\n",
       "      <th>USA</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Heathrow</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>Amsterdam</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Tokyo</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Seattle-Tacoma     Seattle             USA\n",
       "0         Dulles  Washington             USA\n",
       "1       Heathrow      London  United Kingdom\n",
       "2       Schiphol   Amsterdam     Netherlands\n",
       "3         Changi   Singapore       Singapore\n",
       "4        Pearson     Toronto          Canada\n",
       "5         Narita       Tokyo           Japan"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports_df = pd.read_csv('Data/airportsNoHeaderRows.csv')\n",
    "airports_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Specify **header=None** if you do not have a Header row to avoid having the first row of data treated as a header row"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seattle-Tacoma</td>\n",
       "      <td>Seattle</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Heathrow</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>Amsterdam</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Tokyo</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                0           1               2\n",
       "0  Seattle-Tacoma     Seattle             USA\n",
       "1          Dulles  Washington             USA\n",
       "2        Heathrow      London  United Kingdom\n",
       "3        Schiphol   Amsterdam     Netherlands\n",
       "4          Changi   Singapore       Singapore\n",
       "5         Pearson     Toronto          Canada\n",
       "6          Narita       Tokyo           Japan"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports_df = pd.read_csv(\n",
    "                          'Data/airportsNoHeaderRows.csv', \n",
    "                           header=None\n",
    "                           )\n",
    "airports_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you do not have a header row you can use the **names** parameter to specify column names when data is loaded"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>City</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seattle-Tacoma</td>\n",
       "      <td>Seattle</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Heathrow</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>Amsterdam</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Tokyo</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Name        City         Country\n",
       "0  Seattle-Tacoma     Seattle             USA\n",
       "1          Dulles  Washington             USA\n",
       "2        Heathrow      London  United Kingdom\n",
       "3        Schiphol   Amsterdam     Netherlands\n",
       "4          Changi   Singapore       Singapore\n",
       "5         Pearson     Toronto          Canada\n",
       "6          Narita       Tokyo           Japan"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports_df = pd.read_csv(\n",
    "                          'Data/airportsNoHeaderRows.csv', \n",
    "                          header=None, \n",
    "                          names=['Name', 'City', 'Country']\n",
    "                          )\n",
    "airports_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Missing values in Data files\n",
    "Missing values appear in DataFrames as **NaN**\n",
    "\n",
    "There is no city listed for Schiphol airport in airportsBlankValues.csv :\n",
    "\n",
    "Name,City,Country  \n",
    "Seattle-Tacoma,Seattle,USA  \n",
    "Dulles,Washington,USA  \n",
    "Heathrow,London,United Kingdom  \n",
    "Schiphol,,Netherlands  \n",
    "Changi,Singapore,Singapore  \n",
    "Pearson,Toronto,Canada  \n",
    "Narita,Tokyo,Japan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>City</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seattle-Tacoma</td>\n",
       "      <td>Seattle</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Heathrow</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Tokyo</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Name        City         Country\n",
       "0  Seattle-Tacoma     Seattle             USA\n",
       "1          Dulles  Washington             USA\n",
       "2        Heathrow      London  United Kingdom\n",
       "3        Schiphol         NaN     Netherlands\n",
       "4          Changi   Singapore       Singapore\n",
       "5         Pearson     Toronto          Canada\n",
       "6          Narita       Tokyo           Japan"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports_df = pd.read_csv('Data/airportsBlankValues.csv')\n",
    "airports_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Writing DataFrame contents to a CSV file\n",
    "**to_csv** will write the contents of a pandas DataFrame to a CSV file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>City</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seattle-Tacoma</td>\n",
       "      <td>Seattle</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Heathrow</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Tokyo</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Name        City         Country\n",
       "0  Seattle-Tacoma     Seattle             USA\n",
       "1          Dulles  Washington             USA\n",
       "2        Heathrow      London  United Kingdom\n",
       "3        Schiphol         NaN     Netherlands\n",
       "4          Changi   Singapore       Singapore\n",
       "5         Pearson     Toronto          Canada\n",
       "6          Narita       Tokyo           Japan"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "airports_df.to_csv('Data/MyNewCSVFile.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The index column is written to the csv file\n",
    "\n",
    "Specify **index=False** if you do not want the index column to be included in the csv file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "airports_df.to_csv(\n",
    "                   'Data/MyNewCSVFileNoIndex.csv', \n",
    "                    index=False\n",
    "                    )"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/README.md
================================================
# Read and write CSV files from pandas DataFrames

You can populate a DataFrame with the data in a CSV file.

## Common functions and properties

- [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) reads a comma-separated values file into a DataFrame
- [to_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html) writes contents of a DataFrame to a comma-separated values file
- [NaN](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html) is the default representation of missing values

## Microsoft Learn Resources

Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner).

- [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner)


================================================
FILE: even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/airports.csv
================================================
Name,City,Country
Seattle-Tacoma,Seattle,USA
Dulles,Washington,USA
Heathrow,London,United Kingdom
Schiphol,Amsterdam,Netherlands
Changi,Singapore,Singapore
Pearson,Toronto,Canada
Narita,Tokyo,Japan


================================================
FILE: even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/airportsBlankValues.csv
================================================
Name,City,Country
Seattle-Tacoma,Seattle,USA
Dulles,Washington,USA
Heathrow,London,United Kingdom
Schiphol,,Netherlands
Changi,Singapore,Singapore
Pearson,Toronto,Canada
Narita,Tokyo,Japan


================================================
FILE: even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/airportsInvalidRows.csv
================================================
Name,City,Country
Seattle-Tacoma,Seattle,USA
Dulles,Washington,USA
Heathrow,London,,United Kingdom
Schiphol,Amsterdam,Netherlands
Changi,Singapore,Singapore
Pearson,Toronto,Canada
Narita,Tokyo,Japan


================================================
FILE: even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/airportsNoHeaderRows.csv
================================================
Seattle-Tacoma,Seattle,USA
Dulles,Washington,USA
Heathrow,London,United Kingdom
Schiphol,Amsterdam,Netherlands
Changi,Singapore,Singapore
Pearson,Toronto,Canada
Narita,Tokyo,Japan


================================================
FILE: even-more-python-for-beginners-data-tools/08 - Removing and splitting DataFrame columns/08 - Removing columns.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Removing and splitting pandas DataFrame columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When you are preparing to train machine learning models, you often need to delete specific columns, or split certain columns from your DataFrame into a new DataFrame.\n",
    "\n",
    "We need the pandas library and a DataFrame to explore"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's load a bigger csv file with more columns, **flight_delays.csv** provides information about flights and flight delays"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>FL_DATE</th>\n",
       "      <th>OP_UNIQUE_CARRIER</th>\n",
       "      <th>TAIL_NUM</th>\n",
       "      <th>OP_CARRIER_FL_NUM</th>\n",
       "      <th>ORIGIN</th>\n",
       "      <th>DEST</th>\n",
       "      <th>CRS_DEP_TIME</th>\n",
       "      <th>DEP_TIME</th>\n",
       "      <th>DEP_DELAY</th>\n",
       "      <th>CRS_ARR_TIME</th>\n",
       "      <th>ARR_TIME</th>\n",
       "      <th>ARR_DELAY</th>\n",
       "      <th>CRS_ELAPSED_TIME</th>\n",
       "      <th>ACTUAL_ELAPSED_TIME</th>\n",
       "      <th>AIR_TIME</th>\n",
       "      <th>DISTANCE</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N221WN</td>\n",
       "      <td>802</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>BWI</td>\n",
       "      <td>905</td>\n",
       "      <td>903</td>\n",
       "      <td>-2</td>\n",
       "      <td>1450</td>\n",
       "      <td>1433</td>\n",
       "      <td>-17</td>\n",
       "      <td>225</td>\n",
       "      <td>210</td>\n",
       "      <td>197</td>\n",
       "      <td>1670</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N8329B</td>\n",
       "      <td>3744</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>BWI</td>\n",
       "      <td>1500</td>\n",
       "      <td>1458</td>\n",
       "      <td>-2</td>\n",
       "      <td>2045</td>\n",
       "      <td>2020</td>\n",
       "      <td>-25</td>\n",
       "      <td>225</td>\n",
       "      <td>202</td>\n",
       "      <td>191</td>\n",
       "      <td>1670</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N920WN</td>\n",
       "      <td>1019</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>DAL</td>\n",
       "      <td>1800</td>\n",
       "      <td>1802</td>\n",
       "      <td>2</td>\n",
       "      <td>2045</td>\n",
       "      <td>2032</td>\n",
       "      <td>-13</td>\n",
       "      <td>105</td>\n",
       "      <td>90</td>\n",
       "      <td>80</td>\n",
       "      <td>580</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N480WN</td>\n",
       "      <td>1499</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>DAL</td>\n",
       "      <td>950</td>\n",
       "      <td>947</td>\n",
       "      <td>-3</td>\n",
       "      <td>1235</td>\n",
       "      <td>1223</td>\n",
       "      <td>-12</td>\n",
       "      <td>105</td>\n",
       "      <td>96</td>\n",
       "      <td>81</td>\n",
       "      <td>580</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N227WN</td>\n",
       "      <td>3635</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>DAL</td>\n",
       "      <td>1150</td>\n",
       "      <td>1151</td>\n",
       "      <td>1</td>\n",
       "      <td>1430</td>\n",
       "      <td>1423</td>\n",
       "      <td>-7</td>\n",
       "      <td>100</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "      <td>580</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      FL_DATE OP_UNIQUE_CARRIER TAIL_NUM  OP_CARRIER_FL_NUM ORIGIN DEST  \\\n",
       "0  2018-10-01                WN   N221WN                802    ABQ  BWI   \n",
       "1  2018-10-01                WN   N8329B               3744    ABQ  BWI   \n",
       "2  2018-10-01                WN   N920WN               1019    ABQ  DAL   \n",
       "3  2018-10-01                WN   N480WN               1499    ABQ  DAL   \n",
       "4  2018-10-01                WN   N227WN               3635    ABQ  DAL   \n",
       "\n",
       "   CRS_DEP_TIME  DEP_TIME  DEP_DELAY  CRS_ARR_TIME  ARR_TIME  ARR_DELAY  \\\n",
       "0           905       903         -2          1450      1433        -17   \n",
       "1          1500      1458         -2          2045      2020        -25   \n",
       "2          1800      1802          2          2045      2032        -13   \n",
       "3           950       947         -3          1235      1223        -12   \n",
       "4          1150      1151          1          1430      1423         -7   \n",
       "\n",
       "   CRS_ELAPSED_TIME  ACTUAL_ELAPSED_TIME  AIR_TIME  DISTANCE  \n",
       "0               225                  210       197      1670  \n",
       "1               225                  202       191      1670  \n",
       "2               105                   90        80       580  \n",
       "3               105                   96        81       580  \n",
       "4               100                   92        80       580  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "delays_df = pd.read_csv('Data/flight_delays.csv')\n",
    "delays_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Removing a column from a DataFrame.\n",
    "\n",
    "When you are preparing your data for machine learning, you may need to delete specific columns from the DataFrame before training the model.\n",
    "\n",
    "For example:\n",
    "Imagine you are training a model to predict how many minutes late a flight will be (ARR_DELAY)\n",
    "\n",
    "If the model knew the scheduled arrival time (CRS_ARR_TIME) and the actual arrival time (ARR_TIME), the model would quickly figure out ARR_DELAY = ARR_TIME - CRS_ARR_TIME\n",
    "\n",
    "When we predict arrival times for future flights, we won't have a value for  arrival time (ARR_TIME). So we should remove this column from the DataFrame so it is not used as a feature when training the model to predict ARR_DELAY.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>FL_DATE</th>\n",
       "      <th>OP_UNIQUE_CARRIER</th>\n",
       "      <th>TAIL_NUM</th>\n",
       "      <th>OP_CARRIER_FL_NUM</th>\n",
       "      <th>ORIGIN</th>\n",
       "      <th>DEST</th>\n",
       "      <th>CRS_DEP_TIME</th>\n",
       "      <th>DEP_TIME</th>\n",
       "      <th>DEP_DELAY</th>\n",
       "      <th>CRS_ARR_TIME</th>\n",
       "      <th>ARR_DELAY</th>\n",
       "      <th>CRS_ELAPSED_TIME</th>\n",
       "      <th>ACTUAL_ELAPSED_TIME</th>\n",
       "      <th>AIR_TIME</th>\n",
       "      <th>DISTANCE</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N221WN</td>\n",
       "      <td>802</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>BWI</td>\n",
       "      <td>905</td>\n",
       "      <td>903</td>\n",
       "      <td>-2</td>\n",
       "      <td>1450</td>\n",
       "      <td>-17</td>\n",
       "      <td>225</td>\n",
       "      <td>210</td>\n",
       "      <td>197</td>\n",
       "      <td>1670</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N8329B</td>\n",
       "      <td>3744</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>BWI</td>\n",
       "      <td>1500</td>\n",
       "      <td>1458</td>\n",
       "      <td>-2</td>\n",
       "      <td>2045</td>\n",
       "      <td>-25</td>\n",
       "      <td>225</td>\n",
       "      <td>202</td>\n",
       "      <td>191</td>\n",
       "      <td>1670</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N920WN</td>\n",
       "      <td>1019</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>DAL</td>\n",
       "      <td>1800</td>\n",
       "      <td>1802</td>\n",
       "      <td>2</td>\n",
       "      <td>2045</td>\n",
       "      <td>-13</td>\n",
       "      <td>105</td>\n",
       "      <td>90</td>\n",
       "      <td>80</td>\n",
       "      <td>580</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N480WN</td>\n",
       "      <td>1499</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>DAL</td>\n",
       "      <td>950</td>\n",
       "      <td>947</td>\n",
       "      <td>-3</td>\n",
       "      <td>1235</td>\n",
       "      <td>-12</td>\n",
       "      <td>105</td>\n",
       "      <td>96</td>\n",
       "      <td>81</td>\n",
       "      <td>580</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N227WN</td>\n",
       "      <td>3635</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>DAL</td>\n",
       "      <td>1150</td>\n",
       "      <td>1151</td>\n",
       "      <td>1</td>\n",
       "      <td>1430</td>\n",
       "      <td>-7</td>\n",
       "      <td>100</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "      <td>580</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      FL_DATE OP_UNIQUE_CARRIER TAIL_NUM  OP_CARRIER_FL_NUM ORIGIN DEST  \\\n",
       "0  2018-10-01                WN   N221WN                802    ABQ  BWI   \n",
       "1  2018-10-01                WN   N8329B               3744    ABQ  BWI   \n",
       "2  2018-10-01                WN   N920WN               1019    ABQ  DAL   \n",
       "3  2018-10-01                WN   N480WN               1499    ABQ  DAL   \n",
       "4  2018-10-01                WN   N227WN               3635    ABQ  DAL   \n",
       "\n",
       "   CRS_DEP_TIME  DEP_TIME  DEP_DELAY  CRS_ARR_TIME  ARR_DELAY  \\\n",
       "0           905       903         -2          1450        -17   \n",
       "1          1500      1458         -2          2045        -25   \n",
       "2          1800      1802          2          2045        -13   \n",
       "3           950       947         -3          1235        -12   \n",
       "4          1150      1151          1          1430         -7   \n",
       "\n",
       "   CRS_ELAPSED_TIME  ACTUAL_ELAPSED_TIME  AIR_TIME  DISTANCE  \n",
       "0               225                  210       197      1670  \n",
       "1               225                  202       191      1670  \n",
       "2               105                   90        80       580  \n",
       "3               105                   96        81       580  \n",
       "4               100                   92        80       580  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Remove the column ARR_TIME from the DataFrane delays_df\n",
    "\n",
    "#delays_df = delays_df.drop(['ARR_TIME'],axis=1)\n",
    "new_df = delays_df.drop(columns=['ARR_TIME'])\n",
    "new_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the **inplace** parameter to specify you want to drop the column from the original DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>FL_DATE</th>\n",
       "      <th>OP_UNIQUE_CARRIER</th>\n",
       "      <th>TAIL_NUM</th>\n",
       "      <th>OP_CARRIER_FL_NUM</th>\n",
       "      <th>ORIGIN</th>\n",
       "      <th>DEST</th>\n",
       "      <th>CRS_DEP_TIME</th>\n",
       "      <th>DEP_TIME</th>\n",
       "      <th>DEP_DELAY</th>\n",
       "      <th>CRS_ARR_TIME</th>\n",
       "      <th>ARR_DELAY</th>\n",
       "      <th>CRS_ELAPSED_TIME</th>\n",
       "      <th>ACTUAL_ELAPSED_TIME</th>\n",
       "      <th>AIR_TIME</th>\n",
       "      <th>DISTANCE</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N221WN</td>\n",
       "      <td>802</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>BWI</td>\n",
       "      <td>905</td>\n",
       "      <td>903</td>\n",
       "      <td>-2</td>\n",
       "      <td>1450</td>\n",
       "      <td>-17</td>\n",
       "      <td>225</td>\n",
       "      <td>210</td>\n",
       "      <td>197</td>\n",
       "      <td>1670</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N8329B</td>\n",
       "      <td>3744</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>BWI</td>\n",
       "      <td>1500</td>\n",
       "      <td>1458</td>\n",
       "      <td>-2</td>\n",
       "      <td>2045</td>\n",
       "      <td>-25</td>\n",
       "      <td>225</td>\n",
       "      <td>202</td>\n",
       "      <td>191</td>\n",
       "      <td>1670</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N920WN</td>\n",
       "      <td>1019</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>DAL</td>\n",
       "      <td>1800</td>\n",
       "      <td>1802</td>\n",
       "      <td>2</td>\n",
       "      <td>2045</td>\n",
       "      <td>-13</td>\n",
       "      <td>105</td>\n",
       "      <td>90</td>\n",
       "      <td>80</td>\n",
       "      <td>580</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N480WN</td>\n",
       "      <td>1499</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>DAL</td>\n",
       "      <td>950</td>\n",
       "      <td>947</td>\n",
       "      <td>-3</td>\n",
       "      <td>1235</td>\n",
       "      <td>-12</td>\n",
       "      <td>105</td>\n",
       "      <td>96</td>\n",
       "      <td>81</td>\n",
       "      <td>580</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N227WN</td>\n",
       "      <td>3635</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>DAL</td>\n",
       "      <td>1150</td>\n",
       "      <td>1151</td>\n",
       "      <td>1</td>\n",
       "      <td>1430</td>\n",
       "      <td>-7</td>\n",
       "      <td>100</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "      <td>580</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      FL_DATE OP_UNIQUE_CARRIER TAIL_NUM  OP_CARRIER_FL_NUM ORIGIN DEST  \\\n",
       "0  2018-10-01                WN   N221WN                802    ABQ  BWI   \n",
       "1  2018-10-01                WN   N8329B               3744    ABQ  BWI   \n",
       "2  2018-10-01                WN   N920WN               1019    ABQ  DAL   \n",
       "3  2018-10-01                WN   N480WN               1499    ABQ  DAL   \n",
       "4  2018-10-01                WN   N227WN               3635    ABQ  DAL   \n",
       "\n",
       "   CRS_DEP_TIME  DEP_TIME  DEP_DELAY  CRS_ARR_TIME  ARR_DELAY  \\\n",
       "0           905       903         -2          1450        -17   \n",
       "1          1500      1458         -2          2045        -25   \n",
       "2          1800      1802          2          2045        -13   \n",
       "3           950       947         -3          1235        -12   \n",
       "4          1150      1151          1          1430         -7   \n",
       "\n",
       "   CRS_ELAPSED_TIME  ACTUAL_ELAPSED_TIME  AIR_TIME  DISTANCE  \n",
       "0               225                  210       197      1670  \n",
       "1               225                  202       191      1670  \n",
       "2               105                   90        80       580  \n",
       "3               105                   96        81       580  \n",
       "4               100                   92        80       580  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Remove the column ARR_TIME from the DataFrame delays_df\n",
    "\n",
    "#delays_df = delays_df.drop(['ARR_TIME'],axis=1)\n",
    "delays_df.drop(columns=['ARR_TIME'], inplace=True)\n",
    "delays_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use different techniques to predict based on quantititative values which are usually numeric values (e.g. distance, number of minutes, weight) and qualitative (descriptive) values which may not be numeric (e.g. what airport a flight left from, what airline operated the flight)\n",
    "\n",
    "Quantitative data may be moved into a separate DataFrame before training a model.\n",
    "\n",
    "You also need to put the value you want to predict, called the label (ARR_DELAY) in a separate DataFrame from the values you think can help you make the prediction, called the features\n",
    "\n",
    "We need to be able to create a new dataframe from the columns in an existing dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Create a new DataFrame called desc_df\n",
    "# include all rows\n",
    "# include the columns ORIGIN, DEST, OP_CARRIER_FL_NUM, OP_UNIQUE_CARRIER, TAIL_NUM\n",
    "\n",
    "desc_df = delays_df.loc[:,['ORIGIN', 'DEST', 'OP_CARRIER_FL_NUM', 'OP_UNIQUE_CARRIER', 'TAIL_NUM']]\n",
    "desc_df.head()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: even-more-python-for-beginners-data-tools/08 - Removing and splitting DataFrame columns/README.md
================================================
# Removing and splitting DataFrame columns

When preparing data for machine learning you may need to remove specific columns from the DataFrame.

## Common functions

- [drop](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html) deletes specified columns from a DataFrame

## Microsoft Learn Resources

Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner).

- [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner)


================================================
FILE: even-more-python-for-beginners-data-tools/08 - Removing and splitting DataFrame columns/flight_delays.csv
================================================
FL_DATE,OP_UNIQUE_CARRIER,TAIL_NUM,OP_CARRIER_FL_NUM,ORIGIN,DEST,CRS_DEP_TIME,DEP_TIME,DEP_DELAY,CRS_ARR_TIME,ARR_TIME,ARR_DELAY,CRS_ELAPSED_TIME,ACTUAL_ELAPSED_TIME,AIR_TIME,DISTANCE
2018-10-01,WN,N221WN,802,ABQ,BWI,905,903,-2,1450,1433,-17,225,210,197,1670
2018-10-01,WN,N8329B,3744,ABQ,BWI,1500,1458,-2,2045,2020,-25,225,202,191,1670
2018-10-01,WN,N920WN,1019,ABQ,DAL,1800,1802,2,2045,2032,-13,105,90,80,580
2018-10-01,WN,N480WN,1499,ABQ,DAL,950,947,-3,1235,1223,-12,105,96,81,580
2018-10-01,WN,N227WN,3635,ABQ,DAL,1150,1151,1,1430,1423,-7,100,92,80,580
2018-10-01,WN,N243WN,3998,ABQ,DAL,655,652,-3,940,924,-16,105,92,83,580
2018-10-01,WN,N485WN,5432,ABQ,DAL,1340,1354,14,1625,1631,6,105,97,81,580
2018-10-01,WN,N229WN,4596,ABQ,DEN,1420,1444,24,1540,1552,12,80,68,55,349
2018-10-01,WN,N934WN,6013,ABQ,DEN,910,907,-3,1025,1027,2,75,80,52,349
2018-10-01,WN,N934WN,6015,ABQ,DEN,1735,1742,7,1845,1854,9,70,72,58,349
2018-10-01,WN,N8615E,2885,ABQ,HOU,1240,1239,-1,1540,1539,-1,120,120,108,759
2018-10-01,WN,N965WN,3939,ABQ,HOU,640,640,0,940,938,-2,120,118,103,759
2018-10-01,WN,N408WN,4025,ABQ,HOU,1555,1610,15,1850,1906,16,115,116,103,759
2018-10-01,WN,N913WN,1642,ABQ,LAS,1040,1037,-3,1115,1057,-18,95,80,69,486
2018-10-01,WN,N927WN,3271,ABQ,LAS,1615,1614,-1,1645,1646,1,90,92,75,486
2018-10-01,WN,N732SW,4816,ABQ,LAS,605,601,-4,635,628,-7,90,87,73,486
2018-10-01,WN,N496WN,6095,ABQ,LAS,2130,2123,-7,2155,2146,-9,85,83,70,486
2018-10-01,WN,N468WN,555,ABQ,LAX,1710,1708,-2,1815,1805,-10,125,117,92,677
2018-10-01,WN,N7751A,3858,ABQ,LAX,545,541,-4,645,638,-7,120,117,102,677
2018-10-01,WN,N435WN,5757,ABQ,MCI,1720,2119,239,2005,2357,232,105,98,87,718
2018-10-01,WN,N556WN,538,ABQ,MDW,1705,1756,51,2040,2114,34,155,138,129,1121
2018-10-01,WN,N410WN,4837,ABQ,MDW,705,708,3,1045,1032,-13,160,144,127,1121
2018-10-01,WN,N8726H,792,ABQ,OAK,815,809,-6,940,928,-12,145,139,123,889
2018-10-01,WN,N956WN,5673,ABQ,OAK,1125,1221,56,1250,1337,47,145,136,121,889
2018-10-01,WN,N739GB,5753,ABQ,OAK,1915,1915,0,2035,2029,-6,140,134,121,889
2018-10-01,WN,N7723E,5516,ABQ,PDX,1020,1017,-3,1215,1204,-11,175,167,157,1111
2018-10-01,WN,N770SA,1415,ABQ,PHX,945,944,-1,1005,949,-16,80,65,54,328
2018-10-01,WN,N730SW,2782,ABQ,PHX,1410,1424,14,1430,1431,1,80,67,55,328
2018-10-01,WN,N7725A,2863,ABQ,PHX,700,702,2,720,720,0,80,78,56,328
2018-10-01,WN,N450WN,4114,ABQ,PHX,1935,1931,-4,1950,1959,9,75,88,58,328
2018-10-01,WN,N8673F,5500,ABQ,PHX,1625,1630,5,1640,1636,-4,75,66,58,328
2018-10-01,WN,N948WN,6315,ABQ,PHX,1120,1126,6,1140,1138,-2,80,72,57,328
2018-10-01,WN,N566WN,19,ABQ,SAN,1505,1551,46,1555,1631,36,110,100,87,628
2018-10-01,WN,N957WN,4832,ABQ,SAN,610,616,6,700,658,-2,110,102,91,628
2018-10-01,WN,N8704Q,824,ALB,BWI,805,801,-4,920,911,-9,75,70,54,289
2018-10-01,WN,N903WN,1758,ALB,BWI,605,601,-4,720,706,-14,75,65,55,289
2018-10-01,WN,N8572X,2790,ALB,BWI,925,928,3,1040,1031,-9,75,63,53,289
2018-10-01,WN,N7701B,3292,ALB,BWI,1315,1308,-7,1435,1417,-18,80,69,58,289
2018-10-01,WN,N295WN,3376,ALB,BWI,1105,1101,-4,1220,1206,-14,75,65,53,289
2018-10-01,WN,N716SW,4898,ALB,BWI,1710,1707,-3,1825,1815,-10,75,68,56,289
2018-10-01,WN,N8674B,5153,ALB,DEN,1850,1849,-1,2050,2045,-5,240,236,223,1610
2018-10-01,WN,N8643A,390,ALB,MCO,705,705,0,955,954,-1,170,169,149,1073
2018-10-01,WN,N730SW,2776,ALB,MDW,630,625,-5,735,744,9,125,139,120,717
2018-10-01,WN,N798SW,4197,ALB,MDW,1655,1652,-3,1805,1815,10,130,143,112,717
2018-10-01,WN,N729SW,988,AMA,DAL,1605,1615,10,1720,1713,-7,75,58,48,323
2018-10-01,WN,N933WN,1913,AMA,DAL,605,603,-2,720,705,-15,75,62,50,323
2018-10-01,WN,N7706A,5226,AMA,DAL,1045,1047,2,1155,1156,1,70,69,52,323
2018-10-01,WN,N755SA,6984,AMA,DAL,1830,1825,-5,1940,1921,-19,70,56,48,323
2018-10-01,WN,N211WN,6822,AMA,LAS,1425,1429,4,1425,1438,13,120,129,107,758
2018-10-01,WN,N928WN,4261,ATL,AUS,1015,1011,-4,1140,1137,-3,145,146,123,813
2018-10-01,WN,N8581Z,4701,ATL,AUS,2030,2024,-6,2150,2133,-17,140,129,109,813
2018-10-01,WN,N950WN,5615,ATL,AUS,1645,1647,2,1810,1805,-5,145,138,112,813
2018-10-01,WN,N932WN,106,ATL,BNA,2215,2211,-4,2215,2205,-10,60,54,39,214
2018-10-01,WN,N739GB,2583,ATL,BNA,800,756,-4,755,752,-3,55,56,42,214
2018-10-01,WN,N454WN,3766,ATL,BNA,1955,1951,-4,2000,1948,-12,65,57,39,214
2018-10-01,WN,N7716A,4165,ATL,BNA,1225,1226,1,1235,1226,-9,70,60,41,214
2018-10-01,WN,N7822A,4501,ATL,BNA,1750,1745,-5,1745,1742,-3,55,57,40,214
2018-10-01,WN,N8324A,3360,ATL,BOS,1330,1500,90,1605,1737,92,155,157,126,946
2018-10-01,WN,N444WN,3987,ATL,BOS,2210,2204,-6,50,24,-26,160,140,118,946
2018-10-01,WN,N472WN,1031,ATL,BWI,1120,1119,-1,1310,1303,-7,110,104,81,577
2018-10-01,WN,N758SW,1526,ATL,BWI,800,757,-3,945,934,-11,105,97,81,577
2018-10-01,WN,N8642E,1922,ATL,BWI,1700,1656,-4,1850,1840,-10,110,104,88,577
2018-10-01,WN,N7838A,3991,ATL,BWI,2115,2202,47,2305,2339,34,110,97,80,577
2018-10-01,WN,N7839A,4436,ATL,BWI,1905,1904,-1,2100,2044,-16,115,100,85,577
2018-10-01,WN,N8509U,5150,ATL,BWI,1340,1416,36,1530,1548,18,110,92,79,577
2018-10-01,WN,N242WN,2574,ATL,CLE,835,833,-2,1015,1016,1,100,103,79,554
2018-10-01,WN,N961WN,5133,ATL,CLE,2200,2155,-5,2335,2334,-1,95,99,80,554
2018-10-01,WN,N8503A,2571,ATL,CMH,1540,1540,0,1710,1711,1,90,91,65,447
2018-10-01,WN,N282WN,6348,ATL,CMH,835,834,-1,1005,1005,0,90,91,67,447
2018-10-01,WN,N293WN,6661,ATL,CMH,2200,2208,8,2325,2332,7,85,84,67,447
2018-10-01,WN,N954WN,63,ATL,DAL,2010,2010,0,2125,2118,-7,135,128,101,721
2018-10-01,WN,N764SW,2838,ATL,DAL,720,719,-1,825,816,-9,125,117,103,721
2018-10-01,WN,N8549Z,3845,ATL,DAL,1740,1738,-2,1845,1838,-7,125,120,102,721
2018-10-01,WN,N8620H,5577,ATL,DAL,1045,1102,17,1200,1208,8,135,126,103,721
2018-10-01,WN,N8317M,6768,ATL,DAL,1330,1331,1,1440,1431,-9,130,120,101,721
2018-10-01,WN,N550WN,1347,ATL,DCA,1525,1542,17,1710,1724,14,105,102,79,547
2018-10-01,WN,N8648A,2600,ATL,DCA,715,716,1,855,851,-4,100,95,82,547
2018-10-01,WN,N726SW,4747,ATL,DCA,2025,2027,2,2210,2209,-1,105,102,83,547
2018-10-01,WN,N8665D,5207,ATL,DCA,1030,1105,35,1215,1241,26,105,96,80,547
2018-10-01,WN,N930WN,208,ATL,DEN,1835,1846,11,1945,1948,3,190,182,167,1199
2018-10-01,WN,N400WN,4133,ATL,DEN,610,607,-3,715,711,-4,185,184,170,1199
2018-10-01,WN,N7817J,4139,ATL,DEN,1430,1432,2,1540,1539,-1,190,187,167,1199
2018-10-01,WN,N8647A,5960,ATL,DEN,955,1008,13,1110,1112,2,195,184,162,1199
2018-10-01,WN,N212WN,115,ATL,DTW,835,832,-3,1025,1024,-1,110,112,86,594
2018-10-01,WN,N256WN,1896,ATL,DTW,2200,2200,0,2345,2349,4,105,109,86,594
2018-10-01,WN,N8556Z,5388,ATL,DTW,1530,1531,1,1730,1727,-3,120,116,86,594
2018-10-01,WN,N279WN,2775,ATL,FLL,1110,1111,1,1310,1250,-20,120,99,82,581
2018-10-01,WN,N945WN,3088,ATL,FLL,1340,1351,11,1535,1529,-6,115,98,82,581
2018-10-01,WN,N8699A,5459,ATL,FLL,650,645,-5,835,820,-15,105,95,80,581
2018-10-01,WN,N8691A,6191,ATL,FLL,1950,1958,8,2140,2146,6,110,108,83,581
2018-10-01,WN,N8548P,964,ATL,GSP,1540,1539,-1,1630,1628,-2,50,49,27,153
2018-10-01,WN,N258WN,5417,ATL,GSP,2205,2240,35,2255,2322,27,50,42,27,153
2018-10-01,WN,N8660A,6185,ATL,GSP,1105,1101,-4,1200,1145,-15,55,44,26,153
2018-10-01,WN,N274WN,343,ATL,HOU,1810,1808,-2,1920,1901,-19,130,113,100,696
2018-10-01,WN,N230WN,1176,ATL,HOU,1955,1955,0,2105,2057,-8,130,122,101,696
2018-10-01,WN,N786SW,1433,ATL,HOU,1130,1308,98,1235,1443,128,125,155,140,696
2018-10-01,WN,N452WN,2847,ATL,HOU,605,601,-4,710,659,-11,125,118,99,696
2018-10-01,WN,N8619F,5161,ATL,HOU,1340,1333,-7,1440,1503,23,120,150,136,696
2018-10-01,WN,N8513F,812,ATL,IAD,1535,1535,0,1725,1727,2,110,112,78,534


================================================
FILE: even-more-python-for-beginners-data-tools/09 - Handling duplicates and rows with missing values/09 - Removing rows.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Handling duplicate rows and rows with missing values\n",
    "\n",
    "Most machine learning algorithms will return an error if they encounter a missing value.  So, you often have to remove rows with missing values from your DataFrame.\n",
    "\n",
    "To learn how, we need to create a pandas DataFrame and load it with data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The flight delays data set contains information about flights and flight delays"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>FL_DATE</th>\n",
       "      <th>OP_UNIQUE_CARRIER</th>\n",
       "      <th>TAIL_NUM</th>\n",
       "      <th>OP_CARRIER_FL_NUM</th>\n",
       "      <th>ORIGIN</th>\n",
       "      <th>DEST</th>\n",
       "      <th>CRS_DEP_TIME</th>\n",
       "      <th>DEP_TIME</th>\n",
       "      <th>DEP_DELAY</th>\n",
       "      <th>CRS_ARR_TIME</th>\n",
       "      <th>ARR_TIME</th>\n",
       "      <th>ARR_DELAY</th>\n",
       "      <th>CRS_ELAPSED_TIME</th>\n",
       "      <th>ACTUAL_ELAPSED_TIME</th>\n",
       "      <th>AIR_TIME</th>\n",
       "      <th>DISTANCE</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N221WN</td>\n",
       "      <td>802</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>BWI</td>\n",
       "      <td>905</td>\n",
       "      <td>903.0</td>\n",
       "      <td>-2.0</td>\n",
       "      <td>1450</td>\n",
       "      <td>1433.0</td>\n",
       "      <td>-17.0</td>\n",
       "      <td>225</td>\n",
       "      <td>210.0</td>\n",
       "      <td>197.0</td>\n",
       "      <td>1670</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N8329B</td>\n",
       "      <td>3744</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>BWI</td>\n",
       "      <td>1500</td>\n",
       "      <td>1458.0</td>\n",
       "      <td>-2.0</td>\n",
       "      <td>2045</td>\n",
       "      <td>2020.0</td>\n",
       "      <td>-25.0</td>\n",
       "      <td>225</td>\n",
       "      <td>202.0</td>\n",
       "      <td>191.0</td>\n",
       "      <td>1670</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N920WN</td>\n",
       "      <td>1019</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>DAL</td>\n",
       "      <td>1800</td>\n",
       "      <td>1802.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2045</td>\n",
       "      <td>2032.0</td>\n",
       "      <td>-13.0</td>\n",
       "      <td>105</td>\n",
       "      <td>90.0</td>\n",
       "      <td>80.0</td>\n",
       "      <td>580</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N480WN</td>\n",
       "      <td>1499</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>DAL</td>\n",
       "      <td>950</td>\n",
       "      <td>947.0</td>\n",
       "      <td>-3.0</td>\n",
       "      <td>1235</td>\n",
       "      <td>1223.0</td>\n",
       "      <td>-12.0</td>\n",
       "      <td>105</td>\n",
       "      <td>96.0</td>\n",
       "      <td>81.0</td>\n",
       "      <td>580</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2018-10-01</td>\n",
       "      <td>WN</td>\n",
       "      <td>N227WN</td>\n",
       "      <td>3635</td>\n",
       "      <td>ABQ</td>\n",
       "      <td>DAL</td>\n",
       "      <td>1150</td>\n",
       "      <td>1151.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1430</td>\n",
       "      <td>1423.0</td>\n",
       "      <td>-7.0</td>\n",
       "      <td>100</td>\n",
       "      <td>92.0</td>\n",
       "      <td>80.0</td>\n",
       "      <td>580</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      FL_DATE OP_UNIQUE_CARRIER TAIL_NUM  OP_CARRIER_FL_NUM ORIGIN DEST  \\\n",
       "0  2018-10-01                WN   N221WN                802    ABQ  BWI   \n",
       "1  2018-10-01                WN   N8329B               3744    ABQ  BWI   \n",
       "2  2018-10-01                WN   N920WN               1019    ABQ  DAL   \n",
       "3  2018-10-01                WN   N480WN               1499    ABQ  DAL   \n",
       "4  2018-10-01                WN   N227WN               3635    ABQ  DAL   \n",
       "\n",
       "   CRS_DEP_TIME  DEP_TIME  DEP_DELAY  CRS_ARR_TIME  ARR_TIME  ARR_DELAY  \\\n",
       "0           905     903.0       -2.0          1450    1433.0      -17.0   \n",
       "1          1500    1458.0       -2.0          2045    2020.0      -25.0   \n",
       "2          1800    1802.0        2.0          2045    2032.0      -13.0   \n",
       "3           950     947.0       -3.0          1235    1223.0      -12.0   \n",
       "4          1150    1151.0        1.0          1430    1423.0       -7.0   \n",
       "\n",
       "   CRS_ELAPSED_TIME  ACTUAL_ELAPSED_TIME  AIR_TIME  DISTANCE  \n",
       "0               225                210.0     197.0      1670  \n",
       "1               225                202.0     191.0      1670  \n",
       "2               105                 90.0      80.0       580  \n",
       "3               105                 96.0      81.0       580  \n",
       "4               100                 92.0      80.0       580  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv')\n",
    "delays_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**info**  will tell us how many rows are in the DataFrame and for each column how many of those rows contain non-null values. From this we can determine which columns (if any) contain null/missing values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 300000 entries, 0 to 299999\n",
      "Data columns (total 16 columns):\n",
      "FL_DATE                300000 non-null object\n",
      "OP_UNIQUE_CARRIER      300000 non-null object\n",
      "TAIL_NUM               299660 non-null object\n",
      "OP_CARRIER_FL_NUM      300000 non-null int64\n",
      "ORIGIN                 300000 non-null object\n",
      "DEST                   300000 non-null object\n",
      "CRS_DEP_TIME           300000 non-null int64\n",
      "DEP_TIME               296825 non-null float64\n",
      "DEP_DELAY              296825 non-null float64\n",
      "CRS_ARR_TIME           300000 non-null int64\n",
      "ARR_TIME               296574 non-null float64\n",
      "ARR_DELAY              295832 non-null float64\n",
      "CRS_ELAPSED_TIME       300000 non-null int64\n",
      "ACTUAL_ELAPSED_TIME    295832 non-null float64\n",
      "AIR_TIME               295832 non-null float64\n",
      "DISTANCE               300000 non-null int64\n",
      "dtypes: float64(6), int64(5), object(5)\n",
      "memory usage: 30.9+ MB\n"
     ]
    }
   ],
   "source": [
    "delays_df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "TAIL_NUM, DEP_TIME, DEP_DELAY, ARR_TIME, ARR_DELAY, ACTUAL_ELAPSED_TIME, and AIR_TIME all have rows with missing values."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are many techniques to deal with missing values, the simplest is to delete the rows with missing values.\n",
    "\n",
    "**dropna** will delete rows containing null/missing values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Int64Index: 295832 entries, 0 to 299999\n",
      "Data columns (total 16 columns):\n",
      "FL_DATE                295832 non-null object\n",
      "OP_UNIQUE_CARRIER      295832 non-null object\n",
      "TAIL_NUM               295832 non-null object\n",
      "OP_CARRIER_FL_NUM      295832 non-null int64\n",
      "ORIGIN                 295832 non-null object\n",
      "DEST                   295832 non-null object\n",
      "CRS_DEP_TIME           295832 non-null int64\n",
      "DEP_TIME               295832 non-null float64\n",
      "DEP_DELAY              295832 non-null float64\n",
      "CRS_ARR_TIME           295832 non-null int64\n",
      "ARR_TIME               295832 non-null float64\n",
      "ARR_DELAY              295832 non-null float64\n",
      "CRS_ELAPSED_TIME       295832 non-null int64\n",
      "ACTUAL_ELAPSED_TIME    295832 non-null float64\n",
      "AIR_TIME               295832 non-null float64\n",
      "DISTANCE               295832 non-null int64\n",
      "dtypes: float64(6), int64(5), object(5)\n",
      "memory usage: 32.7+ MB\n"
     ]
    }
   ],
   "source": [
    "delay_no_nulls_df = delays_df.dropna()   # Delete the rows with missing values\n",
    "delay_no_nulls_df.info()                 # Check the number of rows and number of rows with non-null values to confirm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you don't need to keep the original DataFrame, you can just delete the rows within the existing DataFrame instead of creating a new one\n",
    "\n",
    "**inplace=*True*** indicates you want to drop the rows in the specified DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Int64Index: 295832 entries, 0 to 299999\n",
      "Data columns (total 16 columns):\n",
      "FL_DATE                295832 non-null object\n",
      "OP_UNIQUE_CARRIER      295832 non-null object\n",
      "TAIL_NUM               295832 non-null object\n",
      "OP_CARRIER_FL_NUM      295832 non-null int64\n",
      "ORIGIN                 295832 non-null object\n",
      "DEST                   295832 non-null object\n",
      "CRS_DEP_TIME           295832 non-null int64\n",
      "DEP_TIME               295832 non-null float64\n",
      "DEP_DELAY              295832 non-null float64\n",
      "CRS_ARR_TIME           295832 non-null int64\n",
      "ARR_TIME               295832 non-null float64\n",
      "ARR_DELAY              295832 non-null float64\n",
      "CRS_ELAPSED_TIME       295832 non-null int64\n",
      "ACTUAL_ELAPSED_TIME    295832 non-null float64\n",
      "AIR_TIME               295832 non-null float64\n",
      "DISTANCE               295832 non-null int64\n",
      "dtypes: float64(6), int64(5), object(5)\n",
      "memory usage: 32.7+ MB\n"
     ]
    }
   ],
   "source": [
    "delays_df.dropna(inplace=True)\n",
    "delays_df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When data is loaded from multiple data sources you sometimes end up with duplicate records. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>City</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seattle-Tacoma</td>\n",
       "      <td>Seattle</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Heathrow</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>Amsterdam</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Name        City         Country\n",
       "0  Seattle-Tacoma     Seattle             USA\n",
       "1          Dulles  Washington             USA\n",
       "2          Dulles  Washington             USA\n",
       "3        Heathrow      London  United Kingdom\n",
       "4        Schiphol   Amsterdam     Netherlands"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports_df = pd.read_csv('Data/airportsDuplicateRows.csv')\n",
    "airports_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "use **duplicates** to find the duplicate rows.\n",
    "\n",
    "If a row is a duplicate of a previous row it returns **True**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2     True\n",
       "3    False\n",
       "4    False\n",
       "5    False\n",
       "6    False\n",
       "7    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports_df.duplicated()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**drop_duplicates** will delete the duplicate rows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>City</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seattle-Tacoma</td>\n",
       "      <td>Seattle</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dulles</td>\n",
       "      <td>Washington</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Heathrow</td>\n",
       "      <td>London</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Schiphol</td>\n",
       "      <td>Amsterdam</td>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Changi</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Pearson</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Narita</td>\n",
       "      <td>Tokyo</td>\n",
       "      <td>Japan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Name        City         Country\n",
       "0  Seattle-Tacoma     Seattle             USA\n",
       "1          Dulles  Washington             USA\n",
       "3        Heathrow      London  United Kingdom\n",
       "4        Schiphol   Amsterdam     Netherlands\n",
       "5          Changi   Singapore       Singapore\n",
       "6         Pearson     Toronto          Canada\n",
       "7          Narita       Tokyo           Japan"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airports_df.drop_duplicates(inplace=True)\n",
    "airports_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: even-more-python-for-beginners-data-tools/09 - Handling duplicates and rows with missing values/Lots_of_flight_data.csv
================================================
[File too large to display: 21.4 MB]

================================================
FILE: even-more-python-for-beginners-data-tools/09 - Handling duplicates and rows with missing values/README.md
================================================
# Handling duplicates and rows with missing values

When preparing data for machine learning you need to remove duplicate rows and you may need to delete rows with missing values.

## Common functions

- [dropna](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.duplicated.html) removes rows with missing values
- [duplicated](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.duplicated.html) returns a True or False to indicate if a row is a duplicate of a previous row
- [drop_duplicates](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html) returns a DataFrame with duplicate rows removed

## Microsoft Learn Resources

Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner).

- [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner)


================================================
FILE: even-more-python-for-beginners-data-tools/09 - Handling duplicates and rows with missing values/airportsDuplicateRows.csv
================================================
Name,City,Country
Seattle-Tacoma,Seattle,USA
Dulles,Washington,USA
Dulles,Washington,USA
Heathrow,London,United Kingdom
Schiphol,Amsterdam,Netherlands
Changi,Singapore,Singapore
Pearson,Toronto,Canada
Narita,Tokyo,Japan


================================================
FILE: even-more-python-for-beginners-data-tools/10 - Splitting test and training data with scikit-learn/10 - Train Test split.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Splitting test and training data\n",
    "When you train a data model you may need to split up your data into test and training data sets\n",
    "\n",
    "To accomplish this task we will use the [scikit-learn](https://scikit-learn.org/stable/) library\n",
    "\n",
    "scikit-learn is an open source, BSD licensed library for data science for preprocessing and training models."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we can split our data test and training data, we need to do some data preparation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's load our csv file with information about flights and flight delays\n",
    "\n",
    "Use **shape** to find out how many rows and columns are in the original DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(300000, 16)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv')\n",
    "delays_df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Split data into features and labels\n",
    "Create a DataFrame called X containing only the features we want to use to train our model.\n",
    "\n",
    "**Note** You can only use numeric values as features, if you have non-numeric values you must apply different techniques such as Hot Encoding to convert these into numeric values before using them as features to train a model. Check out Data Science courses for more information on these techniques!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>DISTANCE</th>\n",
       "      <th>CRS_ELAPSED_TIME</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1670</td>\n",
       "      <td>225</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1670</td>\n",
       "      <td>225</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>580</td>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>580</td>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>580</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   DISTANCE  CRS_ELAPSED_TIME\n",
       "0      1670               225\n",
       "1      1670               225\n",
       "2       580               105\n",
       "3       580               105\n",
       "4       580               100"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = delays_df.loc[:,['DISTANCE', 'CRS_ELAPSED_TIME']]\n",
    "X.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a DataFrame called y containing only the value we want to predict with our model. \n",
    "\n",
    "In our case we want to predict how many minutes late a flight will arrive. This information is in the ARR_DELAY column. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ARR_DELAY</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-17.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-25.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-13.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-7.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   ARR_DELAY\n",
       "0      -17.0\n",
       "1      -25.0\n",
       "2      -13.0\n",
       "3      -12.0\n",
       "4       -7.0"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y = delays_df.loc[:,['ARR_DELAY']]\n",
    "y.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Split into test and training data\n",
    "Use **scikitlearn train_test_split** to move 30% of the rows into Test DataFrames\n",
    "\n",
    "The other 70% of the rows into DataFrames we can use to train our model\n",
    "\n",
    "NOTE: by specifying a value for *random_state* we ensure that if we run the code again the same rows will be moved into the test DataFrame. This makes our results repeatable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "                                                    X, \n",
    "                                                    y, \n",
    "                                                    test_size=0.3, \n",
    "                                                    random_state=42\n",
    "                                                   )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now have a DataFrame **X_train** which contains 70% of the rows\n",
    "\n",
    "We will use this DataFrame to train our model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(210000, 2)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_train.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The DataFrame **X_test** contains the remaining 30% of the rows\n",
    "\n",
    "We will use this DataFrame to test our trained model, so we can check it's accuracy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(90000, 2)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_test.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**X_train** and **X_test** contain our features\n",
    "\n",
    "The features are the columns we think can help us predict how late a flight will arrive: **DISTANCE** and **CRS_ELAPSED_TIME**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>DISTANCE</th>\n",
       "      <th>CRS_ELAPSED_TIME</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>186295</th>\n",
       "      <td>237</td>\n",
       "      <td>60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>127847</th>\n",
       "      <td>411</td>\n",
       "      <td>111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>274740</th>\n",
       "      <td>342</td>\n",
       "      <td>85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74908</th>\n",
       "      <td>1005</td>\n",
       "      <td>164</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11630</th>\n",
       "      <td>484</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        DISTANCE  CRS_ELAPSED_TIME\n",
       "186295       237                60\n",
       "127847       411               111\n",
       "274740       342                85\n",
       "74908       1005               164\n",
       "11630        484               100"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_train.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "scrolled": true
   },
   "source": [
    "The DataFrame **y_train**  contains 70% of the rows\n",
    "\n",
    "We will use this DataFrame to train our model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you don't need to keep the original DataFrame, you can just delete the rows within the existing DataFrame instead of creating a new one\n",
    "**inplace=*True*** indicates you want to drop the rows in the specified DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(210000, 1)"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_train.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The DataFrame **y_test** contains the remaining 30% of the rows\n",
    "\n",
    "We will use this DataFrame to test our trained model, so we can check it's accuracy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(90000, 1)"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_test.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**y_train** and **y_test** contain our label\n",
    "\n",
    "The label is the columns we want to predict with our trained model: **ARR_DELAY**\n",
    "\n",
    "**NOTE:**  a negative value for ARR_DELAY indicates a flight arrived early"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ARR_DELAY</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>186295</th>\n",
       "      <td>-7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>127847</th>\n",
       "      <td>-16.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>274740</th>\n",
       "      <td>-10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74908</th>\n",
       "      <td>-19.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11630</th>\n",
       "      <td>-13.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        ARR_DELAY\n",
       "186295       -7.0\n",
       "127847      -16.0\n",
       "274740      -10.0\n",
       "74908       -19.0\n",
       "11630       -13.0"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_train.head()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: even-more-python-for-beginners-data-tools/10 - Splitting test and training data with scikit-learn/Lots_of_flight_data.csv
================================================
[File too large to display: 21.4 MB]

================================================
FILE: even-more-python-for-beginners-data-tools/10 - Splitting test and training data with scikit-learn/README.md
================================================
# Splitting test and training data with scikit-learn

[scikit-learn](https://scikit-learn.org/) is a library of tools for predictive data analysis, which will allow you to prepare your data for machine learning and create models.

## Common functions

- [train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) splits arrays into random train and test subsets

## Microsoft Learn Resources

Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner).

- [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner)


================================================
FILE: even-more-python-for-beginners-data-tools/11 - Train a linear regression model with scikit-learn/11 - Train a basic model.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Train a linear regression model\n",
    "When you have your data prepared you can train a model.\n",
    "\n",
    "There are multiple libraries and methods you can call to train models. In this notebook we will use the **LinearRegression** model in the **scikit-learn** library"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need our DataFrame, with data loaded, all the rows with null values removed, and the features and labels split into the separate training and test data. So, we'll start by just rerunning the commands from the previous notebooks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load our data from the csv file\n",
    "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv') \n",
    "\n",
    "# Remove rows with null values since those will crash our linear regression model training\n",
    "delays_df.dropna(inplace=True)\n",
    "\n",
    "# Move our features into the X DataFrame\n",
    "X = delays_df.loc[:,['DISTANCE', 'CRS_ELAPSED_TIME']]\n",
    "\n",
    "# Move our labels into the y DataFrame\n",
    "y = delays_df.loc[:,['ARR_DELAY']] \n",
    "\n",
    "# Split our data into test and training DataFrames\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "                                                    X, \n",
    "                                                    y, \n",
    "                                                    test_size=0.3, \n",
    "                                                    random_state=42\n",
    "                                                   )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use **Scikitlearn LinearRegression** *fit* method to train a linear regression model based on the training data stored in X_train and y_train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.linear_model import LinearRegression\n",
    "\n",
    "regressor = LinearRegression()     # Create a scikit learn LinearRegression object\n",
    "regressor.fit(X_train, y_train)    # Use the fit method to train the model using your training data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The *regressor* object now contains your trained Linear Regression model"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: even-more-python-for-beginners-data-tools/11 - Train a linear regression model with scikit-learn/Lots_of_flight_data.csv
================================================
[File too large to display: 21.4 MB]

================================================
FILE: even-more-python-for-beginners-data-tools/11 - Train a linear regression model with scikit-learn/README.md
================================================
# Train a linear regression model with scikit-learn

[Linear regression](https://en.wikipedia.org/wiki/Linear_regression) is a common algorithm for predicting values based on a given dataset.

## Common classes and functions

- [LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) fits a linear model
- [LinearRegression.fit](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html?highlight=linearregression#sklearn.linear_model.LinearRegression.fit) is used to fit the linear model based on training data

## Microsoft Learn Resources

Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner).

- [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner)


================================================
FILE: even-more-python-for-beginners-data-tools/12 - Testing a model/12 - Test a model.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Test a trained model\n",
    "Once you have trained a model, you can test it with the test data you put aside"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will start by rerunning the code from the previous notebook to create a trained model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.linear_model import LinearRegression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Load our data from the csv file\n",
    "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv') \n",
    "\n",
    "# Remove rows with null values since those will crash our linear regression model training\n",
    "delays_df.dropna(inplace=True)\n",
    "\n",
    "# Move our features into the X DataFrame\n",
    "X = delays_df.loc[:,['DISTANCE', 'CRS_ELAPSED_TIME']]\n",
    "\n",
    "# Move our labels into the y DataFrame\n",
    "y = delays_df.loc[:,['ARR_DELAY']] \n",
    "\n",
    "# Split our data into test and training DataFrames\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "                                                    X, \n",
    "                                                    y, \n",
    "                                                    test_size=0.3, \n",
    "                                                    random_state=42\n",
    "                                                   )\n",
    "regressor = LinearRegression()     # Create a scikit learn LinearRegression object\n",
    "regressor.fit(X_train, y_train)    # Use the fit method to train the model using your training data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Test the model\n",
    "Use **Scikitlearn LinearRegression predict** to have our trained model predict values for our test data\n",
    "\n",
    "We stored our test data in X_Test\n",
    "\n",
    "We will store the predicted results in  y_pred"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred = regressor.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[3.47739078],\n",
       "       [5.89055919],\n",
       "       [4.33288464],\n",
       "       ...,\n",
       "       [5.84678979],\n",
       "       [6.05195889],\n",
       "       [5.66255414]])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_pred"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When we split our data into training and test data we stored the actual values for each row of test data in the DataFrame y_test\n",
    "\n",
    "We can compare the values in y_pred to the value in y_test to get a sense of how accurately our mdoel predicted arrival delays"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ARR_DELAY</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>291483</th>\n",
       "      <td>-5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>98997</th>\n",
       "      <td>-12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23454</th>\n",
       "      <td>-9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>110802</th>\n",
       "      <td>-14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49449</th>\n",
       "      <td>-20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>94944</th>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>160885</th>\n",
       "      <td>-17.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47572</th>\n",
       "      <td>-20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>164800</th>\n",
       "      <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>62578</th>\n",
       "      <td>-9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>196742</th>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>91166</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>171564</th>\n",
       "      <td>-9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>60706</th>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>240773</th>\n",
       "      <td>-6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32695</th>\n",
       "      <td>-13.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>98399</th>\n",
       "      <td>-23.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>167341</th>\n",
       "      <td>-11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>126191</th>\n",
       "      <td>-4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>188715</th>\n",
       "      <td>131.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>258610</th>\n",
       "      <td>-5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>215751</th>\n",
       "      <td>-20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41210</th>\n",
       "      <td>-15.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>68090</th>\n",
       "      <td>-19.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>140794</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>178840</th>\n",
       "      <td>-14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>248071</th>\n",
       "      <td>21.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12770</th>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95948</th>\n",
       "      <td>40.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>172913</th>\n",
       "      <td>-13.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>200797</th>\n",
       "      <td>21.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36199</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70402</th>\n",
       "      <td>-37.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>285308</th>\n",
       "      <td>152.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>201508</th>\n",
       "      <td>-2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>154671</th>\n",
       "      <td>-5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>238535</th>\n",
       "      <td>-5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>133567</th>\n",
       "      <td>-9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3349</th>\n",
       "      <td>-8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>257254</th>\n",
       "      <td>-28.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>106572</th>\n",
       "      <td>-19.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>73023</th>\n",
       "      <td>-25.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>214699</th>\n",
       "      <td>-12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>274435</th>\n",
       "      <td>-7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67089</th>\n",
       "      <td>-10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>269917</th>\n",
       "      <td>-4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>164966</th>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>275120</th>\n",
       "      <td>-12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139292</th>\n",
       "      <td>-8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31106</th>\n",
       "      <td>-25.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>277799</th>\n",
       "      <td>17.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>293749</th>\n",
       "      <td>-7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>231114</th>\n",
       "      <td>35.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11645</th>\n",
       "      <td>-15.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>252520</th>\n",
       "      <td>-12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>209898</th>\n",
       "      <td>-20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22210</th>\n",
       "      <td>-9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>165727</th>\n",
       "      <td>-6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>260838</th>\n",
       "      <td>-33.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192546</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>88750 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        ARR_DELAY\n",
       "291483       -5.0\n",
       "98997       -12.0\n",
       "23454        -9.0\n",
       "110802      -14.0\n",
       "49449       -20.0\n",
       "...           ...\n",
       "209898      -20.0\n",
       "22210        -9.0\n",
       "165727       -6.0\n",
       "260838      -33.0\n",
       "192546        0.0\n",
       "\n",
       "[88750 rows x 1 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: even-more-python-for-beginners-data-tools/12 - Testing a model/Lots_of_flight_data.csv
================================================
[File too large to display: 21.4 MB]

================================================
FILE: even-more-python-for-beginners-data-tools/12 - Testing a model/README.md
================================================
# Testing a model

Once a model is built it can be used to predict values. You can provide new values to see where it would fall on the spectrum, and test the generated model.

## Common classes and functions

- [LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) fits a linear model
- [LinearRegression.predict](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html?highlight=linearregression#sklearn.linear_model.LinearRegression.predict) is used to predict outcomes for new data based on the trained linear model

## Microsoft Learn Resources

Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner).

- [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner)


================================================
FILE: even-more-python-for-beginners-data-tools/13 - Evaluating accuracy of a model using calculations/13 - Evaluate accuracy.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Evaluating accuracy of a model using calculations\n",
    "After you train a model, you need to get a sense of it's accuracy. The accuracy of a model gives you an idea of how much confidence you can put it predictions made by the model.\n",
    "\n",
    "The **scitkit-learn** and **numpy** libraries are both helpful for measuring model accuracy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's start by recreating our trained linear regression model from the last lesson"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.linear_model import LinearRegression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load our data from the csv file\n",
    "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv') \n",
    "\n",
    "# Remove rows with null values since those will crash our linear regression model training\n",
    "delays_df.dropna(inplace=True)\n",
    "\n",
    "# Move our features into the X DataFrame\n",
    "X = delays_df.loc[:,['DISTANCE', 'CRS_ELAPSED_TIME']]\n",
    "\n",
    "# Move our labels into the y DataFrame\n",
    "y = delays_df.loc[:,['ARR_DELAY']] \n",
    "\n",
    "# Split our data into test and training DataFrames\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "                                                    X, \n",
    "                                                    y, \n",
    "                                                    test_size=0.3, \n",
    "                                                    random_state=42\n",
    "                                                   )\n",
    "regressor = LinearRegression()     # Create a scikit learn LinearRegression object\n",
    "regressor.fit(X_train, y_train)    # Use the fit method to train the model using your training data\n",
    "\n",
    "y_pred = regressor.predict(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Measuring accuracy\n",
    "Now that we have a trained model there are a number of metrics you can use to check the accuracy of the model. \n",
    "\n",
    "All these metrics are based on mathematical calculations, the key take-away here is you don't have to calculate everything yourself. Scikit-learn and numpy will do most of the work and provide good performance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Mean Squared Error (MSE)\n",
    "The MSE is the average error performed by the model when predicting the outcome for an observation. \n",
    "The lower the MSE, the better the model.\n",
    "\n",
    "MSE is the average squared difference between the observed actual outome values and the values predicted by the model.\n",
    "\n",
    "MSE = mean((actuals - predicteds)^2) \n",
    "\n",
    "We could write code to loop through our records comparing actual and predicated values to perform this calculation, but we don't have to! Just use **mean_squared_error** from the **scikit-learn** library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Mean Squared Error: 2250.4445141530855\n"
     ]
    }
   ],
   "source": [
    "from sklearn import metrics\n",
    "print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Root Mean Squared Error (RMSE)\n",
    "RMSE is the average error performed by the model when predicting the outcome for an observation. \n",
    "The lower the RMSE, the better the model.\n",
    "\n",
    "Mathematically, the RMSE is the square root of the mean squared error \n",
    "\n",
    "RMSE = sqrt(MSE)\n",
    "\n",
    "Skikit learn does not have a function for RMSE, but since it's just the square root of MSE, we can use the numpy library which contains lots of mathematical functions to calculate the square root of the MSE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Mean Absolute Error (MAE)\n",
    "MAE measures the prediction error. The lower the MAE the better the model\n",
    "\n",
    "Mathematically, it is the average absolute difference between observed and predicted outcomes\n",
    "\n",
    "MAE = mean(abs(actuals - predicteds)). \n",
    "\n",
    "MAE is less sensitive to outliers compared to RMSE. Calculate RMSE using **mean_absolute_error** in the **scikit-learn** library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Mean absolute error: ',metrics.mean_absolute_error(y_test, y_pred))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# R^2 or R-Squared\n",
    "\n",
    "R squared is the proportion of variation in the outcome that is explained by the predictor variables. It is an indication of how much the values passed to the model influence the predicted value. \n",
    "\n",
    "The Higher the R-squared, the better the model. Calculate R-Squared using **r2_score** in the **scikit-learn** library."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('R^2: ',metrics.r2_score(y_test, y_pred))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Different models have different ways to measure accuracy. Fortunately **scikit-learn** and **numpy** provide a wide variety of functions to help measure accuracy."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: even-more-python-for-beginners-data-tools/13 - Evaluating accuracy of a model using calculations/Lots_of_flight_data.csv
================================================
[File too large to display: 21.4 MB]

================================================
FILE: even-more-python-for-beginners-data-tools/13 - Evaluating accuracy of a model using calculations/README.md
================================================
# Evaluating accuracy of a model using calculations

Playing with individual values isn't the best way to test a model. Fortunately, [scikit-learn](https://scikit-learn.org/) provides tools for automated testing an analysis.

## Common functions

- [metrics](https://scikit-learn.org/stable/modules/classes.html?highlight=metrics#module-sklearn.metrics) includes functions and metrics that can be used for data science including measuring accuracy of models
- [mean_squared_error](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error) returns the mean squared error, a measure used to measure accuracy of linear regression models
- [r2_score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html#sklearn.metrics.r2_score) returns the R^2 regression score, a measure used to measure accuracy of linear regression models

## NumPy

[NumPy](https://numpy.org/) is a package for scientific computing with Python

### Common functions

- [sqrt](https://numpy.org/doc/1.18/reference/generated/numpy.sqrt.html?highlight=sqrt#numpy.sqrt) returns the square root of a value

## Microsoft Learn Resources

Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner).

- [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner)


================================================
FILE: even-more-python-for-beginners-data-tools/14 - NumPy vs Pandas/14 - Working with numpy and pandas.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Moving data from numpy arrays to pandas DataFrames\n",
    "In our last notebook we trained a model and compared our actual and predicted results\n",
    "\n",
    "What may not have been evident was when we did this we were working with two different objects: a **numpy array** and a **pandas DataFrame**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To explore further let's rerun the code from the previous notebook to create a trained model and get predicted values for our test data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.linear_model import LinearRegression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load our data from the csv file\n",
    "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv') \n",
    "\n",
    "# Remove rows with null values since those will crash our linear regression model training\n",
    "delays_df.dropna(inplace=True)\n",
    "\n",
    "# Move our features into the X DataFrame\n",
    "X = delays_df.loc[:,['DISTANCE','CRS_ELAPSED_TIME']]\n",
    "\n",
    "# Move our labels into the y DataFrame\n",
    "y = delays_df.loc[:,['ARR_DELAY']] \n",
    "\n",
    "# Split our data into test and training DataFrames\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, \n",
    "                                                    y, \n",
    "                                                    test_size=0.3, \n",
    "                                                    random_state=42)\n",
    "regressor = LinearRegression()     # Create a scikit learn LinearRegression object\n",
    "regressor.fit(X_train, y_train)    # Use the fit method to train the model using your training data\n",
    "\n",
    "y_pred = regressor.predict(X_test)  # Generate predicted values for our test data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the last Notebook, you might have noticed the output displays differently when you display the contents of the predicted values in y_pred and the actual values in y_test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[3.47739078],\n",
       "       [5.89055919],\n",
       "       [4.33288464],\n",
       "       ...,\n",
       "       [5.84678979],\n",
       "       [6.05195889],\n",
       "       [5.66255414]])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_pred"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ARR_DELAY</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>291483</th>\n",
       "      <td>-5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>98997</th>\n",
       "      <td>-12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23454</th>\n",
       "      <td>-9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>110802</th>\n",
       "      <td>-14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49449</th>\n",
       "      <td>-20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>94944</th>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>160885</th>\n",
       "      <td>-17.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47572</th>\n",
       "      <td>-20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>164800</th>\n",
       "      <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>62578</th>\n",
       "      <td>-9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>196742</th>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>91166</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>171564</th>\n",
       "      <td>-9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>60706</th>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>240773</th>\n",
       "      <td>-6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32695</th>\n",
       "      <td>-13.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>98399</th>\n",
       "      <td>-23.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>167341</th>\n",
       "      <td>-11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>126191</th>\n",
       "      <td>-4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>188715</th>\n",
       "      <td>131.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>258610</th>\n",
       "      <td>-5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>215751</th>\n",
       "      <td>-20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41210</th>\n",
       "      <td>-15.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>68090</th>\n",
       "      <td>-19.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>140794</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>178840</th>\n",
       "      <td>-14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>248071</th>\n",
       "      <td>21.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12770</th>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95948</th>\n",
       "      <td>40.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>172913</th>\n",
       "      <td>-13.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>200797</th>\n",
       "      <td>21.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36199</th>\n",
       "      <td>0.0</td>\n",
 
Download .txt
gitextract_0t_49sca/

├── .gitignore
├── CODE_OF_CONDUCT.md
├── LICENSE
├── README.md
├── SECURITY.md
├── even-more-python-for-beginners-data-tools/
│   ├── 01 - Jupyter Notebooks/
│   │   └── README.md
│   ├── 02 - Introduction to Anaconda and Conda/
│   │   └── README.md
│   ├── 03 - Intro to Pandas/
│   │   ├── 03 - Pandas Series and DataFrame.ipynb
│   │   └── README.md
│   ├── 04 - Examining Pandas DataFrame contents/
│   │   ├── 04 - Exploring pandas DataFrame contents.ipynb
│   │   └── README.md
│   ├── 05 - Query a pandas Dataframe/
│   │   ├── 05 - Querying DataFrames.ipynb
│   │   └── README.md
│   ├── 06 - CSV Files and Jupyter Notebooks/
│   │   ├── README.md
│   │   └── airports.csv
│   ├── 07 - Read and write CSV files from pandas DataFrames/
│   │   ├── 07 - Read write CSV files.ipynb
│   │   ├── README.md
│   │   ├── airports.csv
│   │   ├── airportsBlankValues.csv
│   │   ├── airportsInvalidRows.csv
│   │   └── airportsNoHeaderRows.csv
│   ├── 08 - Removing and splitting DataFrame columns/
│   │   ├── 08 - Removing columns.ipynb
│   │   ├── README.md
│   │   └── flight_delays.csv
│   ├── 09 - Handling duplicates and rows with missing values/
│   │   ├── 09 - Removing rows.ipynb
│   │   ├── Lots_of_flight_data.csv
│   │   ├── README.md
│   │   └── airportsDuplicateRows.csv
│   ├── 10 - Splitting test and training data with scikit-learn/
│   │   ├── 10 - Train Test split.ipynb
│   │   ├── Lots_of_flight_data.csv
│   │   └── README.md
│   ├── 11 - Train a linear regression model with scikit-learn/
│   │   ├── 11 - Train a basic model.ipynb
│   │   ├── Lots_of_flight_data.csv
│   │   └── README.md
│   ├── 12 - Testing a model/
│   │   ├── 12 - Test a model.ipynb
│   │   ├── Lots_of_flight_data.csv
│   │   └── README.md
│   ├── 13 - Evaluating accuracy of a model using calculations/
│   │   ├── 13 - Evaluate accuracy.ipynb
│   │   ├── Lots_of_flight_data.csv
│   │   └── README.md
│   ├── 14 - NumPy vs Pandas/
│   │   ├── 14 - Working with numpy and pandas.ipynb
│   │   ├── Lots_of_flight_data.csv
│   │   └── README.md
│   ├── 15 - Visualizing data with Matplotlib/
│   │   ├── 15 - Visualizing correlations.ipynb
│   │   ├── Lots_of_flight_data.csv
│   │   └── README.md
│   ├── README.md
│   └── Slides/
│       ├── 01 - Jupyter Notebooks.pptx
│       ├── 02 - Intro to Anaconda and conda.pptx
│       ├── 03 - Pandas series and DataFrame.pptx
│       ├── 04 - Examining pandas DataFrame contents.pptx
│       ├── 05 - Query a pandas DataFrame.pptx
│       ├── 06 - CSV Files and Jupyter notebooks.pptx
│       ├── 07 - Read and write CSV files from DataFrames.pptx
│       ├── 08 - Remove columns from DataFrame.pptx
│       ├── 09 - Remove rows with missing values.pptx
│       ├── 10 - Splitting test and training data.pptx
│       ├── 11 - Train a linear regression model with scikitlearn.pptx
│       ├── 12 - Testing a model.pptx
│       ├── 13 - Evaluate accuracy of a model using calculations.pptx
│       ├── 14 - Working with numpy and pandas.pptx
│       └── 15 - Visualizing Data Correlations with Matplotlib.pptx
├── more-python-for-beginners/
│   ├── .gitignore
│   ├── 01 - Formatting and linting/
│   │   ├── .vscode/
│   │   │   └── settings.json
│   │   ├── README.md
│   │   ├── bad.py
│   │   └── good.py
│   ├── 02 - Lambdas/
│   │   ├── README.md
│   │   ├── failed_sort.py
│   │   ├── lambda_sorter.py
│   │   └── method_sorter.py
│   ├── 03 - Classes/
│   │   ├── README.md
│   │   ├── basic_class.py
│   │   └── properties_class.py
│   ├── 04 - Inheritance/
│   │   ├── README.md
│   │   └── demo.py
│   ├── 05 - Mixins/
│   │   ├── README.md
│   │   └── demo.py
│   ├── 06 - Managing the file system/
│   │   ├── README.md
│   │   ├── demo.txt
│   │   ├── directories.py
│   │   ├── files.py
│   │   └── paths.py
│   ├── 07 - Reading and writing files/
│   │   ├── README.md
│   │   ├── demo.txt
│   │   ├── manage.py
│   │   ├── read.py
│   │   └── write.py
│   ├── 08 - Managing external resources/
│   │   ├── README.md
│   │   ├── demo.py
│   │   └── output.txt
│   ├── 09 - Asynchronous programming/
│   │   ├── README.md
│   │   ├── async_demo.py
│   │   └── sync_demo.py
│   ├── README.md
│   ├── Slides/
│   │   ├── 01 - Formatting and linting.pptx
│   │   ├── 02 - Lambdas.pptx
│   │   ├── 03 - Classes.pptx
│   │   ├── 04 - Inhheritance.pptx
│   │   ├── 05 - Mixins (multiple inheritance).pptx
│   │   ├── 06 - Managing the file system.pptx
│   │   ├── 07 - Working with files.pptx
│   │   ├── 08 - Cleanup with with.pptx
│   │   └── 09 - Asynchronous operations.pptx
│   └── requirements.txt
└── python-for-beginners/
    ├── 02 - Print/
    │   ├── README.md
    │   ├── ask_for_input.py
    │   ├── coding_challenge.py
    │   ├── coding_challenge_solution.py
    │   ├── hello_world.py
    │   ├── print_blank_line.py
    │   └── single_or_double_quotes.py
    ├── 03 - Comments/
    │   ├── README.md
    │   ├── comments_are_not_executed.py
    │   ├── comments_for_debugging.py
    │   ├── enable_pin.py
    │   └── string_in_double_quotes.py
    ├── 04 - String variables/
    │   ├── README.md
    │   ├── code_challenge.py
    │   ├── code_challenge_solution.py
    │   ├── combine_strings.py
    │   ├── format_strings.py
    │   ├── string_functions.py
    │   └── strings_in_variables.py
    ├── 05 - Numeric variables/
    │   ├── README.md
    │   ├── code_challenge.py
    │   ├── code_challenge_solution.py
    │   ├── combining_strings_and_numbers.py
    │   ├── convert_strings_to_numbers_for_math.py
    │   ├── doing_math.py
    │   ├── numbers_treated_as_strings.py
    │   └── print_pi.py
    ├── 06 - Dates/
    │   ├── README.md
    │   ├── code_challenge.py
    │   ├── code_challenge_solution.py
    │   ├── date_functions.py
    │   ├── format_date.py
    │   ├── get_current_date.py
    │   └── input_date.py
    ├── 07 - Error handling/
    │   ├── README.md
    │   ├── logic.py
    │   ├── runtime.py
    │   └── syntax.py
    ├── 08 - Handling conditions/
    │   ├── README.md
    │   ├── add_else.py
    │   ├── add_else_different_indentation.py
    │   ├── case_insensitive_comparisons.py
    │   ├── check_tax.py
    │   ├── code_challenge.py
    │   ├── code_challenge_solution.py
    │   └── comparing_strings.py
    ├── 09 - Handling multiple conditions/
    │   ├── README.md
    │   ├── add_else_to_elif.py
    │   ├── code_challenge.py
    │   ├── code_challenge_solution.py
    │   ├── multiple_if_statements.py
    │   ├── nested_if.py
    │   ├── or_statements.py
    │   ├── use_elif.py
    │   └── use_in_statements.py
    ├── 10 - Complex conditon checks/
    │   ├── boolean_variables.py
    │   ├── code_challenge.py
    │   ├── code_challenge_solution.py
    │   ├── readme.md
    │   └── using_and.py
    ├── 11 - Collections/
    │   ├── README.md
    │   ├── arrays.py
    │   ├── common-operations.py
    │   ├── dictionaries.py
    │   ├── lists.py
    │   └── ranges.py
    ├── 12 - Loops/
    │   ├── README.md
    │   ├── for.py
    │   ├── number.py
    │   └── while.py
    ├── 13 - Functions/
    │   ├── README.md
    │   ├── code_challenge.py
    │   ├── code_challenge_solution.py
    │   ├── get_initails_function.py
    │   ├── get_initials.py
    │   ├── getting_clever_with_functions_harder_to_read.py
    │   ├── print_time_function.py
    │   ├── print_time_function_different_messages.py
    │   ├── print_time_function_fix_import.py
    │   ├── print_time_repeated_code.py
    │   └── print_time_with_message_parameter.py
    ├── 14 - Function parameters/
    │   ├── code_challenge.py
    │   ├── code_challenge_solution.py
    │   ├── get_initials_default_values.py
    │   ├── get_initials_function.py
    │   ├── get_initials_multiple_parameters.py
    │   ├── get_initials_named_parameters.py
    │   ├── named_parameters_make_code_readable.py
    │   └── readme.md
    ├── 15 - Packages/
    │   ├── README.md
    │   ├── color_import_demo.py
    │   ├── helpers.py
    │   ├── import_module.py
    │   └── requirements.txt
    ├── 16 - Calling APIs/
    │   ├── call_api.py
    │   ├── code_challenge.py
    │   └── readme.md
    ├── 17 - JSON/
    │   ├── create_json_from_dict.py
    │   ├── create_json_with_list.py
    │   ├── create_json_with_nested_dict.py
    │   ├── read_json.py
    │   ├── read_key_pair.py
    │   ├── read_key_pair_list.py
    │   ├── read_subkey.py
    │   └── readme.md
    ├── 18 - Decorators/
    │   ├── README.md
    │   └── creating_decorators.py
    ├── README.md
    └── Slides/
        ├── 0 - Intro.pptx
        ├── 1 - Getting started.pptx
        ├── 10 - ComplexConditionChecks.pptx
        ├── 11 - Collections.pptx
        ├── 12 - Loops.pptx
        ├── 13 - Functions.pptx
        ├── 14 - FunctionParameters.pptx
        ├── 15 - ModulesPackages.pptx
        ├── 16 - CallingAPI.pptx
        ├── 17 - JSON.pptx
        ├── 2 - Print.pptx
        ├── 3 - Comments.pptx
        ├── 4 - StringVariables.pptx
        ├── 5 - NumericVariables.pptx
        ├── 6 - Dates.pptx
        ├── 7 - ErrorHandling.pptx
        ├── 8 - Conditions.pptx
        └── 9 - MultipleConditions.pptx
Download .txt
SYMBOL INDEX (47 symbols across 24 files)

FILE: more-python-for-beginners/01 - Formatting and linting/bad.py
  function helper (line 7) | def helper(name='sample'):
  function another (line 10) | def another(name = 'sample'):

FILE: more-python-for-beginners/01 - Formatting and linting/good.py
  function print_hello (line 1) | def print_hello(name: str) -> str:

FILE: more-python-for-beginners/02 - Lambdas/method_sorter.py
  function sorter (line 1) | def sorter(item):

FILE: more-python-for-beginners/03 - Classes/basic_class.py
  class Presenter (line 1) | class Presenter():
    method __init__ (line 2) | def __init__(self, name):
    method say_hello (line 5) | def say_hello(self):

FILE: more-python-for-beginners/03 - Classes/properties_class.py
  class Presenter (line 1) | class Presenter():
    method __init__ (line 2) | def __init__(self, name):
    method name (line 7) | def name(self):
    method name (line 11) | def name(self, value):

FILE: more-python-for-beginners/04 - Inheritance/demo.py
  class Person (line 1) | class Person:
    method __init__ (line 2) | def __init__(self, name):
    method say_hello (line 4) | def say_hello(self):
  class Student (line 7) | class Student(Person):
    method __init__ (line 8) | def __init__(self, name, school):
    method sing_school_song (line 11) | def sing_school_song(self):

FILE: more-python-for-beginners/05 - Mixins/demo.py
  class Loggable (line 1) | class Loggable:
    method __init__ (line 2) | def __init__(self):
    method log (line 4) | def log(self):
  class Connection (line 7) | class Connection:
    method __init__ (line 8) | def __init__(self):
    method connect (line 10) | def connect(self):
  class SqlDatabase (line 13) | class SqlDatabase(Connection, Loggable):
    method __init__ (line 14) | def __init__(self):
  function framework (line 20) | def framework(item):

FILE: more-python-for-beginners/09 - Asynchronous programming/async_demo.py
  function load_data (line 5) | async def load_data(session, delay):
  function main (line 12) | async def main():

FILE: more-python-for-beginners/09 - Asynchronous programming/sync_demo.py
  function load_data (line 4) | def load_data(delay):
  function run_demo (line 9) | def run_demo():
  function main (line 18) | def main():

FILE: python-for-beginners/03 - Comments/enable_pin.py
  function enable_pin (line 6) | def enable_pin(user, pin):

FILE: python-for-beginners/13 - Functions/code_challenge_solution.py
  function calculator (line 10) | def calculator(first_number, second_number, operation):

FILE: python-for-beginners/13 - Functions/get_initails_function.py
  function get_initial (line 7) | def get_initial(name):

FILE: python-for-beginners/13 - Functions/getting_clever_with_functions_harder_to_read.py
  function get_initial (line 7) | def get_initial(name):

FILE: python-for-beginners/13 - Functions/print_time_function.py
  function print_time (line 4) | def print_time():

FILE: python-for-beginners/13 - Functions/print_time_function_fix_import.py
  function print_time (line 6) | def print_time():

FILE: python-for-beginners/13 - Functions/print_time_with_message_parameter.py
  function print_time (line 6) | def print_time(task_name):

FILE: python-for-beginners/14 - Function parameters/code_challenge_solution.py
  function calculator (line 9) | def calculator(first_number, second_number, operation='ADD'):

FILE: python-for-beginners/14 - Function parameters/get_initials_default_values.py
  function get_initial (line 7) | def get_initial(name, force_uppercase=True):

FILE: python-for-beginners/14 - Function parameters/get_initials_function.py
  function get_initial (line 7) | def get_initial(name):

FILE: python-for-beginners/14 - Function parameters/get_initials_multiple_parameters.py
  function get_initial (line 7) | def get_initial(name, force_uppercase):

FILE: python-for-beginners/14 - Function parameters/get_initials_named_parameters.py
  function get_initial (line 7) | def get_initial(name, force_uppercase):

FILE: python-for-beginners/14 - Function parameters/named_parameters_make_code_readable.py
  function error_logger (line 14) | def error_logger(error_code, error_severity, log_to_db, error_message, s...

FILE: python-for-beginners/15 - Packages/helpers.py
  function display (line 1) | def display(message, is_warning=False):

FILE: python-for-beginners/18 - Decorators/creating_decorators.py
  function color (line 5) | def color(color):
  function greeter (line 15) | def greeter():
Condensed preview — 231 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (331K chars).
[
  {
    "path": ".gitignore",
    "chars": 2116,
    "preview": "env\n\n# Created by https://www.gitignore.io/api/python,visualstudiocode\n# Edit at https://www.gitignore.io/?templates=pyt"
  },
  {
    "path": "CODE_OF_CONDUCT.md",
    "chars": 553,
    "preview": "# Microsoft Open Source Code of Conduct\r\n\r\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://"
  },
  {
    "path": "LICENSE",
    "chars": 1162,
    "preview": "    MIT License\r\n\r\n    Copyright (c) Microsoft Corporation.\r\n\r\n    Permission is hereby granted, free of charge, to any "
  },
  {
    "path": "README.md",
    "chars": 5011,
    "preview": "# Getting started with Python\r\n\r\n## Overview\r\n\r\nThese three series on Channel 9 and YouTube are designed to help get you"
  },
  {
    "path": "SECURITY.md",
    "chars": 2967,
    "preview": "<!-- BEGIN MICROSOFT SECURITY.MD V0.0.2 BLOCK -->\r\n\r\n## Security\r\n\r\nMicrosoft takes the security of our software product"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/01 - Jupyter Notebooks/README.md",
    "chars": 1607,
    "preview": "# Jupyter Notebooks\r\n\r\nJupyter Notebooks are an open source web application that allows you to create and share Python c"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/02 - Introduction to Anaconda and Conda/README.md",
    "chars": 1146,
    "preview": "# Anaconda\r\n\r\n[Anaconda](https://www.anaconda.com/) is an open source distribution of Python and R for data science. It "
  },
  {
    "path": "even-more-python-for-beginners-data-tools/03 - Intro to Pandas/03 - Pandas Series and DataFrame.ipynb",
    "chars": 10571,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# pandas Series and DataFrame\"\n   ]"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/03 - Intro to Pandas/README.md",
    "chars": 750,
    "preview": "# pandas\r\n\r\n[pandas](https://pandas/pydata.org​) is an open source Python library contains a number of high performance "
  },
  {
    "path": "even-more-python-for-beginners-data-tools/04 - Examining Pandas DataFrame contents/04 - Exploring pandas DataFrame contents.ipynb",
    "chars": 10190,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Examining pandas DataFrame conten"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/04 - Examining Pandas DataFrame contents/README.md",
    "chars": 915,
    "preview": "# Examining pandas DataFrame contents\r\n\r\nThe pandas [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/a"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/05 - Query a pandas Dataframe/05 - Querying DataFrames.ipynb",
    "chars": 21728,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Query a pandas DataFrame \\n\",\n   "
  },
  {
    "path": "even-more-python-for-beginners-data-tools/05 - Query a pandas Dataframe/README.md",
    "chars": 831,
    "preview": "# Query a pandas DataFrame\r\n\r\nThe pandas [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.D"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/06 - CSV Files and Jupyter Notebooks/README.md",
    "chars": 514,
    "preview": "# CSV Files and Jupyter Notebooks\r\n\r\nCSV files are comma separated variable file. CSV files are frequently used to store"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/06 - CSV Files and Jupyter Notebooks/airports.csv",
    "chars": 207,
    "preview": "Name,City,Country\r\nSeattle-Tacoma,Seattle,USA\r\nDulles,Washington,USA\r\nHeathrow,London,United Kingdom\r\nSchiphol,Amsterda"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/07 - Read write CSV files.ipynb",
    "chars": 31603,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Read and write CSV files with pan"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/README.md",
    "chars": 890,
    "preview": "# Read and write CSV files from pandas DataFrames\r\n\r\nYou can populate a DataFrame with the data in a CSV file.\r\n\r\n## Com"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/airports.csv",
    "chars": 207,
    "preview": "Name,City,Country\r\nSeattle-Tacoma,Seattle,USA\r\nDulles,Washington,USA\r\nHeathrow,London,United Kingdom\r\nSchiphol,Amsterda"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/airportsBlankValues.csv",
    "chars": 198,
    "preview": "Name,City,Country\r\nSeattle-Tacoma,Seattle,USA\r\nDulles,Washington,USA\r\nHeathrow,London,United Kingdom\r\nSchiphol,,Netherl"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/airportsInvalidRows.csv",
    "chars": 208,
    "preview": "Name,City,Country\r\nSeattle-Tacoma,Seattle,USA\r\nDulles,Washington,USA\r\nHeathrow,London,,United Kingdom\r\nSchiphol,Amsterd"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/airportsNoHeaderRows.csv",
    "chars": 188,
    "preview": "Seattle-Tacoma,Seattle,USA\r\nDulles,Washington,USA\r\nHeathrow,London,United Kingdom\r\nSchiphol,Amsterdam,Netherlands\r\nChan"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/08 - Removing and splitting DataFrame columns/08 - Removing columns.ipynb",
    "chars": 21532,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Removing and splitting pandas Dat"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/08 - Removing and splitting DataFrame columns/README.md",
    "chars": 606,
    "preview": "# Removing and splitting DataFrame columns\r\n\r\nWhen preparing data for machine learning you may need to remove specific c"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/08 - Removing and splitting DataFrame columns/flight_delays.csv",
    "chars": 7538,
    "preview": "FL_DATE,OP_UNIQUE_CARRIER,TAIL_NUM,OP_CARRIER_FL_NUM,ORIGIN,DEST,CRS_DEP_TIME,DEP_TIME,DEP_DELAY,CRS_ARR_TIME,ARR_TIME,A"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/09 - Handling duplicates and rows with missing values/09 - Removing rows.ipynb",
    "chars": 19180,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Handling duplicate rows and rows "
  },
  {
    "path": "even-more-python-for-beginners-data-tools/09 - Handling duplicates and rows with missing values/README.md",
    "chars": 993,
    "preview": "# Handling duplicates and rows with missing values\r\n\r\nWhen preparing data for machine learning you need to remove duplic"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/09 - Handling duplicates and rows with missing values/airportsDuplicateRows.csv",
    "chars": 230,
    "preview": "Name,City,Country\r\nSeattle-Tacoma,Seattle,USA\r\nDulles,Washington,USA\r\nDulles,Washington,USA\r\nHeathrow,London,United Kin"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/10 - Splitting test and training data with scikit-learn/10 - Train Test split.ipynb",
    "chars": 14509,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Splitting test and training data\\"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/10 - Splitting test and training data with scikit-learn/README.md",
    "chars": 719,
    "preview": "# Splitting test and training data with scikit-learn\r\n\r\n[scikit-learn](https://scikit-learn.org/) is a library of tools "
  },
  {
    "path": "even-more-python-for-beginners-data-tools/11 - Train a linear regression model with scikit-learn/11 - Train a basic model.ipynb",
    "chars": 3400,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Train a linear regression model\\n"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/11 - Train a linear regression model with scikit-learn/README.md",
    "chars": 905,
    "preview": "# Train a linear regression model with scikit-learn\r\n\r\n[Linear regression](https://en.wikipedia.org/wiki/Linear_regressi"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/12 - Testing a model/12 - Test a model.ipynb",
    "chars": 12171,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Test a trained model\\n\",\n    \"Onc"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/12 - Testing a model/README.md",
    "chars": 917,
    "preview": "# Testing a model\r\n\r\nOnce a model is built it can be used to predict values. You can provide new values to see where it "
  },
  {
    "path": "even-more-python-for-beginners-data-tools/13 - Evaluating accuracy of a model using calculations/13 - Evaluate accuracy.ipynb",
    "chars": 6596,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Evaluating accuracy of a model us"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/13 - Evaluating accuracy of a model using calculations/README.md",
    "chars": 1474,
    "preview": "# Evaluating accuracy of a model using calculations\r\n\r\nPlaying with individual values isn't the best way to test a model"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/14 - NumPy vs Pandas/14 - Working with numpy and pandas.ipynb",
    "chars": 17998,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Moving data from numpy arrays to "
  },
  {
    "path": "even-more-python-for-beginners-data-tools/14 - NumPy vs Pandas/README.md",
    "chars": 1316,
    "preview": "# NumPy vs pandas\r\n\r\nThere are numerous libraries available for use for data scientists. NumPy and pandas are two of the"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/15 - Visualizing data with Matplotlib/15 - Visualizing correlations.ipynb",
    "chars": 3943,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"sourc"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/15 - Visualizing data with Matplotlib/README.md",
    "chars": 1141,
    "preview": "# Visualizing data with Matplotlib\r\n\r\n[Matplotlib](https://matplotlib.org/) gives you the ability to draw charts which c"
  },
  {
    "path": "even-more-python-for-beginners-data-tools/README.md",
    "chars": 2415,
    "preview": "# Even more Python for beginners - data tools\n\n## Overview\n\nData science and machine learning among the most popular fie"
  },
  {
    "path": "more-python-for-beginners/.gitignore",
    "chars": 2123,
    "preview": "\n# Created by https://www.gitignore.io/api/python,virtualenv,visualstudiocode\n# Edit at https://www.gitignore.io/?templa"
  },
  {
    "path": "more-python-for-beginners/01 - Formatting and linting/.vscode/settings.json",
    "chars": 262,
    "preview": "{\r\n    \"python.linting.pylintEnabled\": true,\r\n    \"python.linting.enabled\": true,\r\n    \"python.linting.flake8Enabled\": f"
  },
  {
    "path": "more-python-for-beginners/01 - Formatting and linting/README.md",
    "chars": 1156,
    "preview": "# Style Guidelines\r\n\r\n## Formatting\r\n\r\nFormatting makes code readable and easier to debug.\r\n\r\n## Documentation\r\n\r\n- [PEP"
  },
  {
    "path": "more-python-for-beginners/01 - Formatting and linting/bad.py",
    "chars": 146,
    "preview": "x = 12\nif x == 24:\n print('Is valid')\nelse:\n    print(\"Not valid\")\n\ndef helper(name='sample'):\n pass\n\ndef another(name ="
  },
  {
    "path": "more-python-for-beginners/01 - Formatting and linting/good.py",
    "chars": 180,
    "preview": "def print_hello(name: str) -> str:\n    \"\"\"\n    Greets the user by name\n\n\tParameters:\n\t\tname (str): The name of the user\n"
  },
  {
    "path": "more-python-for-beginners/02 - Lambdas/README.md",
    "chars": 477,
    "preview": "# Lambdas\r\n\r\nA [lambda](https://www.w3schools.com/python/python_lambda.asp) function is a small anonymous function. It c"
  },
  {
    "path": "more-python-for-beginners/02 - Lambdas/failed_sort.py",
    "chars": 251,
    "preview": "# This code will return an error because the sort method does not know\r\n# which presenter field to use when sorting\r\npre"
  },
  {
    "path": "more-python-for-beginners/02 - Lambdas/lambda_sorter.py",
    "chars": 348,
    "preview": "# Sort alphabetically\npresenters = [\n    {'name': 'Susan', 'age': 50},\n    {'name': 'Christopher', 'age': 47}\n]\n\npresent"
  },
  {
    "path": "more-python-for-beginners/02 - Lambdas/method_sorter.py",
    "chars": 180,
    "preview": "def sorter(item):\n    return item['name']\n\n\npresenters = [\n    {'name': 'Susan', 'age': 50},\n    {'name': 'Christopher',"
  },
  {
    "path": "more-python-for-beginners/03 - Classes/README.md",
    "chars": 456,
    "preview": "# Classes\r\n\r\n[Classes](https://docs.python.org/3/tutorial/classes.html) define data structures and behavior. Classes all"
  },
  {
    "path": "more-python-for-beginners/03 - Classes/basic_class.py",
    "chars": 229,
    "preview": "class Presenter():\n\tdef __init__(self, name):\n\t\t# Constructor\n\t\tself.name = name\n\tdef say_hello(self):\n\t\t# method\n\t\tprin"
  },
  {
    "path": "more-python-for-beginners/03 - Classes/properties_class.py",
    "chars": 360,
    "preview": "class Presenter():\n\tdef __init__(self, name):\n\t\t# Constructor\n\t\tself.name = name\n\n\t@property\n\tdef name(self):\n\t\tprint('R"
  },
  {
    "path": "more-python-for-beginners/04 - Inheritance/README.md",
    "chars": 607,
    "preview": "# Inheritance\r\n\r\n[Inheritance](https://docs.python.org/3/tutorial/classes.html#inheritance) allows you to define a class"
  },
  {
    "path": "more-python-for-beginners/04 - Inheritance/demo.py",
    "chars": 492,
    "preview": "class Person:\n\tdef __init__(self, name):\n\t\tself.name = name\n\tdef say_hello(self):\n\t\tprint('Hello, ' + self.name)\n\nclass "
  },
  {
    "path": "more-python-for-beginners/05 - Mixins/README.md",
    "chars": 829,
    "preview": "# Mixins (multiple inheritance)\r\n\r\nPython allows you to inherit from multiple classes. While the technical term for this"
  },
  {
    "path": "more-python-for-beginners/05 - Mixins/demo.py",
    "chars": 655,
    "preview": "class Loggable:\n    def __init__(self):\n        self.title = ''\n    def log(self):\n        print('Log message from ' + s"
  },
  {
    "path": "more-python-for-beginners/06 - Managing the file system/README.md",
    "chars": 319,
    "preview": "# Managing the file system\r\n\r\nPython's [pathlib](https://docs.python.org/3/library/pathlib.html) provides operations and"
  },
  {
    "path": "more-python-for-beginners/06 - Managing the file system/demo.txt",
    "chars": 11,
    "preview": "Lorem ipsum"
  },
  {
    "path": "more-python-for-beginners/06 - Managing the file system/directories.py",
    "chars": 382,
    "preview": "from pathlib import Path\ncwd = Path.cwd()\n\n# Get the parent directory\nparent = cwd.parent\n\n# Is this a directory?\nprint("
  },
  {
    "path": "more-python-for-beginners/06 - Managing the file system/files.py",
    "chars": 362,
    "preview": "from pathlib import Path\ncwd = Path.cwd()\n\ndemo_file = Path(Path.joinpath(cwd, 'demo.txt'))\n\n# Get the file name\nprint('"
  },
  {
    "path": "more-python-for-beginners/06 - Managing the file system/paths.py",
    "chars": 406,
    "preview": "# Python 3.6 or higher\n# Grab the library\nfrom pathlib import Path\n\n# What is the current working directory?\ncwd = Path."
  },
  {
    "path": "more-python-for-beginners/07 - Reading and writing files/README.md",
    "chars": 360,
    "preview": "# Working with files\r\n\r\nPython allows you to read and write from files. [io](https://docs.python.org/3/library/io.html) "
  },
  {
    "path": "more-python-for-beginners/07 - Reading and writing files/demo.txt",
    "chars": 108,
    "preview": "This is the first line of the file\nAnd this is the second line of the file\nThis is the last line of the file"
  },
  {
    "path": "more-python-for-beginners/07 - Reading and writing files/manage.py",
    "chars": 382,
    "preview": "# Open manage.txt file to write text\nstream = open('manage.txt', 'wt')\n\n#Write the word demo to the file stream\nstream.w"
  },
  {
    "path": "more-python-for-beginners/07 - Reading and writing files/read.py",
    "chars": 312,
    "preview": "# Open file demo.txt and read the contents\nstream = open('./demo.txt', 'rt')\nprint('\\nIs this readable? ' + str(stream.r"
  },
  {
    "path": "more-python-for-beginners/07 - Reading and writing files/write.py",
    "chars": 543,
    "preview": "# Open output.txt as a text file for writing\nstream = open('output.txt', 'wt')\n\nprint('\\nCan I write to this file? ' + s"
  },
  {
    "path": "more-python-for-beginners/08 - Managing external resources/README.md",
    "chars": 430,
    "preview": "# with\r\n\r\nThe [with](https://docs.python.org/3/reference/compound_stmts.html#with) statement allows you to simplify code"
  },
  {
    "path": "more-python-for-beginners/08 - Managing external resources/demo.py",
    "chars": 210,
    "preview": "try:\n\tstream = open('output.txt', 'wt')\n\tstream.write('Lorem ipsum dolar')\nfinally:\n\tstream.close() # THIS IS REALLY IMP"
  },
  {
    "path": "more-python-for-beginners/08 - Managing external resources/output.txt",
    "chars": 17,
    "preview": "Lorem ipsum dolar"
  },
  {
    "path": "more-python-for-beginners/09 - Asynchronous programming/README.md",
    "chars": 523,
    "preview": "# Asynchronous operations\r\n\r\nPython offers several options for managing long running operations asynchronously. [asyncio"
  },
  {
    "path": "more-python-for-beginners/09 - Asynchronous programming/async_demo.py",
    "chars": 1029,
    "preview": "from timeit import default_timer\nimport aiohttp\nimport asyncio\n\nasync def load_data(session, delay):\n    print(f'Startin"
  },
  {
    "path": "more-python-for-beginners/09 - Asynchronous programming/sync_demo.py",
    "chars": 480,
    "preview": "from timeit import default_timer\nimport requests\n\ndef load_data(delay):\n    print(f'Starting {delay} second timer')\n    "
  },
  {
    "path": "more-python-for-beginners/README.md",
    "chars": 2209,
    "preview": "# More Python for beginners\n\n## Overview\n\nWhen we created [Python for beginners](https://aka.ms/pythonbeginnerseries) we"
  },
  {
    "path": "more-python-for-beginners/requirements.txt",
    "chars": 16,
    "preview": "aiohttp\nrequests"
  },
  {
    "path": "python-for-beginners/02 - Print/README.md",
    "chars": 496,
    "preview": "# print\n\nThe print function allows you to send output to the terminal\n\n- [print](https://docs.python.org/3/library/funct"
  },
  {
    "path": "python-for-beginners/02 - Print/ask_for_input.py",
    "chars": 182,
    "preview": "# The input funciton allows you to prompt the user for a value\n# You need to declare a variable to hold the value entere"
  },
  {
    "path": "python-for-beginners/02 - Print/coding_challenge.py",
    "chars": 352,
    "preview": "# Here's a challenge for you to help you practice\n# See if you can fix the code below \n\n# print the message\nprint('Why w"
  },
  {
    "path": "python-for-beginners/02 - Print/coding_challenge_solution.py",
    "chars": 609,
    "preview": "# Here's a challenge for you to help you practice\n# See if you can fix the code below \n\n# print the message\n# There was "
  },
  {
    "path": "python-for-beginners/02 - Print/hello_world.py",
    "chars": 63,
    "preview": "# the print statement displays a message \nprint('Hello world')\n"
  },
  {
    "path": "python-for-beginners/02 - Print/print_blank_line.py",
    "chars": 351,
    "preview": "# Each print statements starts on a new line\nprint('Hello world')\n\n# If you pass nothing to the print statement you get "
  },
  {
    "path": "python-for-beginners/02 - Print/single_or_double_quotes.py",
    "chars": 162,
    "preview": "# Strings can be enclosed in single quotes\nprint('Hello world single quotes')\n\n# Strings can also be enclosed in double "
  },
  {
    "path": "python-for-beginners/03 - Comments/README.md",
    "chars": 225,
    "preview": "# Comments\n\nComments start with a hash character (#) and allow you to document your code.\nComments are ignored when code"
  },
  {
    "path": "python-for-beginners/03 - Comments/comments_are_not_executed.py",
    "chars": 124,
    "preview": "# This is a comment in my code it does nothing\n# print('Hello world')\n# print(\"Hello world\")\n# No output will be display"
  },
  {
    "path": "python-for-beginners/03 - Comments/comments_for_debugging.py",
    "chars": 59,
    "preview": "print('Hello world')\nprint('It's a small world after all')\n"
  },
  {
    "path": "python-for-beginners/03 - Comments/enable_pin.py",
    "chars": 435,
    "preview": "#The enable_pin method is not coded yet\n# I have created a dummy method so the code\n# will run without an error\n# Don't "
  },
  {
    "path": "python-for-beginners/03 - Comments/string_in_double_quotes.py",
    "chars": 129,
    "preview": "# Using double quotes for this string because \n# the string itself contains a single quote\nprint(\"It's a small world aft"
  },
  {
    "path": "python-for-beginners/04 - String variables/README.md",
    "chars": 597,
    "preview": "# Strings\n\nPython can store and manipulate strings. Strings can be enclosed in single or double quotes. There are a numb"
  },
  {
    "path": "python-for-beginners/04 - String variables/code_challenge.py",
    "chars": 328,
    "preview": "# ask a user to enter their first name and store it in a variable\n# ask a user to enter their last name and store it in "
  },
  {
    "path": "python-for-beginners/04 - String variables/code_challenge_solution.py",
    "chars": 484,
    "preview": "# ask a user to enter their first name and store it in a variable\nfirst_name = input('What is your first name? ')\n# ask "
  },
  {
    "path": "python-for-beginners/04 - String variables/combine_strings.py",
    "chars": 260,
    "preview": "# You can use the + operator to concatenate strings\nfirst_name = 'Susan'\nlast_name = 'Ibach'\nprint(first_name + last_nam"
  },
  {
    "path": "python-for-beginners/04 - String variables/format_strings.py",
    "chars": 342,
    "preview": "# Ask the user for their first and last name\nfirst_name = input('What is your first name? ')\nlast_name = input('What is "
  },
  {
    "path": "python-for-beginners/04 - String variables/string_functions.py",
    "chars": 569,
    "preview": "# There are a number of string functions you can use\n# on string variables\nsentence = 'The dog is named Sammy'\n\n# upper "
  },
  {
    "path": "python-for-beginners/04 - String variables/strings_in_variables.py",
    "chars": 128,
    "preview": "# You can store strings in variables\nfirst_name = 'Susan'\n\n# The variable can then be used later in your code\nprint(firs"
  },
  {
    "path": "python-for-beginners/05 - Numeric variables/README.md",
    "chars": 575,
    "preview": "# Numeric values\n\nPython can store and manipulate numbers. Python has two types of numeric values: integers (whole numbe"
  },
  {
    "path": "python-for-beginners/05 - Numeric variables/code_challenge.py",
    "chars": 250,
    "preview": "# Ask a user to enter a number\n# Ask a user to enter a second number\n# Calculate the total of the two numbers added toge"
  },
  {
    "path": "python-for-beginners/05 - Numeric variables/code_challenge_solution.py",
    "chars": 600,
    "preview": "# Ask a user to enter a number\nfirst_number = input('Enter a number: ')\n\n# Ask a user to enter a second number\nsecond_nu"
  },
  {
    "path": "python-for-beginners/05 - Numeric variables/combining_strings_and_numbers.py",
    "chars": 465,
    "preview": "days_in_feb = 28\n\n# The print function can accept numbers or strings\nprint(days_in_feb)\n\n# The + operator can either add"
  },
  {
    "path": "python-for-beginners/05 - Numeric variables/convert_strings_to_numbers_for_math.py",
    "chars": 446,
    "preview": "first_num = input('Enter first number ')\nsecond_num = input('Enter second number ')\n# If you have a string variable cont"
  },
  {
    "path": "python-for-beginners/05 - Numeric variables/doing_math.py",
    "chars": 452,
    "preview": "# Because the variables are assigned numeric values when created\n# Python knows they are numeric variables\nfirst_num = 6"
  },
  {
    "path": "python-for-beginners/05 - Numeric variables/numbers_treated_as_strings.py",
    "chars": 410,
    "preview": "# Python has to guess what datatype a variable should be\n\n# since the input function returns a string, the variables it "
  },
  {
    "path": "python-for-beginners/05 - Numeric variables/print_pi.py",
    "chars": 71,
    "preview": "# You can use variables to store numeric values\npi = 3.14159\nprint(pi)\n"
  },
  {
    "path": "python-for-beginners/06 - Dates/README.md",
    "chars": 668,
    "preview": "# Date values\n\nThe [datetime module](https://docs.python.org/3/library/datetime.html) contains a number of classes for m"
  },
  {
    "path": "python-for-beginners/06 - Dates/code_challenge.py",
    "chars": 122,
    "preview": "# print today's date\n# print yesterday's date\n# ask a user to enter a date\n# print the date one week from the date enter"
  },
  {
    "path": "python-for-beginners/06 - Dates/code_challenge_solution.py",
    "chars": 570,
    "preview": "from datetime import datetime, timedelta\n\n# print today's date\ncurrent_date = datetime.now()\nprint(current_date)\n\n# prin"
  },
  {
    "path": "python-for-beginners/06 - Dates/date_functions.py",
    "chars": 479,
    "preview": "#To get current date and time we need to use the datetime library\nfrom datetime import datetime, timedelta\n# The now fun"
  },
  {
    "path": "python-for-beginners/06 - Dates/format_date.py",
    "chars": 581,
    "preview": "#To get current date and time we need to use the datetime library\nfrom datetime import datetime\n\n# The now function retu"
  },
  {
    "path": "python-for-beginners/06 - Dates/get_current_date.py",
    "chars": 339,
    "preview": "#To get current date and time we need to use the datetime library\nfrom datetime import datetime\n\ncurrent_date = datetime"
  },
  {
    "path": "python-for-beginners/06 - Dates/input_date.py",
    "chars": 741,
    "preview": "# import the datetime and timedelta modules\nfrom datetime import datetime, timedelta\n\n# When you ask a user for a date t"
  },
  {
    "path": "python-for-beginners/07 - Error handling/README.md",
    "chars": 450,
    "preview": "# Error handling\n\nError handling in Python is managed through the use of [try/except/finally](https://docs.python.org/3."
  },
  {
    "path": "python-for-beginners/07 - Error handling/logic.py",
    "chars": 74,
    "preview": "x = 206\ny = 42\nif x < y:\n    print(str(x) + ' is greater than ' + str(y))\n"
  },
  {
    "path": "python-for-beginners/07 - Error handling/runtime.py",
    "chars": 251,
    "preview": "x = 42\ny = 0\ntry:\n    print(x / y)\nexcept ZeroDivisionError as e:\n    # Optionally, log e somewhere\n    print('Sorry, so"
  },
  {
    "path": "python-for-beginners/07 - Error handling/syntax.py",
    "chars": 45,
    "preview": "x = 42\ny = 206\nif x == y\n    print('Success')"
  },
  {
    "path": "python-for-beginners/08 - Handling conditions/README.md",
    "chars": 464,
    "preview": "# Handling conditions\n\nConditional execution can be completed using the [if](https://docs.python.org/3/reference/compoun"
  },
  {
    "path": "python-for-beginners/08 - Handling conditions/add_else.py",
    "chars": 405,
    "preview": "price = input('how much did you pay? ')\nprice = float(price)\n\nif price >= 1.00:\n\t# Anything that costs $1.00 or more is "
  },
  {
    "path": "python-for-beginners/08 - Handling conditions/add_else_different_indentation.py",
    "chars": 165,
    "preview": "price = 5.0\nif price >= 1.00:\n\ttax = .07\nelse:\n\ttax = 0\n# the print statement below is not indented so is executed after"
  },
  {
    "path": "python-for-beginners/08 - Handling conditions/case_insensitive_comparisons.py",
    "chars": 304,
    "preview": "country = 'CANADA'\n# by converting the string entered to lowercase and comparing it to a string\n# that is all lowercase "
  },
  {
    "path": "python-for-beginners/08 - Handling conditions/check_tax.py",
    "chars": 325,
    "preview": "#Calculate the tax\n# Anything purchased for more than $1.00 is charged a 7% tax\nprice = input('how much did you pay? ')\n"
  },
  {
    "path": "python-for-beginners/08 - Handling conditions/code_challenge.py",
    "chars": 412,
    "preview": "# Fix the mistakes in this code and test based on the description below\n# If I enter 2.00 I should see the message \"Tax "
  },
  {
    "path": "python-for-beginners/08 - Handling conditions/code_challenge_solution.py",
    "chars": 431,
    "preview": "# Fix the mistakes in this code and test using the following\n# If I enter 2.00 I should see the message \"Tax rate is: 0."
  },
  {
    "path": "python-for-beginners/08 - Handling conditions/comparing_strings.py",
    "chars": 251,
    "preview": "country = input('Enter the name of your home country: ')\nif country == 'canada':\n\t# string comparisons are case sensitiv"
  },
  {
    "path": "python-for-beginners/09 - Handling multiple conditions/README.md",
    "chars": 787,
    "preview": "# Handling conditions\n\nConditional execution can be completed using the [if](https://docs.python.org/3/reference/compoun"
  },
  {
    "path": "python-for-beginners/09 - Handling multiple conditions/add_else_to_elif.py",
    "chars": 206,
    "preview": "province = input(\"What province do you live in? \")\ntax = 0\n\nif province == 'Alberta':\n\ttax = 0.05\nelif province == 'Nuna"
  },
  {
    "path": "python-for-beginners/09 - Handling multiple conditions/code_challenge.py",
    "chars": 553,
    "preview": "# Ask a user their name\n# If their first name starts with A or B \n# tell them they go to room AB\n# IF their first name s"
  },
  {
    "path": "python-for-beginners/09 - Handling multiple conditions/code_challenge_solution.py",
    "chars": 1037,
    "preview": "# Assign people to different rooms when they check in based on their names\n# When you are done\n# Anna should be in room "
  },
  {
    "path": "python-for-beginners/09 - Handling multiple conditions/multiple_if_statements.py",
    "chars": 187,
    "preview": "\nprovince = input(\"What province do you live in? \")\ntax = 0\n\nif province == 'Alberta': \n\ttax = 0.05\nif province == 'Nuna"
  },
  {
    "path": "python-for-beginners/09 - Handling multiple conditions/nested_if.py",
    "chars": 298,
    "preview": "country = input(\"What country do you live in? \")\n\nif country.lower() == 'canada':\n\tprovince = input(\"What province/state"
  },
  {
    "path": "python-for-beginners/09 - Handling multiple conditions/or_statements.py",
    "chars": 196,
    "preview": "province = input(\"What province do you live in? \")\ntax = 0\nif province == 'Alberta' \\\n   or province == 'Nunavut':\n\ttax "
  },
  {
    "path": "python-for-beginners/09 - Handling multiple conditions/use_elif.py",
    "chars": 188,
    "preview": "province = input(\"What province do you live in? \")\ntax = 0\nif province == 'Alberta': \n\ttax = 0.05\nelif province == 'Nuna"
  },
  {
    "path": "python-for-beginners/09 - Handling multiple conditions/use_in_statements.py",
    "chars": 316,
    "preview": "province = input(\"What province do you live in? \")\ntax = 0\n# If multiple values cause the same output you can combine th"
  },
  {
    "path": "python-for-beginners/10 - Complex conditon checks/boolean_variables.py",
    "chars": 535,
    "preview": "# I check to see if the requirements for honour roll are met\ngpa = float(input('What was your Grade Point Average? '))\nl"
  },
  {
    "path": "python-for-beginners/10 - Complex conditon checks/code_challenge.py",
    "chars": 1028,
    "preview": "# When you join a hockey team you get your name on the back of the jersey\n# but the jersey may not be big enough to hold"
  },
  {
    "path": "python-for-beginners/10 - Complex conditon checks/code_challenge_solution.py",
    "chars": 1347,
    "preview": "# When you join a hockey team you get your name on the back of the jersey\n# but the jersey may not be big enough to hold"
  },
  {
    "path": "python-for-beginners/10 - Complex conditon checks/readme.md",
    "chars": 931,
    "preview": "# Complex condition checks\n\nConditional execution can be completed using the [if](https://docs.python.org/3/reference/co"
  },
  {
    "path": "python-for-beginners/10 - Complex conditon checks/using_and.py",
    "chars": 319,
    "preview": "# A student makes honour roll if their average is >=85\n# and their lowest grade is not below 70\ngpa = float(input('What "
  },
  {
    "path": "python-for-beginners/11 - Collections/README.md",
    "chars": 996,
    "preview": "# Collections\n\nCollections are groups of items. Python supports several types of collections. Three of the most common a"
  },
  {
    "path": "python-for-beginners/11 - Collections/arrays.py",
    "chars": 93,
    "preview": "from array import array\nscores = array('d')\nscores.append(97)\nscores.append(98)\nprint(scores)"
  },
  {
    "path": "python-for-beginners/11 - Collections/common-operations.py",
    "chars": 136,
    "preview": "names = ['Christopher', 'Susan']\nprint(len(names)) # Get the number of items\nnames.insert(0, 'Bill') # Insert before ind"
  },
  {
    "path": "python-for-beginners/11 - Collections/dictionaries.py",
    "chars": 98,
    "preview": "person = {'first': 'Christopher'}\nperson['last'] = 'Harrison'\nprint(person)\nprint(person['first'])"
  },
  {
    "path": "python-for-beginners/11 - Collections/lists.py",
    "chars": 108,
    "preview": "names = ['Christopher', 'Susan']\nscores = []\nscores.append(98)\nscores.append(99)\nprint(names)\nprint(scores)\n"
  },
  {
    "path": "python-for-beginners/11 - Collections/ranges.py",
    "chars": 172,
    "preview": "names = ['Susan', 'Christopher', 'Bill']\npresenters = names[0:2] # Get the first two items\n# Starting index and number o"
  },
  {
    "path": "python-for-beginners/12 - Loops/README.md",
    "chars": 597,
    "preview": "# Loops\n\n## For loops\n\n[For loops](https://docs.python.org/3/reference/compound_stmts.html#the-for-statement) takes each"
  },
  {
    "path": "python-for-beginners/12 - Loops/for.py",
    "chars": 51,
    "preview": "for name in ['Christopher', 'Susan']:\n\tprint(name)\n"
  },
  {
    "path": "python-for-beginners/12 - Loops/number.py",
    "chars": 178,
    "preview": "# range creates an array\n# First parameter is the starter\n# Second indicates the number of numbers to create\n# range(0, "
  },
  {
    "path": "python-for-beginners/12 - Loops/while.py",
    "chars": 135,
    "preview": "names = ['Christopher', 'Susan']\nindex = 0\nwhile index < len(names):\n\tprint(names[index])\n\t# Change the condition!!\n\tind"
  },
  {
    "path": "python-for-beginners/13 - Functions/README.md",
    "chars": 450,
    "preview": "# Functions\n\nFunctions allow you to take code that is repeated and move it to a module that can be called when needed. F"
  },
  {
    "path": "python-for-beginners/13 - Functions/code_challenge.py",
    "chars": 641,
    "preview": "# Create a calculator function\n# The function should accept three parameters:\n# first_number: a numeric value for the ma"
  },
  {
    "path": "python-for-beginners/13 - Functions/code_challenge_solution.py",
    "chars": 1095,
    "preview": "# Create a calculator function\n# The function should accept three parameters:\n# first_number: a numeric value for the ma"
  },
  {
    "path": "python-for-beginners/13 - Functions/get_initails_function.py",
    "chars": 884,
    "preview": "# Create function get_initial to accept a name and \n# return the first letter of the name in uppercase\n# Parameters:\n#  "
  },
  {
    "path": "python-for-beginners/13 - Functions/get_initials.py",
    "chars": 389,
    "preview": "# Ask for a name and return the initials\nfirst_name = input('Enter your first name: ')\nfirst_name_initial = first_name[0"
  },
  {
    "path": "python-for-beginners/13 - Functions/getting_clever_with_functions_harder_to_read.py",
    "chars": 675,
    "preview": "# Create function get_initial to accept a name and \n# return the first letter of the name in uppercase\n# Parameters:\n#  "
  },
  {
    "path": "python-for-beginners/13 - Functions/print_time_function.py",
    "chars": 417,
    "preview": "import datetime\n# Create a function called print_time\n# This function will print the message and current time\ndef print_"
  },
  {
    "path": "python-for-beginners/13 - Functions/print_time_function_different_messages.py",
    "chars": 278,
    "preview": "from datetime import datetime\n\n# What if we want different messages displayed?\n# Can we still use a function?\nfirst_name"
  },
  {
    "path": "python-for-beginners/13 - Functions/print_time_function_fix_import.py",
    "chars": 504,
    "preview": "# Import datetime class from datetime library to simplify calls to datetime.now()\nfrom datetime import datetime\n\n# Creat"
  },
  {
    "path": "python-for-beginners/13 - Functions/print_time_repeated_code.py",
    "chars": 292,
    "preview": "import datetime\n# print timestamps after each section of code\n# to see how long sections of code take to run\n\nfirst_name"
  },
  {
    "path": "python-for-beginners/13 - Functions/print_time_with_message_parameter.py",
    "chars": 588,
    "preview": "from datetime import datetime\n\n# Define a function to print the current time and task name\n# Function the following para"
  },
  {
    "path": "python-for-beginners/14 - Function parameters/code_challenge.py",
    "chars": 724,
    "preview": "# Create a calculator function\n# The function should accept three parameters:\n# first_number: a numeric value for the ma"
  },
  {
    "path": "python-for-beginners/14 - Function parameters/code_challenge_solution.py",
    "chars": 1097,
    "preview": "# Create a calculator function\n# The function should accept three parameters:\n# first_number: a numeric value for the ma"
  },
  {
    "path": "python-for-beginners/14 - Function parameters/get_initials_default_values.py",
    "chars": 726,
    "preview": "# Create a function to return the first initial of a name\n# Parameters:\n#   name: name of person\n#   force_uppercase: in"
  },
  {
    "path": "python-for-beginners/14 - Function parameters/get_initials_function.py",
    "chars": 521,
    "preview": "# This function will take a name and return the \n# Create a function to return the first initial of a name\n# Parameters:"
  },
  {
    "path": "python-for-beginners/14 - Function parameters/get_initials_multiple_parameters.py",
    "chars": 679,
    "preview": "# Create a function to return the first initial of a name\n# Parameters:\n#   name: name of person\n#   force_uppercase: in"
  },
  {
    "path": "python-for-beginners/14 - Function parameters/get_initials_named_parameters.py",
    "chars": 760,
    "preview": "# Create a function to return the first initial of a name\n# Parameters:\n#   name: name of person\n#   force_uppercase: in"
  },
  {
    "path": "python-for-beginners/14 - Function parameters/named_parameters_make_code_readable.py",
    "chars": 1581,
    "preview": "# Create a function to handle errors that occur during code execution\n# This will display a message to the user adn may "
  },
  {
    "path": "python-for-beginners/14 - Function parameters/readme.md",
    "chars": 1214,
    "preview": "# Function parameters\n\nFunctions allow you to take code that is repeated and move it to a module that can be called when"
  },
  {
    "path": "python-for-beginners/15 - Packages/README.md",
    "chars": 1191,
    "preview": "# Packages and modules\n\n## Modules\n\n[Modules](https://docs.python.org/3/tutorial/modules.html) allow you to store reusab"
  },
  {
    "path": "python-for-beginners/15 - Packages/color_import_demo.py",
    "chars": 209,
    "preview": "import colorama\n\ncolorama.init()\nprint(colorama.Fore.RED + 'This is red')\n\nfrom colorama import *\n\ninit()\nprint(Fore.BLU"
  },
  {
    "path": "python-for-beginners/15 - Packages/helpers.py",
    "chars": 105,
    "preview": "def display(message, is_warning=False):\n    if is_warning:\n        print('Warning!!')\n    print(message)\n"
  },
  {
    "path": "python-for-beginners/15 - Packages/import_module.py",
    "chars": 261,
    "preview": "# import module as namespace\nimport helpers\nhelpers.display('Not a warning')\n\n# import all into current namespace\nfrom h"
  },
  {
    "path": "python-for-beginners/15 - Packages/requirements.txt",
    "chars": 8,
    "preview": "colorama"
  },
  {
    "path": "python-for-beginners/16 - Calling APIs/call_api.py",
    "chars": 1994,
    "preview": "# This code will show you how to call the Computer Vision API from Python\n# You can find documentation on the Computer V"
  },
  {
    "path": "python-for-beginners/16 - Calling APIs/code_challenge.py",
    "chars": 1232,
    "preview": "# Challenge #1\n# Create an Azure Custom Vision Service \n# Analyze an image and return the JSON describing the image.\n# c"
  },
  {
    "path": "python-for-beginners/16 - Calling APIs/readme.md",
    "chars": 1467,
    "preview": "# Calling APIs\n\nYou can call functions called by other programs hosted on web servers. [Microsoft Azure Cognitive Servic"
  },
  {
    "path": "python-for-beginners/17 - JSON/create_json_from_dict.py",
    "chars": 296,
    "preview": "import json\n\n# Create a dictionary object\nperson_dict = {'first': 'Christopher', 'last':'Harrison'}\n# Add additional key"
  },
  {
    "path": "python-for-beginners/17 - JSON/create_json_with_list.py",
    "chars": 491,
    "preview": "import json\n\n# Create a dictionary object\nperson_dict = {'first': 'Christopher', 'last':'Harrison'}\n# Add additional key"
  },
  {
    "path": "python-for-beginners/17 - JSON/create_json_with_nested_dict.py",
    "chars": 434,
    "preview": "import json\n\n# Create a dictionary object\nperson_dict = {'first': 'Christopher', 'last':'Harrison'}\n# Add additional key"
  },
  {
    "path": "python-for-beginners/17 - JSON/read_json.py",
    "chars": 2252,
    "preview": "# This code will show you how to call the Computer Vision API from Python\n# You can find documentation on the Computer V"
  },
  {
    "path": "python-for-beginners/17 - JSON/read_key_pair.py",
    "chars": 2020,
    "preview": "# This code will show you how to call the Computer Vision API from Python\n# You can find documentation on the Computer V"
  },
  {
    "path": "python-for-beginners/17 - JSON/read_key_pair_list.py",
    "chars": 2182,
    "preview": "# This code will show you how to call the Computer Vision API from Python\n# You can find documentation on the Computer V"
  },
  {
    "path": "python-for-beginners/17 - JSON/read_subkey.py",
    "chars": 2072,
    "preview": "# This code will show you how to call the Computer Vision API from Python\n# You can find documentation on the Computer V"
  },
  {
    "path": "python-for-beginners/17 - JSON/readme.md",
    "chars": 642,
    "preview": "# JSON\n\nMany APIs return data in [JSON](https://json.org/), JavaScript Object Notation. JSON is a standard format that c"
  },
  {
    "path": "python-for-beginners/18 - Decorators/README.md",
    "chars": 527,
    "preview": "# Decorators\n\n[Decorators](https://www.python.org/dev/peps/pep-0318/) are similar to attributes in that they add meaning"
  },
  {
    "path": "python-for-beginners/18 - Decorators/creating_decorators.py",
    "chars": 399,
    "preview": "import functools\nfrom colorama import init, Fore\ninit()\n\ndef color(color):\n    def wrapper(func):\n        @functools.wra"
  },
  {
    "path": "python-for-beginners/README.md",
    "chars": 2582,
    "preview": "# Python for beginners\r\n\r\n## Overview\r\n\r\nGetting started with a new environment can be challenging, especially when you "
  }
]

// ... and 49 more files (download for full content)

About this extraction

This page contains the full source code of the microsoft/c9-python-getting-started GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 231 files (149.7 MB), approximately 94.1k tokens, and a symbol index with 47 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!