Repository: microsoft/c9-python-getting-started Branch: master Commit: a74d0ea8451a Files: 231 Total size: 149.7 MB Directory structure: gitextract_0t_49sca/ ├── .gitignore ├── CODE_OF_CONDUCT.md ├── LICENSE ├── README.md ├── SECURITY.md ├── even-more-python-for-beginners-data-tools/ │ ├── 01 - Jupyter Notebooks/ │ │ └── README.md │ ├── 02 - Introduction to Anaconda and Conda/ │ │ └── README.md │ ├── 03 - Intro to Pandas/ │ │ ├── 03 - Pandas Series and DataFrame.ipynb │ │ └── README.md │ ├── 04 - Examining Pandas DataFrame contents/ │ │ ├── 04 - Exploring pandas DataFrame contents.ipynb │ │ └── README.md │ ├── 05 - Query a pandas Dataframe/ │ │ ├── 05 - Querying DataFrames.ipynb │ │ └── README.md │ ├── 06 - CSV Files and Jupyter Notebooks/ │ │ ├── README.md │ │ └── airports.csv │ ├── 07 - Read and write CSV files from pandas DataFrames/ │ │ ├── 07 - Read write CSV files.ipynb │ │ ├── README.md │ │ ├── airports.csv │ │ ├── airportsBlankValues.csv │ │ ├── airportsInvalidRows.csv │ │ └── airportsNoHeaderRows.csv │ ├── 08 - Removing and splitting DataFrame columns/ │ │ ├── 08 - Removing columns.ipynb │ │ ├── README.md │ │ └── flight_delays.csv │ ├── 09 - Handling duplicates and rows with missing values/ │ │ ├── 09 - Removing rows.ipynb │ │ ├── Lots_of_flight_data.csv │ │ ├── README.md │ │ └── airportsDuplicateRows.csv │ ├── 10 - Splitting test and training data with scikit-learn/ │ │ ├── 10 - Train Test split.ipynb │ │ ├── Lots_of_flight_data.csv │ │ └── README.md │ ├── 11 - Train a linear regression model with scikit-learn/ │ │ ├── 11 - Train a basic model.ipynb │ │ ├── Lots_of_flight_data.csv │ │ └── README.md │ ├── 12 - Testing a model/ │ │ ├── 12 - Test a model.ipynb │ │ ├── Lots_of_flight_data.csv │ │ └── README.md │ ├── 13 - Evaluating accuracy of a model using calculations/ │ │ ├── 13 - Evaluate accuracy.ipynb │ │ ├── Lots_of_flight_data.csv │ │ └── README.md │ ├── 14 - NumPy vs Pandas/ │ │ ├── 14 - Working with numpy and pandas.ipynb │ │ ├── Lots_of_flight_data.csv │ │ └── README.md │ ├── 15 - Visualizing data with Matplotlib/ │ │ ├── 15 - Visualizing correlations.ipynb │ │ ├── Lots_of_flight_data.csv │ │ └── README.md │ ├── README.md │ └── Slides/ │ ├── 01 - Jupyter Notebooks.pptx │ ├── 02 - Intro to Anaconda and conda.pptx │ ├── 03 - Pandas series and DataFrame.pptx │ ├── 04 - Examining pandas DataFrame contents.pptx │ ├── 05 - Query a pandas DataFrame.pptx │ ├── 06 - CSV Files and Jupyter notebooks.pptx │ ├── 07 - Read and write CSV files from DataFrames.pptx │ ├── 08 - Remove columns from DataFrame.pptx │ ├── 09 - Remove rows with missing values.pptx │ ├── 10 - Splitting test and training data.pptx │ ├── 11 - Train a linear regression model with scikitlearn.pptx │ ├── 12 - Testing a model.pptx │ ├── 13 - Evaluate accuracy of a model using calculations.pptx │ ├── 14 - Working with numpy and pandas.pptx │ └── 15 - Visualizing Data Correlations with Matplotlib.pptx ├── more-python-for-beginners/ │ ├── .gitignore │ ├── 01 - Formatting and linting/ │ │ ├── .vscode/ │ │ │ └── settings.json │ │ ├── README.md │ │ ├── bad.py │ │ └── good.py │ ├── 02 - Lambdas/ │ │ ├── README.md │ │ ├── failed_sort.py │ │ ├── lambda_sorter.py │ │ └── method_sorter.py │ ├── 03 - Classes/ │ │ ├── README.md │ │ ├── basic_class.py │ │ └── properties_class.py │ ├── 04 - Inheritance/ │ │ ├── README.md │ │ └── demo.py │ ├── 05 - Mixins/ │ │ ├── README.md │ │ └── demo.py │ ├── 06 - Managing the file system/ │ │ ├── README.md │ │ ├── demo.txt │ │ ├── directories.py │ │ ├── files.py │ │ └── paths.py │ ├── 07 - Reading and writing files/ │ │ ├── README.md │ │ ├── demo.txt │ │ ├── manage.py │ │ ├── read.py │ │ └── write.py │ ├── 08 - Managing external resources/ │ │ ├── README.md │ │ ├── demo.py │ │ └── output.txt │ ├── 09 - Asynchronous programming/ │ │ ├── README.md │ │ ├── async_demo.py │ │ └── sync_demo.py │ ├── README.md │ ├── Slides/ │ │ ├── 01 - Formatting and linting.pptx │ │ ├── 02 - Lambdas.pptx │ │ ├── 03 - Classes.pptx │ │ ├── 04 - Inhheritance.pptx │ │ ├── 05 - Mixins (multiple inheritance).pptx │ │ ├── 06 - Managing the file system.pptx │ │ ├── 07 - Working with files.pptx │ │ ├── 08 - Cleanup with with.pptx │ │ └── 09 - Asynchronous operations.pptx │ └── requirements.txt └── python-for-beginners/ ├── 02 - Print/ │ ├── README.md │ ├── ask_for_input.py │ ├── coding_challenge.py │ ├── coding_challenge_solution.py │ ├── hello_world.py │ ├── print_blank_line.py │ └── single_or_double_quotes.py ├── 03 - Comments/ │ ├── README.md │ ├── comments_are_not_executed.py │ ├── comments_for_debugging.py │ ├── enable_pin.py │ └── string_in_double_quotes.py ├── 04 - String variables/ │ ├── README.md │ ├── code_challenge.py │ ├── code_challenge_solution.py │ ├── combine_strings.py │ ├── format_strings.py │ ├── string_functions.py │ └── strings_in_variables.py ├── 05 - Numeric variables/ │ ├── README.md │ ├── code_challenge.py │ ├── code_challenge_solution.py │ ├── combining_strings_and_numbers.py │ ├── convert_strings_to_numbers_for_math.py │ ├── doing_math.py │ ├── numbers_treated_as_strings.py │ └── print_pi.py ├── 06 - Dates/ │ ├── README.md │ ├── code_challenge.py │ ├── code_challenge_solution.py │ ├── date_functions.py │ ├── format_date.py │ ├── get_current_date.py │ └── input_date.py ├── 07 - Error handling/ │ ├── README.md │ ├── logic.py │ ├── runtime.py │ └── syntax.py ├── 08 - Handling conditions/ │ ├── README.md │ ├── add_else.py │ ├── add_else_different_indentation.py │ ├── case_insensitive_comparisons.py │ ├── check_tax.py │ ├── code_challenge.py │ ├── code_challenge_solution.py │ └── comparing_strings.py ├── 09 - Handling multiple conditions/ │ ├── README.md │ ├── add_else_to_elif.py │ ├── code_challenge.py │ ├── code_challenge_solution.py │ ├── multiple_if_statements.py │ ├── nested_if.py │ ├── or_statements.py │ ├── use_elif.py │ └── use_in_statements.py ├── 10 - Complex conditon checks/ │ ├── boolean_variables.py │ ├── code_challenge.py │ ├── code_challenge_solution.py │ ├── readme.md │ └── using_and.py ├── 11 - Collections/ │ ├── README.md │ ├── arrays.py │ ├── common-operations.py │ ├── dictionaries.py │ ├── lists.py │ └── ranges.py ├── 12 - Loops/ │ ├── README.md │ ├── for.py │ ├── number.py │ └── while.py ├── 13 - Functions/ │ ├── README.md │ ├── code_challenge.py │ ├── code_challenge_solution.py │ ├── get_initails_function.py │ ├── get_initials.py │ ├── getting_clever_with_functions_harder_to_read.py │ ├── print_time_function.py │ ├── print_time_function_different_messages.py │ ├── print_time_function_fix_import.py │ ├── print_time_repeated_code.py │ └── print_time_with_message_parameter.py ├── 14 - Function parameters/ │ ├── code_challenge.py │ ├── code_challenge_solution.py │ ├── get_initials_default_values.py │ ├── get_initials_function.py │ ├── get_initials_multiple_parameters.py │ ├── get_initials_named_parameters.py │ ├── named_parameters_make_code_readable.py │ └── readme.md ├── 15 - Packages/ │ ├── README.md │ ├── color_import_demo.py │ ├── helpers.py │ ├── import_module.py │ └── requirements.txt ├── 16 - Calling APIs/ │ ├── call_api.py │ ├── code_challenge.py │ └── readme.md ├── 17 - JSON/ │ ├── create_json_from_dict.py │ ├── create_json_with_list.py │ ├── create_json_with_nested_dict.py │ ├── read_json.py │ ├── read_key_pair.py │ ├── read_key_pair_list.py │ ├── read_subkey.py │ └── readme.md ├── 18 - Decorators/ │ ├── README.md │ └── creating_decorators.py ├── README.md └── Slides/ ├── 0 - Intro.pptx ├── 1 - Getting started.pptx ├── 10 - ComplexConditionChecks.pptx ├── 11 - Collections.pptx ├── 12 - Loops.pptx ├── 13 - Functions.pptx ├── 14 - FunctionParameters.pptx ├── 15 - ModulesPackages.pptx ├── 16 - CallingAPI.pptx ├── 17 - JSON.pptx ├── 2 - Print.pptx ├── 3 - Comments.pptx ├── 4 - StringVariables.pptx ├── 5 - NumericVariables.pptx ├── 6 - Dates.pptx ├── 7 - ErrorHandling.pptx ├── 8 - Conditions.pptx └── 9 - MultipleConditions.pptx ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ env # Created by https://www.gitignore.io/api/python,visualstudiocode # Edit at https://www.gitignore.io/?templates=python,visualstudiocode ### Python ### # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ pip-wheel-metadata/ share/python-wheels/ *.egg-info/ .installed.cfg *.egg MANIFEST # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .nox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover .hypothesis/ .pytest_cache/ # Translations *.mo *.pot # Django stuff: *.log local_settings.py db.sqlite3 # Flask stuff: instance/ .webassets-cache # Scrapy stuff: .scrapy # Sphinx documentation docs/_build/ # PyBuilder target/ # Jupyter Notebook .ipynb_checkpoints # IPython profile_default/ ipython_config.py # pyenv .python-version # pipenv # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. # However, in case of collaboration, if having platform-specific dependencies or dependencies # having no cross-platform support, pipenv may install dependencies that don’t work, or not # install all needed dependencies. #Pipfile.lock # celery beat schedule file celerybeat-schedule # SageMath parsed files *.sage.py # Environments .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ # Spyder project settings .spyderproject .spyproject # Rope project settings .ropeproject # mkdocs documentation /site # mypy .mypy_cache/ .dmypy.json dmypy.json # Pyre type checker .pyre/ ### VisualStudioCode ### .vscode/* !.vscode/settings.json !.vscode/tasks.json !.vscode/launch.json !.vscode/extensions.json ### VisualStudioCode Patch ### # Ignore all local history of files .history # End of https://www.gitignore.io/api/python,visualstudiocode ================================================ FILE: CODE_OF_CONDUCT.md ================================================ # Microsoft Open Source Code of Conduct This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/?WT.mc_id=python-c9-niner). Resources: - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/?WT.mc_id=python-c9-niner) - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/?WT.mc_id=python-c9-niner) - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com?WT.mc_id=python-c9-niner) with questions or concerns ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) Microsoft Corporation. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE ================================================ FILE: README.md ================================================ # Getting started with Python ## Overview These three series on Channel 9 and YouTube are designed to help get you up to speed on Python. If you're a beginning developer looking to add Python to your quiver of languages or trying to get started on data science or web project which uses Python, these videos are here to help show you the foundations necessary to walk through a tutorial or other quick start. We do assume you are familiar with another programming language, and some core programming concepts. For example, we highlight the syntax for boolean expressions and creating classes, but we don't dig into what a [boolean](https://en.wikipedia.org/wiki/Boolean_data_type) is or [object oriented design](https://en.wikipedia.org/wiki/Object-oriented_design). We show you how to perform the tasks you're familiar with in other languages in Python. ### What you'll learn - The basics of Python - Common syntax - Popular packages ## Prerequisites - Light experience with another programming language, such as [JavaScript](https://www.edx.org/course/javascript-introduction), [Java](https://www.java.com) or [C#](https://docs.microsoft.com/dotnet/csharp/) - [An understanding of Git](https://git-scm.com/book/en/v1/Getting-Started) ## Courses ### Getting started [Python for beginners](https://aka.ms/pythonbeginnerseries) is the perfect starting location for getting started. No Python experience is required! We'll show you how to set up [Visual Studio Code](https://code.visualstudio.com?WT.mc_id=python-c9-niner) as your code editor, and start creating Python code. You'll see how to manage create, structure and run your code, how to manage packages, and even make [REST calls](https://en.wikipedia.org/wiki/Representational_state_transfer). ### Dig a little deeper [More Python for beginners](https://aka.ms/morepython) digs deeper into Python syntax. You'll explore how to create classes and mixins in Python, how to work with the file system, and introduce `async/await`. This is the perfect next step if you're looking to see a bit more of what Python can do. ### Peek at data science tools [Even more Python for beginners](https://aka.ms/evenmorepython) is a practical exploration of a couple of the most common packages and tools you'll use when working with data and machine learning. While we won't dig into why you choose particular machine learning models (that's another course), you will get hands-on with Jupyter Notebooks, and create and test models using scikit-learn and pandas. ## Next steps As the goal of these courses is to help get you up to speed on Python so you can work through a quick start. The next step after completing the videos is to follow a tutorial! Here are a few of our favorites: - [Quickstart: Detect faces in an image using the Face REST API and Python](https://docs.microsoft.com/azure/cognitive-services/face/QuickStarts/Python?WT.mc_id=python-c9-niner?WT.mc_id=python-c9-niner) - [Quickstart: Analyze a local image using the Computer Vision REST API and Python](https://docs.microsoft.com/azure/cognitive-services/computer-vision/quickstarts/python-disk?WT.mc_id=python-c9-niner?WT.mc_id=python-c9-niner) - [Quickstart: Using the Python REST API to call the Text Analytics Cognitive Service](https://docs.microsoft.com/azure/cognitive-services/Text-Analytics/quickstarts/python?WT.mc_id=python-c9-niner?WT.mc_id=python-c9-niner) - [Tutorial: Build a Flask app with Azure Cognitive Services](https://docs.microsoft.com/azure/cognitive-services/translator/tutorial-build-flask-app-translation-synthesis?WT.mc_id=python-c9-niner) - [Flask tutorial in Visual Studio Code](https://code.visualstudio.com/docs/python/tutorial-flask?WT.mc_id=python-c9-niner) - [Django tutorial in Visual Studio Code](https://code.visualstudio.com/docs/python/tutorial-django?WT.mc_id=python-c9-niner) - [Predict flight delays by creating a machine learning model in Python](https://docs.microsoft.com/learn/modules/predict-flight-delays-with-python?WT.mc_id=python-c9-niner) ## Contributing This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com. When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA. This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. ================================================ FILE: SECURITY.md ================================================ ## Security Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [many more](https://opensource.microsoft.com/?WT.mc_id=python-c9-niner). If you believe you have found a security vulnerability in any Microsoft-owned repository that meets Microsoft's [definition](https://docs.microsoft.com/previous-versions/tn-archive/cc751383(v=technet.10)?WT.mc_id=python-c9-niner) of a security vulnerability, please report it to us as described below. ## Reporting Security Issues **Please do not report security vulnerabilities through public GitHub issues.** Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report?WT.mc_id=python-c9-niner). If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/msrc/pgp-key-msrc?WT.mc_id=python-c9-niner). You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc?WT.mc_id=python-c9-niner). Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue: * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.) * Full paths of source file(s) related to the manifestation of the issue * The location of the affected source code (tag/branch/commit or direct URL) * Any special configuration required to reproduce the issue * Step-by-step instructions to reproduce the issue * Proof-of-concept or exploit code (if possible) * Impact of the issue, including how an attacker might exploit the issue This information will help us triage your report more quickly. If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://microsoft.com/msrc/bounty?WT.mc_id=python-c9-niner) page for more details about our active programs. ## Preferred Languages We prefer all communications to be in English. ## Policy Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/msrc/cvd?WT.mc_id=python-c9-niner). ================================================ FILE: even-more-python-for-beginners-data-tools/01 - Jupyter Notebooks/README.md ================================================ # Jupyter Notebooks Jupyter Notebooks are an open source web application that allows you to create and share Python code. They are frequently used for data science. The code samples in this course are completed using Jupyter Notebooks which have a .ipynb file extension. ## Documentation - [Jupyter](https://jupyter.org/) to install Jupyter so you can run Jupyter Notebooks locally on your computer - [Jupyter Notebook viewer](https://nbviewer.jupyter.org/) to view Jupyter Notebooks in this GitHub repository without installing Jupyter - [Azure Notebooks](https://notebooks.azure.com/) to create a free Azure Notebooks account to run Notebooks in the cloud - [Create and run a notebook](https://docs.microsoft.com/azure/notebooks/tutorial-create-run-jupyter-notebook?WT.mc_id=python-c9-niner) is a tutorial that walks you through the process of using Azure Notebooks to create a complete Jupyter Notebook that demonstrates linear regression - [How to create and clone projects](https://docs.microsoft.com/azure/notebooks/create-clone-jupyter-notebooks?WT.mc_id=python-c9-niner) to create a project - [Manage and configure projects in Azure Notebooks](https://docs.microsoft.com/azure/notebooks/configure-manage-azure-notebooks-projects?WT.mc_id=python-c9-niner) to upload Notebooks to your project ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner) ================================================ FILE: even-more-python-for-beginners-data-tools/02 - Introduction to Anaconda and Conda/README.md ================================================ # Anaconda [Anaconda](https://www.anaconda.com/) is an open source distribution of Python and R for data science. It includes more than 1500 packages, a graphical interface called Anaconda Navigator, a command line interface called Anaconda prompt and a tool called Conda. ## Conda Python code often relies on external libraries stored in packages. Conda is an open source package management system and environment management system. Conda helps you manage environments and install packages for Jupyter Notebooks. ## Documentation - [Conda home page](https://docs.conda.io/) - [Managing Conda environments](https://docs.conda.io/projects/conda/latest/user-guide/tasks/manage-environments.html) to find links and instructions for creating Conda environments, activating, and de-activating Conda environments - [Managing packages](https://docs.conda.io/projects/conda/latest/user-guide/getting-started.html#managing-packages) to learn how to install packages in a Conda environment - [Conda cheat sheet](https://docs.conda.io/projects/conda/latest/user-guide/cheatsheet.html) is a handy quick reference of common Conda commands ================================================ FILE: even-more-python-for-beginners-data-tools/03 - Intro to Pandas/03 - Pandas Series and DataFrame.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# pandas Series and DataFrame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## pandas\n", "**pandas** is an open source library providing data structures and data analysis tools for Python programmers" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Series\n", "The pandas **Series** is a one dimensional array, similar to a Python list" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Seattle-Tacoma\n", "1 Dulles\n", "2 London Heathrow\n", "3 Schiphol\n", "4 Changi\n", "5 Pearson\n", "6 Narita\n", "dtype: object" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports = pd.Series([\n", " 'Seattle-Tacoma', \n", " 'Dulles', \n", " 'London Heathrow', \n", " 'Schiphol', \n", " 'Changi', \n", " 'Pearson', \n", " 'Narita'\n", " ])\n", "\n", "# When using a notebook, you can use the print statement\n", "# print(airports) to examine the contents of a variable\n", "# or you can print a value on the screen by just typing the object name\n", "airports" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can reference an individual value in a Series using it's index" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'London Heathrow'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports[2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use a loop to iterate through all the values in a Series" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Seattle-Tacoma\n", "Dulles\n", "London Heathrow\n", "Schiphol\n", "Changi\n", "Pearson\n", "Narita\n" ] } ], "source": [ "for value in airports:\n", " print(value) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DataFrame\n", "Most of the time when we are working with pandas we are dealing with two-dimensional arrays\n", "\n", "The pandas **DataFrame** can store two dimensional arrays" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012
0Seatte-TacomaSeattleUSA
1DullesWashingtonUSA
2London HeathrowLondonUnited Kingdom
3SchipholAmsterdamNetherlands
4ChangiSingaporeSingapore
5PearsonTorontoCanada
6NaritaTokyoJapan
\n", "
" ], "text/plain": [ " 0 1 2\n", "0 Seatte-Tacoma Seattle USA\n", "1 Dulles Washington USA\n", "2 London Heathrow London United Kingdom\n", "3 Schiphol Amsterdam Netherlands\n", "4 Changi Singapore Singapore\n", "5 Pearson Toronto Canada\n", "6 Narita Tokyo Japan" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports = pd.DataFrame([\n", " ['Seatte-Tacoma', 'Seattle', 'USA'],\n", " ['Dulles', 'Washington', 'USA'],\n", " ['London Heathrow', 'London', 'United Kingdom'],\n", " ['Schiphol', 'Amsterdam', 'Netherlands'],\n", " ['Changi', 'Singapore', 'Singapore'],\n", " ['Pearson', 'Toronto', 'Canada'],\n", " ['Narita', 'Tokyo', 'Japan']\n", " ])\n", "\n", "airports" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the **columns** parameter to specify names for the columns when you create the DataFrame" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCityCountry
0Seatte-TacomaSeattleUSA
1DullesWashingtonUSA
2London HeathrowLondonUnited Kingdom
3SchipholAmsterdamNetherlands
4ChangiSingaporeSingapore
5PearsonTorontoCanada
6NaritaTokyoJapan
\n", "
" ], "text/plain": [ " Name City Country\n", "0 Seatte-Tacoma Seattle USA\n", "1 Dulles Washington USA\n", "2 London Heathrow London United Kingdom\n", "3 Schiphol Amsterdam Netherlands\n", "4 Changi Singapore Singapore\n", "5 Pearson Toronto Canada\n", "6 Narita Tokyo Japan" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports = pd.DataFrame([\n", " ['Seatte-Tacoma', 'Seattle', 'USA'],\n", " ['Dulles', 'Washington', 'USA'],\n", " ['London Heathrow', 'London', 'United Kingdom'],\n", " ['Schiphol', 'Amsterdam', 'Netherlands'],\n", " ['Changi', 'Singapore', 'Singapore'],\n", " ['Pearson', 'Toronto', 'Canada'],\n", " ['Narita', 'Tokyo', 'Japan']\n", " ],\n", " columns = ['Name', 'City', 'Country']\n", " )\n", "\n", "airports " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: even-more-python-for-beginners-data-tools/03 - Intro to Pandas/README.md ================================================ # pandas [pandas](https://pandas/pydata.org​) is an open source Python library contains a number of high performance data structures and tools for data analysis. ## Documentation - [Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html) stores one dimensional arrays - [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html) stores two dimensional arrays and can contain different datatypes ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner) ================================================ FILE: even-more-python-for-beginners-data-tools/04 - Examining Pandas DataFrame contents/04 - Exploring pandas DataFrame contents.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Examining pandas DataFrame contents\n", "It's useful to be able to quickly examine the contents of a DataFrame. \n", "\n", "Let's start by importing the pandas library and creating a DataFrame populated with information about airports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCityCountry
0Seatte-TacomaSeattleUSA
1DullesWashingtonUSA
2HeathrowLondonUnited Kingdom
3SchipholAmsterdamNetherlands
4ChangiSingaporeSingapore
5PearsonTorontoCanada
6NaritaTokyoJapan
\n", "
" ], "text/plain": [ " Name City Country\n", "0 Seatte-Tacoma Seattle USA\n", "1 Dulles Washington USA\n", "2 Heathrow London United Kingdom\n", "3 Schiphol Amsterdam Netherlands\n", "4 Changi Singapore Singapore\n", "5 Pearson Toronto Canada\n", "6 Narita Tokyo Japan" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports = pd.DataFrame([\n", " ['Seatte-Tacoma', 'Seattle', 'USA'],\n", " ['Dulles', 'Washington', 'USA'],\n", " ['Heathrow', 'London', 'United Kingdom'],\n", " ['Schiphol', 'Amsterdam', 'Netherlands'],\n", " ['Changi', 'Singapore', 'Singapore'],\n", " ['Pearson', 'Toronto', 'Canada'],\n", " ['Narita', 'Tokyo', 'Japan']\n", " ],\n", " columns = ['Name', 'City', 'Country']\n", " )\n", "\n", "airports " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Returning first *n* rows\n", "If you have thousands of rows, you might just want to look at the first few rows\n", "\n", "* **head**(*n*) returns the top *n* rows " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCityCountry
0Seatte-TacomaSeattleUSA
1DullesWashingtonUSA
2HeathrowLondonUnited Kingdom
\n", "
" ], "text/plain": [ " Name City Country\n", "0 Seatte-Tacoma Seattle USA\n", "1 Dulles Washington USA\n", "2 Heathrow London United Kingdom" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Returning last *n* rows\n", "Looking at the last rows in a DataFrame can be a good way to check that all your data loaded correctly\n", "* **tail**(*n*) returns the last *n* rows" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCityCountry
4ChangiSingaporeSingapore
5PearsonTorontoCanada
6NaritaTokyoJapan
\n", "
" ], "text/plain": [ " Name City Country\n", "4 Changi Singapore Singapore\n", "5 Pearson Toronto Canada\n", "6 Narita Tokyo Japan" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports.tail(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Checkign number of rows and columns in DataFrame\n", "Sometimes you just need to know how much data you have in the DataFrame\n", "\n", "* **shape** returns the number of rows and columns" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(7, 3)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting mroe detailed information about DataFrame contents\n", "\n", "* **info**() returns more detailed information about the DataFrame\n", "\n", "Information returned includes:\n", "* The number of rows, and the range of index values\n", "* The number of columns\n", "* For each column: column name, number of non-null values, the datatype\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 7 entries, 0 to 6\n", "Data columns (total 3 columns):\n", "Name 7 non-null object\n", "City 7 non-null object\n", "Country 7 non-null object\n", "dtypes: object(3)\n", "memory usage: 148.0+ bytes\n" ] } ], "source": [ "airports.info()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: even-more-python-for-beginners-data-tools/04 - Examining Pandas DataFrame contents/README.md ================================================ # Examining pandas DataFrame contents The pandas [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) is a structure for storing two-dimensional tabular data. ## Common functions - [head](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) returns the first *n* rows from the DataFrame - [tail](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) returns the last *n* rows from the DataFrame - [shape](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html) returns the dimensions of the DataFrame (e.g. number of rows and columns) - [info](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html) provides a summary of the DataFrame content including column names, their datatypes, and number of rows containing non-null values ================================================ FILE: even-more-python-for-beginners-data-tools/05 - Query a pandas Dataframe/05 - Querying DataFrames.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Query a pandas DataFrame \n", "\n", "Returning a portion of the data in a DataFrame is called slicing or dicing the data\n", "\n", "There are many different ways to query a pandas DataFrame, here are a few to get you started" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCityCountry
0Seatte-TacomaSeattleUSA
1DullesWashingtonUSA
2London HeathrowLondonUnited Kingdom
3SchipholAmsterdamNetherlands
4ChangiSingaporeSingapore
5PearsonTorontoCanada
6NaritaTokyoJapan
\n", "
" ], "text/plain": [ " Name City Country\n", "0 Seatte-Tacoma Seattle USA\n", "1 Dulles Washington USA\n", "2 London Heathrow London United Kingdom\n", "3 Schiphol Amsterdam Netherlands\n", "4 Changi Singapore Singapore\n", "5 Pearson Toronto Canada\n", "6 Narita Tokyo Japan" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports = pd.DataFrame([\n", " ['Seatte-Tacoma', 'Seattle', 'USA'],\n", " ['Dulles', 'Washington', 'USA'],\n", " ['London Heathrow', 'London', 'United Kingdom'],\n", " ['Schiphol', 'Amsterdam', 'Netherlands'],\n", " ['Changi', 'Singapore', 'Singapore'],\n", " ['Pearson', 'Toronto', 'Canada'],\n", " ['Narita', 'Tokyo', 'Japan']\n", " ],\n", " columns = ['Name', 'City', 'Country']\n", " )\n", "airports " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Return one column\n", "Specify the name of the column you want to return\n", "* *DataFrameName*['*columnName*']\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Seattle\n", "1 Washington\n", "2 London\n", "3 Amsterdam\n", "4 Singapore\n", "5 Toronto\n", "6 Tokyo\n", "Name: City, dtype: object" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports['City']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Return multiple columns\n", "Provide a list of the columns you want to return\n", "* *DataFrameName*[['*FirstColumnName*','*SecondColumnName*',...]]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCountry
0Seatte-TacomaUSA
1DullesUSA
2London HeathrowUnited Kingdom
3SchipholNetherlands
4ChangiSingapore
5PearsonCanada
6NaritaJapan
\n", "
" ], "text/plain": [ " Name Country\n", "0 Seatte-Tacoma USA\n", "1 Dulles USA\n", "2 London Heathrow United Kingdom\n", "3 Schiphol Netherlands\n", "4 Changi Singapore\n", "5 Pearson Canada\n", "6 Narita Japan" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports[['Name', 'Country']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using *iloc* to specify rows and columns to return\n", "**iloc**[*rows*,*columns*] allows you to access a group of rows or columns by row and column index positions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You specify the specific row and column you want returned\n", "* First row is row 0\n", "* First column is column 0" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Seatte-Tacoma'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Return the value in the first row, first column\n", "airports.iloc[0,0]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'United Kingdom'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Return the value in the third row, third column\n", "airports.iloc[2,2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A value of *:* returns all rows or all columns" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCityCountry
0Seatte-TacomaSeattleUSA
1DullesWashingtonUSA
2London HeathrowLondonUnited Kingdom
3SchipholAmsterdamNetherlands
4ChangiSingaporeSingapore
5PearsonTorontoCanada
6NaritaTokyoJapan
\n", "
" ], "text/plain": [ " Name City Country\n", "0 Seatte-Tacoma Seattle USA\n", "1 Dulles Washington USA\n", "2 London Heathrow London United Kingdom\n", "3 Schiphol Amsterdam Netherlands\n", "4 Changi Singapore Singapore\n", "5 Pearson Toronto Canada\n", "6 Narita Tokyo Japan" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports.iloc[:,:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can request a range of rows or a range of columns\n", "* [x:y] will return rows or columns x through y" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCityCountry
0Seatte-TacomaSeattleUSA
1DullesWashingtonUSA
\n", "
" ], "text/plain": [ " Name City Country\n", "0 Seatte-Tacoma Seattle USA\n", "1 Dulles Washington USA" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Return the first two rows and display all columns \n", "airports.iloc[0:2,:]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCity
0Seatte-TacomaSeattle
1DullesWashington
2London HeathrowLondon
3SchipholAmsterdam
4ChangiSingapore
5PearsonToronto
6NaritaTokyo
\n", "
" ], "text/plain": [ " Name City\n", "0 Seatte-Tacoma Seattle\n", "1 Dulles Washington\n", "2 London Heathrow London\n", "3 Schiphol Amsterdam\n", "4 Changi Singapore\n", "5 Pearson Toronto\n", "6 Narita Tokyo" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Return all rows and display the first two columns\n", "airports.iloc[:,0:2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can request a list of rows or a list of columns\n", "* [x,y,z] will return rows or columns x,y, and z" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCountry
0Seatte-TacomaUSA
1DullesUSA
2London HeathrowUnited Kingdom
3SchipholNetherlands
4ChangiSingapore
5PearsonCanada
6NaritaJapan
\n", "
" ], "text/plain": [ " Name Country\n", "0 Seatte-Tacoma USA\n", "1 Dulles USA\n", "2 London Heathrow United Kingdom\n", "3 Schiphol Netherlands\n", "4 Changi Singapore\n", "5 Pearson Canada\n", "6 Narita Japan" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports.iloc[:,[0,2]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using *loc* to specify columns by name\n", "If you want to list the column names instead of the column positions use **loc** instead of **iloc**" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCountry
0Seatte-TacomaUSA
1DullesUSA
2London HeathrowUnited Kingdom
3SchipholNetherlands
4ChangiSingapore
5PearsonCanada
6NaritaJapan
\n", "
" ], "text/plain": [ " Name Country\n", "0 Seatte-Tacoma USA\n", "1 Dulles USA\n", "2 London Heathrow United Kingdom\n", "3 Schiphol Netherlands\n", "4 Changi Singapore\n", "5 Pearson Canada\n", "6 Narita Japan" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports.loc[:,['Name', 'Country']]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: even-more-python-for-beginners-data-tools/05 - Query a pandas Dataframe/README.md ================================================ # Query a pandas DataFrame The pandas [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) is a structure for storing two-dimensional tabular data. ## Common properties - [loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) returns specific rows and columns by specifying column names - [iloc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html) returns specific rows and columns by specifying column positions ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner) ================================================ FILE: even-more-python-for-beginners-data-tools/06 - CSV Files and Jupyter Notebooks/README.md ================================================ # CSV Files and Jupyter Notebooks CSV files are comma separated variable file. CSV files are frequently used to store data. In order to access the data in a CSV file from a Jupyter Notebook you must upload the file. ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner) ================================================ FILE: even-more-python-for-beginners-data-tools/06 - CSV Files and Jupyter Notebooks/airports.csv ================================================ Name,City,Country Seattle-Tacoma,Seattle,USA Dulles,Washington,USA Heathrow,London,United Kingdom Schiphol,Amsterdam,Netherlands Changi,Singapore,Singapore Pearson,Toronto,Canada Narita,Tokyo,Japan ================================================ FILE: even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/07 - Read write CSV files.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Read and write CSV files with pandas DataFrames\n", "\n", "You can load data from a CSV file directly into a pandas DataFrame" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading a CSV file into a pandas DataFrame\n", "**read_csv** allows you to read the contents of a csv file into a DataFrame\n", "\n", "airports.csv contains the following: \n", "\n", "Name,City,Country \n", "Seattle-Tacoma,Seattle,USA \n", "Dulles,Washington,USA \n", "Heathrow,London,United Kingdom \n", "Schiphol,Amsterdam,Netherlands \n", "Changi,Singapore,Singapore \n", "Pearson,Toronto,Canada \n", "Narita,Tokyo,Japan" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCityCountry
0Seattle-TacomaSeattleUSA
1DullesWashingtonUSA
2HeathrowLondonUnited Kingdom
3SchipholAmsterdamNetherlands
4ChangiSingaporeSingapore
5PearsonTorontoCanada
6NaritaTokyoJapan
\n", "
" ], "text/plain": [ " Name City Country\n", "0 Seattle-Tacoma Seattle USA\n", "1 Dulles Washington USA\n", "2 Heathrow London United Kingdom\n", "3 Schiphol Amsterdam Netherlands\n", "4 Changi Singapore Singapore\n", "5 Pearson Toronto Canada\n", "6 Narita Tokyo Japan" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports_df = pd.read_csv('Data/airports.csv')\n", "airports_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Handling rows with errors\n", "By default rows with an extra , or other issues cause an error\n", "\n", "Note the extra , in the row for Heathrow London in airportsInvalidRows.csv: \n", "\n", "Name,City,Country \n", "Seattle-Tacoma,Seattle,USA \n", "Dulles,Washington,USA \n", "Heathrow,London,,United Kingdom \n", "Schiphol,Amsterdam,Netherlands \n", "Changi,Singapore,Singapore \n", "Pearson,Toronto,Canada \n", "Narita,Tokyo,Japan " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "ename": "ParserError", "evalue": "Error tokenizing data. C error: Expected 3 fields in line 4, saw 4\n", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mParserError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mairports_df\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'Data/airportsInvalidRows.csv'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2\u001b[0m \u001b[0mairports_df\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mparser_f\u001b[1;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)\u001b[0m\n\u001b[0;32m 683\u001b[0m )\n\u001b[0;32m 684\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 685\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 686\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 687\u001b[0m \u001b[0mparser_f\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mname\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36m_read\u001b[1;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[0;32m 461\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 462\u001b[0m \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 463\u001b[1;33m \u001b[0mdata\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mparser\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 464\u001b[0m \u001b[1;32mfinally\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 465\u001b[0m \u001b[0mparser\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mclose\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mread\u001b[1;34m(self, nrows)\u001b[0m\n\u001b[0;32m 1152\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mNone\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1153\u001b[0m \u001b[0mnrows\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0m_validate_integer\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"nrows\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1154\u001b[1;33m \u001b[0mret\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1155\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1156\u001b[0m \u001b[1;31m# May alter columns / col_dict\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mread\u001b[1;34m(self, nrows)\u001b[0m\n\u001b[0;32m 2046\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mNone\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2047\u001b[0m \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2048\u001b[1;33m \u001b[0mdata\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_reader\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2049\u001b[0m \u001b[1;32mexcept\u001b[0m \u001b[0mStopIteration\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2050\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_first_chunk\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader.read\u001b[1;34m()\u001b[0m\n", "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_low_memory\u001b[1;34m()\u001b[0m\n", "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_rows\u001b[1;34m()\u001b[0m\n", "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._tokenize_rows\u001b[1;34m()\u001b[0m\n", "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.raise_parser_error\u001b[1;34m()\u001b[0m\n", "\u001b[1;31mParserError\u001b[0m: Error tokenizing data. C error: Expected 3 fields in line 4, saw 4\n" ] } ], "source": [ "airports_df = pd.read_csv('Data/airportsInvalidRows.csv')\n", "airports_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Specify **error_bad_lines=False** to skip any rows with errors" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "b'Skipping line 4: expected 3 fields, saw 4\\n'\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCityCountry
0Seattle-TacomaSeattleUSA
1DullesWashingtonUSA
2SchipholAmsterdamNetherlands
3ChangiSingaporeSingapore
4PearsonTorontoCanada
5NaritaTokyoJapan
\n", "
" ], "text/plain": [ " Name City Country\n", "0 Seattle-Tacoma Seattle USA\n", "1 Dulles Washington USA\n", "2 Schiphol Amsterdam Netherlands\n", "3 Changi Singapore Singapore\n", "4 Pearson Toronto Canada\n", "5 Narita Tokyo Japan" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports_df = pd.read_csv(\n", " 'Data/airportsInvalidRows.csv', \n", " error_bad_lines=False\n", " )\n", "airports_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Handling files which do not contain column headers\n", "If your file does not have the column headers in the first row by default, the first row of data is treated as headers\n", "\n", "airportsNoHeaderRows.csv contains airport data but does not have a row specifying the column headers:\n", "\n", "Seattle-Tacoma,Seattle,USA \n", "Dulles,Washington,USA \n", "Heathrow,London,United Kingdom \n", "Schiphol,Amsterdam,Netherlands \n", "Changi,Singapore,Singapore \n", "Pearson,Toronto,Canada \n", "Narita,Tokyo,Japan " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Seattle-TacomaSeattleUSA
0DullesWashingtonUSA
1HeathrowLondonUnited Kingdom
2SchipholAmsterdamNetherlands
3ChangiSingaporeSingapore
4PearsonTorontoCanada
5NaritaTokyoJapan
\n", "
" ], "text/plain": [ " Seattle-Tacoma Seattle USA\n", "0 Dulles Washington USA\n", "1 Heathrow London United Kingdom\n", "2 Schiphol Amsterdam Netherlands\n", "3 Changi Singapore Singapore\n", "4 Pearson Toronto Canada\n", "5 Narita Tokyo Japan" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports_df = pd.read_csv('Data/airportsNoHeaderRows.csv')\n", "airports_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Specify **header=None** if you do not have a Header row to avoid having the first row of data treated as a header row" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012
0Seattle-TacomaSeattleUSA
1DullesWashingtonUSA
2HeathrowLondonUnited Kingdom
3SchipholAmsterdamNetherlands
4ChangiSingaporeSingapore
5PearsonTorontoCanada
6NaritaTokyoJapan
\n", "
" ], "text/plain": [ " 0 1 2\n", "0 Seattle-Tacoma Seattle USA\n", "1 Dulles Washington USA\n", "2 Heathrow London United Kingdom\n", "3 Schiphol Amsterdam Netherlands\n", "4 Changi Singapore Singapore\n", "5 Pearson Toronto Canada\n", "6 Narita Tokyo Japan" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports_df = pd.read_csv(\n", " 'Data/airportsNoHeaderRows.csv', \n", " header=None\n", " )\n", "airports_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you do not have a header row you can use the **names** parameter to specify column names when data is loaded" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCityCountry
0Seattle-TacomaSeattleUSA
1DullesWashingtonUSA
2HeathrowLondonUnited Kingdom
3SchipholAmsterdamNetherlands
4ChangiSingaporeSingapore
5PearsonTorontoCanada
6NaritaTokyoJapan
\n", "
" ], "text/plain": [ " Name City Country\n", "0 Seattle-Tacoma Seattle USA\n", "1 Dulles Washington USA\n", "2 Heathrow London United Kingdom\n", "3 Schiphol Amsterdam Netherlands\n", "4 Changi Singapore Singapore\n", "5 Pearson Toronto Canada\n", "6 Narita Tokyo Japan" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports_df = pd.read_csv(\n", " 'Data/airportsNoHeaderRows.csv', \n", " header=None, \n", " names=['Name', 'City', 'Country']\n", " )\n", "airports_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Missing values in Data files\n", "Missing values appear in DataFrames as **NaN**\n", "\n", "There is no city listed for Schiphol airport in airportsBlankValues.csv :\n", "\n", "Name,City,Country \n", "Seattle-Tacoma,Seattle,USA \n", "Dulles,Washington,USA \n", "Heathrow,London,United Kingdom \n", "Schiphol,,Netherlands \n", "Changi,Singapore,Singapore \n", "Pearson,Toronto,Canada \n", "Narita,Tokyo,Japan" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCityCountry
0Seattle-TacomaSeattleUSA
1DullesWashingtonUSA
2HeathrowLondonUnited Kingdom
3SchipholNaNNetherlands
4ChangiSingaporeSingapore
5PearsonTorontoCanada
6NaritaTokyoJapan
\n", "
" ], "text/plain": [ " Name City Country\n", "0 Seattle-Tacoma Seattle USA\n", "1 Dulles Washington USA\n", "2 Heathrow London United Kingdom\n", "3 Schiphol NaN Netherlands\n", "4 Changi Singapore Singapore\n", "5 Pearson Toronto Canada\n", "6 Narita Tokyo Japan" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports_df = pd.read_csv('Data/airportsBlankValues.csv')\n", "airports_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Writing DataFrame contents to a CSV file\n", "**to_csv** will write the contents of a pandas DataFrame to a CSV file" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCityCountry
0Seattle-TacomaSeattleUSA
1DullesWashingtonUSA
2HeathrowLondonUnited Kingdom
3SchipholNaNNetherlands
4ChangiSingaporeSingapore
5PearsonTorontoCanada
6NaritaTokyoJapan
\n", "
" ], "text/plain": [ " Name City Country\n", "0 Seattle-Tacoma Seattle USA\n", "1 Dulles Washington USA\n", "2 Heathrow London United Kingdom\n", "3 Schiphol NaN Netherlands\n", "4 Changi Singapore Singapore\n", "5 Pearson Toronto Canada\n", "6 Narita Tokyo Japan" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports_df" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "airports_df.to_csv('Data/MyNewCSVFile.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The index column is written to the csv file\n", "\n", "Specify **index=False** if you do not want the index column to be included in the csv file" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "airports_df.to_csv(\n", " 'Data/MyNewCSVFileNoIndex.csv', \n", " index=False\n", " )" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/README.md ================================================ # Read and write CSV files from pandas DataFrames You can populate a DataFrame with the data in a CSV file. ## Common functions and properties - [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) reads a comma-separated values file into a DataFrame - [to_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html) writes contents of a DataFrame to a comma-separated values file - [NaN](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html) is the default representation of missing values ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner) ================================================ FILE: even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/airports.csv ================================================ Name,City,Country Seattle-Tacoma,Seattle,USA Dulles,Washington,USA Heathrow,London,United Kingdom Schiphol,Amsterdam,Netherlands Changi,Singapore,Singapore Pearson,Toronto,Canada Narita,Tokyo,Japan ================================================ FILE: even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/airportsBlankValues.csv ================================================ Name,City,Country Seattle-Tacoma,Seattle,USA Dulles,Washington,USA Heathrow,London,United Kingdom Schiphol,,Netherlands Changi,Singapore,Singapore Pearson,Toronto,Canada Narita,Tokyo,Japan ================================================ FILE: even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/airportsInvalidRows.csv ================================================ Name,City,Country Seattle-Tacoma,Seattle,USA Dulles,Washington,USA Heathrow,London,,United Kingdom Schiphol,Amsterdam,Netherlands Changi,Singapore,Singapore Pearson,Toronto,Canada Narita,Tokyo,Japan ================================================ FILE: even-more-python-for-beginners-data-tools/07 - Read and write CSV files from pandas DataFrames/airportsNoHeaderRows.csv ================================================ Seattle-Tacoma,Seattle,USA Dulles,Washington,USA Heathrow,London,United Kingdom Schiphol,Amsterdam,Netherlands Changi,Singapore,Singapore Pearson,Toronto,Canada Narita,Tokyo,Japan ================================================ FILE: even-more-python-for-beginners-data-tools/08 - Removing and splitting DataFrame columns/08 - Removing columns.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Removing and splitting pandas DataFrame columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you are preparing to train machine learning models, you often need to delete specific columns, or split certain columns from your DataFrame into a new DataFrame.\n", "\n", "We need the pandas library and a DataFrame to explore" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's load a bigger csv file with more columns, **flight_delays.csv** provides information about flights and flight delays" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FL_DATEOP_UNIQUE_CARRIERTAIL_NUMOP_CARRIER_FL_NUMORIGINDESTCRS_DEP_TIMEDEP_TIMEDEP_DELAYCRS_ARR_TIMEARR_TIMEARR_DELAYCRS_ELAPSED_TIMEACTUAL_ELAPSED_TIMEAIR_TIMEDISTANCE
02018-10-01WNN221WN802ABQBWI905903-214501433-172252101971670
12018-10-01WNN8329B3744ABQBWI15001458-220452020-252252021911670
22018-10-01WNN920WN1019ABQDAL18001802220452032-131059080580
32018-10-01WNN480WN1499ABQDAL950947-312351223-121059681580
42018-10-01WNN227WN3635ABQDAL11501151114301423-71009280580
\n", "
" ], "text/plain": [ " FL_DATE OP_UNIQUE_CARRIER TAIL_NUM OP_CARRIER_FL_NUM ORIGIN DEST \\\n", "0 2018-10-01 WN N221WN 802 ABQ BWI \n", "1 2018-10-01 WN N8329B 3744 ABQ BWI \n", "2 2018-10-01 WN N920WN 1019 ABQ DAL \n", "3 2018-10-01 WN N480WN 1499 ABQ DAL \n", "4 2018-10-01 WN N227WN 3635 ABQ DAL \n", "\n", " CRS_DEP_TIME DEP_TIME DEP_DELAY CRS_ARR_TIME ARR_TIME ARR_DELAY \\\n", "0 905 903 -2 1450 1433 -17 \n", "1 1500 1458 -2 2045 2020 -25 \n", "2 1800 1802 2 2045 2032 -13 \n", "3 950 947 -3 1235 1223 -12 \n", "4 1150 1151 1 1430 1423 -7 \n", "\n", " CRS_ELAPSED_TIME ACTUAL_ELAPSED_TIME AIR_TIME DISTANCE \n", "0 225 210 197 1670 \n", "1 225 202 191 1670 \n", "2 105 90 80 580 \n", "3 105 96 81 580 \n", "4 100 92 80 580 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "delays_df = pd.read_csv('Data/flight_delays.csv')\n", "delays_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Removing a column from a DataFrame.\n", "\n", "When you are preparing your data for machine learning, you may need to delete specific columns from the DataFrame before training the model.\n", "\n", "For example:\n", "Imagine you are training a model to predict how many minutes late a flight will be (ARR_DELAY)\n", "\n", "If the model knew the scheduled arrival time (CRS_ARR_TIME) and the actual arrival time (ARR_TIME), the model would quickly figure out ARR_DELAY = ARR_TIME - CRS_ARR_TIME\n", "\n", "When we predict arrival times for future flights, we won't have a value for arrival time (ARR_TIME). So we should remove this column from the DataFrame so it is not used as a feature when training the model to predict ARR_DELAY. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FL_DATEOP_UNIQUE_CARRIERTAIL_NUMOP_CARRIER_FL_NUMORIGINDESTCRS_DEP_TIMEDEP_TIMEDEP_DELAYCRS_ARR_TIMEARR_DELAYCRS_ELAPSED_TIMEACTUAL_ELAPSED_TIMEAIR_TIMEDISTANCE
02018-10-01WNN221WN802ABQBWI905903-21450-172252101971670
12018-10-01WNN8329B3744ABQBWI15001458-22045-252252021911670
22018-10-01WNN920WN1019ABQDAL1800180222045-131059080580
32018-10-01WNN480WN1499ABQDAL950947-31235-121059681580
42018-10-01WNN227WN3635ABQDAL1150115111430-71009280580
\n", "
" ], "text/plain": [ " FL_DATE OP_UNIQUE_CARRIER TAIL_NUM OP_CARRIER_FL_NUM ORIGIN DEST \\\n", "0 2018-10-01 WN N221WN 802 ABQ BWI \n", "1 2018-10-01 WN N8329B 3744 ABQ BWI \n", "2 2018-10-01 WN N920WN 1019 ABQ DAL \n", "3 2018-10-01 WN N480WN 1499 ABQ DAL \n", "4 2018-10-01 WN N227WN 3635 ABQ DAL \n", "\n", " CRS_DEP_TIME DEP_TIME DEP_DELAY CRS_ARR_TIME ARR_DELAY \\\n", "0 905 903 -2 1450 -17 \n", "1 1500 1458 -2 2045 -25 \n", "2 1800 1802 2 2045 -13 \n", "3 950 947 -3 1235 -12 \n", "4 1150 1151 1 1430 -7 \n", "\n", " CRS_ELAPSED_TIME ACTUAL_ELAPSED_TIME AIR_TIME DISTANCE \n", "0 225 210 197 1670 \n", "1 225 202 191 1670 \n", "2 105 90 80 580 \n", "3 105 96 81 580 \n", "4 100 92 80 580 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Remove the column ARR_TIME from the DataFrane delays_df\n", "\n", "#delays_df = delays_df.drop(['ARR_TIME'],axis=1)\n", "new_df = delays_df.drop(columns=['ARR_TIME'])\n", "new_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the **inplace** parameter to specify you want to drop the column from the original DataFrame" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FL_DATEOP_UNIQUE_CARRIERTAIL_NUMOP_CARRIER_FL_NUMORIGINDESTCRS_DEP_TIMEDEP_TIMEDEP_DELAYCRS_ARR_TIMEARR_DELAYCRS_ELAPSED_TIMEACTUAL_ELAPSED_TIMEAIR_TIMEDISTANCE
02018-10-01WNN221WN802ABQBWI905903-21450-172252101971670
12018-10-01WNN8329B3744ABQBWI15001458-22045-252252021911670
22018-10-01WNN920WN1019ABQDAL1800180222045-131059080580
32018-10-01WNN480WN1499ABQDAL950947-31235-121059681580
42018-10-01WNN227WN3635ABQDAL1150115111430-71009280580
\n", "
" ], "text/plain": [ " FL_DATE OP_UNIQUE_CARRIER TAIL_NUM OP_CARRIER_FL_NUM ORIGIN DEST \\\n", "0 2018-10-01 WN N221WN 802 ABQ BWI \n", "1 2018-10-01 WN N8329B 3744 ABQ BWI \n", "2 2018-10-01 WN N920WN 1019 ABQ DAL \n", "3 2018-10-01 WN N480WN 1499 ABQ DAL \n", "4 2018-10-01 WN N227WN 3635 ABQ DAL \n", "\n", " CRS_DEP_TIME DEP_TIME DEP_DELAY CRS_ARR_TIME ARR_DELAY \\\n", "0 905 903 -2 1450 -17 \n", "1 1500 1458 -2 2045 -25 \n", "2 1800 1802 2 2045 -13 \n", "3 950 947 -3 1235 -12 \n", "4 1150 1151 1 1430 -7 \n", "\n", " CRS_ELAPSED_TIME ACTUAL_ELAPSED_TIME AIR_TIME DISTANCE \n", "0 225 210 197 1670 \n", "1 225 202 191 1670 \n", "2 105 90 80 580 \n", "3 105 96 81 580 \n", "4 100 92 80 580 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Remove the column ARR_TIME from the DataFrame delays_df\n", "\n", "#delays_df = delays_df.drop(['ARR_TIME'],axis=1)\n", "delays_df.drop(columns=['ARR_TIME'], inplace=True)\n", "delays_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use different techniques to predict based on quantititative values which are usually numeric values (e.g. distance, number of minutes, weight) and qualitative (descriptive) values which may not be numeric (e.g. what airport a flight left from, what airline operated the flight)\n", "\n", "Quantitative data may be moved into a separate DataFrame before training a model.\n", "\n", "You also need to put the value you want to predict, called the label (ARR_DELAY) in a separate DataFrame from the values you think can help you make the prediction, called the features\n", "\n", "We need to be able to create a new dataframe from the columns in an existing dataframe" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Create a new DataFrame called desc_df\n", "# include all rows\n", "# include the columns ORIGIN, DEST, OP_CARRIER_FL_NUM, OP_UNIQUE_CARRIER, TAIL_NUM\n", "\n", "desc_df = delays_df.loc[:,['ORIGIN', 'DEST', 'OP_CARRIER_FL_NUM', 'OP_UNIQUE_CARRIER', 'TAIL_NUM']]\n", "desc_df.head()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: even-more-python-for-beginners-data-tools/08 - Removing and splitting DataFrame columns/README.md ================================================ # Removing and splitting DataFrame columns When preparing data for machine learning you may need to remove specific columns from the DataFrame. ## Common functions - [drop](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html) deletes specified columns from a DataFrame ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner) ================================================ FILE: even-more-python-for-beginners-data-tools/08 - Removing and splitting DataFrame columns/flight_delays.csv ================================================ FL_DATE,OP_UNIQUE_CARRIER,TAIL_NUM,OP_CARRIER_FL_NUM,ORIGIN,DEST,CRS_DEP_TIME,DEP_TIME,DEP_DELAY,CRS_ARR_TIME,ARR_TIME,ARR_DELAY,CRS_ELAPSED_TIME,ACTUAL_ELAPSED_TIME,AIR_TIME,DISTANCE 2018-10-01,WN,N221WN,802,ABQ,BWI,905,903,-2,1450,1433,-17,225,210,197,1670 2018-10-01,WN,N8329B,3744,ABQ,BWI,1500,1458,-2,2045,2020,-25,225,202,191,1670 2018-10-01,WN,N920WN,1019,ABQ,DAL,1800,1802,2,2045,2032,-13,105,90,80,580 2018-10-01,WN,N480WN,1499,ABQ,DAL,950,947,-3,1235,1223,-12,105,96,81,580 2018-10-01,WN,N227WN,3635,ABQ,DAL,1150,1151,1,1430,1423,-7,100,92,80,580 2018-10-01,WN,N243WN,3998,ABQ,DAL,655,652,-3,940,924,-16,105,92,83,580 2018-10-01,WN,N485WN,5432,ABQ,DAL,1340,1354,14,1625,1631,6,105,97,81,580 2018-10-01,WN,N229WN,4596,ABQ,DEN,1420,1444,24,1540,1552,12,80,68,55,349 2018-10-01,WN,N934WN,6013,ABQ,DEN,910,907,-3,1025,1027,2,75,80,52,349 2018-10-01,WN,N934WN,6015,ABQ,DEN,1735,1742,7,1845,1854,9,70,72,58,349 2018-10-01,WN,N8615E,2885,ABQ,HOU,1240,1239,-1,1540,1539,-1,120,120,108,759 2018-10-01,WN,N965WN,3939,ABQ,HOU,640,640,0,940,938,-2,120,118,103,759 2018-10-01,WN,N408WN,4025,ABQ,HOU,1555,1610,15,1850,1906,16,115,116,103,759 2018-10-01,WN,N913WN,1642,ABQ,LAS,1040,1037,-3,1115,1057,-18,95,80,69,486 2018-10-01,WN,N927WN,3271,ABQ,LAS,1615,1614,-1,1645,1646,1,90,92,75,486 2018-10-01,WN,N732SW,4816,ABQ,LAS,605,601,-4,635,628,-7,90,87,73,486 2018-10-01,WN,N496WN,6095,ABQ,LAS,2130,2123,-7,2155,2146,-9,85,83,70,486 2018-10-01,WN,N468WN,555,ABQ,LAX,1710,1708,-2,1815,1805,-10,125,117,92,677 2018-10-01,WN,N7751A,3858,ABQ,LAX,545,541,-4,645,638,-7,120,117,102,677 2018-10-01,WN,N435WN,5757,ABQ,MCI,1720,2119,239,2005,2357,232,105,98,87,718 2018-10-01,WN,N556WN,538,ABQ,MDW,1705,1756,51,2040,2114,34,155,138,129,1121 2018-10-01,WN,N410WN,4837,ABQ,MDW,705,708,3,1045,1032,-13,160,144,127,1121 2018-10-01,WN,N8726H,792,ABQ,OAK,815,809,-6,940,928,-12,145,139,123,889 2018-10-01,WN,N956WN,5673,ABQ,OAK,1125,1221,56,1250,1337,47,145,136,121,889 2018-10-01,WN,N739GB,5753,ABQ,OAK,1915,1915,0,2035,2029,-6,140,134,121,889 2018-10-01,WN,N7723E,5516,ABQ,PDX,1020,1017,-3,1215,1204,-11,175,167,157,1111 2018-10-01,WN,N770SA,1415,ABQ,PHX,945,944,-1,1005,949,-16,80,65,54,328 2018-10-01,WN,N730SW,2782,ABQ,PHX,1410,1424,14,1430,1431,1,80,67,55,328 2018-10-01,WN,N7725A,2863,ABQ,PHX,700,702,2,720,720,0,80,78,56,328 2018-10-01,WN,N450WN,4114,ABQ,PHX,1935,1931,-4,1950,1959,9,75,88,58,328 2018-10-01,WN,N8673F,5500,ABQ,PHX,1625,1630,5,1640,1636,-4,75,66,58,328 2018-10-01,WN,N948WN,6315,ABQ,PHX,1120,1126,6,1140,1138,-2,80,72,57,328 2018-10-01,WN,N566WN,19,ABQ,SAN,1505,1551,46,1555,1631,36,110,100,87,628 2018-10-01,WN,N957WN,4832,ABQ,SAN,610,616,6,700,658,-2,110,102,91,628 2018-10-01,WN,N8704Q,824,ALB,BWI,805,801,-4,920,911,-9,75,70,54,289 2018-10-01,WN,N903WN,1758,ALB,BWI,605,601,-4,720,706,-14,75,65,55,289 2018-10-01,WN,N8572X,2790,ALB,BWI,925,928,3,1040,1031,-9,75,63,53,289 2018-10-01,WN,N7701B,3292,ALB,BWI,1315,1308,-7,1435,1417,-18,80,69,58,289 2018-10-01,WN,N295WN,3376,ALB,BWI,1105,1101,-4,1220,1206,-14,75,65,53,289 2018-10-01,WN,N716SW,4898,ALB,BWI,1710,1707,-3,1825,1815,-10,75,68,56,289 2018-10-01,WN,N8674B,5153,ALB,DEN,1850,1849,-1,2050,2045,-5,240,236,223,1610 2018-10-01,WN,N8643A,390,ALB,MCO,705,705,0,955,954,-1,170,169,149,1073 2018-10-01,WN,N730SW,2776,ALB,MDW,630,625,-5,735,744,9,125,139,120,717 2018-10-01,WN,N798SW,4197,ALB,MDW,1655,1652,-3,1805,1815,10,130,143,112,717 2018-10-01,WN,N729SW,988,AMA,DAL,1605,1615,10,1720,1713,-7,75,58,48,323 2018-10-01,WN,N933WN,1913,AMA,DAL,605,603,-2,720,705,-15,75,62,50,323 2018-10-01,WN,N7706A,5226,AMA,DAL,1045,1047,2,1155,1156,1,70,69,52,323 2018-10-01,WN,N755SA,6984,AMA,DAL,1830,1825,-5,1940,1921,-19,70,56,48,323 2018-10-01,WN,N211WN,6822,AMA,LAS,1425,1429,4,1425,1438,13,120,129,107,758 2018-10-01,WN,N928WN,4261,ATL,AUS,1015,1011,-4,1140,1137,-3,145,146,123,813 2018-10-01,WN,N8581Z,4701,ATL,AUS,2030,2024,-6,2150,2133,-17,140,129,109,813 2018-10-01,WN,N950WN,5615,ATL,AUS,1645,1647,2,1810,1805,-5,145,138,112,813 2018-10-01,WN,N932WN,106,ATL,BNA,2215,2211,-4,2215,2205,-10,60,54,39,214 2018-10-01,WN,N739GB,2583,ATL,BNA,800,756,-4,755,752,-3,55,56,42,214 2018-10-01,WN,N454WN,3766,ATL,BNA,1955,1951,-4,2000,1948,-12,65,57,39,214 2018-10-01,WN,N7716A,4165,ATL,BNA,1225,1226,1,1235,1226,-9,70,60,41,214 2018-10-01,WN,N7822A,4501,ATL,BNA,1750,1745,-5,1745,1742,-3,55,57,40,214 2018-10-01,WN,N8324A,3360,ATL,BOS,1330,1500,90,1605,1737,92,155,157,126,946 2018-10-01,WN,N444WN,3987,ATL,BOS,2210,2204,-6,50,24,-26,160,140,118,946 2018-10-01,WN,N472WN,1031,ATL,BWI,1120,1119,-1,1310,1303,-7,110,104,81,577 2018-10-01,WN,N758SW,1526,ATL,BWI,800,757,-3,945,934,-11,105,97,81,577 2018-10-01,WN,N8642E,1922,ATL,BWI,1700,1656,-4,1850,1840,-10,110,104,88,577 2018-10-01,WN,N7838A,3991,ATL,BWI,2115,2202,47,2305,2339,34,110,97,80,577 2018-10-01,WN,N7839A,4436,ATL,BWI,1905,1904,-1,2100,2044,-16,115,100,85,577 2018-10-01,WN,N8509U,5150,ATL,BWI,1340,1416,36,1530,1548,18,110,92,79,577 2018-10-01,WN,N242WN,2574,ATL,CLE,835,833,-2,1015,1016,1,100,103,79,554 2018-10-01,WN,N961WN,5133,ATL,CLE,2200,2155,-5,2335,2334,-1,95,99,80,554 2018-10-01,WN,N8503A,2571,ATL,CMH,1540,1540,0,1710,1711,1,90,91,65,447 2018-10-01,WN,N282WN,6348,ATL,CMH,835,834,-1,1005,1005,0,90,91,67,447 2018-10-01,WN,N293WN,6661,ATL,CMH,2200,2208,8,2325,2332,7,85,84,67,447 2018-10-01,WN,N954WN,63,ATL,DAL,2010,2010,0,2125,2118,-7,135,128,101,721 2018-10-01,WN,N764SW,2838,ATL,DAL,720,719,-1,825,816,-9,125,117,103,721 2018-10-01,WN,N8549Z,3845,ATL,DAL,1740,1738,-2,1845,1838,-7,125,120,102,721 2018-10-01,WN,N8620H,5577,ATL,DAL,1045,1102,17,1200,1208,8,135,126,103,721 2018-10-01,WN,N8317M,6768,ATL,DAL,1330,1331,1,1440,1431,-9,130,120,101,721 2018-10-01,WN,N550WN,1347,ATL,DCA,1525,1542,17,1710,1724,14,105,102,79,547 2018-10-01,WN,N8648A,2600,ATL,DCA,715,716,1,855,851,-4,100,95,82,547 2018-10-01,WN,N726SW,4747,ATL,DCA,2025,2027,2,2210,2209,-1,105,102,83,547 2018-10-01,WN,N8665D,5207,ATL,DCA,1030,1105,35,1215,1241,26,105,96,80,547 2018-10-01,WN,N930WN,208,ATL,DEN,1835,1846,11,1945,1948,3,190,182,167,1199 2018-10-01,WN,N400WN,4133,ATL,DEN,610,607,-3,715,711,-4,185,184,170,1199 2018-10-01,WN,N7817J,4139,ATL,DEN,1430,1432,2,1540,1539,-1,190,187,167,1199 2018-10-01,WN,N8647A,5960,ATL,DEN,955,1008,13,1110,1112,2,195,184,162,1199 2018-10-01,WN,N212WN,115,ATL,DTW,835,832,-3,1025,1024,-1,110,112,86,594 2018-10-01,WN,N256WN,1896,ATL,DTW,2200,2200,0,2345,2349,4,105,109,86,594 2018-10-01,WN,N8556Z,5388,ATL,DTW,1530,1531,1,1730,1727,-3,120,116,86,594 2018-10-01,WN,N279WN,2775,ATL,FLL,1110,1111,1,1310,1250,-20,120,99,82,581 2018-10-01,WN,N945WN,3088,ATL,FLL,1340,1351,11,1535,1529,-6,115,98,82,581 2018-10-01,WN,N8699A,5459,ATL,FLL,650,645,-5,835,820,-15,105,95,80,581 2018-10-01,WN,N8691A,6191,ATL,FLL,1950,1958,8,2140,2146,6,110,108,83,581 2018-10-01,WN,N8548P,964,ATL,GSP,1540,1539,-1,1630,1628,-2,50,49,27,153 2018-10-01,WN,N258WN,5417,ATL,GSP,2205,2240,35,2255,2322,27,50,42,27,153 2018-10-01,WN,N8660A,6185,ATL,GSP,1105,1101,-4,1200,1145,-15,55,44,26,153 2018-10-01,WN,N274WN,343,ATL,HOU,1810,1808,-2,1920,1901,-19,130,113,100,696 2018-10-01,WN,N230WN,1176,ATL,HOU,1955,1955,0,2105,2057,-8,130,122,101,696 2018-10-01,WN,N786SW,1433,ATL,HOU,1130,1308,98,1235,1443,128,125,155,140,696 2018-10-01,WN,N452WN,2847,ATL,HOU,605,601,-4,710,659,-11,125,118,99,696 2018-10-01,WN,N8619F,5161,ATL,HOU,1340,1333,-7,1440,1503,23,120,150,136,696 2018-10-01,WN,N8513F,812,ATL,IAD,1535,1535,0,1725,1727,2,110,112,78,534 ================================================ FILE: even-more-python-for-beginners-data-tools/09 - Handling duplicates and rows with missing values/09 - Removing rows.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Handling duplicate rows and rows with missing values\n", "\n", "Most machine learning algorithms will return an error if they encounter a missing value. So, you often have to remove rows with missing values from your DataFrame.\n", "\n", "To learn how, we need to create a pandas DataFrame and load it with data." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The flight delays data set contains information about flights and flight delays" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FL_DATEOP_UNIQUE_CARRIERTAIL_NUMOP_CARRIER_FL_NUMORIGINDESTCRS_DEP_TIMEDEP_TIMEDEP_DELAYCRS_ARR_TIMEARR_TIMEARR_DELAYCRS_ELAPSED_TIMEACTUAL_ELAPSED_TIMEAIR_TIMEDISTANCE
02018-10-01WNN221WN802ABQBWI905903.0-2.014501433.0-17.0225210.0197.01670
12018-10-01WNN8329B3744ABQBWI15001458.0-2.020452020.0-25.0225202.0191.01670
22018-10-01WNN920WN1019ABQDAL18001802.02.020452032.0-13.010590.080.0580
32018-10-01WNN480WN1499ABQDAL950947.0-3.012351223.0-12.010596.081.0580
42018-10-01WNN227WN3635ABQDAL11501151.01.014301423.0-7.010092.080.0580
\n", "
" ], "text/plain": [ " FL_DATE OP_UNIQUE_CARRIER TAIL_NUM OP_CARRIER_FL_NUM ORIGIN DEST \\\n", "0 2018-10-01 WN N221WN 802 ABQ BWI \n", "1 2018-10-01 WN N8329B 3744 ABQ BWI \n", "2 2018-10-01 WN N920WN 1019 ABQ DAL \n", "3 2018-10-01 WN N480WN 1499 ABQ DAL \n", "4 2018-10-01 WN N227WN 3635 ABQ DAL \n", "\n", " CRS_DEP_TIME DEP_TIME DEP_DELAY CRS_ARR_TIME ARR_TIME ARR_DELAY \\\n", "0 905 903.0 -2.0 1450 1433.0 -17.0 \n", "1 1500 1458.0 -2.0 2045 2020.0 -25.0 \n", "2 1800 1802.0 2.0 2045 2032.0 -13.0 \n", "3 950 947.0 -3.0 1235 1223.0 -12.0 \n", "4 1150 1151.0 1.0 1430 1423.0 -7.0 \n", "\n", " CRS_ELAPSED_TIME ACTUAL_ELAPSED_TIME AIR_TIME DISTANCE \n", "0 225 210.0 197.0 1670 \n", "1 225 202.0 191.0 1670 \n", "2 105 90.0 80.0 580 \n", "3 105 96.0 81.0 580 \n", "4 100 92.0 80.0 580 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv')\n", "delays_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**info** will tell us how many rows are in the DataFrame and for each column how many of those rows contain non-null values. From this we can determine which columns (if any) contain null/missing values" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 300000 entries, 0 to 299999\n", "Data columns (total 16 columns):\n", "FL_DATE 300000 non-null object\n", "OP_UNIQUE_CARRIER 300000 non-null object\n", "TAIL_NUM 299660 non-null object\n", "OP_CARRIER_FL_NUM 300000 non-null int64\n", "ORIGIN 300000 non-null object\n", "DEST 300000 non-null object\n", "CRS_DEP_TIME 300000 non-null int64\n", "DEP_TIME 296825 non-null float64\n", "DEP_DELAY 296825 non-null float64\n", "CRS_ARR_TIME 300000 non-null int64\n", "ARR_TIME 296574 non-null float64\n", "ARR_DELAY 295832 non-null float64\n", "CRS_ELAPSED_TIME 300000 non-null int64\n", "ACTUAL_ELAPSED_TIME 295832 non-null float64\n", "AIR_TIME 295832 non-null float64\n", "DISTANCE 300000 non-null int64\n", "dtypes: float64(6), int64(5), object(5)\n", "memory usage: 30.9+ MB\n" ] } ], "source": [ "delays_df.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "TAIL_NUM, DEP_TIME, DEP_DELAY, ARR_TIME, ARR_DELAY, ACTUAL_ELAPSED_TIME, and AIR_TIME all have rows with missing values." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are many techniques to deal with missing values, the simplest is to delete the rows with missing values.\n", "\n", "**dropna** will delete rows containing null/missing values" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 295832 entries, 0 to 299999\n", "Data columns (total 16 columns):\n", "FL_DATE 295832 non-null object\n", "OP_UNIQUE_CARRIER 295832 non-null object\n", "TAIL_NUM 295832 non-null object\n", "OP_CARRIER_FL_NUM 295832 non-null int64\n", "ORIGIN 295832 non-null object\n", "DEST 295832 non-null object\n", "CRS_DEP_TIME 295832 non-null int64\n", "DEP_TIME 295832 non-null float64\n", "DEP_DELAY 295832 non-null float64\n", "CRS_ARR_TIME 295832 non-null int64\n", "ARR_TIME 295832 non-null float64\n", "ARR_DELAY 295832 non-null float64\n", "CRS_ELAPSED_TIME 295832 non-null int64\n", "ACTUAL_ELAPSED_TIME 295832 non-null float64\n", "AIR_TIME 295832 non-null float64\n", "DISTANCE 295832 non-null int64\n", "dtypes: float64(6), int64(5), object(5)\n", "memory usage: 32.7+ MB\n" ] } ], "source": [ "delay_no_nulls_df = delays_df.dropna() # Delete the rows with missing values\n", "delay_no_nulls_df.info() # Check the number of rows and number of rows with non-null values to confirm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you don't need to keep the original DataFrame, you can just delete the rows within the existing DataFrame instead of creating a new one\n", "\n", "**inplace=*True*** indicates you want to drop the rows in the specified DataFrame" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 295832 entries, 0 to 299999\n", "Data columns (total 16 columns):\n", "FL_DATE 295832 non-null object\n", "OP_UNIQUE_CARRIER 295832 non-null object\n", "TAIL_NUM 295832 non-null object\n", "OP_CARRIER_FL_NUM 295832 non-null int64\n", "ORIGIN 295832 non-null object\n", "DEST 295832 non-null object\n", "CRS_DEP_TIME 295832 non-null int64\n", "DEP_TIME 295832 non-null float64\n", "DEP_DELAY 295832 non-null float64\n", "CRS_ARR_TIME 295832 non-null int64\n", "ARR_TIME 295832 non-null float64\n", "ARR_DELAY 295832 non-null float64\n", "CRS_ELAPSED_TIME 295832 non-null int64\n", "ACTUAL_ELAPSED_TIME 295832 non-null float64\n", "AIR_TIME 295832 non-null float64\n", "DISTANCE 295832 non-null int64\n", "dtypes: float64(6), int64(5), object(5)\n", "memory usage: 32.7+ MB\n" ] } ], "source": [ "delays_df.dropna(inplace=True)\n", "delays_df.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When data is loaded from multiple data sources you sometimes end up with duplicate records. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCityCountry
0Seattle-TacomaSeattleUSA
1DullesWashingtonUSA
2DullesWashingtonUSA
3HeathrowLondonUnited Kingdom
4SchipholAmsterdamNetherlands
\n", "
" ], "text/plain": [ " Name City Country\n", "0 Seattle-Tacoma Seattle USA\n", "1 Dulles Washington USA\n", "2 Dulles Washington USA\n", "3 Heathrow London United Kingdom\n", "4 Schiphol Amsterdam Netherlands" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports_df = pd.read_csv('Data/airportsDuplicateRows.csv')\n", "airports_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "use **duplicates** to find the duplicate rows.\n", "\n", "If a row is a duplicate of a previous row it returns **True**" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 False\n", "1 False\n", "2 True\n", "3 False\n", "4 False\n", "5 False\n", "6 False\n", "7 False\n", "dtype: bool" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports_df.duplicated()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**drop_duplicates** will delete the duplicate rows" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCityCountry
0Seattle-TacomaSeattleUSA
1DullesWashingtonUSA
3HeathrowLondonUnited Kingdom
4SchipholAmsterdamNetherlands
5ChangiSingaporeSingapore
6PearsonTorontoCanada
7NaritaTokyoJapan
\n", "
" ], "text/plain": [ " Name City Country\n", "0 Seattle-Tacoma Seattle USA\n", "1 Dulles Washington USA\n", "3 Heathrow London United Kingdom\n", "4 Schiphol Amsterdam Netherlands\n", "5 Changi Singapore Singapore\n", "6 Pearson Toronto Canada\n", "7 Narita Tokyo Japan" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "airports_df.drop_duplicates(inplace=True)\n", "airports_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: even-more-python-for-beginners-data-tools/09 - Handling duplicates and rows with missing values/Lots_of_flight_data.csv ================================================ [File too large to display: 21.4 MB] ================================================ FILE: even-more-python-for-beginners-data-tools/09 - Handling duplicates and rows with missing values/README.md ================================================ # Handling duplicates and rows with missing values When preparing data for machine learning you need to remove duplicate rows and you may need to delete rows with missing values. ## Common functions - [dropna](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.duplicated.html) removes rows with missing values - [duplicated](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.duplicated.html) returns a True or False to indicate if a row is a duplicate of a previous row - [drop_duplicates](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html) returns a DataFrame with duplicate rows removed ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner) ================================================ FILE: even-more-python-for-beginners-data-tools/09 - Handling duplicates and rows with missing values/airportsDuplicateRows.csv ================================================ Name,City,Country Seattle-Tacoma,Seattle,USA Dulles,Washington,USA Dulles,Washington,USA Heathrow,London,United Kingdom Schiphol,Amsterdam,Netherlands Changi,Singapore,Singapore Pearson,Toronto,Canada Narita,Tokyo,Japan ================================================ FILE: even-more-python-for-beginners-data-tools/10 - Splitting test and training data with scikit-learn/10 - Train Test split.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Splitting test and training data\n", "When you train a data model you may need to split up your data into test and training data sets\n", "\n", "To accomplish this task we will use the [scikit-learn](https://scikit-learn.org/stable/) library\n", "\n", "scikit-learn is an open source, BSD licensed library for data science for preprocessing and training models." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we can split our data test and training data, we need to do some data preparation" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's load our csv file with information about flights and flight delays\n", "\n", "Use **shape** to find out how many rows and columns are in the original DataFrame" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(300000, 16)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv')\n", "delays_df.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Split data into features and labels\n", "Create a DataFrame called X containing only the features we want to use to train our model.\n", "\n", "**Note** You can only use numeric values as features, if you have non-numeric values you must apply different techniques such as Hot Encoding to convert these into numeric values before using them as features to train a model. Check out Data Science courses for more information on these techniques!" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DISTANCECRS_ELAPSED_TIME
01670225
11670225
2580105
3580105
4580100
\n", "
" ], "text/plain": [ " DISTANCE CRS_ELAPSED_TIME\n", "0 1670 225\n", "1 1670 225\n", "2 580 105\n", "3 580 105\n", "4 580 100" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X = delays_df.loc[:,['DISTANCE', 'CRS_ELAPSED_TIME']]\n", "X.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a DataFrame called y containing only the value we want to predict with our model. \n", "\n", "In our case we want to predict how many minutes late a flight will arrive. This information is in the ARR_DELAY column. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ARR_DELAY
0-17.0
1-25.0
2-13.0
3-12.0
4-7.0
\n", "
" ], "text/plain": [ " ARR_DELAY\n", "0 -17.0\n", "1 -25.0\n", "2 -13.0\n", "3 -12.0\n", "4 -7.0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = delays_df.loc[:,['ARR_DELAY']]\n", "y.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Split into test and training data\n", "Use **scikitlearn train_test_split** to move 30% of the rows into Test DataFrames\n", "\n", "The other 70% of the rows into DataFrames we can use to train our model\n", "\n", "NOTE: by specifying a value for *random_state* we ensure that if we run the code again the same rows will be moved into the test DataFrame. This makes our results repeatable." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " X, \n", " y, \n", " test_size=0.3, \n", " random_state=42\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have a DataFrame **X_train** which contains 70% of the rows\n", "\n", "We will use this DataFrame to train our model" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(210000, 2)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X_train.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The DataFrame **X_test** contains the remaining 30% of the rows\n", "\n", "We will use this DataFrame to test our trained model, so we can check it's accuracy" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(90000, 2)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X_test.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**X_train** and **X_test** contain our features\n", "\n", "The features are the columns we think can help us predict how late a flight will arrive: **DISTANCE** and **CRS_ELAPSED_TIME**" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DISTANCECRS_ELAPSED_TIME
18629523760
127847411111
27474034285
749081005164
11630484100
\n", "
" ], "text/plain": [ " DISTANCE CRS_ELAPSED_TIME\n", "186295 237 60\n", "127847 411 111\n", "274740 342 85\n", "74908 1005 164\n", "11630 484 100" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X_train.head()" ] }, { "cell_type": "markdown", "metadata": { "scrolled": true }, "source": [ "The DataFrame **y_train** contains 70% of the rows\n", "\n", "We will use this DataFrame to train our model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you don't need to keep the original DataFrame, you can just delete the rows within the existing DataFrame instead of creating a new one\n", "**inplace=*True*** indicates you want to drop the rows in the specified DataFrame" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(210000, 1)" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_train.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The DataFrame **y_test** contains the remaining 30% of the rows\n", "\n", "We will use this DataFrame to test our trained model, so we can check it's accuracy" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(90000, 1)" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_test.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**y_train** and **y_test** contain our label\n", "\n", "The label is the columns we want to predict with our trained model: **ARR_DELAY**\n", "\n", "**NOTE:** a negative value for ARR_DELAY indicates a flight arrived early" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ARR_DELAY
186295-7.0
127847-16.0
274740-10.0
74908-19.0
11630-13.0
\n", "
" ], "text/plain": [ " ARR_DELAY\n", "186295 -7.0\n", "127847 -16.0\n", "274740 -10.0\n", "74908 -19.0\n", "11630 -13.0" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_train.head()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: even-more-python-for-beginners-data-tools/10 - Splitting test and training data with scikit-learn/Lots_of_flight_data.csv ================================================ [File too large to display: 21.4 MB] ================================================ FILE: even-more-python-for-beginners-data-tools/10 - Splitting test and training data with scikit-learn/README.md ================================================ # Splitting test and training data with scikit-learn [scikit-learn](https://scikit-learn.org/) is a library of tools for predictive data analysis, which will allow you to prepare your data for machine learning and create models. ## Common functions - [train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) splits arrays into random train and test subsets ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner) ================================================ FILE: even-more-python-for-beginners-data-tools/11 - Train a linear regression model with scikit-learn/11 - Train a basic model.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Train a linear regression model\n", "When you have your data prepared you can train a model.\n", "\n", "There are multiple libraries and methods you can call to train models. In this notebook we will use the **LinearRegression** model in the **scikit-learn** library" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need our DataFrame, with data loaded, all the rows with null values removed, and the features and labels split into the separate training and test data. So, we'll start by just rerunning the commands from the previous notebooks." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from sklearn.model_selection import train_test_split" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Load our data from the csv file\n", "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv') \n", "\n", "# Remove rows with null values since those will crash our linear regression model training\n", "delays_df.dropna(inplace=True)\n", "\n", "# Move our features into the X DataFrame\n", "X = delays_df.loc[:,['DISTANCE', 'CRS_ELAPSED_TIME']]\n", "\n", "# Move our labels into the y DataFrame\n", "y = delays_df.loc[:,['ARR_DELAY']] \n", "\n", "# Split our data into test and training DataFrames\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " X, \n", " y, \n", " test_size=0.3, \n", " random_state=42\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use **Scikitlearn LinearRegression** *fit* method to train a linear regression model based on the training data stored in X_train and y_train" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.linear_model import LinearRegression\n", "\n", "regressor = LinearRegression() # Create a scikit learn LinearRegression object\n", "regressor.fit(X_train, y_train) # Use the fit method to train the model using your training data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The *regressor* object now contains your trained Linear Regression model" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: even-more-python-for-beginners-data-tools/11 - Train a linear regression model with scikit-learn/Lots_of_flight_data.csv ================================================ [File too large to display: 21.4 MB] ================================================ FILE: even-more-python-for-beginners-data-tools/11 - Train a linear regression model with scikit-learn/README.md ================================================ # Train a linear regression model with scikit-learn [Linear regression](https://en.wikipedia.org/wiki/Linear_regression) is a common algorithm for predicting values based on a given dataset. ## Common classes and functions - [LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) fits a linear model - [LinearRegression.fit](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html?highlight=linearregression#sklearn.linear_model.LinearRegression.fit) is used to fit the linear model based on training data ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner) ================================================ FILE: even-more-python-for-beginners-data-tools/12 - Testing a model/12 - Test a model.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Test a trained model\n", "Once you have trained a model, you can test it with the test data you put aside" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will start by rerunning the code from the previous notebook to create a trained model" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LinearRegression" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Load our data from the csv file\n", "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv') \n", "\n", "# Remove rows with null values since those will crash our linear regression model training\n", "delays_df.dropna(inplace=True)\n", "\n", "# Move our features into the X DataFrame\n", "X = delays_df.loc[:,['DISTANCE', 'CRS_ELAPSED_TIME']]\n", "\n", "# Move our labels into the y DataFrame\n", "y = delays_df.loc[:,['ARR_DELAY']] \n", "\n", "# Split our data into test and training DataFrames\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " X, \n", " y, \n", " test_size=0.3, \n", " random_state=42\n", " )\n", "regressor = LinearRegression() # Create a scikit learn LinearRegression object\n", "regressor.fit(X_train, y_train) # Use the fit method to train the model using your training data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Test the model\n", "Use **Scikitlearn LinearRegression predict** to have our trained model predict values for our test data\n", "\n", "We stored our test data in X_Test\n", "\n", "We will store the predicted results in y_pred" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "y_pred = regressor.predict(X_test)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[3.47739078],\n", " [5.89055919],\n", " [4.33288464],\n", " ...,\n", " [5.84678979],\n", " [6.05195889],\n", " [5.66255414]])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_pred" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we split our data into training and test data we stored the actual values for each row of test data in the DataFrame y_test\n", "\n", "We can compare the values in y_pred to the value in y_test to get a sense of how accurately our mdoel predicted arrival delays" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ARR_DELAY
291483-5.0
98997-12.0
23454-9.0
110802-14.0
49449-20.0
9494414.0
160885-17.0
47572-20.0
16480020.0
62578-9.0
1967425.0
911660.0
171564-9.0
607066.0
240773-6.0
32695-13.0
98399-23.0
167341-11.0
126191-4.0
188715131.0
258610-5.0
215751-20.0
41210-15.0
68090-19.0
1407940.0
178840-14.0
24807121.0
127705.0
9594840.0
172913-13.0
......
20079721.0
361990.0
70402-37.0
285308152.0
201508-2.0
154671-5.0
238535-5.0
133567-9.0
3349-8.0
257254-28.0
106572-19.0
73023-25.0
214699-12.0
274435-7.0
67089-10.0
269917-4.0
16496670.0
275120-12.0
139292-8.0
31106-25.0
27779917.0
293749-7.0
23111435.0
11645-15.0
252520-12.0
209898-20.0
22210-9.0
165727-6.0
260838-33.0
1925460.0
\n", "

88750 rows × 1 columns

\n", "
" ], "text/plain": [ " ARR_DELAY\n", "291483 -5.0\n", "98997 -12.0\n", "23454 -9.0\n", "110802 -14.0\n", "49449 -20.0\n", "... ...\n", "209898 -20.0\n", "22210 -9.0\n", "165727 -6.0\n", "260838 -33.0\n", "192546 0.0\n", "\n", "[88750 rows x 1 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_test" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: even-more-python-for-beginners-data-tools/12 - Testing a model/Lots_of_flight_data.csv ================================================ [File too large to display: 21.4 MB] ================================================ FILE: even-more-python-for-beginners-data-tools/12 - Testing a model/README.md ================================================ # Testing a model Once a model is built it can be used to predict values. You can provide new values to see where it would fall on the spectrum, and test the generated model. ## Common classes and functions - [LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) fits a linear model - [LinearRegression.predict](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html?highlight=linearregression#sklearn.linear_model.LinearRegression.predict) is used to predict outcomes for new data based on the trained linear model ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner) ================================================ FILE: even-more-python-for-beginners-data-tools/13 - Evaluating accuracy of a model using calculations/13 - Evaluate accuracy.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Evaluating accuracy of a model using calculations\n", "After you train a model, you need to get a sense of it's accuracy. The accuracy of a model gives you an idea of how much confidence you can put it predictions made by the model.\n", "\n", "The **scitkit-learn** and **numpy** libraries are both helpful for measuring model accuracy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's start by recreating our trained linear regression model from the last lesson" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LinearRegression" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Load our data from the csv file\n", "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv') \n", "\n", "# Remove rows with null values since those will crash our linear regression model training\n", "delays_df.dropna(inplace=True)\n", "\n", "# Move our features into the X DataFrame\n", "X = delays_df.loc[:,['DISTANCE', 'CRS_ELAPSED_TIME']]\n", "\n", "# Move our labels into the y DataFrame\n", "y = delays_df.loc[:,['ARR_DELAY']] \n", "\n", "# Split our data into test and training DataFrames\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " X, \n", " y, \n", " test_size=0.3, \n", " random_state=42\n", " )\n", "regressor = LinearRegression() # Create a scikit learn LinearRegression object\n", "regressor.fit(X_train, y_train) # Use the fit method to train the model using your training data\n", "\n", "y_pred = regressor.predict(X_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Measuring accuracy\n", "Now that we have a trained model there are a number of metrics you can use to check the accuracy of the model. \n", "\n", "All these metrics are based on mathematical calculations, the key take-away here is you don't have to calculate everything yourself. Scikit-learn and numpy will do most of the work and provide good performance." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mean Squared Error (MSE)\n", "The MSE is the average error performed by the model when predicting the outcome for an observation. \n", "The lower the MSE, the better the model.\n", "\n", "MSE is the average squared difference between the observed actual outome values and the values predicted by the model.\n", "\n", "MSE = mean((actuals - predicteds)^2) \n", "\n", "We could write code to loop through our records comparing actual and predicated values to perform this calculation, but we don't have to! Just use **mean_squared_error** from the **scikit-learn** library" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mean Squared Error: 2250.4445141530855\n" ] } ], "source": [ "from sklearn import metrics\n", "print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Root Mean Squared Error (RMSE)\n", "RMSE is the average error performed by the model when predicting the outcome for an observation. \n", "The lower the RMSE, the better the model.\n", "\n", "Mathematically, the RMSE is the square root of the mean squared error \n", "\n", "RMSE = sqrt(MSE)\n", "\n", "Skikit learn does not have a function for RMSE, but since it's just the square root of MSE, we can use the numpy library which contains lots of mathematical functions to calculate the square root of the MSE" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mean Absolute Error (MAE)\n", "MAE measures the prediction error. The lower the MAE the better the model\n", "\n", "Mathematically, it is the average absolute difference between observed and predicted outcomes\n", "\n", "MAE = mean(abs(actuals - predicteds)). \n", "\n", "MAE is less sensitive to outliers compared to RMSE. Calculate RMSE using **mean_absolute_error** in the **scikit-learn** library" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Mean absolute error: ',metrics.mean_absolute_error(y_test, y_pred))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# R^2 or R-Squared\n", "\n", "R squared is the proportion of variation in the outcome that is explained by the predictor variables. It is an indication of how much the values passed to the model influence the predicted value. \n", "\n", "The Higher the R-squared, the better the model. Calculate R-Squared using **r2_score** in the **scikit-learn** library." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('R^2: ',metrics.r2_score(y_test, y_pred))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Different models have different ways to measure accuracy. Fortunately **scikit-learn** and **numpy** provide a wide variety of functions to help measure accuracy." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: even-more-python-for-beginners-data-tools/13 - Evaluating accuracy of a model using calculations/Lots_of_flight_data.csv ================================================ [File too large to display: 21.4 MB] ================================================ FILE: even-more-python-for-beginners-data-tools/13 - Evaluating accuracy of a model using calculations/README.md ================================================ # Evaluating accuracy of a model using calculations Playing with individual values isn't the best way to test a model. Fortunately, [scikit-learn](https://scikit-learn.org/) provides tools for automated testing an analysis. ## Common functions - [metrics](https://scikit-learn.org/stable/modules/classes.html?highlight=metrics#module-sklearn.metrics) includes functions and metrics that can be used for data science including measuring accuracy of models - [mean_squared_error](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error) returns the mean squared error, a measure used to measure accuracy of linear regression models - [r2_score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html#sklearn.metrics.r2_score) returns the R^2 regression score, a measure used to measure accuracy of linear regression models ## NumPy [NumPy](https://numpy.org/) is a package for scientific computing with Python ### Common functions - [sqrt](https://numpy.org/doc/1.18/reference/generated/numpy.sqrt.html?highlight=sqrt#numpy.sqrt) returns the square root of a value ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner) ================================================ FILE: even-more-python-for-beginners-data-tools/14 - NumPy vs Pandas/14 - Working with numpy and pandas.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Moving data from numpy arrays to pandas DataFrames\n", "In our last notebook we trained a model and compared our actual and predicted results\n", "\n", "What may not have been evident was when we did this we were working with two different objects: a **numpy array** and a **pandas DataFrame**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To explore further let's rerun the code from the previous notebook to create a trained model and get predicted values for our test data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LinearRegression" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Load our data from the csv file\n", "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv') \n", "\n", "# Remove rows with null values since those will crash our linear regression model training\n", "delays_df.dropna(inplace=True)\n", "\n", "# Move our features into the X DataFrame\n", "X = delays_df.loc[:,['DISTANCE','CRS_ELAPSED_TIME']]\n", "\n", "# Move our labels into the y DataFrame\n", "y = delays_df.loc[:,['ARR_DELAY']] \n", "\n", "# Split our data into test and training DataFrames\n", "X_train, X_test, y_train, y_test = train_test_split(X, \n", " y, \n", " test_size=0.3, \n", " random_state=42)\n", "regressor = LinearRegression() # Create a scikit learn LinearRegression object\n", "regressor.fit(X_train, y_train) # Use the fit method to train the model using your training data\n", "\n", "y_pred = regressor.predict(X_test) # Generate predicted values for our test data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the last Notebook, you might have noticed the output displays differently when you display the contents of the predicted values in y_pred and the actual values in y_test" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[3.47739078],\n", " [5.89055919],\n", " [4.33288464],\n", " ...,\n", " [5.84678979],\n", " [6.05195889],\n", " [5.66255414]])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_pred" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ARR_DELAY
291483-5.0
98997-12.0
23454-9.0
110802-14.0
49449-20.0
9494414.0
160885-17.0
47572-20.0
16480020.0
62578-9.0
1967425.0
911660.0
171564-9.0
607066.0
240773-6.0
32695-13.0
98399-23.0
167341-11.0
126191-4.0
188715131.0
258610-5.0
215751-20.0
41210-15.0
68090-19.0
1407940.0
178840-14.0
24807121.0
127705.0
9594840.0
172913-13.0
......
20079721.0
361990.0
70402-37.0
285308152.0
201508-2.0
154671-5.0
238535-5.0
133567-9.0
3349-8.0
257254-28.0
106572-19.0
73023-25.0
214699-12.0
274435-7.0
67089-10.0
269917-4.0
16496670.0
275120-12.0
139292-8.0
31106-25.0
27779917.0
293749-7.0
23111435.0
11645-15.0
252520-12.0
209898-20.0
22210-9.0
165727-6.0
260838-33.0
1925460.0
\n", "

88750 rows × 1 columns

\n", "
" ], "text/plain": [ " ARR_DELAY\n", "291483 -5.0\n", "98997 -12.0\n", "23454 -9.0\n", "110802 -14.0\n", "49449 -20.0\n", "... ...\n", "209898 -20.0\n", "22210 -9.0\n", "165727 -6.0\n", "260838 -33.0\n", "192546 0.0\n", "\n", "[88750 rows x 1 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use **type()** to check the datatype of an object." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "numpy.ndarray" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(y_pred)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **y_pred** is a numpy array\n", "* **y_test** is a pandas DataFrame\n", "\n", "Another way you might discover this is if you try to use the **head** method on **y_pred**. \n", "\n", "This will return an error, because **head** is a method of the DataFrame class it is not a method of numpy arrays" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "ename": "AttributeError", "evalue": "'numpy.ndarray' object has no attribute 'head'", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mAttributeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0my_pred\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mhead\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mAttributeError\u001b[0m: 'numpy.ndarray' object has no attribute 'head'" ] } ], "source": [ "y_pred.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A one dimensional numpy array is similar to a pandas Series\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Pearson' 'Changi' 'Narita']\n", "Narita\n" ] } ], "source": [ "import numpy as np\n", "airports_array = np.array(['Pearson','Changi','Narita'])\n", "print(airports_array)\n", "print(airports_array[2])" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 Pearson\n", "1 Changi\n", "2 Narita\n", "dtype: object\n", "Narita\n" ] } ], "source": [ "airports_series = pd.Series(['Pearson','Changi','Narita'])\n", "print(airports_series)\n", "print(airports_series[2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A two dimensional numpy array is similar to a pandas DataFrame" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[['YYZ' 'Pearson']\n", " ['SIN' 'Changi']\n", " ['NRT' 'Narita']]\n", "YYZ\n" ] } ], "source": [ "airports_array = np.array([\n", " ['YYZ','Pearson'],\n", " ['SIN','Changi'],\n", " ['NRT','Narita']])\n", "print(airports_array)\n", "print(airports_array[0,0])" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0 1\n", "0 YYZ Pearson\n", "1 SIN Changi\n", "2 NRT Narita\n", "YYZ\n" ] } ], "source": [ "airports_df = pd.DataFrame([['YYZ','Pearson'],['SIN','Changi'],['NRT','Narita']])\n", "print(airports_df)\n", "print(airports_df.iloc[0,0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you need the functionality of a DataFrame, you can move data from numpy objects to pandas objects and vice-versa.\n", "\n", "In the example below we use the DataFrame constructor to read the contents of the numpy array *y_pred* into a DataFrame called *predicted_df*\n", "\n", "Then we can use the functionality of the DataFrame object" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
03.477391
15.890559
24.332885
33.447476
45.072394
\n", "
" ], "text/plain": [ " 0\n", "0 3.477391\n", "1 5.890559\n", "2 4.332885\n", "3 3.447476\n", "4 5.072394" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "predicted_df = pd.DataFrame(y_pred)\n", "predicted_df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: even-more-python-for-beginners-data-tools/14 - NumPy vs Pandas/Lots_of_flight_data.csv ================================================ [File too large to display: 21.4 MB] ================================================ FILE: even-more-python-for-beginners-data-tools/14 - NumPy vs Pandas/README.md ================================================ # NumPy vs pandas There are numerous libraries available for use for data scientists. NumPy and pandas are two of the most common. Some operations may return different data types. You can use the Python function [type](https://docs.python.org/3/library/functions.html#type) to determine the type of an object. ## NumPy [NumPy](https://numpy.org/) is a Python package for scientific computing that includes a array and dictionary type objects for data analysis. ### Common object - [array](https://numpy.org/doc/1.18/reference/generated/numpy.array.html?highlight=array#numpy.array) creates an N-dimensional array object ## pandas [pandas](https://pandas.pydata.org/) is a Python package for data analysis that includes a 1 dimensional and 2 dimensional array objects ### Common objects - [Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.html) stores a one dimensional array - [DataFrame](https://pandas.pydata.org/docs/reference/frame.html) stores a two-dimensional array ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner) ================================================ FILE: even-more-python-for-beginners-data-tools/15 - Visualizing data with Matplotlib/15 - Visualizing correlations.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Visualizing data with matplotlib" ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Somtimes graphs provide the best way to visualize data\n", "\n", "The **matplotlib** library allows you to draw graphs to help with visualization\n", "\n", "If we want to visualize data, we will need to load some data into a DataFrame" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load our data from the csv file\n", "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv') " ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "In order to display plots we need to import the **matplotlib** library" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A common plot used in data science is the scatter plot for checking the relationship between two columns\n", "If you see dots scattered everywhere, there is no correlation between the two columns\n", "If you see somethign resembling a line, there is a correlation between the two columns\n", "\n", "You can use the plot method of the DataFrame to draw the scatter plot\n", "* kind - the type of graph to draw\n", "* x - value to plot as x\n", "* y - value to plot as y\n", "* color - color to use for the graph points\n", "* alpha - opacity - useful to show density of points in a scatter plot\n", "* title - title of the graph" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "#Check if there is a relationship between the distance of a flight and how late the flight arrives\n", "delays_df.plot(\n", " kind='scatter',\n", " x='DISTANCE',\n", " y='ARR_DELAY',\n", " color='blue',\n", " alpha=0.3,\n", " title='Correlation of arrival and distance'\n", " )\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Check if there is a relationship between the how late the flight leaves and how late the flight arrives\n", "delays_df.plot(\n", " kind='scatter',\n", " x='DEP_DELAY',\n", " y='ARR_DELAY',\n", " color='blue',\n", " alpha=0.3,\n", " title='Correlation of arrival and departure delay'\n", " )\n", "plt.show()" ] }, { "cell_type": "markdown", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "The scatter plot allows us to see there is no correlation between distance and arrival delay but there is a strong correlation between departure delay and arrival delay.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: even-more-python-for-beginners-data-tools/15 - Visualizing data with Matplotlib/Lots_of_flight_data.csv ================================================ [File too large to display: 21.4 MB] ================================================ FILE: even-more-python-for-beginners-data-tools/15 - Visualizing data with Matplotlib/README.md ================================================ # Visualizing data with Matplotlib [Matplotlib](https://matplotlib.org/) gives you the ability to draw charts which can be used to visualize data. ## Common tools and functions - [pyplot](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.html?highlight=pyplot#module-matplotlib.pyplot) provides the ability to draw plots similar to the MATLAB tool - [pyplot.plot](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot) plots a graph - [pyplot.show](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.show.html#matplotlib.pyplot.show) displays figures such as a graph - [pyplot.scatter](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.scatter.html?highlight=scatter%20plot#matplotlib.pyplot.scatter) is used to draw scatter plots, a diagram that shows the relationship between two sets of data ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner) ================================================ FILE: even-more-python-for-beginners-data-tools/README.md ================================================ # Even more Python for beginners - data tools ## Overview Data science and machine learning among the most popular fields today, in which Python is one of the most popular languages. As you might expect, there are several libraries and tools available to you. As you begin your journey into this field, it will help to be familiar with the most common frameworks and techniques. This is what we're here to help you with! We're going to introduce [Jupyter notebooks](https://jupyter.org/), a common tool for data scientists. We're also going to show off [pandas](https://pandas.pydata.org/) which is used to help manage and explore data, and [scikit-learn](https://scikit-learn.org/) for incorporating machine learning. You'll see how to bring everything together and walk through a common scenario of loading data and running it through a particular algorithm. Our goal is to help show you the tools you'll be using as you dig deeper into data science and machine learning. While we won't highlight the decision points of algorithms or collecting the data (there are other courses available for those topics), you will explore the techniques and libraries. ### What you'll learn - Jupyter notebooks - pandas DataFrame for managing data - NumPy for arrays - scikit-learn for machine learning ### What we don't cover - Theory behind machine learning - Algorithm selection - Managing big data ## Prerequisites - [An understanding of Git](https://git-scm.com/book/en/v1/Getting-Started) - [An understanding of Python](https://aka.ms/pythonbeginnerseries) ## Next steps As the goal of this course is to help get you up to speed on Python so you can work through a quick start, the next step after completing the videos is to follow a tutorial! Here's a few of our favorites: - [Predict flight delays by creating a machine learning model in Python](https://docs.microsoft.com/learn/modules/predict-flight-delays-with-python?WT.mc_id=python-c9-niner) - [Train a machine learning model with Azure Machine Learning](https://docs.microsoft.com/learn/modules/train-local-model-with-azure-mls?WT.mc_id=python-c9-niner) - [Analyze climate data](https://docs.microsoft.com/learn/modules/analyze-climate-data-with-azure-notebooks?WT.mc_id=python-c9-niner) - [Use unsupervised learning to analyze unlabeled data](https://docs.microsoft.com/learn/modules/introduction-to-unsupervised-learning?WT.mc_id=python-c9-niner) ================================================ FILE: more-python-for-beginners/.gitignore ================================================ # Created by https://www.gitignore.io/api/python,virtualenv,visualstudiocode # Edit at https://www.gitignore.io/?templates=python,virtualenv,visualstudiocode ### Python ### # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ pip-wheel-metadata/ share/python-wheels/ *.egg-info/ .installed.cfg *.egg MANIFEST # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .nox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover .hypothesis/ .pytest_cache/ # Translations *.mo *.pot # Scrapy stuff: .scrapy # Sphinx documentation docs/_build/ # PyBuilder target/ # pyenv .python-version # pipenv # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. # However, in case of collaboration, if having platform-specific dependencies or dependencies # having no cross-platform support, pipenv may install dependencies that don't work, or not # install all needed dependencies. #Pipfile.lock # celery beat schedule file celerybeat-schedule # SageMath parsed files *.sage.py # Spyder project settings .spyderproject .spyproject # Rope project settings .ropeproject # Mr Developer .mr.developer.cfg .project .pydevproject # mkdocs documentation /site # mypy .mypy_cache/ .dmypy.json dmypy.json # Pyre type checker .pyre/ ### VirtualEnv ### # Virtualenv # http://iamzed.com/2009/05/07/a-primer-on-virtualenv/ pyvenv.cfg .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ pip-selfcheck.json ### VisualStudioCode ### .vscode/* !.vscode/settings.json !.vscode/tasks.json !.vscode/launch.json !.vscode/extensions.json ### VisualStudioCode Patch ### # Ignore all local history of files .history # End of https://www.gitignore.io/api/python,virtualenv,visualstudiocode ================================================ FILE: more-python-for-beginners/01 - Formatting and linting/.vscode/settings.json ================================================ { "python.linting.pylintEnabled": true, "python.linting.enabled": true, "python.linting.flake8Enabled": false, "python.linting.banditEnabled": false, "python.linting.pycodestyleEnabled": false, "python.linting.mypyEnabled": false } ================================================ FILE: more-python-for-beginners/01 - Formatting and linting/README.md ================================================ # Style Guidelines ## Formatting Formatting makes code readable and easier to debug. ## Documentation - [PEP 8](https://pep8.org/) is a set of coding conventions for Python code - [Docstring](https://www.python.org/dev/peps/pep-0257/) is the standard for documenting a module, function, class or method definition ## Linting Linting helps you identify formatting and convention issues in your Python code - [Pylint](https://www.pylint.org/) Pylint is a linter for Python to help enforce coding standards and check for errors in Python code - [Linting Python in Visual Studio Code](https://code.visualstudio.com/docs/python/linting) will show you how to enable litners in VS Code - [Type hints](https://docs.python.org/3/library/typing.html) allow some interactive development environments and linters to enforce types ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Set up your Python beginner development environment with Visual Studio Code](https://docs.microsoft.com/learn/languages/python-install-vscode/?WT.mc_id=python-c9-niner) ================================================ FILE: more-python-for-beginners/01 - Formatting and linting/bad.py ================================================ x = 12 if x == 24: print('Is valid') else: print("Not valid") def helper(name='sample'): pass def another(name = 'sample'): pass ================================================ FILE: more-python-for-beginners/01 - Formatting and linting/good.py ================================================ def print_hello(name: str) -> str: """ Greets the user by name Parameters: name (str): The name of the user Returns: str: The greeting """ print('Hello, ' + name) ================================================ FILE: more-python-for-beginners/02 - Lambdas/README.md ================================================ # Lambdas A [lambda](https://www.w3schools.com/python/python_lambda.asp) function is a small anonymous function. It can take any number of arguments but can only execute one expression. ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Create reusable functionality with functions in Python](https://docs.microsoft.com/learn/languages/python-functions/?WT.mc_id=python-c9-niner) ================================================ FILE: more-python-for-beginners/02 - Lambdas/failed_sort.py ================================================ # This code will return an error because the sort method does not know # which presenter field to use when sorting presenters = [ {'name': 'Susan', 'age': 50}, {'name': 'Christopher', 'age': 47} ] presenters.sort() print(presenters) ================================================ FILE: more-python-for-beginners/02 - Lambdas/lambda_sorter.py ================================================ # Sort alphabetically presenters = [ {'name': 'Susan', 'age': 50}, {'name': 'Christopher', 'age': 47} ] presenters.sort(key=lambda item: item['name']) print('-- alphabetically --') print(presenters) # Sort by length of name (shortest to longest) presenters.sort(key=lambda item: len(item['name'])) print('-- length --') print(presenters) ================================================ FILE: more-python-for-beginners/02 - Lambdas/method_sorter.py ================================================ def sorter(item): return item['name'] presenters = [ {'name': 'Susan', 'age': 50}, {'name': 'Christopher', 'age': 47} ] presenters.sort(key=sorter) print(presenters) ================================================ FILE: more-python-for-beginners/03 - Classes/README.md ================================================ # Classes [Classes](https://docs.python.org/3/tutorial/classes.html) define data structures and behavior. Classes allow you to group data and functionality together. ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Object-oriented programming in Python](https://docs.microsoft.com/learn/modules/python-object-oriented-programming/?WT.mc_id=python-c9-niner) ================================================ FILE: more-python-for-beginners/03 - Classes/basic_class.py ================================================ class Presenter(): def __init__(self, name): # Constructor self.name = name def say_hello(self): # method print('Hello, ' + self.name) presenter = Presenter('Chris') presenter.name = 'Christopher' presenter.say_hello() ================================================ FILE: more-python-for-beginners/03 - Classes/properties_class.py ================================================ class Presenter(): def __init__(self, name): # Constructor self.name = name @property def name(self): print('Retrieving name...') return self.__name @name.setter def name(self, value): # cool validation here print('Validating name...') self.__name = value presenter = Presenter('Chris') presenter.name = 'Christopher' print(presenter.name) ================================================ FILE: more-python-for-beginners/04 - Inheritance/README.md ================================================ # Inheritance [Inheritance](https://docs.python.org/3/tutorial/classes.html#inheritance) allows you to define a class that inherits all the methods and properties from another class. The parent or base class is the class being inherited from. The child or derived class is the class that inherits from another class. ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Object-oriented programming in Python](https://docs.microsoft.com/learn/modules/python-object-oriented-programming/?WT.mc_id=python-c9-niner) ================================================ FILE: more-python-for-beginners/04 - Inheritance/demo.py ================================================ class Person: def __init__(self, name): self.name = name def say_hello(self): print('Hello, ' + self.name) class Student(Person): def __init__(self, name, school): super().__init__(name) self.school = school def sing_school_song(self): print('Ode to ' + self.school) student = Student('Christopher', 'UVM') student.say_hello() student.sing_school_song() # What are you? print(isinstance(student, Student)) print(isinstance(student, Person)) print(issubclass(Student, Person)) ================================================ FILE: more-python-for-beginners/05 - Mixins/README.md ================================================ # Mixins (multiple inheritance) Python allows you to inherit from multiple classes. While the technical term for this is multiple inheritance, many developers refer to the use of more than one base class adding a mixin. These are commonly used in frameworks such as [Django](https://www.djangoproject.com). - [Multiple Inheritance](https://docs.python.org/3/tutorial/classes.html#multiple-inheritance) - [super](https://docs.python.org/3/library/functions.html#super) is used to give access to methods and properties of a parent class ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). - [Object-oriented programming in Python](https://docs.microsoft.com/learn/modules/python-object-oriented-programming/?WT.mc_id=python-c9-niner) ================================================ FILE: more-python-for-beginners/05 - Mixins/demo.py ================================================ class Loggable: def __init__(self): self.title = '' def log(self): print('Log message from ' + self.title) class Connection: def __init__(self): self.server = '' def connect(self): print('Connecting to database on ' + self.server) class SqlDatabase(Connection, Loggable): def __init__(self): super().__init__() self.title = 'Sql Connection Demo' self.server = 'Some_Server' def framework(item): if isinstance(item, Connection): item.connect() if isinstance(item, Loggable): item.log() sql_connection = SqlDatabase() framework(sql_connection) ================================================ FILE: more-python-for-beginners/06 - Managing the file system/README.md ================================================ # Managing the file system Python's [pathlib](https://docs.python.org/3/library/pathlib.html) provides operations and classes to access files and directories in the file system. ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner) ================================================ FILE: more-python-for-beginners/06 - Managing the file system/demo.txt ================================================ Lorem ipsum ================================================ FILE: more-python-for-beginners/06 - Managing the file system/directories.py ================================================ from pathlib import Path cwd = Path.cwd() # Get the parent directory parent = cwd.parent # Is this a directory? print('\nIs this a directory? ' + str(parent.is_dir())) # Is this a file? print('\nIs this a file? ' + str(parent.is_file())) # List child directories print('\n-----directory contents-----') for child in parent.iterdir(): if child.is_dir(): print(child) ================================================ FILE: more-python-for-beginners/06 - Managing the file system/files.py ================================================ from pathlib import Path cwd = Path.cwd() demo_file = Path(Path.joinpath(cwd, 'demo.txt')) # Get the file name print('\nfile name: ' + demo_file.name) # Get the extension print('\nfile suffix: ' + demo_file.suffix) # Get the folder print('\nfile folder: ' + demo_file.parent.name) # Get the size print('\nfile size: ' + str(demo_file.stat().st_size) + '\n') ================================================ FILE: more-python-for-beginners/06 - Managing the file system/paths.py ================================================ # Python 3.6 or higher # Grab the library from pathlib import Path # What is the current working directory? cwd = Path.cwd() print('\nCurrent working directory:\n' + str(cwd)) # Create full path name by joining path and filename new_file = Path.joinpath(cwd, 'new_file.txt') print('\nFull path:\n' + str(new_file)) # Check if file exists print('\nDoes that file exist? ' + str(new_file.exists()) + '\n') ================================================ FILE: more-python-for-beginners/07 - Reading and writing files/README.md ================================================ # Working with files Python allows you to read and write from files. [io](https://docs.python.org/3/library/io.html) is the module that provides Python capabilities for input/output (I/O), including text I/O from files ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner) ================================================ FILE: more-python-for-beginners/07 - Reading and writing files/demo.txt ================================================ This is the first line of the file And this is the second line of the file This is the last line of the file ================================================ FILE: more-python-for-beginners/07 - Reading and writing files/manage.py ================================================ # Open manage.txt file to write text stream = open('manage.txt', 'wt') #Write the word demo to the file stream stream.write('demo!') # Move back to the start of the file stream stream.seek(0) #write the word cool to the file stream stream.write('cool') #Flush the file stream contents to the file buffer stream.flush() # Flush the file stream and close the file stream.close() ================================================ FILE: more-python-for-beginners/07 - Reading and writing files/read.py ================================================ # Open file demo.txt and read the contents stream = open('./demo.txt', 'rt') print('\nIs this readable? ' + str(stream.readable())) print('\nRead one character : ' + stream.read(1)) print('\nRead to end of line : ' + stream.readline()) print('\nRead all lines to end of file :\n' + str(stream.readlines())+ '\n') ================================================ FILE: more-python-for-beginners/07 - Reading and writing files/write.py ================================================ # Open output.txt as a text file for writing stream = open('output.txt', 'wt') print('\nCan I write to this file? ' + str(stream.writable()) + '\n') stream.write('H') # Write a single string stream.writelines(['ello',' ','world']) # Write one or more strings stream.write('\n') # Write a new line names = ['Susan','Christopher'] stream.writelines(names) # Here's a neat trick to insert a new line between items in the list stream.write('\n') # Write a new line stream.writelines('\n'.join(names)) stream.close() #Flush stream and close ================================================ FILE: more-python-for-beginners/08 - Managing external resources/README.md ================================================ # with The [with](https://docs.python.org/3/reference/compound_stmts.html#with) statement allows you to simplify code in [try](https://docs.python.org/3/reference/compound_stmts.html#the-try-statement)/finally statements. It's considered to use `with` for any operation which supports it. ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner) ================================================ FILE: more-python-for-beginners/08 - Managing external resources/demo.py ================================================ try: stream = open('output.txt', 'wt') stream.write('Lorem ipsum dolar') finally: stream.close() # THIS IS REALLY IMPORTANT!! # with open('output.txt', 'wt') as stream: # stream.write('Lorem ipsum dolar') ================================================ FILE: more-python-for-beginners/08 - Managing external resources/output.txt ================================================ Lorem ipsum dolar ================================================ FILE: more-python-for-beginners/09 - Asynchronous programming/README.md ================================================ # Asynchronous operations Python offers several options for managing long running operations asynchronously. [asyncio](https://docs.python.org/3/library/asyncio.html) is the core library for supporting asynchronous operations, including [async](https://docs.python.org/3/reference/compound_stmts.html#async-def)/[await](https://docs.python.org/3/reference/expressions.html#await). ## Microsoft Learn Resources Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner). ================================================ FILE: more-python-for-beginners/09 - Asynchronous programming/async_demo.py ================================================ from timeit import default_timer import aiohttp import asyncio async def load_data(session, delay): print(f'Starting {delay} second timer') async with session.get(f'http://httpbin.org/delay/{delay}') as resp: text = await resp.text() print(f'Completed {delay} second timer') return text async def main(): # Start the timer start_time = default_timer() # Creating a single session async with aiohttp.ClientSession() as session: # Setup our tasks and get them running two_task = asyncio.create_task(load_data(session, 2)) three_task = asyncio.create_task(load_data(session, 3)) # Simulate other processing await asyncio.sleep(1) print('Doing other work') # Let's go get our values two_result = await two_task three_result = await three_task # Print our results elapsed_time = default_timer() - start_time print(f'The operation took {elapsed_time:.2} seconds') asyncio.run(main()) ================================================ FILE: more-python-for-beginners/09 - Asynchronous programming/sync_demo.py ================================================ from timeit import default_timer import requests def load_data(delay): print(f'Starting {delay} second timer') text = requests.get(f'http://httpbin.org/delay/{delay}').text print(f'Completed {delay} second timer') def run_demo(): start_time = default_timer() two_data = load_data(2) three_data = load_data(3) elapsed_time = default_timer() - start_time print(f'The operation took {elapsed_time:.2} seconds') def main(): run_demo() main() ================================================ FILE: more-python-for-beginners/README.md ================================================ # More Python for beginners ## Overview When we created [Python for beginners](https://aka.ms/pythonbeginnerseries) we knew we wouldn't be able to cover everything in Python. We focused on the features which are core to getting started with the language. But, of course, we left some items off the list. Well, we're back for more! We created another set of videos to highlight more features, including a couple of "cutting edge" items like `async/await`. These skills will allow you to continue to grow as a Python developer. ### What you'll learn - Creating classes and objects - Asynchronous development - Working with the filesystem ### What we don't cover - Programming concepts like [object-oriented design](https://en.wikipedia.org/wiki/Object-oriented_design) - Database access ## Prerequisites - [An understanding of git](https://git-scm.com/book/en/v2) - [An understanding of Python](https://aka.ms/pythonbeginnerseries) - [Visual Studio Code](https://code.visualstudio.com?WT.mc_id=python-c9-niner) or another code editor ### Setup steps - [Create a virtual environment](https://docs.python.org/3/tutorial/venv.html) ``` bash # Windows python -m venv venv .\venv\Scripts\activate # Linux or macOS python3 -m venv venv . ./venv/bin/activate ``` - Install the packages for Async/Await ``` bash # Windows pip install -r requirements.txt # Linux or macOS pip3 install -r requirements.txt ``` ## Next steps If you're looking to continue building, here's a couple of courses and quickstarts you might find of interest: - [Object-oriented programming in Python](https://docs.microsoft.com/learn/modules/python-object-oriented-programming?WT.mc_id=python-c9-niner?WT.mc_id=python-c9-niner) - [Build an AI web app using Python and Flask](https://docs.microsoft.com/learn/modules/python-flask-build-ai-web-app?WT.mc_id=python-c9-niner?WT.mc_id=python-c9-niner) - [Build Python Django apps with Microsoft Graph](https://docs.microsoft.com/graph/tutorials/python?WT.mc_id=python-c9-niner?WT.mc_id=python-c9-niner) - [Create a Python app in Azure App Service on Linux](https://docs.microsoft.com/azure/app-service/containers/quickstart-python?WT.mc_id=python-c9-niner?WT.mc_id=python-c9-niner) ================================================ FILE: more-python-for-beginners/requirements.txt ================================================ aiohttp requests ================================================ FILE: python-for-beginners/02 - Print/README.md ================================================ # print The print function allows you to send output to the terminal - [print](https://docs.python.org/3/library/functions.html#print) Strings can be enclosed in single quotes or double quotes - "this is a string" - 'this is also a string' The input function allows you to prompt a user for a value - [input](https://docs.python.org/3/library/functions.html#input) Parameters: - `prompt`: Message to display to the user return value: - string value containing value entered by user ================================================ FILE: python-for-beginners/02 - Print/ask_for_input.py ================================================ # The input funciton allows you to prompt the user for a value # You need to declare a variable to hold the value entered by the user name = input('What is your name? ') print(name) ================================================ FILE: python-for-beginners/02 - Print/coding_challenge.py ================================================ # Here's a challenge for you to help you practice # See if you can fix the code below # print the message print('Why won't this line of code print') # print the message prnit('This line fails too!') # print the message print "I think I know how to fix this one" # print the name entered by the user input('Please tell me your name: ') print(name) ================================================ FILE: python-for-beginners/02 - Print/coding_challenge_solution.py ================================================ # Here's a challenge for you to help you practice # See if you can fix the code below # print the message # There was a single quote inside the string! # Use double quotes to enclose the string print("Why won't this line of code print") # print the message # There was a mistake in the function name print('This line fails too!') # print the message # Need to add the () around the string print ("I think I know how to fix this one") # print the name entered by the user # You need to store the value returned by the input statement # in a variable name = input('Please tell me your name: ') print(name) ================================================ FILE: python-for-beginners/02 - Print/hello_world.py ================================================ # the print statement displays a message print('Hello world') ================================================ FILE: python-for-beginners/02 - Print/print_blank_line.py ================================================ # Each print statements starts on a new line print('Hello world') # If you pass nothing to the print statement you get a blank line print() print('Did you see that blank line?') # '\n' is a special character sequence that means print new line # you can use it to break the output over multiple lines print('Blank line \nin the middle of string') ================================================ FILE: python-for-beginners/02 - Print/single_or_double_quotes.py ================================================ # Strings can be enclosed in single quotes print('Hello world single quotes') # Strings can also be enclosed in double quotes print("Hello world double quotes") ================================================ FILE: python-for-beginners/03 - Comments/README.md ================================================ # Comments Comments start with a hash character (#) and allow you to document your code. Comments are ignored when code is executed. - [Comments](https://docs.python.org/3/reference/lexical_analysis.html?highlight=comment) ================================================ FILE: python-for-beginners/03 - Comments/comments_are_not_executed.py ================================================ # This is a comment in my code it does nothing # print('Hello world') # print("Hello world") # No output will be displayed! ================================================ FILE: python-for-beginners/03 - Comments/comments_for_debugging.py ================================================ print('Hello world') print('It's a small world after all') ================================================ FILE: python-for-beginners/03 - Comments/enable_pin.py ================================================ #The enable_pin method is not coded yet # I have created a dummy method so the code # will run without an error # Don't panic if you don't understand this part of the code # we cover methods in a separate module def enable_pin(user, pin): print('pin enabled') # Set current_user and pin to test values current_user = 'TEST123' pin = '123456' # Enable PIN check as listed in # security requirements enable_pin(current_user, pin) ================================================ FILE: python-for-beginners/03 - Comments/string_in_double_quotes.py ================================================ # Using double quotes for this string because # the string itself contains a single quote print("It's a small world after all") ================================================ FILE: python-for-beginners/04 - String variables/README.md ================================================ # Strings Python can store and manipulate strings. Strings can be enclosed in single or double quotes. There are a number of string methods you can use to manipulate and work with strings - [strings](https://docs.python.org/3/tutorial/introduction.html#strings) - [string methods](https://docs.python.org/3/library/stdtypes.html#string-methods) Converting to string values - [str](https://docs.python.org/3/library/functions.html#func-str) When naming variables follow the PEP-8 Style Guide for Python Code - [PEP-8 Style Guide](https://www.python.org/dev/peps/pep-0008/#naming-conventions) ================================================ FILE: python-for-beginners/04 - String variables/code_challenge.py ================================================ # ask a user to enter their first name and store it in a variable # ask a user to enter their last name and store it in a variable # print their full name # Make sure you have a space between first and last name # Make sure the first letter of first name and last name is uppercase # Make sure the rest of the name is lowercase ================================================ FILE: python-for-beginners/04 - String variables/code_challenge_solution.py ================================================ # ask a user to enter their first name and store it in a variable first_name = input('What is your first name? ') # ask a user to enter their last name and store it in a variable last_name = input('What is your last name? ') # print their full name # Make sure you have a space between first and last name # Make sure the first letter of first name and last name is uppercase # Make sure the rest of the name is lowercase print(first_name.capitalize() + ' ' + last_name.capitalize()) ================================================ FILE: python-for-beginners/04 - String variables/combine_strings.py ================================================ # You can use the + operator to concatenate strings first_name = 'Susan' last_name = 'Ibach' print(first_name + last_name) # If you want a space between the strings you must include the space # within the string print('Hello ' + first_name + ' ' + last_name) ================================================ FILE: python-for-beginners/04 - String variables/format_strings.py ================================================ # Ask the user for their first and last name first_name = input('What is your first name? ') last_name = input('What is your last name? ') # the capitalize function will return the string with # the first letter uppercase and the rest of the word lowercase print ('Hello ' + first_name.capitalize() + ' ' \ + last_name.capitalize()) ================================================ FILE: python-for-beginners/04 - String variables/string_functions.py ================================================ # There are a number of string functions you can use # on string variables sentence = 'The dog is named Sammy' # upper will return the string in uppercase letters print(sentence.upper()) # lower will return the string in lowercase letters print(sentence.lower()) # capitalize will return the string with the first letter uppercase # and the rest of the string in lowercase print(sentence.capitalize()) # count will count the number of occurrences of the value specified # in the string, in this case how many times the letter 'a' appears print(sentence.count('a')) ================================================ FILE: python-for-beginners/04 - String variables/strings_in_variables.py ================================================ # You can store strings in variables first_name = 'Susan' # The variable can then be used later in your code print(first_name) ================================================ FILE: python-for-beginners/05 - Numeric variables/README.md ================================================ # Numeric values Python can store and manipulate numbers. Python has two types of numeric values: integers (whole numbers) or float (numbers with decimal places) - [numeric types](https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex) When naming variables follow the PEP-8 Style Guide for Python Code - [PEP-8 Style Guide](https://www.python.org/dev/peps/pep-0008/#naming-conventions) Converting to numeric values - [int](https://docs.python.org/3/library/functions.html#int) - [float](https://docs.python.org/3/library/functions.html#float) ================================================ FILE: python-for-beginners/05 - Numeric variables/code_challenge.py ================================================ # Ask a user to enter a number # Ask a user to enter a second number # Calculate the total of the two numbers added together # Print 'first number + second number = answer' # For example if someone enters 4 and 6 the output should read # 4 + 6 = 10 ================================================ FILE: python-for-beginners/05 - Numeric variables/code_challenge_solution.py ================================================ # Ask a user to enter a number first_number = input('Enter a number: ') # Ask a user to enter a second number second_number = input('Enter another number: ') # Calculate the total of the two numbers added together answer = float(first_number) + float(second_number) # Print 'first number + second number = answer' # For example if someone enters 4 and 6 the output should read # 4 + 6 = 10 print(first_number + ' + ' + second_number + ' = ' + str(answer)) # If you do not want the decimal places you could round the answer print(first_number + ' + ' + second_number + ' = ' + str(round(answer))) ================================================ FILE: python-for-beginners/05 - Numeric variables/combining_strings_and_numbers.py ================================================ days_in_feb = 28 # The print function can accept numbers or strings print(days_in_feb) # The + operator can either add two numbers or it can concatenate two strings # it does not know what to do when you pass it one number and one string # This line of code will cause an error print(days_in_feb + ' days in February') # You need to convert the number to a string to display the value # This line of code will work print(str(days_in_feb) + ' days in February') ================================================ FILE: python-for-beginners/05 - Numeric variables/convert_strings_to_numbers_for_math.py ================================================ first_num = input('Enter first number ') second_num = input('Enter second number ') # If you have a string variable containing a number # And you want to treat it as a number # You must convert it to a numeric datatype # int() converts a string to an integer e.g. 5, 8, 416, 506 print(int(first_num) + int(second_num)) # float() converts a string to a decimal or float number e.g. 3.14159, 89.5, 1.0 print(float(first_num) + float(second_num)) ================================================ FILE: python-for-beginners/05 - Numeric variables/doing_math.py ================================================ # Because the variables are assigned numeric values when created # Python knows they are numeric variables first_num = 6 second_num = 2 # You can peform a variety of math operations on numeric values print('addition') print(first_num + second_num) print('subtraction') print(first_num - second_num) print('multiplication') print(first_num * second_num) print('division') print(first_num / second_num) print ('exponent') print(first_num ** second_num) ================================================ FILE: python-for-beginners/05 - Numeric variables/numbers_treated_as_strings.py ================================================ # Python has to guess what datatype a variable should be # since the input function returns a string, the variables it populates # will hold string values first_num = input('Enter first number ') second_num = input('Enter second number ') # Because first_num and second_num are string variables the + operator # concatenates them just like concatenating first_name and last_name print(first_num + second_num) ================================================ FILE: python-for-beginners/05 - Numeric variables/print_pi.py ================================================ # You can use variables to store numeric values pi = 3.14159 print(pi) ================================================ FILE: python-for-beginners/06 - Dates/README.md ================================================ # Date values The [datetime module](https://docs.python.org/3/library/datetime.html) contains a number of classes for manipulating dates and times. Date and time types: - `date` stores year, month, and day - `time` stores hour, minute, and second - `datetime` stores year, month, day, hour, minute, and second - `timedelta` a duration of time between two dates, times, or datetimes When naming variables follow the PEP-8 Style Guide for Python Code - [PEP-8 Style Guide](https://www.python.org/dev/peps/pep-0008/#naming-conventions) Converting from string to datetime - [strptime](https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior) ================================================ FILE: python-for-beginners/06 - Dates/code_challenge.py ================================================ # print today's date # print yesterday's date # ask a user to enter a date # print the date one week from the date entered ================================================ FILE: python-for-beginners/06 - Dates/code_challenge_solution.py ================================================ from datetime import datetime, timedelta # print today's date current_date = datetime.now() print(current_date) # print yesterday's date one_day = timedelta(days=1) yesterday = current_date - one_day print('Yesterday was: ' + str(yesterday)) # ask a user to enter a date date_entered = input('Please enter a date (dd/mm/yyyy): ') date_entered = datetime.strptime(date_entered, '%d/%m/%Y') # print the date one week from the date entered one_week = timedelta(weeks=1) one_week_later = date_entered + one_week print('One week later it will be: ' + str(one_week_later)) ================================================ FILE: python-for-beginners/06 - Dates/date_functions.py ================================================ #To get current date and time we need to use the datetime library from datetime import datetime, timedelta # The now function returns current date and time today = datetime.now() print('Today is: ' + str(today)) #You can use timedelta to add or remove days, or weeks to a date one_day = timedelta(days=1) yesterday = today - one_day print('Yesterday was: ' + str(yesterday)) one_week = timedelta(weeks=1) last_week = today - one_week print('Last week was: ' + str(last_week)) ================================================ FILE: python-for-beginners/06 - Dates/format_date.py ================================================ #To get current date and time we need to use the datetime library from datetime import datetime # The now function returns current date and time today = datetime.now() # use day, month, year, hour, minute, second functions # to display only part of the date # All these functions return integers # Convert them to strings before concatenating them to another string print('Day: ' + str(today.day)) print('Month: ' + str(today.month)) print('Year: ' + str(today.year)) print('Hour: ' + str(today.hour)) print('Minute: ' + str(today.minute)) print('Second: ' + str(today.second)) ================================================ FILE: python-for-beginners/06 - Dates/get_current_date.py ================================================ #To get current date and time we need to use the datetime library from datetime import datetime current_date = datetime.now() # The now function returns current date and time as a datetime object # You must convert the datetime object to a string # before you can concatenate it to another string print('Today is: ' + str(current_date)) ================================================ FILE: python-for-beginners/06 - Dates/input_date.py ================================================ # import the datetime and timedelta modules from datetime import datetime, timedelta # When you ask a user for a date tell them the desired date format birthday = input('When is your birthday (dd/mm/yyyy)? ') # When you convert the string containing the date into a date object # you must specify the expected date format # if the date is not in the expected format Python will raise an exception birthday_date = datetime.strptime(birthday, '%d/%m/%Y') print ('Birthday: ' + str(birthday_date)) # Because we converted the string into a date object # We can use date and time functions such as timedelta with the object one_day = timedelta(days=1) birthday_eve = birthday_date - one_day print('Day before birthday: ' + str(birthday_eve)) ================================================ FILE: python-for-beginners/07 - Error handling/README.md ================================================ # Error handling Error handling in Python is managed through the use of [try/except/finally](https://docs.python.org/3.7/reference/compound_stmts.html#except) Python has numerous [built-in exceptions](https://docs.python.org/3.7/library/exceptions.html). When creating `except` blocks, they need to be created from most specific to most generic according to the [hierarchy](https://docs.python.org/3.7/library/exceptions.html#exception-hierarchy). ================================================ FILE: python-for-beginners/07 - Error handling/logic.py ================================================ x = 206 y = 42 if x < y: print(str(x) + ' is greater than ' + str(y)) ================================================ FILE: python-for-beginners/07 - Error handling/runtime.py ================================================ x = 42 y = 0 try: print(x / y) except ZeroDivisionError as e: # Optionally, log e somewhere print('Sorry, something went wrong') except: print('Something really went wrong') finally: print('This always runs on success or failure') ================================================ FILE: python-for-beginners/07 - Error handling/syntax.py ================================================ x = 42 y = 206 if x == y print('Success') ================================================ FILE: python-for-beginners/08 - Handling conditions/README.md ================================================ # Handling conditions Conditional execution can be completed using the [if](https://docs.python.org/3/reference/compound_stmts.html#the-if-statement) statement `if` syntax ```python if expression: # code to execute else: # code to execute ``` [Comparison operators](https://docs.python.org/3/library/stdtypes.html#comparisons) - < less than - < greater than - == is equal to - \>= greater than or equal to - <= less than or equal to - != not equal to ================================================ FILE: python-for-beginners/08 - Handling conditions/add_else.py ================================================ price = input('how much did you pay? ') price = float(price) if price >= 1.00: # Anything that costs $1.00 or more is charged 7% tax # All statements indented are only executed if price is > = 1 tax = .07 print('Tax rate is: ' + str(tax)) else: # Anything else we do not charge any tax # All statements indented are only executed if price is NOT >= 1 tax = 0 print('Tax rate is: ' + str(tax)) ================================================ FILE: python-for-beginners/08 - Handling conditions/add_else_different_indentation.py ================================================ price = 5.0 if price >= 1.00: tax = .07 else: tax = 0 # the print statement below is not indented so is executed after the if # statement is evaluated print(tax) ================================================ FILE: python-for-beginners/08 - Handling conditions/case_insensitive_comparisons.py ================================================ country = 'CANADA' # by converting the string entered to lowercase and comparing it to a string # that is all lowercase letters I make the comparison case-insensitive # If someone types in CANADA or Canada it will still be a match if country.lower() == 'canada': print('Hello eh') else: print('Hello') ================================================ FILE: python-for-beginners/08 - Handling conditions/check_tax.py ================================================ #Calculate the tax # Anything purchased for more than $1.00 is charged a 7% tax price = input('how much did you pay? ') # Convert the string to a number price = float(price) # Check if the price is greater than 1.00 if price >= 1.00: # Everything over $1.00 is charged 7% tax tax = .07 print('Tax rate is: ' + str(tax)) ================================================ FILE: python-for-beginners/08 - Handling conditions/code_challenge.py ================================================ # Fix the mistakes in this code and test based on the description below # If I enter 2.00 I should see the message "Tax rate is: 0.07" # If I enter 1.00 I should see the message "Tax rate is: 0.07" # If I enter 0.50 I should see the message "Tax rate is: 0" price = input('how much did you pay? ') if price > 1.00: tax = .07 print('Tax rate is: ' + str(tax)) else tax = 0 print('Tax rate is: ' + str(tax)) ================================================ FILE: python-for-beginners/08 - Handling conditions/code_challenge_solution.py ================================================ # Fix the mistakes in this code and test using the following # If I enter 2.00 I should see the message "Tax rate is: 0.07" # If I enter 1.00 I should see the message "Tax rate is: 0.07" # If I enter 0.50 I should see the message "Tax rate is: 0" price = input('how much did you pay? ') price = float(price) if price >= 1.00: tax = .07 print('Tax rate is: ' + str(tax)) else: tax = 0 print('Tax rate is: ' + str(tax)) ================================================ FILE: python-for-beginners/08 - Handling conditions/comparing_strings.py ================================================ country = input('Enter the name of your home country: ') if country == 'canada': # string comparisons are case sensitive # if you typed in CANADA or Canada it will not match print('So you must like hockey!') else: print('You are not from Canada') ================================================ FILE: python-for-beginners/09 - Handling multiple conditions/README.md ================================================ # Handling conditions Conditional execution can be completed using the [if](https://docs.python.org/3/reference/compound_stmts.html#the-if-statement) statement. Adding `elif` allows you to check multiple conditions `if` syntax ```python if expression: # code to execute elif expression: # code to execute else: # code to execute ``` [Boolean operators](https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not) - **x *or* y** - If either x OR y is true, the expression is executed [Comparison operators](https://docs.python.org/3/library/stdtypes.html#comparisons) - < less than - < greater than - == is equal to - \>= greater than or equal to - <= less than or equal to - != not equal to - **x *in* [a,b,c]** Does x match the value of a, b, or c ================================================ FILE: python-for-beginners/09 - Handling multiple conditions/add_else_to_elif.py ================================================ province = input("What province do you live in? ") tax = 0 if province == 'Alberta': tax = 0.05 elif province == 'Nunavut': tax = 0.05 elif province == 'Ontario': tax = 0.13 else: tax = 0.15 print(tax) ================================================ FILE: python-for-beginners/09 - Handling multiple conditions/code_challenge.py ================================================ # Ask a user their name # If their first name starts with A or B # tell them they go to room AB # IF their first name starts with C # tell them to go to room CD # If their first name starts with another letter, ask for their last name # IF their last name starts with Z, tell them to go to room Z # if their last name starts with any other letter, tell them to go to room OTHER # When you are done # Anna should be in room AB # Bob should be in room AB # Charlie should be in room C # Khalid Haque should be in room OTHER # Xin Zhao should be in room Z ================================================ FILE: python-for-beginners/09 - Handling multiple conditions/code_challenge_solution.py ================================================ # Assign people to different rooms when they check in based on their names # When you are done # Anna should be in room AB # Bob should be in room AB # Charlie should be in room C # Khalid Haque should be in room OTHER # Xin Zhao should be in room Z # Ask a user their first name name = input('What is your name? ') # If their first name starts with A or B # tell them they go to room AB first_letter = name[0:1] if first_letter.upper() in ('A','B'): room = 'AB' # If their first name starts with C # tell them to go to room C elif first_letter.upper() == 'C': room = 'C' else: # If their first name starts with another letter, ask for their last name # If their last name starts with Z, tell them to go to room Z last_name = input('what is your last name? ') last_name_first_letter = last_name[0:1] # if their last name starts with any other letter, tell them to go to room OTHER if last_name_first_letter == 'Z': room = 'Z' else: room = 'OTHER' print('Please go to room ' + room) ================================================ FILE: python-for-beginners/09 - Handling multiple conditions/multiple_if_statements.py ================================================ province = input("What province do you live in? ") tax = 0 if province == 'Alberta': tax = 0.05 if province == 'Nunavut': tax = 0.05 if province == 'Ontario': tax = 0.13 print(tax) ================================================ FILE: python-for-beginners/09 - Handling multiple conditions/nested_if.py ================================================ country = input("What country do you live in? ") if country.lower() == 'canada': province = input("What province/state do you live in? ") if province in('Alberta',\ 'Nunavut','Yukon'): tax = 0.05 elif province == 'Ontario': tax = 0.13 else: tax = 0.15 else: tax = 0.0 print(tax) ================================================ FILE: python-for-beginners/09 - Handling multiple conditions/or_statements.py ================================================ province = input("What province do you live in? ") tax = 0 if province == 'Alberta' \ or province == 'Nunavut': tax = 0.05 elif province == 'Ontario': tax = 0.13 else: tax = 0.15 print(tax) ================================================ FILE: python-for-beginners/09 - Handling multiple conditions/use_elif.py ================================================ province = input("What province do you live in? ") tax = 0 if province == 'Alberta': tax = 0.05 elif province == 'Nunavut': tax = 0.05 elif province == 'Ontario': tax = 0.13 print(tax) ================================================ FILE: python-for-beginners/09 - Handling multiple conditions/use_in_statements.py ================================================ province = input("What province do you live in? ") tax = 0 # If multiple values cause the same output you can combine them by listing all # values you want to check for with the in operator if province in('Alberta','Nunavut','Yukon'): tax = 0.05 elif province == 'Ontario': tax = 0.13 else: tax = 0.15 print(tax) ================================================ FILE: python-for-beginners/10 - Complex conditon checks/boolean_variables.py ================================================ # I check to see if the requirements for honour roll are met gpa = float(input('What was your Grade Point Average? ')) lowest_grade = float(input('What was your lowest grade? ')) # Boolean variables allow you to remember a True/False value if gpa >= .85 and lowest_grade >= .70: honour_roll = True else: honour_roll = False # Somewhere later in your code if you need to check if students is # on honour roll, all I need to do is check the boolean variable # I set earlier in my code if honour_roll: print('You made honour roll') ================================================ FILE: python-for-beginners/10 - Complex conditon checks/code_challenge.py ================================================ # When you join a hockey team you get your name on the back of the jersey # but the jersey may not be big enough to hold all the letters # Ask the user for their first name # Ask the user for their last name # if first name is < 10 characters and last name is < 10 characters # print first and last name on the jersey # if first name >= 10 characters long and last name is < 10 characters # print first initial of first name and the entire last name # if first name < 10 characters long and last name is >= 10 characters # print entire first name and first initial of last name # if first name >= 10 characters long and last name is >= 10 characters # print last name only # Test with the following values # first name: Susan last name: Ibach # output: Susan Ibach # first name: Susan last name: ReallyLongLastName # output: Susan R. # first name: ReallyLongFirstName last name: Ibach # output: R. Ibach # first name: ReallyLongFirstName last name: ReallyLongLastName # output: ReallyLongLastName ================================================ FILE: python-for-beginners/10 - Complex conditon checks/code_challenge_solution.py ================================================ # When you join a hockey team you get your name on the back of the jersey # but the jersey may not be big enough to hold all the letters # Ask the user for their first name first_name = input('Please enter your first name: ') # Ask the user for their last name last_name = input('Please enter your last name: ') # if first name is < 10 characters and last name is < 10 characters # print first and last name on the jersey # if first name >= 10 characters long and last name is < 10 characters # print first initial of first name and the entire last name # if first name < 10 characters long and last name is >= 10 characters # print entire first name and first initial of last name # if first name >= 10 characters long and last name is >= 10 characters # print last name only # Check length of first name if len(first_name) >=10: long_first_name = True else: long_first_name = False # Check length of last name if len(last_name) >= 10: long_last_name = True else: long_last_name = False # Evaluate possible jersey print combinations for different lengths if long_first_name and long_last_name: print(last_name) elif long_first_name: print(first_name[0:1] + '. ' + last_name) elif long_last_name: print(first_name + ' ' + last_name[0:1] + '.') else: print(first_name + ' ' + last_name) ================================================ FILE: python-for-beginners/10 - Complex conditon checks/readme.md ================================================ # Complex condition checks Conditional execution can be completed using the [if](https://docs.python.org/3/reference/compound_stmts.html#the-if-statement) statement. `if` syntax ```python if expression: # code to execute elif expression: # code to execute else: # code to execute ``` [Boolean values](https://docs.python.org/3/library/stdtypes.html#boolean-values) can be either `False` or `True` [Boolean operators](https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not) - **x *or* y** - If either x **OR** y is true, the expression is executed - **x *and* y** - If x **AND** y are both true, the expression is executed [Comparison operators](https://docs.python.org/3/library/stdtypes.html#comparisons) - < less than - < greater than - == is equal to - \>= greater than or equal to - <= less than or equal to - != not equal to - **x *in* [a,b,c]** Does x match the value of a, b, or c ================================================ FILE: python-for-beginners/10 - Complex conditon checks/using_and.py ================================================ # A student makes honour roll if their average is >=85 # and their lowest grade is not below 70 gpa = float(input('What was your Grade Point Average? ')) lowest_grade = input('What was your lowest grade? ') lowest_grade = float(lowest_grade) if gpa >= .85 and lowest_grade >= .70: print('You made the honour roll') ================================================ FILE: python-for-beginners/11 - Collections/README.md ================================================ # Collections Collections are groups of items. Python supports several types of collections. Three of the most common are dictionaries, lists and arrays. ## Lists [Lists](https://docs.python.org/3/tutorial/introduction.html#lists) are a collection of items. Lists can be expanded or contracted as needed, and can contain any data type. Lists are most commonly used to store a single column collection of information, however it is possible to [nest lists](https://docs.python.org/3/tutorial/datastructures.html#nested-list-comprehensions) ## Arrays [Arrays](https://docs.python.org/3/library/array.html) are similar to lists, however are designed to store a uniform basic data type, such as integers or floating point numbers. ## Dictionaries [Dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) are key/value pairs of a collection of items. Unlike a list where items can only be accessed by their index or value, dictionaries use keys to identify each item. ================================================ FILE: python-for-beginners/11 - Collections/arrays.py ================================================ from array import array scores = array('d') scores.append(97) scores.append(98) print(scores) ================================================ FILE: python-for-beginners/11 - Collections/common-operations.py ================================================ names = ['Christopher', 'Susan'] print(len(names)) # Get the number of items names.insert(0, 'Bill') # Insert before index print(names) ================================================ FILE: python-for-beginners/11 - Collections/dictionaries.py ================================================ person = {'first': 'Christopher'} person['last'] = 'Harrison' print(person) print(person['first']) ================================================ FILE: python-for-beginners/11 - Collections/lists.py ================================================ names = ['Christopher', 'Susan'] scores = [] scores.append(98) scores.append(99) print(names) print(scores) ================================================ FILE: python-for-beginners/11 - Collections/ranges.py ================================================ names = ['Susan', 'Christopher', 'Bill'] presenters = names[0:2] # Get the first two items # Starting index and number of items to retrieve print(names) print(presenters) ================================================ FILE: python-for-beginners/12 - Loops/README.md ================================================ # Loops ## For loops [For loops](https://docs.python.org/3/reference/compound_stmts.html#the-for-statement) takes each item in an array or collection in order, and assigns it to the variable you define. ``` python names = ['Christopher', 'Susan'] for name in names: print(name) ``` ## While loops [While loops](https://docs.python.org/3/reference/compound_stmts.html#the-while-statement) perform an operation as long as a condition is true. ``` python names = ['Christopher', 'Susan'] index = 0 while index < len(names): name = names[index] print(name) index = index + 1 ``` ================================================ FILE: python-for-beginners/12 - Loops/for.py ================================================ for name in ['Christopher', 'Susan']: print(name) ================================================ FILE: python-for-beginners/12 - Loops/number.py ================================================ # range creates an array # First parameter is the starter # Second indicates the number of numbers to create # range(0, 2) creates [0, 1] for index in range(0, 2): print(index) ================================================ FILE: python-for-beginners/12 - Loops/while.py ================================================ names = ['Christopher', 'Susan'] index = 0 while index < len(names): print(names[index]) # Change the condition!! index = index + 1 ================================================ FILE: python-for-beginners/13 - Functions/README.md ================================================ # Functions Functions allow you to take code that is repeated and move it to a module that can be called when needed. Functions are defined with the `def` keyword and must be declared before the function is called in your code. Functions can accept parameters and return values. - [Functions](https://docs.python.org/3/tutorial/controlflow.html#defining-functions) ```python def functionname(parameter): # code to execute return value ``` ================================================ FILE: python-for-beginners/13 - Functions/code_challenge.py ================================================ # Create a calculator function # The function should accept three parameters: # first_number: a numeric value for the math operation # second_number: a numeric value for the math operation # operation: the word 'add' or 'subtract' # the function should return the result of the two numbers added or subtracted # based on the value passed in for the operator # # Test your function with the values 6,4, add # Should return 10 # # Test your function with the values 6,4, subtract # Should return 2 # # BONUS: Test your function with the values 6, 4 and divide # Have your function return an error message when invalid values are received ================================================ FILE: python-for-beginners/13 - Functions/code_challenge_solution.py ================================================ # Create a calculator function # The function should accept three parameters: # first_number: a numeric value for the math operation # second_number: a numeric value for the math operation # operation: the word 'add' or 'subtract' # the function should return the result of the two numbers added or subtracted # based on the value passed in for the operator # def calculator(first_number, second_number, operation): if operation.upper() == 'ADD': return(float(first_number) + float(second_number)) elif operation.upper() =='SUBTRACT': return(float(first_number) - float(second_number)) else: return('Invalid operation please specify ADD or SUBTRACT') # Test your function with the values 6,4, add # Should return 10 # print('Adding 6 + 4 = ' + str(calculator(6,4,'add'))) # Test your function with the values 6,4, subtract # Should return 2 print('Subtracting 6 - 4 = ' + str(calculator(6,4,'subtract'))) # Test your function with the values 6,4, divide # Should return some sort of error message print('Dividing 6 / 4 = ' + str(calculator(6,4,'divide'))) ================================================ FILE: python-for-beginners/13 - Functions/get_initails_function.py ================================================ # Create function get_initial to accept a name and # return the first letter of the name in uppercase # Parameters: # name: the name of a person # Return value: # first letter of name passed in as a parameter in uppercase def get_initial(name): initial = name[0:1].upper() return initial # This program will ask for someone's name and return the initials first_name = input('Enter your first name: ') # Call get_initial to retrieve initial of name first_name_initial = get_initial(first_name) middle_name = input('Enter your middle name: ') # Call get_initial to retrieve initial of name middle_name_initial = get_initial(middle_name) last_name = input('Enter your last name: ') # Call get_initial to retrieve initial of name last_name_initial = get_initial(last_name) print('Your initials are: ' + first_name_initial \ + middle_name_initial + last_name_initial) ================================================ FILE: python-for-beginners/13 - Functions/get_initials.py ================================================ # Ask for a name and return the initials first_name = input('Enter your first name: ') first_name_initial = first_name[0:1] middle_name = input('Enter your middle name: ') middle_name_initial = middle_name[0:1] last_name = input('Enter your last name: ') last_name_initial = last_name[0:1] print('Your initials are: ' + first_name_initial \ + middle_name_initial + last_name_initial) ================================================ FILE: python-for-beginners/13 - Functions/getting_clever_with_functions_harder_to_read.py ================================================ # Create function get_initial to accept a name and # return the first letter of the name in uppercase # Parameters: # name: the name of a person # Return value: # first letter of name passed in as a parameter in uppercase def get_initial(name): initial = name[0:1].upper() return initial #Ask for someone's name and return the initials first_name = input('Enter your first name: ') middle_name = input('Enter your middle name: ') last_name = input('Enter your last name: ') # Call get_initial function to return first letter of a name print('Your initials are: ' \ + get_initial(first_name) \ + get_initial(middle_name) \ + get_initial(last_name)) ================================================ FILE: python-for-beginners/13 - Functions/print_time_function.py ================================================ import datetime # Create a function called print_time # This function will print the message and current time def print_time(): print('task completed') print(datetime.datetime.now()) print() first_name = 'Susan' # Call print_time() function to display message and current time print_time() for x in range(0,10): print(x) # Call print_time() function to display message and current time print_time() ================================================ FILE: python-for-beginners/13 - Functions/print_time_function_different_messages.py ================================================ from datetime import datetime # What if we want different messages displayed? # Can we still use a function? first_name = 'Susan' print('first name assigned') print(datetime.now()) print() for x in range(0,10): print(x) print('loop completed') print(datetime.now()) print() ================================================ FILE: python-for-beginners/13 - Functions/print_time_function_fix_import.py ================================================ # Import datetime class from datetime library to simplify calls to datetime.now() from datetime import datetime # Create a function called print_time # This function will print the message and current time def print_time(): print('task completed') print(datetime.now()) print() first_name = 'Susan' # Call print_time() function to display message and current time print_time() for x in range(0,10): print(x) # Call print_time() function to display message and current time print_time() ================================================ FILE: python-for-beginners/13 - Functions/print_time_repeated_code.py ================================================ import datetime # print timestamps after each section of code # to see how long sections of code take to run first_name = 'Susan' print('task completed') print(datetime.datetime.now()) print() for x in range(0,10): print(x) print('task completed') print(datetime.datetime.now()) print() ================================================ FILE: python-for-beginners/13 - Functions/print_time_with_message_parameter.py ================================================ from datetime import datetime # Define a function to print the current time and task name # Function the following parameters: # task_name: Name of the task to display to output screen def print_time(task_name): print(task_name) print(datetime.now()) print() first_name = 'Susan' # Call print_time() function to display message and current time # pass in name of task completed print_time('first name assigned') for x in range(0,10): print(x) # Call print_time() function to display message and current time # pass in name of task completed print_time('loop completed') ================================================ FILE: python-for-beginners/14 - Function parameters/code_challenge.py ================================================ # Create a calculator function # The function should accept three parameters: # first_number: a numeric value for the math operation # second_number: a numeric value for the math operation # operation: the word 'add' or 'subtract'. The default operation is 'add' # the function should return the result of the two numbers added or subtracted # based on the value passed in for the operator # # Test your function using named notation passing in only the numbers 6 and 4 # Should return 10 # # Test your function using named notation with the values 6,4, subtract # Should return 2 # # BONUS: Test your function with the values 6, 4 and divide # Have your function return an error message when invalid values are received ================================================ FILE: python-for-beginners/14 - Function parameters/code_challenge_solution.py ================================================ # Create a calculator function # The function should accept three parameters: # first_number: a numeric value for the math operation # second_number: a numeric value for the math operation # operation: the word 'add' or 'subtract'. The default operation is 'add' # the function should return the result of the two numbers added or subtracted # based on the value passed in for the operator # def calculator(first_number, second_number, operation='ADD'): if operation.upper() == 'ADD': return(float(first_number) + float(second_number)) elif operation.upper() =='SUBTRACT': return(float(first_number) - float(second_number)) else: return('Invalid operation please specify ADD or SUBTRACT') # Test your function using named notation passing in only the numbers 6 and 4 # Should return 10 print('Adding 6 + 4 = ' + str(calculator(first_number=6, second_number=4))) # Test your function using named notation with the values 6,4, subtract # Should return 2 # print('Subtracting 6 - 4 = ' + str(calculator(first_number=6, second_number=4, operation='subtract'))) ================================================ FILE: python-for-beginners/14 - Function parameters/get_initials_default_values.py ================================================ # Create a function to return the first initial of a name # Parameters: # name: name of person # force_uppercase: indicates if you always want the initial to be in upppercase: default is True # Return value # first letter of name passed in def get_initial(name, force_uppercase=True): if force_uppercase: initial = name[0:1].upper() else: initial = name[0:1] return initial # Ask for someone's name and return the initial first_name = input('Enter your first name: ') # Call get_initial function to retrieve first letter of name # not passing a value for force_uppercase so default value is used first_name_initial = get_initial(first_name) print('Your initial is: ' + first_name_initial) ================================================ FILE: python-for-beginners/14 - Function parameters/get_initials_function.py ================================================ # This function will take a name and return the # Create a function to return the first initial of a name # Parameters: # name: name of person # Return value # first letter of name passed in def get_initial(name): initial = name[0:1].upper() return initial # Ask for someone's name and return the initials first_name = input('Enter your first name: ') # Call get_initial function to retrieve first letter of name first_name_initial = get_initial(first_name) print('Your initial is: ' + first_name_initial) ================================================ FILE: python-for-beginners/14 - Function parameters/get_initials_multiple_parameters.py ================================================ # Create a function to return the first initial of a name # Parameters: # name: name of person # force_uppercase: indicates if you always want the initial to be in upppercase # Return value # first letter of name passed in def get_initial(name, force_uppercase): if force_uppercase: initial = name[0:1].upper() else: initial = name[0:1] return initial #Ask for someone's name and return the initial first_name = input('Enter your first name: ') # Call get_initial function to retrieve first letter of name # Alwasy return initial in uppercase first_name_initial = get_initial(first_name, False) print('Your initial is: ' + first_name_initial) ================================================ FILE: python-for-beginners/14 - Function parameters/get_initials_named_parameters.py ================================================ # Create a function to return the first initial of a name # Parameters: # name: name of person # force_uppercase: indicates if you always want the initial to be in upppercase # Return value # first letter of name passed in def get_initial(name, force_uppercase): if force_uppercase: initial = name[0:1].upper() else: initial = name[0:1] return initial # Ask for someone's name and return the initial first_name = input('Enter your first name: ') # Call get_initial to retrieve first letter of name # When you use named notation, you can specify parameters in any order first_name_initial = get_initial(force_uppercase=True, \ name=first_name) print('Your initial is: ' + first_name_initial) ================================================ FILE: python-for-beginners/14 - Function parameters/named_parameters_make_code_readable.py ================================================ # Create a function to handle errors that occur during code execution # This will display a message to the user adn may log the error for the support team to # help with debugging # # Parameters: # error_code: Unique error code assigned to each type of error: e.g. 45 is datatype conversion error # error_severity: 0 - fatal error should never occur # 1 - severe error code cannot continue # 2 - warning code can continue but may be missing information in records # log_to_db: Should this error be logged to the database # error_message: Error message to display to user and write to database # source_module: Name of the python module that generated ther error def error_logger(error_code, error_severity, log_to_db, error_message, source_module): print('oh no error: ' + error_message) # Imagine code here that logs our error to a database or file first_number = 10 second_number = 5 # This function call by itself is confusing, I have to look at the # definition of the error_logger function to understand it if first_number > second_number: error_logger(45,1,True,'Second number greater than first','adding_method') if first_number > second_number: # This function call by itself is easier to understand because I can # see how the values I pass in map to the function parameters error_logger(error_code=45, error_severity=1, log_to_db=True, error_message='Second number greater than first', source_module='adding_method') ================================================ FILE: python-for-beginners/14 - Function parameters/readme.md ================================================ # Function parameters Functions allow you to take code that is repeated and move it to a module that can be called when needed. Functions are defined with the `def` keyword and must be declared before the function is called in your code. Functions can accept one or more parameters and return values. - [Functions](https://docs.python.org/3/tutorial/controlflow.html#defining-functions) ```python def function_name(parameter): # code to execute return value ``` Parameters can be assigned a [default value](https://docs.python.org/3/tutorial/controlflow.html#default-argument-values) making them optional when the function is called. ```python def function_name(parameter=default): # code to execute return value ``` When you call a function you may specify the values for the parameters using positional or [named notation](https://docs.python.org/3/tutorial/controlflow.html#keyword-arguments) ```python def function_name(parameter1, parameter2): # code to execute return value # Positional notation pass in arguments in same order as parameters are declared result = function_name(value1,value2) # Named notation result = function_name(parameter1=value1, parameter2=value2) ``` ================================================ FILE: python-for-beginners/15 - Packages/README.md ================================================ # Packages and modules ## Modules [Modules](https://docs.python.org/3/tutorial/modules.html) allow you to store reusable blocks of code, such as functions, in separate files. They're referenced by using the `import` statement. ``` python # import module as namespace import helpers helpers.display('Not a warning') # import all into current namespace from helpers import * display('Not a warning') # import specific items into current namespace from helpers import display display('Not a warning') ``` ## Packages [Distribution packages](https://packaging.python.org/glossary/#term-distribution-package) are external archive files which contain resources such as classes and functions. Most every application you create will make use of one or more packages. Imports from packages follow the same syntax as modules you've created. The [Python Package index](https://pypi.org/) contains a full list of packages you can install using [pip](https://pip.pypa.io/en/stable/). ## Virtual environments [Virtual environments](https://docs.python.org/3.7/tutorial/venv.html) allow you to install packages into an isolated folder. This allows you to better manage versions. ``` console ``` ================================================ FILE: python-for-beginners/15 - Packages/color_import_demo.py ================================================ import colorama colorama.init() print(colorama.Fore.RED + 'This is red') from colorama import * init() print(Fore.BLUE + 'This is blue') from colorama import init, Fore print(Fore.GREEN + 'This is green') ================================================ FILE: python-for-beginners/15 - Packages/helpers.py ================================================ def display(message, is_warning=False): if is_warning: print('Warning!!') print(message) ================================================ FILE: python-for-beginners/15 - Packages/import_module.py ================================================ # import module as namespace import helpers helpers.display('Not a warning') # import all into current namespace from helpers import * display('Not a warning') # import specific items into current namespace from helpers import display display('Not a warning') ================================================ FILE: python-for-beginners/15 - Packages/requirements.txt ================================================ colorama ================================================ FILE: python-for-beginners/16 - Calling APIs/call_api.py ================================================ # This code will show you how to call the Computer Vision API from Python # You can find documentation on the Computer Vision Analyze Image method here # https://westus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/56f91f2e778daf14a499e1fa # Use the requests library to simplify making a REST API call from Python import requests # We will need the json library to read the data passed back # by the web service import json # You need to update the SUBSCRIPTION_KEY to # they key for your Computer Vision Service SUBSCRIPTION_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxx" # You need to update the vision_service_address to the address of # your Computer Vision Service vision_service_address = "https://canadacentral.api.cognitive.microsoft.com/vision/v2.0/" # Add the name of the function you want to call to the address address = vision_service_address + "analyze" # According to the documentation for the analyze image function # There are three optional parameters: language, details & visualFeatures parameters = {'visualFeatures':'Description,Color', 'language':'en'} # Open the image file to get a file object containing the image to analyze image_path = "./TestImages/PolarBear.jpg" image_data = open(image_path, "rb").read() # According to the documentation for the analyze image function # we need to specify the subscription key and the content type # in the HTTP header. Content-Type is application/octet-stream when you pass in a image directly headers = {'Content-Type': 'application/octet-stream', 'Ocp-Apim-Subscription-Key': SUBSCRIPTION_KEY} # According to the documentation for the analyze image function # we use HTTP POST to call this function response = requests.post(address, headers=headers, params=parameters, data=image_data) # Raise an exception if the call returns an error code response.raise_for_status() # Display the JSON results returned results = response.json() print(json.dumps(results)) ================================================ FILE: python-for-beginners/16 - Calling APIs/code_challenge.py ================================================ # Challenge #1 # Create an Azure Custom Vision Service # Analyze an image and return the JSON describing the image. # call_api.py is a completed solution, but it will not run unless # you do the following: # Create a Custom Vision Service in Azure # Update the Key # Update the address of your service # Update the image file and location # Bonus Challenge # Your skills are growing, it's time to start trying to figure things out on your own # based on documentation. You have already called one function of the Computer Vision Service # Now try calling another # Instead of calling the analyze method of the service, try calling the OCR method # Here is some helpful documentation # https://docs.microsoft.com/azure/cognitive-services/Computer-vision/concept-recognizing-text#ocr-optical-character-recognition-api # https://westus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/56f91f2e778daf14a499e1fc # Use the documentation to figure out # The correct function name for the address # The parameters you need to pass the function # The HTTP Headers required # Pass in an image containing text # The JSON returned will contain the string of text in the image # Good luck! ================================================ FILE: python-for-beginners/16 - Calling APIs/readme.md ================================================ # Calling APIs You can call functions called by other programs hosted on web servers. [Microsoft Azure Cognitive Services](https://docs.microsoft.com/azure/cognitive-services/?WT.mc_id=python-c9-niner) contain a number of APIs you can call from your code to add intelligence to your apps and websites. In the code example you call the [Analyze Image](https://westus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/56f91f2e778daf14a499e1fa0) function of the [Computer Vision](https://docs.microsoft.com/azure/cognitive-services/computer-vision/?WT.mc_id=python-c9-niner?WT.mc_id=python-c9-niner) Requirements for calling an API: - [API Key](https://azure.microsoft.com/try/cognitive-services/?WT.mc_id=python-c9-niner) to give you permission to call the API - Address or Endpoint of the service - function name of method to call as listed in the [API documentation](https://westus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/56f91f2e778daf14a499e1fa?WT.mc_id=python-c9-niner?WT.mc_id=python-c9-niner) - function parameters as listed in the [API documentation](https://westus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/56f91f2e778daf14a499e1fa?WT.mc_id=python-c9-niner) - HTTP Headers as listed in the [API documentation](https://westus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/56f91f2e778daf14a499e1fa?WT.mc_id=python-c9-niner) ================================================ FILE: python-for-beginners/17 - JSON/create_json_from_dict.py ================================================ import json # Create a dictionary object person_dict = {'first': 'Christopher', 'last':'Harrison'} # Add additional key pairs to dictionary as needed person_dict['City']='Seattle' # Convert dictionary to JSON object person_json = json.dumps(person_dict) # Print JSON object print(person_json) ================================================ FILE: python-for-beginners/17 - JSON/create_json_with_list.py ================================================ import json # Create a dictionary object person_dict = {'first': 'Christopher', 'last':'Harrison'} # Add additional key pairs to dictionary as needed person_dict['City']='Seattle' # Create a list object of programming languages languages_list = ['CSharp','Python','JavaScript'] # Add list object to dictionary for the languages key person_dict['languages']= languages_list # Convert dictionary to JSON object person_json = json.dumps(person_dict) # Print JSON object print(person_json) ================================================ FILE: python-for-beginners/17 - JSON/create_json_with_nested_dict.py ================================================ import json # Create a dictionary object person_dict = {'first': 'Christopher', 'last':'Harrison'} # Add additional key pairs to dictionary as needed person_dict['City']='Seattle' # create a staff dictionary # assign a person to a staff position of program manager staff_dict ={} staff_dict['Program Manager']=person_dict # Convert dictionary to JSON object staff_json = json.dumps(staff_dict) # Print JSON object print(staff_json) ================================================ FILE: python-for-beginners/17 - JSON/read_json.py ================================================ # This code will show you how to call the Computer Vision API from Python # You can find documentation on the Computer Vision Analyze Image method here # https://westus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/56f91f2e778daf14a499e1fa # Use the requests library to simplify making a REST API call from Python import requests # We will need the json library to read the data passed back # by the web service import json # We need the address of our Computer vision service vision_service_address = "https://canadacentral.api.cognitive.microsoft.com/vision/v2.0/" # Add the name of the function we want to call to the address address = vision_service_address + "analyze" # According to the documentation for the analyze image function # There are three optional parameters: language, details & visualFeatures parameters = {'visualFeatures':'Description,Color', 'language':'en'} # We need the key to access our Computer Vision Service subscription_key = "cf229a23c3054905b5a8ad512edfa9dd" # Open the image file to get a file object containing the image to analyze image_path = "./TestImages/PolarBear.jpg" image_data = open(image_path, 'rb').read() # According to the documentation for the analyze image function # we need to specify the subscription key and the content type # in the HTTP header. Content-Type is application/octet-stream when you pass in a image directly headers = {'Content-Type': 'application/octet-stream', 'Ocp-Apim-Subscription-Key': subscription_key} # According to the documentation for the analyze image function # we use HTTP POST to call this function response = requests.post(address, headers=headers, params=parameters, data=image_data) # Raise an exception if the call returns an error code response.raise_for_status() # Display the JSON results returned results = response.json() print(json.dumps(results)) print('requestId') print (results['requestId']) print('dominantColorBackground') print(results['color']['dominantColorBackground']) print('first_tag') print(results['description']['tags'][0]) for item in results['description']['tags']: print(item) print('caption text') print(results['description']['captions'][0]['text']) ================================================ FILE: python-for-beginners/17 - JSON/read_key_pair.py ================================================ # This code will show you how to call the Computer Vision API from Python # You can find documentation on the Computer Vision Analyze Image method here # https://westus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/56f91f2e778daf14a499e1fa # Use the requests library to simplify making a REST API call from Python import requests # We will need the json library to read the data passed back # by the web service import json # We need the address of our Computer vision service vision_service_address = "https://canadacentral.api.cognitive.microsoft.com/vision/v2.0/" # Add the name of the function we want to call to the address address = vision_service_address + "analyze" # According to the documentation for the analyze image function # There are three optional parameters: language, details & visualFeatures parameters = {'visualFeatures':'Description,Color', 'language':'en'} # We need the key to access our Computer Vision Service subscription_key = "xxxxxxxxxxxxxxxxxxxx" # Open the image file to get a file object containing the image to analyze image_path = "./TestImages/PolarBear.jpg" image_data = open(image_path, 'rb').read() # According to the documentation for the analyze image function # we need to specify the subscription key and the content type # in the HTTP header. Content-Type is application/octet-stream when you pass in a image directly headers = {'Content-Type': 'application/octet-stream', 'Ocp-Apim-Subscription-Key': subscription_key} # According to the documentation for the analyze image function # we use HTTP POST to call this function response = requests.post(address, headers=headers, params=parameters, data=image_data) # Raise an exception if the call returns an error code response.raise_for_status() # Display the raw JSON results returned results = response.json() print(json.dumps(results)) # print the value for requestId from the JSON output print() print('requestId') print (results['requestId']) ================================================ FILE: python-for-beginners/17 - JSON/read_key_pair_list.py ================================================ # This code will show you how to call the Computer Vision API from Python # You can find documentation on the Computer Vision Analyze Image method here # https://westus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/56f91f2e778daf14a499e1fa # Use the requests library to simplify making a REST API call from Python import requests # We will need the json library to read the data passed back # by the web service import json # We need the address of our Computer vision service vision_service_address = "https://canadacentral.api.cognitive.microsoft.com/vision/v2.0/" # Add the name of the function we want to call to the address address = vision_service_address + "analyze" # According to the documentation for the analyze image function # There are three optional parameters: language, details & visualFeatures parameters = {'visualFeatures':'Description,Color', 'language':'en'} # We need the key to access our Computer Vision Service subscription_key = "xxxxxxxxxxxxxxxxxxxxxxx" # Open the image file to get a file object containing the image to analyze image_path = "./TestImages/PolarBear.jpg" image_data = open(image_path, 'rb').read() # According to the documentation for the analyze image function # we need to specify the subscription key and the content type # in the HTTP header. Content-Type is application/octet-stream when you pass in a image directly headers = {'Content-Type': 'application/octet-stream', 'Ocp-Apim-Subscription-Key': subscription_key} # According to the documentation for the analyze image function # we use HTTP POST to call this function response = requests.post(address, headers=headers, params=parameters, data=image_data) # Raise an exception if the call returns an error code response.raise_for_status() # Display the JSON results returned in their raw JSON format results = response.json() print(json.dumps(results)) # Print out all the tags in the description print() print('all tags') for item in results['description']['tags']: print(item) # print out the first tag in the description print() print('first_tag') print(results['description']['tags'][0]) ================================================ FILE: python-for-beginners/17 - JSON/read_subkey.py ================================================ # This code will show you how to call the Computer Vision API from Python # You can find documentation on the Computer Vision Analyze Image method here # https://westus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/56f91f2e778daf14a499e1fa # Use the requests library to simplify making a REST API call from Python import requests # We will need the json library to read the data passed back # by the web service import json # We need the address of our Computer vision service vision_service_address = "https://canadacentral.api.cognitive.microsoft.com/vision/v2.0/" # Add the name of the function we want to call to the address address = vision_service_address + "analyze" # According to the documentation for the analyze image function # There are three optional parameters: language, details & visualFeatures parameters = {'visualFeatures':'Description,Color', 'language':'en'} # We need the key to access our Computer Vision Service subscription_key = "xxxxxxxxxxxxxxxxxxxxxxx" # Open the image file to get a file object containing the image to analyze image_path = "./TestImages/PolarBear.jpg" image_data = open(image_path, 'rb').read() # According to the documentation for the analyze image function # we need to specify the subscription key and the content type # in the HTTP header. Content-Type is application/octet-stream when you pass in a image directly headers = {'Content-Type': 'application/octet-stream', 'Ocp-Apim-Subscription-Key': subscription_key} # According to the documentation for the analyze image function # we use HTTP POST to call this function response = requests.post(address, headers=headers, params=parameters, data=image_data) # Raise an exception if the call returns an error code response.raise_for_status() # Display the raw JSON results returned results = response.json() print(json.dumps(results)) # Print the value for dominantColorBackground from the color keys print() print('dominantColorBackground') print(results['color']['dominantColorBackground']) ================================================ FILE: python-for-beginners/17 - JSON/readme.md ================================================ # JSON Many APIs return data in [JSON](https://json.org/), JavaScript Object Notation. JSON is a standard format that can is readable by humans and parsed or generated by code. JSON is built on two structures: - collections of key/value pairs - lists of values JSON Linters will format JSON so it easier to read by a human. The following website have JSON linters: - [JSONLint](https://jsonlint.com/) - [ConvertJson.com](http://www.convertjson.com/jsonlint.htm) - [JSON schema linter](https://www.json-schema-linter.com/) Python includes a [json](https://docs.python.org/2/library/json.html) module which helps you encode and decode JSON ================================================ FILE: python-for-beginners/18 - Decorators/README.md ================================================ # Decorators [Decorators](https://www.python.org/dev/peps/pep-0318/) are similar to attributes in that they add meaning or functionality to blocks of code in Python. They're frequently used in frameworks such as [Flask](http://flask.pocoo.org/) or [Django](https://www.djangoproject.com/). The most common interaction you'll have with decorators as a Python developer is through using them rather than creating them. ``` python # Example decorator @log(True) def sample_function(): print('this is a sample function') ``` ================================================ FILE: python-for-beginners/18 - Decorators/creating_decorators.py ================================================ import functools from colorama import init, Fore init() def color(color): def wrapper(func): @functools.wraps(func) def runner(*args, **kwargs): print(color + 'changing to blue') func(*args, **kwargs) return runner return wrapper @color(color=Fore.BLUE) def greeter(): print('Hello, world!!') print('Just saying hi again') greeter() ================================================ FILE: python-for-beginners/README.md ================================================ # Python for beginners ## Overview Getting started with a new environment can be challenging, especially when you literally don't even speak the language. Fortunately, we created a set of videos to help get you up and running with the language, so you can focus on the task at hand - learning how to create applications using Python. We don't dig into specific frameworks, but we help get you ready to start exploring on your own. We'll show you the core Python concepts you'll need as you begin your journey into web development on popular frameworks such as [Django](https://djangoproject.com) and [Flask](https://flask.palletsprojects.com/en/1.1.x/), use AI services such as [Cognitive Services](https://azure.microsoft.com/services/cognitive-services/), or even machine learning. ### What you'll learn - The basics of Python - Starting a project - Common syntax - Package management ### What we don't cover - Class design and inheritance - Asynchronous programming - Basics of programming ## Prerequisites - [An understanding of Git](https://git-scm.com/book/en/v1/Getting-Started) - Light experience with another programming language, such as [JavaScript](https://www.edx.org/course/javascript-introduction) ## Next steps As the goal of this course is to help get you up to speed on Python so you can work through a quick start, the next step after completing the videos is to follow a tutorial! Here's a few of our favorites: - [Quickstart: Detect faces in an image using the Face REST API and Python](https://docs.microsoft.com/azure/cognitive-services/face/QuickStarts/Python?WT.mc_id=python-c9-niner?WT.mc_id=python-c9-niner) - [Quickstart: Analyze a local image using the Computer Vision REST API and Python](https://docs.microsoft.com/azure/cognitive-services/computer-vision/quickstarts/python-disk?WT.mc_id=python-c9-niner?WT.mc_id=python-c9-niner) - [Quickstart: Using the Python REST API to call the Text Analytics Cognitive Service](https://docs.microsoft.com/azure/cognitive-services/Text-Analytics/quickstarts/python?WT.mc_id=python-c9-niner?WT.mc_id=python-c9-niner) - [Tutorial: Build a Flask app with Azure Cognitive Services](https://docs.microsoft.com/azure/cognitive-services/translator/tutorial-build-flask-app-translation-synthesis?WT.mc_id=python-c9-niner) - [Flask tutorial in Visual Studio Code](https://code.visualstudio.com/docs/python/tutorial-flask?WT.mc_id=python-c9-niner) - [Django tutorial in Visual Studio Code](https://code.visualstudio.com/docs/python/tutorial-django?WT.mc_id=python-c9-niner)