Showing preview only (435K chars total). Download the full file or copy to clipboard to get everything.
Repository: minsuk-heo/pandas
Branch: master
Commit: 85331d806d59
Files: 6
Total size: 418.1 KB
Directory structure:
gitextract_520hdf_4/
├── Pandas_Cheatsheet.ipynb
├── data/
│ ├── friend_list.csv
│ ├── friend_list.txt
│ ├── friend_list_no_head.csv
│ └── friend_list_tab.txt
└── 팬더스_명령어_꿀팁.ipynb
================================================
FILE CONTENTS
================================================
================================================
FILE: Pandas_Cheatsheet.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# What is Pandas?\n",
"python library for data manipulation and analysis"
]
},
{
"cell_type": "code",
"execution_count": 178,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"data_frame = pd.read_csv('data/friend_list.csv')"
]
},
{
"cell_type": "code",
"execution_count": 179,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Julia</td>\n",
" <td>40</td>\n",
" <td>dentist</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brian</td>\n",
" <td>45</td>\n",
" <td>manager</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Chris</td>\n",
" <td>25</td>\n",
" <td>intern</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher\n",
"3 Julia 40 dentist\n",
"4 Brian 45 manager\n",
"5 Chris 25 intern"
]
},
"execution_count": 179,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_frame"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# What is DataFrame?\n",
"dataframe is a 2-dimensional labeled data structure with columns"
]
},
{
"cell_type": "code",
"execution_count": 180,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Julia</td>\n",
" <td>40</td>\n",
" <td>dentist</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brian</td>\n",
" <td>45</td>\n",
" <td>manager</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher\n",
"3 Julia 40 dentist\n",
"4 Brian 45 manager"
]
},
"execution_count": 180,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_frame.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# What is Series?\n",
"Every single column in dataframe is series"
]
},
{
"cell_type": "code",
"execution_count": 181,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.series.Series"
]
},
"execution_count": 181,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(data_frame.job)"
]
},
{
"cell_type": "code",
"execution_count": 182,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>STUDENT</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>DEVELOPER</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>TEACHER</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Julia</td>\n",
" <td>40</td>\n",
" <td>DENTIST</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brian</td>\n",
" <td>45</td>\n",
" <td>MANAGER</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 STUDENT\n",
"1 Jenny 30 DEVELOPER\n",
"2 Nate 30 TEACHER\n",
"3 Julia 40 DENTIST\n",
"4 Brian 45 MANAGER"
]
},
"execution_count": 182,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_frame.job = data_frame.job.str.upper()\n",
"data_frame.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Series** is just wrapper for python list"
]
},
{
"cell_type": "code",
"execution_count": 183,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>num</th>\n",
" <th>word</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>one</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>two</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>three</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" num word\n",
"0 1 one\n",
"1 2 two\n",
"2 3 three"
]
},
"execution_count": 183,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s1 = pd.core.series.Series(['one', 'two', 'three'])\n",
"s2 = pd.core.series.Series([1, 2, 3])\n",
"pd.DataFrame(data=dict(word=s1, num=s2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Why Pandas?\n",
"\n",
"Very similar to Excel spreadsheet view, \n",
"support various functions for data manipulation and analysis. \n",
"Fast based on Numpy. \n",
"Easy to manipulate data for your purpose"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Read File to DataFrame\n",
"A **Data frame** is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"by default, pandas support csv format"
]
},
{
"cell_type": "code",
"execution_count": 184,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('data/friend_list.csv')"
]
},
{
"cell_type": "code",
"execution_count": 185,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Julia</td>\n",
" <td>40</td>\n",
" <td>dentist</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brian</td>\n",
" <td>45</td>\n",
" <td>manager</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Chris</td>\n",
" <td>25</td>\n",
" <td>intern</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher\n",
"3 Julia 40 dentist\n",
"4 Brian 45 manager\n",
"5 Chris 25 intern"
]
},
"execution_count": 185,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"you can read txt file like below, if the txt file data are comma separated"
]
},
{
"cell_type": "code",
"execution_count": 186,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('data/friend_list.txt')"
]
},
{
"cell_type": "code",
"execution_count": 187,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Julia</td>\n",
" <td>40</td>\n",
" <td>dentist</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brian</td>\n",
" <td>45</td>\n",
" <td>manager</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher\n",
"3 Julia 40 dentist\n",
"4 Brian 45 manager"
]
},
"execution_count": 187,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"if txt file delimiter is not comma, you can use define delimiter using keyword argument"
]
},
{
"cell_type": "code",
"execution_count": 188,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('data/friend_list_tab.txt', delimiter = \"\\t\")"
]
},
{
"cell_type": "code",
"execution_count": 189,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Julia</td>\n",
" <td>40</td>\n",
" <td>dentist</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brian</td>\n",
" <td>45</td>\n",
" <td>manager</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher\n",
"3 Julia 40 dentist\n",
"4 Brian 45 manager"
]
},
"execution_count": 189,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"if data file doesn't have header, \n",
"Use header = None like below, so first column not to be your column header"
]
},
{
"cell_type": "code",
"execution_count": 190,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('data/friend_list_no_head.csv', header = None)"
]
},
{
"cell_type": "code",
"execution_count": 191,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Julia</td>\n",
" <td>40</td>\n",
" <td>dentist</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brian</td>\n",
" <td>45</td>\n",
" <td>manager</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 1 2\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher\n",
"3 Julia 40 dentist\n",
"4 Brian 45 manager"
]
},
"execution_count": 191,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"you can add column header after you create dataframe"
]
},
{
"cell_type": "code",
"execution_count": 192,
"metadata": {},
"outputs": [],
"source": [
"df.columns = ['name', 'age', 'job']"
]
},
{
"cell_type": "code",
"execution_count": 193,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Julia</td>\n",
" <td>40</td>\n",
" <td>dentist</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brian</td>\n",
" <td>45</td>\n",
" <td>manager</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher\n",
"3 Julia 40 dentist\n",
"4 Brian 45 manager"
]
},
"execution_count": 193,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"you can create column header for no header data at once"
]
},
{
"cell_type": "code",
"execution_count": 194,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('data/friend_list_no_head.csv', header = None, names=['name', 'age', 'job'])"
]
},
{
"cell_type": "code",
"execution_count": 195,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Julia</td>\n",
" <td>40</td>\n",
" <td>dentist</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brian</td>\n",
" <td>45</td>\n",
" <td>manager</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher\n",
"3 Julia 40 dentist\n",
"4 Brian 45 manager"
]
},
"execution_count": 195,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create DataFrame\n",
"when you want to create dataframe from your python code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## from dictionary"
]
},
{
"cell_type": "code",
"execution_count": 196,
"metadata": {},
"outputs": [],
"source": [
"friend_dict_list = [{'name': 'Jone', 'age': 20, 'job': 'student'},\n",
" {'name': 'Jenny', 'age': 30, 'job': 'developer'},\n",
" {'name': 'Nate', 'age': 30, 'job': 'teacher'}]\n",
"df = pd.DataFrame(friend_dict_list)"
]
},
{
"cell_type": "code",
"execution_count": 197,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" <td>Jone</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" <td>Jenny</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" <td>Nate</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job name\n",
"0 20 student Jone\n",
"1 30 developer Jenny\n",
"2 30 teacher Nate"
]
},
"execution_count": 197,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"if you need fixed column order, you can adjust column order like below,"
]
},
{
"cell_type": "code",
"execution_count": 198,
"metadata": {},
"outputs": [],
"source": [
"df = df[['name', 'age', 'job']]"
]
},
{
"cell_type": "code",
"execution_count": 199,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Jone</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 Jone 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher"
]
},
"execution_count": 199,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## from OrderedDict\n",
"OrderedDict helps you to have fixed column order at once"
]
},
{
"cell_type": "code",
"execution_count": 200,
"metadata": {},
"outputs": [],
"source": [
"from collections import OrderedDict"
]
},
{
"cell_type": "code",
"execution_count": 201,
"metadata": {},
"outputs": [],
"source": [
"friend_ordered_dict = OrderedDict([ ('name', ['John', 'Jenny', 'Nate']),\n",
" ('age', [20, 30, 30]),\n",
" ('job', ['student', 'developer', 'teacher']) ] )\n",
"df = pd.DataFrame.from_dict(friend_ordered_dict)"
]
},
{
"cell_type": "code",
"execution_count": 202,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher"
]
},
"execution_count": 202,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## from list"
]
},
{
"cell_type": "code",
"execution_count": 203,
"metadata": {},
"outputs": [],
"source": [
"friend_list = [ ['John', 20, 'student'],['Jenny', 30, 'developer'],['Nate', 30, 'teacher'] ]\n",
"column_name = ['name', 'age', 'job']\n",
"df = pd.DataFrame.from_records(friend_list, columns=column_name)"
]
},
{
"cell_type": "code",
"execution_count": 204,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher"
]
},
"execution_count": 204,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 205,
"metadata": {},
"outputs": [],
"source": [
"friend_list = [ \n",
" ['name',['John', 'Jenny', 'Nate']],\n",
" ['age',[20,30,30]],\n",
" ['job',['student', 'developer', 'teacher']] \n",
" ]\n",
"df = pd.DataFrame.from_items(friend_list)"
]
},
{
"cell_type": "code",
"execution_count": 206,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher"
]
},
"execution_count": 206,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Write DataFrame to File"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"here is one dataframe example with header"
]
},
{
"cell_type": "code",
"execution_count": 207,
"metadata": {},
"outputs": [],
"source": [
"friend_list = [ \n",
" ['name',['John', 'Jenny', 'nate']],\n",
" ['age',[20,30,30]],\n",
" ['job',['student', 'developer', 'teacher']] \n",
" ]\n",
"df = pd.DataFrame.from_items(friend_list)"
]
},
{
"cell_type": "code",
"execution_count": 208,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 nate 30 teacher"
]
},
"execution_count": 208,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"you can create csv file using below command,"
]
},
{
"cell_type": "code",
"execution_count": 209,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv('friend_list_from_df.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"below is one example of dataframe **doesn't** have header"
]
},
{
"cell_type": "code",
"execution_count": 210,
"metadata": {},
"outputs": [],
"source": [
"friend_list = [ ['John', 20, 'student'],['Jenny', 30, 'developer'],['Nate', 30, 'teacher'] ]\n",
"df = pd.DataFrame.from_records(friend_list)"
]
},
{
"cell_type": "code",
"execution_count": 211,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 1 2\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher"
]
},
"execution_count": 211,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"you can write csv file using below command,"
]
},
{
"cell_type": "code",
"execution_count": 212,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv('friend_list_from_df.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"you also can write txt file using same command"
]
},
{
"cell_type": "code",
"execution_count": 213,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv('friend_list_from_df.txt')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"by default, header and index are True like below, even if you don't mention it in the command"
]
},
{
"cell_type": "code",
"execution_count": 214,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv('friend_list_from_df.csv', header = True, index = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**header = False** means you don't want to create column names. no 0,1,2 at column name \n",
"**index = False** means you don't want to create row names. no 0,1,2 at row name"
]
},
{
"cell_type": "code",
"execution_count": 215,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv('friend_list_from_df.csv', header = False, index = False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"you can specify add column names by giving **header** with list"
]
},
{
"cell_type": "code",
"execution_count": 216,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv('friend_list_from_df.csv', header = ['name', 'age', 'job'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"below is dataframe has **None** value"
]
},
{
"cell_type": "code",
"execution_count": 217,
"metadata": {},
"outputs": [],
"source": [
"friend_list = [ \n",
" ['name',['John', None, 'nate']],\n",
" ['age',[20,None,30]],\n",
" ['job',['student', 'developer', 'teacher']] \n",
" ]\n",
"df = pd.DataFrame.from_items(friend_list)"
]
},
{
"cell_type": "code",
"execution_count": 218,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20.0</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>nate</td>\n",
" <td>30.0</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20.0 student\n",
"1 None NaN developer\n",
"2 nate 30.0 teacher"
]
},
"execution_count": 218,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 219,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv('friend_list_from_df.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**na_rep** replace **None** with provided value"
]
},
{
"cell_type": "code",
"execution_count": 220,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv('friend_list_from_df.csv', na_rep = '-')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Select Row"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## by index"
]
},
{
"cell_type": "code",
"execution_count": 221,
"metadata": {},
"outputs": [],
"source": [
"friend_list = [ \n",
" ['name',['John', 'Jenny', 'Nate']],\n",
" ['age',[20,30,30]],\n",
" ['job',['student', 'developer', 'teacher']] \n",
" ]\n",
"df = pd.DataFrame.from_items(friend_list)"
]
},
{
"cell_type": "code",
"execution_count": 222,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher"
]
},
"execution_count": 222,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"select rows from index 1 to index 2"
]
},
{
"cell_type": "code",
"execution_count": 223,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher"
]
},
"execution_count": 223,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[1:3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"select row index 0 and index 2"
]
},
{
"cell_type": "code",
"execution_count": 224,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"2 Nate 30 teacher"
]
},
"execution_count": 224,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[[0,2]]"
]
},
{
"cell_type": "code",
"execution_count": 225,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher"
]
},
"execution_count": 225,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## by column condition"
]
},
{
"cell_type": "code",
"execution_count": 226,
"metadata": {},
"outputs": [],
"source": [
"df_filtered = df[df.age > 25]"
]
},
{
"cell_type": "code",
"execution_count": 227,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher"
]
},
"execution_count": 227,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_filtered"
]
},
{
"cell_type": "code",
"execution_count": 228,
"metadata": {},
"outputs": [],
"source": [
"df_filtered = df.query('age>25')"
]
},
{
"cell_type": "code",
"execution_count": 229,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher"
]
},
"execution_count": 229,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_filtered"
]
},
{
"cell_type": "code",
"execution_count": 230,
"metadata": {},
"outputs": [],
"source": [
"df_filtered = df[(df.age >25) & (df.name == 'Nate')]"
]
},
{
"cell_type": "code",
"execution_count": 231,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"2 Nate 30 teacher"
]
},
"execution_count": 231,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_filtered"
]
},
{
"cell_type": "code",
"execution_count": 232,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher"
]
},
"execution_count": 232,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Filter Column"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## by index"
]
},
{
"cell_type": "code",
"execution_count": 233,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 1 2\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher"
]
},
"execution_count": 233,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"friend_list = [ ['John', 20, 'student'],['Jenny', 30, 'developer'],['Nate', 30, 'teacher'] ]\n",
"df = pd.DataFrame.from_records(friend_list)\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"select all rows, from column 0 to column 1"
]
},
{
"cell_type": "code",
"execution_count": 234,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 1\n",
"0 John 20\n",
"1 Jenny 30\n",
"2 Nate 30"
]
},
"execution_count": 234,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[:, 0:2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"select all rows, column 0 and column 2"
]
},
{
"cell_type": "code",
"execution_count": 235,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 2\n",
"0 John student\n",
"1 Jenny developer\n",
"2 Nate teacher"
]
},
"execution_count": 235,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[:,[0,2]]"
]
},
{
"cell_type": "code",
"execution_count": 236,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 1 2\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher"
]
},
"execution_count": 236,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## by column name"
]
},
{
"cell_type": "code",
"execution_count": 237,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Julia</td>\n",
" <td>40</td>\n",
" <td>dentist</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brian</td>\n",
" <td>45</td>\n",
" <td>manager</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Chris</td>\n",
" <td>25</td>\n",
" <td>intern</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher\n",
"3 Julia 40 dentist\n",
"4 Brian 45 manager\n",
"5 Chris 25 intern"
]
},
"execution_count": 237,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# you can create column header for no header data at once\n",
"df = pd.read_csv('data/friend_list_no_head.csv', header = None, names=['name', 'age', 'job'])\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 238,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Julia</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brian</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Chris</td>\n",
" <td>25</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age\n",
"0 John 20\n",
"1 Jenny 30\n",
"2 Nate 30\n",
"3 Julia 40\n",
"4 Brian 45\n",
"5 Chris 25"
]
},
"execution_count": 238,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_filtered = df[['name', 'age']]\n",
"df_filtered"
]
},
{
"cell_type": "code",
"execution_count": 239,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>40</td>\n",
" <td>dentist</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>45</td>\n",
" <td>manager</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>25</td>\n",
" <td>intern</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job\n",
"0 20 student\n",
"1 30 developer\n",
"2 30 teacher\n",
"3 40 dentist\n",
"4 45 manager\n",
"5 25 intern"
]
},
"execution_count": 239,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.filter(items=['age', 'job'])"
]
},
{
"cell_type": "code",
"execution_count": 240,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Julia</td>\n",
" <td>40</td>\n",
" <td>dentist</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brian</td>\n",
" <td>45</td>\n",
" <td>manager</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Chris</td>\n",
" <td>25</td>\n",
" <td>intern</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 John 20 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher\n",
"3 Julia 40 dentist\n",
"4 Brian 45 manager\n",
"5 Chris 25 intern"
]
},
"execution_count": 240,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 241,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Julia</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brian</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Chris</td>\n",
" <td>25</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age\n",
"0 John 20\n",
"1 Jenny 30\n",
"2 Nate 30\n",
"3 Julia 40\n",
"4 Brian 45\n",
"5 Chris 25"
]
},
"execution_count": 241,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# select columns containing 'a'\n",
"df.filter(like='a',axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 242,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>teacher</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>dentist</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>manager</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>intern</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" job\n",
"0 student\n",
"1 developer\n",
"2 teacher\n",
"3 dentist\n",
"4 manager\n",
"5 intern"
]
},
"execution_count": 242,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# select columns using regex\n",
"df.filter(regex='b$',axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Drop rows"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## by row name (index name)"
]
},
{
"cell_type": "code",
"execution_count": 243,
"metadata": {},
"outputs": [],
"source": [
"friend_dict_list = [{'age': 20, 'job': 'student'},\n",
" {'age': 30, 'job': 'developer'},\n",
" {'age': 30, 'job': 'teacher'}]\n",
"df = pd.DataFrame(friend_dict_list, index = ['John', 'Jenny', 'Nate'])"
]
},
{
"cell_type": "code",
"execution_count": 244,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>John</th>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Jenny</th>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Nate</th>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job\n",
"John 20 student\n",
"Jenny 30 developer\n",
"Nate 30 teacher"
]
},
"execution_count": 244,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### drop row\n",
"dropped result will be shown, but dataframe keeps the dropped row"
]
},
{
"cell_type": "code",
"execution_count": 245,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Jenny</th>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job\n",
"Jenny 30 developer"
]
},
"execution_count": 245,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.drop(['John', 'Nate'])"
]
},
{
"cell_type": "code",
"execution_count": 246,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>John</th>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Jenny</th>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Nate</th>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job\n",
"John 20 student\n",
"Jenny 30 developer\n",
"Nate 30 teacher"
]
},
"execution_count": 246,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"you can assign the result to dataframe to keep the dropped result like below,"
]
},
{
"cell_type": "code",
"execution_count": 247,
"metadata": {},
"outputs": [],
"source": [
"df = df.drop(['John', 'Nate'])"
]
},
{
"cell_type": "code",
"execution_count": 248,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Jenny</th>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job\n",
"Jenny 30 developer"
]
},
"execution_count": 248,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### drop row in place\n",
"The dropped row will be deleted from dataframe with inplace keyword parameter"
]
},
{
"cell_type": "code",
"execution_count": 249,
"metadata": {},
"outputs": [],
"source": [
"friend_dict_list = [{'age': 20, 'job': 'student'},\n",
" {'age': 30, 'job': 'developer'},\n",
" {'age': 30, 'job': 'teacher'}]\n",
"df = pd.DataFrame(friend_dict_list, index = ['John', 'Jenny', 'Nate'])"
]
},
{
"cell_type": "code",
"execution_count": 250,
"metadata": {},
"outputs": [],
"source": [
"df.drop(['John', 'Nate'], inplace = True)"
]
},
{
"cell_type": "code",
"execution_count": 251,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Jenny</th>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job\n",
"Jenny 30 developer"
]
},
"execution_count": 251,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## by row id (index number)"
]
},
{
"cell_type": "code",
"execution_count": 252,
"metadata": {},
"outputs": [],
"source": [
"friend_dict_list = [{'name': 'Jone', 'age': 20, 'job': 'student'},\n",
" {'name': 'Jenny', 'age': 30, 'job': 'developer'},\n",
" {'name': 'Nate', 'age': 30, 'job': 'teacher'}]\n",
"df = pd.DataFrame(friend_dict_list)"
]
},
{
"cell_type": "code",
"execution_count": 253,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" <td>Jone</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" <td>Jenny</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" <td>Nate</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job name\n",
"0 20 student Jone\n",
"1 30 developer Jenny\n",
"2 30 teacher Nate"
]
},
"execution_count": 253,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"you can drop rows by its index"
]
},
{
"cell_type": "code",
"execution_count": 254,
"metadata": {},
"outputs": [],
"source": [
"df = df.drop(df.index[[0,2]])"
]
},
{
"cell_type": "code",
"execution_count": 255,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" <td>Jenny</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job name\n",
"1 30 developer Jenny"
]
},
"execution_count": 255,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## By Column value"
]
},
{
"cell_type": "code",
"execution_count": 256,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" <td>Jone</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" <td>Jenny</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" <td>Nate</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job name\n",
"0 20 student Jone\n",
"1 30 developer Jenny\n",
"2 30 teacher Nate"
]
},
"execution_count": 256,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"friend_dict_list = [{'name': 'Jone', 'age': 20, 'job': 'student'},\n",
" {'name': 'Jenny', 'age': 30, 'job': 'developer'},\n",
" {'name': 'Nate', 'age': 30, 'job': 'teacher'}]\n",
"df = pd.DataFrame(friend_dict_list)\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 257,
"metadata": {},
"outputs": [],
"source": [
"df = df[df.age != 30]"
]
},
{
"cell_type": "code",
"execution_count": 258,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" <td>Jone</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job name\n",
"0 20 student Jone"
]
},
"execution_count": 258,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Drop column"
]
},
{
"cell_type": "code",
"execution_count": 259,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" <td>Jone</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" <td>Jenny</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" <td>Nate</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job name\n",
"0 20 student Jone\n",
"1 30 developer Jenny\n",
"2 30 teacher Nate"
]
},
"execution_count": 259,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"friend_dict_list = [{'name': 'Jone', 'age': 20, 'job': 'student'},\n",
" {'name': 'Jenny', 'age': 30, 'job': 'developer'},\n",
" {'name': 'Nate', 'age': 30, 'job': 'teacher'}]\n",
"df = pd.DataFrame(friend_dict_list)\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 260,
"metadata": {},
"outputs": [],
"source": [
"df = df.drop('age', axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 261,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>job</th>\n",
" <th>name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>student</td>\n",
" <td>Jone</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>developer</td>\n",
" <td>Jenny</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>teacher</td>\n",
" <td>Nate</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" job name\n",
"0 student Jone\n",
"1 developer Jenny\n",
"2 teacher Nate"
]
},
"execution_count": 261,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# Add Column / Update Column"
]
},
{
"cell_type": "code",
"execution_count": 262,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Jone</td>\n",
" <td>15</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 Jone 15 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher"
]
},
"execution_count": 262,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"friend_dict_list = [{'name': 'Jone', 'age': 15, 'job': 'student'},\n",
" {'name': 'Jenny', 'age': 30, 'job': 'developer'},\n",
" {'name': 'Nate', 'age': 30, 'job': 'teacher'}]\n",
"df = pd.DataFrame(friend_dict_list, columns = ['name', 'age', 'job'])\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Add New Column with default value"
]
},
{
"cell_type": "code",
"execution_count": 263,
"metadata": {},
"outputs": [],
"source": [
"df['salary'] = 0"
]
},
{
"cell_type": "code",
"execution_count": 264,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>salary</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Jone</td>\n",
" <td>15</td>\n",
" <td>student</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job salary\n",
"0 Jone 15 student 0\n",
"1 Jenny 30 developer 0\n",
"2 Nate 30 teacher 0"
]
},
"execution_count": 264,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Add New Column derived from existing value"
]
},
{
"cell_type": "code",
"execution_count": 265,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Jone</td>\n",
" <td>15</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job\n",
"0 Jone 15 student\n",
"1 Jenny 30 developer\n",
"2 Nate 30 teacher"
]
},
"execution_count": 265,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"friend_dict_list = [{'name': 'Jone', 'age': 15, 'job': 'student'},\n",
" {'name': 'Jenny', 'age': 30, 'job': 'developer'},\n",
" {'name': 'Nate', 'age': 30, 'job': 'teacher'}]\n",
"df = pd.DataFrame(friend_dict_list, columns = ['name', 'age', 'job'])\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## one liner adding column by true or false condition"
]
},
{
"cell_type": "code",
"execution_count": 266,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"df['salary'] = np.where(df['job'] != 'student' , 'yes', 'no')"
]
},
{
"cell_type": "code",
"execution_count": 267,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>salary</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Jone</td>\n",
" <td>15</td>\n",
" <td>student</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" <td>yes</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" <td>yes</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name age job salary\n",
"0 Jone 15 student no\n",
"1 Jenny 30 developer yes\n",
"2 Nate 30 teacher yes"
]
},
"execution_count": 267,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 268,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>midterm</th>\n",
" <th>final</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>95</td>\n",
" <td>85</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>85</td>\n",
" <td>80</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>10</td>\n",
" <td>30</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name midterm final\n",
"0 John 95 85\n",
"1 Jenny 85 80\n",
"2 Nate 10 30"
]
},
"execution_count": 268,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"friend_dict_list = [{'name': 'John', 'midterm': 95, 'final': 85},\n",
" {'name': 'Jenny', 'midterm': 85, 'final': 80},\n",
" {'name': 'Nate', 'midterm': 10, 'final': 30}]\n",
"df = pd.DataFrame(friend_dict_list, columns = ['name', 'midterm', 'final'])\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## column derived from adding two existing columns"
]
},
{
"cell_type": "code",
"execution_count": 269,
"metadata": {},
"outputs": [],
"source": [
"df['total'] = df['midterm'] + df['final']"
]
},
{
"cell_type": "code",
"execution_count": 270,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>midterm</th>\n",
" <th>final</th>\n",
" <th>total</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>95</td>\n",
" <td>85</td>\n",
" <td>180</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>85</td>\n",
" <td>80</td>\n",
" <td>165</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>10</td>\n",
" <td>30</td>\n",
" <td>40</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name midterm final total\n",
"0 John 95 85 180\n",
"1 Jenny 85 80 165\n",
"2 Nate 10 30 40"
]
},
"execution_count": 270,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## columm from existing column"
]
},
{
"cell_type": "code",
"execution_count": 271,
"metadata": {},
"outputs": [],
"source": [
"df['average'] = df['total'] / 2"
]
},
{
"cell_type": "code",
"execution_count": 272,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>midterm</th>\n",
" <th>final</th>\n",
" <th>total</th>\n",
" <th>average</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>95</td>\n",
" <td>85</td>\n",
" <td>180</td>\n",
" <td>90.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>85</td>\n",
" <td>80</td>\n",
" <td>165</td>\n",
" <td>82.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>10</td>\n",
" <td>30</td>\n",
" <td>40</td>\n",
" <td>20.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name midterm final total average\n",
"0 John 95 85 180 90.0\n",
"1 Jenny 85 80 165 82.5\n",
"2 Nate 10 30 40 20.0"
]
},
"execution_count": 272,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## column by conditional condition"
]
},
{
"cell_type": "code",
"execution_count": 273,
"metadata": {},
"outputs": [],
"source": [
"grades = []\n",
"\n",
"for row in df['average']:\n",
" if row >= 90:\n",
" grades.append('A')\n",
" elif row >= 80:\n",
" grades.append('B')\n",
" elif row >= 70:\n",
" grades.append('C')\n",
" else:\n",
" grades.append('F')\n",
" \n",
"df['grade'] = grades"
]
},
{
"cell_type": "code",
"execution_count": 274,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>midterm</th>\n",
" <th>final</th>\n",
" <th>total</th>\n",
" <th>average</th>\n",
" <th>grade</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>95</td>\n",
" <td>85</td>\n",
" <td>180</td>\n",
" <td>90.0</td>\n",
" <td>A</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>85</td>\n",
" <td>80</td>\n",
" <td>165</td>\n",
" <td>82.5</td>\n",
" <td>B</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>10</td>\n",
" <td>30</td>\n",
" <td>40</td>\n",
" <td>20.0</td>\n",
" <td>F</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name midterm final total average grade\n",
"0 John 95 85 180 90.0 A\n",
"1 Jenny 85 80 165 82.5 B\n",
"2 Nate 10 30 40 20.0 F"
]
},
"execution_count": 274,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## how to use apply function\n",
"apply function helps you code concisely.\n",
"the function will be applied to selected column(s) on all rows"
]
},
{
"cell_type": "code",
"execution_count": 275,
"metadata": {},
"outputs": [],
"source": [
"def pass_or_fail(row):\n",
" print(row)\n",
" if row != \"F\":\n",
" return 'Pass'\n",
" else:\n",
" return 'Fail'"
]
},
{
"cell_type": "code",
"execution_count": 276,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"A\n",
"B\n",
"F\n"
]
}
],
"source": [
"df.grade = df.grade.apply(pass_or_fail)"
]
},
{
"cell_type": "code",
"execution_count": 277,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>midterm</th>\n",
" <th>final</th>\n",
" <th>total</th>\n",
" <th>average</th>\n",
" <th>grade</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>95</td>\n",
" <td>85</td>\n",
" <td>180</td>\n",
" <td>90.0</td>\n",
" <td>Pass</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>85</td>\n",
" <td>80</td>\n",
" <td>165</td>\n",
" <td>82.5</td>\n",
" <td>Pass</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>10</td>\n",
" <td>30</td>\n",
" <td>40</td>\n",
" <td>20.0</td>\n",
" <td>Fail</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name midterm final total average grade\n",
"0 John 95 85 180 90.0 Pass\n",
"1 Jenny 85 80 165 82.5 Pass\n",
"2 Nate 10 30 40 20.0 Fail"
]
},
"execution_count": 277,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## info extraction using df.apply"
]
},
{
"cell_type": "code",
"execution_count": 278,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>yyyy-mm-dd</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2000-06-27</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2002-09-24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2005-12-20</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" yyyy-mm-dd\n",
"0 2000-06-27\n",
"1 2002-09-24\n",
"2 2005-12-20"
]
},
"execution_count": 278,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"date_list = [{'yyyy-mm-dd': '2000-06-27'},\n",
" {'yyyy-mm-dd': '2002-09-24'},\n",
" {'yyyy-mm-dd': '2005-12-20'}]\n",
"df = pd.DataFrame(date_list, columns = ['yyyy-mm-dd'])\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 279,
"metadata": {},
"outputs": [],
"source": [
"def extract_year(row):\n",
" return row.split('-')[0]"
]
},
{
"cell_type": "code",
"execution_count": 280,
"metadata": {},
"outputs": [],
"source": [
"df['year'] = df['yyyy-mm-dd'].apply(extract_year)"
]
},
{
"cell_type": "code",
"execution_count": 281,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>yyyy-mm-dd</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2000-06-27</td>\n",
" <td>2000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2002-09-24</td>\n",
" <td>2002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2005-12-20</td>\n",
" <td>2005</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" yyyy-mm-dd year\n",
"0 2000-06-27 2000\n",
"1 2002-09-24 2002\n",
"2 2005-12-20 2005"
]
},
"execution_count": 281,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## passing keyword parameter to apply function\n",
"you also can send parameter to apply function"
]
},
{
"cell_type": "code",
"execution_count": 282,
"metadata": {},
"outputs": [],
"source": [
"def extract_year(year, current_year):\n",
" return current_year - int(year)"
]
},
{
"cell_type": "code",
"execution_count": 283,
"metadata": {},
"outputs": [],
"source": [
"df['age'] = df['year'].apply(extract_year, current_year=2018)"
]
},
{
"cell_type": "code",
"execution_count": 284,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>yyyy-mm-dd</th>\n",
" <th>year</th>\n",
" <th>age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2000-06-27</td>\n",
" <td>2000</td>\n",
" <td>18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2002-09-24</td>\n",
" <td>2002</td>\n",
" <td>16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2005-12-20</td>\n",
" <td>2005</td>\n",
" <td>13</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" yyyy-mm-dd year age\n",
"0 2000-06-27 2000 18\n",
"1 2002-09-24 2002 16\n",
"2 2005-12-20 2005 13"
]
},
"execution_count": 284,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## passing multiple keyword parameter to apply function\n",
"you also can send multiple parameter to apply function"
]
},
{
"cell_type": "code",
"execution_count": 285,
"metadata": {},
"outputs": [],
"source": [
"def get_introduce(age, prefix, suffix):\n",
" return prefix + str(age) + suffix"
]
},
{
"cell_type": "code",
"execution_count": 286,
"metadata": {},
"outputs": [],
"source": [
"df['introduce'] = df['age'].apply(get_introduce, prefix=\"I am \", suffix=\" years old\")"
]
},
{
"cell_type": "code",
"execution_count": 287,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>yyyy-mm-dd</th>\n",
" <th>year</th>\n",
" <th>age</th>\n",
" <th>introduce</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2000-06-27</td>\n",
" <td>2000</td>\n",
" <td>18</td>\n",
" <td>I am 18 years old</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2002-09-24</td>\n",
" <td>2002</td>\n",
" <td>16</td>\n",
" <td>I am 16 years old</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2005-12-20</td>\n",
" <td>2005</td>\n",
" <td>13</td>\n",
" <td>I am 13 years old</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" yyyy-mm-dd year age introduce\n",
"0 2000-06-27 2000 18 I am 18 years old\n",
"1 2002-09-24 2002 16 I am 16 years old\n",
"2 2005-12-20 2005 13 I am 13 years old"
]
},
"execution_count": 287,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## passing multiple columns to apply function\n",
"you can provide axis=1 in the apply function, so you send all column values to apply function"
]
},
{
"cell_type": "code",
"execution_count": 288,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>yyyy-mm-dd</th>\n",
" <th>year</th>\n",
" <th>age</th>\n",
" <th>introduce</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2000-06-27</td>\n",
" <td>2000</td>\n",
" <td>18</td>\n",
" <td>I was born in 2000 my age is 18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2002-09-24</td>\n",
" <td>2002</td>\n",
" <td>16</td>\n",
" <td>I was born in 2002 my age is 16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2005-12-20</td>\n",
" <td>2005</td>\n",
" <td>13</td>\n",
" <td>I was born in 2005 my age is 13</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" yyyy-mm-dd year age introduce\n",
"0 2000-06-27 2000 18 I was born in 2000 my age is 18\n",
"1 2002-09-24 2002 16 I was born in 2002 my age is 16\n",
"2 2005-12-20 2005 13 I was born in 2005 my age is 13"
]
},
"execution_count": 288,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def get_introduce2(row):\n",
" return \"I was born in \"+str(row.year)+\" my age is \"+str(row.age)\n",
"df.introduce = df.apply(get_introduce2, axis=1)\n",
"\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## how to use map function\n",
"if you give function as parameter, it works same as apply function on the column"
]
},
{
"cell_type": "code",
"execution_count": 289,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>yyyy-mm-dd</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2000-06-27</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2002-09-24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2005-12-20</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" yyyy-mm-dd\n",
"0 2000-06-27\n",
"1 2002-09-24\n",
"2 2005-12-20"
]
},
"execution_count": 289,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"date_list = [{'yyyy-mm-dd': '2000-06-27'},\n",
" {'yyyy-mm-dd': '2002-09-24'},\n",
" {'yyyy-mm-dd': '2005-12-20'}]\n",
"df = pd.DataFrame(date_list, columns = ['yyyy-mm-dd'])\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 290,
"metadata": {},
"outputs": [],
"source": [
"def extract_year(row):\n",
" return row.split('-')[0]"
]
},
{
"cell_type": "code",
"execution_count": 291,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>yyyy-mm-dd</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2000-06-27</td>\n",
" <td>2000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2002-09-24</td>\n",
" <td>2002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2005-12-20</td>\n",
" <td>2005</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" yyyy-mm-dd year\n",
"0 2000-06-27 2000\n",
"1 2002-09-24 2002\n",
"2 2005-12-20 2005"
]
},
"execution_count": 291,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['year'] = df['yyyy-mm-dd'].map(extract_year)\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"if you give dictionary as parameter, \n",
"column will be updated with new value like \n",
"new value = dict['old value']"
]
},
{
"cell_type": "code",
"execution_count": 292,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>20</td>\n",
" <td>student</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>30</td>\n",
" <td>developer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>30</td>\n",
" <td>teacher</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job\n",
"0 20 student\n",
"1 30 developer\n",
"2 30 teacher"
]
},
"execution_count": 292,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"job_list = [{'age': 20, 'job': 'student'},\n",
" {'age': 30, 'job': 'developer'},\n",
" {'age': 30, 'job': 'teacher'}]\n",
"df = pd.DataFrame(job_list)\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 293,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>20</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>30</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>30</td>\n",
" <td>3</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job\n",
"0 20 1\n",
"1 30 2\n",
"2 30 3"
]
},
"execution_count": 293,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.job = df.job.map({\"student\":1,\"developer\":2,\"teacher\":3})\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Applymap\n",
"update all elements in the dataframe at once"
]
},
{
"cell_type": "code",
"execution_count": 294,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>x</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.5</td>\n",
" <td>-5.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>-5.2</td>\n",
" <td>5.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>-1.6</td>\n",
" <td>-4.5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" x y\n",
"0 5.5 -5.6\n",
"1 -5.2 5.5\n",
"2 -1.6 -4.5"
]
},
"execution_count": 294,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x_y = [{'x': 5.5, 'y': -5.6},\n",
" {'x': -5.2, 'y': 5.5},\n",
" {'x': -1.6, 'y': -4.5}]\n",
"df = pd.DataFrame(x_y)\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 295,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>x</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>6.0</td>\n",
" <td>-6.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>-5.0</td>\n",
" <td>6.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>-2.0</td>\n",
" <td>-4.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" x y\n",
"0 6.0 -6.0\n",
"1 -5.0 6.0\n",
"2 -2.0 -4.0"
]
},
"execution_count": 295,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = df.applymap(np.around)\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Add Row"
]
},
{
"cell_type": "code",
"execution_count": 296,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>midterm</th>\n",
" <th>final</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>95</td>\n",
" <td>85</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>85</td>\n",
" <td>80</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>10</td>\n",
" <td>30</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name midterm final\n",
"0 John 95 85\n",
"1 Jenny 85 80\n",
"2 Nate 10 30"
]
},
"execution_count": 296,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"friend_dict_list = [{'name': 'John', 'midterm': 95, 'final': 85},\n",
" {'name': 'Jenny', 'midterm': 85, 'final': 80},\n",
" {'name': 'Nate', 'midterm': 10, 'final': 30}]\n",
"df = pd.DataFrame(friend_dict_list, columns = ['name', 'midterm', 'final'])\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 297,
"metadata": {},
"outputs": [],
"source": [
"df2 = pd.DataFrame([['Ben', 50,50]], columns = ['name', 'midterm', 'final'])"
]
},
{
"cell_type": "code",
"execution_count": 298,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>midterm</th>\n",
" <th>final</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Ben</td>\n",
" <td>50</td>\n",
" <td>50</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name midterm final\n",
"0 Ben 50 50"
]
},
"execution_count": 298,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2.head()"
]
},
{
"cell_type": "code",
"execution_count": 299,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>midterm</th>\n",
" <th>final</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>95</td>\n",
" <td>85</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jenny</td>\n",
" <td>85</td>\n",
" <td>80</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nate</td>\n",
" <td>10</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Ben</td>\n",
" <td>50</td>\n",
" <td>50</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name midterm final\n",
"0 John 95 85\n",
"1 Jenny 85 80\n",
"2 Nate 10 30\n",
"3 Ben 50 50"
]
},
"execution_count": 299,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.append(df2, ignore_index=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Group by\n",
"group by command helps to get more information from given data"
]
},
{
"cell_type": "code",
"execution_count": 300,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>major</th>\n",
" <th>sex</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Nate</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Abraham</td>\n",
" <td>Physics</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Brian</td>\n",
" <td>Psychology</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Janny</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Yuna</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Jeniffer</td>\n",
" <td>Computer Science</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Edward</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Zara</td>\n",
" <td>Psychology</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Wendy</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Sera</td>\n",
" <td>Psychology</td>\n",
" <td>female</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name major sex\n",
"0 John Computer Science male\n",
"1 Nate Computer Science male\n",
"2 Abraham Physics male\n",
"3 Brian Psychology male\n",
"4 Janny Economics female\n",
"5 Yuna Economics female\n",
"6 Jeniffer Computer Science female\n",
"7 Edward Computer Science male\n",
"8 Zara Psychology female\n",
"9 Wendy Economics female\n",
"10 Sera Psychology female"
]
},
"execution_count": 300,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"student_list = [{'name': 'John', 'major': \"Computer Science\", 'sex': \"male\"},\n",
" {'name': 'Nate', 'major': \"Computer Science\", 'sex': \"male\"},\n",
" {'name': 'Abraham', 'major': \"Physics\", 'sex': \"male\"},\n",
" {'name': 'Brian', 'major': \"Psychology\", 'sex': \"male\"},\n",
" {'name': 'Janny', 'major': \"Economics\", 'sex': \"female\"},\n",
" {'name': 'Yuna', 'major': \"Economics\", 'sex': \"female\"},\n",
" {'name': 'Jeniffer', 'major': \"Computer Science\", 'sex': \"female\"},\n",
" {'name': 'Edward', 'major': \"Computer Science\", 'sex': \"male\"},\n",
" {'name': 'Zara', 'major': \"Psychology\", 'sex': \"female\"},\n",
" {'name': 'Wendy', 'major': \"Economics\", 'sex': \"female\"},\n",
" {'name': 'Sera', 'major': \"Psychology\", 'sex': \"female\"}\n",
" ]\n",
"df = pd.DataFrame(student_list, columns = ['name', 'major', 'sex'])\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 301,
"metadata": {},
"outputs": [],
"source": [
"groupby_major = df.groupby('major')"
]
},
{
"cell_type": "code",
"execution_count": 302,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'Computer Science': Int64Index([0, 1, 6, 7], dtype='int64'),\n",
" 'Economics': Int64Index([4, 5, 9], dtype='int64'),\n",
" 'Physics': Int64Index([2], dtype='int64'),\n",
" 'Psychology': Int64Index([3, 8, 10], dtype='int64')}"
]
},
"execution_count": 302,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"groupby_major.groups"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"here we can see, computer science has mostly man, while economic has mostly woman students"
]
},
{
"cell_type": "code",
"execution_count": 303,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Computer Science: 4\n",
" name major sex\n",
"0 John Computer Science male\n",
"1 Nate Computer Science male\n",
"6 Jeniffer Computer Science female\n",
"7 Edward Computer Science male\n",
"\n",
"Economics: 3\n",
" name major sex\n",
"4 Janny Economics female\n",
"5 Yuna Economics female\n",
"9 Wendy Economics female\n",
"\n",
"Physics: 1\n",
" name major sex\n",
"2 Abraham Physics male\n",
"\n",
"Psychology: 3\n",
" name major sex\n",
"3 Brian Psychology male\n",
"8 Zara Psychology female\n",
"10 Sera Psychology female\n",
"\n"
]
}
],
"source": [
"for name, group in groupby_major:\n",
" print(name + \": \" + str(len(group)))\n",
" print(group)\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### group object to dataframe"
]
},
{
"cell_type": "code",
"execution_count": 304,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>major</th>\n",
" <th>count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Computer Science</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Economics</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Physics</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Psychology</td>\n",
" <td>3</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" major count\n",
"0 Computer Science 4\n",
"1 Economics 3\n",
"2 Physics 1\n",
"3 Psychology 3"
]
},
"execution_count": 304,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_major_cnt = pd.DataFrame({'count' : groupby_major.size()}).reset_index()\n",
"df_major_cnt"
]
},
{
"cell_type": "code",
"execution_count": 305,
"metadata": {},
"outputs": [],
"source": [
"groupby_sex = df.groupby('sex')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"here we can see, this school has balanced woman and man ratio"
]
},
{
"cell_type": "code",
"execution_count": 306,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"female: 6\n",
" name major sex\n",
"4 Janny Economics female\n",
"5 Yuna Economics female\n",
"6 Jeniffer Computer Science female\n",
"8 Zara Psychology female\n",
"9 Wendy Economics female\n",
"10 Sera Psychology female\n",
"\n",
"male: 5\n",
" name major sex\n",
"0 John Computer Science male\n",
"1 Nate Computer Science male\n",
"2 Abraham Physics male\n",
"3 Brian Psychology male\n",
"7 Edward Computer Science male\n",
"\n"
]
}
],
"source": [
"for name, group in groupby_sex:\n",
" print(name + \": \" + str(len(group)))\n",
" print(group)\n",
" print()"
]
},
{
"cell_type": "code",
"execution_count": 307,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sex</th>\n",
" <th>count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>female</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>male</td>\n",
" <td>5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sex count\n",
"0 female 6\n",
"1 male 5"
]
},
"execution_count": 307,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_sex_cnt = pd.DataFrame({'count' : groupby_sex.size()}).reset_index()\n",
"df_sex_cnt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Drop Duplicate\n",
"sometimes you need to drop duplicate rows and here is elegant way to to it"
]
},
{
"cell_type": "code",
"execution_count": 308,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>major</th>\n",
" <th>sex</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Nate</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Abraham</td>\n",
" <td>Physics</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Brian</td>\n",
" <td>Psychology</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Janny</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Yuna</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Jeniffer</td>\n",
" <td>Computer Science</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Edward</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Zara</td>\n",
" <td>Psychology</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Wendy</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Sera</td>\n",
" <td>Psychology</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>John</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name major sex\n",
"0 John Computer Science male\n",
"1 Nate Computer Science male\n",
"2 Abraham Physics male\n",
"3 Brian Psychology male\n",
"4 Janny Economics female\n",
"5 Yuna Economics female\n",
"6 Jeniffer Computer Science female\n",
"7 Edward Computer Science male\n",
"8 Zara Psychology female\n",
"9 Wendy Economics female\n",
"10 Sera Psychology female\n",
"11 John Computer Science male"
]
},
"execution_count": 308,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"student_list = [{'name': 'John', 'major': \"Computer Science\", 'sex': \"male\"},\n",
" {'name': 'Nate', 'major': \"Computer Science\", 'sex': \"male\"},\n",
" {'name': 'Abraham', 'major': \"Physics\", 'sex': \"male\"},\n",
" {'name': 'Brian', 'major': \"Psychology\", 'sex': \"male\"},\n",
" {'name': 'Janny', 'major': \"Economics\", 'sex': \"female\"},\n",
" {'name': 'Yuna', 'major': \"Economics\", 'sex': \"female\"},\n",
" {'name': 'Jeniffer', 'major': \"Computer Science\", 'sex': \"female\"},\n",
" {'name': 'Edward', 'major': \"Computer Science\", 'sex': \"male\"},\n",
" {'name': 'Zara', 'major': \"Psychology\", 'sex': \"female\"},\n",
" {'name': 'Wendy', 'major': \"Economics\", 'sex': \"female\"},\n",
" {'name': 'Sera', 'major': \"Psychology\", 'sex': \"female\"},\n",
" {'name': 'John', 'major': \"Computer Science\", 'sex': \"male\"},\n",
" ]\n",
"df = pd.DataFrame(student_list, columns = ['name', 'major', 'sex'])\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## check if there is duplicated row"
]
},
{
"cell_type": "code",
"execution_count": 309,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 False\n",
"3 False\n",
"4 False\n",
"5 False\n",
"6 False\n",
"7 False\n",
"8 False\n",
"9 False\n",
"10 False\n",
"11 True\n",
"dtype: bool"
]
},
"execution_count": 309,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.duplicated()"
]
},
{
"cell_type": "code",
"execution_count": 310,
"metadata": {},
"outputs": [],
"source": [
"df = df.drop_duplicates()"
]
},
{
"cell_type": "code",
"execution_count": 311,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>major</th>\n",
" <th>sex</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Nate</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Abraham</td>\n",
" <td>Physics</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Brian</td>\n",
" <td>Psychology</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Janny</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Yuna</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Jeniffer</td>\n",
" <td>Computer Science</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Edward</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Zara</td>\n",
" <td>Psychology</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Wendy</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Sera</td>\n",
" <td>Psychology</td>\n",
" <td>female</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name major sex\n",
"0 John Computer Science male\n",
"1 Nate Computer Science male\n",
"2 Abraham Physics male\n",
"3 Brian Psychology male\n",
"4 Janny Economics female\n",
"5 Yuna Economics female\n",
"6 Jeniffer Computer Science female\n",
"7 Edward Computer Science male\n",
"8 Zara Psychology female\n",
"9 Wendy Economics female\n",
"10 Sera Psychology female"
]
},
"execution_count": 311,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 312,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>major</th>\n",
" <th>sex</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Nate</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Abraham</td>\n",
" <td>Physics</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Brian</td>\n",
" <td>Psychology</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Janny</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Yuna</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Jeniffer</td>\n",
" <td>Computer Science</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Edward</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Zara</td>\n",
" <td>Psychology</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Wendy</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Nate</td>\n",
" <td>None</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>John</td>\n",
" <td>Computer Science</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name major sex\n",
"0 John Computer Science male\n",
"1 Nate Computer Science male\n",
"2 Abraham Physics male\n",
"3 Brian Psychology male\n",
"4 Janny Economics female\n",
"5 Yuna Economics female\n",
"6 Jeniffer Computer Science female\n",
"7 Edward Computer Science male\n",
"8 Zara Psychology female\n",
"9 Wendy Economics female\n",
"10 Nate None male\n",
"11 John Computer Science None"
]
},
"execution_count": 312,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"student_list = [{'name': 'John', 'major': \"Computer Science\", 'sex': \"male\"},\n",
" {'name': 'Nate', 'major': \"Computer Science\", 'sex': \"male\"},\n",
" {'name': 'Abraham', 'major': \"Physics\", 'sex': \"male\"},\n",
" {'name': 'Brian', 'major': \"Psychology\", 'sex': \"male\"},\n",
" {'name': 'Janny', 'major': \"Economics\", 'sex': \"female\"},\n",
" {'name': 'Yuna', 'major': \"Economics\", 'sex': \"female\"},\n",
" {'name': 'Jeniffer', 'major': \"Computer Science\", 'sex': \"female\"},\n",
" {'name': 'Edward', 'major': \"Computer Science\", 'sex': \"male\"},\n",
" {'name': 'Zara', 'major': \"Psychology\", 'sex': \"female\"},\n",
" {'name': 'Wendy', 'major': \"Economics\", 'sex': \"female\"},\n",
" {'name': 'Nate', 'major': None, 'sex': \"male\"},\n",
" {'name': 'John', 'major': \"Computer Science\", 'sex': None},\n",
" ]\n",
"df = pd.DataFrame(student_list, columns = ['name', 'major', 'sex'])\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 313,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 False\n",
"3 False\n",
"4 False\n",
"5 False\n",
"6 False\n",
"7 False\n",
"8 False\n",
"9 False\n",
"10 True\n",
"11 True\n",
"dtype: bool"
]
},
"execution_count": 313,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.duplicated(['name'])"
]
},
{
"cell_type": "code",
"execution_count": 314,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>major</th>\n",
" <th>sex</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Abraham</td>\n",
" <td>Physics</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Brian</td>\n",
" <td>Psychology</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Janny</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Yuna</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Jeniffer</td>\n",
" <td>Computer Science</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Edward</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Zara</td>\n",
" <td>Psychology</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Wendy</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Nate</td>\n",
" <td>None</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>John</td>\n",
" <td>Computer Science</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name major sex\n",
"2 Abraham Physics male\n",
"3 Brian Psychology male\n",
"4 Janny Economics female\n",
"5 Yuna Economics female\n",
"6 Jeniffer Computer Science female\n",
"7 Edward Computer Science male\n",
"8 Zara Psychology female\n",
"9 Wendy Economics female\n",
"10 Nate None male\n",
"11 John Computer Science None"
]
},
"execution_count": 314,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.drop_duplicates(['name'], keep='last')"
]
},
{
"cell_type": "code",
"execution_count": 315,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>major</th>\n",
" <th>sex</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Nate</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Abraham</td>\n",
" <td>Physics</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Brian</td>\n",
" <td>Psychology</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Janny</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Yuna</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Jeniffer</td>\n",
" <td>Computer Science</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Edward</td>\n",
" <td>Computer Science</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Zara</td>\n",
" <td>Psychology</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Wendy</td>\n",
" <td>Economics</td>\n",
" <td>female</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Nate</td>\n",
" <td>None</td>\n",
" <td>male</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>John</td>\n",
" <td>Computer Science</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name major sex\n",
"0 John Computer Science male\n",
"1 Nate Computer Science male\n",
"2 Abraham Physics male\n",
"3 Brian Psychology male\n",
"4 Janny Economics female\n",
"5 Yuna Economics female\n",
"6 Jeniffer Computer Science female\n",
"7 Edward Computer Science male\n",
"8 Zara Psychology female\n",
"9 Wendy Economics female\n",
"10 Nate None male\n",
"11 John Computer Science None"
]
},
"execution_count": 315,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# how to manage None value?"
]
},
{
"cell_type": "code",
"execution_count": 316,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>job</th>\n",
" <th>age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>teacher</td>\n",
" <td>40.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Nate</td>\n",
" <td>teacher</td>\n",
" <td>35.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Yuna</td>\n",
" <td>teacher</td>\n",
" <td>37.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Abraham</td>\n",
" <td>student</td>\n",
" <td>10.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brian</td>\n",
" <td>student</td>\n",
" <td>12.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Janny</td>\n",
" <td>student</td>\n",
" <td>11.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Nate</td>\n",
" <td>teacher</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>John</td>\n",
" <td>student</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name job age\n",
"0 John teacher 40.0\n",
"1 Nate teacher 35.0\n",
"2 Yuna teacher 37.0\n",
"3 Abraham student 10.0\n",
"4 Brian student 12.0\n",
"5 Janny student 11.0\n",
"6 Nate teacher NaN\n",
"7 John student NaN"
]
},
"execution_count": 316,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"school_id_list = [{'name': 'John', 'job': \"teacher\", 'age': 40},\n",
" {'name': 'Nate', 'job': \"teacher\", 'age': 35},\n",
" {'name': 'Yuna', 'job': \"teacher\", 'age': 37},\n",
" {'name': 'Abraham', 'job': \"student\", 'age': 10},\n",
" {'name': 'Brian', 'job': \"student\", 'age': 12},\n",
" {'name': 'Janny', 'job': \"student\", 'age': 11},\n",
" {'name': 'Nate', 'job': \"teacher\", 'age': None},\n",
" {'name': 'John', 'job': \"student\", 'age': None}\n",
" ]\n",
"df = pd.DataFrame(school_id_list, columns = ['name', 'job', 'age'])\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## how to check if there is Null or NaN"
]
},
{
"cell_type": "code",
"execution_count": 317,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 8 entries, 0 to 7\n",
"Data columns (total 3 columns):\n",
"name 8 non-null object\n",
"job 8 non-null object\n",
"age 6 non-null float64\n",
"dtypes: float64(1), object(2)\n",
"memory usage: 272.0+ bytes\n"
]
}
],
"source": [
"df.info()"
]
},
{
"cell_type": "code",
"execution_count": 318,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>job</th>\n",
" <th>age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name job age\n",
"0 False False False\n",
"1 False False False\n",
"2 False False False\n",
"3 False False False\n",
"4 False False False\n",
"5 False False False\n",
"6 False False True\n",
"7 False False True"
]
},
"execution_count": 318,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.isna()"
]
},
{
"cell_type": "code",
"execution_count": 319,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>job</th>\n",
" <th>age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name job age\n",
"0 False False False\n",
"1 False False False\n",
"2 False False False\n",
"3 False False False\n",
"4 False False False\n",
"5 False False False\n",
"6 False False True\n",
"7 False False True"
]
},
"execution_count": 319,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.isnull()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## how to fill Null or NaN"
]
},
{
"cell_type": "code",
"execution_count": 320,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>job</th>\n",
" <th>age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>teacher</td>\n",
" <td>40.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Nate</td>\n",
" <td>teacher</td>\n",
" <td>35.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Yuna</td>\n",
" <td>teacher</td>\n",
" <td>37.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Abraham</td>\n",
" <td>student</td>\n",
" <td>10.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brian</td>\n",
" <td>student</td>\n",
" <td>12.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Janny</td>\n",
" <td>student</td>\n",
" <td>11.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Nate</td>\n",
" <td>teacher</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>John</td>\n",
" <td>student</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name job age\n",
"0 John teacher 40.0\n",
"1 Nate teacher 35.0\n",
"2 Yuna teacher 37.0\n",
"3 Abraham student 10.0\n",
"4 Brian student 12.0\n",
"5 Janny student 11.0\n",
"6 Nate teacher 0.0\n",
"7 John student 0.0"
]
},
"execution_count": 320,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tmp = df\n",
"tmp[\"age\"] = tmp[\"age\"].fillna(0)\n",
"tmp"
]
},
{
"cell_type": "code",
"execution_count": 321,
"metadata": {},
"outputs": [],
"source": [
"# fill missing age with median age for each group (teacher, student)\n",
"df[\"age\"].fillna(df.groupby(\"job\")[\"age\"].transform(\"median\"), inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 322,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>job</th>\n",
" <th>age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>John</td>\n",
" <td>teacher</td>\n",
" <td>40.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Nate</td>\n",
" <td>teacher</td>\n",
" <td>35.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Yuna</td>\n",
" <td>teacher</td>\n",
" <td>37.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Abraham</td>\n",
" <td>student</td>\n",
" <td>10.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brian</td>\n",
" <td>student</td>\n",
" <td>12.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Janny</td>\n",
" <td>student</td>\n",
" <td>11.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Nate</td>\n",
" <td>teacher</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>John</td>\n",
" <td>student</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name job age\n",
"0 John teacher 40.0\n",
"1 Nate teacher 35.0\n",
"2 Yuna teacher 37.0\n",
"3 Abraham student 10.0\n",
"4 Brian student 12.0\n",
"5 Janny student 11.0\n",
"6 Nate teacher 0.0\n",
"7 John student 0.0"
]
},
"execution_count": 322,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Unique"
]
},
{
"cell_type": "code",
"execution_count": 323,
"metadata": {},
"outputs": [],
"source": [
"job_list = [{'name': 'John', 'job': \
gitextract_520hdf_4/ ├── Pandas_Cheatsheet.ipynb ├── data/ │ ├── friend_list.csv │ ├── friend_list.txt │ ├── friend_list_no_head.csv │ └── friend_list_tab.txt └── 팬더스_명령어_꿀팁.ipynb
Condensed preview — 6 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (494K chars).
[
{
"path": "Pandas_Cheatsheet.ipynb",
"chars": 213848,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \"# What is Pandas?\\n\",\n \"python l"
},
{
"path": "data/friend_list.csv",
"chars": 113,
"preview": "name,age,job\rJohn,20,student\rJenny,30,developer\rNate,30,teacher\rJulia,40,dentist\rBrian,45,manager\rChris,25,intern"
},
{
"path": "data/friend_list.txt",
"chars": 113,
"preview": "name,age,job\rJohn,20,student\rJenny,30,developer\rNate,30,teacher\rJulia,40,dentist\rBrian,45,manager\rChris,25,intern"
},
{
"path": "data/friend_list_no_head.csv",
"chars": 101,
"preview": "John,20,student\rJenny,30,developer\rNate,30,teacher\rJulia,40,dentist\rBrian,45,manager\rChris,25,intern\n"
},
{
"path": "data/friend_list_tab.txt",
"chars": 114,
"preview": "name\tage\tjob\rJohn\t20\tstudent\rJenny\t30\tdeveloper\rNate\t30\tteacher\rJulia\t40\tdentist\rBrian\t45\tmanager\rChris\t25\tintern\n"
},
{
"path": "팬더스_명령어_꿀팁.ipynb",
"chars": 213828,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \"# Pandas는 무엇인가요?\\n\",\n \"데이터 분석 및 "
}
]
About this extraction
This page contains the full source code of the minsuk-heo/pandas GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 6 files (418.1 KB), approximately 141.7k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.