Showing preview only (212K chars total). Download the full file or copy to clipboard to get everything.
Repository: milaan9/DataScience_Interview_Questions
Branch: main
Commit: b515c84b6b42
Files: 12
Total size: 204.3 KB
Directory structure:
gitextract_4xh2ybvq/
├── 01_120_Python_Basics_Interview_Questions.ipynb
├── 02_Predictive_Modeling.ipynb
├── 03_Programming.ipynb
├── 04_Probability.ipynb
├── 05_Statistical_Inference.ipynb
├── 06_Data_Analysis.ipynb
├── 07_Product_Metrics.ipynb
├── 08_Communication.ipynb
├── 09_Coding.ipynb
├── 10_Linkedin_Skill_Assessment_Python.ipynb
├── LICENSE
└── README.md
================================================
FILE CONTENTS
================================================
================================================
FILE: 01_120_Python_Basics_Interview_Questions.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<small><small><i>\n",
"All the IPython Notebooks in **Data Science Interview Questions** lecture series by **[Dr. Milaan Parmar](https://www.linkedin.com/in/milaanparmar/)** are available @ **[GitHub](https://github.com/milaan9/DataScience_Interview_Questions)**\n",
"</i></small></small>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Python Basics ➞ <span class='label label-default'>120 Questions</span>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. What is Python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Python is a high-level, interpreted, interactive and object-oriented scripting language. Python is designed to be highly readable. It uses English keywords frequently where as other languages use punctuation, and it has fewer syntactical constructions than other languages."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Name some of the features of Python.\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"Following are some of the salient features of python −\n",
"\n",
"* It supports functional and structured programming methods as well as OOP.\n",
"\n",
"* It can be used as a scripting language or can be compiled to byte-code for building large applications.\n",
"\n",
"* It provides very high-level dynamic data types and supports dynamic type checking.\n",
"\n",
"* It supports automatic garbage collection.\n",
"\n",
"* It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. What is the purpose of PYTHONPATH environment variable?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- PYTHONPATH - It has a role similar to PATH. This variable tells the Python interpreter where to locate the module files imported into a program. It should include the Python source library directory and the directories containing Python source code. PYTHONPATH is sometimes preset by the Python installer."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. What is the purpose of PYTHONSTARTUP environment variable?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- PYTHONSTARTUP - It contains the path of an initialization file containing Python source code. It is executed every time you start the interpreter. It is named as .pythonrc.py in Unix and it contains commands that load utilities or modify PYTHONPATH."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5. What is the purpose of PYTHONCASEOK environment variable?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- PYTHONCASEOK − It is used in Windows to instruct Python to find the first case-insensitive match in an import statement. Set this variable to any value to activate it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6. What is the purpose of PYTHONHOME environment variable?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- PYTHONHOME − It is an alternative module search path. It is usually embedded in the PYTHONSTARTUP or PYTHONPATH directories to make switching module libraries easy."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7. Is python a case sensitive language?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Yes! Python is a case sensitive programming language."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 8. What are the supported data types in Python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Python has five standard data types:\n",
" 1. Numbers\n",
" 2. String\n",
" 3. List\n",
" 4. Tuple\n",
" 5. Dictionary"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 9. What is the output of print `str` if `str = 'Hello World!'`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print complete string. \n",
"- Output would be `Hello World!`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 10. What is the output of print `str[0]` if `str = 'Hello World!'`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print first character of the string. Output would be H."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 11. What is the output of print `str[2:5]` if `str = 'Hello World!'`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print characters starting from 3rd to 5th. \n",
"- Output would be `llo`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 12. What is the output of print `str[2:]` if `str = 'Hello World!'`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print characters starting from 3rd character. \n",
"- Output would be `llo World!`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 13. What is the output of print `str * 2` if `str = 'Hello World!'`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print string two times. \n",
"- Output would be `Hello World!Hello World!`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 14. What is the output of print `str + \"TEST\"` if `str = 'Hello World!'`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print concatenated string. \n",
"- Output would be `Hello World!TEST`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 15. What is the output of print `list` if `list = [ 'abcd', 786 , 2.23, 'john', 70.2 ]`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print complete list. \n",
"- Output would be `['abcd', 786, 2.23, 'john', 70.200000000000003]`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 16. What is the output of print `list[0]` if `list = [ 'abcd', 786 , 2.23, 'john', 70.2 ]`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print first element of the list. \n",
"- Output would be `abcd`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 17. What is the output of print `list[1:3]` if `list = [ 'abcd', 786 , 2.23, 'john', 70.2 ]`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print elements starting from 2nd till 3rd. \n",
"- Output would be `[786, 2.23]`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 18. What is the output of print `list[2:]` if `list = [ 'abcd', 786 , 2.23, 'john', 70.2 ]`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print elements starting from 3rd element. \n",
"- Output would be `[2.23, 'john', 70.200000000000003]`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 19. What is the output of print `tinylist * 2` if `tinylist = [123, 'john']`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print list two times. \n",
"- Output would be `[123, 'john', 123, 'john']`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 20. What is the output of print `list1 + list2`, if `list1 = [ 'abcd', 786 , 2.23, 'john', 70.2 ] and ist2 = [123, 'john']`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print concatenated lists. \n",
"- Output would be `['abcd', 786, 2.23, 'john', 70.2, 123, 'john']`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 21. What are tuples in Python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- A tuple is another sequence data type that is similar to the list. \n",
"- A tuple consists of a number of values separated by commas. \n",
"- Unlike lists, however, tuples are enclosed within parentheses."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 22. What is the difference between tuples and lists in Python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- The main differences between lists and tuples are: \n",
" - Lists are enclosed in brackets `[ ]` and their elements and size can be changed, while tuples are enclosed in parentheses `( )` and cannot be updated. \n",
" - Tuples can be thought of as read-only lists."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 23. What is the output of print `tuple` if `tuple = ( 'abcd', 786 , 2.23, 'john', 70.2 )`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print complete tuple. \n",
"- Output would be `('abcd', 786, 2.23, 'john', 70.200000000000003)`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 24. What is the output of print `tuple[0]` if `tuple = ( 'abcd', 786 , 2.23, 'john', 70.2 )`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print first element of the tuple. \n",
"- Output would be `abcd`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 25. What is the output of print `tuple[1:3]` if `tuple = ( 'abcd', 786 , 2.23, 'john', 70.2 )`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print elements starting from 2nd till 3rd. \n",
"- Output would be `(786, 2.23)`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 26. What is the output of print `tuple[2:]` if `tuple = ( 'abcd', 786 , 2.23, 'john', 70.2 )`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print elements starting from 3rd element. \n",
"- Output would be `(2.23, 'john', 70.200000000000003)`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 27. What is the output of print `tinytuple * 2` if `tinytuple = (123, 'john')`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print tuple two times. \n",
"- Output would be `(123, 'john', 123, 'john')`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 28. What is the output of print `tuple + tinytuple` if `tuple = ( 'abcd', 786, 2.23, 'john', 70.2 )` and `tinytuple = (123, 'john')`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- It will print concatenated tuples. \n",
"- Output would be `('abcd', 786, 2.23, 'john', 70.200000000000003, 123, 'john')`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 29. What are Python's dictionaries?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Python's dictionaries are kind of hash table type. \n",
"- They work like associative arrays or hashes found in Perl and consist of key-value pairs. \n",
"- A dictionary key can be almost any Python type, but are usually numbers or strings. \n",
"- Values, on the other hand, can be any arbitrary Python object."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 30. How will you create a dictionary in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Dictionaries are enclosed by curly braces `{ }` and values can be assigned and accessed using square braces `[]`.\n",
"\n",
"```python\n",
"dict = {}\n",
"dict['one'] = \"This is one\"\n",
"dict[2] = \"This is two\"\n",
"tinydict = {'name': 'john','code':6734, 'dept': 'sales'}\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 31. How will you get all the keys from the dictionary?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Using `dictionary.keys()` function, we can get all the keys from the dictionary object.\n",
"\n",
"```python\n",
"print dict.keys() # Prints all the keys\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 32. How will you get all the values from the dictionary?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Using `dictionary.values()` function, we can get all the values from the dictionary object.\n",
"\n",
"```python\n",
"print dict.values() # Prints all the values\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 33. How will you convert a string to an int in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `int(x [,base])` - Converts `x` to an integer. `base` specifies the base if `x` is a string."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 34. How will you convert a string to a long in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `long(x [,base] )` - Converts `x` to a long integer. `base` specifies the base if `x` is a string."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 35. How will you convert a string to a float in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `float(x)` − Converts `x` to a floating-point number."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 36. How will you convert a object to a string in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `str(x)` − Converts object `x` to a string representation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 37. How will you convert a object to a regular expression in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `repr(x)` − Converts object `x` to an expression string."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 38. How will you convert a String to an object in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `eval(str)` − Evaluates a string and returns an object."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 39. How will you convert a string to a tuple in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `tuple(s)` − Converts `s` to a tuple."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 40. How will you convert a string to a list in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `list(s)` − Converts `s` to a list."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 41. How will you convert a string to a set in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `set(s)` − Converts `s` to a set."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 42. How will you create a dictionary using tuples in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `dict(d)` − Creates a dictionary. `d` must be a sequence of (key,value) tuples."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 43. How will you convert a string to a frozen set in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `frozenset(s)` − Converts `s` to a frozen set."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 44. How will you convert an integer to a character in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `chr(x)` − Converts an integer to a character."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 45. How will you convert an integer to an unicode character in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `unichr(x)` − Converts an integer to a Unicode character."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 46. How will you convert a single character to its integer value in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `ord(x)` − Converts a single character to its integer value."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 47. How will you convert an integer to hexadecimal string in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `hex(x)` − Converts an integer to a hexadecimal string."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 48. How will you convert an integer to octal string in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `oct(x)` − Converts an integer to an octal string."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 49. What is the purpose of `**` operator?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `**` Exponent − Performs exponential (power) calculation on operators. \n",
"- `a**b` = 10 to the power 20 if `a = 10` and `b = 20`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 50. What is the purpose of `//` operator?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `//` Floor Division − The division of operands where the result is the quotient in which the digits after the decimal point are removed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 51. What is the purpose of `is` operator?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `is` − Evaluates to `True` if the variables on either side of the operator point to the same object and false otherwise. `x` is `y`, here is results in 1 if `id(x)` equals `id(y)`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 52. What is the purpose of `not in` operator?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `not in` − Evaluates to `True` if it does not finds a variable in the specified sequence and false otherwise. `x` not in `y`, here not in results in a 1 if `x` is not a member of sequence `y`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 53. What is the purpose `break` statement in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `break` statement − Terminates the loop statement and transfers execution to the statement immediately following the loop."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 54. What is the purpose `continue` statement in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `continue` statement − Causes the loop to skip the remainder of its body and immediately retest its condition prior to reiterating."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 55. What is the purpose `pass` statement in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `pass` statement − The `pass` statement in Python is used when a statement is required syntactically but you do not want any command or code to execute."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 56. How can you pick a random item from a list or tuple?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `choice(seq)` − Returns a random item from a list, tuple, or string."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 57. How can you pick a random item from a range?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `randrange ([start,] stop [,step])` − returns a randomly selected element from range(start, stop, step)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 58. How can you get a random number in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `random()` − returns a random float `r`, such that 0 is less than or equal to `r` and `r` is less than 1."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 59. How will you set the starting value in generating random numbers?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `seed([x])` − Sets the integer starting value used in generating random numbers. Call this function before calling any other random module function. Returns `None`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 60. How will you randomizes the items of a list in place?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `shuffle(lst)` − Randomizes the items of a list in place. Returns `None`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 61. How will you capitalizes first letter of string?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `capitalize()` − Capitalizes first letter of string."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 62. How will you check in a string that all characters are alphanumeric?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `isalnum()` − Returns `True` if string has at least 1 character and all characters are alphanumeric and `False` otherwise."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 63. How will you check in a string that all characters are digits?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `isdigit()` − Returns `True` if string contains only digits and `False` otherwise."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 64. How will you check in a string that all characters are in lowercase?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `islower()` − Returns `True` if string has at least 1 cased character and all cased characters are in lowercase and `False` otherwise."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 65. How will you check in a string that all characters are numerics?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `isnumeric()` − Returns `True` if a unicode string contains only numeric characters and `False` otherwise."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 66. How will you check in a string that all characters are whitespaces?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `isspace()` − Returns `True` if string contains only whitespace characters and `False` otherwise."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 67. How will you check in a string that it is properly titlecased?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `istitle()` − Returns `True` if string is properly \"titlecased\" and `False` otherwise."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 68. How will you check in a string that all characters are in uppercase?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `isupper()` − Returns `True` if string has at least one cased character and all cased characters are in uppercase and `False` otherwise."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 69. How will you merge elements in a sequence?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `join(seq)` − Merges (concatenates) the string representations of elements in sequence `seq` into a string, with separator string."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 70. How will you get the length of the string?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `len(string)` − Returns the length of the string."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 71. How will you get a space-padded string with the original string left-justified to a total of width columns?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `ljust(width[, fillchar])` − Returns a space-padded string with the original string left-justified to a total of width columns."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 72. How will you convert a string to all lowercase?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `lower()` − Converts all uppercase letters in string to lowercase."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 73. How will you remove all leading whitespace in string?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `lstrip()` − Removes all leading whitespace in string."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 74. How will you get the max alphabetical character from the string?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `max(str)` − Returns the `max` alphabetical character from the string `str`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 75. How will you get the min alphabetical character from the string?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- ``min(str)` − Returns the `min` alphabetical character from the string `str`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 76. How will you replaces all occurrences of old substring in string with new string?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `replace(old, new [, max])` − Replaces all occurrences of old in string with new or at most max occurrences if `max` given."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 77. How will you remove all leading and trailing whitespace in string?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `strip([chars])` − Performs both `lstrip()` and `rstrip()` on string."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 78. How will you change case for all letters in string?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `swapcase()` − Inverts case for all letters in string."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 79. How will you get titlecased version of string?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `title()` − Returns \"titlecased\" version of string, that is, all words begin with uppercase and the rest are lowercase."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 80. How will you convert a string to all uppercase?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `upper()` − Converts all lowercase letters in string to uppercase."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 81. How will you check in a string that all characters are decimal?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `isdecimal()` − Returns `True` if a unicode string contains only decimal characters and `False` otherwise."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 82. What is the difference between `del()` and `remove()` methods of list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- To remove a list element, you can use either the `del` statement if you know exactly which element(s) you are deleting or the `remove()` method if you do not know."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 83. What is the output of `len([1, 2, 3])`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `3`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 84. What is the output of `[1, 2, 3] + [4, 5, 6]`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `[1, 2, 3, 4, 5, 6]`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 85. What is the output of `['Hi!'] * 4`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `['Hi!', 'Hi!', 'Hi!', 'Hi!']`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 86. What is the output of 3 in `[1, 2, 3]`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `True`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 87. What is the output of for `x in [1, 2, 3]: print x`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"```python\n",
"1\n",
"2\n",
"3\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 88. What is the output of `L[2]` if `L = [1,2,3]`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `3`, Offsets start at zero."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 89. What is the output of `L[-2]` if `L = [1,2,3]`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `1`, Negative: count from the right."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 90. What is the output of `L[1:]` if `L = [1,2,3]`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `2, 3`, Slicing fetches sections."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 91. How will you compare two lists?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `cmp(list1, list2)` − Compares elements of both lists."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 92. How will you get the length of a list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `len(list)` − Gives the total length of the list."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 93. How will you get the max valued item of a list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `max(list)` − Returns item from the list with max value."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 94. How will you get the min valued item of a list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `min(list)` − Returns item from the list with min value."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 95. How will you get the index of an object in a list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `list.index(obj)` − Returns the lowest index in list that `obj` appears."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 96. How will you insert an object at given index in a list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `list.insert(index, obj)` − Inserts object `obj` into list at offset index."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 97. How will you remove last object from a list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"`list.pop(obj=list[-1])` − Removes and returns last object or obj from list."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 98. How will you remove an object from a list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `list.remove(obj)` − Removes object `obj` from list."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 99. How will you reverse a list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `list.reverse()` − Reverses objects of list in place."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 100. How will you sort a list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `list.sort([func])` − Sorts objects of list, use compare `func` if given."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 101. What is lambda function in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `‘lambda’` is a keyword in python which creates an anonymous function. Lambda does not contain block of statements. It does not contain return statements."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 102. What we call a function which is incomplete version of a function?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `Stub`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 103. When a function is defined then the system stores parameters and local variables in an area of memory. What this memory is known as?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `Stack`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 104. A canvas can have a foreground color? (Yes/No)\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `Yes`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 105. Is Python platform independent?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- No. There are some modules and functions in python that can only run on certain platforms."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 106. Do you think Python has a complier?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Yes. Python complier which works automatically so we don’t notice the compiler of python."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 107. What are the applications of Python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"1. Django (Web framework of Python).\n",
"\n",
"2. Micro Frame work such as Flask and Bottle.\n",
"\n",
"3. Plone and Django CMS for advanced content Management."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 108. What is the basic difference between Python ver 2 and Python ver 3?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Table below explains the difference between Python version 2 and Python version 3.\n",
"\n",
"| S.No | Section | Python Version 2 | Python Version 3 | \n",
"|:-------|:---------------| :------ |:--------|\n",
"| 1. | Print Function | Print command can be used without parentheses. | Python 3 needs parentheses to print any string. It will raise error without parentheses. | \n",
"| 2. | Unicode | ASCII str() types and separate Unicode() but there is no byte type code in Python 2. | Unicode (utf-8) and it has two byte classes − Byte, Bytearray S. |\n",
"| 3. | Exceptions | Python 2 accepts both new and old notations of syntax. | Python 3 raises a SyntaxError in turn when we don’t enclose the exception argument in parentheses. |\n",
"| 4. | Comparing Unorderable | It does not raise any error. | It raises ‘TypeError’ as warning if we try to compare unorderable types. |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 109. Which programming Language is an implementation of Python programming language designed to run on Java Platform?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `Jython`. (Jython is successor of Jpython.)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 110. Is there any double data type in Python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `No`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 111. Is String in Python are immutable? (Yes/No)\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `Yes`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 112. Can `True = False` be possible in Python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `No`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 113. Which module of python is used to apply the methods related to OS.?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `OS`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 114. When does a new block begin in python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- A block begins when the line is intended by 4 spaces."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 115. Write a function in python which detects whether the given two strings are anagrams or not.\n",
"\n",
"<span class='label label-default'>Solution</span>"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2021-09-22T07:53:06.172548Z",
"start_time": "2021-09-22T07:53:06.155950Z"
}
},
"outputs": [],
"source": [
"def check(a,b):\n",
" if(len(a)!=len(b)):\n",
" return False\n",
" else:\n",
" if(sorted(list(a)) == sorted(list(b))):\n",
" return True\n",
" else:\n",
" return False"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 116. Name the python Library used for Machine learning.\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Scikit-learn python Library used for Machine learning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 117. What does `pass` operation do?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `pass` indicates that nothing is to be done i.e., it signifies a no operation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 118. Name the tools which python uses to find bugs (if any).\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- `Pylint` and `pychecker`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 119. Write a function to give the sum of all the numbers in list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"Sample list − (100, 200, 300, 400, 0, 500)\n",
"\n",
"Expected output − 1500"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2021-09-22T07:53:06.492373Z",
"start_time": "2021-09-22T07:53:06.180366Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sum of the numbers: 1500\n"
]
}
],
"source": [
"# Program for sum of all the numbers in list is −\n",
"\n",
"def sum(numbers):\n",
" total = 0\n",
" for num in numbers:\n",
" total+=num\n",
" print(\"Sum of the numbers: \", total)\n",
"sum((100, 200, 300, 400, 0, 500))\n",
"\n",
"# We define a function ‘sum’ with numbers as parameter. \n",
"#The in for loop we store the sum of all the values of list."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 120. Write a program in Python to reverse a string without using inbuilt function reverse string?\n",
"\n",
"<span class='label label-default'>Solution</span>\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2021-09-22T07:53:06.631533Z",
"start_time": "2021-09-22T07:53:06.498233Z"
},
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The length of string is: 6\n",
"point1\n"
]
}
],
"source": [
"# Reverse a string without using reverse() function\n",
"\n",
"def string_reverse(string):\n",
" i = len(string) - 1\n",
" print (\"The length of string is: \", len(string))\n",
" sNew = ''\n",
" while i >= 0:\n",
" sNew = sNew + str(string[i])\n",
" i = i -1\n",
" return sNew\n",
"print(string_reverse(\"1tniop\"))\n",
"\n",
"# First we declare a variable to store the reverse string. \n",
"# Then using while loop and indexing of string (index is calculated by string length) \n",
"# we reverse the string. While loop starts when index is greater than zero. \n",
"# Index is reduced to value 1 each time. When index reaches zero we obtain the reverse of string."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 121. Write a program to test whether the number is in the defined range or not?\n",
"\n",
"<span class='label label-default'>Solution</span>"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2021-09-22T07:53:06.735537Z",
"start_time": "2021-09-22T07:53:06.643256Z"
},
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"99 is in range\n"
]
}
],
"source": [
"# Program is −\n",
"\n",
"def test_range(num):\n",
" if num in range(0, 101):\n",
" print(\"%s is in range\"%str(num))\n",
" else:\n",
" print(\"%s is not in range\"%str(num))\n",
"# print(\"The number is outside the given range.\")\n",
" \n",
"test_range(99)\n",
"\n",
"# To test any number in a particular range we make use of the method ‘if..in’ and else condition."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 122. Write a program to calculate number of upper case letters and number of lower case letters?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"Test on String: 'The quick Brown Fox'"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2021-09-22T07:53:06.848818Z",
"start_time": "2021-09-22T07:53:06.745304Z"
},
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"String in testing is: The quick Brown Fox\n",
"Number of Lower Case characters in String: 3\n",
"Number of Upper Case characters in String: 13\n"
]
}
],
"source": [
"# Program is −\n",
"\n",
"def string_test(s):\n",
" d={\"UPPER_CASE\":0, \"LOWER_CASE\":0}\n",
" for c in s:\n",
" if c.isupper():\n",
" d[\"UPPER_CASE\"]+=1\n",
" elif c.islower():\n",
" d[\"LOWER_CASE\"]+=1\n",
" else:\n",
" pass\n",
" print (\"String in testing is: \", s)\n",
" print (\"Number of Lower Case characters in String: \", d[\"UPPER_CASE\"])\n",
" print (\"Number of Upper Case characters in String: \", d[\"LOWER_CASE\"])\n",
"\n",
"string_test('The quick Brown Fox')\n",
"\n",
"# We make use of the methods .isupper() and .islower(). We initialise the count for lower and upper. \n",
"# Using if and else condition we calculate total number of lower and upper case characters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}
================================================
FILE: 02_Predictive_Modeling.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<small><small><i>\n",
"All the IPython Notebooks in **Data Science Interview Questions** lecture series by **[Dr. Milaan Parmar](https://www.linkedin.com/in/milaanparmar/)** are available @ **[GitHub](https://github.com/milaan9/DataScience_Interview_Questions)**\n",
"</i></small></small>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Predictive Modeling ➞ <span class='label label-default'>19 Questions</span>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. (Given a Dataset) Analyze this dataset and give me a model that can predict this response variable."
]
},
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2021-09-21T13:31:28.708336Z",
"start_time": "2021-09-21T13:31:28.699521Z"
}
},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Problem Determination ➞ Data Cleaning ➞ Feature Engineering ➞ Modeling\n",
"\n",
"- Benchmark Models\n",
" - Linear Regression (Ridge or Lasso) for regression\n",
" - Logistic Regression for Classification\n",
" \n",
"- Advanced Models\n",
" - Random Forest, Boosting Trees, and so on\n",
" - Scikit-Learn, XGBoost, LightGBM, CatBoost\n",
" \n",
"- Determine if the problem is classification or regression.\n",
"\n",
"- Plot and visualize the data.\n",
"\n",
"- Start by fitting a simple model (multivariate regression, logistic regression), do some feature engineering accordingly, and then try some complicated models. Always split the dataset into train, validation, test dataset and use cross validation to check their performance.\n",
"\n",
"- Favor simple models that run quickly and you can easily explain.\n",
"\n",
"- Mention cross validation as a means to evaluate the model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. What could be some issues if the distribution of the test data is significantly different than the distribution of the training data?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- The model that has high training accuracy might have low test accuracy. Without further knowledge, it is hard to know which dataset represents the population data and thus the generalizability of the algorithm is hard to measure. This should be mitigated by repeated splitting of train vs. test dataset (as in cross validation).\n",
"- When there is a change in data distribution, this is called the dataset shift. If the train and test data has a different distribution, then the classifier would likely overfit to the train data.\n",
"- This issue can be overcome by using a more general learning method.\n",
"- This can occur when:\n",
" - $P(y|x)$ are the same but $P(x)$ are different. (covariate shift)\n",
" - $P(y|x)$ are different. (concept shift)\n",
"- The causes can be:\n",
" - Training samples are obtained in a biased way. (sample selection bias)\n",
" - Train is different from test because of temporal, spatial changes. (non-stationary environments)\n",
"- Solution to covariate shift\n",
" - importance weighted cv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. What are some ways I can make my model more robust to outliers?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- We can have regularization such as L1 or L2 to reduce variance (increase bias).\n",
"- Changes to the algorithm:\n",
" - Use tree-based methods instead of regression methods as they are more resistant to outliers. For statistical tests, use non parametric tests instead of parametric ones.\n",
" - Use robust error metrics such as MAE or Huber Loss instead of MSE.\n",
"- Changes to the data:\n",
" - Winsorizing the data\n",
" - Transforming the data (e.g. log)\n",
" - Remove them only if you’re certain they’re anomalies and not worth predicting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What are some differences you would expect in a model that minimizes squared error, versus a model that minimizes absolute error? In which cases would each error metric be appropriate?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- MSE is more strict to having outliers. MAE is more robust in that sense, but is harder to fit the model for because it cannot be numerically optimized. So when there are less variability in the model and the model is computationally easy to fit, we should use MAE, and if that’s not the case, we should use MSE.\n",
"- MSE: easier to compute the gradient, MAE: linear programming needed to compute the gradient\n",
"- MAE more robust to outliers. If the consequences of large errors are great, use MSE\n",
"- MSE corresponds to maximizing likelihood of Gaussian random variables"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5. What error metric would you use to evaluate how good a binary classifier is? What if the classes are imbalanced? What if there are more than 2 groups?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Accuracy: proportion of instances you predict correctly.\n",
" - Pros: intuitive, easy to explain\n",
" - Cons: works poorly when the class labels are imbalanced and the signal from the data is weak\n",
"- ROC curve and AUC: plot false-positive-rate (fpr) on the x axis and true-positive-rate (tpr) on the y axis for different threshold. Given a random positive instance and a random negative instance, the AUC is the probability that you can identify who's who.\n",
" - Pros: Works well when testing the ability of distinguishing the two classes.\n",
" - Cons: can’t interpret predictions as probabilities (because AUC is determined by rankings), so can’t explain the uncertainty of the model, and it doesn't work for multi-class case.\n",
"- logloss/deviance/cross entropy:\n",
" - Pros: error metric based on probabilities\n",
" - Cons: very sensitive to false positives, negatives\n",
"- When there are more than 2 groups, we can have k binary classifications and add them up for logloss. Some metrics like AUC is only applicable in the binary case."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6. What are various ways to predict a binary response variable? Can you compare two of them and tell me when one would be more appropriate? What’s the difference between these? (SVM, Logistic Regression, Naive Bayes, Decision Tree, etc.)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Things to look at: N, P, linearly separable, features independent, likely to overfit, speed, performance, memory usage and so on.\n",
"- Logistic Regression\n",
" - features roughly linear, problem roughly linearly separable\n",
" - robust to noise, use l1,l2 regularization for model selection, avoid overfitting\n",
" - the output come as probabilities\n",
" - efficient and the computation can be distributed\n",
" - can be used as a baseline for other algorithms\n",
" - (-) can hardly handle categorical features\n",
"- SVM\n",
" - with a nonlinear kernel, can deal with problems that are not linearly separable\n",
" - (-) slow to train, for most industry scale applications, not really efficient\n",
"- Naive Bayes\n",
" - computationally efficient when P is large by alleviating the curse of dimensionality\n",
" - works surprisingly well for some cases even if the condition doesn’t hold\n",
" - with word frequencies as features, the independence assumption can be seen reasonable. So the algorithm can be used in text categorization\n",
" - (-) conditional independence of every other feature should be met\n",
"- Tree Ensembles\n",
" - good for large N and large P, can deal with categorical features very well\n",
" - non parametric, so no need to worry about outliers\n",
" - GBT’s work better but the parameters are harder to tune\n",
" - RF works out of the box, but usually performs worse than GBT\n",
"- Deep Learning\n",
" - works well for some classification tasks (e.g. image)\n",
" - used to squeeze something out of the problem"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7. What is regularization and where might it be helpful? What is an example of using regularization in a model?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Regularization is useful for reducing variance in the model, meaning avoiding overfitting.\n",
"- For example, we can use L1 regularization in Lasso regression to penalize large coefficients and automatically select features, or we can also use L2 regularization for Ridge regression to penalize the feature coefficients."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 8. Why might it be preferable to include fewer predictors over many?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- When we add irrelevant features, it increases model's tendency to overfit because those features introduce more noise. When two variables are correlated, they might be harder to interpret in case of regression, etc.\n",
"- curse of dimensionality\n",
"- adding random noise makes the model more complicated but useless\n",
"- computational cost\n",
"- Ask someone for more details."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 9. Given training data on tweets and their retweets, how would you predict the number of retweets of a given tweet after 7 days after only observing 2 days worth of data?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Build a time series model with the training data with a seven day cycle and then use that for a new data with only 2 days data.\n",
"- Ask someone for more details.\n",
"- Build a regression function to estimate the number of retweets as a function of time t\n",
"- to determine if one regression function can be built, see if there are clusters in terms of the trends in the number of retweets\n",
"- if not, we have to add features to the regression function\n",
"- features + # of retweets on the first and the second day ➞ predict the seventh day\n",
"- https://en.wikipedia.org/wiki/Dynamic_time_warping"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 10. How could you collect and analyze data to use social media to predict the weather?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- We can collect social media data using twitter, Facebook, instagram API’s.\n",
"- Then, for example, for twitter, we can construct features from each tweet, e.g. the tweeted date, number of favorites, retweets, and of course, the features created from the tweeted content itself.\n",
"- Then use a multivariate time series model to predict the weather.\n",
"- Ask someone for more details."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 11. How would you construct a feed to show relevant content for a site that involves user interactions with items?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- We can do so using building a recommendation engine.\n",
"- The easiest we can do is to show contents that are popular other users, which is still a valid strategy if for example the contents are news articles.\n",
"- To be more accurate, we can build a content based filtering or collaborative filtering. If there’s enough user usage data, we can try collaborative filtering and recommend contents other similar users have consumed. If there isn’t, we can recommend similar items based on vectorization of items (content based filtering)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 12. How would you design the people you may know feature on LinkedIn or Facebook?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Find strong unconnected people in weighted connection graph\n",
" - Define similarity as how strong the two people are connected\n",
" - Given a certain feature, we can calculate the similarity based on\n",
" - friend connections (neighbors)\n",
" - Check-in’s people being at the same location all the time.\n",
" - same college, workplace\n",
" - Have randomly dropped graphs test the performance of the algorithm\n",
"- Ref. News Feed Optimization\n",
" - Affinity score: how close the content creator and the users are\n",
" - Weight: weight for the edge type (comment, like, tag, etc.). Emphasis on features the company wants to promote\n",
" - Time decay: the older the less important"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 13. How would you predict who someone may want to send a Snapchat or Gmail to?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- for each user, assign a score of how likely someone would send an email to\n",
"- the rest is feature engineering:\n",
" - number of past emails, how many responses, the last time they exchanged an email, whether the last email ends with a question mark, features about the other users, etc.\n",
"- Ask someone for more details.\n",
"- People who someone sent emails the most in the past, conditioning on time decay."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 14. How would you suggest to a franchise where to open a new store?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- build a master dataset with local demographic information available for each location.\n",
" - local income levels, proximity to traffic, weather, population density, proximity to other businesses\n",
" - a reference dataset on local, regional, and national macroeconomic conditions (e.g. unemployment, inflation, prime interest rate, etc.)\n",
" - any data on the local franchise owner-operators, to the degree the manager\n",
"- identify a set of KPIs acceptable to the management that had requested the analysis concerning the most desirable factors surrounding a franchise\n",
" - quarterly operating profit, ROI, EVA, pay-down rate, etc.\n",
"- run econometric models to understand the relative significance of each variable\n",
"- run machine learning algorithms to predict the performance of each location candidate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 15. In a search engine, given partial data on what the user has typed, how would you predict the user’s eventual search query?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Based on the past frequencies of words shown up given a sequence of words, we can construct conditional probabilities of the set of next sequences of words that can show up (n-gram). The sequences with highest conditional probabilities can show up as top candidates.\n",
"- To further improve this algorithm,\n",
" - we can put more weight on past sequences which showed up more recently and near your location to account for trends\n",
" - show your recent searches given partial data\n",
"- Personalize and localize the search\n",
" - Use the user's historical search data\n",
" - Use the historical data from the local region"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 16. Given a database of all previous alumni donations to your university, how would you predict which recent alumni are most likely to donate?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Based on frequency and amount of donations, graduation year, major, etc, construct a supervised regression (or binary classification) algorithm."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 17. You’re Uber and you want to design a heatmap to recommend to drivers where to wait for a passenger. How would you approach this?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Based on the past pickup location of passengers around the same time of the day, day of the week (month, year), construct\n",
"- Ask someone for more details.\n",
"- Based on the number of past pickups\n",
" - account for periodicity (seasonal, monthly, weekly, daily, hourly)\n",
" - special events (concerts, festivals, etc.) from tweets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 18. How would you build a model to predict a March Madness bracket?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- One vector each for team A and B. Take the difference of the two vectors and use that as an input to predict the probability that team A would win by training the model. Train the models using past tournament data and make a prediction for the new tournament by running the trained model for each round of the tournament\n",
"- Some extensions:\n",
" - Experiment with different ways of consolidating the 2 team vectors into one (e.g concantenating, averaging, etc)\n",
" - Consider using a RNN type model that looks at time series data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 19. You want to run a regression to predict the probability of a flight delay, but there are flights with delays of up to 12 hours that are really messing up your model. How can you address this?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- This is equivalent to making the model more robust to outliers.\n",
"- See **Question 3**."
]
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
================================================
FILE: 03_Programming.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<small><small><i>\n",
"All the IPython Notebooks in **Data Science Interview Questions** lecture series by **[Dr. Milaan Parmar](https://www.linkedin.com/in/milaanparmar/)** are available @ **[GitHub](https://github.com/milaan9/DataScience_Interview_Questions)**\n",
"</i></small></small>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Programming ➞ <span class='label label-default'>14 Questions</span>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Write a function to calculate all possible assignment vectors of `2n` users, where `n` users are assigned to group 0 (control), and `n` users are assigned to group 1 (treatment)."
]
},
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2021-09-21T13:31:28.708336Z",
"start_time": "2021-09-21T13:31:28.699521Z"
}
},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Recursive programming (sol in code)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2021-09-21T16:10:40.786976Z",
"start_time": "2021-09-21T16:10:40.771303Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1, 1, 0, 0], [1, 0, 1, 0], [1, 0, 0, 1], [0, 1, 1, 0], [0, 1, 0, 1], [0, 0, 1, 1]]\n"
]
}
],
"source": [
"def n_choose_k(n, k):\n",
" \"\"\" function to choose k from n \"\"\"\n",
" if k == 1:\n",
" ans = []\n",
" for i in range(n):\n",
" tmp = [0] * n\n",
" tmp[i] = 1\n",
" ans.append(tmp)\n",
" return ans\n",
"\n",
" if k == n:\n",
" return [[1] * n]\n",
"\n",
" ans = []\n",
" space = n - k + 1\n",
" for i in range(space):\n",
" assignment = [0] * (i + 1)\n",
" assignment[i] = 1\n",
" for c in n_choose_k(n - i - 1, k - 1):\n",
" ans.append(assignment + c)\n",
" return ans\n",
"\n",
"# test: choose 2 from 4\n",
"print(n_choose_k(4, 2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Given a list of tweets, determine the top 10 most used hashtags."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Store all the hashtags in a dictionary and use priority queue to solve the top-k problem\n",
" - An extension will be top-k problem using Hadoop/MapReduce"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. Program an algorithm to find the best approximate solution to the knapsack problem in a given time."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - **[https://en.wikipedia.org/wiki/Knapsack_problem](https://en.wikipedia.org/wiki/Knapsack_problem)**\n",
" - Greedy solution (add the best v/w as much as possible and move on to the next)\n",
" - Dynamic programming"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. Program an algorithm to find the best approximate solution to the traveling salesman problem in a given time."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - **[https://en.wikipedia.org/wiki/Travelling_salesman_problem](https://en.wikipedia.org/wiki/Travelling_salesman_problem)**\n",
" - Greedy\n",
" - Dynamic programming"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5. You have a stream of data coming in of size n, but you don’t know what n is ahead of time. Write an algorithm that will take a random sample of `k` elements. Can you write one that takes `O(k)` space?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - **[Reservoir sampling](https://en.wikipedia.org/wiki/Reservoir_sampling)**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6. Write an algorithm that can calculate the square root of a number."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Binary search or Newton's method"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7. Given a list of numbers, can you return the outliers?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Sort then select the highest and the lowest 2.5%\n",
" - Visualization can helps a lot"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 8. When can parallelism make your algorithms run faster? When could it make your algorithms run slower?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Ask someone for more details.\n",
" - compute in parallel when communication cost < computation cost\n",
" - ensemble trees\n",
" - minibatch\n",
" - cross validation\n",
" - forward propagation\n",
" - minibatch\n",
" - not suitable for online learning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 9. What are the different types of joins? What are the differences between them?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Inner Join, Left Join, Right Join, Outer Join, Self Join"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 10. Why might a join on a subquery be slow? How might you speed it up?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Change the subquery to a join.\n",
" - **[Stack Overflow Answers](https://stackoverflow.com/questions/31724903/why-might-a-join-on-a-subquery-be-slow-what-could-be-done-to-make-it-faster-s)**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 11. Describe the difference between primary keys and foreign keys in a SQL database."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Primary keys are columns whose value combinations must be unique in a specific table so that each row can be referenced uniquely.\n",
" - Foreign keys are columns that references columns (often primary keys) in other tables."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 12. Given a **COURSES** table with columns **course_id** and **course_name**, a **FACULTY** table with columns **faculty_id** and **faculty_name**, and a **COURSE_FACULTY** table with columns **faculty_id** and **course_id**, how would you return a list of faculty who teach a course given the name of a course?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"```SQL\n",
"SELECT f.faculty_name\n",
"FROM COURSES c\n",
"JOIN COURSE_FACULTY cf\n",
" ON c.course_id = cf.course_id\n",
"JOIN FACULTY\n",
" ON f.faculty_id = cf.faculty_id\n",
"WHERE c.course_name = xxx;\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 13. Given a **IMPRESSIONS** table with **ad_id**, **click** (an indicator that the ad was clicked), and **date**, write a SQL query that will tell me the click-through-rate of each ad by month."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"```SQL\n",
"SELECT ad_id, MONTH(date), AVG(click)\n",
"FROM IMPRESSIONS\n",
"GROUP BY ad_id, MONTH(date);\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 14. Write a query that returns the name of each department and a count of the number of employees in each: \n",
"\n",
"- **EMPLOYEES** containing: **Emp_ID** (Primary key) and **Emp_Name** \n",
"- **EMPLOYEE_DEPT** containing: **Emp_ID** (Foreign key) and **Dept_ID** (Foreign key) \n",
"- **DEPTS** containing: **Dept_ID** (Primary key) and **Dept_Name**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"```SQL\n",
"SELECT d.Dept_Name, COUNT(*)\n",
"FROM DEPTS d\n",
"LEFT JOIN EMPLOYEE_DEPT ed\n",
" ON d.Dept_ID = ed.Dept_ID\n",
"GROUP BY d.Dept_Name;\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
================================================
FILE: 04_Probability.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<small><small><i>\n",
"All the IPython Notebooks in **Data Science Interview Questions** lecture series by **[Dr. Milaan Parmar](https://www.linkedin.com/in/milaanparmar/)** are available @ **[GitHub](https://github.com/milaan9/DataScience_Interview_Questions)**\n",
"</i></small></small>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Probability ➞ <span class='label label-default'>20 Questions</span>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Bobo the amoeba has a 25%, 25%, and 50% chance of producing 0, 1, or 2 o spring, respectively. Each of Bobo’s descendants also have the same probabilities. What is the probability that Bobo’s lineage dies out?"
]
},
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2021-09-21T13:31:28.708336Z",
"start_time": "2021-09-21T13:31:28.699521Z"
}
},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - $p=1/4+1/4*p+1/2*p^2$ \n",
" - $p=1/2$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. In any 15-minute interval, there is a 20% probability that you will see at least one shooting star. What is the probability that you see at least one shooting star in the period of an hour?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - $1-(0.8)^4 = 0.5904$\n",
" - Or, we can use Poisson processes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. How can you generate a random number between 1 - 7 with only a die?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - **[Quora Answer](https://www.quora.com/How-can-you-generate-a-random-number-between-1-7-with-only-a-die-1)**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. How can you get a fair coin toss if someone hands you a coin that is weighted to come up heads more often than tails?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Flip twice:\n",
" - HT ➞ H\n",
" - TH ➞ T\n",
" - If HH or TT, repeat."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5. You have an 50-50 mixture of two normal distributions with the same standard deviation. How far apart do the means need to be in order for this distribution to be bimodal?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - more than two standard deviations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6. Given draws from a normal distribution with known parameters, how can you simulate draws from a uniform distribution?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Plug in the value to the CDF of the same random variable"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7. A certain couple tells you that they have two children, at least one of which is a girl. What is the probability that they have two girls?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - gg, gb, bg ➞ 1/3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 8. You have a group of couples that decide to have children until they have their first girl, after which they stop having children. What is the expected gender ratio of the children that are born? What is the expected number of children each couple will have?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Geometric distribution with $p = 0.5$\n",
" - gender ratio is $1:1$. Expected number of children is 2.\n",
" - let X be the number of children until getting a female (happens with prob 1/2). this follows a geometric distribution with probability 1/2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 9. How many ways can you split 12 people into 3 teams of 4?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - the outcome follows a multinomial distribution with $n=12$ and $k=3$. but the classes are indistinguishable\n",
" - $(12, 8) * (8, 4) * (4, 4) / (3, 3)$\n",
" - $12! / (4!)^3 / 3!$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 10. Your hash function assigns each object to a number between 1:10, each with equal probability. With 10 objects, what is the probability of a hash collision? What is the expected number of hash collisions? What is the expected number of hashes that are unused."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - the probability of a hash collision ➞ $1-(10!/10^{10})$\n",
" - the expected number of hash collisions ➞ $10(1 - (1-1/10)^{10})$\n",
" - **[Quora Reference](https://www.quora.com/Your-hash-function-assigns-each-object-to-a-number-between-1-10-each-with-equal-probability-With-10-objects-what-is-the-probability-of-a-hash-collision-What-is-the-expected-number-of-hash-collisions-What-is-the-expected-number-of-hashes-that-are-unused)**\n",
" - the expected number of hashes that are unused ➞ $10*(9/10)^{10}$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 11. You call 2 UberX’s and 3 Lyfts. If the time that each takes to reach you is IID, what is the probability that all the Lyfts arrive first? What is the probability that all the UberX’s arrive first?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Lyfts arrive first ➞ $2! * 3! / 5!$\n",
" - Ubers arrive first ➞ same"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 12. I write a program should print out all the numbers from 1 to 300, but prints out Fizz instead if the number is divisible by 3, Buzz instead if the number is divisible by 5, and FizzBuzz if the number is divisible by 3 and 5. What is the total number of numbers that is either Fizzed, Buzzed, or FizzBuzzed?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - $100+60-20=140$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 13. On a dating site, users can select 5 out of 24 adjectives to describe themselves. A match is declared between two users if they match on at least 4 adjectives. If Alice and Bob randomly pick adjectives, what is the probability that they form a match?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - $= 24C5*(1+5(24-5))/24C5*24C5$ \n",
" - $= 4/1771$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 14. A lazy high school senior types up application and envelopes to `n` different colleges, but puts the applications randomly into the envelopes. What is the expected number of applications that went to the right college?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 15. Let’s say you have a very tall father. On average, what would you expect the height of his son to be? Taller, equal, or shorter? What if you had a very short father?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Shorter. Regression to the mean"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 16. What’s the expected number of coin flips until you get two heads in a row? What’s the expected number of coin flips until you get two tails in a row?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - $x = 0.25 * 2 + 0.25 * (x + 2) + 0.5 * (x + 1)$ \n",
" - $x = 6$\n",
" - **[Quora Reference](https://www.quora.com/What-is-the-expected-number-of-coin-flips-until-you-get-two-heads-in-a-row)**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 17. Let’s say we play a game where I keep flipping a coin until I get heads. If the first time I get heads is on the nth coin, then I pay you `2n-1` dollars. How much would you pay me to play this game?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - less than $3\n",
" - **[Quora reference](https://www.quora.com/I-will-flip-a-coin-until-I-get-my-first-heads-I-will-then-pay-you-2-n-1-where-n-is-the-total-number-of-coins-I-flipped-How-much-would-you-pay-me-to-play-this-game-You-can-only-play-once)**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 18. You have two coins, one of which is fair and comes up heads with a probability 1/2, and the other which is biased and comes up heads with probability 3/4. You randomly pick coin and flip it twice, and get heads both times. What is the probability that you picked the fair coin?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - 4/13\n",
" - Bayesian method"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 19. You have a 0.1% chance of picking up a coin with both heads, and a 99.9% chance that you pick up a fair coin. You flip your coin and it comes up heads 10 times. What’s the chance that you picked up the fair coin, given the information that you observed?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Bayesian method"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 20. What is a P-Value ?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- https://en.wikipedia.org/wiki/P-value"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
================================================
FILE: 05_Statistical_Inference.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<small><small><i>\n",
"All the IPython Notebooks in **Data Science Interview Questions** lecture series by **[Dr. Milaan Parmar](https://www.linkedin.com/in/milaanparmar/)** are available @ **[GitHub](https://github.com/milaan9/DataScience_Interview_Questions)**\n",
"</i></small></small>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Statistical Inference ➞ <span class='label label-default'>15 Questions</span>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. In an A/B test, how can you check if assignment to the various buckets was truly random?"
]
},
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2021-09-21T13:31:28.708336Z",
"start_time": "2021-09-21T13:31:28.699521Z"
}
},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Plot the distributions of multiple features for both A and B and make sure that they have the same shape. More rigorously, we can conduct a permutation test to see if the distributions are the same.\n",
" - MANOVA to compare different means"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. What might be the benefits of running an A/A test, where you have two buckets who are exposed to the exact same product?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Verify the sampling algorithm is random."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. What would be the hazards of letting users sneak a peek at the other bucket in an A/B test?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - The user might not act the same suppose had they not seen the other bucket. You are essentially adding additional variables of whether the user peeked the other bucket, which are not random across groups."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. What would be some issues if blogs decide to cover one of your experimental groups?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Same as the previous question. The above problem can happen in larger scale."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5. How would you conduct an A/B test on an opt-in feature? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Ask someone for more details."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6. How would you run an A/B test for many variants, say 20 or more?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - one control, 20 treatment, if the sample size for each group is big enough.\n",
" - Ways to attempt to correct for this include changing your confidence level (e.g. Bonferroni Correction) or doing family-wide tests before you dive in to the individual metrics (e.g. Fisher's Protected LSD)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7. How would you run an A/B test if the observations are extremely right-skewed?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - lower the variability by modifying the KPI\n",
" - cap values\n",
" - percentile metrics\n",
" - log transform\n",
" - <https://www.quora.com/How-would-you-run-an-A-B-test-if-the-observations-are-extremely-right-skewed>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 8. I have two different experiments that both change the sign-up button to my website. I want to test them at the same time. What kinds of things should I keep in mind?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - exclusive ➞ ok"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 9. What is a p-value? What is the difference between type-1 and type-2 error?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - **[en.wikipedia.org/wiki/P-value](https://en.wikipedia.org/wiki/P-value)**\n",
" - type-1 error: rejecting Ho when Ho is true\n",
" - type-2 error: not rejecting Ho when Ha is true"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 10. You are AirBnB and you want to test the hypothesis that a greater number of photographs increases the chances that a buyer selects the listing. How would you test this hypothesis?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - For randomly selected listings with more than 1 pictures, hide 1 random picture for group A, and show all for group B. Compare the booking rate for the two groups.\n",
" - Ask someone for more details."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 11. How would you design an experiment to determine the impact of latency on user engagement?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - The best way I know to quantify the impact of performance is to isolate just that factor using a slowdown experiment, i.e., add a delay in an A/B test."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 12. 12. What is maximum likelihood estimation? Could there be any case where it doesn’t exist?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - A method for parameter optimization (fitting a model). We choose parameters so as to maximize the likelihood function (how likely the outcome would happen given the current data and our model).\n",
" - maximum likelihood estimation (MLE) is a method of **[estimating](https://en.wikipedia.org/wiki/Estimator \"Estimator\")** the **[parameters](https://en.wikipedia.org/wiki/Statistical_parameter \"Statistical parameter\")** of a **[statistical model](https://en.wikipedia.org/wiki/Statistical_model \"Statistical model\")** given observations, by finding the parameter values that maximize the **[likelihood](https://en.wikipedia.org/wiki/Likelihood \"Likelihood\")** of making the observations given the parameters. MLE can be seen as a special case of the **[maximum a posteriori estimation](https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation \"Maximum a posteriori estimation\")** (MAP) that assumes a **[uniform](https://en.wikipedia.org/wiki/Uniform_distribution_\\(continuous\\) \"Uniform distribution \\(continuous\\)\")** **[prior distribution](https://en.wikipedia.org/wiki/Prior_probability \"Prior probability\")** of the parameters, or as a variant of the MAP that ignores the prior and which therefore is **[unregularized](https://en.wikipedia.org/wiki/Regularization_\\(mathematics\\) \"Regularization \\(mathematics\\)\")**.\n",
" - for gaussian mixtures, non parametric models, it doesn’t exist"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 13. What’s the difference between a MAP, MOM, MLE estimator? In which cases would you want to use each?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - MAP estimates the posterior distribution given the prior distribution and data which maximizes the likelihood function. MLE is a special case of MAP where the prior is uninformative uniform distribution.\n",
" - MOM sets moment values and solves for the parameters. MOM is not used much anymore because maximum likelihood estimators have higher probability of being close to the quantities to be estimated and are more often unbiased."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 14. What is a confidence interval and how do you interpret it?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - For example, 95% confidence interval is an interval that when constructed for a set of samples each sampled in the same way, the constructed intervals include the true mean 95% of the time.\n",
" - if confidence intervals are constructed using a given confidence level in an infinite number of independent experiments, the proportion of those intervals that contain the true value of the parameter will match the confidence level."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 15. What is unbiasedness as a property of an estimator? Is this always a desirable property when performing inference? What about in data analysis or predictive modeling?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Unbiasedness means that the expectation of the estimator is equal to the population value we are estimating. This is desirable in inference because the goal is to explain the dataset as accurately as possible. However, this is not always desirable for data analysis or predictive modeling as there is the bias variance tradeoff. We sometimes want to prioritize the generalizability and avoid overfitting by reducing variance and thus increasing bias."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
================================================
FILE: 06_Data_Analysis.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<small><small><i>\n",
"All the IPython Notebooks in **Data Science Interview Questions** lecture series by **[Dr. Milaan Parmar](https://www.linkedin.com/in/milaanparmar/)** are available @ **[GitHub](https://github.com/milaan9/DataScience_Interview_Questions)**\n",
"</i></small></small>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data Analysis ➞ <span class='label label-default'>27 Questions</span>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. (Given a Dataset) Analyze this dataset and tell me what you can learn from it."
]
},
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2021-09-21T13:31:28.708336Z",
"start_time": "2021-09-21T13:31:28.699521Z"
}
},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Typical data cleaning and visualization."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. What is `R2`? What are some other metrics that could be better than `R2` and why?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- goodness of fit measure. variance explained by the regression / total variance.\n",
" \n",
" - the more predictors you add, the higher $R^2$ becomes.\n",
" - hence use adjusted $R^2$ which adjusts for the degrees of freedom. \n",
" - or train error metrics."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. What is the curse of dimensionality?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- High dimensionality makes clustering hard, because having lots of dimensions means that everything is **\"far away\"** from each other.\n",
" - For example, to cover a fraction of the volume of the data we need to capture a very wide range for each variable as the number of variables increases.\n",
" - All samples are close to the edge of the sample. And this is a bad news because prediction is much more difficult near the edges of the training sample.\n",
" - The sampling density decreases exponentially as p increases and hence the data becomes much more sparse without significantly more data. \n",
" - We should conduct PCA to reduce dimensionality."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. Is more data always better?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- **Statistically**\n",
" - It depends on the quality of your data, for example, if your data is biased, just getting more data won’t help.\n",
" - It depends on your model. If your model suffers from high bias, getting more data won’t improve your test results beyond a point. You’d need to add more features, etc.\n",
" \n",
" - **Practically**\n",
" - More data usually benefit the models.\n",
" - Also there’s a tradeoff between having more data and the additional storage, computational power, memory it requires. Hence, always think about the cost of having more data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5. What are advantages of plotting your data before performing analysis?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Data sets have errors. You won't find them all but you might find some. That 212 year old man. That 9 foot tall woman. \n",
" - Variables can have skewness, outliers, etc. Then the arithmetic mean might not be useful, which means the standard deviation isn't useful.\n",
" - Variables can be multimodal! If a variable is multimodal then anything based on its mean or median is going to be suspect."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6. How can you make sure that you don’t analyze something that ends up meaningless?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Proper exploratory data analysis. \n",
" - In every data analysis task, there's the exploratory phase where you're just graphing things, testing things on small sets of the data, summarizing simple statistics, and getting rough ideas of what hypotheses you might want to pursue further. \n",
" - Then there's the exploratory phase, where you look deeply into a set of hypotheses. \n",
" - The exploratory phase will generate lots of possible hypotheses, and the exploratory phase will let you really understand a few of them. Balance the two and you'll prevent yourself from wasting time on many things that end up meaningless, although not all."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7. What is the role of trial and error in data analysis? What is the the role of making a hypothesis before diving in?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- data analysis is a repetition of setting up a new hypothesis and trying to refute the null hypothesis.\n",
" - The scientific method is eminently inductive: we elaborate a hypothesis, test it and refute it or not. As a result, we come up with new hypotheses which are in turn tested and so on. This is an iterative process, as science always is."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 8. How can you determine which features are the most important in your model?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Linear regression can use p-value\n",
" - run the features though a Gradient Boosting Machine or Random Forest to generate plots of relative importance and information gain for each feature in the ensembles.\n",
" - Look at the variables added in forward variable selection. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 9. How do you deal with some of your predictors being missing?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Remove rows with missing values - This works well if\n",
" - the values are missing randomly (see [Vinay Prabhu's answer](https://www.quora.com/How-can-I-deal-with-missing-values-in-a-predictive-model/answer/Vinay-Prabhu-7) for more details on this)\n",
" - if you don't lose too much of the dataset after doing so.\n",
" - Build another predictive model to predict the missing values.\n",
" - This could be a whole project in itself, so simple techniques are usually used here.\n",
" - Use a model that can incorporate missing data. \n",
" - Like a random forest, or any tree-based method."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 10. You have several variables that are positively correlated with your response, and you think combining all of the variables could give you a good prediction of your response. However, you see that in the multiple linear regression, one of the weights on the predictors is negative. What could be the issue?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Multicollinearity refers to a situation in which two or more explanatory variables in a [multiple regression](https://en.wikipedia.org/wiki/Multiple_regression \"Multiple regression\") model are highly linearly related. \n",
" - Leave the model as is, despite multicollinearity. The presence of multicollinearity doesn't affect the efficiency of extrapolating the fitted model to new data provided that the predictor variables follow the same pattern of multicollinearity in the new data as in the data on which the regression model is based.\n",
" - principal component regression"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 11. Let’s say you’re given an unfeasible amount of predictors in a predictive modeling task. What are some ways to make the prediction more feasible?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - PCA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 12. Now you have a feasible amount of predictors, but you’re fairly sure that you don’t need all of them. How would you perform feature selection on the dataset?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - ridge / lasso / elastic net regression.\n",
" - Univariate Feature Selection where a statistical test is applied to each feature individually. You retain only the best features according to the test outcome scores.\n",
" - Recursive Feature Elimination: \n",
" - First, train a model with all the feature and evaluate its performance on held out data.\n",
" - Then drop let say the 10% weakest features (e.g. the feature with least absolute coefficients in a linear model) and retrain on the remaining features.\n",
" - Iterate until you observe a sharp drop in the predictive accuracy of the model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 13. Your linear regression didn’t run and communicates that there are an infinite number of best estimates for the regression coefficients. What could be wrong?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - p > n.\n",
" - If some of the explanatory variables are perfectly correlated (positively or negatively) then the coefficients would not be unique. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 14. You run your regression on different subsets of your data, and find that in each subset, the beta value for a certain variable varies wildly. What could be the issue here?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - The dataset might be heterogeneous. In which case, it is recommended to cluster datasets into different subsets wisely, and then draw different models for different subsets. Or, use models like non parametric models (trees) which can deal with heterogeneity quite nicely."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 15. What is the main idea behind ensemble learning? If I had many different models that predicted the same response variable, what might I want to do to incorporate all of the models? Would you expect this to perform better than an individual model or worse?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - The assumption is that a group of weak learners can be combined to form a strong learner.\n",
" - Hence the combined model is expected to perform better than an individual model.\n",
" - Assumptions:\n",
" - average out biases\n",
" - reduce variance\n",
" - Bagging works because some underlying learning algorithms are unstable: slightly different inputs leads to very different outputs. If you can take advantage of this instability by running multiple instances, it can be shown that the reduced instability leads to lower error. If you want to understand why, the original bagging paper( [http://www.springerlink.com/](http://www.springerlink.com/content/l4780124w2874025/)) has a section called \"why bagging works\"\n",
" - Boosting works because of the focus on better defining the \"decision edge\". By re-weighting examples near the margin (the positive and negative examples) you get a reduced error (see http://citeseerx.ist.psu.edu/vie...)\n",
" - Use the outputs of your models as inputs to a meta-model. \n",
"\n",
"**For example:** if you're doing binary classification, you can use all the probability outputs of your individual models as inputs to a final logistic regression (or any model, really) that can combine the probability estimates. \n",
"\n",
"One very important point is to make sure that the output of your models are out-of-sample predictions. This means that the predicted value for any row in your data-frame should NOT depend on the actual value for that row."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 16. Given that you have wi-fi data in your office, how would you determine which rooms and areas are underutilized and over-utilized?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - If the data is more used in one room, then that one is over utilized!\n",
" - Maybe account for the room capacity and normalize the data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 17. How could you use GPS data from a car to determine the quality of a driver?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Speed\n",
" - Driving paths"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 18. Given accelerometer, altitude, and fuel usage data from a car, how would you determine the optimum acceleration pattern to drive over hills?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Historical data?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 19. Given position data of NBA players in a season’s games, how would you evaluate a basketball player’s defensive ability?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Evaluate his positions in the court."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 20. How would you quantify the influence of a Twitter user?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - like page rank with each user corresponding to the webpages and linking to the page equivalent to following."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 21. Given location data of golf balls in games, how would construct a model that can advise golfers where to aim?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - winning probability for different positions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 22. You have 100 mathletes and 100 math problems. Each mathlete gets to choose 10 problems to solve. Given data on who got what problem correct, how would you rank the problems in terms of difficulty?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - One way you could do this is by storing a \"skill level\" for each user and a \"difficulty level\" for each problem. We assume that the probability that a user solves a problem only depends on the skill of the user and the difficulty of the problem.* Then we maximize the likelihood of the data to find the hidden skill and difficulty levels.\n",
" - The Rasch model for dichotomous data takes the form: \n",
" \n",
"$ {\\displaystyle \\Pr {X_{ni}=1\\\\} = {\\frac {\\exp({\\beta_{n}}-{\\delta_{i}})}{1+\\exp({\\beta_{n}}-{\\delta_{i}})}},} $\n",
"\n",
"where is the ability of person and is the difficulty of item."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 23. You have 5000 people that rank 10 sushis in terms of saltiness. How would you aggregate this data to estimate the true saltiness rank in each sushi?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - Some people would take the mean rank of each sushi. If I wanted something simple, I would use the median, since ranks are (strictly speaking) ordinal and not interval, so adding them is a bit risque (but people do it all the time and you probably won't be far wrong)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 24. Given data on congressional bills and which congressional representatives co-sponsored the bills, how would you determine which other representatives are most similar to yours in voting behavior? How would you evaluate who is the most liberal? Most republican? Most bipartisan?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - collaborative filtering. you have your votes and we can calculate the similarity for each representatives and select the most similar representative.\n",
" - for liberal and republican parties, find the mean vector and find the representative closest to the center point."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 25. How would you come up with an algorithm to detect plagiarism in online content?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - reduce the text to a more compact form (e.g. fingerprinting, bag of words) then compare those with other texts by calculating the similarity."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 26. You have data on all purchases of customers at a grocery store. Describe to me how you would program an algorithm that would cluster the customers into groups. How would you determine the appropriate number of clusters to include?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - K-means\n",
" - choose a small value of k that still has a low SSE (elbow method)\n",
" - [Elbow method](https://bl.ocks.org/rpgove/0060ff3b656618e9136b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 27. Let’s say you’re building the recommended music engine at Spotify to recommend people music based on past listening history. How would you approach this problem?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" - content-based filtering\n",
" - collaborative filtering"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n"
]
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
================================================
FILE: 07_Product_Metrics.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<small><small><i>\n",
"All the IPython Notebooks in **Data Science Interview Questions** lecture series by **[Dr. Milaan Parmar](https://www.linkedin.com/in/milaanparmar/)** are available @ **[GitHub](https://github.com/milaan9/DataScience_Interview_Questions)**\n",
"</i></small></small>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Product Metrics ➞ <span class='label label-default'>15 Questions</span>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. What would be good metrics of success for an advertising-driven consumer product? (Buzzfeed, YouTube, Google Search, etc.) A service-driven consumer product? (Uber, Flickr, Venmo, etc.)"
]
},
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2021-09-21T13:31:28.708336Z",
"start_time": "2021-09-21T13:31:28.699521Z"
}
},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" * advertising-driven: Page-views and daily actives, CTR, CPC (cost per click)\n",
" * click-ads \n",
" * display-ads \n",
" * service-driven: number of purchases, conversion rate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. What would be good metrics of success for a productivity tool? (Evernote, Asana, Google Docs, etc.) A MOOC? (edX, Coursera, Udacity, etc.)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" * Productivity tool: same as premium subscriptions\n",
" * MOOC: same as premium subscriptions, completion rate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. What would be good metrics of success for an e-commerce product? (Etsy, Groupon, Birchbox, etc.) A subscription product? (Net ix, Birchbox, Hulu, etc.) Premium subscriptions? (OKCupid, LinkedIn, Spotify, etc.) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" * e-commerce: number of purchases, conversion rate, Hourly, daily, weekly, monthly, quarterly, and annual sales, Cost of goods sold, Inventory levels, Site traffic, Unique visitors versus returning visitors, Customer service phone call count, Average resolution time\n",
" * subscription\n",
" * churn, CoCA, ARPU, MRR, LTV\n",
" * premium subscriptions: \n",
" * subscription rate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. What would be good metrics of success for a consumer product that relies heavily on engagement and interaction? (Snapchat, Pinterest, Facebook, etc.) A messaging product? (GroupMe, Hangouts, Snapchat, etc.)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" * heavily on engagement and interaction: uses AU ratios, email summary by type, and push notification summary by type, resurrection ratio\n",
" * messaging product: \n",
" * daily, monthly active users"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5. What would be good metrics of success for a product that offered in-app purchases? (Zynga, Angry Birds, other gaming apps)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" * Average Revenue Per Paid User\n",
" * Average Revenue Per User"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6. A certain metric is violating your expectations by going down or up more than you expect. How would you try to identify the cause of the change?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" * breakdown the KPI’s into what consists them and find where the change is\n",
" * then further breakdown that basic KPI by channel, user cluster, etc. and relate them with any campaigns, changes in user behaviors in that segment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7. Growth for total number of tweets sent has been slow this month. What data would you look at to determine the cause of the problem?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" * Historical data, especially historical data at the same month\n",
" * Outer data, such as economic data, political data, data about competitors"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 8. You’re a restaurant and are approached by Groupon to run a deal. What data would you ask from them in order to determine whether or not to do the deal?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" * for similar restaurants (they should define similarity), average increase in revenue gain per coupon, average increase in customers per coupon"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 9. You are tasked with improving the efficiency of a subway system. Where would you start?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" * define efficiency"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 10. Say you are working on Facebook News Feed. What would be some metrics that you think are important? How would you make the news each person gets more relevant?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" * rate for each action, duration users stay, CTR for sponsor feed posts\n",
" * ref. News Feed Optimization\n",
" * Affinity score: how close the content creator and the users are\n",
" * Weight: weight for the edge type (comment, like, tag, etc.). Emphasis on features the company wants to promote\n",
" * Time decay: the older the less important"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 11. How would you measure the impact that sponsored stories on Facebook News Feed have on user engagement? How would you determine the optimum balance between sponsored stories and organic content on a user’s News Feed?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" * AB test on different balance ratio and see "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 12. You are on the data science team at Uber and you are asked to start thinking about surge pricing. What would be the objectives of such a product and how would you start looking into this?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" * there is a gradual step-function type scaling mechanism until that imbalance of requests-to-drivers is alleviated and then vice versa as too many drivers come online enticed by the surge pricing structure. \n",
" * I would bet the algorithm is custom tailored and calibrated to each location as price elasticities almost certainly vary across different cities depending on a huge multitude of variables: income, distance/sprawl, traffic patterns, car ownership, etc. With the massive troves of user data that Uber probably has collected, they most likely have tweaked the algorithms for each city to adjust for these varying sensitivities to surge pricing. Throw in some machine learning and incredibly rich data and you've got yourself an incredible, constantly-evolving algorithm. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 13. Say that you are Netflix. How would you determine what original series you should invest in and create?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" * Netflix uses data to estimate the potential market size for an original series before giving it the go-ahead."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 14. What kind of services would find churn (metric that tracks how many customers leave the service) helpful? How would you calculate churn?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" * subscription based services"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 15. Let’s say that you’re are scheduling content for a content provider on television. How would you determine the best times to schedule content?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
" * Based on similar product and the corresponding broadcast popularity"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
================================================
FILE: 08_Communication.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<small><small><i>\n",
"All the IPython Notebooks in **Data Science Interview Questions** lecture series by **[Dr. Milaan Parmar](https://www.linkedin.com/in/milaanparmar/)** are available @ **[GitHub](https://github.com/milaan9/DataScience_Interview_Questions)**\n",
"</i></small></small>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Communication ➞ <span class='label label-default'>5 Questions</span>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Explain to me a technical concept related to the role that you’re interviewing for."
]
},
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2021-09-21T13:31:28.708336Z",
"start_time": "2021-09-21T13:31:28.699521Z"
}
},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- AB test, PCA, data science, machine learning, neural networks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Introduce me to something you’re passionate about."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Data science"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. How would you explain an A/B test to an engineer with no statistics background? A linear regression?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- A/B testing, or more broadly, multivariate testing, is the testing of different elements of a user's experience to determine which variation helps the business achieve its goal more effectively (i.e. increasing conversions, etc..) This can be copy on a web site, button colors, different user interfaces, different email subject lines, calls to action, offers, etc. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. How would you explain a confidence interval to an engineer with no statistics background? What does 95% confidence mean?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [link](https://www.quora.com/What-is-a-confidence-interval-in-laymans-terms)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5. How would you explain to a group of senior executives why data is important?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span class='label label-default'>Solution</span>\n",
"\n",
"- Examples"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
================================================
FILE: 09_Coding.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<small><small><i>\n",
"All the IPython Notebooks in **Data Science Interview Questions** lecture series by **[Dr. Milaan Parmar](https://www.linkedin.com/in/milaanparmar/)** are available @ **[GitHub](https://github.com/milaan9/DataScience_Interview_Questions)**\n",
"</i></small></small>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Write a function to calculate all possible assignment vectors of `2n` users, where `n` users are assigned to group 0 (control), and `n` users are assigned to group 1 (treatment).\n",
"\n",
"<span class='label label-default'>Solution</span>"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1, 1, 0, 0], [1, 0, 1, 0], [1, 0, 0, 1], [0, 1, 1, 0], [0, 1, 0, 1], [0, 0, 1, 1]]\n"
]
}
],
"source": [
"def n_choose_k(n, k):\n",
" \"\"\" function to choose k from n \"\"\"\n",
" if k == 1:\n",
" ans = []\n",
" for i in range(n):\n",
" tmp = [0] * n\n",
" tmp[i] = 1\n",
" ans.append(tmp)\n",
" return ans\n",
" \n",
" if k == n:\n",
" return [[1] * n]\n",
" \n",
" ans = []\n",
" space = n - k + 1\n",
" for i in range(space):\n",
" assignment = [0] * (i + 1)\n",
" assignment[i] = 1\n",
" for c in n_choose_k(n - i - 1, k - 1):\n",
" ans.append(assignment + c)\n",
" return ans\n",
"\n",
"# test: choose 2 from 4\n",
"print(n_choose_k(4, 2))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
================================================
FILE: 10_Linkedin_Skill_Assessment_Python.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<small><small><i>\n",
"All the IPython Notebooks in **Data Science Interview Questions** lecture series by **[Dr. Milaan Parmar](https://www.linkedin.com/in/milaanparmar/)** are available @ **[GitHub](https://github.com/milaan9/DataScience_Interview_Questions)**\n",
"</i></small></small>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Linkedin Skill Assessment\n",
"\n",
"## Python ➞ <span class='label label-default'>80 Questions</span>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q1. What is an abstract class?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] An abstract class is the name for any class from which you can instantiate an object.\n",
"- [ ] Abstract classes must be redefined any time an object is instantiated from them.\n",
"- [ ] Abstract classes must inherit from concrete classes.\n",
"- [x] An abstract class exists only so that other \"concrete\" classes can inherit from the abstract class."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q2. What happens when you use the build-in function `any()` on a list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] The **`any()`** function will randomly return any item from the list.\n",
"- [x] The **`any()`** function returns **`True`** if any item in the list evaluates to **`True`**. Otherwise, it returns **`False`**.\n",
"- [ ] The **`any()`** function takes as arguments the list to check inside, and the item to check for. If **\"any\"** of the items in the list match the item to check for, the function returns **`True`**.\n",
"- [ ] The **`any()`** function returns a Boolean value that answers the question **\"Are there any items in this list?\"**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q3. What data structure does a binary tree degenerate to if it isn't balanced properly?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [x] linked list\n",
"- [ ] queue\n",
"- [ ] set\n",
"- [ ] OrderedDict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q4. What statement about static methods is true?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] Static methods are called static because they always return **`None`**.\n",
"- [ ] Static methods can be bound to either a class or an instance of a class.\n",
"- [x] Static methods serve mostly as utility methods or helper methods, since they can't access or modify a class's state.\n",
"- [ ] Static methods can access and modify the state of a class or an instance of a class."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q5. What are attributes?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] Attributes are long-form version of an **`if/else`** statement, used when testing for equality between objects.\n",
"- [x] Attributes are a way to hold data or describe a state for a class or an instance of a class.\n",
"- [ ] Attributes are strings that describe characteristics of a class.\n",
"- [ ] Function arguments are called **attributes** in the context of class methods and instance methods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q6. What is the term to describe this code?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"`count, fruit, price = (2, 'apple', 3.5)`\n",
"\n",
"- [ ] **`tuple assignment`**\n",
"- [x] **`tuple unpacking`**\n",
"- [ ] **`tuple matching`**\n",
"- [ ] **`tuple duplication`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q7. What built-in list method would you use to remove items from a list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] **`\".delete()\" method`**\n",
"- [ ] **`pop(my_list)`**\n",
"- [ ] **`del(my_list)`**\n",
"- [x] **`\".pop()\" method`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q8. What is one of the most common use of Python's sys library?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [x] to capture command-line arguments given at a file's runtime\n",
"- [ ] to connect various systems, such as connecting a web front end, an API service, a database, and a mobile app\n",
"- [ ] to take a snapshot of all the packages and libraries in your virtual environment\n",
"- [ ] to scan the health of your Python ecosystem while inside a virtual environment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q9. What is the runtime of accessing a value in a dictionary by using its key?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] $O(n)$, also called linear time.\n",
"- [ ] $O(log n)$, also called logarithmic time.\n",
"- [ ] $O(n^2)$, also called quadratic time.\n",
"- [x] $O(1)$, also called constant time."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q10. What is the correct syntax for defining a class called Game?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [x] **`class Game: pass`**\n",
"- [ ] **`def Game(): pass`**\n",
"- [ ] **`def Game: pass`**\n",
"- [ ] **`class Game(): pass`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q11. What is the correct way to write a doctest?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] A \n",
"\n",
"```python\n",
"def sum(a, b):\n",
" \"\"\"\n",
" sum(4, 3)\n",
" 7\n",
" sum(-4, 5)\n",
" 1\n",
" \"\"\"\n",
" return a + b\n",
"```\n",
"\n",
"- [x] B \n",
"\n",
"```python\n",
"def sum(a, b):\n",
" \"\"\"\n",
" >>> sum(4, 3)\n",
" 7\n",
" >>> sum(-4, 5)\n",
" 1\n",
" \"\"\"\n",
" return a + b\n",
"```\n",
"\n",
"- [ ] C \n",
"\n",
"```python\n",
"def sum(a, b):\n",
" \"\"\"\n",
" # >>> sum(4, 3)\n",
" # 7\n",
" # >>> sum(-4, 5)\n",
" # 1\n",
" \"\"\"\n",
" return a + b\n",
"```\n",
"\n",
"- [ ] D \n",
"\n",
"```python\n",
"def sum(a, b):\n",
" ###\n",
" >>> sum(4, 3)\n",
" 7\n",
" >>> sum(-4, 5)\n",
" 1\n",
" ###\n",
" return a + b\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q12. What built-in Python data type is commonly used to represent a stack?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] **`set`**\n",
"- [x] **`list`**\n",
"- [ ] **`None`**. You can only build a stack from scratch.\n",
"- [ ] **`dictionary`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q13. What would this expression return?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"```python\n",
"college_years = ['Freshman', 'Sophomore', 'Junior', 'Senior']\n",
"return list(enumerate(college_years, 2019))\n",
"```\n",
"\n",
"- [ ] **`[('Freshman', 2019), ('Sophomore', 2020), ('Junior', 2021), ('Senior', 2022)]`**\n",
"- [ ] **`[(2019, 2020, 2021, 2022), ('Freshman', 'Sophomore', 'Junior', 'Senior')]`**\n",
"- [ ] **`[('Freshman', 'Sophomore', 'Junior', 'Senior'), (2019, 2020, 2021, 2022)]`**\n",
"- [x] **`[(2019, 'Freshman'), (2020, 'Sophomore'), (2021, 'Junior'), (2022, 'Senior')]`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q14. How does `defaultdict` work?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] **`defaultdict`** will automatically create a dictionary for you that has keys which are the integers 0-10.\n",
"- [ ] **`defaultdict`** forces a dictionary to only accept keys that are of the types specified when you created the **`defaultdict`** (such as string or integers).\n",
"- [x] If you try to access a key in a dictionary that doesn't exist, **`defaultdict`** will create a new key for you instead of throwing a **`KeyError`**.\n",
"- [ ] **`defaultdict`** stores a copy of a dictionary in memory that you can default to if the original gets unintentionally modified."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q15. What is the correct syntax for defining a class called `Game`, if it inherits from a parent class called `LogicGame`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] **`class Game.LogicGame(): pass`**\n",
"- [ ] **`def Game(LogicGame): pass`**\n",
"- [x] **`class Game(LogicGame): pass`**\n",
"- [ ] **`def Game.LogicGame(): pass`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q16. What is the purpose of the `self` keyword when defining or calling instance methods?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] **`self`** means that no other arguments are required to be passed into the method.\n",
"- [ ] There is no real purpose for the **`self`** method; it's just historic computer science jargon that Python keeps to stay consistent with other programming languages.\n",
"- [x] **`self`** refers to the instance whose method was called.\n",
"- [ ] **`self`** refers to the class that was inherited from to create the object using **`self`**."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q17. Which of these is NOT a characteristic of namedtuples?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] You can assign a name to each of the **`namedtuple`** members and refer to them that way, similarly to how you would access keys in **`dictionary`**.\n",
"- [ ] Each member of a namedtuple object can be indexed to directly, just like in a regular **`tuple`**.\n",
"- [ ] **`namedtuples`** are just as memory efficient as regular **`tuples`**.\n",
"- [x] No import is needed to use **`namedtuples`** because they are available in the standard library."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q18. What is an instance method?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [x] Instance methods can modify the state of an instance or the state of its parent class.\n",
"- [ ] Instance methods hold data related to the instance.\n",
"- [ ] An instance method is any class method that doesn't take any arguments.\n",
"- [ ] An instance method is a regular function that belongs to a class, but it must return **`None`.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q19. Which choice is the most syntactically correct example of the conditional branching?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] ```python\n",
"num_people = 5\n",
"if num_people > 10:\n",
" print(\"There is a lot of people in the pool.\")\n",
"elif num_people > 4:\n",
" print(\"There are some people in the pool.\")\n",
"elif num_people > 0:\n",
" print(\"There are a few people in the pool.\")\n",
"else:\n",
" print(\"There is no one in the pool.\")\n",
"```\n",
"\n",
"- [ ] ```python\n",
"num_people = 5\n",
"if num_people > 10:\n",
" print(\"There is a lot of people in the pool.\")\n",
"if num_people > 4:\n",
" print(\"There are some people in the pool.\")\n",
"if num_people > 0:\n",
" print(\"There are a few people in the pool.\")\n",
"else:\n",
" print(\"There is no one in the pool.\")\n",
" ```\n",
"\n",
"- [x] ```python\n",
"num_people = 5\n",
"if num_people > 10:\n",
" print(\"There is a lot of people in the pool.\")\n",
"elif num_people > 4:\n",
" print(\"There are some people in the pool.\")\n",
"elif num_people > 0:\n",
" print(\"There are a few people in the pool.\")\n",
"else:\n",
" print(\"There is no one in the pool.\")\n",
"```\n",
"\n",
"- [ ] ```python\n",
"if num_people > 10;\n",
" print(\"There is a lot of people in the pool.\")\n",
"if num_people > 4:\n",
" print(\"There are some people in the pool.\")\n",
"if num_people > 0:\n",
" print(\"There are a few people in the pool.\")\n",
"else:\n",
" print(\"There is no one in the pool.\")\n",
" ```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q20. Which statement does NOT describe the object-oriented programming concept of encapsulation?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] It protects the data from outside interference.\n",
"- [ ] A parent class is encapsulated and no data from the parent class passes on to the child class.\n",
"- [ ] It keeps data and the methods that can manipulate that data in one place.\n",
"- [x] It only allows the data to be changed by methods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q21. What is the purpose of an if/else statement?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] An if/else statement tells the computer which chunk of code to run if the instructions you coded are incorrect\n",
"- [ ] An if/else statement runs one chunk of code if all the imports were successful, and another chunk of code if the imports were not successful\n",
"- [x] An if/else statement executes one chunk of code if a condition it true, but a different chunk of code if the condition is false\n",
"- [ ] An if/else statement tells the computer which chunk of code to run if the is enough memory to handle it. and which chunk of code to run if there is not enough memory to handle it"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q22. What built-in Python data type is commonly used to represent a queue?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] **`dictionary`**\n",
"- [ ] **`set`**\n",
"- [ ] **`None`**. You can only build a stack from scratch.\n",
"- [x] **`list`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q23. What is the correct syntax for instantiating a new object of the type Game?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] **`my_game = class.Game()`**\n",
"- [ ] **`my_game = class(Game)`**\n",
"- [x] **`my_game = Game()`**\n",
"- [ ] **`my_game = Game.create()`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q24. What does the built-in `map()` function do?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] It creates a path from multiple values in an iterable to a single value.\n",
"- [x] It applies a function to each item in an iterable and returns the value of that function.\n",
"- [ ] It converts a complex value type into simpler value types.\n",
"- [ ] It creates a mapping between two different elements of different iterables."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q25. If you don't explicitly return a value from a function, what happens?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] The function will return a RuntimeError if you don't return a value.\n",
"- [x] If the return keyword is absent, the function will return **`None`**.\n",
"- [ ] If the return keyword is absent, the function will return **`True`**.\n",
"- [ ] The function will enter an infinite loop because it won't know when to stop executing its code."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q26. What is the purpose of the `pass` statement in Python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] It is used to skip the **`yield`** statement of a generator and return a value of None.\n",
"- [x] It is a null operation used mainly as a placeholder in functions, classes, etc.\n",
"- [ ] It is used to pass control from one statement block to another.\n",
"- [ ] It is used to skip the rest of a **`while`** or **`for`** loop and return to the start of the loop."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q27. What is the term used to describe items that may be passed into a function?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [x] arguments\n",
"- [ ] paradigms\n",
"- [ ] attributes\n",
"- [ ] decorators"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q28. Which collection type is used to associate values with unique keys?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] **`slot`**\n",
"- [x] **`dictionary`**\n",
"- [ ] **`queue`**\n",
"- [ ] **`sorted list`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q29. When does a for loop stop iterating?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] when it encounters an infinite loop\n",
"- [ ] when it encounters an if/else statement that contains a break keyword\n",
"- [x] when it has assessed each item in the iterable it is working on or a break keyword is encountered\n",
"- [ ] when the runtime for the loop exceeds $O(n^2)$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q30. Assuming the node is in a singly linked list, what is the runtime complexity of searching for a specific node within a singly linked list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [x] The runtime is $O(n)$ because in the worst case, the node you are searching for is the last node, and every node in the linked list must be visited.\n",
"- [ ] The runtime is $O(nk)$, with n representing the number of nodes and k representing the amount of time it takes to access each node in memory.\n",
"- [ ] The runtime cannot be determined unless you know how many nodes are in the singly linked list.\n",
"- [ ] The runtime is $O(1)$ because you can index directly to a node in a singly linked list."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q31. Given the following three list, how would you create a new list that matches the desired output printed below?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"```python\n",
"fruits = ['Apples', 'Oranges', 'Bananas']\n",
"quantities = [5, 3, 4]\n",
"prices = [1.50, 2.25, 0.89]\n",
"\n",
"#Desired output\n",
"[('Apples', 5, 1.50),\n",
"('Oranges', 3, 2.25),\n",
"('Bananas', 4, 0.89)]\n",
"```\n",
"\n",
"- [ ] ```python\n",
"output = []\n",
"fruit_tuple_0 = (first[0], quantities[0], price[0])\n",
"output.append(fruit_tuple)\n",
"fruit_tuple_1 = (first[1], quantities[1], price[1])\n",
"output.append(fruit_tuple)\n",
"fruit_tuple_2 = (first[2], quantities[2], price[2])\n",
"output.append(fruit_tuple)\n",
"return output\n",
" ```\n",
"\n",
"- [x] ```python\n",
"i = 0\n",
"output = []\n",
"for fruit in fruits:\n",
" temp_qty = quantities[i]\n",
" temp_price = prices[i]\n",
" output.append((fruit, temp_qty, temp_price))\n",
" i += 1\n",
"return output\n",
" ```\n",
"\n",
"- [ ] ```python\n",
"groceries = zip(fruits, quantities, prices)\n",
"return groceries\n",
"\n",
" [\n",
" ('Apples', 5, 1.50),\n",
" ('Oranges', 3, 2.25),\n",
" ('Bananas', 4, 0.89)\n",
" ]\n",
" ```\n",
"\n",
"- [ ] ```python\n",
"i = 0\n",
"output = []\n",
"for fruit in fruits:\n",
" for qty in quantities:\n",
" for price in prices:\n",
" output.append((fruit, qty, price))\n",
" i += 1\n",
"return output\n",
" ```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q32. What happens when you use the built-in function `all()` on a list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] The **`all()`** function returns a Boolean value that answers the question \"Are all the items in this list the same?\"\n",
"- [ ] The **`all()`** function returns True if all the items in the list can be converted to strings. Otherwise, it returns False.\n",
"- [ ] The **`all()`** function will return all the values in the list.\n",
"- [x] The **`all()`** function returns True if all items in the list evaluate to True. Otherwise, it returns False."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q33. What is the correct syntax for calling an instance method on a class named Game?\n",
"\n",
"**(Answer format may vary. Game and roll (or dice_roll) should each be called with no parameters.)**\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [x] ```python\n",
">>> dice = Game()\n",
">>> dice.roll()\n",
"```\n",
"\n",
"- [ ] ```python\n",
">>> dice = Game(self)\n",
">>> dice.roll(self)\n",
"```\n",
"\n",
"- [ ] ```python\n",
">>> dice = Game()\n",
">>> dice.roll(self)\n",
"```\n",
"\n",
"- [ ] ```python\n",
">>> dice = Game(self)\n",
">>> dice.roll()\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q34. What is the algorithmic paradigm of quick sort?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] backtracking\n",
"- [ ] dynamic programming\n",
"- [ ] decrease and conquer\n",
"- [x] divide and conquer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q35. What is runtime complexity of the list's built-in `append()` method?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [x] $O(1)$, also called constant time\n",
"- [ ] $O(log n)$, also called logarithmic time\n",
"- [ ] $O(n^2)$, also called quadratic time\n",
"- [ ] $O(n)$, also called linear time"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q36. What is key difference between a set and a list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] A set is an ordered collection unique items. A list is an unordered collection of non-unique items.\n",
"- [ ] Elements can be retrieved from a list but they cannot be retrieved from a set.\n",
"- [ ] A set is an ordered collection of non-unique items. A list is an unordered collection of unique items.\n",
"- [x] A set is an unordered collection unique items. A list is an ordered collection of non-unique items."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q37. What is the definition of abstraction as applied to object-oriented Python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] Abstraction means that a different style of code can be used, since many details are already known to the program behind the scenes.\n",
"- [x] Abstraction means the implementation is hidden from the user, and only the relevant data or information is shown.\n",
"- [ ] Abstraction means that the data and the functionality of a class are combined into one entity.\n",
"- [ ] Abstraction means that a class can inherit from more than one parent class."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q38. What does this function print?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"```python\n",
"def print_alpha_nums(abc_list, num_list):\n",
" for char in abc_list:\n",
" for num in num_list:\n",
" print(char, num)\n",
" return\n",
"\n",
"print_alpha_nums(['a', 'b', 'c'], [1, 2, 3])\n",
"```\n",
"\n",
"- [x] ```python\n",
"a 1\n",
"a 2\n",
"a 3\n",
"b 1\n",
"b 2\n",
"b 3\n",
"c 1\n",
"c 2\n",
"c 3\n",
"```\n",
"\n",
"- [ ] ```python\n",
"['a', 'b', 'c'], [1, 2, 3]\n",
"```\n",
"\n",
"- [ ] ```python\n",
"aaa\n",
"bbb\n",
"ccc\n",
"111\n",
"222\n",
"333\n",
"```\n",
"\n",
"- [ ] ```python\n",
"a 1 2 3\n",
"b 1 2 3\n",
"c 1 2 3\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q39. What is the correct syntax for calling an instance method on a class named `Game`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [x] ```python\n",
"my_game = Game()\n",
"my_game.roll_dice()\n",
"```\n",
"\n",
"- [ ] ```python\n",
"my_game = Game()\n",
"self.my_game.roll_dice()\n",
"```\n",
"\n",
"- [ ] ```python\n",
"my_game = Game(self)\n",
"self.my_game.roll_dice()\n",
"```\n",
"\n",
"- [ ] ```python\n",
"my_game = Game(self)\n",
"my_game.roll_dice(self)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q40. Correct representation of doctest for function in Python\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] ```python\n",
"def sum(a, b):\n",
" # a = 1\n",
" # b = 2\n",
" # sum(a, b) = 3\n",
"\n",
" return a + b\n",
"```\n",
"\n",
"- [ ] ```python\n",
"def sum(a, b):\n",
" \"\"\"\n",
" a = 1\n",
" b = 2\n",
" sum(a, b) = 3\n",
" \"\"\"\n",
"\n",
" return a + b\n",
"```\n",
"\n",
"- [x] ```python\n",
"def sum(a, b):\n",
" \"\"\"\n",
" >>> a = 1\n",
" >>> b = 2\n",
" >>> sum(a, b)\n",
" 3\n",
" \"\"\"\n",
"\n",
" return a + b\n",
"```\n",
"\n",
"- [ ] ```python\n",
"def sum(a, b):\n",
" '''\n",
" a = 1\n",
" b = 2\n",
" sum(a, b) = 3\n",
" '''\n",
" return a + b\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q41. Suppose a Game class inherits from two parent classes: BoardGame and LogicGame. Which statement is true about the methods of an object instantiated from the Game class?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] When instantiating an object, the object doesn't inherit any of the parent class's methods.\n",
"- [ ] When instantiating an object, the object will inherit the methods of whichever parent class has more methods.\n",
"- [ ] When instantiating an object, the programmer must specify which parent class to inherit methods from.\n",
"- [x] An instance of the Game class will inherit whatever methods the BoardGame and LogicGame classes have."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q42. What does calling namedtuple on a collection type return?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] a generic object class with iterable parameter fields\n",
"- [ ] a generic object class with non-iterable named fields\n",
"- [ ] a tuple subclass with non-iterable parameter fields\n",
"- [x] a tuple subclass with iterable named fields"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q43. What symbol(s) do you use to assess equality between two elements?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] **`&&`**\n",
"- [ ] **`=`**\n",
"- [x] **`==`**\n",
"- [ ] **`||`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q44. Review the code below. What is the correct syntax for changing the price to 1.5?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"```python\n",
" fruit_info = {\n",
" 'fruit': 'apple',\n",
" 'count': 2,\n",
" 'price': 3.5\n",
" }\n",
"```\n",
"\n",
"- [x] **`fruit_info ['price'] = 1.5`**\n",
"- [ ] **`my_list [3.5] = 1.5`**\n",
"- [ ] **`1.5 = fruit_info ['price]`**\n",
"- [ ] **`my_list['price'] == 1.5`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q45. What value would be returned by this check for equality?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"**`5 != 6`**\n",
"\n",
"- [ ] **`yes`**\n",
"- [ ] **`False`**\n",
"- [x] **`True`**\n",
"- [ ] **`None`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q46. What does a class's **init**() method do?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] The **`__init__`** method makes classes aware of each other if more than one class is defined in a single code file.\n",
"- [ ] The **`__init__`** method is included to preserve backwards compatibility from Python 3 to Python 2, but no longer needs to be used in Python 3.\n",
"- [x] The **`__init__`** method is a constructor method that is called automatically whenever a new object is created from a class. It sets the initial state of a new object.`\n",
"- [ ] The **`__init__`** method initializes any imports you may have included at the top of your file."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q47. What is meant by the phrase \"space complexity\"?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] **`How many microprocessors it would take to run your code in less than one second`**\n",
"- [ ] **`How many lines of code are in your code file`**\n",
"- [x] **`The amount of space taken up in memory as a function of the input size`**\n",
"- [ ] **`How many copies of the code file could fit in 1 GB of memory`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q48. What is the correct syntax for creating a variable that is bound to a dictionary?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [x] **`fruit_info = {'fruit': 'apple', 'count': 2, 'price': 3.5}`**\n",
"- [ ] **`fruit_info =('fruit': 'apple', 'count': 2,'price': 3.5 ).dict()`**\n",
"- [ ] **`fruit_info = ['fruit': 'apple', 'count': 2,'price': 3.5 ].dict()`**\n",
"- [ ] **`fruit_info = to_dict('fruit': 'apple', 'count': 2, 'price': 3.5)`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q49. What is the proper way to write a list comprehension that represents all the keys in this dictionary?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"**`fruits = {'Apples': 5, 'Oranges': 3, 'Bananas': 4}`**\n",
"\n",
"- [ ] **`fruit_names = [x in fruits.keys() for x]`**\n",
"- [ ] **`fruit_names = for x in fruits.keys() *`**\n",
"- [x] **`fruit_names = [x for x in fruits.keys()]`**\n",
"- [ ] **`fruit_names = x for x in fruits.keys()`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q50. What is the algorithmic paradigm of quick sort?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] **`backtracking`**\n",
"- [x] **`divide and conquer`**\n",
"- [ ] **`dynamic programming`**\n",
"- [ ] **`decrease and conquer`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q51. What is the purpose of the `self` keyword when defining or calling methods on an instance of an object?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] **`self`** refers to the class that was inherited from to create the object using **`self`**.\n",
"- [ ] There is no real purpose for the **`self`** method. It's just legacy computer science jargon that Python keeps to stay consistent with other programming languages.\n",
"- [ ] **`self`** means that no other arguments are required to be passed into the method.\n",
"- [x] **`self`** refers to the instance whose method was called."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q52. What statement about a class methods is true?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] A class method is a regular function that belongs to a class, but it must return None.\n",
"- [x] A class method can modify the state of the class, but they can't directly modify the state of an instance that inherits from that class.\n",
"- [ ] A class method is similar to a regular function, but a class method doesn't take any arguments.\n",
"- [ ] A class method hold all of the data for a particular class."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q53. What does it mean for a function to have linear runtime?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] You did not use very many advanced computer programming concepts in your code.\n",
"- [ ] The difficulty level your code is written at is not that high.\n",
"- [ ] It will take your program less than half a second to run.\n",
"- [x] The amount of time it takes the function to complete grows linearly as the input size increases."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q54. What is the proper way to define a function?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] **`def getMaxNum(list_of_nums): # body of function goes here`**\n",
"- [ ] **`func get_max_num(list_of_nums): # body of function goes here`**\n",
"- [ ] **`func getMaxNum(list_of_nums): # body of function goes here`**\n",
"- [x] **`def get_max_num(list_of_nums): # body of function goes here`**\n",
" \n",
" \n",
"**[Explanation](https://www.python.org/dev/peps/pep-0008/)**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q55. According to the PEP 8 coding style guidelines, how should constant values be named in Python?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] in camel case without using underscores to separate words -- e.g. **`maxValue = 255`**\n",
"- [ ] in lowercase with underscores to separate words -- e.g. **`max_value = 255`**\n",
"- [x] in all caps with underscores separating words -- e.g. **`MAX_VALUE = 255`**\n",
"- [ ] in mixed case without using underscores to separate words -- e.g. **`MaxValue = 255`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q56. Describe the functionality of a deque.\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] A deque adds items to one side and remove items from the other side.\n",
"- [ ] A deque adds items to either or both sides, but only removes items from the top.\n",
"- [x] A deque adds items at either or both ends, and remove items at either or both ends.\n",
"- [ ] A deque adds items only to the top, but remove from either or both sides."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q57. What is the correct syntax for creating a variable that is bound to a set?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [x] **`myset = {0, 'apple', 3.5}`**\n",
"- [ ] **`myset = to_set(0, 'apple', 3.5)`**\n",
"- [ ] **`myset = (0, 'apple', 3.5).to_set()`**\n",
"- [ ] **`myset = (0, 'apple', 3.5).set()`**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q58. What is the correct syntax for defining an `__init__()` method that takes no parameters?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] ```python\n",
"class __init__(self):\n",
" pass\n",
"```\n",
"\n",
"- [ ] ```python\n",
"def __init__():\n",
" pass\n",
"```\n",
"\n",
"- [ ] ```python\n",
"class __init__():\n",
" pass\n",
"```\n",
"\n",
"- [x] ```python\n",
"def __init__(self):\n",
" pass\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q59. Which statement about the class methods is true?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] A class method holds all of the data for a particular class.\n",
"- [x] A class method can modify the state of the class, but it cannot directly modify the state of an instance that inherits from that class.\n",
"- [ ] A class method is a regular function that belongs to a class, but it must return `None`\n",
"- [ ] A class method is similar to a regular function, but a class method does not take any arguments."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q60. Which of the following is TRUE About how numeric data would be organised in a binary Search tree?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [x] For any given Node in a binary Search Tree, the child node to the left is less than the value of the given node and the child node to its right is greater than the given node. (Not Sure)\n",
"- [ ] Binary Search Tree cannot be used to organize and search through numeric data, given the complication that arise with very deep trees.\n",
"- [ ] The top node of the binary search tree would be an arbitrary number. All the nodes to the left of the top node need to be less than the top node's number, but they don't need to ordered in any particular way.\n",
"- [ ] The smallest numeric value would go in the top most node. The next highest number would go in its left child node, the the next highest number after that would go in its right child node. This pattern would continue until all numeric values were in their own node."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q61. Why would you use a decorator?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] A decorator is similar to a class and should be used if you are doing functional programming instead of object oriented programming.\n",
"- [ ] A decorator is a visual indicator to someone reading your code that a portion of your code is critical and should not be changed.\n",
"- [x] You use the decorator to alter the functionality of a function without the without having to modify the functions code.\n",
"- [ ] An import statement is preceded by a decorator, python knows to import the most recent version of whatever package or library is being imported."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q62. When would you use a for loop ?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] Only in some situations, as loops are used ony for certain type of programming.\n",
"- [x] When you need to check every element in an iterable of known length.\n",
"- [ ] When you want to minimize the use of strings in your code.\n",
"- [ ] When you want to run code in one file for a function in another file."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q63. What is the most self-descriptive way to define a function that calculates sales tax on a purchase?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] ```python\n",
"def tax(my_float):\n",
" '''Calculates the sales tax of a purchase. Takes in a float representing the subtotal as an argument and returns a float representing the sales tax.'''\n",
" pass\n",
"```\n",
"\n",
"- [ ] ```python\n",
"def tx(amt):\n",
" '''Gets the tax on an amount.'''\n",
"```\n",
"\n",
"- [ ] ```python\n",
"def sales_tax(amount):\n",
" '''Calculates the sales tax of a purchase. Takes in a float representing the subtotal as an argument and returns a float representing the sales tax.'''\n",
"```\n",
"\n",
"- [x] ```python\n",
"def calculate_sales_tax(subtotal):\n",
" pass\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q64. What would happen if you did not alter the state of the element that an algorithm is operating on recursively?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] You do not have to alter the state of the element the algorithm is recursing on.\n",
"- [ ] You would eventually get a KeyError when the recursive portion of the code ran out of items to recurse on.\n",
"- [x] You would get a RuntimeError: maximum recursion depth exceeded.\n",
"- [ ] The function using recursion would return None."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q65. What is the runtime complexity of searching for an item in a binary search tree?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] The runtime for searching in a binary search tree is O(1) because each node acts as a key, similar to a dictionary.\n",
"- [ ] The runtime for searching in a binary search tree is O(n!) because every node must be compared to every other node.\n",
"- [x] The runtime for searching in a binary search tree is generally O(h), where h is the height of the tree.\n",
"- [ ] The runtime for searching in a binary search tree is O(n) because every node in the tree must be visited."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q66. Why would you use `mixin`?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] You use a **`mixin`** to force a function to accept an argument at runtime even if the argument wasn't included in the function's definition.\n",
"- [ ] You use a **`mixin`** to allow a decorator to accept keyword arguments.\n",
"- [ ] You use a **`mixin`** to make sure that a class's attributes and methods don't interfere with global variables and functions.\n",
"- [x] If you have many classes that all need to have the same functionality, you'd use a **`mixin`** to define that functionality."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q67. What is the runtime complexity of adding an item to a stack and removing an item from a stack?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] Add items to a stack in O(1) time and remove items from a stack on O(n) time.\n",
"- [x] Add items to a stack in O(1) time and remove items from a stack in O(1) time.\n",
"- [ ] Add items to a stack in O(n) time and remove items from a stack on O(1) time.\n",
"- [ ] Add items to a stack in O(n) time and remove items from a stack on O(n) time."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q68. What does calling namedtuple on a collection type return?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [x] a tuple subclass with iterable named fields\n",
"- [ ] a generic object class with non-iterable named fields\n",
"- [ ] a generic object class with iterable parameter fields\n",
"- [ ] a tuple subclass with non-iterable parameter fields"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q69. Which statement accurately describes how items are added to and remnoved from a stack?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] a stacks adds items to one side and removes items from the other side.\n",
"- [x] a stacks adds items to the top and removes items from the top.\n",
"- [ ] a stacks adds items to the top and removes items from anywhere in the stack.\n",
"- [ ] a stacks adds items to either end and removes items from either end."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q70. What is a base case in a recursive function?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [x] A base case is the condition that allows the algorithm to stop recursing. It is usually a problem that is small enough to solve directly.\n",
"- [ ] The base case is summary of the overall problem that needs to be solved.\n",
"- [ ] The base case is passed in as an argument to a function whose body makes use of recursion.\n",
"- [ ] The base case is similar to a base class, in that it can be inherited by another object."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q71. Why is it considered good practice to open a file from within a Python script by using the `with` keyword?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] The **`with`** keyword lets you choose which application to open the file in.\n",
"- [ ] The **`with`** keyword acts like a **`for`** loop, and lets you access each line in the file one by one.\n",
"- [ ] There is no benefit to using the **`with`** keyword for opening a file in Python.\n",
"- [x] When you open a file using the **`with`** keyword in Python, Python will make sure the file gets closed, even if an exception or error is thrown."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q72. Why would you use a virtual environment?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [x] Virtual environments create a \"bubble\" around your project so that any libraries or packages you install within it don't affect your entire machine.\n",
"- [ ] Teams with remote employees use virtual environments so they can share code, do code reviews, and collaorate remotely.\n",
"- [ ] Virtual environments were common in Python 2 because they augmented missing features in the language. Virtual environments are not necessary in Python 3 due to advancements in the language.\n",
"- [ ] Virtual environments are tied to your GitHub or Bitbucket account, allowing you to access any of your repos virtually from any machine."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q73. What is the correct way to run all the doctests in a given file from the command line?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] python3 -m doctest <filename>\n",
"- [x] python3 <filename>\n",
"- [ ] python3 <filename> rundoctests\n",
"- [ ] python3 doctest"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q74. What is a lambda function ?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] any function that makes use of scientific or mathematical constants, often represented by Greek letters in academic writing\n",
"- [ ] a function that get executed when decorators are used\n",
"- [ ] any function whose definition is contained within five lines of code or fewer\n",
"- [x] a small, anonymous function that can take any number of arguments but has only expression to evaluate\n",
"\n",
"[Reference](https://www.guru99.com/python-lambda-function.html)\n",
"\n",
"**Explanation:**\n",
"\n",
"The lambda notation is basically an anonymous function that can take any number of arguments with only single expression (i.e, cannot be overloaded). It has been introducted in other programming languages, such as C++ and Java. The lambda notation allows programmers to \"bypass\" function declaration."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q75. What is the primary difference between lists and tuples?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] You can access a specifc element in a list by indexing to its position, but you cannot access a specific element in a tuple unless you iterate through the tuple\n",
"- [x] Lists are mutable, meaning you can change the data that is inside them at any time. Tuples are immutable, meaning you cannot change the data that is inside them once you have created the tuple.\n",
"- [ ] Lists are immutable, meaning you cannot change the data that is inside them once you have created the list. Tuples are mutable, meaning you can change the data that is inside them at any time.\n",
"- [ ] Lists can hold several data types inside them at once, but tuples can only hold the same data type if multiple elements are present."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q76. Which statement about static method is true?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] Static methods can be bound to either a class or an instance of a class.\n",
"- [ ] Static methods can access and modify the state of a class or an instance of a class.\n",
"- [x] Static methods serve mostly as utility or helper methods, since they cannot access or modify a class's state.\n",
"- [ ] Static methods are called static because they always return None."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q77. What does a generator return?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] None\n",
"- [ ] An iterable object\n",
"- [ ] A linked list data structure from a non-empty list\n",
"- [ ] All the keys of the given dictionary"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q78. What is the difference between class attributes and instance attributes?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] Instance attributes can be changed, but class attributes cannot be changed\n",
"- [ ] Class attributes are shared by all instances of the class. Instance attributes may be unique to just that instance\n",
"- [ ] There is no difference between class attributes and instance attributes\n",
"- [ ] Class attributes belong just to the class, not to instance of that class. Instance attributes are shared among all instances of a class"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q79. What is the correct syntax of creating an instance method?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] ```python\n",
"def get_next_card():\n",
" # method body goes here\n",
"```\n",
"\n",
"- [x] ```python\n",
"def get_next_card(self):\n",
" # method body goes here\n",
"```\n",
"\n",
"- [ ] ```python\n",
"def self.get_next_card():\n",
" # method body goes here\n",
"```\n",
"\n",
"- [ ] ```python\n",
"def self.get_next_card(self):\n",
" # method body goes here\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q80. What is a key difference between a set and a list?\n",
"\n",
"<span class='label label-default'>Solution</span>\n",
"\n",
"- [ ] A set is an ordered collection of non-unique items. A list is an unordered collection of unique items.\n",
"- [ ] A set is an ordered collection of unique items. A list is an unordered collection of non-unique items.\n",
"- [ ] Elements can be retrieved from a list but they cannot be retrieved from a set.\n",
"- [x] A set is an unordered collection of unique items. A list is an ordered collection of non-unique items."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
gitextract_4xh2ybvq/ ├── 01_120_Python_Basics_Interview_Questions.ipynb ├── 02_Predictive_Modeling.ipynb ├── 03_Programming.ipynb ├── 04_Probability.ipynb ├── 05_Statistical_Inference.ipynb ├── 06_Data_Analysis.ipynb ├── 07_Product_Metrics.ipynb ├── 08_Communication.ipynb ├── 09_Coding.ipynb ├── 10_Linkedin_Skill_Assessment_Python.ipynb ├── LICENSE └── README.md
Condensed preview — 12 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (230K chars).
[
{
"path": "01_120_Python_Basics_Interview_Questions.ipynb",
"chars": 48196,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \"<small><small><i>\\n\",\n \"All the "
},
{
"path": "02_Predictive_Modeling.ipynb",
"chars": 21049,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \"<small><small><i>\\n\",\n \"All the "
},
{
"path": "03_Programming.ipynb",
"chars": 10867,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \"<small><small><i>\\n\",\n \"All the "
},
{
"path": "04_Probability.ipynb",
"chars": 12797,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \"<small><small><i>\\n\",\n \"All the "
},
{
"path": "05_Statistical_Inference.ipynb",
"chars": 11804,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \"<small><small><i>\\n\",\n \"All the "
},
{
"path": "06_Data_Analysis.ipynb",
"chars": 21499,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \"<small><small><i>\\n\",\n \"All the "
},
{
"path": "07_Product_Metrics.ipynb",
"chars": 11108,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \"<small><small><i>\\n\",\n \"All the "
},
{
"path": "08_Communication.ipynb",
"chars": 4327,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \"<small><small><i>\\n\",\n \"All the "
},
{
"path": "09_Coding.ipynb",
"chars": 3270,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \"<small><small><i>\\n\",\n \"All the "
},
{
"path": "10_Linkedin_Skill_Assessment_Python.ipynb",
"chars": 55094,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \"<small><small><i>\\n\",\n \"All the "
},
{
"path": "LICENSE",
"chars": 1070,
"preview": "MIT License\n\nCopyright (c) 2021 Milaan Parmar\n\nPermission is hereby granted, free of charge, to any person obtaining a c"
},
{
"path": "README.md",
"chars": 8151,
"preview": "<p align=\"center\"> \n<a href=\"https://github.com/milaan9\"><img src=\"https://img.shields.io/static/v1?logo=github&label=ma"
}
]
About this extraction
This page contains the full source code of the milaan9/DataScience_Interview_Questions GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 12 files (204.3 KB), approximately 59.3k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.