Repository: absognety/Interview-Process-Coding-Questions Branch: master Commit: da7b50a1b9b8 Files: 156 Total size: 243.8 KB Directory structure: gitextract_je5aerut/ ├── .gitignore ├── 6sense/ │ ├── README.md │ └── findWordInGrid.py ├── Accenture/ │ └── README.md ├── Accolite/ │ ├── binaryArraySort.py │ ├── merge2SortedLinkedLists.py │ └── rotateLinkedListByKelements.py ├── Adobe/ │ ├── QuickSort.py │ ├── WinnerOfElection.py │ ├── checkIfStringsAreRotations.py │ ├── countOfInversionsArray.py │ └── parenthesisChecker.py ├── Amazon/ │ ├── DataEngineering/ │ │ └── README.md │ ├── Delete_Node_Without_HeadPointer.py │ ├── EquilibriumPointFirstOccurWhere.py │ ├── FindzeroSumSubArrays.py │ ├── ImplementSTRSTR.py │ ├── JosephusProblem.py │ ├── MaximumOfMinimumsOfAllWindowSizes.py │ ├── Possible_Words_Phone_Digits.py │ ├── countOfInversionsArray.py │ ├── getAggregationsByParsing.py │ ├── keypadTyping.py │ ├── merge2SortedLinkedLists.py │ ├── missingSmallestPositiveNumber.py │ ├── moveAllZerosToEndOfArray.py │ ├── parenthesisChecker.py │ ├── relativeSorting.py │ └── rotateLinkedListByKelements.py ├── AmericanExpress/ │ └── README.md ├── Athena-Health/ │ └── README.md ├── Betterworks/ │ ├── README.md │ └── print.py ├── Bloomberg/ │ └── moveAllZerosToEndOfArray.py ├── BrightMoney/ │ ├── Data_Analyst_Test.docx │ ├── Knapsack.py │ ├── LongestBalancedSubstring.py │ ├── README.md │ ├── countDistinctValidPANNumbers.py │ └── printSpirally.py ├── Busigence/ │ ├── README.md │ └── Vikas_Chitturi.ipynb ├── Byndr/ │ └── README.md ├── CloudCover/ │ └── README.md ├── DATFreightAnalytics/ │ └── README.md ├── Facebook/ │ └── ImplementSTRSTR.py ├── FactSet/ │ └── convertArrayToWave.py ├── Flipkart/ │ ├── addTwoNumbers_LinkedListRep.py │ ├── countOfInversionsArray.py │ └── parenthesisChecker.py ├── Fractal_Analytics/ │ ├── Arrays.md │ ├── Comparator.md │ ├── README.md │ ├── countChampNumbers.py │ ├── countOfAnagrams.py │ ├── numberOfGroups.py │ ├── sorting_words.md │ └── substrings.md ├── Fre8wise/ │ ├── README.md │ └── manipulate_string.py ├── Goldman-Sachs/ │ ├── convertArrayToWave.py │ ├── numberOfSquares_in_NbyN_CheesBoard.py │ ├── printNumbersContain123.py │ └── repeatingCharacter_LeftmostOccurrence.py ├── Google/ │ ├── First_Recurring_Character_In_String.py │ ├── allocateMinimumPages.py │ ├── checkPairsWithGivenSum.py │ └── maxIndexDiffOfArray.py ├── Grofers/ │ └── QuickSort.py ├── Guardant-Health/ │ ├── README.md │ ├── fallen_leaves.py │ └── maintainMinimumStartingNumber.py ├── HighRadius-Technologies/ │ └── README.md ├── Hike/ │ └── QuickSort.py ├── IQLECT/ │ ├── LargestPrimeFromSubsetSum.py │ └── README.md ├── InMobi/ │ └── README.md ├── Infrrd/ │ ├── problem01/ │ │ └── maximumRowsWithAll1s.py │ └── problem02/ │ └── countMe.py ├── Instabase/ │ ├── Add2BinaryStrings.py │ ├── README.md │ └── checkPairsWithGivenSum.py ├── Intuit/ │ ├── BuySellStock.py │ └── binaryArraySort.py ├── Kritikal-Solutions/ │ └── Delete_Without_Head_Pointer.py ├── LeadSquared/ │ ├── README.md │ ├── UniqueWaysToClimbStaircase.py │ ├── maximumSumLikeTimeCoefficients.py │ └── totalDistanceByStreetLights.py ├── MAQ_Software/ │ └── Closet0s1s2s.py ├── MakeMyTrip/ │ ├── rotateLinkedListByKelements.py │ └── sortedLinkedList012s.py ├── Mastek/ │ ├── README.md │ └── computedepth.sql ├── Microsoft/ │ ├── countOfInversionsArray.py │ ├── merge2SortedLinkedLists.py │ └── relativeSorting.py ├── Morgan-Stanley/ │ └── addTwoNumbers_LinkedListRep.py ├── Myntra/ │ ├── countOfInversionsArray.py │ └── removeDuplicatesSortedLinkedList.py ├── Nagarro/ │ └── isAnagram.py ├── Nielsen/ │ └── README.md ├── OYO_Rooms/ │ ├── FindzeroSumSubArrays.py │ ├── parenthesisChecker.py │ └── removeDuplicatesSortedLinkedList.py ├── Oracle/ │ └── merge2SortedLinkedLists.py ├── Paytm/ │ ├── Convert_Infix_To_Postfix.py │ ├── binaryArraySort.py │ ├── frequencyLimitedRangeArrayElements.py │ └── subarrayWithZeroSum.py ├── Qalcomm/ │ ├── ImplementSTRSTR.py │ └── addTwoNumbers_LinkedListRep.py ├── Quikr/ │ └── BuySellStock.py ├── README.md ├── RelianceJIO/ │ ├── CamelCaseToSnakeCase.py │ ├── README.md │ └── getSmallestNumber.py ├── Salesforce/ │ └── BuySellStock.py ├── Samsung/ │ ├── merge2SortedLinkedLists.py │ ├── missingSmallestPositiveNumber.py │ └── moveAllZerosToEndOfArray.py ├── SkyPointCloud/ │ ├── README.md │ ├── binarySearch.py │ └── modifyString.py ├── Snapdeal/ │ └── addTwoNumbers_LinkedListRep.py ├── Swiggy/ │ └── README.md ├── SymphonyAI/ │ ├── README.md │ └── maximumHeightOfMudSegment.py ├── Tavant-Technologies/ │ └── README.md ├── Thomson-Reuters/ │ └── README.md ├── Thoughtworks/ │ ├── README.md │ └── ScheduleInterviews.py ├── Twilio/ │ ├── DiskSpaceAnalysis.py │ └── README.md ├── Uber/ │ ├── README.md │ └── getNewArray.py ├── Ushur/ │ ├── README.md │ └── findPairs.py ├── VMWare/ │ ├── Convert_Infix_To_Postfix.py │ ├── maxIndexDiffOfArray.py │ └── mergeKSortedLinkedLists.py ├── Vimeo/ │ └── README.md ├── Visa/ │ ├── addNode_DoublyLinkedList.py │ ├── populateList.py │ └── removeDuplicatesSortedLinkedList.py ├── Walmart/ │ ├── GroupAnagrams.py │ ├── minimumCoins.py │ └── minimumCoinsRecursive.py ├── Yahoo/ │ └── ThreeWayPartition.py ├── ZSAssociates/ │ ├── CrossSequence.py │ ├── OutputOfProgram.py │ └── OutputOfProgram2.py ├── Zoho/ │ ├── merge2SortedLinkedLists.py │ └── rearrangeArrayAlternately.py ├── Zycus/ │ └── README.md └── cimpress/ └── README.md ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ *.egg-info/ .installed.cfg *.egg MANIFEST # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover .hypothesis/ .pytest_cache/ # Translations *.mo *.pot # Django stuff: *.log local_settings.py db.sqlite3 # Flask stuff: instance/ .webassets-cache # Scrapy stuff: .scrapy # Sphinx documentation docs/_build/ # PyBuilder target/ # Jupyter Notebook .ipynb_checkpoints # pyenv .python-version # celery beat schedule file celerybeat-schedule # SageMath parsed files *.sage.py # Environments .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ # Spyder project settings .spyderproject .spyproject # Rope project settings .ropeproject # mkdocs documentation /site # mypy .mypy_cache/ ================================================ FILE: 6sense/README.md ================================================ # Hackerrank Test (Personal Experience): ================================================ FILE: 6sense/findWordInGrid.py ================================================ """ Refer below link for the question: https://www.geeksforgeeks.org/search-a-word-in-a-2d-grid-of-characters/ Solution: Algorithm for this is already present, but expected solution should have lowest time complexity (the algorithm should only traverse once unlike the solution above which traverses whole matrix for each word given) Enhancement: if the word is not present in the grid, print word along with row and col as (-1,-1) like word,-1,-1 Hint: Use DP approach Solution that I have gone with is below: """ # Enter your code here. Read input from STDIN. Print output to STDOUT import sys class word_seek: def __init__(self): self.R = None self.C = None self.dir = [[-1, 0], [1, 0], [1, 1], [1, -1], [-1, -1], [-1, 1], [0, 1], [0, -1]] def search_grid(self, grid, row, col, word): if grid[row][col] != word[0]: return False for x, y in self.dir: rd, cd = row + x, col + y flag = True for k in range(1, len(word)): if (0 <= rd 1: tracker = sorted(tracker,key=lambda x: x[1],reverse=True) print (tracker[0][0],tracker[0][1],tracker[0][2]) return else: assert (tracker[0][1] == -1) and (tracker[0][2] == -1),"failed" print (tracker[0][0],tracker[0][1],tracker[0][2]) return if __name__ == '__main__': grid = sys.stdin.readlines() grid = [i.strip() for i in grid] ind = grid.index('') #grid = grid[:(ind)] #finders = grid[(ind+1):] #print (grid) #print (finders) wordSeek = word_seek() for w in grid[(ind+1):]: wordSeek.patternSearch(grid[:(ind)],w) ================================================ FILE: Accenture/README.md ================================================ # Interview Process - Data Engineer - Round 1 (Skills Interview): + Can a python class have 2 __init__ methods? + Questions on class methods (staticmethods and classmethods). + Difference between method and function? + what are the ways in spark when writing data to storage to reduce or increase number of part files? + Difference between repartition and coalesce in Spark. + What is shuffle in spark? + can data be stored without indexes? + Object versioning in AWS S3? + Access contol policies in AWS S3 when creating a bucket? + Indexing in mongodb? + Given a small table with field names, write a sql query or spark code to get the required aggregation? refer Window functionality and ranking. + Differences between client mode and cluster mode (Spark execution) (https://stackoverflow.com/questions/37027732/apache-spark-differences-between-client-and-cluster-deploy-modes). + Machine learning - classification model - questions on model selection, model evaluation? + deploy a flask app in azure - what is the approach? + Questions on azure fundamentals? + what are the configurations for spark app when spark-submit is done? ================================================ FILE: Accolite/binaryArraySort.py ================================================ { #Initial Template for Python 3 import math def main(): T=int(input()) while(T>0): N=int(input()) A=list(map(int,input().split())) binSort(A,N) for i in A: print(i,end=" ") print() T-=1 if __name__ == "__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ##Complete this function def binSort(arr, n): #Your code here ''' No need to print the array ''' c0 = arr.count(0) c1 = arr.count(1) for i in range(c0): arr[i] = 0 for i in range(c0,c0+c1): arr[i] = 1 ================================================ FILE: Accolite/merge2SortedLinkedLists.py ================================================ { #Initial Template for Python 3 # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None # Linked List Class class LinkedList: def __init__(self): self.head = None # creates a new node with given value and appends it at the end of the linked list def append(self, new_value): new_node = Node(new_value) if self.head is None: self.head = new_node return curr_node = self.head while curr_node.next is not None: curr_node = curr_node.next curr_node.next = new_node # prints the elements of linked list starting with head def printList(self): if self.head is None: print(' ') return curr_node = self.head while curr_node: print(curr_node.data,end=" ") curr_node=curr_node.next print(' ') if __name__ == '__main__': t=int(input()) for cases in range(t): n,m = map(int, input().strip().split()) a = LinkedList() # create a new linked list 'a'. b = LinkedList() # create a new linked list 'b'. nodes_a = list(map(int, input().strip().split())) nodes_b = list(map(int, input().strip().split())) for x in nodes_a: a.append(x) for x in nodes_b: b.append(x) a.head = merge(a.head,b.head) a.printList() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Function to merge two sorted lists in one using constant space. Function Arguments: head_a and head_b (head reference of both the sorted lists) Return Type: head of the obtained list after merger. { # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None } Contributed By: Nagendra Jha ''' def merge(head_a,head_b): #code here global a elements = [] curr_node = head_a while curr_node != None: elements.append(curr_node.data) curr_node = curr_node.next curr_node = head_b while curr_node != None: elements.append(curr_node.data) curr_node = curr_node.next elements = sorted(elements) a = LinkedList() for i in elements: a.append(i) return a.head ================================================ FILE: Accolite/rotateLinkedListByKelements.py ================================================ { class Node: def __init__(self, data): self.data = data self.next = None class LinkedList: def __init__(self): self.head = None def push(self, new_data): new_node = Node(new_data) new_node.next = self.head self.head = new_node def printList(self): temp = self.head while(temp): print(temp.data, end=" ") # arr.append(str(temp.data)) temp = temp.next print("") if __name__ == '__main__': start = LinkedList() t = int(input()) while(t > 0): llist = LinkedList() n = int(input()) values = list(map(int, input().strip().split())) for i in reversed(values): llist.push(i) k = int(input()) new_head = rotateList(llist.head, k) llist.head = new_head llist.printList() t -= 1 # Contributed by: Harshit Sidhwa } ''' This is a function problem.You only need to complete the function given below ''' # Your task is to complete this function ''' class Node: def __init__(self, data): self.data = data self.next = None ''' # This function should rotate list counter- # clockwise by k and return new head (if changed) def rotateList(head, k): # code here global llist llist = LinkedList() C = 0 curr_node = head part1 = [] part2 = [] while curr_node != None: if C <= k-1: part1.append(curr_node.data) elif C > k-1: part2.append(curr_node.data) C += 1 curr_node = curr_node.next total = reversed(part2 + part1) for i in total: llist.push(i) return llist.head ================================================ FILE: Adobe/QuickSort.py ================================================ { #Initial Template for Python 3 if __name__ == "__main__": t=int(input()) for i in range(t): n=int(input()) arr=list(map(int,input().split())) quickSort(arr,0,n-1) for i in range(n): print(arr[i],end=" ") print() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 def quickSort(arr,low,high): if low < high: # pi is partitioning index, arr[p] is now # at right place pi = partition(arr,low,high) # Separately sort elements before # partition and after partition quickSort(arr, low, pi-1) quickSort(arr, pi+1, high) def partition(arr,low,high): #add code here tmp = 0 pivot = arr[high] i = low - 1 for j in range(low,high): if arr[j] <= pivot: i += 1 tmp = arr[i] arr[i] = arr[j] arr[j] = tmp tmp = arr[i+1] arr[i+1] = arr[high] arr[high] = tmp return i+1 ================================================ FILE: Adobe/WinnerOfElection.py ================================================ { #Initial Template for Python 3 def main(): T=int(input()) while(T>0): n=int(input()) arr=input().strip().split() winner(arr,n) print() T-=1 if __name__=="__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 #Complete this function def winner(arr,n): #Your code here import collections votes = collections.Counter(arr) maxVotes = max(votes.values()) if len(set(votes.values())) != len(votes.values()): maxVoteNames = [] for k,v in votes.items(): if v == maxVotes: maxVoteNames.append(k) print (min(maxVoteNames),maxVotes,end="") else: for k,v in votes.items(): if v == maxVotes: print (k,maxVotes,end="") return ================================================ FILE: Adobe/checkIfStringsAreRotations.py ================================================ { #Initial Template for Python 3 import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__=='__main__': t = int(input()) for i in range(t): s1=str(input()) s2=str(input()) if(areRotations(s1,s2)): print(1) else: print(0) } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Your task is tocheck if the given two strings are rotations of each other. Function Arguments: s1 and s2 (given strings) Return Type:boolean ''' def areRotations(s1,s2): if (len(s1) == 1 or len(s2) == 1) & (s1 != s2): return False #code here if (len(s1) == len(s2)) & (set(s1)==set(s2)) & (s2 in s1+s2) & (s1 in s1+s2): return True else: return False ================================================ FILE: Adobe/countOfInversionsArray.py ================================================ # GeeksForGeeks Code - Copied# { #Initial Template for Python 3 import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__=='__main__': t = int(input()) for tt in range(t): n = int(input()) a = list(map(int, input().strip().split())) print(Inversion_Count(a,n)) } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Your task is to return total number of inversions present in the array. Function Arguments: array a and size n Return Type: Integer ''' def Inversion_Count(arr,n): if a == sorted(a): return 0 temp_arr = [0]*n return mergesort(arr,temp_arr,0,n-1) def mergesort(arr,temp_arr,left,right): inv_count = 0 if left < right: mid = (left + right)//2 inv_count = mergesort(arr,temp_arr,left,mid) inv_count += mergesort(arr,temp_arr,mid+1,right) inv_count += merge(arr,temp_arr,left,mid,right) return inv_count def merge(arr,temp_arr,left, mid, right): # Merge the temp arrays back into arr[l..r] i = left # Initial index of first subarray j = mid+1 # Initial index of second subarray k = left # Initial index of merged subarray invcount = 0 while i <= mid and j <= right: if arr[i] <= arr[j]: temp_arr[k] = arr[i] i += 1 else: temp_arr[k] = arr[j] invcount += (mid - i + 1) j += 1 k += 1 # Copy the remaining elements of L[], if there # are any while i <= mid: temp_arr[k] = arr[i] i += 1 k += 1 # Copy the remaining elements of R[], if there # are any while j <= right: temp_arr[k] = arr[j] j += 1 k += 1 for lr in range(left, right + 1): arr[lr] = temp_arr[lr] return invcount ================================================ FILE: Adobe/parenthesisChecker.py ================================================ { #Initial Template for Python 3 import atexit import io import sys #Contributed by : Nagendra Jha _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__ == '__main__': test_cases = int(input()) for cases in range(test_cases) : #n = int(input()) #n,k = map(int,imput().strip().split()) #a = list(map(int,input().strip().split())) s = str(input()) if ispar(s): print("balanced") else: print("not balanced") } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Function Arguments : @param : s (given string containing parenthesis) @return : boolean True or False ''' def isMatchingPair(c1,c2): if (c1=='(') & (c2==')'): return True elif (c1=='{') & (c2=='}'): return True elif (c1=='[') & (c2==']'): return True else: return False def ispar(s): # code here import queue stack = queue.LifoQueue() for i in range(len(s)): if ((s[i] == '{') | (s[i] == '[') | (s[i] == '(')): stack.put(s[i]) if ((s[i] == '}') | (s[i] == ']') | (s[i] == ')')): if stack.empty(): return False elif not isMatchingPair(stack.get(),s[i]): return False if stack.empty(): return True else: return False ================================================ FILE: Amazon/DataEngineering/README.md ================================================ # Data Engineering Assessment (As of Oct 2024): + Assessment redirected to Hackerank from Amazon Jobs profile configuration. + Assessment with a maximum duration of 120 mins. + Multiple Choice Questions on Very basic SQL (Joins, MySQL syntax, Error recognition etc..). Around 20 MCQs. + 3 SQL coding questions, Easy, Medium and Hard [Simple Joins to writing subqueries to ranking + CTEs + Computing Aggregate metrics + subqueries] ================================================ FILE: Amazon/Delete_Node_Without_HeadPointer.py ================================================ class Node(object): def __init__(self,dataVal,nextNode=None): self.data = dataVal self.next = nextNode def getData(self): return (self.data) def setData(self,val): self.data = val def getNextNode(self): return (self.next) def setNextNode(self,val): self.next = val class LinkedList(object): def __init__(self,head=None): self.head = head self.size = 0 def getSize(self): return (self.size) def addNode(self,data): newNode = Node(data) newNode.setNextNode(self.head) self.head = newNode self.size += 1 return True def printNode(self): curr = self.head while curr: print (curr.data) curr = curr.getNextNode() def deleteNode(self,value): prev = None curr = self.head while curr: if curr.data == value: if prev: prev.setNextNode(curr.getNextNode()) else: self.head = curr.getNextNode() self.size -= 1 return True prev = curr curr = curr.getNextNode() return False myList = LinkedList() print (myList.getSize()) print ("______*Inserting*_______") myList.addNode(5) myList.addNode(10) myList.addNode(15) myList.addNode(20) myList.addNode(25) print ("printing") myList.printNode() print ("Deleting") myList.deleteNode(10) myList.deleteNode(20) myList.deleteNode(5) myList.addNode(90) myList.addNode(2000) print (myList.getSize()) myList.printNode() ================================================ FILE: Amazon/EquilibriumPointFirstOccurWhere.py ================================================ { # Initial Template for Python 3 import math def main(): T = int(input()) while(T > 0): N = int(input()) A = [int(x) for x in input().strip().split()] print(equilibriumPoint(A, N)) T -= 1 if __name__ == "__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' # User function Template for python3 # Complete this function def equilibriumPoint(A, N): # Your code here if len(A)==1: return 1 else: eqlist = [] for i in range(1,N-1): if sum(A[:i]) == sum(A[i+1:]): eqlist.append(i+1) if sum(A[1:]) == 0: eqlist.insert(0,1) if sum(A[:N-1]) == 0: eqlist.append(N) if eqlist: return min(eqlist) else: return -1 ================================================ FILE: Amazon/FindzeroSumSubArrays.py ================================================ def subArrayExists(arr,n): ##Your code here hashMap = {} out = [] sum1 = 0 for i in range(n): sum1 += arr[i] if sum1==0: out.append((0,i)) al = [] if sum1 in hashMap: al = hashMap.get(sum1) for it in range(len(al)): out.append((al[it]+1,i)) al.append(i) hashMap[sum1] = al return len(out) if __name__ == '__main__': t = int(input()) for tcase in range(t): n = int(input()) arr = list(map(int,input().strip().split())) print (subArrayExists(arr,n)) ================================================ FILE: Amazon/ImplementSTRSTR.py ================================================ { #Contributed by : Nagendra Jha import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__ == '__main__': t=int(input()) for cases in range(t): s,p =input().strip().split() print(strstr(s,p)) } ''' This is a function problem.You only need to complete the function given below ''' ''' Your task is to return the index of the pattern present in the given string. Function Arguments: s (given text), p(given pattern) Return Type: Integer. ''' def strstr(s,p): #code here import re loc = re.search(p,s) if loc is not None: return (loc.start()) else: return -1 ================================================ FILE: Amazon/JosephusProblem.py ================================================ { import math //Position this line where user code will be pasted. def main(): T=int(input()) while(T>0): nk=[int(x) for x in input().strip().split()] n=nk[0] k=nk[1] print(josephus(n,k)) T-=1 if __name__=="__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #Complete this function def josephus(n,k): #Your code here if n == 1: return n return ((josephus(n-1,k)+k-1)%n + 1) ================================================ FILE: Amazon/MaximumOfMinimumsOfAllWindowSizes.py ================================================ #User function Template for python3 """ Given an integer array A[] of size N. The task is to find the maximum of the minimum of every window size in the array. Note: Window size varies from 1 to n. Input: The first line contains an integer T denoting the total number of test cases. In each test cases, the first line contains an integer N denoting the size of array. The second line contains N space-separated integers A1, A2, ..., AN denoting the elements of the array. Output: In each seperate line, print the array of numbers of size N for each of the considered window size 1, 2 , ..., N respectively. User Task: The task is to complete the function printMaxOfMin() which finds the maximum of minimum of every window size. Constraints: 1 <= T <= 50 1 <= N <= 10^5 1 <= A[i] <= 10^6 Example: Input: 2 7 10 20 30 50 10 70 30 3 10 20 30 Output: 70 30 20 10 10 10 10 30 20 10 Explaination: Testcase 1: First element in output indicates maximum of minimums of all windows of size 1. Minimums of windows of size 1 are {10}, {20}, {30}, {50}, {10}, {70} and {30}. Maximum of these minimums is 70. Second element in output indicates maximum of minimums of all windows of size 2. Minimums of windows of size 2 are {10}, {20}, {30}, {10}, {10}, and {30}. Maximum of these minimums is 30. Third element in output indicates maximum of minimums of all windows of size 3. Minimums of windows of size 3 are {10}, {20}, {10}, {10} and {10}. Maximum of these minimums is 20. Similarly other elements of output are computed. Testcase 2: First element in output indicates maximum of minimums of all windows of size 1.Minimums of windows of size 1 are {10} , {20} , {30}. Maximum of these minimums are 30 and similarly other outputs can be computed. """ ''' Function Arguments : @param : a(given array), n (size of array) @return : None, print the required Maxofmin array. ''' def printMaxOfMin(a,n): # code here s = [] left = [0 for i in range(n+1)] right = [0 for i in range(n+1)] left[:n] = [-1]*n right[:n] = [n]*n for i in range(0,n): while (len(s) != 0 and a[s[-1]] >= a[i]): s.pop() if (len(s) != 0): left[i] = s[-1] s.append(i) while (len(s) != 0): s.pop() for i in range(n-1,-1,-1): while (len(s) != 0 and a[s[-1]] >= a[i]): s.pop() if (len(s) != 0): right[i] = s[-1] s.append(i) ans = [0]*(n+1) for i in range(0,n): length = right[i] - left[i] - 1 ans[length] = max(ans[length],a[i]) for i in range(n-1,0,-1): ans[i] = max(ans[i],ans[i+1]) for i in range(1,n+1): print (ans[i],end=" ") print ("") #{ # Driver Code Starts #Initial Template for Python 3 import atexit import io import sys #Contributed by : Nagendra Jha _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__ == '__main__': test_cases = int(input()) for cases in range(test_cases) : n = int(input()) a = list(map(int,input().strip().split())) printMaxOfMin(a,n) # } Driver Code Ends ================================================ FILE: Amazon/Possible_Words_Phone_Digits.py ================================================ { #Initial Template for Python 3 import math def main(): T=int(input()) while(T>0): N=int(input()) a=[int(x) for x in input().strip().split()] possibleWords(a,N) print() T-=1 if __name__=="__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ##Complete this function def possibleWords(a,N): ##Your code here import itertools ph = {'abc':2,'def':3,'ghi':4,'jkl':5,'mno':6,'pqrs':7,'tuv':8,'wxyz':9} my_sts = [] for x in a: for k,v in ph.items(): if v == x: my_sts.append(k) if len(my_sts)>1: res = list(itertools.product(*my_sts)) res = [''.join(u) for u in res] print (' '.join(i for i in res)) else: print (' '.join(list(my_sts[0]))) ================================================ FILE: Amazon/countOfInversionsArray.py ================================================ # GeeksForGeeks Code - Copied# { #Initial Template for Python 3 import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__=='__main__': t = int(input()) for tt in range(t): n = int(input()) a = list(map(int, input().strip().split())) print(Inversion_Count(a,n)) } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Your task is to return total number of inversions present in the array. Function Arguments: array a and size n Return Type: Integer ''' def Inversion_Count(arr,n): if a == sorted(a): return 0 temp_arr = [0]*n return mergesort(arr,temp_arr,0,n-1) def mergesort(arr,temp_arr,left,right): inv_count = 0 if left < right: mid = (left + right)//2 inv_count = mergesort(arr,temp_arr,left,mid) inv_count += mergesort(arr,temp_arr,mid+1,right) inv_count += merge(arr,temp_arr,left,mid,right) return inv_count def merge(arr,temp_arr,left, mid, right): # Merge the temp arrays back into arr[l..r] i = left # Initial index of first subarray j = mid+1 # Initial index of second subarray k = left # Initial index of merged subarray invcount = 0 while i <= mid and j <= right: if arr[i] <= arr[j]: temp_arr[k] = arr[i] i += 1 else: temp_arr[k] = arr[j] invcount += (mid - i + 1) j += 1 k += 1 # Copy the remaining elements of L[], if there # are any while i <= mid: temp_arr[k] = arr[i] i += 1 k += 1 # Copy the remaining elements of R[], if there # are any while j <= right: temp_arr[k] = arr[j] j += 1 k += 1 for lr in range(left, right + 1): arr[lr] = temp_arr[lr] return invcount ================================================ FILE: Amazon/getAggregationsByParsing.py ================================================ #!/usr/bin/env python3.9 # Asked in Thrasio Interview Process # Input: sales_data = [{'channel':'Amazon', 'id':'AMZ456', 'sales':10, 'returns':0}, {'channel':'Amazon', 'id':'AMZ123', 'sales':5, 'returns':2}, {'channel':'Shopify', 'id':'1234', 'sales':15, 'returns':0}, {'channel':'Target', 'id':'TGT456', 'sales':23, 'returns':5} ] channel_thrasio_mapping = { 'AMAZON':{'AMZ123':'THRASIO-987', 'AMZ456':'THRASIO-456'}, 'SHOPIFY':{'1234':'THRASIO-987', '5678':'THRASIO-321'} } #Expected Output: # [ # {'id':'THRASIO-987', 'net_sales':18, 'returns':2}, # {'id':'THRASIO-456', 'net_sales':10, 'returns':0}, # ] result = [] for k1,v1 in channel_thrasio_mapping.items(): for k2,v2 in v1.items(): temp_dict = {} temp_dict['id'] = v2 all_sales = [] all_returns = [] for s in sales_data: if s['id'] == k2: all_sales.append(s['sales']) all_returns.append(s['returns']) net_sales = sum(all_sales)-sum(all_returns) returns = sum(all_returns) temp_dict['net_sales'] = net_sales temp_dict['returns'] = returns result.append(temp_dict) #print (result) unique_thrasio_ids = set([t['id'] for t in result]) #print (unique_thrasio_ids) output = [] for utd in unique_thrasio_ids: final_result = {} final_result['id'] = utd final_result['net_sales'] = 0 final_result['returns'] = 0 for res in result: if res['id'] == utd: final_result['net_sales'] += res['net_sales'] final_result['returns'] += res['returns'] output.append(final_result) print (output) ================================================ FILE: Amazon/keypadTyping.py ================================================ def keypadTyping(s): for c in s: print (mydict[c],end="") if __name__ == '__main__': t = int(input()) mydict = {'a':2,'b':2,'c':2, 'd':3,'e':3,'f':3, 'g':4,'h':4,'i':4, 'j':5,'k':5,'l':5, 'm':6,'n':6,'o':6, 'p':7,'q':7,'r':7,'s':7, 't':8,'u':8,'v':8, 'w':9,'x':9,'y':9,'z':9} for tcase in range(t): s = input() keypadTyping(s) print () ================================================ FILE: Amazon/merge2SortedLinkedLists.py ================================================ { #Initial Template for Python 3 # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None # Linked List Class class LinkedList: def __init__(self): self.head = None # creates a new node with given value and appends it at the end of the linked list def append(self, new_value): new_node = Node(new_value) if self.head is None: self.head = new_node return curr_node = self.head while curr_node.next is not None: curr_node = curr_node.next curr_node.next = new_node # prints the elements of linked list starting with head def printList(self): if self.head is None: print(' ') return curr_node = self.head while curr_node: print(curr_node.data,end=" ") curr_node=curr_node.next print(' ') if __name__ == '__main__': t=int(input()) for cases in range(t): n,m = map(int, input().strip().split()) a = LinkedList() # create a new linked list 'a'. b = LinkedList() # create a new linked list 'b'. nodes_a = list(map(int, input().strip().split())) nodes_b = list(map(int, input().strip().split())) for x in nodes_a: a.append(x) for x in nodes_b: b.append(x) a.head = merge(a.head,b.head) a.printList() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Function to merge two sorted lists in one using constant space. Function Arguments: head_a and head_b (head reference of both the sorted lists) Return Type: head of the obtained list after merger. { # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None } Contributed By: Nagendra Jha ''' def merge(head_a,head_b): #code here global a elements = [] curr_node = head_a while curr_node != None: elements.append(curr_node.data) curr_node = curr_node.next curr_node = head_b while curr_node != None: elements.append(curr_node.data) curr_node = curr_node.next elements = sorted(elements) a = LinkedList() for i in elements: a.append(i) return a.head ================================================ FILE: Amazon/missingSmallestPositiveNumber.py ================================================ { #Initial Template for Python 3 import math def main(): T=int(input()) while(T>0): n=int(input()) arr=[int(x) for x in input().strip().split()] print(missingNumber(arr,n)) T-=1 if __name__ == "__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ##Complete this function def missingNumber(arr,n): #Your code here poss = [x for x in arr if x > 0] if poss: if poss == list(range(1,n+1)): return max(poss)+1 else: min_poss = min(poss) max_poss = max(poss) total_range = list(range(1,max_poss+1)) missingNumbers = set(total_range) - set(poss) return min(missingNumbers) else: return 0 ================================================ FILE: Amazon/moveAllZerosToEndOfArray.py ================================================ def moveZerosToEnd(arr): arr_0 = [x for x in arr if x!=0] zeros = [0] * (len(arr)-len(arr_0)) ans = arr_0 + zeros return " ".join(str(u) for u in ans) if __name__ == '__main__': T = int(input()) for t in range(T): N = int(input()) arr = list(map(int,input().strip().split())) print (moveZerosToEnd(arr)) ================================================ FILE: Amazon/parenthesisChecker.py ================================================ { #Initial Template for Python 3 import atexit import io import sys #Contributed by : Nagendra Jha _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__ == '__main__': test_cases = int(input()) for cases in range(test_cases) : #n = int(input()) #n,k = map(int,imput().strip().split()) #a = list(map(int,input().strip().split())) s = str(input()) if ispar(s): print("balanced") else: print("not balanced") } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Function Arguments : @param : s (given string containing parenthesis) @return : boolean True or False ''' def isMatchingPair(c1,c2): if (c1=='(') & (c2==')'): return True elif (c1=='{') & (c2=='}'): return True elif (c1=='[') & (c2==']'): return True else: return False def ispar(s): # code here import queue stack = queue.LifoQueue() for i in range(len(s)): if ((s[i] == '{') | (s[i] == '[') | (s[i] == '(')): stack.put(s[i]) if ((s[i] == '}') | (s[i] == ']') | (s[i] == ')')): if stack.empty(): return False elif not isMatchingPair(stack.get(),s[i]): return False if stack.empty(): return True else: return False ================================================ FILE: Amazon/relativeSorting.py ================================================ def relativeSorting(A1,A2): common_elements = set(A1).intersection(set(A2)) extra = set(A1).difference(set(A2)) out = [] for i in A2: s = [i] * A1.count(i) out.extend(s) extra_out = [] for j in extra: u = [j] * A1.count(j) extra_out.extend(u) out = out + sorted(extra_out) return " ".join(str(i) for i in out) if __name__ == '__main__': t = int(input()) for tcase in range(t): N,M = list(map(int,input().strip().split())) A1 = list(map(int,input().strip().split())) A2 = list(map(int,input().strip().split())) print (relativeSorting(A1,A2)) ================================================ FILE: Amazon/rotateLinkedListByKelements.py ================================================ { class Node: def __init__(self, data): self.data = data self.next = None class LinkedList: def __init__(self): self.head = None def push(self, new_data): new_node = Node(new_data) new_node.next = self.head self.head = new_node def printList(self): temp = self.head while(temp): print(temp.data, end=" ") # arr.append(str(temp.data)) temp = temp.next print("") if __name__ == '__main__': start = LinkedList() t = int(input()) while(t > 0): llist = LinkedList() n = int(input()) values = list(map(int, input().strip().split())) for i in reversed(values): llist.push(i) k = int(input()) new_head = rotateList(llist.head, k) llist.head = new_head llist.printList() t -= 1 # Contributed by: Harshit Sidhwa } ''' This is a function problem.You only need to complete the function given below ''' # Your task is to complete this function ''' class Node: def __init__(self, data): self.data = data self.next = None ''' # This function should rotate list counter- # clockwise by k and return new head (if changed) def rotateList(head, k): # code here global llist llist = LinkedList() C = 0 curr_node = head part1 = [] part2 = [] while curr_node != None: if C <= k-1: part1.append(curr_node.data) elif C > k-1: part2.append(curr_node.data) C += 1 curr_node = curr_node.next total = reversed(part2 + part1) for i in total: llist.push(i) return llist.head ================================================ FILE: AmericanExpress/README.md ================================================ # Interview Process for Engineer - II (2021) (Personal): ### First Round - Coding Round: Online Coding Round with time management. ### Second Round - Design: + Design an Application that serves machine learning model results in real-time (Design a feedback ingestion system and notifications pusher) - Brush upon system design concepts, NoSQL Databases, caching, load balancer frameworks, distributed computing fundamentals, APIs, software design patterns. ### Third Round - Technical: + Implement KNN (K-Nearest neighbors) algorithm at scale - Start from scratch - Try to think of solution that doesn't rely on existing cluster computing frameworks like spark, Dask etc - performance and efficient design matters! - very good question, completely depends on the distributed and parallel processing paradigms and operating systems knowledge. ================================================ FILE: Athena-Health/README.md ================================================ # Interview Process for SMTS (Senior Member of Technical Staff) - 2021: ### First Round - Coding (Data Structures and Algorithms): + Given an array with 0s, 1s and 2s - sort the array without actually doing sorting (Incase of python - Avoid using built-in functions for any operations). + Given an unsorted array, Find the median of the given array (Do not use any libraries and functions, Raw code has to be written). ### Second Round - Design: + In-Depth questions on BSTs and graphs. + Real time Application use-case involving updation of news feed/facebook feed/twitter feed based on ranking of the incoming data stream of posts, comments, reactions such that posts are sorted by highest rank followed by post with lesser rank and so on - Do this in real time - Design an efficient feed management system. ### Third Round - Managerial: + Questions on your technical experience and projects you have done. + Challenges in the projects, how are scaling issues resolved, Team work, Milestones achieved etc. + Questions on Deployment process followed and tools used for deploying codebase - devops architecture, CI/CD, Docker, Kubernetes, Service mesh and containerization etc ### Fourth Round - HR round: + Questions are similar to questions from any other HR round. + what are you looking for? what are your career aspirations? + what are your interests and why are you interested in athenahealth? + Any other doubts and clarifications on the role, Planning connects with Hiring manager etc. ================================================ FILE: Betterworks/README.md ================================================ # Hiring process for Betterworks Software Engineer Role (Personal Experience): ## First round: + Discussion from Director of Engineering/Lead Engineer on your work experience, Job description, you expertise on skills required for the role, understand your skillset, projects that you have worked in and have been working on at the moment - critical challenges faced and how you have overcome them etc. ## Second round: + Technical screening on basic coding questions. + Basic questions on pandas data munging steps, write a simple algorithm (Given a pair of string and numbers ex: one 1, two 2, three 3, four 4, five 5 - print sequence like this - (n,1),(w,2),(r,3),(ou,4),(iv,5). + sorting techniques, python built-in data structures and questions on the same. + questions on python decorators. ## Third round: + Hands-on Assignment where you have to build a working prototype of an API (Assignment will be shared in the form of PDF). ## Fourth round: + Questions on pandas data analysis techniques groupby, aggregations, distributions, visualizing the bar charts, boxplots. + SQL - primary keys, foreign keys, composite keys, Indexes, star and snowflake schemas. + Write an algorithm that does the following, Given a array of integers and element x - find all the elements of array that are closer to given number x (Numbers having minimum difference with x)(questions on time complexity, scope for optimization,efficiency). + Questions on Github, process of deployment and development you follow in your regular projects at work with github. when would you use git squash, git merge, git rebase etc. + Questions on REST API, design techniques etc. ================================================ FILE: Betterworks/print.py ================================================ """ Given a dictionary of elements: mydict = {'one':1, 'two':2, 'three':3, 'four':4, 'five':5} print the sequence of elements in the following fashion: {'n':1,'w':2,'r':3,'ou':4,'iv':5} """ mydict = {'one':1, 'two':2, 'three':3, 'four':4, 'five':5} def doop(mydict): if len(mydict) == 0: return "no data" res = {} for k,v in mydict.items(): leng = len(k) if leng%2 != 0: res.update({k[(leng-1)//2]:v}) else: key = ''.join([k[(leng-1)//2],k[(leng+1)//2]]) res.update({key:v}) return res print (doop(mydict)) ================================================ FILE: Bloomberg/moveAllZerosToEndOfArray.py ================================================ def moveZerosToEnd(arr): arr_0 = [x for x in arr if x!=0] zeros = [0] * (len(arr)-len(arr_0)) ans = arr_0 + zeros return " ".join(str(u) for u in ans) if __name__ == '__main__': T = int(input()) for t in range(T): N = int(input()) arr = list(map(int,input().strip().split())) print (moveZerosToEnd(arr)) ================================================ FILE: BrightMoney/Knapsack.py ================================================ """ 0-1 Knapsack problem """ ================================================ FILE: BrightMoney/LongestBalancedSubstring.py ================================================ """ Longest balanced substring given a string containing only parenthesis (open and closed curly braces) Refer https://www.geeksforgeeks.org/length-of-the-longest-valid-substring/ along with the length, also print the start and end indices of this longest substring Refer below custom addition to the existing code """ def findMaxLen(string): n = len(string) # Create a stack and push -1 as initial index to it. stk = [] stk.append(-1) # Initialize result result = 0 tracker = [] # Traverse all characters of given string for i in range(n): # If opening bracket, push index of it if string[i] == '{': stk.append(i) else: # If closing bracket, i.e., str[i] = '}' # Pop the previous opening bracket's index stk.pop() # Check if this length formed with base of # current valid substring is more than max # so far if len(stk) != 0: if result < i - stk[len(stk)-1]: result = i - stk[len(stk)-1] tracker.append((result,i)) # If stack is empty. push current index as # base for next valid substring (if any) else: stk.append(i) return result,tracker #string = "{{{}" #string = "}{}{}}" string = "{}{{}}}}}" result,tracker = findMaxLen(string) for t in tracker: if t[0] == result: end_index = t[1] # length of longest substring (balanced) print (result) # print (start and end locations of that substring) print (" ".join([str(end_index - result + 2),str(end_index + 1)])) ================================================ FILE: BrightMoney/README.md ================================================ # Coding questions on DoSelect Platform (Personal Experience) (Data Science roles - ML Engineering) (4 Coding questions are added here) + This process is different from Data analyst process. # Interview Process for Data Analyst position: ## First round - DoSelect Assessment for Data Analyst positions/Word Document on email: + There are different kinds of assessments being used by BrightMoney for Data analyst positions: - It can send you a word document (Refer word document above) consisting of questions & asking you to write sql queries for different scenarios in the document and also to replicate the same logic using pandas in python (use sqlite). - It can also send you DoSelect link where there are two questions, one question to be solved using pandas for simple data manipulation and other is you have to write a sql query (Given different tables by showing relationships between all tables like an ER diagram with indications of primary and foreign keys) ## Second round - Business Case study from the founder of Brightmoney.co: + Given a case study around credit cards and checking accounts with financial lingo, build your hypothesis and present a solution as to why you think your approach is accurate compared to other possible approaches? What are the different problems with your approach? What can be challenges with the data you are taking? etc (Deleted the case study ppt to avoid DMCA notices). ================================================ FILE: BrightMoney/countDistinctValidPANNumbers.py ================================================ """ Given a paragraph of P words, find all valid PAN card numbers in that paragraph """ ================================================ FILE: BrightMoney/printSpirally.py ================================================ """ Given an integer n, generate a square matrix from entries from 1 to n**2 in spiral pattern """ def generateMatrix(n): if n <= 0: return [] result = [[None for i in range(n)] for j in range(n)] xBeg,xEnd = 0,n-1 yBeg,yEnd = 0,n-1 current = 1 while (True): for i in range(yBeg,yEnd+1): result[xBeg][i] = current current += 1 xBeg += 1 if (xBeg > xEnd): break for i in range(xBeg,xEnd+1): result[i][yEnd] = current current += 1 yEnd -= 1 if (yEnd < yBeg): break for i in range(yEnd,yBeg-1,-1): result[xEnd][i] = current current += 1 xEnd -= 1 if (xEnd < xBeg): break for i in range(xEnd,xBeg-1,-1): result[i][yBeg] = current current += 1 yBeg += 1 if (yBeg > yEnd): break return result rows = generateMatrix(9) for row in rows: print (" ".join(str(i) for i in row)) ================================================ FILE: Busigence/README.md ================================================ # Interview Process - Personal (Data Engineer - Python/Pyspark/Scala-spark) - 2021: ### First round: + Take Home assignment on Spark coding. + The solutions added above are correct (Got selected to next round) 👍 🙂. ================================================ FILE: Busigence/Vikas_Chitturi.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "DATA ENGINEER - PYTHON PYSPARK" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This test consits of fifteen problems. You are required to write your code in cell below each problem and output the result in cell next to it " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Total Time Allowed: 3 hours" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Style1: 10 markes each shall be awarded for correct solution coded in object oriented paradign in Python3
\n", "Style2: 15 marks each shall be awarded for correct solution coded in functional programming paradigm (lamda, map, reduce, filter etc) in Python3
\n", "Style3: 20 marks each shall be awarded for correct solution coded in functional programming paradigm (dataframes, map, reduce, filter etc) in PySpark3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rename and Save the notebook with your FirstName_LastName (eg. Sahil_Gupta.ipynb)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "#Vikas Veerabhadra Chitturi\n", "#XXXXXXXXXXXXXXXX\n", "#+91-XXXXXXXXXX" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "********************************************* Test starts here **************************************************" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "INSTRUCTIONS:\n", " 1. You are required to download and import five CSV files, one json file and one xml file\n", " 2. You would need to understand business involved behind CRM database tables. This is important\n", " 3. Code must be in Python3/PySpark3\n", " 4. Either your code should output something or leave the comment \"#solution code here\" as it is. We shall use 'Run All' in notebook and it shouldn't result error\n", " 5. Test the entire notebook before uploading to Google Form provided\n", " 6. You can use any Python3 library (two are imported already) or PySpark3 library. There is no restriction\n", " 7. Output fieldname to be displayed are marked as single quaotes '' in problem statement. You should use same field alias names whereever required\n", " 8. Notation for dataframe and/or array must be local to a problem's solution. Eg. Dataframe \"test\" for problem 8 should be df_prb8_test" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from pyspark.sql import SparkSession\n", "from pyspark.sql.functions import *\n", "from pyspark.sql.types import *\n", "from collections import defaultdict\n", "import json" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "#Please make sure this notebook is run on spark installed cluster/environment - Data processing is done in pyspark\n", "spark = SparkSession.builder\\\n", " .master(\"local[4]\")\\\n", " .appName(\"Assignment\")\\\n", " .getOrCreate()\n", "print (spark)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sales Teams...\n", "[Row(sales_agent='Donn Cantrell', manager='Rocco Neubert', regional_office='Central'), Row(sales_agent='James Ascencio', manager='Summer Sewald', regional_office='West'), Row(sales_agent='Vicki Laflamme', manager='Celia Rouche', regional_office='West'), Row(sales_agent='Niesha Huffines', manager='Melvin Marxen', regional_office='East'), Row(sales_agent='Kami Bicknell', manager='Summer Sewald', regional_office='West')]\n", "Accounts....\n", "[Row(account='Sunnamplex', revenue='4592.96', employees='13938.0'), Row(account='Silis', revenue='5339.57', employees='18053.0'), Row(account='Groovestreet', revenue='2728.86', employees='6486.0'), Row(account='Donware', revenue='2009.52', employees='3409.0'), Row(account='Wonka Industries', revenue='4962.27', employees='4687.0')]\n", "Clicks....\n", "[Row(created_on='2016-11-14', source='Referral', industry='IT'), Row(created_on='2016-11-14', source='Social', industry='IT'), Row(created_on='2016-11-14', source='Paid', industry='SaaS'), Row(created_on='2016-11-14', source='Direct', industry='SaaS'), Row(created_on='2016-11-15', source='Organic', industry='Health Care')]\n", "Products....\n", "[Row(product='GTXAdvanced', sales_price='649.0'), Row(product='GTXBasic', sales_price='641.0'), Row(product='MGRPFU', sales_price='3959.0'), Row(product='MGRPFS', sales_price='64.0'), Row(product='GTXPlusBasic', sales_price='1279.0')]\n", "Sales Pipeline....\n", "[Row(account='Sunnamplex', opportunity_id='67HY0MW7', sales_agent='Donn Cantrell', deal_stage='Won', product='GTXBasic', close_date='2017-05-06', close_value='500.0', created_on='2017-04-24'), Row(account=None, opportunity_id='MA82HVCI', sales_agent='James Ascencio', deal_stage='In_Progress', product='GTXPro', close_date=None, close_value=None, created_on='2017-06-15'), Row(account=None, opportunity_id='BRL1KVVH', sales_agent='Vicki Laflamme', deal_stage='Lost', product='GTXBasic', close_date='2017-08-03', close_value='0.0', created_on='2017-05-19'), Row(account='Silis', opportunity_id='R22O68FF', sales_agent='Niesha Huffines', deal_stage='Won', product='GTXBasic', close_date='2017-06-27', close_value='524.0', created_on='2017-03-21'), Row(account='Silis', opportunity_id='J78AK31N', sales_agent='Kami Bicknell', deal_stage='Won', product='MGRPFU', close_date='2017-08-04', close_value='4794.0', created_on='2017-05-15')]\n" ] } ], "source": [ "#import CSVs here\n", "df_sales_teams = spark.read.format('csv')\\\n", " .option(\"header\",\"true\")\\\n", " .load(\"data/sales_teams.csv\")\n", "df_accounts = spark.read.format('csv')\\\n", " .option(\"header\",\"true\")\\\n", " .load(\"data/accounts.csv\")\n", "df_clicks = spark.read.format('csv')\\\n", " .option(\"header\",\"true\")\\\n", " .load(\"data/clicks.csv\")\n", "df_products = spark.read.format('csv')\\\n", " .option(\"header\",\"true\")\\\n", " .load(\"data/products.csv\")\n", "df_sales_pipeline = spark.read.format('csv')\\\n", " .option(\"header\",\"true\")\\\n", " .load(\"data/sales_pipeline.csv\")\n", "\n", "print (\"Sales Teams...\")\n", "print (df_sales_teams.take(5))\n", "\n", "print (\"Accounts....\")\n", "print (df_accounts.take(5))\n", "\n", "print (\"Clicks....\")\n", "print (df_clicks.take(5))\n", "\n", "print (\"Products....\")\n", "print (df_products.take(5))\n", "\n", "print (\"Sales Pipeline....\")\n", "print (df_sales_pipeline.take(5))\n", "\n", "#import JSONs here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Refer & Use five CSVs to answer problem 1-10 below" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PROBLEM 1: Display 'Manager' and 'Grand Total Sales', for sales done by the sales agents reporting these managers" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+-------------+-------------+-----------+-----------+\n", "| sales_agent| manager| deal_stage|close_value|\n", "+-------------+-------------+-----------+-----------+\n", "|Donn Cantrell|Rocco Neubert| Won| 444.0|\n", "|Donn Cantrell|Rocco Neubert|In_Progress| null|\n", "|Donn Cantrell|Rocco Neubert| Lost| 0.0|\n", "|Donn Cantrell|Rocco Neubert| Won| 7695.0|\n", "|Donn Cantrell|Rocco Neubert| Won| 5531.0|\n", "|Donn Cantrell|Rocco Neubert|In_Progress| null|\n", "|Donn Cantrell|Rocco Neubert| Won| 7381.0|\n", "|Donn Cantrell|Rocco Neubert| Won| 565.0|\n", "|Donn Cantrell|Rocco Neubert|In_Progress| null|\n", "|Donn Cantrell|Rocco Neubert| Won| 7905.0|\n", "+-------------+-------------+-----------+-----------+\n", "only showing top 10 rows\n", "\n", "14277\n", "+----------------+-----------------+\n", "| manager|grand_total_sales|\n", "+----------------+-----------------+\n", "| Celia Rouche| 2518466.0|\n", "| Rocco Neubert| 3346813.0|\n", "| Melvin Marxen| 4265901.0|\n", "| Summer Sewald| 2915362.0|\n", "| Cara Losch| 1861751.0|\n", "|Dustin Brinkmann| 3028635.0|\n", "+----------------+-----------------+\n", "\n" ] } ], "source": [ "#solution code here\n", "#joining sales_teams and sales_pipeline tables\n", "df_st_sp_prob1 = df_sales_teams.select(\"sales_agent\",\"manager\").alias(\"a\")\\\n", " .join(df_sales_pipeline.select(\"sales_agent\",\"deal_stage\",\"close_value\").alias(\"b\"),\n", " on=col(\"a.sales_agent\")==col(\"b.sales_agent\"),how='left')\\\n", " .select([col(\"a.sales_agent\"),col(\"a.manager\"),col(\"b.deal_stage\"),col(\"b.close_value\")])\n", "df_st_sp_prob1.show(10) \n", "print (df_st_sp_prob1.count())\n", "#putting a filter on deal_stage=='Won' because it is considered as successful sale completed and close_value is the\n", "#final value for which product is sold after negotiation\n", "df_st_sp_prob1 = df_st_sp_prob1.filter(col(\"deal_stage\")==\"Won\")\n", "df_st_sp_prob1 = df_st_sp_prob1.withColumn(\"close_value\",df_st_sp_prob1[\"close_value\"].cast(DoubleType()))\n", "result_prob1 = df_st_sp_prob1.groupBy(\"manager\").agg(sum(\"close_value\").alias(\"grand_total_sales\"))\n", "result_prob1.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PROBLEM 2: Display 'Sales Agents' and 'Sales' for those sales where product sold at profit" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------+-----------------+----------+------------+-----------+------------+\n", "|opportunity_id| sales_agent|deal_stage| product|final_price|actual_price|\n", "+--------------+-----------------+----------+------------+-----------+------------+\n", "| 67HY0MW7| Donn Cantrell| Won| GTXBasic| 500.0| 641.0|\n", "| R22O68FF| Niesha Huffines| Won| GTXBasic| 524.0| 641.0|\n", "| J78AK31N| Kami Bicknell| Won| MGRPFU| 4794.0| 3959.0|\n", "| 8I9PRPGN|Versie Hillebrand| Won| MGRPFS| 67.0| 64.0|\n", "| 4VHUTHOJ| Kami Bicknell| Won|GTXPlusBasic| 1480.0| 1279.0|\n", "| TMJ0OJ0B| Kary Hendrixson| Won| GTXBasic| 635.0| 641.0|\n", "| MD4PBMNN| Anna Snelling| Won| MGRPFU| 3842.0| 3959.0|\n", "| 1XPVT5AY| Kary Hendrixson| Won| GTXPlusPro| 5055.0| 6395.0|\n", "| IH2QISS9| Kary Hendrixson| Won| GTXPro| 4889.0| 5624.0|\n", "| 7JJ73XCX| James Ascencio| Won|GTXPlusBasic| 1226.0| 1279.0|\n", "| T8QRTV6F| James Ascencio| Won| GTXPlusPro| 9150.0| 6395.0|\n", "| C7NFUAR6|Marty Freudenburg| Won|GTXPlusBasic| 1380.0| 1279.0|\n", "| NXRZZBVS| Lajuana Vencill| Won| MGRPFU| 3620.0| 3959.0|\n", "| H7ZQUWDJ| Reed Clapper| Won| GTXBasic| 660.0| 641.0|\n", "| YV47EGIC| Anna Snelling| Won| MGRPFS| 44.0| 64.0|\n", "| SC4LUMPZ| Elease Gluck| Won| MGRPFS| 64.0| 64.0|\n", "| ZES3NR0F|Versie Hillebrand| Won| MGRPFS| 54.0| 64.0|\n", "| KJ1JOOQ0| Niesha Huffines| Won| MGRPFS| 61.0| 64.0|\n", "| ZHV68QKO| Reed Clapper| Won| GTXPro| 4551.0| 5624.0|\n", "| 4RE1ST7V| Kami Bicknell| Won| GTXPlusPro| 5213.0| 6395.0|\n", "+--------------+-----------------+----------+------------+-----------+------------+\n", "only showing top 20 rows\n", "\n", "6438\n", "+--------------+------------------+------------+-----------+------------+\n", "|opportunity_id| sales_agent| product|final_price|actual_price|\n", "+--------------+------------------+------------+-----------+------------+\n", "| J78AK31N| Kami Bicknell| MGRPFU| 4794.0| 3959.0|\n", "| 8I9PRPGN| Versie Hillebrand| MGRPFS| 67.0| 64.0|\n", "| 4VHUTHOJ| Kami Bicknell|GTXPlusBasic| 1480.0| 1279.0|\n", "| T8QRTV6F| James Ascencio| GTXPlusPro| 9150.0| 6395.0|\n", "| C7NFUAR6| Marty Freudenburg|GTXPlusBasic| 1380.0| 1279.0|\n", "| H7ZQUWDJ| Reed Clapper| GTXBasic| 660.0| 641.0|\n", "| 8S4H8GNZ| Reed Clapper| MGRPFU| 4750.0| 3959.0|\n", "| P007M3B8| Marty Freudenburg| MGRPFS| 75.0| 64.0|\n", "| KI49INQW| Gladys Colclough|GTXPlusBasic| 1524.0| 1279.0|\n", "| 2NAKBIH8| Reed Clapper| GTXPro| 7007.0| 5624.0|\n", "| TGYPRG6Y| Wilburn Farren|GTXPlusBasic| 1566.0| 1279.0|\n", "| H8ONBK8M| Kary Hendrixson|GTXPlusBasic| 1631.0| 1279.0|\n", "| WB5W4F6P| Reed Clapper| GTXPro| 6894.0| 5624.0|\n", "| VB2E4FRU| Elease Gluck| MGRPFU| 4712.0| 3959.0|\n", "| VUNCUB75|Jonathan Berthelot|GTXPlusBasic| 1307.0| 1279.0|\n", "| 9UAMZCLW| Elease Gluck| MGRPFU| 4389.0| 3959.0|\n", "| FH7HBET2| Moses Frase|GTXPlusBasic| 1509.0| 1279.0|\n", "| WHRDPR4H| Darcel Schlecht| GTXBasic| 875.0| 641.0|\n", "| 9L5HPLM6| Niesha Huffines| GTXBasic| 688.0| 641.0|\n", "| AZF4JUJH| Kami Bicknell| MGRPFS| 66.0| 64.0|\n", "+--------------+------------------+------------+-----------+------------+\n", "only showing top 20 rows\n", "\n", "3386\n" ] } ], "source": [ "#solution code here\n", "#joining sales_pipeline and products tables to get the mapping of actual and final prices\n", "df_sp_prod_prob2 = df_sales_pipeline.filter(col(\"deal_stage\")==\"Won\")\\\n", " .select(\"opportunity_id\",\"sales_agent\",\"deal_stage\",\"product\",\"close_value\").alias(\"a\")\\\n", " .join(df_products.select(\"product\",\"sales_price\").alias(\"b\"),\n", " on=col(\"a.product\")==col(\"b.product\"),how='left')\\\n", " .select([col(\"a.opportunity_id\"),col(\"a.sales_agent\"),col(\"a.deal_stage\"),col(\"a.product\"),\n", " col(\"a.close_value\").alias(\"final_price\"),\n", " col(\"b.sales_price\").alias(\"actual_price\")])\n", "df_sp_prod_prob2.show(20)\n", "print (df_sp_prod_prob2.count())\n", "\n", "#profit means final price (close_value) is greater than actual price (sales_price)\n", "result_prob2 = df_sp_prod_prob2.where(col(\"final_price\") > col(\"actual_price\"))\n", "result_prob2 = result_prob2.select(\"opportunity_id\",\"sales_agent\",\"product\",\"final_price\",\"actual_price\")\n", "result_prob2.show()\n", "print (result_prob2.count())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PROBLEM 3: Display the 'Opportunity ID' and 'Days Taken to Close', for opportunities those got closed within a month" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------+----------------+\n", "|opportunity_id|DaysTakenToClose|\n", "+--------------+----------------+\n", "| 67HY0MW7| 12|\n", "| R22O68FF| 98|\n", "| J78AK31N| 81|\n", "| 8I9PRPGN| 41|\n", "| 4VHUTHOJ| 1|\n", "| TMJ0OJ0B| 10|\n", "| MD4PBMNN| 95|\n", "| 1XPVT5AY| 57|\n", "| IH2QISS9| 94|\n", "| 7JJ73XCX| 61|\n", "| T8QRTV6F| 10|\n", "| C7NFUAR6| 65|\n", "| NXRZZBVS| 52|\n", "| H7ZQUWDJ| 9|\n", "| YV47EGIC| 65|\n", "| SC4LUMPZ| 53|\n", "| ZES3NR0F| 89|\n", "| KJ1JOOQ0| 13|\n", "| ZHV68QKO| 89|\n", "| 4RE1ST7V| 22|\n", "| 8S4H8GNZ| 65|\n", "| P007M3B8| 109|\n", "| KI49INQW| 50|\n", "| 2NAKBIH8| 87|\n", "| QRYFOK47| 36|\n", "| 3X1QOEBM| 68|\n", "| TGYPRG6Y| 92|\n", "| H8ONBK8M| 56|\n", "| H0NRZ2VX| 22|\n", "| NC7SHGMD| 71|\n", "+--------------+----------------+\n", "only showing top 30 rows\n", "\n", "6438\n", "+--------------+----------------+\n", "|opportunity_id|DaysTakenToClose|\n", "+--------------+----------------+\n", "| 67HY0MW7| 12|\n", "| 4VHUTHOJ| 1|\n", "| TMJ0OJ0B| 10|\n", "| T8QRTV6F| 10|\n", "| H7ZQUWDJ| 9|\n", "| KJ1JOOQ0| 13|\n", "| 4RE1ST7V| 22|\n", "| H0NRZ2VX| 22|\n", "| VB2E4FRU| 8|\n", "| FLXHSKT4| 13|\n", "| WHRDPR4H| 12|\n", "| IF0LPAQA| 15|\n", "| Z1L5OUDD| 12|\n", "| NLKOGB9I| 18|\n", "| MTEPBRDZ| 6|\n", "| RVTAL02P| 11|\n", "| 4E73L1M3| 28|\n", "| NASE5KTW| 12|\n", "| LXTS18HY| 6|\n", "| WA0B8VK9| 18|\n", "| SPR1ZGYU| 30|\n", "| L3BMFOAZ| 9|\n", "| 2CFS9GLC| 10|\n", "| K1DFBO5B| 13|\n", "| TRN9B8S9| 1|\n", "| GCNZ7C5H| 13|\n", "| ZWSHCZO3| 18|\n", "| JEFB24SP| 18|\n", "| 9UXVO2DF| 7|\n", "| WCP1FZLS| 10|\n", "+--------------+----------------+\n", "only showing top 30 rows\n", "\n", "2196\n" ] } ], "source": [ "#solution code here\n", "#Filter on deal_stage and parsing the date column to date type and using datediff\n", "result_prob3 = df_sales_pipeline.filter(col(\"deal_stage\")==\"Won\")\\\n", " .withColumn(\"DaysTakenToClose\", \n", " datediff(to_date(\"close_date\",\"yyyy-MM-dd\"),\n", " to_date(\"created_on\",\"yyyy-MM-dd\"))).select(\"opportunity_id\",\"DaysTakenToClose\")\n", "result_prob3.show(30)\n", "print (result_prob3.count())\n", "\n", "#hardcoding the range of month as 30 for now\n", "result_prob3 = result_prob3.filter(col(\"DaysTakenToClose\") <= 30)\n", "result_prob3.show(30)\n", "print (result_prob3.count())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PROBLEM 4: Display product(s) got maximum leads (by count) generated from paid source" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "#solution code here\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PROBLEM 5: Display 'Sales Agent' and 'Opportunity Count', for those sales agents who lost atleast two opportunties" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+------------------+----------------------+\n", "| sales_agent|numOfLostOpportunities|\n", "+------------------+----------------------+\n", "| Darcel Schlecht| 337|\n", "| Kami Bicknell| 134|\n", "| Vicki Laflamme| 162|\n", "| Elease Gluck| 62|\n", "|Jonathan Berthelot| 185|\n", "| Daniell Hammack| 80|\n", "| Anna Snelling| 140|\n", "| Cassey Cress| 137|\n", "| Garret Kinder| 63|\n", "| Markita Hansen| 115|\n", "| Reed Clapper| 87|\n", "|Rosie Papadopoulos| 56|\n", "| Maureen Marcano| 119|\n", "| Violet Mclelland| 111|\n", "| Gladys Colclough| 149|\n", "| Boris Faz| 63|\n", "| Wilburn Farren| 44|\n", "| Versie Hillebrand| 118|\n", "| Marty Freudenburg| 120|\n", "| Cecily Lampkin| 86|\n", "+------------------+----------------------+\n", "only showing top 20 rows\n", "\n", "30\n", "+------------------+----------------------+\n", "| sales_agent|numOfLostOpportunities|\n", "+------------------+----------------------+\n", "| Darcel Schlecht| 337|\n", "| Kami Bicknell| 134|\n", "| Vicki Laflamme| 162|\n", "| Elease Gluck| 62|\n", "|Jonathan Berthelot| 185|\n", "| Daniell Hammack| 80|\n", "| Anna Snelling| 140|\n", "| Cassey Cress| 137|\n", "| Garret Kinder| 63|\n", "| Markita Hansen| 115|\n", "| Reed Clapper| 87|\n", "|Rosie Papadopoulos| 56|\n", "| Maureen Marcano| 119|\n", "| Violet Mclelland| 111|\n", "| Gladys Colclough| 149|\n", "| Boris Faz| 63|\n", "| Wilburn Farren| 44|\n", "| Versie Hillebrand| 118|\n", "| Marty Freudenburg| 120|\n", "| Cecily Lampkin| 86|\n", "+------------------+----------------------+\n", "only showing top 20 rows\n", "\n", "30\n", "validation test: True\n" ] } ], "source": [ "#solution code here\n", "result_prob5 = df_sales_pipeline.filter(col(\"deal_stage\")==\"Lost\")\\\n", " .select(\"sales_agent\",\"opportunity_id\")\\\n", " .groupBy(\"sales_agent\").agg(count(\"opportunity_id\").alias(\"numOfLostOpportunities\"))\n", "result_prob5.show(20)\n", "print (result_prob5.count())\n", "\n", "result_prob5 = result_prob5.filter(col(\"numOfLostOpportunities\") >= 2)\n", "\n", "result_prob5.show(20)\n", "print (result_prob5.count())\n", "\n", "#validation test\n", "print (\"validation test: \", result_prob5.agg(sum(\"numOfLostOpportunities\")).collect()[0][0] == df_sales_pipeline.filter(col(\"deal_stage\")==\"Lost\").count())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PROBLEM 6: Display in ascending order of revenue, 'Account' and 'Revenue' for telecom accounts " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------------+-------+\n", "| account|revenue|\n", "+--------------------+-------+\n", "| Stanredtax| 14.79|\n", "| Fasehatice| 19.2|\n", "| Kan-code| 22.63|\n", "| Treequote| 73.1|\n", "| Konmatfix| 82.96|\n", "|Olivia Pope & Ass...| 97.94|\n", "| Donquadtech| 110.88|\n", "| Warephase| 130.62|\n", "| Soylent Corp| 136.89|\n", "| Iselectrics| 138.63|\n", "| Yearin| 144.68|\n", "| Ganjaflex| 161.8|\n", "| Sterling Cooper| 204.47|\n", "| Rangreen| 211.12|\n", "| Xx-zobam| 221.86|\n", "| Hatfan| 223.54|\n", "| Betatech| 239.22|\n", "| Duff Beer| 244.32|\n", "| Good Burger| 247.91|\n", "| Blackzim| 256.32|\n", "| Nam-zim| 369.58|\n", "| Krusty Krab| 372.21|\n", "| Hooli| 374.84|\n", "| Finhigh| 398.18|\n", "| Hottechi| 431.1|\n", "| Bubba Gump| 459.79|\n", "| Green-Plus| 562.28|\n", "| Sonron| 588.46|\n", "| Codehow| 638.03|\n", "| Lexiqvolax| 652.53|\n", "+--------------------+-------+\n", "only showing top 30 rows\n", "\n", "97\n" ] } ], "source": [ "#solution code here\n", "df_accounts = df_accounts.withColumn(\"revenue\",df_accounts[\"revenue\"].cast(DoubleType()))\n", "result_prob6 = df_accounts.orderBy(\"revenue\",ascending=True).select(\"account\",\"revenue\")\n", "result_prob6.show(30)\n", "print (result_prob6.count())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PROBLEM 7: Display by revenue generated, bottom five 'Industries' and 'Revenue'" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+-----------------+-------+\n", "| account|revenue|\n", "+-----------------+-------+\n", "| Dalttechnology| 6085.6|\n", "| Silis|5339.57|\n", "| Newex|5093.78|\n", "| Wonka Industries|4962.27|\n", "| Faxquote|4939.54|\n", "| Sunnamplex|4592.96|\n", "| Isdom|4514.68|\n", "| Golddex|4340.32|\n", "| Conecom|4242.85|\n", "| Stark Industries|4221.65|\n", "| Scotfind|3911.27|\n", "| Ron-tech|3805.02|\n", "| Gogozoom| 3577.1|\n", "| Bioholding| 3321.9|\n", "| Zumgoity| 3264.4|\n", "|Wayne Enterprises|3193.45|\n", "| Dontechi|2990.17|\n", "| Gekko & Co|2934.79|\n", "| Zathunicon|2913.82|\n", "| Labdrill|2913.26|\n", "+-----------------+-------+\n", "only showing top 20 rows\n", "\n", "97\n", "+----------+-------+\n", "| account|revenue|\n", "+----------+-------+\n", "|Stanredtax| 14.79|\n", "|Fasehatice| 19.2|\n", "| Kan-code| 22.63|\n", "| Treequote| 73.1|\n", "| Konmatfix| 82.96|\n", "+----------+-------+\n", "only showing top 5 rows\n", "\n" ] } ], "source": [ "#solution code here\n", "result_prob7 = df_accounts.orderBy(\"revenue\",ascending=False).select(\"account\",\"revenue\")\n", "result_prob7.show(20)\n", "print (result_prob7.count())\n", "#generating id incrementally for accounts sorted by revenue in decreasing order\n", "result_prob7 = result_prob7.withColumn(\"index\", monotonically_increasing_id())\n", "#sorting by index in descending order so that we get accounts with least revenue (bottom 5)\n", "result_prob7.orderBy(desc(\"index\")).drop(\"index\").show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PROBLEM 8: Display 'Month of Year' vs 'Sales', for GTXBasic. NOTE: \"Month of Year\" means month year (eg. Jan) and \"Month\" means month (eg. Jan 2020, Jan 2021 etc)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------+-----------+--------+----------+----------+-------------+\n", "|opportunity_id| deal_stage| product|created_on|close_date|Month Of year|\n", "+--------------+-----------+--------+----------+----------+-------------+\n", "| 67HY0MW7| Won|GTXBasic|2017-04-24|2017-05-06| Apr 2017|\n", "| BRL1KVVH| Lost|GTXBasic|2017-05-19|2017-08-03| May 2017|\n", "| R22O68FF| Won|GTXBasic|2017-03-21|2017-06-27| Mar 2017|\n", "| TMJ0OJ0B| Won|GTXBasic|2017-06-30|2017-07-10| Jun 2017|\n", "| B22V5Z3B| Lost|GTXBasic|2017-07-25|2017-08-07| Jul 2017|\n", "| H7ZQUWDJ| Won|GTXBasic|2017-12-21|2017-12-30| Dec 2017|\n", "| HC77MZU1| Lost|GTXBasic|2017-07-25|2017-09-07| Jul 2017|\n", "| 27QUC5M7|In_Progress|GTXBasic|2017-11-01| null| Nov 2017|\n", "| N3DXP5OP|In_Progress|GTXBasic|2017-02-05| null| Feb 2017|\n", "| ZLLF7UHU| Lost|GTXBasic|2017-09-20|2017-12-08| Sep 2017|\n", "| QRYFOK47| Won|GTXBasic|2017-03-27|2017-05-02| Mar 2017|\n", "| QCA1IIKF|In_Progress|GTXBasic|2016-11-25| null| Nov 2016|\n", "| P0I0DTBJ| Lost|GTXBasic|2017-02-15|2017-04-04| Feb 2017|\n", "| 5S5DO3QC| Lost|GTXBasic|2017-01-08|2017-04-03| Jan 2017|\n", "| 8QSPOR0V| Lost|GTXBasic|2017-08-09|2017-09-12| Aug 2017|\n", "| WHRDPR4H| Won|GTXBasic|2017-10-19|2017-10-31| Oct 2017|\n", "| V34SNG4P|In_Progress|GTXBasic|2017-10-04| null| Oct 2017|\n", "| DNTUK4O1|In_Progress|GTXBasic|2017-05-21| null| May 2017|\n", "| 77VRR78O| Lost|GTXBasic|2017-06-21|2017-09-06| Jun 2017|\n", "| E9JHZR7J| Won|GTXBasic|2017-10-27|2017-12-16| Oct 2017|\n", "| 9L5HPLM6| Won|GTXBasic|2017-06-22|2017-10-02| Jun 2017|\n", "| OL2RFQCM| Won|GTXBasic|2017-06-20|2017-08-16| Jun 2017|\n", "| 377G0K33| Lost|GTXBasic|2017-10-23|2017-10-25| Oct 2017|\n", "| Y131Y9KM|In_Progress|GTXBasic|2017-03-10| null| Mar 2017|\n", "| NEJZ68R1| Lost|GTXBasic|2017-07-02|2017-07-07| Jul 2017|\n", "| FOSIPD0U|In_Progress|GTXBasic|2017-09-17| null| Sep 2017|\n", "| JYYU4CDI|In_Progress|GTXBasic|2017-06-25| null| Jun 2017|\n", "| 6RNOFZRB|In_Progress|GTXBasic|2017-10-21| null| Oct 2017|\n", "| NFNYO1WJ| Won|GTXBasic|2017-08-07|2017-09-11| Aug 2017|\n", "| BQ91O7T3| Won|GTXBasic|2017-05-13|2017-07-03| May 2017|\n", "+--------------+-----------+--------+----------+----------+-------------+\n", "only showing top 30 rows\n", "\n", "3062\n" ] } ], "source": [ "#solution code here\n", "#formatting created_on (sales initiation date)\n", "result_prob8 = df_sales_pipeline.filter(col(\"product\")=='GTXBasic')\\\n", " .select(\"opportunity_id\",\"deal_stage\",\"product\",\"created_on\",\"close_date\")\\\n", " .withColumn(\"Month Of year\",date_format(\"created_on\",\"LLL y\"))\n", "result_prob8.show(30)\n", "print (result_prob8.count())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PROBLEM 9: Which sales agent(s) never lost a deal. Display as a dictionary {sales agent:sales}" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dict_keys(['Donn Cantrell', 'James Ascencio', 'Niesha Huffines', 'Kami Bicknell', 'Versie Hillebrand', 'Kary Hendrixson', 'Anna Snelling', 'Vicki Laflamme', 'Darcel Schlecht', 'Marty Freudenburg', 'Lajuana Vencill', 'Reed Clapper', 'Elease Gluck', 'Rosie Papadopoulos', 'Jonathan Berthelot', 'Zane Levy', 'Rosalina Dieter', 'Cecily Lampkin', 'Gladys Colclough', 'Daniell Hammack', 'Moses Frase', 'Wilburn Farren', 'Violet Mclelland', 'Markita Hansen', 'Corliss Cosme', 'Cassey Cress', 'Boris Faz', 'Maureen Marcano', 'Hayden Neloms', 'Garret Kinder'])\n", "30\n" ] } ], "source": [ "#solution code here\n", "df_sales_notlost = df_sales_pipeline.filter(col(\"deal_stage\") != \"Lost\").select(\"opportunity_id\",\"sales_agent\",\"close_value\")\n", "result_prob9 = defaultdict(list)\n", "for row in df_sales_notlost.toJSON().collect():\n", " row_json = json.loads(row)\n", " if \"close_value\" in row_json:\n", " result_prob9[row_json['sales_agent']].append({\"opportunity_id\":row_json['opportunity_id'],\n", " \"close_value\":row_json[\"close_value\"]})\n", " else:\n", " result_prob9[row_json['sales_agent']].append({\"opportunity_id\":row_json['opportunity_id'],\n", " \"close_value\":None})\n", "print (result_prob9.keys())\n", "print (len(result_prob9.keys()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PROBLEM 10: Display 'Sales Agents', 'Product', and 'Sales', for those sales agents who closed more than one deal on same day" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------+-----------------+------------+----------------+\n", "|opportunity_id| sales_agent| product|DaysTakenToClose|\n", "+--------------+-----------------+------------+----------------+\n", "| 67HY0MW7| Donn Cantrell| GTXBasic| 12|\n", "| R22O68FF| Niesha Huffines| GTXBasic| 98|\n", "| J78AK31N| Kami Bicknell| MGRPFU| 81|\n", "| 8I9PRPGN|Versie Hillebrand| MGRPFS| 41|\n", "| 4VHUTHOJ| Kami Bicknell|GTXPlusBasic| 1|\n", "| TMJ0OJ0B| Kary Hendrixson| GTXBasic| 10|\n", "| MD4PBMNN| Anna Snelling| MGRPFU| 95|\n", "| 1XPVT5AY| Kary Hendrixson| GTXPlusPro| 57|\n", "| IH2QISS9| Kary Hendrixson| GTXPro| 94|\n", "| 7JJ73XCX| James Ascencio|GTXPlusBasic| 61|\n", "| T8QRTV6F| James Ascencio| GTXPlusPro| 10|\n", "| C7NFUAR6|Marty Freudenburg|GTXPlusBasic| 65|\n", "| NXRZZBVS| Lajuana Vencill| MGRPFU| 52|\n", "| H7ZQUWDJ| Reed Clapper| GTXBasic| 9|\n", "| YV47EGIC| Anna Snelling| MGRPFS| 65|\n", "| SC4LUMPZ| Elease Gluck| MGRPFS| 53|\n", "| ZES3NR0F|Versie Hillebrand| MGRPFS| 89|\n", "| KJ1JOOQ0| Niesha Huffines| MGRPFS| 13|\n", "| ZHV68QKO| Reed Clapper| GTXPro| 89|\n", "| 4RE1ST7V| Kami Bicknell| GTXPlusPro| 22|\n", "+--------------+-----------------+------------+----------------+\n", "only showing top 20 rows\n", "\n", "6438\n", "+------------------+------------+-----------------------+\n", "| sales_agent| product|completed_opportunities|\n", "+------------------+------------+-----------------------+\n", "| Markita Hansen|GTXPlusBasic| 2|\n", "| Darcel Schlecht| GTXPro| 2|\n", "| Marty Freudenburg| GTXBasic| 2|\n", "| Kami Bicknell|GTXPlusBasic| 2|\n", "| Gladys Colclough|GTXPlusBasic| 2|\n", "| Darcel Schlecht| MGRPFS| 2|\n", "| Darcel Schlecht|GTXPlusBasic| 2|\n", "|Jonathan Berthelot| GTXBasic| 6|\n", "| Moses Frase| GTXBasic| 2|\n", "| Kami Bicknell| GTXBasic| 2|\n", "| Cassey Cress| GTXBasic| 2|\n", "|Rosie Papadopoulos| GTXBasic| 2|\n", "| Anna Snelling| GTXPlusPro| 3|\n", "| Kary Hendrixson| GTXBasic| 2|\n", "+------------------+------------+-----------------------+\n", "\n", "14\n" ] } ], "source": [ "#solution code here\n", "result_prob10 = df_sales_pipeline.filter(col(\"deal_stage\")==\"Won\")\\\n", " .withColumn(\"DaysTakenToClose\", \n", " datediff(to_date(\"close_date\",\"yyyy-MM-dd\"),\n", " to_date(\"created_on\",\"yyyy-MM-dd\")))\\\n", " .select(\"opportunity_id\",\"sales_agent\",\"product\",\"DaysTakenToClose\")\n", "\n", "result_prob10.show(20)\n", "print (result_prob10.count())\n", "result_prob10 = result_prob10.filter(col(\"DaysTakenToClose\") == 1)\\\n", " .select(\"sales_agent\",\"product\",\"opportunity_id\")\\\n", " .groupBy(\"sales_agent\",\"product\").agg(count(\"opportunity_id\").alias(\"completed_opportunities\"))\\\n", " .filter(col(\"completed_opportunities\") > 1)\n", "\n", "result_prob10.show(20)\n", "print (result_prob10.count())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Refer & Use Orchestra.json to answer problem 11-13 below" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PROBLEM 11: Display the instrument played by Lehmann Caroline" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------------+--------------------+--------------------+---------+-------+--------------------+\n", "| concerts| id| orchestra|programID| season| works|\n", "+--------------------+--------------------+--------------------+---------+-------+--------------------+\n", "|[{1842-12-07T05:0...|00646b9f-fec7-4ff...|New York Philharm...| 3853|1842-43|[{52446*, Beethov...|\n", "|[{1843-02-18T05:0...|1118e84e-eb59-46c...|New York Philharm...| 5178|1842-43|[{52437*, Beethov...|\n", "|[{1843-04-07T05:0...|08536612-27c3-437...|Musicians from th...| 10785|1842-43|[{52364*1, Beetho...|\n", "|[{1843-04-22T05:0...|81a3b8de-1737-4c9...|New York Philharm...| 5887|1842-43|[{52434*, Beethov...|\n", "|[{1843-11-18T05:0...|09581bb7-8855-496...|New York Philharm...| 305|1843-44|[{52453*, Beethov...|\n", "|[{1844-01-13T05:0...|0848266c-8eee-48a...|New York Philharm...| 3368|1843-44|[{51668*, Mozart,...|\n", "|[{1844-03-16T05:0...|8025e763-9c12-415...|New York Philharm...| 4226|1843-44|[{3707*, Spohr, ...|\n", "|[{1844-05-18T05:0...|19d44866-ee7b-48d...|New York Philharm...| 5087|1843-44|[{52446*, Beethov...|\n", "|[{1844-11-16T05:0...|4f13ab38-9eb4-414...|New York Philharm...| 6310|1844-45|[{52456*, Beethov...|\n", "|[{1845-01-11T05:0...|7725e2f7-7f77-41d...|New York Philharm...| 1979|1844-45|[{51727*, Haydn, ...|\n", "|[{1845-03-01T05:0...|f1de3488-6831-408...|New York Philharm...| 2821|1844-45|[{52437*, Beethov...|\n", "|[{1845-04-19T05:0...|cd444e18-1fcd-4d6...|New York Philharm...| 3259|1844-45|[{52453*, Beethov...|\n", "|[{1845-11-22T05:0...|51ee4702-8f57-4fe...|New York Philharm...| 4919|1845-46|[{52575*, Mendels...|\n", "|[{1846-01-17T05:0...|1b162182-4605-4cb...|New York Philharm...| 562|1845-46|[{52446*, Beethov...|\n", "|[{1846-03-07T05:0...|66cc492e-e6eb-42e...|New York Philharm...| 1408|1845-46|[{3826*, Kalliwod...|\n", "|[{1846-04-25T05:0...|08205151-c17b-4de...|New York Philharm...| 1851|1845-46|[{51664*, Mozart,...|\n", "|[{1846-05-20T05:0...|079d4d73-e2e7-4c8...|New York Philharm...| 2321|1845-46|[{6709*16, Weber,...|\n", "|[{1846-11-21T05:0...|f679abf1-6beb-408...|New York Philharm...| 3540|1846-47|[{3864*, Spohr, ...|\n", "|[{1847-01-09T05:0...|111c22e5-0527-4e1...|New York Philharm...| 6573|1846-47|[{51658*, Mozart,...|\n", "|[{1847-03-06T05:0...|07ffca52-b177-43c...|New York Philharm...| 47|1846-47|[{52453*, Beethov...|\n", "+--------------------+--------------------+--------------------+---------+-------+--------------------+\n", "only showing top 20 rows\n", "\n", "1033\n" ] } ], "source": [ "#solution code here\n", "#Edited the orchestra.json file to retrieve the proper JSON structure to accommodate Array of JSON objects\n", "df_orchestra = spark.read.format(\"json\").option(\"multiline\",\"true\").load(\"data/Orchestra.json\")\n", "df_orchestra.show(20)\n", "print (df_orchestra.count())" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('concerts',\n", " 'array>'),\n", " ('id', 'string'),\n", " ('orchestra', 'string'),\n", " ('programID', 'string'),\n", " ('season', 'string'),\n", " ('works',\n", " 'array>,workTitle:string>>')]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_orchestra.dtypes" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "def getRequiredInstrument(ws,soloist_name):\n", " exists = None\n", " for work in ws:\n", " if len(work['soloists']) > 0:\n", " for soloist_element in work['soloists']:\n", " if soloist_element[\"soloistName\"] == soloist_name:\n", " exists = 1\n", " break\n", " if exists == 1:\n", " break\n", " else:\n", " continue\n", " if exists:\n", " return soloist_element[\"soloistInstrument\"]\n", " else:\n", " return \"\"\n", "\n", "instrumentUDF = udf(lambda colname,sname: getRequiredInstrument(colname,sname))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------------+--------------------+--------------------+---------+-----------------+\n", "| id| orchestra| works|programID|return_instrument|\n", "+--------------------+--------------------+--------------------+---------+-----------------+\n", "|21b49b4a-1805-41b...|New York Philharm...|[{52437*, Beethov...| 5792| Soprano|\n", "|89c1db0d-8c41-476...|New York Philharm...|[{52577*, Mendels...| 2149| Soprano|\n", "|4e81347f-898d-48b...|New York Philharm...|[{52374*1, Cherub...| 6613| Soprano|\n", "+--------------------+--------------------+--------------------+---------+-----------------+\n", "\n" ] } ], "source": [ "test = df_orchestra.withColumn(\"return_instrument\",instrumentUDF(\"works\",lit(\"Lehmann, Caroline\")))\n", "test.select(\"id\",\"orchestra\",\"works\",\"programID\",\"return_instrument\").filter(col(\"return_instrument\")!=\"\").show(40)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PROBLEM 12: Display all vocalists" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Row(composerName=['Beethoven, Ludwig van', 'Weber, Carl Maria Von', 'Hummel, Johann', 'Pacini, Giovanni', 'Romberg, Bernhard', 'Onslow, George', 'Onslow, George', 'Rossini, Gioachino', 'Thalberg, Sigismond', 'Mozart, Wolfgang Amadeus', 'Herz, Henri', 'Lindpaintner, Peter Von']),\n", " Row(composerName=['Beethoven, Ludwig van', 'Vieuxtemps, Henri', 'Mendelssohn, Felix', 'Donizetti, Gaetano', 'Weber, Carl Maria Von']),\n", " Row(composerName=['Beethoven, Ludwig van', None, 'Loder, George, Jr.', 'Mendelssohn, Felix', 'Lindpaintner, Peter Von', 'Beriot, Charles-Auguste de', 'Reissiger, Karl Gottlieb']),\n", " Row(composerName=['Schumann, Robert', 'Haydn, Franz Joseph', 'Ernst, Heinrich Wilhelm', None, 'Rietz, Julius', 'Weber, Carl Maria Von', 'Mollenhauer, Friedrich', None, 'Beethoven, Ludwig van']),\n", " Row(composerName=['Strauss, Richard', 'Beethoven, Ludwig van', None, 'Volkmann, Friedrich Robert', 'Schumann, Robert']),\n", " Row(composerName=['Wagner, Richard', 'Rubinstein, Anton', 'Joachim, Joseph', None, 'Beethoven, Ludwig van']),\n", " Row(composerName=['Wagner, Richard', 'Wagner, Richard', 'Schubert, Franz', None, 'Tchaikovsky, Pyotr Ilyich']),\n", " Row(composerName=['Wagner, Richard', 'Godard, Benjamin', 'Svendsen, Johan', 'Beethoven, Ludwig van', 'Beethoven, Ludwig van', 'Delibes, Léo', 'Delibes, Léo', None, 'Beethoven, Ludwig van', 'Wagner, Richard', 'Arditi, Luigi', 'Schenck, Elliott', 'Schenck, Elliott', 'Ziehrer, Carl Michael']),\n", " Row(composerName=['Bulow, Hans Von', 'Brahms, Johannes', 'Verdi, Giuseppe', 'Verdi, Giuseppe', 'Liszt, Franz', 'Pfeffer, Walter', None, 'Beethoven, Ludwig van']),\n", " Row(composerName=['Schumann, Robert', 'Saint-Saens [Saint-Saëns], Camille', None, 'Wagner, Richard', 'Wagner, Richard', 'Wagner, Richard', 'Wagner, Richard', 'Beethoven, Ludwig van'])]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#solution code here\n", "df_orchestra.select(\"works.composerName\").distinct().take(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PROBLEM 13: Display orchestra played under program id 2561" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+---------+--------------------+\n", "|programID| orchestra|\n", "+---------+--------------------+\n", "| 2561|New York Philharm...|\n", "+---------+--------------------+\n", "\n" ] } ], "source": [ "#solution code here\n", "df_orchestra.select(\"programID\",\"orchestra\").filter(col(\"programID\")==\"2561\").show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Refer & Use Orchestra.xml to answer problem 14-15 below" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PROBLEM 14: Display locations used for event at time 8:15 PM" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "#solution code here\n", "def get_locations(concerts,time):\n", " locations = []\n", " if len(concerts) > 0:\n", " for concert in concerts:\n", " if concert['Time'] == '8:15PM':\n", " locations.append(concert['Location'])\n", " if len(locations) > 0:\n", " return locations\n", " else:\n", " return \"\"\n", "\n", "locationsUDF = udf(lambda colname,sname: get_locations(colname,sname))" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------------+--------------------+--------------------+---------+---------------+\n", "| id| orchestra| concerts|programID|return_location|\n", "+--------------------+--------------------+--------------------+---------+---------------+\n", "|1124c816-f97c-4cf...| New York Symphony|[{1888-12-28T05:0...| 11610|[Manhattan, NY]|\n", "|7e1d590a-9750-4e0...| New York Symphony|[{1889-12-13T05:0...| 7921|[Manhattan, NY]|\n", "|ebe38ed5-af8a-42c...| New York Symphony|[{1890-01-31T05:0...| 8072|[Manhattan, NY]|\n", "|fa51a57c-f27c-430...| New York Symphony|[{1891-11-17T05:0...| 14304|[Manhattan, NY]|\n", "|20df0162-d9ad-407...| New York Symphony|[{1891-11-19T05:0...| 14305|[Manhattan, NY]|\n", "|c5f00d41-3ae7-499...| New York Symphony|[{1892-01-15T05:0...| 8087|[Manhattan, NY]|\n", "|de6b31a7-d117-47e...| New York Symphony|[{1892-02-05T05:0...| 8088|[Manhattan, NY]|\n", "|637c6a11-0e03-48c...| New York Symphony|[{1892-03-04T05:0...| 8089|[Manhattan, NY]|\n", "|d617e80c-a31a-415...| New York Symphony|[{1892-04-01T05:0...| 8090|[Manhattan, NY]|\n", "|d37f0df9-9e87-477...|Members of NY Phi...|[{1892-10-21T05:0...| 9742|[Manhattan, NY]|\n", "|58aa553d-b941-4b9...| New York Symphony|[{1892-11-11T05:0...| 8091|[Manhattan, NY]|\n", "|55d5fc26-abb2-409...| New York Symphony|[{1892-12-02T05:0...| 8092|[Manhattan, NY]|\n", "|384caf29-6ebc-40b...|New York Philharm...|[{1892-12-16T05:0...| 6073|[Manhattan, NY]|\n", "|2678ff7b-ecfa-4c7...| New York Symphony|[{1893-01-06T05:0...| 8093|[Manhattan, NY]|\n", "|43599c50-2a7d-40d...|New York Philharm...|[{1893-01-13T05:0...| 2391|[Manhattan, NY]|\n", "|e39ec35e-ab4e-41f...| New York Symphony|[{1893-02-03T05:0...| 8104|[Manhattan, NY]|\n", "|9786f71d-9e46-405...|New York Philharm...|[{1893-02-10T05:0...| 6730|[Manhattan, NY]|\n", "|61de5748-841a-428...|New York Philharm...|[{1893-03-03T05:0...| 3710|[Manhattan, NY]|\n", "|a9501082-5d8e-449...| New York Symphony|[{1893-03-10T05:0...| 8105|[Manhattan, NY]|\n", "|66cd0517-a1d9-4b1...|New York Philharm...|[{1893-03-24T05:0...| 3714|[Manhattan, NY]|\n", "|a8f7f5ce-cb62-4db...| New York Symphony|[{1893-11-10T05:0...| 8107|[Manhattan, NY]|\n", "|0c851376-a2b2-4a7...|New York Philharm...|[{1893-11-17T05:0...| 2589|[Manhattan, NY]|\n", "|e02e6ffd-4071-451...| New York Symphony|[{1893-11-19T05:0...| 9741|[Manhattan, NY]|\n", "|f61c75a0-7355-4a2...| New York Symphony|[{1893-12-08T05:0...| 8108|[Manhattan, NY]|\n", "|489b901d-bd34-40c...|New York Philharm...|[{1893-12-15T05:0...| 6924|[Manhattan, NY]|\n", "|324d6827-eabb-41e...| New York Symphony|[{1894-01-05T05:0...| 8109|[Manhattan, NY]|\n", "|50fcb7b0-f8e7-407...|New York Philharm...|[{1894-01-12T05:0...| 3280|[Manhattan, NY]|\n", "|f439468c-ee5b-43a...|New York Philharm...|[{1894-02-09T05:0...| 252|[Manhattan, NY]|\n", "|d16b069c-3509-454...|New York Philharm...|[{1894-03-09T05:0...| 4619|[Manhattan, NY]|\n", "|3bbe1795-301b-4a3...| New York Symphony|[{1894-03-16T05:0...| 8112|[Manhattan, NY]|\n", "|c5ccb40c-2863-4c2...|New York Philharm...|[{1894-04-06T05:0...| 1598|[Manhattan, NY]|\n", "|f5c265a2-e66d-4a0...| New York Symphony|[{1894-11-09T05:0...| 8113|[Manhattan, NY]|\n", "|4dba2644-4271-43e...|New York Philharm...|[{1894-11-16T05:0...| 3497|[Manhattan, NY]|\n", "|3e41efc1-0f56-494...| New York Symphony|[{1894-12-07T05:0...| 8114|[Manhattan, NY]|\n", "|6ad4c55c-ec61-41a...|New York Philharm...|[{1894-12-14T05:0...| 435|[Manhattan, NY]|\n", "|eb3ed63a-6b27-449...| New York Symphony|[{1894-12-29T05:0...| 11624|[Manhattan, NY]|\n", "|6587d4bd-e471-4c3...| New York Symphony|[{1895-01-04T05:0...| 8115|[Manhattan, NY]|\n", "|2fadbd45-5e46-482...| New York Symphony|[{1895-01-06T05:0...| 10576|[Manhattan, NY]|\n", "|eb47e326-1219-466...|New York Philharm...|[{1895-01-11T05:0...| 4191|[Manhattan, NY]|\n", "|a51c295b-e56d-4a3...| New York Symphony|[{1895-02-01T05:0...| 8118|[Manhattan, NY]|\n", "+--------------------+--------------------+--------------------+---------+---------------+\n", "only showing top 40 rows\n", "\n" ] } ], "source": [ "test = df_orchestra.withColumn(\"return_location\",locationsUDF(\"concerts\",lit(\"8:15PM\")))\n", "test.select(\"id\",\"orchestra\",\"concerts\",\"programID\",\"return_location\").filter(col(\"return_location\")!=\"\").show(40)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------------+\n", "| return_location|\n", "+--------------------+\n", "| [Springfield, MA]|\n", "| [Newark, NJ]|\n", "| [Manhattan, NY]|\n", "| [Providence, RI]|\n", "| [Indianapolis, IN]|\n", "| [New Haven, CT]|\n", "| [Brooklyn, NY]|\n", "| [Philadelphia, PA]|\n", "| [Hartford, CT]|\n", "|[Manhattan, NY, B...|\n", "|[Manhattan, NY, M...|\n", "+--------------------+\n", "\n" ] } ], "source": [ "test.filter(col(\"return_location\")!=\"\").select(\"return_location\").distinct().show(40)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PROBLEM 15: Display total number of programs" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1033" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#solution code here\n", "df_orchestra.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "********************************************* Test ends here **************************************************" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: Byndr/README.md ================================================ # Interview Experience for Data Engineer (Big Data) - 2021: ### First Round: + What is Hive? + What is Hive metastore? + Hive external table vs managed table. + evalute Programming skills - one question on that. + mappers and reducers in Spark execution flow. + what are `groupByKey` and `reduceByKey` operations in Spark. + Write spark skeleton code for given scenario - reading a CSV and do some transformations on it - Code doesn't have to be exact but steps matter. + Given a requirement (find the second largest value of a column for every value of other column - Window functions in SQL), what are the steps to achieve it using only spark-sql without dataframes/RDDs API in spark? + Questions on SQL for window functions in spark (SQL query is preferred compared to Dataframes API). + Different storage formats in big data space? questions on that (`parquet`, `CSV`, `JSON`, `avro`, `delta` etc). + Questions on Hadoop ecosystem if any. ================================================ FILE: CloudCover/README.md ================================================ # Personal Experience for Data Engineer - 2021: ## First round: Live Programming round - Given 1.5 hrs of time, you need to solve a problem statement live on web based IDE. ## Second round: Techical Discussion on the projects done, Some interview questions on Spark,Hive,Hadoop: - About windows functions in spark, what they are used for in spark? - About UDFs in spark and how they are different from normal spark functions in performance. - Difference between external table and managed table in hive. - experience on moving large scale data between different sinks? cloud migrations? database migrations? - experience writing complex sql queries? - behaviorial questions. - current project and what are the responsibilities of your role specifically in your company. - some other questions on bigdata, which I don't remember right now 😁 ================================================ FILE: DATFreightAnalytics/README.md ================================================ ## Interview Process for Machine Learning Engineer (Personal Experience) 1. Resume Screening. 2. Technical Assessment (Take Home Assignment). - Questions are mostly around MLOps concepts, Machine Learning Engineering and Designing systems. 3. Technical Round as a followup and questions. 4. Hiring Manager Round. 5. Leadership interview round. 6. Final - HR Interview covering cultural and behavioral aspects. ================================================ FILE: Facebook/ImplementSTRSTR.py ================================================ { #Contributed by : Nagendra Jha import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__ == '__main__': t=int(input()) for cases in range(t): s,p =input().strip().split() print(strstr(s,p)) } ''' This is a function problem.You only need to complete the function given below ''' ''' Your task is to return the index of the pattern present in the given string. Function Arguments: s (given text), p(given pattern) Return Type: Integer. ''' def strstr(s,p): #code here import re loc = re.search(p,s) if loc is not None: return (loc.start()) else: return -1 ================================================ FILE: FactSet/convertArrayToWave.py ================================================ { #Initial Template for Python 3 import math def main(): T=int(input()) while(T>0): N=int(input()) A=[int(x) for x in input().split()] convertToWave(A,N) for i in A: print(i,end=" ") print() T-=1 if __name__=="__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 #Complete this function def convertToWave(A,N): #Your code here temp=0 for i in range(N-1): if i%2 == 0: if A[i] < A[i+1]: temp=A[i] A[i]=A[i+1] A[i+1]=temp else: continue else: if A[i] > A[i+1]: temp=A[i] A[i]=A[i+1] A[i+1]=temp else: continue ================================================ FILE: Flipkart/addTwoNumbers_LinkedListRep.py ================================================ { #Initial Template for Python 3 #Contributed by : Nagendra Jha import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None # Linked List Class class LinkedList: def __init__(self): self.head = None # creates a new node with given value and appends it at the end of the linked list def append(self, new_value): new_node = Node(new_value) if self.head is None: self.head = new_node return curr_node = self.head while curr_node.next is not None: curr_node = curr_node.next curr_node.next = new_node # prints the elements of linked list starting with head def printList(head): if head is None: print(' ') return curr_node = head while curr_node: print(curr_node.data,end=" ") curr_node=curr_node.next print(' ') if __name__ == '__main__': t=int(input()) for cases in range(t): n_a = int(input()) a = LinkedList() # create a new linked list 'a'. nodes_a = list(map(int, input().strip().split())) nodes_a = nodes_a[::-1] # reverse the input array for x in nodes_a: a.append(x) # add to the end of the list n_b =int(input()) b = LinkedList() # create a new linked list 'b'. nodes_b = list(map(int, input().strip().split())) nodes_b = nodes_b[::-1] # reverse the input array for x in nodes_b: b.append(x) # add to the end of the list result_head = addBoth(a.head,b.head) printList(result_head) } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Function to add two numbers represented in the form of the linked list. Function Arguments: head_a and head_b (heads of both the linked lists) Return Type: head of the resultant linked list. __>IMP : numbers are represented in reverse in the linked list. Ex: 145 is represented as 5->4->1. resultant head is expected in the same format. # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None ''' def addBoth(head_a,head_b): #code here result = LinkedList() num1 = "" curr_node = head_a while curr_node != None: num1 += str(curr_node.data) curr_node = curr_node.next num1 = num1[::-1] num2 = "" curr_node = head_b while curr_node != None: num2 += str(curr_node.data) curr_node = curr_node.next num2 = num2[::-1] num = int(num1) + int(num2) num = str(num)[::-1] for i in num: result.append(i) return result.head ================================================ FILE: Flipkart/countOfInversionsArray.py ================================================ # GeeksForGeeks Code - Copied# { #Initial Template for Python 3 import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__=='__main__': t = int(input()) for tt in range(t): n = int(input()) a = list(map(int, input().strip().split())) print(Inversion_Count(a,n)) } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Your task is to return total number of inversions present in the array. Function Arguments: array a and size n Return Type: Integer ''' def Inversion_Count(arr,n): if a == sorted(a): return 0 temp_arr = [0]*n return mergesort(arr,temp_arr,0,n-1) def mergesort(arr,temp_arr,left,right): inv_count = 0 if left < right: mid = (left + right)//2 inv_count = mergesort(arr,temp_arr,left,mid) inv_count += mergesort(arr,temp_arr,mid+1,right) inv_count += merge(arr,temp_arr,left,mid,right) return inv_count def merge(arr,temp_arr,left, mid, right): # Merge the temp arrays back into arr[l..r] i = left # Initial index of first subarray j = mid+1 # Initial index of second subarray k = left # Initial index of merged subarray invcount = 0 while i <= mid and j <= right: if arr[i] <= arr[j]: temp_arr[k] = arr[i] i += 1 else: temp_arr[k] = arr[j] invcount += (mid - i + 1) j += 1 k += 1 # Copy the remaining elements of L[], if there # are any while i <= mid: temp_arr[k] = arr[i] i += 1 k += 1 # Copy the remaining elements of R[], if there # are any while j <= right: temp_arr[k] = arr[j] j += 1 k += 1 for lr in range(left, right + 1): arr[lr] = temp_arr[lr] return invcount ================================================ FILE: Flipkart/parenthesisChecker.py ================================================ { #Initial Template for Python 3 import atexit import io import sys #Contributed by : Nagendra Jha _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__ == '__main__': test_cases = int(input()) for cases in range(test_cases) : #n = int(input()) #n,k = map(int,imput().strip().split()) #a = list(map(int,input().strip().split())) s = str(input()) if ispar(s): print("balanced") else: print("not balanced") } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Function Arguments : @param : s (given string containing parenthesis) @return : boolean True or False ''' def isMatchingPair(c1,c2): if (c1=='(') & (c2==')'): return True elif (c1=='{') & (c2=='}'): return True elif (c1=='[') & (c2==']'): return True else: return False def ispar(s): # code here import queue stack = queue.LifoQueue() for i in range(len(s)): if ((s[i] == '{') | (s[i] == '[') | (s[i] == '(')): stack.put(s[i]) if ((s[i] == '}') | (s[i] == ']') | (s[i] == ')')): if stack.empty(): return False elif not isMatchingPair(stack.get(),s[i]): return False if stack.empty(): return True else: return False ================================================ FILE: Fractal_Analytics/Arrays.md ================================================ Given an array A of size N as an input, return the following: + for every element of array, find the sum of elements upto that element(inclusive): Examples: ``` A = [3,5,6,9,10], N = 5 for 3, sum of elements upto 3 is 0 + 1 + 2 + 3 = 6. for 5, sum of elements upto 5 is 0 + 1 + 2 + 3 + 4 + 5 = 15. for 6, sum of elements upto 6 is 0 + 1 + 2 + 3 + 4 + 5 + 6 = 21. for 9, sum of elements upto 9 is 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 = 45. for 10, sum of elements upto 10 is 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 55. ================================================ FILE: Fractal_Analytics/Comparator.md ================================================ 1. Write a comparator class with overloaded methods that perform below operations - a. `compare(int a,int b)` that takes two integers `a` and `b` and returns `True` if `a == b` else `False`. - b. `compare(string a,string b)` that takes two strings `a` and `b` as parameters and returns `True` if `a == b` else `False`. - c. `compare(int a[],int b[])` that takes two 1-Dimensional Arrays `a` and `b` as parameters and checks the following conditions - if length of `a` is equal to `b`. - for every index `i` of a, `a[i] == b[i]`. returns `True` only if above two conditions hold true otherwise returns `False`. ================================================ FILE: Fractal_Analytics/README.md ================================================ # Interview Process for Analytics Consultant - 2019: ### First round: + Hackerrank Test (Python + SQL) or (R + SQL) test (4 python/R + 4 sql questions)(python questions are normal coding questions). ### Second round: + Questions on projects from CV/Machine learning algorithms for time series,classification usecases and case study(How would you approach the problem of maximizing a variable from model results)? # Interview Process for Analytics Consultant - 2020: ### First round: + DoSelect Test (Python + SQL) or (R + SQL) test (4 python/R + 4 sql questions) (Python questions are not coding questions but questions on pandas and numpy coding). # Interview Process for Senior Big Data Engineer - 2021: ### First round: + DoSelect Test (Coding Test). ### Second round: + Technical questions on Hive, Spark (Accumulator and Broadcast variables usecases), real world examples, and testing coding skills - Data structures and algorithms (Time complexity and Space complexity analysis). + Focus on Spark application performance tuning techniques. + In-depth SQL questions - solve using either SQL query or dataframes/datasets API in spark. ### Third round: + Interview with the Director/VP on background, questions to check if you are the right fit for the role, etc. ### Fourth round: + HR round - process improvements, challenges faced in the past etc. ================================================ FILE: Fractal_Analytics/countChampNumbers.py ================================================ """ find the count of champ numbers in given range [X,Y] (inclusive) Champ numbers are numbers which have all the digits as unique and no digit should be greater than 5 """ def countChampNumbers(X,Y): count = 0 for i in range(X,Y+1): if len(str(i)) == 1 and i <= 5: print (i) count += 1 else: if len(set(str(i))) == len(str(i)): if not (('6' in str(i)) or ('7' in str(i)) or ('8' in str(i)) or ('9' in str(i))): print(i) count += 1 return count if __name__ == '__main__': T = int(input()) for tc in range(T): X,Y = list(map(int, input().strip().split())) print (countChampNumbers(X,Y)) ================================================ FILE: Fractal_Analytics/countOfAnagrams.py ================================================ """ Given a text and a word, find the count of occurrences of anagrams of word in given text """ def countAnagrams(text,word): count=0 b = text a = word for i in range(len(b)-len(a)+1): if(sorted(a)==sorted(b[i:i+len(a)])): count=count+1 return count if __name__ == '__main__': text = str(input()) word = str(input()) print (countAnagrams(text, word)) ================================================ FILE: Fractal_Analytics/numberOfGroups.py ================================================ """ A pole of the magnet is represented as "01" or "10" 1 = North pole and 0 = South pole we know the like poles of magnet repel each other but unlike poles attract. Suppose a magnet has pole "10" and adjacent magnet also has "10" they attract if a magnet "01" is located adjacent to magnet "10" they repel. Given a list of poles of different magnets, find the number of the separate groups that magnets form into example: number of magnets = 3 01 01 10 number of groups = 2 because first 2 magnets attract each other so will be in the same group where as the third one repels second one so will be in different group """ def numGroups(poles): groups = [poles[0]] for i in range(1,len(poles)): if poles[i][0] == groups[-1][0]: groups.append(poles[i]) else: groups.append(' ') groups.append(poles[i]) concat_groups = ''.join(groups) print (concat_groups) return len(concat_groups.split(' ')) if __name__ == '__main__': n = int(input()) poles = [] for tc in range(n): pole = input() poles.append(pole) print (numGroups(poles)) ================================================ FILE: Fractal_Analytics/sorting_words.md ================================================ Given a sentence, sort the words of a sentence according to the their individual lengths. If one or more words have same length then sort the words by first character of the word. ``` Example: Input: The quick brown fox jumps over the lazy dog Output: dog fox the The lazy over brown jumps quick ``` + This problem can be extended to sorting the words by second letter in case of collision with first letter and continue till the end of word and stop when collision is resolved. ================================================ FILE: Fractal_Analytics/substrings.md ================================================ 2. Get the number of distinct substrings possible for a given string, you can use the below operations: - a. Remove 0 or more characters from left side of the string. - b. Remove 0 or more characters from right side of string. - c. Remove 0 or more characters from both left and right side of string. Examples: ``` Input : str = “ababa” Output : 10 Total number of distinct substring are 10, which are, "", "a", "b", "ab", "ba", "aba", "bab", "abab", "baba" and "ababa" ``` ``` Input : abcd Output : abcd abc ab a bcd bc b cd c d All Elements are Distinct ``` ``` Input : aaa Output : aaa aa a aa a a All elements are not Distinct ``` ================================================ FILE: Fre8wise/README.md ================================================ # Process for Software engineer role: ## First round: 1. First round is with founder of the company - This is an exploratory call about company's profile, your background, projects and why you are looking for a change, CTC details - Even a demo of what company is doing is included on Zoom call ## Second round: 1. Technical interview with one of the technical leads. 2. Questions from the CV, work experience, challenges in projects you have undertaken. 3. Design a queueing system that serves multiple agents looking for differnet datasets being produced from different producers/systems. 4. Memory management in Data engineering pipelines. 5. Questions on lambda architecture. 6. Questions on nitty-gritty details of how you executed projects in production. 7. Focus on system design and understanding. ================================================ FILE: Fre8wise/manipulate_string.py ================================================ """ Solve the following algorithm problem: Given a string, develop a solution that removes only 1's in the beginning Examples: 1198fgh098nm -> 98fgh098nm a1198fgh098nm -> a1198fgh098nm a1198fgh098nm11 -> a1198fgh098nm11 """ s = "1198fgh098nm" def changeString(s): try: while (s.index("1") == 0): slist = list(s) temp = slist.pop(0) s = "".join(slist) except: pass return s if __name__ == '__main__': s = input() print (changeString(s)) ================================================ FILE: Goldman-Sachs/convertArrayToWave.py ================================================ { #Initial Template for Python 3 import math def main(): T=int(input()) while(T>0): N=int(input()) A=[int(x) for x in input().split()] convertToWave(A,N) for i in A: print(i,end=" ") print() T-=1 if __name__=="__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 #Complete this function def convertToWave(A,N): #Your code here temp=0 for i in range(N-1): if i%2 == 0: if A[i] < A[i+1]: temp=A[i] A[i]=A[i+1] A[i+1]=temp else: continue else: if A[i] > A[i+1]: temp=A[i] A[i]=A[i+1] A[i+1]=temp else: continue ================================================ FILE: Goldman-Sachs/numberOfSquares_in_NbyN_CheesBoard.py ================================================ def num_of_squares(n): if n == 1: return (1) else: nSquares = 0 for I in range(1,n+1): nSquares = nSquares + (I*I) return (nSquares) if __name__ == '__main__': T = int(input()) for tcase in range(T): n = int(input()) print (num_of_squares(n)) ================================================ FILE: Goldman-Sachs/printNumbersContain123.py ================================================ def contains123(arr): c = {'1','2','3'} tracker = [] arr = sorted(arr) for i in arr: a = set(str(i)) if a.issubset(c): print (i,end=" ") tracker.append(1) else: tracker.append(0) if len(set(tracker))==1 and 0 in tracker: print (-1,end="") if __name__ == '__main__': t = int(input()) for tcase in range(t): n = int(input()) arr = list(map(int,input().strip().split())) contains123(arr) print ('\n',end="") ================================================ FILE: Goldman-Sachs/repeatingCharacter_LeftmostOccurrence.py ================================================ { #Initial Template for Python 3 import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__=='__main__': t = int(input()) for i in range(t): s=str(input()) index=repeatingCharacter(s) if(index==-1): print(-1) else: print(s[index]) } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Your task is to return the lefmost index of the repeating character whose first appereance is left most or return -1 if all characters are distinct. Function Arguments: s (given string) Return Type: integer ''' def repeatingCharacter(s): #code here import collections freqs = collections.Counter(s) if len(set(freqs.values())) == 1 and 1 in set(freqs.values()): return -1 else: inds = [] for k,v in freqs.items(): if v > 1: inds.append(s.index(k)) return min(inds) ================================================ FILE: Google/First_Recurring_Character_In_String.py ================================================ from collections import Counter input_string = input() def firstRecurringChar(input_string): recurring_chars= [] count_dict = Counter(input_string) for x, y in count_dict.items(): if y > 1: recurring_chars.append(x) if recurring_chars: return recurring_chars[0] else: return (-1) if __name__ == '__main__': print (firstRecurringChar(input_string)) ================================================ FILE: Google/allocateMinimumPages.py ================================================ import math def isValid(arr,n,k,mi): std = 1 curr = 0 for i in range(n): if curr + arr[i] > mi: curr = arr[i] std += 1 if std > k: return False else: curr += arr[i] return True def allocMinPages(arr,n,k): if k > n: return -1 s,totalpage = 0,0 for i in range(n): totalpage += arr[i] s = max(s,arr[i]) e = totalpage finalAns = s while s <= e: mid = math.floor((s+e)/2) if isValid(arr,n,k,mid): finalAns = mid e = mid - 1 else: s = mid + 1 return finalAns if __name__ == '__main__': T = int(input()) for tcase in range(T): N = int(input()) arr = list(map(int,input().strip().split())) M = int(input()) print (allocMinPages(arr,N,M)) ================================================ FILE: Google/checkPairsWithGivenSum.py ================================================ """ This problem was recently asked by Google. Given a list of numbers and a number k, return whether any two numbers from the list add up to k. For example, given [10, 15, 3, 7] and k of 17, return true since 10 + 7 is 17. """ def isPairWithGivenSum(arr,n,x): left,right = 0,n-1 arr = sorted(arr) while left < right: if ((arr[left] + arr[right]) < x): left += 1 elif (arr[left] + arr[right] == x): return True elif (arr[left] + arr[right] > x): right -= 1 return False if __name__ == '__main__': T = int(input()) for tcs in range(T): arr = list(map(int,input().strip().split())) n = len(arr) x = int(input()) print (isPairWithGivenSum(arr,n,x)) ================================================ FILE: Google/maxIndexDiffOfArray.py ================================================ { #Initial Template for Python 3 import math def main(): T=int(input()) while(T>0): n=int(input()) arr=[int(x) for x in input().strip().split()] print(maxIndexDiff(arr,n)) T-=1 if __name__ == "__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 #Complete this function def maxIndexDiff(arr, n): ##Your code here maxxDiff = 0 for i in range(n): for j in range(i+1,n): if arr[i]<=arr[j]: if maxxDiff < j - i: maxxDiff = j - i return maxxDiff ================================================ FILE: Grofers/QuickSort.py ================================================ { #Initial Template for Python 3 if __name__ == "__main__": t=int(input()) for i in range(t): n=int(input()) arr=list(map(int,input().split())) quickSort(arr,0,n-1) for i in range(n): print(arr[i],end=" ") print() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 def quickSort(arr,low,high): if low < high: # pi is partitioning index, arr[p] is now # at right place pi = partition(arr,low,high) # Separately sort elements before # partition and after partition quickSort(arr, low, pi-1) quickSort(arr, pi+1, high) def partition(arr,low,high): #add code here tmp = 0 pivot = arr[high] i = low - 1 for j in range(low,high): if arr[j] <= pivot: i += 1 tmp = arr[i] arr[i] = arr[j] arr[j] = tmp tmp = arr[i+1] arr[i+1] = arr[high] arr[high] = tmp return i+1 ================================================ FILE: Guardant-Health/README.md ================================================ # Hackerrank Test (Personal Experience) ================================================ FILE: Guardant-Health/fallen_leaves.py ================================================ """ Coding Test - Hackerrank Given an array that has number of leaves on N trees (size of array = N)(arr[i] represents number of leaves for ith tree), percentage, array of days (can be in any order), starting and ending arrays Scenario: every day from each tree, given percentage of leaves will be fallen. Task: There are q queries,each query has day[q].starting[q] and ending[q] find out for each query how many leaves are fallen in total from all the trees with given start and end indices. Example: arr = [10,20,30,20,10] percentage = 30 days = [1,1,2] starting = [2,1,1] ending = [4,3,4] after first day number of fallen leaves in the given range of starting[0] and ending[0] is 6 + 9 + 6 = 21 (required answer) Remaining leaves after first day = [7,14,21,14,7] In this way answer all the queries """ ================================================ FILE: Guardant-Health/maintainMinimumStartingNumber.py ================================================ """ Coding Test - Hackerrank what is the minimum starting number to maintain such that the running sum will always be atleast 1 Example: arr = [3,-6,5,-2,1] 4 is the mininum number we need to start with Let's check: 4 + 3 = 7 7 + (-6) = 1 1 + 5 = 6 6 + (-2) = 4 4 + 1 = 5 so it's 4. arr = [-4,3,-2,1] 5 is the minimum number we need to start with Let's check: 5 + (-4) = 1 1 + 3 = 4 4 + (-2) = 2 2 + 1 3 so it's 5 """ def minStartNumber(arr): bounds = [] total = 1 for el in arr: total += (-1 * el) bounds.append(total) return max(bounds) if __name__ == '__main__': arr1 = [3,-6,5,-2,1] print (minStartNumber(arr1)) arr2 = [-4,3,-2,1] print (minStartNumber(arr2)) ================================================ FILE: HighRadius-Technologies/README.md ================================================ ## Interview questions (Personal Experience): #### 1. Confusion matrix for any classification algorithm is showed and what are precision and recall values from the matrix? + **What is precision**: `Number of True Positives / (Total number of Total Positives and False Positives)` + **What is Recall** : `Number of True Positives / (Total Number of Total Positives and False Negatives)` #### 2. How do you deal with overfitting in your machine learning model? + Regularization - (LASSO or Ridge Regression) (L1 and L2 regularization). + K-Fold cross validation with variable K. + Resampling of Train and Test splits of a datasets, sometimes involve out-of-time validation dataset. + Dimensionality Reduction in case of many features in a dataset. + Ensemble Learning. #### 3. How do you explain p-value to a layman? - Hypothesis testing. + p-value is metric by which we decide statistically significant variables. + It's a measure of how extreme an observed value is under the assumed null hypothesis: the smaller it is, the more extreme the observation. We can define p-value as the smallest significance level at which the null hypothesis would be rejected. + As the p-value gets smaller, we start wondering if the null hypothesis really is true and well maybe we should change our minds (and reject the null hypothesis). #### 4. How do you deal with concurrent predictions from decision Trees in ensemble algorithms like random forest? + Do some google research on this, there are multiple techniques to deal with this. ================================================ FILE: Hike/QuickSort.py ================================================ { #Initial Template for Python 3 if __name__ == "__main__": t=int(input()) for i in range(t): n=int(input()) arr=list(map(int,input().split())) quickSort(arr,0,n-1) for i in range(n): print(arr[i],end=" ") print() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 def quickSort(arr,low,high): if low < high: # pi is partitioning index, arr[p] is now # at right place pi = partition(arr,low,high) # Separately sort elements before # partition and after partition quickSort(arr, low, pi-1) quickSort(arr, pi+1, high) def partition(arr,low,high): #add code here tmp = 0 pivot = arr[high] i = low - 1 for j in range(low,high): if arr[j] <= pivot: i += 1 tmp = arr[i] arr[i] = arr[j] arr[j] = tmp tmp = arr[i+1] arr[i+1] = arr[high] arr[high] = tmp return i+1 ================================================ FILE: IQLECT/LargestPrimeFromSubsetSum.py ================================================ # Q)Find largest possible prime number that can be generated by adding all the elements of any subset of the given array. # Return -1 if it is impposible. # Implement the method largestPrime() below # ------------------------------ # Constraints: # Length of array < 16 # Each element in array is zero or a positive integer < 10000 # ------------------------------ # Example # Input: # [3, 3, 2, 2] # Output: # 7 # Input: # [0, 2, 4, 1, 2, 4, 4] # Output: # 17 # Input: # [6] # Output: # -1 # Implement this method to return the expected output import itertools def largestPrime(a): allPrimes = [] for i in range(1,len(a)+1): allCombs = list(itertools.combinations(a,i)) for j in allCombs: req_num = sum(j) if req_num == 2 | req_num == 3: allPrimes.append(req_num) isPrime = [] if (req_num > 1) & (req_num != 2) & (req_num != 3): for i in range(2, req_num//2): if (req_num % i) == 0: isPrime.append(0) break else: isPrime.append(1) else: isPrime.append(0) if 1 in isPrime and 0 not in isPrime: allPrimes.append(req_num) if allPrimes: return max(allPrimes) else: return -1 # No need to change below this # No need to change below this # No need to change below this # No need to change below this # No need to change below this ################################################################################################################# ################################################################################################################# ################################################################################################################# ################################################################################################################# ################################################################################################################# ################################################################################################################# ################################################################################################################# ################################################################################################################# ################################################################################################################# # No need to change below this import traceback def testLargestPrime(): data = [ [3, 3, 2, 2], [0, 2, 4, 1, 2, 4, 4], [6], [87, 47, 33], [772, 312, 421, 706, 258, 583], [854, 964, 59, 747, 753, 511, 40, 340, 703], [2, 9], [7, 6, 5, 2, 2, 5, 0, 0], [36, 81, 37, 89, 36, 77, 89, 33, 36, 45], [0,3,49] ] output = [7, 17, -1, 167, 1697, 4931, 11, 23, 523, 3] accepted = 0 for i in range(len(data)): try: print ("Test case: " + str(i)) print ("Input:") print (str(data[i])) print ("Expected output:") print (output[i]) print ("Actual Output:") ans = largestPrime(data[i]) print (ans) if(ans == output[i]): accepted +=1 except: print(traceback.format_exc()) print ("\n\n") print ("Verdict:") if(accepted == len(output)): print ("------All test passed-------") else: print ("-----Test Failed: " + str(accepted) + " tests passed out of " + str(len(output)) + "-----") testLargestPrime() ================================================ FILE: IQLECT/README.md ================================================ # Asked to solve these questions in-person (In office) (No platform) ================================================ FILE: InMobi/README.md ================================================ # Interview Process for Big Data Engineer 2020 (Personal): 1. First round of interview with onshore engineering manager from U.S.A. 2. Questions about your work experience on distributed systems, spark and data engineering pipelines. 3. what is it that you do as a part of your daily routine at work? 4. Tell me your understanding about hadoop? 5. What kind of challenges did you face in your projects? and How did you resolve them? 6. How do you handle memory leaks/failure of jobs due to memory errors? 7. Explain your understanding of hive internal table vs external table? 8. How did you optimize complex queries in hive? - partitions, buckets etc 9. How do you handle spark applications that are taking too long (certain task is leading to memory leaks)? ================================================ FILE: Infrrd/problem01/maximumRowsWithAll1s.py ================================================ import collections def maximumRows(binary_matrix,K): N = len(binary_matrix) M = len(binary_matrix[0]) numOf1s = [collections.Counter(x)[1] for x in binary_matrix] #Number of flips <= K maximum1s = max(numOf1s) #numOfRows_Maximum1s = collections.Counter(numOf1s)[maximum1s] total = [] for i in range(len(binary_matrix)): xyPairs1 = [] for j in range(M): if binary_matrix[i][j] == 1: xyPairs1.append(j) total.append(xyPairs1) maximum1sRows = [k for k,v in enumerate(numOf1s) if v == maximum1s] #maxmimize first Row finalCountList = [] for e in maximum1sRows: row = set(total[e]) UniversalSet = set(range(M)) needInc = UniversalSet-row checkList = sorted(list(set(range(N)) - {e})) a = checkList[0] b = checkList[len(checkList)-1] countList = [] for u in total[a:b+1]: count = 0 if needInc & set(u): count += len(needInc & set(u)) countList.append(count) finalCountList.append(countList) return (len(finalCountList)//K) if __name__ == '__main__': N,M,K = list(map(int,input().strip().split())) binary_matrix = [] for I in range(N): row = list(map(int,input().strip().split())) binary_matrix.append(row) print (maximumRows(binary_matrix,K)) ================================================ FILE: Infrrd/problem02/countMe.py ================================================ def countMe(arrA,arrL,arrR,arrX): resultCount = [] for b in range(len(arrX)): targetX = arrX[b] p = arrL[b] q = arrR[b] resultCount.append(countOfNums(arrA[p-1:q],targetX)) return (" ".join(str(i) for i in resultCount)) def countOfNums(array,target): C = 0 for x in array: if target%x == 0: C += 1 return (C) if __name__ == '__main__': sizeOfA = int(input()) arrA = list(map(int,input().strip().split())) numQ = int(input()) arrL = list(map(int,input().strip().split())) arrR = list(map(int,input().strip().split())) arrX = list(map(int,input().strip().split())) print (countMe(arrA,arrL,arrR,arrX)) ================================================ FILE: Instabase/Add2BinaryStrings.py ================================================ """ Add two binary strings and get the resultant Hint: Try it without converting individual entities into numerical equivalents Solution: Refer: https://www.geeksforgeeks.org/program-to-add-two-binary-strings/ """ ================================================ FILE: Instabase/README.md ================================================ # Coding questions on Enthire (Personal Experience) ================================================ FILE: Instabase/checkPairsWithGivenSum.py ================================================ """ Daily Coding Problem #1 This problem was recently asked by Google. Given a list of numbers and a number k, return whether any two numbers from the list add up to k. For example, given [10, 15, 3, 7] and k of 17, return true since 10 + 7 is 17. """ def isPairWithGivenSum(arr,n,x): left,right = 0,n-1 arr = sorted(arr) while left < right: if ((arr[left] + arr[right]) < x): left += 1 elif (arr[left] + arr[right] == x): return True elif (arr[left] + arr[right] > x): right -= 1 return False if __name__ == '__main__': T = int(input()) for tcs in range(T): arr = list(map(int,input().strip().split())) n = len(arr) x = int(input()) print (isPairWithGivenSum(arr,n,x)) ================================================ FILE: Intuit/BuySellStock.py ================================================ { #Initial Template for Python 3 import math def main(): T=int(input()) while(T>0): n=int(input()) arr=[int(x) for x in input().strip().split()] stockBuySell(arr,n) print() T-=1 if __name__ == "__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 #Complete this function def stockBuySell(A,n): if A == sorted(A): print ("(" + str(0) + " " + str(n-1) + ")",end=" ") elif A == sorted(A,reverse=True): print ("No Profit",end = " ") else: local_min = [] local_max = [] for i,v in enumerate(A): if i == 0: if v < A[i+1]: local_min.append(i) elif i == n-1: if v > A[i-1]: local_max.append(i) else: if A[i-1] <= v and v >= A[i+1]: local_max.append(i) if A[i-1] >= v and v <= A[i+1]: local_min.append(i) if len(local_max) == len(local_min): for i in range(len(local_max)): x = " ".join([str(local_min[i]),str(local_max[i])]) print ("("+x+")",end=" ") else: x = " ".join([str(max(local_min)),str(max(local_max))]) print ("("+x+")",end = " ") ================================================ FILE: Intuit/binaryArraySort.py ================================================ { #Initial Template for Python 3 import math def main(): T=int(input()) while(T>0): N=int(input()) A=list(map(int,input().split())) binSort(A,N) for i in A: print(i,end=" ") print() T-=1 if __name__ == "__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ##Complete this function def binSort(arr, n): #Your code here ''' No need to print the array ''' c0 = arr.count(0) c1 = arr.count(1) for i in range(c0): arr[i] = 0 for i in range(c0,c0+c1): arr[i] = 1 ================================================ FILE: Kritikal-Solutions/Delete_Without_Head_Pointer.py ================================================ { #Initial Template for Python 3 #Contributed by : Nagendra Jha import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None # Linked List Class class LinkedList: def __init__(self): self.head = None # creates a new node with given value and appends it at the end of the linked list def append(self, new_value): new_node = Node(new_value) if self.head is None: self.head = new_node return curr_node = self.head while curr_node.next is not None: curr_node = curr_node.next curr_node.next = new_node def getNode(self,value): # return node with given value, if not present return None curr_node=self.head while(curr_node.next and curr_node.data != value): curr_node=curr_node.next if(curr_node.data==value): return curr_node else: return None # prints the elements of linked list starting with head def printList(self): if self.head is None: print(' ') return curr_node = self.head while curr_node: print(curr_node.data,end=" ") curr_node=curr_node.next print(' ') if __name__ == '__main__': t=int(input()) for cases in range(t): n = int(input()) a = LinkedList() # create a new linked list 'a'. nodes = list(map(int, input().strip().split())) for x in nodes: a.append(x) del_elem = int(input()) to_delete=a.getNode(del_elem) deleteNode(to_delete) a.printList() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Your task is to delete the given node from the linked list, without using head pointer. Function Arguments: node (given node to be deleted) Return Type: None, just delete the given node from the linked list. { # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None } Contributed By: Nagendra Jha ''' def deleteNode(curr_node): #code here prev = None curr = a.head while curr != None and curr_node != None: if curr.data == curr_node.data: if prev: prev.next = curr.next else: a.head = curr.next return prev = curr curr = curr.next ================================================ FILE: LeadSquared/README.md ================================================ # Coding Questions on mettl platform (Personal Experience) ================================================ FILE: LeadSquared/UniqueWaysToClimbStaircase.py ================================================ """ This question is asked in Interview There are n stairs, a person standing at the bottom wants to reach the top. The person can climb either 1 stair or 2 stairs at a time. Count the number of ways, the person can reach the top. Example: number of stairs = 5 steps allowed = 1 or 2 """ def countWays(n,m): """ Solution to above problem: Pros: It will consider all the possible steps from 1 to given m. Cons: No control by external user (if user wants to give only certain step values say in the form of an array). """ res = [None] * (n+1) temp = 0 res[0] = 1 for i in range(1,n+1): s = i - m - 1 e = i - 1 if s >= 0: temp -= res[s] temp += res[e] res[i] = temp print (res) return res[n] if __name__ == '__main__': n,m = 5,2 print (countWays(n,m)) n,m = 5,3 print (countWays(5,3)) def countWays1(n,args): """ Solution that mitigates the cons of above solution Give allowed step values in the form of an array (here it is args) """ m = len(args) count = [0 for i in range(n + 1)] # base case count[0] = 1 # Count ways for all values up # to 'N' and store the result for i in range(1, n + 1): for j in range(m): # if i >= arr[j] then # accumulate count for value 'i' as # ways to form value 'i-arr[j]' if (i >= args[j]): count[i] += count[i - args[j]] # required number of ways return count[n] if __name__ == '__main__': print (countWays1(5,[2,3])) print (countWays1(5,[1,3,4])) print (countWays1(4,[1,3,4])) print (countWays1(10,[1,3,4])) print (countWays1(6,[3,2])) print (countWays1(5,[1,2,3])) print (countWays1(4,[1])) print (countWays1(4,[2])) print (countWays1(6,[3])) ================================================ FILE: LeadSquared/maximumSumLikeTimeCoefficients.py ================================================ """ Coding Test - Round 1 Given an array of N integers (negative integers allowed) which say likings of N foods by a customer Time taken to prepare a food arr[i] is i like-time co-efficient of food is defined as i*arr[i]. Sum of like-time co-efficients can be obtained by doing the add up of i*arr[i] Find the maximum sum of like-time co-efficients that can be achieved from a given array. Hint: Consider all combinations of likings from a given array. Here is the idea: """ import itertools def maxSumLikeTimeCoeff(arr,N): all_sums = [] for r in range(1,N+1): combinations = set(itertools.combinations(arr,r)) for x in combinations: sums = 0 for i in range(len(x)): sums += (i+1) * x[i] all_sums.append(sums) return max(all_sums) if __name__ == '__main__': N = int(input()) arr = list(map(int,input().strip().split())) print (maxSumLikeTimeCoeff(arr,N)) ================================================ FILE: LeadSquared/totalDistanceByStreetLights.py ================================================ """ Coding Test - Round 1 Given N street lights, and an array of tuples which signify the start and end distances of street covered by that street light. Find the total distance covered by all the street lights Ex:1 N = 1 arr = [(5,10)] Number of street lights = 1 and distance covered by street light i is 5m to 10m which is 10-5 = 5m Ex:2 N = 2 arr = [(5,10),(8,12)] Number of street lights = 2 street light 0 = 10 - 5 = 5 street light 1 = 12 - 8 = 4 common region = 10 - 8 = 2 total distance = (5 + 4 - 2) = 7 """ def total_distance(intervals): if not intervals: return 0 if len(intervals) == 1: return abs(intervals[0][0] - intervals[0][1]) result = 0 common = 0 intervals = sorted(intervals,key=lambda x: x[0]) for i in range(len(intervals)): result += abs(intervals[i][0] - intervals[i][1]) for i in range(len(intervals)-1): if intervals[i][1] > intervals[i+1][0]: common += abs(intervals[i][1] - intervals[i+1][0]) result = abs(result - common) return result if __name__ == '__main__': arr1 = [(5,10)] print (total_distance(arr1)) arr2 = [(5,10),(8,12)] print (total_distance(arr2)) #subset testcase arr3 = [(2,9),(3,6)] print (total_distance(arr3)) ================================================ FILE: MAQ_Software/Closet0s1s2s.py ================================================ { #Initial Template for Python 3 import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__=='__main__': t = int(input()) for i in range(t): n=int(input()) a=list(map(int,input().strip().split())) segragate012(a,n) print(*a) } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Your task to is sort the array a of 0s,1s and 2s of size n. You dont need to return anything.''' def segragate012(a,n): #code here c0 = a.count(0) c1 = a.count(1) c2 = a.count(2) for i in range(c0): a[i] = 0 for i in range(c0,c0+c1): a[i] = 1 for i in range(c0+c1,c0+c1+c2): a[i] = 2 #print (" ".join(str(i) for i in a),end="") ================================================ FILE: MakeMyTrip/rotateLinkedListByKelements.py ================================================ { class Node: def __init__(self, data): self.data = data self.next = None class LinkedList: def __init__(self): self.head = None def push(self, new_data): new_node = Node(new_data) new_node.next = self.head self.head = new_node def printList(self): temp = self.head while(temp): print(temp.data, end=" ") # arr.append(str(temp.data)) temp = temp.next print("") if __name__ == '__main__': start = LinkedList() t = int(input()) while(t > 0): llist = LinkedList() n = int(input()) values = list(map(int, input().strip().split())) for i in reversed(values): llist.push(i) k = int(input()) new_head = rotateList(llist.head, k) llist.head = new_head llist.printList() t -= 1 # Contributed by: Harshit Sidhwa } ''' This is a function problem.You only need to complete the function given below ''' # Your task is to complete this function ''' class Node: def __init__(self, data): self.data = data self.next = None ''' # This function should rotate list counter- # clockwise by k and return new head (if changed) def rotateList(head, k): # code here global llist llist = LinkedList() C = 0 curr_node = head part1 = [] part2 = [] while curr_node != None: if C <= k-1: part1.append(curr_node.data) elif C > k-1: part2.append(curr_node.data) C += 1 curr_node = curr_node.next total = reversed(part2 + part1) for i in total: llist.push(i) return llist.head ================================================ FILE: MakeMyTrip/sortedLinkedList012s.py ================================================ { #Initial Template for Python 3 #Contributed by : Nagendra Jha import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None # Linked List Class class LinkedList: def __init__(self): self.head = None # creates a new node with given value and appends it at the end of the linked list def append(self, new_value): new_node = Node(new_value) if self.head is None: self.head = new_node return curr_node = self.head while curr_node.next is not None: curr_node = curr_node.next curr_node.next = new_node # prints the elements of linked list starting with head def printList(head): if head is None: print(' ') return curr_node = head while curr_node: print(curr_node.data,end=" ") curr_node=curr_node.next print(' ') if __name__ == '__main__': t=int(input()) for cases in range(t): n = int(input()) a = LinkedList() # create a new linked list 'a'. nodes_a = list(map(int, input().strip().split())) for x in nodes_a: a.append(x) # add to the end of the list printList(segregate(a.head)) } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Your task is to segregate the list of 0s,1s and 2s. Function Arguments: head of the original list. Return Type: head of the new list formed. { # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None } Contributed By: Nagendra Jha ''' def segregate(head): #code here global a a = LinkedList() curr_node = head mylist = [] while curr_node != None: mylist.append(curr_node.data) curr_node = curr_node.next mylist = sorted(mylist) for i in mylist: a.append(i) return a.head ================================================ FILE: Mastek/README.md ================================================ Consider the below employee table where ManagerID is mapped to each EmployeeID: Input: | EmployeeID | EmployeeName | ManagerID | |------------|--------------|-----------| | 400     | A1    | null | | 401     | A2     | 400 | | 500     | A3     | 401 | | 501     | A4    | 401 | | 502     | A5 | 501 | Write the SQL query that identifies the level of each manager as mentioned in the below output: | EmployeeID | EmployeeName | ManagerID | Level | |------------|--------------|-----------|-------| | 400      | A1    | null    | 0 | | 401      | A2     | 400     | 1 | | 500      | A3     | 401     | 2 | | 501      | A4    | 401     | 2 | | 502     | A5    | 501     | 3 | ### Solution: 400 | 401 (depth from 400 = 1) / \ 500 501 (depth from 400 = 2) \ 502 (depth from 400 = 3) 400 -> 401 401 -> 500, 501 501 -> 502 **What we want to do here is use RECURSIVE CTE to traverse the tree in SQL and compute the depth at every level.** ================================================ FILE: Mastek/computedepth.sql ================================================ -- Oracle solution WITH RecursiveCTE (lvl, ManagerID, EmployeeID) AS ( SELECT 1 AS lvl, ManagerID, EmployeeID FROM employees WHERE ManagerID IS NULL UNION ALL SELECT lvl + 1, employees.ManagerID, employees.EmployeeID FROM employees JOIN RecursiveCTE ON employees.ManagerID = RecursiveCTE.EmployeeID ) SELECT * FROM RecursiveCTE; ================================================ FILE: Microsoft/countOfInversionsArray.py ================================================ # GeeksForGeeks Code - Copied# { #Initial Template for Python 3 import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__=='__main__': t = int(input()) for tt in range(t): n = int(input()) a = list(map(int, input().strip().split())) print(Inversion_Count(a,n)) } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Your task is to return total number of inversions present in the array. Function Arguments: array a and size n Return Type: Integer ''' def Inversion_Count(arr,n): if a == sorted(a): return 0 temp_arr = [0]*n return mergesort(arr,temp_arr,0,n-1) def mergesort(arr,temp_arr,left,right): inv_count = 0 if left < right: mid = (left + right)//2 inv_count = mergesort(arr,temp_arr,left,mid) inv_count += mergesort(arr,temp_arr,mid+1,right) inv_count += merge(arr,temp_arr,left,mid,right) return inv_count def merge(arr,temp_arr,left, mid, right): # Merge the temp arrays back into arr[l..r] i = left # Initial index of first subarray j = mid+1 # Initial index of second subarray k = left # Initial index of merged subarray invcount = 0 while i <= mid and j <= right: if arr[i] <= arr[j]: temp_arr[k] = arr[i] i += 1 else: temp_arr[k] = arr[j] invcount += (mid - i + 1) j += 1 k += 1 # Copy the remaining elements of L[], if there # are any while i <= mid: temp_arr[k] = arr[i] i += 1 k += 1 # Copy the remaining elements of R[], if there # are any while j <= right: temp_arr[k] = arr[j] j += 1 k += 1 for lr in range(left, right + 1): arr[lr] = temp_arr[lr] return invcount ================================================ FILE: Microsoft/merge2SortedLinkedLists.py ================================================ { #Initial Template for Python 3 # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None # Linked List Class class LinkedList: def __init__(self): self.head = None # creates a new node with given value and appends it at the end of the linked list def append(self, new_value): new_node = Node(new_value) if self.head is None: self.head = new_node return curr_node = self.head while curr_node.next is not None: curr_node = curr_node.next curr_node.next = new_node # prints the elements of linked list starting with head def printList(self): if self.head is None: print(' ') return curr_node = self.head while curr_node: print(curr_node.data,end=" ") curr_node=curr_node.next print(' ') if __name__ == '__main__': t=int(input()) for cases in range(t): n,m = map(int, input().strip().split()) a = LinkedList() # create a new linked list 'a'. b = LinkedList() # create a new linked list 'b'. nodes_a = list(map(int, input().strip().split())) nodes_b = list(map(int, input().strip().split())) for x in nodes_a: a.append(x) for x in nodes_b: b.append(x) a.head = merge(a.head,b.head) a.printList() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Function to merge two sorted lists in one using constant space. Function Arguments: head_a and head_b (head reference of both the sorted lists) Return Type: head of the obtained list after merger. { # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None } Contributed By: Nagendra Jha ''' def merge(head_a,head_b): #code here global a elements = [] curr_node = head_a while curr_node != None: elements.append(curr_node.data) curr_node = curr_node.next curr_node = head_b while curr_node != None: elements.append(curr_node.data) curr_node = curr_node.next elements = sorted(elements) a = LinkedList() for i in elements: a.append(i) return a.head ================================================ FILE: Microsoft/relativeSorting.py ================================================ def relativeSorting(A1,A2): common_elements = set(A1).intersection(set(A2)) extra = set(A1).difference(set(A2)) out = [] for i in A2: s = [i] * A1.count(i) out.extend(s) extra_out = [] for j in extra: u = [j] * A1.count(j) extra_out.extend(u) out = out + sorted(extra_out) return " ".join(str(i) for i in out) if __name__ == '__main__': t = int(input()) for tcase in range(t): N,M = list(map(int,input().strip().split())) A1 = list(map(int,input().strip().split())) A2 = list(map(int,input().strip().split())) print (relativeSorting(A1,A2)) ================================================ FILE: Morgan-Stanley/addTwoNumbers_LinkedListRep.py ================================================ { #Initial Template for Python 3 #Contributed by : Nagendra Jha import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None # Linked List Class class LinkedList: def __init__(self): self.head = None # creates a new node with given value and appends it at the end of the linked list def append(self, new_value): new_node = Node(new_value) if self.head is None: self.head = new_node return curr_node = self.head while curr_node.next is not None: curr_node = curr_node.next curr_node.next = new_node # prints the elements of linked list starting with head def printList(head): if head is None: print(' ') return curr_node = head while curr_node: print(curr_node.data,end=" ") curr_node=curr_node.next print(' ') if __name__ == '__main__': t=int(input()) for cases in range(t): n_a = int(input()) a = LinkedList() # create a new linked list 'a'. nodes_a = list(map(int, input().strip().split())) nodes_a = nodes_a[::-1] # reverse the input array for x in nodes_a: a.append(x) # add to the end of the list n_b =int(input()) b = LinkedList() # create a new linked list 'b'. nodes_b = list(map(int, input().strip().split())) nodes_b = nodes_b[::-1] # reverse the input array for x in nodes_b: b.append(x) # add to the end of the list result_head = addBoth(a.head,b.head) printList(result_head) } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Function to add two numbers represented in the form of the linked list. Function Arguments: head_a and head_b (heads of both the linked lists) Return Type: head of the resultant linked list. __>IMP : numbers are represented in reverse in the linked list. Ex: 145 is represented as 5->4->1. resultant head is expected in the same format. # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None ''' def addBoth(head_a,head_b): #code here result = LinkedList() num1 = "" curr_node = head_a while curr_node != None: num1 += str(curr_node.data) curr_node = curr_node.next num1 = num1[::-1] num2 = "" curr_node = head_b while curr_node != None: num2 += str(curr_node.data) curr_node = curr_node.next num2 = num2[::-1] num = int(num1) + int(num2) num = str(num)[::-1] for i in num: result.append(i) return result.head ================================================ FILE: Myntra/countOfInversionsArray.py ================================================ # GeeksForGeeks Code - Copied# { #Initial Template for Python 3 import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__=='__main__': t = int(input()) for tt in range(t): n = int(input()) a = list(map(int, input().strip().split())) print(Inversion_Count(a,n)) } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Your task is to return total number of inversions present in the array. Function Arguments: array a and size n Return Type: Integer ''' def Inversion_Count(arr,n): if a == sorted(a): return 0 temp_arr = [0]*n return mergesort(arr,temp_arr,0,n-1) def mergesort(arr,temp_arr,left,right): inv_count = 0 if left < right: mid = (left + right)//2 inv_count = mergesort(arr,temp_arr,left,mid) inv_count += mergesort(arr,temp_arr,mid+1,right) inv_count += merge(arr,temp_arr,left,mid,right) return inv_count def merge(arr,temp_arr,left, mid, right): # Merge the temp arrays back into arr[l..r] i = left # Initial index of first subarray j = mid+1 # Initial index of second subarray k = left # Initial index of merged subarray invcount = 0 while i <= mid and j <= right: if arr[i] <= arr[j]: temp_arr[k] = arr[i] i += 1 else: temp_arr[k] = arr[j] invcount += (mid - i + 1) j += 1 k += 1 # Copy the remaining elements of L[], if there # are any while i <= mid: temp_arr[k] = arr[i] i += 1 k += 1 # Copy the remaining elements of R[], if there # are any while j <= right: temp_arr[k] = arr[j] j += 1 k += 1 for lr in range(left, right + 1): arr[lr] = temp_arr[lr] return invcount ================================================ FILE: Myntra/removeDuplicatesSortedLinkedList.py ================================================ { #Initial Template for Python 3 #Contributed by : Nagendra Jha import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None # Linked List Class class LinkedList: def __init__(self): self.head = None # creates a new node with given value and appends it at the end of the linked list def append(self, new_value): new_node = Node(new_value) if self.head is None: self.head = new_node return curr_node = self.head while curr_node.next is not None: curr_node = curr_node.next curr_node.next = new_node # prints the elements of linked list starting with head def printList(self): if self.head is None: print(' ') return curr_node = self.head while curr_node: print(curr_node.data,end=" ") curr_node=curr_node.next print(' ') if __name__ == '__main__': t=int(input()) for cases in range(t): n = int(input()) a = LinkedList() # create a new linked list 'a'. nodes = list(map(int, input().strip().split())) for x in nodes: a.append(x) removeDuplicates(a.head) a.printList() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Your task is to remove duplicates from given sorted linked list. Function Arguments: head (head of the given linked list) Return Type: none, just remove the duplicates from the list. { # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None } ''' def removeDuplicates(head): #code here global a curr_node = head duplicateTracker = [] while curr_node != None: if curr_node.data not in duplicateTracker: duplicateTracker.append(curr_node.data) curr_node = curr_node.next a = LinkedList() for x in duplicateTracker: a.append(x) ================================================ FILE: Nagarro/isAnagram.py ================================================ { #Initial Template for Python 3 import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__=='__main__': t = int(input()) for i in range(t): a,b=map(str,input().strip().split()) if(isAnagram(a,b)): print("YES") else: print("NO") } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 '''Your task is to check given two strings are anagrams or not. a,b: given strings Return True or False accordingly. -> You don't need to print anything.Return type of function is boolean. ''' def isAnagram(a,b): #code here import collections freqs_a = collections.Counter(a) freqs_b = collections.Counter(b) if (set(a) == set(b)) and (sorted(freqs_a.values()) == sorted(freqs_b.values())): return True else: return False ================================================ FILE: Nielsen/README.md ================================================ ## Interview process for Senior SDE roles (Personal Experience) 1. BarRaiser Round. 2. Technical Interview rounds if BarRaiser is passed. - Technical Deep dive on projects listed in the CV - Questions on scalability of the architectures, pipelines, AWS cost saving opportunities etc. - Coding questions on Arrays related to combinatorial optimization concepts. ================================================ FILE: OYO_Rooms/FindzeroSumSubArrays.py ================================================ def subArrayExists(arr,n): ##Your code here hashMap = {} out = [] sum1 = 0 for i in range(n): sum1 += arr[i] if sum1==0: out.append((0,i)) al = [] if sum1 in hashMap: al = hashMap.get(sum1) for it in range(len(al)): out.append((al[it]+1,i)) al.append(i) hashMap[sum1] = al return len(out) if __name__ == '__main__': t = int(input()) for tcase in range(t): n = int(input()) arr = list(map(int,input().strip().split())) print (subArrayExists(arr,n)) ================================================ FILE: OYO_Rooms/parenthesisChecker.py ================================================ { #Initial Template for Python 3 import atexit import io import sys #Contributed by : Nagendra Jha _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__ == '__main__': test_cases = int(input()) for cases in range(test_cases) : #n = int(input()) #n,k = map(int,imput().strip().split()) #a = list(map(int,input().strip().split())) s = str(input()) if ispar(s): print("balanced") else: print("not balanced") } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Function Arguments : @param : s (given string containing parenthesis) @return : boolean True or False ''' def isMatchingPair(c1,c2): if (c1=='(') & (c2==')'): return True elif (c1=='{') & (c2=='}'): return True elif (c1=='[') & (c2==']'): return True else: return False def ispar(s): # code here import queue stack = queue.LifoQueue() for i in range(len(s)): if ((s[i] == '{') | (s[i] == '[') | (s[i] == '(')): stack.put(s[i]) if ((s[i] == '}') | (s[i] == ']') | (s[i] == ')')): if stack.empty(): return False elif not isMatchingPair(stack.get(),s[i]): return False if stack.empty(): return True else: return False ================================================ FILE: OYO_Rooms/removeDuplicatesSortedLinkedList.py ================================================ { #Initial Template for Python 3 #Contributed by : Nagendra Jha import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None # Linked List Class class LinkedList: def __init__(self): self.head = None # creates a new node with given value and appends it at the end of the linked list def append(self, new_value): new_node = Node(new_value) if self.head is None: self.head = new_node return curr_node = self.head while curr_node.next is not None: curr_node = curr_node.next curr_node.next = new_node # prints the elements of linked list starting with head def printList(self): if self.head is None: print(' ') return curr_node = self.head while curr_node: print(curr_node.data,end=" ") curr_node=curr_node.next print(' ') if __name__ == '__main__': t=int(input()) for cases in range(t): n = int(input()) a = LinkedList() # create a new linked list 'a'. nodes = list(map(int, input().strip().split())) for x in nodes: a.append(x) removeDuplicates(a.head) a.printList() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Your task is to remove duplicates from given sorted linked list. Function Arguments: head (head of the given linked list) Return Type: none, just remove the duplicates from the list. { # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None } ''' def removeDuplicates(head): #code here global a curr_node = head duplicateTracker = [] while curr_node != None: if curr_node.data not in duplicateTracker: duplicateTracker.append(curr_node.data) curr_node = curr_node.next a = LinkedList() for x in duplicateTracker: a.append(x) ================================================ FILE: Oracle/merge2SortedLinkedLists.py ================================================ { #Initial Template for Python 3 # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None # Linked List Class class LinkedList: def __init__(self): self.head = None # creates a new node with given value and appends it at the end of the linked list def append(self, new_value): new_node = Node(new_value) if self.head is None: self.head = new_node return curr_node = self.head while curr_node.next is not None: curr_node = curr_node.next curr_node.next = new_node # prints the elements of linked list starting with head def printList(self): if self.head is None: print(' ') return curr_node = self.head while curr_node: print(curr_node.data,end=" ") curr_node=curr_node.next print(' ') if __name__ == '__main__': t=int(input()) for cases in range(t): n,m = map(int, input().strip().split()) a = LinkedList() # create a new linked list 'a'. b = LinkedList() # create a new linked list 'b'. nodes_a = list(map(int, input().strip().split())) nodes_b = list(map(int, input().strip().split())) for x in nodes_a: a.append(x) for x in nodes_b: b.append(x) a.head = merge(a.head,b.head) a.printList() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Function to merge two sorted lists in one using constant space. Function Arguments: head_a and head_b (head reference of both the sorted lists) Return Type: head of the obtained list after merger. { # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None } Contributed By: Nagendra Jha ''' def merge(head_a,head_b): #code here global a elements = [] curr_node = head_a while curr_node != None: elements.append(curr_node.data) curr_node = curr_node.next curr_node = head_b while curr_node != None: elements.append(curr_node.data) curr_node = curr_node.next elements = sorted(elements) a = LinkedList() for i in elements: a.append(i) return a.head ================================================ FILE: Paytm/Convert_Infix_To_Postfix.py ================================================ { #Initial Template for Python 3 import atexit import io import sys # This code is contributed by Nikhil Kumar Singh(nickzuck_007) _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__ == '__main__': test_cases = int(input()) for cases in range(test_cases) : exp = str(input()) print(InfixtoPostfix(exp)) } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Function Arguments : @param : exp (given infix expression) @return : string ''' def InfixtoPostfix(exp): #code here import string stack = [] prec = {'^':4,'*':3,'/':3,'+':2,'-':2,'(':1} postfixexp = [] tokens = list(exp) for token in tokens: if ((token in string.ascii_lowercase) | (token in "0123456789") | (token in string.ascii_uppercase)): postfixexp.append(token) elif token == '(': stack.append(token) elif token == ')': if len(stack)!=0: topToken = stack.pop() while topToken != '(': postfixexp.append(topToken) topToken = stack.pop() else: if len(stack) != 0: while (len(stack)!=0) and (prec[stack[-1]] >= prec[token]): postfixexp.append(stack.pop()) stack.append(token) while len(stack)!=0: postfixexp.append(stack.pop()) return "".join(postfixexp) ================================================ FILE: Paytm/binaryArraySort.py ================================================ { #Initial Template for Python 3 import math def main(): T=int(input()) while(T>0): N=int(input()) A=list(map(int,input().split())) binSort(A,N) for i in A: print(i,end=" ") print() T-=1 if __name__ == "__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ##Complete this function def binSort(arr, n): #Your code here ''' No need to print the array ''' c0 = arr.count(0) c1 = arr.count(1) for i in range(c0): arr[i] = 0 for i in range(c0,c0+c1): arr[i] = 1 ================================================ FILE: Paytm/frequencyLimitedRangeArrayElements.py ================================================ { #Initial Template for Python 3 import math def main(): T=int(input()) while(T>0): N=int(input()) A=[int(x) for x in input().strip().split()] printfrequency(A,N) print() T-=1 if __name__=="__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 #Complete this function def printfrequency(A,N): #Your Code here import collections freq_dict = collections.Counter(A) for i in range(1,N+1): print (freq_dict[i],end=" ") ================================================ FILE: Paytm/subarrayWithZeroSum.py ================================================ { #Initial Template for Python 3 def main(): T=int(input()) while(T>0): n=int(input()) arr=[int(x) for x in input().strip().split()] if(subArrayExists(arr,n)): print("Yes") else: print("No") T-=1 if __name__=="__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 #Complete this function def allSubArrays(L,L2=None): if L2==None: L2 = L[:-1] if L==[]: if L2==[]: return [] return allSubArrays(L2,L2[:-1]) return [L]+allSubArrays(L[1:],L2) def subArrayExists(arr,n): ##Your code here #Return true or false if 0 in arr: return True else: allsubarrays = allSubArrays(arr) for s in allsubarrays: if sum(s) == 0: return True return False ================================================ FILE: Qalcomm/ImplementSTRSTR.py ================================================ { #Contributed by : Nagendra Jha import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__ == '__main__': t=int(input()) for cases in range(t): s,p =input().strip().split() print(strstr(s,p)) } ''' This is a function problem.You only need to complete the function given below ''' ''' Your task is to return the index of the pattern present in the given string. Function Arguments: s (given text), p(given pattern) Return Type: Integer. ''' def strstr(s,p): #code here import re loc = re.search(p,s) if loc is not None: return (loc.start()) else: return -1 ================================================ FILE: Qalcomm/addTwoNumbers_LinkedListRep.py ================================================ { #Initial Template for Python 3 #Contributed by : Nagendra Jha import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None # Linked List Class class LinkedList: def __init__(self): self.head = None # creates a new node with given value and appends it at the end of the linked list def append(self, new_value): new_node = Node(new_value) if self.head is None: self.head = new_node return curr_node = self.head while curr_node.next is not None: curr_node = curr_node.next curr_node.next = new_node # prints the elements of linked list starting with head def printList(head): if head is None: print(' ') return curr_node = head while curr_node: print(curr_node.data,end=" ") curr_node=curr_node.next print(' ') if __name__ == '__main__': t=int(input()) for cases in range(t): n_a = int(input()) a = LinkedList() # create a new linked list 'a'. nodes_a = list(map(int, input().strip().split())) nodes_a = nodes_a[::-1] # reverse the input array for x in nodes_a: a.append(x) # add to the end of the list n_b =int(input()) b = LinkedList() # create a new linked list 'b'. nodes_b = list(map(int, input().strip().split())) nodes_b = nodes_b[::-1] # reverse the input array for x in nodes_b: b.append(x) # add to the end of the list result_head = addBoth(a.head,b.head) printList(result_head) } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Function to add two numbers represented in the form of the linked list. Function Arguments: head_a and head_b (heads of both the linked lists) Return Type: head of the resultant linked list. __>IMP : numbers are represented in reverse in the linked list. Ex: 145 is represented as 5->4->1. resultant head is expected in the same format. # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None ''' def addBoth(head_a,head_b): #code here result = LinkedList() num1 = "" curr_node = head_a while curr_node != None: num1 += str(curr_node.data) curr_node = curr_node.next num1 = num1[::-1] num2 = "" curr_node = head_b while curr_node != None: num2 += str(curr_node.data) curr_node = curr_node.next num2 = num2[::-1] num = int(num1) + int(num2) num = str(num)[::-1] for i in num: result.append(i) return result.head ================================================ FILE: Quikr/BuySellStock.py ================================================ { #Initial Template for Python 3 import math def main(): T=int(input()) while(T>0): n=int(input()) arr=[int(x) for x in input().strip().split()] stockBuySell(arr,n) print() T-=1 if __name__ == "__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 #Complete this function def stockBuySell(A,n): if A == sorted(A): print ("(" + str(0) + " " + str(n-1) + ")",end=" ") elif A == sorted(A,reverse=True): print ("No Profit",end = " ") else: local_min = [] local_max = [] for i,v in enumerate(A): if i == 0: if v < A[i+1]: local_min.append(i) elif i == n-1: if v > A[i-1]: local_max.append(i) else: if A[i-1] <= v and v >= A[i+1]: local_max.append(i) if A[i-1] >= v and v <= A[i+1]: local_min.append(i) if len(local_max) == len(local_min): for i in range(len(local_max)): x = " ".join([str(local_min[i]),str(local_max[i])]) print ("("+x+")",end=" ") else: x = " ".join([str(max(local_min)),str(max(local_max))]) print ("("+x+")",end = " ") ================================================ FILE: README.md ================================================ # Programming questions/Case studies/Theoretical questions/Design questions etc From Interviews and Coding Tests: + Interview Coding Questions for Several Companies encapsulated into one Repository. + Coding questions asked in private hiring processes. + Statistics and questions on Data science for Data scientist interviews. + Along with coding questions, interview procedures experienced are also included here. + **Please note that some of these questions are from my own experiences with various companies (check README for that information), please create an issue if you have any questions regarding any company and its questions.** + **I update this repo with additions as I go through the process of hiring (You can imagine this as kind of a blog), so sometimes even if I didn't make it till the end of the whole process you can expect the questions till the point I was there, My intention is to help the community of software developers.** + **This repository and [Coding Questions](https://github.com/absognety/Competitive-Coding-Platforms) can have few common questions** + Check out my effort in solving questions from daily coding problem (Daily coding quesitons sent to your email) at [Here](https://github.com/absognety/Competitive-Coding-Platforms/tree/master/DailyCodingProblem). Feel free to contribute to this repo and make this one of the best open source resources for interview preparations!!!! # Disclaimer: This repository might document the interview processes of latest up and coming companies which could lead to interview privacy concerns - I trust the user discretion for this. Please note that the interview questions and process bound to change at any point of time - but If this repository helped you, Please give a star. ================================================ FILE: RelianceJIO/CamelCaseToSnakeCase.py ================================================ #!/usr/bin/python3.8 """ Write a function that converts camel_case string into snake case: Example: HackerEarth => hacker_earth OddOrEven => odd_or_even macOS => mac_o_s primeCheck => prime_check Explanation: Camel case string is string with uppercase and lowercase characters mixed up like illustrated above Snake case strings are lowercase with underscores in place of uppercase characters which are in the middle """ import re def convert_string(s:str) -> str: store_indices = [] slist = list(s) for ind,val in enumerate(s): if val.isupper() and ind!=0: store_indices.append(ind) c = len(store_indices) l = 0 while (l <= c-1): slist.insert(store_indices[l]+l,'_') l += 1 return ''.join(slist).lower() if __name__ == '__main__': for tcase in range(T:=int(input())): s = input() print (convert_string(s)) ================================================ FILE: RelianceJIO/README.md ================================================ # First round: ## Hackerrarth Test with some data science questions (Objective) along with 2 coding question ================================================ FILE: RelianceJIO/getSmallestNumber.py ================================================ #!/usr/bin/python3.8 """ P(N) gives all the numbers that divide given number N find the smallest number number Y such that Y > X and P(Y) != P(X), given number X """ import math def div_count(n): # sieve method for # prime calculation hh = [1] * (n + 1); p = 2 while((p * p) < n): if (hh[p] == 1): for i in range((p * 2), n, p): hh[i] = 0 p += 1 # Traversing through # all prime numbers total = 1 for p in range(2, n + 1): if (hh[p] == 1): # calculate number of divisor # with formula total div = # (p1+1) * (p2+1) *.....* (pn+1) # where n = (a1^p1)*(a2^p2).... # *(an^pn) ai being prime divisor # for n and pi are their respective # power in factorization count = 0 if (n % p == 0): while (n % p == 0): n = int(n / p) count += 1 total *= (count + 1) return total def smallestNumber(n,ndn): MAX_INT = math.inf p = n + 1 while (p < MAX_INT): if div_count(p) != ndn: break p += 1 return p if __name__ == '__main__': for tcase in range(T:=int(input())): n = int(input()) print (smallestNumber(n, ndn=div_count(n))) ================================================ FILE: Salesforce/BuySellStock.py ================================================ { #Initial Template for Python 3 import math def main(): T=int(input()) while(T>0): n=int(input()) arr=[int(x) for x in input().strip().split()] stockBuySell(arr,n) print() T-=1 if __name__ == "__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 #Complete this function def stockBuySell(A,n): if A == sorted(A): print ("(" + str(0) + " " + str(n-1) + ")",end=" ") elif A == sorted(A,reverse=True): print ("No Profit",end = " ") else: local_min = [] local_max = [] for i,v in enumerate(A): if i == 0: if v < A[i+1]: local_min.append(i) elif i == n-1: if v > A[i-1]: local_max.append(i) else: if A[i-1] <= v and v >= A[i+1]: local_max.append(i) if A[i-1] >= v and v <= A[i+1]: local_min.append(i) if len(local_max) == len(local_min): for i in range(len(local_max)): x = " ".join([str(local_min[i]),str(local_max[i])]) print ("("+x+")",end=" ") else: x = " ".join([str(max(local_min)),str(max(local_max))]) print ("("+x+")",end = " ") ================================================ FILE: Samsung/merge2SortedLinkedLists.py ================================================ { #Initial Template for Python 3 # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None # Linked List Class class LinkedList: def __init__(self): self.head = None # creates a new node with given value and appends it at the end of the linked list def append(self, new_value): new_node = Node(new_value) if self.head is None: self.head = new_node return curr_node = self.head while curr_node.next is not None: curr_node = curr_node.next curr_node.next = new_node # prints the elements of linked list starting with head def printList(self): if self.head is None: print(' ') return curr_node = self.head while curr_node: print(curr_node.data,end=" ") curr_node=curr_node.next print(' ') if __name__ == '__main__': t=int(input()) for cases in range(t): n,m = map(int, input().strip().split()) a = LinkedList() # create a new linked list 'a'. b = LinkedList() # create a new linked list 'b'. nodes_a = list(map(int, input().strip().split())) nodes_b = list(map(int, input().strip().split())) for x in nodes_a: a.append(x) for x in nodes_b: b.append(x) a.head = merge(a.head,b.head) a.printList() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Function to merge two sorted lists in one using constant space. Function Arguments: head_a and head_b (head reference of both the sorted lists) Return Type: head of the obtained list after merger. { # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None } Contributed By: Nagendra Jha ''' def merge(head_a,head_b): #code here global a elements = [] curr_node = head_a while curr_node != None: elements.append(curr_node.data) curr_node = curr_node.next curr_node = head_b while curr_node != None: elements.append(curr_node.data) curr_node = curr_node.next elements = sorted(elements) a = LinkedList() for i in elements: a.append(i) return a.head ================================================ FILE: Samsung/missingSmallestPositiveNumber.py ================================================ { #Initial Template for Python 3 import math def main(): T=int(input()) while(T>0): n=int(input()) arr=[int(x) for x in input().strip().split()] print(missingNumber(arr,n)) T-=1 if __name__ == "__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ##Complete this function def missingNumber(arr,n): #Your code here poss = [x for x in arr if x > 0] if poss: if poss == list(range(1,n+1)): return max(poss)+1 else: min_poss = min(poss) max_poss = max(poss) total_range = list(range(1,max_poss+1)) missingNumbers = set(total_range) - set(poss) return min(missingNumbers) else: return 0 ================================================ FILE: Samsung/moveAllZerosToEndOfArray.py ================================================ def moveZerosToEnd(arr): arr_0 = [x for x in arr if x!=0] zeros = [0] * (len(arr)-len(arr_0)) ans = arr_0 + zeros return " ".join(str(u) for u in ans) if __name__ == '__main__': T = int(input()) for t in range(T): N = int(input()) arr = list(map(int,input().strip().split())) print (moveZerosToEnd(arr)) ================================================ FILE: SkyPointCloud/README.md ================================================ # Interview process for skypoint cloud (Data engineer) (Personal Experience): 1. **Hackerearth Test**: objective questions on hadoop, spark, python and airflow basics and one coding question added above. 2. **First round of Interview**: - Questions: - what is the difference between mapreduce and spark? - What is the difference between Spark `coalesce` and `repartition`? - How do you handle out of memory error in spark application execution? - How does pyspark work? - Can you explain how you solved the programming question you have solved in the test? (refer [here](https://github.com/absognety/Interview-Process-Coding-Questions/blob/master/SkyPointCloud/modifyString.py)) 3. **Second round of Interview**: - Questions: - what are the challenges you encountered in your projects and how did you solve them? - **Case Study**: - Let's say there are 3 data sources(RDBMS), First Data source contains (firstname,lastname,email and gender), second one contains (firstname,lastname,order_history) and third contains (email,profile_id). How can you construct a 360 degrees view of a customer that contains all information from 3 data sources containing (firstname,lastname,email,gender,order_history and profile_id) but without traditional way of joining the columns (looking for more of ML based solution)(Hint: the columns may contain special characters, spaces, underscores etc) - What are list comprehensions and dictionary comprehensions in python. - Tell me one approach on finding all prime numbers in the given range? - Can you code up a binary search algorithm in array that is sorted in descending order? (refer [here](https://github.com/absognety/Interview-Process-Coding-Questions/blob/master/SkyPointCloud/binarySearch.py)) - How do you define/create a class in python? 4. **Final round of interview with Founder/CEO**. ================================================ FILE: SkyPointCloud/binarySearch.py ================================================ """ Can you write the binary search algorithm for reverse sorted array? """ def binarySearch(arr,x): arr = sorted(arr,reverse=True) l = 0 r = len(arr) - 1 while l <= r: mid = l + (r - l) // 2; # Check if x is present at mid if arr[mid] == x: return mid elif arr[mid] > x: l = mid + 1 else: r = mid - 1 # If we reach here, then the element # was not present return "not present" if __name__ == '__main__': mylist = [1,3,5,7,9,8,5,88,79] n = len(mylist) print (binarySearch(mylist,5)) print (binarySearch(mylist,88)) print (binarySearch(mylist,79)) print (binarySearch(mylist,20)) print (binarySearch(mylist,1)) ================================================ FILE: SkyPointCloud/modifyString.py ================================================ """ Given a string, sort/modify the characters of string according to below rules: 1. characters having prime ascii values should come before characters having composite ascii values. 2. if two characters have same prime ascii value, then the character with less value should come first. 3. if two characters have same composite ascii value, then the character with greater value should come first """ def isPrime(n) : # Corner cases if (n <= 1) : return False if (n <= 3) : return True # This is checked so that we can skip # middle five numbers in below loop if (n % 2 == 0 or n % 3 == 0) : return False i = 5 while(i * i <= n) : if (n % i == 0 or n % (i + 2) == 0) : return False i = i + 6 return True def sort_string(s,n): prime_chars = [] composite_chars = [] for c in s: if isPrime(ord(c)): prime_chars.append(c) else: composite_chars.append(c) prime_dict = [(c,ord(c)) for c in prime_chars] composite_dict = [(d,ord(d)) for d in composite_chars] prime_dict = sorted(prime_dict,key=lambda x: x[1]) composite_dict = sorted(composite_dict,key=lambda y:y[1],reverse=True) result = [k1 for k1,v1 in prime_dict] + [k2 for k2,v2 in composite_dict] return ''.join(result) if __name__ == '__main__': n = int(input()) s = input() print (sort_string(s,n)) ================================================ FILE: Snapdeal/addTwoNumbers_LinkedListRep.py ================================================ { #Initial Template for Python 3 #Contributed by : Nagendra Jha import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None # Linked List Class class LinkedList: def __init__(self): self.head = None # creates a new node with given value and appends it at the end of the linked list def append(self, new_value): new_node = Node(new_value) if self.head is None: self.head = new_node return curr_node = self.head while curr_node.next is not None: curr_node = curr_node.next curr_node.next = new_node # prints the elements of linked list starting with head def printList(head): if head is None: print(' ') return curr_node = head while curr_node: print(curr_node.data,end=" ") curr_node=curr_node.next print(' ') if __name__ == '__main__': t=int(input()) for cases in range(t): n_a = int(input()) a = LinkedList() # create a new linked list 'a'. nodes_a = list(map(int, input().strip().split())) nodes_a = nodes_a[::-1] # reverse the input array for x in nodes_a: a.append(x) # add to the end of the list n_b =int(input()) b = LinkedList() # create a new linked list 'b'. nodes_b = list(map(int, input().strip().split())) nodes_b = nodes_b[::-1] # reverse the input array for x in nodes_b: b.append(x) # add to the end of the list result_head = addBoth(a.head,b.head) printList(result_head) } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Function to add two numbers represented in the form of the linked list. Function Arguments: head_a and head_b (heads of both the linked lists) Return Type: head of the resultant linked list. __>IMP : numbers are represented in reverse in the linked list. Ex: 145 is represented as 5->4->1. resultant head is expected in the same format. # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None ''' def addBoth(head_a,head_b): #code here result = LinkedList() num1 = "" curr_node = head_a while curr_node != None: num1 += str(curr_node.data) curr_node = curr_node.next num1 = num1[::-1] num2 = "" curr_node = head_b while curr_node != None: num2 += str(curr_node.data) curr_node = curr_node.next num2 = num2[::-1] num = int(num1) + int(num2) num = str(num)[::-1] for i in num: result.append(i) return result.head ================================================ FILE: Swiggy/README.md ================================================ # Personal Experience - Software Development Engineer - II (Data and ML Platform): ## First Round (Computer Science Fundamentals and DSA): - Let's assume there are N number of lines in 2D coordinate system and their start and end points - `(X1s,X1e),(X2s,X2e),......(XNs,XNe) where s = start and e = end`, we need to draw a line perpendicular to X-axis (Parallel to Y-axis) such that the line intersects maximum number of given lines and also find the points of intersection if possible (If multiple intersection points possible that maximizes number of intersections - return any of them). - Find the second largest element in an array? - When does the shuffling happen in spark? - what's your inference on the scenario where the application is running and memory consumption is too high and CPU utilization is constant? - what are `coalesce` and `repartition` in Spark? - How does the parallelism happen in Python? Methods to deal with and questions on relevant libraries? ## Second Round (DSA): - Code up a linkedlist with all necessary operations from scratch! - Questions on developed linkedlist code, execution and manipulations to the code for some questions. - Find the middle element of the linkedlist. - Detect a cycle in the linkedlist if present. - Doubly linkedlists and cyclic linkedlists - Questions only (No need to implement). - Questions on Internals of Spark and performance related questions. ## LLD (Low Level Design Round): - Given a scenario of ATC (Air Traffic Controller) and flights, Questions on interactions between flight pilot and ATC if ATC needs to replaced with a automated system (Answer expected interms of API lingo) - payload, messages exchange, queues, parameters - etc. - Design a low level ATC system that manages the flight traffic, departures and arrivals within 50 kms of radius. ================================================ FILE: SymphonyAI/README.md ================================================ Hackerrank Test (Personal Experience) (2020) =========================================== ______________________________________________________________________________________________ # Interview Experience (NLP Systems Engineer) - 2021: ### First Round: + Simple programming and statistics based questions using numpy,scipy etc. + Quesitons on pyspark vs scala-spark + what kind of Data warehouse architecture are you using in your company? Questions on that as well. ### Second Round: + Random forest vs XGBoost + Questions on decision tree internals and how does it work end-end? + Model evaluation and selection techniques and questions on that involving confusion matrix etc. + Questions on K-Fold cross validation and other model validation strategies. + Model overfitting and Underfitting - How do you know? + Questions on NLP project that you did. + Programming question based on sorting (sorting strings in lexographical order) without using any libraries or built-in functions. + `inplace` and `axis` arguments in pandas apis. ================================================ FILE: SymphonyAI/maximumHeightOfMudSegment.py ================================================ """ Problem Statement A child likes to build mud walls by placing mud between sticks positioned on a number line. The gap between sticks will be referred to as a cell, and each cell will contain one segment of wall. The height of mud in a segment cannot exceed 1 unit above an adjacent stick or mud segment. Given the placement of a number of sticks and their heights, determine the maximum height segment of mud that can be built. If no mud segment can be built, return 0. For example, there are n = 3 sticks at stickPositions = [1, 2, 4, 7] with s tickHeights = [4, 5, 7, 11]. There is no space between the first two sticks, so there is no cell for mud. Between positions 2 and 4, there is one cell. Heights of the surrounding sticks are 5 and 7, so the maximum height of mud is 5 + 1 = 6. Between positions 4 and 7 there are two cells. The heights of surrounding sticks are 7 and 11. The maximum height mud segment next to the stick of height 7 is 8. The maximum height mud next to a mud segment of height 8 and a stick of height 11 is 9. Mud segment heights are 6, 8 and 9, and the maximum height is 9. In the table below, digits are in the columns of sticks and M is in the mud segments. 7 7 M7 MM7 4MM7 M4MM7 2M4MM7 12M4MM7 12M4MM7 12M4MM7 12M4MM7 Function Description Complete the function maxHeight in the editor below. The function must return an integer, the maximum height mud segment that can be built. maxHeight has the following parameter(s): stickPositions[stickPositions[0],…stickPositions[n-1]]: an array of integers stickHeights[stickHeights[0],…stickHeights[n-1]]: an array of integers Constraints 1 ≤ n ≤ 105 1 ≤ stickPositions[i] ≤ 109 (where 0 ≤ i < n) 1 ≤ stickHeights[i] ≤ 109 (where 0 ≤ i < n) Sample Input For Custom Testing 3 1 3 7 3 4 3 3 Sample Output 5 Explanation M 1M MMM 1M3MMM7 1M3MMM7 1M3MMM7 Here stickPositions = [1, 3, 7] and stickHeights = [4, 3, 3]. There can be a segment of height 4 at position 2 supported by sticks of heights 4 and 3. Between positions 3 and 7, there can be a segment of height 4 at positions 4 and 6. Between them, a segment can be built of height 5 at position 5. """ def maxHeight(wallPositions,wallHeights): n = len(wallPositions) maxim = 0 for i in range(n-1): if wallPositions[i] < wallPositions[i+1]-1: heightDiff = abs(wallHeights[i+1]-wallHeights[i]) gapLen = wallPositions[i+1]-wallPositions[i]-1 localMax = 0 if gapLen > heightDiff: low = max(wallHeights[i+1],wallHeights[i]) + 1 remainingGap = gapLen - heightDiff - 1 localMax = low + remainingGap//2 else: localMax = min(wallHeights[i+1],wallHeights[i]) + gapLen maxim = max(maxim,localMax) return maxim if __name__ == '__main__': wallPositions = [1,2,4,7] wallHeights = [4,6,8,11] print (maxHeight(wallPositions=wallPositions,wallHeights=wallHeights)) wallPositions = [1,3,7] wallHeights = [4,3,3] print (maxHeight(wallPositions,wallHeights)) ================================================ FILE: Tavant-Technologies/README.md ================================================ ## Interview with Senior Technical Architect (Glider discussion): 1. One Simple Coding question: - Given a list of candidates (list of names - there can be a name repeating).Number of times the candidate's name is repeating is his votes. So, find the candidate with maximum number of votes and if there is a tie then print the candidates name which is lexicographically smaller. 2. Variety of questions from big data projects from the CV involving workflow orchestration,Hive,Hadoop 3. Given a scenario, your knowledge on designing systems is tested aggressively from solutioning approach to implementation. 4. Difference between external and managed tables in hive 5. Questions on CI/CD 6. How would you perform testing (need specifics here!) 7. How would a design a data ingestion framework that consumes datasets from different data sources (Assume each job of data dump from each source is scheduled at a different time), Design a system that consumes the dump with as minimum latency as possible from different sources and makes it available for different applications? ================================================ FILE: Thomson-Reuters/README.md ================================================ # Interview Process - Personal (Machine Learning Engineer) - 2021: ### First Round - Hackerrank test: + Given a string, coding question on text preprocessing (cleaning text into corpus) - removing extra characters,removing stopwords, punctuations,text normalization techniques. + Given a string, coding question on string tokenization, text cleaning techniques (identify numbers,removing words having numbers,identify words with a pattern etc.). + Multiple choice questions on Machine learning - Model overfitting and underfitting, data leakage, supervised vs unsupervised learning, loss functions, cost minimizations, gradient descent, deep learning etc. + Multiple choice questions on AWS - AWS lambda, IAM, Cloudformation, sagemaker, billing, S3 etc. (Refer documentation of AWS). ================================================ FILE: Thoughtworks/README.md ================================================ Source of Problem Statement for [ScheduleInterviews.py](ScheduleInterviews.py) - [link](https://github.com/YogeshSharma0201/ThoughtWorks-pair-coding-round) Original Solution was delivered in C++, But I have developed my own solution in Python. FYI - This solution is not a re-implementation or direct translation of code present in [link](https://github.com/YogeshSharma0201/ThoughtWorks-pair-coding-round). ================================================ FILE: Thoughtworks/ScheduleInterviews.py ================================================ from typing import List,Tuple import datetime def divideWorkingHours(start_time:datetime.datetime, end_time:datetime.datetime, interval:datetime.timedelta) -> List[Tuple]: periods = [] period_start = start_time while period_start < end_time: period_end = min(period_start + interval, end_time) if (period_end - period_start == interval): periods.append((period_start, period_end)) period_start = period_end return periods def removeOverlaps(slots:List[Tuple],break_hour:Tuple) -> List[Tuple]: refined_slots = [] for slot in slots: latest_start = max(slot[0], break_hour[0]) earliest_end = min(slot[1], break_hour[1]) delta = (earliest_end - latest_start).days if (delta < 0) or (earliest_end == latest_start): refined_slots.append(slot) return refined_slots def scheduleInterviews(attendees:dict, interviewers:dict, rooms:dict, slots:List[Tuple]) -> List[Tuple]: available_attendees = attendees.get("entity").copy() available_interviewers = interviewers.get("entity").copy() available_rooms = rooms.get("entity").copy() available_slots = slots[::-1].copy() attendees_done_with_interview = set() interviews_scheduled = list() while len(available_slots) > 0: while ((len(available_interviewers) > 0) & (len(available_rooms) > 0) & (len(available_attendees) > 0)): attendee = available_attendees[-1] interviewer = available_interviewers[-1] room = available_rooms[-1] slot = available_slots[-1] interviews_scheduled.append((slot,attendee,interviewer,room)) attendees_done_with_interview.add(attendee) print (interviews_scheduled) available_attendees.pop() available_rooms.pop() available_interviewers.pop() available_slots.pop() available_interviewers = interviewers.get("entity").copy() available_rooms = rooms.get("entity").copy() available_attendees = list(set(attendees.get("entity").copy()) - attendees_done_with_interview) return interviews_scheduled if __name__ == '__main__': #Attendees n_attendees = int(input()) attendees = list(map(int, input().strip().split(","))) assert len(attendees) == n_attendees, "length of attendees list does not match with declared total" #Interviewers n_interviewers = int(input()) interviewers = list(map(str, input().strip().split(","))) assert len(interviewers) == n_interviewers, "length of interviewers list does not match with declared total" #Rooms n_rooms = int(input()) rooms = list(map(str, input().strip().split(","))) assert len(rooms) == n_rooms, "length of rooms list does not match with declared total" attendees_dict = {"entity":attendees,"count":n_attendees} interviewers_dict = {"entity":interviewers,"count":n_interviewers} rooms_dict = {"entity":rooms,"count":n_rooms} #Give all Inputs required year = 2024 month = 1 day = 22 break_hour_start = 14 break_hour_end = 15 work_hours_start = 9 work_hours_end = 18 slot_duration_in_min = 120 break_hour = (datetime.datetime(year,month,day,break_hour_start,0,0), datetime.datetime(year,month,day,break_hour_end,0,0)) start_time = datetime.datetime(year,month,day,work_hours_start,0,0) end_time = datetime.datetime(year,month,day,work_hours_end,0,0) interval = datetime.timedelta(minutes=slot_duration_in_min) #Run the Algorithm slots = divideWorkingHours(start_time,end_time,interval) refined_slots = removeOverlaps(slots=slots,break_hour=break_hour) result = scheduleInterviews(attendees=attendees_dict, interviewers=interviewers_dict, rooms=rooms_dict, slots=refined_slots) print (result) ================================================ FILE: Twilio/DiskSpaceAnalysis.py ================================================ # Number of Computers: n = 4 # Space = [8,2,4,6] # x = 2, The segment length # The free disk space of computers in each of these segments is [8,2],[2,4] and [4,6] # The minimum of these three segments are 2, 2 and 4 - [2,2,4] # Maximum of these is 4 from typing import List import math def segment(x:int, space:List[int]) -> int: m,M = x-1,x segments_generated = [space[i*(M-m):i*(M-m)+M] for i in range(math.ceil((len(space)-m)/(M-m)))] return max([min(segment) for segment in segments_generated]) if __name__ == '__main__': num_computers = 7 space = [8,2,4,6,10,15,18] segment_length = 3 print (segment(x = segment_length, space=space)) ================================================ FILE: Twilio/README.md ================================================ ### Hackerrank Test - 2020 1. Find the area of largest square that can be found with all 1's present in a binary matrix of 1's and 0's? 2. Given a sentence, write a word break algorithm that breaks down sentence into number of lines (conditions apply). ### Question gleaned from online: [DiskSpaceAnalysis.py](https://github.com/absognety/Interview-Process-Coding-Questions/blob/master/Twilio/DiskSpaceAnalysis.py) ================================================ FILE: Uber/README.md ================================================ ## Questions on SQL: 1. Let's say there is a table called Completed_Trips. The fields present in the table are `rider_id`,`trip_id` and `trip_date`.Assume there are `X` riders in week 1 and Y riders in immediate following week 2. Retention is defined as ratio of common riders between `X` and `Y` who took atleast 1 trip and number of riders in week 1 who took atleast 1 trip i.e `X` **Output:** Plot the retentions for last n weeks -> n is an input here. **Solution:** ``` #n = number of weeks n = int(input()) #n = 4 or 5 or ..... i = 0 j = 1 #Assume df is the parent data frame which is completed_trips retentions = [] #we can use datetime module to compare dates. while (i <= n-1 and j <= n): df1 = df[df['trip_date'] < curr_date - 7*i and df['trip_date'] > curr_date - 7*j] df2 = df[df['trip_date'] < curr_date - 7*(i+1) and df['trip_date'] > curr_date - 7*(j+1)] df1 = df1[df1['trip_id'].notnull()] df2 = df2[df2['trip_id'].notnull()] count_df1 = set(df1['rider_id'].values) #set gives distinct values count_df2 = set(df2['rider_id'].values) numerator = len(count_df1 & count_df2) #gives the common riders among all riders denominator = len(count_df2) retentions.append(numerator/denominator) i += 1 j += 1 #If n = 4 #0<=3 and 1<=4 #1<=3 and 2<=4 #2<=3 and 3<=4 #3<=3 and 4<=4 ----------> total 4 times the loop will repeat so we get 4 retentions. weeks_var = ["week" + "-" + str(i) for i in range(1,5)] final_data = pd.DataFrame({"Week":weeks_var,"Retention":retentions}, columns = ['Week','Retention']) #final_data is ready - we can do line charts, trend charts etc in it. ``` ## Logical Question (Managerial): 2. What are the 3 important metrics that you consider to be important to evaluate people who are reporting to you and why? + Correctness of Solution Deivered' + Quality of Solution Delivered + Adaptability of Solutions Delivered. You can form your own answers here. ## Process Flow Question (Architecture): 3. What are the steps involved in the process doing the FTP for file of any format and putting it in hive location. + Step-1: Check the location and access + Step-2: Check the format of file as hive only supports csv/delimited files. + Step-3: if it is not csv or delimited then try to convert it into structured data set. + Step-4: Transfer the file using scp protocol binding in python (pysftp) or linux shell. + Step-5: Once the file is in respective folder, create a hive table with location inserted in table definition with csv serializer. ================================================ FILE: Uber/getNewArray.py ================================================ """ Given an array of integers, return a new array such that each element at index i of the new array is the product of all the numbers in the original array except the one at i. For example, if our input was [1, 2, 3, 4, 5], the expected output would be [120, 60, 40, 30, 24]. If our input was [3, 2, 1], the expected output would be [2, 3, 6]. """ def productOfArray(arr): prod = 1 for a in arr: prod = prod * a return prod def newArray(arr): n = len(arr) new_arr = [None]*n for i in range(n): item = arr.pop(i) x = productOfArray(arr) new_arr[i] = x arr.insert(i,item) return new_arr if __name__ == '__main__': T = int(input()) for tcs in range(T): arr = list(map(int,input().strip().split())) print (newArray(arr)) ================================================ FILE: Ushur/README.md ================================================ # Personal Experience: ## First round - Hackerearth Test 2021 (Data Analytics Engineer): + Fairly simple coding question (added above) and more multiple choice questions on Machine learning (More than one option, single option etc). ================================================ FILE: Ushur/findPairs.py ================================================ """ Given a list of numbers and each number represents a different color of a sock, Find the number of pairs that can be formed from given list of socks """ import collections def find_pairs(n,socks): num_pairs = 0 freqs = collections.Counter(socks) for s,c in freqs.items(): if c >= 2: num_pairs += (c//2) return num_pairs if __name__ == '__main__': N = 9 socks = [1,2,2,1,3,4,5,2,2] print (find_pairs(N, socks)) N = 10 socks = [1,1,2,1,3,4,5,2,2,0] print (find_pairs(N, socks)) ================================================ FILE: VMWare/Convert_Infix_To_Postfix.py ================================================ { #Initial Template for Python 3 import atexit import io import sys # This code is contributed by Nikhil Kumar Singh(nickzuck_007) _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) if __name__ == '__main__': test_cases = int(input()) for cases in range(test_cases) : exp = str(input()) print(InfixtoPostfix(exp)) } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Function Arguments : @param : exp (given infix expression) @return : string ''' def InfixtoPostfix(exp): #code here import string stack = [] prec = {'^':4,'*':3,'/':3,'+':2,'-':2,'(':1} postfixexp = [] tokens = list(exp) for token in tokens: if ((token in string.ascii_lowercase) | (token in "0123456789") | (token in string.ascii_uppercase)): postfixexp.append(token) elif token == '(': stack.append(token) elif token == ')': if len(stack)!=0: topToken = stack.pop() while topToken != '(': postfixexp.append(topToken) topToken = stack.pop() else: if len(stack) != 0: while (len(stack)!=0) and (prec[stack[-1]] >= prec[token]): postfixexp.append(stack.pop()) stack.append(token) while len(stack)!=0: postfixexp.append(stack.pop()) return "".join(postfixexp) ================================================ FILE: VMWare/maxIndexDiffOfArray.py ================================================ { #Initial Template for Python 3 import math def main(): T=int(input()) while(T>0): n=int(input()) arr=[int(x) for x in input().strip().split()] print(maxIndexDiff(arr,n)) T-=1 if __name__ == "__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 #Complete this function def maxIndexDiff(arr, n): ##Your code here maxxDiff = 0 for i in range(n): for j in range(i+1,n): if arr[i]<=arr[j]: if maxxDiff < j - i: maxxDiff = j - i return maxxDiff ================================================ FILE: VMWare/mergeKSortedLinkedLists.py ================================================ { #Initial Template for Python 3 #Contributed by : Nagendra Jha import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None # Linked List Class class LinkedList: def __init__(self): self.head = None # creates a new node with given value and appends it at the end of the linked list def append(self, new_value): new_node = Node(new_value) if self.head is None: self.head = new_node return curr_node = self.head while curr_node.next is not None: curr_node = curr_node.next curr_node.next = new_node # prints the elements of linked list starting with head def printList(head): if head is None: print(' ') return curr_node = head while curr_node: print(curr_node.data,end=" ") curr_node=curr_node.next print(' ') if __name__ == '__main__': t=int(input()) for cases in range(t): n = int(input()) a= [] for i in range(n): a.append(LinkedList()) list_info = list(map(int,input().strip().split())) curr_ind = 0 curr_list_ind = 0 while curr_ind < len(list_info): nodes= list_info[curr_ind] curr_ind+=1 for i in range(nodes): a[curr_list_ind].append(list_info[curr_ind]) curr_ind += 1 curr_list_ind += 1 heads = [] for i in range(n): heads.append(a[i].head) printList(merge(heads,n)) } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Your task is to merge the given k sorted linked lists into one list and return the head of the new formed linked list. Function Arguments: array "heads" (containing heads of linked lists), n size of array a. Return Type: head node.; { # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None } ''' def merge(heads,n): #code here newll = LinkedList() new_list = [] for h in heads: curr_node = h while curr_node != None: new_list.append(curr_node.data) curr_node = curr_node.next new_list = sorted(new_list) for ele in new_list: newll.append(ele) return newll.head ================================================ FILE: Vimeo/README.md ================================================ # Senior Data Engineer: Interview Process For Vimeo - 2024: ## First Round (Format): + Coding Test with Interviewers present to observe, on Qualified Platform (https://www.qualified.io/). + SQL section (5 Questions) and Python section (5 Questions), 25 minutes for each section. There is a hard stop when 25 mins is over for each section. + Python programming section covers questions from Arrays, Linkedlists and Trees and Tree Traversals (Medium Level). + SQL questions involve Advanced SELECT and JOINS. Questions are subject to change with each test, but the format is likely to remain same, but who knows? :blush: :grin: ================================================ FILE: Visa/addNode_DoublyLinkedList.py ================================================ { class Node: def __init__(self, data): self.data = data self.next = None self.prev = None class DoublyLinkedList: def __init__(self): self.head = None def append(self, new_data): new_node = Node(new_data) new_node.next = None if self.head is None: new_node.prev = None self.head = new_node return last = self.head while(last.next is not None): last = last.next last.next = new_node new_node.prev = last return def printList(self, node): while(node.next is not None): node = node.next while node.prev is not None: node = node.prev while(node is not None): print(node.data, end=" ") node = node.next print() if __name__=='__main__': t = int(input()) for i in range(t): n = int(input()) arr = map(int, input().strip().split()) llist = DoublyLinkedList() for e in arr: llist.append(e) pos,data = map(int, input().strip().split()) addNode(llist.head, pos, data) llist.printList(llist.head) # Contributed by: Harshit Sidhwa } ''' This is a function problem.You only need to complete the function given below ''' # Your task is to complete this function # function should add a new node after the pth position # function shouldn't print or return any data ''' class Node: def __init__(self, data): self.data = data self.next = None self.prev = None ''' def addNode(head, p, data): # Code here temp = Node(data) if head == None: head = temp return curr_node = head C = 0 while C <= p: if C == p: temp.next = curr_node.next curr_node.next = temp if temp.next is not None: temp.next.prev = temp temp.prev = curr_node curr_node = curr_node.next C += 1 return ================================================ FILE: Visa/populateList.py ================================================ """ Populate the list according to following fashion Given a list of elements [a0, a1, a2, ......, an-1] of length n Return a list of length n such that it has the following traversal Input: [a0, a1, a2, ......, an-1] Output: [a0, an-1, a1, an-2, a2, an-3,........] """ def solution(numbers): size_of_arr = len(numbers) result = list() for ind in range(size_of_arr): if len(result) == size_of_arr: break result.append(numbers[ind]) if (ind != size_of_arr - ind - 1): result.append(numbers[size_of_arr - ind - 1]) return result if __name__ == '__main__': input1 = [2, 5, -10, -4, 0, 8] print (solution(input1)) input2 = [1, 20, 3, 8, 5] print (solution(input2)) input3 = [10, -1, 0, 8, 9, -1, 5, -5, 6, 9, 20, 10] print (solution(input3)) ================================================ FILE: Visa/removeDuplicatesSortedLinkedList.py ================================================ { #Initial Template for Python 3 #Contributed by : Nagendra Jha import atexit import io import sys _INPUT_LINES = sys.stdin.read().splitlines() input = iter(_INPUT_LINES).__next__ _OUTPUT_BUFFER = io.StringIO() sys.stdout = _OUTPUT_BUFFER @atexit.register def write(): sys.__stdout__.write(_OUTPUT_BUFFER.getvalue()) # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None # Linked List Class class LinkedList: def __init__(self): self.head = None # creates a new node with given value and appends it at the end of the linked list def append(self, new_value): new_node = Node(new_value) if self.head is None: self.head = new_node return curr_node = self.head while curr_node.next is not None: curr_node = curr_node.next curr_node.next = new_node # prints the elements of linked list starting with head def printList(self): if self.head is None: print(' ') return curr_node = self.head while curr_node: print(curr_node.data,end=" ") curr_node=curr_node.next print(' ') if __name__ == '__main__': t=int(input()) for cases in range(t): n = int(input()) a = LinkedList() # create a new linked list 'a'. nodes = list(map(int, input().strip().split())) for x in nodes: a.append(x) removeDuplicates(a.head) a.printList() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Your task is to remove duplicates from given sorted linked list. Function Arguments: head (head of the given linked list) Return Type: none, just remove the duplicates from the list. { # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None } ''' def removeDuplicates(head): #code here global a curr_node = head duplicateTracker = [] while curr_node != None: if curr_node.data not in duplicateTracker: duplicateTracker.append(curr_node.data) curr_node = curr_node.next a = LinkedList() for x in duplicateTracker: a.append(x) ================================================ FILE: Walmart/GroupAnagrams.py ================================================ ## Data Engineering Interview at Walmart # Find the words from the given list and group them with the words which are # formed by different arrangements of the same letters(anagram) # Input = ['mat','tam','cute','beat','eatb','teab','ateb'] # Output = [[“mat”, “tam”],[“cute”],[“beat”,”eatb”,”teab”,”ateb”]] # Input = ['lump', 'eat', 'me', 'tea', 'em', 'plum'] # Without using Advanced Data structures and libraries from typing import List,Tuple from collections import defaultdict def group_anagrams(words:List[str]) -> List[Tuple[str]]: groups = [] for i in range(len(words)): collect_groups=[] for j in range(i+1,len(words)): if sorted(words[i]) == sorted(words[j]): if words[i] != words[j]: collect_groups.append(words[i]) collect_groups.append(words[j]) else: continue if len(collect_groups) > 0: groups.append(tuple(set(collect_groups))) else: groups.append((words[i],)) result = [] for i in range(len(groups)): checker = [] for j in range(len(groups)): if i != j: if set(groups[i]).issubset(set(groups[j])): checker.append(1) else: checker.append(0) else: continue if (len(set(checker)) == 1) and (0 in checker): result.append(groups[i]) return result def group_anagrams_1(words:List[str]) -> List[Tuple[str]]: temp = defaultdict(list) for ele in words: temp[str(sorted(ele))].append(ele) print (temp) return list(temp.values()) if __name__ == '__main__': # words = list(input().strip().split(",")) # print (group_anagrams(words=words)) print (group_anagrams_1(words=['lump', 'eat', 'me', 'tea', 'em', 'plum'])) print (group_anagrams(words=['mat','tam','cute','beat','eatb','teab','ateb'])) print (group_anagrams_1(words=['mat','tam','cute','beat','eatb','teab','ateb'])) ================================================ FILE: Walmart/minimumCoins.py ================================================ # Given a value V, if we want to make a change for V cents, and we have a finite supply of # each of C = { C1, C2, .., Cm} valued coins, what is the minimum number of coins to make the change? # If it’s not possible to make a change, print -1. # Examples: # finite supply is just coin values provided in the list, cannot use more than those # Input: coins[] = {25, 10, 5}, V = 30 # Output: Minimum 2 coins required We can use one coin of 25 cents and one of 5 cents # Input: coins[] = {9, 6, 5, 1}, V = 11 # Output: Minimum 2 coins required We can use one coin of 6 cents and 1 coin of 5 cents from typing import List import itertools def minCoins(coins:List[int],value:int) -> int: if value == 0: return 0 maxlength = len(coins) for l in range(1,maxlength,1): combinations = list(itertools.combinations(coins,l)) for c in combinations: if sum(c)==value: return f"Minimum {l} coins required, we can use coins of {(','.join([str(i) for i in c]))} cents" else: continue return -1 if __name__ == '__main__': coins = list(map(int,input().strip().split(' '))) value = int(input()) print (minCoins(coins,value)) ================================================ FILE: Walmart/minimumCoinsRecursive.py ================================================ # Given a value V, if we want to make a change for V cents, and we have an infinite supply of # each of C = { C1, C2, .., Cm} valued coins, what is the minimum number of coins to make the change? # If it’s not possible to make a change, print -1. # Examples: # Input: coins[] = {25, 10, 5}, V = 30 # Output: Minimum 2 coins required We can use one coin of 25 cents and one of 5 cents # Input: coins[] = {9, 6, 5, 1}, V = 11 # Output: Minimum 2 coins required We can use one coin of 6 cents and 1 coin of 5 cents from typing import List import sys def minimumCoinsRecursive(coins:List[int],value:int) -> int: if value == 0: return 0 coins_l = len(coins) res = sys.maxsize for i in range(coins_l): if coins[i] <= value: sub_res = minimumCoinsRecursive(coins,value-coins[i]) if ((sub_res != sys.maxsize) & (sub_res + 1 < res)): res = sub_res + 1 return res if __name__ == '__main__': coins = list(map(int,input().strip().split(' '))) print (coins) value = int(input()) print (value) print (minimumCoinsRecursive(coins,value)) ================================================ FILE: Yahoo/ThreeWayPartition.py ================================================ { # Driver Program from collections import Counter if __name__=='__main__': t = int(input()) for i in range(t): n = int(input()) arr = list(map(int, input().strip().split())) brr = Counter(arr) a,b = list(map(int, input().strip().split())) res = threeWayPartition(arr, n, a, b) k1 = k2 = k3 = 0 for e in arr: if e > a: k3+=1 elif e<=a and e>=b: k2+=1 elif e=b: m2+=1 for e in range(k1+k2, k1+k2+k3): if res[e]>=a: m3+=1 flag = False if k1==m1 and k2==m2 and k3==m3: flag = True for e in range(len(res)): brr[res[e]]-=1 for e in range(len(res)): if brr[res[e]]!=0: flag = False if flag: print(1) else: print(0) # Contributed by: Harshit Sidhwa } ''' This is a function problem.You only need to complete the function given below ''' # Your task is to complete this function # function should a list containing the required order of the elements def threeWayPartition(arr, n, a, b): # Code here lessa = [] atob = [] greatb = [] for i in range(n): if arr[i] < a: lessa.append(arr[i]) elif arr[i] >= a and arr[i] <= b: atob.append(arr[i]) elif arr[i] > b: greatb.append(arr[i]) farr = lessa + atob + greatb return farr ================================================ FILE: ZSAssociates/CrossSequence.py ================================================ """ Given an array A, Integer K Find the Kth Smallest element of array which is generated from calculating the absolute differences of the cartesian product of the array elements Example: A = [4,2,1] K = 5 N = size(A) = 3 S = [|4-4|,|4-2|,|4-1|,|2-4|,|2-2|,|2-1|,|1-4|,|1-2|,|1-1|] S = [0,2,3,2,0,1,3,1,0] S = [0,0,0,1,1,2,2,3,3,3] (sorted) 5th Smallest element (assuming 1-indexing) is 4 """ import itertools def solve(N,A,K): cartesian_product = itertools.product(A,A) abs_diffs = [abs(i-j) for i,j in cartesian_product] return sorted(abs_diffs)[K-1] if __name__ == '__main__': A = [4,2,1] N = len(A) K = 5 print (solve(N,A,K)) ================================================ FILE: ZSAssociates/OutputOfProgram.py ================================================ import pandas as pd from pyspark import SparkContext from pyspark.sql.functions import col,struct,pandas_udf,PandasUDFType from pyspark.sql import SQLContext sc = SparkContext("local","test") x = pd.Series([1,2,3]) pdf = pd.DataFrame([1,2,3],columns = ['x']) df = SQLContext(sc).createDataFrame(pdf) @pandas_udf("long",PandasUDFType.SCALAR_ITER) def plus_one(batch_iter): for x in batch_iter: yield x + 1 @pandas_udf("long",PandasUDFType.SCALAR_ITER) def multiply_two_cols(batch_iter): for a,b in batch_iter: yield a * b df.select(multiply_two_cols(col("x"),col("x"))).show() ================================================ FILE: ZSAssociates/OutputOfProgram2.py ================================================ def MainProgram(iterable, x): sample = tuple(iterable) n = len(sample) if not n and x: return indices = [1] * x yield tuple(sample[i] for i in indices) while True: for i in reversed(range(x)): if indices[i] != n-1: break else: return indices[i:] = [indices[i] + 1] * (x-i) yield tuple(sample[i] for i in indices) a = MainProgram('PYTHON', 3) print (next(a)) def MainProg(f): m = {} def InnerProg(num): if num not in m: m[num] = f(num) return m[num] return InnerProg @MainProg def Call(num): if num == 0: return 1 else: return num**2*Call(num-1) print (Call(3)) ================================================ FILE: Zoho/merge2SortedLinkedLists.py ================================================ { #Initial Template for Python 3 # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None # Linked List Class class LinkedList: def __init__(self): self.head = None # creates a new node with given value and appends it at the end of the linked list def append(self, new_value): new_node = Node(new_value) if self.head is None: self.head = new_node return curr_node = self.head while curr_node.next is not None: curr_node = curr_node.next curr_node.next = new_node # prints the elements of linked list starting with head def printList(self): if self.head is None: print(' ') return curr_node = self.head while curr_node: print(curr_node.data,end=" ") curr_node=curr_node.next print(' ') if __name__ == '__main__': t=int(input()) for cases in range(t): n,m = map(int, input().strip().split()) a = LinkedList() # create a new linked list 'a'. b = LinkedList() # create a new linked list 'b'. nodes_a = list(map(int, input().strip().split())) nodes_b = list(map(int, input().strip().split())) for x in nodes_a: a.append(x) for x in nodes_b: b.append(x) a.head = merge(a.head,b.head) a.printList() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ''' Function to merge two sorted lists in one using constant space. Function Arguments: head_a and head_b (head reference of both the sorted lists) Return Type: head of the obtained list after merger. { # Node Class class Node: def __init__(self, data): # data -> value stored in node self.data = data self.next = None } Contributed By: Nagendra Jha ''' def merge(head_a,head_b): #code here global a elements = [] curr_node = head_a while curr_node != None: elements.append(curr_node.data) curr_node = curr_node.next curr_node = head_b while curr_node != None: elements.append(curr_node.data) curr_node = curr_node.next elements = sorted(elements) a = LinkedList() for i in elements: a.append(i) return a.head ================================================ FILE: Zoho/rearrangeArrayAlternately.py ================================================ { #Initial Template for Python 3 import math def main(): T=int(input()) while(T>0): n=int(input()) arr=[int(x) for x in input().strip().split()] arre = rearrange(arr,n) for i in arre: print(i,end=" ") print() T-=1 if __name__ == "__main__": main() } ''' This is a function problem.You only need to complete the function given below ''' #User function Template for python3 ##Complete this function def rearrange(arr, n): ##Your code here if n%2 == 0: for i in range(n//2): arr.append(arr[n-1-i]) arr.append(arr[i]) else: for i in range(n//2): arr.append(arr[n-1-i]) arr.append(arr[i]) arr.append(arr[math.floor(n/2)]) arr[:] = arr[n:] return arr ================================================ FILE: Zycus/README.md ================================================ # Personal Experience (Interview Process for AI/Machine Learning Engineer - 2020): ### First screening round: + Interview with the Director of AI teams: Case study: Dataset is as below: | itemId | item_name | item_description | manufacturer_name | supplier_name | target_variable(home/workplace) | |--------|-----------|-----------------------|-------------------|---------------|---------------------------------| | 1000 | monitor | ram: 32gb,2.8 GHZ,HDD | HP | senty pvt ltd | workplace | | 1001 | | | | | | | 1003 | | | | | | Given only sample size of 10 of above schema (with target variable assigned), training dataset(without target_variable assigned) size is 100, how would you avoid the effort of manual tagging of target variable to training data? #### Follow up questions: + How would you select the variables? + How would u use the opentext column `item_description`? The round is for just 10 minutes, and I didn't get selected. #### My solution: + k-NN/decision tree algorithm for dependent variable tagging (even associaton rule mining algorithms that include apriori and eclat). + initial selection of variables is based on domain knowledge as number of variables is very less here. + preprocess the text column `item_description` maybe do text summarization of the full text and extract keywords using bi-grams and collocations and use that tag in further processing. ## Use https://www.tablesgenerator.com/markdown_tables# for quick tables generation in markdown, It's elegant and simple. ================================================ FILE: cimpress/README.md ================================================ Interview Process for Data Engineer - 2021 (Personal Experience): ================================================================= ### 1. First Round - Introductory Round: + Introduction to yourself,About the company. + Project experience, Tech stacks you are comfortable with, basic programming questions. + Relevance of your background to the role. + Interview process detailing and other questions if you may have or interviewer has. ### 2. Online Coding Test on Codility: + 1 question on SQL and 1 question on programming. + Programming question - Say a string composed of only letters a and b, numbers of ways to split it into 3 substrings such that all 3 substrings have same number of character 'a's. + SQL question: Please find the question below - ![image](https://user-images.githubusercontent.com/25507554/120599363-0a640480-c465-11eb-8b40-bb91a3c8bc3c.png) ### 3. Coding Round (Programming test - Data transformations using Apache Spark Live): + Given 2 data questions - Transform data using Apache spark transformations live, Execute them and present the results - environment for execution of spark programs is provided. ### 4. Design Round: + Document is provided which describes the problem statement and bottlenecks in existing architecture and what the requirement is - Design a scalable Machine learning system that automates the e2e ML workflows and helps analysts and stakeholders in their seamless outputs consumption - Architectural choices, Technology stack that you choose need to be explained clearly and reason behind that choice - would be an addition if the alternative approaches are also explained. ### 5. Problem Solving Round: + This round is not a technical interview but tests on the problem solving skills of yours - A real world situation with some helpers are given, come up with the steps to mitigate the problem and measures taken to solve that problem. ### 6. Awesomeness interview: + Discussions and questions about working with a team and challenges that are faced - questions on how you handled that in past. ### 7. Behavioral Interview: + Working under a stressful situation. + How is the conflict with a co-worker resolved? + How do you keep yourself updated and enthusiastic about latest tech trends? and other questions.