[
  {
    "path": ".github/workflows/copy-to-documenation-branch.yml",
    "content": "name: Copy from master to documentation branch\n\n# Controls when the action will run.\non:\n  # Triggers the workflow on push request events but only for the master branch\n  push:\n    branches: [master]\n\n  # Allows you to run this workflow manually from the Actions tab\n  workflow_dispatch:\n\njobs:\n  copy-images:\n    runs-on: ubuntu-latest\n    steps:\n      # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it\n      - uses: actions/checkout@v2\n\n      - name: Copy Images\n        uses: andstor/copycat-action@v3\n        with:\n          personal_token: ${{ secrets.ACTION_TOKEN  }}\n          src_branch: master\n          src_path: /images/.\n          dst_owner: andkret\n          dst_repo_name: Cookbook\n          dst_path: /static/images/\n          dst_branch: documentation\n          clean: true\n          commit_message: \"Images copied from master to documentation branch!\"\n\n  copy-sections:\n    runs-on: ubuntu-latest\n\n    steps:\n      - uses: actions/checkout@v2\n\n      - name: Copy Markdowns\n        uses: andstor/copycat-action@v3\n        with:\n          personal_token: ${{ secrets.ACTION_TOKEN  }}\n          src_branch: master\n          src_path: /sections/.\n          dst_owner: andkret\n          dst_repo_name: Cookbook\n          dst_path: /docs/\n          dst_branch: documentation\n          clean: true\n          commit_message: \"Sections copied from master to documentation branch!\"\n    \n  # copy-readme:\n  #   runs-on: ubuntu-latest\n\n  #   steps:\n  #     - uses: actions/checkout@v2\n  #     - name: Copy Markdowns\n  #       uses: andstor/copycat-action@v3\n  #       with:\n  #         personal_token: ${{ secrets.ACTION_TOKEN  }}\n  #         src_branch: master\n  #         src_path: README.md\n  #         dst_owner: andkret\n  #         dst_repo_name: Cookbook\n  #         dst_path: /docs/00-TableOfContents.md\n  #         dst_branch: documentation\n  #         clean: false\n  #         commit_message: \"Readme copied from master to documentation branch!\"\n# copy-readme:\n#   runs-on: ubuntu-latest\n#   steps:\n#     - uses: actions/checkout@v2\n#     - name: Copy Markdowns\n#       uses: andstor/copycat-action@v3\n#       with:\n#         personal_token: ${{ secrets.PERSONAL_TOKEN  }}\n#         src_branch: master\n#         src_path: /README.md\n#         dst_owner: andkret\n#         dst_repo_name: Cookbook\n#         dst_path: /docs/\n#         dst_branch: documentation\n#         commit_message: \"README.md copied from master to documentation branch!\"\n"
  },
  {
    "path": ".github/workflows/linkchecker.yml",
    "content": "#on:\n#  schedule:\n#    - cron: '0 9 * * 1'\n#  workflow_dispatch:\n\n#jobs:\n#  linkChecker:\n#    runs-on: ubuntu-latest\n#    steps:\n#      - name: update setuptools\n#        run: |\n#          python3 -m pip install --upgrade pip setuptools wheel\n#      - uses: actions/checkout@v2\n#      - name: Link Checker\n#        uses: lycheeverse/lychee-action@master\n#        with:\n#          args: --verbose --no-progress --accept 200,204,206,406,429,999 --include-mail ./sections/*.md\n#      - name: Create Issue From File\n#        uses: peter-evans/create-issue-from-file@v5\n#        with:\n#          title: Link Checker Report\n#          content-filepath: ./lychee/out.md\n#          labels: report, automated issue\n"
  },
  {
    "path": ".gitignore",
    "content": "# Ignore build artefacts \n*.aux\n*.log\n*.lof\n*.lot\n*.toc\n*.out\n*.synctex.gz\n\nnode_modules/*"
  },
  {
    "path": "Code Examples/#102 Spark Week Day 3.txt",
    "content": "//Read in the textfile\nval input = sc.textFile(\"/notebook/Movies.txt\")\n\ncase class MovieLine(Line: String)\n\nval movieline = input.map(line => MovieLine(line))\n\nmovieline.toDF().registerTempTable(\"MovieLine\")\n\n\n// Lets map the date and the genre\ncase class DateAndGenre(myDate: String, Genre: String)\n\nval dateandgenre = input.map(line => line.split(\";\")).map(s => DateAndGenre( s(0),s(3) ))\n\ndateandgenre.toDF().registerTempTable(\"DateAndGenre\")\n\n// count how many movies per year\ncase class MovieDate(Line: String, myCount: Int)\n\nval countdate = input.map(line => line.split(\";\")).map(s => (s(0),1))\ncountdate.toDF().registerTempTable(\"countdate\")\n\nval reduceddate = countdate.reduceByKey((a,b) => a + b).map(s => MovieDate(s._1,s._2))\n\nreduceddate.toDF().registerTempTable(\"MovieDate\")\n\n//flatten every word into a new line in the RDD\nval flatmappedinput = input.flatMap(line => line.split(\";\") )\nflatmappedinput.toDF().registerTempTable(\"flatinput\")\n\n// read input directly to dataframe\nval inputasdf = spark.read.format(\"csv\").option(\"header\", \"true\").option(\"delimiter\", \";\").load(\"/notebook/Movies.txt\")\ninputasdf.registerTempTable(\"inputdf\")\n\n/* //Use this to store the dataframe as parquet on the local drive\nval reduceddf = reduceddate.toDF()\nreduceddf.write.parquet(\"/notebook/movie.parquet\")\n*/\n\n//read the parquetfile\nval parquetFileDF = spark.read.parquet(\"/notebook/movie.parquet\")\nparquetFileDF.registerTempTable(\"ParquetRead\")\n\n\n//SparkSQL Queries:\n\n//Visualize the raw RDD\n%sql select * from MovieLine\n\n//Visualize the map reduced RDD with count of movies per year\n%sql select Line, myCount from MovieDate order by myCount desc\n\n//Visualize the maped RDD and count the nr. of movies per year in SparkSQL\n%sql select myDate, count(myDate) as counted from DateAndGenre group by myDate order by counted desc\n\n%sql select * from flatinput\n\n%sql select * from ParquetRead\n"
  },
  {
    "path": "Code Examples/GenAI-RAG/conversations.json",
    "content": "[\n  {\n    \"conversation_id\": 456,\n    \"customer_name\": \"Alice Brown\",\n    \"agent_name\": \"Emily Johnson\",\n    \"policy_number\": \"ABC5678\",\n    \"conversation\": \"Customer: Hi, my name is Alice Brown. Date of Birth is September 20th, 1980, Address is 456 Oak St, Springfield, IL 62701, and my Policy Number is XYZ9876543.\\nAgent: Good afternoon, Alice. How may I assist you today?\\nCustomer: Hello, Emily. I have a question regarding my coverage.\\nCustomer: My kitchen caught fire, and I'm concerned about the damages.\\nAgent: I'm sorry to hear that, Alice. Let me review your policy for fire damage coverage.\\nAgent: It appears that fire damage is covered under your policy. We'll assist you with the claim process.\\nCustomer: Thank you, Emily. I appreciate your help during this stressful time.\\nAgent: You're welcome, Alice. We're here to support you. Please don't hesitate to reach out if you need further assistance.\\nCustomer: I'll keep that in mind. Have a great day!\\nAgent: You too, Alice. Take care.\",\n    \"summary\": \"A customer inquires about policy coverage after a kitchen fire, expressing concern, and the agent confirms coverage and offers assistance, providing support and reassurance throughout the conversation.\"\n  },\n  {\n    \"conversation_id\": 789,\n    \"customer_name\": \"David Johnson\",\n    \"agent_name\": \"Sarah Wilson\",\n    \"policy_number\": \"LMN9012\",\n    \"conversation\": \"Customer: Good morning, I'm David Johnson. My Date of Birth is May 5th, 1975, Address is 789 Maple Ave, Seattle, WA 98101, and my Policy Number is PQR3456789.\\nAgent: Good morning, David. How can I assist you today?\\nCustomer: Hi, Sarah. I'm concerned about my home insurance coverage.\\nCustomer: A pipe burst in my basement, and there's significant water damage.\\nAgent: I'm sorry to hear that, David. Let me check your policy for coverage related to water damage.\\nAgent: It seems that water damage from burst pipes is covered under your policy.\\nCustomer: That's a relief. I'll need to file a claim as soon as possible.\\nAgent: We'll assist you with the claim process, David. Is there anything else I can help you with?\\nCustomer: No, that's all for now. Thank you for your assistance, Sarah.\\nAgent: You're welcome, David. Please feel free to reach out if you have any further questions or concerns.\\nCustomer: I will. Have a great day!\\nAgent: You too, David. Take care.\",\n    \"summary\": \"A customer expresses concern about home insurance coverage due to water damage from a burst pipe, and the agent confirms coverage, offering assistance with the claim process, resulting in relief and gratitude expressed by the customer.\"\n  },\n  {\n    \"conversation_id\": 101,\n    \"customer_name\": \"Emily Green\",\n    \"agent_name\": \"Jack Smith\",\n    \"policy_number\": \"DEF4567\",\n    \"conversation\": \"Customer: Hi there, I'm Emily Green. My Date of Birth is April 10th, 1988, Address is 101 Pine St, Boston, MA 02101, and my Policy Number is DEF4567.\\nAgent: Hello, Emily. How can I assist you today?\\nCustomer: Hi, Jack. I have a question about my policy.\\nCustomer: A window in my living room shattered during a storm. Is this covered?\\nAgent: Let me check your policy for coverage related to storm damage.\\nAgent: Unfortunately, damage to windows from storms is not covered under your policy.\\nCustomer: Oh, that's disappointing. Is there any way to add coverage for this?\\nAgent: Yes, we offer endorsements for specific perils like storm damage to windows. I can provide you with more information on that.\\nCustomer: Please do. I want to ensure I'm protected in case this happens again.\\nAgent: I'll send you an email with details on our endorsement options. Feel free to reach out if you have any further questions.\\nCustomer: Thank you, Jack. I appreciate your help.\\nAgent: You're welcome, Emily. Have a great day!\",\n    \"summary\": \"A customer inquires about coverage for a shattered window after a storm, but it's not covered under the policy. The agent suggests adding endorsements for specific perils like storm damage to windows, providing further information and assistance, resulting in the customer's appreciation.\"\n  },\n  {\n    \"conversation_id\": 102,\n    \"customer_name\": \"Michael White\",\n    \"agent_name\": \"Sarah Johnson\",\n    \"policy_number\": \"GHI7890\",\n    \"conversation\": \"Customer: Good afternoon, I'm Michael White. My Date of Birth is February 25th, 1970, Address is 202 Elm St, Chicago, IL 60601, and my Policy Number is GHI7890.\\nAgent: Good afternoon, Michael. How may I assist you today?\\nCustomer: Hi, Sarah. I have a question about my policy coverage.\\nCustomer: My roof has started leaking after heavy rainfall. Will my insurance cover repairs?\\nAgent: Let me review your policy for coverage related to roof leaks.\\nAgent: Roof leaks due to rain are typically covered under your policy.\\nCustomer: That's a relief. I'll need to schedule repairs as soon as possible.\\nAgent: We'll assist you with the claim process, Michael. Is there anything else I can help you with?\\nCustomer: No, that's all for now. Thank you for your assistance, Sarah.\\nAgent: You're welcome, Michael. Please feel free to reach out if you have any further questions or concerns.\\nCustomer: I will. Have a great day!\\nAgent: You too, Michael. Take care.\",\n    \"summary\": \"A customer seeks clarification on policy coverage for a leaking roof after heavy rainfall, and the agent confirms that such damages are typically covered under the policy. The agent offers assistance with the claim process, resulting in the customer expressing relief and gratitude.\"\n  },\n  {\n    \"conversation_id\": 103,\n    \"customer_name\": \"Sophia Jones\",\n    \"agent_name\": \"Emily Wilson\",\n    \"policy_number\": \"JKL0123\",\n    \"conversation\": \"Customer: Hi, I'm Sophia Jones. My Date of Birth is November 15th, 1985, Address is 303 Cedar St, Miami, FL 33101, and my Policy Number is JKL0123.\\nAgent: Hello, Sophia. How may I assist you today?\\nCustomer: Hello, Emily. I have a question about my policy.\\nCustomer: There's been a break-in at my home, and some valuable items are missing. Are they covered?\\nAgent: Let me check your policy for coverage related to theft.\\nAgent: Yes, theft of personal belongings is covered under your policy.\\nCustomer: That's a relief. I'll need to file a claim for the stolen items.\\nAgent: We'll assist you with the claim process, Sophia. Is there anything else I can help you with?\\nCustomer: No, that's all for now. Thank you for your assistance, Emily.\\nAgent: You're welcome, Sophia. Please feel free to reach out if you have any further questions or concerns.\\nCustomer: I will. Have a great day!\\nAgent: You too, Sophia. Take care.\",\n    \"summary\": \"A customer inquires about coverage for stolen items after a break-in at home, and the agent confirms that theft of personal belongings is covered under the policy. The agent offers assistance with the claim process, resulting in the customer expressing relief and gratitude.\"\n  },\n  {\n    \"conversation_id\": 104,\n    \"customer_name\": \"Ethan Wilson\",\n    \"agent_name\": \"Jack Brown\",\n    \"policy_number\": \"MNO3456\",\n    \"conversation\": \"Customer: Hello, I'm Ethan Wilson. My Date of Birth is July 5th, 1995, Address is 404 Oak St, Los Angeles, CA 90001, and my Policy Number is MNO3456.\\nAgent: Good morning, Ethan. How may I assist you today?\\nCustomer: Hi, Jack. I have a question regarding my policy.\\nCustomer: My garage door was damaged in a storm. Is this covered?\\nAgent: Let me review your policy for coverage related to storm damage.\\nAgent: Yes, damage to the garage door from storms is covered under your policy.\\nCustomer: That's a relief. I'll need to schedule repairs as soon as possible.\\nAgent: We'll assist you with the claim process, Ethan. Is there anything else I can help you with?\\nCustomer: No, that's all for now. Thank you for your assistance, Jack.\\nAgent: You're welcome, Ethan. Please feel free to reach out if you have any further questions or concerns.\\nCustomer: I will. Have a great day!\\nAgent: You too, Ethan. Take care.\",\n    \"summary\": \"A customer inquires about coverage for a damaged garage door after a storm, and the agent confirms that such damages are covered under the policy. The agent offers assistance with the claim process, resulting in the customer expressing relief and gratitude.\"\n  },\n  {\n    \"conversation_id\": 105,\n    \"customer_name\": \"Olivia Taylor\",\n    \"agent_name\": \"Sarah Smith\",\n    \"policy_number\": \"PQR7890\",\n    \"conversation\": \"Customer: Hi there, I'm Olivia Taylor. My Date of Birth is December 30th, 1990, Address is 505 Pine St, San Francisco, CA 94101, and my Policy Number is PQR7890.\\nAgent: Good afternoon, Olivia. How may I assist you today?\\nCustomer: Hi, Sarah. I have a question regarding my policy.\\nCustomer: A tree in my backyard has fallen and damaged my fence. Will my insurance cover repairs?\\nAgent: Let me check your policy for coverage related to fallen trees.\\nAgent: Yes, damage to the fence from fallen trees is covered under your policy.\\nCustomer: That's a relief. I'll need to schedule repairs as soon as possible.\\nAgent: We'll assist you with the claim process, Olivia. Is there anything else I can help you with?\\nCustomer: No, that's all for now. Thank you for your assistance, Sarah.\\nAgent: You're welcome, Olivia. Please feel free to reach out if you have any further questions or concerns.\\nCustomer: I will. Have a great day!\\nAgent: You too, Olivia. Take care.\",\n    \"summary\": \"A customer inquires about coverage for a damaged fence due to a fallen tree, and the agent confirms that such damages are covered under the policy. The agent offers assistance with the claim process, resulting in the customer expressing relief and gratitude.\"\n  },\n  {\n    \"conversation_id\": 106,\n    \"customer_name\": \"William Anderson\",\n    \"agent_name\": \"Jack Johnson\",\n    \"policy_number\": \"STU2345\",\n    \"conversation\": \"Customer: Hello, I'm William Anderson. My Date of Birth is August 20th, 1980, Address is 606 Elm St, Dallas, TX 75201, and my Policy Number is STU2345.\\nAgent: Good morning, William. How may I assist you today?\\nCustomer: Hi, Jack. I have a question about my policy.\\nCustomer: My basement flooded during heavy rainfall. Is water damage covered?\\nAgent: Let me review your policy for coverage related to water damage.\\nAgent: Yes, water damage from flooding is covered under your policy.\\nCustomer: That's a relief. I'll need to schedule repairs as soon as possible.\\nAgent: We'll assist you with the claim process, William. Is there anything else I can help you with?\\nCustomer: No, that's all for now. Thank you for your assistance, Jack.\\nAgent: You're welcome, William. Please feel free to reach out if you have any further questions or concerns.\\nCustomer: I will. Have a great day!\\nAgent: You too, William. Take care.\",\n    \"summary\": \"A customer inquires about coverage for water damage after a basement flooding, and the agent confirms that such damages are covered under the policy. The agent offers assistance with the claim process, resulting in the customer expressing relief and gratitude.\"\n  },\n  {\n    \"conversation_id\": 123,\n    \"customer_name\": \"Alice Smith\",\n    \"agent_name\": \"Emily Johnson\",\n    \"policy_number\": \"ABC5678\",\n    \"conversation\": \"Customer: Hi, my name is Alice Smith, Date of Birth is Feb 15th 1985, Address is 123 Main St, Anytown, NY 12345, and my Policy Number is XYZ9876.\\nAgent: Hello, Alice. How can I assist you today?\\nCustomer: I have a question about my home insurance coverage.\\nCustomer: I noticed some water damage in my basement, and I'm not sure if it's covered.\\nAgent: I'm sorry to hear about the damage. Let me review your policy to see what's covered.\\nAgent: Based on your policy, water damage from burst pipes is covered, but it depends on the cause of the damage.\\nCustomer: What if it's from heavy rainfall or flooding?\\nAgent: Unfortunately, damage from flooding is typically not covered under standard home insurance policies.\\nCustomer: That's disappointing. Is there anything I can do to get coverage for flooding?\\nAgent: You may want to consider purchasing a separate flood insurance policy to ensure you're protected.\\nCustomer: I see. Thank you for your help.\\nAgent: You're welcome, Alice. If you have any further questions, feel free to ask.\",\n    \"summary\": \"A customer inquires about home insurance coverage for water damage in the basement, and the agent confirms that damage from burst pipes is covered but explains that flooding is typically not covered under standard policies. The agent advises the customer to consider purchasing a separate flood insurance policy for protection, resulting in the customer expressing gratitude for the assistance provided.\"\n  },\n  {\n    \"conversation_id\": 124,\n    \"customer_name\": \"Michael Johnson\",\n    \"agent_name\": \"Sarah Brown\",\n    \"policy_number\": \"DEF1234\",\n    \"conversation\": \"Customer: Hi there, my name is Michael Johnson, Date of Birth is May 10th 1978, Address is 456 Oak St, Smalltown, CA 98765, and my Policy Number is QRS5678.\\nAgent: Good afternoon, Michael. How can I help you today?\\nCustomer: I'm having an issue with my home insurance policy.\\nCustomer: There's been some damage to my roof due to a recent storm, and I'm not sure if it's covered.\\nAgent: I'm sorry to hear about the damage. Let me check your policy to provide you with accurate information.\\nAgent: According to your policy, damage caused by storms, including wind and hail damage to your roof, should be covered.\\nCustomer: That's a relief to hear. What do I need to do next?\\nAgent: You'll need to file a claim with your insurance company and provide documentation of the damage, such as photos or repair estimates.\\nCustomer: Okay, I'll get started on that right away.\\nAgent: If you need any assistance with the claims process, feel free to reach out to us for help.\\nCustomer: Thank you for your assistance.\\nAgent: You're welcome, Michael. Have a great day!\",\n    \"summary\": \"A customer reports damage to their roof caused by a recent storm and seeks clarification on coverage under their home insurance policy. The agent confirms that such damage is typically covered, advises the customer to file a claim with the insurance company, and offers assistance with the claims process, resulting in the customer expressing gratitude for the assistance provided.\"\n  },\n  {\n    \"conversation_id\": 125,\n    \"customer_name\": \"Jennifer Brown\",\n    \"agent_name\": \"David Wilson\",\n    \"policy_number\": \"GHI7890\",\n    \"conversation\": \"Customer: Hello, I'm Jennifer Brown, born on March 20th, 1980, residing at 789 Elm St, Suburbia, TX 54321, and my Policy Number is LMN9012.\\nAgent: Good morning, Jennifer. How can I assist you today?\\nCustomer: Hi, I have a question about my home insurance coverage.\\nCustomer: A pipe burst in my kitchen, and there's water damage everywhere.\\nAgent: I'm sorry to hear about the incident. Let me check your policy to see what's covered.\\nAgent: Based on your policy, sudden and accidental water damage, including burst pipes, should be covered.\\nCustomer: That's a relief. What should I do next?\\nAgent: You'll need to file a claim with your insurance company and provide documentation of the damage.\\nCustomer: Okay, I'll do that right away. Thank you for your help.\\nAgent: You're welcome, Jennifer. If you have any further questions, feel free to reach out.\",\n    \"summary\": \"A customer reports water damage in the kitchen due to a burst pipe and seeks clarification on coverage under their home insurance policy. The agent confirms that sudden and accidental water damage, including burst pipes, should be covered, advises the customer to file a claim with the insurance company, and offers further assistance, resulting in the customer expressing gratitude for the help provided.\"\n  },\n  {\n    \"conversation_id\": 126,\n    \"customer_name\": \"Robert Johnson\",\n    \"agent_name\": \"Michelle Adams\",\n    \"policy_number\": \"PQR3456\",\n    \"conversation\": \"Customer: Hi, my name is Robert Johnson, DOB is July 5th, 1976, and I live at 456 Maple Ave, Cityville, OH 67890. My Policy Number is STU2345.\\nAgent: Hello, Robert. How can I assist you today?\\nCustomer: I have a concern about my home insurance policy.\\nCustomer: My neighbor's tree fell on my fence during the storm, causing damage.\\nAgent: I'm sorry to hear about the damage. Let me review your policy to see if it's covered.\\nAgent: Unfortunately, damage caused by your neighbor's tree falling on your fence may not be covered under your policy.\\nCustomer: That's disappointing. Is there anything I can do to get coverage?\\nAgent: You may want to speak with your neighbor about their homeowner's insurance policy, as their coverage may apply to this situation.\\nCustomer: I'll do that. Thank you for your assistance.\\nAgent: You're welcome, Robert. If you have any further questions, don't hesitate to ask.\",\n    \"summary\": \"A customer expresses concern about damage to their fence caused by a neighbor's tree falling during a storm and seeks clarification on coverage under their home insurance policy. The agent advises that such damage may not be covered under the customer's policy and suggests contacting the neighbor's homeowner's insurance for potential coverage, resulting in the customer expressing gratitude for the assistance provided.\"\n  },\n  {\n    \"conversation_id\": 127,\n    \"customer_name\": \"Emily Davis\",\n    \"agent_name\": \"Daniel Miller\",\n    \"policy_number\": \"UVW4567\",\n    \"conversation\": \"Customer: Hi, I'm Emily Davis, born on September 12th, 1982, residing at 789 Pine St, Hilltown, FL 45678. My Policy Number is XYZ7890.\\nAgent: Good afternoon, Emily. How can I assist you today?\\nCustomer: Hello, I need to make a change to my home insurance policy.\\nCustomer: I recently renovated my kitchen, and I need to update the coverage to reflect the changes.\\nAgent: I can assist you with that. Let me update your policy with the new information.\\nAgent: Your policy has been updated to reflect the renovation. Is there anything else I can help you with?\\nCustomer: That's all for now. Thank you for your help.\\nAgent: You're welcome, Emily. If you have any further questions or need assistance in the future, feel free to reach out.\",\n    \"summary\": \"A customer requests a change to their home insurance policy to reflect recent renovations to their kitchen. The agent assists with updating the policy accordingly, and the customer expresses gratitude for the help provided.\"\n  },\n  {\n    \"conversation_id\": 128,\n    \"customer_name\": \"Jessica Wilson\",\n    \"agent_name\": \"Ryan Thompson\",\n    \"policy_number\": \"WXY6789\",\n    \"conversation\": \"Customer: Hello, I'm Jessica Wilson, DOB is April 30th, 1974, and I live at 234 Oak St, Suburbia, CA 98765. My Policy Number is ABC1234.\\nAgent: Good morning, Jessica. How can I assist you today?\\nCustomer: Hi, I need to add an additional coverage to my home insurance policy.\\nCustomer: I recently purchased some expensive jewelry, and I want to make sure it's covered in case of theft or loss.\\nAgent: I can help you with that. Let me add a rider to your policy to cover the additional jewelry.\\nAgent: Your policy has been updated to include coverage for your jewelry. Is there anything else I can assist you with?\\nCustomer: That's all for now. Thank you for your help.\\nAgent: You're welcome, Jessica. If you have any further questions or need assistance in the future, feel free to reach out.\",\n    \"summary\": \"A customer requests to add additional coverage to their home insurance policy for recently purchased expensive jewelry to ensure protection against theft or loss. The agent assists by adding a rider to the policy for the additional coverage, and the customer expresses gratitude for the help provided.\"\n  },\n  {\n    \"conversation_id\": 129,\n    \"customer_name\": \"Andrew Brown\",\n    \"agent_name\": \"Sophia Martinez\",\n    \"policy_number\": \"JKL2345\",\n    \"conversation\": \"Customer: Hi there, I'm Andrew Brown, born on November 25th, 1986, residing at 345 Cedar St, Smalltown, TX 67890. My Policy Number is DEF5678.\\nAgent: Good afternoon, Andrew. How can I assist you today?\\nCustomer: Hello, I need to update my contact information on my home insurance policy.\\nCustomer: I recently moved, and I need to provide my new address and phone number.\\nAgent: I can assist you with that. Let me update your contact information in our system.\\nAgent: Your contact information has been updated. Is there anything else I can help you with?\\nCustomer: That's all for now. Thank you for your help.\\nAgent: You're welcome, Andrew. If you have any further questions or need assistance in the future, feel free to reach out.\",\n    \"summary\": \"A customer requests to update their contact information on their home insurance policy due to a recent move. The agent assists by updating the customer's address and phone number in the system, and the customer expresses gratitude for the help provided.\"\n  },\n  {\n    \"conversation_id\": 130,\n    \"customer_name\": \"Michelle Evans\",\n    \"agent_name\": \"Jacob Clark\",\n    \"policy_number\": \"MNO7890\",\n    \"conversation\": \"Customer: Hi, I'm Michelle Evans, DOB is June 15th, 1979, and I live at 567 Elm St, Cityville, NY 23456. My Policy Number is PQR9012.\\nAgent: Good morning, Michelle. How can I assist you today?\\nCustomer: Hello, I need to cancel my home insurance policy.\\nCustomer: I'm selling my house, so I no longer need coverage.\\nAgent: I can assist you with that. Let me process the cancellation for you.\\nAgent: Your home insurance policy has been cancelled, effective immediately. Is there anything else I can help you with?\\nCustomer: That's all, thank you for your help.\\nAgent: You're welcome, Michelle. If you have any further questions or need assistance in the future, feel free to reach out.\",\n    \"summary\": \"A customer requests to cancel their home insurance policy as they are selling their house and no longer require coverage. The agent assists by processing the cancellation, and the customer expresses gratitude for the help provided.\"\n  },\n  {\n    \"conversation_id\": 131,\n    \"customer_name\": \"David Garcia\",\n    \"agent_name\": \"Emma Moore\",\n    \"policy_number\": \"RST9012\",\n    \"conversation\": \"Customer: Hi, I'm David Garcia, born on August 8th, 1988, residing at 789 Maple St, Suburbia, CA 34567. My Policy Number is UVW1234.\\nAgent: Good morning, David. How can I assist you today?\\nCustomer: Hello, I need to inquire about adding a home office coverage to my policy.\\nCustomer: I recently started working from home and have valuable equipment that I want to protect.\\nAgent: I understand. Let me check your policy to see what options are available.\\nAgent: It appears that we offer a home business coverage option that may suit your needs.\\nCustomer: That sounds perfect. Please add it to my policy.\\nAgent: Your policy has been updated to include home business coverage. Is there anything else I can help you with?\\nCustomer: That's all for now. Thank you for your assistance.\\nAgent: You're welcome, David. If you have any further questions or need assistance in the future, feel free to reach out.\",\n    \"summary\": \"A customer requests to add home office coverage to their policy as they recently started working from home and want to protect valuable equipment. The agent confirms the availability of a home business coverage option and assists by adding it to the policy, resulting in the customer expressing gratitude for the help provided.\"\n  },\n  {\n    \"conversation_id\": 132,\n    \"customer_name\": \"Sarah Hernandez\",\n    \"agent_name\": \"John Lee\",\n    \"policy_number\": \"LMN3456\",\n    \"conversation\": \"Customer: Hi there, I'm Sarah Hernandez, born on January 12th, 1983, residing at 123 Cedar St, Hilltown, TX 12345. My Policy Number is GHI6789.\\nAgent: Good afternoon, Sarah. How can I assist you today?\\nCustomer: Hello, I recently got a pet dog and wanted to know if it affects my home insurance policy.\\nCustomer: I heard that some breeds are considered high-risk and may affect coverage.\\nAgent: Let me check your policy and see how pets are addressed.\\nAgent: According to your policy, owning a dog may affect your liability coverage.\\nCustomer: What do I need to do to ensure my coverage remains intact?\\nAgent: You may need to disclose the breed and any history of aggression to your insurance company.\\nCustomer: I'll do that. Thank you for your help.\\nAgent: You're welcome, Sarah. If you have any further questions or need assistance in the future, feel free to reach out.\",\n    \"summary\": \"A customer inquires about the impact of getting a pet dog on their home insurance policy, concerned about potential breed-related issues. The agent checks the policy and explains that owning a dog may affect liability coverage, advising the customer to disclose breed information and any history of aggression to the insurance company to ensure coverage remains intact, resulting in the customer expressing gratitude for the assistance provided.\"\n  },\n  {\n    \"conversation_id\": 133,\n    \"customer_name\": \"Christopher Martinez\",\n    \"agent_name\": \"Olivia Taylor\",\n    \"policy_number\": \"OPQ4567\",\n    \"conversation\": \"Customer: Hi, I'm Christopher Martinez, DOB is April 5th, 1980, and I live at 456 Walnut St, Smalltown, NY 89012. My Policy Number is JKL7890.\\nAgent: Good morning, Christopher. How can I assist you today?\\nCustomer: Hello, I need to renew my home insurance policy.\\nCustomer: My policy is expiring soon, and I want to ensure continuous coverage.\\nAgent: Let me check your policy renewal options and provide you with the necessary information.\\nAgent: Your policy renewal options have been reviewed, and I can assist you with the renewal process.\\nCustomer: That's great. Please proceed with the renewal.\\nAgent: Your policy has been successfully renewed. Is there anything else I can help you with?\\nCustomer: That's all for now. Thank you for your assistance.\\nAgent: You're welcome, Christopher. If you have any further questions or need assistance in the future, feel free to reach out.\",\n    \"summary\": \"A customer requests to renew their home insurance policy as it is expiring soon, seeking continuous coverage. The agent reviews renewal options, assists with the renewal process, and confirms successful renewal, resulting in the customer expressing gratitude for the assistance provided.\"\n  },\n  {\n    \"conversation_id\": 134,\n    \"customer_name\": \"Amy Thompson\",\n    \"agent_name\": \"William Davis\",\n    \"policy_number\": \"CDE7890\",\n    \"conversation\": \"Customer: Hi, I'm Amy Thompson, born on October 18th, 1984, residing at 789 Birch St, Suburbia, CA 23456. My Policy Number is EFG1234.\\nAgent: Good afternoon, Amy. How can I assist you today?\\nCustomer: Hello, I need to report a claim for damage to my home.\\nCustomer: There was a fire in my kitchen, and there's significant damage.\\nAgent: I'm sorry to hear about the fire. Let me assist you with filing a claim.\\nAgent: Your claim has been initiated, and an adjuster will contact you shortly for further assistance.\\nCustomer: Thank you for your help.\\nAgent: You're welcome, Amy. If you have any further questions or need assistance in the future, feel free to reach out.\",\n    \"summary\": \"A customer reports a claim for damage to their home due to a fire in the kitchen, seeking assistance with the claims process. The agent initiates the claim and assures the customer that an adjuster will contact them shortly for further assistance, resulting in the customer expressing gratitude for the help provided.\"\n  },\n  {\n    \"conversation_id\": 135,\n    \"customer_name\": \"Linda Wilson\",\n    \"agent_name\": \"Michael Brown\",\n    \"policy_number\": \"FGH9012\",\n    \"conversation\": \"Customer: Hi, I'm Linda Wilson, born on June 25th, 1975, residing at 234 Pine St, Cityville, TX 56789. My Policy Number is IJK2345.\\nAgent: Good morning, Linda. How can I assist you today?\\nCustomer: Hello, I'm extremely disappointed with the service I've received from your company.\\nCustomer: I filed a claim for water damage a month ago, and I still haven't received any updates.\\nAgent: I apologize for the delay in processing your claim, Linda. Let me investigate the status for you.\\nAgent: It appears that there was an oversight in processing your claim. I will expedite the review process and provide you with an update shortly.\\nCustomer: This is unacceptable. I expect better service from my insurance provider.\\nAgent: I completely understand your frustration, Linda. Rest assured, I will do everything in my power to resolve this matter promptly.\\nCustomer: I hope so. I've been a loyal customer for years, and this experience has been disappointing.\\nAgent: I sincerely apologize for the inconvenience, Linda. I'll keep you updated on the progress of your claim.\\nCustomer: Thank you.\",\n    \"summary\": \"A customer expresses extreme disappointment with the service received from the company, citing a delay in processing a claim for water damage filed a month ago. The agent acknowledges the oversight, apologizes for the inconvenience, and assures the customer of expedited review and updates on the claim's progress, with the customer expressing hope for a resolution and gratitude for the attention to the matter.\"\n  },\n  {\n    \"conversation_id\": 136,\n    \"customer_name\": \"Brian Adams\",\n    \"agent_name\": \"Jessica Miller\",\n    \"policy_number\": \"KLM3456\",\n    \"conversation\": \"Customer: Hi, I'm Brian Adams, DOB is December 10th, 1982, and I live at 345 Oak St, Hilltown, CA 78901. My Policy Number is NOP4567.\\nAgent: Good afternoon, Brian. How can I assist you today?\\nCustomer: Hello, I'm beyond frustrated with your company's billing practices.\\nCustomer: I received a notice stating that my premium has increased significantly without any explanation.\\nAgent: I apologize for the inconvenience, Brian. Let me review your policy to understand the reason for the increase.\\nAgent: It appears that there was an error in the calculation of your premium. I will escalate this issue to our billing department and ensure it's rectified immediately.\\nCustomer: This is unacceptable. I expect transparency and fairness from my insurance provider.\\nAgent: I completely understand your frustration, Brian. Rest assured, I will personally oversee the resolution of this matter and keep you updated on the progress.\\nCustomer: I appreciate your assistance, but this shouldn't have happened in the first place.\\nAgent: I apologize once again, Brian. I'll ensure that corrective measures are put in place to prevent similar issues in the future.\\nCustomer: I hope so.\",\n    \"summary\": \"A customer expresses frustration with the company's billing practices, citing a significant increase in premiums without explanation. The agent apologizes for the inconvenience, acknowledges the error in premium calculation, and assures the customer of immediate escalation and resolution, with the customer emphasizing the expectation of transparency and fairness from their insurance provider and the agent expressing commitment to preventive measures to avoid similar issues in the future.\"\n  },\n  {\n    \"conversation_id\": 137,\n    \"customer_name\": \"Karen Garcia\",\n    \"agent_name\": \"Richard Martinez\",\n    \"policy_number\": \"QRS5678\",\n    \"conversation\": \"Customer: Hi, I'm Karen Garcia, born on September 5th, 1979, residing at 456 Cedar St, Smalltown, NY 34567. My Policy Number is TUV6789.\\nAgent: Good morning, Karen. How can I assist you today?\\nCustomer: Hello, I'm extremely dissatisfied with your company's claims handling process.\\nCustomer: I filed a claim for roof damage three weeks ago, and there's been no progress or communication since then.\\nAgent: I apologize for the lack of updates, Karen. Let me investigate the status of your claim and provide you with an update.\\nAgent: It appears that there was a delay in processing your claim due to a backlog. I will expedite the review process and ensure you receive a timely resolution.\\nCustomer: This is unacceptable. I've been left in the dark for too long, and it's causing me a lot of stress.\\nAgent: I understand your frustration, Karen. Rest assured, I will personally oversee the handling of your claim and keep you informed every step of the way.\\nCustomer: I expect better from my insurance provider. This level of service is unacceptable.\\nAgent: I apologize for the inconvenience, Karen. I'll do everything in my power to address your concerns and ensure a satisfactory outcome.\\nCustomer: I hope so.\",\n    \"summary\": \"A customer expresses extreme dissatisfaction with the company's claims handling process, citing a lack of progress and communication regarding a filed claim for roof damage. The agent apologizes for the inconvenience, acknowledges the delay due to a backlog, and assures the customer of expedited review and personal oversight to ensure timely resolution, with the customer emphasizing the expectation of better service and the agent expressing commitment to addressing concerns and achieving a satisfactory outcome.\"\n  },\n  {\n    \"conversation_id\": 138,\n    \"customer_name\": \"Jason Miller\",\n    \"agent_name\": \"Michelle Harris\",\n    \"policy_number\": \"VWX7890\",\n    \"conversation\": \"Customer: Hi, I'm Jason Miller, DOB is November 15th, 1983, and I live at 567 Elm St, Suburbia, CA 45678. My Policy Number is YZA8901.\\nAgent: Good afternoon, Jason. How can I assist you today?\\nCustomer: Hello, I'm furious with your company's lack of responsiveness.\\nCustomer: I've been trying to contact your claims department for days, but I keep getting transferred and put on hold.\\nAgent: I apologize for the inconvenience, Jason. Let me escalate your issue to a supervisor for immediate assistance.\\nAgent: A supervisor will contact you shortly to address your concerns and ensure a prompt resolution.\\nCustomer: This is unacceptable. I expect better customer service from my insurance provider.\\nAgent: I completely understand your frustration, Jason. Rest assured, we will do everything in our power to rectify the situation and regain your trust.\\nCustomer: I hope so. This experience has been extremely frustrating and disappointing.\\nAgent: I sincerely apologize for the inconvenience, Jason. We value your feedback, and we're committed to improving our service standards.\\nCustomer: I appreciate that.\",\n    \"summary\": \"A customer expresses fury over the company's lack of responsiveness, stating difficulties in contacting the claims department despite attempts over several days. The agent apologizes, escalates the issue to a supervisor for immediate assistance, and assures the customer of efforts to rectify the situation and regain trust, with the customer emphasizing the expectation of better customer service and the agent expressing commitment to improvement and appreciation for the feedback.\"\n  },\n  {\n    \"conversation_id\": 139,\n    \"customer_name\": \"Rachel Clark\",\n    \"agent_name\": \"Daniel Wilson\",\n    \"policy_number\": \"BCD1234\",\n    \"conversation\": \"Customer: Hi, I'm Rachel Clark, born on February 20th, 1981, residing at 678 Walnut St, Cityville, TX 89012. My Policy Number is EFG2345.\\nAgent: Good morning, Rachel. How can I assist you today?\\nCustomer: Hello, I'm extremely disappointed with your company's claims denial decision.\\nCustomer: I filed a claim for water damage, and it was denied without any explanation.\\nAgent: I apologize for the frustration, Rachel. Let me review the details of your claim and the reason for the denial.\\nAgent: It appears that the damage was deemed to be the result of gradual wear and tear, which is not covered under your policy.\\nCustomer: This is unacceptable. I've been paying premiums for years, expecting coverage when I need it most.\\nAgent: I understand your frustration, Rachel. I'll escalate your concerns to our claims department for further review.\\nCustomer: I expect a thorough review of my claim and a fair decision. This denial has caused me a lot of stress.\\nAgent: I'll ensure that your claim is reevaluated promptly, Rachel. I apologize for any inconvenience this has caused.\\nCustomer: Thank you.\",\n    \"summary\": \"A customer expresses extreme disappointment with the company's claims denial decision regarding water damage, citing lack of explanation. The agent apologizes, reviews the claim details, and explains that the denial was due to damage deemed gradual wear and tear, not covered under the policy. The customer emphasizes the expectation of coverage after years of premium payments, and the agent escalates the concerns for further review, promising a thorough reevaluation and apologizing for any inconvenience caused.\"\n  },\n  {\n    \"conversation_id\": 140,\n    \"customer_name\": \"Emily Rodriguez\",\n    \"agent_name\": \"David Garcia\",\n    \"policy_number\": \"LMN5678\",\n    \"conversation\": \"Customer: Hi, I'm Emily Rodriguez, born on April 8th, 1986, residing at 789 Birch St, Hilltown, CA 23456. My Policy Number is OPQ6789.\\nAgent: Good morning, Emily. How can I assist you today?\\nCustomer: Hello, I'm extremely frustrated with your company's decision to deny my claim.\\nCustomer: I filed a claim for damage caused by a fallen tree, and it was denied without any explanation.\\nAgent: I understand your frustration, Emily. Let me review the details of your claim and provide you with an explanation.\\nAgent: It appears that the damage was deemed to be the result of an excluded peril, which is not covered under your policy.\\nCustomer: This is unacceptable. I've been paying premiums for years, expecting coverage when I need it most.\\nAgent: I apologize for the inconvenience, Emily. I'll escalate your concerns to our claims department for further review.\\nCustomer: I expect a thorough review of my claim and a fair decision. This denial has caused me a lot of stress.\\nAgent: I'll ensure that your claim is reevaluated promptly, Emily. I apologize for any inconvenience this has caused.\\nCustomer: Thank you.\",\n    \"summary\": \"Customer expresses frustration with claim denial for tree damage, demands explanation. Agent apologizes, cites damage as excluded peril, promises review. Customer stresses expectation of coverage, agent escalates concerns for thorough reevaluation, apologizes for inconvenience.\"\n  },\n  {\n    \"conversation_id\": 141,\n    \"customer_name\": \"Matthew Lopez\",\n    \"agent_name\": \"Emma Wilson\",\n    \"policy_number\": \"RST7890\",\n    \"conversation\": \"Customer: Hi, I'm Matthew Lopez, DOB is October 12th, 1984, and I live at 456 Cedar St, Smalltown, NY 34567. My Policy Number is TUV8901.\\nAgent: Good afternoon, Matthew. How can I assist you today?\\nCustomer: Hello, I'm extremely disappointed with your company's decision to deny my claim.\\nCustomer: I filed a claim for water damage, and it was denied without any explanation.\\nAgent: I understand your frustration, Matthew. Let me review the details of your claim and provide you with an explanation.\\nAgent: It appears that the damage was deemed to be the result of a maintenance issue, which is not covered under your policy.\\nCustomer: This is unacceptable. I've been paying premiums for years, expecting coverage when I need it most.\\nAgent: I apologize for the inconvenience, Matthew. I'll escalate your concerns to our claims department for further review.\\nCustomer: I expect a thorough review of my claim and a fair decision. This denial has caused me a lot of stress.\\nAgent: I'll ensure that your claim is reevaluated promptly, Matthew. I apologize for any inconvenience this has caused.\\nCustomer: Thank you.\",\n    \"summary\": \"Customer expresses disappointment with claim denial for water damage, demands explanation. Agent apologizes, cites damage as maintenance issue, promises review. Customer stresses expectation of coverage, agent escalates concerns for thorough reevaluation, apologizes for inconvenience.\"\n  },\n  {\n    \"conversation_id\": 142,\n    \"customer_name\": \"Amanda Thompson\",\n    \"agent_name\": \"Michael Johnson\",\n    \"policy_number\": \"UVW9012\",\n    \"conversation\": \"Customer: Hi, I'm Amanda Thompson, born on March 15th, 1983, residing at 567 Oak St, Suburbia, CA 67890. My Policy Number is XYZ0123.\\nAgent: Good morning, Amanda. How can I assist you today?\\nCustomer: Hello, I'm extremely frustrated with your company's decision to deny my claim.\\nCustomer: I filed a claim for theft of personal belongings, and it was denied without any explanation.\\nAgent: I understand your frustration, Amanda. Let me review the details of your claim and provide you with an explanation.\\nAgent: It appears that the theft was deemed to be the result of negligence, which is not covered under your policy.\\nCustomer: This is unacceptable. I've been paying premiums for years, expecting coverage when I need it most.\\nAgent: I apologize for the inconvenience, Amanda. I'll escalate your concerns to our claims department for further review.\\nCustomer: I expect a thorough review of my claim and a fair decision. This denial has caused me a lot of stress.\\nAgent: I'll ensure that your claim is reevaluated promptly, Amanda. I apologize for any inconvenience this has caused.\\nCustomer: Thank you.\",\n    \"summary\": \"Customer frustrated by claim denial for theft of personal belongings, seeks explanation. Agent apologizes, attributes theft to negligence, promises review. Customer emphasizes expectation of coverage, agent escalates concerns for thorough reevaluation, apologizes for inconvenience.\"\n  },\n  {\n    \"conversation_id\": 143,\n    \"customer_name\": \"Jennifer Lee\",\n    \"agent_name\": \"Olivia Brown\",\n    \"policy_number\": \"CDE2345\",\n    \"conversation\": \"Customer: Hi, I'm Jennifer Lee, born on August 20th, 1980, residing at 678 Pine St, Cityville, TX 45678. My Policy Number is EFG3456.\\nAgent: Good afternoon, Jennifer. How can I assist you today?\\nCustomer: Hello, I'm extremely disappointed with your company's decision to deny my claim.\\nCustomer: I filed a claim for fire damage, and it was denied without any explanation.\\nAgent: I understand your frustration, Jennifer. Let me review the details of your claim and provide you with an explanation.\\nAgent: It appears that the fire was deemed to be the result of arson, which is not covered under your policy.\\nCustomer: This is unacceptable. I've been paying premiums for years, expecting coverage when I need it most.\\nAgent: I apologize for the inconvenience, Jennifer. I'll escalate your concerns to our claims department for further review.\\nCustomer: I expect a thorough review of my claim and a fair decision. This denial has caused me a lot of stress.\\nAgent: I'll ensure that your claim is reevaluated promptly, Jennifer. I apologize for any inconvenience this has caused.\\nCustomer: Thank you.\",\n    \"summary\": \"Customer frustrated by claim denial for fire damage, seeks explanation. Agent attributes fire to arson, not covered under policy. Customer emphasizes expectation of coverage, agent escalates concerns for thorough reevaluation, apologizes for inconvenience.\"\n  },\n  {\n    \"conversation_id\": 140,\n    \"customer_name\": \"Emily White\",\n    \"agent_name\": \"Andrew Thompson\",\n    \"policy_number\": \"EFG2345\",\n    \"conversation\": \"Customer: Hi, I'm Emily White, born on July 10th, 1980, residing at 789 Pine St, Hilltown, CA 56789. My Policy Number is HIJ3456.\\nAgent: Good morning, Emily. How can I assist you today?\\nCustomer: Hello, I'm extremely disappointed with your company's decision to deny my claim.\\nCustomer: I filed a claim for water damage, but it was denied due to 'lack of timely notification.'\\nAgent: I apologize for the inconvenience, Emily. Let me review the details of your claim denial.\\nAgent: It appears that the damage occurred several weeks ago, and our policy requires claims to be reported within 72 hours.\\nCustomer: This is ridiculous. I wasn't aware of the damage until recently, and I promptly filed the claim.\\nAgent: I understand your frustration, Emily. I'll escalate your concerns to our claims department for further review.\\nCustomer: I expect a fair evaluation of my claim. This denial has caused me a lot of stress and financial burden.\\nAgent: I'll ensure that your claim is reevaluated promptly, Emily. I apologize for any inconvenience this has caused.\\nCustomer: Thank you.\",\n    \"summary\": \"Customer Emily White disappointed by claim denial for water damage due to 'lack of timely notification'. Agent attributes denial to damage reported beyond policy's 72-hour limit. Customer expresses frustration and financial burden. Agent apologizes and promises prompt reevaluation of the claim.\"\n  },\n  {\n    \"conversation_id\": 141,\n    \"customer_name\": \"James Rodriguez\",\n    \"agent_name\": \"Sophia Martinez\",\n    \"policy_number\": \"KLM4567\",\n    \"conversation\": \"Customer: Hi, I'm James Rodriguez, DOB is March 15th, 1977, and I live at 456 Cedar St, Smalltown, TX 67890. My Policy Number is NOP5678.\\nAgent: Good afternoon, James. How can I assist you today?\\nCustomer: Hello, I'm extremely frustrated with your company's decision to deny my claim.\\nCustomer: I filed a claim for hail damage to my roof, but it was denied due to 'pre-existing damage.'\\nAgent: I apologize for the inconvenience, James. Let me review the details of your claim denial.\\nAgent: It appears that there was evidence of prior damage to your roof, which was not covered under your policy.\\nCustomer: This is outrageous. I had no knowledge of any pre-existing damage, and I've been paying premiums for years.\\nAgent: I understand your frustration, James. I'll escalate your concerns to our claims department for further review.\\nCustomer: I expect a thorough investigation of my claim and a fair decision. This denial has caused me significant financial hardship.\\nAgent: I'll ensure that your claim is reevaluated promptly, James. I apologize for any inconvenience this has caused.\\nCustomer: Thank you.\",\n    \"summary\": \"Customer James Rodriguez frustrated by claim denial for hail damage due to 'pre-existing damage'. Agent attributes denial to evidence of prior damage not covered by policy. Customer expresses outrage and financial hardship. Agent promises prompt reevaluation of the claim.\"\n  },\n  {\n    \"conversation_id\": 141,\n    \"customer_name\": \"James Rodriguez\",\n    \"agent_name\": \"Sophia Martinez\",\n    \"policy_number\": \"KLM4567\",\n    \"conversation\": \"Customer: Hi, I'm James Rodriguez, DOB is March 15th, 1977, and I live at 456 Cedar St, Smalltown, TX 67890. My Policy Number is NOP5678.\\nAgent: Good afternoon, James. How can I assist you today?\\nCustomer: Hello, I'm extremely frustrated with your company's decision to deny my claim.\\nCustomer: I filed a claim for hail damage to my roof, but it was denied due to 'pre-existing damage.'\\nAgent: I apologize for the inconvenience, James. Let me review the details of your claim denial.\\nAgent: It appears that there was evidence of prior damage to your roof, which was not covered under your policy.\\nCustomer: This is outrageous. I had no knowledge of any pre-existing damage, and I've been paying premiums for years.\\nAgent: I understand your frustration, James. I'll escalate your concerns to our claims department for further review.\\nCustomer: I expect a thorough investigation of my claim and a fair decision. This denial has caused me significant financial hardship.\\nAgent: I'll ensure that your claim is reevaluated promptly, James. I apologize for any inconvenience this has caused.\\nCustomer: Thank you.\",\n    \"summary\": \" Customer disputes claim denial for hail damage, citing lack of awareness of pre-existing damage. Agent apologizes, attributing denial to evidence of prior damage not covered by the policy. Customer insists on thorough review and fair decision. Agent promises escalation for reevaluation.\"\n  },\n  {\n    \"conversation_id\": 142,\n    \"customer_name\": \"Melissa Thompson\",\n    \"agent_name\": \"David Wilson\",\n    \"policy_number\": \"PQR5678\",\n    \"conversation\": \"Customer: Hi, I'm Melissa Thompson, born on December 5th, 1979, residing at 678 Elm St, Suburbia, NY 90123. My Policy Number is STU6789.\\nAgent: Good morning, Melissa. How can I assist you today?\\nCustomer: Hello, I'm extremely disappointed with your company's decision to deny my claim.\\nCustomer: I filed a claim for fire damage to my garage, but it was denied due to 'policy exclusions.'\\nAgent: I apologize for the inconvenience, Melissa. Let me review the details of your claim denial.\\nAgent: It appears that damage caused by arson is specifically excluded from coverage under your policy.\\nCustomer: This is infuriating. The fire was accidental, and I had nothing to do with it.\\nAgent: I understand your frustration, Melissa. I'll escalate your concerns to our claims department for further review.\\nCustomer: I expect a fair evaluation of my claim. This denial has caused me a lot of stress and financial hardship.\\nAgent: I'll ensure that your claim is reevaluated promptly, Melissa. I apologize for any inconvenience this has caused.\\nCustomer: Thank you.\",\n    \"summary\": \"Customer disputes claim denial for fire damage, claiming it was accidental. Agent apologizes and explains policy exclusion for damage caused by arson. Customer insists on fair evaluation and expresses stress and financial hardship. Agent promises prompt reevaluation of the claim.\"\n  },\n  {\n    \"conversation_id\": 143,\n    \"customer_name\": \"Steven Lee\",\n    \"agent_name\": \"Emma Moore\",\n    \"policy_number\": \"UVW6789\",\n    \"conversation\": \"Customer: Hi, I'm Steven Lee, DOB is August 20th, 1985, and I live at 789 Oak St, Cityville, CA 23456. My Policy Number is XYZ7890.\\nAgent: Good afternoon, Steven. How can I assist you today?\\nCustomer: Hello, I'm extremely frustrated with your company's decision to deny my claim.\\nCustomer: I filed a claim for theft of personal belongings, but it was denied due to 'lack of evidence.'\\nAgent: I apologize for the inconvenience, Steven. Let me review the details of your claim denial.\\nAgent: It appears that there was insufficient evidence to support the claim of theft.\\nCustomer: This is unacceptable. My belongings were stolen, and I provided all the necessary documentation.\\nAgent: I understand your frustration, Steven. I'll escalate your concerns to our claims department for further review.\\nCustomer: I expect a thorough investigation of my claim and a fair decision. This denial has caused me significant financial loss.\\nAgent: I'll ensure that your claim is reevaluated promptly, Steven. I apologize for any inconvenience this has caused.\\nCustomer: Thank you.\",\n    \"summary\": \"Customer disputes claim denial for theft of personal belongings due to lack of evidence. Agent apologizes and explains insufficient evidence for the claim. Customer insists on fair investigation and expresses financial loss. Agent promises prompt reevaluation of the claim.\"\n  },\n  {\n    \"conversation_id\": 144,\n    \"customer_name\": \"Nicole Brown\",\n    \"agent_name\": \"John Davis\",\n    \"policy_number\": \"LMN6789\",\n    \"conversation\": \"Customer: Hi, I'm Nicole Brown, born on May 30th, 1983, residing at 123 Maple St, Suburbia, TX 45678. My Policy Number is ABC2345.\\nAgent: Good morning, Nicole. How can I assist you today?\\nCustomer: Hello, I'm extremely disappointed with your company's decision to deny my claim.\\nCustomer: I filed a claim for storm damage to my fence, but it was denied due to 'acts of nature exclusion.'\\nAgent: I apologize for the inconvenience, Nicole. Let me review the details of your claim denial.\\nAgent: It appears that damage caused by storms, including wind and hail, is specifically excluded from coverage under your policy.\\nCustomer: This is frustrating. I thought I was protected against such events.\\nAgent: I understand your frustration, Nicole. I'll escalate your concerns to our claims department for further review.\\nCustomer: I expect a fair evaluation of my claim. This denial has caused me a lot of stress and financial burden.\\nAgent: I'll ensure that your claim is reevaluated promptly, Nicole. I apologize for any inconvenience this has caused.\\nCustomer: Thank you.\",\n    \"summary\": \"Customer's claim for storm damage to her fence is denied due to \\\"acts of nature exclusion.\\\" Agent apologizes and explains the policy's exclusion. Customer expresses frustration and financial burden. Agent promises a prompt reevaluation of the claim.\"\n  }\n]\n"
  },
  {
    "path": "Code Examples/GenAI-RAG/cvpipeline.py",
    "content": "# 2024-11-25\n# Andreas Kretz\n# This code currently doesn't work because the preparation of the text for ElasticSearch doesn't work\n# Try to fix this and write the data\n\nimport json, os  # Importing JSON for handling JSON data and os for interacting with the operating system\nimport fitz  # PyMuPDF\nfrom llama_index.core import Document, Settings  # Importing Document class and Settings for managing LlamaIndex\nfrom llama_index.core.node_parser import SentenceSplitter  # Importing SentenceSplitter to split text into smaller chunks\nfrom llama_index.core.ingestion import IngestionPipeline  # Importing IngestionPipeline for managing data ingestion\nfrom llama_index.embeddings.ollama import OllamaEmbedding  # Importing OllamaEmbedding for generating text embeddings\nfrom llama_index.vector_stores.elasticsearch import ElasticsearchStore  # Importing ElasticsearchStore for vector storage\nfrom dotenv import load_dotenv  # Importing load_dotenv to load environment variables from a .env file\nfrom llama_index.core import VectorStoreIndex, QueryBundle, Response, Settings\nfrom llama_index.embeddings.ollama import OllamaEmbedding\nfrom llama_index.llms.ollama import Ollama\nfrom index_raw import es_vector_store\nfrom ollama import chat\nfrom ollama import ChatResponse\n\n# extract text form the pdf with PyMuPDF\ndef extract_text_from_pdf(path):\n    doc = fitz.open(path)\n    text = \"\"\n\n    for page_num in range(len(doc)):\n        page = doc.load_page(page_num)\n        page_text = page.get_text()\n        text += page_text\n    print(text)\n    \n    return text\n\n# feed the pdf into mistral and get a JSON back\n# this fails currently because I cannot get a good answer from mistral. the problem is with escaping \\n and '. \n\ndef prepare_text_to_json(text_to_summarize):\n    instruction_template = \"Here's a text. Encapsulate it into a json as a string and don't turn it into json attributes. Keep it flat. The attribute where the text should go into is called text. Create another attribute of the json called name and put the name of the person there:\"\n    \n    response: ChatResponse = chat(model='mistral', messages=[\n        {\n            'role': 'user',\n            'content': instruction_template + text_to_summarize,\n        },\n    ])\n \n    \n    print(\".....Prepared this json.....\\n\")\n    print(response['message']['content'])\n\n    return response['message']['content']\n\n\n# Define an Elasticsearch vector store with configuration for local Elasticsearch\nes_vector_store = ElasticsearchStore(\n    index_name=\"student_cvs\",  # Name of the Elasticsearch index\n    vector_field='conversation_vector',  # Field to store the vector representation of the text\n    text_field='conversation',  # Field to store the original text\n    es_url=\"http://localhost:9200\"  # URL of the local Elasticsearch instance\n)\n\nlocal_llm = Ollama(model=\"mistral\")\n\ndef main():\n    ollama_embedding = OllamaEmbedding(\"mistral\")   # Initialize the embedding model for generating embeddings using the \"mistral\" model\n\n    # Set up an ingestion pipeline with transformations and the Elasticsearch vector store\n    pipeline = IngestionPipeline(   \n        transformations=[\n            \n            SentenceSplitter(chunk_size=350, chunk_overlap=50), # Split text into chunks of size 350 with 50 characters of overlap\n            ollama_embedding, # Use the embedding model to generate embeddings for the chunks\n        ],\n        vector_store=es_vector_store  # Use the configured Elasticsearch vector store\n    )\n\n    extracted = extract_text_from_pdf('Liam_McGivney_CV.pdf')   #extract the text from the CV\n    prepped_json = prepare_text_to_json(extracted)      # prepare the json\n\n    #create a document (I think this is wrong right now)\n    documents = Document(text=prepped_json['text'], metadata={\"name\": prepped_json['name']})\n    #documents = [Document(text=item['text']) for entry in prepped_json]\n    #documents = [Document(text=item['text'], metadata={\"name\": item['name']}) for item in prepped_json]\n\n    pipeline.run(documents=documents)   # Run the pipeline to process documents and store embeddings in Elasticsearch\n    print(\".....Done running pipeline.....\\n\")  # Print a completion message\n\n# Entry point of the script\nif __name__ == \"__main__\":\n    main()  # Call the main function\n"
  },
  {
    "path": "Code Examples/GenAI-RAG/docker-compose.yml",
    "content": "services:\n\n  # Elasticsearch Docker Images: https://www.docker.elastic.co/\n  elasticsearch:\n    image: docker.elastic.co/elasticsearch/elasticsearch:8.16.0\n    container_name: elasticsearch\n    environment:\n      - xpack.security.enabled=false\n      - discovery.type=single-node\n    ulimits:\n      memlock:\n        soft: -1\n        hard: -1\n      nofile:\n        soft: 65536\n        hard: 65536\n    cap_add:\n      - IPC_LOCK\n    volumes:\n      - elasticsearch-data17:/usr/share/elasticsearch/data\n    ports:\n      - 9200:9200\n      - 9300:9300\n\n  kibana:\n    container_name: kibana\n    image: docker.elastic.co/kibana/kibana:8.16.0\n    environment:\n      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200\n    ports:\n      - 5601:5601\n    depends_on:\n      - elasticsearch\n\nvolumes:\n  elasticsearch-data17:\n    driver: local"
  },
  {
    "path": "Code Examples/GenAI-RAG/index.py",
    "content": "import json, os  # Importing JSON for handling JSON data and os for interacting with the operating system\nfrom llama_index.core import Document, Settings  # Importing Document class and Settings for managing LlamaIndex\nfrom llama_index.core.node_parser import SentenceSplitter  # Importing SentenceSplitter to split text into smaller chunks\nfrom llama_index.core.ingestion import IngestionPipeline  # Importing IngestionPipeline for managing data ingestion\nfrom llama_index.embeddings.ollama import OllamaEmbedding  # Importing OllamaEmbedding for generating text embeddings\nfrom llama_index.vector_stores.elasticsearch import ElasticsearchStore  # Importing ElasticsearchStore for vector storage\nfrom dotenv import load_dotenv  # Importing load_dotenv to load environment variables from a .env file\n\ndef get_documents_from_file(file):\n    \"\"\"Reads a JSON file and returns a list of Document objects\"\"\"\n\n    # Open the JSON file in read-text mode\n    with open(file=file, mode='rt') as f:\n        conversations_dict = json.loads(f.read()) # Load the file contents into a Python dictionary\n      \n    # Create a list of Document objects using the 'conversation' field as text \n    # and 'conversation_id' field as metadata\n    documents = [Document(text=item['conversation'],\n                          metadata={\"conversation_id\": item['conversation_id']})\n                 for item in conversations_dict]\n    \n    return documents # Return the list of Document objects\n\n\n\n# Define an Elasticsearch vector store with configuration for local Elasticsearch\nes_vector_store = ElasticsearchStore(\n    index_name=\"calls\",  # Name of the Elasticsearch index\n    vector_field='conversation_vector',  # Field to store the vector representation of the text\n    text_field='conversation',  # Field to store the original text\n    es_url=\"http://localhost:9200\"  # URL of the local Elasticsearch instance\n)\n\n# Uncomment this if using Elastic Cloud and ensure ELASTIC_CLOUD_ID and ELASTIC_API_KEY are set in the .env file\n\n# Load the .env file contents into environment variables\n# This is used to access sensitive information like API keys or credentials\n# load_dotenv('.env')\n\n# es_vector_store = ElasticsearchStore(\n#     index_name=\"calls\",  # Name of the Elasticsearch index\n#     vector_field='conversation_vector',  # Field for vector embeddings\n#     text_field='conversation',  # Field for storing original text\n#     es_cloud_id=os.getenv(\"ELASTIC_CLOUD_ID\"),  # Cloud ID from the .env file\n#     es_api_key=os.getenv(\"ELASTIC_API_KEY\")  # API key from the .env file\n# )\n\ndef main():\n    ollama_embedding = OllamaEmbedding(\"mistral\")   # Initialize the embedding model for generating embeddings using the \"mistral\" model\n\n    # Set up an ingestion pipeline with transformations and the Elasticsearch vector store\n    pipeline = IngestionPipeline(   \n        transformations=[\n            \n            SentenceSplitter(chunk_size=350, chunk_overlap=50), # Split text into chunks of size 350 with 50 characters of overlap\n            ollama_embedding, # Use the embedding model to generate embeddings for the chunks\n        ],\n        vector_store=es_vector_store  # Use the configured Elasticsearch vector store\n    )\n\n    documents = get_documents_from_file(file=\"conversations.json\")  # Load data from a JSON file and convert it to a list of Document objects\n\n    pipeline.run(documents=documents)   # Run the pipeline to process documents and store embeddings in Elasticsearch\n    print(\".....Done running pipeline.....\\n\")  # Print a completion message\n\n# Entry point of the script\nif __name__ == \"__main__\":\n    main()  # Call the main function\n"
  },
  {
    "path": "Code Examples/GenAI-RAG/query.py",
    "content": "# query.py\nfrom llama_index.core import VectorStoreIndex, QueryBundle, Response, Settings\nfrom llama_index.embeddings.ollama import  OllamaEmbedding\nfrom llama_index.llms.ollama import Ollama\nfrom index_raw import es_vector_store\n\n# Local LLM to send user query to\nlocal_llm = Ollama(model=\"mistral\") # Initialize a local language model (LLM) using the \"mistral\" model from Ollama\nSettings.embed_model= OllamaEmbedding(\"mistral\")    # Create a VectorStoreIndex from the existing Elasticsearch vector store\n\nindex = VectorStoreIndex.from_vector_store(es_vector_store) # Create a VectorStoreIndex from the existing Elasticsearch vector store\nquery_engine = index.as_query_engine(local_llm, similarity_top_k=10)    # Create a query engine from the index using the local LLM and set top-k similarity results to 10\n\n# Define the query string for the question you want to ask the system you'll see that it has some problems understanding the context \n# Especially how to find the policy number from the person's name. \n\n#query=\"Give me summary of water related issues\"\n#query=\"What policy number does emily green, born April 10th, 1988 have?\"\n#query=\"Who has the policy number DEF4567\"\n#query=\"What information about the person do you need to determin the policy number?\"\nquery=\"What policy number does emily green, living in 101 Pine St, Boston, MA 02101 have?\"\n\n# Create a QueryBundle object, which packages the query and its embedding\n# The embedding is generated using the configured embedding model in Settings\nbundle = QueryBundle(query, embedding=Settings.embed_model.get_query_embedding(query))\n\n# Use the query engine to execute the query bundle against the vector store\n# and retrieve the most relevant results\nresult = query_engine.query(bundle)\n\n# Print the results of the query to the console\nprint(result)\n\n\n\n\n"
  },
  {
    "path": "Code Examples/Movies.txt",
    "content": "Year;Length;Title;Subject;Actor;Actress;Director;Popularity;Awards;*Image\nINT;INT;STRING;CAT;CAT;CAT;CAT;INT;BOOL;STRING\n1990;111;Tie Me Up! Tie Me Down!;Comedy;Banderas, Antonio;Abril, Victoria;Almodóvar, Pedro;68;No;NicholasCage.png\n1991;113;High Heels;Comedy;Bosé, Miguel;Abril, Victoria;Almodóvar, Pedro;68;No;NicholasCage.png\n1983;104;Dead Zone, The;Horror;Walken, Christopher;Adams, Brooke;Cronenberg, David;79;No;NicholasCage.png\n1979;122;Cuba;Action;Connery, Sean;Adams, Brooke;Lester, Richard;6;No;seanConnery.png\n1978;94;Days of Heaven;Drama;Gere, Richard;Adams, Brooke;Malick, Terrence;14;No;NicholasCage.png\n1983;140;Octopussy;Action;Moore, Roger;Adams, Maud;Glen, John;68;No;NicholasCage.png\n1984;101;Target Eagle;Action;Connors, Chuck;Adams, Maud;Loma, José Antonio de la;14;No;NicholasCage.png\n1989;99;American Angels: Baptism of Blood, The;Drama;Bergen, Robert D.;Adams, Trudy;Sebastian, Beverly;28;No;NicholasCage.png\n1985;104;Subway;Drama;Lambert, Christopher;Adjani, Isabelle;Besson, Luc;6;No;NicholasCage.png\n1990;149;Camille Claudel;Drama;Depardieu, Gérard;Adjani, Isabelle;Nuytten, Bruno;32;No;NicholasCage.png\n1982;188;Fanny and Alexander;Drama;Ahlstedt, Börje;Adolphson, Kristina;Bergman, Ingmar;81;Yes;Bergman.png\n1982;117;Tragedy of a Ridiculous Man;Drama;Tognazzi, Ugo;Aimee, Anouk;Bertolucci, Bernardo;17;No;NicholasCage.png\n1966;103;A Man & a Woman;Drama;Trintignant, Jean-Louis;Aimee, Anouk;Lelouch, Claude;46;Yes;NicholasCage.png\n1986;112;A Man & a Woman: Twenty Years Later;Drama;Trintignant, Jean-Louis;Aimee, Anouk;Lelouch, Claude;49;No;NicholasCage.png\n1966;103;Un Hombre y una Mujer;Drama;Trintignant, Jean-Louis;Aimee, Anouk;Lelouch, Claude;6;Yes;NicholasCage.png\n1985;112;Official Story, The;Drama;Alterio, Hector;Aleandro, Norma;Puenzo, Luiz;39;Yes;NicholasCage.png\n1976;150;Lindbergh Kidnapping Case, The;Drama;Hopkins, Anthony;Alexander, Denise;Kulik, Buzz;51;No;AnthonyHopkins.png\n1929;84;Blackmail;Mystery;Longden, John;Algood, Sara;Hitchcock, Alfred;2;No;alfredHitchcock.png\n1963;109;Donovan's Reef;Comedy;Wayne, John;Allen, Elizabeth;Ford, John;62;No;johnWayne.png\n1988;110;Tucker: The Man & His Dream;Drama;Bridges, Jeff;Allen, Joan;Coppola, Francis Ford;68;No;NicholasCage.png\n1988;101;Scrooged;Comedy;Murray, Bill;Allen, Karen;Donner, Richard;15;No;NicholasCage.png\n1981;116;Raiders of the Lost Ark;Action;Ford, Harrison;Allen, Karen;Spielberg, Steven;8;No;NicholasCage.png\n1987;101;Running Man, The;Science Fiction;Schwarzenegger, Arnold;Alonso, Maria Conchita;Glaser, Paul Michael;31;No;NicholasCage.png\n1991;105;Predator 2;Action;Glover, Danny;Alonso, Maria Conchita;Hopkins, Stephen;79;No;NicholasCage.png\n1988;127;Colors;Drama;Penn, Sean;Alonso, Maria Conchita;Hopper, Dennis;23;No;NicholasCage.png\n1990;97;Zandalee;Drama;Cage, Nicolas;Anderson, Erika;Pillsbury, Sam;80;No;NicholasCage.png\n1988;108;Miles from Home;Drama;Anderson, Kevin;Anderson, Jo;Sinise, Gary;53;No;NicholasCage.png\n1980;;Happy Birthday to Me;Horror;Ford, Glenn;Anderson, Melissa Sue;Thompson, J. Lee;88;No;glennFord.png\n1989;88;Final Notice;Mystery;Gerard, Gil;Anderson, Melody;Stern, Steven Hilliard;88;No;NicholasCage.png\n1979;110;Quintet;Drama;Newman, Paul;Andersson, Bibi;Altman, Robert;19;No;paulNewman.png\n1960;90;Devil's Eye, The;Drama;Kulle, Jarl;Andersson, Bibi;Bergman, Ingmar;20;No;Bergman.png\n1957;91;Wild Strawberries;Drama;Sjöström, Victor;Andersson, Bibi;Bergman, Ingmar;42;Yes;Bergman.png\n1956;96;Seventh Seal, The;Drama;Sydow, Max von;Andersson, Bibi;Bergman, Ingmar;62;No;Bergman.png\n1992;90;Germicide;Drama;Taylor, Rod;Andersson, Bibi;;36;No;NicholasCage.png\n1955;86;Dreams;Drama;Björnstrand, Gunnar;Andersson, Harriet;Bergman, Ingmar;14;No;Bergman.png\n1955;95;Naked Night, The;Drama;Björnstrand, Gunnar;Andersson, Harriet;Bergman, Ingmar;38;No;Bergman.png\n1962;91;Through a Glass Darkly;Drama;Björnstrand, Gunnar;Andersson, Harriet;Bergman, Ingmar;64;Yes;Bergman.png\n1972;91;Cries & Whispers;Drama;Josephson, Erland;Andersson, Harriet;Bergman, Ingmar;18;Yes;Bergman.png\n1958;104;Barbarian & the Geisha, The;Action;Wayne, John;Ando, Eiko;Huston, John;52;No;johnWayne.png\n1967;130;Casino Royale;Comedy;Niven, David;Andress, Ursula;Hughes, Ken;11;No;NicholasCage.png\n1962;;Dr. No;Action;Connery, Sean;Andress, Ursula;Young, Terence;7;No;seanConnery.png\n1954;103;Elephant Walk;Drama;Finch, Peter;Andrews, Dana;;11;No;NicholasCage.png\n1979;121;Ten;Comedy;Moore, Dudley;Andrews, Julie;Edwards, Blake;60;No;NicholasCage.png\n1983;118;Man Who Loved Women, The;Comedy;Reynolds, Burt;Andrews, Julie;Edwards, Blake;67;No;NicholasCage.png\n1966;190;Hawaii;Drama;Sydow, Max von;Andrews, Julie;Hill, George Roy;8;No;NicholasCage.png\n1966;125;Torn Curtain;Mystery;Newman, Paul;Andrews, Julie;Hitchcock, Alfred;35;No;paulNewman.png\n1986;107;Duet for One;Drama;Bates, Alan;Andrews, Julie;Konchalovsky, Andrei;82;No;NicholasCage.png\n1965;172;Sound of Music, The;Music;Plummer, Christopher;Andrews, Julie;Wise, Robert;59;Yes;NicholasCage.png\n1985;55;Gonzo Presents Muppet Weird Stuff;Comedy;Cleese, John;Andrews, Julie;;88;No;NicholasCage.png\n1984;140;Tartuffe;Comedy;Depardieu, Gérard;Annen, Paule;Depardieu, Gérard;67;No;NicholasCage.png\n1988;104;A New Life;Comedy;Alda, Alan;Ann-Margret;Alda, Alan;53;No;NicholasCage.png\n1978;106;Magic;Mystery;Hopkins, Anthony;Ann-Margret;Attenborough, Richard;85;No;AnthonyHopkins.png\n1992;286;Tommy;Music;Daltry, Roger;Ann-Margret;Russell, Ken;5;No;NicholasCage.png\n1978;108;Big Fix, The;Mystery;Dreyfuss, Richard;Anspach, Susan;Kagan, Jeremy Paul;19;No;NicholasCage.png\n1992;95;Alan & Naomi;Drama;Haas, Lukas;Aquino, Vanessa;Vanwagenen, Sterling;3;No;NicholasCage.png\n1987;120;Fatal Attraction;Mystery;Douglas, Michael;Archer, Anne;Lyne, Adrian;61;No;NicholasCage.png\n1992;117;Patriot Games;Action;Ford, Harrison;Archer, Anne;Noyce, Phillip;28;No;NicholasCage.png\n1981;106;Woman Next Door, The;Drama;Depardieu, Gérard;Ardant, Fanny;Truffaut, François;82;No;NicholasCage.png\n1992;97;Hunting;Mystery;Savage, John;Armstrong, Kerry;Howson, Frank;68;No;NicholasCage.png\n1991;115;Bataan;War;Taylor, Robert;Arnaz, Desi;;68;No;NicholasCage.png\n1924;110;Siegfried, The Nibelungenlied;Drama;Richter, Paul;Arnold, Gertrud;Lang, Fritz;79;No;NicholasCage.png\n1991;90;Henry, Portrait of a Serial Killer;Horror;Rooker, Michael;Arnold, Tracy;;69;No;NicholasCage.png\n1988;118;Big Blue, The;Drama;Barr, Jean-Marc;Arquette, Rosanna;Besson, Luc;7;No;NicholasCage.png\n1991;115;Flight of the Intruder;Drama;Glover, Danny;Arquette, Rosanna;Milius, John;51;No;NicholasCage.png\n1986;108;Nobody's Fool;Comedy;Roberts, Eric;Arquette, Rosanna;Purcell, Evelyn;52;No;NicholasCage.png\n1985;97;After Hours;Comedy;Dunne, Griffin;Arquette, Rosanna;Scorsese, Martin;81;No;NicholasCage.png\n1985;104;Desperately Seeking Susan;Comedy;Quinn, Aidan;Arquette, Rosanna;Seidelman, Susan;41;No;NicholasCage.png\n1971;102;A New Leaf;Comedy;Matthau, Walter;Arrick, Rose;May, Elaine;83;No;NicholasCage.png\n1959;91;Killers of Kilimanjaro;Action;Taylor, Robert;Aslan, Gregoire;Thorpe, Richard;11;No;NicholasCage.png\n1926;126;Don Juan;Action;Barrymore, John;Astor, Mary;Crosland, Alan;55;No;NicholasCage.png\n1987;102;Babette's Feast;Drama;LaFont, Jean-Philippe;Audran, Stéphane;Axel, Gabriel;79;Yes;NicholasCage.png\n1989;118;Vincent, Francois, Paul & the Others;Drama;Montand, Yves;Audran, Stéphane;;20;No;NicholasCage.png\n1988;141;Thunderball;Action;Connery, Sean;Auger, Claudine;Young, Terrence;8;No;seanConnery.png\n1926;66;Lodger (Story of the London Fog);Mystery;Chesney, Arthur;Ault, Marie;Hitchcock, Alfred;76;No;alfredHitchcock.png\n1988;103;Appointment with Death;Mystery;Ustinov, Peter;Bacall, Lauren;Donaggio, Michael Winner;75;No;NicholasCage.png\n1974;128;Murder on the Orient Express;Mystery;Balsam, Martin;Bacall, Lauren;Lumet, Sidney;8;Yes;NicholasCage.png\n1955;115;Blood Alley;War;Wayne, John;Bacall, Lauren;Wellman, William;15;No;johnWayne.png\n1977;136;Spy Who Loved Me, The;Action;Moore, Roger;Bach, Barbara;Gilbert, Lewis;27;No;NicholasCage.png\n1988;100;Storm;Action;Palfy, David;Bahtia, Stacy Christensen;Winning, David;61;No;NicholasCage.png\n1991;89;Bloodbath;Horror;Hopper, Dennis;Baker, Carroll;;37;No;NicholasCage.png\n1989;103;Miami Cops;Action;Roundtree, Richard;Baker, Dawn;Bradley, Al;40;No;NicholasCage.png\n1996;96;Island of Dr. Moreau, The;Horror;Thewlis, David;Balk, Fairuza;Frankenheimer, John;39;No;NicholasCage.png\n1992;100;Eighty-Four Charing Cross Road;Drama;Hopkins, Anthony;Bancroft, Anne;Jones, David;9;No;AnthonyHopkins.png\n1980;124;Elephant Man, The;Drama;Hopkins, Anthony;Bancroft, Anne;Lynch, David;3;Yes;AnthonyHopkins.png\n1988;90;Dr Alien;Science Fiction;Jacoby, Billy;Barash, Olivia;DeCoteau, David;70;No;NicholasCage.png\n1982;120;Creepshow;Horror;Holbrook, Hal;Barbeau, Adrienne;Romero, George A.;70;No;NicholasCage.png\n1987;100;Sammy & Rosie Get Laid;Drama;Din, Ayub Khan;Barber, Frances;Frears, Stephen;6;No;NicholasCage.png\n1971;101;Goalie's Anxiety at the Penalty Kick, The;Drama;Brauss, Arthur;Bardischewski, Maria;Wenders, Wim;62;No;NicholasCage.png\n1957;99;Mademoiselle Striptease;Comedy;Gelin, Daniel;Bardot, Brigitte;Allegret, Marc;25;No;brigitteBardot.png\n1969;86;Women, The;Drama;Ronet, Maurice;Bardot, Brigitte;Aurel, Jean;66;No;brigitteBardot.png\n1958;77;That Naughty Girl;Comedy;Bretonniere, Jean;Bardot, Brigitte;Boisrond, Michel;37;No;brigitteBardot.png\n1959;90;Voulez-Vous Danser Avec Moi?;Comedy;Vidal, Henri;Bardot, Brigitte;Boisrond, Michel;16;No;brigitteBardot.png\n1967;100;A Coeur Joie, (Head Over Heels);Action;Terzieff, Laurent;Bardot, Brigitte;Bourguignon, Serge;54;No;brigitteBardot.png\n1968;113;Shalako;Westerns;Connery, Sean;Bardot, Brigitte;Dmytryk, Edward;0;No;brigitteBardot.png\n1964;102;Contempt;Drama;Palance, Jack;Bardot, Brigitte;Godard, Jean-Luc;81;No;brigitteBardot.png\n1965;100;Dear Brigitte;Comedy;Mumy, Billy;Bardot, Brigitte;Koster, Henry;71;No;brigitteBardot.png\n1962;134;A Very Private Affair;Drama;Mastroianni, Marcello;Bardot, Brigitte;Malle, Louis;30;No;brigitteBardot.png\n1964;99;Ravishing Idiot, The;Comedy;Perkins, Anthony;Bardot, Brigitte;Molinaro, Edouard;34;No;brigitteBardot.png\n1958;90;Bride Is Much Too Beautiful, The;Comedy;Jourdan, Louis;Bardot, Brigitte;Surin, Fred;70;No;brigitteBardot.png\n1955;90;Doctor at Sea;Comedy;Bogarde, Dirk;Bardot, Brigitte;Thomas, Ralph;83;No;brigitteBardot.png\n1962;100;Le Repos Du Guerrier, (Warrior's Rest);War;Hossein, Robert;Bardot, Brigitte;Vadim, Roger;8;No;brigitteBardot.png\n1957;90;And God Created Woman;Drama;Jurgens, Curt;Bardot, Brigitte;Vadim, Roger;29;No;brigitteBardot.png\n1973;87;Ms. Don Juan;Drama;Ronet, Maurice;Bardot, Brigitte;Vadim, Roger;39;No;brigitteBardot.png\n1987;97;Siesta;Drama;Byrne, Gabriel;Barkin, Ellen;Lambert, Mary;48;No;NicholasCage.png\n1932;92;Rich & Strange;Drama;Kendall, Henry;Barry, Joan;Hitchcock, Alfred;57;No;alfredHitchcock.png\n1987;104;Lionheart;Action;Stoltz, Eric;Barrymore, Deborah;Schaffner, Franklin J.;9;No;NicholasCage.png\n1982;115;E. T. The Extra-Terrestrial;Science Fiction;Wallace, Dee;Barrymore, Drew;Spielberg, Steven;8;Yes;NicholasCage.png\n1992;101;Cool World;Drama;Byrne, Gabriel;Basinger, Kim;Bakshi, Ralph;44;No;NicholasCage.png\n1988;83;Nadine;Comedy;Bridges, Jeff;Basinger, Kim;Benton, Robert;47;No;NicholasCage.png\n1989;126;Batman;Action;Nicholson, Jack;Basinger, Kim;Burton, Tim;14;No;JackNicholson.png\n1987;95;Blind Date;Comedy;Willis, Bruce;Basinger, Kim;Edwards, Blake;7;No;NicholasCage.png\n1982;101;Mother Lode;Action;Heston, Charlton;Basinger, Kim;Heston, Charlton;40;No;NicholasCage.png\n1992;125;Final Analysis;Drama;Gere, Richard;Basinger, Kim;Joanou, Phil;50;No;NicholasCage.png\n1983;134;Never Say Never Again;Action;Connery, Sean;Basinger, Kim;Kershner, Irvin;8;No;seanConnery.png\n1986;117;Nine & a Half Weeks;Drama;Rourke, Mickey;Basinger, Kim;Lyne, Adrian;7;No;NicholasCage.png\n1989;;Killjoy;Mystery;Culp, Robert;Basinger, Kim;Moxey, John Llewellyn;71;No;NicholasCage.png\n1986;108;No Mercy;Drama;Gere, Richard;Basinger, Kim;Pearce, Richard;11;No;NicholasCage.png\n1991;116;Marrying Man, The;Comedy;Baldwin, Alec;Basinger, Kim;Rees, Jerry;84;No;NicholasCage.png\n1990;123;Misery;Horror;Caan, James;Bates, Kathy;Reiner, Rob;48;Yes;NicholasCage.png\n1946;93;Crisis;Drama;Andersson, Wiktor;Baude, Anna-Lisa;Bergman, Ingmar;66;No;Bergman.png\n1984;95;Samson & Delilah;Drama;Hamilton, Antony;Bauer, Belinda;Philips, Lee;36;No;NicholasCage.png\n1990;101;Act of Piracy;Mystery;Busey, Gary;Bauer, Belinda;;74;No;NicholasCage.png\n1988;96;Split Decisions;Drama;Hackman, Gene;Beals, Jennifer;Drury, David;52;No;NicholasCage.png\n1989;103;Vampire's Kiss;Comedy;Cage, Nicolas;Beals, Jennifer;;49;No;NicholasCage.png\n1988;96;Nightmare at Noon;Action;Hauser, Wings;Beck, Kimberly;Mastorakis, Nico;0;No;NicholasCage.png\n1990;127;Presumed Innocent;Mystery;Ford, Harrison;Bedelia, Bonnie;Pakula, Alan J.;69;No;NicholasCage.png\n1942;123;Reap the Wild Wind;Drama;Wayne, John;Beecher, Janet;DeMille, Cecil B.;59;No;johnWayne.png\n1972;100;Pocket Money;Comedy;Newman, Paul;Belford, Christine;Rosenberg, Stuart;55;No;paulNewman.png\n1977;102;Mary White;Drama;Flanders, Ed;Beller, Kathleen;Taylor, Jud;2;No;NicholasCage.png\n1982;;Catch a Rising Star, Tenth Anniversary;Comedy;Belzer, Richard;Benatar, Pat;;18;No;NicholasCage.png\n1990;105;Guilty by Suspicion;Drama;De Niro, Robert;Bening, Annette;Winkler, Irwin;88;No;NicholasCage.png\n1948;99;Secret Beyond the Door;Mystery;Redgrave, Michael;Bennett, Joan;Lang, Fritz;31;No;NicholasCage.png\n1945;103;Scarlet Street;Drama;Robinson, Edward G.;Bennett, Joan;Lang, Fritz;80;No;NicholasCage.png\n1988;76;Daffy Duck's Quackbusters;Action;Blanc, Mel;Bennett, Julie;Ford, Greg;68;No;NicholasCage.png\n1985;55;Rowlf's Rhapsodies with the Muppets;Comedy;Burns, George;Berenson, Marisa;;79;No;NicholasCage.png\n1982;188;Gandhi;Drama;Kingsley, Ben;Bergen, Candice;Attenborough, Richard;7;Yes;NicholasCage.png\n1975;120;Wind & the Lion, The;Action;Connery, Sean;Bergen, Candice;Milius, John;2;No;seanConnery.png\n1971;96;Carnal Knowledge;Drama;Nicholson, Jack;Bergen, Candice;Nichols, Mike;10;No;JackNicholson.png\n1970;126;Getting Straight;Comedy;Gould, Elliott;Bergen, Candice;Rush, Richard;83;No;NicholasCage.png\n1972;90;Scarlet Letter, The;Drama;Albaicín, Rafael;Berger, Senta;Wenders, Wim;55;No;NicholasCage.png\n1935;75;Count of Old Town, The;Comedy;Adolphson, Edvin;Bergman, Ingrid;Adolphson, Edvin;72;No;ingridBergman.png\n1978;97;Autumn Sonata;Drama;Björk, Halvar;Bergman, Ingrid;Bergman, Ingmar;49;Yes;ingridBergman.png\n1944;114;Gaslight;Drama;Boyer, Charles;Bergman, Ingrid;Cukor, George;25;Yes;ingridBergman.png\n1958;100;Indiscreet;Drama;Grant, Cary;Bergman, Ingrid;Donen, Stanley;1;No;ingridBergman.png\n1941;75;Walpurgis Night;Drama;Sjöström, Victor;Bergman, Ingrid;Edgren, Gustaf;32;No;ingridBergman.png\n1948;100;Joan of Arc;Drama;Ferrer, Jose;Bergman, Ingrid;Fleming, Victor;7;No;ingridBergman.png\n1982;195;A Woman Called Golda;Drama;Beatty, Ned;Bergman, Ingrid;Gibson, Alan;15;Yes;ingridBergman.png\n1969;98;A Walk in the Spring Rain;Drama;Quinn, Anthony;Bergman, Ingrid;Green, Guy;2;No;ingridBergman.png\n1949;117;Under Capricorn;Drama;Cotten, Joseph;Bergman, Ingrid;Hitchcock, Alfred;74;No;ingridBergman.png\n1946;101;Notorious;Mystery;Grant, Cary;Bergman, Ingrid;Hitchcock, Alfred;42;No;ingridBergman.png\n1940;90;June Night;Drama;Widgren, Olof;Bergman, Ingrid;Lindberg, Per;14;No;ingridBergman.png\n1961;120;Goodbye Again;Drama;Perkins, Anthony;Bergman, Ingrid;Litvak, Anatole;6;No;ingridBergman.png\n1956;106;Anastasia;Drama;Tamiroff, Akim;Bergman, Ingrid;Litvak, Anatole;24;Yes;ingridBergman.png\n1945;126;Bells of St. Mary's, The;Drama;Crosby, Bing;Bergman, Ingrid;McCarey, Leo;31;No;ingridBergman.png\n1937;91;Intermezzo;Drama;Ekman, Gösta;Bergman, Ingrid;Molander, Gustaf;32;No;ingridBergman.png\n1938;104;A Woman's Face;Drama;Svennberg, Tore;Bergman, Ingrid;Molander, Gustaf;49;No;ingridBergman.png\n1935;90;Swedenhielms;Drama;Westergren, Håkan;Bergman, Ingrid;Molander, Gustaf;88;No;ingridBergman.png\n1939;87;Only One Night;Drama;Adolphson, Edvin;Bergman, Ingrid;Molander, Gustav;26;No;ingridBergman.png\n1938;74;Dollar;Drama;Rydeberg, Georg;Bergman, Ingrid;Molander, Gustav;19;No;ingridBergman.png\n1956;98;Elena & Her Men;Drama;Ferrer, Mel;Bergman, Ingrid;Renoir, Jean;33;No;ingridBergman.png\n1952;110;Europa Fifty-One;Drama;Knox, Alexander;Bergman, Ingrid;Rossellini, Roberto;34;No;ingridBergman.png\n1953;83;Voyage in Italy;Drama;Sanders, George;Bergman, Ingrid;Rossellini, Roberto;57;No;ingridBergman.png\n1954;81;Fear;Drama;Wieman, Mathias;Bergman, Ingrid;Rossellini, Roberto;69;No;ingridBergman.png\n1950;107;Stromboli;Drama;Vitale, Mario;Bergman, Ingrid;Rossellini, Roberto;69;No;ingridBergman.png\n1969;103;Cactus Flower;Comedy;Matthau, Walter;Bergman, Ingrid;Saks, Gene;67;Yes;ingridBergman.png\n1989;105;Hideaways;Comedy;Conover, Bruce;Bergman, Ingrid;;16;No;ingridBergman.png\n1990;90;Twenty Four Hours in a Woman's Life;Drama;Torn, Rip;Bergman, Ingrid;;16;No;ingridBergman.png\n1987;91;Programmed to Kill;Action;Ginty, Robert;Bergman, Sandahl;Holzman, Allan;71;No;NicholasCage.png\n1982;128;Conan the Barbarian;Action;Schwarzenegger, Arnold;Bergman, Sandahl;Milius, John;45;No;NicholasCage.png\n1991;91;Raw Nerve;Mystery;Ford, Glenn;Bergman, Sandahl;Prior, David A.;88;No;glennFord.png\n1970;94;Think Dirty;Comedy;Feldman, Marty;Berman, Shelley;Clark, Jim;31;No;NicholasCage.png\n1982;108;King of Comedy;Drama;De Niro, Robert;Bernhard, Sandra;Scorsese, Martin;84;No;NicholasCage.png\n1983;60;Best of the Big Laff Off, The;Comedy;Murphy, Eddie;Bernhard, Sandra;;20;No;NicholasCage.png\n1984;158;Amadeus;Drama;Abraham, F. Murray;Berridge, Elizabeth;Forman, Milos;6;Yes;NicholasCage.png\n1973;101;White Lightning;Action;Reynolds, Burt;Billingsley, Jennifer;Sargent, Joseph;54;No;NicholasCage.png\n1988;172;Unbearable Lightness of Being, The;Drama;Day-Lewis, Daniel;Binoche, Juliette;Kaufman, Philip;5;Yes;NicholasCage.png\n1972;124;Life & Times of Judge Roy Bean, The;Western;Newman, Paul;Bisset, Jacqueline;Huston, John;65;No;paulNewman.png\n1970;137;Airport;Drama;Lancaster, Burt;Bisset, Jacqueline;Seaton, George;0;Yes;burtLancaster.png\n1973;116;Day for Night;Drama;Aumont, Jean-Pierre;Bisset, Jacqueline;Truffaut, François;10;Yes;NicholasCage.png\n1952;107;Secrets of Women;Comedy;Malmsten, Birger;Björk, Anita;Bergman, Ingmar;66;No;Bergman.png\n1976;116;Burnt Offerings;Horror;Reed, Oliver;Black, Karen;Curtis, Dan;35;No;NicholasCage.png\n1969;94;Easy Rider;Drama;Fonda, Peter;Black, Karen;Hopper, Dennis;36;No;NicholasCage.png\n1991;98;Five Easy Pieces;Drama;Nicholson, Jack;Black, Karen;Rafelson, Bob;2;No;JackNicholson.png\n1974;144;Day of the Locust, The;Drama;Sutherland, Donald;Black, Karen;Schlesinger, John;81;No;NicholasCage.png\n1964;112;Goldfinger;Action;Connery, Sean;Blackman, Honor;Hamilton, Guy;77;No;seanConnery.png\n1977;117;Exorcist II, The Heretic;Horror;Burton, Richard;Blair, Linda;Boorman, John;29;No;NicholasCage.png\n1953;61;White Lightning;;Clements, Stanley;Blondell, Gloria;Bernds, Edward;;No;NicholasCage.png\n1942;88;Lady for a Night;Drama;Wayne, John;Blondell, Joan;Leigh, Jason;12;No;johnWayne.png\n1968;103;Charly;Drama;Robertson, Cliff;Bloom, Claire;Nelson, Ralph;38;Yes;NicholasCage.png\n1973;105;High Plains Drifter;Western;Eastwood, Clint;Bloom, Verna;Eastwood, Clint;57;No;clintEastwood.png\n1982;123;Honkytonk Man;Drama;Eastwood, Clint;Bloom, Verna;Eastwood, Clint;69;No;clintEastwood.png\n1990;102;Nightbreed;Horror;Cronenberg, David;Bobby, Anne;Barker, Clive;72;No;NicholasCage.png\n1987;98;Under the Sun of Satan;Drama;Depardieu, Gérard;Bonnaire, Sandrine;Pialat, Maurice;45;No;NicholasCage.png\n1985;105;Vagabond;Drama;Meril, Macha;Bonnaire, Sandrine;Varda, Agnes;49;No;NicholasCage.png\n1993;60;Bill Cosby, Live at Harrah's;Comedy;Cosby, Bill;Boosler, Elayne;;13;No;NicholasCage.png\n1974;89;Monty Python & the Holy Grail;Comedy;Chapman, Graham;Booth, Connie;Gilliam, Terry;83;No;NicholasCage.png\n1993;65;John Cleese on How to Irritate People;Comedy;Cleese, John;Booth, Connie;;62;No;NicholasCage.png\n1958;101;Matchmaker, The;Comedy;Perkins, Anthony;Booth, Shirley;Anthony, Joseph;67;No;NicholasCage.png\n1981;129;For Your Eyes Only;Action;Moore, Roger;Bouquet, Carole;Glen, John;86;No;NicholasCage.png\n1928;139;Wings;War;Rogers, Buddy;Bow, Clara;Wellman, William;44;Yes;NicholasCage.png\n1992;106;Medicine Man;Action;Connery, Sean;Bracco, Lorraine;McTiernan, John;6;No;seanConnery.png\n1989;;Good Fellas;Drama;De Niro, Robert;Bracco, Lorraine;Scorsese, Martin;15;No;NicholasCage.png\n1985;119;Kiss of the Spider Woman;Drama;Hurt, William;Braga, Sonia;Babenco, Hector;10;Yes;NicholasCage.png\n1990;121;Rookie, THe;Action;Eastwood, Clint;Braga, Sonia;Eastwood, Clint;48;No;clintEastwood.png\n1973;129;Sting, The;Drama;Newman, Paul;Brennan, Eileen;Hill, George Roy;83;Yes;paulNewman.png\n1958;96;Torpedo Run;War;Ford, Glenn;Brewster, Diane;Pevney, Joseph;50;No;glennFord.png\n1986;101;Instant Justice;Drama;Paré, Michael;Bridges, Lynda;Rumar, Craig;45;No;NicholasCage.png\n1990;135;Cyrano de Bergerac;Drama;Depardieu, Gérard;Brochet, Anne;Rappeneau, Jean-Paul;76;No;NicholasCage.png\n1948;110;Border Street;Drama;Fijewski, Tadeusz;Broniewska, Maria;Ford, Aleksander;73;No;NicholasCage.png\n1987;91;Firehouse;Comedy;Hopkins, Barrett;Brown, Violet;Ingvordsen, J. Christian;66;No;NicholasCage.png\n1965;123;Morituri;Drama;Brando, Marlon;Brynner, Yul;Wicki, Bernhard;9;No;brando.png\n1980;104;From the Life of the Marionettes;Drama;Atzorn, Robert;Buchegger, Christine;Bergman, Ingmar;58;No;Bergman.png\n1988;120;Frantic;Mystery;Ford, Harrison;Buckley, Betty;Polanski, Roman;17;No;NicholasCage.png\n1978;114;Coma;Science Fiction;Douglas, Michael;Bujold, Geneviève;Crichton, Michael;64;No;NicholasCage.png\n1988;117;Dead Ringers;Drama;Irons, Jeremy;Bujold, Geneviève;Cronenberg, David;29;No;NicholasCage.png\n1988;90;Golden Ninja Invasion;Action;West, Leonard;Burd, Stephanie;Lambert, Bruce;13;No;NicholasCage.png\n1973;122;Exorcist, The;Horror;Sydow, Max von;Burstyn, Ellen;Friedkin, William;28;Yes;NicholasCage.png\n1975;112;Alice Doesn't Live Here Anymore;Comedy;Kristofferson, Kris;Burstyn, Ellen;;82;Yes;NicholasCage.png\n1982;94;Eyes of the Amaryllis, The;Drama;Bolt, Jonathan;Byrne, Martha;King Keller, Frederick;70;No;NicholasCage.png\n1952;109;What Price Glory?;War;Cagney, James;Calvet, Corinne;Ford, John;4;No;johnFord.png\n1954;40;Inauguration of the Pleasure Dome;Short;De Brier, Sampson;Cameron, Marjorie;Anger, Kenneth;62;No;NicholasCage.png\n1989;114;School Daze;Comedy;Fishburne, Larry;Campbell, Tisha;Lee, Spike;18;No;NicholasCage.png\n1990;102;End of Innocence, The;Drama;Heard, John;Cannon, Dyan;Cannon, Dyan;6;No;NicholasCage.png\n1971;98;Anderson Tapes, The;Mystery;Connery, Sean;Cannon, Dyan;Lumet, Sidney;1;No;seanConnery.png\n1983;50;Father Murphy, A Horse from Heaven;Comedy;Olsen, Merlin;Cannon, Katharine;Claxton, William F.;28;No;NicholasCage.png\n1989;80;Skull;Drama;Bideman, Robert;Capone, Nadia;Bergman, Robert;19;No;NicholasCage.png\n1987;91;Quick & The Dead, The;Western;Elliott, Sam;Capshaw, Kate;Day, Robert;40;No;NicholasCage.png\n1984;94;Best Defense;Comedy;Moore, Dudley;Capshaw, Kate;Huyck, Willard;75;No;NicholasCage.png\n1984;99;Dreamscape;Science Fiction;Quaid, Dennis;Capshaw, Kate;Ruben, Joseph;63;No;NicholasCage.png\n1989;125;Black Rain;Action;Douglas, Michael;Capshaw, Kate;Scott, Ridley;73;No;NicholasCage.png\n1963;138;8 1/2;Drama;Mastroianni, Marcello;Cardinale, Claudia;Fellini, Federico;80;Yes;NicholasCage.png\n1935;64;One Frightened Night;Horror;Ford, Wallace;Carlisle, Mary;Cabanne, Christy;33;No;NicholasCage.png\n1988;103;Year My Voice Broke, The;Drama;Taylor, Noah;Carmen, Loene;Duigan, John;71;No;NicholasCage.png\n1966;175;Is Paris Burning?;War;Belmondo, Jean-Paul;Caron, Leslie;Clément, René;63;No;NicholasCage.png\n1974;313;QB VII;Drama;Hopkins, Anthony;Caron, Leslie;Gries, Tom;28;Yes;AnthonyHopkins.png\n1977;104;Island of Dr. Moreau, The;Horror;Lancaster, Burt;Carrera, Barbara;Taylor, Don;54;No;burtLancaster.png\n1983;104;Beyond the Limit;Drama;Caine, Michael;Carrillo, Elpidia;Mackenzie, John;51;No;NicholasCage.png\n1936;84;Secret Agent;Mystery;Lorre, Peter;Carroll, Madeleine;Hitchcock, Alfred;50;No;alfredHitchcock.png\n1986;71;Paramount Comedy Theater: Well-Developed;Comedy;Mahler, Bruce;Carter, Judy;;40;No;NicholasCage.png\n1972;71;Big Bust Out, The;Action;Kendall, Tony;Carter, Karen;Theumer, Ernst R. von;50;No;NicholasCage.png\n1987;119;Fourth Protocol, The;Mystery;Caine, Michael;Cassidy, Joanna;Mackenzie, John;14;No;NicholasCage.png\n1990;107;Gremlins 2: The New Batch;Comedy;Galligan, Zach;Cates, Phoebe;Dante, Joe;61;No;NicholasCage.png\n1982;92;Fast Times at Ridgemont High;Comedy;Penn, Sean;Cates, Phoebe;Heckerling, Amy;65;No;NicholasCage.png\n1987;;Mannequin;Comedy;McCarthy, Andrew;Cattrall, Kim;Gottlieb, Michael;23;No;NicholasCage.png\n1977;91;Rabid;Horror;Moore, Frank;Chambers, Marilyn;Cronenberg, David;34;No;NicholasCage.png\n1990;;Party, The;Comedy;Sellers, Peter;Champion, Marge;Edwards, Blake;32;No;NicholasCage.png\n1989;90;Vampire Raiders, Ninja Queen;Action;Peterson, Chris;Chan, Agnes;Lambert, Bruce;15;No;NicholasCage.png\n1970;26;Bloopers from Star Trek;Comedy;Lawford, Peter;Channing, Carol;;22;No;NicholasCage.png\n1943;99;Destroyer;Action;Robinson, Edward G.;Chapman, Marguerite;Seiter, William A.;87;No;NicholasCage.png\n1992;99;Party Girl;Comedy;Taylor, Robert;Charisse, Cyd;Ray, Nicholas;85;No;NicholasCage.png\n1989;113;Twin Peaks;Mystery;MacLachlan, Kyle;Chen, Joan;Lynch, David;86;No;kyle.png\n1987;103;Moonstruck;Comedy;Cage, Nicholas;Cher;Jewison, Norman;6;Yes;NicholasCage.png\n1987;119;Witches of Eastwick, The;Comedy;Nicholson, Jack;Cher;Miller, George;8;No;NicholasCage.png\n1979;128;Moonraker;Action;Moore, Roger;Chiles, Lois;Gilbert, Lewis;32;No;NicholasCage.png\n1984;106;Beat Street;Drama;Davis, Guy;Chong, Rae Dawn;Lathan, Stan;72;No;NicholasCage.png\n1986;88;Running Out of Luck;Comedy;Jagger, Mick;Chong, Rae Dawn;;16;No;NicholasCage.png\n1989;90;Never on Tuesday;Drama;Lauer, Andrew;Christian, Claudia;Rifkin, Adam;77;No;NicholasCage.png\n1975;109;Shampoo;Comedy;Beatty, Warren;Christie, Julie;Ashby, Hal;69;Yes;NicholasCage.png\n1985;111;Power;Drama;Hackman, Gene;Christie, Julie;Lumet, Sidney;43;No;NicholasCage.png\n1965;122;Darling;Drama;Harvey, Laurence;Christie, Julie;Schlesinger, John;44;Yes;NicholasCage.png\n1963;120;Ugly American, The;Drama;Brando, Marlon;Church, Sandra;Englund, George;63;No;brando.png\n1931;68;Ambassador Bill;Comedy;Rogers, Will;Churchill, Marguerite;Taylor, Sam;66;No;NicholasCage.png\n1931;110;Big Trail, The;Western;Wayne, John;Churchill, Marguerite;Walsh, Raoul;22;No;johnWayne.png\n1967;111;Hombre;Western;Newman, Paul;Cilento, Diane;Ritt, Martin;50;No;paulNewman.png\n1968;103;Coogan's Bluff;Action;Eastwood, Clint;Clark, Susan;Siegel, Don;57;No;clintEastwood.png\n1989;91;Penn & Teller Get Killed;Comedy;Penn, Jillette;Clarke, Caitlin;Penn, Arthur;12;No;NicholasCage.png\n1987;118;Shy People;Drama;Philbin, John;Clayburgh, Jill;Konchalovsky, Andrei;7;No;NicholasCage.png\n1980;91;It's My Turn;Comedy;Douglas, Michael;Clayburgh, Jill;Weill, Claudia;0;No;NicholasCage.png\n1988;119;Dangerous Liaisons;Drama;Malkovich, John;Close, Glenn;Frears, Stephen;77;No;MichellePfeiffer.png\n1990;111;Reversal of Fortune;Drama;Irons, Jeremy;Close, Glenn;Schroeder, Barbet;73;Yes;NicholasCage.png\n1991;119;Meeting Venus;Comedy;Arestrup, Niels;Close, Glenn;Szabó, István;74;No;NicholasCage.png\n1946;105;Tomorrow Is Forever;Drama;Welles, Orson;Colbert, Claudette;;65;No;NicholasCage.png\n1987;101;Like Father Like Son;Comedy;Cameron, Kirk;Colin, Margaret;Daniel, Rod;20;No;NicholasCage.png\n1948;81;Rope;Drama;Stewart, James;Collier, Constance;Hitchcock, Alfred;39;No;alfredHitchcock.png\n1962;91;Road to Hong Kong;Comedy;Hope, Bob;Collins, Joan;Panama, Norman;37;No;NicholasCage.png\n1989;108;Shirley Valentine;Comedy;Conti, Tom;Collins, Pauline;Gilbert, Lewis;51;No;NicholasCage.png\n1992;135;City of Joy;Drama;Swayze, Patrick;Collins, Pauline;Joffe, Roland;87;No;NicholasCage.png\n1966;99;Appaloosa, The;Western;Brando, Marlon;Comer, Anjanette;Furie, Sidney J.;15;No;brando.png\n1986;88;Seven Minutes in Heaven;Comedy;Thames, Byron;Connelly, Jennifer;Feferman, Linda;49;No;NicholasCage.png\n1991;96;Hearts of Darkness, A Filmmaker's Apocalypse;Drama;Bottoms, Sam;Coppola, Eleanor;Bahr, Fax;72;No;NicholasCage.png\n1961;66;Tonight for Sure;Comedy;Lee, Karla;Cornell, Laura;Coppola, Francis Ford;4;No;NicholasCage.png\n1990;110;White Hunter, Black Heart;Adventure;Eastwood, Clint;Cornwell, Charlotte;Eastwood, Clint;66;No;clintEastwood.png\n1962;110;Sundays & Cybele;Drama;Kruger, Hardy;Courcel, Nicole;Bourguignon, Serge;11;Yes;NicholasCage.png\n1989;90;Puppet Master;Science Fiction;LeMat, Paul;Crampton, Barbara;Schmoeller, David;20;No;NicholasCage.png\n1991;95;Night Gallery;Horror;McDowall, Roddy;Crawford, Joan;Spielberg, Steven;31;No;NicholasCage.png\n1989;103;Pet Sematary;Horror;Gwynne, Fred;Crosby, Denise;Lambert, Mary;27;No;NicholasCage.png\n1992;60;America's Music, Gospel;Music;Phipps, Wentley;Crouch, Sandra;Walton, Kip;13;No;NicholasCage.png\n1977;123;Slap Shot;Comedy;Newman, Paul;Crouse, Lindsay;Hill, George Roy;82;No;paulNewman.png\n1987;109;O. C. & Stiggs;Comedy;Jenkins, Daniel H.;Curtin, Jane;Altman, Robert;3;No;NicholasCage.png\n1988;108;A Fish Called Wanda;Comedy;Cleese, John;Curtis, Jamie Lee;Crichton, Charles;7;Yes;NicholasCage.png\n1954;96;A Lesson in Love;Comedy;Björnstrand, Gunnar;Dahlbeck, Eva;Bergman, Ingmar;48;No;Bergman.png\n1957;82;Brink of Life;Drama;Josephson, Erland;Dahlbeck, Eva;Bergman, Ingmar;57;No;Bergman.png\n1986;120;Betty Blue;Drama;Anglade, Jean-Hughes;Dalle, Béatrice;Beineix, Jean-Jacques;71;No;NicholasCage.png\n1979;122;Hair;Music;Savage, John;D'Angelo, Beverly;Forman, Milos;67;No;NicholasCage.png\n1989;97;National Lampoon's Christmas Vacation;Comedy;Chase, Chevy;D'Angelo, Beverly;S, Jeremiah;81;No;NicholasCage.png\n1974;124;Dersu Uzala, (The Hunter);Adventure;Solomin, Yuri;Danilchenko, Svetlana;Kurosawa, Akira;81;Yes;NicholasCage.png\n1990;106;Alice;Comedy;Baldwin, Alec;Danner, Blythe;Allen, Woody;22;No;woody.png\n1980;90;Fifth Floor, The;Mystery;Hopkins, Bo;D'Arbanville, Patti;Avedis, Howard Hikmet;74;No;NicholasCage.png\n1990;94;Snow Kill;Drama;Knox, Terence;D'Arbanville, Patti;Wright, Thomas J.;35;No;NicholasCage.png\n1971;74;People, The;Drama;Shatner, William;Darby, Kim;Coppola, Francis Ford;36;No;NicholasCage.png\n1969;128;True Grit;Western;Wayne, John;Darby, Kim;Hathaway, Henry;77;Yes;johnWayne.png\n1942;18;Battle of Midway, The;War;Crisp, Donald;Darwell, Jane;Ford, John;75;No;johnFord.png\n1948;103;Three Godfathers;Western;Wayne, John;Darwell, Jane;Ford, John;72;No;johnWayne.png\n1965;133;Hush, Hush, Sweet Charlotte;Mystery;Cotten, Joseph;Davis, Bette;Aldrich, Robert;68;No;NicholasCage.png\n1946;110;A Stolen Life;Drama;Ford, Glenn;Davis, Bette;Bernhardt, Curtis;20;No;glennFord.png\n1939;96;Old Maid, The;Drama;Brent, George;Davis, Bette;Goulding, Edmund;18;No;NicholasCage.png\n1950;138;All about Eve;Drama;Sanders, George;Davis, Bette;Mankiewicz, Joseph L.;23;Yes;NicholasCage.png\n1986;96;Fly, The;Horror;Goldblum, Jeff;Davis, Geena;Cronenberg, David;33;No;NicholasCage.png\n1990;89;Quick Change;Comedy;Murray, Bill;Davis, Geena;Franklin, Howard ;24;No;NicholasCage.png\n1988;93;Lair of the White Worm, The;Horror;Grant, Hugh;Davis, Sammi;Russell, Ken;16;No;NicholasCage.png\n1989;104;Rainbow, The;Drama;Hemmings, David;Davis, Sammi;Russell, Ken;53;No;NicholasCage.png\n1956;120;Man Who Knew Too Much, The;Mystery;Stewart, James;Day, Doris;Hitchcock, Alfred;15;No;alfredHitchcock.png\n1992;90;Beauty & the Beast;Science Fiction;Marais, Jean;Day, Josette;Cocteau, Jean;14;No;NicholasCage.png\n1940;120;Foreign Correspondent;Mystery;McCrea, Joel;Day, Laraine;Hitchcock, Alfred;61;No;alfredHitchcock.png\n1949;115;Heiress, The;Drama;Richardson, Ralph;De Havilland, Olivia;Wyler, William;81;Yes;NicholasCage.png\n1986;120;Boy Who Could Fly, The;Drama;Underwood, Jay;Deakins, Lucy;Castle, Nick;25;No;NicholasCage.png\n1975;89;Terrorists, The;Action;Connery, Sean;Dean, Isabel;Wrede, Caspar;4;No;seanConnery.png\n1942;85;Wheel of Fortune;Drama;Wayne, John;Dee, Frances;Auer, John H.;36;No;johnWayne.png\n1989;120;Do the Right Thing;Drama;Aiello, Danny;Dee, Ruby;Lee, Spike;5;No;NicholasCage.png\n1990;93;Court-Martial of Jackie Robinson, The;Drama;Braugher, Andre;Dee, Ruby;Peerce, Larry;33;No;NicholasCage.png\n1967;90;Elvira Madigan;Drama;Berggren, Thommy;Degermark, Pia;Widerberg, Bo;28;No;NicholasCage.png\n1992;86;Hurricane Smith;Action;Weathers, Carl;Delaney, Cassandra;Budds, Colin;16;No;NicholasCage.png\n1987;86;Fair Game;Action;Ford, Peter;Delaney, Cassandra;;24;No;NicholasCage.png\n1989;95;Rape of the Sabines, The;Action;Moore, Roger;Demongeot, Mylene;;83;No;NicholasCage.png\n1983;99;Risky Business;Comedy;Cruise, Tom;DeMornay, Rebecca;Brickman, Paul;28;No;NicholasCage.png\n1980;103;I Love All of You (Je Vous Aime);Drama;Depardieu, Gérard;Deneuve, Catherine;Berri, Claude;40;No;NicholasCage.png\n1986;108;Love Songs;Drama;Lambert, Christopher;Deneuve, Catherine;Chouraqui, Elie;15;No;NicholasCage.png\n1983;114;Le Choix des Armes;Mystery;Montand, Yves;Deneuve, Catherine;Comeau, Alain;15;No;NicholasCage.png\n1981;135;Choice of Arms;Action;Montand, Yves;Deneuve, Catherine;Corneau, Alan;87;No;NicholasCage.png\n1977;107;March or Die;War;Hackman, Gene;Deneuve, Catherine;Richards, Dick;59;No;NicholasCage.png\n1980;135;Last Metro, The;Drama;Depardieu, Gérard;Deneuve, Catherine;Truffaut, François;66;No;NicholasCage.png\n1986;120;Jean de Florette;Drama;Montand, Yves;Depardieu, Elizabeth;Berri, Claude;87;Yes;NicholasCage.png\n1989;127;Fat Man & Little Boy;Drama;Newman, Paul;Dern, Laura;Joffe, Roland;86;No;paulNewman.png\n1990;125;Wild at Heart;Drama;Cage, Nicolas;Dern, Laura;Lynch, David;6;No;NicholasCage.png\n1989;113;Family Business;Action;Connery, Sean;DeSoto, Rosana;Lumet, Sidney;5;No;seanConnery.png\n1988;103;Stand & Deliver;Drama;Olmos, Edward James;DeSoto, Rosana;Menendez, Ramon;19;No;NicholasCage.png\n1981;94;Looker;Science Fiction;Finney, Albert;Dey, Susan;Crichton, Michael;62;No;NicholasCage.png\n1989;89;Fire & Rain;Action;Haid, Charles;Dickinson, Angie;Jameson, Jerry;10;No;NicholasCage.png\n1990;56;Best of Candid Camera, The;Comedy;Allen, Woody;Dickinson, Angie;;12;No;woody.png\n1940;83;Seven Sinners;Drama;Wayne, John;Dietrich, Marlene;Garnett, Tay;24;No;johnWayne.png\n1961;190;Judgment at Nuremberg;Drama;Tracy, Spencer;Dietrich, Marlene;Kramer, Stanley;39;Yes;spencerTracy.png\n1989;60;Minsky's Follies;Comedy;Taylor, Rip;Diller, Phyllis;;12;No;NicholasCage.png\n1990;97;Novice, The;Comedy;Sharif, Omar;Dombasle, Arielle;;72;No;NicholasCage.png\n1987;130;Wings of Desire;Drama;Ganz, Bruno;Dommartin, Solveig;Wenders, Wim;71;No;NicholasCage.png\n1991;158;Until the End of the World;Drama;Hurt, William;Dommartin, Solveig;Wenders, Wim;57;No;NicholasCage.png\n1987;118;Castaway;Drama;Reed, Oliver;Donohoe, Amanda;Roeg, Nicolas;41;No;NicholasCage.png\n1993;30;Alfred Hitchcock Presents, Sorcerer's Apprentice;Mystery;Hitchcock, Alfred;Dors, Diana;;60;No;NicholasCage.png\n1991;99;Delicatessen;Comedy;Benezech, Pascal;Dougnac, Marie-Laure;Caro, Marc;78;No;NicholasCage.png\n1979;110;Great Train Robbery, The;Mystery;Connery, Sean;Down, Lesley-Anne;Crichton, Michael;7;No;seanConnery.png\n1991;110;Hanover Street;Drama;Ford, Harrison;Down, Lesley-Anne;Hyams, Peter;81;No;NicholasCage.png\n1991;102;Hunchback;Drama;Hopkins, Anthony;Down, Lesley-Anne;Tuchner, Michael;33;No;AnthonyHopkins.png\n1946;97;My Darling Clementine;Western;Fonda, Henry;Downs, Cathy;Ford, John;12;No;johnFord.png\n1950;86;Wagon Master;Western;Johnson, Ben;Dru, Joanne;Ford, John;30;No;johnFord.png\n1949;93;She Wore a Yellow Ribbon;Western;Wayne, John;Dru, Joanne;Ford, John;84;No;johnWayne.png\n1985;90;Fantasy Man;Comedy;Hopkins, Harold;Drynan, Jeanie;Meagher, John;82;No;NicholasCage.png\n1986;87;Monster in the Closet;Comedy;Grant, Donald;DuBarry, Denise;Dahlin, Bob;39;No;NicholasCage.png\n1992;85;Double Edge;Drama;Eban, Abba;Dunaway, Faye;Kollek, Amos;69;No;clintEastwood.png\n1976;116;Network;Comedy;Finch, Peter;Dunaway, Faye;Lumet, Sidney;48;Yes;NicholasCage.png\n1974;131;Chinatown;Drama;Nicholson, Jack;Dunaway, Faye;Polanski, Roman;55;Yes;JackNicholson.png\n1975;117;Three Days of the Condor;Drama;Redford, Robert;Dunaway, Faye;Pollack, Sydney;87;No;NicholasCage.png\n1977;134;Voyage of the Damned;Drama;Sydow, Max von;Dunaway, Faye;Rosenberg, Stuart;34;No;NicholasCage.png\n1987;97;Barfly;Drama;Rourke, Mickey;Dunaway, Faye;Schroeder, Barbet;23;No;NicholasCage.png\n1990;104;Wait Until Spring, Bandini;Drama;Mantegna, Joe;Dunaway, Faye;;20;No;NicholasCage.png\n1947;118;Life with Father;Comedy;Powell, William;Dunne, Irene;Curtiz, Michael;10;No;NicholasCage.png\n1943;;A Guy Named Joe;Drama;Tracy, Spencer;Dunne, Irene;Fleming, Victor;42;No;spencerTracy.png\n1974;117;Stavisky;Drama;Belmondo, Jean-Paul;Duperey, Anny;Resnais, Alain;1;No;NicholasCage.png\n1981;117;Time Bandits;Comedy;Cleese, John;Duvall, Shelley;Gilliam, Terry;5;No;NicholasCage.png\n1980;144;Shining, The;Horror;Nicholson, Jack;Duvall, Shelley;Kubrick, Stanley;32;No;JackNicholson.png\n1945;91;Flame of Barbary Coast;Western;Wayne, John;Dvorak, Ann;Kane, Joseph;54;No;johnWayne.png\n1993;92;Naked Truth, The;Comedy;Sellers, Peter;Eaton, Shirley;;34;No;NicholasCage.png\n1979;92;Brood, The;Horror;Reed, Oliver;Eggar, Samantha;Cronenberg, David;51;No;NicholasCage.png\n1970;123;Molly Maguires, The;Action;Connery, Sean;Eggar, Samantha;Ritt, Martin;3;No;seanConnery.png\n1984;105;Beverly Hills Cop;Comedy;Murphy, Eddie;Eilbacher, Lisa;Brest, Martin;41;No;NicholasCage.png\n1991;86;Blind Man's Bluff;Mystery;Urich, Robert;Eilbacher, Lisa;Quinn, James;64;No;NicholasCage.png\n1961;140;La Dolce Vita;Drama;Mastroianni, Marcello;Ekberg, Anita;Fellini, Federico;20;No;NicholasCage.png\n1966;103;After the Fox;Comedy;Sellers, Peter;Ekland, Britt;De Sica, Vittorio;60;No;NicholasCage.png\n1974;127;Man with the Golden Gun, The;Action;Moore, Roger;Ekland, Britt;Hamilton, Guy;41;No;NicholasCage.png\n1985;96;Marbella;Action;Taylor, Rod;Ekland, Britt;Hermoso, Miguel;45;No;NicholasCage.png\n1967;103;Bobo, The;Comedy;Sellers, Peter;Ekland, Britt;Parrish, Robert;80;No;NicholasCage.png\n1993;53;Big Bands, The;Music;Beneke, Tex;Elgart, Les;;48;No;NicholasCage.png\n1992;97;Killer Image.;Mystery;Ironside, Michael;Errickson, Krista;Winning, David;8;No;NicholasCage.png\n1987;94;Kandyland;Drama;Laulette, Charles;Evenson, Kim;Schnitzer, Robert Allen;41;No;NicholasCage.png\n1987;94;Campus Man;Drama;Dye, John;Fairchild, Morgan;Casden, Ron;38;No;NicholasCage.png\n1956;101;Jubal;Drama;Ford, Glenn;Farr, Felicia;Daves, Delmer;32;No;glennFord.png\n1985;84;Purple Rose of Cairo, The;Comedy;Aiello, Danny;Farrow, Mia;Allen, Woody;20;Yes;woody.png\n1984;85;Broadway Danny Rose;Comedy;Allen, Woody;Farrow, Mia;Allen, Woody;14;No;woody.png\n1992;108;Husbands & Wives;Comedy;Allen, Woody;Farrow, Mia;Allen, Woody;80;No;woody.png\n1986;103;Hannah & Her Sisters;Comedy;Caine, Michael;Farrow, Mia;Allen, Woody;8;Yes;woody.png\n1979;115;Hurricane;Action;Robards, Jason;Farrow, Mia;Troell, Jan;8;No;NicholasCage.png\n1986;95;Between Two Women;Drama;Nouri, Michael;Fawcett, Farrah;Avnet, John;52;No;NicholasCage.png\n1981;96;Cannonball Run, The;Comedy;Reynolds, Burt;Fawcett, Farrah;Needham, Hal;80;No;NicholasCage.png\n1936;70;Doughnuts & Society;Comedy;Nugent, Eddie;Fazenda, Louise;Collins, Lewis D.;28;No;NicholasCage.png\n1978;450;Holocaust;Drama;Bottoms, Joseph;Feldshuh, Tovah;Chomsky, Marvin J.;1;No;NicholasCage.png\n1990;103;Meridian;Science Fiction;Jamieson, Malcolm;Fenn, Sherilyn;Band, Charles;47;No;NicholasCage.png\n1992;90;Diary of a Hitman;Drama;Whitaker, Forest;Fenn, Sherilyn;London, Roy;67;No;NicholasCage.png\n1988;95;Gor;Action;Reed, Oliver;Ferratti, Rebecca;Kiersch, Fritz;2;No;NicholasCage.png\n1987;95;Surrender;Comedy;Caine, Michael;Field, Sally;Belson, Jerry;84;No;NicholasCage.png\n1984;112;Places in the Heart;Drama;Harris, Ed;Field, Sally;Benton, Robert;83;Yes;NicholasCage.png\n1991;106;Not Without My Daughter;Drama;Molina, Alfred;Field, Sally;Gilbert, Brian;55;No;NicholasCage.png\n1977;113;Heroes;Drama;Winkler, Henry;Field, Sally;Kagan, Jeremy Paul;17;No;NicholasCage.png\n1981;116;Absence of Malice;Drama;Newman, Paul;Field, Sally;Pollack, Sydney;76;No;paulNewman.png\n1979;110;Norma Rae;Drama;Bridges, Beau;Field, Sally;Ritt, Martin;64;Yes;NicholasCage.png\n1989;118;Steel Magnolias;Drama;Skerritt, Tom;Field, Sally;Ross, Herbert;66;No;NicholasCage.png\n1989;101;Burbs, The;Comedy;Hanks, Tom;Fisher, Carrie;Dante, Joe;42;No;NicholasCage.png\n1980;124;Empire Strikes Back, The;Science Fiction;Hamill, Mark;Fisher, Carrie;Kershner, Irvin;33;No;NicholasCage.png\n1977;121;Star Wars;Science Fiction;Hamill, Mark;Fisher, Carrie;Lucas, George;44;No;NicholasCage.png\n1983;132;Return of the Jedi;Science Fiction;Hamill, Mark;Fisher, Carrie;Marquand, Richard;4;No;NicholasCage.png\n1991;104;Hear My Song;Drama;Dunbar, Adrian;Fitzgerald, Tara;Chelsom, Peter;72;No;NicholasCage.png\n1956;99;Slightly Scarlet;Action;Payne, John;Fleming, Rhonda;Dwan, Allan;52;No;NicholasCage.png\n1957;120;Gunfight at the OK Corral;Western;Lancaster, Burt;Fleming, Rhonda;Sturges, John;84;No;burtLancaster.png\n1931;;Range Feud, The;Western;Wayne, John;Fleming, Susan;Lederman, Ross;51;No;johnWayne.png\n1990;89;Bloodsucking Pharaohs in Pittsburgh;Comedy;Dengel, Jake;Fletcher, Suzanne;Smithey, Alan;79;No;NicholasCage.png\n1972;129;Roma;Drama;Gonzales, Peter;Florence, Fiona;Fellini, Federico;75;No;NicholasCage.png\n1979;122;China Syndrome, The;Drama;Douglas, Michael;Fonda, Jane;Bridges, James;43;No;NicholasCage.png\n1986;100;Morning After, The;Mystery;Bridges, Jeff;Fonda, Jane;Lumet, Sidney;6;No;NicholasCage.png\n1971;114;Klute;Drama;Sutherland, Donald;Fonda, Jane;Pakula, Alan J.;15;Yes;NicholasCage.png\n1979;113;Electric Horseman, The;Comedy;Redford, Robert;Fonda, Jane;Pollack, Sydney;34;No;NicholasCage.png\n1965;97;Cat Ballou;Comedy;Marvin, Lee;Fonda, Jane;Silverstein, Elliot;62;Yes;NicholasCage.png\n1991;;Coming Home;Drama;Voight, Jon;Fonda, Jane;;1;Yes;NicholasCage.png\n1940;130;Rebecca;Drama;Olivier, Laurence;Fontaine, Joan;Hitchcock, Alfred;78;Yes;alfredHitchcock.png\n1944;96;Jane Eyre;Drama;Welles, Orson;Fontaine, Joan;Stevenson, Robert;44;No;NicholasCage.png\n1973;87;Stacey!;Action;Randall, Anne;Ford, Anitra;Sidaris, Andy;31;No;NicholasCage.png\n1992;85;Naked Obsession;Mystery;Katt, William;Ford, Maria;Golden, Dan;26;No;NicholasCage.png\n1989;83;Stripped to Kill II, Live Girls;Mystery;Lottimer, Ed;Ford, Maria;Ruben, Katt Shea;80;No;NicholasCage.png\n1990;94;Rain Killer, The;Mystery;Sharkey, Ray;Ford, Maria;Stein, Ken;10;No;NicholasCage.png\n1983;95;Valley Girl;Comedy;Cage, Nicolas;Foreman, Deborah;Coolidge, Martha;30;No;NicholasCage.png\n1991;118;Silence of the Lambs, The;Mystery;Hopkins, Anthony;Foster, Jodie;Demme, Jonathan;8;Yes;AnthonyHopkins.png\n1988;98;Stealing Home;Drama;Harmon, Mark;Foster, Jodie;Kampmann, Steven ;76;No;NicholasCage.png\n1972;92;Napoleon & Samantha;Comedy;Douglas, Michael;Foster, Jodie;McEveety, Bernard;33;No;NicholasCage.png\n1988;;Five Corners;Drama;Robbins, Tim;Foster, Jodie;;88;No;NicholasCage.png\n1955;;Blackboard Jungle, The;Drama;Ford, Glenn;Francis, Anne;Brooks, Richard;66;No;glennFord.png\n1989;103;My Left Foot;Drama;Day-Lewis, Daniel;Fricker, Brenda;Sheridan, Jim;32;Yes;NicholasCage.png\n1987;92;Back to the Beach;Comedy;Avalon, Frankie;Funicello, Annette;Hobbs, Lyndall;45;No;NicholasCage.png\n1934;85;Painted Veil, The;Drama;Marshall, Herbert;Garbo, Greta;Boleslawski, Richard;57;No;gretaGarbo.png\n1931;74;Inspiration;Drama;Apfel, Oscar;Garbo, Greta;Brown, Clarence;66;No;gretaGarbo.png\n1930;92;Anna Christie;Drama;Bickford, Charles;Garbo, Greta;Brown, Clarence;0;No;gretaGarbo.png\n1926;109;Flesh & the Devil, The;Drama;Gilbert, John;Garbo, Greta;Brown, Clarence;72;No;gretaGarbo.png\n1928;90;Woman of Affairs;Drama;Gilbert, John;Garbo, Greta;Brown, Clarence;83;No;gretaGarbo.png\n1935;96;Anna Karenina;Drama;March, Fredric;Garbo, Greta;Brown, Clarence;35;Yes;gretaGarbo.png\n1936;110;Camille;Drama;Taylor, Robert;Garbo, Greta;Cukor, George;74;No;gretaGarbo.png\n1931;91;Mata Hari;Drama;Novarro, Ramon;Garbo, Greta;Fitzmaurice, George;67;No;gretaGarbo.png\n1929;100;Wild Orchids;Drama;Stone, Lewis;Garbo, Greta;Franklin, Sidney;70;No;gretaGarbo.png\n1932;112;Grand Hotel;Drama;Barrymore, John;Garbo, Greta;Goulding, Edmund;81;Yes;gretaGarbo.png\n1931;84;Susan Lennox, Her Fall & Rise;Drama;Hale, Alan;Garbo, Greta;Leonard, Robert Z.;64;No;gretaGarbo.png\n1939;108;Ninotchka;Comedy;Douglas, Melvyn;Garbo, Greta;Lubitsch, Ernst;40;No;gretaGarbo.png\n1933;97;Queen Christina;Drama;Gilbert, John;Garbo, Greta;Mamoulian, Rouben;82;No;gretaGarbo.png\n1928;96;Mysterious Lady, The;Drama;Nagel, Conrad;Garbo, Greta;Niblo, Fred;72;No;gretaGarbo.png\n1925;125;Joyless Street;Drama;Stuart, Henry;Garbo, Greta;Pabst, Georg Wilhelm;73;No;gretaGarbo.png\n1929;74;Single Standard, The;Drama;Asther, Nils;Garbo, Greta;Robertson, John S.;73;No;gretaGarbo.png\n1932;71;As You Desire Me;Drama;Douglas, Melvyn;Garbo, Greta;;85;No;gretaGarbo.png\n1930;76;Romance;Drama;Stone, Lewis;Garbo, Greta;;62;No;gretaGarbo.png\n1962;105;A Child Is Waiting;Drama;Lancaster, Burt;Garland, Judy;Cassavetes, John;60;No;burtLancaster.png\n1982;116;Tootsie;Comedy;Hoffman, Dustin;Garr, Teri;Pollack, Sydney;8;Yes;NicholasCage.png\n1989;86;Let It Ride;Comedy;Dreyfuss, Richard;Garr, Teri;Pytka, Joe;88;No;NicholasCage.png\n1953;120;Julius Caesar;Drama;Brando, Marlon;Garson, Greer;Mankiewicz, Joseph L.;50;No;brando.png\n1979;120;Nineteen Forty-One;Comedy;Belushi, John;Gary, Lorraine;Spielberg, Steven;24;No;NicholasCage.png\n1975;124;Jaws;Action;Scheider, Roy;Gary, Lorraine;Spielberg, Steven;6;No;NicholasCage.png\n1987;93;Hot Pursuit;Drama;Cusack, John;Gazelle, Wendy;Lisberger, Steven;44;No;NicholasCage.png\n1989;120;Triumph of the Spirit;Drama;Dafoe, Willem;Gazelle, Wendy;Young, Robert M.;49;No;NicholasCage.png\n1975;111;Brannigan;Drama;Wayne, John;Geeson, Judy;Hickox, Douglas;64;No;johnWayne.png\n1979;89;Buffet Froid;Comedy;Depardieu, Gérard;Gence, Denise;Blier, Bertrand;75;No;NicholasCage.png\n1986;122;Salvador;Drama;Woods, James;Gibb, Cynthia;Stone, Oliver;77;No;NicholasCage.png\n1959;102;Horse Soldiers, The;Western;Wayne, John;Gibson, Althea;Ford, John;76;No;johnWayne.png\n1954;108;Long John Silver;Action;Newton, Robert;Gilchrist, Connie;Haskin, Byron;56;No;NicholasCage.png\n1961;134;Hustler, The;Drama;Newman, Paul;Gleason, Jackie;Rossen, Robert;43;Yes;paulNewman.png\n1983;109;Star Chamber, The;Drama;Douglas, Michael;Gless, Sharon;Hyam, Peter;3;No;NicholasCage.png\n1988;100;Clara's Heart;Drama;Ontkean, Michael;Goldberg, Whoopi;Mulligan, Robert;60;No;NicholasCage.png\n1987;102;Burglar;Comedy;Goldthwait, Bob;Goldberg, Whoopi;Wilson, Hugh;44;No;NicholasCage.png\n1986;120;Comic Relief;Comedy;Crystal, Billy;Goldberg, Whoopi;;69;No;NicholasCage.png\n1978;117;Bloodbrothers;Drama;Sorvino, Paul;Goldoni, Lelia;Mulligan, Robert;11;No;NicholasCage.png\n1988;134;Rain Man;Drama;Hoffman, Dustin;Golino, Valeria;Levinson, Barry;8;Yes;NicholasCage.png\n1966;95;Masculine Feminine;Drama;Leaud, Jean-Pierre;Goya, Chantal;Godard, Jean-Luc;20;No;NicholasCage.png\n1964;51;Outer Limits, The;Science Fiction;Perrin, Vic;Grahame, Gloria;Stanley, Paul;27;No;NicholasCage.png\n1988;;Mama's Dirty Girls;Horror;Currie, Sondra;Grahame, Gloria;;62;No;NicholasCage.png\n1979;180;Last Ride of the Dalton Gang, The;Western;Palance, Jack;Greenbush, Lindsay;Curtis, Dan;62;No;NicholasCage.png\n1991;;Why Me?;Comedy;Lambert, Christopher;Greist, Kim;;74;No;NicholasCage.png\n1932;66;Number Seventeen;Crime;Lion, Leon M.;Grey, Anne;Hitchcock, Alfred;66;No;alfredHitchcock.png\n1986;120;Manhunter;Drama;Petersen, William L.;Griest, Kim;Mann, Michael;19;No;NicholasCage.png\n1990;126;Bonfire of the Vanities, The;Drama;Hanks, Tom;Griffith, Melanie;De Palma, Brian;82;No;NicholasCage.png\n1988;115;Working Girl;Comedy;Ford, Harrison;Griffith, Melanie;Nichols, Mike;25;No;NicholasCage.png\n1992;133;Shining Through;Mystery;Douglas, Michael;Griffith, Melanie;Seltzer, David;11;No;NicholasCage.png\n1991;76;Slumber Party Massacre III;Horror;Christian, Keely;Grye, Brittain;;40;No;NicholasCage.png\n1988;99;Tokyo Pop;Comedy;Tadokoro, Yutaka;Hamilton, Carrie;Kuzui, Fran Rubel;2;No;NicholasCage.png\n1991;136;Terminator 2;Action;Schwarzenegger, Arnold;Hamilton, Linda;Cameron, James;8;No;T2.png\n1984;108;Terminator, The;Action;Schwarzenegger, Arnold;Hamilton, Linda;Cameron, James;17;No;T2.png\n1986;105;King Kong Lives!;Action;Kerwin, Brian;Hamilton, Linda;Guillermin, John;20;No;NicholasCage.png\n1969;125;Those Daring Young Men in Their Jaunty;Comedy;Curtis, Tony;Hampshire, Susan;;59;No;NicholasCage.png\n1991;186;At Play in the Fields of the Lord;Drama;Berenger, Tom;Hannah, Daryl;Babenco, Hector;81;No;NicholasCage.png\n1990;;Crazy People;Comedy;Moore, Dudley;Hannah, Daryl;Bill, Tony;61;No;NicholasCage.png\n1992;99;Memoirs of an Invisible Man;Comedy;Chase, Chevy;Hannah, Daryl;Carpenter, John;58;No;NicholasCage.png\n1985;100;Clan of the Cave Bear, The;Drama;Remar, James;Hannah, Daryl;Chapman, Michael;73;No;NicholasCage.png\n1983;82;Final Terror, The;Horror;Zmed, Adrian;Hannah, Daryl;Davis, Andrew;24;No;NicholasCage.png\n1984;93;Reckless;Drama;Quinn, Aidan;Hannah, Daryl;Foley, James;14;No;NicholasCage.png\n1989;;High Spirits;Comedy;O'Toole, Peter;Hannah, Daryl;Jordan, Neil;53;No;NicholasCage.png\n1987;107;Roxanne;Comedy;Martin, Steve;Hannah, Daryl;Schepisi, Fred;66;No;NicholasCage.png\n1982;117;Blade Runner;Action;Ford, Harrison;Hannah, Daryl;Scott, Ridley;1;No;NicholasCage.png\n1987;126;Wall Street;Drama;Douglas, Michael;Hannah, Daryl;Stone, Oliver;6;Yes;NicholasCage.png\n1992;111;Pope of Greenwich Village;Drama;Rourke, Mickey;Hannah, Daryl;;58;No;NicholasCage.png\n1989;89;After School;Drama;Bottoms, Sam;Hannah, Page;;59;No;NicholasCage.png\n1938;298;Flaming Frontiers;Western;Brown, Johnny Mack;Hansen, Eleanor;Taylor, Ray;82;No;NicholasCage.png\n1936;89;Libeled Lady;Comedy;Powell, William;Harlow, Jean;Conway, Jack;86;No;NicholasCage.png\n1976;99;Inserts;Drama;Dreyfuss, Richard;Harper, Jessica;Byrum, John;85;No;NicholasCage.png\n1988;88;Blue Iguana, The;Drama;McDermott, Dylan;Harper, Jessica;Lafia, John;65;No;NicholasCage.png\n1983;93;Tender Mercies;Drama;Duvall, Robert;Harper, Tess;Beresford, Bruce;61;Yes;NicholasCage.png\n1987;96;Nights in White Satin;Drama;Gilman, Kenneth;Harris, Priscilla;Barnard, Michael;5;No;NicholasCage.png\n1989;87;Videodrome;Horror;Woods, James;Harry, Deborah;Cronenberg, David;36;No;NicholasCage.png\n1991;96;Intimate Stranger;Mystery;Russo, James;Harry, Deborah;Holzman, Allan;23;No;NicholasCage.png\n1986;110;Highlander;Science Fiction;Lambert, Christopher;Hart, Roxanne;Mulcahy, Russell;8;No;NicholasCage.png\n1987;93;Bodycount;Action;White, Bernie;Hassett, Marilyn;;51;No;NicholasCage.png\n1989;104;Tango & Cash;Action;Stallone, Sylvester;Hatcher, Teri;Konchalovsky, Andrei;9;No;NicholasCage.png\n1970;94;There's a Girl in My Soup;Comedy;Sellers, Peter;Hawn, Goldie;Boulting, Roy;41;No;NicholasCage.png\n1984;100;Swing Shift;Drama;Russell, Kurt;Hawn, Goldie;Demme, Jonathan;81;No;NicholasCage.png\n1978;112;Foul Play;Comedy;Chase, Chevy;Hawn, Goldie;Higgins, Colin;46;No;NicholasCage.png\n1982;109;Best Friends;Comedy;Reynolds, Burt;Hawn, Goldie;Jewison, Norman;74;No;NicholasCage.png\n1972;109;Butterflies Are Free;Drama;Albert, Edward;Hawn, Goldie;Katselas, Milton;82;Yes;NicholasCage.png\n1987;112;Overboard;Comedy;Russell, Kurt;Hawn, Goldie;Marshall, Garry;6;No;NicholasCage.png\n1974;103;Girl from Petrovka, The;Drama;Holbrook, Hal;Hawn, Goldie;Miller, Robert Ellis;23;No;NicholasCage.png\n1992;102;Housesitter;Comedy;Martin, Steve;Hawn, Goldie;Oz, Frank;14;No;NicholasCage.png\n1986;106;Wildcats;Comedy;Keach, James;Hawn, Goldie;Ritchie, Michael;22;No;NicholasCage.png\n1984;100;Protocol;Comedy;Sarandon, Chris;Hawn, Goldie;Ross, Herbert;53;No;NicholasCage.png\n1980;102;Seems Like Old Times;Comedy;Chase, Chevy;Hawn, Goldie;Sandrich, Jay;49;No;NicholasCage.png\n1974;109;Sugarland Express, The;Drama;Johnson, Ben;Hawn, Goldie;Spielberg, Steven;28;No;NicholasCage.png\n1980;110;Private Benjamin;Comedy;Assante, Armand;Hawn, Goldie;Zieff, Howard;61;No;NicholasCage.png\n1991;115;Deceived;Mystery;Heard, John;Hawn, Goldie;;55;No;NicholasCage.png\n1931;95;Arrowsmith;Drama;Colman, Ronald;Hayes, Helen;Ford, John;84;No;johnFord.png\n1972;78;Say Goodbye Maggie Cole;Drama;McGavin, Darren;Hayward, Susan;Taylor, Jud;84;No;NicholasCage.png\n1964;132;Circus World;Drama;Wayne, John;Hayworth, Rita;Hathaway, Henry;29;No;johnWayne.png\n1952;98;Affair in Trinidad;Drama;Ford, Glenn;Hayworth, Rita;Sherman, Vincent;49;No;glennFord.png\n1948;87;Lady from Shanghai;Mystery;Welles, Orson;Hayworth, Rita;Welles, Orson;16;No;NicholasCage.png\n1940;81;Lady in Question;Drama;Aherne, Brian;Hayworth, Rita;Vidor, Charles;57;No;NicholasCage.png\n1946;110;Gilda;Drama;Ford, Glenn;Hayworth, Rita;Vidor, Charles;57;No;glennFord.png\n1948;98;Loves of Carmen, The;Drama;Ford, Glenn;Hayworth, Rita;Vidor, Charles;48;No;glennFord.png\n1990;105;Dick Tracy;Comedy;Beatty, Warren;Headley, Glenne;Beatty, Warren;84;No;NicholasCage.png\n1964;130;Marnie;Drama;Connery, Sean;Hedren, Tippi;Hitchcock, Alfred;2;No;seanConnery.png\n1987;85;Hot Child in the City;Mystery;Prysirr, Geof;Hendrix, Leah Ayres;Florea, John;0;No;NicholasCage.png\n1984;90;Johnny Dangerously;Comedy;Piscopo, Joe;Henner, Marilu;Heckerling, Amy;3;No;NicholasCage.png\n1985;95;Stark;Mystery;Surovy, Nicolas;Henner, Marilu;Holcomb, Rod;27;No;NicholasCage.png\n1949;84;Three Strange Loves;Drama;Malmsten, Birger;Henning, Eva;Bergman, Ingmar;87;No;Bergman.png\n1964;170;My Fair Lady;Music;Harrison, Rex;Hepburn, Audrey;Cukor, George;10;Yes;NicholasCage.png\n1960;123;Unforgiven, The;Drama;Lancaster, Burt;Hepburn, Audrey;Huston, John;32;No;burtLancaster.png\n1976;106;Robin & Marian;Action;Connery, Sean;Hepburn, Audrey;Lester, Richard;6;No;seanConnery.png\n1961;109;Children's Hour, The;Drama;Garner, James;Hepburn, Audrey;Wyler, William;60;No;NicholasCage.png\n1956;121;Rainmaker, The;Drama;Lancaster, Burt;Hepburn, Katharine;Anthony, Joseph;21;No;katharineHepburn.png\n1952;95;Pat & Mike;Comedy;Tracy, Spencer;Hepburn, Katharine;Cukor, George;48;No;spencerTracy.png\n1968;134;Lion in Winter, THe;Drama;O'Toole, Peter;Hepburn, Katharine;Harvey, Anthony;78;Yes;katharineHepburn.png\n1991;132;Sea of Grass, The;Western;Tracy, Spencer;Hepburn, Katharine;Kazan, Elia;75;No;spencerTracy.png\n1967;108;Guess Who's Coming to Dinner;Drama;Tracy, Spencer;Hepburn, Katharine;Kramer, Stanley;50;Yes;spencerTracy.png\n1957;153;Desk Set;Comedy;Tracy, Spencer;Hepburn, Katharine;Lang, Walter;51;No;spencerTracy.png\n1975;107;Rooster Cogburn;Western;Wayne, John;Hepburn, Katharine;Miller, Stuart;76;No;johnWayne.png\n1981;109;On Golden Pond;Drama;Fonda, Henry;Hepburn, Katharine;Rydell, Mark;23;Yes;katharineHepburn.png\n1991;101;Adam's Rib;Comedy;Tracy, Spencer;Hepburn, Katharine;;62;No;spencerTracy.png\n1991;116;Boom Town;Drama;Tracy, Spencer;Hepburn, Katharine;;73;No;katharineHepburn.png\n1991;145;Dragon Seed;Drama;Tracy, Spencer;Hepburn, Katharine;;34;No;katharineHepburn.png\n1991;115;Little Women;Drama;Tracy, Spencer;Hepburn, Katharine;;22;No;katharineHepburn.png\n1991;113;Philadelphia Story, The;Comedy;Tracy, Spencer;Hepburn, Katharine;;25;No;katharineHepburn.png\n1991;112;Without Love;Comedy;Tracy, Spencer;Hepburn, Katharine;;66;No;katharineHepburn.png\n1991;113;Woman of the Year;Comedy;Tracy, Spencer;Hepburn, Katharine;;12;No;spencerTracy.png\n1992;95;Juice;Drama;Shakur, Tupac;Herron, Cindy;Dickerson, Ernest R.;31;No;NicholasCage.png\n1986;114;Hoosiers;Drama;Hackman, Gene;Hershey, Barbara;Anspaugh, David;2;No;NicholasCage.png\n1987;112;Tin Men;Comedy;Dreyfuss, Richard;Hershey, Barbara;Levinson, Barry;50;No;NicholasCage.png\n1988;163;Last Temptation of Christ, The;Drama;Dafoe, Willem;Hershey, Barbara;Scorsese, Martin;32;No;NicholasCage.png\n1991;99;Paris Trout;Drama;Hopper, Dennis;Hershey, Barbara;;53;No;NicholasCage.png\n1988;87;Souvenir;Drama;Plummer, Christopher;Hicks, Catherine;Reeve, Geoffrey;42;No;NicholasCage.png\n1966;120;A Man for All Seasons;Drama;Shaw, Robert;Hiller, Wendy;Zinnemann, Fred;20;Yes;NicholasCage.png\n1986;90;Knights & Emeralds;Drama;Leadbitter, Bill;Hills, Beverly;Emes, Ian;;No;NicholasCage.png\n1989;83;Masque of the Red Death;Horror;MacNee, Patrick;Hoak, Clare;Brand, Larry;9;No;NicholasCage.png\n1943;265;Adventures of Smilin' Jack, The;Mystery;Brown, Tom;Hobart, Rose;Taylor, Ray;77;No;NicholasCage.png\n1992;88;Adventures in Dinosaur City;Action;Katz, Omri;Hoffman, Shawn;Thompson, Brett;19;No;NicholasCage.png\n1987;95;Allnighter, The;Comedy;Terlesky, John;Hoffs, Susanna;Hoffs, Tamar Simon;71;No;NicholasCage.png\n1980;99;Caddyshack;Comedy;Chase, Chevy;Holcomb, Sarah;Ramis, Harold;70;No;NicholasCage.png\n1973;102;Tom Sawyer;Music;Whitaker, Johnny;Holm, Celeste;Taylor, Don;11;No;NicholasCage.png\n1987;94;Rita, Sue & Bob Too;Comedy;Finneran, Siohban;Holmes, Michelle;Clarke, Alan;5;No;NicholasCage.png\n1947;56;Hawk of Powder River;Western;Dean, Eddie;Holt, Jennifer;Taylor, Ray;61;No;NicholasCage.png\n1928;148;Tempest;Drama;Barrymore, John;Horn, Camilla;Taylor, Sam;33;No;NicholasCage.png\n1986;90;Running Mates;Drama;Webb, Greg;Howard, Barbara;Neff, Thomas L.;63;No;NicholasCage.png\n1987;105;Prettykill;Drama;Birney, David;Hubley, Season;Kaczender, George;71;No;NicholasCage.png\n1934;80;Judge Priest;Drama;Rogers, Will;Hudson, Rochelle;Ford, John;9;No;johnFord.png\n1950;104;Harvey;Comedy;Stewart, James;Hull, Josephine;Koster, Henry;42;No;NicholasCage.png\n1991;89;If Looks Could Kill;Action;Grieco, Richard;Hunt, Linda;Wilmington, Michael;10;No;NicholasCage.png\n1987;94;Raising Arizona;Comedy;Cage, Nicolas;Hunter, Holly;Coen, Joel;23;No;NicholasCage.png\n1989;114;Once Around;Comedy;Dreyfuss, Richard;Hunter, Holly;Hallström, Lasse;68;No;NicholasCage.png\n1980;110;Loulou;Drama;Depardieu, Gérard;Huppert, Isabelle;Pialat, Maurice;65;No;NicholasCage.png\n1982;136;World According to Garp, The;Drama;Williams, Robin;Hurt, Mary Beth;Hill, George Roy;59;No;NicholasCage.png\n1980;106;Virus;Science Fiction;Kennedy, George;Hussey, Olivia;Fukasaku, Kinji;62;No;NicholasCage.png\n1940;127;Northwest Passage;Action;Tracy, Spencer;Hussey, Ruth;Vidor, King;51;No;spencerTracy.png\n1987;112;Gardens of Stone;Drama;Caan, James;Huston, Anjelica;Coppola, Francis Ford;27;No;NicholasCage.png\n1989;121;Enemies, a Love Story;Drama;Silver, Ron;Huston, Anjelica;Mazursky, Paul;5;No;NicholasCage.png\n1992;102;Addams Family, The;Comedy;Julia, Raul;Huston, Anjelica;Sonnenfeld, B.;8;No;NicholasCage.png\n1932;65;Freaks;Horror;Ford, Wallace;Hyams, Leila;Browning, Tod;61;No;NicholasCage.png\n1991;108;Necessary Roughness;Comedy;Bakula, Scott;Ireland, Kathy;Dragoti, Stan;60;No;NicholasCage.png\n1990;93;A Show of Force;Drama;Garcia, Andy;Irving, Amy;Barreto, Bruno;1;No;NicholasCage.png\n1980;129;Competition, The;Drama;Dreyfuss, Richard;Irving, Amy;Oliansky, Joel;45;No;NicholasCage.png\n1988;97;Crossing Delancey;Comedy;Riegert, Peter;Irving, Amy;Silver, Joan Micklin;6;No;NicholasCage.png\n1982;120;State of Things, The;Drama;Kime, Jeffrey;Isabelle Weingarten.;Wenders, Wim;73;No;NicholasCage.png\n1987;89;Business As Usual;Comedy;Thaw, John;Jackson, Glenda;Barrett, Lezli-An;17;No;NicholasCage.png\n1973;103;A Touch of Class;Comedy;Segal, George;Jackson, Glenda;Frank, Melvin;79;Yes;NicholasCage.png\n1970;129;Women in Love.;Drama;Bates, Alan;Jackson, Glenda;Russell, Ken;18;No;NicholasCage.png\n1988;89;Salome's Last Dance;Comedy;Johns, Stratford;Jackson, Glenda;Russell, Ken;76;No;NicholasCage.png\n1986;100;Casino;Mystery;Connors, Mike;Jackson, Sherry;Chaffey, Don;5;No;NicholasCage.png\n1955;108;Smiles of a Summer Night;Comedy;Björnstrand, Gunnar;Jacobsson, Ulla;Bergman, Ingmar;58;No;Bergman.png\n1989;90;New Year's Day;Comedy;Jaglom, Henry;Jakobsen, Maggie;Jaglom, Henry;88;No;NicholasCage.png\n1981;132;Mephisto;Drama;Brandauer, Klaus Maria;Janda, Krystyna;Szabó, István;80;Yes;NicholasCage.png\n1927;60;Easy Virtue;Mystery;Dyall, Franklin;Jeans, Isabel;Hitchcock, Alfred;45;No;alfredHitchcock.png\n1937;59;Swing It Sailor!;Comedy;Ford, Wallace;Jewell, Isabel;;6;No;NicholasCage.png\n1991;83;Strictly Business;Comedy;Davidson, Tommy;Johnson, Anne-Marie;Hooks, Kevin;3;No;NicholasCage.png\n1983;90;Blame It on Rio;Comedy;Caine, Michael;Johnson, Michelle;Donen, Stanley;10;No;NicholasCage.png\n1987;86;Straight to Hell;Action;Hopper, Dennis;Jones, Grace;Cox, Alex;47;No;NicholasCage.png\n1990;131;A View to a Kill;Action;Moore, Roger;Jones, Grace;;44;No;NicholasCage.png\n1986;100;American Anthem;Drama;Gaylord, Mitch;Jones, Janet;Magnoli, Albert;74;No;NicholasCage.png\n1963;99;Bedtime Story;Comedy;Brando, Marlon;Jones, Shirley;Levy, Ralph;7;No;brando.png\n1991;117;Courtship of Eddie's Father, The;Comedy;Howard, Ron;Jones, Shirley;;43;No;NicholasCage.png\n1988;102;Night Train to Katmandu, THe;Action;Roberts, Pernell;Jovovich, Milla;Wiemer, Robert;43;No;NicholasCage.png\n1948;100;Port of Call;Drama;Eklund, Bengt;Jönsson, Nine-Christine;Bergman, Ingmar;29;No;Bergman.png\n1973;103;Paper Moon;Comedy;O'Neal, Ryan;Kahn, Madeline;Bogdanovich, Peter;3;Yes;NicholasCage.png\n1983;97;Yellowbeard;Comedy;Chapman, Graham;Kahn, Madeline;Damski, Mel;34;No;NicholasCage.png\n1975;91;Adventures of Sherlock Holmes' Smarter;Comedy;Wilder, Gene;Kahn, Madeline;Wilder, Gene;42;No;NicholasCage.png\n1990;108;Flashback;Comedy;Hopper, Dennis;Kane, Carol;Amurri, Franco;19;No;NicholasCage.png\n1977;89;World's Greatest Lover, The;Comedy;Wilder, Gene;Kane, Carol;Wilder, Gene;42;No;NicholasCage.png\n1955;67;Killer's Kiss;Mystery;Silvera, Frank;Kane, Irene;Kubrick, Stanley;66;No;NicholasCage.png\n1988;103;Deceivers, The;Action;Brosnan, Pierce;Kapoor, Shashi;Meyer, Nicholas;14;No;NicholasCage.png\n1983;97;Breathless;Action;Gere, Richard;Kaprisky, Valerie;McBride, Jim;51;No;NicholasCage.png\n1989;145;Born on the Fourth of July;Drama;Cruise, Tom;Kava, Caroline;Stone, Oliver;8;Yes;NicholasCage.png\n1991;120;Awakenings;Drama;De Niro, Robert;Kavner, Julie;Marshall, Penny;8;No;NicholasCage.png\n1977;94;Annie Hall;Comedy;Allen, Woody;Keaton, Diane;Allen, Woody;68;Yes;woody.png\n1979;96;Manhattan;Comedy;Allen, Woody;Keaton, Diane;Allen, Woody;82;Yes;woody.png\n1981;195;Reds;Drama;Beatty, Warren;Keaton, Diane;Beatty, Warren;76;Yes;NicholasCage.png\n1986;105;Crimes of the Heart;Comedy;Shepard, Sam;Keaton, Diane;Beresford, Bruce;84;No;NicholasCage.png\n1977;136;Looking for Mr. Goodbar;Drama;Atherton, William;Keaton, Diane;Brooks, Richard;54;No;NicholasCage.png\n1972;175;Godfather, The;Drama;Brando, Marlon;Keaton, Diane;Coppola, Francis Ford;8;Yes;brando.png\n1974;201;Godfather, Pt 2., The;Drama;Pacino, Al;Keaton, Diane;Coppola, Francis Ford;8;Yes;NicholasCage.png\n1976;109;I Will, I Will...For Now;Comedy;Gould, Elliott;Keaton, Diane;Panama, Norman;6;No;NicholasCage.png\n1972;86;Play It Again, Sam;Comedy;Allen, Woody;Keaton, Diane;Ross, Herbert;81;No;woody.png\n1975;82;Love & Death;Comedy;Allen, Woody;Keaton, Diane;;84;No;woody.png\n1973;88;Sleeper;Comedy;Allen, Woody;Keaton, Diane;;59;No;woody.png\n1970;130;Fellini Satyricon;Drama;Potter, Martin;Keller, Hiram;Fellini, Federico;88;No;NicholasCage.png\n1980;117;Formula, The;Mystery;Scott, George C.;Keller, Marthe;Avildsen, John G.;82;No;NicholasCage.png\n1977;143;Black Sunday;Drama;Shaw, Robert;Keller, Marthe;Frankenheimer, John;76;No;NicholasCage.png\n1977;124;Bobby Deerfield;Drama;Pacino, Al;Keller, Marthe;Pollack, Sydney;36;No;NicholasCage.png\n1972;98;Last of the Red Hot Lovers;Comedy;Arkin, Alan;Kellerman, Sally;Saks, Gene;40;No;NicholasCage.png\n1953;116;Mogambo;Action;Gable, Clark;Kelly, Grace;Ford, John;71;No;johnFord.png\n1955;103;To Catch a Thief;Mystery;Grant, Cary;Kelly, Grace;Hitchcock, Alfred;69;No;alfredHitchcock.png\n1954;113;Rear Window;Mystery;Stewart, James;Kelly, Grace;Hitchcock, Alfred;25;No;alfredHitchcock.png\n1945;69;Woman Who Came Back;Drama;Kruger, Otto;Kelly, Nancy;Colmes, Walter;26;No;NicholasCage.png\n1939;101;Stanley & Livingstone;Action;Tracy, Spencer;Kelly, Nancy;King, Henry;11;No;spencerTracy.png\n1956;129;Bad Seed, The;Horror;Jones, Henry;Kelly, Nancy;LeRoy, Mervyn;69;No;NicholasCage.png\n1989;113;Lethal Weapon 2;Action;Gibson, Mel;Kensit, Patsy;Donner, Richard;69;No;NicholasCage.png\n1992;79;Blame It on the Bellboy;Comedy;Moore, Dudley;Kensit, Patsy;Herman, Mark;69;No;NicholasCage.png\n1927;62;Drop Kick, The;Drama;Barthelmess, Richard;Kent, Barbara;Webb, Millard;;No;NicholasCage.png\n1978;145;Superman, The Movie;Action;Brando, Marlon;Kidder, Margot;Donner, Richard;87;No;brando.png\n1987;90;Superman IV: The Quest for Peace;Action;Reeve, Christopher;Kidder, Margot;Furie, Sidney J.;77;No;NicholasCage.png\n1970;90;Quackser Fortune Has a Cousin in the Bronx;Comedy;Wilder, Gene;Kidder, Margot;Waris, Hussein;49;No;NicholasCage.png\n1989;96;Dead Calm;Mystery;Neill, Sam;Kidman, Nicole;Noyce, Phillip;1;No;NicholasCage.png\n1990;107;Days of Thunder;Action;Cruise, Tom;Kidman, Nicole;Scott, Tony;3;No;NicholasCage.png\n1987;101;My Life As a Dog;Comedy;Glanzelius, Anton;Kinnaman, Melinda;Hallström, Lasse;21;No;NicholasCage.png\n1983;;Moon in the Gutter, The;Action;Depardieu, Gérard;Kinski, Nastassia;Beineix, Jean-Jacques;29;No;NicholasCage.png\n1984;150;Paris, Texas;Drama;Stanton, Harry Dean;Kinski, Nastassia;Wenders, Wim;27;No;NicholasCage.png\n1984;96;Unfaithfully Yours;Comedy;Moore, Dudley;Kinski, Nastassia;Zieff, Howard;73;No;NicholasCage.png\n1987;95;Bullseye!;Comedy;Caine, Michael;Kirkland, Sally;Winner, Michael;8;No;NicholasCage.png\n1989;104;Erik the Viking;Action;Robbins, Tim;Kitt, Eartha;Jones, Terry;25;No;NicholasCage.png\n1987;90;Dragonard;Drama;Reed, Oliver;Kitt, Eartha;Kikoine, Gerard;71;No;NicholasCage.png\n1986;90;Hard Choices;Drama;McCleery, Gary;Klenck, Margaret;King, Rick;41;No;NicholasCage.png\n1969;102;Rain People, The;Drama;Caan, James;Knight, Shirley;Coppola, Francis Ford;78;No;NicholasCage.png\n1984;106;A Year of the Quiet Sun;Drama;Wilson, Scott;Komorowska, Maja;Zanussi, Krzystoff;78;No;NicholasCage.png\n1935;54;Desert Trail, The;Western;Wayne, John;Kornman, Mary;Collins, Lewis D.;50;No;johnWayne.png\n1990;98;Almost an Angel;Comedy;Hogan, Paul;Kozlowski, Linda;Cornell, John;14;No;NicholasCage.png\n1986;98;Crocodile Dundee;Comedy;Hogan, Paul;Kozlowski, Linda;Faiman, Peter;66;No;NicholasCage.png\n1977;127;American Friend, The;Mystery;Hopper, Dennis;Kreuzer, Lisa;Wenders, Wim;35;No;NicholasCage.png\n1989;119;See You in the Morning;Drama;Bridges, Jeff;Krige, Alice;Pakula, Alan J.;53;No;NicholasCage.png\n1987;88;Arrogant, The;Drama;Graham, Gary;Kristel, Sylvia;Blot, Philippe;62;No;NicholasCage.png\n1989;86;Dracula's Widow;Horror;Sommer, Josef;Kristel, Sylvia;Coppola, Christopher;55;No;NicholasCage.png\n1987;90;Ninja Masters of Death;Action;Peterson, Chris;Kruize, Kelly;Lambert, Bruce;15;No;NicholasCage.png\n1990;110;Mystery Train;Comedy;Nagase, Masatoshi;Kudoh, Youki;Jarmusch, Jim;23;No;NicholasCage.png\n1978;114;Go Tell the Spartans;War;Lancaster, Burt;Kumagai, Denice;Post, Ted;67;No;burtLancaster.png\n1986;89;True Stories;Comedy;Goodman, John;Kurtz, Swoosie;Byrne, David;79;No;NicholasCage.png\n1953;94;Ugetsu Monogatari;Drama;Mori, Masayuki;Kyô, Machiki;Mizoguchi, Kenji;82;No;NicholasCage.png\n1969;80;Rebel Rousers;Action;Nicholson, Jack;Ladd, Diane;Cohen, Martin B.;44;No;JackNicholson.png\n1988;98;Plain Clothes;Comedy;Howard, Arliss;Ladd, Diane;Coolidge, Martha;4;No;NicholasCage.png\n1981;119;Whose Life Is It, Anyway?;Drama;Dreyfuss, Richard;Lahti, Christine;Badham, John;62;No;NicholasCage.png\n1988;116;Running on Empty;Drama;Hirsch, Judd;Lahti, Christine;Lumet, Sidney;2;No;NicholasCage.png\n1990;101;Funny about Love;Comedy;Wilder, Gene;Lahti, Christine;Nimoy, Leonard;60;No;NicholasCage.png\n1985;118;A Chorus Line, The Movie;Music;Douglas, Michael;Landers, Audrey;Attenborough, Richard;71;No;NicholasCage.png\n1986;84;Stewardess School;Comedy;Most, Donald;Landers, Judy;Blancato, Ken;28;No;NicholasCage.png\n1987;109;Big Town, The;Drama;Dillon, Matt;Lane, Diane;Bolt, Ben;11;No;NicholasCage.png\n1983;94;Rumble Fish;Drama;Dillon, Matt;Lane, Diane;Coppola, Francis Ford;4;No;NicholasCage.png\n1983;91;Outsiders, The;Drama;Howell, C. Thomas;Lane, Diane;Coppola, Francis Ford;56;No;NicholasCage.png\n1990;94;Priceless Beauty;Science Fiction;Lambert, Christopher;Lane, Diane;Finch, Charles;7;No;NicholasCage.png\n1989;93;Streets of Fire;Action;Paré, Michael;Lane, Diane;Hill, Walter;65;No;NicholasCage.png\n1990;115;Men Don't Leave;Drama;Howard, Arliss;Lange, Jessica;Brickman, Paul;66;No;NicholasCage.png\n1988;127;Everybody's All American;Romance;Quaid, Dennis;Lange, Jessica;Hackford, Taylor;62;No;NicholasCage.png\n1992;128;Cape Fear;Mystery;De Niro, Robert;Lange, Jessica;Scorsese, Martin;7;No;NicholasCage.png\n1992;121;Postman Always Rings Twice, The;Mystery;Nicholson, Jack;Lange, Jessica;;24;No;NicholasCage.png\n1949;58;Crashing Thru;Western;Wilson, Whip;Larson, Christine;Taylor, Ray;19;No;NicholasCage.png\n1978;109;Get Out Your Handkerchiefs;Comedy;Depardieu, Gérard;Laure, Carole;Blier, Bertrand;78;Yes;NicholasCage.png\n1971;137;Boy Friend, THe;Music;Gable, Christopher;Lawson, Twiggy;Russell, Ken;8;No;NicholasCage.png\n1990;100;Hard To Kill;Action;Seagal, Steven;LeBrock, Kelly;Malmuth, Bruce;49;No;NicholasCage.png\n1960;109;Psycho;Horror;Perkins, Anthony;Leigh, Janet;Hitchcock, Alfred;56;No;alfredHitchcock.png\n1957;112;Jet Pilot;Action;Wayne, John;Leigh, Janet;Sternberg, Josef von;43;No;johnWayne.png\n1987;95;Under Cover;Mystery;Neidorf, David;Leigh, Jennifer Jason;Stockwell, John;36;No;NicholasCage.png\n1951;122;A Streetcar Named Desire;Drama;Brando, Marlon;Leigh, Vivien;Kazan, Elia;75;Yes;brando.png\n1986;93;Golden Child, The;Comedy;Murphy, Eddie;Lewis, Charlotte;Ritchie, Michael;86;No;NicholasCage.png\n1971;84;Statue, The;Drama;Niven, David;Lisi, Virna;Amateau, Rod;80;No;NicholasCage.png\n1985;128;Christopher Columbus;Drama;Byrne, Gabriel;Lisi, Virna;Lattuada, Alberto;69;No;NicholasCage.png\n1989;116;In Country;Drama;Willis, Bruce;Lloyd, Emily;Jewison, Norman;76;No;NicholasCage.png\n1978;132;Wild Geese, The;Action;Burton, Richard;Lloyd, Rosalind;McLaglen, Andrew V.;21;No;NicholasCage.png\n1974;90;Second Coming of Suzanne., The;Drama;Dreyfuss, Richard;Locke, Sondra;Barry, Michael;21;No;NicholasCage.png\n1980;116;Bronco Billy;Westerns;Eastwood, Clint;Locke, Sondra;Eastwood, Clint;57;No;clintEastwood.png\n1977;109;Gauntlet, The;Action;Eastwood, Clint;Locke, Sondra;Eastwood, Clint;18;No;clintEastwood.png\n1986;105;Ratboy;Drama;Townsend, Robert;Locke, Sondra;Locke, Sondra;1;No;NicholasCage.png\n1938;96;Lady Vanishes;Mystery;Redgrave, Michael;Lockwood, Margaret;Hitchcock, Alfred;27;No;alfredHitchcock.png\n1987;95;Kitchen Toto, THe;Drama;Peck, Bob;Logan, Phyllis;Hook, Harry;41;No;NicholasCage.png\n1959;88;Carlton-Browne of the F.O.;Comedy;Terry-Thomas;Lohr, Marie;Boulting, Roy;63;No;NicholasCage.png\n1929;68;Racketeer;Drama;Armstrong, Robert;Lombard, Carole;Higgin, Howard;2;No;NicholasCage.png\n1941;95;Mr. & Mrs. Smith;Comedy;Montgomery, Robert;Lombard, Carole;Hitchcock, Alfred;3;No;alfredHitchcock.png\n1986;132;Alrededor de Medianoche;Drama;Francois Cluzet;Lonette McKee;Rayfield, David;47;No;NicholasCage.png\n1982;101;Losin' It;Comedy;Cruise, Tom;Long, Shelley;Hanson, Curtis;4;No;NicholasCage.png\n1987;114;Into the Homeland;Action;Boothe, Powers;Longstreth, Emily;Glatter, Lesli Linka;34;No;NicholasCage.png\n1991;60;Boxing Babes;Action;Nichol, Robin;Lords, Traci;Dell, Stewart;9;No;NicholasCage.png\n1991;94;Shock 'em Dead;Horror;Donahue, Troy;Lords, Traci;Freed, Mark;31;No;NicholasCage.png\n1960;101;Heller in Pink Tights;Drama;Quinn, Anthony;Loren, Sophia;Cukor, George;52;No;sophiaLoren.png\n1961;100;Two Women;Drama;Belmondo, Jean-Paul;Loren, Sophia;De Sica, Vittorio;83;Yes;sophiaLoren.png\n1954;107;Gold of Naples, The;Drama;De Sica, Vittorio;Loren, Sophia;De Sica, Vittorio;40;No;sophiaLoren.png\n1963;118;Yesterday, Today & Tomorrow;Comedy;Mastroianni, Marcello;Loren, Sophia;De Sica, Vittorio;73;Yes;sophiaLoren.png\n1957;109;Legend of the Lost;Action;Wayne, John;Loren, Sophia;Hathaway, Henry;84;No;sophiaLoren.png\n1978;111;Brass Target;Action;Cassavetes, John;Loren, Sophia;Hough, John;53;No;sophiaLoren.png\n1964;188;Fall of the Roman Empire, The;Drama;Boyd, Stphen;Loren, Sophia;Mann, Anthony;62;No;sophiaLoren.png\n1961;172;El Cid;Drama;Heston, Charlton;Loren, Sophia;Mann, Anthony;10;No;sophiaLoren.png\n1958;114;Desire under the Elms;Drama;Perkins, Anthony;Loren, Sophia;Mann, Delbert;13;No;sophiaLoren.png\n1953;92;Two Nights with Cleo;Drama;Sordi, Alberto;Loren, Sophia;Mattoli, Mario;54;No;sophiaLoren.png\n1959;;Black Orchid, The;Drama;Quinn, Anthony;Loren, Sophia;Ritt, Martin;54;No;sophiaLoren.png\n1977;91;Angela;Drama;Railsback, Steve;Loren, Sophia;Sagal, Boris;80;No;sophiaLoren.png\n1977;105;A Special Day;Drama;Mastroianni, Marcello;Loren, Sophia;Scola, Ettore;80;Yes;sophiaLoren.png\n1979;112;Blood Feud;Action;Mastroianni, Marcello;Loren, Sophia;Wertmuller, Lina;52;No;sophiaLoren.png\n1991;145;Sophia Loren, Her Own Story;Drama;Gavin, John;Loren, Sophia;;49;No;sophiaLoren.png\n1990;;Running Away;Drama;Loggia, Robert;Loren, Sophia;;2;No;sophiaLoren.png\n1991;130;Man of La Mancha;Music;O'Toole, Peter;Loren, Sophia;;55;No;sophiaLoren.png\n1992;116;Operation Crossbow;Action;Peppard, George;Loren, Sophia;;1;No;sophiaLoren.png\n1986;141;Courage;Drama;Williams, Billy Dee;Loren, Sophia;;56;No;sophiaLoren.png\n1986;94;RAD;Action;Allen, Bill;Loughlin, Lori;Needham, Hal;75;No;NicholasCage.png\n1992;98;Secret Admirer;Comedy;Howell, C. Thomas;Loughlin, Lori;;55;No;NicholasCage.png\n1979;85;Cocaine Cowboys;Action;Palance, Jack;Love, Suzanna;Lommel, Ulli;17;No;NicholasCage.png\n1991;118;Test Pilot;Drama;Gable, Clark;Loy, Myrna;;13;No;NicholasCage.png\n1943;64;Ape Man, The;Horror;Ford, Wallace;Lugosi, Bela;Beaudine, William;83;No;NicholasCage.png\n1986;125;Mission, The;Drama;De Niro, Robert;Lunghi, Cherie;Joffe, Roland;20;No;NicholasCage.png\n1991;102;Curly Sue;Comedy;Belushi, Jim;Lynch, Kelly;Hughes, John;2;No;NicholasCage.png\n1962;150;Lolita;Drama;Mason, James;Lyon, Sue;Kubrick, Stanley;80;No;NicholasCage.png\n1989;101;Sex, Lies, and Videotape;Drama;Spader, James;MacDowell, Andie;Soderbergh, Steven;70;Yes;NicholasCage.png\n1990;107;Green Card;Comedy;Depardieu, Gérard;MacDowell, Andie;Weir, Peter;25;No;NicholasCage.png\n1988;95;Gator Bait II;Action;Muzzcat, Paul;MacKenzie, Jan;Sebastian, Beverly;73;No;NicholasCage.png\n1979;129;Being There;Comedy;Sellers, Peter;MacLaine, Shirley;Ashby, Hal;31;Yes;NicholasCage.png\n1983;132;Terms of Endearment;Drama;Nicholson, Jack;MacLaine, Shirley;Brooks, James L.;32;Yes;JackNicholson.png\n1967;99;Woman Times Seven;Comedy;Sellers, Peter;MacLaine, Shirley;De Sica, Vittorio;36;No;NicholasCage.png\n1968;;Bliss of Mrs. Blossom, The;Comedy;Booth, James;MacLaine, Shirley;McGrath, Joseph;86;No;NicholasCage.png\n1990;101;Postcards from the Edge;Comedy;Quaid, Dennis;MacLaine, Shirley;Nichols, Mike;63;No;NicholasCage.png\n1970;105;Two Mules for Sister Sara;Western;Eastwood, Clint;MacLaine, Shirley;Siegel, Don;36;No;clintEastwood.png\n1992;84;Dragonfight;Drama;Z'Dar, Robert;MacLaren, Fawna;;71;No;NicholasCage.png\n1939;85;Back Door to Heaven;Drama;Ford, Wallace;MacMahon, Aline;Howard, William K.;83;No;NicholasCage.png\n1988;100;Ciao Italia, Madonna Live from Italy;Music;;Madonna;De Winter, Harry;74;No;NicholasCage.png\n1991;118;Madonna, Truth or Dare;Music;;Madonna;Keshishian, Alek;54;No;NicholasCage.png\n1992;60;A Certain Sacrifice;Music;Pattnosh, Jeremy;Madonna;Lewicki, Steven Jon;24;No;NicholasCage.png\n1991;40;National Enquirer, The Untold Story;Music;White, Vanna;Madonna;;65;No;NicholasCage.png\n1990;60;Immaculate Collection, The;Music;;Madonna;;32;No;NicholasCage.png\n1987;50;Madonna Live, The Virgin Tour;Music;;Madonna;;75;No;NicholasCage.png\n1990;5;Madonna, Justify My Love;Music;;Madonna;;77;No;NicholasCage.png\n1991;16;Madonna, Like a Virgin;Music;;Madonna;;63;No;NicholasCage.png\n1988;83;Hot to Trot;Comedy;Goldthwait, Bob;Madsen, Virginia;Dinner, Michael;78;No;NicholasCage.png\n1986;103;Fire with Fire;Drama;Sheffer, Craig;Madsen, Virginia;Gibbins, Duncan;9;No;NicholasCage.png\n1990;120;Hot Spot;Drama;Johnson, Don;Madsen, Virginia;Hopper, Dennis;70;No;NicholasCage.png\n1974;124;Amarcord;Drama;Noel, Magali;Maggio, Pupella;Fellini, Federico;50;Yes;NicholasCage.png\n1988;85;Casablanca Express;Action;Connery, Jason;Maneri, Luisa;Martino, Sergio;33;No;NicholasCage.png\n1980;94;Out of the Blue;Drama;Hopper, Dennis;Manz, Linda;Hopper, Dennis;4;No;NicholasCage.png\n1949;110;Sands of Iwo Jima;War;Wayne, John;Mara, Adele;Dwan, Allan;72;No;johnWayne.png\n1981;104;Hand, The;Horror;Caine, Michael;Marcovicci, Andrea;Stone, Oliver;44;No;NicholasCage.png\n1989;81;Deep Cover;Mystery;Conti, Tom;Markham, Kika;Loncraine, Richard;15;No;NicholasCage.png\n1955;92;Il Bidone;Drama;Crawford, Broderick;Masina, Guilietta;Fellini, Federico;70;No;NicholasCage.png\n1986;130;El Guerrero Solitario;Drama;Eastwood, Clint;Mason, Marsha;Eastwood, Clint;77;No;clintEastwood.png\n1986;130;Heartbreak Ridge;War;Eastwood, Clint;Mason, Marsha;Eastwood, Clint;61;No;clintEastwood.png\n1977;110;Goodbye Girl, The;Comedy;Dreyfuss, Richard;Mason, Marsha;Ross, Herbert;6;Yes;NicholasCage.png\n1991;113;Audrey Rose;Drama;Hopkins, Anthony;Mason, Marsha;;62;No;AnthonyHopkins.png\n1981;86;Polyester;Comedy;Divine;Massey, Edith;;68;No;NicholasCage.png\n1991;144;Robin Hood: Prince of Thieves;Action;Costner, Kevin;Mastrantonio, Mary Elizabeth;Costner, Kevin;8;No;NicholasCage.png\n1992;101;White Sands;Drama;Dafoe, Willem;Mastrantonio, Mary Elizabeth;Donaldson, Roger;38;No;NicholasCage.png\n1986;119;Color of Money, The;Drama;Newman, Paul;Mastrantonio, Mary Elizabeth;Scorsese, Martin;6;Yes;paulNewman.png\n1986;119;Children of a Lesser God;Drama;Hurt, William;Matlin, Marlee;Haines, Randa;20;Yes;NicholasCage.png\n1986;;Matador;Comedy;Banderas, Antonio;Maura, Carmen;Almodóvar, Pedro;34;No;NicholasCage.png\n1989;88;Women on the Verge of a Nervous Breakdown;Comedy;Banderas, Antonio;Maura, Carmen;Almodóvar, Pedro;65;No;NicholasCage.png\n1980;86;Pepi Luci Bom;Comedy;Rotaeta, Félix;Maura, Carmen;Almodóvar, Pedro;66;No;NicholasCage.png\n1989;100;Forgotten, The;Mystery;Carradine, Keith;Maynard, Mimi;Keach, James;69;No;NicholasCage.png\n1992;89;Flame & the Arrow, The;Action;Lancaster, Burt;Mayo, Virginia;;0;No;burtLancaster.png\n1990;92;After the Shock;Drama;Kotto, Yaphet;McClanahan, Rue;Sherman, Gary;28;No;NicholasCage.png\n1990;110;Modern Love;Comedy;Benson, Robby;McClanahan, Rue;;18;No;NicholasCage.png\n1992;95;Riff Raff;Comedy;Carlyle, Robert;McCourt, Emer;Loach, Ken;71;No;NicholasCage.png\n1967;81;Glory Stompers, The;Action;Hopper, Dennis;McCrea, Jody;Lanza, Anthony M.;27;No;NicholasCage.png\n1990;181;Dances with Wolves;Western;Costner, Kevin;McDonnell, Mary;Costner, Kevin;8;Yes;NicholasCage.png\n1987;130;Matewan;Drama;Jones, James Earl;McDonnell, Mary;Sayles, John;81;No;NicholasCage.png\n1988;120;Mississippi Burning;Drama;Hackman, Gene;McDormand, Frances;Parker, Alan;41;Yes;NicholasCage.png\n1975;130;Eiger Sanction, The;Action;Eastwood, Clint;McGee, Vonetta;Eastwood, Clint;69;No;clintEastwood.png\n1988;109;Unsettled Land;Drama;Shea, John;McGillis, Kelly;Barbash, Uri;75;No;NicholasCage.png\n1991;98;Cat Chaser;Drama;Weller, Peter;McGillis, Kelly;Ferrera, Abel;6;No;NicholasCage.png\n1988;110;Accused, The;Drama;Coulson, Bernie;McGillis, Kelly;Kaplan, Jonathan;71;Yes;NicholasCage.png\n1989;109;Winter People;Drama;Russell, Kurt;McGillis, Kelly;Kotcheff, Ted;30;No;NicholasCage.png\n1983;101;Reuben, Reuben;Comedy;Conti, Tom;McGillis, Kelly;Miller, Robert Ellis;2;No;NicholasCage.png\n1987;102;Made in Heaven;Fantasy;Hutton, Timothy;McGillis, Kelly;Rudolph, Alan;57;No;NicholasCage.png\n1986;109;Top Gun;Action;Cruise, Tom;McGillis, Kelly;Scott, Tony;8;No;NicholasCage.png\n1985;112;Witness;Drama;Ford, Harrison;McGillis, Kelly;Weir, Peter;59;No;NicholasCage.png\n1988;111;House on Carroll Street, The;Mystery;Daniels, Jeff;McGillis, Kelly;;6;No;NicholasCage.png\n1984;109;Racing with the Moon;Drama;Penn, Sean;McGovern, Elizabeth;Benjamin, Richard;50;No;NicholasCage.png\n1983;98;Lovesick;Comedy;Moore, Dudley;McGovern, Elizabeth;Brickman, Marshall;51;No;NicholasCage.png\n1988;106;She's Having a Baby;Comedy;Hughes, Kevin Bacon;McGovern, Elizabeth;;18;No;NicholasCage.png\n1965;199;Greatest Story Ever Told, The;Drama;Sydow, Max von;McGuire, Dorothy;Stevens, George;26;No;NicholasCage.png\n1989;105;Hawks;Drama;Dalton, Timothy;McTeer, Janet;Miller, Robert Ellis;11;No;NicholasCage.png\n1981;91;So Fine;Comedy;O'Neal, Ryan;Melato, Mariangela;Bergman, Andrew;17;No;NicholasCage.png\n1957;89;Paths of Glory;Drama;Douglas, Kirk;Menjou, Adolphe;Kubrick, Stanley;47;No;NicholasCage.png\n1964;120;Tom Jones;Drama;Ustinov, Peter;Mercouri, Melina;Dassin, Jules;39;Yes;NicholasCage.png\n1975;103;Sunshine Boys, The;Comedy;Burns, George;Meredith, Lee;Ross, Herbert;35;Yes;NicholasCage.png\n1988;98;Caddyshack 2;Comedy;Mason, Jackie;Merrill, Dina;Arkush, Allan;34;No;NicholasCage.png\n1990;117;Internal Affairs;Drama;Gere, Richard;Metcalf, Laurie;Figgis, Mike;3;No;NicholasCage.png\n1991;206;JFK;Drama;Costner, Kevin;Metcalf, Laurie;Stone, Oliver;78;No;NicholasCage.png\n1991;97;New Jack City;Action;Snipes, Wesley;Michael Michele;Van Peebles, Mario;80;No;NicholasCage.png\n1991;87;Scenes from a Mall;Comedy;Allen, Woody;Midler, Bette;;8;No;woody.png\n1987;118;Hope & Glory;War;Hayman, David;Miles, Sarah;Boorman, John;3;No;NicholasCage.png\n1970;194;Ryan's Daughter;Drama;Mitchum, Robert;Miles, Sarah;Lean, David;81;Yes;NicholasCage.png\n1973;127;Man Who Loved Cat Dancing, The;Western;Reynolds, Burt;Miles, Sarah;Sarafian, Richard C.;40;No;NicholasCage.png\n1962;123;Man Who Shot Liberty Valance, The;Western;Stewart, James;Miles, Vera;Ford, John;85;No;johnFord.png\n1989;102;Dead-Bang;Action;Johnson, Don;Miller, Penelope Ann;Frankenheimer, John;9;No;NicholasCage.png\n1988;90;Big Top Pee-wee;Comedy;Reubens, Paul;Miller, Penelope Ann;Kleiser, Randal;17;No;NicholasCage.png\n1960;103;Time Machine, The;Science Fiction;Taylor, Rod;Mimieux, Yvette;Pal, George;88;No;NicholasCage.png\n1972;128;Cabaret;Drama;Grey, Joel;Minnelli, Liza;Fosse, Bob;59;Yes;NicholasCage.png\n1981;97;Arthur;Comedy;Moore, Dudley;Minnelli, Liza;Gordon, Steve;79;Yes;NicholasCage.png\n1976;97;A Matter of Time;Drama;Boyer, Charles;Minnelli, Liza;Minnelli, Vincente;70;No;NicholasCage.png\n1977;137;New York, New York;Drama;De Niro, Robert;Minnelli, Liza;Scorsese, Martin;8;No;NicholasCage.png\n1989;89;Nightmare on Elm Street, Pt. 5, The Dream Child;Horror;Englund, Robert;Minter, Kelly Jo;Hopkins, Stephen;41;No;NicholasCage.png\n1980;100;Fiendish Plot of Dr. Fu Manchu, The;Comedy;Sellers, Peter;Mirren, Helen;Haggard, Piers;29;No;NicholasCage.png\n1991;240;Four American Composers;Music;Cage, John;Monk, Meredith;Greenaway, Peter;3;No;NicholasCage.png\n1950;112;Asphalt Jungle, The;Action;Hayden, Sterling;Monroe, Marilyn;Huston, John;77;No;NicholasCage.png\n1992;61;Ladies of the Chorus;Music;Garr, Eddie;Monroe, Marilyn;Karlson, Phil;60;No;NicholasCage.png\n1953;95;How to Marry a Millionaire;Comedy;Powell, William;Monroe, Marilyn;Negulesco, Jean;65;No;NicholasCage.png\n1983;;Hollywood Out-Takes & Rare Footage;Comedy;Bogart, Humphrey;Monroe, Marilyn;;27;No;NicholasCage.png\n1991;94;Nothing But Trouble;Comedy;Candy, John;Moore, Demi;Aykroyd, Dan;25;No;NicholasCage.png\n1987;109;Wisdom;Action;Estevez, Emilio;Moore, Demi;Estevez, Emilio;25;No;NicholasCage.png\n1986;94;One Crazy Summer;Comedy;Cusack, John;Moore, Demi;Holland, Savage Steve;61;No;NicholasCage.png\n1989;110;We're No Angels;Comedy;De Niro, Robert;Moore, Demi;Jordan, Neil;51;No;NicholasCage.png\n1984;102;No Small Affair;Comedy;Cryer, Jon;Moore, Demi;Schatzberg, Jerry;10;No;NicholasCage.png\n1990;127;Ghost;Science Fiction;Swayze, Patrick;Moore, Demi;Zucker, Jerry;6;Yes;NicholasCage.png\n1986;113;About Last Night;Drama;Lowe, Rob;Moore, Demi;Zwick, Edward;66;No;NicholasCage.png\n1982;107;Six Weeks;Drama;Moore, Dudley;Moore, Mary Tyler;Bill, Tony;73;No;NicholasCage.png\n1948;89;Return of October;Comedy;Ford, Glenn;Moore, Terry;Lewis, Joseph H.;35;No;glennFord.png\n1952;99;Come Back, Little Sheba;Drama;Lancaster, Burt;Moore, Terry;Mann, Daniel;50;Yes;burtLancaster.png\n1974;117;Going Places;Drama;Depardieu, Gérard;Moreau, Jeanne;Blier, Bertrand;66;No;NicholasCage.png\n1970;99;Monte Walsh;Western;Marvin, Lee;Moreau, Jeanne;Fraker, William A.;29;No;NicholasCage.png\n1955;100;Mr. Arkadin;Drama;Welles, Orson;Mori, Paola;Welles, Orson;80;No;NicholasCage.png\n1988;;White of the Eye;Mystery;Keith, David;Moriarty, Cathy;Cammell, Donald;48;No;NicholasCage.png\n1968;90;Producers, The;Comedy;Wilder, Gene;Mostel, Zero;Brooks, Mel;33;No;NicholasCage.png\n1976;94;Front, The;Drama;Allen, Woody;Mostel, Zero;Ritt, Martin;70;No;woody.png\n1987;86;House of the Rising Sun;Drama;Annese, Frank;Moyer, Tawny;Gold, Greg;45;No;NicholasCage.png\n1988;91;In a Shallow Grave;Drama;Biehn, Michael;Mueller, Maureen;Bowser, Kenneth;72;No;NicholasCage.png\n1974;111;Mc Q;Action;Wayne, John;Muldaur, Diana;Sturges, John;73;No;johnWayne.png\n1941;85;Lady from Louisiana;Drama;Wayne, John;Munson, Ona;Vorhaus, Bernard;38;No;johnWayne.png\n1990;102;Wait Until Spring Bandini;Drama;Mantegna, Joe;Muti, Ornella;Deruddere, Dominique;29;No;NicholasCage.png\n1940;105;Long Voyage Home, The;Drama;Wayne, John;Natwick, Mildred;Ford, John;88;No;johnWayne.png\n1955;100;Trouble with Harry, The;Mystery;Forsythe, John;Natwick, Mildred;Hitchcock, Alfred;28;No;alfredHitchcock.png\n1987;60;Encounters;Drama;Von Bergan, Raven;Navarro, Monica;Marder, Bruce;44;No;NicholasCage.png\n1963;112;Hud;Drama;Newman, Paul;Neal, Patricia;Ritt, Martin;2;Yes;paulNewman.png\n1951;111;Operation Pacific;War;Wayne, John;Neal, Patricia;;5;No;johnWayne.png\n1987;83;Surf Nazis Must Die;Horror;Brenner, Barry;Neely, Gail;George, Peter;50;No;NicholasCage.png\n1956;124;Teahouse of the August Moon;Drama;Brando, Marlon;Negami, Jun;Mann, Daniel;11;No;brando.png\n1992;88;Back in the U.S.S.R.;Action;Whaley, Frank;Negoda, Natalya;;61;No;NicholasCage.png\n1970;91;Man Who Haunted Himself, The;Drama;Moore, Roger;Neil, Hildegard;Dearden, Basil;75;No;NicholasCage.png\n1991;108;Prisoner of Honor.;Drama;Dreyfuss, Richard;Neilson, Catherine;Russell, Ken;58;No;NicholasCage.png\n1988;83;Control;Drama;Lancaster, Burt;Nelligan, Kate;;27;No;burtLancaster.png\n1923;57;Desert Rider;Western;Hoxie, Jack;Nelson, Evelyn;Bradbury, Robert N.;;No;NicholasCage.png\n1980;109;Wholly Moses!;Comedy;Moore, Dudley;Newman, Laraine;Weis, Gary;25;No;NicholasCage.png\n1991;110;Star Trek VI: The Undiscovered Country;Science Fiction;Shatner, William;Nichols, Nichelle;Meyer, Nicholas;11;No;NicholasCage.png\n1989;107;Star Trek V: The Final Frontier;Action;Shatner, William;Nichols, Nichelle;Shatner, William ;87;No;NicholasCage.png\n1991;85;Circuitry Man;Action;Metzler, Jim;Nicholson, Dana W.;Lovy, Steven;78;No;NicholasCage.png\n1986;87;Cobra;Action;Stallone, Sylvester;Nielsen, Brigitte;Cosmatos, George P.;57;No;NicholasCage.png\n1987;103;Beverly Hills Cop II;Comedy;Murphy, Eddie;Nielsen, Brigitte;Scott, Tony;37;No;NicholasCage.png\n1990;90;Red Sonja;Action;Schwarzenegger, Arnold;Nielsen, Brigitte;;40;No;NicholasCage.png\n1950;93;To Joy;Drama;Olin, Stig;Nilsson, Maj-Britt;Bergman, Ingmar;65;No;Bergman.png\n1992;112;Macbeth;Drama;Welles, Orson;Nolan, Jeanette;;45;No;NicholasCage.png\n1958;128;Vertigo;Drama;Stewart, James;Novak, Kim;Hitchcock, Alfred;10;No;alfredHitchcock.png\n1987;91;Young Love: Lemon Popsicle Seven;Comedy;Katzur, Yftach;Noy, Zachi;Bennett, Walter;47;No;NicholasCage.png\n1946;93;Crack-Up;Mystery;Marshall, Herbert;O'Brien, Pat;Reis, Irving;25;No;NicholasCage.png\n1941;57;Bury Me Not on the Lone Prairie;Western;Brown, Johnny Mack;O'Day, Nell;Taylor, Ray;85;No;NicholasCage.png\n1940;57;Law & Order;Western;Brown, Johnny Mack;O'Day, Nell;Taylor, Ray;87;No;NicholasCage.png\n1941;56;Man from Montana;Western;Brown, Johnny Mack;O'Day, Nell;Taylor, Ray;85;No;NicholasCage.png\n1992;137;Long Gray Line, The;Drama;Power, Tyrone;O'Hara, Maureen;Ford, John;26;No;johnFord.png\n1950;105;Rio Grande;Western;Wayne, John;O'Hara, Maureen;Ford, John;64;No;johnWayne.png\n1957;107;Wings of Eagles, The;Drama;Wayne, John;O'Hara, Maureen;Ford, John;29;No;johnWayne.png\n1939;94;Jamaica Inn;Drama;Laughton, Charles;O'Hara, Maureen;Hitchcock, Alfred;75;No;alfredHitchcock.png\n1971;110;Big Jake;Action;Wayne, John;O'Hara, Maureen;Sherman, George;68;No;johnWayne.png\n1992;153;Quiet Man, The;Drama;Wayne, John;O'Hara, Maureen;;74;No;johnWayne.png\n1983;72;After the Rehearsal;Drama;Josephson, Erland;Olin, Lena;Bergman, Ingmar;0;No;Bergman.png\n1952;90;Big Jim McLain;Western;Wayne, John;Olson, Nancy;Ludwig, Edward;14;No;johnWayne.png\n1969;101;Smith!;Western;Ford, Glenn;Olson, Nancy;O'Herlihy, Michael;62;No;glennFord.png\n1953;79;Wild One, The;Drama;Brando, Marlon;O'Malley, Pat;Benedek, Laslo;26;No;brando.png\n1929;129;Manxman, The;Drama;Brisson, Carl;Ondra, Anny;Hitchcock, Alfred;65;No;alfredHitchcock.png\n1978;126;International Velvet;Drama;Hopkins, Anthony;O'Neal, Tatum;Forbes, Bryan;40;No;AnthonyHopkins.png\n1981;104;Scanners;Horror;Lack, Stephen;O'Neill, Jennifer;Cronenberg, David;32;No;NicholasCage.png\n1986;98;Trick or Treat;Horror;Price, Marc;Orgolini, Lisa;Smith, Charles Martin;47;No;NicholasCage.png\n1982;92;48 Hrs.;Action;Nolte, Nick;O'Toole, Annette;Hill, Walter;67;No;NicholasCage.png\n1985;108;Trip to Bountiful, The;Drama;Heard, John;Page, Geraldine;Masterson, Peter;62;Yes;NicholasCage.png\n1955;116;Mister Roberts;Comedy;Fonda, Henry;Palmer, Betsy;Ford, John;8;Yes;johnFord.png\n1969;127;Z;Drama;Montand, Yves;Papas, Irene;Costa-Gavras;72;Yes;NicholasCage.png\n1987;139;Maurice;Drama;Wilby, James;Parfitt, Judy;Ivory, James;45;No;NicholasCage.png\n1969;114;Hamlet;Drama;Williamson, Nicol;Parfitt, Judy;Richardson, Tony;39;No;NicholasCage.png\n1991;117;La Femme Nikita;Drama;Karyo, Tcheky;Parillaud, Anne;Besson, Luc;6;No;NicholasCage.png\n1993;95;Honeymoon in Vegas;Comedy;Caan, James;Parker, Sarah Jessica;Bergman, Andrew;53;No;NicholasCage.png\n1988;90;Going for the Gold;Action;Edwards, Anthony;Parker, Sarah Jessica;Taylor, Dan;10;No;clintEastwood.png\n1976;128;Shout at the Devil;Action;Marvin, Lee;Parkins, Barbara;Hunt, Peter R.;0;No;NicholasCage.png\n1986;94;A Smoky Mountain Christmas;Music;Majors, Lee;Parton, Dolly;Winkler, Henry;23;No;NicholasCage.png\n1984;95;Getting Physical;Drama;Naughton, David;Paul, Alexandra;Stern, Steven Hilliard;75;No;NicholasCage.png\n1990;95;Torn Apart;Drama;Pasdar, Adrian;Peck, Cecilia;Fisher, Jack;8;No;NicholasCage.png\n1986;112;From the Hip;Comedy;Nelson, Judd;Perkins, Elizabeth;Clark, Bob;36;No;NicholasCage.png\n1984;102;Ratings Game, The;Comedy;DeVito, Danny;Perlman, Rhea;DeVito, Danny;21;No;NicholasCage.png\n1992;100;Class Act;Drama;Reid, Christopher;Perlman, Rhea;;88;No;NicholasCage.png\n1986;89;Water;Comedy;Caine, Michael;Perrine, Valerie;Clement, Dick;47;No;NicholasCage.png\n1978;88;Silent Movie;Comedy;Brooks, Mel;Peters, Bernadette;Brooks, Mel;27;No;NicholasCage.png\n1989;122;Pink Cadillac;Comedy;Eastwood, Clint;Peters, Bernadette;Eastwood, Clint;12;No;clintEastwood.png\n1979;94;Jerk, The;Comedy;Martin, Steve;Peters, Bernadette;Reiner, Carl;22;No;NicholasCage.png\n1980;180;Wild Times;;Elliott, Sam;Peyser, Penny;Compton, Richard;75;No;NicholasCage.png\n1986;107;Sweet Liberty;Comedy;Alda, Alan;Pfeiffer, Michelle;Alda, Alan;12;No;MichellePfeiffer.png\n1982;115;Grease II;Music;Caulfield, Maxwell;Pfeiffer, Michelle;Birch, Patricia;64;No;MichellePfeiffer.png\n1989;104;Married to the Mob;Comedy;Modine, Matthew;Pfeiffer, Michelle;Demme, Jonathan;8;No;MichellePfeiffer.png\n1985;121;Ladyhawke;Adventure;Broderick, Matthew;Pfeiffer, Michelle;Donner, Richard;68;No;MichellePfeiffer.png\n1989;114;Fabulous Baker Boys, The;Drama;Bridges, Jeff;Pfeiffer, Michelle;Kloves, Steve;66;No;MichellePfeiffer.png\n1985;115;Into the Night;Comedy;Goldblum, Jeff;Pfeiffer, Michelle;Landis, John;62;No;MichellePfeiffer.png\n1991;124;Russia House, The;Drama;Connery, Sean;Pfeiffer, Michelle;Schepisi, Fred;3;No;MichellePfeiffer.png\n1988;116;Tequila Sunrise;Mystery;Gibson, Mel;Pfeiffer, Michelle;Towne, Robert;50;No;MichellePfeiffer.png\n1989;74;B. A. D. Cats;Action;Morrow, Vic;Pfeiffer, Michelle;;87;No;MichellePfeiffer.png\n1971;108;Last Movie, The;Drama;Hopper, Dennis;Phillips, Michelle;Hopper, Dennis;22;No;NicholasCage.png\n1973;106;Dillinger;Drama;Oates, Warren;Phillips, Michelle;Milius, John;83;No;NicholasCage.png\n1988;360;Little Dorrit;Drama;Jacobi, Derek;Pickering, Sarah;Edzard, Christine;12;No;NicholasCage.png\n1927;78;My Best Girl;Drama;Rogers, Charles;Pickford, Mary;Taylor, Sam;31;No;NicholasCage.png\n1989;93;Seizure;Horror;Frid, Jonathan;Pickles, Christina;Stone, Oliver;59;No;NicholasCage.png\n1990;89;A Chorus of Disapproval;Comedy;Irons, Jeremy;Pigg, Alexandra;Winner, Michael;0;No;NicholasCage.png\n1962;119;Rome Adventure;Drama;Donahue, Tony;Pleshette, Suzanne;Daves, Delmer;39;No;NicholasCage.png\n1992;121;Drowning by Numbers;Mystery;Hill, Bernard;Plowright, Joan;Greenaway, Peter;28;No;NicholasCage.png\n1991;88;Born to Ride;Action;Stamos, John;Polo, Teri;Baker, Graham;59;No;NicholasCage.png\n1988;94;Her Alibi;Comedy;Selleck, Tom;Porizkova, Paulina;Beresford, Bruce;80;No;NicholasCage.png\n1988;96;Glitz;Mystery;Smits, Jimmy;Post, Markie;;9;No;NicholasCage.png\n1990;95;Dangerous Pursuit;Mystery;Harrison, Gregory;Powers, Alexandra;Stern, Sandor;88;No;NicholasCage.png\n1962;123;Experiment in Terror;Mystery;Ford, Glenn;Powers, Stefanie;Edwards, Blake;77;No;glennFord.png\n1972;105;Hideaways, The;Comedy;Doran, Johnny;Prager, Sally;Cook, Fielder;42;No;NicholasCage.png\n1965;108;What's New Pussycat;Comedy;O'Toole, Peter;Prentiss, Paula;Donner, Clive;83;No;NicholasCage.png\n1965;108;What's New Pussycat?;Comedy;Sellers, Peter;Prentiss, Paula;Donner, Clive;46;No;NicholasCage.png\n1983;98;Packin' It In;Comedy;Benjamin, Richard;Prentiss, Paula;Taylor, Jud;8;No;NicholasCage.png\n1988;90;Naked Gun: From the Files of Police Squad!, THe;Comedy;Nielsen, Leslie;Presley, Priscilla;Zucker, David;9;No;NicholasCage.png\n1990;106;In Too Deep;Drama;Race, Hugo;Press, Santha;Tatoulis, Colin South, John;50;No;NicholasCage.png\n1988;107;Twins;Comedy;Schwarzenegger, Arnold;Preston, Kelly;Reitman, Ivan;23;No;NicholasCage.png\n1988;94;Experts, The;Comedy;Travolta, John;Preston, Kelly;Thomas, Dave;67;No;NicholasCage.png\n1989;94;Naked Lie;Drama;Lucking, William;Principal, Victoria;Colla, Richard A.;7;No;NicholasCage.png\n1987;87;Mistress;Drama;Rachins, Allan;Principal, Victoria;Tuchner, Michael;36;No;NicholasCage.png\n1992;;Pleasure Palace;Action;Sharif, Omar;Principal, Victoria;;45;No;NicholasCage.png\n1970;100;Adam at 6 A.M.;Drama;Douglas, Michael;Purcell, Lee;Scheerer, Robert;3;No;NicholasCage.png\n1990;93;Web of Deceit;Drama;Read, James;Purl, Linda;Stern, Sandor;6;No;NicholasCage.png\n1991;119;New York Stories;Comedy;Allen, Woody;Questel, Mae;Coppola, Francis Ford;6;No;NicholasCage.png\n1987;97;Dreams Lost, Dreams Found;Drama;Robb, David;Quinlan, Kathleen;Patterson, Willi;66;No;NicholasCage.png\n1987;103;Au Revoir les Enfants;Drama;Manesse, Gaspard;Racette, Francine;Malle, Louis;35;No;NicholasCage.png\n1989;122;Quo Vadis;Drama;Brandauer, Klaus Maria;Raines, Cristina;Rossi, Franco;6;No;NicholasCage.png\n1949;100;Fighting Kentuckian, The;Action;Wayne, John;Ralston, Vera;Waggner, George;74;No;johnWayne.png\n1974;104;Zardoz;Science Fiction;Connery, Sean;Rampling, Charlotte;Boorman, John;6;No;seanConnery.png\n1989;84;Police Academy 6: City under Siege;Comedy;Smith, Bubba;Ramsey, Marion;Bonerz, Peter;29;No;NicholasCage.png\n1988;90;Police Academy 5: Assignment Miami Beach;Comedy;Gaynes, George;Ramsey, Marion;Myerson, Alan;59;No;NicholasCage.png\n1986;84;Police Academy 3: Back in Training;Comedy;Guttenberg, Steve;Ramsey, Marion;Paris, Jerry;6;No;NicholasCage.png\n1991;60;America's Music, Blues;Music;Hopkins, Linda;Redd, Vi;Walton, Kip;54;No;NicholasCage.png\n1977;100;Julia;Drama;Fonda, Vanessa;Redgrave, Jane;Zinnemann, Fred;75;Yes;NicholasCage.png\n1971;111;Devils, The;Drama;Reed, Oliver;Redgrave, Vanessa;Russell, Ken;69;No;NicholasCage.png\n1984;90;Ransom;Drama;Ford, Glenn;Reed, Donna;Segal, Alex;73;No;glennFord.png\n1990;97;Cadillac Man;Comedy;Williams, Robin;Reed, Pamela;Donaldson, Roger;28;No;NicholasCage.png\n1986;104;Best of Times, The;Comedy;Williams, Robin;Reed, Pamela;Spottiswoode, Roger;88;No;NicholasCage.png\n1985;135;Death of a Salesman;Drama;Hoffman, Dustin;Reid, Kate;Schlöndorff, Volker;13;No;NicholasCage.png\n1993;104;It Started with a Kiss;Drama;Ford, Glenn;Reynolds, Debbie;;80;No;glennFord.png\n1989;88;Money, The;Drama;Luckinbill, Laurence;Richards, Elizabeth;;29;No;NicholasCage.png\n1987;153;Empire of the Sun;Drama;Malkovich, John;Richardson, Miranda;Spielberg, Steven;6;No;NicholasCage.png\n1991;102;Comfort of Strangers, The;Mystery;Walken, Christopher;Richardson, Natasha;Schrader, Paul;5;No;NicholasCage.png\n1969;135;On Her Majesty's Secret Service;Action;Lazenby, George;Rigg, Diana;Hunt, Peter R.;66;No;NicholasCage.png\n1986;96;Pretty in Pink;Drama;Stanton, Harry Dean;Ringwald, Molly;Deutch, Howard;75;No;NicholasCage.png\n1987;90;PK. & the Kid.;Drama;LeMat, Paul;Ringwald, Molly;;49;No;NicholasCage.png\n1943;60;Lone Star Trail, The;Western;Brown, Johnny Mack;Ritter, Tex;Taylor, Ray;27;No;NicholasCage.png\n1986;98;Summer;Comedy;Gauthier, Vincent;Riviere, Marie;Rohmer, Eric;11;No;NicholasCage.png\n1987;93;Planes, Trains & Automobiles;Comedy;Martin, Steve;Robbins, Laila;Hughes, John;73;No;NicholasCage.png\n1990;119;Pretty Woman;Comedy;Gere, Richard;Roberts, Julia;Marshall, Garry;43;No;NicholasCage.png\n1991;111;Flatliners;Drama;Sutherland, Kiefer;Roberts, Julia;Schumacher, Joel;19;No;NicholasCage.png\n1991;142;Hook;Action;Williams, Robin;Roberts, Julia;Spielberg, Steven;4;No;NicholasCage.png\n1940;56;Riders of Pasco Basin;Western;Brown, Johnny Mack;Robinson, Frances;Taylor, Ray;17;No;NicholasCage.png\n1992;53;Gotta Dance, Gotta Sing;Music;Astaire, Fred;Rogers, Ginger;;20;No;NicholasCage.png\n1990;106;Desperate Hours;Mystery;Rourke, Mickey;Rogers, Mimi;Cimino, Michael;58;No;NicholasCage.png\n1986;111;Gung Ho;Comedy;Keaton, Michael;Rogers, Mimi;Howard, Ron;59;No;NicholasCage.png\n1992;96;Shooting Elizabeth;Mystery;Goldblum, Jeff;Rogers, Mimi;Taylor, Baz;5;No;NicholasCage.png\n1951;101;Strangers on a Train;Mystery;Granger, Farley;Roman, Ruth;Hitchcock, Alfred;17;No;alfredHitchcock.png\n1979;198;Sacketts, The;Western;Elliott, Sam;Roman, Ruth;Totten, Robert;86;No;NicholasCage.png\n1991;87;To Die Standing;Action;De Young, Cliff;Rose, Jamie;Morneau, Louis;53;No;NicholasCage.png\n1980;92;Rodeo Girl;Drama;Hopkins, Bo;Ross, Katharine;Cooper, Jackie;80;No;NicholasCage.png\n1969;110;Butch Cassidy & the Sundance Kid;Western;Newman, Paul;Ross, Katharine;Hill, George Roy;29;Yes;paulNewman.png\n1968;121;Hellfighters;Action;Wayne, John;Ross, Katharine;McLaglen, Andrew V.;22;No;johnWayne.png\n1980;92;Final Countdown, The;Action;Douglas, Kirk;Ross, Katharine;Taylor, Don;35;No;NicholasCage.png\n1986;120;Blue Velvet;Mystery;MacLachlan, Kyle;Rossellini, Isabella;Lynch, David;6;No;lynch.png\n1989;110;Cousins;Comedy;Danson, Ted;Rossellini, Isabella;;28;No;NicholasCage.png\n1976;90;Black & White in Color;Comedy;Carmet, Jean;Rouvel, Catherine;Annaud, Jean-Jacques;24;Yes;NicholasCage.png\n1988;81;Another Woman;Drama;Hackman, Gene;Rowlands, Gena;Allen, Woody;7;No;woody.png\n1992;128;Night on Earth;Drama;Benigni, Roberto;Rowlands, Gena;Jarmusch, Jim;24;No;NicholasCage.png\n1988;92;Permanent Record;Drama;Boyce, Alan;Rubin, Jennifer;Silver, Marisa;42;No;NicholasCage.png\n1992;138;Fisher King, The;Drama;Williams, Robin;Ruehl, Mercedes;Gilliam, Terry;8;Yes;NicholasCage.png\n1991;98;Another You;Comedy;Pryor, Richard;Ruehl, Mercedes;Phillips, Maurice;75;No;NicholasCage.png\n1958;167;Young Lions, The;Drama;Brando, Marlon;Rush, Barbara;Dmytryk, Edward;10;No;NicholasCage.png\n1988;89;Cheerleader Camp;Horror;Garrett, Leif;Russell, Betsy;Quinn, John;79;No;NicholasCage.png\n1990;98;Trapper County War;Action;Hudson, Ernie;Russell, Betsy;;5;No;NicholasCage.png\n1947;96;Angel & the Badman;Western;Wayne, John;Russell, Gail;Grant, James Edward;84;No;johnWayne.png\n1990;109;Impulse;Mystery;Fahey, Jeff;Russell, Theresa;Locke, Sondra;23;No;NicholasCage.png\n1988;91;Track Twenty-Nine;Drama;Oldman, Gary;Russell, Theresa;Roeg, Nicolas;48;No;NicholasCage.png\n1991;110;Freejack;Action;Estevez, Emilio;Russo, Rene;Richardson, Tony;26;No;NicholasCage.png\n1939;109;John Wayne Matinee Double Feature, No. 1;Western;Wayne, John;Rutherford, Ann;;30;No;johnWayne.png\n1988;81;Smallest Show on Earth, The;Comedy;Sellers, Peter;Rutherford, Margaret;Dearden, Basil;24;No;NicholasCage.png\n1987;120;Innerspace;Science Fiction;Quaid, Dennis;Ryan, Meg;Dante, Joe;41;No;NicholasCage.png\n1988;97;Presidio, The;Action;Connery, Sean;Ryan, Meg;Hyams, Peter;4;No;seanConnery.png\n1990;102;Joe Versus the Volcano;Comedy;Hanks, Tom;Ryan, Meg;Patrick, John;17;No;NicholasCage.png\n1991;135;Doors, The;Drama;Kilmer, Val;Ryan, Meg;Stone, Oliver;60;No;NicholasCage.png\n1990;98;Welcome Home, Roxy Carmichael;Comedy;Daniels, Jeff;Ryder, Winona;Abrahams, Jim;41;No;NicholasCage.png\n1972;99;Cancel My Reservation;Comedy;Hope, Bob;Saint, Eva Marie;Bogart, Paul;60;No;NicholasCage.png\n1991;135;North by Northwest;Mystery;Grant, Cary;Saint, Eva Marie;Hitchcock, Alfred;20;No;alfredHitchcock.png\n1966;127;Russians Are Coming, the Russians Are, The;Comedy;Reiner, Carl;Saint, Eva Marie;Jewison, Norman;79;Yes;NicholasCage.png\n1992;213;Exodus;Drama;Newman, Paul;Saint, Eva Marie;Preminger, Otto;13;No;paulNewman.png\n1982;128;Ballad of Narayama, The;Drama;Ogata, Ken;Sakamoto, Sumiko;Imamura, Shohei;88;No;NicholasCage.png\n1985;96;Out of the Darkness;Mystery;Sheen, Martin;Salt, Jennifer;Taylor, Jud;86;No;NicholasCage.png\n1971;90;Garden of the Finzi-Continis, The;Drama;Capolicchio, Lino;Sanda, Dominique;De Sica, Vittorio;42;Yes;NicholasCage.png\n1974;105;Steppenwolf;Drama;Sydow, Max von;Sanda, Dominique;Haines, Fred;20;No;NicholasCage.png\n1973;100;Mackintosh Man, The;Action;Newman, Paul;Sanda, Dominique;Huston, John;65;No;paulNewman.png\n1968;105;Partner;Drama;Clementi, Pierre;Sandrelli, Stefania;Bertolucci, Bernardo;26;No;NicholasCage.png\n1970;107;Conformist, The;Drama;Trintignant, Jean-Louis;Sandrelli, Stefania;Bertolucci, Bernardo;72;No;NicholasCage.png\n1971;102;Dirty Harry;Drama;Eastwood, Clint;Santoni, Reni;Siegel, Don;72;No;clintEastwood.png\n1986;103;Ferris Bueller's Day Off;Comedy;Broderick, Matthew;Sara, Mia;Hughes, John;12;No;NicholasCage.png\n1986;89;Legend;Science Fiction;Cruise, Tom;Sara, Mia;Scott, Ridley;42;No;NicholasCage.png\n1984;110;Buddy System, The;Drama;Dreyfuss, Richard;Sarandon, Susan;Jordan, Glenn;48;No;NicholasCage.png\n1989;97;A Dry White Season;Drama;Sutherland, Donald;Sarandon, Susan;Palcy, Euzhan;71;No;NicholasCage.png\n1975;105;Rocky Horror Picture Show, The;Music;Gray, Charles;Sarandon, Susan;Sharman, Jim;59;No;NicholasCage.png\n1968;360;War & Peace;Drama;Tikhonov, Vyacheslav;Savelyeva, Lyudmila;Bondarchuk, Sergei;80;Yes;NicholasCage.png\n1992;96;Defense of the Realm;Drama;Elliott, Denholm;Scacchi, Greta;;79;No;NicholasCage.png\n1991;90;Basil The Rat;Comedy;Cleese, John;Scales, Prunella;;9;No;NicholasCage.png\n1979;90;Fawlty Towers, Gourmet Night, Waldorf Salad & The Kipper & the Corpse;Comedy;Cleese, John;Scales, Prunella;;46;No;NicholasCage.png\n1991;80;Going Under;Comedy;Pullman, Bill;Schaal, Wendy;Travis, Mark W.;30;No;NicholasCage.png\n1990;83;U S. Sub Standard.;Comedy;Pullman, Bill;Schaal, Wendy;;27;No;NicholasCage.png\n1990;;Hells Angels on Wheels;Action;Nicholson, Jack;Scharf, Sabrina;;1;No;NicholasCage.png\n1975;118;Passenger, The;Drama;Nicholson, Jack;Schneider, Maria;Antonioni, Michelangelo;32;No;JackNicholson.png\n1973;127;Last Tango in Paris;Drama;Brando, Marlon;Schneider, Maria;Bertolucci, Bernardo;28;No;brando.png\n1987;155;Indigo Autumn & Lilac Dream;Drama;Singer, Marc;Schrage, Lisa;Gillard, Stuart;72;No;NicholasCage.png\n1924;95;Kriemhild's Revenge, The Nibelungenlied;Drama;Loos, Theodor;Schön, Margarete;Lang, Fritz;74;No;NicholasCage.png\n1966;102;Johnny Tiger;Drama;Taylor, Robert;Scott, Brenda;Wendkos, Paul;69;No;NicholasCage.png\n1986;90;Head Office;Comedy;Reinhold, Judge;Seymour, Jane;Finkleman, Ken;88;No;NicholasCage.png\n1990;;Live & Let Die;Action;Moore, Roger;Seymour, Jane;;62;No;NicholasCage.png\n1972;100;Le Charme Discret de la Bourgeoisie;Comedy;Rey, Fernando;Seyrig, Delphine;Bunuel, Luis;4;Yes;NicholasCage.png\n1986;83;Blue City;Action;Nelson, Judd;Sheedy, Ally;Manning, Michelle;38;No;NicholasCage.png\n1983;123;Bad Boys;Drama;Penn, Sean;Sheedy, Ally;Rosenthal, Rick;7;No;NicholasCage.png\n1986;82;Whoopee Boys, The;Comedy;O'Keefe, Michael;Shelley, Carole;Byrum, John;54;No;NicholasCage.png\n1971;118;Last Picture Show, The;Drama;Bottoms, Timothy;Shepherd, Cybill;Bogdanovich, Peter;62;Yes;NicholasCage.png\n1988;93;Diamond Trap, The;Drama;Hessman, Howard;Shields, Brooke;Taylor, Don;58;No;NicholasCage.png\n1981;115;Endless Love;Drama;Hewitt, Martin;Shields, Brooke;Zeffirelli, Franco;20;No;NicholasCage.png\n1976;90;Rocky;Drama;Stallone, Sylvester;Shire, Talia;Avildsen, John G.;78;Yes;NicholasCage.png\n1988;103;Cocktail;Drama;Cruise, Tom;Shue, Elisabeth;Donaldson, Roger;13;No;NicholasCage.png\n1936;77;Sabotage;Mystery;Homolka, Oskar;Sidney, Sylvia;Hitchcock, Alfred;74;No;alfredHitchcock.png\n1977;105;Madame Rosa;Drama;Youb, Samy Ben;Signoret, Simone;Mizrahi, Moshe;11;Yes;NicholasCage.png\n1985;56;Fozzie's Muppet Scrapbook;Comedy;Berle, Milton;Sills, Beverly;;86;No;NicholasCage.png\n1954;110;Desiree;Drama;Brando, Marlon;Simmons, Jean;Koster, Henry;22;No;brando.png\n1960;185;Spartacus;Drama;Douglas, Kirk;Simmons, Jean;Kubrick, Stanley;67;Yes;NicholasCage.png\n1955;150;Guys & Dolls;Comedy;Brando, Marlon;Simmons, Jean;Mankiewicz, Joseph L.;70;Yes;brando.png\n1992;95;Until They Sail;Drama;Newman, Paul;Simmons, Jean;Wise, Robert;77;No;paulNewman.png\n1988;116;Coming to America;Comedy;Murphy, Eddie;Sinclair, Madge;Landis, John;11;No;NicholasCage.png\n1963;93;Lilies of the Field;Drama;Poitier, Sidney;Skala, Lilia;Poitier, Sidney;36;Yes;NicholasCage.png\n1987;99;River's Edge;Drama;Glover, Crispin;Skye, Ione;Hunter, Tim;3;No;NicholasCage.png\n1986;93;Ruthless People;Comedy;DeVito, Danny;Slater, Helen;Abrahams, Jim;84;No;NicholasCage.png\n1987;110;Secret of My Success, The;Comedy;Fox, Michael J.;Slater, Helen;Ross, Herbert;5;No;NicholasCage.png\n1965;128;Shop on Main Street, The;Drama;Kroner, Josef;Slivoka, Hana;Kadar, Jan;37;Yes;NicholasCage.png\n1988;101;Funny Farm;Comedy;Chase, Chevy;Smith, Madolyn;Hill, George Roy;30;No;NicholasCage.png\n1988;120;Lonely Passion of Judith Hearne, The;Drama;Hoskins, Bob;Smith, Maggie;Clayton, Jack;24;No;NicholasCage.png\n1978;103;California Suite;Comedy;Caine, Michael;Smith, Maggie;Ross, Herbert;11;Yes;NicholasCage.png\n1986;97;Maximum Overdrive;Horror;Estevez, Emilio;Smith, Yeardley;King, Stephen;40;No;NicholasCage.png\n1985;116;Pale Rider;Western;Eastwood, Clint;Snodgress, Carrie;Eastwood, Clint;45;No;clintEastwood.png\n1990;88;Kissing Place, The;Drama;Birney, Meredith Baxter;Snow, Victoria;Wharmby, Tony;41;No;NicholasCage.png\n1986;90;French Lesson;Comedy;Sterling, Alexandre;Snowden, Jane;Gilbert, Brian;29;No;NicholasCage.png\n1985;88;Roller Blade;Action;Hutchinson, Jeff;Solari, Suzanne;Jackson, Donald G;31;No;NicholasCage.png\n1964;101;A Shot in the Dark;Comedy;Sellers, Peter;Sommer, Elke;Edwards, Blake;51;No;NicholasCage.png\n1979;88;Treasure Seekers, The;Action;Whitman, Stuart;Sommer, Elke;;2;No;NicholasCage.png\n1982;122;Missing;Drama;Lemmon, Jack;Spacek, Sissy;Costa-Gavras;30;No;NicholasCage.png\n1989;99;Picasso Trigger;Action;Bond, Steve;Speir, Dona;Sidaris, Andy;20;No;NicholasCage.png\n1987;97;Hard Ticket to Hawaii;Action;Moss, Ronn;Speir, Dona;Sidaris, Andy;36;No;NicholasCage.png\n1990;;Diamonds are Forever;Action;Connery, Sean;St. John, Jill;Hamilton, Guy;8;No;seanConnery.png\n1933;72;Baby Face;Drama;Brent, George;Stanwyck, Barbara;Green, Alfred E.;66;No;NicholasCage.png\n1992;95;Violent Men, The;Action;Ford, Glenn;Stanwyck, Barbara;Mate, Rudolph;25;No;glennFord.png\n1985;117;Cocoon;Science Fiction;Ameche, Don;Stapleton, Maureen;Howard, Ron;45;Yes;NicholasCage.png\n1986;96;Clockwise;Comedy;Cleese, John;Steadman, Alison;Morahan, Christopher;10;No;NicholasCage.png\n1993;103;Romantic Comedy;Comedy;Moore, Dudley;Steenburgen, Mary;;8;No;NicholasCage.png\n1981;111;Outland;Science Fiction;Connery, Sean;Sternhagen, Frances;Hyams, Peter;7;No;seanConnery.png\n1967;114;Hang 'em High;Western;Eastwood, Clint;Stevens, Inger;Post, Ted;67;No;clintEastwood.png\n1992;123;Basic Instinct;Mystery;Douglas, Michael;Stone, Sharon;Verhoeven, Paul;41;No;NicholasCage.png\n1990;113;Total Recall;Action;Schwarzenegger, Arnold;Stone, Sharon;Verhoeven, Paul;8;No;NicholasCage.png\n1987;115;Stakeout;Comedy;Dreyfuss, Richard;Stowe, Madeleine;Badham, John;13;No;NicholasCage.png\n1992;104;Unnamable II, The Statement of Randolph Carter, The;Drama;Rhys-Davies, John;Strain, Julie;Ouellette, Jean-Paul;36;No;NicholasCage.png\n1967;85;Trip, The;Drama;Fonda, Peter;Strasberg, Susan;Corman, Roger;64;No;NicholasCage.png\n1987;135;Ironweed;Drama;Nicholson, Jack;Streep, Meryl;Babenco, Hector;32;No;merylStreep.png\n1979;;Kramer vs. Kramer;Drama;Hoffman, Dustin;Streep, Meryl;Benton, Robert;8;Yes;merylStreep.png\n1988;;Still of the Night;Mystery;Scheider, Roy;Streep, Meryl;Benton, Robert;42;No;merylStreep.png\n1991;112;Defending Your Life;Comedy;Brooks, Albert;Streep, Meryl;Brooks, Albert;75;No;merylStreep.png\n1978;183;Deer Hunter, The;Drama;De Niro, Robert;Streep, Meryl;Cimino, Michael;82;Yes;merylStreep.png\n1984;106;Falling in Love;Drama;De Niro, Robert;Streep, Meryl;Grosbard, Ulu;31;No;merylStreep.png\n1986;108;Heartburn;Comedy;Nicholson, Jack;Streep, Meryl;Nichols, Mike;57;No;JackNicholson.png\n1983;131;Silkwood;Drama;Russell, Kurt;Streep, Meryl;Nichols, Mike;52;No;merylStreep.png\n1982;151;Sophie's Choice;Drama;Kline, Kevin;Streep, Meryl;Pakula, Alan J.;64;Yes;merylStreep.png\n1985;161;Out of Africa;Drama;Redford, Robert;Streep, Meryl;Pollack, Sydney;88;Yes;merylStreep.png\n1981;127;French Lieutenant's Woman, The;Drama;Irons, Jeremy;Streep, Meryl;Reisz, Karel;37;No;merylStreep.png\n1985;124;Plenty;Drama;Dance, Charles;Streep, Meryl;Schepisi, Fred;9;No;merylStreep.png\n1988;122;A Cry in the Dark;Drama;Neill, Sam;Streep, Meryl;Schepisi, Fred;67;No;merylStreep.png\n1989;99;She-Devil;Comedy;Begley, Ed, Jr.;Streep, Meryl;Seidelman, Susan;43;No;merylStreep.png\n1992;103;Death Becomes Her;Drama;Willis, Bruce;Streep, Meryl;Zemeckis, Robert;61;No;merylStreep.png\n1991;28;Kids & Pesticides;Drama;Whyatt, Robin;Streep, Meryl;;36;No;merylStreep.png\n1970;129;On a Clear Day You Can See Forever;Music;Montand, Yves;Streisand, Barbra;Minnelli, Vincente;67;No;NicholasCage.png\n1987;100;Nuts;Drama;Dreyfuss, Richard;Streisand, Barbra;Ritt, Martin;52;No;NicholasCage.png\n1983;134;Yentl;Music;Patinkin, Mandy;Streisand, Barbra;Streisand, Barbra;46;No;NicholasCage.png\n1968;151;Funny Girl;Music;Sharif, Omar;Streisand, Barbra;Wyler, William;30;Yes;NicholasCage.png\n1990;97;Fellow Traveller;Drama;Travanti, Daniel J.;Stubbs, Imogen;Towns, Philip Saville;39;No;NicholasCage.png\n1970;140;Dodesukaden;Drama;Zushi, Yoshitaka;Sugai, Kin;Kurosawa, Akira;75;No;NicholasCage.png\n1987;;Sicilian, The;Drama;Lambert, Christopher;Sukowa, Barbara;Cimino, Michael;41;No;NicholasCage.png\n1941;117;So Ends Our Night;Drama;March, Fredric;Sullavan, Margaret;Cromwell, John;2;No;NicholasCage.png\n1984;102;Sword of the Valiant;Action;O'Keeffe, Miles;Sutton, Emma;Weeks, Stephen;5;No;NicholasCage.png\n1949;78;Devil's Wanton, The;Drama;Malmsten, Birger;Svedlund, Doris;Bergman, Ingmar;66;No;Bergman.png\n1989;99;Driving Miss Daisy;Drama;Freeman, Morgan;Tandy, Jessica;Beresford, Bruce;6;Yes;NicholasCage.png\n1991;111;Seventh Cross, The;Drama;Tracy, Spencer;Tandy, Jessica;;35;No;spencerTracy.png\n1983;105;Between Friends;Drama;Ramer, Henry;Taylor, Elizabeth;Antonio, Lou;54;No;elizabethTaylor.png\n1957;173;Raintree County;Drama;Clift, Montgomery;Taylor, Elizabeth;Dmytryk, Edward;74;No;elizabethTaylor.png\n1975;101;Driver's Seat, The;Drama;Bannen, Ian;Taylor, Elizabeth;Griffi, Giuseppe Patroni;72;No;elizabethTaylor.png\n1967;109;Reflections in a Golden Eye;Drama;Brando, Marlon;Taylor, Elizabeth;Huston, John;81;No;elizabethTaylor.png\n1972;110;X, Y & Zee;Drama;Caine, Michael;Taylor, Elizabeth;Hutton, Brian G.;87;No;elizabethTaylor.png\n1968;109;Secret Ceremony;Drama;Mitchum, Robert;Taylor, Elizabeth;Losey, Joseph;60;No;elizabethTaylor.png\n1963;243;Cleopatra;Drama;Burton, Richard;Taylor, Elizabeth;Mankiewicz, Joseph L.;80;No;elizabethTaylor.png\n1950;;Father of the Bride;Comedy;Taylor, Rod;Taylor, Elizabeth;Minnelli, Vincente;54;No;elizabethTaylor.png\n1992;130;Who's Afraid of Virginia Woolf?;Drama;Burton, Richard;Taylor, Elizabeth;Nichols, Mike;82;Yes;elizabethTaylor.png\n1977;110;A Little Night Music;Music;Cariou, Len;Taylor, Elizabeth;Prince, Harold;61;No;elizabethTaylor.png\n1956;201;Giant;Drama;Hudson, Rock;Taylor, Elizabeth;Stevens, George;61;Yes;elizabethTaylor.png\n1985;94;Rumor Mill, The;Drama;Dysart, Richard A.;Taylor, Elizabeth;Trikonis, Gus;62;No;elizabethTaylor.png\n1943;90;Lassie Come Home;Drama;McDowall, Roddy;Taylor, Elizabeth;Wilcox, Fred M;79;No;elizabethTaylor.png\n1993;76;Return Engagement;Drama;Bottoms, Joseph;Taylor, Elizabeth;;26;No;elizabethTaylor.png\n1972;108;Hammersmith Is Out;Drama;Burton, Richard;Taylor, Elizabeth;;80;No;elizabethTaylor.png\n1991;60;Super Duper Bloopers;Comedy;Cooper, Gary;Taylor, Elizabeth;;21;No;elizabethTaylor.png\n1991;;Elizabeth Taylor Collection, The;Drama;Fisher, Eddie;Taylor, Elizabeth;;21;No;elizabethTaylor.png\n1973;99;Ash Wednesday;Drama;Fonda, Henry;Taylor, Elizabeth;;54;No;elizabethTaylor.png\n1991;117;Last Time I Saw Paris, The;Drama;Johnson, Van;Taylor, Elizabeth;;13;No;elizabethTaylor.png\n1931;125;Cimarron;Western;Dix, Richard;Taylor, Estelle;Ruggles, Wesley;44;Yes;NicholasCage.png\n1992;83;Apache Woman;Western;Bridges, Lloyd;Taylor, Joan;Corman, Roger;32;No;NicholasCage.png\n1984;;Gary Numan - Berzerker;Music;Webb, John;Taylor, Karen;;60;No;NicholasCage.png\n1988;101;Mystic Pizza;Comedy;Moses, William;Taylor, Lili;Petrie, Donald;74;No;NicholasCage.png\n1991;95;Dogfight;Action;Phoenix, River;Taylor, Lili;Savoca, Nancy;66;No;NicholasCage.png\n1935;234;Adventures of Rex & Rinty, The;Western;Rex the Wonder Horse;Taylor, Norma;Beebe, Ford;87;No;NicholasCage.png\n1988;60;Daphnis & Chloe;Music;Morrow, Carl;Taylor, Victoria;Wimhurst, Jolyon;85;No;NicholasCage.png\n1980;97;Marathon;Comedy;Newhart, Bob;Taylor-Young, Leigh;Cooper, Jackie;76;No;NicholasCage.png\n1948;127;Fort Apache;Western;Fonda, Henry;Temple, Shirley;Ford, John;4;No;johnFord.png\n1937;100;Wee Willie Winkie;Drama;Romero, Cesar;Temple, Shirley;Ford, John;78;No;johnFord.png\n1987;91;Big Shots;Action;Busker, Ricky;Thayer, Brynn;Mandel, Robert;5;No;NicholasCage.png\n1988;85;Doin' Time on Planet Earth;Comedy;Strouse, Nocholas;Thompson, Andrea;Matthau, Charles;44;Yes;NicholasCage.png\n1983;91;All the Right Moves;Drama;Cruise, Tom;Thompson, Lea;Chapman, Michael;65;No;NicholasCage.png\n1987;93;Some Kind of Wonderful;Drama;Stoltz, Eric;Thompson, Lea;Deutch, Howard;16;No;NicholasCage.png\n1990;87;All New Tales from the Crypt, A Trilogy;Horror;Walsh, M. Emmet;Thompson, Lea;Deutch, Howard;33;No;NicholasCage.png\n1985;116;Back to the Future;Comedy;Fox, Michael J.;Thompson, Lea;Zemeckis, Robert;9;No;NicholasCage.png\n1963;80;Winter Light;Drama;Björnstrand, Gunnar;Thulin, Ingrid;Bergman, Ingmar;2;No;Bergman.png\n1963;95;Silence, The;Drama;Malmsten, Birger;Thulin, Ingrid;Bergman, Ingmar;79;No;Bergman.png\n1959;100;Magician, The;Drama;Sydow, Max von;Thulin, Ingrid;Bergman, Ingmar;3;No;Bergman.png\n1961;154;Four Horsemen of the Apocalypse, The;Drama;Ford, Glenn;Thulin, Ingrid;Minnelli, Vincente;71;No;glennFord.png\n1986;99;Critical Condition;Comedy;Pryor, Richard;Ticotin, Rachel;Apted, Michael;41;No;NicholasCage.png\n1989;88;Center of the Web;Mystery;Curtis, Tony;Tilton, Charlene;;42;No;NicholasCage.png\n1990;110;Border Shootout;Action;Ford, Glenn;Tilton, Charlene;;7;No;glennFord.png\n1989;109;Lean on Me;Drama;Freeman, Morgan;Todd, Beverly;Avildsen, John G.;51;No;NicholasCage.png\n1986;221;On Wings of Eagles;Drama;Lancaster, Burt;Towers, Constance;McLaglen, Andrew V.;53;No;burtLancaster.png\n1941;94;Texas;Western;Holden, William;Trevor, Claire;Marshall, George;79;No;NicholasCage.png\n1939;80;Allegheny Uprising;Drama;Wayne, John;Trevor, Claire;Seiter, William A.;53;No;johnWayne.png\n1940;95;Dark Command;Western;Wayne, John;Trevor, Claire;Walsh, Raoul;52;No;johnWayne.png\n1986;103;Peggy Sue Got Married;Drama;Cage, Nicolas;Turner, Kathleen;Coppola, Francis Ford;62;No;NicholasCage.png\n1989;84;Dear America, Letters Home from Vietnam;War;De Niro, Robert;Turner, Kathleen;Couturie, Bill;57;No;NicholasCage.png\n1985;130;Prizzi's Honor;Comedy;Nicholson, Jack;Turner, Kathleen;Huston, John;25;Yes;JackNicholson.png\n1983;90;Man with Two Brains, The;Comedy;Martin, Steve;Turner, Kathleen;Reiner, Carl;68;No;NicholasCage.png\n1984;101;Crimes of Passion;Drama;Perkins, Anthony;Turner, Kathleen;Russell, Ken;4;No;NicholasCage.png\n1985;106;Jewel of the Nile, The;Action;Douglas, Michael;Turner, Kathleen;Teague, Lewis;68;No;NicholasCage.png\n1984;106;Romancing the Stone;Action;Douglas, Michael;Turner, Kathleen;Zemeckis, Robert ;83;No;NicholasCage.png\n1988;121;Accidental Tourist, The;Comedy;Hurt, William;Turner, Kathleen;;56;Yes;NicholasCage.png\n1955;117;Sea Chase, The;War;Wayne, John;Turner, Lana;Farrow, John;4;No;johnWayne.png\n1958;98;Another Time, Another Place;Drama;Connery, Sean;Turner, Lana;;4;No;seanConnery.png\n1988;90;Cannibal Women in the Avocado Jungle of Death;Comedy;Primus, Barry;Tweed, Shannon;Lawton, J.F.;56;No;NicholasCage.png\n1986;91;Mr Love.;Comedy;Jackson, Barry;Tyzack, Margaret;Battersby, Roy;10;No;NicholasCage.png\n1968;139;2001: A Space Odyssey;Science Fiction;Dullea, Keir;Tyzack, Margaret;Kubrick, Stanley;83;No;NicholasCage.png\n1966;81;Persona;Drama;Björnstrand, Gunnar;Ullman, Liv;Bergman, Ingmar;81;Yes;Bergman.png\n1973;;Scenes from a Marriage;Drama;Josephson, Erland;Ullman, Liv;Bergman, Ingmar;3;Yes;Bergman.png\n1968;88;Hour of the Wolf;Drama;Sydow, Max von;Ullman, Liv;Bergman, Ingmar;37;No;Bergman.png\n1969;101;Passion of Anna, The;Drama;Sydow, Max von;Ullman, Liv;Bergman, Ingmar;6;No;Bergman.png\n1984;96;Dangerous Moves;Drama;Caron, Leslie;Ullman, Liv;Dembo, Richard;7;Yes;NicholasCage.png\n1957;147;Sayonara;Drama;Brando, Marlon;Umeki, Miyoshi;Logan, Joshua;19;Yes;brando.png\n1968;158;Where Eagles Dare;War;Burton, Richard;Ure, Mary;Hulton, Brian G.;57;No;NicholasCage.png\n1985;95;Teen Wolf;Drama;Fox, Michael J.;Ursitti, Susan;Daniel, Rod;58;No;NicholasCage.png\n1990;88;Amazon;Action;Davi, Robert;Vaananen, Kari;Kaurismäki, Mika;30;No;NicholasCage.png\n1973;;Paper Chase, The;Drama;Bottoms, Timothy;Wagner, Lindsay;Bridges, James;7;Yes;NicholasCage.png\n1959;88;Virgin Spring, The;Drama;Sydow, Max von;Valberg, Brigitta;Bergman, Ingmar;8;Yes;Bergman.png\n1970;97;Spider's Stratagem;Drama;Brogi, Giulio;Valli, Alida;Bertolucci, Bernardo;45;No;NicholasCage.png\n1971;102;Play Misty for Me;Mystery;Eastwood, Clint;Walter, Jessica;Eastwood, Clint;47;No;clintEastwood.png\n1981;88;Going Ape;Comedy;Danza, Tony;Walter, Jessica;Kronsberg, Jeremy Joe;65;No;NicholasCage.png\n1967;127;Cool Hand Luke;Drama;Newman, Paul;Van Fleet, Jo;Rosenberg, Stuart;49;Yes;paulNewman.png\n1988;89;Phantom of the Ritz;Horror;Bergman, Peter;Van Valkenburgh, Deborah;Plone, Allen;85;No;NicholasCage.png\n1990;85;Crash & Burn;Science Fiction;Ganus, Paul;Ward, Megan;Band, Charles;75;No;NicholasCage.png\n1991;114;After Dark My Sweet;Mystery;Patric, Jason;Ward, Rachel;Foley, James;33;No;NicholasCage.png\n1992;121;Christopher Columbus: The Discovery;Adventure;Brando, Marlon;Ward, Rachel;Glen, John;39;No;NicholasCage.png\n1986;109;Young Sherlock Holmes;Mystery;Rowe, Nicholas;Ward, Sophie;Levinson, Barry;16;No;NicholasCage.png\n1991;104;Doc Hollywood;Comedy;Fox, Michael J.;Warner, Julie;Caton-Jones, Michael;64;No;NicholasCage.png\n1988;96;Baja Oklahoma;Comedy;Coyote, Peter;Warren, Lesley Ann;Roth, Bobby;71;No;NicholasCage.png\n1986;137;Aliens;Science Fiction;Biehn, Michael;Weaver, Sigourney;Cameron, James;82;No;weaver.png\n1992;115;Alien Three;Science Fiction;Dutton, Charles;Weaver, Sigourney;Fincher, David;59;No;weaver.png\n1997;109;Alien: resurrection;Science Fiction;Perlman, Ron;Weaver, Sigourney;Jeunet, Jean-Pierre;60;No;weaver.png\n1979;117;Alien;Science Fiction;Skerritt, Tom;Weaver, Sigourney;Scott, Ridley;83;No;weaver.png\n1985;97;One Woman or Two;Comedy;Depardieu, Gérard;Weaver, Sigourney;Vigne, Daniel;64;No;weaver.png\n1984;96;Soggy Bottom U. S. A.;Comedy;Johnson, Ben;Wedgeworth, Ann;Flicker, Theodore J.;50;No;NicholasCage.png\n1973;96;Bang the Drum Slowly;Drama;Moriarty, Michael;Wedgeworth, Ann;Hancock, John D.;73;No;NicholasCage.png\n1974;82;Catamount Killing, The;Action;Buchholz, Horst;Wedgeworth, Ann;Zanussi, Krzystoff;84;No;NicholasCage.png\n1972;92;Fuzz;Action;Reynolds, Burt;Welch, Raquel;Colla, Richard A.;37;No;NicholasCage.png\n1966;101;Shoot Loud, Louder, I Don't Understand!;Mystery;Mastroianni, Marcello;Welch, Raquel;De Filippo, Eduardo;70;No;NicholasCage.png\n1967;107;Bedazzled;Comedy;Cook, Peter;Welch, Raquel;Donen, Stanley;67;No;NicholasCage.png\n1977;120;Prince & the Pauper, The;Action;Reed, Oliver;Welch, Raquel;Fleischer, Richard;86;No;NicholasCage.png\n1969;110;One Hundred Rifles;Western;Reynolds, Burt;Welch, Raquel;Gries, Tom;48;No;NicholasCage.png\n1975;90;Wild Party, The;Drama;Dukes, David;Welch, Raquel;Ivory, James;75;No;NicholasCage.png\n1968;106;Bandolero!;Western;Stewart, James;Welch, Raquel;McLaglen, Andrew V.;9;No;NicholasCage.png\n1973;119;Last of Sheila, The;Mystery;Coburn, James;Welch, Raquel;Ross, Herbert;39;No;NicholasCage.png\n1972;87;Hannie Caulder;Drama;Borgnine, Ernest;Welch, Raquel;;9;No;NicholasCage.png\n1990;;Sounds of the Seventies...& the Beat Goes;Music;Jones, Tom;Welch, Raquel;;13;No;NicholasCage.png\n1988;161;Bird;Drama;Whitaker, Forest;Venora, Diane;Eastwood, Clint;24;No;NicholasCage.png\n1955;60;Meet Millie;Drama;Halop, Florence;Verdugo, Elena;;82;No;NicholasCage.png\n1987;88;Hell Comes to Frogtown;Science Fiction;LeFlore, Julius;Verrell, Cec;Jackson, Donald G;74;No;NicholasCage.png\n1966;126;Fortune Cookie, The;Comedy;Lemmon, Jack;West, Judi;Wilder, Billy;3;Yes;NicholasCage.png\n1990;92;Sun Shines Bright, The;Action;Winninger, Charles;Whelan, Arleen;Ford, John;46;No;johnFord.png\n1987;106;Squeeze, The;Action;Keach, Stacy;White, Carol;Apted, Michael;23;No;NicholasCage.png\n1970;91;Start the Revolution Without Me;Comedy;Wilder, Gene;Whitelaw, Billie;Yorkin, Bud;62;No;NicholasCage.png\n1989;107;Major League;Comedy;Sheen, Charlie;Whitton, Margaret;Ward, David S.;64;No;NicholasCage.png\n1990;108;Bright Lights, Big City;Drama;Fox, Michael J.;Wiest, Dianne;Bridges, James;30;No;NicholasCage.png\n1987;97;Lost Boys, The;Horror;Patric, Jason;Wiest, Dianne;Schumacher, Joel;67;No;NicholasCage.png\n1989;93;Cookie;Comedy;Falk, Peter;Wiest, Dianne;Seidelman, Susan;43;No;NicholasCage.png\n1974;114;Conversation, The;Drama;Hackman, Gene;Williams, Cindy;Coppola, Francis Ford;59;Yes;NicholasCage.png\n1973;112;American Graffiti;Comedy;Dreyfuss, Richard;Williams, Cindy;Lucas, George;39;Yes;NicholasCage.png\n1953;96;Dangerous When Wet;Music;Lamas, Fernando;Williams, Esther;Walters, Charles;67;No;NicholasCage.png\n1980;111;Stir Crazy;Comedy;Pryor, Richard;Williams, JoBeth;Poitier, Sidney;40;No;NicholasCage.png\n1989;91;Young Einstein;Comedy;Serious, Yahoo;Wilson, Pee-Wee;Serious, Yahoo;47;No;NicholasCage.png\n1956;83;Killing, The;Drama;Hayden, Sterling;Windsor, Marie;Kubrick, Stanley;51;No;NicholasCage.png\n1973;102;Cahill, United States Marshal;Western;Wayne, John;Windsor, Marie;McLaglen, Andrew V.;12;No;johnWayne.png\n1989;90;Savage Intruder, The;Horror;Garfield, John David;Wing, Virginia;Wolfe, Donald;24;No;NicholasCage.png\n1992;139;Sheltering Sky, The;Drama;Malkovich, John;Winger, Debra;Bertolucci, Bernardo;64;No;NicholasCage.png\n1982;125;An Officer & a Gentleman;Drama;Gere, Richard;Winger, Debra;Hackford, Taylor;1;Yes;NicholasCage.png\n1987;101;Black Widow;Mystery;Hopper, Dennis;Winger, Debra;Rafelson, Bob;54;No;NicholasCage.png\n1986;116;Legal Eagles;Comedy;Redford, Robert;Winger, Debra;Reitman, Ivan;39;No;NicholasCage.png\n1970;90;Bloody Mama;Action;Stroud, Don;Winters, Shelley;Corman, Roger;17;No;NicholasCage.png\n1965;106;A Patch of Blue;Drama;Poitier, Sidney;Winters, Shelley;Green, Guy;51;No;NicholasCage.png\n1955;109;I Died a Thousand Times;Drama;Palance, Jack;Winters, Shelley;Heisler, Stuart;23;No;NicholasCage.png\n1977;90;Tentacles;Horror;Huston, John;Winters, Shelley;Hellman, Oliver;62;No;NicholasCage.png\n1968;100;Scalphunters, The;Western;Lancaster, Burt;Winters, Shelley;Pollack, Sydney;33;No;burtLancaster.png\n1992;96;A Day in October;Drama;Sweeney, D. B.;Wolf, Kelly;Madsen, Kenneth;76;No;NicholasCage.png\n1964;102;A Fistful of Dollars;Westerns;Eastwood, Clint;Volonte, Gian Maria;Leone, Sergio;61;No;clintEastwood.png\n1985;94;My Science Project;Comedy;Stockwell, John;Von Zerneck, Danielle;Betnel, Jonathan;84;No;NicholasCage.png\n1991;160;Great Race, The;Comedy;Moore, Dudley;Wood, Natalie;Edwards, Blake;88;No;NicholasCage.png\n1956;119;Searchers, The;Western;Wayne, John;Wood, Natalie;Ford, John;9;No;johnWayne.png\n1979;105;Meteor;Action;Connery, Sean;Wood, Natalie;Neame, Ronald;5;No;seanConnery.png\n1955;111;Rebel Without a Cause;Drama;Dean, James;Wood, Natalie;Ray, Nicholas;82;No;NicholasCage.png\n1961;153;West Side Story;Music;Beymer, Richard;Wood, Natalie;Wise, Robert;38;Yes;NicholasCage.png\n1970;110;Trash;Comedy;Dallesandro, Joe;Woodlawn, Holly;Morrissey, Paul;68;No;NicholasCage.png\n1966;95;A Big Hand for the Little Lady;Comedy;Fonda, Henry;Woodward, Joanne;Cook, Fielder;12;No;NicholasCage.png\n1966;104;A Fine Madness;Comedy;Connery, Sean;Woodward, Joanne;Kershner, Irvin;6;No;seanConnery.png\n1987;134;Glass Menagerie, The;Drama;Malkovich, John;Woodward, Joanne;Newman, Paul;68;No;NicholasCage.png\n1989;117;Harry & Son;Drama;Newman, Paul;Woodward, Joanne;Newman, Paul;57;No;paulNewman.png\n1968;102;Rachel, Rachel;Drama;Olson, James;Woodward, Joanne;Newman, Paul;32;No;NicholasCage.png\n1961;98;Paris Blues;Drama;Newman, Paul;Woodward, Joanne;Ritt, Martin;54;No;paulNewman.png\n1960;135;Fugitive Kind, The;Drama;Brando, Marlon;Woodward, Joanne;;3;No;brando.png\n1993;;Mr. & Mrs. Bridge;Drama;Newman, Paul;Woodward, Joanne;;29;No;paulNewman.png\n1991;144;State of Grace;Drama;Penn, Sean;Wright, Robin;Joanou, Phil;49;No;NicholasCage.png\n1943;108;Shadow of a Doubt;Drama;Cotten, Joseph;Wright, Teresa;Hitchcock, Alfred;32;No;alfredHitchcock.png\n1950;85;Men, The;Drama;Brando, Marlon;Wright, Teresa;Zinnemann, Fred;27;No;brando.png\n1950;110;Stage Fright;Mystery;Wilding, Michael;Wyman, Jane;Hitchcock, Alfred;72;No;alfredHitchcock.png\n1947;103;Magic Town;Drama;Stewart, James;Wyman, Jane;Wellman, William;4;No;NicholasCage.png\n1975;93;That Lucky Touch;Action;Moore, Roger;York, Susannah;Miles, Christopher;85;No;NicholasCage.png\n1949;90;Lust for Gold;Drama;Ford, Glenn;Young, Gig;Simon, S. Sylvan;57;No;glennFord.png\n1987;103;Heat;Mystery;Reynolds, Burt;Young, Karen;Jameson, Jerry;69;No;NicholasCage.png\n1993;75;Employee's Entrance;Drama;William, Warren;Young, Loretta;;0;No;NicholasCage.png\n1947;87;Night Is My Future;Drama;Malmsten, Birger;Zetterling, Mai;Bergman, Ingmar;17;No;Bergman.png\n1990;92;Witches, The;Science Fiction;Fisher, Jasen;Zetterling, Mai;Roeg, Nicolas;18;No;NicholasCage.png\n1953;94;Vera Cruz;Action;Cooper, Gary;;Aldrich, Robert;71;No;NicholasCage.png\n1954;91;Apache;Western;Lancaster, Burt;;Aldrich, Robert;78;No;burtLancaster.png\n1977;146;Twilight's Last Gleaming;Drama;Lancaster, Burt;;Aldrich, Robert;84;No;burtLancaster.png\n1979;119;Frisco Kid, The;Comedy;Wilder, Gene;;Aldrich, Robert;10;No;NicholasCage.png\n1954;30;Bank on the Stars;Drama;Paar, Jack;;Allen, Craig;;No;NicholasCage.png\n1987;100;Law of Desire;Drama;Maura, Carmen;;Almodóvar, Pedro;73;No;NicholasCage.png\n1966;103;Quiller Memorandum, The;Mystery;Segal, George;;Anderson, Michael;34;No;NicholasCage.png\n1962;183;Longest Day, The;War;Wayne, John;;Annakin, Ken;7;No;johnWayne.png\n1986;128;Name of the Rose, The;Drama;Connery, Sean;;Annaud, Jean-Jacques;8;No;seanConnery.png\n1988;92;Bloodsport;Action;Van Damme, Jean-Claude;;Arnold, Newt;78;No;NicholasCage.png\n1986;85;Torment;Horror;Gilbert, Taylor;;Aslanian, Samson;8;No;NicholasCage.png\n1988;138;Pelle the Conqueror;Drama;Sydow, Max von;;August, Bille;14;Yes;NicholasCage.png\n1981;118;Taps;Drama;Hutton, Timothy;;Becker, Harold;84;No;NicholasCage.png\n1991;102;Freshman, The;Comedy;Brando, Marlon;;Bergman, Andrew;32;No;brando.png\n1987;164;Last Emperor, The;Drama;Lone, John;;Bertolucci, Bernardo;1;Yes;NicholasCage.png\n1962;100;Grim Reaper, The;Drama;Rulu, Francesco;;Bertolucci, Bernardo;35;No;NicholasCage.png\n1983;90;Le Dernier Combat;Drama;Jolivet, Pierre;;Besson, Luc;72;No;NicholasCage.png\n1989;91;Too Beautiful for You;Drama;Depardieu, Gérard;;Blier, Bertrand;35;No;NicholasCage.png\n1991;105;Fire, Ice & Dynamite;Action;Moore, Roger;;Bogner, Willy;72;No;NicholasCage.png\n1963;113;Heavens Above;Comedy;Sellers, Peter;;Boulting, John;38;No;NicholasCage.png\n1961;141;One Eyed Jacks;Western;Malden, Karl;;Brando, Marlon;26;No;brando.png\n1937;61;Swing It, Sailor!;Comedy;Ford, Wallace;;Cannon, Raymond;83;No;NicholasCage.png\n1987;94;Wolf at the Door, The;Drama;Sutherland, Donald;;Carlsen, Henning;68;No;NicholasCage.png\n1936;87;Modern Times;Comedy;Chaplin, Charles;;Chaplin, Charles;4;No;NicholasCage.png\n1991;114;Thunderbolt & Lightfoot;Action;Eastwood, Clint;;Cimino, Michael;16;No;clintEastwood.png\n1931;87;A Nous la Liberte;Drama;Marchand, Henri;;Clair, Rene;60;No;NicholasCage.png\n1979;95;Scum;Action;Winstone, Ray;;Clarke, Alan;68;No;NicholasCage.png\n1984;90;Inside Man, The;Action;Hopper, Dennis;;Clegg, Tom;45;No;NicholasCage.png\n1979;153;Apocalypse Now;Drama;Brando, Marlon;;Coppola, Francis Ford;8;No;brando.png\n1990;94;Bellboy & the Playgirls, The;Drama;Wilkinson, June;;Coppola, Francis Ford;7;No;NicholasCage.png\n1963;81;Terror, The;Horror;Karloff, Boris;;Corman, Roger;88;No;NicholasCage.png\n1963;86;Raven, The;Horror;Price, Vincent;;Corman, Roger;85;No;NicholasCage.png\n1975;87;They Came from Within;Horror;Hampton, Paul;;Cronenberg, David;21;No;NicholasCage.png\n1986;97;Boy in Blue, The;Drama;Cage, Nicolas;;Dale, Cynthia;63;No;NicholasCage.png\n1991;87;Killer Tomatoes Strike Back;Comedy;Astin, John;;De Bello, John;24;No;NicholasCage.png\n1979;87;Attack of the Killer Tomatoes;Comedy;Wilson, George;;De Bello, John;47;No;NicholasCage.png\n1987;119;Untouchables, The;Drama;Connery, Sean;;De Palma, Brian;7;Yes;seanConnery.png\n1986;91;Wise Guys;Comedy;Piscopo, Joe;;De Palma, Brian;16;No;NicholasCage.png\n1989;90;American Autobahn;Drama;Jalenak, Jan;;Degas, Andre;75;No;NicholasCage.png\n1990;94;Final Alliance, The;Action;Hasselhoff, David;;Di Leo, Mario;10;No;NicholasCage.png\n1984;130;Bounty, The;Drama;Gibson, Mel;;Donaldson, Roger;25;No;NicholasCage.png\n1974;89;Little Prince, The;Music;Kiley, Richard;;Donen, Stanley;31;No;NicholasCage.png\n1975;94;Posse;Western;Douglas, Kirk;;Douglas, Kirk;76;No;NicholasCage.png\n1982;136;Firefox;Action;Eastwood, Clint;;Eastwood, Clint;64;No;clintEastwood.png\n1987;91;Penitentiary III;Action;Kennedy, Leon Isaac;;Fanaka, Jamaa;82;No;NicholasCage.png\n1993;;Ginger & Fred;Comedy;Mastroianni, Marcello;;Fellini, Federico;29;No;NicholasCage.png\n1966;107;Wrong Box, The;Comedy;Mills, John;;Forbes, Bryan;40;No;NicholasCage.png\n1990;86;Wagonmaster;Western;Johnson, Ben;;Ford, John;1;No;johnFord.png\n1945;135;They Were Expendable;War;Montgomery, Robert;;Ford, John;88;No;johnFord.png\n1991;125;Last Hurrah, The;Drama;Tracy, Spencer;;Ford, John;46;No;spencerTracy.png\n1949;59;Law of the Golden West;Western;Hale, Monte;;Ford, Philip;1;No;NicholasCage.png\n1949;60;Pioneer Marshal;Western;Hale, Monte;;Ford, Philip;8;No;NicholasCage.png\n1949;60;Ranger of the Cherokee Strip;Western;Hale, Monte;;Ford, Philip;31;No;NicholasCage.png\n1950;60;Vanishing Westerner;Western;Hale, Monte;;Ford, Philip;6;No;NicholasCage.png\n1948;59;Bandits of Dark Canyon;Western;Lane, Allan;;Ford, Philip;72;No;NicholasCage.png\n1948;60;Bold Frontiersman, The;Western;Lane, Allan;;Ford, Philip;18;No;NicholasCage.png\n1948;59;Wild Frontier, The;Western;Lane, Allan;;Ford, Philip;61;No;NicholasCage.png\n1968;73;Firemen's Ball, The;Comedy;Vostrcil, Jan;;Forman, Milos;8;No;NicholasCage.png\n1983;112;Local Hero;Comedy;Riegert, Peter;;Forsyth, Bill;54;No;NicholasCage.png\n1971;104;French Connection, The;Drama;Hackman, Gene;;Friedkin, William;88;Yes;NicholasCage.png\n1985;114;To Live & Die in L. A.;Action;Stockwell, Dean;;Friedkin, William;70;No;NicholasCage.png\n1961;113;Ferry to Hong Kong;Drama;Welles, Orson;;Gilbert, Lewis;77;No;NicholasCage.png\n1983;69;Eddie Murphy, Delirious;Comedy;Murphy, Eddie;;Gower, Bruce;6;No;NicholasCage.png\n1984;77;Secret Policeman's Private Parts, The;Comedy;Cleese, John;;Graef, Roger;36;No;NicholasCage.png\n1958;83;Up the Creek;Comedy;Sellers, Peter;;Guest, Val;54;No;NicholasCage.png\n1982;111;Yol;Drama;Akan, Tarik;;Guney, Yilmaz;53;No;NicholasCage.png\n1989;150;Sara Dane;Drama;Hopkins, Harold;;Hardy, Rod;75;No;NicholasCage.png\n1988;84;Night Tide;Drama;Muir, Gavin;;Harrington, Curtis;50;No;NicholasCage.png\n1953;92;His Majesty O'Keefe;Action;Lancaster, Burt;;Haskin, Byron;3;No;burtLancaster.png\n1960;122;North to Alaska;Western;Wayne, John;;Hathaway, Henry;31;No;johnWayne.png\n1966;76;Flight to Fury;Action;Nicholson, Jack;;Hellman, Monte;70;No;NicholasCage.png\n1966;82;Ride in the Whirlwind;Western;Nicholson, Jack;;Hellman, Monte;26;No;NicholasCage.png\n1970;93;Powderkeg;Western;Taylor, Rod;;Heyes, Douglas;26;No;NicholasCage.png\n1953;95;I Confess;Drama;Clift, Montgomery;;Hitchcock, Alfred;63;No;alfredHitchcock.png\n1935;88;Thirty-Nine Steps, The;Science Fiction;Donat, Robert;;Hitchcock, Alfred;8;No;alfredHitchcock.png\n1969;126;Topaz;Mystery;Forsythe, John;;Hitchcock, Alfred;12;No;alfredHitchcock.png\n1930;95;Murder;Mystery;Marshall, Herbert;;Hitchcock, Alfred;50;No;alfredHitchcock.png\n1954;123;Dial M for Murder;Mystery;Milland, Ray;;Hitchcock, Alfred;52;No;alfredHitchcock.png\n1937;80;Young & Innocent;Mystery;Pilbeam, Nova;;Hitchcock, Alfred;43;No;alfredHitchcock.png\n1976;95;Creature from Black Lake;Horror;Elam, Jack;;Houck, Joy, Jr.;88;No;NicholasCage.png\n1981;124;Chariots of Fire;Drama;Cross, Ben;;Hudson, Hugh;6;Yes;NicholasCage.png\n1982;81;Monty Python Live at the Hollywood Bowl;Comedy;Chapman, Graham;;Hughes, Terry;81;No;NicholasCage.png\n1975;129;Man Who Would Be King, The;Drama;Connery, Sean;;Huston, John;6;No;seanConnery.png\n1981;117;Victory;Drama;Stallone, Sylvester;;Huston, John;39;No;NicholasCage.png\n1970;146;Kelly's Heroes;War;Eastwood, Clint;;Hutton, Brian G.;84;No;clintEastwood.png\n1989;109;Next of Kin;Mystery;Swayze, Patrick;;Irvin, John;63;No;NicholasCage.png\n1990;96;Chattahoochee;Drama;Oldman, Gary;;Jackson, Mick;30;No;NicholasCage.png\n1985;82;Angelic Conversation, The;Comedy;Reynolds, Paul;;Jarman, Derek;41;No;NicholasCage.png\n1986;107;Down by Law;Comedy;Waits, Tom;;Jarmusch, Jim;49;No;NicholasCage.png\n1984;141;Killing Fields, The;Drama;Waterston, Sam;;Joffe, Roland;6;Yes;NicholasCage.png\n1992;85;Survival Zone;Action;Ford, Terence;;Jones, Chris;25;No;NicholasCage.png\n1979;94;Monty Python's Life of Brian;Comedy;Chapman, Graham;;Jones, Terry;11;No;NicholasCage.png\n1983;107;Monty Python's the Meaning of Life;Comedy;Cleese, John;;Jones, Terry;33;No;NicholasCage.png\n1971;121;Red Tent, The;Action;Finch, Peter;;Kalatozov, Mikhail;7;No;NicholasCage.png\n1945;82;Dakota;Western;Wayne, John;;Kane, Joseph;27;No;johnWayne.png\n1952;112;Viva Zapata!;Drama;Brando, Marlon;;Kazan, Elia;86;Yes;brando.png\n1968;133;Green Berets, The;War;Wayne, John;;Kellogg, Ray;36;No;johnWayne.png\n1990;90;Big Bad John;Action;English, Doug;;Kennedy, Burt;84;No;NicholasCage.png\n1937;71;Ticket of Leave Man, The;Mystery;Slaughter, Tod;;King, George;45;No;NicholasCage.png\n1956;106;D-Day, The Sixth of June;War;Taylor, Robert;;Koster, Henry;84;No;NicholasCage.png\n1974;121;Apprenticeship of Duddy Kravitz, The;Drama;Dreyfuss, Richard;;Kotcheff, Ted;64;Yes;NicholasCage.png\n1971;138;A Clockwork Orange;Science Fiction;McDowell, Malcolm;;Kubrick, Stanley;83;Yes;NicholasCage.png\n1991;117;Full Metal Jacket;War;Modine, Matthew;;Kubrick, Stanley;45;No;NicholasCage.png\n1943;82;Sanshiro Sugata;Drama;Fujita, Susumu;;Kurosawa, Akira;85;No;NicholasCage.png\n1991;97;Rhapsody in August;Drama;Gere, Richard;;Kurosawa, Akira;50;No;NicholasCage.png\n1946;110;No Regrets for Our Youth;Drama;Hara, Setsuko;;Kurosawa, Akira;31;No;NicholasCage.png\n1960;152;Bad Sleep Well, The;Drama;Mifune, Toshiro;;Kurosawa, Akira;65;No;NicholasCage.png\n1951;166;Idiot, The;Drama;Mifune, Toshiro;;Kurosawa, Akira;40;No;NicholasCage.png\n1951;83;Rashomon;Drama;Mifune, Toshiro;;Kurosawa, Akira;59;Yes;NicholasCage.png\n1962;96;Sanjuro;Mystery;Mifune, Toshiro;;Kurosawa, Akira;6;No;NicholasCage.png\n1955;200;Seven Samurai;Drama;Mifune, Toshiro;;Kurosawa, Akira;9;No;NicholasCage.png\n1957;110;Throne of Blood;Drama;Mifune, Toshiro;;Kurosawa, Akira;60;No;NicholasCage.png\n1961;110;Yojimbo;Action;Mifune, Toshiro;;Kurosawa, Akira;60;No;NicholasCage.png\n1980;161;Kagemusha;Drama;Nakadai, Tatsuya;;Kurosawa, Akira;74;Yes;NicholasCage.png\n1952;134;Ikiru;Drama;Shimura, Takashi;;Kurosawa, Akira;36;No;NicholasCage.png\n1987;90;Empire of Spiritual Ninja;Action;Berlin, Tom;;Lambert, Bruce;26;No;NicholasCage.png\n1986;90;Ninja, the Violent Sorcerer;Action;;;Lambert, Bruce;;No;NicholasCage.png\n1926;139;Metropolis;Science Fiction;Abel, Alfred;;Lang, Fritz;49;No;NicholasCage.png\n1946;106;Cloak & Dagger;Mystery;Cooper, Gary;;Lang, Fritz;55;No;NicholasCage.png\n1920;137;Spiders;Drama;De Vogy, Carl;;Lang, Fritz;29;No;NicholasCage.png\n1954;90;Human Desire;Drama;Ford, Glenn;;Lang, Fritz;27;No;glennFord.png\n1928;130;Spies;Drama;Klein-Rogge, Rudolf;;Lang, Fritz;49;No;NicholasCage.png\n1933;120;Testament of Dr. Mabuse, The;Drama;Klein-Rogge, Rudolf;;Lang, Fritz;4;No;NicholasCage.png\n1991;95;Fury;Drama;Tracy, Spencer;;Lang, Fritz;48;No;spencerTracy.png\n1990;129;Mo' Better Blues;Drama;Washington, Denzel;;Lee, Spike;78;No;NicholasCage.png\n1989;30;Matt Talbot;Drama;Ford, Seamus;;Lennon, Biddy W.;35;No;NicholasCage.png\n1989;55;Will Rogers, Look Back in Laughter;Comedy;Williams, Robin;;Leo, Malcolm;6;No;NicholasCage.png\n1991;130;For a Few Dollars More;Westerns;Eastwood, Clint;;Leone, Sergio;34;No;clintEastwood.png\n1944;139;Thirty Seconds over Tokyo;War;Tracy, Spencer;;LeRoy, Mervyn;45;No;spencerTracy.png\n1982;93;Class of 1984;Drama;King, Perry;;Lester, Mark L.;23;No;NicholasCage.png\n1974;109;Juggernaut;Action;Harris, Richard;;Lester, Richard;63;No;NicholasCage.png\n1987;120;Good Morning, Vietnam;Comedy;Williams, Robin;;Levinson, Barry;37;No;NicholasCage.png\n1945;94;Blood on the Sun;Drama;Cagney, James;;Lloyd, Frank;76;No;NicholasCage.png\n1969;161;Paint Your Wagon;Music;Marvin, Lee;;Logan, Joshua;46;No;NicholasCage.png\n1964;105;Ensign Pulver;Comedy;Walker, Robert, Jr.;;Logan, Joshua;16;No;NicholasCage.png\n1976;92;Street People;Action;Moore, Roger;;Lucidi, Maurizio;25;No;NicholasCage.png\n1984;83;Manhunt, The;Action;Borgnine, Ernest;;Ludman, Larry;34;No;NicholasCage.png\n1987;85;Operation Nam;War;Wayne, John Ethan;;Ludman, Larry;37;No;NicholasCage.png\n1944;100;Fighting Seabees, The;War;Wayne, John;;Ludwig, Edward;35;No;johnWayne.png\n1988;75;Let It Rock;Drama;Hopper, Dennis;;Lynch, David;32;No;lynch.png\n1978;90;Eraserhead;Horror;Nance, John;;Lynch, David;2;No;lynch.png\n1955;87;Ladykillers, The;Comedy;Guinness, Alec;;Mackendrick, Alexander;28;No;NicholasCage.png\n1957;97;Sweet Smell of Success;Drama;Lancaster, Burt;;Mackendrick, Alexander;12;No;burtLancaster.png\n1971;88;And Now for Something Completely Different;Comedy;Cleese, John;;MacNaughton, Ian;44;No;NicholasCage.png\n1984;92;Crackers;Action;Sutherland, Donald;;Malle, Louis;17;No;NicholasCage.png\n1991;89;Green Glove;Drama;Ford, Glenn;;Mate, Rudolph;54;No;glennFord.png\n1970;89;Menace on the Mountain;Action;Crowley, Pat;;McEveety, Vincent;69;No;NicholasCage.png\n1940;90;In Old California;Western;Wayne, John;;McGann, William;27;No;johnWayne.png\n1967;85;Thirty Is a Dangerous Age, Cynthia;Comedy;Moore, Dudley;;McGrath, Joseph;28;No;NicholasCage.png\n1980;99;Ffolkes;Action;Moore, Roger;;McLaglen, Andrew V.;62;No;NicholasCage.png\n1970;111;Chisum;Western;Wayne, John;;McLaglen, Andrew V.;72;No;johnWayne.png\n1990;135;Hunt for Red October, The;Drama;Connery, Sean;;McTiernan, John;8;No;seanConnery.png\n1966;123;Closely Watched Trains;Drama;Neckar, Vaclav;;Menzel, Jiri;75;Yes;NicholasCage.png\n1973;91;Executive Action;Drama;Lancaster, Burt;;Miller, David;6;No;burtLancaster.png\n1942;101;Flying Tigers;Action;Wayne, John;;Miller, David;61;No;johnWayne.png\n1991;87;Father's Little Dividend;Comedy;Tracy, Spencer;;Minnelli, Vincente;52;No;spencerTracy.png\n1982;92;An Evening with Robin Williams;Comedy;Williams, Robin;;Mischer, Don;68;No;NicholasCage.png\n1987;90;Eddie Murphy Raw;Comedy;Murphy, Eddie;;Murphy, Eddie;51;No;NicholasCage.png\n1989;118;Harlem Nights;Comedy;Murphy, Eddie;;Murphy, Eddie;11;No;NicholasCage.png\n1973;93;Santee;Western;Ford, Glenn;;Nelson, Gary;47;No;glennFord.png\n1987;90;Good Father, The;Drama;Hopkins, Anthony;;Newell, Mike;42;No;AnthonyHopkins.png\n1971;115;Sometimes a Great Notion;Drama;Newman, Paul;;Newman, Paul;7;No;paulNewman.png\n1970;117;Catch Twenty-Two;Comedy;Arkin, Alan;;Nichols, Mike;50;No;NicholasCage.png\n1988;90;Dark Age;Action;Jarratt, John;;Nicholson, Arch;3;No;NicholasCage.png\n1981;94;Deadline;Mystery;Newman, Barry;;Nicholson, Arch;9;No;paulNewman.png\n1935;60;Mysterious Mr. Wong;Mystery;Lugosi, Bela;;Nigh, William;71;No;NicholasCage.png\n1988;92;A Month in the Country;Drama;Firth, Colin;;O'Connor, Pat;57;No;NicholasCage.png\n1990;97;Prom Night III, The Last Kiss;Horror;Conlon, Tim;;Oliver, Ron;29;No;NicholasCage.png\n1990;;Blood in, Blood Out;Drama;Penn, Sean;;Olmos, Edward James;88;No;NicholasCage.png\n1989;94;Wrong Arm of the Law, The;Comedy;Sellers, Peter;;Owen, Cliff;25;No;NicholasCage.png\n1987;116;Orphans;Drama;Finney, Albert;;Pakula, Alan J.;21;No;NicholasCage.png\n1976;139;All the President's Men;Drama;Redford, Robert;;Pakula, Alan J.;45;Yes;NicholasCage.png\n1987;73;J-Men Forever;Action;Bergman, Peter;;Patterson, Richard;59;No;NicholasCage.png\n1969;144;Wild Bunch, The;Western;Holden, William;;Peckinpah, Sam;50;No;NicholasCage.png\n1988;92;Judgement in Berlin;Drama;Sheen, Martin;;Penn, Leo;13;No;NicholasCage.png\n1993;;Hot Line, The;Comedy;Boyer, Charles;;Perier, Etienne;70;No;NicholasCage.png\n1988;100;Rocket Gibraltar;Drama;Lancaster, Burt;;Petrie, Daniel;26;No;burtLancaster.png\n1975;112;Yakuza, The;Action;Mitchum, Robert;;Pollack, Sydney;16;No;NicholasCage.png\n1972;116;Jeremiah Johnson;Drama;Redford, Robert;;Pollack, Sydney;88;No;NicholasCage.png\n1970;112;Burn!;Drama;Brando, Marlon;;Pontecorvo, Gillo;75;No;brando.png\n1973;122;Magnum Force;Action;Eastwood, Clint;;Post, Ted;28;No;clintEastwood.png\n1989;86;Cyborg;Action;Van Damme, Jean-Claude;;Pyun, Albert;31;No;NicholasCage.png\n1979;108;Prisoner of Zenda, The;Comedy;Sellers, Peter;;Quine, Richard;12;No;NicholasCage.png\n1983;86;Scream;Horror;Martin, Pepper;;Quisenberry, Byron;24;No;NicholasCage.png\n1986;140;Assault, The;Drama;Lint, Derek De;;Rademakers, Fons;71;Yes;NicholasCage.png\n1951;102;Flying Leathernecks;Action;Wayne, John;;Ray, Nicholas;23;No;johnWayne.png\n1985;92;What Comes Around;Drama;Reed, Jerry;;Reed, Jerry;49;No;NicholasCage.png\n1980;123;Mon Oncle D'Amerique;Comedy;Roger-Pierre;;Resnais, Alain;71;No;NicholasCage.png\n1972;92;Culpepper Cattle Company, The;Western;Grimes, Gary;;Richards, Dick;29;No;NicholasCage.png\n1983;102;Survivors, The;Comedy;Matthau, Walter;;Ritchie, Michael;52;No;NicholasCage.png\n1984;96;Roadhouse Sixty-Six;Action;Dafoe, Willem;;Robinson, John Mark;20;No;NicholasCage.png\n1991;60;Burning Poles, Cecil Taylor in Performance;Music;Taylor, Cecil;;Rochlin, Sheldon;82;No;NicholasCage.png\n1987;98;Russkies;Action;Hubley, Whip;;Rosenthal, Rick;87;No;NicholasCage.png\n1990;96;My Blue Heaven;Comedy;Martin, Steve;;Ross, Herbert;63;No;NicholasCage.png\n1990;103;Altered States;Science Fiction;Hurt, William;;Russell, Ken;22;No;NicholasCage.png\n1972;128;Cowboys, The;Western;Wayne, John;;Rydell, Mark;58;No;johnWayne.png\n1985;95;Code Name, Emerald;Drama;Harris, Ed;;Sanger, Jonathan;22;No;NicholasCage.png\n1970;170;Patton;War;Scott, George C.;;Schaffner, Franklin J.;8;Yes;NicholasCage.png\n1969;123;Midnight Cowboy;Drama;Hoffman, Dustin;;Schlesinger, John;33;Yes;NicholasCage.png\n1985;131;Falcon & the Snowman, The;Drama;Hutton, Timothy;;Schlesinger, John;61;No;NicholasCage.png\n1976;112;Maitresse;Drama;Ogier, Bulle;;Schroeder, Barbet;39;No;NicholasCage.png\n1987;86;Disorderlies;Comedy;Boys, The Fat;;Schultz, Michael;69;No;NicholasCage.png\n1991;;Raging Bull;Drama;De Niro, Robert;;Scorsese, Martin;25;No;NicholasCage.png\n1991;60;Garrison Keillor's Home;Comedy;Keillor, Garrison;;Sevush, Herb;6;No;NicholasCage.png\n1938;55;Overland Stage Raiders;Western;Wayne, John;;Sherman, George;83;No;johnWayne.png\n1938;55;Pals of the Saddle;Western;Wayne, John;;Sherman, George;33;No;johnWayne.png\n1982;92;Alone in the Dark;Horror;Schultz, Dwight;;Sholder, Jack;75;No;NicholasCage.png\n1971;109;Beguiled, The;Drama;Eastwood, Clint;;Siegel, Don;60;No;clintEastwood.png\n1979;112;Escape from Alcatraz;Drama;Eastwood, Clint;;Siegel, Don;22;No;clintEastwood.png\n1948;88;Criss Cross;Drama;Lancaster, Burt;;Siodmak, Robert;77;No;burtLancaster.png\n1976;132;Midway;War;Heston, Charlton;;Smight, Jack;36;No;NicholasCage.png\n1990;126;Indiana Jones & the Last Crusade;Action;Ford, Harrison;;Spielberg, Steven;8;No;NicholasCage.png\n1993;90;Duel;Mystery;Weaver, Dennis;;Spielberg, Steven;48;No;NicholasCage.png\n1991;193;Separate but Equal;Drama;Poitier, Sidney;;Stevens, George, Jr.;56;No;NicholasCage.png\n1924;123;Gosta Berling's Saga;Drama;Hanson, Lars;;Stiller, Mauritz;63;No;NicholasCage.png\n1986;120;Platoon;Drama;Sheen, Charlie;;Stone, Oliver;8;Yes;NicholasCage.png\n1963;89;Crawling Hand, The;Science Fiction;Breck, Peter;;Strock, Herbert L.;79;No;NicholasCage.png\n1971;100;Willy Wonka & the Chocolate Factory;Music;Wilder, Gene;;Stuart, Mel;65;No;NicholasCage.png\n1971;88;Joe Kidd;Western;Eastwood, Clint;;Sturges, John;79;No;clintEastwood.png\n1985;104;Santa Claus, The Movie;Comedy;Moore, Dudley;;Szwarc, Jeannot;19;No;NicholasCage.png\n1938;96;Boys Town;Drama;Tracy, Spencer;;Taurog, Norman;21;Yes;spencerTracy.png\n1990;59;Erasure, Live Wild!;Music;;;Taylor, Gavin;48;No;NicholasCage.png\n1982;150;A Question of Honor;Drama;Gazzara, Ben;;Taylor, Jud;80;No;NicholasCage.png\n1947;61;Check Your Guns;Western;Dean, Eddie;;Taylor, Ray;80;No;NicholasCage.png\n1947;56;West to Glory;Western;Dean, Eddie;;Taylor, Ray;43;No;NicholasCage.png\n1937;60;Throwback, The;Western;Jones, Buck;;Taylor, Ray;53;No;NicholasCage.png\n1992;54;Border Feud;Action;LaRue, Lash;;Taylor, Ray;43;No;NicholasCage.png\n1947;58;Fighting Vigilantes, The;Western;LaRue, Lash;;Taylor, Ray;21;No;NicholasCage.png\n1947;53;Law of the Lash;Western;LaRue, Lash;;Taylor, Ray;66;No;NicholasCage.png\n1949;66;Outlaw Country;Western;LaRue, Lash;;Taylor, Ray;62;No;NicholasCage.png\n1992;53;Return of the Lash;Action;LaRue, Lash;;Taylor, Ray;78;No;NicholasCage.png\n1937;60;Mystery of the Hooded Horsemen;Western;Ritter, Tex;;Taylor, Ray;52;No;NicholasCage.png\n1937;60;Tex Rides with the Boy Scouts;Western;Ritter, Tex;;Taylor, Ray;17;No;NicholasCage.png\n1949;59;Shadows of the West;Western;Wilson, Whip;;Taylor, Ray;40;No;NicholasCage.png\n1991;102;Instant Karma;Comedy;Cassidy, David;;Taylor, Roderick;47;No;NicholasCage.png\n1957;73;Time Lock;Drama;Connery, Sean;;Thomas, Gerald;5;No;seanConnery.png\n1953;79;Appointment in Honduras;Drama;Ford, Glenn;;Tourneur, Jacques;7;No;glennFord.png\n1982;136;Danton;Drama;Depardieu, Gérard;;Wajda, Andrzej;5;No;NicholasCage.png\n1960;164;Alamo, The;Action;Wayne, John;;Wayne, John;29;No;johnWayne.png\n1986;91;La Chevre, (The Goat);Drama;Depardieu, Gérard;;Veber, Francis;24;No;NicholasCage.png\n1985;109;Les Comperes;Comedy;Richard, Pierre;;Veber, Francis;54;No;NicholasCage.png\n1990;128;Dead Poets Society;Drama;Williams, Robin;;Weir, Peter;8;Yes;NicholasCage.png\n1952;93;Othello, The Lost Masterpiece;Drama;Welles, Orson;;Welles, Orson;23;No;NicholasCage.png\n1949;119;Battleground, The;War;Johnson, Van;;Wellman, William;7;No;NicholasCage.png\n1976;176;Kings of the Road (In the Course of Time);Drama;Vogler, Rudiger;;Wenders, Wim;41;No;NicholasCage.png\n1990;98;Hiroshima;Drama;Nelson, Judd;;Werner, Peter;17;No;NicholasCage.png\n1982;111;Return of Martin Guerre, The;Drama;Depardieu, Gérard;;Vigne, Daniel;51;No;NicholasCage.png\n1956;97;Somebody up There Likes Me;Drama;Newman, Paul;;Wise, Robert;56;No;paulNewman.png\n1955;57;Jack Benny Show;Comedy;Benny, Jack;;;51;No;NicholasCage.png\n1962;182;Mutiny on the Bounty;Action;Brando, Marlon;;;35;No;brando.png\n1989;;Death Valley Days, Deadly Decision;Western;Caan, James;;;9;No;NicholasCage.png\n1986;60;Monty Python's Flying Circus;Comedy;Chapman, Graham;;;4;No;NicholasCage.png\n1986;60;Monty Python's Flying Circus, Vol 1.;Comedy;Chapman, Graham;;;24;No;NicholasCage.png\n1986;59;Monty Python's Flying Circus, Vol 2.;Comedy;Chapman, Graham;;;79;No;NicholasCage.png\n1986;58;Monty Python's Flying Circus, Vol 3.;Comedy;Chapman, Graham;;;63;No;NicholasCage.png\n1990;;Valkenvania;Comedy;Chase, Chevy;;;82;No;NicholasCage.png\n1982;101;Secret Policeman's Other Ball, The;Comedy;Cleese, John;;;86;No;NicholasCage.png\n1981;127;Taming of the Shrew, The;Drama;Cleese, John;;;2;No;NicholasCage.png\n1964;;From Russia with Love;Action;Connery, Sean;;;6;No;seanConnery.png\n1993;108;Offence, The;Mystery;Connery, Sean;;;6;No;seanConnery.png\n1992;60;Hollywood Mavericks;Comedy;Coppola, Francis Ford;;;22;No;NicholasCage.png\n1990;60;Live at Harrah's;Comedy;Cosby, Bill;;;6;No;NicholasCage.png\n1992;52;Persuaders, The Overture, The;Mystery;Curtis, Tony;;;40;No;NicholasCage.png\n1977;255;Nineteen Hundred;Drama;De Niro, Robert;;;82;No;NicholasCage.png\n1989;90;Van, The;Comedy;DeVito, Danny;;;5;No;NicholasCage.png\n1972;15;My Country Right or Wrong;War;Douglas, Michael;;;21;No;NicholasCage.png\n1991;;Clint Eastwood Collection, The;Westerns;Eastwood, Clint;;;11;No;clintEastwood.png\n1991;;Complete Dirty Harry, Magnum Force, The;Action;Eastwood, Clint;;;53;No;clintEastwood.png\n1992;92;Dead Pool, The;Action;Eastwood, Clint;;;26;No;clintEastwood.png\n1992;163;Good, the Bad & the Ugly, The;Westerns;Eastwood, Clint;;;68;No;clintEastwood.png\n1959;60;Rawhide, Premiere Episode;Western;Eastwood, Clint;;;54;No;clintEastwood.png\n1992;118;Tightrope;Mystery;Eastwood, Clint;;;55;No;clintEastwood.png\n1987;95;Hearts of Fire;Drama;Everett, Rupert;;;25;No;NicholasCage.png\n1992;165;How the West Was Won;Western;Fonda, Henry;;;45;No;NicholasCage.png\n1992;;Mummy's Hand, The;Mystery;Foran, Dick;;;54;No;NicholasCage.png\n1993;88;Great White Death;Action;Ford, Glenn;;;26;No;glennFord.png\n1986;119;Mosquito Coast, The;Drama;Ford, Harrison;;;54;No;NicholasCage.png\n1993;102;Today We Kill....Tomorrow We Die;Western;Ford, Montgomery;;;25;No;NicholasCage.png\n1991;;Tormenta Sobre Arizona;Drama;Ford, Wallace;;;81;No;NicholasCage.png\n1989;116;Back to the Future II;Comedy;Fox, Michael J.;;;65;No;NicholasCage.png\n1959;60;Maverick, Duel at Sundown;Western;Garner, James;;;26;No;NicholasCage.png\n1983;;Shakespeare Series;Drama;Gielgud, John;;;23;No;NicholasCage.png\n1973;105;Deadly Trackers;Western;Harris, Richard;;;54;No;NicholasCage.png\n1992;72;American Film Institute, Alfred Hitchcock;Mystery;Hitchcock, Alfred;;;70;No;NicholasCage.png\n1990;;A Married Man;Drama;Hopkins, Anthony;;;79;No;AnthonyHopkins.png\n1982;208;Othello;Drama;Hopkins, Anthony;;;84;No;AnthonyHopkins.png\n1975;85;Only Way Home, The;Drama;Hopkins, Bo;;;60;No;NicholasCage.png\n1953;120;Tales of Tomorrow;Horror;Karloff, Boris;;;0;No;NicholasCage.png\n1991;128;Inherit the Wind;Drama;Kelly, Gene;;;18;No;NicholasCage.png\n1990;45;This Is Horror;Horror;King, Stephen;;;3;No;NicholasCage.png\n1992;112;Conversation Piece;Drama;Lancaster, Burt;;;1;No;burtLancaster.png\n1992;105;Crimson Pirate, The;Action;Lancaster, Burt;;;60;No;burtLancaster.png\n1992;83;Devil's Disciple, The;Mystery;Lancaster, Burt;;;65;No;burtLancaster.png\n1992;166;Hallelujah Trail, The;Drama;Lancaster, Burt;;;6;No;burtLancaster.png\n1992;133;Train, The;Action;Lancaster, Burt;;;68;No;burtLancaster.png\n1986;49;Jay Leno: The American Dream;Comedy;Leno, Jay;;;67;No;NicholasCage.png\n1990;92;Primal Rage;Mystery;Lowe, Patrick;;;3;No;NicholasCage.png\n1990;50;Industrial Symphony, The Dream of the Broken-Hearted;Music;Lynch, David;;;49;No;lynch.png\n1986;52;Howie Mandel's North American Watusi Tour;Comedy;Mandel, Howie;;;65;No;NicholasCage.png\n1989;90;Branford Marsalis, Steep;Music;Marsalis, Branford;;;52;No;NicholasCage.png\n1991;98;L. A. Story;Comedy;Martin, Steve;;;81;No;NicholasCage.png\n1986;60;Steve Martin Live!;Comedy;Martin, Steve;;;3;No;NicholasCage.png\n1974;60;Steve Martin, The Funnier Side of Eastern Canada;Comedy;Martin, Steve;;;34;No;NicholasCage.png\n1993;;Runaway Barge, The;Action;Matheson, Tim;;;38;No;NicholasCage.png\n1992;101;Romulus & the Sabines;Action;Moore, Roger;;;76;No;NicholasCage.png\n1989;;Saint, The;Mystery;Moore, Roger;;;29;No;NicholasCage.png\n1983;91;Strange Brew;Comedy;Moranis, Rick;;;24;No;NicholasCage.png\n1990;98;Another Forty-Eight Hours;Action;Murphy, Eddie;;;54;No;NicholasCage.png\n1989;;Best of Eddie Murphy, Saturday Night Live, The;Comedy;Murphy, Eddie;;;56;No;NicholasCage.png\n1991;99;What about Bob?;Comedy;Murray, Bill;;;6;No;NicholasCage.png\n1953;91;Mummy's Revenge, The;Horror;Naschy, Paul;;;56;No;NicholasCage.png\n1992;121;Harper;Mystery;Newman, Paul;;;86;No;paulNewman.png\n1992;102;Left Handed Gun, The;Western;Newman, Paul;;;26;No;paulNewman.png\n1989;;Once upon a Wheel;Action;Newman, Paul;;;40;No;paulNewman.png\n1992;136;Prize, The;Drama;Newman, Paul;;;66;No;paulNewman.png\n1968;;Secret War of Harry Frigg, The;Comedy;Newman, Paul;;;28;No;paulNewman.png\n1990;;Two Jakes, The;Mystery;Nicholson, Jack;;;3;No;NicholasCage.png\n1989;61;Exile in Concert;Music;Pennington, J. P.;;;12;No;NicholasCage.png\n1987;60;Joe Piscopo New Jersey Special;Comedy;Piscopo, Joe;;;14;No;NicholasCage.png\n1991;60;Joe Piscopo Video, The;Comedy;Piscopo, Joe;;;44;No;NicholasCage.png\n1989;;Death Valley Days, No Gun Behind His Badge;Western;Reagan, Ronald;;;1;No;NicholasCage.png\n1988;96;Salsa: The Motion Picture;Drama;Rosa, Robby;;;26;No;NicholasCage.png\n1991;80;Hollywood's Greatest War Movies;War;Scott, George C.;;;41;No;NicholasCage.png\n1991;91;Out for Justice;Action;Seagal, Steven;;;2;No;NicholasCage.png\n1956;27;Case of the Mukkinese Battle Horn, The;Comedy;Sellers, Peter;;;45;No;NicholasCage.png\n1953;75;Goon Show Movie, The;Comedy;Sellers, Peter;;;80;No;NicholasCage.png\n1975;95;Great McGonagall, The;Comedy;Sellers, Peter;;;72;No;NicholasCage.png\n1991;101;I'm All Right Jack;Comedy;Sellers, Peter;;;23;No;NicholasCage.png\n1991;101;Magic Christian, The;Comedy;Sellers, Peter;;;75;No;NicholasCage.png\n1960;91;Never Let Go;Action;Sellers, Peter;;;5;No;NicholasCage.png\n1991;121;Pink Panther, The;Comedy;Sellers, Peter;;;77;No;NicholasCage.png\n1991;84;Two-Way Stretch;Comedy;Sellers, Peter;;;7;No;NicholasCage.png\n1988;65;Face at the Window, The;Horror;Slaughter, Tod;;;79;No;NicholasCage.png\n1958;92;Tom Thumb;Science Fiction;Tamblyn, Russ;;;30;No;NicholasCage.png\n1989;90;Beartooth;Action;Taylor, Dub;;;70;No;NicholasCage.png\n1979;90;James Taylor in Concert;Music;Taylor, James;;;38;No;NicholasCage.png\n1942;253;Gangbusters;Drama;Taylor, Kent;;;31;No;NicholasCage.png\n1992;;El Rublo de las Dos Caras;Action;Taylor, Robert;;;83;No;NicholasCage.png\n1992;87;Law & Jake Wade, The;Drama;Taylor, Robert;;;68;No;NicholasCage.png\n1967;105;Chuka;Western;Taylor, Rod;;;47;No;NicholasCage.png\n1980;93;Cry of the Innocent;Drama;Taylor, Rod;;;13;No;NicholasCage.png\n1991;108;Edison the Man;Drama;Tracy, Spencer;;;19;No;spencerTracy.png\n1991;101;Keeper of the Flame;Drama;Tracy, Spencer;;;76;No;spencerTracy.png\n1991;92;Spencer Tracy Legacy, The;Comedy;Tracy, Spencer;;;44;No;spencerTracy.png\n1957;60;Cheyenne, The Iron Trail;Western;Walker, Clint;;;1;No;NicholasCage.png\n1992;56;Dawn Rider, The;Western;Wayne, John;;;44;No;johnWayne.png\n1993;;Duke, The Films of John Wayne;Western;Wayne, John;;;70;No;johnWayne.png\n1939;55;Frontier Horizon;Western;Wayne, John;;;73;No;johnWayne.png\n1934;54;Hell Town;Western;Wayne, John;;;23;No;johnWayne.png\n1932;;Hurricane Express;Western;Wayne, John;;;7;No;johnWayne.png\n1932;210;Hurricane Express, The;Action;Wayne, John;;;68;No;johnWayne.png\n1965;165;In Harm's Way;War;Wayne, John;;;66;No;johnWayne.png\n1991;;John Wayne Collection, Red River, The;War;Wayne, John;;;49;No;johnWayne.png\n1992;;John Wayne Collector's Limited Edition;War;Wayne, John;;;3;No;johnWayne.png\n1991;;John Wayne Four Pack;Western;Wayne, John;;;58;No;johnWayne.png\n1939;112;John Wayne Matinee Double Feature, No. 2;Western;Wayne, John;;;3;No;johnWayne.png\n1939;110;John Wayne Matinee Double Feature, No. 3;Western;Wayne, John;;;24;No;johnWayne.png\n1938;110;John Wayne Matinee Double Feature, No. 4;Western;Wayne, John;;;28;No;johnWayne.png\n1990;;John Wayne Six Pack;Western;Wayne, John;;;87;No;johnWayne.png\n1991;;John Wayne Western Greats, Rio Bravo;Western;Wayne, John;;;22;No;johnWayne.png\n1991;56;King of the Pecos;Western;Wayne, John;;;78;No;johnWayne.png\n1992;59;Lawless Frontier;Western;Wayne, John;;;8;No;johnWayne.png\n1991;52;Lawless Frontier, The;Western;Wayne, John;;;35;No;johnWayne.png\n1991;56;Lawless Nineties, The;Western;Wayne, John;;;3;No;johnWayne.png\n1934;54;Lucky Texan;Western;Wayne, John;;;48;No;johnWayne.png\n1992;112;McQ;Action;Wayne, John;;;5;No;johnWayne.png\n1993;;Neath Arizona Skies;Western;Wayne, John;;;73;No;johnWayne.png\n1991;54;Neath the Arizona Skies;Western;Wayne, John;;;28;No;johnWayne.png\n1991;53;Randy Rides Alone;Western;Wayne, John;;;75;No;johnWayne.png\n1993;58;Range Feud;Western;Wayne, John;;;77;No;johnWayne.png\n1992;134;Red River;Western;Wayne, John;;;16;No;johnWayne.png\n1991;52;Riders of Destiny;Western;Wayne, John;;;30;No;johnWayne.png\n1990;;Sagebrush Trail;Western;Wayne, John;;;23;No;johnWayne.png\n1932;226;Shadow of the Eagle, The;Action;Wayne, John;;;19;No;johnWayne.png\n1989;103;Blood & Guns;Action;Welles, Orson;;;43;No;NicholasCage.png\n1988;78;Hot Money;Drama;Welles, Orson;;;19;No;NicholasCage.png\n1977;75;Comedy Tonight;Comedy;Williams, Robin;;;18;No;NicholasCage.png\n1991;65;Robin Williams;Comedy;Williams, Robin;;;4;No;NicholasCage.png         "
  },
  {
    "path": "FUNDING.yml",
    "content": "custom: https://learndataengineering.com/p/academy\n"
  },
  {
    "path": "LICENSE",
    "content": "                                 Apache License\n                           Version 2.0, January 2004\n                        http://www.apache.org/licenses/\n\n   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION\n\n   1. Definitions.\n\n      \"License\" shall mean the terms and conditions for use, reproduction,\n      and distribution as defined by Sections 1 through 9 of this document.\n\n      \"Licensor\" shall mean the copyright owner or entity authorized by\n      the copyright owner that is granting the License.\n\n      \"Legal Entity\" shall mean the union of the acting entity and all\n      other entities that control, are controlled by, or are under common\n      control with that entity. For the purposes of this definition,\n      \"control\" means (i) the power, direct or indirect, to cause the\n      direction or management of such entity, whether by contract or\n      otherwise, or (ii) ownership of fifty percent (50%) or more of the\n      outstanding shares, or (iii) beneficial ownership of such entity.\n\n      \"You\" (or \"Your\") shall mean an individual or Legal Entity\n      exercising permissions granted by this License.\n\n      \"Source\" form shall mean the preferred form for making modifications,\n      including but not limited to software source code, documentation\n      source, and configuration files.\n\n      \"Object\" form shall mean any form resulting from mechanical\n      transformation or translation of a Source form, including but\n      not limited to compiled object code, generated documentation,\n      and conversions to other media types.\n\n      \"Work\" shall mean the work of authorship, whether in Source or\n      Object form, made available under the License, as indicated by a\n      copyright notice that is included in or attached to the work\n      (an example is provided in the Appendix below).\n\n      \"Derivative Works\" shall mean any work, whether in Source or Object\n      form, that is based on (or derived from) the Work and for which the\n      editorial revisions, annotations, elaborations, or other modifications\n      represent, as a whole, an original work of authorship. For the purposes\n      of this License, Derivative Works shall not include works that remain\n      separable from, or merely link (or bind by name) to the interfaces of,\n      the Work and Derivative Works thereof.\n\n      \"Contribution\" shall mean any work of authorship, including\n      the original version of the Work and any modifications or additions\n      to that Work or Derivative Works thereof, that is intentionally\n      submitted to Licensor for inclusion in the Work by the copyright owner\n      or by an individual or Legal Entity authorized to submit on behalf of\n      the copyright owner. For the purposes of this definition, \"submitted\"\n      means any form of electronic, verbal, or written communication sent\n      to the Licensor or its representatives, including but not limited to\n      communication on electronic mailing lists, source code control systems,\n      and issue tracking systems that are managed by, or on behalf of, the\n      Licensor for the purpose of discussing and improving the Work, but\n      excluding communication that is conspicuously marked or otherwise\n      designated in writing by the copyright owner as \"Not a Contribution.\"\n\n      \"Contributor\" shall mean Licensor and any individual or Legal Entity\n      on behalf of whom a Contribution has been received by Licensor and\n      subsequently incorporated within the Work.\n\n   2. Grant of Copyright License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      copyright license to reproduce, prepare Derivative Works of,\n      publicly display, publicly perform, sublicense, and distribute the\n      Work and such Derivative Works in Source or Object form.\n\n   3. Grant of Patent License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      (except as stated in this section) patent license to make, have made,\n      use, offer to sell, sell, import, and otherwise transfer the Work,\n      where such license applies only to those patent claims licensable\n      by such Contributor that are necessarily infringed by their\n      Contribution(s) alone or by combination of their Contribution(s)\n      with the Work to which such Contribution(s) was submitted. If You\n      institute patent litigation against any entity (including a\n      cross-claim or counterclaim in a lawsuit) alleging that the Work\n      or a Contribution incorporated within the Work constitutes direct\n      or contributory patent infringement, then any patent licenses\n      granted to You under this License for that Work shall terminate\n      as of the date such litigation is filed.\n\n   4. Redistribution. You may reproduce and distribute copies of the\n      Work or Derivative Works thereof in any medium, with or without\n      modifications, and in Source or Object form, provided that You\n      meet the following conditions:\n\n      (a) You must give any other recipients of the Work or\n          Derivative Works a copy of this License; and\n\n      (b) You must cause any modified files to carry prominent notices\n          stating that You changed the files; and\n\n      (c) You must retain, in the Source form of any Derivative Works\n          that You distribute, all copyright, patent, trademark, and\n          attribution notices from the Source form of the Work,\n          excluding those notices that do not pertain to any part of\n          the Derivative Works; and\n\n      (d) If the Work includes a \"NOTICE\" text file as part of its\n          distribution, then any Derivative Works that You distribute must\n          include a readable copy of the attribution notices contained\n          within such NOTICE file, excluding those notices that do not\n          pertain to any part of the Derivative Works, in at least one\n          of the following places: within a NOTICE text file distributed\n          as part of the Derivative Works; within the Source form or\n          documentation, if provided along with the Derivative Works; or,\n          within a display generated by the Derivative Works, if and\n          wherever such third-party notices normally appear. The contents\n          of the NOTICE file are for informational purposes only and\n          do not modify the License. You may add Your own attribution\n          notices within Derivative Works that You distribute, alongside\n          or as an addendum to the NOTICE text from the Work, provided\n          that such additional attribution notices cannot be construed\n          as modifying the License.\n\n      You may add Your own copyright statement to Your modifications and\n      may provide additional or different license terms and conditions\n      for use, reproduction, or distribution of Your modifications, or\n      for any such Derivative Works as a whole, provided Your use,\n      reproduction, and distribution of the Work otherwise complies with\n      the conditions stated in this License.\n\n   5. Submission of Contributions. Unless You explicitly state otherwise,\n      any Contribution intentionally submitted for inclusion in the Work\n      by You to the Licensor shall be under the terms and conditions of\n      this License, without any additional terms or conditions.\n      Notwithstanding the above, nothing herein shall supersede or modify\n      the terms of any separate license agreement you may have executed\n      with Licensor regarding such Contributions.\n\n   6. Trademarks. This License does not grant permission to use the trade\n      names, trademarks, service marks, or product names of the Licensor,\n      except as required for reasonable and customary use in describing the\n      origin of the Work and reproducing the content of the NOTICE file.\n\n   7. Disclaimer of Warranty. Unless required by applicable law or\n      agreed to in writing, Licensor provides the Work (and each\n      Contributor provides its Contributions) on an \"AS IS\" BASIS,\n      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or\n      implied, including, without limitation, any warranties or conditions\n      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A\n      PARTICULAR PURPOSE. You are solely responsible for determining the\n      appropriateness of using or redistributing the Work and assume any\n      risks associated with Your exercise of permissions under this License.\n\n   8. Limitation of Liability. In no event and under no legal theory,\n      whether in tort (including negligence), contract, or otherwise,\n      unless required by applicable law (such as deliberate and grossly\n      negligent acts) or agreed to in writing, shall any Contributor be\n      liable to You for damages, including any direct, indirect, special,\n      incidental, or consequential damages of any character arising as a\n      result of this License or out of the use or inability to use the\n      Work (including but not limited to damages for loss of goodwill,\n      work stoppage, computer failure or malfunction, or any and all\n      other commercial damages or losses), even if such Contributor\n      has been advised of the possibility of such damages.\n\n   9. Accepting Warranty or Additional Liability. While redistributing\n      the Work or Derivative Works thereof, You may choose to offer,\n      and charge a fee for, acceptance of support, warranty, indemnity,\n      or other liability obligations and/or rights consistent with this\n      License. However, in accepting such obligations, You may act only\n      on Your own behalf and on Your sole responsibility, not on behalf\n      of any other Contributor, and only if You agree to indemnify,\n      defend, and hold each Contributor harmless for any liability\n      incurred by, or claims asserted against, such Contributor by reason\n      of your accepting any such warranty or additional liability.\n\n   END OF TERMS AND CONDITIONS\n\n   APPENDIX: How to apply the Apache License to your work.\n\n      To apply the Apache License to your work, attach the following\n      boilerplate notice, with the fields enclosed by brackets \"[]\"\n      replaced with your own identifying information. (Don't include\n      the brackets!)  The text should be enclosed in the appropriate\n      comment syntax for the file format. We also recommend that a\n      file or class name and description of purpose be included on the\n      same \"printed page\" as the copyright notice for easier\n      identification within third-party archives.\n\n   Copyright [yyyy] [name of copyright owner]\n\n   Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License.\n"
  },
  {
    "path": "README.md",
    "content": "<!--- # The Data Engineering Cookbook -->\n\n<div align=\"center\">\n\t<img width=\"341\" height=\"426\" src=\"images/CookbookCover.jpg\" alt=\"Data Engineering Cookbook\">\n\t<br>\n\t<br>\n\t<br>\n</div>\n\n<p align=\"center\">\n\t<a href=\"sections/01-Introduction.md\">What is this Book?</a>&nbsp;&nbsp;&nbsp;\n  <a href=\"#how-to-contribute\">How to Contribute</a>&nbsp;&nbsp;&nbsp;\n  <a href=\"https://www.youtube.com/channel/UCY8mzqqGwl5_bTpBY9qLMAA\">YouTube</a>&nbsp;&nbsp;&nbsp;\n\t<a\n  <a href=\"https://twitter.com/andreaskayy\">Twitter</a>&nbsp;&nbsp;&nbsp;\n  <a href=\"https://www.amazon.com/shop/plumbersofdatascience\">Amazon Shop</a>\n</p>\n\n<br>\n\n## If You Like This Book & Need More Help\nCheck out my Data Engineering Academy at LearnDataEngineering.com trusted by almost 2,000 students!\n\n**Visit learndataengineering.com:** [Click Here](https://learndataengineering.com)\n\n- Learn Data Engineering with our online Academy\n- Perfect for becoming a Data Engineer or add Data Engineering to your skillset\n- Proven process based on years of experience and hundreds of hours of personal coaching\n- Over 30 prepared courses on the most important techniques, fundamental tools and platforms plus our\n- Associate Data Engineer Certification\n- Academy Discord server with over 1,000 members\n\n\n\n## Support This Book For Free!\n- **Amazon:** [Click Here](https://www.amazon.com/shop/plumbersofdatascience) buy whatever you like from Amazon using this link* (Also check out my complete podcast gear and books)\n\n<!---\nI get asked super often how to become a Data Engineer.\nThat's why I decided to start this cookbook with all the topics you need to look into.\n\nIt's not only useful for beginners, professionals will definitely like the case study section.\n\nIf you look for the old PDF version it's [here](https://github.com/andkret/Cookbook/raw/LaTex-Version-Deprecated/Data%20Engineering%20Cookbook.pdf)\n\n-->\n\n## Here's what's new:\nFind the change log with all recent updates here: [SEE UPDATES](sections/10-Updates.md)\n\n# Contents:\n- [Introduction](sections/01-Introduction.md)\n- [Basic Engineering Skills](sections/02-BasicSkills.md)\n- [Advanced Engineering Skills](sections/03-AdvancedSkills.md)\n- [Free Hands On Courses / Tutorials](sections/04-HandsOnCourse.md)‚\n- [Case Studies](sections/05-CaseStudies.md)\n- [Best Practices Cloud Platforms](sections/06-BestPracticesCloud.md)\n- [130+ Data Sources Data Science](sections/07-DataSources.md)\n- [1001 Interview Questions](sections/08-InterviewQuestions.md)\n- [Recommended Books, Courses, and Podcasts](sections/09-BooksAndCourses.md)\n- [Updates](sections/10-Updates.md)\n<!--  test -->\n\n- [How To Contribute](#how-to-contribute)\n- [Support What You Like](#support)\n- [Important Links](#important-links)\n\n# Full Table Of Contents:\n##  Introduction\n- [What is this Cookbook](sections/01-Introduction.md#what-is-this-cookbook)\n- [Data Engineers](sections/01-Introduction.md#data-engineers)\n- [My Data Science Platform Blueprint](sections/01-Introduction.md#my-data-science-platform-blueprint)\n  - [Connect](sections/01-Introduction.md#connect)\n  - [Buffer](sections/01-Introduction.md#buffer)\n  - [Processing Framework](sections/01-Introduction.md#processing-framework)\n  - [Store](sections/01-Introduction.md#store)\n  - [Visualize](sections/01-Introduction.md#visualize)\n- [Who Companies Need](sections/01-Introduction.md#who-companies-need)\n- [How to Learn Data Engineering](sections/01-Introduction.md#how-to-learn-data-engineering)\n\t- [Andreas on the Super Data Science Podcast](sections/01-Introduction.md#Interview-with-Andreas-on-the-Super-Data-Science-Podcast)\n\t- [Building Blocks to Learn Data Engineering](sections/01-Introduction.md#building-blocks-to-learn-data-engineering)\n  - [Roadmap for Beginners](sections/01-Introduction.md#roadmap-for-beginners)\n\t- [Roadmap for Data  Analysts](sections/01-Introduction.md#roadmap-for-data-analysts)\n\t- [Roadmap for Data Scientists](sections/01-Introduction.md#roadmap-for-data-scientists)\n\t- [Roadmap for Software Engineers](sections/01-Introduction.md#roadmap-for-software-engineers)\n- [Data Engineers Skills Matrix](sections/01-Introduction.md#data-engineers-skills-matrix)\n- [How to Become a Senior Data Engineer](sections/01-Introduction.md#how-to-become-a-senior-data-engineer)\n\n## Basic Engineering Skills\n- [Learn To Code](sections/02-BasicSkills.md#learn-to-code)\n- [Get Familiar With Git](sections/02-BasicSkills.md#get-familiar-with-git)\n- [Agile Development](sections/02-BasicSkills.md#agile-development)\n  - [Why is agile so important?](sections/02-BasicSkills.md#Why-is-agile-so-important)\n  - [Agile rules I learned over the years](sections/02-BasicSkills.md#agile-rules-i-learned-over-the-years)\n  - [Agile Frameworks](sections/02-BasicSkills.md#agile-frameworks)\n    - [Scrum](sections/02-BasicSkills.md#scrum)\n    - [OKR](sections/02-BasicSkills.md#okr)\n- [Software Engineering Culture](sections/02-BasicSkills.md#software-engineering-culture)\n- [Learn how a Computer Works](sections/02-BasicSkills.md#learn-how-a-computer-works)\n- [Data Network Transmission](sections/02-BasicSkills.md#data-network-transmission)\n- [Security and Privacy](sections/02-BasicSkills.md#security-and-privacy)\n  - [SSL Public and Private Key Certificates](sections/02-BasicSkills.md#ssl-public-and-private-key-Certificates)\n  - [JSON Web Tokens](sections/02-BasicSkills.md#json-web-tokens)\n  - [GDPR regulations](sections/02-BasicSkills.md#gdpr-regulations)\n- [Linux](sections/02-BasicSkills.md#linux)\n  - [OS Basics](sections/02-BasicSkills.md#os-basics)\n  - [Shell scripting](sections/02-BasicSkills.md#shell-scripting)\n  - [Cron Jobs](sections/02-BasicSkills.md#cron-jobs)\n  - [Packet Management](sections/02-BasicSkills.md#packet-management)\n- [Docker](sections/02-BasicSkills.md#docker)\n  - [What is Docker and How it Works](sections/02-BasicSkills.md#what-is-docker-and-what-do-you-use-it-for)\n    -  [Don't Mess Up Your System](sections/02-BasicSkills.md#dont-mess-up-your-system)\n    - [Preconfigured Images](sections/02-BasicSkills.md#preconfigured-images)\n    - [Take it With You](sections/02-BasicSkills.md#take-it-with-you)\n    - [Kubernetes Container Deployment](sections/02-BasicSkills.md#kubernetes-container-deployment)\n    - [How to Create Start and Stop a Container](sections/02-BasicSkills.md#how-to-create-start-stop-a-container)\n    - [Docker Micro Services](sections/02-BasicSkills.md#docker-micro-services)\n    - [Kubernetes](sections/02-BasicSkills.md#kubernetes)\n    - [Why and How To Do Docker Container Orchestration](sections/02-BasicSkills.md#why-and-how-to-do-docker-container-orchestration)\n    - [Userful Docker Commands](sections/02-BasicSkills.md#useful-docker-commands)\n- [The Cloud](sections/02-BasicSkills.md#the-cloud)\n  - [IaaS vs PaaS vs SaaS](sections/02-BasicSkills.md#iaas-vs-paas-vs-saas)\n  - [AWS Azure IBM Google IBM](sections/02-BasicSkills.md#aws-azure-ibm-google)\n  - [Cloud vs On-Premises](sections/02-BasicSkills.md#cloud-vs-on-premises)\n  - [Security](sections/02-BasicSkills.md#security)\n  - [Hybrid Clouds](sections/02-BasicSkills.md#hybrid-clouds)\n- [Security Zone Design](sections/02-BasicSkills.md#security-zone-design)\n  - [How to secure a multi layered application](sections/02-BasicSkills.md#how-to-secure-a-multi-layered-application)\n  - [Cluster security with Kerberos](sections/02-BasicSkills.md#cluster-security-with-kerberos)\n\n## Advanced Engineering Skills\n- [Data Science Platform](sections/03-AdvancedSkills.md#data-science-platform)\n  - [Why a Good Data Platform Is Important](sections/03-AdvancedSkills.md#why-a-good-data-platform-is-important)\n  - [Big Data vs Data Science and Analytics](sections/03-AdvancedSkills.md#Big-Data-vs-Data-Science-and-Analytics)\n  - [The 4 Vs of Big Data](sections/03-AdvancedSkills.md#the-4-vs-of-big-data)\n  - [Why Big Data](sections/03-AdvancedSkills.md#why-big-data)\n    - [Planning is Everything](sections/03-AdvancedSkills.md#planning-is-everything)\n    - [The Problem with ETL](sections/03-AdvancedSkills.md#the-problem-with-etl)\n    - [Scaling Up](sections/03-AdvancedSkills.md#scaling-up)\n    - [Scaling Out](sections/03-AdvancedSkills.md#scaling-out)\n    - [When not to Do Big Data](sections/03-AdvancedSkills.md#please-dont-go-big-data)\n- [81 Platform & Pipeline Design Questions](sections/03-AdvancedSkills.md#81-platform-and-pipeline-design-questions)\n  - [Data Source Questions](sections/03-AdvancedSkills.md#data-source-questions)\n  - [Goals and Destination Questions](sections/03-AdvancedSkills.md#goals-and-destination-questions)\n- [Connect](sections/03-AdvancedSkills.md#connect)\n  - [REST APIs](sections/03-AdvancedSkills.md#rest-apis)\n    - [API Design](sections/03-AdvancedSkills.md#api-design)\n    - [Implemenation Frameworks](sections/03-AdvancedSkills.md#implementation-frameworks)\n    - [Security](sections/03-AdvancedSkills.md#security)\n  - [Apache Nifi](sections/03-AdvancedSkills.md#apache-nifi)\n  - [Logstash](sections/03-AdvancedSkills.md#logstash)\n- [Buffer](sections/03-AdvancedSkills.md#buffer)\n  - [Apache Kafka](sections/03-AdvancedSkills.md#apache-kafka)\n    - [Why a Message Queue Tool?](sections/03-AdvancedSkills.md#why-a-message-queue-tool)\n    - [Kafka Architecture](sections/03-AdvancedSkills.md#kafka-architecture)\n    - [Kafka Topics](sections/03-AdvancedSkills.md#what-are-topics)\n    - [Kafka and Zookeeper](sections/03-AdvancedSkills.md#what-does-zookeeper-have-to-do-with-kafka)\n    - [How to Produce and Consume Messages](sections/03-AdvancedSkills.md#how-to-produce-and-consume-messages)\n    - [Kafka Commands](sections/03-AdvancedSkills.md#kafka-commands)\n  - [Apache Redis Pub-Sub](sections/03-AdvancedSkills.md#redis-pub-sub)\n  - [AWS Kinesis](sections/03-AdvancedSkills.md#apache-kafka)\n  - [Google Cloud PubSub](sections/03-AdvancedSkills.md#google-cloud-pubsub)\n- [Processing Frameworks](sections/03-AdvancedSkills.md#processing-frameworks)\n\t- [Lambda and Kappa Architecture](sections/03-AdvancedSkills.md#lambda-and-kappa-architecture)\n\t- [Batch Processing](sections/03-AdvancedSkills.md#batch-processing)\n\t- [Stream Processing](sections/03-AdvancedSkills.md#stream-processing)\n\t\t- [Three Methods of Streaming](sections/03-AdvancedSkills.md#three-methods-of-streaming)\n\t\t- [At Least Once](sections/03-AdvancedSkills.md#at-least-once)\n\t\t- [At Most Once](sections/03-AdvancedSkills.md#at-most-once)\n\t\t- [Exactly Once](sections/03-AdvancedSkills.md#exactly-once)\n\t\t- [Check The Tools](sections/03-AdvancedSkills.md#check-the-tools)\n\t- [Should You do Stream or Batch Processing](sections/03-AdvancedSkills.md#should-you-do-stream-or-batch-processing)\n\t- [Is ETL still relevant for Analytics?](sections/03-AdvancedSkills.md#is-etl-still-relevant-for-analytics)\n  - [MapReduce](sections/03-AdvancedSkills.md#mapreduce)\n    - [How Does MapReduce Work](sections/03-AdvancedSkills.md#How-does-mapreduce-work)\n    - [MapReduce](sections/03-AdvancedSkills.md#mapreduce)\n    - [MapReduce Example](sections/03-AdvancedSkills.md#example)\n    - [MapReduce Limitations](sections/03-AdvancedSkills.md#What-is-the-limitation-of-mapreduce)\n  - [Apache Spark](sections/03-AdvancedSkills.md#apache-spark)\n    - [What is the Difference to MapReduce?](sections/03-AdvancedSkills.md#what-is-the-difference-to-MapReduce)\n    - [How Spark Fits to Hadoop](sections/03-AdvancedSkills.md#how-does-spark-fit-to-hadoop)\n    - [Spark vs Hadoop](sections/03-AdvancedSkills.md#wheres-the-difference)\n    - [Spark and Hadoop a Perfect Fit](sections/03-AdvancedSkills.md#spark-and-hadoop-is-a-perfect-fit)\n    - [Spark on YARn](sections/03-AdvancedSkills.md#spark-on-yarn)\n    - [My Simple Rule of Thumb](sections/03-AdvancedSkills.md#my-simple-rule-of-thumb)\n    - [Available Languages](sections/03-AdvancedSkills.md#available-languages)\n    - [Spark Driver Executor and SparkContext](sections/03-AdvancedSkills.md#how-spark-works-driver-executor-sparkcontext)\n    - [Spark Batch vs Stream processing](sections/03-AdvancedSkills.md#spark-batch-vs-stream-processing)\n    - [How Spark uses Data From Hadoop](sections/03-AdvancedSkills.md#How-does-spark-use-data-from-hadoop)\n    - [What are RDDs and How to Use Them](sections/03-AdvancedSkills.md#what-are-rdds-and-how-to-use-them)\n    - [SparkSQL How and Why to Use It](sections/03-AdvancedSkills.md#available-languages)\n    - [What are Dataframes and How to Use Them](sections/03-AdvancedSkills.md#what-are-dataframes-how-to-use-them)\n    - [Machine Learning on Spark (TensorFlow)](sections/03-AdvancedSkills.md#machine-learning-on-spark-tensor-flow)\n    - [MLlib](sections/03-AdvancedSkills.md#mllib)\n    - [Spark Setup](sections/03-AdvancedSkills.md#spark-setup)\n    - [Spark Resource Management](sections/03-AdvancedSkills.md#spark-resource-management)\n  - [AWS Lambda](sections/03-AdvancedSkills.md#apache-flink)  \n  - [Apache Flink](sections/03-AdvancedSkills.md#apache-flink)\n  - [Elasticsearch](sections/03-AdvancedSkills.md#elasticsearch)\n  - [Apache Drill](sections/03-AdvancedSkills.md#apache-drill)\n  - [StreamSets](sections/03-AdvancedSkills.md#streamsets)\n- [Store](sections/03-AdvancedSkills.md#store)\n\t- [Analytical Data Stores](03-AdvancedSkills.md#analytical-data-stores)\n\t\t- [Data Warehouse vs Data Lake](sections/03-AdvancedSkills.md#data-warehouse-vs-data-lake)\n\t\t- [Snowflake and dbt](sections/03-AdvancedSkills.md#snowflake-and-dbt)\n\t- [Transactional Data Stores](sections/03-AdvancedSkills.md#transactional-data-stores)\n\t\t- [SQL Databases](sections/03-AdvancedSkills.md#sql-databases)\n\t    - [PostgreSQL DB](sections/03-AdvancedSkills.md#postgresql-db)\n\t    - [Database Design](sections/03-AdvancedSkills.md#database-design)\n\t    - [SQL Queries](sections/03-AdvancedSkills.md#sql-queries)\n\t    - [Stored Procedures](sections/03-AdvancedSkills.md#stored-procedures)\n\t    - [ODBC/JDBC Server Connections](sections/03-AdvancedSkills.md#odbc-jdbc-server-connections)\n\t  - [NoSQL Stores](sections/03-AdvancedSkills.md#nosql-stores)\n\t    - [HBase KeyValue Store](sections/03-AdvancedSkills.md#keyvalue-stores-hbase)\n\t    - [HDFS Document Store](sections/03-AdvancedSkills.md#document-stores-hdfs)\n\t    - [MongoDB Document Store](sections/03-AdvancedSkills.md#document-stores-mongodb)\n\t    - [Elasticsearch Document Store](sections/03-AdvancedSkills.md#Elasticsearch-search-engine-and-document-store)\n\t    - [Hive Warehouse](sections/03-AdvancedSkills.md#hive-warehouse)\n\t    - [Impala](sections/03-AdvancedSkills.md#impala)\n\t    - [Kudu](sections/03-AdvancedSkills.md#kudu)\n\t    - [Apache Druid](sections/03-AdvancedSkills.md#apache-druid)\n\t    - [InfluxDB Time Series Database](sections/03-AdvancedSkills.md#influxdb-time-series-database)\n\t    - [Greenplum MPP Database](sections/03-AdvancedSkills.md#mpp-databases-greenplum)\n- [Visualize](sections/03-AdvancedSkills.md#visualize)\n  - [Android and IOS](sections/03-AdvancedSkills.md#android-and-ios)\n  - [API Design for Mobile Apps](sections/03-AdvancedSkills.md#how-to-design-apis-for-mobile-apps)\n  - [Dashboards](sections/03-AdvancedSkills.md#dashboards)\n    - [Grafana](sections/03-AdvancedSkills.md#grafana)\n    - [Kibana](sections/03-AdvancedSkills.md#kibana)\n  - [Webservers](sections/03-AdvancedSkills.md#how-to-use-webservers-to-display-content)\n    - [Tomcat](sections/03-AdvancedSkills.md#tomcat)\n    - [Jetty](sections/03-AdvancedSkills.md#jetty)\n    - [NodeRED](sections/03-AdvancedSkills.md#nodered)\n    - [React](sections/03-AdvancedSkills.md#react)\n  - [Business Intelligence Tools](sections/03-AdvancedSkills.md#business-intelligence-tools)\n    - [Tableau](sections/03-AdvancedSkills.md#tableau)\n    - [Power BI](sections/03-AdvancedSkills.md#power-bi)\n    - [Quliksense](sections/03-AdvancedSkills.md#quliksense)\n  - [Identity & Device Management](sections/03-AdvancedSkills.md#Identity-and-device-management)\n    - [What Is A Digital Twin](sections/03-AdvancedSkills.md#what-is-a-digital-twin)\n    - [Active Directory](sections/03-AdvancedSkills.md#active-directory)\n- [Machine Learning](sections/03-AdvancedSkills.md#machine-learning)\n  - [How to do Machine Learning in production](sections/03-AdvancedSkills.md#how-to-domachine-learning-in-production)\n  - [Why machine learning in production is harder then you think](sections/03-AdvancedSkills.md#why-machine-learning-in-production-is-harder-then-you-think)\n  - [Models Do Not Work Forever](sections/03-AdvancedSkills.md#models-do-not-work-forever)\n  - [Where are The Platforms That Support Machine Learning](sections/03-AdvancedSkills.md#where-are-the-platforms-that-support-this)\n  - [Training Parameter Management](sections/03-AdvancedSkills.md#training-parameter-management)\n  - [How to Convince People That Machine Learning Works](sections/03-AdvancedSkills.md#how-to-convince-people-machine-learning-works)\n  - [No Rules No Physical Models](sections/03-AdvancedSkills.md#no-rules-no-physical-models)\n  - [You Have The Data. Use It!](sections/03-AdvancedSkills.md#you-have-the-data-use-it)\n  - [Data is Stronger Than Opinions](sections/03-AdvancedSkills.md#data-is-stronger-than-opinions)\n  - [AWS Sagemaker](sections/03-AdvancedSkills.md#aws-sagemaker)\n\n\n## Hands On Course\n\n- [Free Data Engineering Course with AWS, TDengine, Docker and Grafana](sections/04-HandsOnCourse.md#free-data-engineering-course-with-aws-tdengine-docker-and-grafana)\n- [Monitor your data in dbt & detect quality issues with Elementary](sections/04-HandsOnCourse.md#monitor-your-data-in-dbt-and-detect-quality-issues-with-elementary)\n- [Solving Engineers 4 Biggest Airflow Problems](sections/04-HandsOnCourse.md#solving-engineers-4-biggest-airflow-problems)\n- [The best alternative to Airlfow? Mage.ai](sections/04-HandsOnCourse.md#the-best-alternative-to-airlfow?-mage.ai)\n\n## Case Studies\n\n- [Data Science @Airbnb](sections/05-CaseStudies.md#data-science-at-Airbnb)\n- [Data Science @Amazon](sections/05-CaseStudies.md#data-science-at-Amazon)\n- [Data Science @Baidu](sections/05-CaseStudies.md#data-science-at-Baidu)\n- [Data Science @Blackrock](sections/05-CaseStudies.md#data-science-at-Blackrock)\n- [Data Science @BMW](sections/05-CaseStudies.md#data-science-at-BMW)\n- [Data Science @Booking.com](sections/05-CaseStudies.md#data-science-at-Booking.com)\n- [Data Science @CERN](sections/05-CaseStudies.md#data-science-at-CERN)\n- [Data Science @Disney](sections/05-CaseStudies.md#data-science-at-Disney)\n- [Data Science @DLR](sections/05-CaseStudies.md#data-science-at-DLR)\n- [Data Science @Drivetribe](sections/05-CaseStudies.md#data-science-at-Drivetribe)\n- [Data Science @Dropbox](sections/05-CaseStudies.md#data-science-at-Dropbox)\n- [Data Science @Ebay](sections/05-CaseStudies.md#data-science-at-Ebay)\n- [Data Science @Expedia](sections/05-CaseStudies.md#data-science-at-Expedia)\n- [Data Science @Facebook](sections/05-CaseStudies.md#data-science-at-Facebook)\n- [Data Science @Google](sections/05-CaseStudies.md#data-science-at-Google)\n- [Data Science @Grammarly](sections/05-CaseStudies.md#data-science-at-Grammarly)\n- [Data Science @ING Fraud](sections/05-CaseStudies.md#data-science-at-ING-Fraud)\n- [Data Science @Instagram](sections/05-CaseStudies.md#data-science-at-Instagram)\n- [Data Science @LinkedIn](sections/05-CaseStudies.md#data-science-at-LinkedIn)\n- [Data Science @Lyft](sections/05-CaseStudies.md#data-science-at-Lyft)\n- [Data Science @NASA](sections/05-CaseStudies.md#data-science-at-NASA)\n- [Data Science @Netflix](sections/05-CaseStudies.md#data-science-at-Netflix)\n- [Data Science @OLX](sections/05-CaseStudies.md#data-science-at-OLX)\n- [Data Science @OTTO](sections/05-CaseStudies.md#data-science-at-OTTO)\n- [Data Science @Paypal](sections/05-CaseStudies.md#data-science-at-Paypal)\n- [Data Science @Pinterest](sections/05-CaseStudies.md#data-science-at-Pinterest)\n- [Data Science @Salesforce](sections/05-CaseStudies.md#data-science-at-Salesforce)\n- [Data Science @Siemens Mindsphere](sections/05-CaseStudies.md#data-science-at-Siemens-Mindsphere)\n- [Data Science @Slack](sections/05-CaseStudies.md#data-science-at-Slack)\n- [Data Science @Spotify](sections/05-CaseStudies.md#data-science-at-Spotify)\n- [Data Science @Symantec](sections/05-CaseStudies.md#data-science-at-Symantec)\n- [Data Science @Tinder](sections/05-CaseStudies.md#data-science-at-Tinder)\n- [Data Science @Twitter](sections/05-CaseStudies.md#data-science-at-Twitter)\n- [Data Science @Uber](sections/05-CaseStudies.md#data-science-at-Uber)\n- [Data Science @Upwork](sections/05-CaseStudies.md#data-science-at-Upwork)\n- [Data Science @Woot](sections/05-CaseStudies.md#data-science-at-Woot)\n- [Data Science @Zalando](sections/05-CaseStudies.md#data-science-at-Zalando)\n\n## Best Practices Cloud Platforms\n\n- [Amazon Web Services (AWS)](sections/06-BestPracticesCloud.md#aws)\n  - [Connect](sections/06-BestPracticesCloud.md#Connect)\n  - [Buffer](sections/06-BestPracticesCloud.md#Buffer)\n  - [Processing](sections/06-BestPracticesCloud.md#Processing)\n  - [Store](sections/06-BestPracticesCloud.md#Store)\n  - [Visualize](sections/06-BestPracticesCloud.md#Visualize)\n  - [Containerization](sections/06-BestPracticesCloud.md#Containerization)\n  - [Best Practices](sections/06-BestPracticesCloud.md#Best-Practices)\n  - [More Details](sections/06-BestPracticesCloud.md#More-Details)\n- [Microsoft Azure](sections/06-BestPracticesCloud.md#azure)\n  - [Connect](sections/06-BestPracticesCloud.md#Connect-1)\n  - [Buffer](sections/06-BestPracticesCloud.md#Buffer-1)\n  - [Processing](sections/06-BestPracticesCloud.md#Processing-1)\n  - [Store](sections/06-BestPracticesCloud.md#Store-1)\n  - [Visualize](sections/06-BestPracticesCloud.md#Visualize-1)\n  - [Containerization](sections/06-BestPracticesCloud.md#Containerization-1)\n  - [Best Practices](sections/06-BestPracticesCloud.md#Best-Practices-1)\n- [Google Cloud Platform (GCP)](sections/06-BestPracticesCloud.md#gcp)\n  - [Connect](sections/06-BestPracticesCloud.md#Connect-2)\n  - [Buffer](sections/06-BestPracticesCloud.md#Buffer-2)\n  - [Processing](sections/06-BestPracticesCloud.md#Processing-2)\n  - [Store](sections/06-BestPracticesCloud.md#Store-2)\n  - [Visualize](sections/06-BestPracticesCloud.md#Visualize-2)\n  - [Containerization](sections/06-BestPracticesCloud.md#Containerization-2)\n  - [Best Practices](sections/06-BestPracticesCloud.md#Best-Practices-2)\n\n## 130+ Free Data Sources For Data Science\n\n- [Student Favorites](sections/07-DataSources.md#Student-Favorites)\n- [General And Academic](sections/07-DataSources.md#General-And-Academic)\n- [Content Marketing](sections/07-DataSources.md#Content-Marketing)\n- [Crime](sections/07-DataSources.md#Crime)\n- [Drugs](sections/07-DataSources.md#Drugs)\n- [Education](sections/07-DataSources.md#Education)\n- [Entertainment](sections/07-DataSources.md#Entertainment)\n- [Environmental And Weather Data](sections/07-DataSources.md#Environmental-And-Weather-Data)\n- [Financial And Economic Data](sections/07-DataSources.md#Financial-And-Economic-Data])\n- [Government And World](sections/07-DataSources.md#Government-And-World)\n- [Health](sections/07-DataSources.md#Health)\n- [Human Rights](sections/07-DataSources.md#Human-Rights)\n- [Labor And Employment Data](sections/07-DataSources.md#Labor-And-Employment-Data)\n- [Politics](sections/07-DataSources.md#Politics)\n- [Retail](sections/07-DataSources.md#Retail)\n- [Social](sections/07-DataSources.md#Social)\n- [Travel And Transportation](sections/07-DataSources.md#Travel-And-Transportation)\n- [Various Portals](sections/07-DataSources.md#Various-Portals)\n- [Source Articles and Blog Posts](sections/07-DataSources.md#Source-Articles-and-Blog-Posts)\n- [Free Data Sources Data Science](sections/07-DataSources.md)\n\n## 1001 Interview Questions\n\n- [Interview Questions](sections/08-InterviewQuestions.md)\n\n## Recommended Books, Courses, and Podcasts\n\n- [About Books and Courses](sections/09-BooksAndCourses.md#about-books-and-courses)\n- [Books](sections/09-BooksAndCourses.md#books)\n  - [Languages](sections/09-BooksAndCourses.md#books-languages)\n  - [Data Tools & Platforms](sections/09-BooksAndCourses.md#books-data-science-tools)\n  - [Business](sections/09-BooksAndCourses.md#Books-Business)\n  - [Community Recommendations](sections/09-BooksAndCourses.md#Community-Recommendations)\n- [Online Courses](sections/09-BooksAndCourses.md#online-courses)\n  - [Preparation courses](sections/09-BooksAndCourses.md#Preparation-courses)\n  - [Data engineering courses](sections/09-BooksAndCourses.md#Data-engineering-courses)\n- [Certifications](sections/09-BooksAndCourses.md#Certifications)\n- [Podcasts](sections/09-BooksAndCourses.md#Podcasts)\n  - [Super Data Science](sections/09-BooksAndCourses.md#Super-Data-Science)\n  - [Data Skeptic](sections/09-BooksAndCourses.md#Data-Skeptic)\n  - [Data Engineering Podcast](sections/09-BooksAndCourses.md#Data-Engineering-Podcast)\n  - [Roaring Elephant BiteSized Big Tech](sections/09-BooksAndCourses.md#Roaring-Elephant-BiteSized-Big-Tech)\n  - [SQL Data Partners Podcast](sections/09-BooksAndCourses.md#SQL-Data-Partners-Podcast)\n\n\n## How To Contribute\nIf you have some cool links or topics for the cookbook, please become a contributor.\n\nSimply pull the repo, add your ideas and create a pull request.\nYou can also open an issue and put your thoughts there.\n\nPlease use the \"Issues\" function for comments.\n\n\n## Important Links\n\nSubscribe to my YouTube channel for regular updates:\n[Link to YouTube](https://www.youtube.com/channel/UCY8mzqqGwl5_bTpBY9qLMAA)\n\nI have a Medium publication where you can publish your data engineer articles to reach more people:\n[Medium publication](https://link.medium.com/9oi1VDrhPW)\n\n<br>\n*(As an Amazon Associate I earn from qualifying purchases from Amazon\nThis is free of charge for you, but super helpful for supporting this channel)\n"
  },
  {
    "path": "sections/01-Introduction.md",
    "content": "\nIntroduction\n============\n\n## Contents\n\n- [What is this Cookbook](01-Introduction.md#what-is-this-cookbook)\n- [Data Engineers](01-Introduction.md#data-engineers)\n- [My Data Science Platform Blueprint](01-Introduction.md#my-data-science-platform-blueprint)\n  - [Connect](01-Introduction.md#connect)\n  - [Buffer](01-Introduction.md#buffer)\n  - [Processing Framework](01-Introduction.md#processing-framework)\n  - [Store](01-Introduction.md#store)\n  - [Visualize](01-Introduction.md#visualize)\n- [Who Companies Need](01-Introduction.md#who-companies-need)\n- [How to Learn Data Engineering](01-Introduction.md#how-to-learn-data-engineering)\n  - [Andreas interview on the Super Data Science Podcast](01-Introduction.md#Interview-with-Andreas-on-the-Super-Data-Science-Podcast)\n  - [Building Blocks to Learn Data Engineering](01-Introduction.md#building-blocks-to-learn-data-engineering)\n  - [Roadmap for Beginners](01-Introduction.md#roadmap-for-data-analysts)\n  - [Roadmap for Data Analysts](01-Introduction.md#roadmap-for-data-analysts)\n  - [Roadmap for Data Scientists](01-Introduction.md#roadmap-for-data-scientists)\n  - [Roadmap for Software Engineers](01-Introduction.md#roadmap-for-software-engineers)\n- [Data Engineers Skills Matrix](01-Introduction.md#data-engineers-skills-matrix)\n- [How to Become a Senior Data Engineer](01-Introduction.md#how-to-become-a-senior-data-engineer)\n\n\n\n## What is this Cookbook\n\nI get asked a lot:\n\"What do you actually need to learn to become an awesome data engineer?\"\n\nWell, look no further. You'll find it here!\n\nIf you are looking for AI algorithms and such data scientist things,\nthis book is not for you.\n\n**How to use this Cookbook:**\nThis book is intended to be a starting point for you. It is not a training! I want to help you to identify the topics to look into to become an awesome data engineer in the process.\n\nIt hinges on my Data Science Platform Blueprint. Check it out below. Once you understand it, you can find in the book tools that fit into each key area of a Data Science platform (Connect, Buffer, Processing Framework, Store, Visualize).\n\nSelect a few tools you are interested in, then research and work with them.\n\nDon't learn everything in this book! Focus.\n\n**What types of content are in this book?**\nYou are going to find five types of content in this book: Articles\nI wrote, links to my podcast episodes (video & audio), more than 200\nlinks to helpful websites I like, data engineering interview questions\nand case studies.\n\n**This book is a work in progress!**\nAs you can see, this book is not finished. I'm constantly adding new\nstuff and doing videos for the topics. But, obviously, because I do this\nas a hobby, my time is limited. You can help make this book even\nbetter.\n\n**Help make this book awesome!**\nIf you have some cool links or topics for the cookbook, please become a\ncontributor on GitHub: <https://github.com/andkret/Cookbook>. Fork the\nrepo, add them, and create a pull request. Or join the discussion by\nopening Issues. Tell me your thoughts, what you value,\nwhat you think should be included, or correct me where I am wrong.\nYou can also write me an email any time to\nplumbersofdatascience\\@gmail.com anytime.\n\n**This Cookbook is and will always be free!**\n\n\n## If You Like This Book & Need More Help:\nCheck out my Data Engineering Academy at LearnDataEngineering.com\n\n**Visit learndataengineering.com:** [Click Here](https://learndataengineering.com)\n\n- Huge Step by step Data Engineering Academy with over 30 courses\n- Unlimited access incl. future courses during subsciption\n- Access to all courses and example projects in the Academy\n- Associate Data Engineer Certification\n- Data Engineering on AWS E-Commerce example project\n- Microsoft Azure example project\n- Document Streaming example project with Docker, FastAPI, Apache Kafka, Apache Spark,\n- MongoDB and Streamlit\n- Time Series example project with InfluxDB and Grafana\n- Lifetime access to the private Discord Workspace\n- Course certificates\n- Currently over 54 hours of videos\n\n\n## Support This Book For Free!\n- **Amazon:** [Click Here](https://www.amazon.com/shop/plumbersofdatascience) buy whatever you like from Amazon using this link* (Also check out my complete podcast gear and books)\n\n\n## How To Contribute\nIf you have some cool links or topics for the cookbook, please become a contributor.\n\nSimply pull the repo, add your ideas and create a pull request.\nYou can also open an issue and put your thoughts there.\n\nPlease use the \"Issues\" function for comments.\n\n\n\nData Engineers\n-------------------------------\n\n\nData Engineers are the link between the management's data strategy\nand the data scientists or analysts that need to work with data.\n\nWhat they do is build the platforms that enable data scientists to do\ntheir magic.\n\nThese platforms are usually used in five different ways:\n\n-   Data ingestion and storage of large amounts of data.\n\n-   Algorithm creation by data scientists.\n\n-   Automation of the data scientist's machine learning models and\n    algorithms for production use.\n\n-   Data visualization for employees and customers.\n\n-   Most of the time these guys start as traditional solution architects\n    for systems that involve SQL databases, web servers, SAP\n    installations and other \"standard\" systems.\n\nBut, to create big data platforms, the engineer needs to be an expert in\nspecifying, setting up, and maintaining big data technologies like:\nHadoop, Spark, HBase, Cassandra, MongoDB, Kafka, Redis, and more.\n\nWhat they also need is experience on how to deploy systems on cloud\ninfrastructure like at Amazon or Google, or on-premise hardware.\n\n\n| Podcast Episode: #048 From Wannabe Data Scientist To Engineer My Journey\n|------------------|\n|In this episode Kate Strachnyi interviews me for her humans of data science podcast. We talk about how I found out that I am more into the engineering part of data science.  \n| [Watch on YouTube](https://youtu.be/pIZkTuN5AMM) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/048-From-Wannabe-Data-Scientist-To-Engineer-My-Journey-e45i2o)|\n\n\n## My Data Science Platform Blueprint\n\nI have created a simple and modular big data platform\nblueprint. It is based on what I have seen in the field and\nread in tech blogs all over the internet.\n\nWhy do I believe it will be super useful to you? Because, unlike other blueprints, it is not focused on technology.\n\nFollowing my blueprint will allow you to create the big data platform\nthat fits exactly your needs. Building the perfect platform will allow\ndata scientists to discover new insights. It will enable you to perfectly handle big data and allow you to make\ndata-driven decisions.\n\nThe blueprint is focused on the five key areas: Connect, Buffer, Processing Frameworks, Store, and Visualize.\n\n![Data Science Platform Blueprint](/images/Data-Science-Blueprint-New.jpg)\n\nHaving the platform split like this turns it into a modular platform with\nloosely coupled interfaces.\n\nWhy is it so important to have a modular platform?\n\nIf you have a platform that is not modular, you end up with something\nthat is fixed or hard to modify. This means you can not adjust the\nplatform to changing requirements of the company.\n\nBecause of modularity, it is possible to specifically select tools for your use case. It also allows you to replace every component, if you need it.\n\nNow, lets talk more about each key area.\n\n### Connect\n\nIngestion is all about getting the data in from the source and making it\navailable to later stages. Sources can be everything from tweets to server\nlogs, to IoT sensor data (e.g. from cars).\n\nSources send data to your API Services. The API is going to push the\ndata into temporary storage.\n\nThe temporary storage allows other stages simple and fast access to\nincoming data.\n\nA great solution is to use messaging queue systems like Apache Kafka,\nRabbitMQ or AWS Kinesis. Sometimes people also use caches for\nspecialised applications like Redis.\n\nA good practice is that the temporary storage follows the\npublish-subscribe pattern. This way APIs can publish messages and\nAnalytics can quickly consume them.\n\n### Buffer\n\nIn the buffer phase you have pub/sub systems like Apache Kafka, Redis, or other Cloud tools like Google pub/sub or AWS Kinesis.\n\nThese systems are more or less message Queues.\nYou put something in on one side and take it out on the other.\n\nThe idea behind buffers is to have an intermediate system for the incoming data.\n\nHow this works is, for instance, you're getting data in from from an API.\nThe API is publishing into the message queue. Data is buffered there until it is picked up by the processing.\n\nIf you don't have a buffer, you can run into problems when writing directly into a store or you're processing the data directly. You can always have peaks of incoming data that stall the systems.\n\nLike, it's lunch break and people are working with your app way more than usual.\nThere's more data coming in very very fast, faster than the analytics of the storage can handle.\n\nIn this case, you would run into problems, because the whole system would stall. It would therefore take long to process the data, and your customers would be annoyed.\n\nWith a buffer, you buffer the incoming data. Processes for storage and analytics can take out only as much data as they can process. You are no longer in danger of overpowering systems.\n\nBuffers are also really good for building pipelines.\n\nYou take data out of Kafka, pre-process it, and put it back into Kafka.\nThen, with another analytics process, you take the processed data back out and put it into a store.\n\nTa-da! A pipeline.\n\n### Processing Framework\n\nThe analyse stage is where the actual analytics is done in\nthe form of stream and batch processing.\n\nStreaming data is taken from ingest and fed into analytics. Streaming\nanalyses the \"live\" data, thus generating fast results.\n\nAs the central and most important stage, analytics also has access to\nthe big data storage. Because of that connection, analytics can take a\nbig chunk of data and analyse it.\n\nThis type of analysis is called batch processing. It will deliver you\nanswers for the big questions.\n\nFor a short video about batch and stream processing and their use cases, click on the link below:\n\n[Adding Batch to a Streaming Pipeline](https://www.youtube.com/watch?v=o-aGi3FmdfU)\n\nThe analytics process, batch or streaming, is not a one-way process.\nAnalytics can also write data back to the big data storage.\n\nOftentimes, writing data back to the storage makes sense. It allows you\nto combine previous analytics outputs with the raw data.\n\nAnalytics give insights when you combine\nraw data. This combination will often allow you to create even more\nuseful insights.\n\nA wide variety of analytics tools are available. Ranging from MapReduce\nor AWS Elastic MapReduce to Apache Spark and AWS lambda.\n\n### Store\n\nThis is the typical big-data storage where you just store everything. It\nenables you to analyse the big picture.\n\nMost of the data might seem useless for now, but it is of utmost\nimportance to keep it. Throwing data away is a big no-no.\n\nWhy not throw something away when it is useless?\n\nAlthough it seems useless for now, data scientists can work with the\ndata. They might find new ways to analyse the data and generate valuable\ninsights from it.\n\nWhat kind of systems can be used to store big data?\n\nSystems like Hadoop HDFS, Hbase, Amazon S3 or DynamoDB are a perfect fit\nto store big data.\n\nCheck out my podcast how to decide between SQL and NoSQL:\n<https://anchor.fm/andreaskayy/embed/episodes/NoSQL-Vs-SQL-How-To-Choose-e12f1o>\n\n### Visualize\n\nDisplaying data is as important as ingesting, storing, and analysing it.\nVisualizations enable business users to make data-driven decisions.\n\nThis is why it is important to have a good visual presentation of the\ndata. Sometimes you have a lot of different use cases or projects using\nthe platform.\n\nIt might not be possible to build the perfect UI that fits\neveryone's needs. What you should do in this case is enable others to build the\nperfect UI themselves.\n\nHow to do that? By creating APIs to access the data and making them\navailable to developers.\n\nEither way, UI or API, the trick is to give the display stage direct\naccess to the data in the big-data cluster. This kind of access will\nallow the developers to use analytics results as well as raw data to\nbuild the perfect application.\n\n\n## Who Companies Need\n\nFor a company, it is important to have well-trained data engineers.\n\nThat's why companies are looking for people with experience of tools in every part of the above platform blueprint. One common theme I see is cloud platform experience on AWS, Azure or GCP.\n\n## How to Learn Data Engineering\n\n### Interview with Andreas on the Super Data Science Podcast\n\n#### Summary\n\nThis interview with Andreas  on Jon Krohn's Super Data Science podcast delves into the intricacies of data engineering, highlighting its critical role in the broader data science ecosystem. Andreas, calling from Northern Bavaria, Germany, shares his journey from a data analyst to becoming a renowned data engineering educator through his Learn Data Engineering Academy. The conversation touches upon the foundational importance of data engineering in ensuring data quality, scalability, and accessibility for data scientists and analysts.\n\nAndreas emphasizes that the best data engineers often have a background in the companies domain/niche, which equips them with a deep understanding of the end user's needs. The discussion also explores the essential tools and skills required in the field, such as relational databases, APIs, ETL tools, data streaming with Kafka, and the significance of learning platforms like AWS, Azure, and GCP. Andreas highlights the evolving landscape of data engineering, with a nod towards the emergence of roles like analytics engineers and the increasing importance of automation and advanced data processing tools like Snowflake, Databricks, and DBT.\n\nThe interview is not just a technical deep dive but also a personal journey of discovery and passion for data engineering, underscoring the perpetual learning and adaptation required in the fast-evolving field of data science.\n\n| Watch or listen to this interview -> 657: How to Learn Data Engineering — with Andreas Kretz\n|------------------|\n| Was super fun talking with Jon about Data Engineering on the podcast. Think this will be very helpful for you :)\n| [Watch on YouTube](https://youtu.be/sbDFADS-zo8) / [Listen to the Podcast](https://www.superdatascience.com/podcast/how-to-learn-data-engineering)|\n\n#### Q&A Highlights\n\n**Q: What is data engineering, and why is it important?** A: Data engineering is the foundation of the data science process, focusing on collecting, cleaning, and managing data to make it accessible and usable for data scientists and analysts. It's crucial for automating data processes, ensuring data quality, and enabling scalable data analysis and machine learning models.\n\n**Q: How does one transition from data analysis to data engineering?**\nA: The transition involves gaining a deep understanding of data pipelines, learning to work with various data processing and management tools, and developing skills in programming languages and technologies relevant to data engineering, such as SQL, Python, and cloud platforms like AWS or Azure.\n\n**Q: What are the key skills and tools for a data engineer?**\nA: Essential skills include proficiency in SQL, experience with ETL tools, knowledge of programming languages like Python, and familiarity with cloud services and data processing frameworks like Apache Spark. Tools like Kafka for data streaming and platforms like Snowflake and Databricks are also becoming increasingly important.\n\n**Q: Can you elaborate on the emerging role of analytics engineers?**\nA: Analytics engineers focus on bridging the gap between raw data management and data analysis, working closely with data warehouses and using tools like dbt to prepare and model data for easy analysis. This role is pivotal in making data more accessible and actionable for decision-making processes.\n\n**Q: What advice would you give to someone aspiring to become a data engineer?**\nA: Start by mastering the basics of SQL and Python, then explore and gain experience with various data engineering tools and technologies. It's also important to understand the data science lifecycle and how data engineering fits within it. Continuous learning and staying updated with industry trends are key to success in this field.\n\n**Q: How does a data engineer's role evolve with experience?**\nA: A data engineer's journey typically starts with focusing on specific tasks or segments of data pipelines, using a limited set of tools. As they gain experience, they broaden their skill set, manage entire data pipelines, and take on more complex projects. Senior data engineers often lead teams, design data architectures, and collaborate closely with data scientists and business stakeholders to drive data-driven decisions.\n\n**Q: What distinguishes data engineering from machine learning engineering?**\nA: While both fields overlap, especially in the use of data, data engineering focuses on the infrastructure and processes for handling data, ensuring its quality and accessibility. Machine learning engineering, on the other hand, centers on deploying and maintaining machine learning models in production environments. A strong data engineering foundation is essential for effective machine learning engineering.\n\n**Q: Why might a data analyst transition to data engineering?**\nA: Data analysts may transition to data engineering to work on more technical aspects of data handling, such as building and maintaining data pipelines, automating data processes, and ensuring data scalability. This transition allows them to have a more significant impact on the data lifecycle and contribute to more strategic data initiatives within an organization.\n\n**Q: Can you share a challenging project you worked on as a data engineer?**\nA: One challenging project involved creating a scalable data pipeline for real-time processing of machine-generated data. The complexity lay in handling vast volumes of data, ensuring its quality, and integrating various data sources while maintaining high performance. This project highlighted the importance of selecting the right tools and technologies, such as Kafka for data streaming and Apache Spark for data processing, to meet the project's demands.\n\n**Q: How does the cloud influence data engineering?**\nA: Cloud platforms like AWS, Azure, and GCP have transformed data engineering by providing scalable, flexible, and cost-effective solutions for data storage, processing, and analysis. They offer a wide range of services and tools that data engineers can leverage to build robust data pipelines and infrastructure, facilitating easier access to advanced data processing capabilities and enabling more innovative data solutions.\n\n**Q: What future trends do you see in data engineering?**\nA: Future trends in data engineering include the increasing adoption of cloud-native services, the rise of real-time data processing and analytics, greater emphasis on data governance and security, and the continued growth of machine learning and AI-driven data processes. Additionally, tools and platforms that simplify data engineering tasks and enable more accessible data integration and analysis will become more prevalent, democratizing data across organizations.\n\n**Q: How does the background of a data analyst contribute to their success as a data engineer?**\nA: Data analysts have a unique advantage when transitioning to data engineering due to their understanding of data's end-use. Their experience in analyzing data gives them insights into what makes data valuable and usable, enabling them to design more effective and user-centric data pipelines and storage solutions.\n\n**Q: What role does automation play in data engineering?**\nA: Automation is crucial in data engineering for scaling data processes, reducing manual errors, and ensuring consistency in data handling. Automated data pipelines allow for real-time data processing and integration, making data more readily available for analysis and decision-making.\n\n**Q: Can you discuss the significance of cloud platforms in data engineering?**\nA: Cloud platforms like AWS, Azure, and GCP offer scalable, flexible, and cost-effective solutions for data storage, processing, and analysis. They provide data engineers with a suite of tools and services to build robust data pipelines, implement machine learning models, and manage large volumes of data efficiently.\n\n**Q: How does data engineering support data science and machine learning projects?**\nA: Data engineering lays the groundwork for data science and machine learning by preparing and managing the data infrastructure. It ensures that high-quality, relevant data is available for model training and analysis, thereby enabling more accurate predictions and insights.\n\n**Q: What emerging technologies or trends should data engineers be aware of?**\nA: Data engineers should keep an eye on the rise of machine learning operations (MLOps) for integrating machine learning models into production, the growing importance of real-time data processing and analytics, and the adoption of serverless computing for more efficient resource management. Additionally, technologies like containerization (e.g., Docker) and orchestration (e.g., Kubernetes) are becoming critical for deploying and managing scalable data applications.\n\n**Q: What challenges do data engineers face, and how can they be addressed?**\nA: Data engineers often grapple with data quality issues, integrating disparate data sources, and scaling data infrastructure to meet growing data volumes. Addressing these challenges requires a solid understanding of data architecture principles, continuous monitoring and testing of data pipelines, and adopting best practices for data governance and management.\n\n**Q: How important is collaboration between data engineers and other data professionals?**\nA: Collaboration is key in the data ecosystem. Data engineers need to work closely with data scientists, analysts, and business stakeholders to ensure that data pipelines are aligned with business needs and analytical goals. Effective communication and a shared understanding of data objectives are vital for the success of data-driven projects.\n\n\n### Building Blocks to Learn Data Engineering\n\nThe following Roadmaps all hinge on the courses in my Data Engineering Academy. They are designed to help students who come from many different professions and enable to build a customized curriculum.\n\nHere are all the courses currently available February 2024:\n\n**Colors:** Blue (The Basics), Green (Platform & Pipeline Fundamentals), Orange (Fundamental Tools), Red (Example Projects)\n\n![Building blocks of your curriculum](/images/All-Courses-at-Learn-Data-Engineering.jpg)\n\n\n### Roadmap for Beginners\n\nStart this roadmap at my Academy: [Start Today](https://learndataengineering.com/p/data-engineering-for-beginners)\n\n#### 11-Week Data Engineering Roadmap for Beginners & Graduates\n\n#### Master the Fundamentals and Build Your First Data Pipelines\n\n#### Starting in Data Engineering\n\nStarting in data engineering can feel overwhelming, especially if you’re coming from a non-technical background or have only limited experience with coding and databases.\n\nThis 11-week roadmap, with a time commitment of 5–10 hours per week, is designed to help you build strong foundations in data engineering, step by step, before moving into cloud platforms and more advanced pipelines. You’ll learn essential concepts, hands-on coding, data modeling, and cloud ETL development—everything you need to kickstart your career as a data engineer.\n\n---\n\n#### Why This Roadmap is for You\n\n- You’re just starting in data engineering and need a clear learning path\n- You want to build a strong foundation in data platforms, SQL, and Python\n- You need hands-on experience with data modeling, cloud ETL, and automation\n- You want to work on real-world projects that prepare you for a data engineering job\n\nBy the end of this roadmap, you’ll have the skills, tools, and project experience to confidently apply for entry-level data engineering roles and start your career in the field.\n\n![Building blocks of your curriculum](/images/Roadmap-For-Beginners.jpg)\n\n---\n\n#### What You’ll Achieve in This Roadmap\n\nThis roadmap is structured to help you understand the full data engineering workflow: from learning the fundamentals of data platforms and modeling to working with Python, SQL, and cloud-based ETL pipelines.\n\n#### Learning Goals\n\n| Goal        | Description                                         |\n| ----------- | --------------------------------------------------- |\n| **Goal #1** | Gain Experience in Data Platforms & Pipeline Design |\n| **Goal #2** | Work with Data Like a Data Engineer Using Python & SQL |\n| **Goal #3** | Learn Dimensional Data Modeling & Data Warehousing with Snowflake |\n| **Goal #4** | Gain Experience with ELT Using dbt & Orchestration with Airflow |\n| **Goal #5** | Build Your First ETL Pipeline on a Cloud Platform |\n\n---\n\n#### 11-Week Learning Roadmap\n\n| Week            | Topic                                     | Key Learning Outcomes                                                           |\n| --------------- | ----------------------------------------- | ------------------------------------------------------------------------------- |\n| **Week 1**      | Introduction & Platform & Pipeline Design | Understand data platforms, data pipelines, and the tools used in data engineering  |\n| **Week 2**      | Relational Data Modeling                  | Develop skills in creating relational data models for structured data           |\n| **Week 3 & 4**  | Python for Data Engineers                 | Learn Python for data processing, data manipulation, and pipeline development    |\n| **Week 5**      | Advanced SQL                              | Gain expertise in querying, storing, and manipulating data in relational databases |\n| **Week 6**      | Dimensional Data Modeling                 | Master the techniques of dimensional modeling for analytics and reporting       |\n| **Week 7**      | Snowflake Data Warehousing                | Learn how to use Snowflake as a cloud data warehouse                           |\n| **Week 8**      | Data Transformation with dbt              | Transform and model data efficiently using dbt                                 |\n| **Week 9**      | Data Pipeline Orchestration with Airflow  | Automate and manage data workflows using Apache Airflow                        |\n| **Week 10 & 11**| End-to-End Project on AWS, Azure, or GCP  | Complete an end-to-end project on a cloud platform of your choice              |\n\n---\n\n\n#### Week 1: Introduction & Platform & Pipeline Design\n\n##### 1. Learn the Basics of Platform & Pipeline Design\n\n##### Data Platform and Pipeline Design\n\n**Learn how to build data pipelines with templates and examples for Azure, GCP, and Hadoop**\n\n##### Description\n\nData pipelines are the backbone of any Data Science platform. They are essential for data ingestion, processing, and machine learning workflows. This training will help you understand how to create stream and batch processing pipelines as well as machine learning pipelines by going through the most essential basics—complemented by templates and examples for useful cloud computing platforms.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/data-pipeline-design)\n\n##### Detailed Course Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| **Platform & Pipeline Basics**  | The Platform Blueprint | 10:11 |\n| | Data Engineering Tools Guide | 2:44 |\n| | End-to-End Pipeline Example | 6:18 |\n| **Ingestion Pipelines** | Push Ingestion Pipelines | 3:42 |\n| | Pull Ingestion Pipelines | 3:34 |\n| **Pipeline Types** | Batch Pipelines | 3:07 |\n| | Streaming Pipelines | 3:34 |\n| **Visualization** | Stream Analytics | 2:26 |\n| | Visualization Pipelines | 3:47 |\n| | Visualization with Hive & Spark on Hadoop | 6:21 |\n| | Visualization Data via Spark Thrift Server | 3:27 |\n| **Platform Examples** | AWS, Azure, GCP (Currently Slides Only) | START |\n\n---\n\n##### 2. Get to Know the Different Data Stores\n\n##### Choosing Data Stores\n\n**Learn the different types of data storages and when to use which**\n\n##### Description\n\nOne part of creating a data platform and pipelines is to choose data stores, which is the focus of this training. You will learn about relational databases, NoSQL databases, data warehouses, and data lakes. The goal is to help you understand when to use each type of data storage and how to incorporate them into your pipeline.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/choosing-data-stores)\n\n\n##### Detailed Course Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| | What are Data Stores? | 2:09 |\n| **Data Stores Basics** | OLTP vs OLAP | 7:34 |\n| | ETL vs ELT | 5:45 |\n| | Data Stores Ranking | 4:05 |\n| **Relational Databases** | How to Choose Data Stores | 8:11 |\n| | Relational Databases Concepts | 6:34 |\n| **NoSQL Databases** | NoSQL Basics | 10:39 |\n| | Document Stores | 5:56 |\n| | Time Series Databases | 5:00 |\n| | Search Engines | 4:18 |\n| | Wide Column Stores | 4:22 |\n| | Key Value Stores | 4:59 |\n| | Graph Databases | 1:05 |\n| **Data Warehouses & Data Lakes** | Data Warehouses | 5:32 |\n| | Data Lakes | 7:10 |\n\n---\n\n#### 3. See Data Modeling Examples for the Learned Data Stores\n\n##### Data Modeling 1\n\n**Learn how to design schemas for SQL, NoSQL, and Data Warehouses**\n\n##### Description\n\nSchema design is a critical skill for data engineers. This training covers schema design for different data stores using an e-commerce dataset. You will see examples of how the same dataset is modeled for relational databases, NoSQL stores, wide column stores, document stores, key-value stores, and data warehouses. This will help you understand how to create maintainable models and avoid data swamps.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/data-modeling)\n\n##### Detailed Course Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| | Why Data Modeling Is Important | 5:44 |\n| | A Good Dataset | 1:28 |\n| **Relational Databases** | Schema Design | 9:27 |\n| **Wide Column Stores** | Schema Design | 7:35 |\n| **Document Stores** | Schema Design | 7:28 |\n| **Key Value Stores** | Schema Design | 4:49 |\n| **Data Warehouses** | Schema Design | 4:44 |\n| **Data Modeling Workshop** | November 2024 | 101:49 |\n\n---\n\n\n#### Week 2: Relational Data Modeling\n\n##### Start with Relational Data Modeling\n\n**Relational Data modeling** is an essential skill, as even in modern \"big data\" environments, relational databases are often used for managing and serving metadata. This week focuses on building a strong foundation in relational data modeling, which is crucial for structuring data effectively and optimizing query performance.\n\n##### Relational Data Modeling\n\n**Learn the most important basics to create a data model for OLTP data stores**\n\n###### Description\n\nThis course covers everything you need to know about relational data modeling—from understanding entities, attributes, and relationships to normalizing data models up to the third normal form (3NF). You will learn how to design conceptual, logical, and physical data models, implement primary and foreign keys, and ensure data quality through constraints and validations. Practical exercises include setting up a MySQL server with Docker and creating ER diagrams using MySQL Workbench.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/relational-data-modeling)\n\n##### Detailed Course Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| **Basics and Prepare the Environment** | Relational Data Models History | 3:16 |\n| | Installing MySQL Server and MySQL Workbench | 8:04 |\n| | MySQL Workbench Introduction | 4:36 |\n| **Create the Conceptual Data Model** | The Design Process Explained | 4:14 |\n| | Discover the Entities | 10:24 |\n| | Discover the Attributes | 13:09 |\n| | Define Entity Relationships and Normalize the Data | 11:19 |\n| **Defining and Resolving Relationships** | Identifying vs Non-Identifying Relationships | 2:01 |\n| | How to Resolve Many-to-Many Relationships | 4:00 |\n| | How to Resolve One-to-Many Relationships | 2:34 |\n| | How to Resolve One-to-One Relationships | 1:45 |\n| **Hands-On Workbench - Creating the Database** | Create Your ER Diagram Using Workbench | 19:46 |\n| | Create a Physical Data Model | 4:13 |\n| | Populate the MySQL DB with Data from .xls File | 15:13 |\n\n---\n\n\n#### Week 3 & 4: Python for Data Engineers\n\n##### Description\n\nThis course offers a comprehensive guide to using Python for data engineering tasks. You’ll learn advanced Python features, including data processing with Pandas, working with APIs, interacting with PostgreSQL databases, and handling data types like JSON. The course also covers important programming concepts like exception handling, modules, unit testing, and object-oriented programming—all within the context of data engineering.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/python-for-data-engineers)\n\n##### Detailed Course Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| **Advanced Python** | Classes | 4:37 |\n| | Modules | 3:06 |\n| | Exception Handling | 8:55 |\n| | Logging | 5:12 |\n| **Data Engineering** | Datetime | 8:04 |\n| | JSON | 9:54 |\n| | JSON Validation | 15:10 |\n| | UnitTesting | 16:44 |\n| | Pandas: Intro & Data Types | 8:43 |\n| | Pandas: Appending & Merging DataFrames | 7:49 |\n| | Pandas: Normalizing & Lambdas | 4:12 |\n| | Pandas: Pivot & Parquet Write, Read | 6:17 |\n| | Pandas: Melting & JSON Normalization | 8:15 |\n| | Numpy | 4:47 |\n| **Working with Data Sources/Sinks** | Requests (Working with APIs) | 11:15 |\n| | Working with Databases: Setup | 4:06 |\n| | Working with Databases: Tables, Bulk Load, Queries | 8:12 |\n\n---\n\n#### Week 5: SQL for Data Engineers\n\n##### Description\n\nSQL is the backbone of working with relational databases, and if you’re getting into Data Engineering, mastering SQL is a must. This course provides the essential SQL skills needed to work with databases effectively. You'll learn how to manage data, build efficient queries, and perform advanced operations to handle real-world data challenges.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/sql-for-data-engineers)\n\n##### Detailed Course Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| **Basics** | Database Management Systems & SQL | 3:49 |\n| | The Chinook Database | 3:03 |\n| | SQLite Installation | 7:02 |\n| | DBeaver Installation | 4:08 |\n| | Data Types in SQLite | 6:15 |\n| **Basic SQL** | DML & DDL | 15:06 |\n| | Select Statements | 6:03 |\n| | Grouping & Aggregation | 10:12 |\n| | Joins | 10:05 |\n| **Advanced SQL** | TCP Transaction Control Language | 6:42 |\n| | Common Table Expressions & Subqueries | 10:26 |\n| | Window Functions 1: Concept & Syntax | 5:00 |\n| | Window Functions 2: Aggregate Functions | 7:24 |\n| | Window Functions 3: Ranking Functions | 6:05 |\n| | Window Functions 4: Analytical Functions | 7:20 |\n| **Optimization** | Query Optimization | START |\n| | Indexing Best Practices | START |\n\n\n---\n\n#### Week 6: Dimensional Data Modeling\n\n##### Description\n\nDimensional data modeling is a crucial skill for data engineers working with analytics use-cases where data needs to be structured efficiently for reporting and business insights. This course covers the basics of dimensional modeling, the medallion architecture, and how to create data models for OLAP data stores.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/data-modeling-3-dimensional-data-modeling)\n\n##### Detailed Course Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| | Data Warehousing Basics | 6:42 |\n| **Dimensional Modeling Basics** | Approaches to building a data warehouse | 5:20 |\n| | Dimension tables explained | 5:34 |\n| | Fact tables explained | 6:34 |\n| | Identifying dimensions | 3:16 |\n| **Data Warehouse Setup** | What is DuckDB | 5:58 |\n| | First DuckDB hands-on | 2:20 |\n| | Creating tables in DuckDB | 2:40 |\n| | Installing DBeaver | 6:49 |\n| **Working With The Data Warehouse** | Exploring SCD0 and SCD1 | 19:57 |\n| | Exploring SCD2 | 13:52 |\n| | Exploring transaction fact table | 6:28 |\n| | Exploring accumulating fact table | 7:17 |\n\n---\n\n#### Week 7: Snowflake for Data Engineers\n\n##### Description\n\nSnowflake is a highly popular cloud-based data warehouse that is ideal for beginners due to its simplicity and powerful features. In this course, you will learn how to set up Snowflake, load and process data, and create visualizations. The course covers both SQL and Python methods for managing data within Snowflake, and provides hands-on experience with connecting Snowflake to other tools such as PowerBI.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/snowflake-for-data-engineers)\n\n##### Detailed Course Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| **Introduction** | Snowflake basics | 4:16 |\n| | Data Warehousing basics | 4:13 |\n| | How Snowflake fits into data platforms | 3:14 |\n| **Setup** | Snowflake Account setup | 4:24 |\n| | Creating your warehouse & UI overview | 4:15 |\n| **Loading CSVs from your PC** | Our dataset & goals | 3:01 |\n| | Setup Snowflake database | 10:29 |\n| | Preparing the upload file | 8:31 |\n| | Using internal stages with SnowSQL | 12:37 |\n| | Splitting a data table into two tables | 6:38 |\n| **Visualizing Data** | Creating a visualization worksheet | 7:08 |\n| | Creating a dashboard | 5:23 |\n| | Connect PowerBI to Snowflake | 6:03 |\n| | Query data with Python | 7:35 |\n| **Automation** | Create import task | 9:18 |\n| | Create table refresh task | 3:40 |\n| | Test our pipeline | 3:14 |\n| **AWS S3 Integration** | Working with external stages for AWS S3 | 10:20 |\n| | Implementing snowpipe with S3 | 6:19 |\n\n---\n\n#### Week 8: dbt for Data Engineers\n\n##### Description\n\nThis course introduces dbt (Data Build Tool), a SQL-first transformation workflow that allows you to transform, test, and document data directly within your data warehouse. You will learn how to set up dbt, connect it with Snowflake, create data pipelines, and implement advanced features like CI/CD and documentation generation. This training is ideal for data engineers looking to build trusted datasets for reporting, machine learning, and operational workflows.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/dbt-for-data-engineers)\n\n##### Detailed Course Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| **dbt Introduction & Setup** | Modern data experience | 5:42 |\n| | Introduction to dbt | 4:38 |\n| | Goals of this course | 4:50 |\n| | Snowflake preparation | 7:29 |\n| | Loading data into Snowflake | 4:48 |\n| | Setup dbt Core | 9:35 |\n| | Preparing the GitHub repository | 3:32 |\n| **Working with dbt-Core** | dbt models & materialization explained | 6:16 |\n| | Creating your first SQL model | 5:48 |\n| | Working with custom schemas | 5:28 |\n| | Creating your first Python model | 4:35 |\n| | dbt sources | 1:55 |\n| | Configuring sources | 4:03 |\n| | Working with seed files | 4:20 |\n| **Tests in dbt** | Generic tests | 3:19 |\n| | Tests with Great Expectations | 3:25 |\n| | Writing custom generic tests | 2:49 |\n| **Working with dbt-Cloud** | dbt cloud setup | 7:25 |\n| | Creating dbt jobs | 5:14 |\n| | CI/CD automation with dbt cloud and GitHub | 10:52 |\n| | Documentation in dbt | 7:38 |\n\n---\n\n#### Week 9: Apache Airflow Workflow Orchestration\n\n##### Description\n\nAirflow is a platform-independent workflow orchestration tool that offers many possibilities to create and monitor stream and batch pipeline processes. It supports complex, multi-stage processes across major platforms and tools in the data engineering world, such as AWS or Google Cloud. Airflow is not only great for planning and organizing your processes but also provides robust monitoring capabilities, allowing you to keep track of data workflows and troubleshoot effectively.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/learn-apache-airflow)\n\n##### Detailed Course Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| **Airflow Workflow Orchestration** | Airflow Usage | 3:19 |\n| **Airflow Fundamental Concepts** | Fundamental Concepts | 2:47 |\n| | Airflow Architecture | 3:09 |\n| | Example Pipelines | 4:49 |\n| | Spotlight 3rd Party Operators | 2:17 |\n| | Airflow XComs | 4:32 |\n| **Hands-On Setup** | Project Setup | 1:43 |\n| | Docker Setup Explained | 2:06 |\n| | Docker Compose & Starting Containers | 4:23 |\n| | Checking Services | 1:48 |\n| | Setup WeatherAPI | 1:33 |\n| | Setup Postgres DB | 1:58 |\n| **Learn Creating DAGs** | Airflow Webinterface | 4:37 |\n| | Creating DAG With Airflow 2.0 | 9:46 |\n| | Running our DAG | 4:15 |\n| | Creating DAG With TaskflowAPI | 6:59 |\n| | Getting Data From the API With SimpleHTTPOperator | 3:38 |\n| | Writing into Postgres | 4:12 |\n| | Parallel Processing | 4:15 |\n\n---\n\n\n#### Week 10 & 11: End-to-End Project on AWS, Azure, or GCP\n\n##### Important: Choose One Project\nParticipants need to select **one** of the following cloud platforms to complete their end-to-end data engineering project. It is not necessary to complete all three projects.\n\n##### AWS Project Introduction\n\nThe AWS project is designed for those who want to get started with cloud platforms, particularly with Amazon Web Services, the leading platform in data processing. This project will guide you through setting up an end-to-end data engineering pipeline using AWS tools like Lambda, API Gateway, Glue, Redshift, Kinesis, and DynamoDB. You will work with an e-commerce dataset, learn data modeling, and implement both stream and batch processing pipelines.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/data-engineering-on-aws)\n\n##### Detailed AWS Project Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| | Data Engineering | 4:15 |\n| | Data Science Platform | 5:20 |\n| **The Dataset** | Data Types You Encounter | 3:03 |\n| | What Is A Good Dataset | 2:54 |\n| | The Dataset We Use | 3:16 |\n| | Defining The Purpose | 6:27 |\n| | Relational Storage Possibilities | 3:46 |\n| | NoSQL Storage Possibilities | 6:28 |\n| **Platform Design** | Selecting The Tools | 3:49 |\n| | Client | 3:05 |\n| | Connect | 1:18 |\n| | Buffer | 1:28 |\n| | Process | 2:42 |\n| | Store | 3:41 |\n| | Visualize | 3:00 |\n| **Data Pipelines** | Data Ingestion Pipeline | 3:00 |\n| | Stream To Raw Storage Pipeline | 2:19 |\n| | Stream To DynamoDB Pipeline | 3:09 |\n| | Visualization API Pipeline | 2:56 |\n| | Visualization Redshift Data Warehouse Pipeline | 5:29 |\n| | Batch Processing Pipeline | 3:19 |\n| **AWS Basics** | Create An AWS Account | 1:58 |\n| | Things To Keep In Mind | 2:45 |\n| | IAM Identity & Access Management | 4:06 |\n| | Logging | 2:22 |\n| | AWS Python API Boto3 | 2:57 |\n| **Data Ingestion Pipeline** | Development Environment | 4:02 |\n| | Create Lambda for API | 2:33 |\n| | Create API Gateway | 8:30 |\n| | Setup Kinesis | 1:38 |\n| | Setup IAM for API | 5:00 |\n| | Create Ingestion Pipeline (Code) | 6:09 |\n| | Create Script to Send Data | 5:46 |\n| | Test The Pipeline | 4:53 |\n| **Stream To Raw S3 Storage Pipeline** | Setup S3 Bucket | 3:42 |\n| | Configure IAM For S3 | 3:21 |\n| | Create Lambda For S3 Insert | 7:16 |\n| | Test The Pipeline | 4:01 |\n| **Stream To DynamoDB Pipeline** | Setup DynamoDB | 9:00 |\n| | Setup IAM For DynamoDB Stream | 3:36 |\n| | Create DynamoDB Lambda | 9:20 |\n| **Visualization API** | Create API & Lambda For Access | 6:10 |\n| | Test The API | 4:47 |\n| **Visualization Pipeline Redshift Data Warehouse** | Setup Redshift Data Warehouse | 8:08 |\n| | Security Group For Firehose | 3:12 |\n| | Create Redshift Tables | 5:51 |\n| | S3 Bucket & jsonpaths.json | 3:02 |\n| | Configure Firehose | 7:58 |\n| | Debug Redshift Streaming | 7:43 |\n| | Bug-fixing | 5:58 |\n| | Power BI | 12:16 |\n| **Batch Processing Pipeline** | AWS Glue Basics | 5:14 |\n| | Glue Crawlers | 13:09 |\n| | Glue Jobs | 13:43 |\n| | Redshift Insert & Debugging | 7:16 |\n\n---\n\n\n##### Azure Project Introduction\n\nThe Azure project is designed for those who want to build a streaming data pipeline using Microsoft Azure's robust cloud platform. This project introduces you to Azure services such as APIM, Blob Storage, Azure Functions, Cosmos DB, and Power BI. You will gain practical experience by building a pipeline that ingests, processes, stores, and visualizes data, using Python and Visual Studio Code.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/build-streaming-data-pipelines-in-azure)\n\n##### Detailed Azure Project Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| **Project Introduction** | Data Engineering in Azure - Streaming Data Pipelines | 2:43 |\n| **Datasets and Local Preprocessing** | Introduction to Datasets and Local Preprocessing | 7:06 |\n| | Deploying Code on Visual Studio to Docker Containers | 5:27 |\n| **Azure Functions and Blob Storage** | Develop Azure Functions via Python and VS Code | 5:52 |\n| | Deploy Azure Function to Azure Function App and Test It | 6:26 |\n| | Integrate Azure Function with Blob Storage via Bindings | 4:58 |\n| **Add Azure Function to Azure API Management (APIM)** | Expose Azure Function as a Backend | 7:05 |\n| | Securely Store Secrets in Azure Key Vault | 4:41 |\n| | Add Basic Authentication in API Management | 4:35 |\n| | Test APIM and Imported Azure Function via Local Python Program | 2:34 |\n| **Create and Combine Event Hubs, Azure Function, and Cosmos DB** | Create Event Hubs and Test Capture Events Feature | 6:59 |\n| | Modify Existing Azure Function to Include Event Hubs Binding | 6:42 |\n| **Write Tweets to Cosmos DB (Core SQL) from Event Hub** | Create a Cosmos DB (Core SQL) | 9:03 |\n| | Create a New Azure Function that Writes Messages to Cosmos DB | 9:03 |\n| **Connect Power BI Desktop to Your Cosmos DB** | Connect Power BI Desktop via Connector and Create a Dashboard | 6:32 |\n\n---\n\n##### GCP Project Introduction\n\nThe GCP project is designed for those who want to learn how to build, manage, and optimize data pipelines on Google Cloud Platform. This project focuses on building an end-to-end pipeline that extracts data from an external weather API, processes it through GCP's data tools, and visualizes the results using Looker Studio. This project offers practical, hands-on experience with tools like Cloud SQL, Compute Engine, Cloud Functions, Pub/Sub, and Looker Studio.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/data-engineering-on-gcp)\n\n##### Detailed GCP Project Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| **Introduction** | Introduction | 1:13 |\n| | GitHub & the Team | 1:30 |\n| **Data & Goals** | Architecture of the Project | 3:19 |\n| | Introduction to Weather API | 2:18 |\n| | Setup Google Cloud Account | 2:12 |\n| **Project Setup** | Creating the Project | 2:35 |\n| | Enabling the Required APIs | 1:34 |\n| | Configure Scheduling | 2:20 |\n| **Pipeline Creation - Extract from API** | Setup VM for Database Interaction | 2:53 |\n| | Setup MySQL Database | 2:16 |\n| | Setup VM Client and Create Database | 2:46 |\n| | Creating Pub/Sub Message Queue | 1:41 |\n| | Create Cloud Function to Pull Data from API | 4:17 |\n| | Explanation of Code to Pull from API | 4:20 |\n| **Pipeline Creation - Write to Database** | Create Function to Write to Database | 7:47 |\n| | Explanation of Code to Write Data to Database | 5:56 |\n| | Testing the Function | 5:51 |\n| | Create Function Write Data to DB - Pull | 3:53 |\n| | Explanation Code Write Data to DB - Pull | 4:33 |\n| **Visualization** | Setup Looker Studio and Create Bubble Chart | 2:20 |\n| | Setup Looker Studio and Create Time Series Chart | 1:57 |\n| | Pipeline Monitoring | 6:20 |\n\n---\n\n\n##### What’s Next?\n\nAfter completing this roadmap, you’ll have the confidence and skills to not just analyze data but to engineer and optimize it like a pro! Explore advanced topics, start contributing to projects, and showcase your new skills to potential employers.\n\n\n\n### Roadmap for Data Analysts\n\nStart this roadmap at my Academy: [Start Today](https://learndataengineering.com/p/data-engineering-for-data-analysts)\n\n#### Go Beyond SQL and Learn How to Build, Automate, and Optimize Data Pipelines Like an Engineer\n\n#### Who Is This 10 Week Roadmap For?\n\n- Data Analysts who want to understand the full data lifecycle\n- Those looking to move beyond SQL and build real data pipelines\n- Professionals seeking hands-on, practical experience to boost their resumes\n- Anyone wanting to stay competitive in the job market\n\n#### What You’ll Achieve\n\nThis roadmap provides a step-by-step approach to mastering data engineering skills. You'll start with Python and data modeling, move on to building pipelines, work with cloud platforms, and finally automate workflows using industry-standard tools.\n\n\n![Building blocks of your curriculum](/images/Roadmap-From-Data-Analyst-to-Engineer.jpg)\n\n---\n\n#### Learning Goals\n\n| Goal        | Description                                         |\n| ----------- | --------------------------------------------------- |\n| **Goal #1** | Master Python & Relational Data Modeling            |\n| **Goal #2** | Build Your First ETL Pipeline on AWS (or Azure/GCP) |\n| **Goal #3** | Gain Hands-On Experience with Snowflake & dbt       |\n| **Goal #4** | Connect AWS and Snowflake                           |\n| **Goal #5** | Automate Your Data Pipeline with Airflow            |\n\n---\n\n#### 10-Week Learning Roadmap\n\n| Week            | Topic                                     | Key Learning Outcomes                                                           |\n| --------------- | ----------------------------------------- | ------------------------------------------------------------------------------- |\n| **Week 1**      | Introduction to Data Engineering & Python | Understand core concepts of data engineering and Python programming basics      |\n| **Week 2**      | Platform & Pipeline Design                | Learn how to design effective data platforms and pipelines                      |\n| **Week 3**      | Relational Data Modeling                  | Develop skills in creating relational data models for structured data           |\n| **Week 4**      | Dimensional Data Modeling                 | Master the techniques of dimensional modeling for analytics and reporting       |\n| **Week 5**      | Docker Fundamentals & APIs                | Get hands-on with containerization using Docker and working with APIs           |\n| **Week 8**      | Working with Snowflake                    | Gain practical experience using Snowflake as a data warehouse                   |\n| **Week 9**      | Transforming Data With dbt                | Learn to transform and model data efficiently using dbt                         |\n| **Week 10**     | Pipeline Orchestration with Airflow       | Automate and manage data workflows using Apache Airflow                         |\n\n---\n\n#### Detailed Weekly Content\n\n#### Week 1: Introduction to Data Engineering & Python\n\nIf you want to take your data engineering skills to the next level, you are in the right place. Python has become the go-to language for data analysis and machine learning, and with our training, you will learn how to successfully use Python to build robust data pipelines and manipulate data efficiently.\n\nThis comprehensive training program is designed for data engineers of all levels. Whether you are just starting out in data engineering or you are an experienced engineer looking to expand your skill set, our Python for Data Engineers training will give you the tools you need to excel in your field.\n\nAt the end of the training, you will have a strong foundation in Python and data engineering and be ready to tackle complex data engineering projects with ease.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/python-for-data-engineers)\n\n##### Course Curriculum\n\n| Lesson | Duration |\n|--------|----------|\n| Classes | 4:37 |\n| Modules | 3:06 |\n| Exception Handling | 8:55 |\n| Logging | 5:12 |\n| Datetime | 8:04 |\n| JSON | 9:54 |\n| JSON Validation | 15:10 |\n| UnitTesting | 16:44 |\n| Pandas: Intro & data types | 8:43 |\n| Pandas: Appending & Merging DataFrames | 7:49 |\n| Pandas: Normalizing & Lambdas | 4:12 |\n| Pandas: Pivot & Parquet write, read | 6:17 |\n| Pandas: Melting & JSON normalization | 8:15 |\n| Numpy | 4:47 |\n| Requests (Working with APIs) | 11:15 |\n| Working with Databases: Setup | 4:06 |\n| Working with Databases: Tables, bulk load, queries | 8:12 |\n\n---\n\n#### Week 2: Platform & Pipeline Design\n\n##### Description\nData pipelines are the number one thing within the Data Science platform. Without them, data ingestion or machine learning processing, for example, would not be possible.\n\nThis 110-minute long training will help you understand how to create stream and batch processing pipelines as well as machine learning pipelines by going through some of the most essential basics - complemented by templates and examples for useful cloud computing platforms.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/data-pipeline-design)\n\n##### Course Curriculum\n\n| Lesson | Duration |\n|--------|----------|\n| Platform Blueprint & End to End Pipeline Example | 10:11 |\n| Data Engineering Tools Guide | 2:44 |\n| End to End Pipeline Example | 6:18 |\n| Push Ingestion Pipelines | 3:42 |\n| Pull Ingestion Pipelines | 3:34 |\n| Batch Pipelines | 3:07 |\n| Streaming Pipelines | 3:34 |\n| Stream Analytics | 2:26 |\n| Lambda Architecture | 4:02 |\n| Visualization Pipelines | 3:47 |\n| Visualization with Hive & Spark on Hadoop | 6:21 |\n| Visualization Data via Spark Thrift Server | 3:27 |\n\n---\n\n\n#### Week 3: Relational Data Modeling\n\n##### Description\nRelational modeling is often used for building transactional databases. You might say, 'But I'm not planning to become a back-end engineer'. Apart from knowing how to move data, you should also know how to store it effectively which involves designing a scalable data model optimized to drive faster query response time and efficiently retrieve data.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/relational-data-modeling)\n\n##### Course Curriculum\n\n| Lesson | Duration |\n|--------|----------|\n| Relational Data Models History | 3:16 |\n| Installing MySQL Server and MySQL Workbench | 8:04 |\n| MySQL Workbench Introduction | 4:36 |\n| The Design Process Explained | 4:14 |\n| Discover the Entities | 10:24 |\n| Discover the Attributes | 13:09 |\n| Define Entity Relationships and Normalize the Data | 11:19 |\n| Identifying vs Non-identifying Relationships | 2:01 |\n| Resolve Many-to-Many Relationships | 4:00 |\n| Resolve One-to-Many Relationships | 2:34 |\n| Resolve One-to-One Relationships | 1:45 |\n| Create ER Diagram Using Workbench | 19:46 |\n| Create a Physical Data Model | 4:13 |\n| Populate MySQL DB with Data from .xls File | 15:13 |\n| Course Conclusion | 1:28 |\n\n---\n\n#### Week 4: Dimensional Data Modeling\n\n##### Description\nIn today’s data-driven world, efficient data organization is key to enabling insightful analysis and reporting. Dimensional data modeling is a crucial technique that helps structure your data for faster querying and better decision-making.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/data-modeling-3-dimensional-data-modeling)\n\n##### Course Curriculum\n\n| Lesson | Duration |\n|--------|----------|\n| Intro to Data Warehousing | 6:42 |\n| Approaches to Building a Data Warehouse | 5:20 |\n| Dimension Tables Explained | 5:34 |\n| Fact Tables Explained | 6:34 |\n| Identifying Dimensions | 3:16 |\n| What is DuckDB | 5:58 |\n| First DuckDB Hands-on | 2:20 |\n| Creating Tables in DuckDB | 2:40 |\n| Installing DBeaver | 6:49 |\n| Exploring SCD0 and SCD1 | 19:57 |\n| Exploring SCD2 | 13:52 |\n| Exploring Transaction Fact Table | 6:28 |\n| Exploring Accumulating Fact Table | 7:17 |\n| Course Conclusion | 0:52 |\n\n---\n\n#### Week 5: Docker Fundamentals & APIs\n\n##### Description\nWeek 5 covers two crucial topics: containerization using Docker and building APIs with FastAPI. Docker is essential for creating lightweight, self-sustained containers, while APIs are the backbone of data platforms.\n\nCheck out Docker Fundamentals in my Academy: [Learn More](https://learndataengineering.com/p/docker-fundamentals)\n\nCheck out Building APIs with FastAPI in my Academy: [Learn More](https://learndataengineering.com/p/apis-with-fastapi-course)\n\n##### Course Curriculum\n\n##### Docker Fundamentals\n\n| Lesson | Duration |\n|--------|----------|\n| Docker vs Virtual Machines | 6:23 |\n| Docker Terminology | 5:56 |\n| Installing Docker Desktop | 4:09 |\n| Pulling Images & Running Containers | 6:34 |\n| Docker Compose | 6:34 |\n| Build & Run Simple Image | 6:28 |\n| Build Image with Dependencies | 5:05 |\n| Using DockerHub Image Registry | 4:24 |\n| Image Layers & Security Best Practices | 7:55 |\n| Managing Docker with Portainer | 4:04 |\n\n##### Building APIs with FastAPI\n\n| Lesson | Duration |\n|--------|----------|\n| What are APIs? | 8:29 |\n| Hosting vs Using APIs | 4:08 |\n| HTTP Methods & Media Types | 6:56 |\n| API Parameters & Response Codes | 9:40 |\n| Setting up FastAPI | 4:55 |\n| Creating APIs: POST, GET, PUT | 16:18 |\n| Testing APIs with Postman | 4:22 |\n| Deploying FastAPI with Docker | 6:01 |\n| API Security Best Practices | 3:48 |\n\n---\n\n\n#### Week 6 & 7: End-to-End Project on AWS, Azure, or GCP\n\n##### Important: Choose One Project\nParticipants need to select **one** of the following cloud platforms to complete their end-to-end data engineering project. It is not necessary to complete all three projects.\n\n##### AWS Project Introduction\n\nThe AWS project is designed for those who want to get started with cloud platforms, particularly with Amazon Web Services, the leading platform in data processing. This project will guide you through setting up an end-to-end data engineering pipeline using AWS tools like Lambda, API Gateway, Glue, Redshift, Kinesis, and DynamoDB. You will work with an e-commerce dataset, learn data modeling, and implement both stream and batch processing pipelines.\n\nCheck out this project in my Academy: [Learn More](https://learndataengineering.com/p/data-engineering-on-aws)\n\n##### Detailed AWS Project Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| | Data Engineering | 4:15 |\n| | Data Science Platform | 5:20 |\n| **The Dataset** | Data Types You Encounter | 3:03 |\n| | What Is A Good Dataset | 2:54 |\n| | The Dataset We Use | 3:16 |\n| | Defining The Purpose | 6:27 |\n| | Relational Storage Possibilities | 3:46 |\n| | NoSQL Storage Possibilities | 6:28 |\n| **Platform Design** | Selecting The Tools | 3:49 |\n| | Client | 3:05 |\n| | Connect | 1:18 |\n| | Buffer | 1:28 |\n| | Process | 2:42 |\n| | Store | 3:41 |\n| | Visualize | 3:00 |\n| **Data Pipelines** | Data Ingestion Pipeline | 3:00 |\n| | Stream To Raw Storage Pipeline | 2:19 |\n| | Stream To DynamoDB Pipeline | 3:09 |\n| | Visualization API Pipeline | 2:56 |\n| | Visualization Redshift Data Warehouse Pipeline | 5:29 |\n| | Batch Processing Pipeline | 3:19 |\n| **AWS Basics** | Create An AWS Account | 1:58 |\n| | Things To Keep In Mind | 2:45 |\n| | IAM Identity & Access Management | 4:06 |\n| | Logging | 2:22 |\n| | AWS Python API Boto3 | 2:57 |\n| **Data Ingestion Pipeline** | Development Environment | 4:02 |\n| | Create Lambda for API | 2:33 |\n| | Create API Gateway | 8:30 |\n| | Setup Kinesis | 1:38 |\n| | Setup IAM for API | 5:00 |\n| | Create Ingestion Pipeline (Code) | 6:09 |\n| | Create Script to Send Data | 5:46 |\n| | Test The Pipeline | 4:53 |\n| **Stream To Raw S3 Storage Pipeline** | Setup S3 Bucket | 3:42 |\n| | Configure IAM For S3 | 3:21 |\n| | Create Lambda For S3 Insert | 7:16 |\n| | Test The Pipeline | 4:01 |\n| **Stream To DynamoDB Pipeline** | Setup DynamoDB | 9:00 |\n| | Setup IAM For DynamoDB Stream | 3:36 |\n| | Create DynamoDB Lambda | 9:20 |\n| **Visualization API** | Create API & Lambda For Access | 6:10 |\n| | Test The API | 4:47 |\n| **Visualization Pipeline Redshift Data Warehouse** | Setup Redshift Data Warehouse | 8:08 |\n| | Security Group For Firehose | 3:12 |\n| | Create Redshift Tables | 5:51 |\n| | S3 Bucket & jsonpaths.json | 3:02 |\n| | Configure Firehose | 7:58 |\n| | Debug Redshift Streaming | 7:43 |\n| | Bug-fixing | 5:58 |\n| | Power BI | 12:16 |\n| **Batch Processing Pipeline** | AWS Glue Basics | 5:14 |\n| | Glue Crawlers | 13:09 |\n| | Glue Jobs | 13:43 |\n| | Redshift Insert & Debugging | 7:16 |\n\n---\n\n\n##### Azure Project Introduction\n\nThe Azure project is designed for those who want to build a streaming data pipeline using Microsoft Azure's robust cloud platform. This project introduces you to Azure services such as APIM, Blob Storage, Azure Functions, Cosmos DB, and Power BI. You will gain practical experience by building a pipeline that ingests, processes, stores, and visualizes data, using Python and Visual Studio Code.\n\nCheck out this project in my Academy: [Learn More](https://learndataengineering.com/p/build-streaming-data-pipelines-in-azure)\n\n##### Detailed Azure Project Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| **Project Introduction** | Data Engineering in Azure - Streaming Data Pipelines | 2:43 |\n| **Datasets and Local Preprocessing** | Introduction to Datasets and Local Preprocessing | 7:06 |\n| | Deploying Code on Visual Studio to Docker Containers | 5:27 |\n| **Azure Functions and Blob Storage** | Develop Azure Functions via Python and VS Code | 5:52 |\n| | Deploy Azure Function to Azure Function App and Test It | 6:26 |\n| | Integrate Azure Function with Blob Storage via Bindings | 4:58 |\n| **Add Azure Function to Azure API Management (APIM)** | Expose Azure Function as a Backend | 7:05 |\n| | Securely Store Secrets in Azure Key Vault | 4:41 |\n| | Add Basic Authentication in API Management | 4:35 |\n| | Test APIM and Imported Azure Function via Local Python Program | 2:34 |\n| **Create and Combine Event Hubs, Azure Function, and Cosmos DB** | Create Event Hubs and Test Capture Events Feature | 6:59 |\n| | Modify Existing Azure Function to Include Event Hubs Binding | 6:42 |\n| **Write Tweets to Cosmos DB (Core SQL) from Event Hub** | Create a Cosmos DB (Core SQL) | 9:03 |\n| | Create a New Azure Function that Writes Messages to Cosmos DB | 9:03 |\n| **Connect Power BI Desktop to Your Cosmos DB** | Connect Power BI Desktop via Connector and Create a Dashboard | 6:32 |\n\n---\n\n##### GCP Project Introduction\n\nThe GCP project is designed for those who want to learn how to build, manage, and optimize data pipelines on Google Cloud Platform. This project focuses on building an end-to-end pipeline that extracts data from an external weather API, processes it through GCP's data tools, and visualizes the results using Looker Studio. This project offers practical, hands-on experience with tools like Cloud SQL, Compute Engine, Cloud Functions, Pub/Sub, and Looker Studio.\n\nCheck out this project in my Academy: [Learn More](https://learndataengineering.com/p/data-engineering-on-gcp)\n\n##### Detailed GCP Project Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| **Introduction** | Introduction | 1:13 |\n| | GitHub & the Team | 1:30 |\n| **Data & Goals** | Architecture of the Project | 3:19 |\n| | Introduction to Weather API | 2:18 |\n| | Setup Google Cloud Account | 2:12 |\n| **Project Setup** | Creating the Project | 2:35 |\n| | Enabling the Required APIs | 1:34 |\n| | Configure Scheduling | 2:20 |\n| **Pipeline Creation - Extract from API** | Setup VM for Database Interaction | 2:53 |\n| | Setup MySQL Database | 2:16 |\n| | Setup VM Client and Create Database | 2:46 |\n| | Creating Pub/Sub Message Queue | 1:41 |\n| | Create Cloud Function to Pull Data from API | 4:17 |\n| | Explanation of Code to Pull from API | 4:20 |\n| **Pipeline Creation - Write to Database** | Create Function to Write to Database | 7:47 |\n| | Explanation of Code to Write Data to Database | 5:56 |\n| | Testing the Function | 5:51 |\n| | Create Function Write Data to DB - Pull | 3:53 |\n| | Explanation Code Write Data to DB - Pull | 4:33 |\n| **Visualization** | Setup Looker Studio and Create Bubble Chart | 2:20 |\n| | Setup Looker Studio and Create Time Series Chart | 1:57 |\n| | Pipeline Monitoring | 6:20 |\n\n---\n\n\n#### Week 8: Working with Snowflake\n\n##### Description\n\nCurrently, Snowflake is the analytics store/data warehouse everybody is talking about. It is a 100% cloud-based platform that offers many advantages, including flexible data access and the ability to scale services as needed. Snowflake is widely used in the industry, and learning it will enhance your data engineering skill set.\n\nThis training covers everything from the basics of Snowflake and data warehousing to advanced integration and automation techniques. By the end, you will have the knowledge to prepare, integrate, manage data on Snowflake, and connect other systems to the platform.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/snowflake-for-data-engineers)\n\n##### Course Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| | Snowflake Basics | 4:16 |\n| | Data Warehousing Basics | 4:13 |\n| | How Snowflake Fits into Data Platforms | 3:14 |\n| **Setup** | Snowflake Account Setup | 4:24 |\n| | Creating Your Warehouse & UI Overview | 4:15 |\n| **Loading CSVs from Your PC** | Our Dataset & Goals | 3:01 |\n| | Setup Snowflake Database | 10:29 |\n| | Preparing the Upload File | 8:31 |\n| | Using Internal Stages with SnowSQL | 12:37 |\n| | Splitting a Data Table into Two Tables | 6:38 |\n| **Visualizing Data** | Creating a Visualization Worksheet | 7:08 |\n| | Creating a Dashboard | 5:23 |\n| | Connect PowerBI to Snowflake | 6:03 |\n| | Query Data with Python | 7:35 |\n| **Automation** | Create Import Task | 9:18 |\n| | Create Table Refresh Task | 3:40 |\n| | Test Our Pipeline | 3:14 |\n| **AWS S3 Integration** | Working with External Stages for AWS S3 | 10:20 |\n| | Implementing Snowpipe with S3 | 6:19 |\n\n---\n\n#### Week 9: Transforming Data With dbt\n\n##### Description\n\ndbt is a SQL-first transformation workflow that simplifies the process of transforming, testing, and documenting data. It allows teams to work directly within the data warehouse, creating trusted datasets for reporting, machine learning, and operational workflows. This training is the perfect starting point to get hands-on experience with dbt Core, dbt Cloud, and Snowflake.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/dbt-for-data-engineers)\n\n##### Course Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| **dbt Introduction & Setup** | Modern Data Experience | 5:42 |\n| | Introduction to dbt | 4:38 |\n| | Goals of this Course | 4:50 |\n| | Snowflake Preparation | 7:29 |\n| | Loading Data into Snowflake | 4:48 |\n| | Setup dbt Core | 9:35 |\n| | Preparing the GitHub Repository | 3:32 |\n| **Working with dbt-Core** | dbt Models & Materialization Explained | 6:16 |\n| | Creating Your First SQL Model | 5:48 |\n| | Working with Custom Schemas | 5:28 |\n| | Creating Your First Python Model | 4:35 |\n| | dbt Sources | 1:55 |\n| | Configuring Sources | 4:03 |\n| | Working with Seed Files | 4:20 |\n| **Tests in dbt** | Generic Tests | 3:19 |\n| | Tests with Great Expectations | 3:25 |\n| | Writing Custom Generic Tests | 2:49 |\n| **Working with dbt-Cloud** | dbt Cloud Setup | 7:25 |\n| | Creating dbt Jobs | 5:14 |\n| | CI/CD Automation with dbt Cloud and GitHub | 10:52 |\n| | Documentation in dbt | 7:38 |\n\n---\n\n#### Week 10: Pipeline Orchestration with Airflow\n\n##### Description\n\nApache Airflow is a powerful, platform-independent workflow orchestration tool widely used in the data engineering world. It allows you to create and monitor both stream and batch pipeline processes with ease. Airflow supports integration with major platforms and tools such as AWS, Google Cloud, and many more.\n\nAirflow not only helps in planning and organizing workflows but also offers robust monitoring features, allowing you to troubleshoot and maintain complex ETL pipelines effectively. As one of the most popular tools for workflow orchestration, mastering Airflow is highly valuable for data engineers.\n\nCheck out this course in my Academy: [Learn More](https://learndataengineering.com/p/learn-apache-airflow)\n\n##### Course Curriculum\n\n| Module | Lesson | Duration |\n|--------|--------|----------|\n| **Airflow Workflow Orchestration** | Airflow Usage | 3:19 |\n| **Airflow Fundamental Concepts** | Fundamental Concepts | 2:47 |\n| | Airflow Architecture | 3:09 |\n| | Example Pipelines | 4:49 |\n| | Spotlight 3rd Party Operators | 2:17 |\n| | Airflow XComs | 4:32 |\n| **Hands-On Setup** | Project Setup | 1:43 |\n| | Docker Setup Explained | 2:06 |\n| | Docker Compose & Starting Containers | 4:23 |\n| | Checking Services | 1:48 |\n| | Setup WeatherAPI | 1:33 |\n| | Setup Postgres DB | 1:58 |\n| **Learn Creating DAGs** | Airflow Webinterface | 4:37 |\n| | Creating DAG With Airflow 2.0 | 9:46 |\n| | Running our DAG | 4:15 |\n| | Creating DAG With TaskflowAPI | 6:59 |\n| | Getting Data From the API With SimpleHTTPOperator | 3:38 |\n| | Writing into Postgres | 4:12 |\n| | Parallel Processing | 4:15 |\n| **Recap** | Recap & Outlook | 4:38 |\n\n---\n\n#### What’s Next?\n\nAfter completing this roadmap, you’ll have the confidence and skills to not just analyze data but to engineer and optimize it like a pro! Explore advanced topics, start contributing to projects, and showcase your new skills to potential employers.\n\n\n\n### Roadmap for Data Scientists\n\n#### 14-Week Data Engineering Roadmap for Data Scientists\n\n#### From Notebooks to Production: Build, Deploy, and Scale Your ML Workflows\n\n#### Start this roadmap at my Academy: [Start Today](https://learndataengineering.com/p/data-engineering-for-data-scientists)\n\n---\n\n#### Who Is This Roadmap For?\n\n- Data Scientists who want to deploy and maintain ML models in production\n- ML practitioners struggling with real-time data, CI/CD, and orchestration\n- Data professionals looking to expand their engineering toolkit\n- Anyone ready to go beyond notebooks and automate their ML workflows\n\n---\n\n#### What You’ll Achieve\n\nThis roadmap provides a step-by-step approach to gaining production-grade data engineering skills. You'll start with pipelines and containerization, move on to deployment and orchestration, and finish with big data and monitoring.\n\n![Building blocks of your curriculum](/images/Roadmap-Data-Engineering-For-Data-Scientists.jpg)\n\n#### Learning Goals\n\n| Goal #  | Description                                        |\n| ------- | -------------------------------------------------- |\n| Goal #1 | Build an End-to-End ML Pipeline on AWS             |\n| Goal #2 | Add CI/CD & Containerization to Your Platform      |\n| Goal #3 | Implement the Lakehouse Architecture in AWS or GCP |\n| Goal #4 | Orchestrate Your Pipelines with Airflow            |\n| Goal #5 | Process Big Data with Apache Spark & Streaming     |\n| Goal #6 | Analyze Your ML Training Logs with Elasticsearch   |\n\n---\n\n#### 14-Week Learning Roadmap\n\n| Week       | Topic                                        |\n| ---------- | -------------------------------------------- |\n| Week 1     | Platform & Pipeline Design                   |\n| Week 2     | Docker Fundamentals                          |\n| Week 3     | Relational Data Modeling                     |\n| Week 4     | Working & Designing APIs                     |\n| Week 5 & 6 | ML & Containerization on AWS                 |\n| Week 7     | ETL & CI/CD on AWS                           |\n| Week 8     | Building a Lakehouse on AWS or GCP           |\n| Week 9     | Orchestrate with Airflow                     |\n| Week 10    | Pre-Process Data with Apache Spark           |\n| Week 11-13 | Build a Streaming Pipeline (AWS, Azure, GCP) |\n| Week 14    | Analyze Training Logs with Elasticsearch     |\n\n---\n\n#### Week 1: Platform & Pipeline Design\n\n##### Description\nData pipelines are the foundation of any data platform. In this 110-minute training, you'll learn about stream, batch, and ML pipelines. You'll also explore platform blueprints, architecture components, and Lambda architecture.\n\n**Check out this course in my Academy: [Learn More](https://learndataengineering.com/p/data-pipeline-design)**\n\n##### Course Curriculum\n\n| Lesson                                           | Duration    |\n| ------------------------------------------------ | ----------- |\n| Platform Blueprint & End to End Pipeline Example | 10:11       |\n| Data Engineering Tools Guide                     | 2:44        |\n| End to End Pipeline Example                      | 6:18        |\n| Push Ingestion Pipelines                         | 3:42        |\n| Pull Ingestion Pipelines                         | 3:34        |\n| Batch Pipelines                                  | 3:07        |\n| Streaming Pipelines                              | 3:34        |\n| Stream Analytics                                 | 2:26        |\n| Lambda Architecture                              | 4:02        |\n| Visualization Pipelines                          | 3:47        |\n| Visualization with Hive & Spark on Hadoop        | 6:21        |\n| Visualization Data via Spark Thrift Server       | 3:27        |\n| Platform Examples (AWS, Azure, GCP, Hadoop)      | Slides Only |\n\n---\n\n#### Week 2: Docker Fundamentals\n\n##### Description\nDocker is the go-to container platform for engineers. This training covers key concepts, hands-on Docker usage, building and running containers, and how Docker fits into production workflows.\n\n**Check out this course in my Academy: [Learn More](https://learndataengineering.com/p/docker-fundamentals)**\n\n##### Course Curriculum\n\n| Lesson                              | Duration |\n| ----------------------------------- | -------- |\n| Docker vs Virtual Machines          | 6:23     |\n| Docker Terminology                  | 5:56     |\n| Installing Docker Desktop           | 4:09     |\n| Pulling Images & Running Containers | 6:34     |\n| CLI Cheat Sheet                     | 3:38     |\n| Docker Compose Explained            | 6:34     |\n| Build & Run Hello World Image       | 6:28     |\n| Build Image with Dependencies       | 5:05     |\n| Using DockerHub                     | 4:24     |\n| Image Layers                        | 7:55     |\n| Deployment in Production            | 5:47     |\n| Security Best Practices             | 4:09     |\n| Managing Docker with Portainer      | 4:04     |\n\n---\n\n#### Week 3: Relational Data Modeling\n\n##### Description\nLearn how to design efficient and scalable relational models. You'll go through conceptual to physical modeling and normalize your schema. You'll use MySQL and MySQL Workbench for hands-on practice.\n\n**Check out this course in my Academy: [Learn More](https://learndataengineering.com/p/relational-data-modeling)**\n\n##### Course Curriculum\n\n| Lesson                           | Duration |\n| -------------------------------- | -------- |\n| History of Relational Models     | 3:16     |\n| Installing MySQL & Workbench     | 8:04     |\n| Workbench Introduction           | 4:36     |\n| The Design Process Explained     | 4:14     |\n| Discover Entities                | 10:24    |\n| Discover Attributes              | 13:09    |\n| Normalize & Define Relationships | 11:19    |\n| Identifying vs Non-identifying   | 2:01     |\n| Resolve Many-to-Many             | 4:00     |\n| Resolve One-to-Many              | 2:34     |\n| Resolve One-to-One               | 1:45     |\n| Create ER Diagram                | 19:46    |\n| Create Physical Data Model       | 4:13     |\n| Populate from XLS                | 15:13    |\n| Course Conclusion                | 1:28     |\n\n---\n\n#### Week 4: Working & Designing APIs\n\n##### Description\nAPIs are the backbone of modern data platforms. You'll learn how to build and test APIs using FastAPI, design schemas, and deploy them in Docker. Postman and Docker are used for testing and deployment.\n\n**Check out this course in my Academy: [Learn More](https://learndataengineering.com/p/apis-with-fastapi-course)**\n\n##### Course Curriculum\n\n| Lesson                        | Duration |\n| ----------------------------- | -------- |\n| What are APIs?                | 8:29     |\n| Hosting vs Using APIs         | 4:08     |\n| HTTP Methods & Media Types    | 6:56     |\n| Response Codes & Parameters   | 9:40     |\n| FastAPI Setup                 | 4:55     |\n| POST, GET, PUT API Methods    | 16:18    |\n| Testing with Postman          | 4:22     |\n| Deploying FastAPI with Docker | 6:01     |\n| API Security Best Practices   | 3:48     |\n\n---\n\n#### Week 5 & 6: ML & Containerization on AWS\n\n##### Description\nThis hands-on project teaches you how to build a real-time ML pipeline on AWS. You'll pull data from the Twitter API (or The Guardian API), apply sentiment analysis with NLTK in a Lambda function, store results in a Postgres database via RDS, and build a Streamlit dashboard. Finally, you’ll containerize and deploy the dashboard using AWS ECS and ECR.\n\n**Check out this project in my Academy: [Learn More](https://learndataengineering.com/p/ml-on-aws)**\n\n##### Course Curriculum\n\n| Lesson                                             | Duration |\n| -------------------------------------------------- | -------- |\n| Introduction                                       | 2:38     |\n| Project Architecture Explained                     | 2:06     |\n| RDS Setup                                          | 2:37     |\n| VPC Inbound Rules                                  | 2:12     |\n| PG Admin Installation & S3 Config                  | 4:05     |\n| Lambda Intro & IAM Setup                           | 3:11     |\n| Create Lambda Function                             | 1:24     |\n| Lambda Code Explained                              | 8:22     |\n| Insert Code Into Lambda                            | 0:56     |\n| Add Layers from Klayers                            | 5:32     |\n| Create Custom Layers                               | 4:40     |\n| Test Lambda & Set Env Variables                    | 4:53     |\n| Schedule Lambda with EventBridge                   | 3:15     |\n| Setup Virtual Conda Environment                    | 4:07     |\n| Install Dependencies with Poetry                   | 5:57     |\n| Streamlit App Code Walkthrough                     | 7:52     |\n| Setup ECR Container Registry                       | 1:52     |\n| AWS CLI Install & Login                            | 5:19     |\n| Dockerfile Build & Push                            | 2:52     |\n| Create ECS Fargate Cluster                         | 1:34     |\n| ECS Task Configuration & Deployment                | 4:59     |\n| Fixing ECS Task                                    | 5:14     |\n| Stop ECS Task                                      | 0:59     |\n| Project Conclusion                                 | 5:06     |\n\n---\n\n#### Week 7: ETL & CI/CD on AWS\n\n##### Description\nIn this project, you'll build a lightweight ETL job that pulls data from a public weather API and writes it into a time series database. You’ll dockerize the job, schedule it using AWS Lambda and EventBridge, and visualize the data using Grafana.\n\n**Check out this project in my Academy: [Learn More](https://learndataengineering.com/p/timeseries-etl-with-aws-tdengine-grafana)**\n\n### Course Curriculum\n\n| Lesson                                       | Duration |\n| -------------------------------------------- | -------- |\n| Quick Note from Andreas                      | 0:43     |\n| Project Introduction                         | 1:26     |\n| Setup of the Project                         | 2:52     |\n| Time Series Data Basics                      | 2:20     |\n| Big Pros of Time Series Databases            | 2:06     |\n| About TDengine                               | 1:22     |\n| Setup Weather API                            | 1:04     |\n| Code Query API                               | 2:41     |\n| TDengine Setup                               | 3:04     |\n| Connect Python to TDengine                   | 1:50     |\n| Lambda Docker Container & Push to ECR        | 1:55     |\n| AWS Setup                                    | 1:36     |\n| Create Lambda Function Using Docker Image    | 1:04     |\n| Schedule Function with EventBridge           | 1:25     |\n| CloudWatch Lambda Events                     | 0:27     |\n| Grafana Setup                                | 3:01     |\n\n---\n\n#### Week 8: Building a Lakehouse on AWS or GCP\n\n##### Description\nThis week, you’ll learn how to combine data lakes and warehouses into a Lakehouse architecture. You’ll implement a full data analytics stack using tools like S3, Athena, BigQuery, Glue, Quicksight, and Data Studio.\n\n**Check out this course in my Academy: [Learn More](https://learndataengineering.com/p/modern-data-warehouses)**\n\n##### Course Curriculum\n\n| Lesson                                                  | Duration |\n| -------------------------------------------------------- | -------- |\n| Introduction                                             | 2:13     |\n| Data Science Platform Overview                           | 4:10     |\n| ETL & ELT in Warehouses                                  | 6:22     |\n| Data Lake & Warehouse Integration                        | 3:29     |\n| GCP Pipelines Overview                                   | 3:13     |\n| Cloud Storage & BigQuery Hands-on                       | 8:35     |\n| Create Dashboard in Data Studio                          | 7:33     |\n| GCP Recap & AWS Goals                                    | 2:12     |\n| Upload Data to S3                                        | 2:12     |\n| Athena Manual Table Configuration                        | 3:48     |\n| Create Dashboard in Quicksight                           | 5:05     |\n| Athena via Glue Catalog                                  | 3:29     |\n| Course Recap                                             | 2:36     |\n| BONUS: Redshift Spectrum with S3                         | 2:57     |\n\n---\n\n#### Week 9: Orchestrate with Airflow\n\n##### Description\nThis training will guide you through installing and running Apache Airflow in Docker, creating DAGs, using the Taskflow API, and monitoring workflow execution.\n\n**Check out this course in my Academy: [Learn More](https://learndataengineering.com/p/learn-apache-airflow)**\n\n##### Course Curriculum\n\n| Lesson                                        | Duration |\n| --------------------------------------------- | -------- |\n| Introduction                                  | 1:36     |\n| Airflow Usage                                 | 3:19     |\n| Fundamental Concepts                          | 2:47     |\n| Airflow Architecture                          | 3:09     |\n| Example Pipelines                             | 4:49     |\n| Spotlight on 3rd Party Operators              | 2:17     |\n| Airflow XComs                                 | 4:32     |\n| Project Setup                                 | 1:43     |\n| Docker Setup Explained                        | 2:06     |\n| Docker Compose & Starting Containers          | 4:23     |\n| Checking Services                             | 1:48     |\n| Weather API Setup                             | 1:33     |\n| Postgres DB Setup                             | 1:58     |\n| Airflow Web Interface                         | 4:37     |\n| Create DAG with Airflow 2.0                   | 9:46     |\n| Run Your DAG                                  | 4:15     |\n| Create DAG with Taskflow API                  | 6:59     |\n| Get Data via SimpleHTTP Operator              | 3:38     |\n| Write to Postgres                             | 4:12     |\n| Parallel Processing                           | 4:15     |\n| Recap & Outlook                               | 4:38     |\n\n---\n\n#### Week 10: Pre-Process Data with Apache Spark\n\n##### Description\nThis training introduces Apache Spark fundamentals, showing you how to process large datasets using Spark DataFrames, RDDs, and SparkSQL inside Docker and Jupyter Notebooks.\n\n**Check out this course in my Academy: [Learn More](https://learndataengineering.com/p/learning-apache-spark-fundamentals)**\n\n##### Course Curriculum\n\n| Lesson                                | Duration |\n| ------------------------------------- | -------- |\n| Introduction & Contents               | 3:30     |\n| Vertical vs Horizontal Scaling        | 3:55     |\n| What Spark Is Good For                | 4:45     |\n| Driver, Context & Executors           | 4:11     |\n| Cluster Types                         | 1:59     |\n| Client vs Cluster Deployment          | 6:11     |\n| Where to Run Spark                    | 3:38     |\n| Tools in Spark Course                 | 2:35     |\n| Dataset Overview                      | 4:11     |\n| Docker Setup                          | 2:52     |\n| Jupyter Notebook Setup & Run         | 5:31     |\n| RDDs                                  | 3:57     |\n| DataFrames                            | 1:40     |\n| Transformations & Actions Overview   | 2:59     |\n| Transformations                       | 2:22     |\n| Actions                               | 3:06     |\n| JSON Transformations                  | 9:52     |\n| Working with Schemas                  | 8:23     |\n| Working with DataFrames               | 10:09    |\n| SparkSQL                              | 5:04     |\n| Working with RDDs                     | 12:52    |\n\n---\n\n#### Week 11–13: Build a Streaming Pipeline on AWS, Azure, or GCP\n\n##### Description\nIn this 3-week section, you'll complete an end-to-end streaming data project on the cloud platform of your choice: AWS, Azure, or GCP. Each project teaches you how to ingest real-time data, process it, store it, and create visualizations.\n\nYou only need to complete one of the following three options:\n\n---\n\n##### Option 1: Streaming Pipeline on AWS\n\n##### Description\nYou'll use AWS services like API Gateway, Kinesis, DynamoDB, Redshift, Lambda, Glue, and Power BI to create a complete streaming solution. You'll work with e-commerce data and build multiple ingestion and batch pipelines.\n\n**Check out this project in my Academy: [Learn More](https://learndataengineering.com/p/data-engineering-on-aws)**\n\n##### Course Curriculum\n\n| Lesson                                       | Duration |\n| -------------------------------------------- | -------- |\n| Data Engineering                             | 4:15     |\n| Data Science Platform                        | 5:20     |\n| Dataset Introduction                         | 3:16     |\n| Relational Storage Possibilities             | 3:46     |\n| NoSQL Storage Possibilities                  | 6:28     |\n| Platform Design & Pipeline Planning          | 3:49     |\n| Client to Visualization Design               | 3:00     |\n| Data Ingestion to Kinesis                    | 3:00     |\n| Stream to S3 and DynamoDB                    | 5:28     |\n| Visualization API & Redshift                 | 5:29     |\n| AWS Setup & IAM                              | 4:06     |\n| Create Lambda Functions                      | 2:33     |\n| Configure Firehose & Debugging               | 7:43     |\n| Power BI Setup                               | 12:16    |\n| Glue Crawlers and Jobs                       | 26:52    |\n\n---\n\n##### Option 2: Streaming Pipeline on Azure\n\n##### Description\nYou’ll build a Twitter-like JSON stream pipeline using Azure Functions, Event Hub, Cosmos DB, and Power BI. You’ll learn how to set up API management, key vaults, and authentication.\n\n**Check out this project in my Academy: [Learn More](https://learndataengineering.com/p/build-streaming-data-pipelines-in-azure)**\n\n#### Course Curriculum\n\n| Lesson                                               | Duration |\n| ---------------------------------------------------- | -------- |\n| Project Introduction                                 | 2:43     |\n| Local Preprocessing & Docker Setup                   | 7:06     |\n| Develop & Deploy Azure Functions                     | 5:52     |\n| Test Functions & Integrate with Blob Storage         | 6:26     |\n| Add Functions to Azure API Management (APIM)         | 7:05     |\n| Key Vault & Authentication                           | 4:41     |\n| Create Event Hubs and Bindings                       | 6:59     |\n| Write to Cosmos DB                                   | 9:03     |\n| Power BI Connection and Dashboard Creation           | 6:32     |\n\n---\n\n##### Option 3: Streaming Pipeline on GCP\n\n##### Description\nThis project shows how to extract weather data via API, stream it with Pub/Sub, write it into Cloud SQL, and visualize it with Looker Studio. You'll also learn function deployment and VM/database setup.\n\n**Check out this project in my Academy: [Learn More](https://learndataengineering.com/p/data-engineering-on-gcp)**\n\n##### Course Curriculum\n\n| Lesson                                              | Duration |\n| --------------------------------------------------- | -------- |\n| Introduction & Setup                               | 2:43     |\n| Architecture & Weather API                          | 5:31     |\n| Enable APIs & Configure Scheduling                  | 4:00     |\n| Setup MySQL Database & Compute Engine               | 4:40     |\n| Create Cloud Functions for Data Ingestion           | 8:37     |\n| Use Pub/Sub for Messaging                           | 1:41     |\n| Write Data to Cloud SQL                             | 13:43    |\n| Test and Monitor Data Flow                          | 5:51     |\n| Setup Looker Studio & Build Dashboards              | 4:17     |\n| Monitor Pipelines                                   | 6:20     |\n\n---\n\n##### Week 14: Analyze Training Logs with Elasticsearch\n\n##### Description\nWrap up your roadmap by learning how to monitor pipelines using Elasticsearch. You’ll deploy Elasticsearch with Docker, send logs from your training pipelines, and visualize them in Kibana dashboards.\n\n**Check out this course in my Academy: [Learn More](https://learndataengineering.com/p/log-analysis-with-elasticsearch)**\n\n##### Course Curriculum\n\n| Lesson                                           | Duration |\n| ------------------------------------------------ | -------- |\n| Course Introduction                              | 2:07     |\n| Elasticsearch vs Relational Databases            | 5:43     |\n| ETL Log Analysis & Debugging                     | 3:54     |\n| Streaming Log Analysis & Debugging               | 2:48     |\n| Solving Problems with Elasticsearch              | 4:37     |\n| ELK Stack Overview                               | 2:03     |\n| Setup Limiting RAM & Environment Config          | 4:26     |\n| Running Elasticsearch                            | 4:07     |\n| Elasticsearch APIs & Python Index Creation       | 7:31     |\n| Write Logs (JSON) to Elasticsearch               | 4:46     |\n| Create Kibana Visualizations & Dashboards        | 9:27     |\n| Search Logs in Elasticsearch                     | 4:57     |\n| Course Recap                                     | —        |\n\n---\n\n#### What’s Next?\n\nAfter 14 weeks, you’ll have built scalable, production-ready data pipelines and ML workflows. You can now explore more advanced projects, optimize performance, and contribute to production systems with confidence. Need help showcasing your skills or getting hired? Reach out to my coaching program!\n\n\n### Roadmap for Software Engineers\n\n![Building blocks of your curriculum](/images/Data-Engineering-Roadmap-for-Software-Engineers.jpg)\n\nif you're transitioning from a background in computer science or software engineering into data engineering, you're already equipped with a solid foundation. Your existing knowledge in coding, familiarity with SQL databases, understanding of computer networking, and experience with operating systems like Linux, provide you with a considerable advantage. These skills form the cornerstone of data engineering and can significantly streamline your learning curve as you embark on this new journey.\n\nHere's a refined roadmap, incorporating your prior expertise, to help you excel in data engineering:\n\n- **Deepen Your Python Skills:** Python is crucial in data engineering for processing and handling various data formats, such as APIs, CSV, and JSON. Given your coding background, focusing on Python for data engineering will enhance your ability to manipulate and process data effectively.\n- **Master Docker:** Docker is essential for deploying code and managing containers, streamlining the software distribution process. Your understanding of operating systems and networking will make mastering Docker more intuitive, as you'll appreciate the importance of containerization in today's development and deployment workflows.\n- **Platform and Pipeline Design:** Leverage your knowledge of computer networking and operating systems to grasp the architecture of data platforms. Understanding how to design data pipelines, including considerations for stream and batch processing, and emphasizing security, will be key. Your background will provide a solid foundation for understanding how different components integrate within a data platform.\n- **Choosing the Right Data Stores:** Dive into the specifics of data stores, understanding the nuances between transactional and analytical databases, and when to use relational vs. NoSQL vs. document stores vs. time-series databases. Your experience with SQL databases will serve as a valuable baseline for exploring these various data storage options.\n- **Explore Cloud Platforms:** Get hands-on with cloud services such as AWS, GCP, and Azure. Projects or courses that offer practical experience with these platforms will be invaluable. Your tasks might include building pipelines to process data from APIs, using message queues, or delving into data warehousing and lakes, capitalizing on your foundational skills.\n- **Optional Deep Dives:** For those interested in advanced data processing, exploring technologies like Spark or Kafka for stream processing can be enriching. Additionally, learning how to build APIs and work with MongoDB for document storage can open new avenues, especially through practical projects.\n- **Log Analysis and Data Observability:** Familiarize yourself with tools like Elasticsearch, Grafana, and InfluxDB to monitor and analyze your data pipelines effectively. This area leverages your comprehensive understanding of how systems communicate and operate, enhancing your ability to maintain and optimize data flows.\n\nAs you embark on this path, remember that your journey is unique. Your existing knowledge not only serves as a strong foundation but also as a catalyst for accelerating your growth in the realm of data engineering. Keep leveraging your strengths, explore areas of interest deeply, and continually adapt to the evolving landscape of data technology.\n\n| Live Stream -> Data Engineering Roadmap for Computer Scientists / Developers\n|------------------|\n|In this live stream you'll find even more details how to read this roadmap for Data Scientists, why I chose these tools and why I think this is the right way to do it.\n| [Watch on YouTube](https://youtube.com/live/0e4WfIUixRw)|\n\n\n## Data Engineers Skills Matrix\n\n![Data Engineer Skills Matrix](/images/Data-Engineer-Skills-Matrix.jpg)\n\nIf you're diving into the world of data engineering or looking to climb the ladder within this field, you're in for a treat with this enlightening YouTube video. Andreas kicks things off by introducing us to a very handy tool they've developed: the Data Engineering Skills Matrix. This isn't just any chart; it's a roadmap designed to navigate the complex landscape of data engineering roles, ranging from a Junior Data Engineer to the lofty heights of a Data Architect and Machine Learning Engineer.\n\n| Live Stream -> Data Engineering Skills Matrix\n|------------------|\n|In this live stream you'll find even more details how to read this skills matrix for Data Engineers.  \n| [Watch on YouTube](https://youtube.com/live/5E0UiBy0Kwo)|\n\nAndreas takes us through the intricacies of this matrix, layer by layer. Starting with the basics, they discuss the minimum experience needed for each role. It's an eye-opener, especially when you see how experience requirements evolve from a beginner to senior levels. But it's not just about how many years you've spent in the field; it's about the skills you've honed during that time.\n\n### Challenges & Responsibilities\n\nAs the conversation progresses, Andreas delves into the core responsibilities and main tasks associated with each role. You'll learn what sets a Junior Data Engineer apart from a Senior Data Engineer, the unique challenges a Data Architect faces, and the critical skills a Machine Learning Engineer must possess. This part of the video is golden for anyone trying to understand where they fit in the data engineering ecosystem or plotting their next career move.\n\n### SQL & Soft Skills\n\nThen there's the talk on SQL knowledge and its relevance across different roles. This segment sheds light on how foundational SQL is, irrespective of your position. But it's not just about technical skills; the video also emphasizes soft skills, like leadership and collaboration, painting a holistic picture of what it takes to succeed in data engineering.\n\nFor those who love getting into the weeds, Andreas doesn't disappoint. They discuss software development skills, debugging, and even dive into how data engineers work with SQL and databases. This part is particularly insightful for understanding the technical depth required at various stages of your career.\n\n### Q&A\n\nAnd here's the cherry on top: Andreas encourages interaction, inviting viewers to share their experiences and questions. This makes the video not just a one-way learning experience but a dynamic conversation that enriches everyone involved.\n\n### Summary\n\nBy the end of this video, you'll walk away with a clear understanding of the data engineering field's diverse roles. You'll know the skills needed to excel in each role and have a roadmap for your career progression. Whether you're a recent graduate looking to break into data engineering or a seasoned professional aiming for a senior position, Andreas's video is a must-watch. It's not just a lecture; it's a guide to navigating the exciting world of data engineering, tailored by someone who's taken the time to lay out the journey for you.\n\n\n\n## How to Become a Senior Data Engineer\n\nBecoming a senior data engineer is a goal many in the tech industry aspire to. It's a role that demands a deep understanding of data architecture, advanced programming skills, and the ability to lead and communicate effectively within an organization. In this live stream series, I dove into what it takes to climb the ladder to a senior data engineering position. Here are the key takeaways. You can find the links to the videos and the shown images below.\n\n### Understanding the Role\nThe journey to becoming a senior data engineer starts with a clear understanding of what the role entails. Senior data engineers are responsible for designing, implementing, and maintaining an organization's data architecture. They ensure data accuracy, accessibility, and security, often taking the lead on complex projects that require advanced technical skills and strategic thinking.\n\n### Key Skills and Knowledge Areas\nBased on insights from the live stream and consultations with industry experts, including GPT-3, here are the critical areas where aspiring senior data engineers should focus their development:\n\n- **Advanced Data Modeling and Architecture:** Mastery of data modeling techniques and architecture best practices is crucial. This includes understanding of dimensional and Data Vault modeling, as well as expertise in SQL and NoSQL databases.\n- **Big Data Technologies:** Familiarity with distributed computing frameworks (like Apache Spark), streaming technologies (such as Apache Kafka), and cloud-based big data technologies is essential.\nAdvanced ETL Techniques: Skills in incremental loading, data merging, and transformation are vital for efficiently processing large datasets.\n- **Data Warehousing and Data Lake Implementation:** Building and maintaining scalable and performant data warehouses and lakes are fundamental responsibilities.\n- **Cloud Computing:** Proficiency in cloud services from AWS, Azure, or GCP, along with platforms like Snowflake and Databricks, is increasingly important.\n- **Programming and Scripting:** Advanced coding skills in languages relevant to data engineering, such as Python, Scala, or Java, are non-negotiable.\n- **Data Governance and Compliance:** Understanding data governance frameworks and compliance requirements is critical, especially in highly regulated industries.\n- **Leadership and Communication:** Beyond technical skills, the ability to lead projects, communicate effectively with both technical and non-technical team members, and mentor junior engineers is what differentiates a senior engineer.\n\n### Learning Pathways\nBecoming a senior data engineer requires continuous learning and real-world experience. Here are a few steps to guide your journey:\n\n- **Educational Foundation:** Start with a strong foundation in computer science or a related field. This can be through formal education or self-study courses.\n- **Gain Practical Experience:** Apply your skills in real-world projects. This could be in a professional setting, contributions to open-source projects, or personal projects.\n- **Specialize and Certify:** Consider specializing in areas particularly relevant to your interests or industry needs. Obtaining certifications in specific technologies or platforms can also bolster your credentials.\n- **Develop Soft Skills:** Work on your communication, project management, and leadership skills. These are as critical as your technical abilities.\n- **Seek Feedback and Mentorship:** Learn from the experiences of others. Seek out mentors who can provide guidance and feedback on your progress.\n\n### Video 1\n\n| Live Stream -> How to become a Senior Data Engineer - Part 1\n|------------------|\n| In this part one I talked about Data Modeling, Big Data, ETL, Data Warehousing & Data Lakes as well as Cloud computing\n| [Watch on YouTube](https://youtube.com/live/M-6xkTCKQQc)|\n\n![Watch on YouTube](/images/Becoming-a-Senior-Data-Engineer-Video-1.jpg)\n\n### Video 2\n\n| Live Stream -> How to become a Senior Data Engineer - Part 2\n|------------------|\n| In part two I talked about real time data processing, programming & scripting, data governance, compliance and data security\n| [Watch on YouTube](https://youtube.com/live/po96pzpjxvA)|\n\n![Watch on YouTube](/images/Becoming-a-Senior-Data-Engineer-Video-2.jpg)\n\n### Video 3\n\n| Live Stream -> How to become a Senior Data Engineer - Part 3\n|------------------|\n| In part 3 I focused on everything regarding Leadership and Communication: team management, project management, collaboration, problem solving, strategic thinking, communication and leadership\n| [Watch on YouTube](https://youtube.com/live/DMumpzSyRjI)|\n\n![Watch on YouTube](/images/Becoming-a-Senior-Data-Engineer-Video-3.jpg)\n\n### Final Thoughts\nThe path to becoming a senior data engineer is both challenging and rewarding. It requires a blend of technical prowess, continuous learning, and the development of soft skills that enable you to lead and innovate. Whether you're just starting out or looking to advance your career, focusing on the key areas outlined above will set you on the right path.\n"
  },
  {
    "path": "sections/02-BasicSkills.md",
    "content": "\nBasic Computer Science Skills\n=============================\n\n## Contents\n\n- [Learn to Code](02-BasicSkills.md#learn-to-code)\n- [Get Familiar with Git](02-BasicSkills.md#get-familiar-with-git)\n- [Agile Development](02-BasicSkills.md#agile-development)\n  - [Why Is Agile So Important?](02-BasicSkills.md#Why-is-agile-so-important)\n  - [Agile Rules I Learned Over the Years](02-BasicSkills.md#agile-rules-i-learned-over-the-years)\n  - [Agile Frameworks](02-BasicSkills.md#agile-frameworks)\n    - [Scrum](02-BasicSkills.md#scrum)\n    - [OKR](02-BasicSkills.md#okr)\n- [Software Engineering Culture](02-BasicSkills.md#software-engineering-culture)\n- [Learn How a Computer Works](02-BasicSkills.md#learn-how-a-computer-works)\n- [Data Network Transmission](02-BasicSkills.md#data-network-transmission)\n- [Security and Privacy](02-BasicSkills.md#security-and-privacy)\n  - [SSL Public and Private Key Certificates](02-BasicSkills.md#ssl-public-and-private-key-Certificates)\n  - [JSON Web Tokens](02-BasicSkills.md#json-web-tokens)\n  - [GDPR Regulations](02-BasicSkills.md#gdpr-regulations)\n- [Linux](02-BasicSkills.md#linux)\n  - [OS Basics](02-BasicSkills.md#os-basics)\n  - [Shell Scripting](02-BasicSkills.md#shell-scripting)\n  - [Cron Jobs](02-BasicSkills.md#cron-jobs)\n  - [Packet Management](02-BasicSkills.md#packet-management)\n- [Docker](02-BasicSkills.md#docker)\n  - [What is Docker and How it Works](02-BasicSkills.md#what-is-docker-and-what-do-you-use-it-for)\n  - [Kubernetes Container Deployment](02-BasicSkills.md#kubernetes-container-deployment)\n  - [Why and How To Do Docker Container Orchestration](02-BasicSkills.md#why-and-how-to-do-docker-container-orchestration)\n  - [Useful Docker Commands](02-BasicSkills.md#useful-docker-commands)\n- [The Cloud](02-BasicSkills.md#the-cloud)\n  - [IaaS vs. PaaS vs. SaaS](02-BasicSkills.md#iaas-vs-paas-vs-saas)\n  - [AWS Azure IBM Google](02-BasicSkills.md#aws-azure-ibm-google)\n  - [Cloud vs. On-Premises](02-BasicSkills.md#cloud-vs-on-premises)\n  - [Security](02-BasicSkills.md#security)\n  - [Hybrid Clouds](02-BasicSkills.md#hybrid-clouds)\n- [Data Scientists and Machine Learning](02-BasicSkills.md#Data-Scientists-and-Machine-Learning)\n  - [Machine Learning Workflow](02-BasicSkills.md#machine-learning-workflow)\n  - [Machine Learning Model and Data](02-BasicSkills.md#machine-learning-model-and-data)\n\n\n\nLearn to Code\n-------------\n\nWhy this is important: Without coding you cannot do much in data\nengineering. I cannot count the number of times I needed a quick hack to solve a problem.\n\nThe possibilities are endless:\n\n-   Writing or quickly getting some data out of a SQL DB.\n\n-   Testing to produce messages to a Kafka topic.\n\n-   Understanding the source code of a Webservice\n\n-   Reading counter statistics out of a HBase key-value store.\n\nSo, which language do I recommend then?\n\n\nIf you would asked me a few years ago I would have said Java, 100%. Nowadays though the community moved heavily to Python. I highly recommend starting with it.\n\nWhen you are getting into data processing with Spark you can use\nScala which is a JVM language, but Python is also very good here.\n\nPython is a great choice. It is super versatile.\n\n\nWhere to Learn Python? There are free Python courses all over the internet.\n- I have a beginner one in my Data Engineering academy: [Introduction to Python course](https://learndataengineering.com/p/introduction-to-python)\n- I also have a Python for Data Engineers one one in my Data Engineering academy: [Python for Data Engineers course](https://learndataengineering.com/p/python-for-data-engineers)\n\nKeep in mind to always keep it practical: Learning by doing!\n\nI talked about the importance of learning by doing in this podcast:\n<https://anchor.fm/andreaskayy/episodes/Learning-By-Doing-Is-The-Best-Thing-Ever---PoDS-035-e25g44>\n\nGet Familiar with Git\n---------------------\n\nWhy this is important: One of the major problems with coding is to keep\ntrack of changes. It is also almost impossible to maintain a program you\nhave multiple versions of.\n\nAnother problem is the topic of collaboration and documentation, which\nis super important.\n\nLet's say you work on a Spark application and your colleagues need to\nmake changes while you are on holiday. Without some code management, they\nare in huge trouble:\n\nWhere is the code? What have you changed last? Where is the\ndocumentation? How do we mark what we have changed?\n\nBut, if you put your code on GitHub, your colleagues can find your code.\nThey can understand it through your documentation (please also have\nin-line comments).\n\nDevelopers can pull your code, make a new branch, and do the changes.\nAfter your holiday, you can inspect what they have done and merge it with\nyour original code, and you end up having only one application.\n\nWhere to learn: Check out the GitHub Guides page where you can learn all\nthe basics: <https://guides.github.com/introduction/flow/>\n\nThis great GitHub commands cheat sheet saved my butt multiple times:\n<https://www.atlassian.com/git/tutorials/atlassian-git-cheatsheet>\n\nAlso look into:\n\n-   Pull\n\n-   Push\n\n-   Branching\n\n-   Forking\n\nGitHub uses markdown to write pages, a super simple language that is actually a lot of fun to write. Here's a markdown cheat cheatsheet:\n<https://www.markdownguide.org/cheat-sheet/>\n\nPandoc is a great tool to convert any text file to and from markdown:\n<https://pandoc.org>\n\n\nAgile Development\n-----------------\n\nAgility is the ability to adapt quickly to changing circumstances.\n\nThese days, everyone wants to be agile. Big and small companies are\nlooking for the \"startup mentality.\"\n\nMany think it's the corporate culture. Others think it's the process of how\nwe create things that matters.\n\nIn this article, I am going to talk about agility and self-reliance,\nabout how you can incorporate agility in your professional career.\n\n### Why Is Agile So Important?\n\nHistorically, development has been practiced as an explicitly defined process. You\nthink of something, specify it, have it developed, and then build in mass\nproduction.\n\nIt's a bit of an arrogant process. You assume that you already know\nexactly what a customer wants, or how a product has to look and how\neverything works out.\n\nThe problem is that the world does not work this way!\n\nOftentimes the circumstances change because of internal factors.\n\nSometimes things just do not work out as planned or stuff is harder than\nyou think.\n\nYou need to adapt.\n\nOther times you find out that you built something customers do not like\nand needs to be changed.\n\nYou need to adapt.\n\nThat's why people jump on the Scrum train -- because Scrum is the\ndefinition of agile development, right?\n\n### Agile Rules I Learned Over the Years\n\n#### Is the Method Making a Difference?\n\nYes, Scrum or Google's OKR can help to be more agile. The secret to\nbeing agile, however, is not only how you create.\n\nWhat makes me cringe is people trying to tell you that being agile\nstarts in your head. So, the problem is you?\n\nNo!\n\nThe biggest lesson I have learned over the past years is this: Agility\ngoes down the drain when you outsource work.\n\n#### The Problem with Outsourcing\n\nI know on paper outsourcing seems like a no-brainer: development costs\nagainst the fixed costs.\n\nIt is expensive to bind existing resources on a task. It is even more\nexpensive if you need to hire new employees.\n\nThe problem with outsourcing is that you pay someone to build stuff for\nyou.\n\nIt does not matter who you pay to do something for you. He needs to make\nmoney.\n\nHis agenda will be to spend as little time as possible on your work. That\nis why outsourcing requires contracts, detailed specifications,\ntimetables, and delivery dates.\n\nHe doesn't want to spend additional time on a project, only because you\nwant changes in the middle. Every unplanned change costs him time and\ntherefore money.\n\nIf so, you need to make another detailed specification and a contract\nchange.\n\nHe is not going to put his mind into improving the product while\ndeveloping. Firstly, because he does not have the big picture. Secondly,\nbecause he does not want to.\n\nHe is doing as he is told.\n\nWho can blame him? If I were the subcontractor, I would do exactly the\nsame!\n\nDoes this sound agile to you?\n\n#### Knowledge Is King: A lesson from Elon Musk\n\nDoing everything in house -- that's why startups are so productive. No\ntime is wasted on waiting for someone else.\n\nIf something does not work or needs to be changed, there is someone on\nthe team who can do it right away.\n\nOne very prominent example who follows this strategy is Elon Musk.\n\nTesla's Gigafactories are designed to get raw materials in on one side\nand spit out cars on the other. Why do you think Tesla is building\nGigafactories that cost a lot of money?\n\nWhy is SpaceX building its own space engines? Clearly, there are other,\nolder companies who could do that for them.\n\nWhy is Elon building tunnel boring machines at his new boring company?\n\nAt first glance, this makes no sense!\n\n#### How You Really Can Be Agile\n\nIf you look closer, it all comes down to control and knowledge. You, your\nteam, your company, needs to do as much as possible on your own.\nSelf-reliance is king.\n\nBuild up your knowledge and therefore the team's knowledge. When you have\nthe ability to do everything yourself, you are in full control.\n\nYou can build electric cars, build rocket engines, or bore tunnels.\n\nDon't largely rely on others, and be confident to just do stuff on your\nown.\n\nDream big, and JUST DO IT!\n\nPS. Don't get me wrong. You can still outsource work. Just do it in a\nsmart way by outsourcing small independent parts.\n\n### Agile Frameworks\n\n#### Scrum\n\nThere's an interesting Medium article with a lot of details\nabout Scrum: <https://medium.com/serious-scrum>\n\nAlso, this Scrum guide webpage has good info:\n<https://www.scrumguides.org/scrum-guide.html>\n\n#### OKR\n\nI personally love OKR and have been using it for years. Especially for smaller\nteams, OKR is great. You don't have a lot of overhead and get work done.\nIt helps you stay focused and look at the bigger picture.\n\nI recommend doing a sync meeting every Monday. There you talk about what\nhappened last week and what you are going to work on this week.\n\nI talked about this in this podcast:\n<https://anchor.fm/andreaskayy/embed/episodes/Agile-Development-Is-Important-But-Please-Dont-Do-Scrum--PoDS-041-e2e2j4>\n\nThere is also this awesome 1,5-hour startup guide from Google:\n<https://youtu.be/mJB83EZtAjc> I really love this video; I rewatched it\nmultiple times.\n\n### Software Engineering Culture\n\nThe software engineering and development culture is super important. How\ndoes a company handle product development with hundreds of developers?\nCheck out this podcast:\n\n| Podcast episode: #070 Engineering Culture At Spotify\n|------------------\n|In this podcast, we look at the engineering culture at Spotify, my favorite music streaming service. The process behind the development of Spotify is really awesome.\n  |[Watch on YouTube](https://youtu.be/1asVrsUDbp0) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/070-The-Engineering-Culture-At-Spotify-e45ipa)|\n\n\n**Some interesting slides:**\n\n<https://labs.spotify.com/2014/03/27/spotify-engineering-culture-part-1/>\n\n<https://labs.spotify.com/2014/09/20/spotify-engineering-culture-part-2/>\n\nLearn How a Computer Works\n--------------------------\n\n### CPU,RAM,GPU,HDD\n\n### Differences Between PCs and Servers\n\nI talked about computer hardware and GPU processing in this podcast:\n<https://anchor.fm/andreaskayy/embed/episodes/Why-the-hardware-and-the-GPU-is-super-important--PoDS-030-e23rig>\n\nData Network Transmission\n---------------------------------------\n\n### OSI Model\n\nThe OSI Model describes how data flows through the network. It\nconsists of layers starting from physical layers, basically how the data\nis transmitted over the line or optic fiber.\n\nCheck out this article for a deeper understanding of the layers and processes outlined in the OSI model:\n<https://www.studytonight.com/computer-networks/complete-osi-model>\n\nThe Wikipedia page is also very good:\n<https://en.wikipedia.org/wiki/OSI_model>\n\n###### Which Protocol Lives on Which Layer?\n\nCheck out this network protocol map. Unfortunately, it is really hard to\nfind it theses days:\n<https://www.blackmagicboxes.com/wp-content/uploads/2016/12/Network-Protocols-Map-Poster.jpg>\n\n### IP Subnetting\n\nCheck out this IP address and subnet guide from Cisco:\n<https://www.cisco.com/c/en/us/support/docs/ip/routing-information-protocol-rip/13788-3.html>\n\nA calculator for subnets:\n<https://www.calculator.net/ip-subnet-calculator.html>\n\n### Switch, Layer-3 Switch\n\nFor an introduction to how ethernet went from broadcasts, to bridges, to\nEthernet MAC switching, to ethernet & IP (layer 3) switching, to\nsoftware-defined networking, and to programmable data planes that can\nswitch on any packet field and perform complex packet processing, see\nthis video: <https://youtu.be/E0zt_ZdnTcM?t=144>\n\n### Router\n\n### Firewalls\n\nI talked about network infrastructure and techniques in this podcast:\n<https://anchor.fm/andreaskayy/embed/episodes/IT-Networking-Infrastructure-and-Linux-031-PoDS-e242bh>\n\nSecurity and Privacy\n--------------------\n\n### SSL Public and Private Key Certificates\n\n\n<https://www.cloudflare.com/learning/ssl/how-does-ssl-work/>\n\n<https://www.kaspersky.com/resource-center/definitions/what-is-a-ssl-certificate>\n\n<https://www.ssl.com/faqs/what-is-a-certificate-authority/>\n\n\n### JSON Web Tokens\n\nLink to the Wiki page: <https://en.wikipedia.org/wiki/JSON_Web_Token>\n\n### GDPR Regulations\n\nThe EU created the GDPR \\\"General Data Protection Regulation\\\" to\nprotect your personal data like: name, age, address, and so\non.\n\nIt's huge and quite complicated. If you want to do online business in\nthe EU, you need to apply these rules. The GDPR is applicable since May\n25th, 2018. So, if you haven't looked into it, now is the time.\n\nThe penalties can be crazy high if you make mistakes here.\n\nCheck out the full GDPR regulation here: <https://gdpr-info.eu>\n\nBy the way, if you do profiling or analyse big data in general, look\ninto it. There are some important regulations, unfortunately.\n\nI spend months with GDPR compliance. Super fun. Not! Hahaha\n\n### Privacy by Design\n\nWhen should you look into privacy regulations and solutions?\n\nCreating the product or service first and then bolting on the privacy is\na bad choice. The best way is to start implementing privacy right away\nin the engineering phase.\n\nThis is called privacy by design. Privacy is an integral part of your\nbusiness, not just something optional.\n\nCheck out the Wikipedia page to get a feeling for the important\nprinciples: <https://en.wikipedia.org/wiki/Privacy_by_design>\n\nLinux\n-----\n\nLinux is very important to learn, at least the basics. Most big-data\ntools or NoSQL databases run on Linux.\n\nFrom time to time, you need to modify stuff through the operating system,\nespecially if you run an infrastructure as a service solution like\nCloudera CDH, Hortonworks, or a MapR Hadoop distribution.\n\n### OS Basics\n\nShow all historic commands:\n\n    history | grep docker\n\n### Shell scripting\n\nAh, creating shell scripts in 2019? Believe it or not, scripting in the\ncommand line is still important.\n\nStart a process, automatically rename, move or do a quick compaction of\nlog files. It still makes a lot of sense.\n\nCheck out this cheat sheet to get started with scripting in Linux:\n<https://devhints.io/bash>\n\nThere's also this Medium article with a super-simple example for\nbeginners:\n<https://medium.com/@saswat.sipun/shell-scripting-cheat-sheet-c0ecfb80391>\n\n### Cron Jobs\n\nCron jobs are super important to automate simple processes or jobs in\nLinux. You need this here and there, I promise. Check out these three\nguides:\n\n<https://linuxconfig.org/linux-crontab-reference-guide>\n\n<https://www.ostechnix.com/a-beginners-guide-to-cron-jobs/>\n\nAnd, of course, Wikipedia, which is surprisingly good:\n<https://en.wikipedia.org/wiki/Cron>\n\nPro tip: Don't forget to end your cron files with an empty line or a\ncomment, otherwise it will not work.\n\n### Packet Management\n\nLinux tips are the second part of this podcast:\n<https://anchor.fm/andreaskayy/embed/episodes/IT-Networking-Infrastructure-and-Linux-031-PoDS-e242bh>\n\n\nDocker\n------\n\n### What is Docker, and What Do You Use It for?\n\nHave you played around with Docker yet? If you're a data science learner\nor a data scientist, you need to check it out!\n\nIt's awesome because it simplifies the way you can set up development\nenvironments for data science. If you want to set up a dev environment,\nyou usually have to install a lot of packages and tools.\n\n#### Don't Mess Up Your System\n\nWhat this does is basically mess up your operating system. If you're\njust starting out, you don't know which packages you need to install. You don't\nknow which tools you need to install.\n\nIf you want to, for instance, start with Jupyter Notebooks, you need to\ninstall that on your PC somehow. Or, you need to start installing tools\nlike PyCharm or Anaconda.\n\nAll that gets added to your system, and so you mess up your system more\nand more and more. What Docker brings you, especially if you're on a Mac\nor a Linux system, is simplicity.\n\n#### Preconfigured Images\n\nBecause it is so easy to install on those systems, another cool thing\nabout Docker images is you can just search them in the Docker store,\ndownload them, and install them on your system.\n\nRunning them in a completely pre-configured environment, you don't need\nto think about stuff. You go to the Docker library, and you search for Deep\nLearning, GPU and Python.\n\nYou get a list of images you can download. You download one, start it\nup, go to the browser and hit up the URL, and just start coding.\n\nStart doing the work. The only other thing you need to do is bind some\ndrives to that instance so you can exchange files. And, then that's it!\n\nThere is no way that you can crash or mess up your system. It's all\nencapsulated into Docker. Why this works is because Docker has native\naccess to your hardware.\n\n#### Take It With You\n\nIt's not a completely virtualized environment like a VirtualBox. An\nimage has the upside that you can take it wherever you want. So, if\nyou're on your PC at home, use that there.\n\nMake a quick build, take the image, and go somewhere else. Install the\nimage, which is usually quite fast, and just use it like you're at home.\n\nIt's that awesome!\n\n### Kubernetes Container Deployment\n\nI am getting into Docker a lot more myself. For a some different reasons.\n\nWhat I'm looking for is using Docker with Kubernetes. With Kubernetes,\nyou can automate the whole container deployment process.\n\nThe idea is that you have a cluster of machines. Lets say you have\na 10-server cluster and you run Kubernetes on it.\n\nKubernetes lets you spin up Docker containers on demand to execute\ntasks. You can set up how much resources like CPU, RAM, and network your\nDocker container can use.\n\nYou can basically spin up containers, on the cluster on demand, whenever\nyou need to do an analytics task.\n\nThat's perfect for data science.\n\n\n### How to Create, Start, Stop a Container\n\n### Docker Micro-Services?\n\n### Kubernetes\n\n### Why and How to Do Docker Container Orchestration\n\nPodcast about how data science learners use Docker (for data\nscientists):\n<https://anchor.fm/andreaskayy/embed/episodes/Learn-Data-Science-Go-Docker-e10n7u>\n\n### Useful Docker Commands\n\nCreate a container:\n\n    docker run CONTAINER --network NETWORK\n\nStart a stopped container:\n\n    docker start CONTAINER NAME\n\nStop a running container:\n\n    docker stop\n\nList all running containers:\n\n    docker ps\n\nList all containers including stopped ones:\n\n    docker ps -a\n\nInspect the container configuration (e.g. network settings, etc.):\n\n    docker inspect CONTAINER\n\nList all available virtual networks:\n\n    docker network ls\n\nCreate a new network:\n\n    docker network create NETWORK --driver bridge\n\nConnect a running container to a network:\n\n    docker network connect NETWORK CONTAINER\n\nDisconnect a running container from a network:\n\n    docker network disconnect NETWORK CONTAINER\n\nRemove a network:\n\n    docker network rm NETWORK\n\n\nThe Cloud\n---------\n\n### IaaS vs. PaaS vs. SaaS\n\nCheck out this podcast. It will help you understand the\ndifference and how to decide what to use.\n\n| Podcast episode: #082 Reading Tweets With Apache Niﬁ & IaaS vs PaaS vs SaaS\n|------------------|\n|In this episode, we talk about the differences between infrastructure as a service, platform as a service, and application as a service. Then, we install the Niﬁ Docker container and look into how we can extract the twitter data.\n| [Watch on YouTube](https://youtu.be/pWuT4UAocUY) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/082-Reading-Tweets-With-Apache-Nifi--IaaS-vs-PaaS-vs-SaaS-e45j50)|\n\n\n### AWS, Azure, IBM, Google\n\nEach of these have their own answer to IaaS, Paas, and SaaS. Pricing and\npricing models vary greatly between each provider. Likewise, each\nprovider's service may have limitations and strengths.\n\n#### AWS\n\nHere is the [full list of AWS services](https://www.amazonaws.cn/en/products/). Studying for the [AWS Certified Cloud Practitioner](https://aws.amazon.com/certification/certified-cloud-practitioner/?ch=cta&cta=header&p=2) and/or [AWS Certified Solutions Architect](https://aws.amazon.com/certification/certified-solutions-architect-associate/?ch=sec&sec=rmg&d=1) exams can be helpful to quickly gain an understanding of all these services.\nHere are links for free digital training for the [AWS Certified Cloud Practitioner](https://explore.skillbuilder.aws/learn/public/learning_plan/view/82/cloud-foundations-learning-plan) and [AWS Certified Solutions Architect Associate](https://explore.skillbuilder.aws/learn/public/learning_plan/view/78/architect-learning-plan).\n\nHere is a free 17 hour [Data Analytics Learning plan](https://explore.skillbuilder.aws/learn/public/learning_plan/view/97/data-analytics-learning-plan) for AWS's [Analytics](https://aws.amazon.com/big-data/datalakes-and-analytics/?nc2=h_ql_prod_an)/Data Engineering services.\n\n#### Azure\n[Full list of Azure services](https://azure.microsoft.com/en-us/services/).\n[Get started with mini courses](https://docs.microsoft.com/en-us/learn/browse/).\n\n#### IBM\n\n#### Google\n\nGoogle Cloud Platform offers a wide, ever-evolving variety of services.\n[List of GCP services with brief description](https://github.com/gregsramblings/google-cloud-4-words). In\nrecent years, documentation and tutorials have com a long way to help\n[getting started with\nGCP](https://cloud.google.com/gcp/getting-started/). You can start with\na free account, but to use many of the services, you will need to turn on\nbilling. Once you do enable billing, always remember to turn off services\nthat you have spun up for learning purposes. It is also a good idea to\nturn on billing limits and alerts.\n\n### Cloud vs. On-Premises\n\n| Podcast episode: #076 Cloud vs. On-Premise\n|------------------|\n|How to choose between cloud and on-premises, pros and cons and what you have to think about. There are good reasons to not go cloud. Also, thoughts on how to choose between the cloud providers by just comparing instance prices. Otherwise, the comparison will drive you insane. My suggestion: Basically use them as IaaS and something like Cloudera as PaaS. Then build your solution on top of that.  \n| [Watch on YouTube](https://youtu.be/BAzj0yGcrnE) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/076-Cloud-vs-On-Premise-How-To-Decide-e45ivk)|\n\n\n### Security\n\nListen to a few thoughts about the cloud in this podcast:\n<https://anchor.fm/andreaskayy/embed/episodes/Dont-Be-Arrogant-The-Cloud-is-Safer-Then-Your-On-Premise-e16k9s>\n\n### Hybrid Clouds\n\nHybrid clouds are a mixture of on-premises and cloud deployment. A very\ninteresting example for this is Google Anthos:\n\n<https://cloud.google.com/anthos/>\n\n\n# Data Scientists and Machine Learning\n\nData scientists aren't like every other scientist.\n\nData scientists do not wear white coats or work in high tech labs full\nof science fiction movie equipment. They work in offices just like you\nand me.\n\nWhat differs them from most of us is that they are math experts. They\nuse linear algebra and multivariable calculus to create new insight from\nexisting data.\n\nHow exactly does this insight look?\n\nHere's an example:\n\nAn industrial company produces a lot of products that need to be tested\nbefore shipping.\n\nUsually such tests take a lot of time because there are hundreds of\nthings to be tested. All to make sure that your product is not broken.\n\nWouldn't it be great to know early if a test fails ten steps down the\nline? If you knew that you could skip the other tests and just trash the\nproduct or repair it.\n\nThat's exactly where a data scientist can help you, big-time. This field\nis called predictive analytics and the technique of choice is machine\nlearning.\n\nMachine what? Learning?\n\nYes, machine learning, it works like this:\n\nYou feed an algorithm with measurement data. It generates a model and\noptimises it based on the data you fed it with. That model basically\nrepresents a pattern of how your data is looking. You show that model\nnew data and the model will tell you if the data still represents the\ndata you have trained it with. This technique can also be used for\npredicting machine failure in advance with machine learning. Of course\nthe whole process is not that simple.\n\nThe actual process of training and applying a model is not that hard. A\nlot of work for the data scientist is to figure out how to pre-process\nthe data that gets fed to the algorithms.\n\nIn order to train an algorithm you need useful data. If you use any data\nfor the training the produced model will be very unreliable.\n\nAn unreliable model for predicting machine failure would tell you that\nyour machine is damaged even if it is not. Or even worse: It would tell\nyou the machine is ok even when there is a malfunction.\n\nModel outputs are very abstract. You also need to post-process the model\noutputs to receive the outputs you desire\n\n![The Machine Learning Pipeline](/images/Machine-Learning-Pipeline.jpg)\n\n\n## Machine Learning Workflow\n\n![The Machine Learning Workflow](/images/Machine-Learning-Workflow.jpg)\n\nData Scientists and Data Engineers. How does that all fit together?\n\nYou have to look at the data science process. How stuff is created and how data\nscience is done. How machine learning is\ndone.\n\nThe machine learning process shows, that you start with a training phase. A phase where you are basically training the algorithms to create the right output.\n\nIn the learning phase you are having the input parameters. Basically the configuration of the model and you have the input data.\n\nWhat you're doing is you are training the algorithm. While training the algorithm modifies the training\nparameters. It also modifies the used data and then you are getting to an output.\n\nOnce you get an output you are evaluating. Is that output okay, or is that output not the desired output?\n\nif the output is not what you were looking for? Then you are continuing with the training phase.\n\nYou're trying to retrain the model hundreds, thousands, hundred thousands of times. Of course all this is being done automatically.\n\nOnce you are satisfied with the output, you are putting the model into production. In production it is no longer fed with training\ndata it's fed with the live data.\n\nIt's evaluating the input data live and putting out live results.\n\nSo, you went from training to production and then what?\n\nWhat you do is monitoring the output. If the output keeps making sense, all good!\n\nIf the output of the model changes and it's on longer what you have expected, it means the model doesn't work anymore.\n\nYou need to trigger a retraining of the model. It basically gets to getting trained again.\n\nOnce you are again satisfied with the output, you put it into production again. It replaces the one in production.\n\nThis is the overall process how machine learning. It's how the learning part of data science is working.\n\n\n## Machine Learning Model and Data\n\n![The Machine Learning Model](/images/Machine-Learning-Model.jpg)\n\nNow that's all very nice.\n\nWhen you look at it, you have two very important places where you have data.\n\nYou have in the training phase two types of data:\nData that you use for the training. Data that basically configures the model, the hyper parameter configuration.\n\nOnce you're in production you have the live data that is streaming in. Data that is coming in from from an app, from\na IoT device, logs, or whatever.\n\nA data catalog is also important. It explains which features are available and how different data sets are labeled.\n\nAll different types of data. Now, here comes the engineering part.\n\nThe Data Engineers part, is making this data available. Available to the data scientist and the machine learning process.\n\nSo when you look at the model, on the left side you have your hyper parameter configuration. You need to store and manage these configurations somehow.\n\nThen you have the actual training data.\n\nThere's a lot going on with the training data:\n\nWhere does it come from? Who owns it? Which is basically data governance.\n\nWhat's the lineage? Have you modified this data? What did you do, what was the basis, the raw data?\n\nYou need to access all this data somehow. In training and in production.\n\nIn production you need to have access to the live data.\n\nAll this is the data engineers job. Making the data available.\n\nFirst an architect needs to build the platform. This can also be a good data engineer.\n\nThen the data engineer needs to build the pipelines. How is the data coming in and how is the platform\nconnecting to other systems.\n\nHow is that data then put into the storage. Is there a pre processing for the algorithms necessary? He'll do it.\n\nOnce the data and the systems are available, it's time for the machine learning part.\n\nIt is ready for processing. Basically ready for the data scientist.\n\nOnce the analytics is done the data engineer needs to build pipelines to make it then accessible again. For instance for other analytics processes, for APIs, for front ends and so on.\n\nAll in all, the data engineer's part is a computer science part.\n\nThat's why I love it so much :)\n"
  },
  {
    "path": "sections/03-AdvancedSkills.md",
    "content": "\nAdvanced Data Engineering Skills\n================================\n\n## Contents\n\n- [Data Science Platform](03-AdvancedSkills.md#data-science-platform)\n  - [Why a Good Data Platform Is Important](03-AdvancedSkills.md#why-a-good-data-platform-is-important)\n  - [Big Data vs Data Science and Analytics](03-AdvancedSkills.md#Big-Data-vs-Data-Science-and-Analytics)\n  - [The 4 Vs of Big Data](03-AdvancedSkills.md#the-4-vs-of-big-data)\n  - [Why Big Data](03-AdvancedSkills.md#why-big-data)\n    - [Planning is Everything](03-AdvancedSkills.md#planning-is-everything)\n    - [The Problem with ETL](03-AdvancedSkills.md#the-problem-with-etl)\n    - [Scaling Up](03-AdvancedSkills.md#scaling-up)\n    - [Scaling Out](03-AdvancedSkills.md#scaling-out)\n    - [When not to Do Big Data](03-AdvancedSkills.md#please-dont-go-big-data)\n- [81 Platform & Pipeline Design Questions](03-AdvancedSkills.md#81-platform-and-pipeline-design-questions)\n  - [Data Source Questions](03-AdvancedSkills.md#data-source-questions)\n  - [Goals and Destination Questions](03-AdvancedSkills.md#goals-and-destination-questions)\n- [Connect](03-AdvancedSkills.md#connect)\n  - [REST APIs](03-AdvancedSkills.md#rest-apis)\n    - [API Design](03-AdvancedSkills.md#api-design)\n    - [Implementation Frameworks](03-AdvancedSkills.md#implementation-frameworks)\n    - [Security](03-AdvancedSkills.md#security)\n  - [Apache Nifi](03-AdvancedSkills.md#apache-nifi)\n  - [Logstash](03-AdvancedSkills.md#logstash)\n- [Buffer](03-AdvancedSkills.md#buffer)\n  - [Apache Kafka](03-AdvancedSkills.md#apache-kafka)\n    - [Why a Message Queue Tool?](03-AdvancedSkills.md#why-a-message-queue-tool)\n    - [Kafka Architecture](03-AdvancedSkills.md#kafka-architecture)\n    - [Kafka Topics](03-AdvancedSkills.md#what-are-topics)\n    - [Kafka and Zookeeper](03-AdvancedSkills.md#what-does-zookeeper-have-to-do-with-kafka)\n    - [How to Produce and Consume Messages](03-AdvancedSkills.md#how-to-produce-and-consume-messages)\n    - [Kafka Commands](03-AdvancedSkills.md#kafka-commands)\n  - [Apache Redis Pub-Sub](03-AdvancedSkills.md#redis-pub-sub)\n  - [AWS Kinesis](03-AdvancedSkills.md#apache-kafka)\n  - [Google Cloud PubSub](03-AdvancedSkills.md#google-cloud-pubsub)\n- [Processing Frameworks](03-AdvancedSkills.md#processing-frameworks)\n  - [Lambda and Kappa Architecture](03-AdvancedSkills.md#lambda-and-kappa-architecture)\n  - [Batch Processing](03-AdvancedSkills.md#batch-processing)\n  - [Stream Processing](03-AdvancedSkills.md#stream-processing)\n    - [Three Methods of Streaming](03-AdvancedSkills.md#three-methods-of-streaming)\n    - [At Least Once](03-AdvancedSkills.md#at-least-once)\n    - [At Most Once](03-AdvancedSkills.md#at-most-once)\n    - [Exactly Once](03-AdvancedSkills.md#exactly-once)\n    - [Check The Tools](03-AdvancedSkills.md#check-the-tools)\n  - [Should You do Stream or Batch Processing](03-AdvancedSkills.md#should-you-do-stream-or-batch-processing)\n  - [Is ETL still relevant for Analytics?](03-AdvancedSkills.md#is-etl-still-relevant-for-analytics)\n  - [MapReduce](03-AdvancedSkills.md#mapreduce)\n    - [How Does MapReduce Work](03-AdvancedSkills.md#How-does-mapreduce-work)\n    - [MapReduce](03-AdvancedSkills.md#mapreduce)\n    - [MapReduce Example](03-AdvancedSkills.md#example)\n    - [MapReduce Limitations](03-AdvancedSkills.md#What-is-the-limitation-of-mapreduce)\n  - [Apache Spark](03-AdvancedSkills.md#apache-spark)\n    - [What is the Difference to MapReduce?](03-AdvancedSkills.md#what-is-the-difference-to-MapReduce)\n    - [How Spark Fits to Hadoop](03-AdvancedSkills.md#how-does-spark-fit-to-hadoop)\n    - [Spark vs Hadoop](03-AdvancedSkills.md#wheres-the-difference)\n    - [Spark and Hadoop a Perfect Fit](03-AdvancedSkills.md#spark-and-hadoop-is-a-perfect-fit)\n    - [Spark on YARn](03-AdvancedSkills.md#spark-on-yarn)\n    - [My Simple Rule of Thumb](03-AdvancedSkills.md#my-simple-rule-of-thumb)\n    - [Available Languages](03-AdvancedSkills.md#available-languages)\n    - [Spark Driver Executor and SparkContext](03-AdvancedSkills.md#how-spark-works-driver-executor-sparkcontext)\n    - [Spark Batch vs Stream processing](03-AdvancedSkills.md#spark-batch-vs-stream-processing)\n    - [How Spark uses Data From Hadoop](03-AdvancedSkills.md#How-does-spark-use-data-from-hadoop)\n    - [What are RDDs and How to Use Them](03-AdvancedSkills.md#what-are-rdds-and-how-to-use-them)\n    - [SparkSQL How and Why to Use It](03-AdvancedSkills.md#available-languages)\n    - [What are Dataframes and How to Use Them](03-AdvancedSkills.md#what-are-dataframes-how-to-use-them)\n    - [Machine Learning on Spark (TensorFlow)](03-AdvancedSkills.md#machine-learning-on-spark-tensor-flow)\n    - [MLlib](03-AdvancedSkills.md#mllib)\n    - [Spark Setup](03-AdvancedSkills.md#spark-setup)\n    - [Spark Resource Management](03-AdvancedSkills.md#spark-resource-management)\n  - [AWS Lambda](03-AdvancedSkills.md#apache-flink)  \n  - [Apache Flink](03-AdvancedSkills.md#apache-flink)\n  - [Elasticsearch](03-AdvancedSkills.md#elasticsearch)\n  - [Apache Drill](03-AdvancedSkills.md#apache-drill)\n  - [StreamSets](03-AdvancedSkills.md#streamsets)\n- [Store](03-AdvancedSkills.md#store)\n  - [Analytical Data Stores](03-AdvancedSkills.md#analytical-data-stores)\n    - [Data Warehouse vs Data Lake](03-AdvancedSkills.md#data-warehouse-vs-data-lake)\n    - [Snowflake and dbt](03-AdvancedSkills.md#snowflake-and-dbt)\n  - [Transactional Data Stores](03-AdvancedSkills.md#transactional-data-stores)\n    - [SQL Databases](03-AdvancedSkills.md#sql-databases)\n      - [PostgreSQL DB](03-AdvancedSkills.md#postgresql-db)\n      - [Database Design](03-AdvancedSkills.md#database-design)\n      - [SQL Queries](03-AdvancedSkills.md#sql-queries)\n      - [Stored Procedures](03-AdvancedSkills.md#stored-procedures)\n      - [ODBC/JDBC Server Connections](03-AdvancedSkills.md#odbc-jdbc-server-connections)\n    - [NoSQL Stores](03-AdvancedSkills.md#nosql-stores)\n      - [HBase KeyValue Store](03-AdvancedSkills.md#keyvalue-stores-hbase)\n      - [HDFS Document Store](03-AdvancedSkills.md#document-stores-hdfs)\n      - [MongoDB Document Store](03-AdvancedSkills.md#document-stores-mongodb)\n      - [Elasticsearch Document Store](03-AdvancedSkills.md#Elasticsearch-search-engine-and-document-store)\n      - [Graph Databases (Neo4j)](03-AdvancedSkills.md#graph-db-neo4j)\n      - [Impala](03-AdvancedSkills.md#impala)\n      - [Kudu](03-AdvancedSkills.md#kudu)\n      - [Apache Druid](03-AdvancedSkills.md#apache-druid)\n      - [InfluxDB Time Series Database](03-AdvancedSkills.md#influxdb-time-series-database)\n      - [Greenplum MPP Database](03-AdvancedSkills.md#mpp-databases-greenplum)\n    - [NoSQL Data Warehouses](03-AdvancedSkills.md#nosql-data-warehouses)\n      - [Hive Warehouse](03-AdvancedSkills.md#hive-warehouse)\n      - [Impala](03-AdvancedSkills.md#impala)\n- [Visualize](03-AdvancedSkills.md#visualize)\n  - [Android and IOS](03-AdvancedSkills.md#android-and-ios)\n  - [API Design for Mobile Apps](03-AdvancedSkills.md#how-to-design-apis-for-mobile-apps)\n  - [Dashboards](03-AdvancedSkills.md#dashboards)\n    - [Grafana](03-AdvancedSkills.md#grafana)\n    - [Kibana](03-AdvancedSkills.md#kibana)\n  - [Webservers](03-AdvancedSkills.md#how-to-use-webservers-to-display-content)\n    - [Tomcat](03-AdvancedSkills.md#tomcat)\n    - [Jetty](03-AdvancedSkills.md#jetty)\n    - [NodeRED](03-AdvancedSkills.md#nodered)\n    - [React](03-AdvancedSkills.md#react)\n  - [Business Intelligence Tools](03-AdvancedSkills.md#business-intelligence-tools)\n    - [Tableau](03-AdvancedSkills.md#tableau)\n    - [Power BI](03-AdvancedSkills.md#power-bi)\n    - [Quliksense](03-AdvancedSkills.md#quliksense)\n  - [Identity & Device Management](03-AdvancedSkills.md#Identity-and-device-management)\n    - [What Is A Digital Twin](03-AdvancedSkills.md#what-is-a-digital-twin)\n    - [Active Directory](03-AdvancedSkills.md#active-directory)\n- [Machine Learning](03-AdvancedSkills.md#machine-learning)\n  - [How to do Machine Learning in production](03-AdvancedSkills.md#how-to-domachine-learning-in-production)\n  - [Why machine learning in production is harder then you think](03-AdvancedSkills.md#why-machine-learning-in-production-is-harder-then-you-think)\n  - [Models Do Not Work Forever](03-AdvancedSkills.md#models-do-not-work-forever)\n  - [Where are The Platforms That Support Machine Learning](03-AdvancedSkills.md#where-are-the-platforms-that-support-this)\n  - [Training Parameter Management](03-AdvancedSkills.md#training-parameter-management)\n  - [How to Convince People That Machine Learning Works](03-AdvancedSkills.md#how-to-convince-people-machine-learning-works)\n  - [No Rules No Physical Models](03-AdvancedSkills.md#no-rules-no-physical-models)\n  - [You Have The Data. Use It!](03-AdvancedSkills.md#you-have-the-data-use-it)\n  - [Data is Stronger Than Opinions](03-AdvancedSkills.md#data-is-stronger-than-opinions)\n  - [AWS Sagemaker](03-AdvancedSkills.md#aws-sagemaker)\n\n\n\n## Data Science Platform\n\n### Why a Good Data Platform Is Important\n\n| Podcast Episode: #066 How To Do Data Science From A Data Engineers Perspective  \n|------------------|\n|A simple introduction how to do data science in the context of the internet of things.\n| [Watch on YouTube](https://youtu.be/yp_cc4R0mGQ) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/066-How-To-Do-Data-Science-From-A-Data-Engineers-Perspective-e45imt)|\n\n### Big Data vs Data Science and Analytics\n\nI talked about the difference in this podcast:\n<https://anchor.fm/andreaskayy/embed/episodes/BI-vs-Data-Science-vs-Big-Data-e199hq>\n\n### The 4 Vs of Big Data\n\nIt is a complete misconception. Volume is only one part of the often\ncalled four V's of big data: Volume, velocity, variety and veracity.\n\n**Volume** is about the size - How much data you have\n\n**Velocity** is about the speed - How fast data is getting to you\n\nHow much data in a specific time needs to get processed or is coming\ninto the system. This is where the whole concept of streaming data and\nreal-time processing comes in to play.\n\n**Variety** is about the variety - How different your data is\n\nLike CSV files, PDFs that you have and stuff in XML. That you also have\nJSON logfiles, or data in some kind of a key-value store.\n\nIt's about the variety of data types from different sources that you\nbasically want to join together. All to make an analysis based on that\ndata.\n\n**Veracity** is about the credibility - How reliable your data is\n\nThe issue with big data is, that it is very unreliable.\n\nYou cannot really trust the data. Especially when you're coming from the\nInternet of Things (IoT) side. Devices use sensors for measurement of\ntemperature, pressure, acceleration and so on.\n\nYou cannot always be hundred percent sure that the actual measurement is\nright.\n\nWhen you have data that is from for instance SAP and it contains data\nthat is created by hand you also have problems. As you know we humans\nare bad at inputting stuff.\n\nEverybody articulates differently. We make mistakes, down to the spelling\nand that can be a very difficult issue for analytics.\n\nI talked about the 4Vs in this podcast:\n<https://anchor.fm/andreaskayy/embed/episodes/4-Vs-Of-Big-Data-Are-Enough-e1h2ra>\n\n### Why Big Data?\n\nWhat I always emphasize is that the four V's are quite nice. They give you a\ngeneral direction.\n\nThere is a much more important issue: Catastrophic Success.\n\nWhat I mean by catastrophic success is, that your project, your startup\nor your platform has more growth that you anticipated. Exponential\ngrowth is what everybody is looking for.\n\nBecause with exponential growth there is the money. It starts small and\ngets very big very fast. The classic hockey stick curve:\n\n1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384,\n.... BOOM!\n\nThink about it. It starts small and quite slow, but gets very big very\nfast.\n\nYou get a lot of users or customers who are paying money to use your\nservice, the platform or whatever. If you have a system that is not\nequipped to scale and process the data the whole system breaks down.\n\nThat's catastrophic success. You are so successful and grow so fast that\nyou cannot fulfill the demand anymore. And so you fail and it's all\nover.\n\nIt's now like you just can make that up while you go. That you can\nforesee in a few months or weeks the current system doesn't work\nanymore.\n\n### Planning is Everything\n\nIt's all happens very very fast and you cannot react anymore. There's a\nnecessary type of planning and analyzing the potential of your business\ncase necessary.\n\nThen you need to decide if you actually have big data or not.\n\nYou need to decide if you use big data tools. This means when you\nconceptualize the whole infrastructure it might look ridiculous to\nactually focus on big data tools.\n\nBut in the long run it will help you a lot. Good planning will get a lot\nof problems out of the way, especially if you think about streaming data\nand real-time analytics.\n\n### The problem with ETL\n\nA typical old-school platform deployment would look like the picture\nbelow. Devices use a data API to upload data that gets stored in a SQL\ndatabase. An external analytics tool is querying data and uploading the\nresults back to the SQL DB. Users then use the user interface to display\ndata stored in the database.\n\n![Common SQL Platform Architecture](/images/Common-SQL-Architecture.jpg)\n\nNow, when the front end queries data from the SQL database the following\nthree steps happen:\n\n\\- The database extracts all the needed rows from the storage. (E) - The\nextracted data gets transformed, for instance sorted by timestamp or\nsomething a lot more complex. (T) - The transformed data is loaded to\nthe destination (the user interface) for chart creation. (L)\n\nWith exploding amounts of stored data the ETL process starts being a\nreal problem.\n\nAnalytics is working with large data sets, for instance whole days,\nweeks, months or more. Data sets are very big like 100GB or Terabytes.\nThat means Billions or Trillions of rows.\n\nThis has the result that the ETL process for large data sets takes\nlonger and longer. Very quickly the ETL performance gets so bad it won't\ndeliver results to analytics anymore.\n\nA traditional solution to overcome these performance issues is trying to\nincrease the performance of the database server. That's what's called\nscaling up.\n\n### Scaling Up\n\nTo scale up the system and therefore increase ETL speeds administrators\nresort to more powerful hardware by:\n\nSpeeding up the extract performance by adding faster disks to physically\nread the data faster. Increasing RAM for row caching. What is already in\nmemory does not have to be read by slow disk drives. Using more powerful\nCPU's for better transform performance (more RAM helps here as well).\nIncreasing or optimising networking performance for faster data delivery\nto the front end and analytics.\n\nIn summary: Scaling up the system is fairly easy.\n\n![Scaling up a SQL Database](/images/SQL-Scaling-UP.jpg)\n\nBut with exponential growth it is obvious that sooner or later (more\nsooner than later) you will run into the same problems again. At some\npoint you simply cannot scale up anymore because you already have a\nmonster system, or you cannot afford to buy more expensive hardware.\n\nThe next step you could take would be scaling out.\n\n### Scaling Out\n\nScaling out is the opposite of scaling up. Instead of building bigger\nsystems the goal is to distribute the load between many smaller systems.\n\nThe easiest way of scaling out an SQL database is using a storage area\nnetwork (SAN) to store the data. You can then use up to eight SQL\nservers (explain), attach them to the SAN and let them handle queries.\nThis way load gets distributed between those eight servers.\n\n![Scaling out a SQL Database](/images/SQL-Scaling-Out.jpg)\n\nOne major downside of this setup is that, because the storage is shared\nbetween the SQL servers, it can only be used as an read only database.\nUpdates have to be done periodically, for instance once a day. To do\nupdates all SQL servers have to detach from the database. Then, one is\nattaching the DB in read-write mode and refreshing the data. This\nprocedure can take a while if a lot of data needs to be uploaded.\n\nThis Link (missing) to a Microsoft MSDN page has more options of scaling\nout an SQL database for you.\n\nI deliberately don't want to get into details about possible scaling out\nsolutions. The point I am trying to make is that while it is possible to\nscale out SQL databases it is very complicated.\n\nThere is no perfect solution. Every option has its up- and downsides.\nOne common major issue is the administrative effort that you need to\ntake to implement and maintain a scaled out solution.\n\n### Please don't go Big Data\n\nIf you don't run into scaling issues please, do not use big data tools!\n\nBig data is an expensive thing. A Hadoop cluster for instance needs at\nleast five servers to work properly. More is better.\n\nBelieve me this stuff costs a lot of money.\n\nEspecially when you are talking about maintenance and development on top\nbig data tools into account.\n\nIf you don't need it it's making absolutely no sense at all!\n\nOn the other side: If you really need big data tools they will save your\nass :)\n\n## 81 Platform and Pipeline Design Questions\nMany people ask: \"How do you select the platform, tools and design the pipelines?\"\nThe options seem infinite. Technology however should never dictate the decisions.\n\nHere are 81 questions that you should answer when starting a project\n\n\n### Data Source Questions\n\n#### Data Origin and Structure\n- **What is the source?** Understand the \"device.\"\n- **What is the format of the incoming data?** (e.g., JSON, CSV, Avro, Parquet)\n- **What’s the schema?**\n- **Is the data structured, semi-structured, or unstructured?**\n- **What is the data type?** Understand the content of the data.\n- **Is the schema well-defined, or is it dynamic?**\n- **How are changes in the data structure from the source (schema evolution) handled?**\n\n#### Data Volume & Velocity\n- **How much data is transmitted per transmission?**\n- **How fast is the data coming in?** (e.g., messages per minute)\n- **What is the maximum data volume expected per source per day?**\n- **What scaling of sources/data is expected?**\n- **Are there peaks for incoming data?**\n- **How much data is posted per day across all sources?**\n- **How does the data volume fluctuate?** (e.g., seasonal peaks, hourly/daily variations)\n- **How will the system handle bursts of data?** (e.g., throttling or buffering)\n\n#### Source Reliability & Redundancy\n- **Is there data arriving late?**\n- **Is there a risk of duplicate data from the source?** How will we handle de-duplication?\n- **How reliable are the sources?** What’s the expected failure rate?\n- **How do we handle data corruption or loss during transmission?**\n- **What happens if a source goes offline?** Is there a fallback or failover source?\n- **Do we need to retry failed transmissions or have fault-tolerance mechanisms in place?**\n\n#### Data Extraction & New Sources\n- **Do we need to extract the data from the sources?**\n- **How many sources are there?**\n- **Will new sources be implemented?**\n\n#### Data Source Connectivity & Authentication\n- **How is the data arriving?** (API, bucket, etc.)\n- **How is the authentication done?**\n- **What kind of connection is required for the data source?** (e.g., streaming, batch, API)\n- **What protocols are used for data ingestion?** (e.g., REST, WebSocket, FTP)\n- **Are there any rate limits or quotas imposed by the data source?**\n- **How do we handle credentials?** Is there an API?\n- **What is the retry strategy if data fails to be processed or transmitted?**\n\n#### Data Security & Compliance\n- **Does the data need to be encrypted at the source before being transmitted?**\n- **Are there any compliance frameworks (e.g., GDPR, HIPAA) that the source data must adhere to?**\n- **Is there a requirement for data masking or obfuscation at the source?**\n\n#### Metadata & Audit\n- **Is there metadata for the client transmission stored somewhere?**\n- **What metadata should be captured for each transmission?** (e.g., record counts, latency)\n- **How do we track and log data ingestion events for audit purposes?**\n- **Is there a need for tracking data lineage?** (i.e., source origin and changes over time)\n\n---\n\n### Goals and Destination Questions\n\n#### Use Case & Data Consumption\n- **What kind of use case is this?** (Analytics, BI, ML, Transactional processing, Visualization, User Interfaces, APIs)\n- **What are the typical use cases that require this data?** (e.g., predictive analytics, operational dashboards)\n- **What are the downstream systems or platforms that will consume this data?**\n- **How critical is real-time data versus historical data in this use case?**\n\n#### Data Query & Delivery\n- **How is the data visualized?** (raw data, aggregated data)\n- **How much raw data is processed at once?**\n- **How much data is cold data, and how often is cold data queried?**\n- **How fast do the results need to appear?**\n- **How much data is going to be queried at once?**\n- **How fresh does the data need to be?**\n- **How often is the data queried?** (frequency)\n- **What are the SLAs for delivering data to downstream systems or applications?**\n\n#### Aggregation & Modeling\n- **How is the data aggregated?** (by devices, topic, time)\n- **When does the aggregation happen?** (on query, on schedule, while streaming)\n- **What kind of data models are needed for this use case?** (e.g., star schema, snowflake schema)\n- **Is there a need for pre-aggregations to speed up queries?**\n- **Should partitioning or indexing strategies be implemented to optimize query performance?**\n\n#### Performance & Availability\n- **What is the processing time requirement?**\n- **What is the availability of analytics output?** (input vs output delay)\n- **How fresh does the data need to be?**\n- **What are the performance expectations for query speed?**\n- **What is the acceptable query response time for end-users?**\n- **How will the system handle an increase in concurrent queries from multiple users?**\n- **What is the expected lag between data ingestion and availability for querying?**\n- **Do we need horizontal scaling for query engines or databases?**\n\n#### Data Lifecycle & Retention\n- **What’s the data retention time?**\n- **How often is data archived or moved to lower-cost storage?**\n- **Will old data need to be transformed or reprocessed for new use cases?**\n- **What are the data retention policies?** (e.g., hot vs cold storage)\n- **How will the use case evolve as the data grows?** Will this affect how data is consumed or visualized?\n\n#### Monitoring & Debugging\n- **How will data delivery to the destination be monitored?** (e.g., time-to-load, query failures)\n- **How will we monitor data pipeline health at the destination?** (e.g., throughput, latency)\n- **What tools or methods will be used for debugging data delivery failures or performance bottlenecks?**\n- **What metrics should be tracked to ensure data pipeline health?** (e.g., latency, throughput)\n- **How do we handle issues such as data corruption or incomplete data at the destination?**\n\n#### Data Access & Permissions\n- **Who is working with the platform, and who has access to query or visualize the data?**\n- **Which tools are used to query the data?**\n- **What kind of data export capabilities are required?** (e.g., CSV, API, direct database access)\n- **Is role-based access control (RBAC) needed to segment data views for different users?**\n- **How will access to sensitive data be managed?** (e.g., row-level security, encryption)\n\n#### Scaling & Future Requirements\n- **What are the scalability requirements for the data platform as data volume grows?**\n- **How will future business goals or scalability needs affect the design of data aggregation and retention strategies?**\n- **How will the system handle an increasing load as more users query data or as data volume grows?**\n\n\n## Connect\n\n### REST APIs\n\nAPIs or Application Programming Interfaces are the cornerstones of any\ngreat data platform.\n\n| Podcast Episode: #033 How APIs Rule The World\n|------------------|\n|Strong APIs make a good platform. In this episode I talk about why you need APIs and why Twitter is a great example. Especially JSON APIs are my personal favorite. Because JSON is also important in the Big Data world, for instance in log analytics. How? Check out this episode!  \n| [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/How-APIs-Rule-The-World--PoDS-033-e24ttq)|\n\n#### API Design\n\nIn this podcast episode we look into the Twitter API. It's a great\nexample how to build an API\n\n| Podcast Episode: #081 Twitter API Research Data Engineering Course Part 5\n|------------------|\n|In this episode we look into the Twitter API documentation, which I love by the way. How can we get old tweets for a certain hashtags and how to get current live tweets for these hashtags?\n| [Watch on YouTube](https://youtu.be/UnAXKxeIlyg) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/081-How-to-get-tweets-from-the-Twitter-API-e45j32)|\n\n\n#### Payload compression attacks\n\nHow to defend your Server with zip Bombs\nhttps://www.sitepoint.com/how-to-defend-your-website-with-zip-bombs/\n\n#### Implementation Frameworks\n\nJersey:\n\n<https://eclipse-ee4j.github.io/jersey.github.io/documentation/latest/getting-started.html>\n\nTutorial – REST API design and implementation in Java with Jersey and Spring:\nhttps://www.codepedia.org/ama/tutorial-rest-api-design-and-implementation-in-java-with-jersey-and-spring/\n\nSwagger:\n\n<https://github.com/swagger-api/swagger-core/wiki/Swagger-2.X---Getting-started>\n\nJersey vs Swagger:\n\n<https://stackoverflow.com/questions/36997865/what-is-the-difference-between-swagger-api-and-jax-rs>\n\nSpring Framework:\n\n<https://spring.io/>\n\nWhen to use Spring or Jersey:\n\n<https://stackoverflow.com/questions/26824423/what-is-the-difference-among-spring-rest-service-and-jersey-rest-service-and-spr>\n\n#### OAuth security\n\n### Apache Nifi\n\nNifi is one of these tools that I identify as high potential tools. It\nallows you to create a data pipeline very easily.\n\nRead data from a RestAPI and post it to Kafka? No problem Read data from\nKafka and put it into a database? No problem\n\nIt's super versatile and you can do everything on the UI.\n\nI use it in Part 3 of this Document. Check it out.\n\nCheck out the Apache Nifi FAQ website. Also look into the documentation\nto find all possible data sources and sinks of Nifi:\n\n<https://nifi.apache.org/faq.html>\n\nHere's a great blog about Nifi:\n\n<https://www.datainmotion.dev>\n\n### Logstash\n\n<https://www.elastic.co/products/logstash>\n\n### FluentD\n\nData Collector\n\nhttps://www.fluentd.org/\n\n### Apache Flume\n\nhttps://flume.apache.org/\n\n### Sqoop\n\nhttps://sqoop.apache.org/\n\n### Azure IoTHub\n\nhttps://azure.microsoft.com/en-us/services/iot-hub/\n\n\n\n## Buffer\n\n### Apache Kafka\n\n#### Why a message queue tool?\n\n#### Kafka architecture\n\n#### What are topics\n\n#### What does Zookeeper have to do with Kafka\n\n#### How to produce and consume messages\n\nMy YouTube video how to set up Kafka at home:\n<https://youtu.be/7F9tBwTUSeY>\n\nMy YouTube video how to write to Kafka: <https://youtu.be/RboQBZvZCh0>\n\n#### KAFKA Commands\n\nStart Zookeeper container for Kafka:\n\n    docker run -d --name zookeeper-server   \\\n        --network app-tier   \\\n        -e ALLOW_ANONYMOUS_LOGIN=yes    \\\n        bitnami/zookeeper:latest\n\nStart Kafka container:\n\n    docker run -d --name kafka-server  \\\n        --network app-tier  \\\n        -e KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper-server:2181  \\\n        -e ALLOW_PLAINTEXT_LISTENER=yes  \\\n        bitnami/kafka:latest\n\n### Redis Pub-Sub\n\n### AWS Kinesis\n\n### Google Cloud PubSub\n\n## Processing Frameworks\n\n### Lambda and Kappa Architecture\n\n| Podcast Episode: #077 Lambda Architecture and Kappa Architecture\n|------------------|\n|In this stream we talk about the lambda architecture with stream and batch processing as well as a alternative the Kappa Architecture that consists only of streaming. Also Data engineer vs data scientist and we discuss Andrew Ng’s AI Transformation Playbook.  \n| [Watch on YouTube](https://youtu.be/iUOQPyHN9-0) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/077-Lambda--Kappa-Architecture-e45j0r)|\n\n\n### Batch Processing\n\nAsk the big questions. Remember your last yearly tax statement?\n\nYou break out the folders. You run around the house searching for the\nreceipts.\n\nAll that fun stuff.\n\nWhen you finally found everything you fill out the form and send it on\nits way.\n\nDoing the tax statement is a prime example of a batch process.\n\nData comes in and gets stored, analytics loads the data from storage and\ncreates an output (insight):\n\n![Batch Processing Pipeline](/images/Simple-Batch-Processing-Workflow.jpg)\n\nBatch processing is something you do either without a schedule or on a\nschedule (tax statement). It is used to ask the big questions and gain\nthe insights by looking at the big picture.\n\nTo do so, batch processing jobs use large amounts of data. This data is\nprovided by storage systems like Hadoop HDFS.\n\nThey can store lots of data (petabytes) without a problem.\n\nResults from batch jobs are very useful, but the execution time is high.\nBecause the amount of used data is high.\n\nIt can take minutes or sometimes hours until you get your results.\n\n### Stream Processing\n\nGain instant insight into your data.\n\nStreaming allows users to make quick decisions and take actions based on\n\"real-time\" insight. Contrary to batch processing, streaming processes\ndata on the fly, as it comes in.\n\nWith streaming you don't have to wait minutes or hours to get results.\nYou gain instant insight into your data.\n\nIn the batch processing pipeline, the analytics was after the data\nstorage. It had access to all the available data.\n\nStream processing creates insight before the data storage. It has only\naccess to fragments of data as it comes in.\n\nAs a result the scope of the produced insight is also limited. Because\nthe big picture is missing.\n\n![Stream Processing Pipeline](/images/Simple-Stream-Processing-Workflow.jpg)\n\nOnly with streaming analytics you are able to create advanced services\nfor the customer. Netflix for instance incorporated stream processing\ninto Chuckwa V2.0 and the new Keystone pipeline.\n\nOne example of advanced services through stream processing is the\nNetflix \"Trending Now\" feature. Check out the Netflix case study.\n\n#### Three methods of streaming\n\nIn stream processing sometimes it is ok to drop messages, other times it\nis not. Sometimes it is fine to process a message multiple times, other\ntimes that needs to be avoided like hell.\n\nToday's topic are the different methods of streaming: At most once, at\nleast once and exactly once.\n\nWhat this means and why it is so important to keep them in mind when\ncreating a solution. That is what you will find out in this article.\n\n#### At Least Once\n\nAt least once, means a message gets processed in the system once or\nmultiple times. So with at least once it's not possible that a message\ngets into the system and is not getting processed.\n\nIt's not getting dropped or lost somewhere in the system.\n\nOne example where at least once processing can be used is when you think\nabout a fleet management of cars. You get GPS data from cars and that\ndata is transmitted with a timestamp and the GPS coordinates.\n\nIt's important that you get the GPS data at least once, so you know\nwhere the car is. If you're processing this data multiple times, it\nalways has the the timestamp with it.\n\nBecause of that it does not matter that it gets processed multiple\ntimes, because of the timestamp. Or that it would be stored multiple\ntimes, because it would just override the existing one.\n\n#### At Most Once\n\nThe second streaming method is at most once. At most once means that\nit's okay to drop some information, to drop some messages.\n\nBut it's important that a message is only processed once as a\nmaximum.\n\nA example for this is event processing. Some event is happening and that\nevent is not important enough, so it can be dropped. It doesn't have any\nconsequences when it gets dropped.\n\nBut when that event happens it's important that it does not get\nprocessed multiple times. Then it would look as if the event happened\nfive or six times instead of only one.\n\nThink about engine misfires. If it happens once, no big deal. But if the\nsystem tells you it happens a lot you will think you have a problem with\nyour engine.\n\n#### Exactly Once\n\nAnother thing is exactly once, this means it's not okay to drop data,\nit's not okay to lose data and it's also not okay to process data\nmultiple times.\n\nAn example for this is banking. When you think about credit card\ntransactions it's not okay to drop a transaction.\n\nWhen dropped, your payment is not going through. It's also not okay to\nhave a transaction processed multiple times, because then you are paying\nmultiple times.\n\n#### Check The Tools!\n\nAll of this sounds very simple and logical. What kind of processing is\ndone has to be a requirement for your use case.\n\nIt needs to be thought about in the design process, because not every\ntool is supporting all three methods. Very often you need to code your\napplication very differently based on the streaming method.\n\nEspecially exactly once is very hard to do.\n\nSo, the tool of data processing needs to be chosen based on if you need\nexactly once, at least once or if you need at most once.\n\n\n### Should you do stream or batch processing?\n\nIt is a good idea to start with batch processing. Batch processing is\nthe foundation of every good big data platform.\n\nA batch processing architecture is simple, and therefore quick to set\nup. Platform simplicity means, it will also be relatively cheap to run.\n\nA batch processing platform will enable you to quickly ask the big\nquestions. They will give you invaluable insight into your data and\ncustomers.\n\nWhen the time comes and you also need to do analytics on the fly, then\nadd a streaming pipeline to your batch processing big data platform.\n\n### Is ETL still relevant for Analytics?\n\n| Podcast Episode: #039 Is ETL Dead For Data Science & Big Data?\n|------------------|\n|Is ETL dead in Data Science and Big Data? In today’s podcast I share with you my views on your questions regarding ETL (extract, transform, load). Is ETL still practiced or did pre-processing & cleansing replace it. What would replace ETL in Data Engineering.\n| [Watch on YouTube](https://youtu.be/leSOWPaNkl4) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/Is-ETL-Dead-For-Data-Science--Big-Data---PoDS-039-e2b604)|\n\n### MapReduce\n\nSince the early days of the Hadoop eco system, the MapReduce framework\nis one of the main components of Hadoop alongside HDFS.\n\nGoogle for instance used MapReduce to analyse stored HTML content of\nwebsites through counting all the HTML tags and all the words and\ncombinations of them (for instance headlines). The output was then used\nto create the page ranking for Google Search.\n\nThat was when everybody started to optimise his website for the google\nsearch. Serious search engine optimisation was born. That was the year\n2004.\n\nHow MapReduce is working is, that it processes data in two phases: The\nmap phase and the reduce phase.\n\nIn the map phase, the framework is reading data from HDFS. Each dataset\nis called an input record.\n\nThen there is the reduce phase. In the reduce phase, the actual\ncomputation is done and the results are stored. The storage target can\neither be a database or back HDFS or something else.\n\nAfter all it's Java -- so you can implement what you like.\n\nThe magic of MapReduce is how the map and reduce phase are implemented\nand how both phases are working together.\n\nThe map and reduce phases are parallelised. What that means is, that you\nhave multiple map phases (mappers) and reduce phases (reducers) that can\nrun in parallel on your cluster machines.\n\nHere's an example how such a map and reduce process works with data:\n\n![Mapping of input files and reducing of mapped records](/images/MapReduce-Process-Detailed.jpg)\n\n#### How does MapReduce work\n\nFirst of all, the whole map and reduce process relies heavily on using\nkey-value pairs. That's what the mappers are for.\n\nIn the map phase input data, for instance a file, gets loaded and\ntransformed into key-value pairs.\n\nWhen each map phase is done it sends the created key-value pairs to the\nreducers where they are getting sorted by key. This means, that an input\nrecord for the reduce phase is a list of values from the mappers that\nall have the same key.\n\nThen the reduce phase is doing the computation of that key and its\nvalues and outputting the results.\n\nHow many mappers and reducers can you use in parallel? The number of\nparallel map and reduce processes depends on how many CPU cores you have\nin your cluster. Every mapper and every reducer is using one core.\n\nThis means that the more CPU cores you actually have, the more mappers\nyou can use, the faster the extraction process can be done. The more\nreducers you are using the faster the actual computation is being done.\n\nTo make this more clear, I have prepared an example:\n\n#### Example\n\nAs I said before, MapReduce works in two stages, map and reduce. Often\nthese stages are explained with a word count task.\n\nPersonally, I hate this example because counting stuff is to trivial and\ndoes not really show you what you can do with MapReduce. Therefore, we\nare going to use a more real world use-case from the IoT world.\n\nIoT applications create an enormous amount of data that has to be\nprocessed. This data is generated by physical sensors who take\nmeasurements, like room temperature at 8 o'clock.\n\nEvery measurement consists of a key (the timestamp when the measurement\nhas been taken) and a value (the actual value measured by the sensor).\n\nBecause you usually have more than one sensor on your machine, or\nconnected to your system, the key has to be a compound key. Compound\nkeys contain in addition to the measurement time information about the\nsource of the signal.\n\nBut, let's forget about compound keys for now. Today we have only one\nsensor. Each measurement outputs key-value pairs like: Timestamp-Value.\n\nThe goal of this exercise is to create average daily values of that\nsensor's data.\n\nThe image below shows how the map and reduce process works.\n\nFirst, the map stage loads unsorted data (input records) from the source\n(e.g. HDFS) by key and value (key:2016-05-01 01:02:03, value:1).\n\nThen, because the goal is to get daily averages, the hour:minute:second\ninformation is cut from the timestamp.\n\nThat is all that happens in the map phase, nothing more.\n\nAfter all parallel map phases are done, each key-value pair gets sent to\nthe one reducer who is handling all the values for this particular key.\n\nEvery reducer input record then has a list of values and you can\ncalculate (1+5+9)/3, (2+6+7)/3 and (3+4+8)/3. That's all.\n\n![MapReduce Example of Time Series Data](/images/MapReduce-Time-Series-example.jpg)\n\nWhat do you think you need to do to generate minute averages?\n\nYes, you need to cut the key differently. You then would need to cut it\nlike this: \"2016-05-01 01:02\", keeping the hour and minute information\nin the key.\n\nWhat you can also see is, why map reduce is so great for doing parallel\nwork. In this case, the map stage could be done by nine mappers in\nparallel because each map is independent from all the others.\n\nThe reduce stage could still be done by three tasks in parallel. One for\norange, one for blue and one for green.\n\nThat means, if your dataset would be 10 times as big and you'd have 10\ntimes the machines, the time to do the calculation would be the same.\n\n#### What is the limitation of MapReduce?\n\nMapReduce is awesome for simpler analytics tasks, like counting stuff.\nIt just has one flaw: It has only two stages Map and Reduce.\n\n![The Map Reduce Process](/images/MapReduce-Process.jpg)\n\nFirst MapReduce loads the data from HDFS into the mapping function.\nThere you prepare the input data for the processing in the reducer.\nAfter the reduce is finished the results get written to the data store.\n\nThe problem with MapReduce is that there is no simple way to chain\nmultiple map and reduce processes together. At the end of each reduce\nprocess the data must be stored somewhere.\n\nThis fact makes it very hard to do complicated analytics processes. You\nwould need to chain MapReduce jobs together.\n\nChaining jobs with storing and loading intermediate results just makes\nno sense.\n\nAnother issue with MapReduce is that it is not capable of streaming\nanalytics. Jobs take some time to spin up, do the analytics and shut\ndown. Basically Minutes of wait time are totally normal.\n\nThis is a big negative point in a more and more real time data\nprocessing world.\n\n### Apache Spark\n\nI talked about the three methods of data streaming in this podcast:\n<https://anchor.fm/andreaskayy/embed/episodes/Three-Methods-of-Streaming-Data-e15r6o>\n\n#### What is the difference to MapReduce?\n\nSpark is a complete in-memory framework. Data gets loaded from, for\ninstance HDFS, into the memory of workers.\n\nThere is no longer a fixed map and reduce stage. Your code can be as\ncomplex as you want.\n\nOnce in memory, the input data and the intermediate results stay in\nmemory (until the job finishes). They do not get written to a drive like\nwith MapReduce.\n\nThis makes Spark the optimal choice for doing complex analytics. It\nallows you for instance to do iterative processes. Modifying a dataset\nmultiple times in order to create an output is totally easy.\n\nStreaming analytics capability is also what makes Spark so great. Spark\nhas natively the option to schedule a job to run every X seconds or X\nmilliseconds.\n\nAs a result, Spark can deliver you results from streaming data in \"real\ntime\".\n\n#### How does Spark fit to Hadoop?\n\nThere are some very misleading articles out there titled \\\"Spark or\nHadoop\\\", \\\"Spark is better than Hadoop\\\" or even \\\"Spark is replacing\nHadoop\\\".\n\nSo, it's time to show you the differences between Spark and Hadoop.\nAfter this you will know when and for what you should use Spark and\nHadoop.\n\nYou'll also understand why \\\"Hadoop or Spark\\\" is the totally wrong\nquestion.\n\n#### Where's the difference?\n\nTo make it clear how Hadoop differs from Spark I created this simple\nfeature table:\n\n![Hadoop vs Spark capabilities](/images/Table-Hadoop-and-Spark.jpg)\n\nHadoop is used to store data in the Hadoop Distributed File System\n(HDFS). It can analyse the stored data with MapReduce and manage\nresources with YARN.\n\nHowever, Hadoop is more than just storage, analytics and resource\nmanagement. There's a whole eco system of tools around the Hadoop core.\nI've written about its eco system in this article: [missing](missing)\nWhat is Hadoop and why is it so freakishly popular. You should check it\nout as well.\n\nCompared to Hadoop, Spark is \"just\" an analytics framework. It has no\nstorage capability. Although it has a standalone resource management,\nyou usually don't use that feature.\n\n#### Spark and Hadoop is a perfect fit\n\nSo, if Hadoop and Spark are not the same things, can they work together?\n\nAbsolutely! Here's how the first picture will look if you combine Hadoop\nwith Spark:\n\nmissing\n\nAs Storage you use HDFS. Analytics is done with Apache Spark and YARN is\ntaking care of the resource management.\n\nWhy does that work so well together?\n\nFrom a platform architecture perspective, Hadoop and Spark are usually\nmanaged on the same cluster. This means on each server where a HDFS data\nnode is running, a Spark worker thread runs as well.\n\nIn distributed processing, network transfer between machines is a large\nbottle neck. Transferring data within a machine reduces this traffic\nsignificantly.\n\nSpark is able to determine on which data node the needed data is stored.\nThis allows a direct load of the data from the local storage into the\nmemory of the machine.\n\nThis reduces network traffic a lot.\n\n#### Spark on YARN:\n\nYou need to make sure that your physical resources are distributed\nperfectly between the services. This is especially the case when you run\nSpark workers with other Hadoop services on the same machine.\n\nIt just would not make sense to have two resource managers managing the\nsame server's resources. Sooner or later they will get in each others\nway.\n\nThat's why the Spark standalone resource manager is seldom used.\n\nSo, the question is not Spark or Hadoop. The question has to be: Should\nyou use Spark or MapReduce alongside Hadoop's HDFS and YARN.\n\n#### My simple rule of thumb:\n\nIf you are doing simple batch jobs like counting values or doing\ncalculating averages: Go with MapReduce.\n\nIf you need more complex analytics like machine learning or fast stream\nprocessing: Go with Apache Spark.\n\n#### Available Languages\n\nSpark jobs can be programmed in a variety of languages. That makes\ncreating analytic processes very user-friendly for data scientists.\n\nSpark supports Python, Scala and Java. With the help of SparkR you can\neven connect your R program to a Spark cluster.\n\nIf you are a data scientist who is very familiar with Python just use\nPython, its great. If you know how to code Java I suggest you start\nusing Scala.\n\nSpark jobs are easier to code in Scala than in Java. In Scala you can\nuse anonymous functions to do processing.\n\nThis results in less overhead, it is a much cleaner, simpler code.\n\nWith Java 8 simplified function calls were introduced with lambda\nexpressions. Still, a lot of people, including me prefer Scala over\nJava.\n\n#### How Spark works: Driver, Executor, Sparkcontext\n\n| Podcast Episode: #100 Apache Spark Week Day 1\n|------------------|\n|Is ETL dead in Data Science and Big Data? In today’s podcast I share with you my views on your questions regarding ETL (extract, transform, load). Is ETL still practiced or did pre-processing & cleansing replace it. What would replace ETL in Data Engineering.\n| [Watch on YouTube](https://youtu.be/qD6Wi2pfCx0)\n\n\n#### Spark batch vs stream processing\n\n#### How does Spark use data from Hadoop\n\nAnother thing is data locality. I always make the point, that processing\ndata locally where it is stored is the most efficient thing to do.\n\nThat's exactly what Spark is doing. You can and should run Spark workers\ndirectly on the data nodes of your Hadoop cluster.\n\nSpark can then natively identify on what data node the needed data is\nstored. This enables Spark to use the worker running on the machine\nwhere the data is stored to load the data into the memory.\n\n![Spark Using Hadoop Data Locality](/images/Spark-Data-Locality.jpg)\n\nThe downside of this setup is that you need more expensive servers.\nBecause Spark processing needs stronger servers with more RAM and CPUs\nthan a \"pure\" Hadoop setup.\n\n#### What are RDDs and how to use them\n\nRDDs are the core part of Spark. I learned and used RDDs first. It felt\nfamiliar coming from MapReduce. Nowadays you use Dataframes or Datasets.\n\nI still find it valuable to learn how RDDs and therefore Spark works at\na lower level.\n\n| Podcast Episode: #101 Apache Spark Week Day 2\n|------------------|\n|On day two of the Apache Spark week we take a look at major Apache Spark concepts: RDDs, transformations and actions, caching and broadcast variables.\n| [Watch on YouTube](https://youtu.be/9I6mA2W6_HU)\n\n\n#### How and why to use SparkSQL?\n\nWhen you use Apache Zeppelin notebooks to learn Spark you automatically\ncome across SparkSQL. SparkSQL allows you to access Dataframes with SQL\nlike queries.\n\nEspecially when you work with notebooks it is very handy to create\ncharts from your data. You can learn from mistakes easier than just\ndeploying Spark applications.\n\n| Podcast Episode: #102 Apache Spark Week Day 3\n|------------------|\n| We continue the Spark week, hands on. We do a full example from reading a csv, doing maps and ﬂatmaps, to writing to disk. We also use SparkSQL to visualize the data.\n| [Watch on YouTube](https://youtu.be/Fk-s8eKD4ZI)\n\n#### What are DataFrames how to use them\n\nAs I said before. Dataframes are the successors to RDDs. It's the new\nSpark API.\n\nDataframes are basically lake Tables in a SQL Database or like an Excel\nsheet. This makes them very simple to use and manipulate with SparkSQL.\nI highly recommend to go this route.\n\nProcessing with Dataframes is even faster then with RDDs, because it\nuses optimization alogrithms for the data processing.\n\n| Podcast Episode: #103 Apache Spark Week Day 4\n|------------------|\n|We look into Dataframes, Dataframes and Dataframes.\n| [Watch on YouTube](https://youtu.be/9I6mA2W6_HU)\n\n#### Machine Learning on Spark? (Tensor Flow)\n\nWouldn't it be great to use your deep learning TensorFlow applications\non Spark? Yes, it is already possible. Check out these Links:\n\nWhy do people integrate Spark with TensorFlow even if there is a\ndistributed TensorFlow framework?\n<https://www.quora.com/Why-do-people-integrate-Spark-with-TensorFlow-even-if-there-is-a-distributed-TensorFlow-framework>\n\nTensorFlow On Spark: Scalable TensorFlow Learning on Spark Clusters:\n<https://databricks.com/session/tensorflow-on-spark-scalable-tensorflow-learning-on-spark-clusters>\n\nDeep Learning with Apache Spark and TensorFlow:\n<https://databricks.com/blog/2016/01/25/deep-learning-with-apache-spark-and-tensorflow.html>\n\n#### MLlib:\n\nThe machine learning library MLlib is included in Spark so there is\noften no need to import another library.\n\nI have to admit because I am not a data scientist I am not an expert in\nmachine learning.\n\nFrom what I have seen and read though the machine learning framework\nMLlib is a nice treat for data scientists wanting to train and apply\nmodels with Spark.\n\n#### Spark Setup\n\nFrom a solution architect's point of view Spark is a perfect fit for\nHadoop big data platforms. This has a lot to do with cluster deployment\nand management.\n\nCompanies like Cloudera, MapR or Hortonworks include Spark into their\nHadoop distributions. Because of that, Spark can be deployed and managed\nwith the clusters Hadoop management web fronted.\n\nThis makes the process for deploying and configuring a Spark cluster\nvery quick and admin friendly.\n\n#### Spark Resource Management\n\nWhen running a computing framework you need resources to do computation:\nCPU time, RAM, I/O and so on. Out of the box Spark can manage resources\nwith it's stand-alone resource manager.\n\nIf Spark is running in an Hadoop environment you don't have to use\nSpark's own stand-alone resource manager. You can configure Spark to use\nHadoop's YARN resource management.\n\nWhy would you do that? It allows YARN to efficiently allocate resources\nto your Hadoop and Spark processes.\n\nHaving a single resource manager instead of two independent ones makes\nit a lot easier to configure the resource management.\n\n![Spark Resource Management With YARN](/images/Spark-Yarn.jpg)\n\n### Samza\n\n[Link to Apache Samza Homepage](http://samza.apache.org/)\n\n### AWS Lambda\n\n[Link to AWS Lambda Homepage](https://aws.amazon.com/lambda/)\n\n\n### Apache Flink\n\n[Link to Apache Flink Homepage](https://flink.apache.org/)\n\n\n### Elasticsearch\n\n[Link to Elatsicsearch Homepage](https://www.elastic.co/products/elastic-stack)\n\n### Graph DB\n\nGraph databases store data in terms of nodes and relationships.\nEach node represents an entity (people, movies, things and other\ndata points) and a relationship represents how the nodes are related.\nThey are designed to store and treat the relationships with the same\nimportance of that of the data (or nodes in this case). This\nrelationship-first approach makes a lot of difference as the relationship\nbetween data need not be inferred anymore with foreign and primary keys.\n\nGraph databases are especially useful when applications require\nnavigating through multiple and multi-level relationships between\nvarious data points.\n\n#### Neo4j\n\nNeo4j is currently the most popular graph database management system.\nIt is ACID compliant and provides its own implementation of a graph database.\nIn addition to nodes and relationships, neo4j has the following components\nto enrich the data model with information.\n\n• Labels. These are used to group nodes, and each node can be assigned\nmultiple labels. Labels are indexed to speed up finding nodes in a graph.\n• Properties. These are attributes of both nodes and relationships.\nNeo4j allows for storing data as key-value pairs, which means properties\ncan have any value (string, number, or boolean).\n\n##### Advantages\n\n• Neo4j is schema-free\n• Highly available and provides transactional guarantees\n• Cypher is a declarative query language which makes it very easy to navigate the graph\n• Neo4j is fast and easily traversible because the data is connected and is very easy to query, retrieve and navigate the graph\n• For the same reason as above, there are no joins in Neo4j\n\n##### Disadvantages\n\n• Neo4j is not the best for any kind of aggregations or sorting, in comparison with a relational database\n• While doable, they are not the best to handle transactional data like accounting\n• Sharding is currently not supported\n\n##### Use Cases\n\nhttps://neo4j.com/use-cases/\n\n### Apache Solr\n\n[Link to Solr Homepage](https://solr.apache.org)\n\n\n### Apache Drill\n\n[Link to Apache Drill Homepage](https://drill.apache.org)\n\n\n### Apache Storm\n\nhttps://storm.apache.org/\n\n### StreamSets\n\n<https://youtu.be/djt8532UWow>\n\n<https://www.youtube.com/watch?v=Qm5e574WoCU&t=2s>\n\n\n<https://streamsets.com/blog/streaming-data-twitter-analysis-spark/>\n\n## Store\n\n### Analytical Data Stores\n\n#### Data Warehouse vs Data Lake\n\n| Podcast Episode: #055 Data Warehouse vs Data Lake\n|------------------|\n|On this podcast we are going to talk about data warehouses and data lakes? When do people use which? What are the pros and cons of both? Architecture examples for both Does it make sense to completely move to a data lake?\n| [Watch on YouTube](https://youtu.be/8gNQTrUUwMk) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/055-Data-Warehouse-vs-Data-Lake-e45iem)|\n\n#### Snowflake and dbt\n\n![Snowlfake thumb](/images/03/Snowflake-dbt-thumbnail.jpeg)\n\nIn the rapidly evolving landscape of data engineering, staying ahead means continuously expanding your skill set with the latest tools and technologies. Among the myriad of options available, dbt (data build tool) and Snowflake have emerged as indispensable for modern data engineering workflows. Understanding and leveraging these tools can significantly enhance your ability to manage and transform data, making you a more effective and valuable data engineer. Let's dive into why dbt and Snowflake should be at the top of your learning list and explore how the \"dbt for Data Engineers\" and \"Snowflake for Data Engineers\" courses from the Learn Data Engineering Academy can help you achieve mastery in these tools.\n\n##### The Power of Snowflake in Data Engineering\n\nSnowflake has revolutionized the data warehousing space with its cloud-native architecture. It offers a scalable, flexible, and highly performant platform that simplifies data management and analytics. Here’s why Snowflake is a critical skill for data engineers:\n\n1. **Cloud-Native Flexibility:** Snowflake’s architecture allows you to scale resources up or down based on your needs, ensuring optimal performance and cost-efficiency.\n2. **Unified Data Platform:** It unifies data silos, enabling seamless data sharing and collaboration across the organization.\n3. **Integration Capabilities:** Snowflake integrates with various data tools and platforms, enhancing its versatility in different data workflows.\n4. **Advanced Analytics:** With its robust support for data querying, transformation, and integration, Snowflake is ideal for complex analytical workloads.\n\nThe \"Snowflake for Data Engineers\" course in my Learn Data Engineering Academy provides comprehensive training on Snowflake. From the basics of setting up your Snowflake environment to advanced data automation with Snowpipes, the course equips you with practical skills to leverage Snowflake effectively in your data projects.\n\nLearn more about the course [here](https://learndataengineering.com/p/snowflake-for-data-engineers).\n\n![Snowlfake thumb](/images/03/Snowflake-ui.jpeg)\n\n\n##### Why dbt is a Game-Changer for Data Engineers\n\ndbt is a powerful transformation tool that allows data engineers to transform, test, and document data directly within their data warehouse using simple SQL. Unlike traditional ETL tools, dbt operates on the principle of ELT (Extract, Load, Transform), which aligns perfectly with modern cloud data warehousing paradigms. Here are a few reasons why dbt is a must-have skill for data engineers:\n\n1. **SQL-First Approach:** dbt allows you to write transformations in SQL, the lingua franca of data manipulation, making it accessible to a broad range of data professionals.\n2. **Collaboration:** Teams can collaborate seamlessly, creating trusted datasets for reporting, machine learning, and operational workflows.\n3. **Ease of Use:** With dbt, you can transform, test, and document your data with ease, streamlining the data pipeline process.\n4. **Integration:** dbt integrates effortlessly with your existing data warehouse, such as Snowflake, making it a versatile addition to your toolkit.\n\nIn my Learn Data Engineering Academy you find the perfect starting point for mastering dbt with the course \"dbt for Data Engineers\". The course covers everything from the basics of ELT processes to advanced features like continuous integration and deployment (CI/CD) pipelines. With hands-on training, you'll learn to create data pipelines, configure dbt materializations, test dbt models, and much more.\n\nLearn more about the course [here](https://learndataengineering.com/p/dbt-for-data-engineers).\n\n![Snowlfake thumb](/images/03/dbt-ui.jpeg)\n\n##### dbt and Snowflake: A Winning Combination\n\nWhen used together, dbt and Snowflake offer a powerful combination for data engineering. Here’s why:\n\n1. **Seamless Integration:** dbt’s SQL-first transformation capabilities integrate perfectly with Snowflake’s scalable data warehousing, creating a streamlined ELT workflow.\n2. **Efficiency:** Together, they enhance the efficiency of data transformation and analytics, reducing the time and effort required to prepare data for analysis.\n3. **Scalability:** The combined power of dbt’s model management and Snowflake’s dynamic scaling ensures that your data pipelines can handle large and complex datasets with ease.\n4. **Collaboration and Documentation:** dbt’s ability to document and test data transformations directly within Snowflake ensures that data workflows are transparent, reliable, and collaborative.\nGet right into it with our Academy!\n\nBy integrating Snowflake and dbt into your skill set, you position yourself at the forefront of data engineering innovation. These tools not only simplify and enhance your data workflows but also open up new possibilities for data transformation and analysis.\n\n### Transactional Data Stores\n#### SQL Databases\n\n##### PostgreSQL DB\n\nHomepage:\n\n<https://www.postgresql.org/>\n\nPostgreSQL vs MongoDB:\n\n<https://blog.panoply.io/postgresql-vs-mongodb>\n\n##### Database Design\n\n##### SQL Queries\n\n##### Stored Procedures\n\n##### ODBC/JDBC Server Connections\n\n#### NoSQL Stores\n\n##### KeyValue Stores (HBase)\n\n\n  | Podcast Episode: #056 NoSQL Key Value Stores Explained with HBase\n  |------------------|\n  |What is the diﬀerence between SQL and NoSQL? In this episode I show you on the example of HBase how a key/value store works.\n  | [Watch on YouTube](https://youtu.be/67hIkbpzFc8) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/056-NoSQL-Key-Value-Stores-Explained-With-HBase-e45ifb)|\n\n\n##### Document Store HDFS\n\nThe Hadoop distributed file system, or HDFS, allows you to store files\nin Hadoop. The difference between HDFS and other file systems like NTFS\nor EXT is that it is a distributed one.\n\nWhat does that mean exactly?\n\nA typical file system stores your data on the actual hard drive. It is\nhardware dependent.\n\nIf you have two disks then you need to format every disk with its own\nfile system. They are completely separate.\n\nYou then decide on which disk you physically store your data.\n\nHDFS works different to a typical file system. HDFS is hardware\nindependent.\n\nNot only does it span over many disks in a server. It also spans over\nmany servers.\n\nHDFS will automatically place your files somewhere in the Hadoop server\ncollective.\n\nIt will not only store your file, Hadoop will also replicate it two or\nthree times (you can define that). Replication means replicas of the\nfile will be distributed to different servers.\n\n![HDFS Master and Data Nodes](/images/HDFS-Master-DataNodes.jpg)\n\nThis gives you superior fault tolerance. If one server goes down, then\nyour data stays available on a different server.\n\nAnother great thing about HDFS is, that there is no limit how big the\nfiles can be. You can have server log files that are terabytes big.\n\nHow can files get so big? HDFS allows you to append data to files.\nTherefore, you can continuously dump data into a single file without\nworries.\n\nHDFS physically stores files different then a normal file system. It\nsplits the file into blocks.\n\nThese blocks are then distributed and replicated on the Hadoop cluster.\nThe splitting happens automatically.\n\n![Distribution of Blocks for a 512MB File](/images/HDFS-Distributed-FileSystem.jpg)\n\nIn the configuration you can define how big the blocks should be. 128\nmegabyte or 1 gigabyte?\n\nNo problem at all.\n\nThis mechanic of splitting a large file in blocks and distributing them\nover the servers is great for processing. See the MapReduce section for\nan example.\n\n##### Document Store MongoDB\n\n\n  | Podcast Episode: #093 What is MongoDB\n  |------------------|\n  |What is the diﬀerence between SQL and NoSQL? In this episode I show you on the example of HBase how a key/value store works.\n  | [Watch on YouTube](https://youtu.be/U05knQN29FA)\n\n\n**Links:**\n\nWhat is MongoDB:\n\n<https://www.guru99.com/what-is-mongodb.html#4>\n\nOr directly from MongoDB.com:\n\n<https://www.mongodb.com/what-is-mongodb>\n\nStorage in BSON files:\n\n<https://en.wikipedia.org/wiki/BSON>\n\nHello World in MongoDB:\n\n<https://www.mkyong.com/mongodb/mongodb-hello-world-example>\n\nReal-Time Analytics on MongoDB Data in Power BI:\n\n<https://dzone.com/articles/real-time-analytics-on-mongodb-data-in-power-bi>\n\nSpark and MongoDB:\n\n<https://www.mongodb.com/scale/when-to-use-apache-spark-with-mongodb>\n\nMongoDB vs Time Series Database:\n\n<https://blog.timescale.com/how-to-store-time-series-data-mongodb-vs-timescaledb-postgresql-a73939734016/>\n\nFun article titled why you should never use mongodb:\n\n<http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/>\n\nMongoDB vs Cassandra:\n\n<https://blog.panoply.io/cassandra-vs-mongodb>\n\n##### Elasticsearch Search Engine and Document Store\n\nElasticsearch is not a DB but firstly a search engine that indexes JSON\ndocuments.\n\n| Podcast Episode: #095 What is Elasticsearch & Why is It So Popular?\n|------------------|\n|Elasticsearch is a super popular tool for indexing and searching data. On this stream we check out how it works, architectures and what to use it for. There must be a reason why it is so popular.  \n| [Watch on YouTube](https://youtu.be/hNb5zB4OPXM)\n\n\nLinks:\n\nGreat example for architecture with Elasticsearch, Logstash and Kibana:\\\n<https://www.elastic.co/pdf/architecture-best-practices.pdf>\n\nIntroduction to Elasticsearch in the documentation:\\\n<https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html>\n\nWorking with JSON documents:\\\n<https://www.slideshare.net/openthinklabs/03-elasticsearch-data-in-data-out>\n\nJSONs need to be flattened heres how to work with nested objects in the\nJSON:\\\n<https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html>\n\nIndexing basics:\\\n<https://www.slideshare.net/knoldus/deep-dive-into-elasticsearch>\n\nHow to do searches with search API:\\\n<https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html>\n\nGeneral recommendations when working with Elasticsearch:\\\n<https://www.elastic.co/guide/en/elasticsearch/reference/current/general-recommendations.html>\n\nJSON document example and intro to Kibana:\\\n<https://www.slideshare.net/objectrocket/an-intro-to-elasticsearch-and-kibana>\n\nHow to connect Tableau to Elasticsearch:\\\n<https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-client-apps-tableau.html>\n\nBenchmarks how fast Elasticsearch is:\\\n<https://medium.appbase.io/benchmarking-elasticsearch-1-million-writes-per-sec-bf37e7ca8a4c>\n\nElasticsearch vs MongoDB quick overview:\\\n<https://db-engines.com/en/system/Elasticsearch%3BMongoDB>\n\nLogstash overview (preprocesses data before insert into Elasticsearch)\n<https://www.elastic.co/products/logstash>\n\nX-Pack Security for Elasticsearch:\\\n<https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api.html>\n\nGoogle Trends Grafana vs Kibana:\\\n<https://trends.google.com/trends/explore?geo=US&q=%2Fg%2F11fy132gmf,%2Fg%2F11cknd0blr>\n\n\n##### Apache Impala\n\n[Apache Impala Homepage](https://impala.apache.org/)\n\n##### Kudu\n\n##### Apache Druid\n\n| Podcast Episode: Druid NoSQL DB and Analytics DB Introduction\n|------------------|\n|In this video I explain what Druid is and how it works. We look into the architecture of a Druid cluster and check out how Clients access the data.\n|[Watch on YouTube](https://youtu.be/EiEIeBXSWjM)\n\n\n##### InfluxDB Time Series Database\n\nWhat is time-series data?\n\n<https://questdb.io/blog/what-is-time-series-data/>\n\nKey concepts:\n\n<https://docs.influxdata.com/influxdb/v1.7/concepts/key_concepts/>\n\nInfluxDB and Spark Streaming\n\n<https://towardsdatascience.com/processing-time-series-data-in-real-time-with-influxdb-and-structured-streaming-d1864154cf8b>\n\nBuilding a Streaming application with spark, grafana, chronogram and\ninflux:\n\n<https://medium.com/@xaviergeerinck/building-a-real-time-streaming-dashboard-with-spark-grafana-chronograf-and-influxdb-e262b68087de>\n\nPerformance Dashboard Spark and InfluxDB:\n\n<https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark>\n\nOther alternatives for time series databases are: DalmatinerDB,\nQuestDB, Prometheus, Riak TS, OpenTSDB, KairosDB\n\n##### MPP Databases (Greenplum)\n\n##### Azure Cosmos DB\n\nhttps://azure.microsoft.com/en-us/services/cosmos-db/\n\n##### Azure Table-Storage\n\nhttps://azure.microsoft.com/en-us/services/storage/tables/\n\n#### NoSQL Data warehouse\n\n##### Hive Warehouse\n\n##### Impala\n\n## Visualize\n\n### Android & IOS\n\n### How to design APIs for mobile apps\n\n### How to use Webservers to display content\n\n### Dashboards\n\n#### Grafana\n\n#### Kibana\n\n#### Tomcat\n\n#### Jetty\n\n#### NodeRED\n\n#### React\n\n### Business Intelligence Tools\n\n#### Tableau\n\n#### PowerBI\n\n#### Quliksense\n\n### Identity & Device Management\n\n#### What is a digital twin?\n\n#### Active Directory\n\n\nMachine Learning\n----------------\n\n| Podcast Episode: Machine Learning In Production\n|------------------|\n|Doing machine learning in production is very diﬀerent than for proof of concepts or in education. One of the hardest parts is keeping models updated.  \n| [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/Machine-Learning-In-Production-e11bbk)\n\n### How to do Machine Learning in production\n\nMachine learning in production is using stream and batch processing. In\nthe batch processing layer you are creating the models, because you have\nall the data available for training.\n\nIn the stream in processing layer you are using the created models, you\nare applying them to new data.\n\nThe idea that you need to incorporate is that it is a constant cycle.\nTraining, applying, re-training, pushing into production and applying.\n\nWhat you don't want to do is doing this manually. You need to figure out\na process of automatic retraining and automatic pushing to into\nproduction of models.\n\nIn the retraining phase the system automatically evaluates the training.\nIf the model no longer fits it works as long as it needs to create a\ngood model.\n\nAfter the evaluation of the model is complete and it's good, the model\ngets pushed into production. Into the stream processing.\n\n### Why machine learning in production is harder then you think\n\nHow to automate machine learning is something that drives me day in and\nday out.\n\nWhat you do in development or education is, that you create a model and\nfit it to the data. Then that model is basically done forever.\n\nWhere I'm coming from, the IoT world, the problem is that machines are\nvery different. They behave very different and experience wear.\n\n### Models Do Not Work Forever\n\nMachines have certain processes that decrease the actual health of the\nmachine. Machine wear is a huge issue. Models that that are built on top\nof a good machine don't work forever.\n\nWhen the Machine wears out, the models need to be adjusted. They need to\nbe maintained, retrained.\n\n### Where The Platforms That Support This?\n\nAutomatic re-training and re-deploying is a very big issue, a very big\nproblem for a lot of companies. Because most existing platforms don't\nhave this capability (I actually haven't seen one until now).\n\nLook at AWS machine learning for instance. The process is: build, train,\ntune deploy. Where's the loop of retraining?\n\nYou can create models and then use them in production. But this loop is\nalmost nowhere to be seen.\n\nIt is a very big issue that needs to be solved. If you want to do\nmachine learning in production you can start with manual interaction of\nthe training, but at some point you need to automate everything.\n\n### Training Parameter Management\n\nTo train a model you are manipulating input parameters of the models.\n\nTake deep learning for instance.\n\nTo train you are manipulating for instance:\n\n\\- How many layers do you use. - The depth of the layers, which means\nhow many neurons you have in a layer. - What activation function you\nuse, how long are you training and so on.\n\nYou also need to keep track of what data you used to train which model.\n\nAll those parameters need to be manipulated automatically, models\ntrained and tested.\n\nTo do all that, you basically need a database that keeps track of those\nvariables.\n\nHow to automate this, for me, is like the big secret. I am still working\non figuring it out.\n\n### What's Your Solution?\n\nDid you already have the problem of automatic re-training and deploying\nof models as well?\n\nWere you able to use a cloud platform like Google, AWS or Azure?\n\nIt would be really awesome if you share your experience :)\n\n### How to convince people machine learning works\n\nMany people still are not convinced that machine learning works\nreliably. But they want analytics insight and most of the time machine\nlearning is the way to go.\n\nThis means, when you are working with customers you need to do a lot of\nconvincing. Especially if they are not into machine learning themselves.\n\nBut it's actually quite easy.\n\n### No Rules, No Physical Models\n\nMany people are still under the impression that analytics only works\nwhen it's based on physics. When there are strict mathematical rules to\na problem.\n\nEspecially in engineering heavy countries like Germany this is the norm:\n\n\"Sere has to be a Rule for Everysing!\" (imagine a German accent). When\nyou're engineering you are calculating stuff based on physics and not\nbased on data. If you are constructing an airplane wing, you better make\nsure to use calculations so it doesn't fall off.\n\nAnd that's totally fine.\n\nKeep doing that!\n\nMachine learning has been around for decades. It didn't quite work as\ngood as people hoped. We have to admit that. But there is this\npreconception that it still doesn't work.\n\nWhich is not true: Machine learning works.\n\nSomehow you need to convince people that it is a viable approach. That\nlearning from data to make predictions is working perfectly.\n\n### You Have The Data. USE IT!\n\nAs a data scientist you have one ace up your sleeve, it's the obvious\none:\n\nIt's the data and it's statistics.\n\nYou can use that data and those statistics to counter peoples\npreconceptions. It's very powerful if someone says: \"This doesn't work\"\n\nYou bring the data. You show the statistics and you show that it works\nreliably.\n\nA lot of discussions end there.\n\nData doesn't lie. You can't fight data. The data is always right.\n\n### Data is Stronger Than Opinions\n\nThis is also why I believe that autonomous driving will come quicker\nthan many of us think. Because a lot of people say, they are not safe.\nThat you cannot rely on those cars.\n\nThe thing is: When you have the data you can do the statistics.\n\nYou can show people that autonomous driving really works reliably. You\nwill see, the question of \\\"Is this allowed or is this not allowed?\\\"\nwill be gone quicker than you think.\n\nBecause government agencies can start testing the algorithms based on\npredefined scenarios. They can run benchmarks and score the cars\nperformance.\n\nAll those opinions, if it works, or if it doesn't work, they will be\ngone.\n\nThe motor agency has the statistics. The stats show people how good cars\nwork.\n\nCompanies like Tesla, they have it very easy. Because the data is\nalready there.\n\n**They just need to show us that the algorithms work. The end.**\n\n### AWS Sagemaker\n\nTrain and apply models online with Sagemaker\n\nLink to the OLX Slideshare with pros, cons and how to use Sagemaker:\n<https://www.slideshare.net/mobile/AlexeyGrigorev/image-models-infrastructure-at-olx>\n"
  },
  {
    "path": "sections/04-HandsOnCourse.md",
    "content": "Data Engineering Course: Building A Data Platform\n=================================================\n\n## Contents\n\n- [GenAI Retrieval Augmented Generation with Ollama and Elasticsearch](04-HandsOnCourse.md#genai-retrieval-augmented-generation-with-ollama-and-elasticsearch)\n- [Free Data Engineering Course with AWS, TDengine, Docker and Grafana](04-HandsOnCourse.md#free-data-engineering-course-with-aws-tdengine-docker-and-grafana)\n- [Monitor your data in dbt & detect quality issues with Elementary](04-HandsOnCourse.md#monitor-your-data-in-dbt-and-detect-quality-issues-with-elementary)\n- [Solving Engineers 4 Biggest Airflow Problems](04-HandsOnCourse.md#solving-engineers-4-biggest-airflow-problems)\n- [The best alternative to Airlfow? Mage.ai](04-HandsOnCourse.md#the-best-alternative-to-airlfow?-mage.ai)\n\n## GenAI Retrieval Augmented Generation with Ollama and Elasticsearch\n\n- This how-to is based on this one from Elasticsearch: https://www.elastic.co/search-labs/blog/rag-with-llamaIndex-and-elasticsearch\n- Instead of Elasticsearch cloud we're going to run everything locally\n- The simplest way to get this done is to just clone this GitHub Repo for the code and docker setup\n- I've tried this on a M1 Mac. Changes for Windows with WSL will come later.\n- The biggest problems that I had were actually installing the dependencies rather than the code itself.\n\n### Install Ollama\n1. Download Ollama from here https://ollama.com/download/mac\n2. Unzip, drag into applications and install\n3. do `ollama run mistral` (It's going to download the Mistral 7b model, 4.1GB size)\n4. Create a new folder in Documents \"Elasticsearch-RAG\"\n5. Open that folder in VSCode\n\n### Install Elasticsearch & Kibana (Docker)\n1. Use the docker-compose file from this repo: https://github.com/andkret/Cookbook/blob/master/Code%20Examples/GenAI-RAG/docker-compose.yml\n2. Download Docker Desktop from here: https://www.docker.com/products/docker-desktop/\n3. Install docker desktop and sign in in the app/create a user -> sends you to the browser\n\n**For Windows Users**\nConfigure WSL2 to use max only 4GB of ram:\n```\nwsl --shutdown\nnotepad \"$env:USERPROFILE/.wslconfig\"\n```\n.wslconfig file:\n```\n[wsl2]\nmemory=4GB   # Limits VM memory in WSL 2 up to 4GB\n```\n**Modify the Linux kernel map count in WSL**\nDo this before the start because Elasticsearch requires a higher value to work\n`sudo sysctl -w vm.max_map_count=262144`\n\n4. go to the Elasticsearch-RAG folder and do `docker compose up`\n5. make sure you have Elasticsearch 8.11 or later (we use 8.16 here in this project) if you want to use your own Elasticsearch image\n6. if you get this error on a mac then just open the console in the docker app: *error getting credentials - err: exec: docker-credential-desktop: executable file not found in $PATH, out:*\n7. Install xcode command line tools: `xcode-select --install`\n8. make sure you're at python 3.8.1 or larger -> installed 3.13.0 from https://www.python.org/downloads/\n\n### Setup the virtual Python environment\n\n#### preparation on a Mac\n##### install brew\nwhich brew\n/bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"\nexport PATH=\"/opt/homebrew/bin:$PATH\"\nbrew --version\nbrew install pyenv\nbrew install pyenv-virtualenv\n\n##### install pyenv\n```\nbrew install pyenv\nbrew install pyenv-virtualenv\n```\n\nModify the path so that pyenv is in the path variable\n`nano ~/.zshrc`\n\n```\nexport PYENV_ROOT=\"$HOME/.pyenv\"\nexport PATH=\"$PYENV_ROOT/bin:$PATH\"\neval \"$(pyenv init --path)\"\neval \"$(pyenv init -)\"\neval \"$(pyenv virtualenv-init -)\"\n```\n\ninstall dependencies for building python versions\n`brew install openssl readline sqlite3 xz zlib`\n\nReload to apply changes\n`source ~/.zshrc`\n\ninstall python\n```\npyenv install 3.11.6\npyenv version\n```\n\nSet Python version system wide\n`pyenv global 3.11.6`\n\n```\npyenv virtualenv <python-version> <new-virtualenv-name>\npyenv activate <your-virtualenv-name>\npyenv virtualenv-delete <your-virtualenv-name>\n```\n\n#### Windows without pyenv\nsetup virtual python environment - go to the Elasticsearch-RAG folder and do\n`python3 -m venv .elkrag`\nenable the environment\n`source .elkrag/bin/activate`\n\n\n### Install required libraries (do one at a time so you see errors):\n```\npip install llama-index (optional python3 -m pip install package name)\npip install llama-index-embeddings-ollama\npip install llama-index-llms-ollama\npip install llama-index-vector-stores-elasticsearch\npip install python-dotenv\n```\n\n### Write the data to Elasticsearch\n1. create / copy in the index.py file\n2. download the conversations.json file from the folder code examples/GenAI-RAG\n3. if you get an error with the execution then check if pedantic version is <2.0 `pip show pydantic` if not do this: `pip install \"pydantic<2.0`\n4. run the program index.py: https://github.com/andkret/Cookbook/blob/master/Code%20Examples/GenAI-RAG/index.py\n\n### Check the data in Elasticsearch\n1. go to kibana http://localhost:5601/app/management/data/index_management/indices and see the new index called calls\n2. go to dev tools and try out this query `GET calls/_search?size=1 http://localhost:5601/app/dev_tools#/console/shell`\n\n### Query data from elasticsearch and create an output with Mistral\n1. if everything is good then run the query.py file https://github.com/andkret/Cookbook/blob/master/Code%20Examples/GenAI-RAG/query.py\n2. try a few queries :)\n\n### Install libraries to extract text from pdfs\n\n\n### Extract data from CV and put it into Elasticsearch\nI created a CV with ChatGPT https://github.com/andkret/Cookbook/blob/master/Code%20Examples/GenAI-RAG/Liam_McGivney_CV.pdf\n\nInstall the library to extract text from the pdf\n`pip install PyMuPDF`\nI had to Shift+Command+p then python clear workspace cache and reload window. Then it saw it :/\n\nThe file cvpipeline.py has the python code for the indexing. It's not working right now though!\nhttps://github.com/andkret/Cookbook/blob/master/Code%20Examples/GenAI-RAG/cvpipeline.py\n\n\nI'll keep developing this and update it once it's working.\n\n\n## Free Data Engineering Course with AWS TDengine Docker and Grafana\n\n**Free hands-on course:** [Watch on YouTube](https://youtu.be/eoj-CnrR9jA)\n\nIn this detailed tutorial video, Andreas guides viewers through creating an end-to-end data pipeline using time series data. The project focuses on fetching weather data from a Weather API, processing it on AWS, storing it in TDengine (a time series database), and visualizing the data with Grafana. Here's a concise summary of what the video covers:\n\n1. **Introduction and Setup:**\n  - The project is introduced along with a GitHub repository containing all necessary resources and a step-by-step guide.\n  - The pipeline architecture includes an IoT weather station, a Weather API, AWS for processing, TDengine for data storage, and Grafana for visualization.\n2. **Project Components:**\n  - **Weather API:** Utilizes weatherapi.com to fetch weather data.\n  - **AWS Lambda:** Processes the data fetched from the Weather API.\n  - **TDengine:** Serves as the time series database to store processed data. It's highlighted for its performance and simplicity, especially for time series data.\n  - **Grafana:** Used for creating dashboards to visualize the time series data.\n3. **Development and Deployment:**\n  - The local development environment setup includes Python, Docker, and VS Code.\n  - The tutorial covers the creation of a Docker image for the project and deploying it to AWS Elastic Container Registry (ECR).\n  - AWS Lambda is then configured to use the Docker image from ECR.\n  - AWS EventBridge is used to schedule the Lambda function to run at specified intervals.\n4. **Time Series Data:**\n  - The importance of time series data and the benefits of using a time series database like TDengine over traditional relational databases are discussed.\n  - TDengine's features such as speed, scaling, data retention, and built-in functions for time series data are highlighted.\n5. **Building the Pipeline:**\n  - Detailed instructions are provided for setting up each component of the pipeline:\n    - Fetching weather data from the Weather API.\n    - Processing and sending the data to TDengine using an AWS Lambda function.\n    - Visualizing the data with Grafana.\n  - Each step includes code snippets and configurations needed to implement the pipeline.\n6. **Conclusion:**\n  - The video concludes with a demonstration of the completed pipeline, showing weather data visualizations in Grafana.\n  - Viewers are encouraged to replicate the project using the resources provided in the GitHub repository linked in the video description.\n\nThis video provides a comprehensive guide to building a data pipeline with a focus on time series data, demonstrating the integration of various technologies and platforms to achieve an end-to-end solution.\n\n## Monitor your data in dbt and detect quality issues with Elementary\n\n**Free hands-on tutorial:** [Watch on YouTube](https://youtu.be/6fnU91Q2gq0)\n\nIn this comprehensive tutorial, Andreas delves into the integration of dbt (data build tool) with Elementary to enhance data monitoring and quality detection within Snowflake databases. The tutorial is structured to guide viewers through a hands-on experience, starting with an introduction to a sample project setup and the common challenges faced in monitoring dbt jobs. It then transitions into how Elementary can be utilized to address these challenges effectively.\n\nKey learning points and tutorial structure include:\n\n1. **Introduction to the Sample Project:** Andreas showcases a project setup involving Snowflake as the data warehouse, dbt for data modeling and testing, and a visualization tool for data analysis. This setup serves as the basis for the tutorial.\n2. **Challenges in Monitoring dbt Jobs:** Common issues in monitoring dbt jobs are discussed, highlighting the limitations of the dbt interface in providing comprehensive monitoring capabilities.\n3. **Introduction to Elementary:** Elementary is introduced as a dbt-native data observability tool designed to enhance the monitoring and analysis of dbt jobs. It offers both open-source and cloud versions, with the tutorial focusing on the cloud version.\n4. **Setup Requirements:** The tutorial covers the necessary setup on both the Snowflake and dbt sides, including schema creation, user and role configuration in Snowflake, and modifications to the dbt project for integrating with Elementary.\n5. **Elementary's User Interface and Features:** A thorough walkthrough of Elementary's interface is provided, showcasing its dashboard, test results, model runs, data catalog, and data lineage features. The tool's ability to automatically run additional tests, like anomaly detection and schema change detection, is also highlighted.\n6. **Advantages of Using Elementary:** The presenter outlines several benefits of using Elementary, such as easy implementation, native test integration, clean and straightforward UI, and enhanced privacy due to data being stored within the user's data warehouse.\n7. **Potential Drawbacks:** Some potential drawbacks are discussed, including the additional load on dbt job execution due to more models being run and limitations in dashboard customization.\n8. **Summary and Verdict:** The tutorial concludes with a summary of the key features and benefits of using Elementary with dbt, emphasizing its value in improving data quality monitoring and detection.\n\nOverall, viewers are guided through setting up and utilizing Elementary for dbt data monitoring, gaining insights into its capabilities, setup process, and the practical benefits it offers for data quality assurance.\n\n\n## Solving Engineers 4 Biggest Airflow Problems\n\n**Free hands-on tutorial:** [Watch on YouTube](https://youtu.be/b9bMNEh8bes)\n\nIn this informative video, Andreas discusses the four major challenges engineers face when working with Apache Airflow and introduces Astronomer, a managed Airflow service that addresses these issues effectively. Astronomer is highlighted as a solution that simplifies Airflow deployment and management, making it easier for engineers to develop, deploy, and monitor their data pipelines. Here's a summary of the key points discussed for each challenge and how Astronomer provides solutions:\n\n1. Managing Airflow Deployments:\n  - **Challenge:** Setting up and maintaining Airflow deployments is complex and time-consuming, involving configuring cloud instances, managing resources, scaling, and updating the Airflow system.\n  - **Solution with Astronomer:** Offers a straightforward deployment process where users can easily configure their deployments, choose cloud providers (GCP, AWS, Azure), and set up scaling with just a few clicks. Astronomer handles the complexity, making it easier to manage production and quality environments.\n2. Development Environment and Deployment:\n  - **Challenge:** Local installation of Airflow is complicated due to its dependency on multiple Docker containers and the need for extensive configuration.\n  - **Solution with Astronomer:** Provides a CLI tool for setting up a local development environment with a single command, simplifying the process of developing, testing, and deploying pipelines. The Astronomer CLI also helps in initializing project templates and deploying Dags to the cloud effortlessly.\n3. Source Code Management and CI/CD Pipelines:\n  - **Challenge:** Collaborative development and continuous integration/deployment (CI/CD) are essential but challenging to implement effectively with Airflow alone.\n  - **Solution with Astronomer:** Facilitates easy integration with GitHub for source code management and GitHub Actions for CI/CD. This allows automatic testing and deployment of pipeline code, ensuring a smooth workflow for teams working on pipeline development.\n4. Observing Pipelines and Alarms:\n  - **Challenge:** Monitoring data pipelines and getting timely alerts when issues occur is crucial but often difficult to achieve.\n  - **Solution with Astronomer:** The Astronomer platform provides a user-friendly interface for monitoring pipeline status and performance. It also offers customizable alerts for failures or prolonged task durations, with notifications via email, PagerDuty, or Slack, ensuring immediate awareness and response to issues.\n\nOverall, the video shows Astronomer as a powerful and user-friendly platform that addresses the common challenges of using Airflow, from deployment and development to collaboration, CI/CD, and monitoring. It suggests that Astronomer can significantly improve the experience of engineers working with Airflow, making it easier to manage, develop, and monitor data pipelines.\n\n\n## The best alternative to Airlfow? Mage.ai\n\n**Free hands-on tutorial:** [Watch on YouTube](https://youtu.be/3gXsFEC3aYA)\n\nIn this insightful video, Andreas introduces Mage, a promising alternative to Apache Airflow, focusing on its simplicity, user-friendliness, and scalability. The video provides a comprehensive walkthrough of Mage, highlighting its key features and advantages over Airflow. Here's a breakdown of what viewers can learn and expect from the video:\n\n1. **Deployment Ease:** Mage offers a stark contrast to Airflow's complex setup process. It simplifies deployment to a single Docker image, making it straightforward to install and start on any machine, whether it's local or cloud-based on AWS, GCP, or Azure. This simplicity extends to scaling, which Mage handles horizontally, particularly beneficial in Kubernetes environments where performance scales with the number of pipelines.\n2. **User Interface (UI):** Mage shines with its UI, presenting a dark mode interface that's not only visually appealing but also simplifies navigation and pipeline management. The UI facilitates easy access to pipelines, scheduling, and monitoring of pipeline runs, offering a more intuitive experience compared to Airflow.\n3. **Pipeline Creation and Modification:** Mage streamlines the creation of ETL pipelines, allowing users to easily add data loaders, transformers, and exporters through its UI. It supports direct interaction with APIs for data loading and provides a visual representation of the data flow, enhancing the overall pipeline design experience.\n4. **Data Visualization and Exploration:** Beyond simple pipeline creation, Mage enables in-depth data exploration within the UI. Users can generate various charts, such as histograms and bar charts, to analyze the data directly, a feature that greatly enhances the tool's utility.\n5. **Testing and Scheduling:** Testing pipelines in Mage is straightforward, allowing for quick integration of tests to ensure data quality and pipeline reliability. Scheduling is also versatile, supporting standard time-based triggers, event-based triggers for real-time data ingestion, and API calls for on-demand pipeline execution.\n6. **Support for Streaming and ELT Processes:** Mage is not limited to ETL workflows but also supports streaming and ELT processes. It integrates seamlessly with DBT models for in-warehouse transformations and Spark for big data processing, showcasing its versatility and scalability.\n7. **Conclusion and Call to Action:** Andreas concludes by praising the direction in which the industry is moving, with tools like Mage simplifying data engineering processes. He encourages viewers to try Mage and engage with the content by liking, subscribing, and commenting on their current tools and the potential impact of Mage.\n\nOverall, the video shows Mage as a highly user-friendly, scalable, and versatile tool for data pipeline creation and management, offering a compelling alternative to traditional tools like Airflow.\n"
  },
  {
    "path": "sections/05-CaseStudies.md",
    "content": "Case Studies\n============\n\n## Contents\n\n- [Data Science @Airbnb](05-CaseStudies.md#data-science-at-Airbnb)\n- [Data Science @Amazon](05-CaseStudies.md#data-science-at-Amazon)\n- [Data Science @Baidu](05-CaseStudies.md#data-science-at-Baidu)\n- [Data Science @Blackrock](05-CaseStudies.md#data-science-at-Blackrock)\n- [Data Science @BMW](05-CaseStudies.md#data-science-at-BMW)\n- [Data Science @Booking.com](05-CaseStudies.md#data-science-at-Booking.com)\n- [Data Science @CERN](05-CaseStudies.md#data-science-at-CERN)\n- [Data Science @Disney](05-CaseStudies.md#data-science-at-Disney)\n- [Data Science @DLR](05-CaseStudies.md#data-science-at-DLR)\n- [Data Science @Drivetribe](05-CaseStudies.md#data-science-at-Drivetribe)\n- [Data Science @Dropbox](05-CaseStudies.md#data-science-at-Dropbox)\n- [Data Science @Ebay](05-CaseStudies.md#data-science-at-Ebay)\n- [Data Science @Expedia](05-CaseStudies.md#data-science-at-Expedia)\n- [Data Science @Facebook](05-CaseStudies.md#data-science-at-Facebook)\n- [Data Science @Google](05-CaseStudies.md#data-science-at-Google)\n- [Data Science @Grammarly](05-CaseStudies.md#data-science-at-Grammarly)\n- [Data Science @ING Fraud](05-CaseStudies.md#data-science-at-ING-Fraud)\n- [Data Science @Instagram](05-CaseStudies.md#data-science-at-Instagram)\n- [Data Science @LinkedIn](05-CaseStudies.md#data-science-at-LinkedIn)\n- [Data Science @Lyft](05-CaseStudies.md#data-science-at-Lyft)\n- [Data Science @NASA](05-CaseStudies.md#data-science-at-NASA)\n- [Data Science @Netflix](05-CaseStudies.md#data-science-at-Netflix)\n- [Data Science @OLX](05-CaseStudies.md#data-science-at-OLX)\n- [Data Science @OTTO](05-CaseStudies.md#data-science-at-OTTO)\n- [Data Science @Paypal](05-CaseStudies.md#data-science-at-Paypal)\n- [Data Science @Pinterest](05-CaseStudies.md#data-science-at-Pinterest)\n- [Data Science @Salesforce](05-CaseStudies.md#data-science-at-Salesforce)\n- [Data Science @Siemens Mindsphere](05-CaseStudies.md#data-science-at-Siemens-Mindsphere)\n- [Data Science @Slack](05-CaseStudies.md#data-science-at-Slack)\n- [Data Science @Spotify](05-CaseStudies.md#data-science-at-Spotify)\n- [Data Science @Symantec](05-CaseStudies.md#data-science-at-Symantec)\n- [Data Science @Tinder](05-CaseStudies.md#data-science-at-Tinder)\n- [Data Science @Twitter](05-CaseStudies.md#data-science-at-Twitter)\n- [Data Science @Uber](05-CaseStudies.md#data-science-at-Uber)\n- [Data Science @Upwork](05-CaseStudies.md#data-science-at-Upwork)\n- [Data Science @Woot](05-CaseStudies.md#data-science-at-Woot)\n- [Data Science @Zalando](05-CaseStudies.md#data-science-at-Zalando)\n\n\n\nHow I do Case Studies\n---------------------\n\n### Data Science at Airbnb\n\n| Podcast Episode: #063 Data Engineering At Airbnb Case Study\n|------------------|\n|How Airbnb is doing data engineering? Let’s check it out.\n| [Watch on YouTube](https://youtu.be/iokqkMfyIfo) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/063-Data-Engineering-At-Airbnb-Case-Study-e45il2)|\n\n\n**Slides:**\n\n<https://medium.com/airbnb-engineering/airbnb-engineering-infrastructure/home>\n\nAirbnb Engineering Blog: <https://medium.com/airbnb-engineering>\n\nData Infrastructure:\n<https://medium.com/airbnb-engineering/data-infrastructure-at-airbnb-8adfb34f169c>\n\nScaling the serving tier:\n<https://medium.com/airbnb-engineering/unlocking-horizontal-scalability-in-our-web-serving-tier-d907449cdbcf>\n\nDruid Analytics:\n<https://medium.com/airbnb-engineering/druid-airbnb-data-platform-601c312f2a4c>\n\nSpark Streaming for logging events:\n<https://medium.com/airbnb-engineering/scaling-spark-streaming-for-logging-event-ingestion-4a03141d135d>\n\n-Druid Wiki: <https://en.wikipedia.org/wiki/Apache_Druid>\n\n### Data Science at Amazon\n\n<https://aws.amazon.com/solutions/case-studies/amazon-migration-analytics/>\n\n### Data Science at Baidu\n\n<https://www.slideshare.net/databricks/spark-sql-adaptive-execution-unleashes-the-power-of-cluster-in-large-scale-with-chenzhao-guo-and-carson-wang>\n\n### Data Science at Blackrock\n\n<https://www.slideshare.net/DataStax/maintaining-consistency-across-data-centers-randy-fradin-blackrock-cassandra-summit-2016>\n\n### Data Science at BMW\n\n<https://www.unibw.de/code/events-u/jt-2018-workshops/ws3_bigdata_vortrag_widmann.pdf>\n\n### Data Science at Booking.com\n\n| Podcast Episode: #064 Data Engineering at Booking.com Case Study\n|------------------|\n|How Booking.com is doing data engineering? Let’s check it out.\n| [Watch on YouTube](https://youtu.be/9GE3yiVo1FM) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/064-Data-Engineering-At-Booking-com-Case-Study-e45ilg)|\n\n**Slides:**\n\n<https://www.slideshare.net/ConfluentInc/data-streaming-ecosystem-management-at-bookingcom?ref=https://www.confluent.io/kafka-summit-sf18/data-streaming-ecosystem-management>\n\n<https://www.slideshare.net/SparkSummit/productionizing-behavioural-features-for-machine-learning-with-apache-spark-streaming-with-ben-teeuwen-and-roman-studenikin>\n\n<https://www.slideshare.net/ConfluentInc/data-streaming-ecosystem-management-at-bookingcom?ref=https://www.confluent.io/kafka-summit-sf18/data-streaming-ecosystem-management>\n\nDruid:\n<https://towardsdatascience.com/introduction-to-druid-4bf285b92b5a>\n\nKafka Architecture:\n<https://data-flair.training/blogs/kafka-architecture/>\n\nConfluent Platform:\n<https://www.confluent.io/product/confluent-platform/>\n\n### Data Science at CERN\n\n| Podcast Episode: #065 Data Engineering At CERN Case Study\n|------------------|\n|How is CERN doing Data Engineering? They must get huge amounts of data from the Large Hadron Collider. Let’s check it out.\n| [Watch on YouTube](https://youtu.be/LrhfzPsKaDE) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/065-Data-Engineering-At-CERN-Case-Study-e45ime)|\n\n\n**Slides:**\n\n<https://en.wikipedia.org/wiki/Large_Hadron_Collider>\n\n<http://www.lhc-facts.ch/index.php?page=datenverarbeitung>\n\n\n<https://www.slideshare.net/SparkSummit/next-cern-accelerator-logging-service-with-jakub-wozniak>\n\n<https://databricks.com/session/the-architecture-of-the-next-cern-accelerator-logging-service>\n\n<http://opendata.cern.ch>\n\n<https://gobblin.apache.org>\n\n<https://www.slideshare.net/databricks/cerns-next-generation-data-analysis-platform-with-apache-spark-with-enric-tejedor>\n\n<https://www.slideshare.net/SparkSummit/realtime-detection-of-anomalies-in-the-database-infrastructure-using-apache-spark-with-daniel-lanza-and-prasanth-kothuri>\n\n### Data Science at Disney\n\n<https://medium.com/disney-streaming/delivering-data-in-real-time-via-auto-scaling-kinesis-streams-72a0236b2cd9>\n\n### Data Science at DLR\n\n<https://www.unibw.de/code/events-u/jt-2018-workshops/ws3_bigdata_vortrag_bamler.pdf>\n\n### Data Science at Drivetribe\n\n<https://berlin-2017.flink-forward.org/kb_sessions/drivetribes-kappa-architecture-with-apache-flink/>\n\n<https://www.slideshare.net/FlinkForward/flink-forward-berlin-2017-aris-kyriakos-koliopoulos-drivetribes-kappa-architecture-with-apache-flink>\n\n### Data Science at Dropbox\n\n<https://blogs.dropbox.com/tech/2019/01/finding-kafkas-throughput-limit-in-dropbox-infrastructure/>\n\n### Data Science at Ebay\n\n<https://www.slideshare.net/databricks/moving-ebays-data-warehouse-over-to-apache-spark-spark-as-core-etl-platform-at-ebay-with-kim-curtis-and-brian-knauss>\n<https://www.slideshare.net/databricks/analytical-dbms-to-apache-spark-auto-migration-framework-with-edward-zhang-and-lipeng-zhu>\n\n### Data Science at Expedia\n\n<https://www.slideshare.net/BrandonOBrien/spark-streaming-kafka-best-practices-w-brandon-obrien>\n<https://www.slideshare.net/Naveen1914/brandon-obrien-streamingdata>\n\n### Data Science at Facebook\n\n<https://code.fb.com/core-data/apache-spark-scale-a-60-tb-production-use-case/>\n\n### Data Science at Google\n\n<http://www.unofficialgoogledatascience.com/>\\\n<https://ai.google/research/teams/ai-fundamentals-applications/>\\\n<https://cloud.google.com/solutions/big-data/>\\\n<https://datafloq.com/read/google-applies-big-data-infographic/385>\n\n### Data Science at Grammarly\n\n<https://www.slideshare.net/databricks/building-a-versatile-analytics-pipeline-on-top-of-apache-spark-with-mikhail-chernetsov>\n\n### Data Science at ING Fraud\n\n<https://sf-2017.flink-forward.org/kb_sessions/streaming-models-how-ing-adds-models-at-runtime-to-catch-fraudsters/>\n\n### Data Science at Instagram\n\n<https://www.slideshare.net/SparkSummit/lessons-learned-developing-and-managing-massive-300tb-apache-spark-pipelines-in-production-with-brandon-carl>\n\n### Data Science at LinkedIn\n\n| Podcast Episode: #073 Data Engineering At LinkedIn Case Study\n|------------------|\n|Let’s check out how LinkedIn is processing data :)\n| [Watch on YouTube](https://youtu.be/wgfoE8Jbr_Q) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/073-Data-Engineering-At-LinkedIn-Case-Study-e45is6)|\n\n\n**Slides:**\n\n<https://engineering.linkedin.com/teams/data#0>\n\n<https://www.slideshare.net/yaelgarten/building-a-healthy-data-ecosystem-around-kafka-and-hadoop-lessons-learned-at-linkedin>\n\n<https://thirdeye.readthedocs.io/en/latest/about.html>\n\n<http://samza.apache.org>\n\n<https://www.slideshare.net/ConfluentInc/more-data-more-problems-scaling-kafkamirroring-pipelines-at-linkedin?ref=https://www.confluent.io/kafka-summit-sf18/more_data_more_problems>\n\n<https://www.slideshare.net/KhaiTran17/conquering-the-lambda-architecture-in-linkedin-metrics-platform-with-apache-calcite-and-apache-samza>\n\n<https://www.slideshare.net/Hadoop_Summit/unified-batch-stream-processing-with-apache-samza>\n\n<http://druid.io/docs/latest/design/index.html>\n\n### Data Science at Lyft\n\n<https://eng.lyft.com/running-apache-airflow-at-lyft-6e53bb8fccff>\n\n### Data Science at NASA\n\n| Podcast Episode: #067 Data Engineering At NASA Case Study\n|------------------|\n|A look into how NASA is doing data engineering.\n| [Watch on YouTube](https://youtu.be/Pctn_1UoNjA) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/067-Data-Engineering-At-NASA-Case-Study-e45ina)|\n\n\n**Slides:**\n\n<https://esip.figshare.com/articles/Apache_Science_Data_Analytics_Platform/5786421>\n\n<http://www.socallinuxexpo.org/sites/default/files/presentations/OnSightCloudArchitecture-scale14x.pdf>\n\n<https://www.slideshare.net/SparkSummit/spark-at-nasajplchris-mattmann?qid=90968554-288e-454a-b63a-21a45cfc897d&v=&b=&from_search=4>\n\n<https://en.m.wikipedia.org/wiki/Hierarchical_Data_Format>\n\n### Data Science at Netflix\n\n| Podcast Episode: #062 Data Engineering At Netﬂix Case Study\n|------------------|\n|How Netﬂix is doing Data Engineering using their Keystone platform.\n| [Watch on YouTube](https://youtu.be/YWPsYpjNKeM) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/062-Data-Engineering-At-Netflix-Case-Study-e45ikp)|\n\n\nNetflix revolutionized how we watch movies and TV. Currently over 75\nmillion users watch 125 million hours of Netflix content every day!\n\nNetflix's revenue comes from a monthly subscription service. So, the\ngoal for Netflix is to keep you subscribed and to get new subscribers.\n\nTo achieve this, Netflix is licensing movies from studios as well as\ncreating its own original movies and TV series.\n\nBut offering new content is not everything. What is also very important\nis, to keep you watching content that already exists.\n\nTo be able to recommend you content, Netflix is collecting data from\nusers. And it is collecting a lot.\n\nCurrently, Netflix analyses about 500 billion user events per day. That\nresults in a stunning 1.3 Petabytes every day.\n\nAll this data allows Netflix to build recommender systems for you. The\nrecommenders are showing you content that you might like, based on your\nviewing habits, or what is currently trending.\n\n###### The Netflix batch processing pipeline\n\nWhen Netflix started out, they had a very simple batch processing system\narchitecture.\n\nThe key components were Chuckwa, a scalable data collection system,\nAmazon S3 and Elastic MapReduce.\n\n![Old Netflix Batch Processing Pipeline[]{label=\"fig:Bild1\"}](/images/Netflix-Chuckwa-Pipeline.jpg){#fig:Bild1\nwidth=\"90%\"}\n\nChuckwa wrote incoming messages into Hadoop sequence files, stored in\nAmazon S3. These files then could be analysed by Elastic MapReduce jobs.\n\nNetflix batch processing pipeline Jobs were executed regularly on a\ndaily and hourly basis. As a result, Netflix could learn how people used\nthe services every hour or once a day.\n\n###### Know what customers want:\n\nBecause you are looking at the big picture you can create new products.\nNetflix uses insight from big data to create new TV shows and movies.\n\nThey created House of Cards based on data. There is a very interesting\nTED talk about this you should watch:\n\n[How to use data to make a hit TV show \\| Sebastian\nWernicke](https://www.youtube.com/watch?v=vQILP19qABk)\n\nBatch processing also helps Netflix to know the exact episode of a TV\nshow that gets you hooked. Not only globally but for every country where\nNetflix is available.\n\nCheck out the article from TheVerge\n\nThey know exactly what show works in what country and what show does\nnot.\n\nIt helps them create shows that work in everywhere or select the shows\nto license in different countries. Germany for instance does not have\nthe full library that Americans have :(\n\nWe have to put up with only a small portion of TV shows and movies. If\nyou have to select, why not select those that work best.\n\n###### Batch processing is not enough\n\nAs a data platform for generating insight the Cuckwa pipeline was a good\nstart. It is very important to be able to create hourly and daily\naggregated views for user behavior.\n\nTo this day Netflix is still doing a lot of batch processing jobs.\n\nThe only problem is: With batch processing you are basically looking\ninto the past.\n\nFor Netflix, and data driven companies in general, looking into the past\nis not enough. They want a live view of what is happening.\n\n###### The trending now feature\n\nOne of the newer Netflix features is \"Trending now\". To the average user\nit looks like that \"Trending Now\" means currently most watched.\n\nThis is what I get displayed as trending while I am writing this on a\nSaturday morning at 8:00 in Germany. But it is so much more.\n\nWhat is currently being watched is only a part of the data that is used\nto generate \"Trending Now\".\n\n![Netflix Trending Now Feature[]{label=\"fig:Bild1\"}](/images/Netflix-Trending-Now-Screenshot.jpg){#fig:Bild1\nwidth=\"90%\"}\n\n\"Trending now\" is created based on two types of data sources: Play\nevents and Impression events.\n\nWhat messages those two types actually include is not really\ncommunicated by Netflix. I did some research on the Netflix Techblog and\nthis is what I found out:\n\nPlay events include what title you have watched last, where you did stop\nwatching, where you used the 30s rewind and others. Impression events\nare collected as you browse the Netflix Library like scroll up and down,\nscroll left or right, click on a movie and so on.\n\nBasically, play events log what you do while you are watching.\nImpression events are capturing what you do on Netflix, while you are\nnot watching something.\n\n###### Netflix real-time streaming architecture\n\nNetflix uses three internet facing services to exchange data with the\nclient's browser or mobile app. These services are simple Apache Tomcat\nbased web services.\n\nThe service for receiving play events is called \"Viewing History\".\nImpression events are collected with the \"Beacon\" service.\n\nThe \"Recommender Service\" makes recommendations based on trend data\navailable for clients.\n\nMessages from the Beacon and Viewing History services are put into\nApache Kafka. It acts as a buffer between the data services and the\nanalytics.\n\nBeacon and Viewing History publish messages to Kafka topics. The\nanalytics subscribes to the topics and gets the messages automatically\ndelivered in a first in first out fashion.\n\nAfter the analytics the workflow is straight forward. The trending data\nis stored in a Cassandra Key-Value store. The recommender service has\naccess to Cassandra and is making the data available to the Netflix\nclient.\n\n![Netflix Streaming Pipeline[]{label=\"fig:Bild1\"}](/images/Netflix-Streaming-Pipeline.jpg){#fig:Bild1\nwidth=\"90%\"}\n\nThe algorithms how the analytics system is processing all this data is\nnot known to the public. It is a trade secret of Netflix.\n\nWhat is known, is the analytics tool they use. Back in Feb 2015 they\nwrote in the tech blog that they use a custom made tool.\n\nThey also stated, that Netflix is going to replace the custom made\nanalytics tool with Apache Spark streaming in the future. My guess is,\nthat they did the switch to Spark some time ago, because their post is\nmore than a year old.\n\n### Data Science at OLX\n\n| Podcast Episode: #083 Data Engineering at OLX Case Study\n|------------------|\n|This podcast is a case study about OLX with Senior Data Scientist Alexey Grigorev as guest. It was super fun.\n| [Watch on YouTube](https://youtu.be/H_uFNoCvykM) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/083-Data-Engineering-at-OLX-Case-Study-e45j5n)|\n\n\n**Slides:**\n\n<https://www.slideshare.net/mobile/AlexeyGrigorev/image-models-infrastructure-at-olx>\n\n### Data Science at OTTO\n\n<https://www.slideshare.net/SparkSummit/spark-summit-eu-talk-by-sebastian-schroeder-and-ralf-sigmund>\n\n### Data Science at Paypal\n\n<https://www.paypal-engineering.com/tag/data/>\n\n### Data Science at Pinterest\n\n| Podcast Episode: #069 Engineering Culture At Pinterest\n|------------------|\n|In this podcast we look into data platform and processing at Pinterest.\n| [Watch on YouTube](https://youtu.be/cqWXGVoDX8Q) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/069-Data-Engineering-At-Pinterest-Case-Study-e45ioh)|\n\n**Slides:**\n\n<https://www.slideshare.net/ConfluentInc/pinterests-story-of-streaming-hundreds-of-terabytes-of-pins-from-mysql-to-s3hadoop-continuously?ref=https://www.confluent.io/kafka-summit-sf18/pinterests-story-of-streaming-hundreds-of-terabytes>\n\n<https://www.slideshare.net/ConfluentInc/building-pinterest-realtime-ads-platform-using-kafka-streams?ref=https://www.confluent.io/kafka-summit-sf18/building-pinterest-real-time-ads-platform-using-kafka-streams>\n\n<https://medium.com/@Pinterest_Engineering/building-a-real-time-user-action-counting-system-for-ads-88a60d9c9a>\n\n<https://medium.com/pinterest-engineering/goku-building-a-scalable-and-high-performant-time-series-database-system-a8ff5758a181>\n\n<https://medium.com/pinterest-engineering/building-a-dynamic-and-responsive-pinterest-7d410e99f0a9>\n\n<https://medium.com/@Pinterest_Engineering/building-pin-stats-25ec8460e924>\n\n<https://medium.com/@Pinterest_Engineering/improving-hbase-backup-efficiency-at-pinterest-86159da4b954>\n\n<https://medium.com/@Pinterest_Engineering/pinterest-joins-the-cloud-native-computing-foundation-e3b3e66cb4f>\n\n<https://medium.com/@Pinterest_Engineering/using-kafka-streams-api-for-predictive-budgeting-9f58d206c996>\n\n<https://medium.com/@Pinterest_Engineering/auto-scaling-pinterest-df1d2beb4d64>\n\n### Data Science at Salesforce\n\n<https://engineering.salesforce.com/building-a-scalable-event-pipeline-with-heroku-and-salesforce-2549cb20ce06>\n\n### Data Science at Siemens Mindsphere\n\n| Podcast Episode: #059 What Is The Siemens Mindsphere IoT Platform?\n|------------------|\n|The Internet of things is a huge deal. There are many platforms available. But, which one is actually good? Join me on a 50 minute dive into the Siemens Mindsphere online documentation. I have to say I was super unimpressed by what I found. Many limitations, unclear architecture and no pricing available? Not good!\n| [Watch on YouTube](https://youtu.be/HEd5Tsuy5HE) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/059-A-Look-Into-The-Siemens-Mindsphere-IoT-Platform---059-e45ihn)|\n\n### Data Science at Slack\n\n<https://speakerdeck.com/vananth22/streaming-data-pipelines-at-slack>\n\n### Data Science at Spotify\n\n| Podcast Episode: #071 Data Engineering At Spotify Case Study\n|------------------|\n|In this episode we are looking at data engineering at Spotify, my favorite music streaming service. How do they process all that data?\n| [Watch on YouTube](https://youtu.be/0WJZ5wtQRWI) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/071-Data-Engineering-At-Spotify-Case-Study-e45iq1)|\n\n\n**Slides:**\n\n<https://labs.spotify.com/2016/02/25/spotifys-event-delivery-the-road-to-the-cloud-part-i/>\n\n<https://labs.spotify.com/2016/03/03/spotifys-event-delivery-the-road-to-the-cloud-part-ii/>\n\n<https://labs.spotify.com/2016/03/10/spotifys-event-delivery-the-road-to-the-cloud-part-iii/>\n\n<https://www.slideshare.net/InfoQ/scaling-the-data-infrastructure-spotify>\n\n<https://www.datanami.com/2018/05/16/big-data-file-formats-demystified/>\n\n<https://labs.spotify.com/2017/04/26/reliable-export-of-cloud-pubsub-streams-to-cloud-storage/>\n\n<https://labs.spotify.com/2017/11/20/autoscaling-pub-sub-consumers/>\n\n### Data Science at Symantec\n\n<https://www.slideshare.net/planetcassandra/symantec-cassandra-data-modelling-techniques-in-action>\n\n### Data Science at Tinder\n\n<https://www.slideshare.net/databricks/scalable-monitoring-using-apache-spark-and-friends-with-utkarsh-bhatnagar>\n\n### Data Science at Twitter\n\n| Podcast Episode: #072 Data Engineering At Twitter Case Study\n|------------------|\n|How is Twitter doing data engineering? Oh man, they have a lot of cool things to share these tweets.\n| [Watch on YouTube](https://youtu.be/UkqSR3IeLZ8) \\ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/072-Data-Engineering-At-Twitter-Case-Study-e45iqq)|\n\n\n**Slides:**\n\n<https://www.slideshare.net/sawjd/real-time-processing-using-twitter-heron-by-karthik-ramasamy>\n\n<https://www.slideshare.net/sawjd/big-data-day-la-2016-big-data-track-twitter-heron-scale-karthik-ramasamy-engineering-manager-twitter>\n\n<https://techjury.net/stats-about/twitter/>\n\n<https://developer.twitter.com/en/docs/tweets/post-and-engage/overview>\n\n<https://www.slideshare.net/prasadwagle/extracting-insights-from-data-at-twitter>\n\n<https://blog.twitter.com/engineering/en_us/topics/insights/2018/twitters-kafka-adoption-story.html>\n\n<https://blog.twitter.com/engineering/en_us/topics/infrastructure/2017/the-infrastructure-behind-twitter-scale.html>\n\n<https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/the-start-of-a-journey-into-the-cloud.html>\n\n<https://www.slideshare.net/billonahill/twitter-heron-in-practice>\n\n<https://medium.com/@kramasamy/introduction-to-apache-heron-c64f8c7c0956>\n\n<https://www.youtube.com/watch?v=3QHGhnHx5HQ>\n\n<https://hbase.apache.org>\n\n<https://db-engines.com/en/system/Amazon+DynamoDB%3BCassandra%3BGoogle+Cloud+Bigtable%3BHBase>\n\n### Data Science at Uber\n\n<https://eng.uber.com/uber-big-data-platform/>\n\n<https://eng.uber.com/aresdb/>\n\n<https://www.uber.com/us/en/uberai/>\n\n### Data Science at Upwork\n\n<https://www.slideshare.net/databricks/how-to-rebuild-an-endtoend-ml-pipeline-with-databricks-and-upwork-with-thanh-tran>\n\n### Data Science at Woot\n\n<https://aws.amazon.com/de/blogs/big-data/our-data-lake-story-how-woot-com-built-a-serverless-data-lake-on-aws/>\n\n### Data Science at Zalando\n\n| Podcast Episode: #087 Data Engineering At Zalando Case Study Talk\n|------------------|\n|I had a great conversation about data engineering for online retailing with Michal Gancarski and Max Schultze. They showed Zalando’s data platform and how they build data pipelines. Super interesting especially for AWS users.\n| [Watch on YouTube](https://youtu.be/IXOLsNA6Hm0)\n\nDo me a favor and give these guys a follow on LinkedIn:\n\nLinkedIn of Michal: <https://www.linkedin.com/in/michalgancarski/>\n\nLinkedIn of Max: <https://www.linkedin.com/in/max-schultze-b11996110/>\n\nZalando has a tech blog with more infos and there is also a meetup in\nBerlin:\n\nZalando Blog: <https://jobs.zalando.com/tech/blog/>\n\nNext Zalando Data Engineering Meetup:\n<https://www.meetup.com/Zalando-Tech-Events-Berlin/events/262032282/>\n\nInteresting tools:\n\nAWS CDK: <https://docs.aws.amazon.com/cdk/latest/guide/what-is.html>\n\nDelta Lake: <https://delta.io/>\n\nAWS Step Functions:\n[https://aws.amazon.com/step-functions/ AWS State Language: https://states-language.net/spec.html](https://aws.amazon.com/step-functions/ AWS State Language: https://states-language.net/spec.html)\n\nYoutube channel of the meetup:\n[https://www.youtube.com/channel/UCxwul7aBm2LybbpKGbCOYNA/playlists talk at Spark+AI](https://www.youtube.com/channel/UCxwul7aBm2LybbpKGbCOYNA/playlists talk at Spark+AI)\n\nSummit about Zalando's Processing Platform:\n<https://databricks.com/session/continuous-applications-at-scale-of-100-teams-with-databricks-delta-and-structured-streaming>\n\nTalk at Strata London slides:\n<https://databricks.com/session/continuous-applications-at-scale-of-100-teams-with-databricks-delta-and-structured-streaming>\n\n<https://jobs.zalando.com/tech/blog/what-is-hardcore-data-science--in-practice/?gh_src=4n3gxh1>\n\n<https://jobs.zalando.com/tech/blog/complex-event-generation-for-business-process-monitoring-using-apache-flink/>\n"
  },
  {
    "path": "sections/06-BestPracticesCloud.md",
    "content": "Best Practices Cloud Platforms\n=============================\n\nThis section is a collection of best practices on how you can arrange the tools together to a platform.  \nIt's here especially to help you start your own project in the cloud on AWS, Azure and GCP.\n\nLike the advanced skills section this section also follows my [My Data Science Platform Blueprint](sections/01-Introduction.md#my-big-data-platform-blueprint).\nIn the blueprint I divided the platform into sections: Connect, Buffer, Processing, Store and Visualize.\n\nThis order will help you learn how to connect the right tools together.\nTake your time and research the tools and learn how they work.\n\nRight now the Azure section has a lot of links to platform examples.\nThey are also useful for AWS and GCP, just try to change out the tools.\n\nAs always, I am going to add more stuff to this over time.\n\nHave fun!\n\n## Contents\n\n- [Amazon Web Services (AWS)](06-BestPracticesCloud.md#aws)\n  - [Connect](06-BestPracticesCloud.md#Connect)\n  - [Buffer](06-BestPracticesCloud.md#Buffer)\n  - [Processing](06-BestPracticesCloud.md#Processing)\n  - [Store](06-BestPracticesCloud.md#Store)\n  - [Visualize](06-BestPracticesCloud.md#Visualize)\n  - [Containerization](06-BestPracticesCloud.md#Containerization)\n  - [Best Practices](06-BestPracticesCloud.md#Best-Practices)\n  - [More Details](06-BestPracticesCloud.md#More-Details)\n- [Microsoft Azure](06-BestPracticesCloud.md#azure)\n  - [Connect](06-BestPracticesCloud.md#Connect-1)\n  - [Buffer](06-BestPracticesCloud.md#Buffer-1)\n  - [Processing](06-BestPracticesCloud.md#Processing-1)\n  - [Store](06-BestPracticesCloud.md#Store-1)\n  - [Visualize](06-BestPracticesCloud.md#Visualize-1)\n  - [Containerization](06-BestPracticesCloud.md#Containerization-1)\n  - [Best Practices](06-BestPracticesCloud.md#Best-Practices-1)\n- [Google Cloud Platform (GCP)](06-BestPracticesCloud.md#gcp)\n  - [Connect](06-BestPracticesCloud.md#Connect-2)\n  - [Buffer](06-BestPracticesCloud.md#Buffer-2)\n  - [Processing](06-BestPracticesCloud.md#Processing-2)\n  - [Store](06-BestPracticesCloud.md#Store-2)\n  - [Visualize](06-BestPracticesCloud.md#Visualize-2)\n  - [Containerization](06-BestPracticesCloud.md#Containerization-2)\n  - [Best Practices](06-BestPracticesCloud.md#Best-Practices-2)\n\n# AWS\n## Connect\n- Elastic Beanstalk (very old)\n- SES Simple Email Service\n- API Gateway\n## Buffer\n- Kinesis\n- Kinesis Data Firehose\n- Managed Streaming for Kafka (MSK)\n- MQ\n- Simple Queue Service (SQS)\n- Simple Notification Service (SNS)\n## Processing\n- EC2\n- Athena\n- EMR\n- Elasticsearch\n- Kinesis Data Analytics\n- Glue\n- Step Functions\n- Fargate\n- Lambda\n- SageMaker\n## Store\n- Simple Storage Service (S3)\n- Redshift\n- Aurora\n- RDS\n- DynamoDB\n- ElastiCache\n- Neptune Graph DB\n- Timestream\n- DocumentDB (MongoDB compatible)\n## Visualize\n- Quicksight\n\n## Containerization\n- Elastic Container Service (ECS)\n- Elastic Container Registry (ECR)\n- Elastic Kubernetes Service (EKS)\n\n## Best Practices\nDeploying a Spring Boot Application on AWS Using AWS Elastic Beanstalk:\n\n[https://aws.amazon.com/de/blogs/devops/deploying-a-spring-boot-application-on-aws-using-aws-elastic-beanstalk/](https://aws.amazon.com/de/blogs/devops/deploying-a-spring-boot-application-on-aws-using-aws-elastic-beanstalk/)\n\nHow to deploy a Docker Container on AWS:\n\n[https://aws.amazon.com/getting-started/hands-on/deploy-docker-containers/](https://aws.amazon.com/getting-started/hands-on/deploy-docker-containers/)\n\n\n#### AWS platform architecture for GenAI\n\n![Imagetitle](/images/06/genai-enterprise.png)\n▶ [Click here to watch](https://youtu.be/2yX6G4ZURbc)\n\nI recorded a reaction video to an AWS platform architecture for GenAI called Tailwinds. Presented by John from Innovative Solutions and Josh from AWS, it has two main flows: indexing and consumer.\n\nData enters through S3 buckets or an API gateway, processed by AWS Lambda or Glue, and stored in a vector or graph database, then indexed in OpenSearch. Applications like chatbots use an API gateway to trigger Lambda functions for data retrieval and processing. This flexible serverless setup supports various data formats and uses tools like SAM and Terraform.\n\nAmazon Bedrock helps customers choose and evaluate models. The architecture is flexible but requires effort to create the necessary Lambda functions. Check out the video and share your thoughts!\n\n▶ [Click here to watch](https://youtu.be/2yX6G4ZURbc)\n\n#### Generative AI enabled job search engine\n\n![Imagetitle](/images/06/job-search.png)\n\n▶ [Click here to watch](https://youtu.be/dOWqasmqfHQ)\n\nHey everyone, I recorded a reaction video to an AWS platform architecture for a Gen AI job search engine. Presented by Andrea from AWS and Bill from Healthy Careers, this setup uses generative AI to enhance job searches for healthcare professionals.\n\nThe architecture uses Elastic Container Service (ECS) to handle user queries, processed by Claude II for prompt checks and geolocation. Cleaned prompts are vectorized using Amazon's Titan model, with user search history fetched from an SQL database. Search results are stored in Elasticsearch, updating every six hours. Finally, Claude II generates a response from the search results and sends it back to the user.\n\nI found the use of Claude II for prompt sanitization and geolocation, and the integration of multiple AI models through AWS Bedrock, particularly interesting. This setup keeps data private and provides a flexible, efficient job search experience.\n\nCheck out the video and share your thoughts!\n\n\n#### Voice transcription and analysis on AWS\n\n![Imagetitle](/images/06/voice-transcription.png)\n\n▶ [Click here to watch](https://youtu.be/RGXRjOTQuBM)\n\nHey everyone, I recorded a reaction video to an AWS architecture for voice transcription and analysis. Presented by Nuan from AWS and Ben from Assembly AI, this system is designed to handle large-scale audio data processing.\n\nUsers upload audio data via an API to an ECS container. The data is then managed by an orchestrator that decides which models to use and in what order. The orchestrator sends tasks to SQS, which triggers various ML models running on ECS. These models handle tasks like speech-to-text conversion, sentiment analysis, and speaker labeling. Results are stored in S3 and users are notified via SNS and a Lambda function when processing is complete.\n\nI found the use of ECS for containerized applications and the flexibility of swapping models through ECR particularly interesting. This architecture ensures scalability and efficiency, making it ideal for handling millions of requests per day.\n\nCheck out the video and share your thoughts!\n\n\n#### GeoSpatial Data Analysis\n\n![Imagetitle](/images/06/geo-spacial.png)\n\n▶ [Click here to watch](https://youtu.be/MxVJAvFSTXg)\n\nHey everyone, I recorded a reaction video to an AWS architecture for geospatial data analysis by TCS. Presented by David John and Suryakant from TCS, this platform is used in next-gen agriculture for tasks like crop health, yield, and soil moisture analysis.\n\nThe platform uses data from satellites, AWS open data, and field agents, processing it with Lambda, Sagemaker, and PostgreSQL. Data is stored and analyzed in S3 buckets and PostgreSQL, with results made accessible via EKS-deployed UIs on EC2 instances, buffered through CloudFront for efficiency.\n\nKey aspects include:\n\n- Lambda functions triggering Sagemaker jobs for machine learning.\n- Sagemaker handling extensive processing tasks.\n- PostgreSQL and S3 for storing processed data.\n- CloudFront caching data to enhance user experience.\n- I found the use of parallel Sagemaker jobs for scalability and the integration of open data for cost efficiency particularly interesting. This setup effectively meets the agricultural sector's data analysis needs.\n\nCheck out the video and share your thoughts!\n\n\n#### Building a Self-Service Enterprise Data Engineering Platform\n\n![Imagetitle](/images/06/enterprise-solution.png)\n\n▶ [Click here to watch](https://youtu.be/E9JFCl7bk88)\n\nHey everyone, I recorded a reaction video to an AWS architecture for a self-service enterprise data engineering platform by ZS Associates. Presented by David John and Laken from ZS Associates, this platform is designed to streamline data integration, infrastructure provisioning, and data access for life sciences companies.\n\nKey components:\n- **Users and Interaction**: Data engineers and analysts interact through a self-service web portal, selecting infrastructure types and providing project details. This portal makes REST requests to EKS, which creates records in PostgreSQL and triggers infrastructure provisioning via SQS.\n- **Infrastructure Provisioning**: EKS processes SQS messages to provision infrastructure such as EMR clusters, databases in Glue Catalog, S3 buckets, and EC2 instances with containerized services like Airflow or NiFi. IAM roles are configured for access control.\n- **Data Governance and Security**: All data sets are accessed through the Glue Catalog, with governance workflows requiring approval from data owners via SES notifications. EKS updates IAM roles and Ranger policies for fine-grained access control.\n- **Scalability and Efficiency**: EKS hosts 100+ microservices supporting workflows and UI portals. The platform handles millions of API requests and hundreds of data access requests monthly, with auto-scaling capabilities to manage costs.\n\nThis architecture effectively reduces time to market, enhances security at scale, and optimizes costs by automating data access and infrastructure provisioning. It also ensures data governance and security through controlled access and approval processes.\n\nCheck out the video and share your thoughts!\n\n\n#### Customer Support Platform\n\n![Imagetitle](/images/06/customer-support.png)\n\n▶ [Click here to watch](https://youtu.be/sCIFpOuryFU)\n\nHey everyone, I recorded a reaction video to an AWS architecture for a personalized customer support platform by Traeger. Presented by David John and Lizzy from Traeger, this system enhances customer support by leveraging data from Shopify, EventBridge, Kinesis Data Firehose, S3, Lambda, DynamoDB, and Amazon Connect.\n\nKey components:\n- **Order Processing**: Customer order data from Shopify flows into EventBridge, then to Kinesis Data Firehose, which writes it to S3. An event trigger in S3 invokes a Lambda function that stores specific order metadata in DynamoDB.\n- **Personalized Customer Support**: When a customer calls, Amazon Connect uses Pinpoint to determine the call's origin, personalizing the language options. Connect triggers a Lambda function to query DynamoDB for customer metadata based on the phone number. This data is used to inform the customer support agent.\n- **Reason for Contact**: Amazon Lex bot asks the customer the reason for their call, and this information, along with customer metadata, routes the call to a specialized support queue.\n\nI found the use of DynamoDB for storing customer metadata and the integration with Amazon Connect and Lex for personalized support particularly interesting. The architecture is scalable and ensures a personalized experience for customers.\n\nCheck out the video and share your thoughts!\n\n\n#### League of Legends Data Platform on AWS\n\n![Imagetitle](/images/06/league.jpg)\n\n▶ [Click here to watch](https://youtu.be/FX_ZUJk_WoE)\n\nHey everyone, I recorded a reaction video to an AWS architecture for the data platform that powers League of Legends by Riot Games. Presented by David John and the team at Riot Games, this system handles massive amounts of data generated by millions of players worldwide.\n\nKey components:\n- **Player Interaction**: Players connect to game servers globally. The game client communicates with an API running in EKS. This setup ensures low latency and optimal performance.\n- **Data Ingestion**: The game client and server send data about player interactions to EKS, which flows into MSK (Managed Streaming for Kafka). Local Kafka clusters buffer the data before it’s replicated to regional MSK clusters using MirrorMaker.\n- **Data Processing**: Spark Streaming jobs process the data from MSK and store it in Delta Lake on S3. This setup ensures efficient data handling and reduces latency in data availability.\n- **Data Storage and Access**: Glue serves as the data catalog, managing metadata and permissions. Data consumers, including analysts, designers, engineers, and executives, access this data through Databricks, leveraging Glue for structured queries.\n\nI found the use of MSK and Spark for scalable data ingestion and processing particularly interesting. This architecture supports real-time analytics, allowing Riot Games to quickly assess the impact of new patches and gameplay changes.\n\nCheck out the video and share your thoughts!\n\n\n\n#### Platform Connecting 70 Million Cars\n\n![Imagetitle](/images/06/70m-cars.png)\n\n▶ [Click here to watch](https://youtu.be/1nifzmvOGHs)\n\nHey everyone, I recorded a reaction video to an AWS architecture for a connected car platform by Mobileye. Presented by David John and the team at Mobileye, this system connects 70 million cars, collecting and processing data to offer digital services and fleet analysis.\n\nKey components:\n- **Data Collection**: Cars collect anonymized data using sensors and visual inspections, sending it to a REST API and storing it in S3.\n- **Data Processing**: The data is pulled from S3 into SQS and processed by EKS workers, which scale according to the queue size. Processed data is stored back in S3 and further analyzed using step functions and Lambda for tasks like extracting construction zones and clustering observations.\n- **Data Storage**: Processed data is stored in S3, Elasticsearch, and CockroachDB. Elasticsearch handles document-based data with self-indexing, while CockroachDB supports frequent updates.\n- **Data Consumption**: EKS hosts a secured REST API and web application, allowing customers like city planners to access insights on pedestrian and bicycle traffic.\n\nFuture plans include enabling cloud image processing on EKS with GPU instances and focusing on cost reduction as data flow increases.\n\nI found the use of EKS for scalable data processing and the combination of Elasticsearch and CockroachDB for different data needs particularly interesting. This architecture efficiently handles large-scale data from millions of connected cars.\n\nCheck out the video and share your thoughts!\n\n\n#### 55TB A Day: Nielsen AWS Data Architecture\n\n![Imagetitle](/images/06/55-tb.png)\n\n▶ [Click here to watch](https://youtu.be/WCQe1VP_q5A)\n\nHey everyone, I recorded a reaction video to an AWS architecture for Nielsen Marketing Cloud, which processes 55TB of data daily. Presented by David John, this system handles marketing segmentation data for campaigns.\n\nKey components:\n- **Data Ingestion**: Marketing data comes in files, written to S3. Spark on EMR processes and transforms the data, writing the output to another S3 bucket.\n- **Data Processing**: Lambda functions handle the final formatting and upload the data to over 100 ad networks. Metadata about file processing is managed in a PostgreSQL RDS database.\n- **Metadata Management**: A work manager Lambda reads metadata from RDS, triggers processing jobs in EMR, and updates the metadata post-processing.\n- **Scaling and Rate Limiting**: The serverless architecture allows automatic scaling. However, rate limiting is implemented to prevent overloading ad networks, ensuring they handle data bursts smoothly.\n\nChallenges and Solutions:\n- **Scale**: The system handles 250 billion events per day, scaling up and down automatically to manage peak loads.\n- **Rate Limiting**: To avoid overwhelming ad networks, a rate-limiting mechanism was introduced, managing data flow based on network capacity.\n- **Back Pressure Management**: SQS is used to buffer Lambda responses, preventing direct overload on the PostgreSQL database.\n\nI found the use of SQS for metadata management and the serverless architecture for handling massive data loads particularly interesting. This setup ensures efficient data processing and smooth delivery to ad networks.\n\nCheck out the video and share your thoughts!\n\n\n#### Orange Theory Fitness\n\n![Image](/images/06/fitness-1.jpeg)\n\n▶ [Click here to watch](https://youtu.be/ssaXRo5s1r4)\n\nHey, everybody! Today, I'm reacting to the AWS data infrastructure at Orange Theory Fitness, where they collect data from wristbands and training machines. Let's dive in and see how they manage it all.\n\n### Key Components\n\n1. **Local Server**: Aggregates data from in-studio equipment and mobile apps, ensuring resiliency if the cloud connection is lost.\n2. **API Gateway and Cognito**: Handle authentication and route data to the cloud.\n3. **Lambda Functions**: Process data.\n4. **Aurora RDS (MySQL)**: Stores structured data like member profiles, class bookings, and studio information.\n5. **DynamoDB**: Stores performance metrics and workout statistics for quick access.\n6. **S3**: Serves as a data lake, storing telemetry data.\n7. **Kinesis Firehose**: Streams telemetry data to S3.\n\n### Challenges & Solutions\n\n1. **Resiliency**\n   - **Challenge**: Ensure operations continue if cloud connection is lost.\n   - **Solution**: Local server aggregates data and syncs with the cloud once the connection is restored.\n\n2. **Data Integration**\n   - **Challenge**: Integrate data from various sources.\n   - **Solution**: Use API Gateway and Cognito for unified authentication and data routing.\n\n3. **Data Processing**\n   - **Challenge**: Efficiently process and store different types of data.\n   - **Solution**: Use Lambda for processing, Aurora RDS for structured data, DynamoDB for quick access to performance metrics, and Kinesis Firehose with S3 for streaming and storing large volumes of telemetry data.\n\nThis architecture leverages AWS tools for scalability, flexibility, and resilience, making it an excellent example of a well-thought-out data infrastructure for a fitness application.\n\nLet me know your thoughts in the comments. What do you think of this architecture? Would you have done anything differently? If you have any questions, feel free to ask. And if you're interested in learning more about data engineering, check out my academy at learndataengineering.com. See you in the next video!\n\n\n## More Details\nAWS Whitepapers:\n\n[https://d1.awsstatic.com/whitepapers/aws-overview.pdf](https://d1.awsstatic.com/whitepapers/aws-overview.pdf)\n\n\n# Azure\n## Connect\n- Event Hub\n- IoT Hub\n## Buffer\n- Data Factory\n- Event Hub\n- RedisCache (also Store)\n## Processing\n- Stream Analytics Service\n- Azure Databricks\n- Machine Learning\n- Azure Functions\n- Azure HDInsight (Hadoop PaaS)\n## Store\n- Blob\n- CosmosDB\n- MariaDB\n- MySQL\n- PostgreSQL\n- SQL\n- Azure Data lake\n- Azure Storage (SQL Table?)\n- Azure Synapse Analytics\n## Visualize\n- PowerBI\n## Containerization\n- Virtual Machines\n- Virtual Machine Scale Sets\n- Azure Container Service (AKS)\n- Container Instances\n- Azure Kubernetes Service\n## Best Practices\n\nAdvanced Analytics Architecture:\n\n[https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/advanced-analytics-on-big-data](https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/advanced-analytics-on-big-data)\n\nAnomaly Detection in Real-time Data Streams:\n\n[https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/anomaly-detection-in-real-time-data-streams](https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/anomaly-detection-in-real-time-data-streams)\n\nModern Data Warehouse Architecture:\n\n[https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/modern-data-warehouse](https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/modern-data-warehouse)\n\nCI/CD for Containers:\n\n[https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/cicd-for-containers](https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/cicd-for-containers)\n\nReal Time Analytics on Big Data Architecture:\n\n[https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/real-time-analytics](https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/real-time-analytics)\n\nAnomaly Detection in Real-time Data Streams:\n\n[https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/anomaly-detection-in-real-time-data-streams](https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/anomaly-detection-in-real-time-data-streams)\n\nIoT Architecture – Azure IoT Subsystems:\n\n[https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/azure-iot-subsystems](https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/azure-iot-subsystems)\n\nTier Applications & Data for Analytics:\n\n[https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/tiered-data-for-analytics](https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/tiered-data-for-analytics)\n\nExtract, transform, and load (ETL) using HDInsight:\n\n[https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/extract-transform-and-load-using-hdinsight](https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/extract-transform-and-load-using-hdinsight)\n\nIoT using Cosmos DB:\n\n[https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/iot-using-cosmos-db](https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/iot-using-cosmos-db)\n\nStreaming using HDInsight:\n\n[https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/streaming-using-hdinsight](https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/streaming-using-hdinsight)\n\n# GCP\n## Connect\n- Cloud IoT Core\n- App Engine\n- Cloud Dataflow\n## Buffer\n- Pub/Sub\n## Processing\n- Compute Engine\n- Cloud Functions\n- Specialized tools:\n  - Cloud Dataflow\n  - Cloud Dataproc\n  - Cloud Datalab\n  - Cloud Dataprep\n  - Cloud Composer\n- App Engine\n## Store\n- Cloud Storage\n- Cloud SQL\n- Cloud Spanner\n- Cloud Datastore\n- Cloud BigTable\n- Cloud Storage\n- Cloud Memorystore\n- BigQuery\n## Visualize\n\n## Containerization\n- Kubernetes Engine\n- Container Security\n## Best Practices\n\nThanks to Ismail Holoubi for the following GCP links\n\nBest practices for migrating virtual machines to Compute Engine:\n\nhttps://cloud.google.com/solutions/best-practices-migrating-vm-to-compute-engine\n\nBest practices for Cloud Storage:\n\nhttps://cloud.google.com/storage/docs/best-practices\n\nMoving a publishing workflow to BigQuery for new data insights:\n\nhttps://cloud.google.com/blog/products/data-analytics/moving-a-publishing-workflow-to-bigquery-for-new-data-insights\n\nArchitecture: Optimizing large-scale ingestion of analytics events and logs:\n\nhttps://cloud.google.com/solutions/architecture/optimized-large-scale-analytics-ingestion\n\nChoosing the right architecture for global data distribution:\n\nhttps://cloud.google.com/solutions/architecture/global-data-distribution\n\nBest Practices for Operating Containers:\n\nhttps://cloud.google.com/solutions/best-practices-for-operating-containers\n\n\nAutomating IoT Machine Learning: Bridging Cloud and Device Benefits with AI Platform:\n\nhttps://cloud.google.com/solutions/automating-iot-machine-learning\n"
  },
  {
    "path": "sections/07-DataSources.md",
    "content": "\n100 Plus Data Sources Data Science\n===================================\n\nThis is a section with links to data sources. During my data engineer coaching we need to find good data sets to work with.\nSo, I started this section to make it easier to find good sources.\n\nI've taken these links from articles and blog posts. Why not only link the articles?\nYou know, these posts can go away at any time. I want to keep the links to the platforms either way.\n\nI haven't had the chance to check each link myself. Please let me know if something isn't right.\n\nYou can find the articles on the bottom of this section to read more. They include even more data sources I haven't had time to add to this list.\n\n\n\n## Contents\n\n- [Student Favorites](07-DataSources.md#Student-Favorites)\n- [Content Marketing](07-DataSources.md#Content-Marketing)\n- [Crime](07-DataSources.md#Crime)\n- [Drugs](07-DataSources.md#Drugs)\n- [Education](07-DataSources.md#Education)\n- [Entertainment](07-DataSources.md#Entertainment)\n- [Environmental And Weather Data](07-DataSources.md#Environmental-And-Weather-Data)\n- [Financial And Economic Data](07-DataSources.md#Financial-And-Economic-Data])\n- [General And Academic](07-DataSources.md#General-And-Academic)\n- [Government And World](07-DataSources.md#Government-And-World)\n- [Health](07-DataSources.md#Health)\n- [Human Rights](07-DataSources.md#Human-Rights)\n- [Labor And Employment Data](07-DataSources.md#Labor-And-Employment-Data)\n- [Politics](07-DataSources.md#Politics)\n- [Retail](07-DataSources.md#Retail)\n- [Social](07-DataSources.md#Social)\n- [Source Articles and Blog Posts](07-DataSources.md#Source-Articles-and-Blog-Posts)\n- [Travel And Transportation](07-DataSources.md#Travel-And-Transportation)\n- [Various Portals](07-DataSources.md#Various-Portals)\n\n## Student Favorites\n\nIn my Coaching program my students learn by doing a project. And the foundation of every project is selecting a dataset.\nThat can be an API or a file source, depending a lot on the student's goals and interests.\n\nWorking out goals for the dataset, figuring out the data modeling, creating the architecture and building it. \nIt's a fun way to learn and get better at Data Engineering. \n\nHere's a list of my student's favorite datasets and APIs\n\nLearn more about the Coaching program: [click here](https://learndataengineering.com/p/data-engineering-coaching)\n\n### Datasets\n\n- [Fraud detection](https://www.kaggle.com/datasets/kartik2112/fraud-detection)\n- [Industrial equipment monitoring](https://www.kaggle.com/datasets/dnkumars/industrial-equipment-monitoring-dataset)\n- [Energy demand & generation](https://www.kaggle.com/datasets/nicholasjhana/energy-consumption-generation-prices-and-weather?select=weather_features.csv)\n- [Online Retail](https://www.kaggle.com/datasets/tunguz/online-retail)\n- [Brazilian E-commerce](https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce)\n- [Beijing Air Quality](https://www.kaggle.com/datasets/sid321axn/beijing-multisite-airquality-data-set)\n- [NYC Taxi](https://www.kaggle.com/datasets/diishasiing/revenue-for-cab-drivers)\n\n### APIs\n\n- [Bike sharing Bluebikes](https://bluebikes.com/system-data)\n- [Bike sharing Divvy Bikes](https://divvybikes.com/system-data)\n- [Weather API](https://www.weatherapi.com/docs/)\n- [Bluesky API](https://docs.bsky.app/docs/advanced-guides/api-directory)\n- [Guardian news API](https://open-platform.theguardian.com/)\n- [Football API](https://www.api-football.com/)\n\n\n## General And Academic\n\n- [Amazon Public Data Sets](https://registry.opendata.aws/)\n- [Datasets Subreddit](https://www.reddit.com/r/datasets)\n- [Enigma Public](https://public.enigma.com/)\n- [FiveThirtyEight](http://fivethirtyeight.com/)\n- [Google Scholar](http://scholar.google.com/)\n- [Pew Research](http://www.pewresearch.org/)\n- [The Upshot by New York Times](http://www.nytimes.com/section/upshot)\n- [UNData](http://data.un.org/)\n\n## Content Marketing\n\n- [Buffer](https://blog.bufferapp.com/)\n- [Content Marketing Institute](http://contentmarketinginstitute.com/about/)\n- [HubSpot](http://www.hubspot.com/marketing-statistics)\n- [Moz](https://moz.com/blog)\n\n## Crime\n\n- [Bureau of Justice Statistics](http://www.bjs.gov/index.cfm?ty=dca)\n- [FBI Crime Statistics](https://www.fbi.gov/stats-services/crimestats)\n- [National Archive of Criminal Justice Data](https://www.icpsr.umich.edu/icpsrweb/NACJD/)\n- [Uniform Crime Reporting Statistics](https://crime-data-explorer.fr.cloud.gov/)\n\n## Drugs\n\n- [Drug Data and Database by First Databank](http://www.fdbhealth.com/)\n- [Drug War Facts](http://www.drugwarfacts.org/)\n- [National Institute on Drug Abuse](https://www.drugabuse.gov/related-topics/trends-statistics)\n- [U.S. Food and Drug Administration](http://www.fda.gov/Drugs/InformationOnDrugs/ucm079750.htm)\n- [United Nations Office on Drugs and Crime](https://www.unodc.org/unodc/en/data-and-analysis/)\n\n## Education\n\n- [Education Data by the World Bank](http://data.worldbank.org/topic/education)\n- [Education Data by Unicef](http://data.unicef.org/education/overview.html)\n- [National Center for Education Statistics](https://nces.ed.gov/)\n\n## Entertainment\n\n- [Academic Rights Press](http://www.academicrightspress.com/entertainment/music)\n- [BFI Film Forever](http://www.bfi.org.uk/education-research/film-industry-statistics-research)\n- [BLS: Arts, Entertainment, and Recreation](http://www.bls.gov/iag/tgs/iag71.htm)\n- [IFPI](http://www.ifpi.org/global-statistics.php)\n- [Million Song Dataset](https://aws.amazon.com/datasets/million-song-dataset/)\n- [Statista: Film Industry](http://www.statista.com/topics/964/film/)\n- [Statista: Music Industry](http://www.statista.com/topics/1639/music/)\n- [Statista: Video Game Industry](http://www.statista.com/topics/868/video-games/)\n- [The Numbers](http://www.the-numbers.com/)\n\n## Environmental And Weather Data\n\n- [Environmental Protection Agency](https://www.epa.gov/data)\n- [International Energy Agency Atlas](https://www.iea.org/data-and-statistics?country=WORLD&fuel=Energy%20supply&indicator=TPESbySource)\n- [National Center for Environmental Health](http://www.cdc.gov/nceh/data.htm)\n- [National Climatic Data Center](http://www.ncdc.noaa.gov/data-access/quick-links#loc-clim)\n- [National Weather Service](http://www.weather.gov/help-past-weather)\n- [Weather Underground](https://www.wunderground.com/)\n- [WeatherBase](http://www.weatherbase.com/)\n\n## Financial And Economic Data\n\n- [Federal Reserve Economic Database](https://fred.stlouisfed.org/)\n- [Financial Data Finder at OSU](./) - Missing link.\n- [Global Financial Data](https://www.globalfinancialdata.com/index.html)\n- [Google Finance](https://www.google.com/finance)\n- [Google Public Data Explorer](http://www.google.com/publicdata/directory)\n- [IMF Economic Data](https://data.imf.org/?sk=388dfa60-1d26-4ade-b505-a05a558d9a42)\n- [National Bureau of Economic Research](http://www.nber.org/data/)\n- [OpenCorporates](https://opencorporates.com/)\n- [The Atlas of Economic Complexity](http://atlas.cid.harvard.edu/)\n- [U.S. Bureau of Economic Analysis](http://www.bea.gov/)\n- [U.S. Securities and Exchange Commission](https://www.sec.gov/dera/data/financial-statement-data-sets.html)\n- [UN Comtrade Database](https://comtrade.un.org/labs/)\n- [Visualizing Economics](http://visualizingeconomics.com/)\n- [World Bank Doing Business Database](http://www.doingbusiness.org/rankings)\n- [World Bank Open Data](http://data.worldbank.org/)\n\n## Government And World\n\n- [Data.gov](http://www.data.gov/)\n- [European Union Open Data Portal](http://data.europa.eu/euodp/en/data/)\n- [Gapminder](https://www.gapminder.org/data/)\n- [Land Matrix (Transnational Land Database)](http://landmatrix.org/en/)\n- [OECD Aid Database](http://www.oecd.org/dac/financing-sustainable-development/development-finance-data/)\n- [Open Data Network](http://www.opendatanetwork.com/)\n- [The CIA World Factbook](https://www.cia.gov/the-world-factbook/)\n- [The World Bank’s World Development Indicators](http://data.worldbank.org/data-catalog/world-development-indicators)\n- [U.S. Census Bureau](http://www.census.gov/)\n- [UNDP’s Human Development Index](http://hdr.undp.org/en/data)\n\n## Health\n\n- [America’s Health Rankings](http://www.americashealthrankings.org/)\n- [Centers for Disease Control and Prevention](http://www.cdc.gov/datastatistics/)\n- [Health & Social Care Information Centre](http://www.hscic.gov.uk/home)\n- [Health Services Research Information Central](https://www.nlm.nih.gov/hsrinfo/datasites.html)\n- [HealthData.gov](https://www.healthdata.gov/)\n- [Medicare Hospital Quality](https://data.medicare.gov/data/hospital-compare#)\n- [MedicinePlus](https://www.nlm.nih.gov/medlineplus/healthstatistics.html)\n- [National Center for Health Statistics](http://www.cdc.gov/nchs/)\n- [SEER Cancer Incidence](http://seer.cancer.gov/faststats/selections.php?series=cancer)\n- [World Health Organization](http://www.who.int/en/)\n\n## Human Rights\n\n- [Amnesty International](https://www.amnesty.org/en/search/?q=&documentType=Annual+Report)\n- [Human Rights Data Analysis Group](https://hrdag.org/)\n- [The Armed Conflict Database by Uppsala University](http://www.pcr.uu.se/research/UCDP/)\n\n## Labor And Employment Data\n\n- [Bureau of Labor Statistics](http://www.bls.gov/)\n- [Department of Labor](https://www.dol.gov/general/topic/statistics/employment)\n- [Employment by U.S. Census](http://www.census.gov/topics/employment.html)\n- [U.S. Small Business Administration](https://www.sba.gov/starting-business/how-start-business/business-data-statistics/employment-statistics)\n\n## Politics\n\n- [California Field Poll](http://dlab.berkeley.edu/data-resources/california-polls)\n- [Crowdpac](https://www.crowdpac.com/)\n- [Gallup](http://www.gallup.com/home.aspx)\n- [Open Secrets](https://www.opensecrets.org/)\n- [Rand State Statistics](http://www.randstatestats.org/us/)\n- [Real Clear Politics](http://guides.lib.berkeley.edu/Intro-to-Political-Science-Research/Stats)\n- [Roper Center for Public Opinion Research](https://ropercenter.cornell.edu/)\n- [US Voter Files](http://voterlist.electproject.org/) Note only some states are free, and most do not allow voter files to be used for commercial purposes - this map allows you to see the rules/cost for each state.\n\n## Retail\n\n- [Love the Sales](https://www.lovethesales.com/press/data-request)\n\n## Social\n\n- [Facebook Graph API](https://developers.facebook.com/docs/graph-api)\n- [Google Trends](http://www.google.com/trends/explore)\n- [SocialMention](./) - Missing link.\n\n## Travel And Transportation\n\n- [Bureau of Transportation Statistics](https://www.bts.gov/browse-statistical-products-and-data)\n- [Monthly Tourism Statistics – U.S. Travelers Overseas](http://travel.trade.gov/research/monthly/departures/)\n- [Search the World](http://www.geoba.se/)\n- [SkiftStats](https://skift.com/skiftx/skiftstats/)\n- [U.S. Travel Association](https://www.ustravel.org/research)\n\n## Various Portals\n\n- [Ckan](https://ckan.org/)\n- [Dataverse](https://dataverse.org/)\n- [DBpedia](https://wiki.dbpedia.org/)\n- [freeCodeCamp Open Data](https://github.com/freeCodeCamp/open-data)\n- [Kaggle](https://www.kaggle.com/datasets)\n- [LODUM](https://lodum.de/)\n- [Open Data Ipact Map](http://opendataimpactmap.org/)\n- [Open Data Kit](https://opendatakit.org/)\n- [Open Data Monitor](https://opendatamonitor.eu/frontend/web/index.php?r=dashboard%2Findex)\n- [Plenar.io](http://plenar.io/)\n- [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php)\n- [Yelp Open Datasets](https://www.yelp.com/dataset)\n\n\n## Source Articles and Blog Posts\n\n\n- [100+ of the Best Free Data Sources For Your Next Project](https://www.columnfivemedia.com/100-best-free-data-sources-infographic)\n- [15 Great Free Data Sources for 2016](https://medium.com/@Infogram/15-great-free-data-sources-for-2016-25cb455db257)\n- [20 Awesome Sources of Free Data](https://www.searchenginejournal.com/free-data-sources/302601/#close)\n- [30+ Free Data Sources Every Company Should Be Aware Of](https://www.bernardmarr.com/default.asp?contentID=960)\n- [50 Amazing Free Data Sources You Should Know](https://infogram.com/blog/free-data-sources/)\n- [50 Best Open Data Sources Ready to be Used Right Now](https://learn.g2.com/open-data-sources)\n- [70 Amazing Free Data Sources You Should Know](https://www.kdnuggets.com/2017/12/big-data-free-sources.html)\n- [Big Data: 33 Brilliant And Free Data Sources Anyone Can Use](https://www.forbes.com/sites/bernardmarr/2016/02/12/big-data-35-brilliant-and-free-data-sources-for-2016/#527557ffb54d)\n- [These Are The Best Free Open Data Sources Anyone Can Use](https://www.freecodecamp.org/news/https-medium-freecodecamp-org-best-free-open-data-sources-anyone-can-use-a65b514b0f2d/)\n"
  },
  {
    "path": "sections/08-InterviewQuestions.md",
    "content": "1001 Data Engineering Interview Questions\n=========================================\n\nHey everyone, this collection of questions and answers is a work in progress.\nI'm going to keep adding Q&As, but you are invited to collaborate through [GitHub](https://github.com/andkret/Cookbook):\n- Eiter clone this repo, make your changes and create a pull request\n- or raise an issue on GitHub with your questions and answers and we'll add them\n\nAndreas\n\n## Contents:\n\n- [Python](10-InterviewQuestions.md#python)\n- [SQL](10-InterviewQuestions.md#sql)\n- [Integrate](10-InterviewQuestions.md#integrate)\n    - [APIs](10-InterviewQuestions.md#apis)\n- [Message Queues](10-InterviewQuestions.md#message-queues)\n    - [Distributed Message Queues](10-InterviewQuestions.md#distributed-message-queues)\n    - [Message Queues (Fifo)](10-InterviewQuestions.md#integrate)\n    - [Caches](10-InterviewQuestions.md#caches)\n- [Data Processing](10-InterviewQuestions.md#data-processing)\n    - [ETL](10-InterviewQuestions.md#etl)\n    - [Stream Processing](10-InterviewQuestions.md#stream-processing)\n    - [Batch Processing](10-InterviewQuestions.md#batch-processing)\n    - [Processing Frameworks](10-InterviewQuestions.md#processing-frameworks)\n        - [Serverless](10-InterviewQuestions.md#serverless)\n        - [Distributed Processing Frameworks](10-InterviewQuestions.md#distributed-processing-frameworks)\n    - [Scheduling](10-InterviewQuestions.md#scheduling)\n        - [Airflow](10-InterviewQuestions.md#airflow)\n    - [CI-CD](10-InterviewQuestions.md#ci-cd)\n    - [Docker](10-InterviewQuestions.md#docker)\n    - [Kubernetes](10-InterviewQuestions.md#kubernetes)\n- [Data Storage](10-InterviewQuestions.md#data-storage)\n    - [Relational Databases](10-InterviewQuestions.md#relational-databases)\n    - [NoSQL](10-InterviewQuestions.md#nosql)\n    - [Analytical Stores](10-InterviewQuestions.md#analytical-stores)\n    - [Relational Modeling](10-InterviewQuestions.md#relational-modeling)\n    - [Dimensional Data Modeling](10-InterviewQuestions.md#dimensional-modeling)\n    - [Data Lakes](10-InterviewQuestions.md#data-lakes)\n- [Data Platforms](10-InterviewQuestions.md#data-platforms)\n    - [AWS](10-InterviewQuestions.md#aws)\n    - [Azure](10-InterviewQuestions.md#azure)\n    - [GCP](10-InterviewQuestions.md#gcp)\n    - [Snowflake](10-InterviewQuestions.md#snowflake)\n\n\n### Python\n\n1. **What is Apache Spark, and how can you use it with Python?**\n   - **Answer**: Apache Spark is a distributed data processing framework that allows for big data processing with in-memory computing capabilities. You can use it with Python through PySpark, which provides a Python API for Spark. PySpark enables data engineers to write Spark applications in Python.\n\n2. **How do you perform data cleaning in Python?**\n   - **Answer**: Data cleaning in Python can be performed using the `pandas` library. Common tasks include handling missing values (`dropna`, `fillna`), removing duplicates (`drop_duplicates`), converting data types, normalizing data, and handling outliers. Example:\n     ```python\n     import pandas as pd\n     df = pd.read_csv('data.csv')\n     df.dropna(inplace=True)  # Remove rows with missing values\n     df['column'] = df['column'].astype(int)  # Convert column to integer type\n     ```\n\n3. **Explain how you would optimize a slow-running SQL query within a Python ETL pipeline.**\n   - **Answer**: To optimize a slow-running SQL query, you can:\n     - Analyze the query execution plan.\n     - Add appropriate indexes.\n     - Optimize the query by reducing complexity, such as using JOINs efficiently and avoiding unnecessary subqueries.\n     - Partition large tables if applicable.\n     - Use caching and materialized views for frequently accessed data.\n     - Ensure that statistics are up to date.\n     Example with SQLAlchemy:\n     ```python\n     from sqlalchemy import create_engine\n     engine = create_engine('postgresql://user:password@localhost/dbname')\n     with engine.connect() as connection:\n         result = connection.execute('SELECT * FROM table WHERE condition')\n         data = result.fetchall()\n     ```\n\n4. **What is the role of a workflow scheduler in data engineering, and can you name some common ones?**\n   - **Answer**: A workflow scheduler automates and manages the execution of ETL jobs and data pipelines. It ensures tasks are executed in the correct order and handles retries, dependencies, and monitoring. Common workflow schedulers include Apache Airflow, Luigi, Prefect, and Apache NiFi.\n\n5. **How do you handle schema changes in a data pipeline?**\n   - **Answer**: Handling schema changes in a data pipeline involves:\n     - Implementing schema evolution techniques.\n     - Using tools like Apache Avro, which supports schema evolution.\n     - Versioning schemas and ensuring backward compatibility.\n     - Monitoring and validating incoming data against the schema.\n     - Applying transformations to adapt to new schemas.\n     Example with Avro:\n     ```python\n     from avro.datafile import DataFileReader\n     from avro.io import DatumReader\n\n     reader = DataFileReader(open(\"data.avro\", \"rb\"), DatumReader())\n     for record in reader:\n         print(record)\n     reader.close()\n     ```\n\n6. **What is data partitioning, and why is it important in data engineering?**\n   - **Answer**: Data partitioning is the process of dividing a large dataset into smaller, more manageable pieces, often based on a key such as date, user ID, or geographic location. Partitioning improves query performance by reducing the amount of data scanned and allows for parallel processing. It also helps in managing large datasets and reducing I/O costs.\n\n7. **How do you ensure data quality in your pipelines?**\n   - **Answer**: Ensuring data quality involves:\n     - Implementing data validation checks (e.g., constraints, data type checks).\n     - Monitoring for data anomalies and inconsistencies.\n     - Using data profiling tools to understand the data.\n     - Creating unit tests for data processing logic.\n     - Automating data quality checks and alerting mechanisms.\n     Example with `pandas` for data validation:\n     ```python\n     import pandas as pd\n\n     df = pd.read_csv('data.csv')\n     assert df['column'].notnull().all(), \"Missing values found in column\"\n     assert (df['age'] >= 0).all(), \"Negative ages found\"\n     ```\n\n8. **What is the difference between batch processing and stream processing?**\n   - **Answer**: Batch processing involves processing large volumes of data at once, usually at scheduled intervals. It is suitable for tasks that are not time-sensitive. Stream processing, on the other hand, involves processing data in real-time as it arrives, which is suitable for time-sensitive applications such as real-time analytics, monitoring, and alerts.\n\n9. **How do you implement logging and monitoring in your data pipelines?**\n   - **Answer**: Logging and monitoring can be implemented using:\n     - Logging libraries like Python's `logging` module to capture and store logs.\n     - Monitoring tools like Prometheus, Grafana, or ELK Stack (Elasticsearch, Logstash, Kibana) to visualize and monitor logs.\n     - Setting up alerts for failures or anomalies.\n     Example with Python's `logging` module:\n     ```python\n     import logging\n\n     logging.basicConfig(filename='pipeline.log', level=logging.INFO)\n     logging.info('This is an informational message')\n     logging.error('This is an error message')\n     ```\n\n10. **What are some common challenges you face with distributed data processing, and how do you address them?**\n    - **Answer**: Common challenges with distributed data processing include data consistency, fault tolerance, data shuffling, and latency. To address these:\n      - Use distributed processing frameworks like Apache Spark, which handle many of these issues internally.\n      - Implement robust error handling and retries.\n      - Optimize data shuffling by partitioning data effectively.\n      - Use caching mechanisms to reduce latency.\n      - Ensure proper resource allocation and scaling to handle large data volumes.\n\n## SQL\n\n## Integrate\n### APIs\n\nThese questions cover a range of topics related to APIs, including their concepts, security, best practices, and specific implementation details.\n\n1. **What is an API and how does it work?**\n   - **Answer**: An API (Application Programming Interface) is a set of rules and protocols for building and interacting with software applications. It allows different software systems to communicate with each other. APIs define the methods and data formats that applications can use to request and exchange data.\n\n2. **What are the different types of APIs?**\n   - **Answer**: The main types of APIs include:\n     - **Open APIs (Public APIs)**: Available to developers and other users with minimal restrictions.\n     - **Internal APIs (Private APIs)**: Used within an organization to connect systems and data internally.\n     - **Partner APIs**: Shared with specific business partners and offer more control over how data is exposed.\n     - **Composite APIs**: Combine multiple API requests into a single call, allowing multiple data or service requests in one API call.\n\n3. **What is REST and how does it differ from SOAP?**\n   - **Answer**: REST (Representational State Transfer) and SOAP (Simple Object Access Protocol) are two different approaches to building APIs. REST uses standard HTTP methods (GET, POST, PUT, DELETE) and is stateless, meaning each request from a client to a server must contain all the information needed to understand and process the request. SOAP, on the other hand, is a protocol that relies on XML-based messaging and includes built-in rules for security and transactions.\n\n4. **Explain the concept of RESTful services.**\n   - **Answer**: RESTful services are web services that follow the principles of REST. These principles include:\n     - **Statelessness**: Each request from a client must contain all the information needed by the server to process the request.\n     - **Client-Server Architecture**: The client and server are separate entities, and they communicate over a network via standard HTTP.\n     - **Cacheability**: Responses from the server can be cached by the client or intermediate proxies to improve performance.\n     - **Uniform Interface**: Resources are identified in the request (usually via URIs), and actions are performed using standard HTTP methods.\n\n5. **What is an API gateway and why is it used?**\n   - **Answer**: An API gateway is a server that acts as an intermediary for requests from clients seeking resources from backend services. It provides various functions such as request routing, composition, protocol translation, and handling of cross-cutting concerns like authentication, authorization, logging, monitoring, and rate limiting. It simplifies the client interface and improves security, scalability, and manageability of API services.\n\n6. **How do you ensure the security of an API?**\n   - **Answer**: Ensuring API security involves several practices, including:\n     - **Authentication**: Verify the identity of the user or system making the request (e.g., using OAuth, JWT).\n     - **Authorization**: Ensure the authenticated user or system has permission to perform the requested action.\n     - **Encryption**: Use HTTPS to encrypt data in transit between the client and server.\n     - **Rate Limiting**: Prevent abuse by limiting the number of requests a client can make in a given time period.\n     - **Input Validation**: Validate and sanitize all inputs to prevent injection attacks.\n     - **Logging and Monitoring**: Track API usage and monitor for unusual or suspicious activity.\n\n7. **What is versioning in APIs and how is it typically managed?**\n   - **Answer**: API versioning is the practice of managing changes to an API without disrupting existing clients. It can be managed in several ways, including:\n     - **URI Versioning**: Including the version number in the URI path (e.g., `/v1/resource`).\n     - **Query Parameter Versioning**: Including the version number as a query parameter (e.g., `/resource?version=1`).\n     - **Header Versioning**: Including the version number in the HTTP headers (e.g., `Accept: application/vnd.example.v1+json`).\n\n8. **What are HTTP status codes and why are they important in API responses?**\n   - **Answer**: HTTP status codes are standardized codes returned by a server to indicate the result of a client's request. They are important because they provide meaningful feedback to the client about what happened with their request. Common status codes include:\n     - **200 OK**: The request was successful.\n     - **201 Created**: A resource was successfully created.\n     - **400 Bad Request**: The request was invalid or cannot be processed.\n     - **401 Unauthorized**: Authentication is required and has failed or has not yet been provided.\n     - **404 Not Found**: The requested resource could not be found.\n     - **500 Internal Server Error**: An error occurred on the server.\n\n9. **Explain the concept of idempotency in RESTful APIs.**\n   - **Answer**: Idempotency refers to the property of certain operations whereby performing the same operation multiple times results in the same outcome. In RESTful APIs, methods like GET, PUT, and DELETE are idempotent because making the same request multiple times has the same effect as making it once. POST is not idempotent because multiple requests could create multiple resources.\n\n10. **How do you handle pagination in APIs?**\n    - **Answer**: Pagination is used to split large sets of data into manageable chunks. Common methods for handling pagination include:\n      - **Offset and Limit**: Using query parameters to specify the starting point and number of records to return (e.g., `?offset=0&limit=10`).\n      - **Page Number and Size**: Using query parameters to specify the page number and the number of records per page (e.g., `?page=1&size=10`).\n      - **Cursor-Based Pagination**: Using a cursor (a pointer to a specific record) to fetch the next set of results (e.g., `?cursor=abc123`).\n\n\nThese additional questions cover more advanced topics related to APIs, including security, design principles, best practices, and tooling.\n11. **What is the difference between synchronous and asynchronous API calls?**\n    - **Answer**: Synchronous API calls wait for the response before continuing, blocking the execution of code until the operation completes. Asynchronous API calls, on the other hand, do not block the execution; they allow the code to continue running and handle the response once it arrives, typically through callbacks, promises, or async/await patterns.\n\n12. **What is a webhook, and how does it differ from an API endpoint?**\n    - **Answer**: A webhook is a way for an application to provide other applications with real-time information. A webhook is a \"callback\" that allows the sending application to push data to the receiving application when an event occurs. Unlike traditional API endpoints, which require the client to periodically check for data (polling), webhooks enable the server to push data to the client when an event occurs.\n\n13. **What is CORS, and why is it important in the context of APIs?**\n    - **Answer**: CORS (Cross-Origin Resource Sharing) is a security feature implemented in web browsers that restricts web pages from making requests to a different domain than the one that served the web page. It is important in APIs to control how resources on a server are accessed by external domains. Proper CORS configuration ensures that only authorized domains can access API resources.\n\n14. **What is the purpose of API documentation, and what should it include?**\n    - **Answer**: API documentation provides developers with the information they need to use and integrate with an API effectively. It should include:\n      - An overview of the API and its purpose.\n      - Authentication and authorization methods.\n      - Endpoint definitions and available methods (GET, POST, PUT, DELETE).\n      - Request and response formats (including headers, query parameters, and body data).\n      - Error codes and their meanings.\n      - Examples of requests and responses.\n      - Rate limits and usage policies.\n\n15. **What are API gateways, and what role do they play in API management?**\n    - **Answer**: API gateways act as intermediaries between clients and backend services. They provide various functions such as request routing, load balancing, security (authentication and authorization), rate limiting, logging, monitoring, and transforming requests and responses. API gateways simplify client interactions with microservices and help manage and secure APIs.\n\n16. **How do you handle authentication and authorization in APIs?**\n    - **Answer**: Authentication verifies the identity of a user or application, while authorization determines what resources and operations they have access to. Common methods for handling authentication and authorization in APIs include:\n      - API keys: Simple tokens provided to access the API.\n      - OAuth: An open standard for token-based authentication and authorization.\n      - JWT (JSON Web Tokens): A compact, URL-safe means of representing claims to be transferred between two parties.\n      - Basic Auth: A simple method using a username and password encoded in base64.\n\n17. **What is the concept of rate limiting in APIs, and why is it important?**\n    - **Answer**: Rate limiting controls the number of requests a client can make to an API within a specified time period. It is important for:\n      - Preventing abuse and overuse of API resources.\n      - Ensuring fair usage among clients.\n      - Protecting the backend services from being overwhelmed.\n      - Managing and maintaining service quality and performance.\n\n18. **Explain the concept of API throttling.**\n    - **Answer**: API throttling is the process of controlling the usage rate of an API by limiting the number of requests a client can make within a certain timeframe. Throttling helps prevent abuse, protects resources, and ensures that the service remains available and responsive to all users. It can be implemented using techniques such as rate limits, quotas, and burst control.\n\n19. **What is HATEOAS and how does it relate to RESTful APIs?**\n    - **Answer**: HATEOAS (Hypermedia As The Engine Of Application State) is a constraint of RESTful APIs where hypermedia links are included in the responses to guide clients through the API. It allows clients to dynamically discover available actions and navigate the API without hardcoding the structure. For example, a response to a GET request for a user resource might include links to update or delete the user.\n\n20. **What are some common tools and platforms for testing and documenting APIs?**\n    - **Answer**: Common tools and platforms for testing and documenting APIs include:\n      - **Postman**: A popular tool for developing, testing, and documenting APIs.\n      - **Swagger/OpenAPI**: A framework for designing, building, and documenting RESTful APIs, often used with tools like Swagger UI and Swagger Editor.\n      - **Insomnia**: An API client for testing RESTful and GraphQL APIs.\n      - **Apigee**: An API management platform providing tools for API design, security, analytics, and monitoring.\n      - **Paw**: A macOS-based API client for testing and documenting APIs.\n      - **RAML (RESTful API Modeling Language)**: A language for designing and documenting APIs.\n\n\n## Message queues\n### Distributed Message Queues\n### Message Queues (Fifo)\n### Caches\n\n## Data Processing\n### ETL\n### Stream processing\n### Batch processing\n### Processing Frameworks\n#### Serverless\n#### Distributed Processing frameworks\n### Scheduling\n#### Airflow\n### Docker and Kubernetes\n### CI-CD\n\n## Data Storage\n### Relational Databases\n### NoSQL\n### Analytical Stores\n### Relational Modeling\n### Dimensional Data Modeling\n### Data Lakes\n\n## Data Platforms\n### AWS\n### GCP\n### Azure\n### Snowflake\n\n\n\n\n\nLooking for a job or just want to know what people find important? In\nthis chapter you can find a lot of interview questions we collect on the\nstream.\n\nUltimately this should reach at least one thousand and one questions.\n\n**But Andreas, where are the answers??** Answers are for losers. I have\nbeen thinking a lot about this and the best way for you to prepare and\nlearn is to look into these questions yourself.\n\nThis cookbook or Google will help you a long way. Some questions we\ndiscuss directly on the live stream.\n\nLive Streams\n------------\n\nFirst live stream where we started to collect these questions.\n\n| Podcast Episode: #096 1001 Data Engineering Interview Questions\n|------------------|\n|First live stream where we collect and try to answer as many interview questions as possible. If this helps people and is fun we do this regularly until we reach 1000 and one.\n| [Watch on YouTube](https://youtu.be/WbqRH2r3N40)\n\nAll Interview Questions\n-----------------------\n\nThe interview questions are roughly structured like the sections in the\n\\\"Basic data engineering skills\\\" part. This makes it easier to navigate\nthis document. I still need to sort them accordingly.\n\n### SQL DBs\n\n-   What are windowing functions?\n\n-   What is a stored procedure?\n\n-   Why would you use them?\n\n-   What are atomic attributes?\n\n-   Explain ACID props of a database\n\n-   How to optimize queries?\n\n-   What are the different types of JOIN (CROSS, INNER, OUTER)?\n\n-   What is the difference between Clustered Index and Non-Clustered\n    Index - with examples?\n\n### The Cloud\n\n-   What is serverless?\n\n-   What is the difference between IaaS, PaaS and SaaS?\n\n-   How do you move from the ingest layer to the Cosumption layer? (In\n    Serverless)\n\n-   What is edge computing?\n\n-   What is the difference between cloud and edge and on-premise?\n\n### Linux\n\n-   What is crontab?\n\n### Big Data\n\n-   What are the 4 V's?\n\n-   Which one is most important?\n\n### Kafka\n\n-   What is a topic?\n\n-   How to ensure FIFO?\n\n-   How do you know if all messages in a topic have been fully consumed?\n\n-   What are brokers?\n\n-   What are consumergroups?\n\n-   What is a producer?\n\n### Coding\n\n-   What is the difference between an object and a class?\n\n-   Explain immutability\n\n-   What are AWS Lambda functions and why would you use them?\n\n-   Difference between library, framework and package\n\n-   How to reverse a linked list\n\n-   Difference between args and kwargs\n\n-   Difference between OOP and functional programming\n\n### NoSQL DBs\n\n-   What is a key-value (rowstore) store?\n\n-   What is a columnstore?\n\n-   Diff between Row and col.store\n\n-   What is a document store?\n\n-   Difference between Redshift and Snowflake\n\n### Hadoop\n\n-   What file formats can you use in Hadoop?\n\n-   What is the difference between a namenode and a datanode?\n\n-   What is HDFS?\n\n-   What is the purpose of YARN?\n\n### Lambda Architecture\n\n-   What is streaming and batching?\n\n-   What is the upside of streaming vs batching?\n\n-   What is the difference between lambda and kappa architecture?\n\n-   Can you sync the batch and streaming layer and if yes how?\n\n\n### Data Warehouse & Data Lake\n\n-   What is a data lake?\n\n-   What is a data warehouse?\n\n-   Are there data lake warehouses?\n\n-   Two data lakes within single warehouse?\n\n-   What is a data mart?\n\n-   What is a slow changing dimension (types)?\n\n-   What is a surrogate key and why use them?\n\n### APIs (REST)\n\n-   What does REST mean?\n\n-   What is idempotency?\n\n-   What are common REST API frameworks (Jersey and Spring)?\n\n### Apache Spark\n\n-   What is an RDD?\n\n-   What is a dataframe?\n\n-   What is a dataset?\n\n-   How is a dataset typesafe?\n\n-   What is Parquet?\n\n-   What is Avro?\n\n-   Difference between Parquet and Avro\n\n-   Tumbling Windows vs. Sliding Windows\n\n-   Difference between batch and stream processing\n\n-   What are microbatches?\n\n### MapReduce\n\n-   What is a use case of mapreduce?\n\n-   Write a pseudo code for wordcount\n\n-   What is a combiner?\n\n### Docker & Kubernetes\n\n-   What is a container?\n\n-   Difference between Docker Container and a Virtual PC\n\n-   What is the easiest way to learn kubernetes fast?\n\n### Data Pipelines\n\n-   What is an example of a serverless pipeline?\n\n-   What is the difference between at most once vs at least once vs\n    exactly once?\n\n-   What systems provide transactions?\n\n-   What is a ETL pipeline?\n\n### Airflow\n\n-   What is a DAG (in context of airflow/luigi)?\n\n-   What are hooks/is a hook?\n\n-   What are operators?\n\n-   How to branch?\n\n### DataVisualization\n\n-   What is a BI tool?\n\n### Security/Privacy\n\n-   What is Kerberos?\n\n-   What is a firewall?\n\n-   What is GDPR?\n\n-   What is anonymization?\n\n### Distributed Systems\n\n-   How clusters reach consensus (the answer was using consensus\n    protocols like Paxos or Raft). Good I didnt have to explain paxos\n\n-   What is the cap theorem / explain it (What factors should be\n    considered when choosing a DB?)\n\n-   How to choose right storage for different data consumers? It's\n    always a tricky question\n\n### Apache Flink\n\n-   What is Flink used for?\n\n-   Flink vs Spark?\n\n### GitHub\n\n-   What are branches?\n\n-   What are commits?\n\n-   What's a pull request?\n\n### Dev/Ops\n\n-   What is continuous integration?\n\n-   What is continuous deployment?\n\n-   Difference CI/CD\n\n### Development / Agile\n\n-   What is Scrum?\n\n-   What is OKR?\n\n-   What is Jira and what is it used for?\n\n\n\n"
  },
  {
    "path": "sections/09-BooksAndCourses.md",
    "content": "Recommended Books, Courses, and Podcasts\n=============================\n\n## Contents\n- [About Books and Courses](09-BooksAndCourses.md#about-books-and-courses)\n- [Books](09-BooksAndCourses.md#books)\n  - [Languages](09-BooksAndCourses.md#books-languages)\n  - [Data Science Tools](09-BooksAndCourses.md#books-data-science-tools)\n  - [Business](09-BooksAndCourses.md#Books-Business)\n  - [Community Recommendations](09-BooksAndCourses.md#Community-Recommendations)\n- [Online Courses](09-BooksAndCourses.md#Online-Courses)\n    - [Preparation courses](09-BooksAndCourses.md#Preparation-courses)\n    - [Data engineering courses](09-BooksAndCourses.md#Data-engineering-courses)\n- [Certifications](09-BooksAndCourses.md#Certifications)\n- [Podcasts](09-BooksAndCourses.md#Podcasts)\n\n\n## About Books, Courses, and Podcasts\n\nThis is a collection of books and courses I can recommend personally.\nThey are great for every data engineering learner.\n\nI either have used or own these books during my professional work.\n\nI also looked into every online course personally.\n\nIf you want to buy a book or course and support my work, please use one of my links below. They are all affiliate marketing links that help me fund this passion.\n\nOf course all this comes at no additional expense to you, but it helps me a lot.\n\nYou can find even more interesting books and my whole podcast equipment on my Amazon store:\n\n[Go to the Amazon store](https://www.amazon.com/shop/plumbersofdatascience)\n\n\n\nPS: Don't just get a book and expect to learn everything\n  - Course certificates alone help you nothing\n  - Have a purpose in mind, like a small project\n  - Great for use at work\n\n## Books\n\n### Languages\n\n#### Java\n\n[Learning Java: A Bestselling Hands-On Java Tutorial](https://amzn.to/2MgYp8h)\n\n#### Python\n\n[Learning Python, 5th Edition](https://amzn.to/2MdpM34)\n\n\n#### Scala\n\n[Programming Scala: Scalability = Functional Programming + Objects](https://amzn.to/2VIpww5)\n\n\n#### Swift\n\n[Learning Swift: Building Apps for macOS, iOS, and Beyond](https://amzn.to/31hDN4e)\n\n\n### Data Science Tools\n\n#### Apache Spark\n\n[Learning Spark: Lightning-Fast Big Data Analysis](https://amzn.to/31mtAUg)\n\n\n#### Apache Kafka\n\n[Kafka Streams in Action: Real-time apps and microservices with the Kafka Streams API](https://amzn.to/35uiSOJ)\n\n\n#### Apache Hadoop\n\n[Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale](https://amzn.to/2VNzf4n)\n\n\n#### Apache HBase\n\n[HBase: The Definitive Guide: Random Access to Your Planet-Size Data](https://amzn.to/2BbiyGz)\n\n\n### Business\n\n#### The Lean Startup\n\n[The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses](https://amzn.to/2Meyv5e)\n\n#### Zero to One\n\n[Zero to One: Notes on Startups, or How to Build the Future](https://amzn.to/2BbBwgr)\n\n\n#### The Innovators Dilemma\n\n[The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail (Management of Innovation and Change)](https://amzn.to/31eGZ0k)\n\n\n#### Crossing the Chasm\n\n[Crossing the Chasm, 3rd Edition (Collins Business Essentials)](https://amzn.to/2IU7QZs)\n\n\n#### Crush It!\n\n[Crush It!: Why Now Is The Time To Cash In On Your Passion](https://amzn.to/33xe7Su)\n\n### Community Recommendations\n\n#### Designing Data-Intensive Applications\n\n\"In my opinion, the knowledge contained in this book differentiates a data engineer from a software engineer or a developer. The book strikes a good balance between breadth and depth of discussion on data engineering topics, as well as the tradeoffs we must make due to working with massive amounts of data.\" -- David Lee on LinkedIn\n\n[Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems](https://amzn.to/2MIqTqJ)\n\n\n## Online Courses\n\n### Preparation courses\n\n| Course name | Course description | Course URL |\n|---|---|---|\n| The Bits and Bytes of Computer Networking | This course is designed to provide a full overview of computer networking. We’ll cover everything from the fundamentals of modern networking technologies and protocols to an overview of the cloud to practical applications and network troubleshooting. | https://www.coursera.org/learn/computer-networking |\n| Learn SQL \\| Codecademy | In this SQL course, you'll learn how to manage large datasets and analyze real data using the standard data management language. | https://www.codecademy.com/learn/learn-sql |\n| Learn Python 3 \\| Codecademy | Learn the basics of Python 3, one of the most powerful, versatile, and in-demand programming languages today. | https://www.codecademy.com/learn/learn-python-3 |\n\n### Data engineering courses\n\n| Course name | Course description | Course URL |\n|---|---|---|\n| **1. Data Engineering Basics** |  |  |\n| Introduction to Data Engineering | Introduction to Data Engineering with over 1 hour of videos including my journey here. | https://learndataengineering.com/p/introduction-to-data-engineering |\n| Computer Science Fundamentals | A complete guide of topics and resources you should know as a Data Engineer. | https://learndataengineering.com/p/data-engineering-fundamentals |\n| Introduction to Python | Learn all the fundamentals of Python to start coding quick | https://learndataengineering.com/p/introduction-to-python |\n| Python for Data Engineers | Learn all the Python topics a Data Engineer needs even if you don't have a coding background | https://learndataengineering.com/p/python-for-data-engineers |\n| Docker Fundamentals | Learn all the fundamental Docker concepts with hands-on examples | https://learndataengineering.com/p/docker-fundamentals |\n| Successful Job Application | Everything you need to get your dream job in Data Engineering. | https://learndataengineering.com/p/successful-job-application |\n| Data Preparation & Cleaning for ML | All you need for preparing data to enable Machine Learning. | https://learndataengineering.com/p/data-preparation-and-cleaning-for-ml |\n| **2. Platform & Pipeline Design Fundamentals** |  |  |\n| Data Platform And Pipeline Design | Learn how to build data pipelines with templates and examples for Azure, GCP and Hadoop. | https://learndataengineering.com/p/data-pipeline-design |\n| Platform & Pipelines Security | Learn the important security fundamentals for Data Engineering | https://learndataengineering.com/p/platform-pipeline-security |\n| Choosing Data Stores | Learn the different types of data stores and when to use which. | https://learndataengineering.com/p/choosing-data-stores |\n| Schema Design Data Stores | Learn how to design schemas for SQL, NoSQL and Data Warehouses. | https://learndataengineering.com/p/data-modeling |\n| **3. Fundamental Tools** |  |  |\n| Building APIs with FastAPI | Learn the fundamentals of designing, creating and deploying APIs with FastAPI and Docker | https://learndataengineering.com/p/apis-with-fastapi-course |\n| Apache Kafka Fundamentals | Learn the fundamentals of Apache Kafka | https://learndataengineering.com/p/apache-kafka-fundamentals |\n| Apache Spark Fundamentals | Apache Spark quick start course in Python with Jupyter notebooks, DataFrames, SparkSQL and RDDs. | https://learndataengineering.com/p/learning-apache-spark-fundamentals |\n| Data Engineering on Databricks | Everything you need to get started with Databricks. From setup to building ETL pipelines &amp; warehousing. | https://learndataengineering.com/p/data-engineering-on-databricks |\n| MongoDB Fundamentals | Learn how to use MongoDB | https://learndataengineering.com/p/mongodb-fundamentals-course |\n| Log Analysis with Elasticsearch | Learn how to monitor and debug your data pipelines | https://learndataengineering.com/p/log-analysis-with-elasticsearch |\n| Airflow Workflow Orchestration | Learn how to orchestrate your data pipelines with Apache Airflow | https://learndataengineering.com/p/learn-apache-airflow |\n| Snowflake for Data Engineers | Everything you need to get started with Snowflake | https://learndataengineering.com/p/snowflake-for-data-engineers |\n| dbt for Data Engineers | Everything you need to work with dbt and Snowflake | https://learndataengineering.com/p/dbt-for-data-engineers |\n| **4. Full Hands-On Example Projects** |  |  |\n| Data Engineering on AWS | Full 5 hours course with complete example project. Building stream and batch processing pipelines on AWS. | https://learndataengineering.com/p/data-engineering-on-aws |\n| Data Engineering on Azure | Ingest, Store, Process, Serve and Visualize Streams of Data by Building Streaming Data Pipelines in Azure. | https://learndataengineering.com/p/build-streaming-data-pipelines-in-azure |\n| Data Engineering on GCP | Everything you need to start with Google Cloud. | https://learndataengineering.com/p/data-engineering-on-gcp |\n| Modern Data Warehouses & Data Lakes | How to integrate a Data Lake with a Data Warehouse and query data directly from files | https://learndataengineering.com/p/modern-data-warehouses |\n| Machine Learning & Containerization On AWS | Build a app that analyzes the sentiment of tweets and visualizing them on a user interface hosted as container | https://learndataengineering.com/p/ml-on-aws |\n| Contact Tracing with Elasticsearch | Track 100,000 users in San Francisco using Elasticsearch and an interactive Streamlit user interface | https://learndataengineering.com/p/contact-tracing-with-elasticsearch |\n| Document Streaming Project | Document Streaming with FastAPI, Kafka, Spark Streaming, MongoDB and Streamlit | https://learndataengineering.com/p/document-streaming |\n| Storing & Visualizing Time Series Data with InfluxDB and Grafana | Learn how to use InfluxDB to store time series data and visualize interactive dashboards with Grafana | https://learndataengineering.com/p/time-series-influxdb-grafana |\n| Data Engineering with Hadoop | Hadoop Project with HDFS, YARN, MapReduce, Hive and Sqoop! | https://learndataengineering.com/p/data-engineering-with-hadoop |\n| Dockerized ETL | Learn how quickly set up a simple ETL script with AWS TDengine & Grafana | https://learndataengineering.com/p/timeseries-etl-with-aws-tdengine-grafana |\n\n## Certifications\n\nHere's a list of great certifications that you can do on AWS and Azure. We left out GCP here, because the adoption of AWS and Azure is a lot higher and that's why I recommend to start with one of these. The costs are usually for doing the certification tests. We also added the level and prerequisites to make it easier for you make the decision which one fits for you.\n\n| Platform | Certification Name                                      | Price | Level       | Prerequisite Experience                                                                  | URL                                                                                                          |\n|----------|---------------------------------------------------------|-------|-------------|------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------|\n| AWS      | AWS Certified Cloud Practitioner (maybe)               | 100   | Beginner    | Familiarity with the AWS platform is recommended but not required.                       | [Link](https://aws.amazon.com/certification/certified-cloud-practitioner/)                                   |\n| AWS      | AWS Certified Solutions Architect                      | 300   | Expert      | AWS Certified Solutions Architect - Professional is intended for individuals with two or more years of hands-on experience designing and deploying cloud architecture on AWS. | [Link](https://aws.amazon.com/certification/certified-solutions-architect-professional/?ch=sec&sec=rmg&d=1) |\n| AWS      | AWS Certified Solutions Architect                      | 150   | Intermediate| This is an ideal starting point for candidates with AWS Cloud or strong on-premises IT experience. This exam does not require deep hands-on coding experience, although familiarity with basic programming concepts would be an advantage. | [Link](https://aws.amazon.com/certification/certified-solutions-architect-associate/)                        |\n| AWS      | AWS Certified Data Engineer                            | 150   | Intermediate| The ideal candidate for this exam has the equivalent of 2-3 years of experience in data engineering or data architecture and a minimum of 1-2 years of hands-on experience with AWS services. | [Link](https://aws.amazon.com/certification/certified-data-engineer-associate/)                              |\n| Azure    | Microsoft Certified: Azure Cosmos DB Developer Specialty| 165   | Intermediate|                                                                                          | [Link](https://learn.microsoft.com/en-us/credentials/certifications/azure-cosmos-db-developer-specialty/?practice-assessment-type=certification) |\n| Azure    | Microsoft Certified: Azure Data Engineer Associate - DP 203| 165   | Intermediate|                                                                                          | [Link](https://learn.microsoft.com/en-us/credentials/certifications/azure-data-engineer/?practice-assessment-type=certification) |\n| Azure    | Microsoft Certified: Azure Data Fundamentals           | 99    | Beginner    |                                                                                          | [Link](https://learn.microsoft.com/en-us/credentials/certifications/azure-data-fundamentals/?practice-assessment-type=certification) |\n| Azure    | Microsoft Certified: Azure Database Administrator Associate| 165   | Intermediate|                                                                                          | [Link](https://learn.microsoft.com/en-us/credentials/certifications/azure-database-administrator-associate/?practice-assessment-type=certification) |\n| Azure    | Microsoft Certified: Azure Developer Associate         | 165   | Intermediate|                                                                                          | [Link](https://learn.microsoft.com/en-us/credentials/certifications/azure-developer/?practice-assessment-type=certification) |\n| Azure    | Microsoft Certified: Azure Fundamentals                | 99    | Beginner    |                                                                                          | [Link](https://learn.microsoft.com/en-us/credentials/certifications/azure-fundamentals/?practice-assessment-type=certification) |\n| Azure    | Microsoft Certified: Azure Solutions Architect Expert  | 165   | Expert      | Microsoft Certified: Azure Administrator Associate certification                        | [Link](https://learn.microsoft.com/en-us/credentials/certifications/azure-solutions-architect/)             |\n| Azure    | Microsoft Certified: Fabric Analytics Engineer Associate| 165   | Intermediate|                                                                                          | [Link](https://learn.microsoft.com/en-us/credentials/certifications/fabric-analytics-engineer-associate/?practice-assessment-type=certification) |\n| Azure    | Microsoft Certified: Fabric Data Engineer Associate    | 165   | Intermediate|                                                                                          | [Link](https://learn.microsoft.com/en-us/credentials/certifications/fabric-data-engineer-associate/)        |\n| Azure    | Microsoft Certified: Power BI Data Analyst Associate   | 165   | Intermediate|                                                                                          | [Link](https://learn.microsoft.com/en-us/credentials/certifications/data-analyst-associate/?practice-assessment-type=certification) |\n\n\n## Podcasts\nTop five podcasts by the number of episodes created.\n\n### Super Data Science\n\n[The latest machine learning, A.I., and data career topics from across both academia and industry are brought to you by host Dr. Jon Krohn on the Super Data Science Podcast.](https://podcasts.apple.com/us/podcast/super-data-science/id1163599059)\n\n### Data Skeptic\n\n[The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.](https://podcasts.apple.com/us/podcast/data-skeptic/id890348705)\n\n### Data Engineering Podcast\n\n[This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.](https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557?mt=2)\n\n### Roaring Elephant BiteSized Big Tech\n\n[A weekly community podcast about Big Technology with a focus on Open Source, Advanced Analytics and other modern magic.](https://roaringelephant.org/)\n\n### SQL Data Partners Podcast\n\n[Hosted by Carlos L Chacon, the SQL Data Partners Podcast focuses on Microsoft data platform related topics mixed with a sprinkling of professional development. Carlos and guests discuss new and familiar features and ideas and how you might apply them in your environments.](https://podcasts.apple.com/us/podcast/sql-data-partners-podcast/id1027394388)\n\n### Complete list\n| Host name               | Podcast name                                                                     | Access podcast                                                                                                                                                 |\n|-------------------------|----------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| Jon Krohn               | Super Data Science                                                               | https://www.superdatascience.com/podcast                                                                                                                       |\n| Kyle Polich             | Data Skeptic                                                                     | https://dataskeptic.com/                                                                                                                                       |\n| Tobias Macey            | Data Engineering Podcast                                                         | https://www.dataengineeringpodcast.com/                                                                                                                        |\n| Dave Russell            | Roaring Elephant - Bite-Sized Big Tech                                           | https://roaringelephant.org/                                                                                                                                   |\n| Carlos L Chacon         | SQL Data Partners Podcast                                                        | https://sqldatapartners.com/podcast/                                                                                                                           |\n| Jason Himmelstein       | BIFocal - Clarifying Business Intelligence                                       | https://bifocal.show/                                                                                                                                          |\n| Scott Hirleman          | Data Mesh Radio                                                                  | https://daappod.com/data-mesh-radio/                                                                                                                           |\n| Jonathan Schwabish      | PolicyViz                                                                        | https://policyviz.com/podcast/                                                                                                                                 |\n| Al Martin               | Making Data Simple                                                               | https://www.ibm.com/blogs/journey-to-ai/2021/02/making-data-simple-this-week-we-continue-our-discussion-on-data-framework-and-what-is-meant-by-data-framework/ |\n| John David Ariansen     | How to Get an Analytics Job                                                      | https://www.silvertoneanalytics.com/how-to-get-an-analytics-job/                                                                                               |\n| Moritz Stefaner         | Data Stories                                                                     | https://datastori.es/                                                                                                                                          |\n| Hilary Parker           | Not So Standard Deviations                                                       | https://nssdeviations.com/                                                                                                                                     |\n| Ben Lorica              | The Data Exchange with Ben Lorica                                                | https://thedataexchange.media/author/bglorica/                                                                                                                 |\n| Juan Sequeda            | Catalog & Cocktails                                                              | https://data.world/resources/podcasts/                                                                                                                         |\n| Wayne Eckerson          | Secrets of Data Analytics Leaders                                                | https://www.eckerson.com/podcasts/secrets-of-data-analytics-leaders                                                                                            |\n| Guy Glantser            | SQL Server Radio                                                                 | https://www.sqlserverradio.com/                                                                                                                                |\n| Eitan Blumin            | SQL Server Radio                                                                 | https://www.sqlserverradio.com/                                                                                                                                |\n| Jason Tan               | The Analytics Show                                                               | https://ddalabs.ai/the-analytics-show/                                                                                                                         |\n| Hugo Bowne-Anderson     | DataFramed                                                                       | https://www.datacamp.com/podcast                                                                                                                               |\n| Kostas Pardalis         | The Data Stack Show                                                              | https://datastackshow.com/                                                                                                                                     |\n| Eric Dodds              | The Data Stack Show                                                              | https://datastackshow.com/                                                                                                                                     |\n| Catherine King          | The Business of Data Podcast                                                     | https://podcasts.apple.com/gb/podcast/the-business-of-data-podcast/id1528796448                                                                                |\n|                         | The Business of Data                                                             | https://business-of-data.com/podcasts/                                                                                                                         |\n| James Le                | Datacast                                                                         | https://datacast.simplecast.com/                                                                                                                               |\n| Mike Delgado            | DataTalk                                                                         | https://podcasts.apple.com/us/podcast/datatalk/id1398548129                                                                                                    |\n| Matt Housley            | Monday Morning Data Chat                                                         | https://podcasts.apple.com/us/podcast/monday-morning-data-chat/id1565154727                                                                                    |\n| Francesco Gadaleta      | Data Science at Home                                                             | https://datascienceathome.com/                                                                                                                                 |\n| Alli Torban             | Data Viz Today                                                                   | https://dataviztoday.com/                                                                                                                                      |\n| Steve Jones             | Voice of the DBA                                                                 | https://voiceofthedba.com/                                                                                                                                     |\n| Lea Pica                | The Present Beyond Measure Show: Data Storytelling, Presentation & Visualization | https://leapica.com/podcast/                                                                                                                                   |\n| Samir Sharma            | The Data Strategy Show                                                           | https://podcasts.apple.com/us/podcast/the-data-strategy-show/id1515194422                                                                                      |\n| Cindi Howson            | The Data Chief                                                                   | https://www.thoughtspot.com/data-chief/podcast                                                                                                                 |\n| Cole Nussbaumer Knaflic | storytelling with data podcast                                                   | https://storytellingwithdata.libsyn.com/                                                                                                                       |\n| Margot Gerritsen        | Women in Data Science                                                            | https://www.widsconference.org/podcast.html                                                                                                                    |\n| Jonas Christensen       | Leaders of Analytics                                                             | https://www.leadersofanalytics.com/episode/the-future-of-analytics-leadership-with-john-thompson                                                               |\n| Matt Brady              | ZUMA: Data For Good                                                              | https://www.youtube.com/@zuma-dataforgood                                                                                                                      |\n| Julia Schottenstein     | The Analytics Engineering Podcast                                                | https://roundup.getdbt.com/s/the-analytics-engineering-podcast                                                                                                 |\n|                         | Data Unlocked                                                                    | https://dataunlocked.buzzsprout.com/                                                                                                                           |\n| Boris Jabes             | The Sequel Show                                                                  | https://www.thesequelshow.com/                                                                                                                                 |\n|                         | Data Radicals                                                                    | https://www.alation.com/podcast/                                                                                                                               |\n| Nicola Askham           | The Data Governance                                                              | https://www.nicolaaskham.com/podcast                                                                                                                           |\n| Boaz Farkash            | The Data Engineering Show                                                        | https://www.dataengineeringshow.com/                                                                                                                           |\n| Bob Haffner             | The Engineering Side of Data                                                     | https://podcasts.apple.com/us/podcast/the-engineering-side-of-data/id1566999533                                                                                |\n| Dan Linstedt            | Data Vault Alliance                                                              | https://datavaultalliance.com/category/news/podcasts/                                                                                                          |\n| Dustin Schimek          | Data Ideas                                                                       | https://podcasts.apple.com/us/podcast/data-ideas/id1650322207                                                                                                  |\n| Alex Merced             | The datanation                                                                   | https://podcasts.apple.com/be/podcast/the-datanation-podcast-podcast-for-data-engineers/id1608638822                                                           |\n| Thomas Bustos           | Let's Talk AI                                                                    | https://www.youtube.com/@lets-talk-ai                                                                                                                          |\n| Jahanvee Narang         | Decoding Data Analytics                                                          | https://www.youtube.com/@decodingdataanalytics/videos                                                                                                          |\n"
  },
  {
    "path": "sections/10-Updates.md",
    "content": "Updates\n============\n\nWhat's new? Here you can find a list of all the updates with links to the sections\n\n- **2025-07-21**\n  - Added a list of my students favorite datasets and APIs [click here](07-DataSources.md#Student-Favorites)\n\n\n- **2025-06-11**\n  - Released the first playable demo of the Spark Optimization Playground [click here](https://bit.ly/play-spark-optimization)\n\n\n- **2025-03-25**\n  - Added a detailed 14-week roadmap to Data Engineering for Data Scientists [click here](01-Introduction.md#roadmap-for-data-scientists)\n\n\n- **2025-03-05**\n  - Added a detailed 11-week roadmap to Data Engineering for Beginners [click here](01-Introduction.md#roadmap-for-beginners)\n\n\n- **2025-03-04**\n  - Added a detailed 10-week roadmap to Data Engineering for Data Analysts [click here](01-Introduction.md#roadmap-for-data-analysts)\n\n\n- **2024-12-11**\n  - Prepared the 81 most important questions for platform & pipeline design. Specifically looking at the data source and the goals [click here](03-AdvancedSkills.md#81-platform-and-pipeline-design-questions)\n\n\n- **2024-11-28**\n  - Prepared a GenAI RAG example project that you can run on your own computer without internet. It uses Ollama with Mistral model and Elasticsearch. Working on a way of creating embeddings from pdf files and inserting them into Elsaticsearch for queries [click here](04-HandsOnCourse.md#genai-retrieval-augmented-generation-with-ollama-and-elasticsearch)\n\n\n- **2024-11-23**\n  - Added an overview of AWS and Azure cloud certifications for Data Engineers. From beginners to experts [click here](09-BooksAndCourses.md#Certifications)\n\n\n- **2024-07-31**\n  - Added 10 platform architecture react videos I did to the \"Best Practices\" section. This way you get a better feeling of what companies are doing and which tools they use [click here](06-BestPracticesCloud.md#best-practices)\n\n\n- **2024-07-17**\n  - Added 20 API interview questoins and their answers [click here](08-InterviewQuestions.md#apis)\n  - Added 10 Python interview questions and their answers [click here](03-AdvancedSkills.md#python)\n\n\n- **2024-07-08**\n  - Added large article about Snowflake and dbt for Data Engineers [click here](03-AdvancedSkills.md#analytical-data-stores)\n  - Added new secton \"Analytical Data Stores\" to Advanced skills with the Snowflake & dbt infos.\n  - Put SQL and NoSQL datastores into a new section \"Transactional Data Stores\"\n\n\n- **2024-03-20**\n  - Added roadmap for Software Engineers / Computer Scientists [click here](01-Introduction.md#roadmap-for-software-engineers)\n  - Added many questions and answers from my interview on the Super Data Science Podcast (plus links to YouTube and the Podcast) [click here](01-Introduction.md#Interview-with-Andreas-on-the-Super-Data-Science-Podcast)\n\n\n- **2024-03-13**\n  - Added \"How to become a Senior Data Engineer\" live stream series as a blog post with images shown in the live streams and the links to the videos. [click here](01-Introduction.md#how-to-become-a-senior-data-engineer)\n\n\n- **2024-03-08**\n  - Included Data Engineering skills matrix into the introduction with link to the live stream. [click here](01-Introduction.md#data-engineers-skills-matrix)\n\n\n- **2024-03-01**\n  - Added updates section\n  - Reworked the Hands-on courses section with 5 free courses / tutorials from Andreas on YouTube [click here](04-HandsOnCourse.md)\n\n\n- **2024-02-28**\n  - Added Data Engineering Roadmap for Data Scientists: [click here](01-Introduction.md#roadmap-for-data-scientists)\n\n\n- **2024-02-25**\n  - Data Engineering Roadmap for Software Engineers: [click here](01-Introduction.md#roadmap-for-software-engineers)\n\n\n- **2024-02-20**\n  - Data Engineering Roadmap for Data Analysts: [click here](01-Introduction.md#roadmap-for-data-analysts)\n\n\n\n<!---\n| Date | Topic | Link\n|------------------|\n| 2024-02-28 | Added updates section - sdfs | [click here](sections/01-Introduction.md#roadmap-for-data-scientists)\n| 2024-02-28 | Data Engineering Roadmap for Data Scientists | [click here](sections/01-Introduction.md#roadmap-for-data-scientists)\n| 2024-02-25 | Data Engineering Roadmap for Software Engineers | [click here](sections/01-Introduction.md#roadmap-for-software-engineers)\n| 2024-02-20 | Data Engineering Roadmap for Data Analysts | [click here](sections/01-Introduction.md#roadmap-for-data-analysts)\n--->\n"
  }
]