[
  {
    "path": "README.md",
    "content": "# Resources for Analytics Engineers\nThis repository is a curation of good blog posts and books for Analytics Engineers. It can also be very useful for Data Analysts and Data Scientists. \n\n## Contribute\nI really appreciate any contribution. Just make sure to describe the theme and why you found the resource useful. \n\n# Table of Contents\n- [SQL](#sql)\n- [Python](#python)\n- [Infrastructure](#infrastructure)\n- [Analytics Skills](#analytics-skills)\n- [Data Warehousing](#data-warehousing)\n- [Data Pipelines](#data-pipelines)\n- [Starting analytics in a company](#starting-analytics-in-a-company)\n- [Testing data](#testing-data)\n- [Success Stories](#success-stories)\n- [Organisation](#organisation)\n- [Data Visualisation](#data-visualisation)\n- [Marketing and data](#marketing-and-data)\n- [Thinking with data](#thinking-with-data)\n- [Github-Gitlab repo to learn from](#github-gitlab-repo-to-learn-from)\n- [Against ELT](#against-elt)\n- [Other readings lists](#other-readings-lists)\n- [Top bloggers/blog](#top-bloggersblog)\n\n# Readings\n\nDefinition of the Analytics Engineer: [The Analytics Engineer](https://www.locallyoptimistic.com/post/analytics-engineer/). \n\n\n### SQL\nSQL has a lot of tips and tricks that take times to know. \n  * [Mode Analytics SQL Guide](https://mode.com/sql-tutorial/introduction-to-sql/). Very complete, even intermediate users can learn from this series of tutorials.\n  * [Learning SQL 201: Optimizing Queries, Regardless of Platform](https://towardsdatascience.com/learning-sql-201-optimizing-queries-regardless-of-platform-918a3af9c8b1) By Randy Au. I finally found a complete post on advanced SQL.\n\n### Python \nPython is a very broad subject. Maybe you can follow this list for more [Python focused readings](https://github.com/charlax/python-education).\n  * [Python for Data Analysis](https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1491957662). :book: Very comprehensive book about using python for data stuff. \n  * [Pandas Cheatsheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf) I use it everyday!\n  * [Modern pandas](https://tomaugspurger.github.io/modern-1-intro.html). A series of blog posts on intermediate/advanced pandas written by one of the maintainers. \n\n### Infrastructure\n\n  * [The Startup Founder's Guide to Analytics](https://thinkgrowth.org/the-startup-founders-guide-to-analytics-1d2176f20ac1). An excellent introduction to the stack necessary for analytics and its evolution following the growth of the start-up.  \n  * [The missing layer of Analytics Stack](https://blog.getdbt.com/the-missing-layers-of-the-analytics-stack). \n  * [Choosing a Data Warehouse](https://discourse.getdbt.com/t/choosing-a-data-warehouse/62/4). A lot of excellent answers on what to choose for your data warehouse. \n  * [Data science for start-ups](https://bgweber.github.io/intro.html). You can find some useful information in this free book.\n  * [Designing Data-Intensive Applications](https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321) :book: Fascinating read to learn more about databases, protocols etc...\n  * [The Modern Data Stack: Past, Present, and Future](http://blog.getdbt.com/future-of-the-modern-data-stack/) A must-read on the last innovations in the data stack.\n\n  **Comparison of tools by Stephen Levin**\n  * [Looker vs Tableau vs Mode. Data Visualisation tools compared](https://www.stephenlevin.co/advanced-analytics-part-3-data-visualization/). . \n  * [Segment vs Fivetran vs Stitch: Which Data Ingest Should You Use?](https://www.stephenlevin.co/segment-vs-fivetran-vs-stitch-which-data-ingest-should-you-use/)\n\n### Analytics Skills\n  * [One analyst's guide for going from good to great](https://blog.getdbt.com/one-analysts-guide-for-going-from-good-to-great/)\n  * [Suceeding as the first data person in a small company/startup](https://towardsdatascience.com/succeeding-as-a-data-scientist-in-small-companies-startups-92f59e22bd8c). A must read for anyone working in data even in a big company. \n  * [Prioritizing data science work](https://towardsdatascience.com/prioritizing-data-science-work-936b3765fd45). Too many engineers like building ivory towers. Make sure you don't fall in the trap.\n\n### Data Warehousing\n\n  * [The beginner guide to data engineering series](https://medium.com/@rchang/a-beginners-guide-to-data-engineering-part-i-4227c5c457d7). Start here if you don't know what is a star schema, Airflow and some basic practices when writing data pipelines.    \n  * [Best practices for data modeling](https://www.stitchdata.com/blog/best-practices-for-data-modeling/). A lot of practical tips on naming, grain, permissions and materialization. \n  * [The Data Warehouse Toolkit](https://www.amazon.com/Data-Warehouse-Toolkit-Definitive-Dimensional/dp/1118530802/ref=sr_1_1?crid=FV5A2S72XIZO&keywords=data+warehouse+toolkit&qid=1566644628&s=gateway&sprefix=data+ware%2Caps%2C213&sr=8-1) by Ralph Kimball. :book: A classic in Business Intelligence. Some chapters can be gold on modeling your data warehouse.   \n  * [Functional Data Engineering — a modern paradigm for batch data processing](https://medium.com/@maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a). You will learn the spirit behind good data pipelines and a well-designed data warehouse.  \n  * [The rise of the Data Engineer](https://medium.com/free-code-camp/the-rise-of-the-data-engineer-91be18f1e603). Explains recent evolutions of the job and data practices.   \n  * [Five principles that will keep your data warehouse organized](https://blog.getdbt.com/five-principles-that-will-keep-your-data-warehouse-organized/)\n  * [Using Postgres as a data warehouse](https://www.narrator.ai/blog/using-postgresql-as-a-data-warehouse/) I wish I read this post earlier. So much wisdom on Postgres.\n  * [For Data Warehouse Performance, One Big Table or Star Schema?](https://fivetran.com/blog/obt-star-schema). Discussion on an alternative to star schema. \n\n### Data Pipelines\n\n  * [Functional Data Engineering — a modern paradigm for batch data processing](https://medium.com/@maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a). You will learn the spirit behind good data pipelines and a well-designed data warehouse.\n  * [Maintenable ETL: Tips for Making Your Pipelines Easier to Support and Extend](https://multithreaded.stitchfix.com/blog/2019/05/21/maintainable-etls/). Best practices to write good ETL. \n  * [The Data Warehouse ETL Toolkit](https://www.amazon.com/gp/product/0764567578?ie=UTF8&tag=decworks-20&lin%20kCode=xm2&camp=1789&creativeASIN=0764567578) :book: Once again, very dense book but you can find good ideas. \n\n### Starting analytics in a company\n  * [Building a data practice from scratch](https://www.locallyoptimistic.com/post/building-a-data-practice/). Very useful for your first weeks as a data person. \n  * [The Startup Founder's Guide to Analytics](https://thinkgrowth.org/the-startup-founders-guide-to-analytics-1d2176f20ac1). An excellent introduction to the stack necessary for analytics and its evolution following the growth of the start-up. \n\n\n### Testing data\n  * [Automated Testing In The Modern Data Warehouse](https://medium.com/@josh.temple/automated-testing-in-the-modern-data-warehouse-d5a251a866af). Practical advice to test data. Useful for everyone building data pipelines. Rare to found such a post dealing with non-sexy thing in data. \n\n\n### Success Stories\n  * [Scaling analytics at Wish](https://medium.com/wish-engineering/scaling-analytics-at-wish-619eacb97d16)\n  * [Building Analytics at 500px](https://medium.com/@samson_hu/building-analytics-at-500px-92e9a7005c83)\n\n### Organisation\n  * [Engineer shouldn't write ETL](https://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/). It's more data science focused but it's a classic.\n  * [Does my startup data team need a data engineer?](https://blog.getdbt.com/does-my-startup-data-team-need-a-data-engineer-/)\n\n### Marketing and data\n  * [Data Driven Marketing](https://www.amazon.com/Data-Driven-Marketing-Metrics-Everyone-Should/dp/0470504544/ref=sr_1_1?crid=38ZUOKHZZEY6D&keywords=data+driven+marketing&qid=1566644698&s=gateway&sprefix=data+driven%2Caps%2C209&sr=8-1). :book: Reading some chapters can help you think like a marketer with data driven approach. It's a gem. Didn't find this kind of insights elsewhere.\n\n### Thinking with data\nThese books/articles helped me to think better when analysing data. \n\n  * [Common Data Mistakes to Avoid](https://www.geckoboard.com/learn/data-literacy/statistical-fallacies/). Excellent summary of the most common fallacies when analyzing data. Very clear and well-explained. \n  * [Thinking fast and slow](https://www.amazon.com/dp/0374533555/ref=cm_sw_em_r_mt_dp_U_wOryDb6WC3CVE). Learning about bias can be super useful. For instance, I didn't have the reflex to think of a base rate anytime I see a figure. \n  * [Fooled by randomness](https://www.amazon.com/Fooled-Randomness-Hidden-Markets-Incerto/dp/0812975219/ref=sr_1_1?crid=2QEXPWM35W0BR&keywords=fooled+by+randomness&qid=1566644880&s=books&sprefix=foole%2Cstripbooks-intl-ship%2C207&sr=1-1).\n:book: Nassim Taleb taught so much both professionally and personnaly. In Fooled By Randomness, you will learn about major pitfalls when dealing with data in **real life**. \n  * [Why you should care about the Nate Silver vs. Nassim Taleb Twitter war](https://towardsdatascience.com/why-you-should-care-about-the-nate-silver-vs-nassim-taleb-twitter-war-a581dce1f5fc). Great chess players learn from high elo games. Great data people learn from debate between data experts. \n  * [Five books every data scientist should read that are not about data science](https://towardsdatascience.com/five-books-every-data-scientist-should-read-that-are-not-about-data-science-f7335fb1f84f). I have not read them all yet. But these suggestions seems judicious. \n\n\n### Data Visualisation \n   * [Fundamentals of Data Visualisation](https://serialmentor.com/dataviz/). Complete guide to visualisation. Free version online.\n\n### Github-Gitlab repo to learn from\nI found that reading code helps to know the best practices whether it is Python or SQL.\n\nIn Python reading some taps from [Singer](https://github.com/singer-io) can teach you a lot. \n\nIn dbt/SQL I like to browse [a repo open-sourced by Gitlab](https://gitlab.com/gitlab-data/analytics/-/tree/master/transform/snowflake-dbt)\n\n\n### Against ELT\nThe concept of analytics engineering is tightly coupled with the ELT view of data warehousing. It is interesting to learn from the people that would prefer the ETL. \n[Reddit comments on Snowflake super-expensive cost](https://www.reddit.com/r/dataengineering/comments/is39id/snowflake_cost_analysis/)\n\n\n### Other readings lists\n\n   The GitLab data team also made an [excellent list](https://about.gitlab.com/handbook/business-ops/data-team/#data-learning-and-resources). (close to mine)\n\n[Analytics Dispatch](https://mode.com/analytics-dispatch) by Mode Analytics. Very comprehensive.\n\nI really love [Reading in Applied Data Science](https://github.com/hadley/stats337#readings) for a more data science focused view.  \n\nKnowing more about programming is an huge asset. For instance [Professional Programming list](https://github.com/charlax/professional-programming) is quite complete.\n\n\n# Top bloggers/blog\n  * [Randy Au](https://towardsdatascience.com/@Randy_Au). You can read almost all his posts there are all very relevant for analytics engineers.\n  * [Locally Optimistic](https://www.locallyoptimistic.com/). A blog dedicated to data in organizations. \n  * [Tristan Handy](https://medium.com/@jthandy). I also love his newsletter: [Data Science Roundup](http://roundup.fishtownanalytics.com/).\n  * [Dbt blog](https://blog.getdbt.com/). 90% of the articles are almost must-read.\n  * [Ken Farmer](https://www.reddit.com/user/kenfar/?sort=top&t=year) It is healthy to read from those who still prefer the ETL stack.\n  * [Holistics.io](https://www.holistics.io/blog/) About the contemporary practice of business intelligence.\n\n# Where is the community?\n  * Twitter\n  * [Locally Optimistic](https://www.locallyoptimistic.com/)\n  * [Reddit data engineering](https://www.reddit.com/r/dataengineering/). ETL, Business Intelligence, Data Science channels are also good.\n\n"
  }
]