[
  {
    "path": ".gitignore",
    "content": ".DS_Store\n.AppleDouble\n.LSOverride\n.idea\n.ipynb_checkpoints\n*/.pytest_cache/\ngit-user.sh\n/excluded_resources/*\n"
  },
  {
    "path": "README.md",
    "content": "# 👉👉👉 Visit [musicat.fm](https://musicat.fm) 😻\n\nYou can connect Spotify and Apple Music to it to discover many cool statistics about your taste!\n\n(I'm the author 🤩)\n\n---\n\n## Notes\n\n### Books\n\n👀 In progress:\n\n- [System design interview](books/system-design-interview.md)\n\n#### ✅ Finished:\n\n- Code:\n    - [Clean Code: A Handbook of Agile Software Craftsmanship](books/clean-code.md)\n    - [Learning Go: An Idiomatic Approach to Real-World Go Programming](books/go/notes.md)\n    - [Python Testing with Pytest](books/pytest/notes.md)\n    - [Refactoring: Improving the Design of Existing Code](books/refactoring.md)\n    - [Tidy first?](books/tidy-first.md)\n\n- Architecture:\n    - [Architecture Patterns with Python](books/python-architecture-patterns/notes.md)\n    - [Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems](books/ddia.md)\n    - [Head First Design Patterns: Building Extensible and Maintainable Object-Oriented Software](books/head-first-design-patterns/notes.md)\n    - [Release It! Design and Deploy Production-Ready Software](books/release-it.md)\n    - [Fundamentals of Software Architecture](books/fundamentals-of-architecture.md)\n\n- Process:\n    - [Clean Agile: Back to Basics](books/clean-agile.md)\n    - [Domain-Driven Design: Tackling Complexity in the Heart of Software](books/ddd.md)\n    - [Peopleware: Productive Projects and Teams](books/peopleware.md)\n    - [The Pragmatic Programmer](books/pragmatic-programmer.md)\n    - [Comic Agilé](books/comic-agile.md)\n\n- DevOps:\n    - [The Kubernetes Book](books/kubernetes-book.md)\n\n- Product:\n    - :eyes:\n\n- ML:\n    - [Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition](books/nlp-book.md)\n\n#### ☑️ Finished partially:\n\n- [Code Complete: A Practical Handbook of Software Construction](books/code-complete.md)\n- [Cracking the Coding Interview](books/cracking-coding-interview/notes.md)\n- [Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems](books/hands-on-ml.md)\n- [Build](books/build.md)\n- [Coaching Agile Teams](books/coaching-agile-teams.md)\n\n#### ⏳ Queue:\n\n- [Docker Deep Dive](books/docker-deep-dive.md)\n- [Software Architecture: The Hard Parts](books/architecture-hard-parts.md)\n- [Understanding Distributed Systems](books/understanding-distributed-systems.md)\n- [Kubernetes in Action](books/kubernetes-in-action.md)\n- [Elixir in Action](books/elixir.md)\n\n### Case Studies\n\n- [Reddit](case-studies/reddit.md)\n\n### Conferences\n\n- [PyCon 2022](conferences/pycon-2022.md)\n- [AWS Innovate: AI/ML Edition 2021](conferences/aws-innovate-ai-ml-21.md)\n- [Brown Bags](conferences/brown-bags.md)\n\n### Patterns\n\n- [Abbreviations](patterns/abbreviations.md)\n- [Architecture](patterns/architecture.md)\n\n### Teaching\n\n- [Introduction to Programming: Python for beginners](teaching/python-intro)\n- [Python Intermediate](teaching/python-intermediate)\n\n### Courses\n\n- [Course @ FastAI](courses/fast-ai.md)\n"
  },
  {
    "path": "books/architecture-hard-parts.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Software Architecture: The Hard Parts: Modern Tradeoff Analysis for Distributed Architectures\n\nBook by Pramod Sadalage, Neal Ford, Mark Richards, Zhamak Dehghani\n"
  },
  {
    "path": "books/build.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Build\n\nBook by Tony Fadell\n\n- [1.1 Adulthood](#11-adulthood)\n- [1.2 Get a job](#12-get-a-job)\n- [1.3 Heroes](#13-heroes)\n- [1.4 Don't (only) look down](#14-dont-only-look-down)\n- [2.1 Just managing](#21-just-managing)\n- [2.2 Data versus opinion](#22-data-versus-opinion)\n- [2.3 Assholes](#23-assholes)\n- [2.4 I quit](#24-i-quit)\n- [3.1 Make the intangible tangible](#31-make-the-intangible-tangible)\n- [3.2 Why storytelling](#32-why-storytelling)\n- [3.3 Evolution versus disruption versus execution](#33-evolution-versus-disruption-versus-execution)\n- [3.4 Your first adventure - and your second](#34-your-first-adventure---and-your-second)\n\n## 1.1 Adulthood\n\nWhen you are looking at the array of potential careers before you, the correct place to start is \"What do I want to\nlearn?\"\n\n- NOT: How much money do I want to make?\n- NOT: What title do I want to have?\n- NOT: What company has enough name recognition?\n\nEarly adulthood is about watching your dreams go up in flames and learning as much as you can from the ashes.\n\nGo where you can grow - people, mission, the opportunity are all that matters.\n\n> The only failure is your twenties is inaction. The rest is trial and error.\n\nHumans learn through productive struggle, by trying it themselves and screwing up and doing it differently next time.\nYou have to push yourself to the mountain, even if it means you might fall of a cliff.\n\n## 1.2 Get a job\n\nIf you are going to throw your time, energy, and youth at a company, try to join one that's not just making a better\nmousetrap. Find a business that's starting a revolution:\n\n- it's creating a product that's wholly new or combines existing technology in a novel way that the competition can't\n  make or even understand\n- this product solves a problem - a real pain point - that a lot of customers experience daily\n- the novel technology can deliver on the company vision\n- leadership is not dogmatic about what the solution looks like and is willing to adapt to their customers' needs\n- it's thinking about a problem or a customer need in a way you've never heard before, but makes a perfect sense once\n  you hear it\n\nCool technology isn't enough, a great team isn't enough, plenty of funding isn't enough. You have to time you product\nright. The world has to be ready to want it. If you're not solving a real problem, you can;t start a revolution.\n\nSeemingly impossible problems that a decade ago would have cost billions to solve, requiring massive investments from\ngian firms, can now be figured out with a smartphone app, a small sensor, and the internet.\n\nIf you are passionate about something - something that could be solving a huge problem one day - then stick with it.\nBecause one day, if you are truly solving a real issue, when the world is ready to want it, you will already be there.\n\nYou don't have to an executive right away, you don't have to get a job at the most amazing, world-changing company out\nof college, but you should have a goal.\n\n## 1.3 Heroes\n\nThe only thing that can make a job truly amazing or complete waste of time is the people.\n\nYou always have something to offer if you are curious and engaged. You can always trade and barter good ideas; you can\nalways be kind and find a way to help.\n\nTry to get into a small company, the sweet spot is a business of 30-100 people building something worth building. You\ncould go to Google, Apple, Facebook, or some other giant company, but it will be hard to maneuver yourself to work\nclosely with the rock stars.\n\nSmaller companies still have specialization, but usually without silos. And they have a different energy. The whole\ncompany will be focused on working together to make one precious idea become reality. Anything unnecessary is shunned -\nred tape and politics are typically nonexistent.\n\nBeing in that lifeboat with people you deeply respect is a joy. It is the best time you can have at work. It might be\nthe best time you can have.\n\n## 1.4 Don't (only) look down\n\nIC - individual contributor - a person who doesn't manage others. As an IC, you need to occasionally do 2 things:\n\n- look up - look beyond the next deadline or project, bne sure the mission still makes sense to you and that the path to\n  reach it seems achievable\n- look around - get out of your comfort zone and away from the immediate team you are on,talk to the other functions in\n  your company to understand their perspectives, needs, and concerns\n\nDon't think doing the work just means locking yourself in a room - a huge part of it is walking with your team. The work\nis reaching your destination together. Or finding a new destination and bringing your team with you.\n\n## 2.1 Just managing\n\n6 things you should know before becoming a manager:\n\n- You don't have to be a manager to be successful - many people wrongly assume that the only path to more money and\n  stature is managing a team. There are alternatives that will enable you to get a similar paycheck.\n- Remember that once you become a manager, you will stop doing the thing that made you successful in the first place -\n  your job will be communication, communication, communication, recruiting, hiring, firing, setting budgets, reviews,\n  one-to-one meetings, setting goals, keeping people on track, resolving conflicts, mentoring, ...\n- Becoming a manager is a discipline - management is a learned skill, not a talent.\n- Being exacting and expecting great work is not micromanagement - your job is to make sure the team produces\n  high-quality work, it only turns into micromanagement when you dictate the step-by-step process.\n- Honesty is more important than style - you can be successful with any style as long as you never shy away from\n  respectfully telling people the uncomfortable, hard truth needs to be said.\n- Don't worry that your team will outshine you - in fact, it's your goal, you should always be training someone on your\n  team to do your job, the better they are, the easier it is for you to move up and even start managing managers\n\nWhen you are a manager, you are no longer just responsible for the work. You are responsible for human beings.\n\nA star individual contributor is incredibly valuable. Valuable enough that many companies will pay them just as much as\nthey'd pay a manager. A truly great IC will be a leader in their chosen function and also become an informal cultural\nleader, someone who people across the company will seek out for advice and mentorship.\n\nExamining the product in detail and caring deeply about the quality of what your team is producing is not\nmicromanagement. That's exactly what you should be doing. Steve Jobs was bringing out a jeweler's loupe and looking at\nindividual pixels on a screen to make sure the user interface graphics were properly drawn.\n\nAs a manager, you should be focused on making sure the team is producing the best possible product.\n\nIt is very easy to turn 1:1s into a friendly chats that go nowhere, so clear meeting agenda can be beneficial.\n\nIf you are a manager - congrats, you're now a parent. Not because you should treat your employees like children, but\nbecause it's now your responsibility to help them work through failure and find success.\n\n## 2.2 Data versus opinion\n\nData driven decisions - you can acquire, study, and debate facts - relatively easy to make.\n\nOpinion-driven - follow your gut and your vision - always hard and always questioned.\n\nMake decisions, not everyone has to agree - it happens when one person has to make the final call. This isn't a\ndemocracy, nor dictatorship - you can't give orders without explaining yourself.\n\nStorytelling is how you get people to take a leap of faith to do something new. Creating a believable narrative that\neveryone can latch on to is critical to moving forward and making hard choices. It's all that marketing comes down to.\n\nYou are selling - vision, guy, opinion.\n\n> It's not data or intuition, it's data and intuition.\n\n## 2.3 Assholes\n\nUp to 12 percent of corporate senior leadership exhibit psychopathic traits. There are different assholes:\n\n- Political assholes - people who master the art of corporate politics, but then do nothing but take credit for everyone\n  else's work. These assholes usually build a coalition of budding assholes around them\n- Controlling assholes - micromanagers who systematically strangle the creativity and juy out of their team. They never\n  give people credit for their work, never praise it, and often steal it.\n- Asshole assholes - they suck at work and everything else, mean jealous, insecure jerks. They cannot deliver, are\n  deeply unproductive, so they do everything possible to deflect attention away from themselves. They are generally out\n  of door pretty quickly.\n- Mission-driven \"assholes\" - crazy passionate - they are neither easygoing nor easy to work with. Unlike true assholes,\n  they care.\n\nPushing for greatness doesn't make you an asshole. Not tolerating mediocrity doesn't make you na asshole. You need to\nunderstand their motivations.\n\nControlling assholes won't listen. They will never admit they screwed up.\n\nThings you can do when faced with a controlling asshole:\n\n- kill'em with kindness\n- ignore them\n- try to get around them\n- quit\n\nMost people aren't assholes. And even if they are, they are also human. So don't walk into a job trying to get anyone\nfired. Start with kindness. Try to make peace. Assume the best.\n\n## 2.4 I quit\n\nSometimes you need to quit. Here is how you know:\n\n- You are no longer passionate about the mission - every hour at your desk feels like an eternity\n- You have tried everything - the company is letting you down\n\nOnce you do decide to quit, make sure you leave in the right way - try to finish as much as possible, find natural\nbreakpoint in your project.\n\nHating your job is never worth whatever raise, title, or perks they throw at you to stay.\n\nThe threat of leaving may be enough to push your company to get serious and make whatever change you are asking for. But\nit might not. Quitting should not be a negotiating tactic - it should be the very last card you play.\n\nGood things take time, big times take longer. If you flit from project to project, company to company, you will never\nhave the vital experience of starting and finishing something meaningful.\n\n## 3.1 Make the intangible tangible\n\nDon't just make a prototype of your product and think you're done. Prototype as much of the full customer experience as\npossible.\n\nYour product isn't only your product. It's the whole user experience. The customer journey and touchpoints:\n\n- awareness (PR, search, social media, ads)\n- education (website, email, blog, trial/demo)\n- acquisition (partners, payment model)\n- product (design, UX, performance)\n- onboarding (quick guide, account creation, tips, how-to videos)\n- usage (reliability, usability, updates, lifespan)\n- support (troubleshooting, knowledge base, call center, community)\n- loyalty (new product, newsletter, promotions, ratings/reviews)\n\n## 3.2 Why storytelling\n\nEvery product should have a story, a narrative that explains why it needs to exist and how will it solve your customer's\nproblems. A good product story:\n\n- it appeals to people's rational and emotional sides\n- it takes complicated concepts and makes them simple\n- it reminds people of the problem that's being solved - it focuses on the why\n\nThe story of your product, your company, and your vision should drive everything you do.\n\nVirus of a doubt: \"it is a way to get into people's heads, remind them about a daily frustration, get them annoyed about\nit all over again. You get them angry about how it works now so they can get excited abut a new way of doing things.\"\n\nProduct's story is its design, features, images, videos, quotes from customers, tips from reviewers. The sum of what\npeople see and feel about this thing that you have created.\n\nWhy does this thing need to exist? Why does it matter? Why will people need it? Why will they love it? The longer you\nwork on something, the more the \"what\" takes over the \"why\". When you get wrapped in the \"what\", you get ahead of\npeople. You think everyone can see what you see. But they don't.\n\nEarn their trust by showing that you really know your stuff or understand their needs. Of offer them something useful,\nconnect with them in a new way, so they feel assured that they're making the right choice with your company.\n\nAppeal to their emotions, connect with something they care about. Their worries, their fears. Every person is different,\nand everyone will read your story differently.\n\nAnalogies can be a useful tool in storytelling. They create a shorthand for complicated concepts.\n\n## 3.3 Evolution versus disruption versus execution\n\nEvolution - a small, incremental step to make something better\n\nDisruption - a fork on the evolutionary tree - something fundamentally new that changes the status quo, usually by\ntaking a novel or revolutionary approach to an old problem\n\nExecution - doing what you have promised to do and doing it well\n\nYour version one product should be disruptive, not evolutionary. But disruption alone will not guarantee success.\nContinue to evolve, but always seek out new ways to disrupt yourself.\n\nDisruption should be important for you personally. If you've truly made something disruptive, your competition probably\nwon't be able to replicate it quickly.\n\nJust don't overshoot. Don't try to disrupt everything at once.\n\nAs your disruptive product, process, or business model begins to gain steam with customers, your competitors will start\nto get worried. They'll start paying attention, they will get pissed. When companies get angry they undercut your\npricing, try to embarrass you with marketing, use negative press, put in new agreements with sales to lock you out of\nthe business.\n\nAnd they might sue you. If they can't innovate, they litigate. The good news is that a lawsuit means you've officially\narrived (you are a real threat, and they know it).\n\nDisruptions - extremely delicate balancing act:\n\n- you focus on making one amazing thing but forget that it has to be part of a single, fluid experience\n- beautiful execution on everything else but the one thing that would have differentiated your product withers away\n- you change too many things too fast and regular people can't recognize or understand what you have made, you can't\n  push people too far outside their mental model, not at first\n\nChallenge yourself, over-deliver, create excellent solutions.\n\nIf you do it right, one disruption will fuel the next. One revolution will domino another.\n\n## 3.4 Your first adventure - and your second\n\nWhen releasing V1 you have the following tools to make decisions: Vision, Customer insights, Vision. Once you start\niterating on an existing product, you will have experience and data, so you can use your existing tools but in different\norder: Data, Customer insights, Vision.\n\nLocking yourself alone in a room to create a manifesto of your single, luminous vision looks and feels indistinguishable\nfrom completely loosing your mind. Get at least one person - but preferably a small group - to bounce ideas off of.\nSketch your ideas together, then fulfill it together.\n\n"
  },
  {
    "path": "books/clean-agile.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Clean Agile: Back to Basics\n\nBook by Robert Cecil Martin\n\n- [Chapter 1: Introduction to Agile](#chapter-1-introduction-to-agile)\n- [Chapter 2: The Reasons For Agile](#chapter-2-the-reasons-for-agile)\n- [Chapter 3: Business Practices](#chapter-3-business-practices)\n- [Chapter 4: Team Practices](#chapter-4-team-practices)\n- [Chapter 5: Technical Practices](#chapter-5-technical-practices)\n- [Chapter 6: Becoming Agile](#chapter-6-becoming-agile)\n- [Chapter 7: Craftsmanship](#chapter-7-craftsmanship)\n- [Chapter 8: Conclusion](#chapter-8-conclusion)\n- [Afterword](#afterword)\n\n## Chapter 1: Introduction to Agile\n\nThe Agile Manifesto was written in February 2001 in Utah by 17 software experts. Once a movement become popular, the\nname of that movement got blurred through misunderstanding and usurpation.\n\nWhen did Agile begin? More than 50 000 years ago when humans first decided to collaborate on a common goal. The idea of\nchoosing small intermediate goals and measuring the progress after each is too intuitive, and too human, to be\nconsidered any kind of revolution.\n\nAgile was not the only game in town:\n\n- Scientific Management - top-down, command-and-control approach. Big up-front planning followed by careful detailed\n  implementation. Worked best for projects that suffered a high cost of change and solved very well-defined problems\n  with extremely specific goals.\n- Waterfall - logical descendant of Scientific Management. Even though it was not what the author was recommending, it\n  was the concept people took away from his paper. And it dominated the next 3 decades. It dominated but it didn't work.\n\nHow could thoroughly analyzing the problem, carefully designing a solution, and then implementing that design fail so\nspectacularly over and over again.\n\nThe beginnings of the Agile reformation began in the late 1980s. In 1995 a famous paper on Scrum was written.\n\nThe Preamble of the Agile Manifesto:\n\n> We are uncovering better ways of developing software by doing it and helping others do it.\n\nThe Agile Manifesto:\n\n> **Individuals and interactions** over processes and tools.\n\n> **Working software** over comprehensive documentation.\n\n> **Customer collaboration** over contract negotiation.\n\n> **Responding to change** over following a plan.\n\nThe Iron Cross of project management: good, fast, cheap, done - pick any three you like, you will not have the fourth.\n\nA good manager drives a project to be good enough, fast enough, cheap enough and done as much as necessary. This is kind\nof management that agile strives to enable.\n\nAgile is a _framework_ that helps developers and managers execute this kind of pragmatic project management. However,\nsuch management is not done automatic. It is entirely possible to work within Agile framework and still completely\nmismanage the project and drive it to failure.\n\nAgile provides data. An Agile development team produces just the kinds of data that managers need in order to make good\ndecisions:\n\n- Velocity - how much the development team has gotten done every week.\n- Burn-down chart - shows how many points remain until the next major milestone. Has a slope that predicts when the\n  milestone will probably be reached.\n\nThis data managers need to decide how to set the coefficients on the Iron Cross and drive the project to the best\npossible outcome.\n\nAgile development is first and foremost a feedback-driven approach. Each, week, each day, each hour, and even each\nminute is driven by looking at the results of the previous week, day, hour and minute, and then making the appropriate\nadjustments.\n\nThe Date (deadline) is usually fixed and is not going to change because some developers think they may not be able to\nmake it. At the same time, the requirements are wildly in flux and can never be frozen. This is because the customers\ndon't really know what they want. So the requirements are constantly being re-evaluated and re-thought.\n\nThe Waterfall model promised to give us a way to get our arms around this problem:\n\n- The Analysis Phase - no real consensus on just what analysis is, the best definition: \"it is what analyst do\".\n- The Design Phase - is where you split the project up into modules and design interfaces between those modules.\n- The Implementation Phase - there is no way to successfully pretend it is done, meanwhile, the requirements are still\n  coming.\n- The Death March Phase - customers are angry, stakeholders are angry, the pressure mounts, people quit. Hell.\n\nIt can be called - Runway Process Inflation - we are going to do the thing that did not work, and do it a lot more of\nit.\n\nOf course Waterfall was not an absolute disaster. It did not crush every software project into rubble. But it was, and\nremains, a disastrous way to run a software project.\n\nThe Waterfall just makes so much sense. First, we analyze the problem, then we design the solution, and then we\nimplement the design. Simple. Direct. Obvious. And wrong.\n\nAn Agile project begins with analysis, but it is an analysis that never ends. Time before deadline is divided into\nregular increments called _iterations_ or _sprints_. The size of an iteration is usually one or two weeks.\n\nThe first iteration (Iteration Zero). is used to generate a short list of features (stories). Iteration Zero is used to\nset up development environment, estimate the stories and lay out the initial plan. This process of writing stories,\nestimating them, planning them and designing never stops. Every iteration will have some analysis and design and\nimplementation in it.\n\nIn Agile project, we are always analyzing and estimating.\n\nSoftware is not a reliably estimable process. We programmers simply do not know how long things will take. There is no\nway to know how complicated a task is going to be until that task is engaged and finished.\n\nAfter a couple of iterations we get insight how much time will be needed basing on past iterations. This number averages\nat a relatively stable velocity. After four or five iterations, we will have a much better idea when this project will\nbe done.\n\nWe practice Agile in order to destroy hope before that hope can kill the project. Hope is the project killer. Hope is\nwhat makes a software team mislead managers abut their true progress. Hope is a very bad way to manage a software\nproject. And Agile is a way to provide an early and continuous dose of cold, hard reality as a replacement for hope.\n\nSome folks think that Agile is about going fast. It is not. Agile is about knowing, as early as possible, just how\nscrewed we are. The reason we want to know this as early as possible is so that we can manage the situation. Managers\nmanage software projects by gathering data and then making the best decisions they can base on that data.\n\nManagers do this by making changes to the scope, the schedule, the staff, and the quality:\n\n- Changing the Schedule - ask stakeholders if we can delay the project. Do this as early as possible.\n- Adding Staff - in general, business is simply not willing to change the schedule. When new staff is added,\n  productivity plummets for a few weeks as the new people suck the life out of the old people. Then, hopefully, the new\n  people start to get smart enough to actually contribute. Of course, you need enough time, and enough improvement, to\n  make up for the initial loss.\n- Decrease Quality - everyone knows that you can go much faster by producing crap. WRONG. There is no such thing as\n  quick and dirty. Anything dirty is slow. **The only way to go fast, is to go well**. If we want to shorten our\n  schedule, the only option is to _increase_ quality.\n- Changing Scope - if the organization is rational, then the stakeholders eventually bow their heads in acceptance and\n  begin to scrutinize the plan.\n\nInevitably the stakeholders will find a feature that we have already implemented and then say \"It is a real shame you\ndid that one, we sure do not need it\". At the beginning of each iteration, ask the stakeholders which features to\nimplement first.\n\n20 000 foot view of Agile:\n\n> Agile is a process wherein a project is subdivided into iterations. The output of each iteration is measured and used\n> to continuously evaluate the schedule. Features are implemented in the order of business value so that the most\n> valuable things are implemented first. Quality is kept as high as possible. The schedule is primarily managed by\n> manipulating scope.\n\n## Chapter 2: The Reasons For Agile\n\nAgile is important because of professionalism and the reasonable expectations from our customers.\n\n- Professionalism - nowadays the cost of software failure is high, therefore we need to increase our professionalism. We\n  are surrounded by computers, and they all need to be programmed - they all need software. Nowadays, virtually nothing\n  of significance can be done without interacting with a software system. Now our actions are putting lives and fortunes\n  at stake.\n- Reasonable Expectations - meeting expectations is one of primary goals of Agile development.\n    - we will not ship sh*t - Agile's emphasis on Testing, Refactoring, Simple Design and customer feedback is the\n      obvious remedy for shipping bad code.\n    - continuous technical readiness - system should be technically (solid enough to be deployed) deployable at the end\n      of every iteration.\n    - stable productivity - big redesigns are horrifically expensive and seldom are deployed. Developers instead, should\n      continuously keep the architecture, design and code as clean as possible, this allows to keep their productivity\n      high and prevent the otherwise inevitable spiral into low productivity and redesign.\n    - inexpensive adaptability - software - soft (easy to change), ware (product). Software was invented because we\n      wanted a way to quickly and easily change the behavior of our machines. Developers should celebrate change because\n      that is why we are here. Changing requirements is the name of the whole game. Our jobs depend on our ability to\n      accept and engineer changing requirements and to make those changes relatively inexpensive. If a change to the\n      requirements breaks your architecture, then your architecture sucks.\n    - continuous improvement - the older a software system is, the better it should be. Unfortunately it seldom happens.\n      We make things worse with time. The Agile practices of Pairing, TDD, Refactoring, and Simple Design strongly\n      support this expectation.\n    - fearless competence - people are afraid of changing bad code, you can break it, and if it breaks it will become\n      yours. This fear forces you to behave incompetently. Customers, users, and managers expect _fearless competence_.\n      They expect that if you see something wrong or dirty, you will fix it and clean it. They don't expect you to allow\n      problems to fester and grow - they expect you to stay on top of the code, keeping it as clean and clear as\n      possible. How to eliminate that fear? Use TDD.\n    - qa should find nothing - the Agile practices support this expectation.\n    - test automation - manual tests are always eventually lost. Manual tests are expensive and so are always a target\n      for reduction. Besides, asking humans to do what machines can do is expensive, inefficient, and immoral. Every\n      test that can be feasibly automated must be automated. Manual testing should be limited to those things that\n      cannot be automatically validated and to the creative discipline of Exploratory Testing.\n    - we cover for each other - each individual member of a software team makes sure that there is someone who can for\n      him if he goes down. It is your responsibility to make sure that one or more of your teammates can cover for you.\n    - honest estimates - you should provide estimates based on what you do and do not know. You can estimate in relative\n      terms (task B should take half of the time spent on task A), you can also estimate using ranges.\n    - you need to say \"no\" - when answer for something is \"no\", then the answer is really \"no\". For example if solution\n      for a problem can not be found.\n    - continuous aggressive learning - our industry changes quickly. We must be able to change with it. So learn, learn,\n      learn! Learn with or without company's help.\n    - mentoring - the best way to learn is to teach. So when new people join the team, teach them. Learn is to teach\n      other.\n\nCustomer Bill of Rights:\n\n- You have the right to an overall plan and to know what can be accomplished when and at what cost.\n    - We cannot agree to deliver fixed scopes on gard dates. Either the scopes or the dates must be soft.\n- You have the right to get the most possible value out of every iteration.\n    - The business has the right to expect that developers will work on the most important things at any given time, and\n      that each iteration will provide them the maximum possible usable business value.\n- You have the right to see progress in a running system, proven to work by passing repeatable tests that you specify.\n- You have the right to change your mind, to substitute functionality, and to change priorities without paying\n  exorbitant costs.\n- You have the right to be informed of schedule and estimate changes, in time to choose how to reduce the scope to meet\n  a required date. You can cancel at any time and be left with a useful working system reflecting investment to date.\n\nDeveloper Bill of Rights:\n\n- You have the right to know what is needed with clear declarations of priority.\n    - Developers are entitled to precision in requirements and in the importance of those requirements. This right\n      applies within the context of an iteration. Outside an iteration, requirements and priorities will shift and\n      change.\n- You have the right to produce hugh-quality work at all times.\n    - The business has no right to tell developers to cut corners or do low quality work. Or, to say this differently,\n      the business has no right to force developers to ruin their professional reputations or violate their professional\n      ethics.\n- You have the right to ask for and receive help from peers, managers, and customers.\n    - This statement gives programmers the right to communicate.\n- You have the right to make and update your estimates.\n    - You can change your estimate when new factors come to light. Estimates are guesses that get better with time.\n      Estimates are never commitments.\n- You have the right to accept your responsibilities instead of having them assigned to you.\n    - Professionals accept work, they are not assigned work. A professional developer has every right to say \"no\" to a\n      particular job or task. It may be that the developer does not feel confident in their ability to complete the\n      task, or it may be that the developer believes the task better suited for someone else. Or, it may be that the\n      developer rejects the task for personal or moral reasons. Acceptance implies responsibility.\n\n> Agile is a set of rights, expectations, and disciplines of the kind that form the basis of an ethical profession.\n\n## Chapter 3: Business Practices\n\nIf you would like an accurate and precise estimate of a project, then break it down into individual lines of codes. The\ntime it takes you to do this will give you a very accurate and precise measure of how long it took you to build the\nproject.\n\nTrivariate Analysis - such estimates are composed of three numbers: best-case, nominal-case, and worst-case. These\nnumbers are confidence numbers. The worst-case number is the amount of time which you feel 95% confident that the task\nwill be completed. The nominal-case has only 50% confidence, and the best case only 5%.\n\nStories and Points - a user story is an abbreviated description of a feature of the system, told from the point of view\nof a user. We want to delay the specification of those details as long as possible, right up to the point where the\nstory is developed.\n\nStory points are a unit of estimated effort, not real time. They are not even estimated time - they are estimated\neffort. Velocity is not a commitment. The team is not making a promise to get 30 points done during the iteration. They\naren't even making the promise to try get 30 points done. This is nothing more than their best guess as to how many\npoints will be complete by the end of the iteration.\n\nThe Four-Quadrant Game (The Highest Return of Investment) - the stories that are valuable but cheap will be done right\naway. Those that are valuable but expensive will be done later. Those that are neither valuable nor expensive might get\ndone one day. Those that are not valuable but are expensive will never be done.\n\nYesterday's weather - the best predictor of today's weather is yesterday's weather. The best predictor of the progress\nof an iteration is the previous iteration.\n\nThe project is over when there are no more stories in the deck worth implementing.\n\nUser stories are simple statements that we use as reminders of features. We try not to record too much detail when we\nwrite the story because we know that those details will likely change. Stories follow a simple set of guidelines that we\nremember with the acronym INVEST:\n\n- I - Independent - they do not need to be implemented in any particular order. This is a soft requirement because there\n  may be stories that depend on other stories. Still, we try to separate the stories so that there is little dependence.\n- N - Negotiable - we want details to be negotiable between the developers and the business.\n- V - Valuable - the story must have clear and quantifiable value to the business. Refactoring/Architecture/Code cleanup\n  is never a story. A story is always something that the business values.\n- E - Estimable - must be concrete enough to allow the developers to estimate it.\n- S - Small - a user story should be larger than one or two developers can implement in a single iteration.\n- T - Testable - the business should be able to articulate tests that will prove that the story has been completed.\n\nThere are number of schemes for estimating stories:\n\n- Flying Fingers\n- Planning Poker\n\nA spike is a meta-story, or a story for estimating a story. It is a spike because it often requires us to develop a long\nbut very thin slice through all the layers of the system. For example, there is a story you cannot estimate: Print PDF -\nyou have never used the PDF library. So you write a new story called Estimate Print PDF - now you estimate that story,\nwhich is easier to estimate.\n\nThe goal of each iteration is to produce data by getting stories done. The team should focus on stories rather than\ntasks within stories. It is far better to get 80% of the stories done than it is to get each story 80% done. Focus on\ndriving the stories to completion.\n\nA story cannot be completed without the acceptance tests. If QA continues to miss the midpoint deadline, one iteration\nafter another, then the ratio of QA engineers to developers is likely wrong. After the midpoint, if all the acceptance\ntests are done, QA should be working on the tests for the next iteration.\n\nThe definition of done is this: acceptance tests pass.\n\nIf we see a positive slope in velocity, it likely does not mean that the team is actually going faster. Rather, it\nprobably means that the project manager is putting pressure on the team to go faster. As that pressure builds, the team\nwill unconsciously shift the value of their estimates to make it appear that they are going faster. This is simple\ninflation. The points are a currency, and the team is devaluing them under external pressure. The lesson is that\nvelocity is a measurement not an objective. Don't put pressure on the thing you are measuring.\n\nEstimate is not a promise, and the team has not failed if the actual velocity is lower.\n\nThe practice of Small Releases suggest that a development team should release their software as often as possible. The\nnew goal, is Continuous Delivery - the practice of releasing the code to production after every change.\n\nAcceptance Tests - Requirements should be specified by the business.\n\nBDD - Behavior-Driven Development - the goal is to remove the techie jargon from the tests and make the tests appear\nmore like specifications that businesspeople would appreciate. At first, this was just another attempt at formalizing\nthe language of testing, in this case using 3 special adverbs: Given, When, and Then.\n\n## Chapter 4: Team Practices\n\nA metaphor can provide a vocabulary that allows the team to communicate effectively. On the other hand, some metaphors\nare silly to the point of being offensive to the customer.\n\nDDD solved the metaphor problem. Eric Evans coined the term _Ubiquitous Language_. What the team needs is a model of the\nproblem domain, which is described by a vocabulary that everyone (the programmers, QA, managers, customers, users)\nagrees on.\n\nThe Ubiquitous Language is used in all parts of the project. It is a thread of consistency that interconnects the entire\nproject during every phase of its lifecycle.\n\nA software project is not a marathon, not a sprint, nor a sequence of sprints. In order to win, you must pace yourself.\nIf you leap out of the blocks and run full speed, you will run out of energy long before you cross the finish line.\n\nYou must run at a Sustainable Pace. If you try to run faster than the pace you can sustain, you will have to slow down\nand rest before you reach the finish line. Managers may ask you to run faster than you should. You must not comply. It\nis your job to husband your resources to ensure that you endure the end.\n\n> Working overtime is not a way to show your dedication to your employer. What it shows is that you are a bad planner,\n> that you agree to deadlines to which you shouldn't agree, that you make promises you shouldn't make, that you are a\n> manipulable laborer and not a professional. This is not to say that all overtime is bad, nor that you should never\n> work overtime. There are extenuating circumstances for which the only option is to work overtime. But they should be\n> extremely rare. And you must be aware that the cost of that overtime will likely be greater than the time you save on\n> the schedule.\n\nThe most precious ingredient in the life of a programmer is sufficient sleep. Make sure you know how many hours of sleep\nyour body needs, and then prioritize those hours. Those hours will more than pay themselves back.\n\nNo one owns the code in an Agile project. The code is owned by the team as a whole. Any member of the team can check and\nimprove any module in the project at any time. The team owns the code collectively. Collective Ownership does not mean\nthat you cannot specialize. However, even as you specialize, you must also generalize. Divide your work between your\nspecialty and other areas of the code. Maintain your ability to work outside your specialty.\n\nThe continuous build should never break.\n\nStandup Meeting:\n\n- This meeting is optional. Many teams get by just fine without one.\n- It can be less often than daily. Pick the schedule that makes sense to you.\n- It should take ~10 minutes, even for large teams.\n- This meeting follows a simple formula.\n\nThe basic idea is that the team members stand in a circle and answer 3 questions:\n\n1. What did I do since the last meeting?\n2. What will I do until the next meeting.\n3. What is in my way?\n4. [Optional] Whom do you want to thank?\n\nNo discussion. No Posturing. No deep explanations. No complaints. Everybody gets 30 seconds to answer those 3 questions.\n\n## Chapter 5: Technical Practices\n\nWithout TDD, Refactoring, Simple Design and Pari Programming, Agile becomes an ineffective flaccid shell of what it was\nintended to be.\n\nTEST-DRIVEN DEVELOPMENT. Every required behavior should be entered twice: once as a test, and then again as production\ncode that makes the test pass.\n\nThe 3 rules of TDD:\n\n1. Do not write any production code until you have first written a test that fails due to the lack of that code.\n2. Do not write more of a test that is sufficient to fail - and failing to compile counts as a failure.\n3. Do not write more production code that is sufficient to pass the currently failing test.\n\nThe tests are a form of documentation that describe the system being tested. This documentation is written in a language\nthat the programmers know fluently. It is utterly unambiguous, it is so formal it executes, and it cannot get out of\nsync with the application code. The test are the perfect kind of documentation for programmers: code.\n\nRemember that function that is hard to test after the fact? The function is hard to test because you did not design it\nto be easy to test. You wrote the code first, and you are now writing the tests as and afterthought. By writing the\ntests first, you will decouple the system in ways that you had never thought about before. The whole system will be\ntestable, therefore, the whole system will be decoupled.\n\nREFACTORING. Refactoring is the practice of improving the structure of the code without altering the behavior, as\ndefined by tests. In other words, we make changes to the names, the classes, the functions and the expressions without\nbreaking any of the tests.\n\nRed/Green/Refactor:\n\n1. We create a test that fails.\n2. Then we make the test pass.\n3. Then we clean up the code.\n4. Return to step 1.\n\nThe word Refactoring should never appear on a schedule. Refactoring is not the kind of activity that appears on a plan.\nWe do not reserve time for refactoring. Refactoring is simply part of our minute-by-minute, hour-by-hour approach to\nwriting software.\n\nSometimes the requirements change is such a way that you realize the current design and architecture of the system is\nsuboptimal, and you need to make a significant change to the structure of the system. Such changes are made within the\nRed/Green/Refactor cycle. We do not create a project specifically to change the design. We do not reserve time in the\nschedule for such large refactorings. Instead, we migrate the code one small step at a time, while continuing to add new\nfeatures during normal Agile cycle.\n\nSIMPLE DESIGN. The practice of Simple Design is one of the goals of Refactoring. Simple Design is the practice of\nwriting only the code that is required with a structure that keeps it simplest, smallest, and the most expressive.\n\nRules of Simple Design:\n\n1. Pass all the tests.\n2. Reveal the intent - It should be easy to read and self-descriptive. This is where we apply many of the simpled and\n   more cosmetic refactorings. We also split large functions into smaller, better-named functions.\n3. Remove duplication.\n4. Decrease elements - Once we have removed all the duplication, we should strive to decrease the number of structural\n   elements, such as classes, functions, variables.\n\nThe more complex the design, the greater the cognitive load placed on the programmers. That cognitive load is Design\nWeight. The greater the weight of that design, the more time and effort are required for the programmers to understand\nand manipulate the system.\n\nPAIR PROGRAMMING. Pairing is the act of two people working together on a single programming problem. Any configuration\nis fine (the same workspace, sharing the screen, keyboard, ping-pong, ...). We pair so that we behave like a team. When\na member of a team goes down, the other team members cover the hole left by that member and keep making progress towards\nthe goal. **Pairing is the best way, by far, to share knowledge between team members and prevent knowledge silos from\nforming. It is the best way to make sure that nobody on the team is indispensable.**\n\nThe word \"pair\" implies that there are just 2 programmers involved in a pairing session. While this is typically true,\nit is not a rule.\n\nGenerally, managers are pleased to see programmers collaborating and working together. It creates the impression that\nwork is being done.\n\n**Never, ever, ever, ask for permission to pair. Or test. Or refactor. Or... You are the expert. You decide.**\n\n## Chapter 6: Becoming Agile\n\nAgile Values:\n\n1. Courage - It is reckless to conform to a schedule by sacrificing quality. The belief that quality and discipline\n   increase speed is a courageous belief because will constantly be challenged by powerful but naive folks who are in a\n   hurry.\n2. Communication - A team that sits together and communicates frequently can work miracles. We value direct and frequent\n   communication that crosses channels. Face-to-face, informal, interpersonal conversations.\n3. Feedback - Maximize the frequency and quantity of feedback. They allow us to determine when things are going wrong\n   early enough to correct them. They provide massive education about the consequences of earlier decisions.\n4. Simplicity - Numbers of problems should be reduced to minimum. Therefore, indirection can be kept to a minimum.\n   Solutions can be simple. This applies to the software, but it also applies to the team. Passive aggression is\n   indirection. Keep the code simple. Keep the team simpler.\n\nThese values are diametrically opposed to the values of large organisations who have invested heavily in\nmiddle-management structures that value safety, consistency, command-and-control, and plan execution.\n\nIt is not really possible to transform such an organisation to Agile.\n\nAgile coaches are members of the team whose role is to defend the process within the team. In the heat of development,\ndevelopers may be tempted to go off process. Perhaps they inadvertently stop pairing, stop refactoring, or ignore\nfailures in the continuous build. The coach acts as the team's conscience, always reminding the team of the promises\nthey made to themselves and the values they agreed to hold. This role typically rotates from one team member to the next\non an informal schedule and based on need. A mature team working steadily along does not require a coach. On the other\nhand, a team under some kind of stress (schedule, business or interpersonal) may decide to ask someone to fill the role\ntemporarily.\n\nEvery member of an Agile team needs to understand the values and techniques of Agile. Therefore, if one member of the\nteam is trained, all members of the team should be trained.\n\nAgile is for small- to medium-sized teams. period. It works well for such teams. Agile was never intended for large\nteams. The problem of large teams is a problem societies and civilizations. And large teams are a solved problems.\n\nAgile was invented because we did not know how to effectively organize a relatively small group of programmers to be\neffective. Software development needed its own process because software is really like nothing else.\n\nThe answer to the question of Agile in the large is simply to organize your developers into small Agile teams, then use\nstandard management and operations research techniques to manage those teams.\n\nGreat tools do the following:\n\n- Help people accomplish their objectives\n- Can be learned \"well enough\" quickly\n- Become transparent to users\n- Allow adaptation and exaptation\n- Are affordable\n\nGit is an example of a great tool.\n\nYour team should establish the pattern of work compatible with their specific context first, and then consider using\ntools that support their workflow. Workers use and control tools, tools don't control and use people. You don't want to\nget locked into other people's process flows.\n\nALM - Agile Lifecycle Management systems despite being feature rich and commercially successful, ALM tools utterly fail\nat being great.:\n\n- ALMs tend to be complicated, usually demanding up-front training.\n- These tools often require constant attention.\n- ALM tools aren't always easily adapted.\n- ALM tools can be expensive.\n- ALM does rarely work the way your team does, and often their default mode is at odds with Agile methods. For example\n  many ALM tools assume that team members have individual work assignments, which makes them nearly unusable for teams\n  who work together in a cross-functional way.\n\nYou can try different forms of Agile practices and check which one is the most relevant to your team's needs:\n\n- Kanban - making the work visible, limiting work in progress and pulling work through the system.\n- Scrum and XP - short daily meetings, a product owner, a process facilitator (Scrum Master), retrospectives, a\n  cross-functional team, user stories, small releases, refactoring, writing tests first, and pair programming.\n- Align team events - when the team events across multiple teams (standups, retrospectives) are aligned in time, it is\n  possible to then roll up daily and systematic impediments via an escalation tree.\n- Escalation trees - if it makes sense to always work on items that produce the highest value, then it makes sense to\n  escalate impediments immediately via a well-defined escalation path.\n- Regular interteam interaction - regular interaction between the Scrum Masters, Product Owners and team members who are\n  working together toward a common deliverable.\n- Portfolio Kanban - sets work in progress limits at the initiative level in order to ensure that the organization is\n  focused on the highest-value work at all times.\n- Minimum Viable Increments - what is the shortest path to producing the highest value in the shortest time. A growing\n  number of organizations are taking this to extreme by implementing Continuous Delivery - releasing small updates on a\n  frequent basis, sometimes as frequently as multiple times per day.\n\nEnablers of multiteam coordination:\n\n- SOLID - especially useful for simplifying multiteam coordination by dramatically reducing dependencies.\n- Small, valuable user stories - limit the scope of dependencies, which simplifies multiteam coordination.\n- Small, frequent releases - whether these releases are delivered to the customer or not, the practice of having a\n  releasable product across all the teams involved helps to surface coordination and architectural issues so that the\n  root cause can be found and addressed.\n- Continuous Integration - calling for integration across the entire product after every checkin.\n- Simple Design - one of the hardest practices to learn and apply because it is one of the most counter-intuitive\n  practices. When coordinating the work of massive dependencies between teams, monolithic, centralized, preplanned\n  architectures create massive dependencies between teams that tend to force them to work in lock step, thus defeating\n  much of the promise of Agile. Simple Design, especially when used with practices such as a microservices architecture,\n  enables Agility in large.\n\n## Chapter 7: Craftsmanship\n\nMany companies misunderstood Agile. Managers are willing to push developers to work faster and are using the full\ntransparency of the process to micromanage them. Developers are pushed hard to fit their estimates into the imposed\nmilestones. Failing to deliver all story points in a sprint means developer must work harder in the next sprint to make\nup the delay. If the product owner thinks developers are spending too much time on things like automated tests,\nrefactoring, or pairing they simply tell them to stop doing it.\n\nStrategic technical work has no place in _their_ Agile process. There is no need for architecture or design. The order\nis to simply focus on the highest-priority item in the backlog and get it done as fast as possible. This approach\nresults in a long sequence of iterative tactical work and accumulation of technical debt. Bugs are accumulating,\ndelivery time goes up, people start to blame one another.\n\n> Companies are still not mature enough to understand that technical problems are in fact business problems.\n\nA group of developers met in November 2008 in Chicago to create a new movement: Software Craftsmanship.\n\nManifesto:\n\nAs aspiring Software Craftsmen, we are raising the bar or professional software development by practicing it and helping\nothers learn the craft. Through this work we have come to value:\n\n- Not only working software, but also well-crafted software.\n- Not only responding to change, but also steadily adding value.\n- Not only individuals and interactions, but also a community of professionals.\n- Not only customer collaboration, but also productive partnership.\n\nThe Software Craftsmanship manifesto describes an ideology, a mindset. It promotes professionalism through different\nperspectives.\n\n**Well-crafted software** - code that is well-designed and well tested. It is code that we are not scared to change and\ncode that enables business to react fast. It is code that is both flexible and robust.\n\n**Steadily adding value** - no matter what we do, we should always be committed to continuously provide increasing value\nto our clients and customers.\n\n**A community of professionals** - we are expected to share and learn with each other, raising the bar of our industry.\nWe are responsible for preparing the next generation of developers.\n\n**Productive partnership** - we will have a professional relationship with our clients and employers. We will always\nbehave ethically and respectfully, advising and working with our clients and employers in the best way possible. We will\nexpect a relationship of mutual respect and professionalism.\n\nWe will look at our work not as something we need to do as part of a job but as a professional service we provide. We\nwill take ownership of our own careers, investing our own time and money to get better at what we do. Craftspeople\nstrive to do the best job they can, not because someone is paying, but based on a desire to do things well.\n\nDevelopers should not ask for authorization for writing tests. They should not have separate tasks for unit testing or\nrefactoring. These technical activities should be factored into the development of any feature. They are not optional.\nManagers and developers should only discuss what is going to be delivered and when, not how. Every time developers\nvolunteer details of how they work, they are inviting managers to micromanage them. Developers should be able to clearly\ndescribe how they work and the advantages of working that way to whomever is interested. What developers should not do\nis to let other people decide how they work.\n\nConversations between developers and business should be about why, what and when - not how.\n\nCraftsmanship promotes software development as a profession. A profession is part of who we are. A job is a thing that\nwe do but is not part of who we are. A profession is something we invest in. It is something we want to get better at.\nWe want to gain more skills and have a long-lasting and fulfilling career.\n\nCombining Agile and Craftsmanship is the perfect way to achieve business agility.\n\n## Chapter 8: Conclusion\n\nThis book covered basics of Agile.\n\n## Afterword\n\nAsk the developers in an \"Agile organization\" what Agile is, and you will likely get a very different answer than if you\nask anyone beyond the level of a software development manager.\n\nDevelopers understand Agile to be a methodology for streamlining the development process and for making software\ndevelopment more predictable, more practicable, and more manageable.\n\nMany developers are blissfully unaware of management's use of the metrics provided by the implementation of Agile\npractices and the data it produces.\n"
  },
  {
    "path": "books/clean-code.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Clean Code: A Handbook of Agile Software Craftsmanship\n\nBook by Robert Cecil Martin\n\n- [Chapter 1: Clean Code](#chapter-1-clean-code)\n- [Chapter 2: Meaningful names](#chapter-2-meaningful-names)\n- [Chapter 3: Functions](#chapter-3-functions)\n- [Chapter 4: Comments](#chapter-4-comments)\n- [Chapter 5: Formatting](#chapter-5-formatting)\n- [Chapter 6: Objects and Data Structures](#chapter-6-objects-and-data-structures)\n- [Chapter 7: Error Handling](#chapter-7-error-handling)\n- [Chapter 8: Boundaries](#chapter-8-boundaries)\n- [Chapter 9: Unit Tests](#chapter-9-unit-tests)\n- [Chapter 10: Classes](#chapter-10-classes)\n- [Chapter 11: Systems](#chapter-11-systems)\n- [Chapter 12: Emergence](#chapter-12-emergence)\n- [Chapter 13: Concurrency](#chapter-13-concurrency)\n- [Chapter 17: Smells and Heuristics](#chapter-17-smells-and-heuristics)\n\n## Chapter 1: Clean Code\n\n- ugly code is expensive\n- take your time to write a good code\n- bad code programmer's fault, not PO's, manager's or anyone's else\n- bad code is like a building with broken windows - people see ugly building and stop caring\n- code like a prose, code should look like you care\n- make the language look like it was made of the problem\n- code rot quickly\n\n## Chapter 2: Meaningful names\n\nVariable name should answer all the questions. It should tell why it exists. If a name requires a comment it does not\nreveal its content. Names should be pronounceable. One letter variables are hard to `grep` in the code - should be ONLY\nas local variables inside short methods. The length of a name should correspond to the size of its scope. Avoid\nencodings.\n\n> Difference between a smart programmer and a professional programmer is that professional programmer understands that\n> **clarity is a king**.\n\nDon't be funny 😔 People tend to forget jokes, so people will forget true meaning of a variable. Choose clarity over\nentertainment. Do not use slang or culture-dependant names.\n\nPick one word per concept, e.g. `get` instead of `fetch`, `retrieve`, ...\n\n## Chapter 3: Functions\n\nFunctions are the first line of organisation in any program. Functions should be small. No more than 2-3 indents.\n\n> Functions should do one thing. They should do it well. They should do it only.\n\nThe reason we write functions is to decompose a larger concept. A function should not mix the levels of abstractions.\n\n> You know you are working on clean code when each routine turns out to be pretty much what you expected.\n\nDon't be afraid to make a name long. The more function arguments the worse - difficulties with testing.\n\nPassing a boolean flag to a function is extremely ugly. Grouping arguments into objects seems like cheating, but it is\nnot.\n\nFunctions should have no side effects\n\n*Command Query Separation* - functions should either do something or answer something, but not both.\n\nExceptions are preferred than error codes. Suggestion to extract exception handling to separate function.\n\n*Don't repeat yourself* - duplication may be the root of all evil in software. Database norms formed to eliminate\nduplication in data, OOP concentrates the code, etc.\n\n> Writing software is like any other kind of writing. When you write a paper or article, you get your thoughts down\n> first, then you massage it until it **reads well**.\n\n> The art of programming is, and always has been, the art of language design.\n\n## Chapter 4: Comments\n\nComments are usually bad, they mean you failed to express yourself in code. IMO: the Best comments are the ones that are\nexplaining why things were done in a particular way.\n\nDon't put historical discussions or irrelevant details into the comments.\n\n## Chapter 5: Formatting\n\nCode formatting is important. Visual design of the code is important. Variable should be declared \"in well-known for\neverybody places\". Functions should show natural flow -> top-down.\n\nAnother matter is alignment, e.g. of test cases in parametrised tests. However, variables declarations is an overkill.\n\nHowever, a team should agree upon a single formatting style.\n\n## Chapter 6: Objects and Data Structures\n\nHiding implementation is about abstractions.\n\nThe Law of Demeter - a module should not know about the innards of the objects it manipulates. Class *C* has a method *\nf*, method *f* should call the methods of: *C*, object created by *f*, object passed as an argument to *f* or object\nheld in an instance variable of *C*.\n\nTrain wreck: `ctxt.getOptions().getScratchDir().getAbsolutePath()` - a bunch of couples train cars. Does it violate The\nLaw of Demeter? `ctxt` contains options, which contain a scratch directory, which has absolute path - a lot of\nknowledge. However, in this case this law does nto apply because these are data structures with no behaviour. It would\nbe good to hide the structure of `ctxt`, e.g.: `ctxt.getScratchDirectoryOption().getAbsolutePath()`.\n\nData Transfer Objects - a class with public variables and no functions, e.g. for communicating with the database.\n\nObjects - expose behaviour and hide data, data structures - expose data and have no significant behaviour.\n\n## Chapter 7: Error Handling\n\nError handling is important, but if it obscures logic, it is wrong. Exceptions are preferred over return codes - return\ncodes can clutter the caller with unnecessary code.\n\n`try` blocks are like transactions, `catch` has to leave the program into a consistent state.\n\nError messages need to be informative - mention the operation that failed and the type of failure.\n\nIt might be a good idea to wrap library's error with your own exceptions - this makes library easily replaceable.\n\n## Chapter 8: Boundaries\n\nHow to keep boundaries of our system clean - e.g. when using external libraries:\n\n- when working with collections, wrap them with object and provide only required functionalities.\n- write learning tests - write tests to explore and understand API\n- our code shouldn't know too many details about 3rd-party library\n- use ADAPTER interface - converted from our perfect interface to the provided interface\n\n## Chapter 9: Unit Tests\n\nThe Three Laws of TDD:\n\n- You may not write production code until you have written a failing unit test\n- You may not write more of unit code than is sufficient to fail, and not compiling is failing\n- You may not write more production code than is sufficient to pass the currently failing test\n\nTest code is just as important as production code. It is not second-class citizen. It must be kept as clean as\nproduction code.\n\nThe Build-Operate-Check pattern - each test is split into three parts:\n\n1. build up the test data\n2. operate on test data\n3. check that the operation yielded the expected results\n\nTest code must be: simple, succinct, expressive, however it doesn't need to be as efficient as production code.\n\nOne test should test a single concept.\n\nClean tests follow 5 rules - FIRST:\n\n- F - Fast - tests should be fast, they should run quickly, if they don't you won't want to run them frequently.\n- I - Independent - Tests should not depend on each other, one test should not set up conditions for the next test\n- R - Repeatable - Tests should be repeatable in any environment (office, home, train without network), if they are not\n  you will have an excuse for why they fail\n- S - Self-Validating - Tests should not have a boolean output, they should either fail or pass\n- T - Timely - Tests need to be written in a timely fashion, should be written just before the production code\n\n## Chapter 10: Classes\n\nClasses should be small. The second rule is that they should be smaller than that. Naming is most probably the best way\nof determining class size. If we cannot derive a concise name for a class, then it is likely too large.\n\nThe Single Responsibility Principle - a class or module should have one, and only one reason to change.\n\nCohesion - classes should have a small number of instance variables. Each of class' methods should manipulate one or\nmore of those variables.\n\nOpen-Closed Principle - class should be open for extensions but closed for modifications.\n\nDependency Inversion Principle - our classes should depend upon abstractions, not on concrete details.\n\n## Chapter 11: Systems\n\nIt is a myth we can get the systems \"right the first time\". Instead, we should implement only today's stories, then\nrefactor and expand the system to implement new stories tomorrow. This is the essence of iterative and incremental\nagility.\n\nUse the simplest thing that can possibly work.\n\n## Chapter 12: Emergence\n\nAccording to Kent, a design is simple if it follows these rules:\n\n- runs all tests - system needs to be testable - if this can not be achieved, system should not be released, all tests\n  need to pass\n- contains no duplication\n- expresses the intent of the programmer - the clearer the code, the less time others will have to spend understanding\n  it (small functions and classes, good names)\n- minimises the number of classes and methods - the least important rule, above rules are more important, however\n  overall goal should be to keep system small\n\nCan set of practices replace experience? No. On the other hand, practices are a crystallised form of the many decades of\nexperience of many authors.\n\n## Chapter 13: Concurrency\n\nConcurrency is a decoupling strategy. It helps to decouple what gets done from when it gets done. In single-threaded\napps wheat and when are strongly coupled.\n\nConcurrency Defence Principles:\n\n- Single Responsibility Principle - concurrency-related code should be kept separately from other code\n- limit the access to any data that may be shared\n- a good way of avoiding shared data is to avoid sharing data in the first place\n- use copy of data , collect results from multiple threads and merge results\n- threads should be as independent as possible\n\nJava supports thread-safe collections, e.g. ConcurrentHashMap, there are other classes to support advanced concurrency:\nReentrantLock - a lock that can be acquired and released, Semaphore - a classic lock with count, CountDownLatch - a lock\nthat waits for a number of events before releasing all threads waiting on it.\n\nCouple of behaviours:\n\n- Bound Resources - resources of a fixed size or number used in a concurrent environment, e.g. database connection\n- Mutual Exclusion - only one thread can access shared data or a shared resource at a time\n- Starvation - thread(s) prohibited from proceeding for an excessively long time or forever\n- Deadlock - two or more threads waiting for each other to finish\n- Livelock - threads in lockstep, each trying to do work but finding another \"in the way\", threads continue trying to\n  make progress but are unable\n\nExecution models:\n\n- producer - consumer - one or more threads create some work and place it in a queue, one or more consumer threads\n  acquire that work from queue and complete it\n- readers - writers - writers wait until there is no readers before allowing the writer to perform an update, if there\n  are continuous readers, writers will starve\n- dining philosophers - a hungry philosopher needs 2 forks before accessing the food, after consumption releases forks\n  and waits until he is hungry again. There are number of solutions to this problem.\n\n`synchronised` keyword introduces a lock in Java. Locks are expensive so use them carefully, also such sections should\nbe small.\n\nGraceful shutdown is hard to get correct. Think about it early and get it working early.\n\nGeneral tips:\n\n- get your non-threaded code working first\n- make threaded-based code pluggable (one thread, n threads, ...)\n- run with more threads than processors\n\n## Chapter 17: Smells and Heuristics\n\nComments:\n\n- Metadata should not appear in the comment (author, modification date). Comments should be reserved for technical notes\n  only.\n- Do not write comments that will become obsolete.\n- Do not paraphrase code.\n- Be brief and correct.\n- Instead of commenting-out code - delete it.\n\nEnvironment:\n\n- You should be able to check out system with one simple command.\n- You should be able to run all unit tests with just one command.\n\nFunctions:\n\n- Functions should have a small number of arguments, no argument is best. More than 3 arguments is very questionable and\n  should be avoided.\n- Output arguments are counterintuitive readers expect arguments to be inputs, not outputs. If function must change\n  state of something, have it change the state of the object it is called on.\n- Flag arguments should be avoided (boolean flags) - they loudly declare function is doing multiple things.\n- Methods that are never called should be removed. Dead code is wasteful.\n\nGeneral:\n\n- The ideal source files should contain one, and only one language (for example Java + JavaScript snippets + English\n  comments).\n- Function / Class should implement the behaviours that another programmer could reasonably expect.\n- Check every boundary condition.\n- No duplication, perhaps the most important rule. Duplicated code means a missed opportunity for abstraction. Codd\n  Normals Forms are a strategy for eliminating duplication.\n- It is important to create abstractions that separate higher level general concepts from lower level detailed concepts.\n- High level concepts should be independent of low level derivatives.\n- A well-defined interface does not offer very many functions to depend upon, so coupling is low. Good software\n  engineers learn to limit what they expose at the interfaces of their classes and modules.\n- Get rid of dead code - code that is never executed.\n- Variables and functions should be defined close to where they are used.\n- Use consistent naming.\n- Keep source code organised and free of clutter.\n- Things that don't depend upon each other should not be artificially coupled.\n- Feature envy - the methods of a class should be interested in the variables and functions of the class they belong to,\n  and not the variables and functions of other classes.\n- Code should be expressive as possible.\n- Code should be placed where a reader would naturally expect it to be (the principle of the least surprise).\n- Think if function should be static or not.\n- Variables should have meaningful names, also use intermediate variables when performing difficult calculations.\n- Function names should say what they do, if you can't understand what function does by reading the call - change the\n  name.\n- Polymorphism is preferred over if / else or switch / case statements.\n- Follow code standards.\n- Replace magic numbers with named constants.\n- Be precise, use appropriate data structures.\n- Encapsulate conditions - boolean logic is hard to understand without having to see it in the context, extract the\n  functions that explain the intent of the conditional.\n- Avoid negative conditions - harder to understand.\n- Functions should do one thing.\n- Encapsulate boundary conditions.\n- The statements within a function should all be written at the same level of abstraction.\n- Keep configurable data at high levels.\n- Law of Demeter - we don't want a single module to know much about its collaborators.\n\nNames:\n\n- Choose descriptive names. Names in software are 90% of what makes software readable.\n- Choose names at the appropriate level of abstraction. Don't pick names that communicate implementation details.\n- Use standard nomenclature where possible.\n- Use unambiguous names.\n- Names should describe side effects.\n\nTests:\n\n- Use coverage tool.\n- Don't skip trivial tests.\n- Test boundary conditions.\n- Tests should be fast.\n"
  },
  {
    "path": "books/coaching-agile-teams.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Coaching Agile Teams\n\nBook by Lyssa Adkins\n\n- [1. Will I be a Good Coach?](#1-will-i-be-a-good-coach)\n\n## 1. Will I be a Good Coach?\n\nIf teams are to have kinds of stellar experiences, leverage agile to teh full competitive advantage it was meant to\nprovide.\n\nAgile coaching matters because it helps both, producing products that matter in the real, complex and uncertain world,\nand adding meaning to people's work lives.\n\nAgile is easy to get going yet hard to do well.\n\nImagine a team that admits mistakes, reinforces their shared values, forgives one another, and moves on. Do you think\nsuch a team would come up with astonishing ideas?\n\nAn agile (or Scrum) coach is:\n\n- someone who appreciates teh depths of agile practices and principles and can help teams appreciate them too\n- someone who has faces big dragons, organizational impediments, and has become a coach to managers and other outsiders\n  in the course of addressing them\n- someone who can help management at all levels of the organization the benefits of working agile\n- someone who has brought ideas from professional facilitation, coaching, conflict management, mediation, theater and\n  more to help the team become a high-performance team\n\nNative wiring for coaching:\n\n- ability to \"read a room\", ability to read emotion in the air and know whether all is good\n- care about people more than products\n- cultivate curiosity\n- believe that people are basically good\n- they know that plans fall apart, so they act in the moment with the team\n- any group of people can do good things\n- it drives them crazy when someone says \"yeah, I know, it's a waste of time, but that's how we do it here\"\n- chaos and destruction are simply building blocks for something better\n- they risk being wrong\n"
  },
  {
    "path": "books/code-complete.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Code Complete: A Practical Handbook of Software Construction\n\nBook by Steve McConnell\n\n- [Chapter 1: Software Construction](#chapter-1-software-construction)\n- [Chapter 2: Metaphors for a Richer Understanding of Software Development](#chapter-2-metaphors-for-a-richer-understanding-of-software-development)\n- [Chapter 8: Defensive Programming](#chapter-8-defensive-programming)\n- [Chapter 20: The Software-Quality Landscape](#chapter-20-the-software-quality-landscape)\n- [Chapter 21: Collaborative Construction](#chapter-21-collaborative-construction)\n- [Chapter 22: Developer Testing](#chapter-22-developer-testing)\n- [Chapter 24: Refactoring](#chapter-24-refactoring)\n- [Chapter 25: Code-Tuning Strategies](#chapter-25-code-tuning-strategies)\n- [Chapter 32: Self-Documenting Code](#chapter-32-self-documenting-code)\n- [Chapter 33: Personal Character](#chapter-33-personal-character)\n- [Chapter 34: Themes in Software Craftsmanship](#chapter-34-themes-in-software-craftsmanship)\n\n## Chapter 1: Software Construction\n\nConstruction - process of building (planning, designing, checking the work). Construction is mostly coding and debugging\nbut also involves designing, planning, unit testing, ... Centre of the software development process. The only activity\nthat is guaranteed to be done (planning might be imperfect, etc.).\n\n## Chapter 2: Metaphors for a Richer Understanding of Software Development\n\nMetaphors contribute to a greater understanding of software-development issues - paper writing metaphor, farming\nmetaphor, etc...\n\n## Chapter 8: Defensive Programming\n\nProtecting yourself from \"cruel world of incorrect data\". Use assertions to document assumptions made in the code.\n\nGuidelines:\n\n- use assertions for conditions that should never occur, this is not error checking code. On error program should take\n  corrective actions, on assertion fail source code should be updated.\n- no executable code in asserts:\n    - bad: `assert foo(), ...`\n    - good: `result = foo(); assert result, ...`\n- use asserts to document and verify preconditions (before executing the routine) and post conditions (after executing\n  the routine)\n- for high robustness: failed assertions should be handled anyway\n\nError handling:\n\n- return neutral value - 0, empty string, ...\n- substitute the next piece of valid data - for example when processing stream of data from the sensor (e.g.\n  temperature)\n  , you may want to skip the missing value and wait for another\n- return the same answer as the previous time - some data might not change in time dramatically, so it is okay to return\n  the last correct value\n- substitute the closest legal value - for example reversing car does not show negative speed value but instead shows\n  0 (the closest legal value)\n- log a warning message on incorrect data\n- return error code - report error has been encountered and trust some other routine higher up will handle the error\n- call centralised error-processing routine, disadvantage is that entire program coupled with the mechanism\n- display error message to the user, warning: don't share too much with the user, attacker may use this information\n- shut down - useful in safety-critical applications\n\nWhile handling errors you need to choose between robustness (do something to keep the software alive) and correctness (\nensuring the data is always correct). Once approach is selected it should be coherent across the system.\n\nExceptions:\n\n- they eliminate the possibility to go unnoticed\n- throw only for truly exceptional situations - for situations that can not be addressed\n- if exception can be handled locally - handle locally\n- avoid exceptions in constructors, because if exception happens there, destruction might not be called - resource leak!\n- include all the information that led to the exception\n- avoid empty catch blocks\n- standardise project's use of exceptions\n\nBarricades:\n\n- similar to having isolated compartments in the hull of a ship, damaged parts are isolated\n- use validation classes that are responsible for cleaning the data\n- assume data is unsafe and you need to sanitise it\n\n*Offensive programming* - exceptional cases should be handled in a way that makes them obvious during development and\nrecoverable when production code is running. During development, you want errors to be as visible as possible but during\nproduction it should not be observable.\n\n## Chapter 20: The Software-Quality Landscape\n\nThere are many quality metrics: correctness, usability, efficiency, reliability, integrity, adaptivity, accuracy,\nrobustness - these are metrics important to the user, for a programmer more important metrics are: maintainability,\nflexibility, portability, reusability, readability, testability, understandability.\n\n*Techniques for Improving Software Quality*: set up software quality objectives, perform quality assurance activities,\nprototyping.\n\nDefect-detection techniques: design reviews, code reviews, prototyping, unit tests, integration tests, regression tests,\n... even all of them combined will not detect all the issues.\n\n> Most studies have found that inspections are cheaper than testing. A study at the Software Engineering Laboratory \n> found that code reading detected about 80% more faults per hour than testing.\n\nCost of detection is only one part. There is also cost of fixing the issues. The longer defect remains in the system,\nthe more expensive it becomes to remove.\n\nRecommended combination: Formal inspections of all requirements, architecture, design -> Modeling / prototyping -> Code\nreading -> Testing.\n\nRemember: Improving quality reduces development cost.\n\n## Chapter 21: Collaborative Construction\n\n> IBM found that each hour of inspection prevented about 100 hours or related work (testing and defect correction)\n\n> Reviews cut the errors by over 80%\n\n> Reviews create a venue for more experienced and less experienced programmers to communicate about technical issues.\n\nCollective ownership - code is owned by the group rather than by the individuals and can be accessed and modified by\nvarious members.\n\nGuide on pair programming:\n\n- it will not be effective if you argue on styling conventions\n- don't let it turn into watching - person without the keyboard should be an active participant\n- sometimes it is better to discuss something on the whiteboard and then go programming solo\n- rotate pairs\n- match other's pace, the fast learner needs to slow down\n- don't force people who don't like each other to pair\n- no pairing between newbies\n\nNice idea: for discussing the design everyone should come with a prepared list of potential issues. It is good to assign\nperspectives - maintainer, coder, user, designer. Author in such discussion should play minor role, should only present\nthe overview. Reviewer can be anyone outside author - tester, developer. Management should not be present at the\nmeeting, however should be briefed with the results after the discussion. Design review can not be used for performance\nappraisals. Group should be focused on identifying defects. Goal of this meeting is not to explore alternatives ar\ndebate who is right and who is wrong.\n\n> NASA's Software Engineering Laboratory found that code reading detected about 3.3 defects per hour of effort. Testing\n> detected 1.8 errors per hour.\n\n## Chapter 22: Developer Testing\n\n> You must hope to find errors in your code. Such hope might seem like an unnatural act, but you should hope that it's\n> you who finds the errors and not someone else.\n\nWhy TDD:\n\n- same effort to write test cases before and after\n- you detect defects earlier, and you can correct them more easily\n- forces you to think a little about the requirements and design before writing code\n- exposes requirements problems sooner\n\nDevelopers tend to write *clean tests* rather than test for all the ways code breaks. Developer's testing isn't\nsufficient to provide adequate quality assurance.\n\nGeneral Principle of Software Quality: improving quality improves the development schedule and reduces development cost.\n\n## Chapter 24: Refactoring\n\nThe Cardinal Rule of Software Evolution: Evolution should improve the internal quality of the program.\n\nSigns / smells that indicate refactoring is needed:\n\n- code duplication - you need to do parallel changes\n- too long routine\n- too long loop or too deeply nested\n- poor class cohesion - if a class takes ownership for many unrelated responsibilities\n- too many parameters\n- changes require parallel modifications to multiple classes\n- related data not organised into classes\n- overloaded primitive data type\n- class doesn't do much - sometimes the result of refactoring is that an old class doesn't have much to do\n- trap data - one routine just passes data to another\n- one class knows too much about the other\n- poor names\n- public data members - in general bad idea\n- subclass uses only a small percentage of its parent routines\n- comments should not be used to explain bad code - \"don't comment bad code, rewrite it\"\n- usage of setup code before routine call\n- code that \"seems like it might be needed one day\" - programmers are rather bad at guessing what functionality might be\n  needed someday, *design ahead* introduces unnecessary complexity\n\nData-Lever Refactoring:\n\n- replace magic number with a named constant\n- give a variable informative name\n- inline expressions\n- replace expression with a routine\n- convert data primitive to a class\n- encapsulate returned collection\n\nStatement-Level Refactoring:\n\n- decompose boolean expression - use variables that help document the meaning of the expression\n- move boolean expression into a well-named function\n- return as soon as you know the return value\n\nRoutine-Level Refactoring:\n\n- Inline simple routines\n- Convert long routine into a class\n- Separate query operations from modification operations\n- Combine similar routines by parametrizing themIf routine depends on the parameter passed in - consider splitting the\n  routine\n- Pass a whole object instead of specific fields, however if you are creating an object just to pass it to a routine,\n  consider changing the routine to take only specific fields\n- Routine should return the most specific object (mostly applicable to iterators, collections, ...)\n\nClass Implementation Refactoring:\n\n- Extract specialised code into a subclass - if class has code that is used by only a subset of its instances\n- Combine similar code into a superclass - if at least 2 classes have similar code\n\nClass Interface Refactoring:\n\n- Eliminate classes not doing too much\n- Hide a delegate - A calling B, A calling C, when really class A should call B and class B should call class C\n- Or remove middleman, remove B and make A call C directly\n- Hide routines that are not intended to be used outside the class\n- Encapsulate unused routines - if you use only small portion of class's interface\n\nRefactoring might cause a lot of harm if misused:\n\n- refactoring should be small\n- one refactoring at a time\n- make a list of needed steps\n- make a parking lot - in the middle of refactoring you might think about another refactoring, and another, and so on,\n  for changes that aren't required immediately save a list of TODO changes\n- check IDE / compiler / other tool's errors\n- refactored code should be retested, programmer should also add more test cases\n- be careful about small refactoring because they tend to introduce more bugs than big refactoring\n- adjust approach basing on the risk of the refactoring - some changes are more dangerous than the other\n\nRefactoring refers to making a changes in working code and do not affect the program's behaviour. Programmers who are\ntweaking broken code aren't refactoring - they are hacking.\n\nThere are many strategies on where refactoring should be started. For example, whenever you are adding a routine you\nshould refactor it's neighbour, or when you are adding a class, or you should refactor error-prone modules, the most\ncomplex modules, etc.\n\n## Chapter 25: Code-Tuning Strategies\n\nCode tuning is one way of improving a program's performance. You can find other ways to improve performance - faster and\nwithout harm to the code.\n\n> More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other \n> single reason - including blind stupidity ~ Wulf\n\nEfficiency can be seen from many viewpoints:\n\n- requirements\n\nTRW required sub-second response time - this led to highly complex design and cost ~100M $, analysis determined, users\nwould be satisfied with 4 seconds responses 90% of time, modifying the response time requirements reduced cost by ~70M\n$.\n\nBefore you invest time solving a performance problem, make sure you are solving a problem that needs to be solved.\n\n- design\n\nSometimes program design make it difficult to write high-performance system, others make it hard not to.\n\n- class and routine design\n\nOn this level algorithms and data structures matter.\n\n- OS interactions\n\nYou might not be aware the compiler generated code using heavy OS calls.\n\n- code compilation\n\nGood compilers, turn good high-level language code into optimised machine code.\n\n- hardware\n\nSometimes cheapest and the beast way to improve a program's performance is to buy a new hardware.\n\n- code tuning\n\nSmall-scale changes that affect a single class, routine or just few lines of code, that make it run mode efficiently.\n\nSome sources say, you can multiply improvements on each of the six levels, achieving performance improvement of a\nmillion-fold.\n\nCode tuning is not the most effective way to improve performance! Writing micro-efficient code does not prove you are\ncool. Efficient code isn't necessarily better.\n\nThe Pareto Principle: Also known as 80/20 rule, you can get 80% of the result with 20% of effort.\n\nWorking toward perfection might prevent completion. Complete it first, and then perfect it. The part that needs to be\nperfect is usually small.\n\nFalse statement: Reducing the lines of code in a high level-language improves the speed or size of the resulting machine\ncode.:\n\n```\n# This is slower:\nfor i = 1 to 10\n\ta[i] = i\n\n# This is faster:\na[1] = 1\na[2] = 2\n...\na[10] = 10\n```\n\nIt is also impossible to identify performance bottlenecks before program is working completely, hence \"You should\noptimise as you go\" is false. Also, premature optimisation is the root of all evil, because you are missing perspective.\n\nCompilers are really powerful, however they are better in optimising straightforward code than they are at optimising\ntricky code. So, design application properly, write clear code and compiler will do the rest :)\n\nSources of inefficiency:\n\n- I/O operations - if possible: store data in the memory\n- paging - an operation that causes the OS to swap pages of memory is much slower than operation that works on only one\n  page of memory.\n- system calls - calls to system routines are expensive (context switch, saving app state, recovering kernel state),\n  avoid using system calls, write your own routines using small part of the functionality offered by a system routine,\n  work with system vendor to improve performance\n- interpreted languages - :(\n- errors - errors in code can be another source of performance problems\n\nExperience doesn't help with optimisation. A person's experience might have come from an old machine, language or\ncompiler. You can never be sure about the effect of an optimisation until you measure the effect.\n\n## Chapter 32: Self-Documenting Code\n\nUnit development folder - informal document that contains notes used by a developer during construction - main purpose\nis to provide a trail of design decisions that aren't documented elsewhere.\n\nDetailed-design document - low-level design document, describes the class-level or routine-level design decisions.\n\nInternal documentation (within the program) is the most detailed kind of documentation. The main contributor to\ncode-level documentation isn't comments, but good programming style, good variable names, clear layout and minimisation\nof control-flow and data-structure complexity,\n\n> **Good comments don't repeat the code or explain it. They clarify its intent. Comments should explain, at a higher \n> level of abstraction than the code, what you are trying to do.**\n\nKinds of comments:\n\n- repeat of the code - comment gives no additional information\n- explanation of the code - code is so complicated it needs to be explained, make code better instead of adding comments\n- **summary of the code** - very useful when someone other than the code's original author tries to modify the code\n- **description of the codes' intent** - IBM study says \"understanding programmer's intent is the most difficult\n  problem\"\n- **information that cannot be expressed by code itself** - for example copyright notice, notes about design, references\n  to requirements\n\n3 types of acceptable comments were highlighted above.\n\nEffective commenting shouldn't be time-consuming. Guidelines for effective commenting:\n\n- if commenting style is too fancy it very likely becomes annoying to maintain\n- write pseudocode in comments\n- performance is not a good reason for avoiding commenting (in some languages commenting slows down execution /\n  compilation) - usually solution for this is to pass code through tool striping comments before release\n\nEnd-line comments pose several problems and should be avoided - hard to write meaningful comment in one line, not much\nspace on the right side of the screen.\n\nThe code itself is always the first documentation you should check. If the code is not good enough, look for comments.\n\nComments should avoid abbreviations. Comments should justify violations of good programming style. Don't comment tricky\ncode, rewrite it. If something is tricky for you, for others it might be incomprehensible.\n\n> Make your code so good that you don't need comments, and then comment it to make it even better.\n\nCommenting data declarations:\n\n- comment the units\n- comment the range of allowable numeric values\n- use enumerated types to express coded meanings\n- comment limitations of input data, use assertions\n- if variable is used as bit field, explain every bit\n- if you have comments that refer to a specific variable, make sure the comment stays updated after variable name change\n\nKeep comments close to the code they describe. Describe the design approaches, limitations, usage assumptions and so on.\nDo not document implementation details in the interface.\n\n## Chapter 33: Personal Character\n\nThe best programmers are the people who realise how small their brains are. The purpose of many good programming\npractices is to reduce the load on your grey cells:\n\n- decomposing - make a system simpler to understand\n- reviews, inspections and tests - our intellectual capacity is limited, so we augment it with someone's else\n- short routines reduce the load on our brains\n- writing programs in terms of the problem domain rather than in terms of low level implementation details reduces\n  mental workload\n- conventions free brain from the relatively mundane aspects of programming\n\nHow to exercise curiosity and make a learning a priority?\n\n- If your workload consists entirely on short-term assignments that don't develop your skills, be dissatisfied. Half of\n  what you need to know will be outdated in three years. You are not learning, you are turning into a dinosaur. If you\n  can't learn at your job, find a new one.\n- Experiment if you don't understand something. Learn to make mistakes, learn from the each. Making a mistake is no sin.\n  Failing to learn from mistake is.\n- Read about problem-solving, don't reinvent the wheel.\n- Study the work of the great programmers, it is not about reading 500-long source code but for example about high-level\n  design.\n- Read books, one book is more than most programmers read each year.\n- Affiliate with other professionals\n- Set up a professional development plan\n\nMature programmers are honest, which means: you refuse to pretend you are an expert when you are not, you admit your\nmistakes, you provide realistic estimates, you understand your program.\n\nWriting readable code is part of being a team player. As a readability guideline, keep the person who has to modify your\ncode in mind. Programming is communicating with another programmer first and communicating with the computer second.\n\nTo stay valuable, you have to stay current. For young hungry programmers, this is an advantage. Older programmers\nsometimes feel they have already earned their stripes and resent having to improve themselves year after year.\n\nGood habits matter because most of what you do as a programmer you do without consciously thinking about it.\n\n## Chapter 34: Themes in Software Craftsmanship\n\nThere are many intellectual tools for handling computer science complexity:\n\n- dividing a system into subsystems at the architecture level so that brain can focus on smaller amount of the system at\n  one time\n- carefully interface definition\n- preserving the abstraction representing by the interface so that brain doesn't have to remember arbitrary details\n- avoid global data\n- avoid deep inheritance hierarchy\n- carefully define error handling strategy\n- prevent monster classes creation\n- keep functions short\n- use self-explanatory names\n- minimise number of parameters passed to the routine\n- use conventions\n\nPoints above are used to decrease usage of mental resources you need to use in order to understand the code.\n\nAbstraction is a particularly powerful tool for managing complexity. Fred Brooks said that the biggest single gain ever\nmade in computer science was in the jump from machine language to higher-level languages. It freed programmers from\nworrying about detailed quirks of individual pieces of the hardware and allowed them to focus on programming.\n\nReducing complexity is arguably the most important key to being and effective programmer.\n\nCollective ability isn't simply the sum of the team members' individual skills. The way people work together determines\nif abilities sum up or subtract from each other.\n\nIn real word, requirements are never stable -in order to build software more flexibly - use incremental approach, plan\nto develop program in several iterations.\n\nWrite readable code because it helps other people to read the code. Computer doesn't care if code is readable. A\nprofessional programmer writes readable code. Even if you think you are the only one who will read your code, in\nreality, chances are good that someone else will need to modify your code. One study found that 10 generations of\nmaintenance programmers work on an average program before it gets rewritten.\n\nIf your language doesn't support some mechanisms do not hesitate and implement them (e.g. missing `assert`) on your own.\n\nAt the highest level, you shouldn't have any idea how the data is stored. Suggested levels of abstraction:\n\n4. High level problem domain terms\n\n3. low level problem domain terms\n\n2. low level implementation structures\n\n1. programming language structures and tools\n\n0. operating system operations and machine instructions\n"
  },
  {
    "path": "books/comic-agile.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Comic Agilé\n\nBook by Luxshan Ratnaravi, Mikkel Noe-Nygaard\n\n- [1: Transformation](#1-transformation)\n- [4: Team](#4-team)\n- [6: Miscellaneous](#6-miscellaneous)\n\n## 1: Transformation\n\nInstead of taking a waterfall approach to your agile transformation, take an iterative one and grow the scope\norganically. Focus on changing the organizational culture to align with an agile one.\n\nProduct Owners don't dictate anything just because they are accountable for maximizing the value through effective\nproduct backlog. The entire Scrum Team collaborates on creating a plan for the next Sprint.\n\nAsses the psychological safety in your organization. If it is too low, seek to make working agreements where blameless\npost-mortems are part of them, so you can create a culture of promoting healthy conflicts and celebration of mistakes (\nand learning from them). Help your managers in demanding more psychological safety from their superiors, as that is a\nprerequisite for the managers creating it for you.\n\nIf you only partly adopted the agile way of working, the scope and time might be fixed, so the only parameter that the\nteams can rally vary is how much technical debt to create.\n\n## 4: Team\n\nTeam Velocity - the velocity is only for the team. If Management doesn't get that, educate them on the purpose and\nnature of velocity.\n\nTechnical Debt - If you PO doesn't get the importance of reducing Technical Debt, you need to educate them - spending\nsome time now on reducing the technical debt will most likely decrease time-to-market of new features.\n\nAvoid external participants in the team's retrospective (lack of trust to the externals).\n\nUse a simple tools for building agile culture, by taking a just-enough approach to your tooling, you free ip energy to\nfocus on the needed behavioral changes.\n\nDevOps is not just about tools, testing and CI/CD pipelines - it is more about culture, breaking down silos and\naligning cross-functional teams tp the paths of value delivery.\n\nWIP limit should create a pull system in the team's flow. This should then bring a conversation about collaboration and\nthe knowledge sharing needed to ensure that the entire team can actually swarm around each PBI.\n\nMob Programming - is about working collaboratively in groups of +3 to deliver high quality software and/or share\nknowledge between the developers in the mob. The Driver - controls the keyboard, the Navigators are thinking,\ndiscussing, reviewing and reflecting. The roles are interchanged.\n\nStability is the foundation for building the trust needed to become high-preforming teams. If team keeps changing, they\nwill have difficulties moving up Tucksman's phases - forming, storming, norming, performing.\n\n## 6: Miscellaneous\n\nIn the spirit of openness, you don't have to wait for the Retrospective to bring up potential improvements to your ways\nof working.\n\nCompanies with diverse leadership are 45% more likely to grow their market share and 70% more likely to capture new\nmarkets compared to companies with \"non-diverse\" leadership. Behavioral diversity is the other half of the equation,\nwhich includes:\n\n- ensuring everyone is heard\n- making it safe to propose novel ideas\n- giving team members decision-making authority\n- sharing credit for success\n- giving actionable feedback\n- implementing feedback from the team\n"
  },
  {
    "path": "books/cracking-coding-interview/Dockerfile",
    "content": "FROM python:3.10.4\n\nWORKDIR /src\n\nENV PYTHONPATH \"${PYTHONPATH}:/src\"\n\nCOPY requirements.txt .\nRUN pip install -r requirements.txt\n\nCOPY src/ src/\n"
  },
  {
    "path": "books/cracking-coding-interview/docker-compose.yml",
    "content": "version: \"3.9\"\nservices:\n  interview:\n    build:\n      context: .\n      dockerfile: Dockerfile\n    volumes:\n      - ./:/src\n"
  },
  {
    "path": "books/cracking-coding-interview/notes.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Cracking the Coding Interview: 189 Programming Questions and Solutions\n\nBook by Gayle Laakmann McDowell\n\nCode here: [click](.)\n\n- [Chapter 1: The Interview Process](#chapter-1-the-interview-process)\n- [Chapter 2: Behind the Scenes](#chapter-2-behind-the-scenes)\n- [Chapter 3: Special situations](#chapter-3-special-situations)\n- [Chapter 4: Before the Interview](#chapter-4-before-the-interview)\n- [Chapter 5: Behavioral Questions](#chapter-5-behavioral-questions)\n\n## Chapter 1: The Interview Process\n\nAssessment of a candidate performance:\n\n- Analytical skills: Did you need much help to solve the problem? How optimal was your solution? How long did it take\n  you to arrive at a solution?\n- Coding skills: Were you able to successfully translate your algorithm to reasonable code? Was it clean and\n  well-organized? Did you think of potential errors? Did you use good style?\n- Technical knowledge: Do you have a strong foundation in computer science and the relevant technologies?\n- Experience: Have you made good technical decisions in the past? Have you built interesting, challenging projects? Have\n  you shown drive, initiative, and other important factors?\n- Culture fit: Do your personality and values fit with the company and team? Did you communicate well with your\n  interviewer?\n\nFalse negatives are acceptable. Some good candidates are rejected. The company is out to build a great set of employees.\nThey can accept that they miss out on some good people. Company is far more concerned with false positives: people who\ndo well in an interview but are not in fact very good.\n\nBasic data structure and algorithm knowledge is useful. It is a good proxy. These skills are not hard to learn, but are\nwell-correlated with being a good developer. Also, it is hard to ask problem-solving questions that don't involve\nalgorithms and data structures.\n\nYour interviewer develops a feel for your performance by comparing you to other people. Getting a hard question isn't a\nbad thing. When it is harder for you, it is harder for everyone.\n\nIf you haven't heard back from a company within 3-5 business days after interview, check in with your recruiter.\n\nYou can almost always re-apply to a company after getting rejected. Typically, you have to wait between 6-12 months.\n\n## Chapter 2: Behind the Scenes\n\n\"Bar raiser\" interviewer is charged with keeping the interview bar high. This person has significant experience with\ninterviews and veto power in the hiring decision.\n\n## Chapter 3: Special situations\n\n**Experienced candidates.** More experienced engineers might see slightly less focus on algorithm questions. Some\ninterviewers might hold experienced candidates to a somewhat lower standard. After all, it has been years since these\ncandidates took an algorithms class. Others though hold experienced candidates to a higher standard. On average, it\nbalances out.\n\nThe exception to this rule is system design and architecture questions. Performance in such interview questions would be\nevaluated with respect to your experience level.\n\nPersonality fit: Typically assessed by how you interact with your interviewer. Establishing a friendly, engaging\nconversation with your interviewers is your ticket to many job offers.\n\n**For interviewers.**\n\n- Don't actually ask the exact questions in here (this book). You can ask similar questions to these. Some candidates\n  are reading this book. Your goal is to test their problem-solving skills, not their memorization skills.\n- Ask Medium and Hard problems. When you ask questions that are too easy, performance gets clustered together.\n- Use hard questions, not hard knowledge. If your question expects obscure knowledge, ask yourself: is this truly an\n  important skill? Most won't remember Dijkstra's algorithm or the specifics of how AVL trees work.\n- Avoid \"scary\" questions. Some questions intimidate candidates, because it seems like they involve some specialized\n  knowledge, even if they really don't - math or probability, low-level knowledge, system design or scalability,\n  proprietary systems (e.g. Google Maps). If you are going to ask a question that sounds \"scary\", make sure you really\n  reassure candidates that it doesn't require the knowledge that they think it does.\n- Offer positive reinforcement. You want candidates to feel comfortable. A candidate who is nervous will perform poorly,\n  and it doesn't mean that they aren't good. Moreover, a good candidate who has a negative reaction to you or to the\n  company is less likely to accept an offer - and they may dissuade their friends from interviewing/accepting as well.\n  No matter how poorly a candidate is doing, there is always something they got right. Find a way to infuse some\n  positivity into the interview.\n- Coach your candidates.\n    - Many candidates don't use an example to solve a question. Guide them.\n    - Some candidates take a long time to find the bug because they use an enormous example. They didn't realize it\n      would be more efficient to analyze their code conceptually first, or that a small example would work nearly as\n      well. Guide them.\n    - If they dive into code before they have an optimal solution, pull them back and focus them on the algorithm.\n    - If they get nervous and stuck and aren't sure where to go, suggest to them that they walk through the brute force\n      solution and look for areas to optimize.\n    - Remind them that they can start off with a brute solution. Their first solution doesn't have to be perfect.\n- If they want silence, give them silence. If your candidate needs this, give your candidate time to think.\n- Know your mode: sanity check, quality, specialist, and proxy.\n    - Sanity Check - Easy problem-solving or design questions. They assess a minimum degree of competence. You can use\n      them early in the process.\n    - Quality Check - More challenging questions. Designed to be more rigorous and make a candidate think.\n    - Specialist Questions - Test knowledge on specific topics, e.g. Java or machine learning.\n    - Proxy Knowledge - This is knowledge that is not quite at the specialist level, but that you would expect a\n      candidate at their level to know.\n\n## Chapter 4: Before the Interview\n\nIf you are smart, you can code, and you can prove that, you can land your interview.\n\nResume screeners want to know that you are smart, and you can code. You should prepare your resume to highlight these 2\nthings. Think twice before cutting more technical lines in order to allow space for your non-technical hobbies.\n\nKeep your resume short, max. 1.5-2 pages. Long resumes are not a reflection of having tons of experience, there are a\nreflection of not understanding how to prioritize content. A resume should not include a full history of every role you\nhave ever had. Include only the relevant positions - the ones that make you a more impressive candidate.\n\nFor each role, try to discuss you accomplishments with the following approach: \"_Accomplishment X by implementing Y\nwhich led to Z_\". Not everything will fit into this approach, but the principle is the same: what you did, how you did\nit, and what the results were.\n\n## Chapter 5: Behavioral Questions\n\nEnsure that you have one to three projects that you can talk about in detail. You should be able to discuss the\ntechnical components in depth. These should be projects where you played a central role.\n\nWhat are your weaknesses? A good answer conveys a real, legitimate weakness but emphasises how you work to overcome it.\n\nWhat questions should you ask the interviewer?\n\n- Genuine Questions: these are the questions you actually want to know the answer to.\n- Insightful Questions: these questions demonstrate your knowledge or understanding of technology. These questions will\n  typically require advance research about the company.\n- Passion Questions: these questions are designed to demonstrate your passion for technology. They show that you are\n  interested in learning and will be a strong contributor to the company. E.g.: I am very interested in scalability, and\n  I would love to learn more about it. What opportunities are there at this company to learn about this?\n\nBe specific, not arrogant. How do you make yourself sound good without being arrogant? Be specific. Specificity means\ngiving just the facts and letting the interviewer derive an interpretation.\n\nStay light on details and just state the key points. Your interviewer can ask for more details.\n\nFocus on yourself, not your team. More \"I\", less \"we\".\n\nGive structured answers.\n\n1. Nugget first - means starting your response with a \"nugget\" that succinctly describes what your response will be\n   about.\n2. S.A.R (Situation, Action, Result) - you start off outlining the situation, then explaining the actions you took, and\n   lastly, describing the result.\n\nTell me about yourself, suggested structure:\n\n1. Current role (headline only)\n2. College\n3. Post college & onwards (job, technologies)\n4. Current role (more details)\n5. Outside of work (hobbies)\n6. Wrap up (what are you looking for)\n"
  },
  {
    "path": "books/cracking-coding-interview/requirements.txt",
    "content": "pytest==7.1.2\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch01_arrays_and_strings/check_permutation.py",
    "content": "import pytest\n\n\ndef check_permutation_sets(string: str, potential_permutation_string: str) -> bool:\n    return len(string) == len(potential_permutation_string) and set(string) == set(potential_permutation_string)\n\n\ndef check_permutation_sort(string: str, potential_permutation_string: str) -> bool:\n    return sorted(string) == sorted(potential_permutation_string)\n\n\ndef check_permutation_array(string: str, potential_permutation_string: str) -> bool:\n    if len(string) != len(potential_permutation_string):\n        return False\n\n    url_array = [0] * 128\n\n    for ch in string:\n        url_array[ord(ch)] += 1\n\n    for ch in potential_permutation_string:\n        url_array[ord(ch)] -= 1\n\n        if url_array[ord(ch)] < 0:\n            return False\n\n    return True\n\n\n@pytest.mark.parametrize(\"string, potential_permutation_string, is_permutation\", [\n    # @formatter:off\n    (\"god\",                 \"dog\",                 True),\n    (\"god\",                 \"dod\",                 False),\n    (\"god\",                 \"dogg\",                False),\n    (\"cat belongs to ala\",  \"ala belongs to cat\",  True),\n    (\"interview questions\", \"interviews question\", True),\n    (\"interview questions\", \"interview question\",  False),\n    # @formatter:on\n])\n@pytest.mark.parametrize(\"function\", [\n    check_permutation_sets,\n    check_permutation_sort,\n    check_permutation_array,\n])\ndef test_algorithm(function, string, potential_permutation_string, is_permutation):\n    assert function(string, potential_permutation_string) == is_permutation\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch01_arrays_and_strings/is_unique.py",
    "content": "import pytest\n\n\ndef check_if_has_unique_characters_pythonic(string: str) -> bool:\n    return len(set(string)) == len(string)\n\n\ndef check_if_has_unique_characters_ascii(string: str) -> bool:\n    boolean_array = [False] * 128\n    for ch in string:\n        int_ch = ord(ch)\n        if boolean_array[int_ch]:\n            return False\n        boolean_array[int_ch] = True\n    return True\n\n\ndef check_if_has_unique_characters_no_structures(string: str) -> bool:\n    for i, ch_0 in enumerate(string):\n        for ch_1 in string[i + 1:]:\n            if ch_0 == ch_1:\n                return False\n    return True\n\n\ndef check_if_has_unique_characters_no_structures_sort(string: str) -> bool:\n    sorted_string = sorted(string)\n\n    for i in range(len(sorted_string) - 1):\n        if sorted_string[i] == sorted_string[i + 1]:\n            return False\n\n    return True\n\n\n@pytest.mark.parametrize(\"string, has_all_unique_chars\", [\n    # @formatter:off\n    (\"qwerty\", True),\n    (\"\",       True),\n    (\"qqwert\", False),\n    (\"qwertt\", False),\n    # @formatter:on\n])\n@pytest.mark.parametrize(\"function\", [\n    check_if_has_unique_characters_pythonic,\n    check_if_has_unique_characters_ascii,\n    check_if_has_unique_characters_no_structures,\n    check_if_has_unique_characters_no_structures_sort,\n])\ndef test_algorithm(function, string, has_all_unique_chars):\n    assert function(string) == has_all_unique_chars\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch01_arrays_and_strings/one_away.py",
    "content": "import pytest\n\n\ndef is_one_edit_away_pythonic(string: str, edit: str) -> bool:\n    if abs(len(string) - len(edit)) > 1:\n        return False\n\n    if string in edit or edit in string:\n        return True\n\n    return len(set(string) - set(edit)) <= 1\n\n\ndef is_one_edit_away_loop(string: str, edit: str) -> bool:\n    if abs(len(string) - len(edit)) > 1:\n        return False\n\n    shorter_text, longer_text = string if len(string) < len(edit) else edit, string if len(string) >= len(edit) else edit\n    shorter_i, longer_i = 0, -1\n    edit_found = False\n\n    while shorter_i < len(shorter_text) and longer_i < len(longer_text):\n        longer_i += 1\n\n        if shorter_text[shorter_i] == longer_text[longer_i]:\n            shorter_i += 1\n            continue\n\n        if edit_found:\n            return False\n\n        if len(string) == len(edit):\n            shorter_i += 1\n\n        edit_found = True\n\n    return True\n\n\n@pytest.mark.parametrize(\"string, edit, expected_result\", [\n    # @formatter:off\n    (\"pale\",  \"ple\",  True),\n    (\"pale\",  \"ale\",  True),\n    (\"ale\",   \"pale\", True),\n    (\"pales\", \"pale\", True),\n    (\"pale\",  \"bale\", True),\n    (\"pale\",  \"bake\", False),\n    (\"pale\",  \"ba\",   False),\n    # @formatter:on\n])\n@pytest.mark.parametrize(\"function\", [\n    is_one_edit_away_pythonic,\n    is_one_edit_away_loop\n])\ndef test_algorithm(function, string, edit, expected_result):\n    assert function(string, edit) == expected_result\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch01_arrays_and_strings/palindrome_permutation.py",
    "content": "from collections import Counter\n\nimport pytest\n\n\ndef is_palindrome_permutation_pythonic(string: str) -> bool:\n    raw_string = string.replace(' ', '')\n    letter_frequency = Counter(raw_string)\n\n    if len(raw_string) % 2 == 0:\n        return all(frequency % 2 == 0 for frequency in letter_frequency.values())\n    else:\n        return sum(1 for frequency in letter_frequency.values() if frequency == 1) <= 1\n\n\ndef is_palindrome_permutation_counter(string: str) -> bool:\n    raw_string = string.replace(' ', '')\n    letter_frequency = Counter()\n    num_of_odd = 0\n\n    for ch in raw_string:\n        letter_frequency[ch] += 1\n\n        if letter_frequency[ch] % 2 == 1:\n            num_of_odd += 1\n        else:\n            num_of_odd -= 1\n\n    return num_of_odd <= 1\n\n\n@pytest.mark.parametrize(\"string, expected_result\", [\n    # @formatter:off\n    (\"tact coa\",     True),\n    (\"kamil slimak\", True),\n    (\"slimakkamil \", True),\n    (\"aaaaaab\",      True),\n    (\"aaa\",          True),\n    (\"aaaaacb\",      False),\n    (\"abc\",          False),\n    (\"slimakoamil \", False),\n    # @formatter:on\n])\n@pytest.mark.parametrize(\"function\", [\n    is_palindrome_permutation_pythonic,\n    is_palindrome_permutation_counter\n])\ndef test_algorithm(function, string, expected_result):\n    assert function(string) == expected_result\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch01_arrays_and_strings/rotate_matrix.py",
    "content": "from typing import List\n\nimport pytest\n\n\ndef rotate_matrix_list_comprehension(matrix: List[List[int]]) -> List[List[int]]:\n    size = len(matrix)\n    return [\n        [matrix[col][row] for col in reversed(range(size))]\n        for row in range(size)\n    ]\n\n\ndef rotate_matrix_zip(matrix: List[List[int]]) -> List[List[int]]:\n    return [list(reversed(row)) for row in zip(*matrix)]\n\n\n@pytest.mark.parametrize(\"matrix, rotated_matrix\", [\n    ([[1, 2],\n      [3, 4]],\n     [[3, 1],\n      [4, 2]]),\n    ([[1, 2, 3],\n      [4, 5, 6],\n      [7, 8, 9]],\n     [[7, 4, 1],\n      [8, 5, 2],\n      [9, 6, 3]]),\n    ([[1, 2, 3, 8],\n      [4, 5, 6, 8],\n      [7, 8, 9, 8],\n      [8, 8, 8, 8]],\n     [[8, 7, 4, 1],\n      [8, 8, 5, 2],\n      [8, 9, 6, 3],\n      [8, 8, 8, 8]])\n])\n@pytest.mark.parametrize(\"function\", [\n    rotate_matrix_zip,\n    rotate_matrix_list_comprehension\n])\ndef test_algorithm(function, matrix, rotated_matrix):\n    assert function(matrix) == rotated_matrix\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch01_arrays_and_strings/string_compression.py",
    "content": "from dataclasses import dataclass\n\nimport pytest\n\n\ndef compress_string(text: str) -> str:\n    @dataclass\n    class Compressed:\n        char: str\n        freq: int\n\n    compressed = []\n\n    for ch in text:\n        if compressed and ch == compressed[-1].char:\n            compressed[-1].freq += 1\n        else:\n            compressed.append(Compressed(char=ch, freq=1))\n\n    return ''.join(f\"{c.char}{c.freq}\" for c in compressed) if len(compressed) * 2 < len(text) else text\n\n\n@pytest.mark.parametrize(\"text, expected_result\", [\n    # @formatter:off\n    (\"a\",       \"a\"),\n    (\"aabb\",    \"aabb\"),\n    (\"aaaa\",    \"a4\"),\n    (\"aabbb\",   \"a2b3\"),\n    (\"aabbbaa\", \"a2b3a2\"),\n    # @formatter:on\n])\ndef test_algorithm(text, expected_result):\n    assert compress_string(text) == expected_result\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch01_arrays_and_strings/string_rotation.py",
    "content": "import pytest\n\n\ndef is_rotated(string: str, rotated_string: str) -> bool:\n    return len(string) == len(rotated_string) and rotated_string in string * 2\n\n\n@pytest.mark.parametrize(\"string, rotated_string, expected_result\", [\n    # @formatter:off\n    (\"\",            \"\",            True),\n    (\"waterbottle\", \"erbottlewat\", True),\n    (\"dog\",         \"gdo\",         True),\n    (\"dog\",         \"dogdo\",       False),\n    (\"dog\",         \"godd\",        False),\n    (\"dog\",         \"go\",          False),\n    # @formatter:on\n])\ndef test_algorithm(string, rotated_string, expected_result):\n    assert is_rotated(string, rotated_string) == expected_result\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch01_arrays_and_strings/urlify.py",
    "content": "import pytest\n\n\ndef urlify_pythonic(url: str) -> str:\n    return ' '.join(url.split()).replace(' ', \"%20\")\n\n\ndef urlify_array(url: str) -> str:\n    result_url = \"\"\n    last_appended_character = None\n\n    for ch in url:\n        if ch == ' ' and last_appended_character is None:\n            # Do not duplicate '%20' in the URL\n            continue\n        elif ch == ' ' and last_appended_character:\n            last_appended_character = None\n            result_url += \"%20\"\n        else:\n            last_appended_character = ch\n            result_url += ch\n\n    if last_appended_character is None:\n        return result_url[:-3]\n\n    return result_url\n\n\n@pytest.mark.parametrize(\"url, expected_url\", [\n    # @formatter:off\n    (\"Mr John Smith\",     \"Mr%20John%20Smith\"),\n    (\"Mr John  Smith\",    \"Mr%20John%20Smith\"),\n    (\"    Mr John Smith\", \"Mr%20John%20Smith\"),\n    (\"Mr John Smith    \", \"Mr%20John%20Smith\"),\n    (\"Mr \",               \"Mr\"),\n    (\"M \",                \"M\"),\n    (\"  \",                \"\"),\n    (\"\",                  \"\"),\n    # @formatter:on\n])\n@pytest.mark.parametrize(\"function\", [\n    urlify_pythonic,\n    urlify_array,\n])\ndef test_algorithm(function, url, expected_url):\n    assert function(url) == expected_url\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch01_arrays_and_strings/zero_matrix.py",
    "content": "from typing import List\n\nimport pytest\n\n\ndef nullify_loop(matrix: List[List[int]]) -> List[List[int]]:\n    height, width = len(matrix), len(matrix[0])\n    columns, rows = set(), set()\n\n    for row in range(height):\n        for col in range(width):\n            if matrix[row][col] == 0:\n                columns.add(col)\n                rows.add(row)\n\n    return [\n        [\n            0 if row in rows or col in columns else matrix[row][col]\n            for col in range(width)\n        ]\n        for row in range(height)\n    ]\n\n\ndef nullify_in_place(matrix: List[List[int]]) -> List[List[int]]:\n    height, width = len(matrix), len(matrix[0])\n\n    def nullify_column(pos: int) -> None:\n        for i in range(height):\n            matrix[i][pos] = 0\n\n    def nullify_row(pos: int) -> None:\n        matrix[pos] = [0] * width\n\n    col_start = 0\n\n    for row in range(height):\n        for col in range(col_start, width):\n            if matrix[row][col] == 0:\n                nullify_row(row)\n                nullify_column(col)\n\n                col_start = col + 1\n                break\n\n    return matrix\n\n\n@pytest.mark.parametrize(\"matrix, rotated_matrix\", [\n    ([[0, 2],\n      [3, 4]],\n     [[0, 0],\n      [0, 4]]),\n    ([[1, 2, 3, 4],\n      [1, 0, 3, 4],\n      [1, 2, 3, 0]],\n     [[1, 0, 3, 0],\n      [0, 0, 0, 0],\n      [0, 0, 0, 0]])\n])\n@pytest.mark.parametrize(\"function\", [\n    nullify_loop,\n    nullify_in_place,\n])\ndef test_algorithm(function, matrix, rotated_matrix):\n    assert function(matrix) == rotated_matrix\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch02_linked_lists/delete_middle_node.py",
    "content": "import pytest\n\nfrom linked_list import (\n    LinkedList,\n    Node,\n)\n\n\ndef delete_middle_node(node: Node) -> None:\n    assert node.next, \"node is not the last node in the linked list\"\n\n    node.data = node.next.data\n    node.next = node.next.next\n\n\n@pytest.mark.parametrize(\"values, node, expected_result\", [\n    # @formatter:off\n    ([1, 2, 3, 4], 2, [1, 3, 4]),\n    ([1, 2, 3, 4], 3, [1, 2, 4]),\n    # @formatter:on\n])\ndef test_algorithm(values, node, expected_result):\n    linked_list = LinkedList(values)\n    delete_middle_node(linked_list.node_for_value(node))\n    assert linked_list.values == expected_result\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch02_linked_lists/intersection.py",
    "content": "from typing import Optional\n\nimport pytest\n\nfrom linked_list import (\n    LinkedList,\n    Node,\n)\n\n\ndef intersection(list_0: LinkedList, list_1: LinkedList) -> Optional[Node]:\n    if list_0.tail != list_1.tail:\n        return None\n\n    l0_node, l1_node = list_0.head, list_1.head\n    l0_len, l1_len = list_0.length, list_1.length\n\n    # Advance pointers when lists have different size:\n    if l0_len > l1_len:\n        for i in range(l0_len - l1_len):\n            l0_node = l0_node.next\n\n    if l0_len < l1_len:\n        for i in range(l1_len - l0_len):\n            l1_len = l1_len.next\n\n    while l0_node and l1_node:\n        if l0_node == l1_node:\n            return l0_node\n        l0_node = l0_node.next\n        l1_node = l1_node.next\n\n    assert False, \"Loop above must finish the program\"\n\n\nl0 = LinkedList([3, 1, 5, 9])\nl1 = LinkedList([4, 6])\ntail = LinkedList([7, 2, 1]).head\n\nl4 = LinkedList([3, 1, 5, 9, 7, 2, 1])\nl5 = LinkedList([4, 6, 7, 2, 1])\n\n\n@pytest.mark.parametrize(\"list_0, list_0_tail, list_1, list_1_tail, expected_result\", [\n    # @formatter:off\n    (l0, tail, l1, tail, tail),\n    (l4, None, l5, None, None)\n    # @formatter:on\n])\ndef test_algorithm(list_0, list_0_tail, list_1, list_1_tail, expected_result):\n    list_0.tail.next = list_0_tail\n    list_1.tail.next = list_1_tail\n\n    assert intersection(list_0, list_1) == expected_result\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch02_linked_lists/linked_list.py",
    "content": "from typing import (\n    List,\n    Optional,\n)\n\nimport pytest\n\n\nclass Node:\n    def __init__(self, data: int) -> None:\n        self.next = None\n        self.data = data\n\n\nclass LinkedList:\n    def __init__(self, data: List[int]) -> None:\n        self.head = None\n        for val in data:\n            self.append(val)\n\n    @property\n    def values(self) -> List[int]:\n        result, current = [], self.head\n        while current:\n            result.append(current.data)\n            current = current.next\n        return result\n\n    @property\n    def tail(self) -> Optional[Node]:\n        node = self.head\n        while node and node.next:\n            node = node.next\n        return node\n\n    @property\n    def length(self) -> int:\n        return len(self.values)\n\n    def node_for_value(self, val: int) -> Optional[Node]:\n        node = self.head\n        while node:\n            if node.data == val:\n                return node\n            node = node.next\n        return None\n\n    def append(self, data: int) -> None:\n        self.head = append(self.head, data)\n\n    def delete(self, data: int) -> None:\n        self.head = delete(self.head, data)\n\n\ndef delete(head: Optional[Node], data: int) -> Optional[Node]:\n    node = head\n\n    if not node:\n        return None\n\n    if head.data == data:\n        return head.next\n\n    while node.next:\n        if node.next.data == data:\n            node.next = node.next.next\n            break\n        node = node.next\n\n    return head\n\n\ndef append(head: Optional[Node], data: int) -> Optional[Node]:\n    if not head:\n        return Node(data)\n\n    current, end = head, Node(data)\n    while current.next:\n        current = current.next\n    current.next = end\n\n    return head\n\n\n@pytest.mark.parametrize(\"values\", [\n    [],\n    [1],\n    [1, 2],\n    [1, 2, 3],\n])\ndef test_append(values):\n    assert LinkedList(values).values == values\n\n\n@pytest.mark.parametrize(\"values, to_delete, expected_result\", [\n    # @formatter:off\n    ([],        0, []),\n    ([1],       0, [1]),\n    ([1],       1, []),\n    ([1, 2],    1, [2]),\n    ([1, 2],    2, [1]),\n    ([1, 2, 3], 2, [1, 3]),\n    # @formatter:on\n])\ndef test_delete(values, to_delete, expected_result):\n    linked_list = LinkedList(values)\n    linked_list.delete(to_delete)\n    assert linked_list.values == expected_result\n\n\n@pytest.mark.parametrize(\"values, value, expected_node_val\", [\n    # @formatter:off\n    ([1, 2, 3, 4], 2, 2),\n    ([1, 2, 3, 4], 5, None)\n    # @formatter:on\n])\ndef test_node_for_value(values, value, expected_node_val):\n    node = LinkedList(values).node_for_value(value)\n    assert node.data if node else node == expected_node_val\n\n\n@pytest.mark.parametrize(\"values, expected_tail\", [\n    # @formatter:off\n    ([],     None),\n    ([1],    1),\n    ([1, 2], 2),\n    # @formatter:on\n])\ndef test_tail(values, expected_tail):\n    tail = LinkedList(values).tail\n    assert tail.data == expected_tail if expected_tail else tail is None\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch02_linked_lists/loop_detection.py",
    "content": "from typing import Optional\n\nimport pytest\n\nfrom linked_list import (\n    LinkedList,\n    Node,\n)\n\n\ndef get_loop(linked_list: LinkedList) -> Optional[Node]:\n    slow, fast = linked_list.head, linked_list.head\n\n    def get_loop_head():\n        nonlocal slow, fast\n        slow = linked_list.head\n\n        while slow != fast:\n            slow = slow.next\n            fast = fast.next\n\n        return fast\n\n    while fast and fast.next:\n        slow = slow.next\n        fast = fast.next.next\n\n        if slow == fast:\n            return get_loop_head()\n\n    return None\n\n\nl0 = LinkedList([1, 2, 3, 4, 5])\nl0.node_for_value(5).next = l0.node_for_value(3)\n\nl1 = LinkedList([1, 2, 3, 4, 5])\n\n\n@pytest.mark.parametrize(\"linked_list, expected_result\", [\n    # @formatter:off\n    (l0, l0.node_for_value(3)),\n    (l1, None),\n    # @formatter:on\n])\ndef test_algorithm(linked_list, expected_result):\n    assert get_loop(linked_list) == expected_result\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch02_linked_lists/palindrome.py",
    "content": "import pytest\n\nfrom linked_list import (\n    LinkedList,\n    Node,\n)\n\n\ndef is_palindrome_simple(linked_list: LinkedList) -> bool:\n    values = linked_list.values\n    return values == values[::-1]\n\n\ndef is_palindrome_reverse(linked_list: LinkedList) -> bool:\n    def reverse_list() -> Node:\n        head, node = None, linked_list.head\n\n        while node:\n            new_node = Node(data=node.data)\n            new_node.next = head\n            head = new_node\n\n            node = node.next\n\n        return head\n\n    normal_node = linked_list.head\n    reversed_node = reverse_list()\n\n    while normal_node and reversed_node:\n        if normal_node.data != reversed_node.data:\n            return False\n        normal_node = normal_node.next\n        reversed_node = reversed_node.next\n\n    return not normal_node and not reversed_node\n\n\ndef is_palindrome_slow_fast_runner(linked_list: LinkedList) -> bool:\n    slow, fast = linked_list.head, linked_list.head\n    stack = []\n\n    while fast and fast.next:\n        stack.append(slow.data)\n        slow = slow.next\n        fast = fast.next.next\n\n    if fast:\n        slow = slow.next\n\n    while slow:\n        if stack and stack.pop() != slow.data:\n            return False\n        slow = slow.next\n\n    return True\n\n\n@pytest.mark.parametrize(\"values, expected_result\", [\n    # @formatter:off\n    ([1, 2, 3, 4], False),\n    ([1, 2, 2, 2], False),\n    ([1, 2, 2, 1], True),\n    ([1, 2, 1],    True),\n    ([1],          True),\n    ([],           True)\n    # @formatter:on\n])\n@pytest.mark.parametrize(\"function\", [\n    is_palindrome_simple,\n    is_palindrome_reverse,\n    is_palindrome_slow_fast_runner,\n])\ndef test_algorithm(function, values, expected_result):\n    linked_list = LinkedList(values)\n    assert function(linked_list) == expected_result\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch02_linked_lists/partition.py",
    "content": "from typing import Tuple\n\nimport pytest\nfrom linked_list import LinkedList\n\n\ndef partition(linked_list: LinkedList, partition_val: int) -> Tuple[LinkedList, LinkedList]:\n    l1, l2 = LinkedList(data=[]), LinkedList(data=[])\n    node = linked_list.head\n\n    while node:\n        if node.data < partition_val:\n            l1.append(node.data)\n        else:\n            l2.append(node.data)\n        node = node.next\n\n    return l1, l2\n\n\n@pytest.mark.parametrize(\"values, partition_val, expected_values\", [\n    # @formatter:off\n    ([1, 2, 3, 4, 5], 3, ([1, 2],          [3, 4, 5])),\n    ([1, 2, 3, 4, 5], 0, ([],              [1, 2, 3, 4, 5])),\n    ([1, 2, 3, 4, 5], 6, ([1, 2, 3, 4, 5], [])),\n    # @formatter:on\n])\ndef test_algorithm(values, partition_val, expected_values):\n    linked_list = LinkedList(values)\n    l1, l2 = partition(linked_list, partition_val)\n    assert (l1.values, l2.values) == expected_values\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch02_linked_lists/remove_dups.py",
    "content": "import pytest\n\nfrom linked_list import LinkedList\n\n\ndef remove_duplicates_buffer(linked_list: LinkedList) -> LinkedList:\n    unique_data = set()\n    prev, current = None, linked_list.head\n\n    while current:\n        if current.data in unique_data:\n            prev.next = current.next\n        else:\n            unique_data.add(current.data)\n            prev = current\n        current = current.next\n\n    return linked_list\n\n\ndef remove_duplicates_no_buffer(linked_list: LinkedList) -> LinkedList:\n    current = linked_list.head\n\n    while current:\n        runner = current\n\n        while runner.next:\n            if current.data == runner.next.data:\n                runner.next = runner.next.next\n            else:\n                runner = runner.next\n\n        current = current.next\n\n    return linked_list\n\n\n@pytest.mark.parametrize(\"values, expected_result\", [\n    # @formatter:off\n    ([],           []),\n    ([1, 1],       [1]),\n    ([1, 1, 0],    [1, 0]),\n    ([1, 1, 1, 1], [1]),\n    ([0, 1, 0, 1], [0, 1]),\n    ([1, 2, 3, 4], [1, 2, 3, 4]),\n    # @formatter:on\n])\n@pytest.mark.parametrize(\"function\", [\n    remove_duplicates_buffer,\n    remove_duplicates_no_buffer\n])\ndef test_algorithm(function, values, expected_result):\n    linked_list = LinkedList(values)\n    assert function(linked_list).values == expected_result\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch02_linked_lists/return_kth_to_last.py",
    "content": "from typing import Optional\n\nimport pytest\n\nfrom linked_list import (\n    LinkedList,\n    Node,\n)\n\n\ndef return_kth_to_last_simple(linked_list: LinkedList, k: int) -> int:\n    node = linked_list.head\n    position, i = len(linked_list.values) - k, 0\n\n    if position < 0:\n        return -1\n\n    while node and i < position:\n        node = node.next\n        i += 1\n\n    return node.data\n\n\ndef return_kth_to_last_simplest(linked_list: LinkedList, k: int) -> int:\n    values = linked_list.values\n    size = len(values)\n\n    return values[size - k] if size - k >= 0 else -1\n\n\ndef return_kth_to_last_recursive(linked_list: LinkedList, k: int) -> int:\n    found_value = None\n\n    def _return_kth_to_last(node: Optional[Node]) -> int:\n        if not node:\n            return 0\n\n        index = _return_kth_to_last(node.next) + 1\n\n        if index == k:\n            nonlocal found_value\n            found_value = node.data\n\n        return index\n\n    _return_kth_to_last(linked_list.head)\n\n    return found_value if found_value else -1\n\n\ndef return_kth_to_last_iterative(linked_list: LinkedList, k: int) -> int:\n    p1, p2 = linked_list.head, linked_list.head\n\n    for _ in range(k):\n        if not p1:\n            return -1\n        p1 = p1.next\n\n    while p1:\n        p1 = p1.next\n        p2 = p2.next\n\n    return p2.data\n\n\n@pytest.mark.parametrize(\"values, k, expected_result\", [\n    # @formatter:off\n    ([1, 2, 3], 1, 3),\n    ([1, 2, 3], 2, 2),\n    ([1, 2, 3], 3, 1),\n    ([1, 2, 3], 4, -1),\n    # @formatter:on\n])\n@pytest.mark.parametrize(\"function\", [\n    return_kth_to_last_simple,\n    return_kth_to_last_simplest,\n    return_kth_to_last_recursive,\n    return_kth_to_last_iterative,\n])\ndef test_algorithm(function, values, k, expected_result):\n    linked_list = LinkedList(values)\n    assert function(linked_list, k) == expected_result\n"
  },
  {
    "path": "books/cracking-coding-interview/src/ch02_linked_lists/sum_lists.py",
    "content": "import pytest\n\nfrom linked_list import (\n    LinkedList,\n    Node,\n)\n\n\ndef sum_lists(list_0: LinkedList, list_1: LinkedList) -> LinkedList:\n    result, remainder = [], 0\n    node_0, node_1 = list_0.head, list_1.head\n\n    def add_aligned_lists() -> None:\n        nonlocal node_0, node_1, result, remainder\n        while node_0 and node_1:\n            result.append((node_0.data + node_1.data + remainder) % 10)\n            remainder = 1 if (node_0.data + node_1.data + remainder) >= 10 else 0\n            node_0, node_1 = node_0.next, node_1.next\n\n    def align_remaining_list(node: Node) -> None:\n        nonlocal result, remainder\n        while node:\n            result.append((node.data + remainder) % 10)\n            remainder = 1 if (node.data + remainder) >= 10 else 0\n            node = node.next\n\n    add_aligned_lists()\n    align_remaining_list(node_0)\n    align_remaining_list(node_1)\n\n    if remainder:\n        result.append(remainder)\n\n    return LinkedList(result)\n\n\n@pytest.mark.parametrize(\"list_0, list_1, expected_result\", [\n    # @formatter:off\n    ([7, 1, 6], [5, 9, 2], [2, 1, 9]),\n    ([1, 7, 1], [3],       [4, 7, 1]),\n    ([9, 9, 9], [1],       [0, 0, 0, 1]),\n    ([7, 1],    [3, 1],    [0, 3]),\n    ([7, 1],    [3],       [0, 2]),\n    # @formatter:on\n])\ndef test_algorithm(list_0, list_1, expected_result):\n    list_0, list_1 = LinkedList(list_0), LinkedList(list_1)\n    assert sum_lists(list_0, list_1).values == expected_result\n"
  },
  {
    "path": "books/ddd.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Domain-Driven Design: Tackling Complexity in the Heart of Software\n\nBook by Eric Evans\n\n- [Chapter 1: Crunching Knowledge](#chapter-1-crunching-knowledge)\n- [Chapter 2: Communication and the Use of Language](#chapter-2-communication-and-the-use-of-language)\n- [Chapter 3: Binding Model and Implementation](#chapter-3-binding-model-and-implementation)\n- [Chapter 4: Isolating the Domain](#chapter-4-isolating-the-domain)\n- [Chapter 5: A Model Expressed in Software](#chapter-5-a-model-expressed-in-software)\n- [Chapter 6: The Life Cycle of a Domain Object](#chapter-6-the-life-cycle-of-a-domain-object)\n- [Chapter 7: Using the Language: An Extended Example](#chapter-7-using-the-language-an-extended-example)\n- [Chapter 8: Breakthrough](#chapter-8-breakthrough)\n- [Chapter 9: Making Implicit Concepts Explicit](#chapter-9-making-implicit-concepts-explicit)\n- [Chapter 10: Supple Design](#chapter-10-supple-design)\n- [Chapter 11: Applying Analysis Patterns](#chapter-11-applying-analysis-patterns)\n- [Chapter 12: Relating Design Patterns to the Model](#chapter-12-relating-design-patterns-to-the-model)\n- [Chapter 13: Refactoring Toward Deeper Insight](#chapter-13-refactoring-toward-deeper-insight)\n- [Chapter 14: Managing Model Integrity](#chapter-14-managing-model-integrity)\n- [Chapter 15: Distillation](#chapter-15-distillation)\n- [Chapter 16: Large-Scale Structure](#chapter-16-large-scale-structure)\n\n## Chapter 1: Crunching Knowledge\n\nEffective modeling:\n\n- Binding model and the implementation\n- Cultivating a language based on the model\n- Developing a knowledge-rich model\n- Distilling the model - drop unneeded concepts\n- Brainstorming and experimenting\n\nEffective domain modellers are knowledge crunchers (take a torrent of information and prove it for relevant trickle).\nKnowledge crunching is a collaborative work, typically led by developers in cooperation with domain experts. Early\nversions or prototypes feed experience back into the team and change interpretations.\n\nAll projects lack knowledge - people leave, team reorganisations happen - in general, knowledge is lost. Highly\nproductive teams grow their knowledge continuously - improve technical knowledge along with general domain-modelling\nskills, but also seriously learn about specific domain they are working on. The accumulated knowledge makes them\neffective knowledge crunchers.\n\nSoftware is unable to fill in gaps with common sense - that is why knowledge crunching is important.\n\nExample with overbooking strategy: overbooking check should be extracted from the booking functionality to be more\nexplicit and visible. This is example of domain modeling and securing and sharing knowledge.\n\n## Chapter 2: Communication and the Use of Language\n\nThe domain experts and developers use different language. Experts vaguely describe what they want, developers vaguely\nunderstand. Cost of translation, plus the risk of misunderstanding is too high. A project needs a common language.\n\nUbiquitous language includes: names of classes and prominent operations, terms to discuss. Model based language should\nbe used to describe artefacts, tasks and functionalities.\n\nLanguage may change to fit the discussion better. These changes will lead to refactoring of the code. Change in the\nlanguage is change to the model.\n\nThe domain-model-based terminology makes conversations more concise, you avoid talking about low level implementation\ndetails, instead you use high level concepts (like in the example: Itinerary, Routing Service, Route Specification\ninstead of cargo id, origin and destination, ...).\n\nPlay with the model as you talk about the system, find easier ways to say what you need to say, and take those new ideas\nback down to the diagrams and code.\n\nThe team should use ONE and only ONE language. Almost every conversation is an opportunity for the developers and domain\nexperts to play with the model, deepen understanding and fine tune it.\n\nDomain model is something between business terms developers don't understand and technical aspect of the design.\n\nThe vital detail about the design in captured in the code. Well written implementation should be transparent and reveal\nthe model underlying it. The model is not the diagram, diagrams help to communicate and explain the model.\n\nExtreme Programming advocates using no extra design documents at all (usually because the fall out of sync) - the code\nshould speak for itself. This motivates developers to keep code clean and transparent.\n\nHowever, if document exists, it should not try to do what code already does well - document should illuminate meaning,\ngive insight into large-scale structures, clarify design intent, complement the code and the talking.\n\n## Chapter 3: Binding Model and Implementation\n\nTightly relating the code to an underlying model gives the code meaning and makes the model relevant. Design must map to\nthe domain model, if not, the correctness of the software is suspect.\n\nModel-Driven Design - discards the dichotomy of analysis model and design to search out a single model that serves both\npurposes (ubiquitous language). Each object in the design plays a conceptual role described in the model. Model needs to\nbe revised to reflect the model in a very literal way, so mapping is obvious. The code becomes expression of the model.\n\nModel-Driven Design is hard to accomplish in procedural languages like C or Fortran. This approach is reserved for\nobject-oriented programming languages.\n\nImplementation model should not be exposed to the user.\n\nPeople responsible for the implementation should participate in modeling. Strict separation of responsibilities is\nharmful. Modeling and implementation are couples in model-driven design. Any technical person contributing to the model\nmust spend some time touching the code. Every developer must be involved in some level of discussion about the model.\n\n## Chapter 4: Isolating the Domain\n\nLayered Architecture - the essential principle is that any element of a layer depends only on other elements in the same\nlayer or on elements of the layers beneath it. Each layer specialises in a particular aspect of a computer program. Most\ncommonly used layers:\n\n- UI (Presentation) Layer - showing information to the user and interpreting the user's commands.\n- Application Layer - this layer does not contain business logic, but only coordinates tasks and delegates work to\n  collaborations of domain objects in the next layer down.\n- Domain (Model) Layer - responsible for representing concepts of business, information about business situation and\n  business rules. This layer is the heart of business software.\n- Infrastructure Layer - generic technical capabilities that support the higher layers (message sending, drawing widgets\n  on the UI, ...), may also support the pattern of interactions between the 4 layers through an architectural framework.\n\nPartition a complex program into layers, develop a design within each layer that is cohesive and that depends only on\nthe layers below. Concentrate all the code related the domain model in one layer and isolate it from the rest of the\nuser interface, application and infrastructure code.\n\nThe domain models, free of the responsibility of displaying themselves, storing themselves, managing application tasks\nand so forth, can be focused on expressing the domain model. This allows to evolve model to be rich enough and clear\nenough to capture essential business knowledge and put it to work.\n\nSuch separation allows a much cleaner design for each layer, especially because they tend to evolve at different pace.\n\nUpper layers can user or manipulate elements of lower ones straightforwardly by calling their public interfaces.\n\nDomain-driven design requires only one particular layer to exist.\n\n## Chapter 5: A Model Expressed in Software\n\nASSOCIATIONS. For every traversable association in the model, there is a mechanism in the software with the same\nproperties. Constraints on associations should be included in the model and implementation (e.g. president of ... for a\nperiod of time), they make the model more precise and the implementation easier to maintain.\n\nENTITIES. Object modeling tends to lead us to focus on the attributes of an object, but the fundamental concept of an\nentity is an abstract continuity threading through a life cycle and even passing through multiple forms. Sometimes such\nan object must be matched with another object even though attributes differ.\n\nTransactions in a banking application, two deposits of the same amount to the same account on the same day are still\ndistinct transactions. They have identity and are entities.\n\n> When an object is distinguished by its identity, rather than its attributes, make this primary to its definition in\n> the model. Keep the class definition simple and focused on life cycle continuity and identity. Define a means of\n> distinguishing each object regardless of its form or history.\n\nIdentity - this may simply mean unique identifier.\n\nEach entity must have an operational way of establishing its identity with another object - distinguishable even from\nanother object with the same descriptive attributes.\n\nDefining identity demands understanding of the domain.\n\nVALUE OBJECTS. An object that represents a descriptive aspect of the domain with no conceptual identity. These are\nobjects that describe things. When you care only about the attributes of an element of the model, classify it as a value\nobject.\n\nSERVICES. Some concepts from the domain aren't natural to model as objects. Forcing the required domain functionality to\nbe the responsibility of an entity or value either distorts the definition of a model-based object or adds meaningless\nartificial objects. A service is an operation offered as an interface that stands alone in the model, without\nencapsulating state. The name *service* emphasises the relationship with other objects. Service have to be stateless.\n\nMODULES. Many don't consider modules as part of the model. Yet it isn't just code being divided into modules, but\nconcepts. Low coupling between modules minimises the cost of understanding their place in the design. It is possible to\nanalyse the contents of one module with a minimum of reference to others that interact.\n\nChoose modules that tell the story of the system and contain a cohesive set of concepts. Give the modules names that\nbecome part of the ubiquitous language. Modules and their names should reflect insight into the domain.\n\nModules need to co-evolve with the rest of the model. This means refactoring modules right along with the model and\ncode. But this refactoring often doesn't happen.\n\nUse packaging to separate the domain layer from other code. Otherwise, leave as much freedom as possible to the domain\ndevelopers to package the domain objects in ways that support their model and design choices.\n\n## Chapter 6: The Life Cycle of a Domain Object\n\nThe challenges:\n\n- Maintaining object integrity throughout the life cycle\n- Preventing the model from getting swamped by the complexity of managing the life cycle\n\nThese issues can be addressed using 3 patterns.\n\nAGGREGATES. It is difficult to guarantee the consistency of changes to object in a model with complex associations.\nInvariants need to be maintained that apply to closely related groups of objects, not just discrete objects. Yet\ncautious locking schemes cause multiple users to interfere pointlessly with each other and make a system unusable. An\naggregate is a cluster or associated objects that we treat as a unit for the purpose of data changes. Each aggregate has\na root and a boundary. Chose one entity to be the root of each aggregate, and control all access to the objects inside\nthe boundary through the root. Allow external objects to hold references to the root only.\n\nFACTORIES. When creation of an object, or an entire aggregation, becomes complicated or reveals too much of the internal\nstructure, factories provide encapsulation (assembly of a car: cars are never assembled and driven at the same time,\nthere is no value in combining both of these functions into the same mechanism). Creation of an object can be a major\noperation by itself, but complex assembly operations do not fit the responsibility of the created objects. Combining\nsuch responsibilities can produce ungainly designs that are hard to understand. Making the client direct construction\nmuddies the design of the client, breaches encapsulation of the assembled object or aggregate, and overly couples the\nclient to the implementation of the created object.\n\nTwo basic requirements for any good factory:\n\n1. Each creation method is atomic and enforces all invariants of the created object or aggregate.\n2. The factory should be abstracted to the type desired, rather than the concrete class created\n\nREPOSITORIES. Associations allow us to find an object based on its relationship to another. But we must have a starting\npoint for a traversal to an entity of value in the middle of its life cycle. For each type of object that needs global\naccess, create an object that can provide the illusion of an in-memory collection of all objects of that type. Set up\naccess through a well-known global interface. Provide methods to add and remove objects, which will encapsulate the\nactual insertion or removal of data in the data store. Provide methods that select objects based on some criteria and\nreturn objects. Provide repositories only for aggregate roots that actually need direct access. Keep the client focused\non the model, delegating all object storage and access to the repositories.\n\nRepository provide methods that allow a client to request objects matching some criteria.\n\n## Chapter 7: Using the Language: An Extended Example\n\nThe model organises domain knowledge and provides a language for the team. Each object in the model has a clear meaning.\n\nTo prevent domain responsibilities from being mixed with those of other parts of the system apply layered architecture.\n\nModeling and design is not a constant forward process. It will grind to a halt unless there is a frequent refactoring to\ntake advantage of new insights to improve the model and the design.\n\nThe real challenge is to actually find an incisive model, one that captures subtle concerns of the domain experts and\ncan drive a practical design. Ultimately, we hope to develop a model that captures a deep understanding of the domain.\n\nRefactoring is the redesign of software in ways that do not change its functionality. Rather than doing elaborate\nup-front design decisions, developers take code through a continuous series of small, discrete design changes, each\nleaving existing functionality unchanged while making the design more flexible or easier to understand.\n\nInitial models usually are naive and superficial, based on shallow knowledge. Versatility, simplicity and explanatory\npower come from a model that is truly in tune with the domain.\n\nYou will usually depend on creativity and trail and error to find good ways to model the concepts you discover.\n\n## Chapter 8: Breakthrough\n\nThe returns from refactoring are not linear. Usually there is marginal return for a small effort, and the small\nimprovements add up.\n\nSlowly but surely, the team assimilates knowledge and crunches it into a model. Each refinement of code and model gives\ndevelopers a clearer view. This clarity creates the potential for a breakthrough.\n\nDon't become paralysed trying to bring about a breakthrough. The possibility usually comes after many modest\nrefactorings. Most of the time is spent making piecemeal improvements, with model insights emerging gradually during\neach successive refinement.\n\nDon't hold back from modest improvements, which gradually deepen the model, even if confined with the same general\nconceptual framework.\n\n## Chapter 9: Making Implicit Concepts Explicit\n\nA deep model has power because it contains the central concepts and abstractions that can succinctly and flexibly\nexpress essential knowledge of the user's activities, their problems and their solutions.\n\nThe first step is to somehow represent the essential concepts of the domain in the model. Refinement comes later, after\nsuccessive iterations of knowledge crunching and refactoring. But this process really gets into gear when an important\nconcept is recognised and made explicit in the model and design.\n\nTransformation of a formerly implicit concept into an explicit one is a breakthrough that leads to a deep model. More\noften, though, the breakthrough comes later, after a number of important concepts are explicit in the model.\n\nListen to the language the domain experts use. Are there terms that succinctly state something complicated? Are they\ncorrecting your word choice? Do the puzzled looks on their faces go away when you use a particular phrase? These are\nhints of a concept that might benefit the model.\n\nConstraints make up a particularly important category of model concepts. They often emerge implicitly, and expressing\nthem explicitly can greatly improve a design. Sometimes constraints find a natural home in an object or separate method.\n\nSpecification - a predicate that determines if an object does satisfy some criteria.\n\n## Chapter 10: Supple Design\n\nThe ultimate purpose of software is to serve users. But first, that same software has to serve developers. This is\nespecially true in a process that emphasises refactoring.\n\nWhen software with complex behaviour lacks good a design, it becomes hard to refactor or combine elements. Duplication\nstarts to appear as soon as a developer isn't confident of predicting the full implications of computation. Duplication\nis forced when design elements are monolithic, so that the parts cannot be recombined.\n\nSupple design is the complement to deep modelling. Once you have dug out implicit concepts and made them explicit, you\nhave the raw material. Thorough the iterative cycle, you hammer that material into a useful shape.\n\nIf developer must consider the implementation of a component in order to use it, the value of encapsulation is lost.\nTames should conform to the ubiquitous language so that team members can quickly infer their meaning. Write a test of a\nbehaviour before creating it, to force your thinking into client developer mode.\n\nPlace as much of the logic of the program as possible into functions, operations that return results with no observable\nside effects.\n\nDecompose design elements (operations, interfaces, classes and aggregates) into cohesive units, taking into\nconsideration your intuition of the important divisions in the domain. Align the model with the consistent aspects of\nthe domain that make it a viable area of knowledge in the first place.\n\nLow coupling is fundamental to object design. When you can go all the way. Eliminate all other concepts from the\npicture. Then the class will be completely self-contained and can be studied and understood alone. Every such\nself-contained class significantly eases the burden of understanding a module.\n\nWhere it fits, define an operation whose return type is the same as the type of its arguments.\n\n## Chapter 11: Applying Analysis Patterns\n\n> Analysis patterns are groups of concepts that represent a common construction in business modelling. It may be\n> relevant to only one domain, or it may span many domains.\n\nAn analysis pattern is a template for solving an organizational, social or economic problem in a professional domain.\n\n## Chapter 12: Relating Design Patterns to the Model\n\nNot all design patterns can be used as domain patterns.\n\nSTRATEGY - Domain models contain processes that are not technically motivated but actually meaningful in the problem\ndomain. When alternative processes must be provided, the complexity of choosing the appropriate process combines with\nthe complexity of the multiple processes themselves, and things get out of hand. Factor the varying parts of process\ninto a separate \"strategy\" object in the model. Factor apart a rule and the behaviour it governs. Implement the rule or\nsubstitutable process following the strategy design pattern. Multiple versions of the strategy object represents\ndifferent ways the process can be done.\n\nCOMPOSITE - When the relatedness of nested containers is not reflected in the model, common behaviour has to be\nduplicated at each level of the hierarchy, and nesting is rigid. Clients must deal with different levels of the\nhierarchy through different interfaces, even though there may be no conceptual difference they care about. Recursion\nthrough the hierarchy to produce aggregated information is very complicated. Define an abstract type that encompasses\nall members of the composite. Methods that return information are implemented on containers to return aggregated\ninformation about their contents. Leaf nodes implement those methods based on their own values. Clients deal with the\nabstract type and have no need to distinguish leaves from containers.\n\n## Chapter 13: Refactoring Toward Deeper Insight\n\nMultifaceted process. There are 3 things you have to focus on:\n\n1. Live in the domain\n2. Keep looking at things a different way\n3. Maintain an unbroken dialog with domain experts\n\nSeeking insight into the domain creates a broader context for the process of refactoring.\n\nRefactoring toward deeper insight is a continuing process. Implicit concepts are recognised and made explicit.\nDevelopment suddenly comes to the brink of a breakthrough and plunges through to a deep model.\n\n## Chapter 14: Managing Model Integrity\n\nTotal unification of the domain model for a large system will not be feasible or cost-effective.\n\nBOUNDED CONTEXT. Multiple models are in play on any large project. Yet when code based on distinct models is combined,\nsoftware becomes buggy, unreliable and difficult to understand. Communication among team members becomes confused. It is\noften unclear in what context a model should not be applied. Therefore, explicitly define the context within which a\nmodel applies. Explicitly set boundaries in terms of team organisation, usage within specific parts of the application.\nAnd physical manifestations such as code bases and database schemas. Keep the model strictly consistent within these\nbounds, but don't be distracted or confused by issues outside.\n\nCONTINUOUS INTEGRATION. When a number of people are working in the same bounded context, there is a strong tendency for\nthe model to fragment. The bigger the team, the bigger the problem, but a few as three or four people can encounter\nserious problems. Yet breaking down the system into even-smaller contexts eventually loses a valuable level of\nintegration and coherency. Therefore, institute a process of merging all code and other implementation artefacts\nfrequently, with automated tests to flag fragmentation quickly. Relentlessly exercise the ubiquitous language to hammer\nout a shared view of the model as the concepts evolve in different people's heads.\n\nCONTEXT MAP. People on other teams will not be very aware of the context sounds and will unknowingly make changes that\nblur the edges or complicate the interconnections. When connections must be made between different contexts, they tend\nto bleed into each other. Therefore, identify each model in play on the project and define its bounded context. This\nincludes the implicit models of non-object-oriented subsystems. Name each bounded context, and make the names part of\nubiquitous language. Describe the points of contact between the models, outlining explicit translation for any\ncommunication and highlighting any sharing.\n\nSHARED KERNEL. Uncoordinated teams working on closely related applications can go racing forward for a while, but what\nthey produce may not fit together. They can end up spending more on translation layers and retrofitting than they would\nhave on continuous integration in the first place, meanwhile duplicating effort and losing the benefits of a common\nubiquitous language. Therefore, designate some subset of the domain that the two teams agree to share. Of course this\nincludes, along with this subset of the model, the subset of code or of the database design associated with that part of\nthe model. This explicitly shared stuff has special status, and shouldn't be changed without consultation with the other\nteam. Integrate a functional system frequently, but somewhat less often than the pace of continuous integration within\nthe teams. At these integrations, run the tests of both teams.\n\nCUSTOMER / SUPPLIER DEVELOPMENT TEAMS. The freewheeling development of the upstream team can be cramped if the\ndownstream team has no veto power over changes, or if procedures for requesting changes are too cumbersome. The upstream\nteam may even be inhibited, worried about breaking the downstream system. Meanwhile, the downstream team can be\nhelpless, at the mercy of upstream priorities. Therefore, establish a clear customer / supplier relationship between the\ntwo teams. In planning sessions, make the downstream team play the customer role to the upstream team. Negotiate the\nbudget and tasks for downstream requirements so that everyone understands the commitment and schedule.\n\nCONFORMIST. When two development teams have an upstream / downstream relationship in which the upstream has no\nmotivation to provide for the downstream team's needs, the downstream team is helpless. Therefore, eliminate the\ncomplexity of translation between bounded contexts by slavishly adhering to the model of the upstream team.\n\nANTI-CORRUPTION LAYER. When a new system is being built that must have a large interface with another, the difficulty of\nrelating two models can eventually overwhelm the intent of the new model altogether, causing it to be modified to\nresemble the other system's model, in an ad hoc fashion. Therefore, create an isolating layer to provide clients with\nfunctionality in terms of their own domain model. The layer talks to the other system through its existing interface,\nrequiring little or no modification to the other system.\n\nSEPARATE WAYS. Integration is always expensive. Sometimes the benefit is small. Therefore, declare a bounded context to\nhave no connection to the others at all, allowing developers to find simple, specialised solutions within this small\nscope.\n\nOPEN HOST SERVICE. When a subsystem has to be integrated with many others, there is more and more to maintain and more\nand more to worry about when changes are made. Therefore, define a protocol that gives access to your subsystem as a set\nof services.\n\nPUBLISHED LANGUAGE. Direct translation to and from the existing domain models may not be a good solution. Those models\nmay be overly complex or poorly factored. Therefore, use a well-documented shared language that can express the\nnecessary domain information as a common medium of communication, translating as necessary into and out of that\nlanguage.\n\n## Chapter 15: Distillation\n\nCORE DOMAIN. In designing a large system, there are so many contributing components, all complicated and all absolutely\nnecessary to success, that the essence of the domain model, can be obscured and neglected. Therefore, boil the model\ndown. Make the core small.\n\nGENERIC SUBDOMAINS. Anything extraneous makes the core domain harder to discern and understand. Therefore, identify\ncohesive subdomains that are not the motivation for your project. Factor out generic models of these subdomains and\nplace them in separate models.\n\nDOMAIN VISION STATEMENT. In later stages of development, there is a need for explanation the value of the system that\ndoes not require an in-depth study of the model. Therefore, write a short description of the core domain. Keep it\nnarrow. Write this statement early and revise it as you gain new insight.\n\nHIGHLIGHTED CORE. The mental labor of constantly filtering the model to identify key parts absorbs concentration better\nspent on design thinking, and it requires comprehensive knowledge of the model. Therefore, write a brief document that\ndescribes the core domain and the primary interactions among core elements.\n\nCOHESIVE MECHANISMS. Computations sometimes reach a level of complexity that begins to bloat the design. The\nconceptual *what* is swamped by the mechanistic *how*. Therefore, partition a conceptually cohesive mechanism into a\nseparate lightweight framework.\n\nSEGREGATED CODE. Elements in the model may partially serve the core domain and partially play supporting role. Core\nelements may be tightly coupled to generic ones. Therefore, refactor the model to separate the core concepts from\nsupporting players and strengthen the cohesion of the core while reducing its coupling to other code.\n\nABSTRACT CORE. When there is a lot of interaction between subdomains in separate modules, either many references will\nhave to be created between modules, which defeats much of the value of the partitioning or the interaction will have to\nbe made indirect, which makes the model obscure. Therefore, identify the most fundamental concepts in the model and\nfactor them into distinct classes, abstract classes or interfaces.\n\n## Chapter 16: Large-Scale Structure\n\nEVOLVING ORDER. Design free-for-all's produce systems no one can make sense of as whole. Therefore, let this conceptual\nlarge-scale structure evolve with the application, possibly changing to a completely different type of structure along\nthe way. Don't over constrain the detailed design and model decisions that must be made with detailed knowledge.\n\nSYSTEM METAPHOR. Software decisions tend to be very abstract and hard to grasp. Developers and users alike need tangible\nways to understand the system and share a view of the system as a whole. Therefore, organise the design around metaphor\nand absorb it into the ubiquitous language.\n\nRESPONSIBILITY LAYERS. When each individual object has handcrafted responsibilities, there are no guidelines, no\ninfirmity and no ability to handle large swaths of the domain together. Therefore, look at the conceptual dependencies\nin your model and the varying rates and sources of change of different parts of your domain. Refactor the model so that\nthe responsibilities of each domain object fit nearly within the responsibility of one layer.\n\nKNOWLEDGE LEVEL. In application in which the roles and relationships between entities vary in different situations,\ncomplexity can explode. Objects end up with references to other types to cover a variety of cases, or with attributes\nthat are used in different ways in different situations. Therefore, create a distinct set of objects that can be used to\ndescribe and constrain the structure and behaviour of the basic model.\n\nPLUGGABLE COMPONENT BEHAVIOUR. When a variety of applications have to interoperate, all based on the same abstractions\nbut designed independently, translations between multiple bounded contexts limit integration. Duplication and\nfragmentation raise costs of development and installation. Therefore, distill an abstract core of interfaces and\ninteractions and create a framework that allow diverse implementations of those interfaces to be freely substituted. \n"
  },
  {
    "path": "books/ddia.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems\n\nBook by Martin Kleppmann\n\n- [Chapter 1: Reliable, Scalable and Maintainable Applications](#chapter-1-reliable-scalable-and-maintainable-applications)\n- [Chapter 2: Data Models and Query Languages](#chapter-2-data-models-and-query-languages)\n- [Chapter 3: Storage and Retrieval](#chapter-3-storage-and-retrieval)\n- [Chapter 4: Encoding and Evolution](#chapter-4-encoding-and-evolution)\n- [Chapter 5: Replication](#chapter-5-replication)\n- [Chapter 6: Partitioning](#chapter-6-partitioning)\n- [Chapter 7: Transactions](#chapter-7-transactions)\n- [Chapter 8: The Trouble with Distributed Systems](#chapter-8-the-trouble-with-distributed-systems)\n- [Chapter 9: Consistency and Consensus](#chapter-9-consistency-and-consensus)\n- [Chapter 10: Batch Processing](#chapter-10-batch-processing)\n- [Chapter 11: Stream Processing](#chapter-11-stream-processing)\n- [Chapter 12: The Future of Data Systems](#chapter-12-the-future-of-data-systems)\n\n## Chapter 1: Reliable, Scalable and Maintainable Applications\n\nMay applications today are data-intensive, CPU is not a problem but amount of data, its complexity and speed of change.\nThey are built from standard building blocks: database, cache, search index, stream processing, batch processing. These\nbuilding blocks have many variants.\n\n*Reliability* - performs as expected, tolerates user's mistakes, good performance, continues to work even if things go\nwrong.\n\nHardware faults - on a cluster with 10 000 disks, you can expect, on average, one disk to die per day. Nowadays,\nmulti-machine redundancy is no longer required - only in few use cases.\n\nSoftware errors - e.g. many applications hang simultaneously on 30.06.2012 because of bug in Linux kernel. This kind of\nbugs lie dormant for a long time until they are triggered by an unusual set of circumstances.\n\nHuman errors - humans are responsible for the majority of errors. There are measures that can be taken in order to\nprevent the errors:\n\n- well-defined abstractions, easy to use tools, interfaces that discourage doing the wrong things\n- provide fully functional non-production sandbox environment where people can explore and experiment with real data\n- test thoroughly at all levels (unit tests, integration, ...)\n- provide tools that can recompute the data in case of errors in the past\n- set up detailed monitoring\n\n*Scalability* - system's ability to cope with increased load. Load can be described with a few numbers (load parameters)\n, e.g. requests per second, read/write ratio, number of simultaneous connections, hit rate on cache or something else.\n\n*Describing performance*\n\nResponse times (client waiting time) vary, always look at averages or medians (p50). In order to know how bad you\noutliers are you need to look at 95th, 99th and 99.9th percentiles. High percentiles (tail latencies) are important\nbecause they directly affect users' experience. Anyhow, optimising 99.99th percentile might be really expensive.\n\nSLO (service level objectives) and SLA (service level agreements) - contracts that define the expected performance and\navailability of a service. Example SLA: service up and median response time < 200 ms, 99th percentile < 1s. High\npercentiles are extremely important in backend services that are called multiple times as part of serving a single\nend-user request.\n\n*Approaches for coping with load*\n\nArchitecture might need to be reworked on every order of magnitude load increase. Because application could handle 2x\nbigger load, it doesn't mean it will handle 10x that load.\n\nScaling up / vertical scaling- moving to more powerful machine. Scaling out / horizontal scaling - distributing the load\nacross multiple machines.\n\nDistributing stateless services across multiple machines is easy, stateful data systems form a single node to a\ndistributed setup can introduce a lot of additional complexity. There is no a single, universal approach for all\napplications, design is very often highly specific.\n\n*Maintainability*\n\nDesign and build systems that will minimise pain during maintenance. Make it easy to understand for new engineers. Allow\nfor easy changes, adapting for unanticipated use cases as requirements change.\n\n*Simplicity*\n\nProject's complexity grove with time, this slows everyone down. Symptoms of complexity:\n\n- explosion of state space\n- tight coupling of modules\n- tangled dependencies\n- inconsistent naming and terminology\n- special-casing to work around issues\n\nComplex software makes it easy to introduce bugs, system makes it harder to understand hidden assumptions, unintended\nconsequences and many more. **Simplicity should be a key goal for the systems we build**. One of the best tools for\nremoving complexity is *abstraction*. Great abstraction can hide implementation details behind a clean interface.\n\n*Evolvability*\n\nRequirements change, you learn new facts, new use cases emerge, priorities change, etc. Agile provides a framework for\nadapting to change. Modify system and adapt it to changing requirements - pay attention to simplicity and abstractions.\n\n## Chapter 2: Data Models and Query Languages\n\n*Relational Model vs Document Model*. Relational Databases turned out to generalise very well. NoSQL (*Not Only SQL*) is\nthe latest attempt to overthrow the relational model's dominance.\n\nDriving forces behind NoSQL:\n\n- a need for greater scalability - very large datasets / very high write throughput\n- many open source projects\n- specialised query operations\n- frustration with restrictiveness of relational model\n\nA rule of thumb: is you are duplicating values that could be stored in just one place the schema is not normalised.\n\nMany-to-many relationships are widely used in relational databases, NoSQL reopened the debate on how best to represent\nsuch relationship.\n\nIf your data has document-like structure, then it's probably a good idea to use a document model. The relational\ndatabase and its shredding (splitting document-like structure into multiple tables) can lead to unnecessary complicated\napplication code.\n\nProblems with document model: you can not access nested object directly you need to use access path, also it is not\nperforming well in many-to-many relationships.\n\nDatabase schema can be compared to languages: relational - compiled language with static typing, document - dynamic (\nruntime) type checking - schema on read.\n\nData locality - because document databases store document as a string continuous string - JSON, XML, ... - often access\nwill be faster because of locality, if data is split across multiple tables -> multiple disks -> more disk seeks -> more\ntime required. However, the document database will need to load entire document even if you need a small portion of it.\n\n*Query Languages for Data*\n\nSQL is declarative - you define what you want, and it is up to the computer do determine how to get this data. Most\nprogramming languages are imperative - you define how to process the data.\n\n*MapReduce Querying*\n\nMapReduce - programming model for processing large amounts of data in bulk across many machines. Limited form of\nMapReduce is supported by some noSQL data-stores. Something between declarative and imperative programming.\n\n*Graph-Like Data Models*\n\nVery good approach form data with many-to-many relationships. Each vertex has: ID, set of outgoing edges, set of\noutgoing edges, a collection of properties (key-value pairs). Each edge has: ID, the tail vertex, the head vertex, label\ndescribing the type of relationship, a collection of properties (key-value pairs).\n\nGraphs give great flexibility in modeling relationships. e.g. France has departments and regions, whereas the US has\ncounties and states.\n\nCypher is a declarative query language for property graphs, created for Neo4j DB, e.g.: find the names of all people who\nemigrated from the US to Europe:\n\n```cypher\nMATCH\n  (person) -[:BORN_IN]->  () -[:WITHIN*0..]-> (us:Location {name: \"United States\"}),\n  (person) -[:LIVES_IN]-> () -[:WITHIN*0..]-> (eu:Location {name: \"Europe\"})\nRETURN person.name\n```\n\nThis can be expressed in SQL (recursive common table expressions), but with one difficulty, `LIVES_IN` might point to\nany location (region, country, state, continent), here we are interested only in the US and Europe. 4 lines in Cypher vs\n29 lines in SQL.\n\n*Triple-Stores*\n\nVery similar to graph model, all information stored in the form of very simple three-part\nstatements: `(subject, predicate, object)`, e.g. `(Jim, likes, bananas)`.\n\n## Chapter 3: Storage and Retrieval\n\nIn order to tune a storage engine to perform well on your kind of workload, you need to have a rough idea of what the\nstorage engine is doing under the hood.\n\n*Data Structures That Power Your Database*\n\nHash Indexes. For example: Key and offset pairs. SSTable - Sorted String Table.\n\nB-Trees - most widely used indexing structure, standard index implementation for almost all relational databases and for\nmany non-relational databases. B-trees keep key-value pairs sorted by key, which allows efficient key-value lookups. The\nnumber of references to child pages in one page of the B-tree is called the branching factor. A B-tree with *n* keys\nalways has depth of *O(log n)*. Most databases can fit into a B-tree that is 3-4 levels deep. 4-level tree of 4KB pages\nwith a branching factor of 500 can store up to 256TB of data.\n\nIn order to make db resilient to crashes, it is common for B-tree implementations to include an additional data\nstructure on disk - WAL - write-ahead log - append only file, every B-tree modification must be written before it can be\napplied on the pages of the tree. When db crashes, this log is used to restore the B-tree to consistent state.\n\nLSM-tree:\n\n- faster for writes\n- can be compressed better, thus often produce smaller files on disk\n- lower storage overheads\n- compaction process can sometimes interfere with the performance of ongoing reads and writes\n- if throughput is high and compaction is not configured carefully, compaction might not keep up with the rate of\n  incoming writes\n\nB-trees are so old, and so well optimised so that they can deliver good, consistent performance for many workloads.\n\nKey-value indexes are for primary key index in the relational model. It is also common to have secondary index. They\ndon't have to be unique - this can be solved for example by appending row ID.\n\nClustered index - storing all row data within the index.\n\nConcatenated index - multi-column index, combines several fields into one key by appending one column to another.\n\nWhat if you search for misspelled data or similar data. Lucene is able to search text for words within a certain edit\ndistance.\n\nData structures discussed so far are specific for disks. However, as RAM becomes cheaper and many datasets are feasible\nto keep in memory. This led to the development of in-memory databases. Some in-memory key-value stores (Memcached) are\nintended for caching, data can be lost on machine restart. Other in-memory databases aim for durability, which can be\nachieved with special battery-powered RAM, by writing a log changes to disk or replicating memory state to other\nmachines. When restarted it needs to load the data from the disk of from a replica. Even though it is an in-memory\ndatabase, a disk is still used. Other in-memory databases with relational model: VoltDB, MemSQL, Oracle TimesTen.\nRAMCloud is a key-value store, Redis and Couchbase provide weak durability by writing to disk asynchronously.\n\nIn-memory databases achieve better performance.\n\nOLTP - Online Transaction Processing - interactive applications - look up fa small number of records, insert or update\nrecords based on user's activity.\n\nOLAP - Online Analytic Processing - second patterns - analytic queries.\n\nIn 90s companies stopped using OLTP systems for analytics purposes and shifted to OLAP for running analytics on a\nseparate database. This separate database is called a data warehouse.\n\nData warehouse - separate database that analyst can query without affecting OLTP operations. Read-only copy of the data.\nData extracted from OLTP databases, transformed into analysis-friendly schema. Process of getting data info the\nwarehouse is known as Extract-Transform-Load. Biggest advantage of OLAP for analysis is that this database can be\noptimised for large queries.\n\nMany data warehouses use star schema (dimensional modeling). Variation of this schema is called the snowflake schema.\nSnowflakes are more normalised than stars.\n\nFact tables are often 100 columns wide, however `SELECT * ` queries are rarely used. In most OLTP databases, storage is\nlaid out in a row-oriented fashion. How to execute queries more efficiently? The idea behind column-oriented storage is\nsimple: don't store all the values from one row together, but store all values from each column together. e.g. one file\n= one column - much faster than parsing each row.\n\nColumns can be compressed using for example bitmap encoding - unique values encoded using bits. Efficient in situations\nwhere only few unique values and millions of records. Column compression allows mor rows from a column to fit in L1\ncache.\n\n## Chapter 4: Encoding and Evolution\n\nRolling update / staged rollout - deploying the new version to a few nodes at a time, checking whether the new version\nis running smoothly. With client-side applications you are at mercy of the user, who may not install the update for some\ntime. This means that old and new versions of the code might co-exist for some time.\n\nBackward compatibility - newer code can read data that was written by older code (normally not hard).\n\nForward compatibility - older code can read data that was written by newer code (this is trickier).\n\nPrograms usually work with data in 2 representations:\n\n- in memory - objects, lists, arrays, trees - data structures optimised for efficient access and manipulation by the CPU\n- byte sequence structures - for example JSON\n\nThe translation from the in-memory to a byte sequence is called encoding. The reverse is called decoding (also: parsing,\ndeserialization, unmarshalling).\n\nMany programming languages have built-in support for encoding in-memory data structures. Python has pickle, Java has\nSerializable, Ruby has Marshal, however:\n\n- encoding is tied to programming language\n- potential source of security issues\n- Java's built-in serialisation is said to have bad performance\n\nIn general, it is bad idea to use language's built-in encoding for anything other than very transient purposes.\n\nJSON:\n\n- built-in support in browsers\n- distinguishes strings and numbers\n- good support for unicode, no support for binary strings\n\nXML:\n\n- too verbose\n- no distinction between numbers and strings\n- good support for unicode, no support for binary strings\n\nCSV:\n\n- less powerful than XML and JSON\n- no distinction between numbers and strings\n- no data schema\n\nDespite flaws of JSON, XML and CSV they are good enough for many purposes, and they will remain popular.\n\nJSON is less verbose than XMAL, but still uses a lot of space - this might be an issue when you are dealing with\nterabytes of data. This led to the development of binary encodings for JSON - BSON, BJSON, UBJSON, BISON. XMAL has also\nits binary encodings - WBXML and Fast Infoset. However, none of them are widely adopted.\n\nApache Thrift (Facebook), Protocol Buffers (Google) are binary encoding libraries that are based on the same principle.\nSchema defined in interface definition language, this schema can generate code for encoding and decoding data.\n\nField numbers in Apache Thrift are used for more compact encoding (no need for passing field names through the wire -\nCompactProtocol). Required / optional makes no difference for encoding, this is used for the runtime.\n\nEvery field you add after the initial deployment of schema must be optional of have default value. Removing is like\nadding, you can remove only optional fields. Also, with ProtoBuf / Thrift you can never use the same tag number again.\n\nAvro is another binary encoding format, it has optional code generation for dynamically typed programming languages.\n\nData can flow through:\n\n- database\n- services - REST and RPC, services are similar to databases, they allow clients to submit and query data. A key design\n  goal of a service-oriented / microservices architecture is to make the application easier to change and maintain by\n  making services independently deployable and evolvable.\n\nREST is not a protocol, but rather a design philosophy that builds upon the principles of HTTP.\n\nSOAP - XML-based protocol for making network API requests.\n\nRPC - Remote Procedure Call - seems convenient at first, but the approach is fundamentally flawed, because a network\nrequest is very different from a local function call:\n\n- local function is predictable - it can either succeed or fail depending on the input, a network request is\n  unpredictable - connection might be lost\n- a local function call either returns a result or throws an exception, a network request may return without a result -\n  timeout\n- retry mechanism might cause duplication (fist request went through), unless you build deduplication mechanism\n- duration of remote call depends on the network congestion\n- when you call a local function you can effectively pass references\n- if client and server use different languages, data translation might end up ugly\n\nDespite these problems RPC is not going away, modern frameworks are more explicit about the fact that a remote call i\ndifferent from local function invocation.\n\n- message passing - something between database passing and services. Similar to RPC because client's request is\n  delivered with low latency, similar to databases because message is not sent via a direct network connection but goes\n  through message broker.\n\nMessage brokers have a couple of advantages comparing to RPC:\n\n- can acs as a buffer when recipient is unavailable\n- can automatically redeliver messages\n- avoids the sender to know the IP address and port\n- one message can be sent to multiple recipients\n- logical decoupling between sender and receiver\n\n## Chapter 5: Replication\n\nShared-Nothing Architecture - each machine or virtual machine running the database is called a node. Each node uses its\nown CPU, RAM and disks independently. Any coordination between nodes is done at the software level using network.\n\nReplication - means keeping a copy of the same data on multiple machines that are connected via a network. Why?\n\n- to reduce latency - copy close to the users\n- to allow the system to continue working\n- to scale out\n\nIf data is not changing, replication is easy, for dealing with replication changes, following algorithms can be used:\nsingle-leader, multi-leader and leaderless replication.\n\nLeaders and Followers - each node (replica) stores a copy of the database . One of the replicas is designated to be a\nleader (master), when clients want to write to the database, they must send their requests to the leader. Other replicas\n\n- followers (slaves), take the log from the leader and updates local copy of the data, applying all writes in the same\n  order as they were processed by the leader. Writes are accepted only to the leader, read can be performed using any\n  follower.\n\nOn follower failure, if the connection between leader and follower is temporarily corrupted, follower can recover\neasily, because it knows the last processed transaction from the log. It can request missing data from the last\nsuccessful transaction. Leader failure is trickier. One of the followers can be promoted to be the new leader, for\nexample replica with the most recent data (election) - data loss minimisation.\n\nImplementation of Replication Logs:\n\n- Statement-based replication - leader logs every request (statement) - for relational database this means thet every\n  INSERT / UPDATE / DELETE statement is forwarded to followers, each follower executes received SQL statement (as if it\n  had been received from a client)\n    - Problems - what about NOW, RAND - nondeterministic, what about auto-incrementing fields, what about triggers.\n      There are some workarounds, like sending request and result or requiring deterministic transactions.\n- Write-ahead log (WAL) shipping - log is append-only sequence of bytes containing all writes, this log can be used to\n  build replica. This method is used in PostgreSQL and Oracle. Disadvantage of this approach is that log contains\n  low-level information - like which disk blocks were changed, so replication is closely coupled to the storage engine (\n  or even storage engine version!).\n- Logical log replication - alternative approach that uses different log format for replication - decoupling. Usually a\n  sequence of records describing writes to database tables at the granularity of a row. Easier backward compatibility -\n  leaders and followers can run different engine versions\n- Trigger-based replication - triggers have ability to log changes to a separate table, from which an external process\n  can read. This allows for replicating for example subset of data.\n\nProblems with replication lag:\n\n- leader-based replication is cool when we need to scale reads, not necessarily writes - common on the web\n- synchronous replication - single node failure can make entire system unavailable\n- asynchronous replication - follower might fall behind -> inconsistent data (this is temporary situation, if you stop\n  writing for a while, the followers will catch up and become consistent with the leader - eventual consistency)\n\nReplica lag - anomalies:\n\n- if user writes and then views, the new data might not yet have reached the replica. Read-your-writes consistency -\n  needed guarantee that if the user reloads the page, whey will always see updates they submitted themselves.\n    - Solution: owner of the profile views data from the leader, other users view from replica. There are modifications,\n      for example: if last update older than 1m -> view from replica.\n- when reading from asynchronous followers is that user can see things moving back in time - happens when user makes\n  several reads from different replicas\n    - Solution: monotonic reads - guarantee stronger than eventual consistency, if user makes several reads in sequence,\n      they will not see time go backward (never data after older data)\n- consistent prefix reads - is a sequence of writes happens in certain order, then anyone reading those writes will see\n  them appear in the same order\n    - Solution: make sure that any writes that are casually related to each other are written to the same partition OR\n      use algorithm that keep track of casual dependencies\n\nWhen working with an eventual consistent system, it is worth thinking about how the application behaves if the\nreplication lag increases to several minutes or hours.\n\nMulti-Leader Replication - more than one node accepting writes, each leader simultaneously acts as a follower to the\nother leaders. It rarely makes sense to use a multi-leader setup within a single datacenter, because benefits rarely\noutweigh the added complexity, however there are some situations in which this configuration makes sense:\n\n- multi-datacenter operation - one leader in each datacenter, multi-leader setup requires conflict resolution mechanism\n  which can be problematic. Multi-leader replication is often considered dangerous territory that should be avoided if\n  possible.\n- clients with offline operation - for example calendar app have to work even if it is disconnected from the internet,\n  if you make changes while you are offline then they need to be synced with a server and all other devices. This\n  basically means every device has a local database that acts as a leader. For example CouchDB is designed for this mode\n  of operation.\n\n- collaborative editing - multiple people editing the same document - e.g. Google Docs, very similar case to the\n  previous one. If you want to guarantee that there will be no editing conflicts, the application must obtain a lock on\n  the document before user can edit - this collaboration model is equivalent to single-leader replication with\n  transaction on the leader.\n\nHandling Write Conflicts:\n\n- make the conflict detection synchronous - wait for the write to be replicated to all replicas before telling the user\n  that write was successful\n- avoid conflicts - all writes can go through the same leader, requests from particular user are always routed to the\n  same datacenter and use the leader in that datacenter for writing and reading.\n- each replica should converge toward consistent state\n- custom conflict resolution - this might depend on the application, code might be executed on write or on read\n\nAutomatic Conflict Resolution - Amazon was frequently cited example of surprising effects due to conflict resolution\nhandler - customers were seeing items removed from the cart. Some ideas for automatic conflict resolution:\n\n- conflict-free replicated datatypes - family of data structures that can be concurrently edited by multiple users\n- merge-able persistent data structures - similar to GIT\n- operational transformation - algorithm behind Google Docs - designed specifically for concurrent edits of an ordered\n  list of items - e.g. string.\n\nReplication topology describes the communication paths along which writes are propagated from one node to another (\ncircular, star, all-to-all).\n\nLeaderless replication - the client sends directly its writes to several replicas, or coordinator node does this on\nbehalf of the client. When one node is down, some data might be down. For this reason when a client reads from the\ndatabase, it sends its requests to multiple replicas and uses data with the most version number. Eventually all the data\nwill be copied to every replica. 2 approaches with dealing with inconsistent data: whenever client notices inconsistency\nor background process looking for differences in the data.\n\nFor example in DynamoDB it is possible to set minimum number of replicas that saved the data to mark write as valid.\n\nIt is important to monitor replication lag, even if your application can tolerate stale reads.\n\nDynamo-style databases allow several clients to concurrently write to the same key - this means potential conflicts!\nEvents may arrive in a different order at different nodes, due to network delays and partial failures (replicas might\nstore different values). In order to become eventually consistent, the replicas should converge toward the same value.\nIt is up to the developer to resolve conflict:\n\n- last write wins - discard older values\n- detecting happens-before operations (btw. two operations might be considered concurrent when they overlap in time, not\n  necessarily at the same time)\n- merge concurrently written values\n- use version vectors - version number per replica and per key, each replica increments its own version number\n\n## Chapter 6: Partitioning\n\nPartitioning - breaking up the data into partitions (each piece of data belongs to exactly one partition). The main\nreason for partitioning is scalability - different partitions can be placed on different nodes.\n\nPartitioning is usually combined with replication. Copies of each partition are stored on multiple nodes. The goal with\npartitioning is to spread the data and the query load evenly across nodes. If every node takes fair share, e.g. 10 nodes\nshould be able to handle 10x much data. If partitioning is unfair it is called skewed. Skew makes partitioning less\neffective. In order to reduce skew data needs to be distributed evenly.\n\nOne way is to assign a continuous range of keys to each partition (PARTITION 1: A-B, PARTITION 2: C-D, ...). The ranges\nof keys are not necessarily evenly spaced, for example majority of entries with letter A. Partition boundaries need to\nbe carefully selected by application developer with domain knowledge. Partitioning by data is problematic too - all\nwrites going to single partition, whereas remaining partitions are idle. For example, you could solve this issue by\npartitioning first by name (for example sensor name) and then by the time, this will balance the load.\n\nSkew can be reduced by using hash function that is evenly distributing data across partitions. The partition boundaries\ncan be evenly spaced or chosen pseudorandomly (consistent hashing).\n\nConsistent Hashing - a way of evenly distributing load across an internet-wide system of caches such as CDN. Uses\nrandomly chosen partition boundaries to avoid the need for central control or distributed consensus.\n\nUsing hash of the key loses the ability to do efficient range queries (sort order lost).\n\nHashing a key can reduce hot spots, but can not reduce them entirely. For example celebrity on social media can cause\nstorm of activity - this may lead to many writes to the same key. It is up to application developer to handle hot spots\n\n- e.g. add random prefix.\n\nSecondary indexes are slightly more problematic because they don't identify a record uniquely. There are 2 main\napproaches to partitioning a database with secondary indexes:\n\n- document-based (local index) - each partition have (local) partitioned secondary indexes, this means reading requires\n  extra care. I am looking for a red car - needs to scatter query to two partitions - quite expensive. However, widely\n  used.\n\n- term-based (global index) - instead of each partition having its own secondary index, we can construct a global index.\n  A global index also needs to be partitioned - for example secondary key `color:red`, cars with names a-d on partition\n  0, rest on partition 1. Reads are more efficient, writes are slower.\n\nData change in the database - throughput increases, dataset increases, machine can fail. Rebalancing - the process of\nmoving data from one node to another. After rebalancing data should be shared fairly between nodes, when rebalancing\ndatabase should remain available for writes and reads and only minimal amount of data should be moved between nodes.\n\nDO NOT USE hash mod N when rebalancing between partitions. Problem with this approach is that number of nodes changes.\nThis requires moving more data than necessary when new node is added.\n\nBetter solution is to create fixed number of partitions (more partitions than the nodes, e.g. 10 nodes - 1000\npartitions)\n. If new node is added to the cluster, it can steal few partitions from the others. The only thing that changes is\npartitions assignment. This is followed by for example by ElasticSearch (number of partitions set up at the beginning).\nChoosing the right number of partitions is difficult.\n\nDynamic partitioning is suitable for key range partitioning.\n\nAutomatic rebalancing can be unpredictable, because it is expensive operation - rerouting requests and moving a large\namount of data, this can overload the network. For this reason it is a good approach to have human administrator\nperforming rebalancing.\n\nHow to route request to particular partition? How can system know where is data? This problem is known as service\ndiscovery. System can keep track of the data in separate register. Another possibility is that client connects to any\nnode, if node can not serve the request, client is forwarded to another node.\n\n## Chapter 7: Transactions\n\nOverhead of transactions > lack of transactions and coding around the lack of transactions.\n\nA transaction is a way for an application to group several reads and writes together into a logical unit. Either entire\ntransaction succeeds (commit) or fails (abort, rollback). If transaction fails, application can retry. With this error\nhandling is much simpler.\n\nHowever sometimes it might be beneficial to weaken transactions or abandon them entirely (for higher availability).\n\nThe safety guarantees provided by transactions are often described by ACID acronym. Implementations of ACID might vary\nbetween DBMSs.\n\n- Atomicity - (atomic refers to something that can not be broken into smaller parts), if error happens in the middle of\n  transaction, it has to be reverted. If a transaction was aborted, the application can be sure that it didn't change\n  anything, so it can be safely retried. Perhaps \"abortability\" would have been a better term than atomicity.\n- Consistency - (terribly overloaded term) in ACID - certain statements about the data must be always true (for example\n  correct account balance in banking system). If a transaction starts with a database that is valid, any writes during\n  the transaction preserve the validity.\n- Isolation - most databases are accessed by several clients at the same time, if they are accessing the same database\n  records, you can run into concurrency problems. Isolation means that concurrently executing transactions are isolated\n  from each other, they can not step on each other's toes. The classic database textbooks define isolation as\n  serialisability (however this is rarely sued because it has performance penalty).\n- Durability - the promise that once a transaction has committed successfully, any data it has written will not be\n  forgotten, even if there is a hardware fault or the database crash. Anyhow, perfect durability does not exist (for\n  example all backup destroyed at the same time).\n\nACID databases are based on this philosophy: \"if the database is in danger of violating its guarantee of atomicity,\nisolation or durability, it would rather abandon the transaction entirely than allow it to remain half-finished\".\n\nIsolation make life easier by hiding concurrency issues. In reality serialisation is not that simple, it has performance\ncost , therefore it so common for systems to use weaker levels of isolation, which protect against some concurrency\nissues. Common wisdom: \"Use ACID databases if you are handling financial data\", however many popular relational\ndatabases use weak isolation even though are considered ACID.\n\nRead committed - most basic level of transaction isolation, makes 2 guarantees:\n\n- no dirty reads - you will only see data that has been committed\n- no dirty writes - you will only override data that has been committed\n\nSnapshot isolation - read committed is not solving all the issues (for example non-repeatable reads - when you select\ndata in the middle of transaction). Data unavailable for few seconds is not a problem, more problematic are long-lasting\ndata inconsistencies. Read committed is a boon for long-running , read-only queries such as backups and analytics.\nTransaction should see a consistent snapshot of the database, frozen at a particular point in time (so data is not\nchanging when it is being processed). Key principle of snapshot isolation is: readers never block writers and writers\nnever block readers.\n\nFOR UPDATE tells the database to lock all rows returned by this query.\n\nSerialisable isolation is usually regarded as the strongest isolation level. It guarantees that even though transactions\nmay execute in parallel, the end result is the same as if they had executed one at a time, serially, without any\ncurrency. Database prevents all possible race conditions.\n\nThe simplest way of avoiding concurrency problems is to remove the concurrency entirely - one transaction at a time, in\nserial order or a single thread.\n\nStored procedures gained bad reputation for various reasons: each db vendor has its own language for stored procedures,\ncode running in a database is difficult to manage (hard to debug, awkward to version and deploy, trickier to test),\nbadly written procedure may harm overall DB performance. Modern implementations of stored procedures have abandoned\nPL/SQL and use existing general-purpose programming languages instead.\n\nSerial execution of transactions makes concurrency control much simpler, but limits the transaction throughput of the\ndatabase to the speed of a single CPU core on a single machine. Simple solution is to partition the database, each CPU\ncore have its own partition. However, if partition needs to access multiple partitions, the database must coordinate\nacross all the partitions that it touches.\n\nSerial execution is a viable way of achieving serialisable isolation within certain constraints:\n\n- every transaction must be small and fast\n- write throughput must be low enough to be handled on a single CPU core\n- cross-partition transactions are possible, but there is a hard limit to the extent to which they can be used\n\n2PL - Two-Phase Locking - widely used algorithm for serialisability in databases. Similar to \"no dirty writes\" - if two\ntransactions concurrently try to write the same object, the lock ensure that the second writer must wait until the first\none has finished its transactions before it may continue. More specifically:\n\n- If transaction A reads and B wants to write - B must wait until A commits or aborts\n- If A writes and B wants to read, B must wait until A commits or aborts\n\n2PL - writers don't just block other writes, they also block readers and vice-versa. The big downside of 2PL is\nperformance - worse throughput and response times comparing to weak isolation (because of overhead of acquiring and\nreleasing locks). Also called a \"pessimistic concurrency control mechanism\" - better to wait until situation is safe\nbefore doing anything.\n\nSI - Serialisable Snapshot Isolation - full serialisability with small performance penalty compared to snapshot\nisolation. Very young technology - 2008. Called \"optimistic concurrency control technique\". Instead of blocking\npotentially dangerous transactions, it allows them to work, hoping everything will turn out all right. When transaction\nwants to commit, database checks if everything is fine. It performs badly in high contention (many transactions\naccessing the same object) - many of transactions need to be aborted.\n\nReads from the database are made based on snapshot isolation.\n\n## Chapter 8: The Trouble with Distributed Systems\n\nAnything that can go wrong, will go wrong. Working with distributed systems is fundamentally different from writing\nsoftware on a single computer. When writing software that runs on several computers, connected by a network, the\nsituation is fundamentally different.\n\nPartial failure - some parts of the system are broken in an unpredictable way. Partial failures are nondeterministic.\nNondeterminism and partial failures is what makes distributed systems hard to work with.\n\nHigh-performance computing - supercomputers with thousands of CPUs are used for computationally intensive scientific\ncomputing tasks, such as whether forecasting or molecular dynamics.\n\nCloud computing - often associated with multi-tenant data centres, commodity computers connected with an IP network,\non-demand resource allocation and metered billing.\n\nTraditional enterprise data centres are somewhere between these two extremes.\n\nIf we want to make distributed systems work, we must accept the possibility of partial failure and build fault-tolerance\nmechanisms into the software. We need to build a reliable system from unreliable mechanisms (like communication over the\ninternet, network may fail, bits might be lost, however it somehow works, engineers managed to build something reliable\nbasing on unreliable foundations).\n\nWhat can go wrong when sending a request:\n\n- request may be lost\n- request might be waiting in a queue and will be delivered later\n- remote node may have failed or temporarily stopped responding\n- request might have been processed but was lost on a way back or was delayed or will be delivered later\n\nNetwork problems can be surprisingly common, even in controlled environments like a datacenter operated by one company (\neven 12 network faults per month in a medium-sized datacenter, half of them disconnected a single machine and a half an\nentire rack). EC2 is notorious for having frequent transient network glitches.\n\nMany systems need to automatically detect fault nodes, for example: load balancer needs to stop sending requests to a\nnode that is dead. Unfortunately it is hard to tell whether a node is working or not.\n\nTimeout is the only sure way of detecting a fault. Appropriate duration of timeout is difficult to estimate. The\ntelephone network uses circuit - a fixed, guaranteed amount of bandwidth between 2 callers. On the other hand TCP\ndynamically adapts the rate of data transfer to the available network capacity. TCP is optimised for busy networks,\ncircuit would not work for internet's use case.\n\nClocks and time is important, in distributed systems we never know the delay between send and received.\n\nTime-of-day clocks - returns current date and time according to some calendar. Usually synchronised with NTP (Network\nTime Protocol). Time-of-day clocks are unsuitable for measuring time (clock might be reset during measurement, because\nit was desynchronised).\n\nMonotonic clocks - suitable for measuring elapsed time, they are guaranteed to always move forward (time-of-day clock my\njump back in time). NTS may adjust monotonic clock frequency if it discovers it is too slow or too fast.\n\nSoftware must be designed on the assumption that the network will occasionally be faulty, and the software must handle\nsuch faults gracefully.\n\n> Distributed systems are different from programs running on a single computer - there is no shared memory, only massage\n> passing through unreliable network with variable delays and the systems may suffer from partial failures, unreliable\n> clocks and processing pauses.\n\nThere are algorithms designed to solve distributed systems problems:\n\n- synchronous model - assumes bounded network delay, process pauses and clock error, this means you know the delay, and\n  it will not exceed some fixed value. This model is not realistic\n- partially synchronous system - system behaves like a synchronous most of the time, but sometimes exceeds the bounds\n  for network delay, process pauses and clock drift\n- asynchronous model - any timing assumptions are not allowed\n- crash-stop faults - algorithm may assume that a node can fail in only one way - by crashing, once crashed never comes\n  back\n- crash-recovery faults - node can fail at any moment, but has some nonvolatile disk storage that is preserved across\n  crashes\n- byzantine faults - nodes may do absolutely anything, including trying to trick and deceive other nodes\n\nPartially synchronous and crash-recovery faults are the most common models.\n\nSafety of a system - nothing bad happens, liveness of a system - something good eventually happens. These two are often\nused for reasoning about the correctness of a distributed algorithm.\n\n## Chapter 9: Consistency and Consensus\n\nTolerating faults - keeping the service functioning correctly, even if some internal component is faulty. The best way\nof building fault-tolerant systems is to find some general-purpose abstractions with useful guarantees (e.g.\ntransactions).\n\nWhen working with a database that provides only weak guarantees (e.g. eventual consistency), you need to be constantly\naware of its limitations (e.g. when you write and immediately read there is no guarantee that you will see the value you\njust wrote).\n\nLINEARIZABILITY (atomic consistency, strong consistency, immediate consistency) - is to make a system appear as if there\nwere only one copy of the data and all operations are atomic.\n\nEasily confused with serialisability (both mean something like \"can be arranged in a sequential order\"):\n\n- Serialisability - an isolation property of transactions, it guarantees that transactions behave the same as if they\n  had executed in some serial order.\n- Linearisability - a recency guarantee on reads and writes of a register, it doesn't group operations together into\n  transactions, so does not prevent problems like write skew.\n\nUse cases for linearisability:\n\n- locking and leader election - a single-leader system needs to ensure there is indeed only one leader, not several (\n  split brain) - it must be linearlisable, all nodes must agree which node owns the lock.\n- constraints and uniqueness guarantees - for example unique usernames - you need linearisability. Hard uniqueness\n  constraint requires linearisability.\n- cross-channel timing dependencies - multiple components in a system can communicate which opens a possibility for race\n  conditions\n\nCAP theorem has been historically influential but nowadays has little practical value for designing systems. Better way\nof paraphrasing CAP would be \"either consistent or available when partitioned\".\n\nORDERING GUARANTEES.\n\nCausality imposes an ordering on the events (what happened before what) - question comes before answer, a message is\nsent before it is received, ... These chains of casually dependent operations define the casual order in the system. If\nsystem obeys the ordering imposed by causality, we say it is causally consistent.\n\nLinearisability ensures causality. However, it is not the only way of preserving causality - system can be causally\nconsistent without incurring the performance (the strongest possible consistency model that does not slow down due to\nnetwork errors).\n\nSequence Number Ordering - sequence numbers or timestamps (not really time-of-day clock, but some logical clock) used to\norder events. If there is not a single leader it is less clear how to generate sequence numbers for operations:\n\n- each node can generate its own independent sequence number + node ID\n- attach timestamp to each operation\n- preallocate blocks of sequence numbers (1-1000 for node A, 1001-2000 for node B, ...)\n\nMethods above allow generating unique sequence numbers efficiently, but do not capture correctly the ordering of\noperations across different nodes.\n\nLamport timestamp - method for generating sequence numbers that is consistent with causality. Every node keeps track of\nthe maximum counter value it has seen so far, and includes that maximum on every request. Each node appends its node ID\nto the final counter, if 2 counter values are the same, higher node ID wins.\n\nThe goal to get several nodes to agree on something is not easy, examples:\n\n- leader election - lack of communication may lead to split brain (multiple nodes believe themselves to be the leader)\n- atomic commit - in a system that supports transactions spanning several nodes, transaction may fail on some nodes but\n  succeed on others (all nodes have to agree on the outcome - abort or accept)\n\nThe Impossibility of Consensus - there is no algorithm that is always able to reach consensus if there is a risk that a\nnode may crash, in a distributed system we must assume that node may crash, so reliable consensus is impossible.\n\nTwo-phase locking is an algorithm for achieving atomic transaction commit across multiple nodes (all commit or all\nabort). 2 phases:\n\n- the coordinator begins phase 1 - it sends prepare to each of the nodes, asking whether they are able to commit\n- the coordinator tracks the responses from the participants, if all say yes - the coordinator sends out a commit\n  request, if any of the participant say no - the coordinator sends an abort request to all nodes\n\nThis is very similar to wedding ceremony in Western cultures. If the decision was to commit there is no going back, no\nmatter how many retries it takes. The protocol has 2 crucial points of no return. If the coordinator dies, the nodes\nshould communicate and come to some agreement. 2PC has bad reputation because of operational problems, low performance\nand promising more than it can deliver.\n\n## Chapter 10: Batch Processing\n\n> A system cannot be successful if it is too strongly influenced by a single person. Once the initial design is complete\n> and fairly robust, the real test begins as people with many viewpoints undertake their own experiments.\n\n3 types of systems:\n\n- services (online systems) - a service waits for a request or instruction from a client to arrive, when received, the\n  service tries to serve it as quickly as possible.\n- batch processing (offline systems) - system takes a large amount of input data, runs a job to process it and produces\n  some output data. Batch jobs are often scheduled to run periodically. The primary performance measure is throughput.\n- stream processing systems (near-real-time systems) - something between online and offline systems. A stream processor\n  consumes inputs and produces outputs (rather than responding to request).\n\nSimple Batch Processing can be performed in UNIX via awk, grep and other command line tools (using a chain of commands).\n\nThe Unix Philosophy - the idea of connecting programs with pipes. This is possible because of common interface (programs\noperating on file descriptors) of programs, which are small and are doing one thing.\n\nThe biggest limitation of UNIX tools is that they run only on a single machine and that is where tools like Hadoop come\nin.\n\nMapReduce is a bit like Unix tools, but distributed across potentially thousands of machines. MapReduce jobs read and\nwrite files on a distributed filesystem, in Hadoop's implementation of MapReduce the filesystem is called HDFS (Hadoop\nDistributed File System - reimplementation of the Google File System).\n\nHDFS is based on the shared-nothing principle. HDFS consists of a daemon process running on each machine, exposing a\nnetwork service that allows other nodes to access files stored on that machine. In order to tolerate machine and disk\nfailures, file blocks are replicated on multiple machines.\n\nTo create a MapReduce job, you need to implement 2 callback functions:\n\n- mapper - called once for every inout record, its job is to extract the key and value from the input record.\n- reducer - the framework takes the key-value pairs produced by the mapper, collects all the values belonging to the\n  same key and calls the reducer with an iterator over collection of values.\n\nPrinciple:\n\n> Put the computation near the data\n\nit saves copying the input file over the network, reducing network load and increasing locality.\n\nIn order to achieve good throughput in a batch processing, the computation must be as much as possible local in one\nmachine.\n\nHDFS is somewhat like a distributed version of UNIX, where HDFS is the filesystem and MapReduce is a quirky\nimplementation of a UNIX process. When MapReduce was published it was not all new. Some concepts were already known -\ne.g. massively parallel processing databases. Hadoop vs Distributed Databases:\n\n- databases require you to structure data according to particular model, whereas files in a distributed filesystem are\n  just byte sequences. Hadoop opened up the possibility of indiscriminately dumping data into HDFS and later figuring\n  out how to process it further. MPP databases require careful, up-front modeling of the data. The Hadoop has often been\n  used for implementing ETL processes, MapReduce jobs are written to clean up the data, transform it into a relational\n  form and import it into an MPP data warehouse for analytic purposes.\n- MPP databases are great because they take care of storage, query planning and execution, moreover they use SQL -\n  powerful query language. On the other hand not all kinds of processing can be sensibly expressed as SQL queries (\n  recommendation systems, full-text search or image analysis). MapReduce gave the engineers the ability to easily run\n  their own code over large datasets.\n- MPP databases and MapReduce took different approach to handling faults and the use of memory and disk. Natch processes\n  are less sensitive to faults than online systems, because they do not immediately affect users if they fail, and they\n  always can be run again. If a node fails, most MPP databases abort the entire query, MapReduce can tolerate the\n  failure of a map or reduce task. MapReduce dumps partial results to the disk, so they can be restored after failure.\n  MPP databases are more willing to store data in the memory for faster access. MapReduce is designed to tolerate\n  frequent unexpected task termination, not because hardware is unreliable, it is because the freedom to arbitrarily\n  terminate processes enables better resource utilisation in a computing cluster (Google came up wit this idea, this\n  design was designed by their resource usage).\n\nMapReduce is just one of many possible programming models for distributed systems. MapReduce has problems with *\nmaterialisation* of the data - the process of writing out intermediate state files. Several new execution engines for\ndistributed batch processing were developed in order to fix this problem with MapReduce (data-flow engines) - Spark,\nTez, Flink. Dataflow engines provide several options for connecting one operator's output to another's input - sort by\nkey, tak several inputs and to partition them, but skip the sorting, for broadcast hash joins, the same output from one\noperator can be sent to all partitions of the join operator.\n\nSystems like Dryrad and Nephele offer several advantages compared to MapReduce model:\n\n- expensive work (e.g. sorting) only performed in places where it is actually required\n- no unnecessary map tasks\n- intermediate state between operators kept in memory or written to local disk\n- operators can start executing as soon as their input is ready\n- existing JVMs can be reused to run new operators\n\nFully materialised intermediate state to a distributed filesystem makes fault tolerance fairly easy in MapReduce. Spark,\nFLink and Tes avoid writing intermediate state to HDFS.\n\nMapReduce - is like writing the output of each command to a temporary file.\n\nDataflow engines look like much more like UNIX pipes (final result still might be saved to HDFS).\n\nHigh level APIs like Hive, Pig, Cascading and Crunch became popular because programming MapReduce jobs is quite\nlaborious.\n\n## Chapter 11: Stream Processing\n\n> Complex systems always evolve from simple system that works. A complex system designed from scratch never works and\n> cannot be made to work.\n\nBatch processing must artificially divide data into chunks of fixed duration (for example: processing a day's worth of\ndata at the end of every day). The problem with daily batch processes is that changes in the input are only reflected in\nthe output a day later, which is too slow for many impatient users. Delay can be reduced by running the processing more\nfrequently.\n\nStream processing - processing every event as it happens. \"Stream\" refers to data that is incrementally made available\nover time.\n\nEvent - a small, self-contained, immutable object containing the details of something that happened at some point in\ntime. An event usually contains a timestamp indicating when it happened (according time-of-day clock). Related events\nare usually grouped together into a topic or stream.\n\nPolling the datastore to check for events that have appeared since it last ran becomes expensive if the datastore is not\ndesigned for this kind of usage. It is better for consumers to be notified when new events appear.\n\nCommon approach for notifying consumers about new events is to use a messaging system - producer sends a message\ncontaining the event, which is then pushed to consumers.\n\nDirect messaging - direct communication between producers and consumers without going via intermediary nodes. Brokerless\nlibraries: ZeroMQ, nanomsg - pub-sub messaging over TCP or IP multicast. StatsD and Brubeck use unreliable UDP messaging\nfor collecting metrics from all machines on the network and monitoring them. Webhooks - a pattern in which a callback\nURL of one service is registered with another service, and it makes a request to that URL whenever an event occurs.\n\nMessage brokers - kind of database, that is optimised for handling message streams. It runs as a server, with producers\nand consumers connecting to it as clients. Producers write messages, consumers receive them by reading them from the\nbroker. By centralising the data in the broker, these systems can more easily tolerate clients that come and go. A\nconsequence of queueing is also that consumers are generally asynchronous: when a producer send a message it normally\nonly waits for the broker to confirm that it has buffered the message, and it does not wait for the message to be\nconsumed.\n\nMultiple consumers - when multiple consumers read messages in the same topic, two main patterns of messaging are used:\n\n- load balancing - each message is delivered to one of the consumers, so the consumers can share the work of processing\n  the messages in the topic. This pattern is useful then the messages are expensive to process, and you want to bale to\n  add consumers to parallelize the processing.\n- fan-out - each message is delivered to all the consumers, equivalent of having several batch jobs that read the same\n  input file.\n\nMessage brokers use acknowledgements: a client must explicitly tell the broker when it has finished processing a message\nso that the broker can remove it from the queue.\n\nMessages can go out of order because for example network problem and lack of acknowledgement.\n\nLog-based message brokers - durable storage approach of databases combined with the low-latency notification facilities\nof messaging. A log is simply an append-only sequence of records on disk. A producer can send a message by appending it\nto the end of the log and a consumer can receive message by reading the log sequentially.\n\nIn order to scale to higher throughput that a single disk can offer, the log can be partitioned. Different partitions\ncan be hosted on different machines. A topic can then be defined as a group of partitions that carry messages of the\nsame type.\n\nApache Kafka, Amazon Kinsesis Streams and Twitter's DistributedLog are log-based message brokers. Google Pub/Sub is\narchitecturally similar but exposes a JMS-style API rather than log abstraction.\n\nEven though these message brokers write all messages to disk, they are able to achieve throughput of millions of\nmessages per second by partitioning across multiple machines.\n\nLog-based approach trivially supports fan-out messaging.\n\nChange Data Capture - the process of observing all data changes written to a database and extracting them in a form in\nwhich they can be replicated to other systems. You can capture the changes in a database and continually apply the same\nchanges to search index.\n\nEvent Sourcing - involves storing all changes to the application state as log of change events. Events are designed to\nreflect things that happened at the application level, rather than low-level state changes. Powerful technique for data\nmodeling: from an application point of view it is more meaningful to record the user's actions as immutable events,\nrather than recording the effect of those actions on a mutable database: \"student cancelled their course enrolment\" vs \"\none entry was deleted from the enrolments table\". Event Store is a specialised database to support applications using\nevent sourcing.\n\nApplications that use event sourcing typically have some mechanism for storing snapshots of the current state that is\nderived from the log of events, so they don't need to repeatedly reprocess the full log.\n\nCQRS - Command Query Responsibility Segregation - separating the form in which data is written from the form it is read,\nby allowing several read views.\n\nStreams can be used to produce other, derived streams. Stream processing has long been used for monitoring purposes:\nfraud detection, trading system examining price changes, machines monitoring, monitoring in military.\n\nComplex Event Processing (CEP) - an approach developed in the 1990s for analysing event streams, especially geared\ntoward the kind of application that requires searching for certain event patterns. CEP allows you to specify rules to\nsearch for certain patterns of events in a stream. CEP systems use a high-level declarative query language like SQL or\nGUI.\n\nStream processing is used also for analytics on streams, boundary between CEP and stream analytics is blurry.\nFrameworks: Apache Storm, Spark Streaming, Flink, Concord, Samza, Kafka Streams, Google Cloud Dataflow, Azure Stream\nAnalytics.\n\nTypes of time windows:\n\n- tumbling windows - has a fixed length, and every event belongs to exactly one window. Fo example 1-minute tumbling\n  window, events with timestamp between 10:03:00 and 10:03:59 are grouped into one window.\n- hopping window - has a fixed length, but allows windows to overlap in order to provide some smoothing.\n- sliding window - contains all the events that occur within some interval of each other. For example. a 5-minute\n  sliding window would cover events at 10\":03:39 and 10:08:12 because they are less than 5 minutes apart.\n- session window - has no fixed duration, instead it is defined by grouping together all events for the same user that\n  occur closely together in time, and the window ends when the user has been inactive for some time.\n\nTypes of stream joins:\n\n- stream-stream join (window join) - you need to choose a suitable window for the join (seconds, days weeks between\n  events), also be careful about ordering of received events.\n- stream-table join (stream enrichment) - to perform this join, the stream process needs to look at one activity event\n  at a time, look up something in the database (local or remote)\n- table-table join (materialised view maintenance) - twitter example: when user wants to see their feed, it is too\n  expensive to load all profiles' most recent tweets, instead we want a timeline cache, so reading is a simple lookup.\n  To implement cache maintenance (append to cache new tweets, remove deleted, ...) you need streams of events for\n  tweets.\n\nIf events on different streams happen around a similar time, in which order they are processed? If the ordering of\nevents across streams is undetermined, the join becomes nondeterministic, which means you cannot rerun the same job on\nthe same input and get the same result. In data warehouses, this issue is known as a slowly changing dimension (SCD). It\nis often addressed by using a unique identifier for a particular version of the joined record.\n\nBatch processing frameworks can tolerate faults fairly easily. In stream processing, fault tolerance is less\nstraightforward to handle. Possible approaches:\n\n- microbatching and checkpointing - break the stream into small blocks, and treat each block like a miniature batch\n  process (used in Spark Streaming, batch approx. 1 second long). Apache Flink periodically generate rolling checkpoints\n  of state and write them to durable storage.\n- atomic commit revisited - in order to give the appearance of exactly-once processing in the presence of faults, we\n  need to ensure that all outputs and side effects of processing take effect if and only if the processing is\n  successful. Exactly-once message processing in the context of distributed transactions and two-phase commit.\n- idempotence - our goal is to discard the partial output of any failed tasks so that they can be safely retried without\n  taking effect twice. Distributed transactions are the one way of achieving this, but another way is to rely on\n  idempotence. An idempotent operation is one that you can perform multiple times, and it has the same effect as if you\n  performed it only once (e.g. setting key in a key-value store, incrementing counter value is not idempotent). Even if\n  an operation is not naturally idempotent, it can often be made idempotent with a bit of extra metadata.\n\n## Chapter 12: The Future of Data Systems\n\nThe lambda architecture - incoming data should be recorded by appending immutable events to an always-growing dataset,\nsimilarly to event sourcing. From these events, read-optimised views are derived. The lambda architecture proposes\nrunning two different systems in parallel. In the lambda approach, the stream processor consumes the events and quickly\nproduces an approximate update to the view, the batch processor later consumes the same set of events and produces a\ncorrected version of derived view.\n\nFederated databases - unifying reads - it is possible to provide a unified query interface to a wide variety of\nunderlying storage engines and processing methods - an approach known as a federated database or polystore.\n\nUnbundled databases - unifying writes - making it easier to reliably plug together storage systems is like unbundling a\ndatabase's index-maintenance features in a way that can synchronise writes across disparate technologies.\n\nHardware is not quite the perfect abstraction that it may seem. Random bit-flips are very rare on modern hardware but\ncan happen. Even software lik MySQL or PostgreSQL can have bugs.\n\nLarge scale storage systems like HDFS or Amazon S3 do not fully trust disks: they run background processes that\ncontinually read back files, compare them to other replicas and move files from one disk to another, in order to\nmitigate the risk of silent corruption.\n\nACID databases has led us toward developing applications on the basis of blindly trusting technology. Since the\ntechnology we trusted worked well enough most time, auditing mechanisms were deemed worth the investment.\n\nHaving continuous end-to-end integrity checks gives you increased confidence about the correctness of your systems,\nwhich in turn allows you to move faster (like automated testing software).\n\nIt is not sufficient for software engineers to focus exclusively on the technology and ignore its ethical consequences.\nUsers are humans and human dignity is paramount.\n\nAlgorithmic prison - systematically being excluded from jobs, air travel, insurance coverage, property rentals,\nfinancial services, ... because algorithm said NO. In countries that respect human rights, the criminal system presumes\ninnocence until proven guilty, on the other hand automated systems can systematically exclude a person from\nparticipating in society without any proof of guilt and with little chance of appeal.\n\nDecisions made by an algorithm are not necessarily any better or worse than those made by a human. Every person is\nlikely to have biases. In many countries, anti-discrimination laws prohibit treating people differently depending on\nprotected traits (ethnicity, age, gender, sexuality, disability, beliefs).\n\nAutomated decision-making opens the question of responsibility and accountability. Who is responsible if self-driving\ncar causes an accident?\n\nBesides, the problems of predictive analysis, there are ethical problems with data collection itself. Though experiment,\nwhenever you see \"data\" (e.g. data driven company), replace it with the word surveillance (e.g. surveillance driven\ncompany). Even the most totalitarian and oppressive regimes could only dream of putting a microphone in every room and\nforcing every person to constantly carry a device capable of tracking their location and movements.\n\nDeclining to use a service due to its tracking of users is only an option for the small number of people who are\nprivileged enough to have the time and knowledge to understand its privacy policy and who are can afford to potentially\nmiss out on social participation opportunities.\n\nWhen collecting data, we need to consider not just today's political environments, but all possible future governments.  \n"
  },
  {
    "path": "books/docker-deep-dive.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Docker Deep Dive\n\nBook by Nigel Poulton"
  },
  {
    "path": "books/elixir.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Elixir in Action\n\nBook by Saša Jurić\n"
  },
  {
    "path": "books/fundamentals-of-architecture.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Fundamentals of Software Architecture: An Engineering Approach.\n\nBook by Mark Richards and Neal Ford\n\n- [Preface: Invalidating Axioms](#preface-invalidating-axioms)\n- [Chapter 1: Introduction](#chapter-1-introduction)\n- [Chapter 2: Architectural thinking](#chapter-2-architectural-thinking)\n- [Chapter 3: Modularity](#chapter-3-modularity)\n- [Chapter 4: Architecture Characteristics Defined](#chapter-4-architecture-characteristics-defined)\n- [Chapter 5: Identifying Architectural Characteristics](#chapter-5-identifying-architectural-characteristics)\n- [Chapter 6: Measuring and Governing Architecture Characteristics](#chapter-6-measuring-and-governing-architecture-characteristics)\n- [Chapter 7: Scope of Architecture Characteristics](#chapter-7-scope-of-architecture-characteristics)\n- [Chapter 8: Component-Based Thinking](#chapter-8-component-based-thinking)\n- [Chapter 9: Foundations](#chapter-9-foundations)\n- [Chapter 10: Layered Architecture Style](#chapter-10-layered-architecture-style)\n- [Chapter 11: Pipeline Architecture Style](#chapter-11-pipeline-architecture-style)\n- [Chapter 12: Microkernel Architecture Style](#chapter-12-microkernel-architecture-style)\n- [Chapter 13: Service-Based Architecture Style](#chapter-13-service-based-architecture-style)\n- [Chapter 14: Event-Driven Architecture Style](#chapter-14-event-driven-architecture-style)\n- [Chapter 15: Space-Driven Architecture Style](#chapter-15-space-driven-architecture-style)\n- [Chapter 16: Orchestration-Driven Service-Oriented Architecture](#chapter-16-orchestration-driven-service-oriented-architecture)\n- [Chapter 17: Microservices Architecture](#chapter-17-microservices-architecture)\n- [Chapter 18: Choosing the Appropriate Architecture Style](#chapter-18-choosing-the-appropriate-architecture-style)\n- [Chapter 19: Architecture Decisions](#chapter-19-architecture-decisions)\n- [Chapter 20: Analyzing Architecture Risk](#chapter-20-analyzing-architecture-risk)\n- [Chapter 21: Diagramming and Presenting Architecture](#chapter-21-diagramming-and-presenting-architecture)\n- [Chapter 22: Making Teams Effective](#chapter-22-making-teams-effective)\n- [Chapter 23: Negotiation and Leadership Skills](#chapter-23-negotiation-and-leadership-skills)\n- [Chapter 24: Developing a Career Path](#chapter-24-developing-a-career-path)\n- [Self-Assessment Questions](#self-assessment-questions)\n\n## Preface: Invalidating Axioms\n\n> Axiom - A statement or proposition which is regarded as being established, accepted, or self-evidently true.\n\nSoftware architects (like mathematicians) also build theories atop axioms (but the software world is _softer_ than\nmathematics).\n\nArchitects have an important responsibility to question assumptions and axioms left over from previous eras. Each new\nera requires new practices, tools, measurements, patterns, and a host of other changes.\n\n## Chapter 1: Introduction\n\nThe industry does not have a good definition of software architecture.\n\n> Architecture is about the important stuff... whatever that is ~ Ralph Johnson\n\nThe responsibilities of a software architect encompass technical abilities, soft skills, operational awareness, and a\nhost of others.\n\nWhen studying architecture - keep in mind that everything can be understood in context - why certain decisions were\nmade, was based on the realities of the environment (for example building microservice architecture in 2002 would be\ninconceivably expensive).\n\nKnowledge of the architecture structure, architecture characteristics, architecture decisions, and design principles is\nneeded to fully understand the architecture of the system.\n\n- structure/style: microservices, layered, microkernel, ...\n- characteristics: availability, reliability, scalability, fault tolerance, security, ...\n- decisions: what is and what is not allowed, rules for a system how it should be constructed\n- design principles: guidelines for constructing systems -- leverage async messaging between services to increase\n  performance\n\nExpectations of an architect:\n\n- make architecture decisions\n    - instead of making technical decisions (use React.js), instruct development teams (use a reactive-based framework)\n- continually analyze the architecture\n    - validate decisions made years ago in order to prevent structural decay\n- keep current with the latest trends\n    - the decisions an architect makes tend to be long-lasting and difficult to change. understanding and following key\n      trends helps the architect prepare for the future\n- ensure compliance with decisions\n    - continually verify that development teams are following the architecture decisions and design principles defined\n- diverse exposure and experience\n    - an architect should be at least familiar with a variety of technologies, effective architect should be aggressive\n      in seeking out opportunities to gain experience in multiple languages, platforms and technologies\n- have business domain knowledge\n    - without business knowledge, an architect cannot communicate with stakeholders and business users and will quickly\n      lose credibility\n- possess interpersonal skills\n    - interpersonal skills, including teamwork, facilitation, and leadership\n    - engineers love to solve technical problems, however G. Weinberg said: \"no matter what they tell you, it is always\n      a people problem\"\n    - many architects are excellent technologists, but are ineffective architects because of poor communication skills\n- understand and navigate politics\n    - have negotiation skills, almost every decision an architect makes will be challenged\n\n> All architectures become iterative because of _unknown unknowns_. Agile just recognizes this and does it sooner.\n\nIterative process fits the nature of software architecture. Trying to build a modern system such as microservices using\nWaterfall will find a great deal of friction.\n\nNothing remain static. What we need is _evolutionary architecture_ - mutate the solution, evolve new solutions\niteratively. Adopting Agile engineering practices (continuous integration, automated machine provisioning, ...) makes\nbuilding resilient architectures easier.\n\nAgile methodologies support changes better than planning-heavy processes because of tight feedback loop.\n\nLaws of Software Architecture:\n\n- Everything in software architecture is a trade-off\n- If an architect thinks they have discovered something that isn't a trade-off, more likely they just haven't identified\n  the trade-off yet\n- Why is more important than how\n\n## Chapter 2: Architectural thinking\n\n4 main aspects of thinking like an architect:\n\n1. understanding the difference between architecture and design\n    - architecture vs design\n        - architecture: defining architecture characteristics, selecting architecture patterns, creating components\n        - design: class diagrams, user interface, code testing and development\n    - architects and development teams have to form strong bidirectional relationship, be on the same virtual team\n    - where does architecture end and design begin? nowhere\n    - architecture and design must be synchronized by tight collaboration\n2. wide breadth of technical knowledge\n    - developer - significant amount of technical depth\n        - specialised in languages, frameworks and tools\n    - architect - significant amount of technical breadth\n        - broad understanding of technology and how to use it to solve particular problems\n3. understanding, analyzing, and reconciling trade-offs between various solutions and technologies\n    - thinking like an architect is all about seeing trade-offs in every solution\n    - the ultimate answer for architectural questions: _it depends on ..._ (budget, business env, company culture, ...)\n    - look at the benefits of a given solution, but also analyze the negatives\n    - analyze trade-offs and the ask, what is more important, this decision always depend on the environment\n4. understanding the importance of business drivers\n    - business drivers are required for the success of the system\n    - understanding the domain knowledge and ability to translate those requirements into architecture characteristics\n\n_Frozen Caveman Anti-Pattern_: describes an architect who always reverts to their pet irrational concern for every\narchitecture. This anti-pattern manifests in architects who have been burned in the past by a poor decision/unexpected\noccurrence, making them particularly cautious in the future.\n\nHow an architect can remain hands-on coding skills?\n\n- do frequent proof-of-concepts\n- whenever possible, write best production-quality code (even when doing POCs) -- POC code often remains in the\n  repository and becomes the reference or guiding example\n- tackle technical debt stories or architecture stories, freeing the development team up to work on the critical\n  function user stories\n- work on bug fixes\n- create simple command-line tools and analyzers to help the development team with their day-to-day tasks\n- do code reviews frequently\n\n## Chapter 3: Modularity\n\nModularity is an organizing principle. If an architect designs a system without paying attention to how the pieces wire\ntogether, they end up creating a system that presents myriad difficulties.\n\nDevelopers typically use modules as a way to group related code together. For discussions about architecture, we use\nmodularity as a general term to denote a related grouping of code: classes, functions, or any other grouping.\n\n_Cohesion_ - refers to what extent the parts of a module should be contained within the same module. It is a measure of\nhow related the parts are to one another.\n\n_Abstractness_ is the ratio of abstract artifacts to concrete artifacts. It represents a measure of abstractness versus\nimplementation. A code base with no abstractions vs a code base with too many abstractions.\n\n## Chapter 4: Architecture Characteristics Defined\n\nArchitects may collaborate on defining the domain or business requirements, but one key responsibility entails defining,\ndiscovering, and analyzing all the things the software must do that isn't directly related to the domain functionality\n-- architectural characteristics.\n\nOperational Architecture Characteristics:\n\n- Availability - how long the system will need to be available\n- Continuity - disaster recovery capability\n- Performance - stress testing, peak analysis\n- Recoverability - how quickly is the system required to be on-line again?\n- Reliability - if it fails, will it cost the company large sums of money?\n- Robustness - ability to handle error and boundary conditions while running\n- Scalability - ability for the system to perform and operate as the number of users/requests increases\n\nStructural Architecture Characteristics\n\n- Configurability - ability for the end users to easily change aspects of the software's configuration\n- Extensibility - how important it is to plug new pieces of functionality in\n- Installability - ease of system installation on all necessary platforms\n- Localization - support for the multiple languages, currencies, measures\n- Maintainability - how easy it is to apply changes and enhance the system?\n- Portability - does the system need to run on more than one platform?\n- Supportability - what level of technical support is needed by the application?\n- Upgradeability - ability to quickly upgrade from a previous version\n\nCross-cutting Architecture Characteristics\n\n- Accessibility - access to all users, including those with disabilities\n- Archivability - will the data need to be deleted/archived?\n- Authentication - security requirements to ensure users are who they say they are\n- Authorization - security requirements to ensure users can access only certain functions within application\n- Legal - what legislative constraints is the system operation in?\n- Privacy - ability to hide transactions from internal company employees\n- Security - does the data need to be encrypted in the database, what type of authentication is needed...?\n- Supportability - what level of technical support is needed by the application?\n- Usability - level of training required for users to achieve their goals with the app\n\nAny list of architecture characteristics will be an incomplete list. Any software may invent important architectural\ncharacteristics based on unique factors. Many of the terms are imprecise and ambiguous. No complete list of standards\nexists.\n\nApplications can support only a few of the architecture characteristics we have listed. Firstly, each of the supported\ncharacteristics requires design effort. Secondly, each architecture characteristic often has an impact on others.\nArchitects rarely encounter the situation where they are able to design a system and maximize every single architecture\ncharacteristics.\n\n> Never shoot for the best architecture, but rather _the least worst_ architecture.\n\nToo many architecture characteristics lead to generic solutions that are trying to solve every business problem, and\nthose architectures rarely work because the design becomes unwieldy. Architecture design should be as iterative as\npossible.\n\n## Chapter 5: Identifying Architectural Characteristics\n\nIdentifying the correct architectural characteristics for a given problem requires an architect to not only understand\nthe domain problem, but also collaborate with the problem domain stakeholders to determine what is truly important from\na domain perspective.\n\n_Extracting architecture characteristics from domain concerns_: translate domain concerns to identify the right\narchitectural characteristics. Do not design a generic architecture, focus on short list of characteristics. Too many\ncharacteristics leads to greater and greater complexity. Keep design simple. Instead of prioritizing characteristics,\nhave the domain stakeholders select the top 3 most important characteristics from the final list.\n\nTranslation of domain concerns to architecture characteristics:\n\n- Mergers and acquisition -> Interoperability, scalability, adaptability, extensibility\n- Time to market -> Agility, testability, deployability\n- User satisfaction -> Performance, availability, fault tolerance, testability, deployability, agility, security\n- Competitive advantage -> Agility, testability, deployability, scalability, availability, fault tolerance\n- Time and budget -> Simplicity, feasibility\n\n_Extracting architecture characteristics from requirements_: some characteristics come from explicit statements in\nrequirements.\n\nArchitecture Katas - in order te become a great architect you need a practice. The Kata exercise provides architects\nwith a problem stated in domain terms (description, users, requirements) and additional context. Small teams work 45\nminutes on a design, then show results to the other groups, who vote on who came up with the best architecture. Team\nmembers ideally get feedback from the experienced architect abut missed trade-offs and alternative designs.\n\nExplicit characteristics - appear in a requirements' specification, e.g. support for particular number of users.\n\nImplicit characteristics - characteristics aren't specified in requirements documents, yet they make up an important\naspect of the design, e.g. availability - making sure users can access the website, security - no one wants to create\ninsecure software, ...\n\nArchitects must remember: there is no best design in architecture, only a least worst collection of trade-offs.\n\n## Chapter 6: Measuring and Governing Architecture Characteristics\n\n- They aren't physics - many characteristics have vague meanings, the industry has wildly differing perspectives\n- Wildly varying definitions - different people may disagree on the definition, without agreeing on a common definition,\n  a proper conversation is difficult\n- Too composite - many characteristics compromise may others at a smaller scale\n\nOperational measures: obvious direct measurements, like performance -- measure response time. High-level teams don't\njust establish hard performance numbers, they base their definitions on statistical analysis.\n\nStructural measures: addressing critical aspects of code structure, like cyclomatic complexity - the measurement for\ncode complexity, computed by applying graph theory to code.\n\n> Overly complex code represents a code smell. It hurts almost every of the desirable characteristics of code bases\n> (modularity, testability, deployability, ...). Yet if teams don't keep an eye on gradually growing complexity,\n> that complexity will dominate the code base.\n\nProcess measures: some characteristics intersect with software development processes. For example, agility can relate to\nthe software development process, ease of deployment and testability requires some emphasis on good modularity and\nisolation at the architecture level.\n\nGoverning architecture characteristics - for example, ensuring software quality within an organization falls under the\nheading of architectural governance, because it falls within the scope of architecture, and negligence can lead to\ndisastrous quality problems.\n\n_Architecture fitness function_ - **any mechanism** that provides an objective integrity assessment of some architecture\ncharacteristic or combination of architecture characteristics. Many tools may be used to implement fitness functions:\nmetrics, monitors, unit tests, chaos engineering, ...\n\nRather than a heavyweight governance mechanism, fittness functions provide a mechanism for architects to express\nimportant architectural principles and automatically verify them. Developer know that they shouldn't release insecure\ncode, but that priority competes with dozens or hundreds of other priorities for busy developers. Tools like the\nSecurity Monkey, and fitness functions generally, allow architects to codify important governance checks into the\nsubstrate of the architecture.\n\n## Chapter 7: Scope of Architecture Characteristics\n\nWhen evaluating many operational architecture characteristics, an architect must consider dependent components outside\nthe code base that will impact those characteristics.\n\n_Connascence_ - Two components are connascent is a change in one would require the other to be modified in order to\nmaintain the overall correctness of the system.\n\nIf two services in a microservices architecture share the same class definition of some class, they are statically\nconnascent. Dynamic connascence: synchronous - caller needs to wait for the response from the callee, asynchronous calls\nallow fire-and-forget semantics in event-driven architecture.\n\nComponent level coupling isn't the only thing that binds software together. Many business concepts semantically bind\nparts of the system together, creating functional cohesion.\n\n_Architecture quantum_ - an independently deployable artifact with high functional cohesion and synchronous connascence.\n\n- independently deployable - all necessary components to function independently from other parts of the architecture (\n  e.g. a database - the system will not function without it)\n- high functional cohesion - how well the contained code is unified in purpose, meaning - an architecture quantum needs\n  to do something purposeful\n- synchronous connascence - synchronous call within an application context of between distributed services that form\n  this architecture quantum.\n\n## Chapter 8: Component-Based Thinking\n\nArchitects typically think in terms of components, the physical manifestation of a module. Typically, the architect\ndefines, refines, manages, and governs components within an architecture.\n\nArchitecture Partitioning - several styles exist, with different sets of trade-offs (layered architecture, modular\nmonolith).\n\n> Convay's Law: Organizations which design systems ... are constrained to produce designs which are copies of\n> the communication structures of these organizations.\n\nThis law suggests that when a group of people deigns some technical artifact, the communication structures between the\npeople end up replicated in the design.\n\nTechnical partitioning - organizing components by technical capabilities (presentation, business rules, persistence).\n\nDomain partitioning - modeling by identifying domains/workflows independent and decouples from another. Microservices\nare based on this philosophy.\n\nDeveloper should never take components designed by architects as the last words. All software design benefits from\niteration. Initial design should be viewed as a first draft.\n\nComponent identification flow:\n\n- identify initial components\n- assign requirements to components\n- analyze roles and responsibilities\n- analyze architecture characteristics\n- restructure components\n\nFinding proper granularity for components is one of most difficult tasks. Too fine-grained design leads to too much\ncommunication between components, too coarse-grained encourage high internal coupling.\n\nDiscovering components:\n\n- entity trap - anti-pattern when an architect incorrectly identifies the database relationships, this anti-pattern\n  indicates lack of thought about the actual workflows of the application.\n- actor-actions approach - a popular way to map requirements to components, identify actors who perform activities with\n  the application and the actions those actors may perform.\n- event storming - the architect assumes the project will use messages and/or events to communicate between components,\n  the team tries to determine which events occur in the system based on requirements and identified roles, and build\n  components around those event and message handlers.\n- workflow approach - identifies the key roles, the kinds of workflows, and builds components around the identified\n  activities\n\nMonolithic vs Distributed Architecture:\n\n- monolithic: a single deployable unit, all functionality of the system that runs in the process, typically connected to\n  a single database\n- distributed: multiple services running in their onw ecosystem, communicating via network, each service can may have\n  its own release cadence and engineering practices\n\n## Chapter 9: Foundations\n\nArchitecture styles (a.k.a. architecture patterns) - describe a named relationship of components covering a variety of\narchitecture characteristics. Style name, similar to design patterns, creates a single name that acts as shorthand\nbetween experienced architects.\n\nBig Ball of Mud - the absence of any discernible architecture structure. The lack of structure makes change increasingly\ndifficult. Problematic testing, deployment, scalability, performance, ... Mess because of lack of governance around code\nquality and structure.\n\nClient/Server - separation of responsibilities - backend-frontend/two-tier/client-server.\n\nArchitecture styles can be classified into 2 main types:\n\n- monolithic - single deployment of unit code\n    - layered, pipeline, microkernel\n- distributed - multiple deployment units connected through network\n    - service-based, event-driven, space-based, service-oriented, microservices\n    - much more powerful in terms of performance, scalability, and availability, but there are trade-offs\n\n_The Fallacies of Distributed Computing:_\n\n1. The Network is Reliable - fact: networks still remain generally unreliable, this is why things like timeouts and\n   circuit breakers exist between services. The more a system relies on the network, the potentially less reliable it\n   becomes.\n2. Latency is Zero - local call is measured in nanoseconds/microseconds, the same call made through a remote access\n   protocol is measured in milliseconds. Do you know what the average round-trip latency is for a RESTful call in your\n   prod env?\n3. Bandwidth is Infinite - communication between remote services significantly utilizes bandwidth causing networks to\n   slow down. Imagine 2000 req/s, 500 kb each = 1 Gb! Ensuring that the minimal amount of data is passed between\n   services in a distributed architecture is the best way to address this fallacy.\n4. The Network is Secure - the surface area for threats and attacks increases by magnitudes when moving from a\n   monolithic to a distributed architecture, despite measures like VPNs, trusted networks and firewalls.\n5. The Topology Never Changes - network topology (routers, hubs, switches, firewalls, networks, appliances) CAN change,\n   architects must be in constant communication with operations and network administrators to know what is changing and\n   when so they can make adjustments.\n6. There is Only One Administrator - this fallacy points to the complexity of distributed architecture and the amount of\n   coordination that must happen to get everything working correctly. Monoliths do not require this level of\n   communication and collaboration due to the single deployment unit characteristics.\n7. Transport Cost is Zero - transport cost does not refer to latency, but rather to actual cost in terms of money\n   associated with making a simple RESTful call. Distributed architectures cost significantly more than monolithic\n   architectures, primarily due to increased needs for additional hardware, servers, gateways, firewalls, subnets,\n   proxies, ...\n8. The Network is Homogenous - notwork is not made up by one network hardware vendor, not all of this heterogeneous\n   hardware vendors play well together.\n\nOther distributed considerations:\n\n- distributed logging - debugging in a distributed architecture is very difficult and time-consuming, logging\n  consolidation tools may help.\n- distributed transactions - in a monolith it is super easy to perform `commit`/`rollback`, it is much more difficult\n  todo the same in a distributed system. Distributed systems rely on eventual consistency - this is one of the\n  trade-offs. Transactional SAGAs are one way to manage distributed transactions.\n- contract maintenance and versioning - a contract is behaviour and data that is agreed upon by both the client and\n  service, maintenance is hard due to decoupled services owned by different teams and departments.\n\n## Chapter 10: Layered Architecture Style\n\nThe Layered Architecture (n-tiered) - standard for most applications, because of simplicity, familiarity, and low cost.\nThe style also falls into several architectural anti-patterns (architecture by implication, accidental architecture).\n\nMost layered architectures consist of 4 standard layers: presentation, business, persistence, and database.\n\nThe layered architecture is a technically partitioned architecture (as opposed to domain-partitioned architecture).\nGroups of components, rather than being grouped by domain, are grouped by their technical role in the architecture. As a\nresult, any particular business domain is spread throughout all of the layers of the architecture. A domain-driven\ndesign does not work well with the layered architecture style.\n\nEach layer can be either closed or open.\n\n- closed - a request moves top-down from layer to layer, the request cannot skip any layers\n- open - the request can bypass layers (fast-lane reader pattern)\n\nThe layers of isolation - changes made in one layer of the architecture generally don't impact/affect components in\nother layers. Each layer is independent of the other layers, thereby having little or no knowledge of the inner workings\nof other layers in the architecture. Violation of this concept produces very tightly coupled application with layer\ninterdependencies between components This type of architecture becomes very brittle, difficult and expensive to change.\n\nThis architecture makes for a good starting point for most applications whe it is not known yet exactly which\narchitecture will ultimately be used. Be sure to keep reuse at minimum and keep object hierarchies. A good level of\nmodularity will help facilitate the move to another architecture style later on.\n\nWatch out for the architecture sinkhole anti-pattern - this anti-pattern occurs when requests move from one layer to\nanother as simple pass-through processing with no business logic performed within each layer. For example, the\npresentation layer responds to a simple request from the user to retrieve basic costumer data.\n\n## Chapter 11: Pipeline Architecture Style\n\nPipeline (a.k.a. pipes, filters) architecture: _Filter -(Pipe)-> Filter -(Pipe)-> Filter -(Pipe)-> Filter_\n\n- pipes - for the communication channel between filters, each pipe is usually unidirectional and point-to-point.\n- filters - self-contained, independent from other filters, stateless, should perform one task only. 4 types of filters\n  exist within this architecture style\n    - producer - the starting point of a proces, sometimes called the source\n    - transformer - accepts input, optionally performs a transformation on data, then forwards it to the outbound pipe,\n      also known as \"map\"\n    - tester - accepts inout, tests criteria, then optionally produces output, also known as \"reduce\"\n    - consumer - the termination point for the pipeline flow, persist or display the final result\n\nETL tools leverage the pipeline architecture for the flow and modification of data from one database to another.\n\n## Chapter 12: Microkernel Architecture Style\n\nThe microkernel architecture style (a.k.a plug-in) - a relatively simple monolithic architecture consisting of two\ncomponents: a core system and plug-in components.\n\nCore system - the minimal functionality required to run the system. Depending on the size and complexity, the core\nsystem can be implemented as a layered architecture or modular monolith.\n\nPlug-in components - standalone, independent components that contain specialized processing, additional features, and\ncustom code meant to enhance or extend the core system. Additionally, they can be used to isolate highly volatile code,\ncreating better maintainability and testability within the application. Plug-in components should have no-dependencies\nbetween them.\n\nPlug-in components do not always have to be point-to-point communication with the core system (REST or messaging can be\nused instead). Each plug-in can be a standalone service (or even microservice) - this topology is still only a single\narchitecture quantum due to monolithic core system.\n\nPlug-in Registry - the core system needs to know about which plug-in modules are available and gow to get them. The\nregistry contains information about each plug-in (name, data, contract, remote access protocol). The registry can be as\nsimple as an internal map structure owned by the core system, or as complex as a registry and discovery tool (like\nZooKeeper or Consul).\n\nExamples of usages: Eclipse IDE, JIRA, Jenkins, Internet web browsers, ...\n\nProblems that require different configurations for each location or client match extremely well with this architecture\nstyle. Another example is a product that places a strong emphasis on user customization and feature extensibility.\n\n## Chapter 13: Service-Based Architecture Style\n\nA hybrid of the microservices, one of the most pragmatic architecture styles (flexible, simpler and cheaper than\nmicroservices/even-driven services).\n\nTopology: a distributed macro layered structure consisting of a separately deployed user interface, separately deployed\ncoarse-grained services (domain services) and a monolithic database. Because the services typically share a single\nmonolithic database, the number of services within an application context range between 4 and 12.\n\nBase on scalability, fault tolerance, and throughput - multiple instances of a domain service can exist. Multiple\ninstances require some load-balancing.\n\nMany variants exist within the service-based architecture:\n\n- single monolithic user interface\n- domain-based user interface\n- service-based user interface\n\nSimilarly, you can break apart a single monolithic database, going as far as domain-scoped databases.\n\nService-based architecture uses a centrally shared database. Because of small number of services, database connections\nare not usually an issue. Database changes, can be an issue. If not done properly, a table schema change can impact\nevery service, making database changes very costly task in terms of effort and coordination.\n\nOne way to mitigate the impact and risk of database changes is to logically partition the database and manifest the\nlogical partitioning through federated shared libraries. Changes to a table within a particular logical domain, impacts\nonly those services using that shared library.\n\nWhen making changes to shared tables, lock the common entity objects and restrict change access to only the database\nteam. This helps control change and emphasizes the significance of changes to the common tables used by all services.\n\nService based architecture - one of the most pragmatic architecture styles, natural fit when doing DDD, preserves ACID\nbetter than any other distributed architecture, good level of architectural modularity.\n\n## Chapter 14: Event-Driven Architecture Style\n\nA popular distributed asynchronous architecture style used to produce highly scalable and high-performance apps. It can\nbe used for small applications as well as large, complex ones. Made up of decoupled event processing components that\nasynchronously receive and process events. It can be used as a standalone style or embedded within other architecture\nstyle (e.g. event-driven microservices architecture).\n\n2 primary topologies:\n\n- the mediator topology - used when you require control over the workflow of an event process\n    - an event mediator - manages and controls the workflow for initiating events that require the coordination of\n      multiple event processors, usually there are multiple mediators (associated with a particular domain)\n    - if an error occurs (no acknowledgement from on eof event processors), the mediator can take corrective action to\n      fix the problem\n    - the mediator controls the workflow, it can maintain the event state and manage errors\n    - operates on commands (send-email, fulfill-order), rather than on events (email-sent, order-fulfilled)\n    - cons: not as highly decoupled as the broker topology, lower scalability, hard to model complex workflows\n- the broker topology - used when you require a high level of responsiveness\n    - no central event mediator\n    - message flow is distributed across the event processor components in a chain-like broadcasting fashion\n    - a good practice: for each event processor advertise what it did to the rest of the system, regardless of whether\n      any other event processor cares about what that action was\n    - operates on events (email-sent, order-fulfilled), rather than on commands (send-email, fulfill-order)\n    - cons: challenging error handling - no central monitoring/controlling, not possible to restart a business\n      transaction (because actions are taken asynchronously)\n\nERROR HANDLING: the workflow event pattern - leverages delegation, containment, and repair through the use of a workflow\ndelegate. On error, the event consumer immediately delegates the error to the workflow processor and moves on. The\nworkflow processor tries to figure out what is wrong with the message (rules, machine learning, ...), once the message\nis repaired it can be sent back to the event processor. In case a very problematic error a human agent can determine\nwhat is wrong with the message and then re-submit.\n\nData loss (lost messages) - a primary concern when dealing with asynchronous communication. Typical data-loss scenarios:\n\n- the message never makes it to the queue or the broker goes does before the event processor can can retrieve the\n  message\n    - solution: leverage persistent message queues (guaranteed delivery), message persisted in the broker's database (\n      not only in the memory)\n- event processor de-queues message and crashes before it can process the message\n    - solution: _client acknowledge mode_ - message is not deleted from the broker immediately, but waits for\n      acknowledgement\n- event processor is unable to persist the message in the database\n    - solution: leverage ACID transactions\n\nBroadcast - the capability to broadcast events without knowledge of who is receiving the message and what they do with\nit. Broadcasting is perhaps the highest level of decoupling between event processors.\n\nIn event-driven architecture, synchronous communication is accomplished through **request-reply** messaging. Each event\nchannel within request-reply messaging has 2 queues (request + reply queue). 2 primary techniques for implementing\nrequest-reply messaging:\n\n1. [PREFERRED] Correlation ID - a field in the reply message usually set to the request message ID.\n2. Temporary queue - dedicated for the specific request, created when the request is made, and deleted when the request\n   ends. Does not require Correlation ID. Large message volumes can significantly slow down the message broker and\n   impact performance and responsiveness.\n\n- Request-Based - for well-structured, data-driven requests (e.g. retrieving customer profile data).\n- Event-Based - for flexible, action-based events that require high level of responsiveness and scale, with complex and\n  dynamic processing.\n\n## Chapter 15: Space-Driven Architecture Style\n\nIn any high-volume application with a large concurrent load, the database will become a bottleneck, regardless of used\ncaching technologies.\n\nThe space-based architecture style is specifically designed to address problems involving high scalability, elasticity,\nand high concurrency issues.\n\n_Tuple space_ - the technique of using multiple parallel processors communicating through shared memory.\n\nHigh scalability, elasticity, and performance are achieved by removing the central database and leveraging replicated\nin-memory data grids. Application data is kept in memory and replicated among all active processing units.\n\nSeveral architecture components that make up a space-based architecture:\n\n- processing unit: containing the application code\n    - single or multiple processing units\n    - contains in-memory data grid and replication engine usually implemented using: Hazelcast, Apache Ignite, Oracle\n      Coherence\n- virtualized middleware: used to manage and coordinate the processing units\n    - handles the infrastructure concerns (data sync, request handling)\n    - made of:\n        - messaging grid - manages input request and session state, determines which active processing components are\n          available to receive the requests and forwards to one of those processing units (usually implemented using HA\n          Proxy and Nginx)\n        - data grid - implemented within the processing units as a replicated cache\n        - processing grid - (optional component) manages orchestrated request processing when there are multiple\n          processing units involved in a single business request\n        - deployment manager - monitors response times and user loads, starts up new processing units when load\n          increases, and shuts down when the load decreases\n- data pumps: used to synchronously send updated data to the database\n    - is a way of sending data to another processor which then updates data in a database\n    - always asynchronous, provide eventual consistency\n    - when a processing unit receives a request and updates its cache, that processing unit becomes the owner of teh\n      update and is responsible for sending that update through the data pump so that the database can be updated\n      eventually\n    - implemented using messaging; messages usually contain the new data values (diff)\n- data writers: used to perform the updates from the data pumps\n    - accept messages from a data pump and updates the database with the information contained in the message\n- data readers: used to read database data and deliver it to processing units upon startup\n    - responsible for reading data from the database and sending it to the processing units via reverse data pump\n    - invoked in one of 3 situations:\n        - a crash of all processing unit instances of the same named cache\n        - a redeployment of all processing units within the same named cache\n        - retrieving archive data not contained in the replicated cache\n\nData collision - occurs when data is updated in one cache instance A, and during replication to another cache instance\nB, the same data is updated by that cache B.The local update to B will be overridden by the old data from cache A, cache\nA will be overridden by cache B. Data Collision Rate factors: latency, number of instances, cache size.\n\n_Distributed cache_ - better data consistency. _Replicated cache_ - better performance and fault tolerance.\n\nExample usages of space-based architecture: well suited for applications that experience high spikes in user or request\nvolume and apps that have throughput excess of 10k concurrent users - online concert ticketing systems, online auction\nsystems.\n\n## Chapter 16: Orchestration-Driven Service-Oriented Architecture\n\nThis type appeared in the late 1990s when companies were becoming enterprises and architects were forced to reuse as\nmuch as possible because of expensive software licenses (no open source alternatives).\n\nReuse - the dominant philosophy in this architecture.\n\n- Business Services - sit at the top of this architecture and provide the entry point. No code, just input, output and\n  schema information.\n- Enterprise Services - fine-grained, shared implementations - atomic behaviors around particular business domain -\n  CreateCustomer, CalculateQuote, ... - collection of reusable assets - unfortunately, the dynamic nature of reality\n  defies these attempts.\n- Application Services - not all services in the architecture require the same level of granularity, these are one-off,\n  single-implementation services, for example an application a company doesn't want to take the time to make a reusable\n  service.\n- Infrastructure Services - supply the operational concerns - monitoring, logging, auth.\n- Orchestration Engine - the heart of this architecture, defines the relationship between the business and enterprise\n  services, how they map together, and where transaction boundaries lie. It also acts as an integration hub, allowing\n  architects to integrate custom code with package and legacy software systems.\n\nThis architecture in practice was mostly a disaster.\n\nWhen a team builds a system primarily around reuse, they also incur a huge amount of coupling between components. Each\nchange had a potential huge ripple effect. That in turn led to the need for coordinated deployments, holistic testing\nand other drags on engineering efficiency.\n\nThis architecture manages to find the disadvantages of both monolithic and distributed architectures!\n\n## Chapter 17: Microservices Architecture\n\nThere is no secret group of architects who decide what the next big movement will be. Rather, it turns out that many\narchitects end up making common decisions.\n\nMicroservices differ in this regard - it was popularized by a famous blog entry by Martin Fowled and James Lewis.\n\nMicroservices Architecture is heavily inspired by the ideas in DDD - bounded context, decidedly inspired microservices.\nWithin a bounded context, the internal parts (code, data schemas) are coupled together to produce work, but they are\nnever coupled to anything outside the bounded context.\n\nEach service is expected to include all necessary parts to operate independently.\n\nPerformance is often the negative side effect of the distributed nature of microservices. Network calls take much longer\nthan method calls. It is advised to avoid transactions across service boundaries, making determining the granularity the\nkey to success in this architecture.\n\nIt is hard to define the right granularity for services in microservices. If there are too many services, a lot of\ncommunication will be required to perform work. The purpose of service boundaries is to capture a domain or workflow.\n\nGuidelines to find the appropriate boundaries:\n\n- purpose - a domain, one significant behaviour on behalf of the overall application\n- transactions - often the entities that need to cooperate in a transaction show a good service boundary\n- choreography - if excellent domain isolation require extensive communication, you may consider merging services back\n  into larger service to avoid communication overhead\n\nMicroservices Architecture tries to avoid all kinds of coupling - including shared schemas and databases used as\nintegration points.\n\nOnce a team has built several microservices, they realize that each has common elements that benefit from similarity.\nThe shared sidecar can be either owned by individual teams or a shared infrastructure team. Once teams know that each\nservice includes a common sidecar, they can build a _service mesh_ - allowing unified control across the infrastructure\nconcerns like logging and monitoring.\n\n2 styles of user interfaces:\n\n- monolithic user interface - a single UI that calls through the API layer to satisfy user request\n- micro-frontends - each service emits the UI for that service, which the frontend coordinates with the other emitted UI\n  components\n\nMicroservices architectures typically utilize _protocol-aware heterogeneous interoperability_:\n\n- protocol-aware - each service should know how to call other services\n- heterogeneous - each service may be written in a different technology stack, heterogeneous means that microservices\n  fully support polyglot environments\n- interoperability - describes services calling one another, while architects in microservices try to discourage\n  transactional calls, services commonly call other services via the network to collaborate\n\nFor asynchronous communication, architects often use events and messages (internally utilizing an event-driven\narchitecture).\n\nThe broker and mediator patterns manifest as choreography and orchestration:\n\n- choreography - no central coordinator exists in this architecture\n- orchestration - coordinating calls across several services\n\nBuilding transactions across service boundaries violates the core decoupling principle of the microservices\narchitecture. DON'T.\n\n> Don't do transactions in microservices - fix granularity instead.\n\nExceptions always exist (e.g. 2 different services need vastly different architecture characteristics -> different\nboundaries), in such situations - patterns exist to handle transaction orchestration (with serious trade-offs).\n\nSAGA - the mediator calls each part of the transaction, records success/failure, and coordinates results. In case of an\nerror, the mediator must ensure that no part of the transaction succeeds if one part fails (e.g. send a request to undo\n\n- usually very complex). Typically, implemented by having each request in a `pending` state.\n\n> A few transactions across services is sometimes necessary; if it is the dominant feature of the architecture, mistakes\n> were made!\n\nPerformance is often an issue in microservices - many network calls, which has high performance overhead. Many patterns\nexist to increase performance (data caching and replication).\n\nHowever, one of the most scalable systems yet have utilized this style to great success, thanks to scalability,\nelasticity and evolutionary.\n\nAdditional references on microservices:\n\n- Building Microservices\n- Microservices vs. Service-Oriented Architecture\n- Microservices AntiPatterns\n\n## Chapter 18: Choosing the Appropriate Architecture Style\n\nChoosing an architecture style represents the culmination of analysis and thought about trade-offs for architecture\ncharacteristics, domain considerations, strategic goals, and a host of other things.\n\nPreferred architecture shift over time, driven by:\n\n- observations from the past - rely on experience and observations\n- changes in the ecosystem - constant change is a reliable feature of the software development\n- new capabilities - architects must keep an eye open to not only new tools but new paradigms\n- acceleration - new tools create new engineering practices, which lead to new design and capabilities\n- domain changes - the business continues to evolve\n- technology changes - as technology evolves, organizations have to keep up with at least some of these changes\n- external factors - external factors may force a migration to another option\n\nWhen choosing an architectural style, an architect must take into account all the various factors that contribute to the\nstructure for the domain design. Architects should go into the decision comfortable with the following things:\n\n- the domain - good general understanding of the major aspects of the domain\n- architecture characteristics that impact structure - architect must discover and elucidate the architecture\n  characteristics\n- data architecture - architects and DBAs must collaborate on database, schema and other DB-related concerns\n- organizational factors - external factors may influence design - cost, company's plans, ...\n- knowledge of process, trams, and operational concerns - software development process, interaction with operations and\n  the QA influence a design\n- domain/architecture isomorphism - some problem domains match the topology of the architecture\n\nSeveral determinations:\n\n- monolith vs distributed\n- where should data live\n- synchronous or asynchronous communication between services\n\nGeneral tip:\n\n> Use synchronous by default, asynchronous when necessary\n\n## Chapter 19: Architecture Decisions\n\nMaking architecture decisions involves gathering enough relevant information, justifying the decision, documenting the\ndecision, and effectively communicating the decision to the right stakeholders.\n\nDecision anti-patterns:\n\n- covering your assets - occurs when an architect avoids/defers making architecture decisions out of fear of making the\n  wrong choice, 2 ways to overcome:\n    - wait until you have enough information to justify and validate your decision, but waiting too long holds up\n      development teams\n    - continually collaborate with development teams to ensure that the decision can be implemented as expected, quickly\n      respond to change\n- groundhog day - when people don't know why a decision was made, so it keeps getting discussed over and over, architect\n  failed to provide a justification for the decision (technical and business justifications)\n- email-driven architecture - where people lose, forget, or don't even know an architecture decision has been made and\n  therefore cannot implement that decision, notify impacted people directly in order to avoid this anti-pattern\n\nArchitecturally significant decisions are those decisions that affect [OR]:\n\n- the structure - impacts the patterns/styles of architecture being used\n- nonfunctional characteristics - architecture characteristics (performance, scalability, ...)\n- dependencies - coupling points between components/services within the system\n- interfaces - how services and components are accessed and orchestrated\n- construction techniques - platforms, frameworks, tools, processes\n\nArchitecture Decision Records - ADRs - short text file describing a specific architecture decision. 5 main sections:\n\n- title - numbered sequentially and contains short phrase describing the architecture decisions\n- status - one of: proposed (must be approved-by a higher-level decision maker), accepted (approved & ready for\n  implementation), suspended (decision changed and superseded by another ADR)\n- context - what situation forces me to make this decision, this section also provides a way to document the\n  architecture (clear & concise)\n- decision - the architecture decision, along with full justification for the decision, advised to use following voice:\n  we will do, we will use, ... -- this section allows an architect to place more emphasis on _why_ rather than _how_.\n  Understanding why a decision was made is far more important than understanding how something works.\n- consequences - the overall impact of an architecture decision, this section forces the architect to think about\n  whether those impacts outweigh the benefits of the decision. Another good use is to document the trade-offs' analysis.\n- [additional] compliance - how the architecture decision will be measured and governed from a compliance perspective\n- [additional] notes - various metadata -- author, approval date, approved by, superseded date, last modified date, ...\n\nAuthors' recommendation -- store ADRs in a wiki, rather than on Git.\n\nADRs can be used as an effective means to document a software architecture.\n\n## Chapter 20: Analyzing Architecture Risk\n\nEvery architecture has risk associated with it -- risk involving availability, scalability, data integrity, ... -- by\nidentifying risks, the architect can address deficiencies and take corrective actions.\n\nThe Architecture Risk Matrix - 2D array -- the overall impact and the likelihood, each dimension has 3 ratings (low,\nmedium, high). When leveraging the risk matrix to qualify the risk, consider the impact dimension first and the\nlikelihood dimension second.\n\nRisk Assessment - a summarized report of the overall risk of an architecture (the risk matrix can be used to build it).\n\nRisk Storming - a collaborative exercise used to determine architectural risk within a specific dimension (area of risk)\n-- unproven technology, performance, scalability, availability, data loss, single points of failure, security. Risk\nstorming is broken down into 3 primary activities:\n\n1. Identification - each participant individually identifies areas of risk within the architecture, should involve\n   analyzing only one particular dimension\n2. Consensus - highly collaborative activity with the goal of gaining consensus among all participants\n3. Mitigation - involves changes or enhancements to certain areas of the architecture\n\nRisk storming can be used for other aspects of software development -- for example story grooming -- story risk, the\nlikelihood that the story will not be completed.\n\n## Chapter 21: Diagramming and Presenting Architecture\n\nEffective communication becomes critical to an architect's success. No matter how brilliant an architect's ideas are, if\nthey can't convince managers to fund them and developers to build them.\n\nDiagramming and presenting are 2 critical soft skills for architects.\n\nIrrational Artifact Attachment - is the proportional relationship between a person's attachment to some artifact and\nhow long it took to produce.If you spend a lot of time on something , you may have an irrational attachment to that\nartifact (proportional to the time invested). Use Agile approach in order to avoid this anti-pattern - create\njust-in-time artifacts, use simple tools to create diagrams.\n\nBaseline features of a diagram tool:\n\n- layers - used to link a group of items together logically to enable hiding/showing individual layers. An architect can\n  build a diagram where they can hide overwhelming details or to incrementally build pictures for presentations\n- stencils/templates - allow to build up a library of common virtual components (basic shapes with a special meaning,\n  e.g. microservice stencil)\n- magnets - assistance in drawing lines\n\nDiagram Guidelines:\n\n- Titles - all elements of the diagram should have title or are well known to the audience\n- Lines - should be thick enough to be seen, if lines indicate information flow use arrows\n    - solid lines = synchronous communication\n    - dotted lines = asynchronous communication\n- Shapes - each architect tends to make their own standard set of shapes, hint: use 3D boxes to indicate deployable\n  artifacts and rectangles to indicate containership\n- Labels - label each item in a diagram, especially if there is a chance of ambiguity for the readers\n- Color - use colors when it helps to distinguish one artifact from the other\n- Keys - if shapes are ambiguous, include a key on the diagram clearly indicating what each shape represents\n\nBook recommendation: Presentation Patterns\n\nWhen preparing a presentation - use different type of transition when changing a topic, use the same transition within a\ntopic.\n\nWhen presenting, the presenter has 2 presentation channels: verbal and visual. By placing too much text on the slides\nand then saying the same words, the presenter is overloading one information channel and starving the other.\n\nUsing animations and transitions in conjunction with incremental builds (reveal information gradually) allows the\npresenter to make more compelling, entertaining presentations.\n\nInfo-decks - slide decks that are not meant to be projected but rather summarize information graphically, essentially\nusing a presentation tool as a desktop publishing machine. They contain all the information, are meant to be standalone,\nno need for presenter.\n\nInvisibility - a pattern where the presenter inserts a blank slide within a presentation to refocus attention solely on\nthe speaker (turn of one visual channel).\n\n## Chapter 22: Making Teams Effective\n\nA software architect is also responsible for guiding the development team through the implementation of the\narchitecture.\n\nSoftware architect should create and communicate constraints, or the box, in which developers can implement the\narchitecture. Tight boundaries = frustration, loose boundaries = confusion, appropriate boundaries = effective teams.\n\n3 basic types of architect personalities:\n\n- a control freak:\n    - controls every detailed aspect of the software development process\n    - too fine-grained and too low-level decisions\n    - may restrict the development team to use specific technology, library, naming convention, class design\n    - steals art of programming away from the developers\n- an armchair architect:\n    - hasn't coded in a very long time and does not take the implementation details into account\n    - creates loose boundaries, in this scenario, development teams end up taking the role of architect, doing the work\n      an architect is supposed to be doing\n    - in order to avoid such behaviour, an architect should be involved in the technology being used on the project\n- an effective architect:\n    - produces the appropriate constraints and boundaries, ensures that the team members are working well together and\n      have the right level of guidance on the team\n    - requires working closely and collaborating with the team, and gaining respect of the team as well\n\nElastic Leadership - https://www.elasticleadership.com -- knowing how much control to exert on a given development team,\nfactors to determine how many teams a software architect can manage at once:\n\n- team familiarity - the better team members know each other, the less control is needed because team members start to\n  become self-organizing, the newer the team members, the more control needed to help facilitate collaboration among\n  team members and reduce cliques within the team\n- team size - the larger the team, the more control is needed, the smaller the team, less control is needed\n- overall experience - teams with more junior developers require more control and mentoring whereas teams with more\n  senior developers require less control\n- project complexity - highly complex projects require the architect to be more available to the team and to assist with\n  issues that arise, hence more control is needed on the team\n- project duration - the shorter the duration, the lass control is needed, the longer the project, the more control is\n  needed\n\n3 factors when considering the most effective team size:\n\n- process loss - (Brook's law) the more people you add to a project, the more ime the project will take, example: unable\n  to parallelize work, merge conflicts\n- pluralistic ignorance - when everyone agrees to a norm because they think they are missing something obvious, rather\n  than speaking up, a person chooses to follow the group (similar to \"The Emperor's New Clothes\" -- the king is naked),\n  an architect should observe body language of all team members and ask each person what they think about the proposed\n  solution\n- diffusion of responsibility - as team size increases, it has a negative impact on communication\n\nAn effective architect not only helps guide the development team through the implementation of the architecture, but\nalso ensures that the team is healthy, happy, and working together to achieve a common goal.\n\nChecklists work and provide an excellent vehicle for making sure everything is covered and addressed. The key to making\nteams effective is knowing when to leverage checklists and when not to. Most effective checklists:\n\n- code completion checklist - if everything in the checklist is completed, then the developer can claim they are\n  actually done with the code\n- unit and functional testing checklist - contains some of the more unusual and edge-case tests that software developers\n  tend to forget to test\n- software release checklist - releasing software is perhaps one of the most error-prone aspects of the software\n  development life cycle, it helps avoid failed builds, deployments, and it significantly reduces the amount of rish\n  associated with releasing software\n\nMany items from the checklists can be automated.\n\n> Don't worry about stating the obvious in a checklist. It's the obvious stuff that's usually skipped or missed.\n\n## Chapter 23: Negotiation and Leadership Skills\n\nNegotiation is one of the most important skills a software architect can have. Effective software architects understand\nthe politics of the organization, have strong negotiation and facilitation skills, and can overcome disagreements when\nthey occur to create solutions that all stakeholders agree on.\n\n\"We must have zero downtime\", \"I need these features yesterday\", ...:\n\n> Leverage the use of grammar and buzzwords to better understand the situation\n\nEnter the negotiation wit as many arguments as possible:\n\n> Gather as much information as possible _before_ entering into a negotiation\n\nSave this negotiation tactic for last:\n\n> When all else fails, state things in terms of cost and time\n\nDoes entire system require 99.999% availability or just some parts?:\n\n> Leverage the \"divide and conquer\" rule to qualify demands or requirements\n\nDemonstrate your point with a real-life example:\n\n> Always remember that demonstration defeats discussion\n\n> Avoid being too argumentative or letting things get too personal in a negotiation -- calm leadership combined with\n> clear and concise reasoning will always win a negotiation\n\nIvory Tower architecture anti-pattern - Ivory tower architects are ones who simply dictate from on high, telling\ndevelopment teams what to do without regard to their opinion or concerns. This usually leads to a loss of respect for\nthe architect and an eventual breakdown of the team dynamics.\n\n> When convincing developers to adopt an architecture decision or to do a specific task, provide a justification rather\n> than \"dictating from on high\"\n\nBy providing a reason why something needs to be done, developers will more likely agree with the request. Most of the\ntime, once a person hears something they disagree with, they stop listening. By stating the reason first, the architect\nis sure that the justification will be heard.\n\n> If a developer disagrees with a decision, have them arrive at the solution on their own\n\nWin-win situation: the developer either fail trying and the architect automatically gets buy-in agreement for the\narchitect's decision or the developer finds a better way to address concerns.\n\nAccidental complexity - we have made a problem hard, architects sometimes do this to prove their worth when things seem\ntoo simple or to guarantee that they are always kept in the loop on discussions and decisions. Introducing accidental\ncomplexity into something that is not complex is one of the best ways to become an ineffective leader as an architect.\nAn effective way of avoiding accidental complexity is what we call the 4C's of architecture:\n\n- communication\n- collaboration\n- clarity\n- conciseness\n\nBe pragmatic, yet visionary. Visionary - Thinking about or planning the future with imagination or wisdom. Pragmatic -\nDealing with things sensibly and realistically in a way that is based on practical rather than theoretical\nconsiderations.\n\nBad software architects leverage their title to get people to do what they want from them to do. Effective software\narchitects get people to do things by not leveraging their title as architect, but rather by leading through example,\nnot title. Lead by example, not by title.\n\nTo lead a team and become an effective leader, a software architect should try to become the go-to person on the team -\nthe person developers go to for their questions and problems. Another technique to start gaining respect as a leader and\nbecome the go-to person on the team is to host periodic brown-bag lunches to talk about specific technique or\ntechnology.\n\nToo many meetings? Ask for the meeting agenda ahead of time to help quantify if you are really needed at the meeting or\nnot.\n\nMeetings should be either first thing in the morning, right after lunch, or toward the end of the day, but not during\nthe day when most developers experience flow state.\n\n> The most important single ingredient is the formula of success is knowing how to get along with people ~ Theodore\n> Roosevelt\n\n## Chapter 24: Developing a Career Path\n\nAn architect must continue to learn throughout their career. Technology breadth is more important to architects than\ndepth.\n\nThe 20-Minute Rule - devote at least 20 minutes a day to your career as an architect by learning something new or diving\ndeeper into a specific topic. Spend min. 20 minutes to Google some unfamiliar buzzwords.\n\nTechnology Radar: https://www.thoughtworks.com/radar\n\nYou can create your won personal technology radar. It helps to formalize thinking about technology and balance opposing\ndecision criteria.\n\nArchitects should choose some technologies and/or skills that are widely in demand and track that demand. But they might\nalso want to try some technology gambits, like open source or mobile development.\n\nArchitects can utilize social media to enhance their technical breadth. Using media like Twitter professionally,\narchitects should find technologists whose advice they respect. This allows to build a network on new, interesting\ntechnologies to assess and keep up with the rapid changes in the technology world.\n\n## Self-Assessment Questions\n\n[Chapter 1: Introduction](#chapter-1-introduction)\n\n1. What are the 4 dimensions that define software architecture?\n\nKnowledge of the architecture structure, architecture characteristics, architecture decisions, and design principles.\n\n2. What is the difference between an architecture decision and a design principle?\n\nDecisions: what is and what is not allowed, rules for a system how it should be constructed. Design principles:\nguidelines for constructing systems.\n\n3. List the eight core expectations of a software architect.\n\nMake architecture decisions. Continually analyze the architecture. Keep current with the latest trends. Ensure\ncompliance with decisions. Diverse exposure and experience. Have business domain knowledge. Posses interpersonal skills.\nUnderstand and navigate politics.\n\n4. What is the First Law of Software Architecture.\n\nEverything in software architecture is a trade-off.\n\n[Chapter 2: Architectural thinking](#chapter-2-architectural-thinking)\n\n1. Describe the traditional approach of architecture versus development and explain why that approach no longer works.\n\nIn a traditional model the architect is disconnected from the development teams, and as such the architecture rarely\nprovides what it was originally set out to do. Architect defines architecture characteristics, selects architecture\npatterns and styles, then these artifacts are handed off to the development teams.\n\nBoundaries between architects and developers must be broken down. Unlike the old-school waterfall approaches to static\nand rigid software architecture, the architecture of today's systems changes and evolves every iteration. A tight\ncollaboration is essential for the success.\n\n2. List the three levels of knowledge in the knowledge triangle and provide an example of each.\n\nStuff you know: Python\n\nStuff you know you don't know: Deep Learning\n\nStuff you don't know you don't know: 🤷‍\n\n3. Why is it more important for an architect to focus on technical breadth rather than technical depth?\n\nArchitects must make decisions that match capabilities to technical constraints, a broad understanding of a wide variety\nof solutions is valuable.\n\n4. What are some of the ways of maintaining your technical depth and remaining hands-on as an architect?\n\n- do frequent proof-of-concepts\n- whenever possible, write best production-quality code (even when doing POCs) -- POC code often remains in the\n  repository and becomes the reference or guiding example\n- tackle technical debt stories or architecture stories, freeing the development team up to work on the critical\n  function user stories\n- work on bug fixes\n- create simple command-line tools and analyzers to help the development team with their day-to-day tasks\n- do code reviews frequently\n\n[Chapter 3: Modularity](#chapter-3-modularity)\n\n1. What is meant by the term _connascence_?\n\nTwo components mare connascent if a change in one would require the other to be modified in order to maintain teh\noverall correctness of the system.\n\nConnascence allows us to go beyond the binary of \"coupled\" and \"not coupled\", serving as a tool to measure coupling and\ndescribe how bad it is under different levels and kinds.\n\n2. What is the difference between static and dynamic connascence?\n\nStatic connascence refers to source-code-level coupling - name (multiple entities must agree on the name), type (\nmultiple entities must agree on the type), meaning (multiple entities must agree on the meaning of particular values),\nposition (multiple entities must agree on the order of the values), algorithm (multiple entities must agree on a\nparticular algorithm).\n\nDynamic connascence analyzes calls at runtime - execution (order of execution), timing (timing of the execution of\nmultiple components), values (several values relate to one another), identity (several values relate to one another and\nmust change together).\n\n3. What does the connascence of type mean? Is it static or dynamic connascence?\n\n[STATIC] Multiple components must agree on the type of entity.\n\n4. What is the strongest form of connascence?\n\nIdentity. Multiple components must reference the same entity. For example when 2 independent components must share and\nupdate a common data source.\n\n5. What is the weakest form of connascence?\n\nName. Multiple components must agree on the name.\n\n6. Which is preferred within code base -- static or dynamic connascence?\n\nStatic. Architects have a harder time determining connascence because we lack tools to analyze runtime calls as\neffectively as we can analyze the call graph.\n\n[Chapter 4: Architecture Characteristics Defined](#chapter-4-architecture-characteristics-defined)\n\n1. What three criteria must an attribute meet to be considered an architecture characteristic?\n\n- specifies a non-domain design consideration\n- influences some structural aspect of the domain\n- is critical or important to application success\n\n2. What is the difference between an implicit characteristic and an explicit one? Provide an example of each.\n\nImplicit - appears in requirements, necessary for project success. Domain knowledge required to uncover such\ncharacteristics.\n\nExplicit - characteristic listed in the requirements.\n\n3. Provide an example of an operational characteristic.\n\nAvailability, Continuity, Performance, Reliability, Recoverability, Scalability, ...\n\n4. Provide an example of a structural characteristic.\n\nConfigurability, Extensibility, Maintainability, ...\n\n5. Provide an example of a cross-cutting characteristic.\n\nAccessibility, Authentication, Authorization, Legal, Security, Privacy, ...\n\n6. Which architecture characteristic is more important to strive for -- availability or performance?\n\nThe ultimate answer for architectural questions: _it depends on ...\n\n[Chapter 5: Identifying Architectural Characteristics](#chapter-5-identifying-architectural-characteristics)\n\n1. Give a reason why it is a good practice to limit the number of characteristics an architecture should support.\n\nOver-specifying architecture characteristics may kill the project. Example: The Vasa - a Swedish warship, it was\nsupposed to be magnificent, turned out to be too heavy, too complicated.\n\nKeep the design simple.\n\n2. True or false: most architecture characteristics come from business requirements and user stories\n\nTrue.\n\n3. If a business stakeholders states that time-to-market is the most important business concern, which architecture\n   characteristic would the architecture need to support?\n\nAgility, testability, deployability\n\n4. What is the difference between scalability and elasticity?\n\nScalability - the ability to handle a large number of concurrent users without serious performance degradation.\n\nElasticity - the ability to handle bursts of requests.\n\n5. You find out that your company is about to undergo several major acquisitions to significantly increase its customer\n   base. Which architectural characteristics should you be worried about?\n\nInteroperability, scalability, adaptability, extensibility.\n\n[Chapter 6: Measuring and Governing Architecture Characteristics](#chapter-6-measuring-and-governing-architecture-characteristics)\n\n1. Why is cyclomatic complexity such an important metric to analyze for architecture?\n\nOverly complex code represents a code smell - it harms virtually every one of the desirable characteristics.\n\n2. What is an architecture fitness function? How can they be used to analyze an architecture?\n\nAny mechanism that provides an objective integrity assessment of some architecture characteristic or combination of\narchitecture characteristics. Many tools may be used to implement fitness functions: metrics, monitors, unit tests,\nchaos engineering, ...\n\n3. Provide an example of an architecture fitness function to measure the scalability of an architecture?\n\nCode automatic scalability tests and compare results.\n\n4. What is the most important criteria for an architecture characteristic to allow architects and developers to create\n   fitness functions?\n\nArchitects must ensure that developers understand the purpose of the fitness function before imposing it on them.\n\n[Chapter 7: Scope of Architecture Characteristics](#chapter-7-scope-of-architecture-characteristics)\n\n1. What is an architectural quantum, and why is it important to architecture?\n\nThe architectural quantum is the smallest possible item that needs to be deployed in order to run an application.\n\n3. Assume a system consisting of a single user interface with four independently deployed services, each containing its\n   own separate database. Would this system have a single quantum or four quanta? Why?\n\n4 because each service can be deployed separately.\n\n4. Assume a system with an administration portion managing static reference data (such as the product catalog, and\n   warehouse information) and a customer-facing portion managing the placement of orders. How many quanta should this\n   system be and why? If you envision multiple quanta, could the admin quantum and customer-facing quantum share a\n   database? If so, in which quantum would the database need to reside?\n\n2 quantas - ordering and a warehouse management, separate databases.\n\n[Chapter 8: Component-Based Thinking](#chapter-8-component-based-thinking)\n\n1. We define the term component as a building block of an application - something the application does. A component\n   usually consist of a group of classes or source files. How are components typically manifested within an application\n   or service?\n\nComponents - the physical manifestation of a module. Components offer a language-specific mechanism to group artifacts\ntogether, often nesting them to create stratification. Components also appear as subsystems or layers in architecture,\nas the deployable unit of work for many event processors.\n\n2. What is the difference between technical partitioning and domain partitioning? Provide an example of each.\n\nTechnical partitioning - organizing architecture based on technical capabilities (presentation, business, service,\npersistence).\n\nDomain partitioning - a modeling technique for decomposing complex systems. In DDD the architect identifies domains\nindependent and decoupled from each other. The microservices architecture is based on this philosophy.\n\n3. What is the advantage of domain partitioning?\n\nBetter reflects the kinds of changes that most often occur on projects.\n\n4. Under what circumstances would technical partitioning be a better choice over domain partitioning?\n\nSeparation based on technical partitioning enables developers to find certain categories of code base quickly, as it is\norganized by capabilities.\n\n5. What is the entity trap? Why is it not a good approach for component identification?\n\nArises when the architect incorrectly identifies the database relationships ads workflows in the application, a\ncorrespondence that rarely manifests in the real world. This anti-pattern indicates lack of thought about the actual\nworkflows of the application. Components created with entity-trap tend to be too coarse-grained.\n\n[Chapter 9: Foundations](#chapter-9-foundations)\n\n1. List the eight fallacies of distributed computing.\n\nLatency is Zero, Bandwidth is Infinite, The Network is Reliable, The Network is Secure, The Topology Never Changes,\nThere is Only One Administrator, Transport Cost is Zero, The Network is Homogenous\n\n2. Name three challenges that distributed architectures have that monolithic architectures don't.\n\nDebugging a distributed architecture, Distributed transactions, Contract maintenance and versioning.\n\n3. What is stamp coupling?\n\nRequesting/receiving too much data whereas only a small subset of data is needed -- 2000 req x 10kB VS 2000 req x 100kB\n\n4. What are some ways of addressing stamp coupling?\n\n- create private RESTful API endpoints\n- use field selectors in the contract\n- use GraphQL\n- use internal messaging endpoints\n"
  },
  {
    "path": "books/go/ch01/Makefile",
    "content": "# Set default target, when 'make' executed, runs 'build' by default:\n.DEFAULT_GOAL := build\n\nfmt:\n\tgo fmt ./...\n# Keep 'make' from getting confused with directories, in this case with directory 'fmt' (if it is ever created):\n.PHONY: fmt\n\n# Before running 'lint', run 'fmt'\nlint: fmt\n\tgolint ./...\n.PHONY: lint\n\nvet: fmt\n\tgo vet ./...\n.PHONY: vet\n\nbuild: vet\n\tgo build hello.go\n.PHONY: build\n"
  },
  {
    "path": "books/go/ch01/hello.go",
    "content": "package main\n\nimport \"fmt\"\n\nfunc main() {\n\tfmt.Println(\"Hello, world!\")\n}\n"
  },
  {
    "path": "books/go/ch02/const.go",
    "content": "package main\n\nimport \"fmt\"\n\nconst x int64 = 10\n\nconst (\n\tidKey   = \"id\"\n\tnameKey = \"name\"\n)\n\nconst z = 20 * 20\n\nfunc main() {\n\tconst y = \"hello\"\n\n\tfmt.Println(x)\n\tfmt.Println(y)\n\n\t//x = x + 1 // Error\n\t//y = \"bye\" // Error\n\n\tfmt.Println(x)\n\tfmt.Println(y)\n}\n"
  },
  {
    "path": "books/go/ch02/unicode.go",
    "content": "package main\n\nimport \"fmt\"\n\nfunc main() {\n\tęąćśż := \"hello\"\n\tfmt.Println(ęąćśż)\n}\n"
  },
  {
    "path": "books/go/ch03/types.go",
    "content": "package main\n\nimport \"fmt\"\n\nfunc main() {\n\tvar x [3]int\n\tfmt.Println(x)\n\n\tvar y = [12]int{1, 5: 4}\n\tfmt.Println(y)\n\n\tvar z = [...]int{12, 20, 30}\n\tfmt.Println(z)\n\n\tvar p = []int{12, 20, 30}\n\tfmt.Println(p)\n\n\tvar v []int\n\tfmt.Println(v == nil)\n\tfmt.Println(len(v))\n\tv = append(v, 10, 20)\n\tfmt.Println(v)\n\tv = append(v, p...)\n\tfmt.Println(v)\n\tfmt.Println(cap(v))\n\n\tr := make([]int, 5)\n\tfmt.Println(r)\n\tr = make([]int, 0, 20)\n\tr = append(r, 10, 20)\n\tfmt.Println(r)\n\n\ts := \"Hello 😇\"\n\tfmt.Println(s[6:7])\n\tfmt.Println(s[6:10]) // 4 bytes for emoji\n\n\tteams := map[string][]string{\n\t\t\"Orcas\": {\"Fred\", \"Ralph\"},\n\t\t\"Lions\": {\"Sarah\", \"Peter\"},\n\t}\n\tfmt.Println(teams)\n\tteam, ok := teams[\"Kittens\"]\n\tfmt.Println(team, ok)\n\n\tset := map[int]bool{}\n\tvals := []int{1, 2, 3, 4, 5, 6, 7, 4, 3, 2, 3, 4, 3}\n\tfor _, v := range vals {\n\t\tset[v] = true\n\t}\n\tfmt.Println(len(set), len(vals))\n\tif set[1] {\n\t\tfmt.Println(\"1 is in the set\")\n\t}\n\n\ttype person struct {\n\t\tname string\n\t\tage  int\n\t\tpet  string\n\t}\n\tjulia := person{\n\t\t\"Julia\",\n\t\t30,\n\t\t\"cat\",\n\t}\n\tbeth := person{\n\t\tname: \"Beth\",\n\t}\n\tfmt.Println(julia, beth)\n\n\tvar bob struct {\n\t\tname string\n\t\tage  int\n\t\tpet  string\n\t}\n\tbob.name = \"Bob\"\n\tfmt.Println(bob)\n}\n"
  },
  {
    "path": "books/go/ch04/case.go",
    "content": "package main\n\nimport \"fmt\"\n\nfunc main() {\n\twords := []string{\"a\", \"cow\", \"smile\", \"gopher\"}\n\n\tfor _, word := range words {\n\t\tswitch size := len(word); size {\n\t\tcase 1, 2, 3, 4:\n\t\t\tfmt.Println(word, \"is a short word!\")\n\t\tcase 5:\n\t\t\twordLen := len(word)\n\t\t\tfmt.Println(word, \"is the exactly the right length:\", wordLen)\n\t\tcase 6, 7, 8, 9:\n\t\tdefault:\n\t\t\tfmt.Println(word, \"is a long word!\")\n\t\t}\n\t}\n}\n"
  },
  {
    "path": "books/go/ch04/for.go",
    "content": "package main\n\nimport (\n\t\"fmt\"\n)\n\nfunc main() {\n\tcompleteFor()\n\tconditionOnlyFor()\n\tinfiniteFor()\n\tforRange()\n\tlabelingStatements()\n}\n\nfunc completeFor() {\n\tfor i := 0; i < 10; i++ {\n\t\tfmt.Println(i)\n\t}\n}\n\nfunc conditionOnlyFor() {\n\ti := 1\n\tfor i < 100 {\n\t\tfmt.Println(i)\n\t\ti = i * 2\n\t}\n}\n\nfunc infiniteFor() {\n\tfor {\n\t\tfmt.Println(\"Hello\")\n\t\tbreak\n\t}\n}\n\nfunc forRange() {\n\tevenVals := []int{2, 4, 6, 8, 10, 12}\n\tfor i, v := range evenVals {\n\t\tfmt.Println(i, v)\n\t}\n\n\tfor _, v := range evenVals {\n\t\tfmt.Println(v)\n\t}\n\n\tfor _, v := range evenVals {\n\t\tfmt.Println(v)\n\t}\n\n\tuniqueNames := map[string]bool{\"Fred\": true, \"Paul\": true, \"Wilma\": true}\n\tfor k := range uniqueNames {\n\t\tfmt.Println(k)\n\t}\n}\n\nfunc labelingStatements() {\n\tsamples := []string{\"hello\", \"apple_π!\"}\nouter:\n\tfor _, sample := range samples {\n\t\tfor i, r := range sample {\n\t\t\tfmt.Println(i, r, string(r))\n\t\t\tif r == 'l' {\n\t\t\t\tcontinue outer\n\t\t\t}\n\t\t}\n\t\tfmt.Println()\n\t}\n}\n"
  },
  {
    "path": "books/go/ch04/if.go",
    "content": "package main\n\nimport (\n\t\"fmt\"\n\t\"math/rand\"\n)\n\nfunc main() {\n\tif n := rand.Intn(10); n == 10 {\n\t\tfmt.Println(\"That's too low\")\n\t} else if n > 5 {\n\t\tfmt.Println(\"That's too big:\", n)\n\t} else {\n\t\tfmt.Println(\"That's a good number:\", n)\n\t}\n}\n"
  },
  {
    "path": "books/go/ch05/anonymous.go",
    "content": "package main\n\nimport \"fmt\"\n\nfunc main() {\n\tfor i := 0; i < 5; i++ {\n\t\tfunc(j int) {\n\t\t\tfmt.Println(\"printing\", j, \"from inside of an anonymous function\")\n\t\t}(i)\n\t}\n}\n"
  },
  {
    "path": "books/go/ch05/deferExample.go",
    "content": "package main\n\nimport (\n\t\"io\"\n\t\"log\"\n\t\"os\"\n)\n\nfunc getFile(name string) (*os.File, func(), error) {\n\tf, err := os.Open(name)\n\tif err != nil {\n\t\treturn nil, nil, err\n\t}\n\treturn f, func() {\n\t\tf.Close()\n\t}, nil\n}\n\nfunc main() {\n\tif len(os.Args) < 2 {\n\t\tlog.Fatal(\"no file specified\")\n\t}\n\tf, closer, err := getFile(os.Args[1])\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\tdefer closer()\n\tdata := make([]byte, 2048)\n\tfor {\n\t\tcount, err := f.Read(data)\n\t\tos.Stdout.Write(data[:count])\n\t\tif err != nil {\n\t\t\tif err != io.EOF {\n\t\t\t\tlog.Fatal(err)\n\t\t\t}\n\t\t\tbreak\n\t\t}\n\t}\n}\n"
  },
  {
    "path": "books/go/ch05/functionAsParam.go",
    "content": "package main\n\nimport (\n\t\"fmt\"\n\t\"sort\"\n)\n\ntype Person struct {\n\tFirstName string\n\tLastName  string\n\tAge       int\n}\n\nfunc main() {\n\tpeople := []Person{\n\t\t{\"Pat\", \"Patterson\", 34},\n\t\t{\"Tracy\", \"Bobbert\", 23},\n\t\t{\"Fred\", \"Fredson\", 18},\n\t}\n\tsort.Slice(people, func(i int, j int) bool {\n\t\treturn people[i].LastName < people[j].LastName\n\t})\n\tfmt.Println(people)\n}\n"
  },
  {
    "path": "books/go/ch05/functions.go",
    "content": "package main\n\nimport (\n\t\"errors\"\n\t\"fmt\"\n)\n\nfunc main() {\n\tresult := Div(5, 2)\n\tfmt.Println(result)\n\n\tMyFunc(MyFuncOpts{\n\t\tLastName: \"Smith\",\n\t\tAge:      10,\n\t})\n\n\tfmt.Println(addTo(10, 1, 2, 3, 4, 5))\n\tfmt.Println(addTo(10, []int{1, 2, 3, 4, 5}...))\n\n\tresult, remainder, err := divAndRemainder(5, 2)\n\tif err != nil {\n\t\tfmt.Println(err)\n\t}\n\tfmt.Println(result, remainder)\n}\n\nfunc Div(numerator int, denominator int) int {\n\tif denominator == 0 {\n\t\treturn 0\n\t}\n\treturn numerator / denominator\n}\n\ntype MyFuncOpts struct {\n\tFirstName string\n\tLastName  string\n\tAge       int\n}\n\nfunc MyFunc(opts MyFuncOpts) int {\n\treturn opts.Age\n}\n\nfunc addTo(base int, vals ...int) []int {\n\tout := make([]int, 0, len(vals))\n\tfor _, v := range vals {\n\t\tout = append(out, base+v)\n\t}\n\treturn out\n}\n\nfunc divAndRemainder(numerator int, denominator int) (result int, remainder int, err error) {\n\tif denominator == 0 {\n\t\terr = errors.New(\"cannot divide by zero\")\n\t\treturn result, remainder, err\n\t}\n\tresult, remainder, err = numerator/denominator, numerator%denominator, nil\n\treturn result, remainder, err\n}\n"
  },
  {
    "path": "books/go/ch05/functionsAreValues.go",
    "content": "package main\n\nimport \"fmt\"\n\nfunc main() {\n\tvar opMap = map[string]func(int, int) int{\n\t\t\"+\": add,\n\t\t\"-\": sub,\n\t\t\"*\": mul,\n\t\t\"/\": div,\n\t}\n\n\tfmt.Println(opMap[\"+\"](10, 20))\n}\n\nfunc add(i int, j int) int { return i + j }\nfunc sub(i int, j int) int { return i - j }\nfunc mul(i int, j int) int { return i * j }\nfunc div(i int, j int) int { return i / j }\n"
  },
  {
    "path": "books/go/ch05/returnFunction.go",
    "content": "package main\n\nimport \"fmt\"\n\nfunc makeMult(base int) func(int) int {\n\treturn func(factor int) int {\n\t\treturn base * factor\n\t}\n}\n\nfunc main() {\n\ttwoBase := makeMult(2)\n\tthreeBase := makeMult(3)\n\n\tfor i := 0; i < 3; i++ {\n\t\tfmt.Println(twoBase(i), threeBase(i))\n\t}\n}\n"
  },
  {
    "path": "books/go/ch06/pointers.go",
    "content": "package main\n\nimport \"fmt\"\n\nfunc failedUpdate(px *int) {\n\tx2 := 20\n\tpx = &x2\n}\n\nfunc update(px *int) {\n\t*px = 20\n}\n\nfunc main() {\n\ty := \"hello\"\n\tfmt.Println(y, &y, *&y)\n\n\tx := 10\n\tfailedUpdate(&x)\n\tfmt.Println(x)\n\tupdate(&x)\n\tfmt.Println(x)\n}\n"
  },
  {
    "path": "books/go/ch07/counter.go",
    "content": "package main\n\nimport (\n\t\"fmt\"\n\t\"time\"\n)\n\ntype Counter struct {\n\ttotal       int\n\tlastUpdated time.Time\n}\n\nfunc (c *Counter) Increment() {\n\tc.total++\n\tc.lastUpdated = time.Now()\n}\n\nfunc (c Counter) String() string {\n\treturn fmt.Sprintf(\"total: %d, last updated %v\", c.total, c.lastUpdated)\n}\n\nfunc updateWrong(c Counter) {\n\tc.Increment()\n\tfmt.Println(\"in updateWrong:\", c.String())\n}\n\nfunc updateRight(c *Counter) {\n\tc.Increment()\n\tfmt.Println(\"in updateRight:\", c.String())\n}\n\nfunc main() {\n\tvar c Counter\n\tfmt.Println(c.String())\n\tc.Increment()\n\tfmt.Println(c.String())\n\n\tupdateWrong(c)\n\tfmt.Println(\"in main:\", c.String())\n\tupdateRight(&c)\n\tfmt.Println(\"in main:\", c.String())\n}\n"
  },
  {
    "path": "books/go/ch07/dependencyInjection.go",
    "content": "package main\n\nimport (\n\t\"errors\"\n\t\"fmt\"\n\t\"net/http\"\n)\n\nfunc LogOutput(message string) {\n\tfmt.Println(message)\n}\n\ntype SimpleDataStore struct {\n\tuserData map[string]string\n}\n\nfunc (sds SimpleDataStore) UserNameForId(userID string) (string, bool) {\n\tname, ok := sds.userData[userID]\n\treturn name, ok\n}\n\nfunc NewSimpleDataStore() SimpleDataStore {\n\treturn SimpleDataStore{\n\t\tuserData: map[string]string{\n\t\t\t\"1\": \"Fred\",\n\t\t\t\"2\": \"Mary\",\n\t\t\t\"3\": \"Pat\",\n\t\t},\n\t}\n}\n\ntype DataStore interface {\n\tUserNameForId(userID string) (string, bool)\n}\n\ntype Logger interface {\n\tLog(message string)\n}\n\ntype LoggerAdapter func(message string)\n\nfunc (lg LoggerAdapter) Log(message string) {\n\tlg(message)\n}\n\ntype SimpleLogic struct {\n\tl  Logger\n\tds DataStore\n}\n\nfunc (sl SimpleLogic) SayHello(userID string) (string, error) {\n\tsl.l.Log(\"in SayHello for \" + userID)\n\tname, ok := sl.ds.UserNameForId(userID)\n\tif !ok {\n\t\treturn \"\", errors.New(\"unknown user\")\n\t}\n\treturn \"Hello, \" + name, nil\n}\n\nfunc (sl SimpleLogic) SayGoodbye(userID string) (string, error) {\n\tsl.l.Log(\"in SayGoodbye for \" + userID)\n\tname, ok := sl.ds.UserNameForId(userID)\n\tif !ok {\n\t\treturn \"\", errors.New(\"unknown user\")\n\t}\n\treturn \"Goodbye, \" + name, nil\n}\n\nfunc NewSimpleLogic(l Logger, ds DataStore) SimpleLogic {\n\treturn SimpleLogic{\n\t\tl:  l,\n\t\tds: ds,\n\t}\n}\n\ntype MyLogic interface {\n\tSayHello(userID string) (string, error)\n}\n\ntype Controller struct {\n\tl     Logger\n\tlogic MyLogic\n}\n\nfunc (c Controller) SayHello(w http.ResponseWriter, r *http.Request) {\n\tc.l.Log(\"In SayHello\")\n\tuserID := r.URL.Query().Get(\"user_id\")\n\tmessage, err := c.logic.SayHello(userID)\n\tif err != nil {\n\t\tw.WriteHeader(http.StatusBadRequest)\n\t\tw.Write([]byte(err.Error()))\n\t\treturn\n\t}\n\tw.Write([]byte(message))\n}\n\nfunc NewController(l Logger, logic MyLogic) Controller {\n\treturn Controller{\n\t\tl:     l,\n\t\tlogic: logic,\n\t}\n}\n\nfunc main() {\n\tl := LoggerAdapter(LogOutput)\n\tds := NewSimpleDataStore()\n\tlogic := NewSimpleLogic(l, ds)\n\tc := NewController(l, logic)\n\thttp.HandleFunc(\"/hello\", c.SayHello)\n\thttp.ListenAndServe(\":8080\", nil)\n}\n"
  },
  {
    "path": "books/go/ch07/embedding.go",
    "content": "package main\n\nimport \"fmt\"\n\ntype Employee struct {\n\tName string\n\tID   string\n}\n\nfunc (e Employee) Description() string {\n\treturn fmt.Sprintf(\"%s (%s)\", e.Name, e.ID)\n}\n\ntype Manager struct {\n\tEmployee\n\tReports []Employee\n}\n\nfunc main() {\n\tm := Manager{\n\t\tEmployee: Employee{\n\t\t\tName: \"Bob Bobson\",\n\t\t\tID:   \"12345\",\n\t\t},\n\t\tReports: []Employee{},\n\t}\n\n\tfmt.Println(m.ID)\n\tfmt.Println(m.Description())\n}\n"
  },
  {
    "path": "books/go/ch07/intTree.go",
    "content": "package main\n\nimport \"log\"\n\ntype IntTree struct {\n\tval         int\n\tleft, right *IntTree\n}\n\nfunc (it *IntTree) Insert(val int) *IntTree {\n\tif it == nil {\n\t\treturn &IntTree{val: val}\n\t}\n\n\tif val < it.val {\n\t\tit.left = it.left.Insert(val)\n\t} else if val > it.val {\n\t\tit.right = it.right.Insert(val)\n\t}\n\n\treturn it\n}\n\nfunc (it *IntTree) Contains(val int) bool {\n\tswitch {\n\tcase it == nil:\n\t\treturn false\n\tcase val < it.val:\n\t\treturn it.left.Contains(val)\n\tcase val > it.val:\n\t\treturn it.right.Contains(val)\n\tdefault:\n\t\treturn true\n\t}\n}\n\nfunc main() {\n\tvar it *IntTree\n\tit = it.Insert(5) // calling methods on a nil receiver\n\tit = it.Insert(3)\n\tit = it.Insert(10)\n\tit = it.Insert(2)\n\n\tlog.Println(it.Contains(2))\n\tlog.Println(it.Contains(12))\n}\n"
  },
  {
    "path": "books/go/ch07/interfaces.go",
    "content": "package main\n\nimport \"fmt\"\n\ntype LogicProvider struct{}\n\nfunc (lp LogicProvider) Process(data string) string {\n\treturn data\n}\n\ntype Logic interface {\n\tProcess(data string) string\n}\n\ntype Client struct {\n\tL Logic\n}\n\nfunc (c Client) Program() {\n\tdata := \"whatever\"\n\tc.L.Process(data)\n}\n\nfunc main() {\n\tc := Client{L: LogicProvider{}}\n\tc.Program()\n\n\tvar i interface{}\n\ti = 1\n\ti = \"a\"\n\tfmt.Println(i)\n}\n"
  },
  {
    "path": "books/go/ch07/iota.go",
    "content": "package main\n\ntype MailCategory int\n\nconst (\n\tUncategorized MailCategory = iota\n\tPersonal\n\tSpam\n\tSocial\n\tAds\n)\n"
  },
  {
    "path": "books/go/ch07/types.go",
    "content": "package main\n\nimport \"fmt\"\n\ntype Person struct {\n\tFirstName string\n\tLastName  string\n\tAge       int\n}\n\ntype King Person // this is not an inheritance\n\nfunc (p Person) String() string {\n\treturn fmt.Sprintf(\"%s %s, age %d\", p.FirstName, p.LastName, p.Age)\n}\n\ntype Score int\ntype Converter func(string) Score\ntype TeamScore map[string]Score\n\nfunc main() {\n\tp := Person{\n\t\tFirstName: \"Fred\",\n\t\tLastName:  \"Fredson\",\n\t\tAge:       52,\n\t}\n\tfmt.Println(p.String())\n}\n"
  },
  {
    "path": "books/go/ch08/customErrors.go",
    "content": "package main\n\ntype Status int\n\nconst (\n\tInvalidLogin Status = iota + 1\n\tNotFound\n)\n\ntype StatusErr struct {\n\tStatus  Status\n\tMessage string\n\terr     error\n}\n\nfunc (se StatusErr) Error() string {\n\treturn se.Message\n}\n\nfunc (se StatusErr) Unwrap() error {\n\treturn se.err\n}\n"
  },
  {
    "path": "books/go/ch08/errors.go",
    "content": "package main\n\nimport (\n\t\"errors\"\n\t\"fmt\"\n\t\"os\"\n)\n\nfunc calcRemainderAndMod(numerator, denominator int) (int, int, error) {\n\tif denominator == 0 {\n\t\treturn 0, 0, errors.New(\"denominator is 0\")\n\t}\n\treturn numerator / denominator, numerator % denominator, nil\n}\n\nfunc main() {\n\tnumerator := 20\n\tdenominator := 3\n\tremainder, mod, err := calcRemainderAndMod(numerator, denominator)\n\tif err != nil {\n\t\tfmt.Println(err)\n\t\tos.Exit(1)\n\t}\n\tfmt.Println(remainder, mod)\n}\n"
  },
  {
    "path": "books/go/ch08/panic.go",
    "content": "package main\n\nfunc doPanic(msg string) {\n\tpanic(msg)\n}\n\nfunc main() {\n\tdoPanic(\"ERR\")\n}\n"
  },
  {
    "path": "books/go/ch08/recover.go",
    "content": "package main\n\nimport \"fmt\"\n\nfunc div60(i int) {\n\tdefer func() {\n\t\tif v := recover(); v != nil {\n\t\t\tfmt.Println(v)\n\t\t}\n\t}()\n\tfmt.Println(60 / i)\n}\n\nfunc main() {\n\tfor _, val := range []int{1, 2, 0, 6} {\n\t\tdiv60(val)\n\t}\n}\n"
  },
  {
    "path": "books/go/ch08/sentinel.go",
    "content": "package main\n\nimport (\n\t\"archive/zip\"\n\t\"bytes\"\n\t\"fmt\"\n)\n\ntype Sentinel string\n\nfunc (s Sentinel) Error() string {\n\treturn string(s)\n}\n\nconst (\n\tErrFoo = Sentinel(\"foo err\")\n\tErrBar = Sentinel(\"bar err\")\n)\n\nfunc main() {\n\tdata := []byte(\"This is not a zip file\")\n\tnotZipFile := bytes.NewReader(data)\n\t_, err := zip.NewReader(notZipFile, int64(len(data)))\n\tif err == zip.ErrFormat {\n\t\tfmt.Println(\"Told you so\")\n\t}\n}\n"
  },
  {
    "path": "books/go/ch08/wrappingErrors.go",
    "content": "package main\n\nimport (\n\t\"errors\"\n\t\"fmt\"\n\t\"os\"\n)\n\nfunc fileChecker(name string) error {\n\tf, err := os.Open(name)\n\tif err != nil {\n\t\treturn fmt.Errorf(\"in fileChecker: %w\", err) // %w wraps the error\n\t\t//return fmt.Errorf(\"in fileChecker: %v\", err) // %v does not wrap the error\n\t}\n\tf.Close()\n\treturn nil\n}\n\nfunc main() {\n\terr := fileChecker(\"not_here.txt\")\n\tif err != nil {\n\t\tfmt.Println(err)\n\t\tif wrappedErr := errors.Unwrap(err); wrappedErr != nil {\n\t\t\tfmt.Println(wrappedErr)\n\t\t}\n\t}\n}\n"
  },
  {
    "path": "books/go/ch09/formatter/formatter.go",
    "content": "package print\n\nimport \"fmt\"\n\nfunc Format(num int) string {\n\treturn fmt.Sprintf(\"The number is %d\", num)\n}\n"
  },
  {
    "path": "books/go/ch09/main.go",
    "content": "package main\n\nimport (\n\t\"./formatter\"\n\t\"./math\"\n\t\"fmt\"\n)\n\nfunc main() {\n\tnum := math.Double(2)\n\toutput := print.Format(num)\n\tfmt.Println(output)\n}\n"
  },
  {
    "path": "books/go/ch09/math/math.go",
    "content": "package math\n\nfunc Double(a int) int {\n\treturn a * 2\n}\n"
  },
  {
    "path": "books/go/ch10/deadlock.go",
    "content": "package main\n\nimport \"fmt\"\n\nfunc main() {\n\tch1 := make(chan int)\n\tch2 := make(chan int)\n\n\tgo func() {\n\t\tv := 1\n\t\tch1 <- v\n\t\tv2 := <-ch2\n\t\tfmt.Println(v2)\n\t}()\n\n\tv := 2\n\tch2 <- v\n\tv2 := <-ch1\n\tfmt.Println(v, v2)\n}\n"
  },
  {
    "path": "books/go/ch10/deadlockSolution.go",
    "content": "package main\n\nimport \"fmt\"\n\nfunc main() {\n\tch1 := make(chan int)\n\tch2 := make(chan int)\n\n\tgo func() {\n\t\tv := 1\n\t\tch1 <- v\n\t\tv2 := <-ch2\n\t\tfmt.Println(v2)\n\t}()\n\n\tv := 2\n\tvar v2 int\n\n\tselect {\n\tcase ch2 <- v:\n\tcase v2 = <-ch1:\n\t}\n\n\tfmt.Println(v, v2)\n}\n"
  },
  {
    "path": "books/go/ch10/goroutinesExample.go",
    "content": "package main\n\nfunc process(val int) int {\n\treturn val * 2\n}\n\nfunc runThingConcurrently(in <-chan int, out chan<- int) {\n\tgo func() {\n\t\tfor val := range in {\n\t\t\tresult := process(val)\n\t\t\tout <- result\n\t\t}\n\t}()\n}\n"
  },
  {
    "path": "books/go/notes.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Learning Go: An Idiomatic Approach to Real-World Go Programming\n\nBook by Jon Bodner\n\nCode here: [click](.)\n\n- [Chapter 1: Setting Up Your Go Environment](#chapter-1-setting-up-your-go-environment)\n- [Chapter 2: Primitive Types and Declarations](#chapter-2-primitive-types-and-declarations)\n- [Chapter 3: Composite Types](#chapter-3-composite-types)\n- [Chapter 4: Blocks, Shadows, and Control Structures](#chapter-4-blocks-shadows-and-control-structures)\n- [Chapter 5: Functions](#chapter-5-functions)\n- [Chapter 6: Pointers](#chapter-6-pointers)\n- [Chapter 7: Types, Methods, and Interfaces](#chapter-7-types-methods-and-interfaces)\n- [Chapter 8: Errors](#chapter-8-errors)\n- [Chapter 9: Modules, Packages, and Imports](#chapter-9-modules-packages-and-imports)\n- [Chapter 10: Concurrency in Go](#chapter-10-concurrency-in-go)\n- [Chapter 11: The Standard Library](#chapter-11-the-standard-library)\n- [Chapter 12: The Context](#chapter-12-the-context)\n- [Chapter 13: Writing Tests](#chapter-13-writing-tests)\n- [Chapter 14: Here There Be Dragons: Reflect, Unsafe, and Cgo](#chapter-14-here-there-be-dragons-reflect-unsafe-and-cgo)\n- [Chapter 15: A Look at the Future: Generics in Go](#chapter-15-a-look-at-the-future-generics-in-go)\n\n## Chapter 1: Setting Up Your Go Environment\n\nGo is intended for building programs that last, programs that are modified by dozens of developers over dozens of years.\nUsing Go correctly requires an understanding of how its features are intended to fit together. You can write code that\nlooks like Java or Python, but you are going to be unhappy with the result.\n\n> $ brew install go\n\nValidate that your env is set up correctly: `go version`\n\nThere have been several changes in how Go developers organize their code and their dependencies. For modern Go\ndevelopment, the rules is simple: **you are free to organize your projects as you see fit**. However, Go still expects\nthere to be a single workspace (default `$HOME/go`) for third-party Go tools installed via `go install`. You can use\nthis default or set `$GOPATH` env variable.\n\nAdd following lines to `.zshrc`:\n\n```\nexport GOPATH=$HOME/go\nexport PATH=$PATH:$GOPATH/bin\n```\n\nUse `go run` when you want to treat a Go program like a script and run the source code immediately. `go run` builds the\nbinary in a temporary directory, and the deletes the binary after your program finishes. Useful for testing out small\nprograms or using Go like a scripting language.\n\nUse `go build` to create a binary that is distributed for other people to use. Most of the time, this is what you want\nto do. Use the `-o` flag to give the binary a different name or location.\n\nGo programs can be also built from source and installed into your Go work-space via `go install link@version`. Go\ndevelopers don't rely on a centrally hosted service (Maven, PyPI, NPM, ...). Instead they share projects via their\nsource code repositories. If you already installed a tool and want to update it to a newer version, rerun `go install`\nwith the newer version specified after `@`.\n\nDevelopers have historically wasted extraordinary amounts of time on format wars. Go defines a standard way of\nformatting code, Go developers avoid arguments over code styling. Go developers expect code to look in a certain way and\nfollow certain rules, and if your code does not, it sticks out.\n\n`go fmt` automatically reformat code to match the standard format.\n\nGo requires a semicolon at the end of every statement. However, Go developers never put the semicolons by themselves;\nthe Go compiler does it for them.\n\n`go vet` detects things like: passing the wrong number of parameters to formatting methods or assigning values to\nvariables that are never used.\n\nMake `golint` and `go vet part of your development process to avoid common bugs and non-idiomatic code.\n\nAn IDE is nice to use, but it is hard to automate. Modern software development relies on repeatable, automatable builds\nthat can be run by anyone, anywhere, at any time. Go developers have adopted `make` as their solution.\n\nYou can use different Go versions:\n\n```\ngo get golang.org/dl/go.1.15.6\ngo.1.15.6 download\ngo.1.15.6 build\n```\n\nIn order to update Go version globally on your computer use regular `brew` commands.\n\n## Chapter 2: Primitive Types and Declarations\n\nWhen trying to figure out what \"best\" means, there is one overriding principle: write your programs in a way that makes\nyour intention clear.\n\nLITERAL - in Go refers to writing out a number, character, or string.\n\n- integer literals\n    - sequences of numbers, normally base 10, but different prefixes are used to indicate other bases (`0b` binary, `0o`\n      octal, `0x` hexadecimal).\n    - put underscores in the middle of your literal, use them to improve readability, e.g. `120_000_000`\n- floating point literals - they can also have an exponent specified with the letter `e` and a positive or negative\n  number, e.g. `6.03e23`\n- rune literals - characters surrounded by single quotes, in Go `\"` and `'` are _not_ interchangeable.\n- string literals - two different ways to create:\n    - interpreted string literal (\") zero or more rune literals\n    - raw string literal (`) can contain any literal character except a backquote\n    - strings in Go are immutable\n\nLiterals in Go are untyped - they can interact with any variable that is compatible with the literal.\n\nBOOLEAN - `true` or `false`, variable definition defaults to `false`. Go doesn't allow truthiness - e.g. positive\ninteger can not be treated as `true`.\n\nINTEGER TYPES - 12 different types, more than other languages. 3 rules to follow:\n\n1. If you are working with a binary format or network protocol that has an integer of a specific size or sign, use\n   corresponding integer type.\n2. If you are writing a library function that should work with any integer type, write a pair of functions, one for\n   `int64`, and the other for `uint64`. You can see this pattern in std library (ParseInt/ParseUint, ...)\n3. In all other cases, just use `int`.\n\nFLOATING POINT - `float64` is the default type, simples option is to use this type. Don't worry about memory usage,\nunless you have used the profiler to determine it is a significant source of problems.\n\nA floating point number cannot represent a decimal value exactly. Do not use them to represent money or any other value\nthat must have an exact decimal representation.\n\nGo stores floats using IEEE 754 standard. 64 bits for the sign, 11 bits for the exponent, 52 bits to represent mantissa.\n\nGo doesn't allow automatic type promotion, as a language that values clarity of intent and readability. It turns out\nthat the rules to properly convert one type to another can get complicated and produce unexpected results. You must use\ntype conversion.\n\nVariable declaration. Go has multiple ways of declaring a variable, because each declaration style communicates\nsomething about how the variable is used.\n\n- `var x int = 10`\n- `var x = 10`\n- `var x int` - will default to 0\n- `var z, y int = 10, 20`\n- `var x, y = 10, \"hello\"`\n- `var(...)` - declaration list\n- `x := 10`\n\nThe most common declaration style within functions is `:=`. Outside a function, use declaration lists. Sometimes you\nneed to avoid `:=`:\n\n1. When initializing a variable to its zero value, use `var x int`. This makes it clear that the zero is intended.\n2. Because `:=` allows assigning to new and existing variables, it is confusing if you use new or existing variable.\n   Declare all new variables with `var`, and then use assignment operator (`=`) to both new and old variables.\n3. When you need to convert type during assignment, use `var x byte = 20`, not `x := byte(20)`.\n\nGo allows Unicode characters and letters in the variable name. However, don't use this feature.\n\nNaming:\n\n- use `camelCase`, even for constant vars\n- use single letters for e.g. loops: `k`, `v` are common names for `key`, `value`; `i` for `integer`, ...\n- do not put type in the variable name\n- use short names, they remove repetitive typing and force you to write smaller blocks of code (if you need a complete\n  name to keep track of it, it is likely that your block of code does too much)\n\n## Chapter 3: Composite Types\n\nARRAYS - rarely used in Go. All the elements in the array must be of the type that is specified.\n\n```go\nvar x [3] int\nvar x [3] int{10, 20, 30}\nvar x = [12]int{1, 5: 4}  // Sparse array (most elements are set to zero value)\nvar x = [...]int{12, 20, 30}\n```\n\nArrays are rarely used in go because they come with an unusual limitations:\n\n- _size_ of the array is part of the _type_, `[3]int` has different type than `[4]int`, you can't use a variable to\n  specify the size of an array\n- you can't use a type conversion to convert arrays of different sizes to identical types\n\nDon't use arrays unless you know the exact length you need ahead of time. Arrays in Go exist to provide backing stores\nfor SLICES.\n\nSLICES - slices remove limitations of arrays. We can write a single function that processes slices of any size. We can\nalso grow slices as needed.\n\nSlice definition:\n\n```go\nvar x = []int{12, 20, 30}\n```\n\nUsing `[...]` makes an array. Using `[]` makes a slice.\n\n`nil` in Go has no type, can be assigned or compared against values of different types.\n\nBuilt-in functions:\n\n- `len` - `len(nil)`\n- `append` - `x = append(x, 10, 20, 30)`, `x = append(x, y...)` (`...` used to expand the source slice)\n- `cap` - `cap(v)` - returns the current capacity of a slice\n- `make` - `x := make([]int, 5)` - it allows us to specify the type, length, and optionally, the capacity\n- `copy` - `numberOfElementsCopied := copy(destination, source)` - if you need to create a copy that is independent of\n  the original\n\nGo is _Call by value_ - every time you pass a parameter to a function, Go makes a copy of the value that is passed in.\n\nWhen a slice grows via `append`, Go increases a slice by more than a one when it runs out of capacity. Doubles the size\nwhen the size of the capacity is less than 1024 and then grow by at least 25% afterward.\n\n`make` and `append` is a preferred way of declaring slices.\n\nSlicing: `[startingOffset: endingOffset]`. In Go when you take a slice from a slice, you are not making a copy of the\ndata, Instead you have 2 variables that are sharing two variables. Avoid modifying slices after they have been sliced or\nif they were produced by slicing. Use the full slice expression to prevent `append` from sharing capacity between\nslices (`x[:2:2]`, `x[2:4:4]`). The last position indicates the last position in the parent slice's capacity that is\navailable for the subslice. Subtract the starting offset from this number to get the subslice's capacity.\n\nArray can be converted to Slice by using slicing expression.\n\nGo allows us to use slicing notation to make substrings. Be very careful when doing so. Strings are immutable, they\ndon't have modification problem BUT a string is composed of _bytes_, a code point in UTF-8 can be anywhere from one to\nfour bytes long. When dealing with languages other than English or with emojis, you run into code points that are\nmultiple bytes long.\n\nUTF-8 is very clever, in worst case uses 4 bytes, in best case only one. The only downside is that you cannot randomly\naccess a string encoded with UTF-8.\n\nMAPS - dictionary/hash map. Declaration `map[keyType]valueType`.\n\n- maps automatically grow as you add key-value pairs\n- is you know how many key-value pairs you plan to insert into a map, you can use `make` to create a map with specific\n  initial size\n- passing a _map_ to the _len_ function tells you the number of key-value pairs in a _map_\n- the zero value for a map is nil\n- maps are not comparable\n\nGo doesn't allow you to define your own hash algorithm.\n\nComma ok idiom - boolean value, if ok is true - key is present, ok is false - key is not present.\n\n- `delete` - `delete(m, key)` (remove key-value pair from the map)\n\nGo does not include sets, but you can use a map to simulate some of its features. Set simulation:\n\n```go\nset := map[int]bool{}\n```\n\nIf you need sets that provide operations like union, intersection, and subtraction - write one yourself or use 3rd-party\nlibrary.\n\nSTRUCT - when you have related data that you want to group together.\n\n```\ntype person struct {\n    name string\n    age  int\n    pet  string\n}\njulia := person{\n    \"Julia\",\n    30,\n    \"cat\",\n}\nbeth := person{\n    name: \"Beth\",\n}\n```\n\nAnonymous struct - without giving it a name first:\n\n```\nvar person struct {\n    name string\n    age  int\n    pet  string\n}\nperson.name = \"Bob\"\n```\n\nWhether struct is comparable depends on struct's fields. Structs that are entirely composed of comparable types are\ncomparable, those with slice or map fields are not. Unlike Python, there are no methods that can be overridden to\nredefine equality.\n\nGo allows you to perform a type conversion from one struct to another _if the fields of both structs have the same\nnames, order, and types_.\n\n## Chapter 4: Blocks, Shadows, and Control Structures\n\nBLOCKS - Go lets you declare variables in lots of places. You can declare them outside of functions, as the parameters\nto functions, and as local variables within functions.\n\nEach place where a declaration occurs is called a _block_. Variables, constants, types and functions declared outside\nany functions are placed in the package module.\n\n`:=` reuses variables that are declared in the current block. When using `:=` make sure that you don't have any\nvariables from an outer scope on the left-hand side, unless you intend to shadow them.\n\nSometimes avoid using `:=` because it may make it unclear what variables are being used.\n\nThere is a `shadow` linter - a tool to detect shadowing.\n\nThe Universe Block - the block that contains all other blocks. Never redefine any of the identifiers in the universe\nblock (`true`, `false`, `string`, `int`, ...). If you accidentally do so, you will get some very strange behavior.\n\nIF - Go doesn't require you to put parenthesis around the condition. You can declare variables that are scoped to the\ncondition and to both the `if` and `else` blocks.\n\n```go\nif n := rand.Intn(10); n == 10\n```\n\nHaving this special scope is very handy, it lets you create variables that are available only where they are needed.\nOnce the series of `if/else` statements ends, `n` is undefined.\n\nFOR - Go has 4 formats of `for`.\n\n- C-style `for`\n- condition only `for`\n- infinite `for`\n- `for-range`\n\nWhen iterating over `map`, some runs may be identical. This is a security feature. In older Go versions, the iteration\norder was usually the same. People used to write code that the order was fixed, and this would break at weird times.\nRandom read, prevents _Hash DoS_ attack.\n\nWhen iterating over a string with `for-range` loop, it iterates over the runes, not the bytes. Whenever a `for-range`\nloop encounters a multibyte rune in a string, it converts the UTF-8 representation into a single 32-nit number and\nassigns it to the value.\n\nEvery time the `for-loop` iterates over your compound type, it copies the value from the compound type to the value\nvariable.\n\nSWITCH - like an `if` statement, you can declare a variable that is scoped to all the branches of the switch statement.\n\nIf you have a `switch` statement inside a `for loop`, and you want to break out of the `for loop`, put a label on\nthe `for` statement, and then do `break label`. If you don't use a label, Go assumes that you want to break out of the\ncase.\n\nYou can create a \"blank switch\" - this allows you to use any boolean comparison for each case. There isn't a lot of\ndifference between a series of `if/else` statements and a blank `switch`. Favor blank `switch` statements over `if/else`\nchains when you have multiple related cases. Using a `switch` makes the comparisons more visible and reinforces that\nthey are a related set of concerns.\n\nGOTO - Traditionally `goto` was dangerous because it could jump to nearly anywhere in a program (jump into/out of a\nloop, skip variable definitions, or into the middle of a set of statements in `if`). This made it difficult to\nunderstand what a goto-using program did.\n\nGo has a `goto` statement (most modern languages don't). You should still do what you can to avoid using it. Go forbids\njumps that skip over variable declarations and jumps that go into an inner or parallel block.\n\n## Chapter 5: Functions\n\n`main` - the starting point for every Go program.\n\nGo is a typed language, so you must specify the types of parameters. If a function returns a value, you must supply a\nreturn.\n\nGo doesn't have named and optional input parameters. If you want to emulate named and optional parameters, define a\nstruct that has fields that match the desired parameters, and pass the struct to your function.\n\nNot having named and optional parameters isn't a limitation. A function shouldn't have more than a few parameters, and\nnamed and optional parameters are mostly useful when a function has many inputs. If you find yourself in that situation,\nyour function is quite possibly too complicated.\n\nVariadic input - `func addTo(base int, vals ... int)` - must be the last parameter in the input parameter list.\n\nGo allows for multiple return values - `def divAndRemainder(numerator int, denominator int) (int, int, error)`. You can\npre-declare variables that you use within function to hold the return\nvalues: `def divAndRemainder(numerator int, denominator int) (result int, remainder int, err error)`. Name that is used\nfor a named returned value is local to the function - it doesn't enforce any name outside the function.\n\nIf you use named return values, you can use empty/blank/naked return - never use it. This returns the last values\nassigned to the named return values. It can be really confusing to figure out what value is actually returned.\n\nUse `_` whenever you don't need to read a value that is returned by a function.\n\nJust like in many other languages, functions in Go are values. Any function that has the exact same number and types of\nparameters and return values meets the type signature.\n\nAnonymous functions - they don't have a name. You don't have to assign them to a variable. You can write them inline and\ncall them immediately.\n\nFunctions declared inside functions are called _closures_. This is a computer science word that means that functions\ndeclared inside of functions are able to access and modify variables declared in the outer function.\n\nNot only you can use a closure to pass some function state to another function, you can also return a closure from a\nfunction.\n\n`defer` - used to release resources. Programs often create temporary resources, like files or network connections, that\nneed to be cleaned up. You can `defer` multiple closures in a Go function. They run last-in-first-out order - the last\ndefer registered runs first.\n\nIn Go, _defer_ statements delay the execution of the function or method or an anonymous method until the nearby\nfunctions returns. In other words, defer function or method call arguments evaluate instantly, but they don't execute\nuntil the nearby functions returns.\n\nA common pattern in Go is for a function that allocates a resource to also return a closure that cleans up the resource.\n\nEmpirical Software Engineering:\n> Of... eleven proposed characteristics, only two markedly influence complexity growth: the nesting depth and the lack\n> of structure.\n\nGo is _Call By Value_ - it means that when you supply a variable for a parameter to a function, Go always makes a copy\nof the value of the variable. Every type in Go is a value type. It is just that sometimes the value is pointer (map,\nslice).\n\n## Chapter 6: Pointers\n\nA pointer - a variable that holds the location in memory where a value is stored. Every variable is stored in one or\nmore contiguous memory locations - _addresses_.\n\n- `&` - the _address_ operator, returns the address of the memory location where the value is stored.\n- `*` - the _indirection_ operator, returns pointed-to value.\n\nExample pointer **type**: `*int`\n\nBefore de-referencing a pointer, you must make sure that the pointer is non-nil. Your program will panic if you attempt\nto de-reference a _nil_ pointer.\n\nJava, Python, JavaScript, and Ruby are pass-by-value (values passed to functions are copies) - just like Go. Every\ninstance of a class in these languages is implemented as a pointer. When a class instance is passed to a function or\nmethod, the value being copied is the pointer to the instance.\n\n> Immutable types are safer from bugs, easier to understand, and more ready for change. Mutability makes it harder to\n> understand what your program is doing, and much harder ro enforce contracts.\n\nThe lack of immutable declarations in Go might seem problematic, but the ability to choose between value and pointer\nparameter types addresses the issue.\n\nBe careful when using pointers in Go. They make it hard to understand data flow anc can create extra work for the\ngarbage collector. Rather than populating a struct by passing a pointer to it to function, have the function instantiate\nthe struct. The only time you should use pointer params to modify a variable is when the function expects an interface,\nYou see this pattern when working with JSON.\n\nThe time to pass a pointer into a function ~ 1ns. Passing a value into a function takes longer as the data gets larger,\n1ms for ~10MB data. So if data is large enough, there are performance benefits from using a pointer. On the other hand\nit does not pay off to use a pointer for small data (< 1MB), e.g. 100 byte data: 30ns (pointer) vs 10ns (copy value).\n\nPointers indicate mutability - be careful when using this pattern.\n\nAvoid using maps for input or return values (map is implemented as a pointer to a struct). Rather than passing map\naround, use a struct. Passing a slice to a function has even more complicated behavior: any modification to the contents\nis reflected, but use of _append_ is not reflected. As the only linear data structure, slices are often passed around in\nGo programs - by default you should assume that a slice is not modified by a function.\n\nGarbage - data that has no more pointers pointing to it. Once there are no more pointers pointing to some data, the\nmemory can be reused. If the memory isn't recovered, the program's memory usage would continue to grow until the\ncomputer run out of RAM. The job of a garbage collector is to automatically detect unused memory and recover it.\n\nThe Stack - consecutive block of memory, allocation fast and simple, local variables along parameters passed into a\nfunction stored on a stack. You have to know exactly how big it is at compile time. When the compiler determines that\nthe data can't be stored on the stack, the data the pointer points to _escapes_ the stack and the compiler stores the\ndata on the heap.\n\nThe Heap - memory managed by the garbage collector. Go's garbage collector favours lower latency (< 500ms, finish as\nquickly as possible) over throughput (find the most garbage possible in a single scan). If your program creates a lot of\ngarbage, the garbage collector will not find all the garbage during a cycle, slowing down the collector and increasing\nmemory usage.\n\nGo encourages you to use pointers sparingly. We reduce the workload of the garbage collector by making sure that as much\nas possible is stored on the stack.\n\n## Chapter 7: Types, Methods, and Interfaces\n\nGo is designed to encourage the best practices that are advocated by software engineers, avoiding inheritance while\nencouraging composition.\n\nMethods: `func (p Person) String() string`, `(p Person)` is like `self` or `this`, however it is non-idiomatic to\nuse `self` or `this`. This is called a _receiver_, usually should have a short name. Methods can not be overloaded. You\ncan't add methods to the types you don't control.\n\n- If method modifies the receiver, you _must_ use a pointer receiver\n- If method needs to handle _nil_ instances, you _must_ use a pointer receiver\n- If method doesn't modify the receiver, you can _use_ a value receiver\n\nWhen a type has any pointer receiver methods, a common practice is to be consistent and use pointer receivers for all\nmethods, even the ones that don't modify the receiver.\n\nDo not write getters/setters. Go encourages you to directly access a field. Reserve methods for business logic.\n\nDefining a user-defined type based on other type, makes code clearer by providing a name for a concept and describing\nthe kind of data that is expected (e.g. type `Percentage` vs `int`).\n\nGo doesn't have enumerations, instead it has `iota` - which allows you to assign an increasing value to a set of\nconstants. `iota` makes sense when you care about being able to differentiate between a set of values, and don't\nparticularly care what the value is behind the scenes. If the actual value matters, specify it explicitly.\n\nEmbedding - promote methods on the embedded type to the containing struct. Embedding support is rare in programming\nlanguages. Do not mislead embedding with inheritance, they are not the same. If the containing struct has fields/methods\nwith the same name, you need to use embedded field type to refer to the obscured fields/methods.\n\nThe real star of Go's design - implicit interfaces. `interface` literal lists all methods that must be implemented by a\nconcrete type to meet the interface. Interfaces are usually named with `er` endings (`io.Reader`, `io.Closer`\n, `json.Marshaller`, `http.Handler`).\n\nGo blends duck-typing and Java's interfaces. Implicit interfaces give the flexibility of changing implementation and\nmake it easier to understand whe the code is doing.\n\n> Interfaces specify what callers need. The client code defines the interface to specify what functionality it requires.\n\n**Accept interfaces, return structs.** The business logic invoked by your functions should be invoked via interfaces,\nbut the output of your functions should be a concrete type. Go encourages small interfaces.\n\nSometimes you need to say that a variable could store any value, Go uses `interface{}` to represent this. It matches\nevery type in Go. However, avoid this. Go was designed as a strongly typed language and attempts to work around this are\nunidiomatic.\n\nDependency injection - code should explicitly specify the functionality it needs to perform its task. Implicit\ninterfaces make dependency injection an excellent way to decouple your code.\n\n> \"Dependency Injection\" is a 25-dollar term for a 5-cent concept. [...] Dependency injection means giving an object its\n> instance variables. [...].\n\n> Dependency injection is basically providing the objects that an object needs (its dependencies) instead of having it\n> construct them itself. It's a very useful technique for testing, since it allows dependencies to be mocked or stubbed\n> out.\n\nUse `Wire` if you think writing dependency injection code by hand is too much work.\n\nGo is not Object-Oriented, nor functional, nor procedural. It is practical. It borrows concepts from many places with\nthe overriding goal of creating a language that is simple, readable, and maintainable by large teams for many years.\n\n## Chapter 8: Errors\n\nGo handles errors by returning a value of type `error` as the last return value for a function (convention). The Go\ncompiler requires that all variables must be read. Making errors returned values forces developers to either check and\nhandle error conditions or make it explicit that they are ignoring errors by using an underscore (`_`) for the returned\nerror value.\n\n`errors.New(\"denominator is 0\")` - error messages should not be capitalized nor should they end with punctuation or new\nline. Second option is to create error using `fmt.Errorf(\"denominator is 0\")`\n\n_sentinel errors_ - pattern, errors meant to signal that processing cannot continue due to a problem with the current\nstate. By convention, their names start with `Err`. Be sure you need a sentinel error beg=fore you define one. It is\npart of your public API and you have committed to it being available in all future backward-compatible releases.\n\n`error` is an interface, you can define your own errors that include additional information for logging or error\nhandling. Even when you define your own custom error types, always use `error` as the return type for the error result.\nBe sure you don't return an uninitialized instance (`var genErr StatusErr`), instead, explicitly return `nil`.\n\n_Wrapping the error_ - when you preserve an error while adding additional information. When you have a series of wrapped\nerrors, it is called an _error chain_. You don't usually call `errors.Unwrap` directly. Instead, you use `errors.Is`\nand `errors.As` to find specific wrapped error. If you want to wrap an error with your custom error type, your error\ntype needs to implement the `Unwrap` method.\n\n- `errors.Is` - to check if the returned error or any error that it wraps match a specific sentinel error instance\n- `errors.As` - allows you to check if a returned error (or any error it wraps) matches a specific type\n\nIf there are situations in your programs that are unrecoverable, you can create your own panics. Go provides a way to\ncapture a panic to provide a more graceful shutdown or to prevent shutdown at all. Reserve `panic` for fatal situations\nuse `recover` as a way to gracefully handle these situations. If program panics, be careful about trying to continue\nexecuting after the panic.\n\n## Chapter 9: Modules, Packages, and Imports\n\nA module is the root of a Go library or application, stored in a repository. Modules consist one or more packages, which\ngive the module organization and structure.\n\nA collection of Go source code becomes a module when there is a valid `go.mod` file in its root directory\n-- `go mod init MODULE_PATH`. `MODULE_PATH` - globally unique name that identifies your module (e.g. github link).\n\nGo uses capitalization to determine if a package-level identifier is visible outside the package where it is declared.\nAnything you export is part of your package's API. Be sure you want to expose certain things to clients. Document all\nexported identifiers and keep the backward-compatible.\n\nAs a general rule, make the name of the package match the name of the directory that contains the package. Package names\nshould be descriptive. Don't repeat name in a function and package (`extract.Names` > `extract.ExtractNames`).\n\nIf your code is small -- kep it in a single package. Introduce packages ac codebase grows.\n\nIn case of conflicting names, you can alias an import (`import crand \"crypto/rand`). Usage of `.` (imports all\nidentifiers into the current package's namespace) is discouraged -- like usage of `*` in Python.\n\nGo has its own format form writing comments that are automatically converted into documentation -- `godoc` format. Place\nthe documentation directly above the item being documented. Start the comment with the name of the item. Use a blank\ncomment to break comment into multiple paragraphs. Use indenting.\n\n`go doc PACKAGE_NAME.IDENTIFIER_NAME` - views `godoc`.\n\nWhen you create a package called `internal`, the exported identifiers are only accessible to the direct parent of\ninternal and the sibling packages of `internal`.\n\nYou might want to rename or move some identifiers -- to avoid backward-breaking change, don't remove the original\nidentifiers, provide an alternate name instead (`type Bar = Foo`).\n\nSemVer - semantic versioning: _major_._minor_._patch_:\n\n- `patch` - incremented when fixing a bug\n- `minor` - incremented when a new, backward-compatible feature is added\n- `major` - incremented when making a change that breaks backward compatibility\n\nThe import compatibility rule says that all minor and patch versions of a module must be backward-compatible. If they\naren't it is a bug.\n\n`pkg.go.dev` - a single service that gathers together documentation of Go modules.\n\n## Chapter 10: Concurrency in Go\n\nConcurrency - the CS term for breaking up a single process into independent components and specifying how these\ncomponents safely share data. Most languages provide concurrency via a library that uses OS-level threads that share\ndata by attempting to acquire locks. Go is different, and is based on Communicating Sequential Processes.\n\n_Concurrency is not parallelism._ Concurrency is a tool to better structure the problem you are solving - whether\nconcurrent code runs in parallel depends on the hardware and if the algorithm allows it.\n\nWhether you should use concurrency depends on how data flows through the steps in your program. Concurrency isn't free,\nit may come with a huge overhead. That's why concurrent code is used for I/O -- a lot of waiting, we can do different\ntimes in the meantime.\n\n`goroutine` - the core concept in GO's concurrency model. Lightweight processes, managed by the Go runtime. Faster to\ncreate than thread creation (no system-level resources). Small initial stack size, smaller than thread stack -- grows as\nneeded. Switching between _goroutines_ is faster because it happens within the process.\n\n- process - an instance of a program that is being run\n- threads - a process is composed of one or more threads, a thread is a unit of execution that is given some time to run\n  by the OS, threads within a process share resources\n\nGo is able to spawn even tens of thousands of simultaneous _goroutines_. Any function can be launched as a _goroutine_.\n\nGoroutines communicate using _channels_ (`ch := manke(chan int)`) - channels are reference types. Use `<-` to interact\nwith a channel (read `<-chan`, write `chan<-`). Each value written to a channel can be read once. If multiple goroutines\nare reading from the same channel, a value will be read by only of them.\n\nBy default, channels are unbuffered - every write to an open, unbuffered channel causes the writing goroutine to pause\nuntil another goroutine reads from the same channel. Buffered channels (`ch := make(chan int, 10)`) - these channels\nbuffer a limited number of writes without blocking. Most of the time, use unbuffered channels.\n\nAny time you are reading from a channel that might be closed, use the comma ok idiom to ensure that the chanel is stil\nopen.\n\n`select` - the control structure for concurrency in Go, solves _starvation_ problem. Checks if any of its cases can be\nprocessed, the deadlock is avoided. Select is often embedded within a for-loop.\n\nConcurrency practices and patterns:\n\n1. Keep your APIS concurrency-free - never export channels or mutexes in your API.\n2. Goroutines, for Loops, and Varying Variables - any time goroutine uses a variable whose value might change, pass the\n   current value of the variable into the goroutine.\n3. Always clean up your goroutines - make sure that it will eventually exit. If a goroutine doesn't exit, the scheduler\n   will periodically give it time to do nothing.\n4. The Done Channel Pattern - provides a way to signal a goroutine that it's time to stop processing. It uses a channel\n   to signal that it is time to exit.\n5. Using a cancel function to terminate a goroutine - return a cancellation function alongside the channel.\n6. When to use buffered and unbuffered channels - buffered channels are useful when you know how many goroutines you\n   have launched, want to limit the number of goroutines you will launch, or want to limit the amount of work that is\n   queued up.\n7. Backpressure - systems perform better when their components limit the amount of work they are willing to perform. We\n   can use buffered channel and a select statement to limit the number of simultaneous requests in a system.\n8. Turning off a case in a select - if one of the cases in a _select_ is reading a closed channel, it will always be\n   successful. Use a `nil` channel to disable a case, set the channel's variable to `nil` and then `continue`.\n9. How to time out code - use `case <- time.After(2 * time.Second):`.\n10. Using WaitGroups - sometime some goroutine needs to wait for multiple goroutines to complete their work. If you are\n    waiting for a single goroutine, you can use the done channel pattern that we saw earlier. But if you are waiting gon\n    several goroutines, you need to use a `WaitGroup`.\n11. Running code exactly once - `sync.Once` - a handy type that enables this functionality.\n12. Putting our concurrent tools together - by structuring our code with goroutines, channels and select statements, we\n    separate the individual parts to run and complete in any order and cleanly exchange data between the dependant\n    parts.\n\n`mutex` - mutual exclusion, the job of a mutex is to limit the concurrent execution of some code or access to a shared\npiece of data.This protected part is called the _critical section_.\n\n> Share memory by communicating, do not communicate by sharing memory.\n\nDecision tree - use channels or mutexes:\n\n- If you are coordinating goroutines or tracking a value as it is transformed by a series of goroutines, use channels\n- If you are sharing access to a field in a struct, use mutexes\n- If you discover a critical performance issue when using channels, and you cannot find any other way to fix the issue,\n  modify your code to use a mutex\n\n## Chapter 11: The Standard Library\n\nLike Python, Go has \"batteries included\" philosophy - it provides many of the tools that you need to build an\napplication.\n\n`io` - contains one of the most useful interfaces - `io.Writer` and `io.Reader`.\n\n`time` - two main types used to represent time - `time.Duration` (used to represent a period of time,\ne.g.: `2 * time.Hour`) and `time.Time` (used to represent a moment of time). It is possible to extract month, day, year,\n... from `Time`.\n\nMost OS keep track of two different sorts of time:\n\n- the wall clock - current time\n- monotonic clock - counts up from the time the computer was booted\n\n`encoding/json` - Go includes support for converting Go data types to and from JSON.\n\n- marshalling - Go data type -> encoding\n- unmarshalling - encoding -> Go data type\n\nWe specify the rules for processing our JSON with _struct tags_, strings that are written after the fields in a\nstruct (`tagName: \"tagValue\"`, e.g.: `json:\"id\"`).\n\n`net/http` - a production of quality HTTP/2 client and server.\n\n- Client - make HTTP requests and receive HTTP responses\n- Server - responsible for listening for HTTP requests\n\nEven though Go provide the server, use idiomatic third-party modules to enhance the server.\n\n## Chapter 12: The Context\n\nServers need a way to handle metadata on individual requests. Go uses a construct called the context.\n\nContext - an instance that meets the Context interface. An empty context is a starting point: each time you add metadata\nto the context, you do so by wrapping the existing context using one of the factory functions in the context package.\n\nCancellation - a request that spawns several goroutines, each one calling a different HTTP service. If one service\nreturns an error that prevents you from returning a valid response, there is no point in continuing to process the other\ngoroutines. In go this is called _cancellation_.\n\nThere are 4 things a server can do to manage its load:\n\n- Limit simultaneous requests\n- Limit how many requests are queued waiting to run\n- Limit how long a request can run\n- Limit the resources a request can use\n\nGo provides tools to handle the first three - first two -> limit number of goroutines, the context provides a way to\ncontrol how long a request runs.\n\nThe context provides a way to pass per-request metadata through your program.\n\n## Chapter 13: Writing Tests\n\nGo includes testing support as part of its standard library. The `testing` package provides the types anf unctions to\nwrite tests, while the `go test` tool runs your tests and generates reports.\n\nGo tests are placed in the same directory and the same package as the production code. Tests are able to access and test\nun-exported functions and variables. If you want to test just the public API, Go has a special convention for this.\nUse `packagename_test` for the package name.\n\nEvery test written in a file whose name ends with `_test.go`. Test functions start with the word `Test` and take in a\nsingle parameter of type `*testing.T`.\n\nIt is possible to write set-up and tear-down code.\n\nUse `go-cmp` (third-party module) in order to compare two instances of a compound type.\n\nAdding the `-cover` flag to the `go test` command calculates coverage information and includes a summary in the test\noutput. `-coverprofile=c.out` saves the coverage infor to a file. `-html=c.out` generates an HTML representation of your\nsource code coverage.\n\n> Code coverage is necessary, but it is not sufficient. You can have 100% code coverage and still have bugs in your\n> code.\n\n> When your code depends on abstractions, it is easier to write unit tests.\n\nA stub returns a canned value for given output, whereas a mock validates that a set of calls happen in the expected\norder with the expected inputs.\n\n`httpest` package to make it easier to stub HTTP services. Even though `httptest` provides a way to avoid testing\nexternal services, you should still write _integration_ tests - automated tests that connect to other services. These\nvalidate your understanding of the service's APIs is correct.The challenge is figuring out how to group your automated\ntests - you want to run integration tests when the support environment is present. Also, integration tests tend to be\nslower than unit tests, so they usually run less frequently.\n\nGo includes a _race checker_ - it helps to find accidental references to a variable from two different goroutines\nwithout acquiring a lock. It is not guaranteed to find every single data race in your code, but if it finds one, you\nshould put proper locks around what it finds. Do not solve race conditions by inserting \"sleeps\" into the code.\n\n## Chapter 14: Here There Be Dragons: Reflect, Unsafe, and Cgo\n\nGo is a safe language, but sometimes your Go programs need to venture out into less defined areas.\n\nReflection allows us to examine types at runtime. It also provides the ability to examine, modify, and create variables,\nfunctions, and structs at runtime.\n\n- `database/sql` - uses reflection to send requests to databases and read data back\n- `text/template` and `html/template` - use reflection to process the values that are passed to the templates\n- `fmt` - uses reflection to detect the type of the provided parameters\n- `errors` - uses reflection to implement `errors.Is` and `errors.As`\n- `sort` - uses reflection to implement functions that sort and evaluate slices of any type\n\nMost of these examples have one thing in common - they involve accessing and formatting data that is being imported into\nor exported out of Go program.\n`reflect` package is build around 3 core concepts:\n\n- `types` - `reflect.TypeOf` returns a value of type `reflect.Type`, which represents the type of variable passed into\n  the function\n- `kinds` - `Kind` method on `reflect.Type` returns a value of type `reflect.Kind`, which is a constant that says what\n  the type is made of - a slice, a map, a pointer, a struct, an interface, an array, a function, an int, ...\n- `values` - we can use `reflec.ValueOf` to create a `reflect.Value` instance that represents the value of a variable\n\nOther use cases:\n\n- use reflection to check if an interface's value is nil\n- use reflection to write a data marshaller\n- use reflection to automate repetitive tasks, e.g. create a new function without writing repetitive code\n\nWhile reflection is essential when converting data at the boundaries of Go, be careful using it in other situations.\n\n`unsafe` - allows to manipulate memory. Very small and very odd. There are 2 common patterns in `unsafe` code:\n\n- conversion between 2 types of variables that are normally not convertable\n- reading/modifying the bytes in a variable\n\nThe majority of _unsafe_ usages were motivated by integration with operating systems and C code. Developers also\nfrequently use _unsafe_ to write more efficient Go code.\n\nThe _unsafe_ package is powerful and low-level! Avoid using it unless you know what you are doing, and you need the\nperformance improvements that it provides.\n\nNearly every programming language provides a way to integrate with C libraries. Go calls its FFI (Foreign Function\nInterface) to C `cgo`. `cgo` is for integration, not performance. `cgo` isn't fast, and it is not easy to use for\nnontrivial programs, the only reason to use `cgo` is if there is a C library that you must use and there is no suitable\nGo replacement.\n\n## Chapter 15: A Look at the Future: Generics in Go\n\nGenerics reduce repetitive code and increase type safety. Generics is the concept that it is sometimes useful to write\nfunctions where the specific type of parameter or field is specified when it is used.\n\nMany common algorithms, such as map, reduce, and filter had to be reimplemented for different types.\n\n> Properly written, Go is boring... well-written Go programs tend to be straightforward and sometimes a bit repetitive.\n"
  },
  {
    "path": "books/hands-on-ml.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems\n\nBook by Aurelien Geron\n\n[TOC]\n\nTODO: *Re-read Part I.*\n\n## Chapter 10: Introduction to Artificial Neural Networks with Keras\n\nANNs - Artificial Neural Networks - inspired by the networks of biological neurons, have gradually become quite\ndifferent from their biological cousins.\n\nANNs introduced in 1943 by McCulloch and Pitts - a simplified computational model of how biological neurons might work\ntogether in animal brains to perform complex computation using propositional logic.\n\nMcCulloch and Pitts proposed an artificial neuron that has one or more binary inputs (on/off) and one binary output. The\nartificial neuron activates its output when more than a certain number of its inputs are active. They showed that even\nsuch simplified model is possible capable of performing various logical computations.\n\nThe Perceptron - is one of the simplest ANN architectures, invented in 1957. It is based on slightly different\nartificial neuron - threshold logic unit or linear threshold unit. The inputs and outputs are numbers (instead of binary\non/off values) and each input connection is associated with a weight. The TLU computes a weighted sum of its inputs,\nthen applies a step function to that sum and outputs the result. Most commonly used step function is the Heaviside step\nfunction.\n\nA single TLU can be used for simple linear binary classification.\n\nA perceptron is composed of a single layer of TLUs, each TLU connected to all inputs (when all the neurons are connected\nto every neuron in the previous layer, the layer is called a fully connected layer). The inputs of the Perceptron are\nfed to special passthrough neurons from the input layer. Extra bias feature is generally added (neuron that always\noutput 1). A Perceptron with 2 inputs and three outputs can classify instances simultaneously into three different\nbinary classes - multi-output classifier.\n\nHow perceptron is trained? \"Cells that fire together, wire together\" - the connection between weight between 2 neurons\ntend to increase when they fire simultaneously (Hebb's rule). The perceptron is fed one example at a time, when it\noutputs wrong answer, it reinforces the connection weights from the inputs that would have contributed to the correct\nanswer.\n\nIn fact single Perceptron is similar to SDGClassifier.\n\nBack-propagation training algorithm - it is Gradient Descent using an efficient technique for computing the gradients\nautomatically in just 2 passes through network - one forward, one backward. It can find out how each connection weight\nand each bias term should be tweaked in order to reduce the error.\n\nIn other words: for each training instance, the back propagation algorithm first makes a prediction (forward pass) and\nmeasures the error, then goes through each layer in reverse to measure the error contribution from each connection (\nreverse pass) and finally tweaks the connection weights to reduce error (Gradient Descent step).\n\nWhen building MLP for regression you don't want to use any activation function for the output neurons, so they are free\nto output any range of values. If output needs to be always positive ReLU can be used in the output layer.\n\nThe loss function to use during training is typically the mean squared error, but if there are many outliers in\nthe training set, mean absolute error might be a better choice.\n\nMLP can be used also for classification.\n\nTensorflow 2 adopted Keras' high-level API + introduced some additional functionalities.\n\n**Sequential API** - the simplest kind of Keras model for neural networks that are just composed of a single stack of layers\nconnected sequentially. Flatten - preprocessing layer whose role is to convert each input into 1D array. Once model is\ndefined it needs to be compiled - you need to specify loss function and optimiser to use, optionally list of metrics can\nbe passed. Then model can be trained.\n\nIf the training set is very skewed, with some classes being overrepresented and others underrepresented, it would be\nuseful to set the class_weight argument when calling the fit method.\n\nIf you are not satisfied with model's performance - adjust hyperparametrs if longer training is not bringing any\nadditional benefits.\n\nModel estimates probabilities per class.\n\nWhen layers are created they are called like functions -> `keras.layers.Dense(30)(prev_layer)` - This is why it is\ncalled the **Functional API**, this is the way of telling Keras how to join layers.\n\nA model can have multiple inputs and multiple outputs, depending on the task.\n\nSequential API and Functional API are declarative, for more declarative programming style is **Subclassing API**. Simply\nsubclass the `Model` class, create layers in the constructor and use them to perform computations in the `call` method.\nSubclassing API is very limited, it does not allow viewing model's summary, also Keras na not inspect the model ahead of\ntime. So Sequential and Functional APIs are preferred.\n\nIt is possible to save and load Keras model to/from disk. Keras will use HDF5 format to save model's architecture and\nall the values of all the model parameters for every layer (weights and bias). When training enormous model, it is a\ngood idea to save checkpoints at regular intervals during training to avoid loosing everything if computer crashes. In\norder to make checkpoints you have to use callbacks.\n\n \n"
  },
  {
    "path": "books/head-first-design-patterns/ch_01_strategy.py",
    "content": "class FlyBehavior:\n    def fly(self) -> None:\n        raise NotImplementedError\n\n\nclass QuackBehavior:\n    def quack(self) -> None:\n        raise NotImplementedError\n\n\nclass Duck:\n    def __init__(self, fly_behavior: FlyBehavior, quack_behavior: QuackBehavior) -> None:\n        self.fly_behavior = fly_behavior\n        self.quack_behavior = quack_behavior\n\n    def perform_fly(self) -> None:\n        self.fly_behavior.fly()\n\n    def perform_quack(self) -> None:\n        self.quack_behavior.quack()\n\n    def display(self) -> None:\n        raise NotImplementedError\n\n\nclass FlyWithWings(FlyBehavior):\n    def fly(self) -> None:\n        print(\"I am using wings!\")\n\n\nclass FlyNoWay(FlyBehavior):\n    def fly(self) -> None:\n        print(\"I am not flying.\")\n\n\nclass Quack(QuackBehavior):\n    def quack(self) -> None:\n        print(\"QUACK\")\n\n\nclass Squeak(QuackBehavior):\n    def quack(self) -> None:\n        print(\"SQUEAK\")\n\n\nclass MuteQuack(QuackBehavior):\n    def quack(self) -> None:\n        print(\"<SILENCE>\")\n\n\nclass MallardDuck(Duck):\n    def __init__(self) -> None:\n        super().__init__(FlyWithWings(), Quack())\n\n    def display(self) -> None:\n        print(\"Looks like a mallard.\")\n\n\nduck = MallardDuck()\nduck.display()\nduck.perform_fly()\nduck.perform_quack()\n"
  },
  {
    "path": "books/head-first-design-patterns/ch_02_observer.py",
    "content": "class Observer:\n    def update(self) -> None:\n        raise NotImplementedError\n\n\nclass Subject:\n    def register_observer(self, observer: Observer) -> None:\n        raise NotImplementedError\n\n    def remove_observer(self, observer: Observer) -> None:\n        raise NotImplementedError\n\n    def notify_observers(self) -> None:\n        raise NotImplementedError\n\n\nclass DisplayElement:\n    def display(self) -> None:\n        raise NotImplementedError\n\n\nclass WeatherData(Subject):\n    def __init__(self):\n        self._observers = []\n        self.temperature = 0.0\n        self.humidity = 0.0\n        self.pressure = 0.0\n\n    def register_observer(self, observer: Observer) -> None:\n        self._observers.append(observer)\n\n    def remove_observer(self, observer: Observer) -> None:\n        self._observers.remove(observer)\n\n    def notify_observers(self) -> None:\n        for observer in self._observers:\n            observer.update()\n\n    def set_measurements(self, temperature: float, humidity: float, pressure: float) -> None:\n        self.temperature = temperature\n        self.humidity = humidity\n        self.pressure = pressure\n        self.notify_observers()\n\n\nclass CurrentConditionsDisplay(Observer, DisplayElement):\n    def __init__(self, weather_data: WeatherData):\n        self._temperature = 0.0\n        self._humidity = 0.0\n        self._weather_data = weather_data\n        self._weather_data.register_observer(self)\n\n    def display(self) -> None:\n        print(f\"Current conditions: {self._temperature}°C, {self._humidity}%\")\n\n    def update(self) -> None:\n        self._temperature = self._weather_data.temperature\n        self._humidity = self._weather_data.humidity\n        self.display()\n\n\nclass AvgTempDisplay(Observer, DisplayElement):\n    def __init__(self, weather_data: WeatherData):\n        self._temperature = []\n        self._weather_data = weather_data\n        self._weather_data.register_observer(self)\n\n    def display(self) -> None:\n        print(f\"Average temperature: {sum(self._temperature) / len(self._temperature)}°C\")\n\n    def update(self) -> None:\n        self._temperature.append(self._weather_data.temperature)\n        self.display()\n\n\nweather_data = WeatherData()\ncurrent_display = CurrentConditionsDisplay(weather_data)\nforecast_display = AvgTempDisplay(weather_data)\n\nweather_data.set_measurements(23.0, 68.1, 1018.0)\nweather_data.set_measurements(24.2, 70.4, 1019.2)\nweather_data.set_measurements(25.8, 71.2, 1018.4)\n"
  },
  {
    "path": "books/head-first-design-patterns/ch_03_decorator.py",
    "content": "class Beverage:\n    @property\n    def description(self) -> str:\n        return self.__class__.__name__\n\n    @property\n    def cost(self) -> float:\n        raise NotImplementedError\n\n\nclass CondimentDecorator(Beverage):\n    def __init__(self, beverage: Beverage):\n        self._beverage = beverage\n\n    @property\n    def description(self) -> str:\n        return f\"{self._beverage.description}, {super(CondimentDecorator, self).description}\"\n\n    @property\n    def cost(self) -> float:\n        raise NotImplementedError\n\n\nclass Espresso(Beverage):\n    @property\n    def cost(self) -> float:\n        return 1.99\n\n\nclass HouseBlend(Beverage):\n    @property\n    def cost(self) -> float:\n        return 0.89\n\n\nclass Mocha(CondimentDecorator):\n    @property\n    def cost(self) -> float:\n        return self._beverage.cost + 0.20\n\n\nclass Soy(CondimentDecorator):\n    @property\n    def cost(self) -> float:\n        return self._beverage.cost + 0.15\n\n\nbeverage = Espresso()\nbeverage = Mocha(beverage)\nbeverage = Mocha(beverage)\nbeverage = Soy(beverage)\nprint(f\"${beverage.cost} for '{beverage.description}'\")\n"
  },
  {
    "path": "books/head-first-design-patterns/ch_04_factory.py",
    "content": "class Ingredient:\n    def __init__(self):\n        print(self.__class__.__name__)\n\n\nclass ThinCrustDough(Ingredient):\n    pass\n\n\nclass ThickCrustDough(Ingredient):\n    pass\n\n\nclass MarinaraSauce(Ingredient):\n    pass\n\n\nclass PlumTomatoSauce(Ingredient):\n    pass\n\n\nclass MozzarellaCheese(Ingredient):\n    pass\n\n\nclass ReggianoCheese(Ingredient):\n    pass\n\n\nclass Garlic(Ingredient):\n    pass\n\n\nclass Onion(Ingredient):\n    pass\n\n\nclass Mushroom(Ingredient):\n    pass\n\n\nclass SlicedPepperoni(Ingredient):\n    pass\n\n\nclass FreshClams(Ingredient):\n    pass\n\n\nclass FrozenClams(Ingredient):\n    pass\n\n\nclass PizzaIngredientFactory:\n    def create_dough(self):\n        raise NotImplementedError\n\n    def create_sauce(self):\n        raise NotImplementedError\n\n    def create_cheese(self):\n        raise NotImplementedError\n\n    def create_veggies(self):\n        raise NotImplementedError\n\n    def create_pepperoni(self):\n        raise NotImplementedError\n\n    def create_clam(self):\n        raise NotImplementedError\n\n\nclass NYPizzaIngredientFactory(PizzaIngredientFactory):\n    def create_dough(self):\n        return ThinCrustDough()\n\n    def create_sauce(self):\n        return MarinaraSauce()\n\n    def create_cheese(self):\n        return ReggianoCheese()\n\n    def create_veggies(self):\n        return [Garlic(), Onion()]\n\n    def create_pepperoni(self):\n        return SlicedPepperoni()\n\n    def create_clam(self):\n        return FreshClams()\n\n\nclass ChicagoPizzaIngredientFactory(PizzaIngredientFactory):\n    def create_dough(self):\n        return ThickCrustDough()\n\n    def create_sauce(self):\n        return PlumTomatoSauce()\n\n    def create_cheese(self):\n        return MozzarellaCheese()\n\n    def create_veggies(self):\n        return [Garlic(), Mushroom()]\n\n    def create_pepperoni(self):\n        return SlicedPepperoni()\n\n    def create_clam(self):\n        return FrozenClams()\n\n\nclass Pizza:\n    name = ...\n\n    def __init__(self, ingredient_factory: PizzaIngredientFactory):\n        self._ingredient_factory = ingredient_factory\n\n    def prepare(self) -> None:\n        raise NotImplementedError\n\n    def bake(self) -> None:\n        print(\"Bake for 25 minutes at 350\")\n\n    def cut(self) -> None:\n        print(\"Cutting the pizza into diagonal slices\")\n\n    def box(self) -> None:\n        print(\"Place the pizza in official PizzaStore box\")\n\n\nclass CheesePizza(Pizza):\n    def prepare(self) -> None:\n        print(f\"Preparing {self.name}\")\n        self._ingredient_factory.create_dough()\n        self._ingredient_factory.create_sauce()\n        self._ingredient_factory.create_cheese()\n\n\nclass ClamPizza(Pizza):\n    def prepare(self) -> None:\n        print(f\"Preparing {self.name}\")\n        self._ingredient_factory.create_dough()\n        self._ingredient_factory.create_sauce()\n        self._ingredient_factory.create_cheese()\n        self._ingredient_factory.create_clam()\n\n\nclass PizzaStore:\n    def order_pizza(self, pizza_type: str) -> Pizza:\n        pizza = self.create_pizza(pizza_type)\n\n        pizza.prepare()\n        pizza.bake()\n        pizza.cut()\n        pizza.box()\n\n        return pizza\n\n    # Factory Method:\n    def create_pizza(self, pizza_type: str) -> Pizza:\n        raise NotImplementedError\n\n\nclass NYPizzaStore(PizzaStore):\n    def create_pizza(self, pizza_type: str) -> Pizza:\n        ingredient_factory = NYPizzaIngredientFactory()\n\n        match pizza_type:\n            case \"cheese\":\n                pizza = CheesePizza(ingredient_factory)\n                pizza.name = \"NY Style Sauce and Cheese Pizza\"\n            case \"clam\":\n                pizza = ClamPizza(ingredient_factory)\n                pizza.name = \"NY Style Sauce and Clam Pizza\"\n            case _:\n                raise RuntimeError(\"Unknown pizza type\")\n\n        return pizza\n\n\nclass ChicagoPizzaStore(PizzaStore):\n    def create_pizza(self, pizza_type: str) -> Pizza:\n        ingredient_factory = ChicagoPizzaIngredientFactory()\n\n        match pizza_type:\n            case \"cheese\":\n                pizza = CheesePizza(ingredient_factory)\n                pizza.name = \"Chicago Style Deep Dish Cheese Pizza\"\n            case \"clam\":\n                pizza = ClamPizza(ingredient_factory)\n                pizza.name = \"Chicago Style Deep Dish Clam Pizza\"\n            case _:\n                raise RuntimeError(\"Unknown pizza type\")\n\n        return pizza\n\n\nny_store = NYPizzaStore()\nny_store.order_pizza(\"cheese\")\n\nchicago_store = ChicagoPizzaStore()\nchicago_store.order_pizza(\"cheese\")\n"
  },
  {
    "path": "books/head-first-design-patterns/ch_05_singleton.py",
    "content": "class ChocolateBoiler:\n    _instance = None\n\n    def __new__(cls):\n        if not cls._instance:\n            cls._instance = super(ChocolateBoiler, cls).__new__(cls)\n        return cls._instance\n\n\nboiler_0 = ChocolateBoiler()\nboiler_1 = ChocolateBoiler()\n\nprint(f\"#0: {boiler_0}\")\nprint(f\"#1: {boiler_1}\")\nprint(f\"Are they the same object? {boiler_0 is boiler_1}\")\n\n\n# Implementation using variable - instantiated on module import:\nclass ChocolateBoiler:\n    pass\n\n\nchocolate_boiler = ChocolateBoiler()\nprint(f\"Are they the same object? {chocolate_boiler is chocolate_boiler}\")\n\n\n# Implementation using function - using 'attr':\ndef get_chocolate_boiler() -> ChocolateBoiler:\n    if not hasattr(get_chocolate_boiler, \"instance\"):\n        setattr(get_chocolate_boiler, \"instance\", ChocolateBoiler())\n    return getattr(get_chocolate_boiler, \"instance\")\n\n\nprint(f\"Are they the same object? {get_chocolate_boiler() is get_chocolate_boiler()}\")\n\n# Implementation using function - using variable:\n_chocolate_boiler = None\n\n\ndef get_chocolate_boiler() -> ChocolateBoiler:\n    global _chocolate_boiler\n\n    if not _chocolate_boiler:\n        _chocolate_boiler = ChocolateBoiler()\n\n    return _chocolate_boiler\n\n\nprint(f\"Are they the same object? {get_chocolate_boiler() is get_chocolate_boiler()}\")\n"
  },
  {
    "path": "books/head-first-design-patterns/ch_06_command.py",
    "content": "from typing import List\n\n\nclass Device:\n    @property\n    def name(self) -> str:\n        return self.__class__.__name__\n\n    def on(self) -> None:\n        print(f\"{self.name} was turned on\")\n\n    def off(self) -> None:\n        print(f\"{self.name} was turned off\")\n\n\nclass Light(Device):\n    pass\n\n\nclass Tv(Device):\n    pass\n\n\nclass Stereo(Device):\n    def __init__(self) -> None:\n        self.volume = 0\n\n    def set_cd(self) -> None:\n        print(f\"{self.name} CD set\")\n\n    def set_volume(self, volume: int) -> None:\n        print(f\"{self.name} Volume set to {volume}\")\n        self.volume = volume\n\n\nclass Command:\n    def execute(self) -> None:\n        raise NotImplementedError\n\n    def undo(self) -> None:\n        raise NotImplementedError\n\n\nclass NoCommand(Command):\n    def execute(self) -> None:\n        pass\n\n    def undo(self) -> None:\n        pass\n\n\nclass MarcoCommand(Command):\n    def __init__(self, commands: List[Command]):\n        self._commands = commands\n\n    def execute(self) -> None:\n        for command in self._commands:\n            command.execute()\n\n    def undo(self) -> None:\n        for command in self._commands[::-1]:\n            command.undo()\n\n\nclass DeviceOnCommand(Command):\n    def __init__(self, device: Device) -> None:\n        self._device = device\n\n    def execute(self) -> None:\n        self._device.on()\n\n    def undo(self) -> None:\n        self._device.off()\n\n\nclass DeviceOffCommand(Command):\n    def __init__(self, device: Device) -> None:\n        self._device = device\n\n    def execute(self) -> None:\n        self._device.off()\n\n    def undo(self) -> None:\n        self._device.on()\n\n\nclass StereoVolumeUpCommand(Command):\n    def __init__(self, stereo: Stereo) -> None:\n        self._stereo = stereo\n\n    def execute(self) -> None:\n        self._stereo.set_volume(stereo.volume + 1)\n\n    def undo(self) -> None:\n        self._stereo.set_volume(stereo.volume - 1)\n\n\nclass RemoteControl:\n    def __init__(self):\n        self._on_commands = [NoCommand()] * 7\n        self._off_commands = [NoCommand()] * 7\n        self._undo_commands = []\n\n    def set_command(self, slot: int, on_command: Command, off_command: Command) -> None:\n        self._on_commands[slot] = on_command\n        self._off_commands[slot] = off_command\n\n    def on_button_pushed(self, slot: int) -> None:\n        self._on_commands[slot].execute()\n        self._undo_commands.append(self._on_commands[slot])\n\n    def off_button_pushed(self, slot: int) -> None:\n        self._off_commands[slot].execute()\n        self._undo_commands.append(self._off_commands[slot])\n\n    def undo_button_pushed(self) -> None:\n        if not self._undo_commands:\n            return\n        self._undo_commands.pop().undo()\n\n\nlight = Light()\ntv = Tv()\nstereo = Stereo()\n\nlight_on_command, light_off_command = DeviceOnCommand(light), DeviceOffCommand(light)\ntv_on_command, tv_off_command = DeviceOnCommand(tv), DeviceOffCommand(tv)\nstereo_on_command, stereo_off_command = DeviceOnCommand(stereo), DeviceOffCommand(stereo)\n\nvolume_up_command = StereoVolumeUpCommand(stereo)\n\nparty_on_command = MarcoCommand([light_on_command, tv_on_command, stereo_on_command, volume_up_command])\nparty_off_command = MarcoCommand([light_on_command, tv_on_command, stereo_off_command])\n\nremote = RemoteControl()\nremote.set_command(0, light_on_command, light_off_command)\nremote.set_command(1, tv_on_command, tv_off_command)\nremote.set_command(2, stereo_on_command, stereo_off_command)\nremote.set_command(3, party_on_command, party_off_command)\n\nremote.on_button_pushed(1)\nremote.on_button_pushed(3)\nremote.undo_button_pushed()\n"
  },
  {
    "path": "books/head-first-design-patterns/ch_07_adapter.py",
    "content": "class Duck:\n    def quack(self) -> None:\n        raise NotImplementedError\n\n    def fly(self) -> None:\n        raise NotImplementedError\n\n\nclass Turkey:\n    def gobble(self) -> None:\n        raise NotImplementedError\n\n    def fly(self) -> None:\n        raise NotImplementedError\n\n\nclass WildTurkey(Turkey):\n    def gobble(self) -> None:\n        print(\"Gobble Gobble\")\n\n    def fly(self) -> None:\n        print(\"I am flying a short distance\")\n\n\nclass TurkeyAdapter(Duck):\n    def __init__(self, turkey: Turkey):\n        self._turkey = turkey\n\n    def quack(self) -> None:\n        self._turkey.gobble()\n\n    def fly(self) -> None:\n        self._turkey.fly()\n\n\n# We ran out of ducks, so we use turkeys:\nturkey = WildTurkey()\nturkey_adapter = TurkeyAdapter(turkey)\n\nturkey_adapter.quack()\n"
  },
  {
    "path": "books/head-first-design-patterns/ch_07_facade.py",
    "content": "from unittest.mock import Mock\n\n\nclass HomeTheaterFacade:\n    def __init__(self, amplifier, tuner, projector, lights, screen, player, popper):\n        self._amplifier = amplifier\n        self._tuner = tuner\n        self._projector = projector\n        self._lights = lights\n        self._screen = screen\n        self._player = player\n        self._popper = popper\n\n    # Wrap complex behavior into single method:\n    def watch_movie(self, movie):\n        self._popper.on()\n        self._popper.pop()\n\n        self._lights.dim(10)\n\n        self._screen.down()\n\n        self._projector.on()\n\n        self._amplifier.on()\n        self._amplifier.set_volume(20)\n\n        self._player.on()\n        self._player.play(movie)\n\n\nhome_theater = HomeTheaterFacade(*([Mock()] * 7))\nhome_theater.watch_movie(\"Joker\")\n"
  },
  {
    "path": "books/head-first-design-patterns/ch_08_template_method.py",
    "content": "class CaffeineBeverage:\n    def prepare_recipe(self) -> None:\n        self._boil_water()\n        self._brew()\n        self._pour_in_cup()\n        self._add_condiments()\n\n    def _boil_water(self) -> None:\n        print(\"Boiling water\")\n\n    def _pour_in_cup(self) -> None:\n        print(\"Pouring in a cup\")\n\n    def _brew(self) -> None:\n        raise NotImplementedError\n\n    def _add_condiments(self) -> None:\n        raise NotImplementedError\n\n\nclass Tea(CaffeineBeverage):\n    def _brew(self) -> None:\n        print(\"Steeping the tea\")\n\n    def _add_condiments(self) -> None:\n        print(\"Adding Lemon\")\n\n\nclass Coffee(CaffeineBeverage):\n    def _brew(self) -> None:\n        print(\"Dripping Coffee through filter\")\n\n    def _add_condiments(self) -> None:\n        print(\"Adding Sugar and Milk\")\n\n\nCoffee().prepare_recipe()\nTea().prepare_recipe()\n"
  },
  {
    "path": "books/head-first-design-patterns/ch_09_composite.py",
    "content": "from __future__ import annotations\n\nfrom abc import ABC\nfrom dataclasses import dataclass\n\n\nclass MenuComponent:\n    def add(self, menu_component: MenuComponent):\n        raise NotImplementedError\n\n    def remove(self, menu_component: MenuComponent):\n        raise NotImplementedError\n\n    def get_child(self, i: int):\n        raise NotImplementedError\n\n    def print(self):\n        raise NotImplementedError\n\n\n@dataclass\nclass MenuItem(MenuComponent, ABC):\n    name: str\n    description: str\n    vegetarian: bool\n    price: float\n\n    def print(self):\n        print(f\"{self.name}, {self.price}, {self.description}\")\n\n\nclass Menu(MenuComponent):\n    def __init__(self, name: str):\n        self._name = name\n        self._menu_components = []\n\n    def add(self, menu_component: MenuComponent):\n        self._menu_components.append(menu_component)\n\n    def remove(self, menu_component: MenuComponent):\n        self._menu_components.remove(menu_component)\n\n    def get_child(self, i: int):\n        return self._menu_components[i]\n\n    def print(self):\n        print(self._name)\n        for menu_component in self._menu_components:\n            menu_component.print()\n\n\nclass Waitress:\n    def __init__(self, menu_component: MenuComponent):\n        self._menu_component = menu_component\n\n    def print_menu(self):\n        self._menu_component.print()\n\n\nbreakfast_menu = Menu(\"BREAKFAST\")\ndinner_menu = Menu(\"DINNER\")\ndessert_menu = Menu(\"DESSERT\")\n\nall_menus = Menu(\"ALL MENUS\")\nall_menus.add(breakfast_menu)\nall_menus.add(dinner_menu)\n\ndinner_menu.add(MenuItem(\"Pasta\", \"Pasta with marinara Sauce\", True, 3.89))\ndinner_menu.add(dessert_menu)\n\ndessert_menu.add(MenuItem(\"Apple Pie\", \"Apple pie with a flaky crust, topped with vanilla ice cream\", True, 1.59))\n\nWaitress(all_menus).print_menu()\n"
  },
  {
    "path": "books/head-first-design-patterns/ch_09_iterator.py",
    "content": "from collections.abc import Iterator\nfrom dataclasses import dataclass\nfrom typing import (\n    Dict,\n    List,\n    Union,\n)\n\n\n@dataclass\nclass MenuItem:\n    name: str\n    description: str\n    vegetarian: bool\n    price: float\n\n\nclass DinnerMenuIterator(Iterator):\n    # Just for demonstration purposes!\n    def __init__(self, collection: List[MenuItem]):\n        self._collection = collection\n        self._position = 0\n\n    def __next__(self) -> MenuItem:\n        try:\n            value = self._collection[self._position]\n            self._position += 1\n        except IndexError:\n            raise StopIteration()\n\n        return value\n\n\nclass DinnerMenu:\n    # Just for demonstration purposes!\n    menu = [\n        MenuItem(\"Vegetarian BLT\", \"Fake Bacon with lettuce on whole wheat\", True, 2.99),\n        MenuItem(\"BLT\", \"Bacon with lettuce on whole wheat\", False, 2.99),\n        MenuItem(\"Soup of the day\", \"Soup of the day, with a side of potato salad\", False, 3.99),\n        MenuItem(\"HotDog\", \"A Hot Dog with sauerkraut, relish, onions, topped with cheese\", False, 3.05),\n    ]\n\n    def __iter__(self) -> DinnerMenuIterator:\n        # Factory Method\n        return DinnerMenuIterator(self.menu)\n\n\nclass BreakfastMenuIterator(Iterator):\n    # Just for demonstration purposes!\n    def __init__(self, collection: Dict[str, MenuItem]):\n        self._collection = collection\n        self._position = 0\n\n    def __next__(self) -> MenuItem:\n        try:\n            value = list(self._collection.values())[self._position]\n            self._position += 1\n        except IndexError:\n            raise StopIteration()\n\n        return value\n\n\nclass BreakfastMenu:\n    # Just for demonstration purposes!\n    menu = {\n        \"K&B's Pancake Breakfast\": MenuItem(\"K&B's Pancake Breakfast\", \"Pancakes with scrambled eggs and toast\", True, 2.99),\n        \"Regular Pancake Breakfast\": MenuItem(\"Regular Pancake Breakfast\", \"Pancakes with fried eggs, sausage\", False, 2.99),\n        \"Blueberry Pancakes\": MenuItem(\"Blueberry Pancakes\", \"Pancakes made with fresh blueberries\", True, 3.49),\n    }\n\n    def __iter__(self) -> BreakfastMenuIterator:\n        # Factory Method\n        return BreakfastMenuIterator(self.menu)\n\n\nclass Waitress:\n    def __init__(self, pancake_menu: BreakfastMenu, dinner_menu: DinnerMenu):\n        self._pancake_menu = pancake_menu\n        self._dinner_menu = dinner_menu\n\n    def print_menu(self):\n        print(\"BREAKFAST\")\n        self._print_menu(self._pancake_menu)\n        print(\"DINNER\")\n        self._print_menu(self._dinner_menu)\n\n    @staticmethod\n    def _print_menu(menu: Union[BreakfastMenu, DinnerMenu]):\n        for menu_item in menu:\n            print(f\"{menu_item.name}, {menu_item.price}, {menu_item.description}\")\n\n\nWaitress(BreakfastMenu(), DinnerMenu()).print_menu()\n"
  },
  {
    "path": "books/head-first-design-patterns/ch_10_state.py",
    "content": "from __future__ import annotations\n\nfrom random import random\n\n\nclass State:\n    def __init__(self, gumball_machine: GumballMachine):\n        self._gumball_machine = gumball_machine\n\n    def insert_quarter(self) -> None:\n        pass\n\n    def eject_quarter(self) -> None:\n        pass\n\n    def turn_crank(self) -> None:\n        pass\n\n    def dispense(self) -> None:\n        pass\n\n\nclass NoQuarterState(State):\n    def insert_quarter(self) -> None:\n        print(\"You inserted a quarter\")\n        self._gumball_machine.state = self._gumball_machine.has_quarter_state\n\n\nclass HasQuarterState(State):\n    def eject_quarter(self) -> None:\n        print(\"Quarter returned\")\n        self._gumball_machine.state = self._gumball_machine.no_quarter_state\n\n    def turn_crank(self) -> None:\n        print(\"You turned...\")\n\n        if random() < 0.1 and self._gumball_machine.count > 1:\n            self._gumball_machine.state = self._gumball_machine.winner_state\n        else:\n            self._gumball_machine.state = self._gumball_machine.sold_state\n\n\nclass SoldState(State):\n    def dispense(self) -> None:\n        self._gumball_machine.release_ball()\n\n        if self._gumball_machine.count > 0:\n            self._gumball_machine.state = self._gumball_machine.no_quarter_state\n        else:\n            print(\"Out of gumballs!\")\n            self._gumball_machine.state = self._gumball_machine.sold_out_state\n\n\nclass SoldOutState(State):\n    pass\n\n\nclass WinnerState(State):\n    def dispense(self) -> None:\n        self._gumball_machine.release_ball()\n\n        if self._gumball_machine.count == 0:\n            self._gumball_machine.state = self._gumball_machine.sold_out_state\n        else:\n            self._gumball_machine.release_ball()\n            print(\"You are a WINNER!\")\n\n            if self._gumball_machine.count > 0:\n                self._gumball_machine.state = self._gumball_machine.no_quarter_state\n            else:\n                print(\"Out of gumballs!\")\n                self._gumball_machine.state = self._gumball_machine.sold_out_state\n\n\nclass GumballMachine:\n    def __init__(self, count: int):\n        self.count = count\n\n        self.no_quarter_state = NoQuarterState(self)\n        self.has_quarter_state = HasQuarterState(self)\n        self.sold_state = SoldState(self)\n        self.sold_out_state = SoldOutState(self)\n        self.winner_state = WinnerState(self)\n\n        self.state = self.no_quarter_state if count > 0 else self.sold_out_state\n\n    def insert_quarter(self) -> None:\n        self.state.insert_quarter()\n\n    def eject_quarter(self) -> None:\n        self.state.eject_quarter()\n\n    def turn_crank(self) -> None:\n        self.state.turn_crank()\n        self.state.dispense()\n\n    def release_ball(self) -> None:\n        print(\"A ball rolling out the slot...\")\n        if self.count > 0:\n            self.count = self.count - 1\n\n\nmachine = GumballMachine(5)\n\nmachine.insert_quarter()\nmachine.turn_crank()\n\nmachine.insert_quarter()\nmachine.turn_crank()\n\nmachine.insert_quarter()\nmachine.turn_crank()\n"
  },
  {
    "path": "books/head-first-design-patterns/ch_11_virtual_proxy.py",
    "content": "class Icon:\n    @property\n    def width(self) -> int:\n        raise NotImplementedError\n\n    @property\n    def height(self) -> int:\n        raise NotImplementedError\n\n    def paint_icon(self) -> None:\n        raise NotImplementedError\n\n\nclass ImageIcon(Icon):\n    @property\n    def width(self) -> int:\n        return 1280\n\n    @property\n    def height(self) -> int:\n        return 720\n\n    def paint_icon(self) -> None:\n        print(\":)\")\n\n\nclass ImageProxy(Icon):\n    def __init__(self, url: str):\n        self._image_icon = None\n        self._url = url\n\n    # Following 'if' statements can be reworked to use The State Pattern: ImageNotLoaded and ImageLoaded\n    @property\n    def width(self) -> int:\n        return self._image_icon.width if self._image_icon else 600\n\n    @property\n    def height(self) -> int:\n        return self._image_icon.height if self._image_icon else 800\n\n    def paint_icon(self) -> None:\n        if not self._image_icon:\n            # Download image from the internet\n            print(f\"Downloading the image from '{self._url}'\")\n            self._image_icon = ImageIcon()\n        self._image_icon.paint_icon()\n\n\nimage = ImageProxy(\"whatever://image\")\nimage.paint_icon()\n"
  },
  {
    "path": "books/head-first-design-patterns/notes.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Head First Design Patterns: Building Extensible and Maintainable Object-Oriented Software\n\nBook by Eric Freeman and Elisabeth Robson\n\nCode here: [click](.)\n\n- [Chapter 1: The Strategy Pattern - Welcome to Design Patterns](#chapter-1-welcome-to-design-patterns)\n- [Chapter 2: The Observer Pattern - Keeping your Objects in the Know](#chapter-2-keeping-your-objects-in-the-know)\n- [Chapter 3: The Decorator Pattern - Decorating Objects](#chapter-3-decorating-objects)\n- [Chapter 4: The Factory Pattern - Baking with OO Goodness](#chapter-4-baking-with-oo-goodness)\n- [Chapter 5: The Singleton Pattern - One-of-a-kind Objects](#chapter-5-one-of-a-kind-objects)\n- [Chapter 6: The Command Pattern - Encapsulating Invocation](#chapter-6-encapsulating-invocation)\n- [Chapter 7: The Adapter and Facade Patterns - Being Adaptive](#chapter-7-being-adaptive)\n- [Chapter 8: The Template Method Pattern - Encapsulating Algorithms](#chapter-8-encapsulating-algorithms)\n- [Chapter 9: The Iterator and Composite Patterns - Well-Managed Collections](#chapter-9-well-managed-collections)\n- [Chapter 10: The State Pattern - The State of Things](#chapter-10-the-state-of-things)\n- [Chapter 11: The Proxy Pattern - Controlling Object Access](#chapter-11-controlling-object-access)\n- [Chapter 12: Compound patterns - Patterns of Patterns](#chapter-12-patterns-of-patterns)\n- [Chapter 13: Patterns in the Real World](#chapter-13-patterns-in-the-real-world)\n- [Chapter 14: Appendix - Leftover Patterns](#chapter-14-leftover-patterns)\n\n## Chapter 1: Welcome to Design Patterns\n\n[The Strategy Pattern - Pattern implementation in Python](https://github.com/pkardas/learning/blob/master/books/head-first-design-patterns/ch_01_strategy.py)\n\nSomeone has already solved your problems. You can exploit the wisdom and lessons learned by other developers who have\nbeen down the same design problems road and survived the trip. Instead of code reuse, with patterns you get experience\nreuse.\n\nExample with ducks, adding `fly` method to the `Duck` superclass turned out to introduce a bug to the `RubberDuck`\nsubclass. A localised update to the code caused a non-local side effect (flying rubber-duck).\n\n*Which of the following are disadvantages of using inheritance to provide Duck behaviour?*\n\n- My answer: [D] It is hard to gain knowledge of all duck behaviours. [F] Changes can unintentionally affect other\n  ducks.\n\n*What do YOU think about the design? What would you do if you were Joe?*\n\n- My answer: New features would require adding many interfaces, for example: interface for migrating birds. Maybe\n  instead, it would be better to have 2 types of ducks: Living and non-living and instead of introducing a single class\n  per duck, reuse classes and make them parametrised with a name.\n\nThere is one constant in software development. What is the one thing you can always count on in software development.\n**CHANGE**. No matter how well you design an application, over time an application must grow and change, or it will die.\n\n*List some reasons you have had to change code in your application*:\n\n- New definition of the operations process.\n- Better understanding of the domain.\n- Requirement to use worker instead of lambda.\n- New library for the JSON serialisation.\n\nWe know using inheritance hasn't worked out very well. The `Flyable` and `Quackable` interfaces sounded good at first.\nThere is a design principle:\n\n> Identify the aspects of your application that vary and separate them from what stays the same.\n\nAnother way to think about this principle: *take the parts that vary and encapsulate them, so that later you can alter\nor extend the parts that vary without affecting those that don't*.\n\nWe know that `fly` and `quack` are the parts of the Duck class that vary across ducks. We pull these methods out of the\nDuck class and create a new set of classes to represent each behaviour (FlyBehaviour, QuackBehaviour, ...). That way,\nthe Duck classes won't need to know any of the implementation details for their own behaviours.\n\n> Program to an interface, not an implementation. == Program to a supertype.\n\nProgramming to an implementation:\n\n```java\nDog d = new Dog();  // a concrete implementation of Animal \nd.bark()\n```\n\nProgramming to an interface/supertype:\n\n```java\nAnimal animal = new Dog();  // we knwo it is a Dog, but we can now use the animal reference polymorphically\nanimal.makeSound();\n```\n\n*Using our new design, what would you do if you needed to add rocket-powered flying to the SimUDuck app?*\n\n- My answer: Add a new implementation of the `FlyBehaviour`\n\n*Can you think of a class that might want to use the Quack behaviour that isn't a duck?*\n\n- My answer: Russian quacking machine\n\nA Duck will now delegate its flying and quacking behaviours, instead of using quacking and flying methods defined in the\nDuck class. To change a duck's behaviour at runtime, just call the duck's setter method for that behaviour.\n\nDesign principle:\n\n> Favour composition over inheritance.\n\nCreating systems using composition gives you a lot more flexibility. Not only does it let you encapsulate a family of\nalgorithms into their own set of classes, but it also lets you change behaviour at runtime. Composition Is used in many\ndesign patterns, and you will see a lot more about its advantages and disadvantages throughout the book.\n\n*A duck call is a device that hunters use to mimic the calls (quacks) of ducks. How would you implement your own duck\ncall that does not inherit from the Duck class?*\n\n- My answer: Compose a duck call of `QuackBehaviour`.\n\nI have just applied the **STRATEGY** pattern. **The Strategy Pattern** - defines a family of algorithms, encapsulates\neach one, and makes them interchangeable. Strategy lets the algorithm vary independently of clients that use it.\n\nDesign puzzle:\n\n- *KnifeBehaviour, BowAndArrowBehaviour, AxeBehaviour, SwordBehaviour* IMPLEMENT *WeaponBehaviour*\n- *Troll, Queen, King, Knight* EXTENDS *Character*\n- *Character* HAS-A *WeaponBehaviour*\n- `setWeapon` should be in *Character* class\n\nDesign Patterns give you a shared vocabulary with other developers. Once you have got the vocabulary, you can more\neasily communicate with other developers and inspire those who don't know patterns to start learning them. It also\nelevates your thinking about architectures by letting you think at the pattern level, not the nitty-gritty object level.\n\nThe power of a shared pattern vocabulary:\n\n- Shared pattern vocabularies are POWERFUL. When you communicate with another developer using patterns, you are\n  communicating not just a pattern name but a whole set of qualities, characteristics and constraints that the pattern\n  represents.\n- Patterns allow you to say more with less. Other developers can quickly know precisely the design you have in mind.\n- Talking at the pattern level allows you to stay *in the design* loner, without having to dive deep down to the\n  nitty-gritty details of implementing objects and classes.\n- Shared vocabularies can turbo-charge your team. A team well versed in design patterns can move quickly with less room\n  for misunderstanding.\n- Shared vocabularies encourage more junior developers to get up to speed.\n\nDesign patterns don't go directly into your code, they first go into your **brain**. Once you have loaded your brain\nwith a good working knowledge of patterns, you can then start to apply them to new designs, and rework your old code\nwhen you find it is degrading into inflexible mess.\n\nOO Basics: Abstraction, Encapsulation, Polymorphism, Inheritance\n\nOO Principles: Encapsulate what varies. Favour composition over inheritance. Program to interfaces, not implementations.\n\nBullet points:\n\n- Knowing the OO basics does not make you a good OO designer.\n- Good OO designs are reusable, extensible and maintainable.\n- Patterns show you how to build systems with good OO design qualities.\n- Patterns are proven OO experience.\n- Patterns don't give you code, they give you general solutions to design problems. You apply them to your specific\n  application.\n- Patterns aren't invented, they are discovered.\n- Most patterns and principles address issues of change in software.\n- Most patterns allow some part of a system to vary independently of all other parts.\n- We often try to take what varies in a system and encapsulate it.\n- Patterns provide language that can maximise the value of your communication with other developers.\n\n## Chapter 2: Keeping your Objects in the Know\n\n[The Observer Pattern - Pattern implementation in Python](https://github.com/pkardas/learning/blob/master/books/head-first-design-patterns/ch_02_observer.py)\n\nObserver Pattern: Pattern that keeps your objects in the know when something they care about happens.\n\nWeather-O-Rama, our task is to create an app that uses the WeatherData object to update 3 displays for current\nconditions weather stats and a forecast.\n\n*Based on our first implementation, which of the following apply?*\n\n- My answers: [A] We are coding to concrete implementations, not interfaces. [B] For every new display we will need to\n  alter this code. [C] We have no way to add or remove display elements at runtime. [E] We haven't encapsulated the part\n  that changes.\n\nYou know how newspaper or magazine subscriptions work:\n\n1. A newspaper publisher goes into business and begins publishing newspapers\n2. You subscribe to a particular publisher, and every time there is a new edition it gets delivered to you. As long as\n   you remain a subscriber, you get new newspapers.\n3. You unsubscribe when you don't want papers anymore, and they stop being delivered.\n4. While the publisher remains in business, people, hotels, airlines and other businesses constantly subscribe and\n   unsubscribe to the newspaper.\n\n> Publishers + Subscribers = Observer Pattern\n\nThe Observer Pattern:\n\n> Defines a one-to-many dependency between objects so that when one object changes state, all of its dependencies are\n> notified and updated automatically.\n\nThere are few different ways to implement the Observer Pattern, but most revolve around a class design that includes\nSubject and Observer interfaces.\n\nBecause the subject is the sole owner of the data, the observers are dependent on the subject to update them when the\ndata changes. This leads to a cleaner OO design than allowing many objects to control the same data.\n\n**We say an object is tightly coupled to another object when it is too dependent on that object.** Loosely coupled\nobject doesn't know or care too much about the details of another object. By not knowing too much about other objects,\nwe can create designs that can handle change better. The Observer Pattern is a great example of loose coupling.\n\nThe ways the pattern achieves loose coupling:\n\n1. The only thing the subject knows about an observer is that it implements a certain interface.\n2. We can add new observers at any time.\n3. We never need to modify the subject to add new types of observers.\n4. We can reuse subjects or observers independently of each other.\n5. Changes to either the subject or an observer will not affect the other.\n\nDesign principle:\n\n> Strive for loosely coupled designs between objects that interact.\n\nLoosely coupled designs allow us to build flexible systems that can handle change because they minimise the\ninterdependency between objects.\n\nThe Observer Pattern is one of the most common patterns in use, and you will find plenty of examples of the pattern\nbeing used in many libraries and frameworks (Swing, JavaBeans, Cocoa, ...). Listener == Observer Pattern.\n\nThe Observer Pattern can be used for sending \"notifications\" so that observers can pull the data on their own.\n\nBullet points:\n\n- The Observer Pattern defines a one-to-many relationship between objects.\n- Subjects update Observers using a common interface.\n- Observers of any concrete type can participate in the pattern as long they implement the Observer interface.\n- Observers are loosely coupled in that the Subject knows nothing about them, other than that they implement the\n  Observer interface.\n- You can push or pull data from the Subject when using the pattern (pull is considered more correct).\n- Swing makes heavy use of the Observer Pattern, as do many GUI frameworks.\n- You will also find the pattern in many other places including RxJava, JavaBeans and RMI, as well as in other language\n  frameworks, like Cocoa, Swift and JavaScript events.\n- The Observer Pattern is related to the Publish / Subscribe Pattern, which is for more complex situations with multiple\n  Subjects and or / multiple message types.\n- The Observer Pattern is a commonly used pattern, and we will see it again when we learn about Model-View-Controller.\n\n*For each design principle, describe how the Observer Pattern makes use of the principle:*\n\n- Identify the aspects of your application that vary and separate them from what stays the same: Observers and data\n  vary.\n- Program to an interface, not an implementation: Subject and Observers are loosely coupled because what they know about\n  each other are the interfaces they implement.\n- Favour composition over inheritance: Subject holds a list of observers, observers hold a reference to the subject.\n\n## Chapter 3: Decorating Objects\n\n[The Decorator Pattern - Pattern implementation in Python](https://github.com/pkardas/learning/blob/master/books/head-first-design-patterns/ch_03_decorator.py)\n\nWe will re-examine the typical overuse of inheritance, and we will learn how to decorate classes at runtime using a form\nof object composition.\n\nStarbuzz system has created a maintenance nightmare for themselves. They are violating \"*Identify the aspects of your\napplication that vary and separate them from what stays the same\" and \"*Favour composition over inheritance*\".\n\nProblems with the suggested design:\n\n- My answer: What if customer has promo coupon e.g -20%. What if condiment is not available.\n\nIf I can extend an objects' behaviour through composition, then I can do this dynamically at runtime. When I inherit by\nsubclassing that behaviour is set statically at compile time. By dynamically composing objects, I can add new\nfunctionality by writing new code, rather than altering existing code. Because I am not changing existing code, the\nchanges of introducing bugs or causing unintended side effects in pre-existing code are much reduced. Code should be\nclosed to change, yet open to extension.\n\nDesign principle - one of the most important design principles:\n\n> Classes should be open for extension, but closed for modification.\n\nOPEN - if it needs or requirements change, just go and make your own extensions. CLOSED - we spent a lot of time getting\nthis code correct and bug free, so we can't let you alter the existing code. It must remain closed to modification.\n\nOur goal is to allow classes to be easily extended to incorporate new behaviour without modifying existing code. Designs\nthat are resilient to change and flexible enough to take on new functionality to meet changing requirements. E.g. The\nObserver Pattern - we can add new Observers and extend the Subject at any time.\n\nMany of the patterns give us time-tested designs that protect your code from being modified by supplying a means of\nextension.\n\nHow can I make every part of my design follow the Open-Closed Principle? Usually you can't. Making OO design flexible\nand open to extension without modifying existing code takes time and effort.\n\nApplying the Open-Closed principle EVERYWHERE is wasteful and unnecessary, and can lead to complex, hard-to-understand\ncode.\n\nThe Decorator Pattern:\n\n> Attaches additional responsibilities to an object dynamically. Decorators provide a flexible alternative to\n> subclassing for extending functionality.\n\nThe decorator adds its own behaviour before and / or after delegating to the object it decorates to do the rest of the\njob.\n\nJust because we are subclassing, it doesn't mean we use inheritance. Sometimes we are subclassing in order to have the\ncorrect type, not to inherit the behaviour. We can acquire new behaviour not by inheriting it from a superclass, but by\ncomposing objects together.\n\nDecorators are typically created using other patterns like Factory and Builder.\n\n`java.io` is largely based on Decorator. Java I/O also points out one of the downsides of the Decorator Pattern: designs\nusing this pattern often result in a large number of small classes, that can be overwhelming to the developer trying to\nuse the Decorator-based API.\n\nBullet points:\n\n- Inheritance is one form of extension, but not necessarily the best way to achieve flexibility in our designs.\n- In our designs we should allow behaviour to be extended without the need to modify existing code.\n- Composition and delegation is often used to add new behaviours at runtime.\n- The Decorator Pattern an alternative to subclassing for extending behaviour.\n- The Decorator Pattern involves a set of decorator patterns that are used to wrap concrete components.\n- Decorator classes mirror the type of the components they decorate (in fact they are the same type as the components\n  they decorate, either through inheritance or interface implementation).\n- Decorators change the behaviour of their components by adding new functionality before and / or after method calls to\n  the component.\n- You can wrap a component with any number of decorators.\n- Decorators are typically transparent to the client of the component - that is, unless the client is relying on the\n  component's concrete type.\n- Decorators can result in many small objects in our design, and overuse can be complex.\n\n## Chapter 4: Baking with OO Goodness\n\n[The Factory Pattern - Pattern implementation in Python](https://github.com/pkardas/learning/blob/master/books/head-first-design-patterns/ch_04_factory.py)\n\nThere is more to making objects than just using the *new* operator. We will learn that instantiation is an activity that\nshouldn't always be done in public and can often lead to coupling problems. The Factory Pattern can save us from\nembarrassing dependencies.\n\nWe should not program to an implementation, but every time we use *new* that is exactly what we do. The *new* operator\ninstantiating a concrete class, so that is definitely an implementation not an interface.\n\nCHANGE impacts our use of *new*. Code will have to be changed new concrete classes are added.\n\n*How might you take all the parts of your application that instantiate concrete classes and separate or encapsulate them\nfrom the rest of your application?*\n\n- My answer: I would add a function returning instantiated classes.\n\nIndeed, we can encapsulate object creation, we can take the creation code and move it into another object that is only\ngoing to be concerned with creating pizzas. Anytime it needs pizza, it asks the pizza factory to make one. By\nencapsulating object creation in one class, we have only one place to make modifications when the implementation\nchanges. Simple object factory can be a static function, however it has the disadvantage that we can not subclass and\nchange behaviour of the create method.\n\nThe Simple Factory isn't actually a Design Pattern, it is more of a programming idiom. Some developers do mistake this\nidiom for the Factory Pattern.\n\nA *factory method* handles object creation and encapsulates it in a subclass. This decouples the client code (\ne.g. `orderPizza`) in the superclass from the object creation code in the subclass.\n\n```java\npublic abstract class PizzaStore {\n  public Pizza orderPizza(String type) {\n    Pizza pizza;\n  \n    pizza = createPizza(type);\n  \n    pizza.prepare();\n    pizza.bake();\n    pizza.cut();\n    pizza.box();\n  \n    return pizza;\n  } \n  \n  protected abstract Pizza createPizza(String type);\n}\n```\n\nAll factory patterns encapsulate object creation. The Factory Method Pattern encapsulates object creation by letting\nsubclasses decide what objects to create. For every concrete Creator, there is typically a whole set of products that it\ncreates. Chicago pizza creators create different types of Chicago-style pizza, New York pizza creators create different\ntypes of New York-style pizza, and so on.\n\nThe Factory Method Pattern:\n\n> Defines an interface for creating an object, but lets subclasses decide which class to instantiate. Factory Method\n> lets a class defer instantiation to subclasses.\n\nCreator is written to operate on products produced by the factory method. The Creator class is written without knowledge\nof the ac dual products that will be created. Only subclasses actually implement the factory method and create products.\n\nWhen you directly instantiate an object, you are depending on its concrete class. Reducing dependencies to concrete\nclasses in our code is a \"good thing\". General Principle - Dependency Inversion Principle:\n\n> Depend upon abstractions. Do not depend upon concrete class.\n\nIt suggests that our high-level components should not depend on out low-level components, rather, they should both\ndepend on abstractions.\n\nThe \"inversion\" in the name Dependency Inversion Principle is there because it inverts they way you typically might\nthink about your OO design. Low-level components now depend on higher-level abstraction.\n\nGuidelines that can help to avoid OO designs that violate the Dependency Inversion Principle:\n\n- No variable should hold a reference to a concrete class (if you use new, you will be holding a reference, use factory\n  instead)\n- No class should derive from a concrete class (If you derive, you depend, derive from an abstraction)\n- No method should override an implemented method of its base classes (if you override an implemented method, your base\n  wasn't really an abstraction to start with)\n\nThis is a guideline you should strive for, rather than a rule you should follow all the time. Clearly, every single Java\nprogram ever written violates these guidelines. But if you internalise these guidelines and have them in the back of\nyour mind when you design, you will know when you are violating the principle, and you will have a good reason for doing\nso.\n\nAn Abstract Factory gives un an interface for creating a family of products. By writing code that uses this interface,\nwe decouple our code from actual factory that creates the products. That allows us to implement a variety of factories\nthat produce products meant for different contexts - such as different regions, operating systems of different look and\nfeels. Because code is decouples from the actual products, we can substitute different factories to get different\nbehaviours.\n\nThe Abstract Factory Pattern:\n\n> Provides an interface for creating families of related or dependent objects without specifying their concrete classes.\n\nOften the methods of an Abstract Factory are implemented as factory methods.\n\nThe Factory Method and The Abstract Factory are both good at decoupling applications from specific implementations.\n\n- Use Abstract Factory whenever you have families of products you need to create, and you need to make sure your clients\n  create products that belong together. Abstract Factory creates objects through object composition.\n- Use Factory Methods to decouple client code from the concrete classes you need to instantiate, or if you don't know\n  ahead of time all the concrete classes you are going to need. Factory Method creates objects through inheritance.\n\nBullet points:\n\n- All factories encapsulate object creation.\n- Simple Factory, while not a bona fide design pattern, is a simple way to decouple your clients from concrete classes.\n- Factory Method relies on inheritance: object creation is delegated to subclasses, which implement the factory method\n  to create objects.\n- Abstract Factory relies on object composition: object creation is implemented in methods exposed in the factory\n  interface.\n- All factory patterns promote loose coupling by reducing the dependency of your application on concrete classes.\n- The intent of Factory Method is to allow a class to defer instantiation to its subclasses.\n- The intent of Abstract Factory is to create families of related objects without having to depend on their concrete\n  classes.\n- The Dependency Inversion Principle guides us to avoid dependencies on concrete types and to strive for abstractions.\n- Factories are powerful technique for coding to abstractions, not concrete classes.\n\n## Chapter 5: One-of-a-kind Objects\n\n[The Singleton Pattern - Pattern implementation in Python](https://github.com/pkardas/learning/blob/master/books/head-first-design-patterns/ch_05_singleton.py)\n\nThe ticket to creating one-of-a-kind objects for which there is only one instance, ever. By using singleton you can\nensure that every object in your application is making use of the same global resource. Often used to manage pools of\nresources, like connection or thread pools.\n\n_How might things go wrong if more than one instance of ChocolateBoiler is created in an application?_\n\n- My answer: Incorrect state management, because of multiple instances.\n\nThe Singleton Pattern:\n\n> Ensures a class has only one instance, and provides a global point of access to it.\n\nDespite using the Singleton Pattern, multithreaded-application can still cause problems - instantiate multiple objects.\nIn Javan solution for this is to use `synchronized` keyword.\n\n```java\npublic static synchronized Singleton getInstance() {\n  ...\n}\n```\n\n`synchronized` - forces every thread to wait fot its turn before it can enter the method. That is, no 2 threads may\nenter the method at the same time. Synchronization may be expensive, but here it will be used only once\non `uniqueInstance` initialization. After the first time, synchronization is totally unneeded overhead. There are\nJava-specific solutions to this overhead (e.g. double-check locking).\n\nThe Singleton Pattern violates \"_the loose coupling principle_\", if you make a change to the Singleton, you will likely\nhave to make a change to every object connected to it.\n\nA global variable can provide a global access, but not ensure only one instance. Global variables also tend to encourage\ndevelopers to pollute the namespace with lots of global references to small objects. Singletons don't encourage this in\nthe same way, but can be abused nonetheless.\n\nIt is possible to implement Singleton as an enum.\n\nBullet points:\n\n- The Singleton Pattern ensures you have at most one instance of a class in your application.\n- The Singleton Pattern also provides a global access point to that interface.\n- Java's implementation of the Singleton Pattern makes use of a private constructor, a static method combined with a\n  static variable.\n- Examine your performance and resource constraints and carefully choose an appropriate Singleton for multithreaded\n  applications (we should consider all applications multithreaded).\n\n## Chapter 6: Encapsulating Invocation\n\n[The Command Pattern - Pattern implementation in Python](https://github.com/pkardas/learning/blob/master/books/head-first-design-patterns/ch_06_command.py)\n\nIn this chapter we are going to encapsulate method invocation. By encapsulating method invocation, we can crystallize\npieces of computation so that the object invoking the computation doesn't need to worry about how to do things, it just\nuses our crystallized method to get it done.\n\nThe Command Pattern allows you to decouple the requester of an action from the object that actually performs the action.\nThis can be achieved by introducing command objects into the design. A command object encapsulates a request to do\nsomething on a specific object.\n\nExample with a waitress taking orders and passing them to a cook - separation of an object making a request from the\nobject that receive and execute requests.\n\n- Customer - Client\n- Order - Command\n- Waitress - Invoker\n- Short-Order Cook - Receiver\n- takeOrder - setCommand - sets what is supposed to be executed\n- orderUp - execute\n\nThe Command Pattern:\n\n> Encapsulates a request as an object, thereby letting you parametrize other objects with different requests,\n> queue or log requests, and support undoable operations.\n\nA null object is useful when you don't have a meaningful object to return, and yet you want to remove the responsibility\nof handling null from the client, e.g. `NoCommand` - surrogate and does nothing when its execute method is called.\n\nCommand Pattern can be taken into the next level by using e.g. Java's lambda expressions. Instead of instantiating the\nconcrete command objects, you can use function objects in their place. This can be done if Command interface has one\nabstract method.\n\nIn order to support undoable Commands, `Command` interface has to be extended with `undo` method.\n\n`MacroCommand` can be used to execute multiple commands:\n\n```java\nMacroCommand partyOnMacro = new MacroCommand({lightOn, stereoOn, tvOn, hottubOn});\n```\n\nMore uses of the Command Pattern:\n\n- queueing requests - objects implementing the command interface are added to the queue, threads remove commands from\n  the queue on by one and call their `execute` method. Once complete, they go back for a new command object. This gives\n  us an effective way to limit computation to a fixed number of threads.\n- logging requests - semantics of some applications require that we log all actions and be able to recover after a crash\n  by re-invoking those actions. The Command Pattern can support these semantics with the addition of two\n  methods: `store` and `load`.\n\nBullet points:\n\n- The Command Pattern decouples an object making a request from the one that knows how to perform it.\n- A Command object is at the center of this decoupling and encapsulates a receiver with an action (or set of actions).\n- An invoker makes a request of a Command object by calling its execute method, which invokes these actions on the\n  receiver.\n- Invokers can be parametrized with Commands, even dynamically at runtime.\n- Commands may support undo by implementing an undo method that restores the object to its previous state before the\n  execute method was last called.\n- MacroCommands are a simple extension of the Command Pattern that allow multiple commands to be invoked. Likewise,\n  MacroCommands can easily support undo.\n- In practice, it is not uncommon for \"smart\" Command objects to implement the request themselves rather than delegating\n  to a receiver.\n- Commands may also be used to implement logging and transactional systems.\n\n## Chapter 7: Being Adaptive\n\n[The Adapter Pattern - Pattern implementation in Python](https://github.com/pkardas/learning/blob/master/books/head-first-design-patterns/ch_07_adapter.py)\n\n[The Facade Pattern - Pattern implementation in Python](https://github.com/pkardas/learning/blob/master/books/head-first-design-patterns/ch_07_facade.py)\n\nWe are going to wrap some objects with a different purpose: to make their interfaces look something they are not. So we\ncan adapt a design expecting one interface to a class that implements a different interface. Also, we are going to look\nat another pattern that wraps objects to simplify their interface.\n\nYou will have no trouble understanding what an OO adapter is because the real world is full of them (e.g. power adapter,\nThe British wall outlet exposes on interface for getting power, the adapter converts one interface into another, the US\nlaptop expects another interface).\n\nOO adapters play the same role as their real-world counterparts: they take an interface and adapt it to one that a\nclient ise expecting. For example: you are going to use a new library, but the new vendor designed their interfaces\ndifferently than the last vendor.\n\nThe adapter acts as the middleman by receiving requests from the client and converting them into requests that make\nsense on the vendor classes.\n\n_If it walks like a duck and quacks like a duck, then it ~~must~~ might be a ~~duck~~ turkey wrapped with a duck\nadapter..._\n\n```java\npublic class TurkeyAdapter implements Duck {\n  // take Turkey in the constuctor, implement Duck's method by invoking Turkey's methods.\n}\n```\n\nHow the Client uses the Adapter:\n\n1. The client makes a request to the adapter by calling a method on it using the target interface.\n2. The adapter translates the request into one or more calls on the adaptee using the adaptee interface.\n3. The client receives the results of the call and never knows there is an adapter doing the translation.\n\nIt is possible to create a Two Way Adapter, just implement both interfaces involved, so the adapter can act as an old\ninterface or a new interface.\n\nThe Adapter Pattern:\n\n> Converts the interface of a class into another interface the client expects. Adapter lets classes work together that\n> couldn't otherwise because of incompatible interfaces.\n\nAdapter is used to decouple the client from the implemented interface, and if we expect the interface to change over\ntime, the adapter encapsulates that change that the client doesn't have to be modified each time it needs to operate\nagainst a different interface.\n\nThe Adapter Pattern is full of good OO design principles: uses object composition + binds the client to an interface,\nnot an implementation.\n\nThere is second type of adapter - class adapter, this one uses multiple inheritance (Target and Adaptee).\n\nReal-world adapters:\n\n- [Java] Enumerators - The Enumerator interface allows you to step through the elements of a collection without knowing\n  the specifics of how they are managed in the collection.\n- [Java] Iterators - The more recent Collection classes use an Iterator interface, allows you to iterate through a set\n  of items in a collection, and adds the ability to remove items.\n\nWhen a method in an adapter can not be supported you can throw e.g. `UnsupportedOperationException`.\n\n_Some AC adapters do more than just change the interface - they add other features like surge protection, indicator\nlights, and other bells and whistles. If you were going to implement these kinds of features, what pattern would you\nuse?_\n\n- My answer: The Decorator Pattern\n\nDecorator vs Adapter:\n\n- Decorators allow new behavior to be added to classes without altering existing code.\n- Adapter always convert the interface of what they wrap.\n\nDecorators and Adapters seem to look somewhat similar on paper, but clearly are miles apart.\n\nThe Facade Pattern alters an interface, but in order to simplify the interface - it hides all the complexity of one or\nmore classes behind a clean well-lit facade. The Facade Pattern can take a complex subsystem and make it easier to use.\n\nExample home cinema system: instead of turning on popcorn machine, screen and audio system - all you need to do is\ncall `watchMovie`.\n\nFacades don't encapsulate the subsystem classes, they merely provide a simplified interface to their functionality. The\nsubsystem classes still remain available. It provides a simplified interface while still exposing the full functionality\nof the system to those who may need it.\n\nA facade not only simplifies an interface, it decouples a client from a subsystem of components.\n\nFacades and adapters may wrap multiple classes, but a facade's intent is to simplify, while an adapter's is to convert\nthe interface to something different.\n\nThe Facade Pattern:\n\n> Provides a unified interface to a set of interfaces in a subsystem. Facade defines a higher-level interface that makes\n> the subsystem easier to use.\n\nDesign principle - Principle of Least Knowledge (The Law of Demeter):\n\n> Talk only to your immediate friends.\n\nThis principle guides us to reduce the interactions between objects to just a few close \"friends\". It means when you are\ndesigning a system, for any object, be careful of the number of classes it interacts with and also how it comes to\ninteract with those classes.\n\nThis principle prevents us from creating designs that have a large number of classes coupled together so that changes in\none part of the system cascade to other parts.\n\nThis means, invoke only methods that belong to:\n\n- the object itself\n- objects passed in as a parameter to the method\n- any object the method creates or instantiates\n- any components of the object\n\n_Side note: Principle of Least Knowledge is a better name than The Law of Demeter, because no principle is a law, and\nthey don't have to be always applied._\n\nThe Facade Pattern and the Principle of Least Knowledge - we try to keep subsystems adhering to the Principle of Least\nKnowledge as well. If this gets too complex and too many friends are intermingling, we can introduce additional facades\nto form layers of subsystems.\n\nBullet points:\n\n- When you need to use an existing class and its interface is not the one you need, use an adapter.\n- When you need to simplify and unify a large interface or complex set of interfaces, use a facade.\n- An adapter changes an interface into one a client expects.\n- A facade decouples a client from a complex subsystem.\n- Implementing an adapter may require little work or a great deal of work depending on the size and complexity of the\n  target interface.\n- Implementing a facade requires that we compose the facade with its subsystem and use delegation to perform the work of\n  the facade.\n- There are two forms of the Adapter pattern: object and class adapters. Class adapters require multiple inheritance.\n- You can implement more than one facade for a subsystem.\n- An adapter wraps an object to add new behaviours and responsibilities, and a facade \"wraps\" a set of objects to\n  simplify.\n\n## Chapter 8: Encapsulating Algorithms\n\n[The Template Method Pattern - Pattern implementation in Python](https://github.com/pkardas/learning/blob/master/books/head-first-design-patterns/ch_08_template_method.py)\n\nWe are going to get down to encapsulating pieces of algorithms so that subclasses can hook themselves right into a\ncomputation any time they want.\n\nWe can generalize the recipe and place it in a base class.\n\n```java\npublic abstract class CaffeineBeverage {\n  final void prepareRecipe() {\n    // Our template method - it serves as a template for an algorithm.\n    boilWater();\n    brew();\n    pourInCup();\n    addCondiments();\n  }\n  \n  abstract void brew();\n  abstract void addCondiments();\n  \n  void boilWater() {}\n  void pourInCup() {}\n}\n```\n\nThe Template Method defines the steps of an algorithm and allows subclasses to provide the implementation for one or\nmore steps.\n\nThe Template Method Pattern:\n\n> Defines the skeleton of an algorithm in a method, deferring some steps to subclasses. Template Method lets subclasses\n> redefine certain steps of an algorithm without changing the algorithm's structure.\n\nThis pattern is all about creating a template for an algorithm. Template - just a method, method that defines and\nalgorithms as a set of steps. One or more of these steps is defined to be abstract and implemented by a subclass. This\nensures the algorithm's structure stays unchanged.\n\nWe can also have concrete methods that do nothing by default - we call them `hooks`. Subclasses are free to override\nthese but don't have to.\n\nUse abstract classes when subclass MUST provide an implementation of the method. Use hooks when that part of the\nalgorithm is optional.\n\nThe Hollywood Principle:\n\n> Don't call us, we'll call you.\n\nThe Hollywood Principle gives us a way to prevent _dependency rot_. We allow low-level components to hook themselves\ninto a system, but the high-level components determine when they are needed, and how. In other words, the high-level\ncomponents give the low-level components the \"don't call us, we'll call you\" treatment.\n\nPatters using The Hollywood Principle:\n\n- The Template Method Principle\n- The Observer Pattern\n- The Strategy Pattern\n- The Factory Pattern\n\nThe Dependency Inversion Principle teaches us to avoid the use of concrete classes and instead work as much as possible\nwith abstractions. The Hollywood Principle is a technique for building frameworks or components so that lower-level\ncomponents can be hooked into the computation, but without creating dependencies between lower and higher level\ncomponents.\n\nThis pattern is a great design tool for creating frameworks, where the framework controls how something gets done, but\nleaves you to specify your own details about what is actually happening at each step of the framework's algorithm.\n\n`sort` methods are in the spirit of The Template Method Pattern, developer has to define `compare` method.~~\n\nTemplate Method vs Strategy:\n\n- Strategy defines a family of algorithms and make them interchangeable.\n- Factory Method defines the outline of an algorithm, and lets subclasses do some work.\n- Strategy uses object composition.\n- Template Method uses inheritance.\n\nBullet points:\n\n- A template method defines the steps of an algorithm, deferring to subclasses for the implementation of those steps.\n- The Template Method Pattern gives us an important technique for code reuse.\n- The template method's abstract may define concrete methods, abstract methods and hooks.\n- Abstract methods are implemented by subclasses.\n- Hooks are methods that do nothing or default behavior in the abstract class, but may be overridden in the subclass.\n- To prevent subclasses form changing the algorithm in the template method, declare the template method as final.\n- The Hollywood Principle guides us to put decision making in high-level modules that can decide how and when to call\n  low-level modules.\n- You will see lots of uses of the Template Method Pattern in real-world code, but (as with any pattern) don't expect it\n  all to be designed \"by the book\".\n- The Strategy and Template Method Patterns both encapsulate algorithms, the first by composition and the other by\n  inheritance.\n- Factory Method is a specialisation of Template Method.\n\n## Chapter 9: Well-Managed Collections\n\n[The Iterator Pattern - Pattern implementation in Python](https://github.com/pkardas/learning/blob/master/books/head-first-design-patterns/ch_09_iterator.py)\n\n[The Composite Pattern - Pattern implementation in Python](https://github.com/pkardas/learning/blob/master/books/head-first-design-patterns/ch_09_composite.py)\n\nIn this chapter we are going to see how we can allow our clients to iterate through objects without ever getting a peak\nat how we store the objects.\n\nIterator - encapsulates the way we iterate through a collection of objects. The Iterator Pattern relies on an interface\ncalled Iterator.\n\nHowever, in Java following interface does not have to be defined, because Java has built-in Iterator interface.\n\n```java\npublic interface Iterator {\n  boolean hasNext();\n  MenuItem next();\n}\n```\n\nOnce we have this interface, we can implement Iterators for any kind of collection of objects: arrays, lists, hash\nmaps...\n\nThe Iterator Pattern:\n\n> Provides a way to access the elements of an aggregate object sequentially without exposing its underlying\n> representation.\n\nThe effect of using iterators in the design: once you have a uniform way of accessing the elements of all your aggregate\nobjects, you can write polymorphic code that works with any of these aggregates.\n\nThe other important impact on the design is that the Iterator Pattern takes the responsibility of traversing elements\nand gives that responsibility of traversing elements to the iterator object, not aggregate object. This not only keeps\nthe aggregate interface and implementation simpler, it removes the responsibility for iterator from the aggregate and\nkeeps the aggregate focused on the things it should be focused on (managing a collection of objects), not on iteration.\n\nThe Single Responsibility Principle:\n\n> A class should have only one reason to change.\n\nWe want to avoid change in our classes because modifying code provides all sorts of opportunities for problems to creep\nin. Having two ways to change increases the probability the class will change in the future, and when it does, it's\ngoing to affect two aspects of your design.\n\n**Cohesion** - is a measure of how closely a class of module supports a single purpose or responsibility.\n\n- High cohesion - designed around a set of related functions (easy to maintain, single responsibility)\n- Low cohesion - designed around a set of unrelated functions (difficult to maintain, multiple responsibilities)\n\nThere comes a time when we must refactor our code in order for it to grow. To not of so would leave us with rigid,\ninflexible code that has no hope of ever sprouting new life.\n\nThe Composite Pattern:\n\n> Allows you to compose objects into tree structures to represent part-whole hierarchies. Composite lets clients treat\n> individual objects and compositions of objects uniformly.\n\nPart-whole hierarchy - tree of objects that is made of parts (e.g. menus and menu items).\n\nUsing a composite structure, we can apply the same operations over both composites and individual objects. In other\nwords, in most cases we can ignore differences between compositions of objects and individual objects.\n\nA composite contains components. Components come in two flavors: composites and leaf elements. A composite holds a set\nof children: those children may be other composites or leaf elements.\n\nThe Composite Pattern takes the Single Responsibility Principle and trades it dor transparency - by allowing the\nComponent interface to contain the child management operations and the leaf operations, a client can treat both\ncomposites and leaves uniformly.\n\nWe are guided by design principles, but we always need to observe the effect they have on our designs.\n\nBullet points:\n\n- An Iterator allows access to an aggregate's elements without exposing its internal structure.\n- An Iterator takes the job of iterating over an aggregate and encapsulates it in another object.\n- When using an Iterator, we relieve the aggregate of the responsibility of supporting operations for traversing its\n  data.\n- An Iterator provides a common interface for traversing the items of an aggregate, allowing you to use polymorphism\n  when writing code that makes the use of the items of the aggregate.\n- The Iterable interface provides a means of getting an iterator and enables Java's enhanced for loop (for-each).\n- We should strive for to assign only one responsibility to each class.\n- The Composite Pattern allows clients to treat composites and individual objects uniformly.\n- A Component is any object in a Composite structure. Components may be other composites or leaves.\n- There are many design tradeoffs in implementing Composite. You need to balance transparency and safety with your\n  needs.\n\n## Chapter 10: The State of Things\n\n[The State Pattern - Pattern implementation in Python](https://github.com/pkardas/learning/blob/master/books/head-first-design-patterns/ch_10_state.py)\n\nThe Strategy and State Patterns are twins separated at birth. The Strategy Pattern went on to create a wildly successful\nbusiness around interchangeable algorithms, while State took the perhaps more noble path by helping objects to control\ntheir behavior by changing their internal state. As different as their paths became, however, underneath you will almost\nprecisely the same design.\n\nThe State Pattern:\n\n> Allows an object to alter its behavior when its internal state changes. The object will appear to change its class.\n\nThe pattern encapsulates state into separate classes and delegates to the object representing the current state. What\ndoes it mean for an object to \"appear to change its class\"? If an object you are using can completely change its\nbehavior, then it appears to you that the object is actually instantiated from another class. In reality, however, you\nknow that we are using composition to give the appearance of a class change by simply referencing different state\nobjects.\n\nThink of the Strategy Pattern as a flexible alternative to subclassing - if you use inheritance to define the behavior\nof a class, then you are stuck with that behavior even if you need to change it. With Strategy, you can change the\nbehavior by composing with a different object.\n\nThink of the State Pattern as an alternative to putting lots of conditionals in your context - by encapsulating the\nbehaviors within state objects, you can simply change the state object in context to change its behavior.\n\nBullet points:\n\n- The State Pattern allows an object to have many behaviors that are based on its internal state.\n- Unlike a procedural state machine, the State Pattern represents each state as a full-blown class.\n- The Context gets its behavior by delegating to the current state object it is composed with.\n- By encapsulating each state into a class, we localize any changes that will need to be made.\n- The State and Strategy Patterns have the same class diagram, but they differ in intent.\n- The Strategy Pattern typically configures Context classes with a behavior or algorithm.\n- The State Pattern allows a Context to change its behavior as the state of the Context changes.\n- State transitions can be controlled by the State classes or by the Context classes.\n- Using the State Pattern will typically result in a greater number of classes in your design.\n- State classes may be shared among Context instances.\n\n## Chapter 11: Controlling Object Access\n\n[The Virtual Proxy Pattern - Pattern implementation in Python](https://github.com/pkardas/learning/blob/master/books/head-first-design-patterns/ch_11_virtual_proxy.py)\n\nProxies control and manage access. Proxies have been known to haul entire method calls over the internet for their\nproxied objects - they have been also known to patiently stand in for some pretty lazy objects.\n\nProxy pretends it is the real object, but it is really communicating over the net to the real object. A remote proxy\nacts as a local representative to a remote object. Remote object is an object that lives in the heap of different JVM.\nLocal representative - it is an object that you call local methods on and have them forwarded on to the remote object.\n\nRMI builds the client and the service helper objects. The nice thing about RMI is that you don't have to write any of\nthe networking or I/O code yourself. Networking and I/O methods are risky and can fail. The client dopes have to\nacknowledge the risk.\n\nRMI nomenclature: client helper is a \"stub\" and the service helper is a \"skeleton\".\n\nThe Proxy Pattern:\n\n> Provides a surrogate of placeholder for another object to control access to it.\n\nUse the Proxy Pattern to create a representative object that controls access to another object, which may be remote,\nexpensive to create, or in need of securing.\n\nThe Proxy Pattern can manifest itself in many forms, e.g. the Virtual Proxy.\n\nVirtual Proxy - acts as a representative for an object that bay be expensive to create. The Virtual Proxy often defers\nthe creation of the object until it is needed. The Virtual Proxy also acts as a surrogate for the object before and\nwhile it is being created. After that, the proxy delegates requests to the RealSubject.\n\nImageProxy for application displaying images:\n\n1. ImageProxy first creates an ImageIcon and starts loading it from a network URL.\n2. While the bytes of the image are being retrieved, ImageProxy displays \"Loading album cover, please wait...\"\n3. When the image is fully loaded, ImageProxy delegates all method calls to the image icon\n4. If the user requests a new image, we will create a new proxy and start the process over.\n\nThere are a lot of variants of the Proxy Pattern in the real world. What they all have in common is that they intercept\na method invocation that the client is making to the subject. This level of indirection allows us to do many things,\nincluding dispatching requests to a remote subject, providing a representative for an expensive object as it is created\nor providing some level of protection that can determine which clients should be calling which methods.\n\nProtection Proxy - a proxy that controls access to an object based on access rights. For example: `Employee` object - a\nProtection Proxy might allow the employee to call certain methods on the object, a manager to call additional methods (\nlike `setSalary`), and an HR employee to call any method on the object.\n\nAdditional proxies:\n\n- Firewall Proxy - controls access to a set of network resources, protecting the subject from \"bad\" clients.\n- Smart Reference Proxy - provides additional actions whenever a subject is referenced, such as counting the number of\n  references to an object.\n- Caching Proxy - provides temporary storage for results of operations that are expensive. It can also allow multiple\n  clients to share the results to reduce computation or network latency.\n- Synchronization Proxy - provides safe access to a subject from multiple threads.\n- Complexity Hiding Proxy - hides the complexity of and controls access to a complex set of classes. This is sometimes\n  called the Facade Proxy for obvious reasons. The Complexity Hiding Proxy differs from the Facade Pattern in that the\n  proxy controls access, while the Facade Pattern just provides an alternative interface.\n- Copy-On-Write Proxy - controls the copying of an object by deferring the copying of an object until it is required by\n  a client. This is a variant of the Virtual Proxy.\n\nBullet points:\n\n- The Proxy Pattern provides a representative for another object in order to control the client's access to it.\n- A Remote Proxy manages interaction between a client and a remote object.\n- A Virtual Proxy controls access to the methods of an object based on the caller.\n- A Protection Proxy controls access to the methods of an object based on the caller.\n- Many other variants of the Proxy Pattern exist, including caching proxies, firewall proxies, copy-on-write proxies,\n  and so on.\n- Proxy is structurally similar to Decorator, but the two patterns differ in their purpose.\n- The Decorator Pattern adds behavior to an object, while Proxy controls access.\n- Java's built-in support for Proxy can build a dynamic proxy class on demand and dispatch all calls on it to a handler\n  of your choosing.\n- Like any wrapper, proxies will increase the number of classes and objects in your designs.\n\n## Chapter 12: Patterns of Patterns\n\nSome of the most powerful OO designs use several patterns together. Compound patterns - set of patterns that wort\ntogether in a design that can be applied over many problems.\n\nPatterns are often used together and combined within the same design solution. A compound pattern combines two or more\npatterns into a solution that solves a recurring or general problem.\n\nIt is possible to rework Duck Simulator from the first chapter using 6 patterns. In fact, you never actually want to\napproach a design like this. You only want to apply patterns when and where they make sense. **You never want to start\nout with the intention of using patterns just for the sake of it**.\n\nMVC - it is just a few patterns put together. Music players underneath use MVC.\n\nView - gives you a presentation of the model. The view usually gets the state and data it needs to display directly from\nthe model.\n\nController - takes user input and figures out what it means to the model.\n\nModel - the model holds all the data, state and application logic. The model is oblivious to the view and controller,\nalthough it provides an interface to manipulate and retrieve its state, and it can send notifications of state changes\nto observers.\n\nYou are the user - you interact with the view. When you do something to the view, then the view tells the controller\nwhat you did. It is controller's job to handle that. The controller asks the model to change its state. If you click a\nbutton it is the controller's job to figure out what that means and how the model should be manipulated based on that\naction. The controller may also ask the view to change. The model notifies the view when its state has changed. The view\nasks the model for state.\n\nMVW is made of:\n\n- Strategy - The view and controller implement the classic Strategy Pattern - the view is configured with a strategy,\n  the controller provides strategy.\n- Composite - the display consists of a nested set of windows, panels, buttons, text labels and so on. Each display is a\n  composite (like a window) or a leaf (like a button). When the controller tells the view to update, it only has to tell\n  the top view component, and Composite takes care of the rest.\n- Observer - The model implements the Observer Pattern to keep interested objects updated when state changes occur.\n\nTypically, you need one controller per view at runtime; however the same controller class can easily manage many views.\n\nMVC has been adopted to many web frameworks:\n\n- thin client - the model and most of the view and the controller all reside in the server, with the browser providing a\n  way to display the view, and to get input from the browser to the controller.\n- single page application - almost all the model, view and controller reside on the client side.\n\nMVC frameworks: Django, AngularJS, EmberJS, ...\n\nBullet points:\n\n- The Model View Controller Pattern is a compound pattern consisting of the Observer, Strategy and Composite Patterns.\n- The model makes use of the Observer Pattern so that it can keep observers updated yet stayed decoupled from them.\n- The controller is the Strategy for the view. The view can use different implementations of the controller to get\n  different behavior.\n- The view uses the Composite Pattern to implement the user interface, which usually consists of nested components like\n  panels, frames and buttons.\n- These patterns work together to decouple the three players in the MVC model, which keep designs clear and flexible.\n- The Adapter Pattern can be used to adapt a new model to an existing view and controller.\n- MVC has been adapted to the web.\n- There are many web MVC frameworks with various adaptations of the MVC pattern to fit the client/server application\n  structure.\n\n## Chapter 13: Patterns in the Real World\n\nA Pattern:\n\n> is a solution to a problem in a context.\n\n- The context is the situation in which the pattern applies. This should be recurring situation.\n- The problem refers to the goal you are trying to achieve in this context, but it also refers to any constraints that\n  occur in the context.\n- The solution is what you are after: a general design that anyone can apply that resolves the goal and set of\n  constraints.\n\nLike design principles, patterns are not meant to be laws or rules - they are guidelines that you can alter to fit your\nneeds. A lot of real-world examples don't fit the classic pattern designs. When you adapt patterns, it never hurts to\ndocument how your pattern differs from the classic design - that way other developers can quickly recognize the patterns\nyou are using.\n\nThe Design Pattern definition tells us that the problem consists of a goal and set of constraints. Only when solution\nbalances both sides of the _force_ (goal - constraints) do we have a useful pattern.\n\nDesign pattern should have: a name, a template, an intent, motivation, applicability, code example, use cases, how\npattern relates to other patterns, consequences.\n\nDesign patterns are discovered, not created. Anyone can discover a Design Pattern, however it is not easy and doesn't\nhappen quickly. You don't have a pattern until others have used it and found it to work. You don't have a pattern until\nit passes the Rule of Thee - a pattern can be called a pattern only if it has been applied in a real-world solution at\nleast 3 times.\n\nCreational Patterns - involve object instantiation and all provide a way to decouple a client from the objects it needs\nto instantiate: Singleton, Abstract Factory, Factory Method.\n\nBehavioral Patterns - concerned with how classes and objects interact and distribute responsibility: Template Method,\nIterator, Command, State, Observer, Strategy.\n\nStructural Patterns - let you compose classes or objects into larger structures: Proxy, Facade, Composite, Adapter,\nDecorator.\n\nPatterns are often classified by a second attribute - whether the pattern deals with classes or objects: Class\nPatterns (Template Method, Factory Method, Adapter) and Object Patterns (Composite, Decorator, State, Singleton, ...).\n\nCategorisation is confusing because many patterns fit into more than one category. Categories give us a way to think\nabout the wa groups of patterns relate and how patterns within a group relate to one another. They also give us a way to\nextrapolate to new patterns.\n\nKeep it simple - KISS - your goal should be simplicity, not \"how can I apply a pattern to this problem\". Don't feel like\nyou aren't a sophisticated developer if you don't use a pattern to solve a problem.\n\nPatterns aren't a magic bullet. You can't plug one in, compile and then take and early lunch. To use patterns, you need\nto think through the consequences for the rest of your design.\n\nRefactoring is a great time to reexamine your design to see if it may be better structured with patterns.\n\nDon't be afraid to remove a Design Pattern from your design. Remove, when a simpler solution without the pattern would\nbe better.\n\n_YAGNI_: Resist the temptation of creating architectures that are ready to take on change from any direction. If the\nreason for adding a pattern is only hypothetical, don't add the pattern: it will only add complexity to your system, and\nyou might never need it. Overuse of design patterns can lead to code that is downright overengineered. Always go with\nthe simplest solution that does the job and introduce patterns where the need emerges.\n\nThe Beginner uses patterns everywhere. The Intermediate starts to see where patterns are needed and where they aren't.\nThe Zen mind is able to see patterns where they fit naturally.\n\nAnti-pattern:\n\n> Tells you how to go from a problem to a BAD solution.\n\nAn anti-pattern tells you why bad solution is attractive, tells why that solution in the long term is bad and suggests\nother applicable patterns that may provide good solutions.\n\nAn anti-pattern always looks like a good solution, but then turns out to be a bad solution when it is applied. BY\ndocumenting anti-patterns we help others to recognize bad solutions before they implement them. Like many patterns,\nthere are many types of anti-patterns including development, OO, organizational, and domain specific anti-patterns.\n\nBullet points:\n\n- Let Design Patterns emerge in your designs, don't force them in just for the sake of using a pattern.\n- Design Patterns aren't set in stone - adapt and tweak them to meet your needs.\n- Always use the simplest solution that meets your needs, even if it doesn't include a pattern.\n- Study Design Patterns catalogs to familiarize yourself with patterns and the relationship among them.\n- Pattern classifications provide groupings for patterns. When they help, use them.\n- You need to be committed to be a patterns' writer - it takes time and patience, and you have to be willing to do lots\n  of refinement.\n- Remember, most patterns you encounter will be adaptations of existing patterns, not new patterns.\n- Build your team's shared vocabulary. This is one of the most powerful benefits of using patterns.\n- Like any community, the patterns community has its own lingo. Don't let that hold you back. Having read this book, you\n  know most of it.\n\n## Chapter 14: Leftover Patterns\n\n**Bridge**\n\n> Use the Bridge Pattern to vary not only your implementations, but also your abstractions.\n\nBenefits:\n\n+ Decouples an implementation so that it is bout bound permanently to an interface.\n+ Abstraction and implementation can be extended independently.\n+ Changes to the concrete abstraction classes don't affect the client.\n\nBridge Uses and Drawbacks:\n\n- Useful in graphics and windowing systems that need to run over multiple platforms.\n- Useful any time you need to vary an interface and an implementation in different ways.\n- Increases complexity.\n\n**Builder**\n\n> Use the Builder Pattern to encapsulate the construction of a product and allow it to be constructed in steps.\n\nBenefits:\n\n+ Encapsulates the way a complex object is constructed.\n+ Allows objects to be constructed in a multistep and varying process (as opposed to one-step factories).\n+ Hides the internal representation of the product from the client.\n+ Product implementations can be swapped in and out because the client only sees an abstract interface.\n\nBuilder Uses and Drawbacks:\n\n- Often used for building composite structures.\n- Constructing objects requires more domain knowledge of the client than when using a Factory.\n\n**Chain of Responsibility**\n\n> Use the Chain of Responsibility Pattern when you want to give more than one object a chance to handle a request.\n\nBenefits:\n\n+ Decouples the sender of the request and its receivers.\n+ Simplifies your object because it doesn't have to know the chain's structure and keep direct references to its\n  members.\n+ Allows you to add or remove responsibilities dynamically by changing the members or order of the chain.\n\nChain of Responsibility Uses and Drawbacks:\n\n- Commonly used in Windows systems to handle events like mouse clicks and keyboard events.\n- Execution of the request isn't guaranteed - it may fall to the end of the chain if no object handles it.\n- Can be hard to observe and debug at runtime.\n\n**Flyweight**\n\n> Use the Flyweight Pattern when one instance of a class can be used to provide many virtual instances.\n\nBenefits:\n\n+ Reduces the number of objects instances at runtime, saving memory.\n+ Centralizes state for many \"virtual\" objects into a single location.\n\nFlyweight Uses and Drawbacks:\n\n- The Flyweight is used when a class has many instances, and they all can be controlled identically.\n- A drawback, of the Flyweight Pattern is once you have implemented it, single, logical instances of the class will not\n  be able to behave independently from the other instances.\n\n**Interpreter**\n\n> Use the Interpreter Pattern to build an interpreter for a language.\n\nWhen you need to implement a simple language, the Interpreter Pattern defines a class based representations for its\ngrammar along with an interpreter to interpret its sentences. To represent the language, you use a class to represent\neach rule in the language.\n\nBenefits:\n\n+ Representing each grammar rule in a class makes the language easy to implement.\n+ Because the grammar is represented by classes, you can easily change or extend the language.\n+ By adding methods to the class structure, you can add new behaviors beyond interpretation, like pretty printing and\n  more sophisticated program validation.\n\nInterpreter Uses and Drawbacks:\n\n- Use Interpreter when you need to implement a simple language.\n- Appropriate when you have a simple grammar and simplicity is more important than efficiency.\n- Used for scripting and programming languages.\n- This pattern can become cumbersome when the number of grammar rules is large. In these cases a parser/compiler\n  generator may be more appropriate.\n\n**Mediator**\n\n> Use the Mediator Pattern to centralize complex communications and control between related objects.\n\nBenefits:\n\n+ Increases the reusability of the objects supported by the Mediator by decoupling them from the system.\n+ Simplifies maintenance of the system by centralizing control logic,\n+ Simplifies and reduces the variety of messages sent between objects in the system.\n\nMediator Uses and Drawbacks:\n\n- The Mediator is commonly used to coordinate related GUI components.\n- A drawback so the Mediator Pattern is that without proper design, the Mediator object itself can become overly\n  complex.\n\n**Memento**\n\n> Use the Memento Pattern when you need to be able to return an object to one of ots previous states: for instance, if\n> your user requests an \"undo\"\n\nThe Memento has 2 goals: Saving the important state of a system's key object. Maintaining the key object's\nencapsulation.\n\nKeeping the Single Responsibility Principle in mind, it is also a good idea to keep the state that you are saving\nseparate from the key object. This separate object that holds the state is known as the Memento object.\n\nBenefits:\n\n+ Keeping the saved state external from the key object helps to maintain cohesion.\n+ Keeps the key object's data encapsulated.\n+ Provides easy-to-implement recovery capability.\n\nMemento Uses and Drawbacks:\n\n- The Memento is used to save state.\n- A drawback to using Memento is that saving and restoring state can be time-consuming.\n- In Java systems, consider using Serialization to save a system's state.\n\n**Prototype**\n\n> Use the Prototype Pattern when creating an instance of a given class is either expensive or complicated.\n\nThe Prototype Pattern allows you to make new instances by copying existing instances.\n\nBenefits:\n\n+ Hides the complexities of making new instances from the client.\n+ Provides the option for the client to generate objects whose type is not known.\n+ In some circumstances, copying an object can be more efficient than creating a new object.\n\nPrototype Uses and Drawbacks:\n\n- Prototype should be considered wen a system must create new objects of many types in a complex class hierarchy.\n- A drawback to using Prototype is that making a copy of an object can sometimes be complicated.\n\n**Visitor**\n\n> Use the Visitor Pattern when you want to add capabilities to a composite of objects and encapsulation is not\n> important.\n\nThe Visitor works hand in hand with a Traverser. The Traverser knows how to navigate to all the objects in a Composite.\nThe Traverser guides the Visitor through the Composite so that the Visitor can collect state as it goes. Once state has\nbeen gathered, the Client can have the Visitor pattern perform various operations on the state.\n\nBenefits:\n\n+ Allows you to add operations to a Composite.\n+ Adding new operations is relatively easy.\n+ The code for operations performed by the Visitor is centralized.\n\nVisitor Drawbacks:\n\n- The Composite classes' encapsulation is broken when the Visitor is used.\n- Because the traversal function is involved, changes to the Composite structure are more difficult.\n"
  },
  {
    "path": "books/kubernetes-book.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# The Kubernetes Book\n\nBook by Nigel Poulton, https://github.com/nigelpoulton/TheK8sBook\n\n- [1: Kubernetes primer](#1-kubernetes-primer)\n- [2: Kubernetes principles of operation](#2-kubernetes-principles-of-operation)\n- [3: Getting Kubernetes](#3-getting-kubernetes)\n- [4: Working with Pods](#4-working-with-pods)\n- [5: Virtual clusters with Namespaces](#5-virtual-clusters-with-namespaces)\n- [6: Kubernetes Deployments](#6-kubernetes-deployments)\n- [7: Kubernetes Services](#7-kubernetes-services)\n- [8: Ingress](#8-ingress)\n- [9: Service discovery deep dive](#9-service-discovery-deep-dive)\n- [10: Kubernetes storage](#10-kubernetes-storage)\n- [11: ConfigMaps and Secrets](#11-configmaps-and-secrets)\n- [12: StatefulSets](#12-statefulsets)\n- [13: API security and RBAC](#13-api-security-and-rbac)\n- [14: The Kubernetes API](#14-the-kubernetes-api)\n- [15: Threat modeling Kubernetes](#15-threat-modeling-kubernetes)\n\n## 1: Kubernetes primer\n\nKubernetes - an application orchestrator, it orchestrates containerized cloud-native microservices apps.\n\n- orchestrator - a system that deploys and manages applications (dynamically respond to changes - scale up/down,\n  self-heal, perform zero-downtime rolling updates)\n- containerized app - app that runs in a container - 1980-1990 physical servers era, 2000-2010 virtual machines and\n  virtualization era, now cloud-native era\n- cloud-native app - designed to meet cloud-like demands of auto-scaling, self-healing, rolling updates, rollbacks and\n  more, cloud-native is about the way applications behave and react to events\n- microservices app - built from lots of small, specialised, independent parts that work together to form a meaningful\n  application\n\nKubernetes enables 2 things Google and the rest of the industry needs:\n\n1. It abstracts underlying infrastructure such as AWS\n2. It makes it easy to move applications on and off clouds\n\nKubernetes vs Docker Swarm - long story short, Kubernetes won. Docker Swarm is still under active development and is\npopular with small companies that need simple alternative to Kubernetes.\n\nKubernetes as the operating system of the cloud:\n\n- you install a traditional OS on a server, and it abstracts server resources and schedules application processes\n- you install Kubernetes on a cloud, and it abstracts cloud resources and schedules application microservices\n\nAt a high level, a cloud/datacenter is a pool of compute, network and storage resources. Kubernetes abstracts them.\nServers are no longer pets, they are cattle.\n\nKubernetes is like a courier service - you package the app as a container, give it a Kubernetes manifest, and let\nKubernetes take care of deploying it and keeping it running.\n\n## 2: Kubernetes principles of operation\n\nKubernetes is 2 things:\n\n- a cluster to run applications on\n    - like any cluster - bunch od machines to host apps\n    - these machines are called \"nodes\" (physical servers, VMs, cloud instances, Raspberry PIs, ...)\n    - cluster is made of:\n        - control plane (the brains) - exposes the API, has a scheduler for assigning work, records the state of the\n          cluster and apps\n        - worker nodes (the muscle) - where user apps run\n- an orchestrator of cloud-native microservices apps\n    - a system that takes care of deploying and managing apps\n\nSimple process to run apps on a Kubernetes cluster:\n\n1. Design and write the application as small independent microservices\n2. Package each microservice as its own container\n3. Wrap each container in a Kubernetes Pod\n4. Deploy Pods to the cluster via higher-level controllers such as Deployments, DaemonSets, StatefulSets, CronJobs, ...\n\nThe Control Plane - runs a collection of system services that make up the control plane of the cluster (Master, Heads,\nHead nodes). Production envs should have multiple control plane nodes - 3 or 5 recommended, and should be spread across\navailability zones. Different services making up the control plane:\n\n- The API server - the Grand Central station of Kubernetes, all communication, between all components, must go through\n  the API server. All roads lead to the API Server.\n- The Cluster Store - the only stateful part of the Control Plane, stores the configuration and the state. Based\n  on `etcd` (a popular distributed database).\n- The Controller Manager and Controllers - all the background controllers that monitor cluster components amd respond to\n  events.\n- The Scheduler - watches the API server for new work tasks and assigns them to appropriate healthy worker nodes. Only\n  responsible for picking the nodes to run tasks, it isn't responsible for running them.\n- The Cloud Controller Manager - its job is to facilitate integrations with cloud services, such as instances,\n  load-balancers, and storage.\n\nWorker nodes - are where user applications run. At a high-level they do 3 things:\n\n1. Watch the API server for new work assignments\n2. Execute work assignments\n3. Report back to the control plane (via the API server)\n\n3 major components:\n\n1. Kubelet - main Kubernetes agent and runs on every worker node. Watches the API server for new work tasks. Executes\n   the task and maintains reporting channel back to the control plane.\n2. Container runtime - kubelet needs it to perform container-related tasks - things like pulling images and starting and\n   stopping containers.\n3. Kube-proxy - runs on every node and is responsible for local cluster networking.\n\nIn order to run on a Kubernetes cluster an application needs to:\n\n1. Be packaged as a container\n2. Be wrapped in a Pod\n3. Be deployed via a declarative manifest file\n\nThe declarative model:\n\n- declare the desired state of an application microservice in a manifest file\n    - desired state - image, how many replicas, which network ports, how to perform updates\n- post it to the API server\n    - using `kubectl` CLI (it uses a HTTP request)\n- Kubernetes stores it in the cluster store as the application's desired state\n- Kubernetes implements the desired state on the cluster\n- A controller makes sure the observed state of the application doesn't vary from the desired state\n    - background reconciliation loops that constantly monitor the state of the cluster, if desired state != observed\n      state - Kubernetes performs the necessary tasks\n\nKubernetes Pod - a wrapper that allows a container to run on a Kubernetes cluster. Atomic unit of scheduling. VMware has\nvirtual machines, Docker has containers, Kubernetes has Pods. In Kubernetes, every container must run inside a Pod. \"\nPod\" comes from \"a pod of whales\" (group of whales is called \"a pod of whales\"). \"Pod\" and \"container\" are often used\ninterchangeably, however it is possible (in some advanced use-cases) to run multiple containers in a single Pod.\n\nPods don't run applications - applications always run in containers, the Pod is just a sandbox to run one or more\ncontainers. Pods are also the minimum unit of scheduling in Kubernetes. If you need to scale an app, you add or remove\nPods. You do not scale by adding more containers to existing Pods.\n\nA pod is only ready for service when all its containers are up and running. A single Pod can only be scheduled to a\nsingle node.\n\nPods are immutable. Whenever we talk about updating Pods, we mean - delete and replace it with a new one. Pods are\nunreliable.\n\nExample controller: Deployments - a high-level Kubernetes object that wraps around a Pod and adds features such as\nself-healing, scaling, zero-downtime rollouts, and versioned rollbacks.\n\nServices - provide reliable networking for a set of Pods. Services have a stable DNS name, IP address and name, they\nload-balance traffic across a dynamic set of Pods. As Pods come and go, the Service observes this, automatically updates\nitself, and continues to provide that stable networking endpoint.\n\nService - a stable network abstraction that provides TCP/UPD load-balancing across a dynamic set of Pods.\n\n## 3: Getting Kubernetes\n\nHosted Kubernetes: AWS Elastic Kubernetes Service, Google Kubernetes Engine, Azure Kubernetes Service. Managing your own\nKubernetes cluster isn't a good use of time and other resources. However, it is easy to rack up large bills if you\nforget to turn off infrastructure when not in use.\n\nThe hardest way to get a Kubernetes cluster is to build it yourself.\n\nPlay with Kubernetes - quick and simple way to get your hands on a development Kubernetes cluster. However, it is time\nlimited and sometimes suffers from capacity and performance issues. Link: https://labs.play-with-k8s.com\n\nDocker Desktop - offers a single-node Kubernetes cluster that you can develop and test with.\n\n`kubectl` is the main Kubernetes command-line tool. At a high-level, `kubectl` converts user-friendly commands into HTTP\nREST requests with JSON content required by the Kubernetes API server.\n\n```shell\nkubectl get nodes\n```\n\n```shell\nkubectl config current-context\n```\n\n```shell\nkubectl config use-context docker-desktop\n```\n\n## 4: Working with Pods\n\nControllers - infuse Pods with super-powers such as self-healing, scaling, rollouts and rollbacks. Every Controller bas\na PodTemplate defining the Pods it deploys and manages. You rarely interact with Pods directly.\n\nPod - the atomic unit of scheduling in Kubernetes. Apps deployed to Kubernetes always run inside Pods. If you deploy an\napp, you deploy it in a Pod. If you terminate an app, you terminate its Pod. If you scale your app up/down, you\nadd/remove Pods.\n\nKubernetes doesn't allow containers to run directly on a cluster, they always have to be wrapped in a Pod.\n\n1. Pods augment containers\n\n- labels - group Pods and associate them with others\n- annotations - add experimental features and integrations with 3rd-party tools\n- probes - test the health and status of Pods and the apps they run, this enables advanced scheduling, updates, and\n  more.\n- affinity and anti-affinity rules - control over where in the cluster Pods are allowed to run\n- termination controls - gracefully terminate Pods and the apps they run\n- security policies - enforce security features\n- resource requests and limits - min. and max. values for CPU, memory, IO, ...\n\nDespite bringing so many features, Pods are super-lightweight and add very little overhead.\n\n```shell\nkubectl explain pods --recursive\n```\n\n```shell\nkubectl explain pod.spec.restartPolicy\n```\n\n2. Pods assist in scheduling\n\nEvery container in a Pods is guaranteed to be scheduled to the same worker node.\n\n3. Pods enable resource sharing\n\nPods provide a shared execution environment for one or more containers (filesystem, network stack, memory, volumes). So\nif a Pod has 2 containers, both containers share the Pod's IP address and can access ony of the Pod's volumes to share\ndata.\n\nThere are 2 ways to deploy a Pod:\n\n- directly via a Pod manifest\n    - called \"Static Pods\", no super-powers like self-healing, scaling, or rolling updates\n- indirectly via a controller\n    - have all the benefits of being monitored by a highly-available controller running on the control-plane\n\nPets vs Cattle paradigm - Pods are cattle, when they die, they get replaced by another. The old one is gone, and a shiny\nnew one (with the same config, but a different IP and UID) magically appears and takes its place.\n\nThis is why applications should always store state and data outside the Pod. It is also why you should not rely on\nindividual Pods - they are ephemeral, here today, gone tomorrow.\n\nDeploying Pods:\n\n1. Define it in a YAML manifest file\n2. Post it to the API server\n3. The API server authenticates and authorizes the request\n4. The configuration (YAML) is validated\n5. The scheduler deploys the Pod to a healthy worker node with enough available resources\n\nIf you are using Docker or containerd as your container runtime, a Pod is actually a special type of container - a pause\ncontainer. This means containers running inside of Pods are really containers running inside containers.\n\nThe Pod Network is flat, meaning every Pod can talk directly to every other Pod without the need for complex routing and\nport mappings. You should use Kubernetes Network Policies.\n\nPod deployment is an atomic operation - all-or-nothing - deployment either succeeds or fails. You will never have a\nscenario where a partially deployed Pod is servicing requests.\n\nPod lifecycle: pending -> running (long-lived Pod) | succeeded (short-lived Pod)\n\n- short-lived - batch jobs, designed to only run until a task completes\n- long-lived - web-servers, remain in the running phase indefinitely, if container fail, the controller may attempt to\n  restart them\n\nPods are immutable objects. You can't modify them after they are deployed. You always replace a Pod with a new one (in\ncase of a failure or update).\n\nIf you need to scale an app, you add or remove Pods (horizontal scaling). You never scale an app by adding more of the\nsame containers to a Pod. Multi-container Pods are only for co-scheduling and co-locating containers that need tight\ncoupling.\n\nCo-locating multiple containers in the same Pod allows containers to be designed with a single responsibility but\nco-operate closely with others.\n\nKubernetes multi-container Pod patterns:\n\n- Sidecar pattern - (most popular) the job of a sidecar is to augment of perform a secondary task for the main\n  application container\n- Adapter pattern - variation of the sidecar pattern where the helper container takes non-standardized output from the\n  main container and rejigs it into a format required by an external system\n- Ambassador pattern - variation of the sidecar pattern where the helper container brokers connectivity to an external\n  system, ambassador containers interface with external systems on behalf of the main app container\n- Init pattern - runs a special init container that is guaranteed to start and complete before your main app container,\n  it is also guaranteed to run only once\n\n```shell\nkubectl get pods\n```\n\nGet pods info with additional info:\n\n```shell\nkubectl get pods -o wide\n```\n\nGet pod info, a full copy of the Pod from the cluster:\n\n```shell\nkubectl get pods -o yaml\n```\n\nGet even more info; spec - desired state, status - observed state:\n\n```shell\nkubectl get pods hello-pod -o yaml\n```\n\nPod manifest files:\n\n- kind - tells the Kubernetes the type of object being defined\n- apiVersion - defines the schema version to use when creating the object\n- metadata - names, labels, annotations, and a Namespace\n- spec - define the containers the Pod will run\n\n```shell\nkubectl apply -f pod.yml\n```\n\n`kubectl describe` - a nicely formatted multi-line overview of an object: You can add the `--watch` flag to the command\nto monitor it and see when the status changes to _Running_.\n\n```shell\nkubectl describe pods hello-pod\n```\n\nYou can see ordering and names of containers using this command.\n\n`kubectl logs` - like other Pod related commands, if you don't specify `--container`, it executes against the first\ncontainer in the pod:\n\n```shell\nkubectl logs hello-pod\n```\n\n```shell\nkubectl logs hello-pod --container hello-ctr\n```\n\n`kubectl exec` - execute commands inside a running Pod\n\n```shell\nkubectl exec hello-pod -- pwd\n```\n\nGet shell access:\n\n```shell\nkubectl exec -it hello-pod hello-pod -- sh\n```\n\n`-it` flag makes the session interactive and connects STDIN and STDOUT on your terminal to STD and STDOUT inside the\nfirst container in the Pod.\n\nPod hostname - every container in a Pod inherits its hostname from the name of the Pod (`metadata.name`). With this in\nmind, you should always set Pod names as valid DNS names (a-z, 0-9, +, -, .).\n\n`spec.initCointainers` block defines one or more containers that Kubernetes guarantees will run and complete before main\napp container starts.\n\n```shell\nkubectl delete pod git-sync\n```\n\n## 5: Virtual clusters with Namespaces\n\nNamespaces are a native way to divide a single Kubernetes cluster into multiple virtual clusters.\n\nNamespaces partition a Kubernetes cluster and are designed as an easy way to apply quotas and policies to groups of\nobjects.\n\nSee all Kubernetes API resources supported in your cluster:\n\n```shell\nkubectl api-resources\n```\n\nNamespaces are a good way of sharing a single cluster among different departments and environments. For example, a\nsingle cluster might have the following namespaces: dev, test, qa. Each one can have its own set of users and\npermissions, as well as unique resource quotas.\n\nNamespaces are not good for isolating hostile workloads. A compromised container or Pod in one Namespace can wreak havoc\nin other Namespaces. For example, you shouldn't place competitors such as Pepsi and Coke, in separate Namespaces on the\nsame shared cluster.\n\nIf you need strong workload isolation, the current method is to use multiple clusters. There are some attempts to do\nsomething different, but the safest and most common way of isolating workloads is putting them on their own clusters.\n\nEvery Kubernetes cluster has a set of pre-created Namespaces (virtual clusters):\n\n```shell\nkubectl get namespaces\n```\n\n- `default` is where newly created objects go if you don't specify a Namespace\n- `kube-system` is where DNS, the metrics server, and other control plane components run\n- `kube-public` is for objects that need to be readable by anyone\n- `kube-node-lease` is used for node heartbeat and managing node leases\n\n```shell\nkubectl describe namespaces default\n```\n\nList service objects in a selected namespace:\n\n```shell\nkubectl get svc --namespace kube-system\n```\n\n```shell\nkubectl get svc --all-namespaces\n```\n\nCreate a new Namespace, Pods don't create a Namespace automatically, a Namespace must be created first:\n\n```shell\nkubectl create ns kydra\n```\n\nSwitch between Namespaces:\n\n```shell\nkubens shield\n```\n\nThere are 2 ways to deploy objects to a specific Namespace:\n\n- imperatively - requires you to add the `-n` or `--namespace` flag to commands\n- declaratively - requires you to specify the Namespace in the YAML\n\nDelete Pods:\n\n```shell\nkubectl detele -f shield.app.yml\n```\n\nDelete Namespace:\n\n```shell\nkubectl delete ns shield\n```\n\n## 6: Kubernetes Deployments\n\nUse Deployments to bring cloud-native features such as self-healing, scaling, rolling updates, and versioned rollbacks\nto stateless apps on Kubernetes.\n\nKubernetes offers several controllers that augment Pods with important capabilities. The Deployment controller is\ndesigned for stateless apps.\n\nThe Deployment spec is a declarative YAML object where you describe the desired state of a stateless app. The controller\nelement operates as a backgrounds loop on the control plane, reconciling observed state with desired state.\n\nYou start with a stateless application, package it as a container, then define it in a Pod template. At this point you\nhave a static Pod - it does not self-heal, autoscale or is easy to update. That is why you almost always wrap them in a\nDeployment object.\n\nA Deployment object only manages a single Pod template.\n\nDeployments rely heavily on a ReplicaSet. Replica Sets manage Pods and bring self-healing and scaling. Deployments\nmanage ReplicaSet and add rollouts and rollbacks. It is not recommended to manage ReplicaSets directly. Think of\nDeployments as managing ReplicaSets, and ReplicaSets as managing Pods.\n\nDeployments:\n\n- if Pods managed by a Deployment fail, they will be replaced (self-healing)\n- if Pods managed by a Deployment see increased or decreased load, they can be scaled\n\n3 concepts fundamental to everything about Kubernetes:\n\n- desired state (what you want)\n- observed state (what you have)\n- reconciliation (if desired state != observed state, a process of reconciliation attempts to bring observed state into\n  sync with desired state)\n\nDeclarative model is a method of telling Kubernetes your desired state, while avoiding the detail of how to implement\nit. You leave the _how_ up to Kubernetes.\n\nZero-downtime rolling-updates of stateless apps are what Deployments are about. They require a couple of things from\nyour microservice applications in order to work properly:\n\n- loose coupling via APIs\n- backwards and forwards compatibility\n\nEach Deployment describes all the following:\n\n- how many Pod replicas\n- what images to use for the Pod's containers\n- what network ports to expose\n- details about how to perform rolling updates\n\nDeploying a new version: update the dame Deployment YAML file with the new image version and re-post it to the API\nserver.\n\nRollback: you wind one of the old ReplicaSets up while you wind the current one down.\n\nKubernetes gives you fine-grained control over how rollouts and rollbacks proceed - insert delays, control the pace and\ncadence of releases, you can probe the health and status of updated replicas.\n\nYAML components:\n\n- `apiVersion: apps/v1` - Deployments available in the apps/v1 subgroup\n- `kind: Deployment` - Deployment object\n- `metadata.name: hello-deploy` - a valid DNS name\n- `spec` - anything nested below `spec` relates to the Deployment\n- `spec.templates` - the Pod template Deployments uses to stamp out Pod replicas\n- `spec.replicas` - how many Pod replicas the Deployment should create and manage\n- `spec.selector` - a list of labels that Pods must have in order for Deployments to manage them. This tells Kubernetes\n  which Pods to terminate and replace when performing the rollout.\n- `spec.revisionHistoryLimit` - how many older versions/ReplicaSets to keep\n- `spec.progressDeadlineSeconds` - tells Kubernetes how long to wait during a rollout for each new replica to come\n  online\n- `spec.strategy` - tells the Deployment controller how to upgrade the Pods when a rollout occurs\n    - update using the Rolling Update strategy\n    - never have more than one Pod below desired state (`maxUnavailable: 1`) - you will never have less than 9 replicas\n      during the update process\n    - never have more than one Pod above desired state (`maxSurge: 1`) - never have more than qq replicas during the\n      update process\n    - net result - update two Pods at a time, the delta between 9 and 11 is 2\n\n```yaml\nspec:\n  replicas: 10\n  selector:\n    matchLabels:\n      app: hello-world\n  revisionHistoryLimit: 5\n  progressDeadlineSeconds: 300\n  minReadySeconds: 10\n  strategy:\n    type: RollingUpdate\n    rollingUpdate:\n      maxUnavailable: 1\n      maxSurge: 1\n  template:\n    metadata:\n      labels:\n        app: hello-world\n    spec:\n      containers:\n        - name: hello-pod\n          image: nigelpoulton/k8sbook:2.0\n          ports:\n            - containerPort: 8080\n```\n\nDeploy to the cluster:\n\n```shell\nkubectl apply -f deploy.yml\n```\n\n```shell\nkubectl get deploy hello-deploy\n```\n\n```shell\nkubectl describe deploy hello-deploy\n```\n\n```shell\nkubectl get replicaset\n```\n\n```shell\nkubectl describe replicaset hello-deploy-5cd5dcf7d7\n```\n\nIn order to access a web app from a stable name or IP address, or even from outside the cluster, you need a Kubernetes\nservice object. A Service provide reliable networking for a set of Pods.\n\nScaling the number of replicas manually - edit the YAML and set a different number of replicas or use the command:\n\n```shell\nkubectl scale deploy hello-deploy --replicas 5\n```\n\nPerforming a rolling update (by replacement because Pods are immutable):\n\n```shell\nkubectl apply -f deploy.yml\n```\n\n```shell\nkubectl rollout status deployment hello-deploy\n```\n\nPausing & resuming deployment:\n\n```shell\nkubectl rollout pause deploy hello-deploy\n```\n\n```shell\nkubectl rollout resume deploy hello-deploy\n```\n\nDetailed deployment info:\n\n```shell\nkubectl describe deploy hello-deploy\n```\n\nKubernetes maintains a documented revision history of rollouts:\n\n```shell\nkubectl rollout history deployment hello-deploy\n```\n\nRolling Updates create new ReplicaSets, old ReplicaSets aren't deleted. The fact the old ones still exist makes them\nideal for executing rollbacks:\n\n```shell\nkubectl rollout undo deployment hello-deploy --to-revision=1\n```\n\nModern versions of Kubernetes use the system generated pod-template-hash label so only Pods that were originally created\nby the Deployment/ReplicaSet will be managed:\n\n```shell\nkubectl get pods --show-labels \n```\n\n## 7: Kubernetes Services\n\nControllers add self-healing, scaling and rollouts. Despite all of this, Pods are still unreliable, and you should never\nconnect directly to them.\n\nServices provide stable and reliable networking for a set of unreliable Pods. Every Service gets its onw stable IP\naddress, its own DNS name, and its own stable port. The Service fronts the Pods with a stable UP, DNS, and port. It also\nload-balances traffic to Pods with the right labels.\n\nWith a Service in place, the Pods can scale up/down, they can fail, and they can be updated and rolled back. Despite all\nof this, clients will continue to access them without interruption. The Service is observing the changes and updating\nits lists of healthy Pods it sends traffic to.\n\nThink of Services as having a static front-end and a dynamic back-end.\n\nServices are loosely coupled with Pods via labels and selectors. This is ihe same technology that loosely couples\nDeployments to Pods.\n\nEvery time you create a Service, Kubernetes automatically creates an associated Endpoints object. The Endpoints object\nis used to store a dynamic list of healthy Pods matching the Service's label selector. Any new Pods that match the\nselector get added to the Endpoints object.\n\nTypes of Services:\n\n- accessible from inside the cluster\n    - ClusterIP - default type, a stable virtual IP, every service you create gets a ClusterIP\n- accessible from outside the cluster\n    - NodePort - built on top of CLusterIP and allow external clients to hit a dedicated port on every cluster node and\n      reach the Service\n    - LoadBalancer- make external access even easier by integrating with an internet-facing load-balancer on your\n      underlying cloud platform\n\nExample Service object:\n\n```yml\nspec:\n  type: NodePort\n  ports:\n    - port: 8080       -- listen internally on port 8080\n      nodePort: 30001  -- listen externally on 30001\n      targetPort: 8080 -- forward traffic to the application Pods on port 8080\n      protocol: TCP    -- use TCP (default)\n  selector: -- send traffic to all healthy Pods on the cluster with the following metadata.labels\n    chapter: services\n```\n\nGet Endpoint object:\n\n```shell\nkubectl get endpointslices\n```\n\nGet details of each healthy Pods:\n\n```shell\nkubectl describe endpointslice svc-test-xgnsv\n```\n\nIf your cluster is on a cloud platform, deploying a Service with `type=LoadBalancer` will provision one of your cloud's\ninternet-facing load-balancers and configure it to send traffic to your Service.\n\n```shell\nkubectl get svc --watch\n```\n\nAfter ~2 minutes the value in the EXTERNAL-IP column will appear.\n\nDelete multiple resources:\n\n```shell\nkubectl delete -f deploy.yml -f lb.yml -f svc.yml\n```\n\n## 8: Ingress\n\nIngress is all about accessing multiple web applications through a single LoadBalancer Service.\n\n- `Load Balancer` refers to a Kubernetes Service object of `type=LoadBalancer`\n- `load-balancer` refers to the internet-facing load-balancer on the underlying cloud\n\nIngress exposes multiple Services through a single cloud load-balancer. Cloud load-balancers are expensive.\n\n```shell\nkubectl get ing\n```\n\nIngress classes allow you to run multiple Ingress controllers on a single cluster:\n\n- assign each Ingress controller to an Ingress class\n- when you create Ingress object, you assign them to an Ingress class\n\n```shell\nkubectl get ingressclass\n```\n\nIngress is a way to expose multiple applications and Kubernetes Services via a single cloud load-balancer. They are\nstable objects in the API but have feature overlap with a lot of service meshes - if you are running a service mesh you\nmay not need Ingress.\n\n## 9: Service discovery deep dive\n\nFinding stuff on a crazy-busy platform like Kubernetes is hard. Service discovery makes it simple. Apps need a way to\nfind the other apps they work with.\n\n2 components to service discovery:\n\n- registration - is the process of an application listing its connection details in a service registry so other apps can\n  find it and consume it. Kubernetes uses its internal DNS as a service registry. All Kubernetes Services are\n  automatically registered with DNS.\n- discovery - for service discovery to work, apps need to know to the name of the Service fronting the apps they want to\n  connect to (rast is taken care of by Kubernetes)\n\nGet Pods running the cluster DNS:\n\n```shell\nkubectl get pods -n kube-system -l k8s-app=kube-dns\n```\n\nService discovery works like a typical routing - check your own table, if not found pass it to the next one.\n\nDomain name format: _object-name_._namespace_.svc.cluster.local, object name has to be unique within a Namespace, but\nnot across Namespaces.\n\n## 10: Kubernetes storage\n\nKubernetes supports lots of types of storage from lots of different places. No matter what type of storage, or where is\ncomes from, when it is exposed on Kubernetes it is called a volume. All that's required is a plugin allowing their\nstorage resources to be surfaced as volumes in Kubernetes.\n\nContainer Storage Interface - an open standard aimed at providing a clean storage interface for container orchestrators\nsuch as Kubernetes.\n\nCore storage-related API objects:\n\n- Persistent Volumes - are how external storage assets are represented in Kubernetes\n- Persistent Volume Claims - like tickets that grant access to a PV\n- Storage Classes - makes it all dynamic\n\nStorage Providers - AWS Elastic Block Store, Azure File, NFS volumes, ...\n\nThe CSI is a vital place of the Kubernetes storage, however, unless you are a developer writing a storage plugins, you\nare unlikely to interact with it very often.\n\nWorking with Storage Classes:\n\n- Create one or more StorageClasses on Kubernetes\n- Deploy Pods with PVCs that reference those Storage Classes\n\nOther settings:\n\n- Access mode:\n    - ReadWriteOnce - a PV that can be only bound as R/W by a single PVC\n    - ReadWriteMany - a PV that can be bound as R/W by multiple PCVs\n    - ReadOnlyMany - a PV that can be bound as R/O by multiple PVCs\n- Reclaim policy - how to deal with a PV when its PVC is released:\n    - Delete - it deletes the PV and associated storage resource on the external storage system\n    - Retain - keep the associated PV object on the cluster as well as any data stored on the associated external asset\n\n```shell\nkubectl get sc\n```\n\n```shell\nkubectl get pv\n```\n\n```shell\nkubectl get pvc\n```\n\n## 11: ConfigMaps and Secrets\n\nMost apps comprise two main parts: the app & the configuration. Coupling the application and the configuration into a\nsingle easy-to deploy unit is an anti-pattern. De-coupling the application and the configuration has the following\nbenefits:\n\n- re-usable application images (you can use the same image on dev, staging, prod)\n- simpler development and testing (easier to spot a mistake when the app and the config are decoupled, e.g. app crash\n  after config change)\n- simpler and fewer disruptive changes\n\nKubernetes provides an object called a ConfigMap that lets you store configuration data outside a Pod. It also makes\nit easy to inject config into Pods at run-time.\n\nYou should not use ConfigMaps to store sensitive data such as certificates and passwords. Kubernetes provides a\ndifferent object, called a Secret, for storing sensitive data.\n\nBehind the scenes, ConfigMaps are a map of key-value pairs, and we call each pair an entry:\n\n- Keys - an arbitrary name that can be created from alphanumerics, dashes, dots, and underscores\n- Values - anything, including multiple lines with carriage returns\n- Keys and Values are separated by a colon -- `key:value`\n\nData in a ConfigMap, can be injected into containers at run-time via any of the following methods:\n\n- environmental variables (static variables, updates made to the map don't get reflected in running containers, major\n  reason not to use environmental variables)\n- arguments to the container's startup command (the most limited methods, shares environmental variables' limitations)\n- files in a volume (the most flexible method)\n\nConfigMap object don'§t have the concept of state (desired/actual) - this is why they have a `data` block instead\nof `spec` and `status` blocks.\n\nCreating a ConfigMap declaratively:\n\n```yaml\nkind: ConfigMap\napiVersion: v1\nmetadata:\n  name: multimap\ndata:\n  given: Nigel\n  family: Poulton\n```\n\n```shell\nkubectl apply -f multimap.yml\n```\n\nConfigMaps are extremely flexible and can be used to insert complex configurations, including JSON files and even\nscripts, into containers at run-time.\n\nView logs from a pod from a container:\n\n```shell\nkubectl logs startup-pod -c args1\n```\n\nConfigMaps with volumes is the most flexible option. You can reference entire configuration files, as well as make\nupdates to the ConfigMap that will be reflected in running containers.\n\n1. Create the ConfigMap\n2. Create a ConfigMap volume in the Pod template\n3. Mount the ConfigMap volume into the container\n4. Entries in the ConfigMap will appear in the container as individual files\n\nUpdate to a ConfigMap via re-applying ConfigMap YML.\n\nCheck ENV variable value:\n\n```shell\nkubectl exec cmvol -- cat /etc/name/given\n```\n\nSecrets are almost identical to ConfigMaps - they hold application configuration data that is injected into containers\nat run-time. Secrets are designed for sensitive data such as passwords, certificates, and OAuth tokens.\n\nDespite being designed for sensitive data, Kubernetes does not encrypt Secrets in the cluster store. Fortunately, it is\npossible to configure encryption-ar-rest with EncryptionConfiguration objects. Despite this, many people opt to use\nexternal 3rd-party tools, such as HasiCorp Vault.\n\nA typical workflow for a Secret is as follows:\n\n1. The Secret is created and persisted to the cluster store as an un-encrypted object\n2. A Pod that uses it gets scheduled to cluster node\n3. The Secret is transferred over the network, un-encrypted, to the node\n4. The kubelet on the node starts the Pod and its containers\n5. The Secret is mounted into the container via in-memory tmpfs filesystem and decoded from base64 to plain text\n6. The application consumes it\n7. When the Pod is deleted, the Secret is deleted from the node\n\n```shell\nkubectl get secrets\n```\n\nCreate a Secret manually:\n\n```shell\nkubectl create secret generic creds --from-literal user=piotr --from-literal pwd=qwerty\n```\n\nDecode base-64:\n\n```shell\necho cGlvdHI= | base64 -d\n```\n\n```yaml\napiVersion: v1\nkind: Secret\nmetadata:\n  name: tkb-secret\n  labels:\n    chapter: configmaps\ntype: Opaque\ndata: -- stringData when using plaintext\n  username: bmlnZWxwb3VsdG9u\n  password: UGFzc3dvcmQxMjM=\n```\n\nThe most flexible way to inject a Secret into a Pod is via a special type of volume called a Secret volume. Secret vols\nare automatically mounted as read-only to prevent containers and applications accidentally mutating them.\n\n## 12: StatefulSets\n\nStateful application - application that creates and saves valuable data, for example an app that saves data about client\nsessions and uses it for future sessions, or a database.\n\nStatefulSets guarantee:\n\n- predictable and persistent Pod names\n    - name format: `StatefulSetName-Integer`\n- predictable and persistent DNS hostnames\n- predictable and persistent volume bindings\n\nFailed Pods managed by a StatefulSet will be replaced by new Pods with the exact same Pod name, the exact same DNS\nhostname, and the exact same volumes. This is true even if the replacement is started on a different cluster node. The\nsame is not true of Pods managed by a Deployment.\n\nStatefulSets create one Pod at a time, and always wait for previous Pods to be running and ready before creating the\nnext.\n\nKnowing the order in which Pods will be scaled down, as well as knowing that Pods will not be terminated in parallel, is\na game-changer for many stateful apps.\n\nNote: deleting a StatefulSet object does not terminate Pods in order, with this in mind, you may want to scale down a\nStatefulSet to 0 replicas before deleting it.\n\nHeadless Service is a regular Kubernetes Service object without an IP address. It becomes a StatefulSet's Governing\nService when you list it in the StatefulSet config under `spec.serviceName`.\n\nStatefulSets are only a framework. Applications need to be written in ways to take advantage of the way StatefulSets\nbehave.\n\n## 13: API security and RBAC\n\nKubernetes is API-centric and the API is served through the API server.\n\nAuthentication (authN = \"auth en\") is about providing your identity. All requests to the API server have to include\ncredentials, and the authentication layer is responsible for verifying them. The authentication layer in Kubernetes is\npluggable, and popular modules include integration with external identity management systems such as Amazon Identity\nAccess Management.\n\nIn fact, Kubernetes forces you to use external identity management system.\n\nCluster details and credentials are stored in a `kubeconfig` file.\n\n```shell\nkubectl config view\n```\n\nAuthorization (authZ - \"auth zee\") - RBAC (Role-Based Access Control) - happens immediately after successful\nauthorization. It is about three things: users, actions, resources. Which users can perform which actions agains which\nresources.\n\nAdmission Control runs immediately after successful Authentication and Authorization and is all about policies. There\nare 2 types of admission controllers: mutating (check for compliance and can modify requests) and validating (check for\npolicy compliance, without request modification).\n\nMost real-world clusters will have a lot of admission controllers enabled. Example: a policy to require `env=prod`\nlabel, admission control can verify presence and add a label when it is missing.\n\n## 14: The Kubernetes API\n\nKubernetes is API centric. This means everything in Kubernetes is about the API, and everything ges through the API and\nAPI server. For most part, you will use `kubectl` to send requests, however you can craft them in code.\n\n```shell\nkubectl proxy --port 9000 &\n```\n\n```shell\ncurl http://localhost:9000/api/v1/pods\n```\n\nThe Kubernetes API is divided into 2 groups:\n\n- the core group - mature objects that were created in the early dats of Kubernetes before the API was divided into\n  groups, located in `api/v1`\n- the named group - the future of the API, all new resources get added to named groups\n\n```shell\nkubectl api-resources\n```\n\nKubernetes has a strict process for adding new resources to the API. They come in as _alpha_ (experimental, can be\nbuggy), progress through _beta_ (pre-release), and eventually reach _stable_.\n\nIt is possible to write your custom controller or resource.\n\n## 15: Threat modeling Kubernetes\n\nThreat modeling is the process of identifying vulnerabilities. The STRIDE model:\n\n- Spoofing\n    - pretending to be somebody else with the aim of gaining extra privileges on a system\n- Tampering\n    - the act of changing something in a malicious way, so you can cause one of the following:\n        - denial of service - tampering with the resource to make it unusable\n        - elevation of privilege - tampering with a resource to gain additional privileges\n- Repudiation\n    - creating doubt about something, non-repudiation is proving certain actions were carried out by certain individuals\n- Information disclosure\n    - when sensitive data is leaked\n- Denial of service\n    - making something unavailable, many types of DoS attacks, but a well-known variation is overloading system to the\n      point it can no longer service requests\n- Elevation of privilege\n    - gaining higher access than what is granted, usually in order to cause damage or gain unauthorized access\n"
  },
  {
    "path": "books/kubernetes-in-action.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Kubernetes in Action, Second Edition \n\nBook by Marko Lukša\n"
  },
  {
    "path": "books/nlp-book.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition\n\nBook by Daniel Jurafsky and James H. Martin (December 2020 draft)\n\n- [Chapter 2: Regular Expressions, Text Normalization, Edit Distance](#chapter-2-regular-expressions-text-normalization-edit-distance)\n- [Chapter 3: N-gram Language Models](#chapter-3-n-gram-language-models)\n- [Chapter 4: Naive Bayes and Sentiment Classification](#chapter-4-naive-bayes-and-sentiment-classification)\n- [Chapter 5: Logistic Regression](#chapter-5-logistic-regression)\n- [Chapter 6: Vector Semantics and Embeddings](#chapter-6-vector-semantics-and-embeddings)\n- [Chapter 7: Neural Networks and Neural Language Models](#chapter-7-neural-networks-and-neural-language-models)\n- [Chapter 8: Sequence Labeling for Parts of Speech and Named Entities](#chapter-8-sequence-labeling-for-parts-of-speech-and-named-entities)\n- [Chapter 9: Deep Learning Architectures for Sequence Processing](#chapter-9-deep-learning-architectures-for-sequence-processing)\n- [Chapter 10](#chapter-10)\n- [Chapter 11: Machine Translation and Encode-Decoder Models](#chapter-11-machine-translation-and-encode-decoder-models)\n- [Chapter 12: Constituency Grammars](#chapter-12-constituency-grammars)\n- [Chapter 13-16](#chapter-13-16)\n- [Chapter 17: Information Extraction](#chapter-17-information-extraction)\n- [Chapter 18: Word Senses and WordNet](#chapter-18-word-senses-and-wordnet)\n- [Chapter 19](#chapter-19)\n- [Chapter 20: Lexicons for Sentiment, Affect and Connotation](#chapter-20-lexicons-for-sentiment-affect-and-connotation)\n- [Chapter 21-22](#chapter-21-22)\n- [Chapter 23: Question Answering](#chapter-23-question-answering)\n- [Chapter 24: Chatbots & Dialogue Systems](#chapter-24-chatbots--dialogue-systems)\n- [Chapter 25: Phonetics](#chapter-25-phonetics)\n- [Chapter 26: Automatic Speech Recognition and Text-to-speech](#chapter-26-automatic-speech-recognition-and-text-to-speech)\n\n## Chapter 2: Regular Expressions, Text Normalization, Edit Distance\n\n*Regular Expressions*\n\nRegular expression is an algebraic notation for characterising a set of strings.\n\nKleene * (cleany star) - zero or more occurrences.\n\nKleene + - at least one\n\nAnchors - special characters that *anchor* regular expressions to particular places in a string (`^` - start, `$` - end\nof a string).\n\n`^` has multiple meanings:\n\n1. match start of the line\n2. negation inside square brackets `[^Ss]` - neither `S` nor `s`\n\nPipe symbol `|` also known as \"disjunction\". Logical OR. `cat|dog` match either `cat` or `dog`.\n\nOperator precedence hierarchy:\n\n1. Parenthesis: `()`\n2. Counters: `* + ? {}`, `{}` - explicit counter\n3. Sequences and anchors: `sequence ^the end$`\n4. Disjunction: `|`\n\nRegular expressions are greedy, however there is a way to enforce non-greedy behaviour -> `*?` - Kleene star that\nmatches as little text as possible, `+?` - Kleene plus that matches as little text as possible.\n\nFixing RE errors might require following efforts:\n\n- Increasing precision (minimising false positives - incorrectly matched)\n\n- Increasing recall (minimising false negatives - incorrectly missed)\n\n*Substitution* - easiest to explain with an example:\n\n```\nthe (.*)er they were, the \\1er they will be\n--- will match ---\nthe bigger they were, the bigger they will be\n```\n\nNumber operator, e.g.: `\\1` allows repeating matched group. So parenthesis operator not only allows to group but also\nstore in a numbered register. It is possible to disable register and use non-capturing group,\ne.g.: `(:?some: a few) (people|cats) like some \\1`. Famous chatbot ELIZA used a series of regular expressions\nsubstitutions.\n\n```\nI'M (depressed|sad) -> I AM SORRY TO HEAR YOU ARE \\1\n```\n\nLook ahead - look ahead in the text to see if some pattern matches BUT not advance the match cursor.\n\nNegative lookahead - used for ruling out special cases, e.g. rule out strings starting with word\nVolcano: `(?!Volcano)[A-Za-z]+`\n\n*Words*\n\nFragment - broken-off word, \"I do main- mainly\", \"main-\" is a fragment here\n\nFiller - \"um, uh\" - used in spoken language, problem: should be treated as a word? Fragment and fillers are 2 kinds of\ndisfluencies.\n\nType - number of distinct words in a corpus. When we speak about number of words in the language, we are generally\nreferring to word types.\n\nHerdan's Law/Heap's Law - relationship between number of types (`|V|`) and number of tokens (`N`) in the corpora:\n$$ |V| = kN^{\\beta} $$\n(k and 0 < B < 1 are constants)\n\n*Corpora*\n\nWriters, speakers have specific styles of communicating, use specific dialects, text can vary by time, place, function,\nrace, gender, age, socioeconomic class.\n\nCode switching - common practice for speakers and writers to use multiple languages in single communicative act\n\nWhen preparing a computational models for language processing it is useful to prepare data-sheet, document answering\nquestions like: Who produced the text? In what context? For what purpose? In what language? What was race, gender, ...\nof the authors? How data was annotated?\n\n*Text Normalisation*\n\n1. Tokenisation (segmentation)\n2. Normalising word formats\n3. Segmenting sentences\n\n*Tokenisation*\n\nUNIX's `tr` command can be used for quick tokenisation of English texts.\n\nProblem: Keep specific words together: `2020/02/02`, `km/h`, `$65`, `www.github.com`, `100 000`, `I'm`, `New York`\n\nTokeniser can be used to expand clitic contractions: `we're -> we are`. Tokenisation is tied up with Named Entity\nRecognition. Tokenisation needs to be fast, hence often uses deterministic algorithms based on regular expressions\ncompiled into efficient finite state automata. Tokenisation is more complex for e.g.: Chinese or Japanese (languages not\nusing spaces for separating words). For Japanese algorithms like words segmentation work better. Also, it is possible to\nuse neural networks for the task of tokenisation.\n\nPenn Treebank Tokeniser\n\n- separates clitics (`doesn't -> does n't`)\n- keeps hyphenated words together (`close-up`, `Bielsko-Biała`)\n- separates out all punctuation\n\nByte-Pair Encoding\n\n- begins with a vocabulary that is set of all individual characters\n- examines training corpus, chooses the two symbols that are most frequently adjacent (A, B -> AB)\n- continues to count and merge, creating longer and longer character strings, until k merges (parameter of the\n  algorithm)\n- most words will be represented as full symbols, few rare will have to be represented by their parts\n\n*Normalisation*\n\nTask of putting words in a standard format, choosing a single normal form for words with multiple forms like the USA/US.\nValuable proces despite spelling information is lost.\n\nCase folding - mapping everything to lower/upper case. However, might give wrong results, e.g.: US (country) -> us (\npeople, we)\n\n*Lemmatisation*\n\nTask of determining that two words have the same root (am, are, is -> be). Useful for web search - usually we want all\nforms to be found. Requires morphological parsing of the word. Morphology is the study of the way words are built up\nfrom smaller meaning-bearing units called morphemes. Morphemes have 2 classes: stems - central part of the word and\naffixes - (prefixes and suffixes).\n\nThe Porter Stemmer\n\nLemmatisation is hard, that's why sometimes we use stemming.\n\n```\nThis -> Thi, was -> wa, Bone's -> Bone s, ...\n```\n\nStemming is based on series of rules, e.g.: `ATIONAL -> ATE` (relational -> relate). They do make errors, but are fast\nand deterministic.\n\nSentence Segmentation\n\n`?`, `!` are unambiguous, `.` is unfortunately ambiguous, it doesn't need to mean sentence end. Rule-based approach or\nmachine learning.\n\n*Minimum Edit Distance* - minimum number of editing operations (add, delete, substitution) needed to transform one\nstring into another.\n\nHow to find Minimal Edit Distance? This can be thought of as the shortest path problem. Shortest sequence of edits from\none string to another. This can be solved using dynamic programming (table-driven method for solving problems by\ncombining solutions to sub-problems).\n\n## Chapter 3: N-gram Language Models\n\nAssigning probabilities of upcoming words in a sentence is a very important task in speech recognition, spelling\ncorrection, machine translation and AAC systems. Systems that assign probabilities to sequences of models are called **\nlanguage models**. Simplest model is an n-gram.\n\n*P(w|h)* - the probability of a word *w* given some history *h*. $$ P(the|its\\ water\\ is\\ so\\ transparent\\ that) =\n\\dfrac{count(its\\ water\\ is\\ so\\ transparent\\ that\\ the)}{count(its\\ water\\ is\\ so\\ transparent\\ that)} $$ You can\ncompute these probabilities for a large corpus, e.g. wikipedia. This method works fine in many cases, but it turned out\neven the web can not give us good estimates in most cases - language is dynamic, you are not able to count ALL the\npossible sentences. Hence, there is a need for introducing more clever way for estimating the probability *P(w|h)*.\n\nInstead of computing the probability of a word given its entire history, we can approximate the history by just the last\nfew words. The bigram model approximates the probability by taking the last word, so for the example we had earlier (so\nin general: n-gram takes *n - 1* words into the past, trigrams are most commonly used, 4/5-grams are used when there is\nsufficient training data):\n$$ P(the|its\\ water\\ is\\ so\\ transparent\\ that) \\approx P(the|that)\n$$ This assumption, that next word depends on the previous one is called a **Markov** assumption.\n\nProbability of a sentence can be calculated using chain rule of probability:\n$$ P(<s>\\ i\\ want\\ english\\ food\\ </s>) = P(i|<s>)P(want|i)P(english|want)P(food|english)P(</s>|food) =\\ ... $$ Such\ntechnique is able to capture e.g. cultural things - people more often look for Chinese food than English. Language\nmodels are always computed in log format - log probabilities. Why? Probability always fall between 0 and 1, multiplying\nsmall float numbers - you end up with numerical underflow, using logarithms you get numbers that are not as small.\n\n*Evaluating Language Models*\n\nBest way to evaluate the performance of a language model is to embed it in an application and measure how much the\napplication improves - **extrinsic evaluation**. However, this technique requires running multiple models in order to\nmeasure the improvement. Better approach is to use  **intrinsic evaluation** - standard approach from ML, training set\nand validation (unseen) set. So the better predictions on the test set, the better model you got. Sample from test set\ncan not appear in training set - this introduces bias - probabilities gets too high (unreliable) - huge inaccuracies in\nperplexity - probability based metric. If a particular test set is used too often, we implicitly tune to its\ncharacteristics.\n\n*Perplexity*\n\nPP for short, metric used for evaluating language model. Perplexity on a test set is the inverse probability of the test\nset, normalised by the number of words. Minimising perplexity is equivalent to maximising the test set probability\naccording to the language model.\n\nAnother way of thinking about perplexity: weighted average branching factor (branching factor - number of possible next\nwords that can follow any words).\n\nThe more information the n-gram gives us about the word sequence, the lower the perplexity (unigram: 962, bigram: 170,\ntrigram: 109).\n\nAn intrinsic improvement in perplexity does not guarantee an extrinsic improvement in the performance. In other words:\nbecause some metric shows your model is great, it does not mean it will do so great in real life. Perplexity should be\nconfirmed by an end-to-end evaluation of a real task.\n\n*Generalisation and zeros*\n\nN-gram model is highly dependant on the training model, also it does better job as we increase *n*. You need to use\nsimilar genres for the training - Shakespearian English is far different from WSJ's English. To build model for\ntranslating legal documents you need to train it on legal documents, you need to build a questions answering system, you\nneed to use questions for training. It is important to use appropriate dialects and variety (African American Language,\nNigerian English, ...).\n\nZeros: Imagine you trained a model on a corpus, \"denied the: allegations, speculation, rumours, report\", but for the\ntest you check phrases like \"denied the: offer, loan\", model would estimate probability as 0:\n$$ P(offer|denied\\ the) = 0 $$ This is bad... if you want to calculate perplexity, you would need to divide by zero.\nWhich is kinda problematic.\n\nSo what about words we haven't seen before (open vocabulary -> out of vocabulary words / unknown words)? Add pseudo\nword `<UNK>`. You can use this tag to replace all the words that occur fewer than some small number *n*.\n\n*Smoothing* (discounting) - process of shaving off a bit of probability mass from some more frequent events and give it\nto the events we have never seen. There are variety of ways to do smoothing:\n\n- Laplace Smoothing (add-one smoothing) - adds 1 to all bigram counts before we normalise them into probabilities. So\n  all the counts that uses do be 0, becomes 1, 1 will be 2, ... This method is not used in state-of-the-art solutions.\n  Can be treated as a baseline.\n- Add-k smoothing - Instead of adding 1, we add a fractional count e.g. 0.5, 0.05, 0.01, ... Useful for some of the\n  applications but still, does not perform perfectly.\n\nBackoff - we can use available knowledge, if you need to computer trigram, maybe bigram can help you with that, or even\nunigram. Sometimes, this might be sufficient.\n\nInterpolation - mix the probability estimates form all the n-gram estimators\n\n*Kneser-Ney Smoothing* - most commonly used method. It uses following observation: \"words that have appeared in more\ncontexts in the past are more likely to appear in some new context as well\". The best performing method is a modified\nKneser-Ney Smoothing.\n\n*Huge Language Models and Stupid Backoff*\n\nGoogle open-sourced their The Web 1 Trillion 5-gram corpus, they released also Google Books Ngrams. There is also COCA.\n\nStupid backoff - algorithm for a language model, gives up idea of making the idea of trying to make the model a true\nprobability distribution, no discounting. If a higher-order n-gram has zero count, we simply backoff to a lower order\nn-gram, this algorithm does not produce a probability distribution.\n\n## Chapter 4: Naive Bayes and Sentiment Classification\n\nMany problems can be viewed as classification problems: text categorisation, sentiment analysis, language\nidentification, authorship attribution, period disambiguation, tokenisation, and many more. Goal is to take a sample,\nextract features and classify the observation.\n\n*Naive Bayes Classifiers*\n\nClassifiers that make simplified (naive) assumption about how the features interact.\n\nBinary Multinomial Naive Bayes (binary NB) - used for sentiment analysis, clip the word counts in each document at 1 (\nextract unique words from the documents and count occurrence).\n\nHow to deal with negations? I really like this movie (positive), I don't like this movie (negative). Very simple\nbaseline, commonly used is: during text normalisation prepend the prefix *NOT_* to every word after a token of logical\nnegation.\n\n````\ni didn't like this movie , but ... -> i didn't NOT_like NOT_this NOT_movie , but ...\n````\n\nChapter 16 will tell more about parsing and relationship between negations.\n\nSentiment lexicons - lists of words that are pre-annotated with positive or negative sentiment. Popular lexicon: General\nInquirer or LIWC. For Naive Bayes you can add a feature \"this word occurs in the positive lexicon\" instead of counting\neach words separately. Chapter 20 will tell how lexicons can be learned automatically and other use cases besides\nsentiment analysis will be shown.\n\nSpam detection - Naive Bayes + regex + HTML scan\n\nLanguage identification - Naive Byes but not on the words! Used Character n-grams.\n\nNaive Bayes can be viewed as a language model.\n\n*Evaluation*\n\nConfusion matrix - table for visualising how an algorithm performs with respect to the human *gold label* (human labeled\ndata - gold labels). Has 2 dimensions - system output and gold labels.\n\nAccuracy - what percentage of all observations our system labelled correctly, doesn't work well for unbalanced classes -\ne.g. 80 negative classes, 20 *positive*, learn to always answer *negative* and you have 80% *accuracy*.\n\nPrecision - percentage of the items that the system detected that are in fact positive.\n\nRecall - percentage of the items actually present in the input that were correctly identified by the system.\n\nF-measure - combines both metrics - weighted harmonic mean of precision and recall - conservative metric, closer to the\nminimum of the two values (comparing to the arithmetic mean).\n\n*Evaluating with more than two classes*\n\nMacro-averaging - compute the performance of each class and then average over classes. Can be dominated by the more\nfrequent class.\n\nMicro-averaging - collect decisions for all classes into a single confusion matrix and then computer precision and\nrecall from that table. Reflects better the statistics for the smaller classes, more appropriate when the performance on\nall the classes is equally important.\n\n*Test sets and Cross-validation*\n\nCross validation - when your dataset is not large enough, you can use it all for training and validating by using\ncross-validation. Process of selecting random training and validation sets, training the classifier, computing the error\nand the repeating it once again. Usually 10 times.\n\n*Statistical Significance Testing*\n\nWe often need to compare the performance of two systems. How can we know one system is better than the another?\n\n*Effect size* - difference between F1-scores.\n\n*Null hypothesis* - we suppose *delta > 0*, we would like to know if we can confidentially rule out this hypothesis. In\norder to do this, create random variable *X* ranging over all test sets, we ask: how likely is it if the null hypothesis\nis correct that among these test sets we would encounter the value of *delta* that we found. This likelihood is called *\np-value*. We select the threshold - usually small, if we can reject the *null hypothesis* we can tell A is better than B\n\n- is *statistically significant*.\n\n*Avoiding harms in classification*\n\nRepresentational harms - system perpetuating negative stereotypes about social groups.\n\nToxicity detection - hate speech, abuse, harassment detection. These systems make harm themselves, for example: mark\nsentences mentioning minorities.\n\nSystem based on stereotypes can lead to censorship. Also, human labeled data can be biased.\n\nIt is important to include *model card* when releasing a system. Model card includes: training algorithms and\nparameters, data sources, intended users and use, model performance across different groups.\n\n## Chapter 5: Logistic Regression\n\nLogistic regression - one of the most important analytic tools in the social and natural sciences. Baseline supervised\nmachine learning algorithm for classification. Neural network can be seen as a series of logistic regression classifiers\nstacked on top of each other. This is a discriminative classifier (unlike Naive Bayes - generative classifier - you can\nliterally as such model how for example dog or cat looks like, discriminative model learns only how to distinguish the\nclasses, e.g. training set with dogs with collars and cats - when you ask a model what does it know about cats it would\nrespond: it doesn't wear a collar).\n\nClassification: *The Sigmoid*\n\nSigmoid function - takes a real value (even x -> infinity) and maps it to the range [0, 1]. Nearly linear near 0. This\nis extremely useful for calculating e.g. *P(y=1|x)* - belonging to the class. $$ z = weights\\ of \\ feature\\ vector\\ *\\ x\n\n+ bias $$\n\n$$ P(y=1) = \\sigma(z)\n$$\n\n*z* - ranges from *-inf* to *+inf*.\n\nLogistic regression can be used for all sorts of NLP tasks, e.g. period disambiguation (deciding if a period is the end\nof a sentence or part of a word).\n\n*Designing features* - features are generally designed by examining the training set with an eye to linguistic\nintuitions.\n\n*Representation learning* - ways to learn features automatically in an unsupervised way from the input.\n\n*Choosing a classifier* - Logistic Regression great at finding correlations.\n\n*Loss / cost function* - The distance between the system output and the gold output. Gradient descent - optimisation\nalgorithm for updating the weights. It is a method that finds a minimum of a function by figuring out in which direction\nthe function's slope is rising the most steeply. $$ \\theta\\ -\\ weights,\\ in\\ the\\ case\\ of\\ logistic\\ regression\\ \\theta\n= weights,\\ bias $$\n*Convex function* - function with one minimum. No local minima to get stuck. Local minima is a problem in training\nneural networks - non-convex functions.\n\n*Learning rate* - the magnitude of the amount to move in gradient descent (hyper-parameter).\n\n*Hyper-parameters* - special parameters chosen by the algorithm designer that affect how the algorithm works.\n\n*Batch training* - we compute gradient over the entire dataset, quite expensive to compute. Possibility to use *\nmini-batch* training, we train on a group of *m* examples (512 or 1024).\n\n*Regularisation* - a good model should generalise well, there is a problem of overfitting it model fits the data too\nperfectly. There is a possibility to add a regularisation - L1 (lasso regression) and L2 (ridge regression)\nregularisation.\n\n*Multinomial logistic regression* (*softmax* regression) - for classification problems with more than 2 classes. The\nmultinomial logistic classifier uses a generalisation of the sigmoid function called softmax function.\n\n*Model interpretation* - Often we want to know more than just the result of classification, we want to know why\nclassifier made certain decision. Logistic regression is interpretable.\n\n## Chapter 6: Vector Semantics and Embeddings\n\n*Distributional hypothesis* - the link between similarity in how words are distributed and similarity.\n\n*Lemma / citation* form - basic form of a word. *Wordform* - inflected lemma. Lemma can have multiple meanings, e.g.\nmouse might refer to a rodent or to a pointer, each of these are called word senses. Lemmas can be polysemous (have\nmultiple senses), this makes interpretation difficult. Word sense disambiguation - the task of determining which sense\nof a word is being used in particular context.\n\n*Synonyms* - two words are synonymous if they are substitutable - have the same propositional meaning.\n\n*Principle of contrast* - a difference in linguistic form is always associated with some difference in meaning, e.g.:\nwater / H2O, H2O - rather used in scientific context.\n\n*Word similarity* - *cat* is not a synonym of a *dog*, but these are 2 similar words. There are many human-labelled\ndatasets for this.\n\n*Word relatedness* - (or association) e.g.: *coffee* is not similar to *cup*, they shave ro similar features, but they\nare very related - associated, they co-exist. Very common kind of relatedness is semantic field, e.g.: *surgeon,\nscalpel, nurse, hospital*. Semantic fields are related to topic models like LDa - Latent Dirichlet Allocation -\nunsupervised learning on large sets of texts to induce sets of associated words from text. There are more relations\nbetween words:\nhypernymy, antonymy or meronymy.\n\n*Semantic Frames and Roles* - a set of words that denote perspectives or participants in a particular type of event,\ne.g.: *Ling sold the book to Sam* - seller / buyer relation. Important problem in question answering.\n\n*Connotation* - affective meaning - emotions, sentiment, opinions or evaluations.\n\n*Sentiment* - valence - the pleasantness of the stimulus, arousal - the intensity of emotion provoked by the stimulus,\ndominance - the degree of control exerted by the stimulus. In 1957 Osgood used these 2 values to represent a word -\nrevolutionary idea! Word embedded in 3D space.\n\n*Vectors semantics*. Word's meaning can be defined by its distribution in language - use neighbouring words. Idea of\nvector semantics is to represent a word as a point in a multidimensional semantic space (word embedding) that is derived\nfrom the distributions of word neighbours.\n\n*Information retrieval* - the task of finding the document *d* from the *D* documents in some collection that best\nmatches a query *q*.\n\n*Cosine* - similarity metric between 2 words (angle between 2 vectors)\n\n*TF-IDF* - Raw frequencies - not the best way to measure association between words (a lot of noise from words like *the,\nit, they, ...*). Term Frequency - the frequency of a word *t* in document *d*. The second factor gives higher weights to\nwords that occur only in a few documents.\n\n*PMI* - Point-wise Mutual Information - measure how often 2 events occur, compared with what we would expect if they\nwere independent. A useful tool whenever we need to find words that are strongly associated. It is more common to use\nPPMI. Very rare words tend to have very high PMI.\n\n*Word2vec* - dense word embedding, the intuition of word2vec is that instead of counting how often each word *w* occurs\nnear word *u*, we train a classifier on a binary classification task: \"Is word *w* likely to show up near word *u*?\". We\ncan use running text as supervised training data. - this is called self-supervised training data.\n\nVisualising embeddings - visualise the meaning of a word embedded in space by listing the most similar words, clustering\nalgorithms and the most important method - dimensionality projection, e.g. t-SNE.\n\n*First-order co-occurrence / Syntagmatic association* - if words are near each other, e.g. *wrote* and *book*.\n\n*Second-order co-occurrence / Paradigmatic association* - if words have similar neighbours, e.g. *wrote*, *said*\n\n*Representational harm*. Embeddings are capable of capturing bias and stereotypes. More, they are capable of amplifying\nbias.\n\n## Chapter 7: Neural Networks and Neural Language Models\n\nNeural network share much of the same mathematics as logistic regression, but NNs are more powerful classifier than\nlogistic regression. Neural networks can automatically learn useful representations of the input.\n\n*Unit* - takes a set of real values numbers as input, performs some computations on them and produces an output. Is\ntaking weighted sum of inputs + bias. Output of this function is called an activation. $$ y = a = f(z) = f(w \\cdot x +\nb)\n$$\n*f* - e.g. sigmoid, tanh, ReLU. Sigmoid most commonly used for teaching. Tanh is almost always better than sigmoid.\nReLU (rectified linear unit) - most commonly used and the simplest.\n\n*The (famous) XOR problem* - Minsky proved it is not possible to build a perceptron (very simple neural unit that has a\nbinary output and does not have a non-linear activation function) to compute logical XOR. However, it can be computed\nusing a layered neural network.\n\n*Feed-Forward Neural Network*. Multi-layer network, units are connected without cycles. Sometimes called multi-layer\nperceptrons for historical reasons, modern networks aren't perceptrons (aren't linear). Simple FFNN have 3 types of\nnodes: input units, hidden units and output units. The core of the neural network is the hidden layer formed of hidden\nunits. Standard architecture is that each layer is fully connected - each unit in each layer takes all the outputs from\nthe previous layer.\n\nPurpose of learning is to learn weights and bias on each layer. *Loss function* - the distance between the system output\nand the gold output, e.g. cross-entropy loss. To find the parameters that minimise this loss function, we use for\nexample *gradient descent*. Gradient descent requires knowing the gradient of the loss function with respect to each of\nthe parameters. Solution for computing this gradient is error back-propagation.\n\nLanguage modeling - predicting upcoming words from prior word context - neural networks are perfect at this task. Much\nbetter than *n-gram* models - better generalisation, higher accuracy, on the other hand - much slower to train.\n\n## Chapter 8: Sequence Labeling for Parts of Speech and Named Entities\n\n*Named entity* - e.g. Marie Curie, New York City, Stanford University, ... important for many natural language\nunderstanding tasks (e.g. sentiment towards specific product, question answering). Generally speaking, anything that can\nbe referred to with a proper name (person, location, organisation). Possible output tags: PER (person), LOC (location),\nORG (organisation) and GPE (geopolitical entity).\n\n*POS/Part of Speech* - knowing if a word is noun or verb tells us about likely neighbouring words. They fall into 2\ncategories: closed class and open class. POS-tagging is the process of assigning a part-of-speech to each word in a\ntext. Tagging is a disambiguation task. Words are ambiguous, one can have more than one POS e.g. book flight, hand me\nthat book, ... The goal is to resolve these ambiguities. The accuracy of POS tagging algorithms is very high +97%. Most\nFrequent Class Baseline - effective, baseline method, assign token to the class that occurs most often in the training\nset.\n\nMarkov chain - a model that tells about the probabilities of sequences of random variables. A Markov chain makes a very\nstrong assumption - if you want to predict future sentence, all that matters is the current state. Formally a Markov\nchain is specified by: set of *N* states, a transition probability matrix and initial probability distribution.\n\nThe Hidden Markov Model - allows talking about both observed events (words seen in the input) and hidden events (\npart-of-speech tags). Formally HMM is specified by: set of *N* states, a transition probability, observations,\nobservation likelihoods / emission probabilities (probability of an observation begin generated from a state *q*) and\ninitial probability distribution.\n\nHMM is a useful and powerful model, but needs a number of augmentations to achieve high accuracy. CFR is a log-linear\nmodel that assigns a probability to an entire output sequence. We can think of a CRF as like a giant version of what\nmultinomial logistic regression does for a single token.\n\nGazetteer - list of place names, millions of entries for locations with detailed geographical and political information,\ne.g. https://www.geonames.org/\n\nPOS tags are evaluated by accuracy. NERs are evaluated using recall, precision and F1.\n\nNamed Entity Recognition is often based on rule based approaches.\n\n## Chapter 9: Deep Learning Architectures for Sequence Processing\n\nLanguage is inherently temporal phenomenon. This is hard to capture using standard machine learning models.\n\n*Perplexity* - a measure of model quality, perplexity of a model with respect to an unseen test set is the probability\nthe model assigns to it, normalised by its length.\n\n*RNN - Recurrent Neural Network* - any network that contains a cycle within its network connections. Any network where\nthe value of a unit is directly or indirectly dependent on its own earlier outputs as an input. Within RNNs there are\nconstrained architectures that have proven to be extremely effective.\n\n*Elman Networks / Simple Recurrent Networks* - very useful architecture, also serves as the basis for more complex\napproaches like LSTM (Long Short-Term Memory). RNN can be illustrated as a feedforward network. New set of weights that\nconnect the hidden layer from the previous time step to the current hidden layer determine how the network makes the use\nof past context in calculating the output for the current input.\n\nRNN-based language models process sequences a word at a time, attempting to predict the next word in a sequence by using\nthe current word and the previous hidden state as inputs.\n\nRNNs can be used for many other tasks:\n\n- sequence labeling - task is to assign a label chosen from a small fixed set of labels to each element of a sequence (\n  e.g. POS tagging or named entity recognition). Inputs for RNN are word embeddings and the outputs are tag\n  probabilities generated by softmax layer.\n- sequence classification - e.g. sentiment analysis, spam detection, message routing for customer support applications.\n\nStacked RNN - multiple networks where the output of one layer serves as the input to a subsequent layer. They very often\noutperform single-layer networks, mainly because stacked layers are able to have different level of abstractions across\nlayers. Optimal number of layers is application-dependant.\n\nBidirectional RNN = forward and backward networks combined. In these 2 independent networks input is processed form the\nstart to the end and from the end to the start. Also, very effective for sequence classification.\n\nIt is difficult to train RNNs for tasks that require a network to make use of information distant from the current point\nof processing. RNNs can not carry forward critical information because of hidden layers and because they are fairly\nlocal.\n\nLSTM - Long Short-Term Memory - divide the context management problem into two sub-problems:\n\n- removing information no longer needed from the context\n- adding information likely to be needed for later decision-making\n\nLSTM is capable of mitigating the loss of distant information. However, there are still RNNs, so relevant information\ncan be lost.\n\nTransformers - approach to sequence processing that eliminates recurrent connections and returns to architectures\nreminiscent of the fully connected networks. Transformers are made up of stacks of networks of the same length of simple\nlinear layers, feedforward networks and custom connections.\n\nTransformers use *self-attention layers* - they allow network to directly extract and use information from arbitrarily\nlarge contexts without the need to pass it through intermediate recurrent connections as in RNNs.\n\nAt the core of an attention-based approach is the ability to compare an item of the interest to a collection of other\nitems in way that reveals their relevance in the current context.\n\nIt turns out, language models can generate toxic language. Many models are trained on data from Reddit (majority of\nyoung, males - not representative). Language model can also leak information about training data - meaning it can be\nattacked.\n\n## Chapter 10\n\nMissing chapter.\n\n## Chapter 11: Machine Translation and Encode-Decoder Models\n\nMachine translation - the use of computers to translate from one language to another. The most common use of machine\ntranslation is information access - when you want to for example translate some instructions on the web. Also, often\nused in CAT - Computer-Aided Translation, where computer produces draft translation and then human fixes it in\npost-editing. Last but not least, useful in human communication needs.\n\nStandard algorithm for MT is encoder-decoder network (can be implemented with RNNs or with Transformers). They are\nextremely successful in catching small differences between languages.\n\nSome aspects of human language seem to be universal - true for every or almost every language, for example every\nlanguage have words for referring to people, eating or drinking. However, many languages differ what causes translation\ndivergences.\n\nGerman, French, English and Mandarin are all SVO (Subject-Verb-Object) languages. Hindi and Japanese are SOV languages.\nIrish and Arabic are VSO languages. VO languages generally have prepositions, OV languages generally have postpositions.\n\nMachine Translation and Words Sense Disambiguation problems are closely linked.\n\nEncode-decoder (sequence-to-sequence) networks are models capable of generating contextually appropriate, arbitrary\nlength, output sequences. Encoder (LSTM, GRU, convolutional networks, Transformers) takes an input sequence and creates\na contextualised representation of it, then representation is passed to decoder (any kind of sequence architecture)\nwhich generates a task-specific output sequence.\n\nMachine translation raises many ethical of the same issues that we have discussed previously. MT systems often assign\ngender according to culture stereotypes. Some reaserch found that MY systems perform worse when they are asked to\ntranslate sentences that describe people with non-stereotypical gender roles.\n\n## Chapter 12: Constituency Grammars\n\nSyntactic constituency is the idea that groups of words can behave as single units.\n\nThe most widely used formal system for modeling constituent structure in English is Context-Free Grammar, also called\nPhrase-Structure Grammars, and the formalism is equivalent to Backus-Naur Form (BNF). A context-free grammar consist of\na set of rules or productions, each of which expresses the ways that symbols of the language can be grouped and ordered\ntogether.\n\nTreebank - parse tree.\n\n## Chapter 13-16\n\nSkipped for now.\n\n## Chapter 17: Information Extraction\n\nInformation extraction - turns the unstructured information embedded in texts into structured data - e.g. relational\ndatabase to enable further processing.\n\nRelation extraction - finding and classifying semantic relations among the text entities. These are often binary\nrelations - child of, employment, part-whole. Task of NER is extremely useful here. Wikipedia also offers large supply\nof relations.\n\nRDF - Resource Description Framework - tuple of entry-relation-entry. DBPedia was derived from Wikipedia and contains\nover 2 bilion RDF triples. Freebase - part of Wikidata, has relations between people and their nationality or locations.\n\nThere are 5 main classes of algorithms for relation extraction:\n\n- handwritten patterns - high-precision and can be tailored to specific domains, however low recall and a lot of work\n- supervised machine learning - for all entity pairs determine if are in relation\n- semi-supervised machine learning (bootstrapping and via distant supervision) - bootstrapping proceeds by taking the\n  entities in the seed pair and then finding sentences that contain both entities.\n- unsupervised\n\nFor unsupervised and semi-supervised approaches it is possible to calculate estimated metrics (like estimated precision)\n.\n\nKnowledge graphs - dataset of structured relational knowledge.\n\nEvent extraction - task of identification mentions of events in texts. In English most events correspond to verbs and\nmost verbs introduce events (United Airlines SAID, prices INCREASED, ...). Some noun phrases can also denote events (\nthe increase, the move, ...).\n\nWith extracted events and extracted temporal expressions, events from text can be put on a timeline. Determining\nordering can be viewed as a binary relation detection and classification task.\n\nEvent coreference - is needed to figure out which event mentions in a text refer to the same event.\n\nExtracting time - temporal expressions are used to determine when the events in a text happened. Dates in text need to\nbe normalised.\n\n- relative: yesterday, next semester\n- absolute: date\n- durations\n\nTemporal expressions task consists of finding the start and the end of all the text spans that correspond to such\ntemporal expressions. Such task can use rule-based approach.\n\nTemporal Normalisation - process of mapping a temporal expressions to either a specific point in time or to a duration.\n\nTemplate filling - the task of describing stereotypical or recurring events.\n\n## Chapter 18: Word Senses and WordNet\n\nAmbiguity - the same word can be used to mean different things. Words can be polysemous - have many meanings.\n\nWord sense is a discrete representation of one aspect of the meaning of a word. Meaning can be expressed as an\nembedding, for example embedding that represents the meaning of a word in its textual context. Alternative for\nembeddings are glosses - written for people, a gloss is just a sentence, sentence can be embedded. Other way of defining\na sense is through relationships (\"right\" is opposite to \"left\").\n\nRelations between senses:\n\n- synonymy - when two senses of two different words are (almost) identical - couch / sofa, vomit / throw up\n- antonymy - when two words have an opposite meaning - long / short, fast / slow\n- hyponym / subordinate - when one word is more specific than the other word - car (hyponym) -> vehicle\n- hypernym / superordinate- when one word is more general than the other word - vehicle (hypernym) -> car\n- meronymy - when one word describes part of the other word - wheel (meronym) -> car\n- holonym - opposite to meronym - car (holonym) -> wheel\n- metonymy - the use of one aspect of a concept to refer to other aspects of the entity - Jane Austen wrote Emma (\n  author) <-> I really love Jane Austen (works of author),\n\nWordNet - a large online thesaurus, a database that represents word senses. WordNet also represents relations between\nsenses (is-a, part-whole). The relation between two senses is important in language understanding, for example -\nantonymy - words with opposite meaning.\n\nEnglish WordNet has 3 separate databases (nouns, verbs, adjectives and adverbs).\n\nSynset - (Synonym Set) - the set of near-synonyms for a WordNet sense. Glosses are properties of a synset.\n\nWord Sense Disambiguation - the task of determining which sense of a word is being used in a particular context. WSD\nalgorithms take as input some word and context and output the correct word sense.\n\nLexical sample tasks - small pre-selected set of target words and an inventory of senses. All-words task (harder\nproblem) - the system is given an entire texts and a lexicon with an inventory of senses for each entry, and we have\ndisambiguate every word in the text.\n\nThe best WSD algorithm is simple 1-nearest-neighbour algorithm using contextual word embeddings.\n\nThere are also feature-based algorithms for WSD - POS-tags, n-grams (3-gram most commonly used), weighted average of\nembeddings - passed to SVM classifier\n\nThe Lesk algorithm - the oldest and the most powerful knowledge-based WSD metod and useful baseline. Lest is a family of\nalgorithms that choose the sense whose dictionary gloss or definition shares the most words with the target word's\nneighbourhood.\n\nBERT - uses contextual embeddings.\n\nWord Sense Induction - unsupervised approach, we don't use human-defined word senses, instead, the set of senses of each\nword is created automatically from the instances of each word in the training set.\n\n## Chapter 19\n\nSkipped for now.\n\n## Chapter 20: Lexicons for Sentiment, Affect and Connotation\n\nConnotation - the aspect of word's meaning that are related to a writer or reader's emotions, sentiment, opinions or\nevaluations.\n\nEmotion - (by Scherer) relatively brief episode of response to the evaluation of an external or internal event as being\nof major significance. Detecting emotions has the potential to improve a number of language processing tasks - detecting\nemotions in reviews, improve conversation systems, depression detection.\n\nBasic emotions proposed by Ekman - surprise, happiness, anger, fear, disgust, sadness\n\nBasic emotions proposed by Plutchik - joy-sadness, anger-fear, trust-disgust, anticipation-surprise.\n\nMost models include 2-3 dimensions:\n\n- valence - the pleasantness of the stimulus\n- arousal - the intensity of emotion provoked by stimulus\n- dominance - the degree of control exerted by the stimulus\n\nThe General Inquirer - the oldest lexicon of 1915 positive words and 2291 negative words. The NRC Valence, Arousal and\nDominance scores 20 000 words (this model assigns valence, arousal and dominance). The NC WordEmotion Association\nLexicon uses Plutchik's basic emotions to describe 14 000 words. There are many more lexicons.\n\nBest-worst scaling - method used in crowdsourcing, annotators are given N items and are asked which item is the best and\nwhich item is the worst.\n\nDetecting peron's personality from their language can be useful for dialog systems. Many theories of human personality\nare based around a small number of dimensions:\n\n- extroversion vs introversion - sociable, assertive vs aloof, reserved, shy\n- emotional stability vs neurocriticism - calm, unemotional vs insecure, anxious\n- agreeableness vs disagreeableness - friendly, cooperative vs antagonistic, fault-finding\n- conscientiousness vs unconscientiousness - self-disciplined, organised vs inefficient, careless\n- openness to experience - intellectual, insightful s shallow, unimaginative\n\nConnotation frames - express richer relations to affective meaning that a predicate encodes about its arguments -\nCountry A violated the sovereignty of Country B.\n\n## Chapter 21-22\n\nSkipped for now.\n\n## Chapter 23: Question Answering\n\nTwo major paradigms of question answering:\n\n- information retrieval\n- knowledge-based\n\nFactoid questions - questions that can be answered with simple facts expressed in short texts, like: Where is the Louvre\nMuseum located?\n\nInformation retrieval. The resulting IR system is often called a search engine. Ad hoc retrieval, user poses a query to\na retrieval system, which then returns an ordered set of documents from some collection.\n\nBasic IR system architecture uses the vector space, queries and documents are mapped to vector, then cosine similarity\nis being used to rank potential documents answering the query. This is an example of the bag-of-words model. However, we\ndon't use raw word counts in IR, instead we use TD-IDF.\n\nTD-IDF - The term frequency tells us how frequent the word is, words that occur more often are likely to be informative\nabout the document's contest. However, terms that occur across all documents aren't useful. In such case inverse\ndocument frequency comes handy.\n\nDocuments scoring - we score document d by the cosine of its vector d with the query vector q. $$ score(q, d) = cos(q *\nd) = \\frac{q*d}{|q|*|d|} $$ More commonly used version of the score (because queries are usually short):\n$$ score(q, d) =\\sum \\frac{tf-idf(t,d)}{|d|} $$ Slightly more complex version of TF-IDF is called BM25.\n\nIn the past it was common to remove high-frequency words from query and the document. The list of such high-frequency\nwords to be removed is called a stop list (the, a, to, ...). Worth to know, however not commonly used recently because\nof much better mechanisms.\n\nInverted index - given a query term, gives a list of documents that contain the term.\n\nTF-IDF / BM25 have conceptual flaw - they work only if there is exact overlap of words between the query and document -\nvocabulary mismatch problem. Solution to this is to use synonymy - instead of using word-count, use embeddings. Modern\nmethods use encoders like BERT.\n\nThe goal of IR-based QA (open domain QA) is to answer a user's question by finding short text segments from the web or\nsome document collection.\n\nDatasets:\n\n- SQuAD - Stanford Question Answering Dataset - contains passages form Wikipedia and associated questions.\n\n- HotpotQA - is the dataset that was created by showing crowd workers multiple context documents and asked to come up\n  with questions that require reasoning about all the documents.\n\n- TriviaQA - questions written by trivia enthusiasts, question-answer-evidence triples\n- The Natural Questions - real anonymised queries to the Google search engine, annotators were presented a query along\n  with Wikipedia page from the top 5 results\n- TyDi QA - questions from diverse languages\n\nEntity linking - the task of associating a mention in text with the representation of some real-word entity in an\nontology (e.g. Wikipedia).\n\nKnowledge-based question answering - idea of answering a question by mapping it to a query over a structured database.\n\nRDF triples - a tuple of 3 elements: subject, predicate and object, e.g. (Ada Lovelace, birth-year, 1815). This can be\nused to perform queries: \"When was Ada Lovelace born?\" - birth-year(Ada Lovelace, ?).\n\nSecond kind uses a semantic parser to map the question to a structured program to produce an answer.\n\nAnother alternative is to query a pretrained model, forcing model to answer a question solely from information stored in\nits parameters.\n\nT5 is an encoder-decoder architecture, in pretraining it learns to fill in masked spans of task by generating missing\nspans in the decoder. It is then fine-tuned on QA datasets, given the question without adding any additional context or\npassages.\n\nWatson DeepQA - system from IBM that won the Jeopardy - main stages - Question processing, Candidate Answer Generation,\nCandidate Answer Scoring, Answer Merging and Confidence Scoring.\n\nMRR - mean reciprocal rank - a common evaluation metric for factoid question answering\n\n## Chapter 24: Chatbots & Dialogue Systems\n\nProperties of Human Conversation:\n\n- turns - a dialogue is a sequence of turns, turn structure have important implications for spoken dialogue - a system\n  needs to know when to stop talking and also needs to know when user is done speaking.\n- speech acts :\n    - constatives - committing the speaker to something's being the case (answering, claiming, denying, confirming,\n      disagreeing)\n    - directives - attempts by the speaker to get the addressee to do something (advising, asking, forbidding, inviting,\n      ordering, requesting)\n    - commissives - committing the speaker to some future course of action (promising, planning, vowing, betting,\n      opposing)\n    - acknowledgments - express the speaker's attitude regarding the hearer with respect to some social action (\n      apologising, greeting, thanking, accepting, thanking)\n- grounding - this means acknowledging that the hearer has understood the speaker (like ACK in TCP), humans do this all\n  the time for example using OK\n- sub-dialogues and dialogue structure:\n    - questions set up an expectation for an answer, proposals are followed by acceptance / rejection, ...\n    - dialogue systems aren't always followed immediately by their second pair part, they can be separated by a side\n      sequence (or sub-dialogue) - correction sub-dialogue, clarification question or presequence (Can you make a train\n      reservations? Yes I can. Please, do ...)\n- initiative - sometimes a conversation is completely controlled by one participant, for humans it is more natural that\n  initiative shifts from one person to another\n- inference - speaker uses provides some information and another information needs to be derived from that information (\n  When in May do you want to travel? I have a meeting from 12th to 15th.)\n\nBecause of characteristics of human conversations it is difficult to build dialogue systems that can carry on natural\nconversations.\n\nChatbots - the simplest form of dialogue systems. Chatbots have 3 categories:\n\n- rule based chatbots - for example ELIZA based on psychological research, created in 1966, the most important chatbot.\n  Few years later PARRY was created - this chatbot had model of its own mental state (fear, anger, ...) - first known\n  system to pass the Turing test (1972) - psychiatrists couldn't distinguish text transcripts of interviews with PARRY\n  from transcripts of interviews with real paranoids (!!!)\n- corpus-based chatbots - instead of using hand-built rules, mine conversations of human-human conversations. Requires\n  enormous data for training. Most methods use retrieval methods (grab response from some document) or generation\n  methods (language model or encoder-decoder to generate the response given the dialogue context).\n- hybrid of 2 above\n\nTask-based dialogue - a dialogue system has the goal of helping a user solve some task like making an airplane\nreservation or buying a product. GUS - influential architecture form 1977 for travel planning.\n\nThe control architecture for frame-based (frame - kind of knowledge structure representing the kinds of intentions the\nsystem can extract from user sentences) dialogues systems is used in various modern systems like Siri, Google Assistant\nor Alexa. The system's goal is to fill the slots in the frame with the fillers the user intends and the preform relevant\naction for the user. To do this system asks questions associated with frames. This is heavily rule-based approach.\n\nSlot-filling - task of domain and intent classification.\n\nIf dialogue system misrecognizes or misunderstands an utterance, the user will generally correct the error by repeating\nor reformulating the utterance.\n\nModern systems often ask for confirmation or rejection if input data is correct. The explicit confirmation eliminates\nrisk of mistakes, but awkward and increases the length of conversation.\n\nSystem might as clarification questions.\n\nDialogue systems might be evaluated using different metrics, e.g. engagingness, avoiding repetition, making sense.\nCommonly used high-level metric is called acute-eval - annotator looks at two conversations and choose the one in which\nthe dialogue system participant performed better. Automatic metrics are generally not used for chatbots. However, there\nare some attempts to train a Turing-like evaluator classifier to distinguish a human-generated responses and\nmachine-generated responses.\n\nThe study of dialogue systems is closely liked with the field of Human-Computer Interaction. Ethical issues also need to\nbe taken into consideration when designing system - famous example Microsoft's Tay chatbot (adversarial attacks). ML\nmodels amplify stereotypes and also raise privacy concerns.\n\n## Chapter 25: Phonetics\n\n## Chapter 26: Automatic Speech Recognition and Text-to-speech\n"
  },
  {
    "path": "books/peopleware.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Peopleware: Productive Projects and Teams\n\nBook by Tom DeMarco and Tim Lister\n\n- [Chapter 1: Somewhere today, a project is failing](#chapter-1-somewhere-today-a-project-is-failing)\n- [Chapter 2: Make a cheeseburger, sell a cheeseburger](#chapter-2-make-a-cheeseburger-sell-a-cheeseburger)\n- [Chapter 3: Vienna waits for you](#chapter-3-vienna-waits-for-you)\n- [Chapter 4: Quality - if time permits](#chapter-4-quality---if-time-permits)\n- [Chapter 5: Parkinson's Law revisited](#chapter-5-parkinsons-law-revisited)\n- [Chapter 6: Laetrile](#chapter-6-laetrile)\n- [Chapter 7: The Furniture Police](#chapter-7-the-furniture-police)\n- [Chapter 8: You never get anything done around here between 9 and 5](#chapter-8-you-never-get-anything-done-around-here-between-9-and-5)\n- [Chapter 9: Saving money on space](#chapter-9-saving-money-on-space)\n- [Chapter 10: Brain Time versus Body Time](#chapter-10-brain-time-versus-body-time)\n- [Chapter 11: The Telephone](#chapter-11-the-telephone)\n- [Chapter 12: Bring Back the Door](#chapter-12-bring-back-the-door)\n- [Chapter 13: Taking Umbrella Steps](#chapter-13-taking-umbrella-steps)\n- [Chapter 14: The Hornblower Factor](#chapter-14-the-hornblower-factor)\n- [Chapter 15: Let's talk about Leadership](#chapter-15-lets-talk-about-leadership)\n- [Chapter 16: Hiring a Juggler](#chapter-16-hiring-a-juggler)\n- [Chapter 17: Playing well with others](#chapter-17-playing-well-with-others)\n- [Chapter 18: Childhood's end](#chapter-18-childhoods-end)\n- [Chapter 19: Happy to be here](#chapter-19-happy-to-be-here)\n- [Chapter 20: Human Capital](#chapter-20-human-capital)\n- [Chapter 21: The Whole is greater than the sum of the Parts](#chapter-21-the-whole-is-greater-than-the-sum-of-the-parts)\n- [Chapter 22: The Black Team](#chapter-22-the-black-team)\n- [Chapter 23: Teamicide](#chapter-23-teamicide)\n- [Chapter 24: Teamicide Revisited](#chapter-24-teamicide-revisited)\n- [Chapter 25: Competition](#chapter-25-competition)\n- [Chapter 26: A spaghetti dinner](#chapter-26-a-spaghetti-dinner)\n- [Chapter 27: Open Kimono](#chapter-27-open-kimono)\n- [Chapter 28: Chemistry for Team Formation](#chapter-28-chemistry-for-team-formation)\n- [Chapter 29: The self-healing system](#chapter-29-the-self-healing-system)\n- [Chapter 30: Dancing with Risk](#chapter-30-dancing-with-risk)\n- [Chapter 31: Meetings, Monologues, and Conversations](#chapter-31-meetings-monologues-and-conversations)\n- [Chapter 32: The ultimate management sin is ...](#chapter-32-the-ultimate-management-sin-is-)\n- [Chapter 33: E(vil) Mail](#chapter-33-evil-mail)\n- [Chapter 34: Making change possible](#chapter-34-making-change-possible)\n- [Chapter 35: Organizational learning](#chapter-35-organizational-learning)\n- [Chapter 36: The making of community](#chapter-36-the-making-of-community)\n- [Chapter 37: Chaos and Order](#chapter-37-chaos-and-order)\n- [Chapter 38: Free Electrons](#chapter-38-free-electrons)\n- [Chapter 39: Holgar Dansk](#chapter-39)\n\n## Chapter 1: Somewhere today, a project is failing\n\n\"Politics\" is the most frequently cited cause of failure. \"Politics\" for people mean: communication problems, staffing\nproblems, lack of motivation, and high turnover. English language provides a much more precise term - sociology.\n\n> The major problems of our work are not so much technological as sociological in nature.\n\nWe tend to focus on the technical rather than the human side of the work, because it is easier to do: new hard drive\ninstallation vs figuring out why somebody is dissatisfied with the company.\n\nManager should concentrate on sociology, not on technology. Human interactions are complicated and never very crisp and\nclean in their effects, but they matter more than any other aspect of work.\n\n## Chapter 2: Make a cheeseburger, sell a cheeseburger\n\nThe \"make a cheeseburger, sell a cheeseburger\" mentality, can be fatal in your development area:\n\n- Make the machine (the human machine) run as smoothly as possible\n- Take a hard line about people goofing off on the job\n- Treat workers as interchangeable pieces of the machine\n- Optimize the steady state\n- Standardize procedure, do everything by the book\n- Eliminate experimentation - that's what the folks at the headquarters are paid for\n\nTo manage thinking workers effectively, managers should take measures nearly opposite to those listed above:\n\n- encourage people to make some errors - ask people what dead-end roads they have been down, and making sure they\n  understand that \"none\" is not the best answer\n- you may be able to kick the people to make them active, but not to make them creative, inventive and thoughtful, there\n  is nothing more discouraging to any worker than the sense that his own motivation is inadequate and has to be\n  \"supplemented\" by the boss\n- every worker is unique, uniqueness is what makes project chemistry vital and effective\n- the catalyst is important because the project is always in a state of flux, someone who can help a project to jell is\n  worth two people who just do work - managers pay too little attention to how well each team member fits into the\n  effort as a whole\n- workers need time for brainstorming, investigating new methods, figuring out how to avoid doing some subtasks,\n  reading, training, and just goofing off\n\n## Chapter 3: Vienna waits for you\n\nYour people are aware of the one short life that each person is allotted. There has to be something more important than\nthe silly hob they are working on.\n\nOvertime - for every hour of overtime, there will be more or less an hour of undertime. The trade-off might work to\nmanagement's advantage in the short term, but in the long term it will cancel out.\n\nOvertime is like sprinting: it makes some sense for the last 100m of the marathon for those with any energy left, but if\nyou start sprinting in the first kilometer, you are wasting time.\n\nWorkaholism is an illness, but common-cold-like, everyone has a bout of it now and then. If a manager tries to exploit\nworkaholics, they will eventually lose them.\n\nRealization that one has sacrificed a more important value (family, love, home, youth) for a less important value (work)\nis devastating.\n\nTypical things companies do to improve productivity, they make the work less enjoyable and less interesting:\n\n- pressure people to put in more hours\n- mechanize the process of product deployment\n- compromise the quality of the product\n- standardize procedures\n\nNext time you hear someone talking about productivity, listen carefully to hear if the speaker uses the term \"employee\nturnover\". Chances are low.\n\n> People under time pressure don't work better - they just work faster.\n\nIn order to work faster, they may have to sacrifice the quality of the product and of their work experience.\n\n## Chapter 4: Quality - if time permits\n\nMan's character is dominated by a small number of basic instincts: survival, self-esteem, reproduction, territory, ...\nEven the slightest challenge to one of these built-in values can be upsetting.\n\nWe all tend to tie our self-esteem to the quality of the product we produce. Any step you take that may jeopardize the\nquality of the product is likely to set the emotions of your staff directly against you.\n\nWorkers kept under extreme time pressure will begin to sacrifice quality. They will hate what they are doing, but what\nother choice do they have?\n\n> Some of my folks would tinker forever on a task, all in the name of _QUALITY_. But the market doesn't give a damn\n> about that much quality - it is screaming for the product to be delivered yesterday and will accept it even in a\n> quick-and-dirty state\n\n**WRONG.** The builders' view of quality is very different - their self-esteem is strongly tied to the quality of the\nproduct, they tend to impose quality standards of their own.\n\n_Quality, for beyond that required by the end user, is a means to higher productivity._\n\nIn some Japanese companies, the project team has an effective power of veto over delivery of what they believe to be a\nnot-yet-ready product. No matter that the client would be willing to accept even a substandard product, the team can\ninsist that delivery wait until its own standards are achieved.\n\n## Chapter 5: Parkinson's Law revisited\n\nParkinson's Law:\n\n> Work expands to fill the time allocated for it\n\nParkinson's Law gives managers the strongest possible conviction that the only way to get work done at all is to set an\nimpossibly optimistic delivery date. Parkinson's Law almost certainly doesn't apply to your people. Treating people as\nParkinsonian workers doesn't work - it can only demean and demotivate them.\n\nParkinson's Law didn't catch on because it was so true, it caught on because it was funny.\n\nProgrammers seem to be a bit more productive after they have done the estimate themselves, compared to cases in which\nthe manager did it without even consulting them.\n\nAccording to 1985 Jeffery-Lawrence study - projects on which the boss applied no schedule pressure whatsoever (\"Just\nwake me up when you are done\") had the highest productivity of all.\n\n## Chapter 6: Laetrile\n\nLaetrile - colorless liquid pressed from the apricot pits. Can be used for baking like any extract, in Mexico you can\nbuy it for $50 to \"cure\" fatal cancer.\n\nSimilarly, lots of managers fall into a trap of technical laetrile that purports to improve productivity.\n\nThe 7 false hopes of software management:\n\n1. There is some new trick you have missed that could send productivity soaring\n2. Other managers are getting gains of 100-200% or more\n3. Technology is moving so swiftly that you are being passed by\n4. Changing languages will give you huge gains\n5. Because of the backlog, you need to double productivity immediately\n6. You automate everything else: isn't it about time you automated away your software development staff?\n7. Your people will work better if you put them under a lot of pressure\n\nWhat is management:\n\n> The manager's function is not to make people work, but to make it possible for people to work.\n\n## Chapter 7: The Furniture Police\n\nThe work space given to intellect workers is usually noisy, interruptive, un-private and sterile. SOme are prettier than\nothers, but not much more functional.\n\nPolice-mentality planners design workspaces the way they would design prison - optimized for containment at minimal\ncost.\n\nAs long as workers are crowded into noisy, sterile, disruptive space, it is not worth improving anything but the\nworkspace.\n\n## Chapter 8: You never get anything done around here between 9 and 5\n\nTo be productive, people may come in early or stay late or even try to escape entirely, by staying home for a day to\nget a critical piece of work done. Staying late or arriving early or staying home to work in peace is a damning\nindictment of the office environment.\n\nTwo people from the same organization tend to perform alike. The best performers are clustering in some organizations\nwhile the worst performers are clustering in others.\n\nMany companies provide developers with a workplace that is too crowded, noisy, and interruptive as to fill their days\nwith frustration. That alone could explain reduced efficiency as well as tendency for good people to migrate elsewhere.\n\nIf you participate in or manage a team of people who need to use their brains during the workday, then the environment\nis your business.\n\n## Chapter 9: Saving money on space\n\nIt is surprising how little the potential savings are compared to the potential risk. The entire cost of work space for\ndeveloper is a small percentage of the salary paid to the developer - 20:1 ratio.\n\nPeople need the space and quiet in order to perform optimally. Noise is directly proportional to density, so halving the\nallotment of space per person can be expected to double the noise.\n\nSaving money on space may be costing you a fortune.\n\n## Chapter 10: Brain Time versus Body Time\n\nIn the office: 30% of the time, people are noise sensitive, and the rest of the time, they are noise generators.\n\nEach time you are interrupted, you require an additional immersion period to get back into flow. During this immersion,\nyou are not really doing work.\n\nPeople have to be reassured that it is not their fault if they can only manage one or two uninterrupted hours a week -\nrather it is the organization's fault for not providing a flow-conductive environment. None of this data can go to the\nPayroll Department.\n\nThe collection of uninterrupted-hour data can give you some meaningful metric evidence of just good or bad your\nenvironment is.\n\n```\nE-Factor = Uninterrupted Hours / Body-Present Hours\n```\n\n## Chapter 11: The Telephone\n\nWhen you are doing think-intensive work like design, interruptions are productivity killers. When you are doing sales\nand marketing support, you have to take every single call that comes in. Mixing flow and highly interruptive activities\nis a recipe for nothing but frustration. \"Leave me alone, I am working\" ethic can emerge. People must learn that it is\nokay sometimes not to answer the phone, and their managers need to understand that as well.\n\nThat is the character of knowledge workers' work: The quality of their time is important, not just its quantity.\n\n## Chapter 12: Bring Back the Door\n\nThere are some prevalent symbols of success and failure in creating a sensible workplace. The most obvious symbol of\nsuccess is the door. When there are sufficient doors, workers can control noise and interruptibility to suit their\nchanging needs.\n\nDon't expect the Establishment to roll over and play dead just because you begin to complain. There at least 3\ncounterarguments to surface almost immediately:\n\n- People don't care about glitzy office space. They are too intelligent for that. And the ones who do care are just\n  playing status games.\n    - Appearance is stressed far too much in workplace design. What is more relevant is whether the workplace lets you\n      work or inhibits you.\n- Maybe noise is a problem, but there are cheaper ways to deal with it than mucking around with the physical layout. We\n  could just pipe in white noise of Muzak and cover up the disturbance.\n    - You can either treat the symptom or treat the cause. Treating the cause means choosing isolation in the form of\n      noise barriers - walls and doors - and these cost money. Treating the symptom is much cheaper, when you install\n      Muzak or some other form of pink noise, you can save even more money by ignoring the problem.\n- Enclosed offices don't make for a vital environment. We want people to interact productively, and that is what they\n  want, too. So walls and doors would be a step in the wrong direction.\n    - Enclosed offices don't have to be one-person offices. 2, 3, 4-person office makes a lot more sense.\n\nManagement, at its best, should make sure there is enough space, enough quiet, and enough ways to ensure privacy so that\npeople can create their own sensible work space.\n\n## Chapter 13: Taking Umbrella Steps\n\n> People cannot work effectively if their workspace is too enclosed or too exposed. A good workspace strikes the\n> balance. You feel more comfortable in a workspace if there is a wall behind you. There should be no blank wall closer\n> than 2.5m in front of you (eye relief). You should not be able to hear noises very different from the kind you make,\n> from your workspace. Workspaces should allow you to face in different directions.\n\n> Rooms without a view are like prisons for the people who have to stay in them\n\n~ Christopher Alexander, _A Pattern Language_\n\n## Chapter 14: The Hornblower Factor\n\nHornblower is the ultimate manager - his career advanced from midshipman to admiral through the same blend of\ncleverness, daring, political maneuvering and good luck.\n\nManagers are supposed to use their leadership skills to bring out untapped qualities in each subordinate - this is not\nrealistic. The manager doesn't have enough leverage to make a difference in person's nature. So the people who work for\nyou through whatever period will be more or less the same at the end as they were at the beginning. If they are not\nright for the job from the start, they will never be. Getting the right people in the first place is all important.\n\nMost hiring mistakes result from too much attention to appearance. Evolution has planted in each of us a certain\nuneasiness toward people who differ by very much from the norm. The need for uniformity is a sign of insecurity on the\npart of management. Strong managers don't care when team members cut their hair or whether they wear ties. Their pride\nis only to their staff's accomplishments.\n\nCompanies sometimes impose standards of dress, they remove considerable discretion from the individual. The effect is\ndevastating - people can talk and think of nothing else, all useful work stops dead. The most valuable people begin to\nrealize that they aren't appreciated for their contributions, but for haircuts and neckties.\n\nThe term _unprofessional_ is often used to characterize surprising and threatening behaviour. Anything that upsets the\nweak manager is almost by definition unprofessional - long hair on male's head, comfortable shoes, dancing around desk,\nlaughing, ...\n\nSecond thermodynamic law of management: _Entropy is always increasing in the org._ - That's why most elderly\ninstitutions are tighter and a lot of less fun than sprightly young companies.\n\n## Chapter 15: Let's talk about Leadership\n\nOne of the worst dreadful \"motivational\" posters says: \"The speed of the leader sets the rate of the pack\" ==\nwork-extraction mechanism, purpose is to increase quantity, not quality - work harder, stay loner, stop goofing off.\n\nLeadership is not about extracting anything from somebody - it is about service, while they sometimes set explicit\ndirections, their main role is that of a catalyst, not a director.\n\nRebellious leadership is important in order to innovate - they should supply time to innovate (take a person away from\ndoing billable work). Nobody knows enough to give permission to the key innovators to do what needs to be done. That's\nwhy leadership as a service almost always operates without official permission.\n\n> If companies were more likely inclined to let leadership arise naturally, they wouldn't need to produce so much hot\n> air talking about it.\n\n## Chapter 16: Hiring a Juggler\n\nYou are hiring a person to produce, you need to examine a sample of those products to see the quality of work the\ncandidate does. Otherwise, the interview is just a talk.\n\nYou can show off your portfolio as part of each interview.\n\nAptitude tests are almost always oriented toward the tasks the person will perform immediately after being hired. They\ntest if person is likely to perform immediately after being hired. Aptitude tests are left-brain oriented. The aptitude\ntest may give you people who perform better in the short term, but are less likely to succeed later on. Use them, but\nnot for hiring.\n\nHiring process needs to focus on at least some sociological and human communication traits. Ask a candidate to prepare\n10-15 minute presentation on some aspect of past work (technology, management, project) - you will be able to see\ncandidate's communication skills.\n\n## Chapter 17: Playing well with others\n\nThe capacity of a team to absorb newness has its limits. Team jell takes time, and, during much of that time, the\ncomposition of the team can't be changing. If you need to use a reactive strategy of labor, your team will probably\nnever jell. In fact, the workspace you manage almost certainly will not be a team at all.\n\n## Chapter 18: Childhood's end\n\nFor youngest employees, computers, smartphones, the Web, programming, hacking, social networking, and blogging are\nenvironment, not technology.\n\nYoung people divide their attention while their older colleagues tend to focus on one or possibly two tasks at once.\nContinuous partial attention is the opposite of flow. There is a difference between spending 2% of time on Facebook in a\nsingle block of time vs spending 2% of attention all day on Facebook.\n\nArticulating requirements to young workers is going to be essential to give them a chance to fit in.\n\n## Chapter 19: Happy to be here\n\nTypical turnover figures are in the range of 80-33%/year => average employee longevity averages between 15 and 36\nmonths. The average person leaves after a little more than two years. It costs 1.5-2 months' salary to hire a new\nemployee (agency or in-house HR).\n\nA new employee is quite useless on Day Zero (or even less than useless), after few months the new person is doing some\nuseful work. Within 5 months, he/she is at full working capacity.\n\nThe total cost of replacing each person is the equivalent of 4.5-5 months of employee cost or about 20% of the cost of\nkeeping that employee for the full 2 years on the job. And the is only the visible cost of turnover.\n\nIn companies with high turnover, people ten toward a destructively short-term viewpoint, because they know they just\naren't going to be there very long. In an organization with high turnover, nobody is willing to take the long view.\n\nIf people only stick around for a year or two, the only way to conserve the best people is to promote them quickly. From\nthe corporate perspective, late promotion is a sign of health.\n\nReasons account for most departures:\n\n- A just-passing-through mentality - no feelings of long-term involvement in the job\n- A feeling of disposability - workers as interchangeable parts (since turnover is really high, nobody is indispensable)\n- A sense that loyalty would be ludicrous - who would be loyal to an org that views its people as parts\n\nPeople leave quickly -> no spending money on training -> no investment in the individual -> individual thinks of moving\non.\n\nThe best companies are consciously striving to be best. People tend to stay at such companies because there is a\nwidespread sense that you are expected to stay. A common feature of companies with the lowest turnover is widespread\nretraining (you are forever bumping into managers and officers who started out as secretaries, payroll clerks, or in the\nmailroom).\n\n## Chapter 20: Human Capital\n\nCompanies that manage their investment sensibly will prosper in the long run. Companies of knowledge workers have to\nrealize that it is their in human capital that matters most. The good ones already do.\n\n## Chapter 21: The Whole is greater than the sum of the Parts\n\nJelled Team - a group of people so strongly knit that the whole is greater than the sum of parts. The production of such\nteam is greater than that of the same people working in unjelled team. Once a team begins to jell, the probability of\nsuccess goes up dramatically. They don't need to be managed in the traditional sense, and they certainly don't need to\nbe motivated. They have got momentum.\n\nBelieving that workers will automatically accept organizational goals is the sign of naive managerial optimism.\n\n> The purpose of a team is not goal attainment but goal alignment\n\nSigns of jelled team:\n\n- low turnover\n- strong sense of identity (colourful name)\n- sense or eliteness (part of something unique, this attitude might be annoying to people outside the group)\n- joint ownership\n\n## Chapter 22: The Black Team\n\nThe story about the legendary, jelled team - The Black Team.\n\n## Chapter 23: Teamicide\n\nYou can't control jelling - the process is too fragile to be controlled. Exact steps are hard to describe, the opposite\nis easier. Teamicide techniques:\n\n- Defensive management\n    - let your people make mistakes, do not send a message that making errors is forbidden\n    - \"My people are too dumb to build systems without me\"\n    - people who feel untrusted have little inclination to bond together into a cooperative team\n- Bureaucracy\n    - mindless paper pushing hurts team formation\n- Physical separation\n    - group members may grow stronger bonds to non-group neighbours, just because they see more of them\n    - putting people together gives them opportunity for the casual interaction that is so necessary for team formation\n- Fragmentation of people's time\n    - bad for team formation and efficiency\n    - no one can be part of multiple jelled teams\n- Quality reduction of the product\n    - typical scenario: deliver a product in less time = lower quality\n    - self-esteem and enjoyment are undermined by the necessity of building a product of clearly lower quality than what\n      they are capable of\n- Phony deadlines\n    - the date mentioned is impossible to meet, and everyone knows it\n    - team will not jell in such environment\n- Clique control\n    - there are no jelled teams at managerial level\n    - as you go higher and higher in the organization chart, the concept of jelled teams reduces further into oblivion\n\n## Chapter 24: Teamicide Revisited\n\n2 additional kinds of teamicide:\n\n- motivational posters\n    - are phony enough to make most people's skin crawl\n- overtime\n    - error, burnout, accelerated turnover, and compensatory undertime\n    - disrupts team\n\n## Chapter 25: Competition\n\nCoaching is an important factor in successful team interaction. It provides coordination, personal growth and feels\ngood. We feel a huge debt to those who have coached us in the past. The act of coaching cannot take place if people\ndon't feel safe. In competitive atmosphere, you would be crazy to let anyone see you sitting down to be coached. You\nwould be similarly crazy to coach someone else, as that person may eventually yse your assistance to pass you by.\n\nAnything the manager does to increase the competition within a team has to be viewed as teamidical.\n\n## Chapter 26: A spaghetti dinner\n\nGood managers provide frequent easy opportunities for the team to succeed together. The opportunities may be tiny pilot\nsubprojects, or demonstrations, os simulations, anything that gets the team quickly into the habit of succeeding\ntogether.\n\n## Chapter 27: Open Kimono\n\nThe Open Kimono attitude is the opposite of defensive management. You take no steps to defend yourself from the people\nyou have put into positions of trust. A person you can't trust with any autonomy is of no use to you.\n\nIf you have got decent people under you, there is probably nothing you can do to improve their chances of success more\ndramatically than to get yourself out their hair occasionally. Visual supervision is for prisoners.\n\n## Chapter 28: Chemistry for Team Formation\n\nSome organizations are famous for their consistent good luck in getting well-knit teams to happen. It isn't luck - it's\nchemistry. These organizations are just plain healthy.\n\nSigns of a health organization:\n\n- people at ease\n- people having a good time\n- people enjoying interactions with their peers\n- no defensiveness\n- the work is a joint product\n- everybody is proud of the quality\n- managers devote their energy to build and maintain healthy chemistry\n\nChemistry-building strategy:\n\n- Make a cult of quality - cult of quality is the strongest catalyst for team formation\n- Provide lots of satisfying closure - people need reassurance from time to time that they are headed in the right\n  direction\n- Build a sense of eliteness - people require a sense of uniqueness to be at peace with themselves, and they need to be\n  at peace with themselves to let the jelling process begin\n- Allow and encourage heterogeneity - diverse teams are more fun to work in\n- Preserve and protect successful teams\n- Provide strategic but not tactical direction\n\nManagers are usually not part of the teams that they manage. On the best teams, different individuals provide occasional\nleadership, taking charge in areas where they have particular strengths.\n\n## Chapter 29: The self-healing system\n\nA Methodology - a general theory of how a whole class of thought-intensive work ought to be conducted. _The people who\ncarry write the Methodology are smart. The people who carry it out can be dumb._\n\nThere is a big difference between Methodology and methodology - methodology is a basic approach one takes to get a job\ndone. It doesn't reside in a fat book, but rather inside the heads of people carrying out the work. Big M Methodology is\nan attempt to centralize thinking. All meaningful decisions are made by the Methodology builders, not by the staff\nassigned to do the work.\n\n> Voluminous documentation is part of the problem, not part of the solution. People should focus on getting things done,\n> instead of building documents.\n\nPeople might actually do exactly what the Methodology says, and the work would grind nearly to a halt.\n\n## Chapter 30: Dancing with Risk\n\nOur main problems are more likely to be sociological than technological in nature.\n\nProjects that have real value but little or no risk were all done ages ago. The ones that matter today are laden with\nrisk.\n\nRisk management: it is not to make the risk go away, but to enable sensible mitigation - planned and provisioned well\nahead of time.\n\n## Chapter 31: Meetings, Monologues, and Conversations\n\nSome orgs are addicted to meetings, at the other extreme, some orgs refuse to use the \"M\" word at all.\n\nAs orgs age, meeting time increases until there is time for nothing else. Even short stand-ups can be a drag on an\norganization's effectiveness is they lack purpose and focus.\n\nIn order to cure meeting-addicted org, start small and eliminate most ceremonial meetings in your area, spend time in\none-on-one conversations, limit attendance at working meetings. Encourage Open-Space networking to give people the\nchance to have unstructured interaction.\n\n## Chapter 32: The ultimate management sin is ...\n\nwasting people's time.\n\nWHen participants of a meeting take turns interacting with one key figure, the expected rationale for assembling the\nwhole group is missing - the boss might as well have interacted separately with each of the subordinates.\n\nFragmented time is almost certain teamicidal, but also is guaranteed to waste the individual's time.\n\nThe human capital invested in your workforce also represents a ton of money.\n\n## Chapter 33: E(vil) Mail\n\nWhen you over-coordinate the people who work for you, they are too likely to under-coordinate their own efforts. But\nself-coordination and mutual coordination amon peers is the hallmark of graceful teamwork.\n\nImagine how it would work if every pass could only happen if and when the coach gave the signal from the sideline. A\ndecent coach understands that his/her job is to help people learn to self-coordinate.\n\n> Life is short. If you need to know everything in order to do anything, you are not going to get much done.\n\n## Chapter 34: Making change possible\n\n> People hate change, and that is because people hate change. People really hate change, they really, really do.\n\nWhen we start out to change, it is never certain that we will succeed. The uncertainty is more compelling than the\npossible gain.\n\n> The fundamental response to change is not logical, but emotional\n\n**You can never improve if you can't change at all.**\n\nChange involves at least 4 stages: Old Status Quo -> Chaos -> Practice and Integration -> News Status Quo. Change\nhappens upon introduction of a foreign element: a catalyst for a change. Without a catalyst, there is no recognition of\nthe desirability of change.\n\nChange won't get even started unless people fell safe - people feel safe when they know they will not be demeaned for\nproposing a change.\n\nChange has only a chance of succeeding if failure is also okay.\n\n## Chapter 35: Organizational learning\n\nLearning is a critical improvement mechanism - non-learners can not expect to prosper for very long without learning.\n\nExperience gets turned into learning when an organization alters itself to take account of what experience has shown.\n\n> Learning is limited by an organization's ability to keep its people\n\nWhen turnover is high, learning is unlikely to stick or can't take place at all. In such an organization, attempts to\nchange skills or to improve redesigned procedures are an exercises in futility.\n\n## Chapter 36: The making of community\n\nWhat great managers do best? The making of community. A need for community is something that is built right into the\nhuman firmware.\n\nCommunity doesn't just happen on the job. It has to be made. The people who make it are the unsung heroes of our work\nexperience.\n\nAn org that succeeds in building a satisfying community tends to keep its people. No one wants to leave. The investment\nmade in human capital is retained, and upper management is willing to invest more. When the company invest more in its\npeople, the people perform better and feel better about themselves and about their company.\n\nThere is no formula to build community in the workplace. Some experimenting is needed.\n\n## Chapter 37: Chaos and Order\n\nThere is something about human nature that makes us the implacable enemies of chaos. People who were attracted to the\nlack of order, feel nostalgic fondness foe the days when everything wasn't so awfully mechanical.\n\nSome lost disorder can be reintroduced to breath some energy into the work - a policy of constructive reintroduction of\nsmall amounts of disorder:\n\n1. Pilot projects\n    - set the fat book of standards aside and try some new unproved technique\n    - people get the boost in energy when they are doing something new and different\n2. War games\n    - war games help you evaluate your relative strengths and weaknesses and help the organization to observe its global\n      strengths and weaknesses\n    - a bug fuss should be made over any and all accomplishments\n3. Brainstorming\n    - interactive session, targeted on creative insight\n    - focus on quantity of ideas, not quality, keep proceedings loose, even silly, discourage negative comments\n4. Provocative training experiences\n5. Training, trips, conferences, celebrations, and retreats\n    - everybody relishes a chance to get out of the office\n    - when a team is forming, it makes a good business sense to fight for travel money to get team members out of office\n      together\n    - adventure adds small amounts of constructive disorder\n\n## Chapter 38: Free Electrons\n\nFree electrons - workers having a strong role in choosing their own orbits. Positions with loosely stated\nresponsibilities so that the individual has a strong say in defining the work. Companies profit from such people.\n\nSome individuals need to be left alone to work out some matters, or at least free to seek guidance if and when and from\nwhomever he or she chooses. The mark of the best manager is an ability to single out the few key spirits who have the\nproper mix of perspective and maturity and then turn them loose.\n\n## Chapter 39: Holgar Dansk\n\nA single person acting alone is not likely to effect any meaningful change. But there is no need to act alone. When\nsomething is terribly out of kilter, it takes very little to raise people's consciousness of it. Then it is no longer\nyou. It is everyone.\n\nIt may be small voice saying: \"This is unacceptable\" -- people know it is true. Once it has been said out loud, they\ncan't ignore it any longer.\n\nSociology matters more than technology or even money. It is supposed to be productive, satisfying fun to work. If it\nisn't, there is nothing else worth concerning on. Choose your terrain carefully, assemble your facts, and speak up. You\ncan make a difference.\n"
  },
  {
    "path": "books/pragmatic-programmer.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# The Pragmatic Programmer: journey to mastery, 20th Anniversary Edition\n\nBook by David Thomas and Andrew Hunt\n\n- [Chapter 1: A Pragmatic Philosophy](#chapter-1-a-pragmatic-philosophy)\n- [Chapter 2: A Pragmatic Approach](#chapter-2-a-pragmatic-approach)\n- [Chapter 3: The Basic Tools](#chapter-3-the-basic-tools)\n- [Chapter 4: Pragmatic Paranoia](#chapter-4-pragmatic-paranoia)\n- [Chapter 5: Bend, or Break](#chapter-5-bend-or-break)\n- [Chapter 6: Concurrency](#chapter-6-concurrency)\n- [Chapter 7: While you are coding](#chapter-7-while-you-are-coding)\n- [Chapter 8: Before the Project](#chapter-8-before-the-project)\n- [Chapter 9: Pragmatic Projects](#chapter-9-pragmatic-projects)\n- [Postface](#postface)\n\n## Chapter 1: A Pragmatic Philosophy\n\n**You Have Agency.** It is your life. You own it. You run it. You create it. This industry gives you a remarkable set of\nopportunities. Be proactive, and take them.\n\nThe team needs to be able to trust you and rely on you, and you need to be comfortable relying on each of them as well.\nIn a healthy environment based in trust, you can safely speak your mind, present your ideas, and rely on your team\nmembers who can in turn rely on you.\n\n**Provide options, don't make lame excuses.** Instead of excuses provide options. Don't say it can't be done: explain\nwhat can be done to salvage the solution. When you find yourself saying \"_I don't know_\" be sure to follow it up with\n\"_--but I'll find out_\". It is a great way to admit what you don't know, but then take responsibility like a pro.\n\n_Entropy_ - a term from physics that refers to the amount of \"disorder\" in a system. The entropy in the universe tends\ntoward a maximum. When disorder increases in software, we call it \"software rot\". Some folks might call it by the more\noptimistic term \"_technical debt_\" (with the implied notion that they will pay it back someday, they probably will not).\n\n**Don't live with broken windows.** Bad designs, wrong decisions, or poor code. Fix each one as soon as it is\ndiscovered. If there is no sufficient time to fix it properly, board it up. Take some action to prevent further damage\nand to show that you are on top of the situation. Don't let entropy win. If you find yourself working on a project with\nquite a few broken windows, it is all to easy to slip into the mindset of \"_All the rest of this code is crap, I will\njust follow suit._\". By the same token, if you find yourself on a project where the code is beautiful, well-designed,\nand elegant - you will likely take extra special care not to mess it up.\n\nIdea: Help strengthen your team by surveying your project neighbourhood. Choose two or three broken windows and discuss\nwith your colleagues what the problems are and what could be done to fix them.\n\n**Be a catalyst for change.** You may be in a situation where you know exactly what needs doing and how to do it. People\nwill form committees, budgets will need approval, and things will get complicated. Work out what can you reasonably ask\nfor. Develop it well. Once you have got it, show people, and let them marvel. Sit back and wait for them to start asking\nyou to add the functionality you originally wanted. Show them a glimpse of the future, and you will get them to rally\naround.\n\n**Remember the Big Picture.** Constantly review what is happening around you, not just what you personally are doing.\nProjects slowly and inexorably get totally out of hand. Most software disasters start out too small to notice, and most\nprojects overruns happen a day at a time. It is often the accumulation of small things that breaks morale and teams.\n\nSituational awareness (is there anything out of context, anything that looks like it doesn't belong), a technique\npracticed by folks ranging from Boy and Girl Scouts and Navy SEALs. Get in a habit of really looking and noticing your\nsurroundings.\n\n**Make quality a requirements issue.** Involve your users in determining the project's real quality requirements.\n\n> An investment in knowledge always pays the best interest ~ Benjamin Franklin\n\n**Invest regularly in your knowledge portfolio.** Your knowledge and experience are your most important day-to-day\nprofessional assets. Knowledge may become out of date, as the value of your knowledge declines, so does your value to\nyour company or client.\n\n1. Invest regularly - invest in knowledge regularly, even small amounts.\n2. Diversify - the more different things you know, the more valuable you are.\n3. Manage risk - don't put all your technical eggs in one basket.\n4. Buy low, sell high - learning an emerging technology before it becomes popular can be just as hard as finding an\n   undervalues stock, but the payoff can be just as rewarding.\n5. Review and rebalance - that hot technology you started investing last month might be stone-cold by now.\n\nGoals:\n\n- learn at least one programming language per year - by learning several approaches, you can broaden your thinking\n- read a technical book each month\n- read nontechnical books too - don't forget the human side of the equation, as that requires an entirely different\n  skill set\n- take classes - look for interesting courses at local or online college\n- participate in local user groups and meetups - isolation can be deadly to your career, find out what people are\n  working on outside of your company\n- experiment with different environments - try Linux, Windows, Mac, a new IDE, ...\n- stay current - read news and posts online on technology different from that of your current project\n\n**Critically analyze what you read and hear.** You need to ensure that the knowledge in your portfolio is accurate and\nunswayed by either vendor or media hype.\n\n_Critical Thinking Tutorial:_\n\n1. Ask the \"Five Whys\" - ask why at least 5 times. Ask a question and get an answer. Dig deeper by asking \"why\".\n2. Who does this benefit? - \"follow the money\" can be a very helpful path to analyze. The benefits to someone else or\n   another organization may be aligned with your own, or not.\n3. What is the context? - everything occurs in its own context. Good for someone, doesn't mean it is good for you.\n4. Why is this a problem? - is there an underlying model? How does the underlying model work?\n\n**English is just another programming language.** Having the best ideas, the finest code, or the most pragmatic thinking\nis ultimately sterile unless you can communicate with other people.\n\n**It is both what you say and the way you say it.** There is no point in having great ideas if you don't communicate\nthem effectively. The more effective communication, the more influential you become.\n\n**Build documentation in, don't bolt it on.** It is easy to produce good-looking documentation from the comments in\nsource code, and we recommend adding comments to modules and exported functions to give other developers a leg up when\nthey come to use it. Restrict your non-API commenting to discussing why something is done, its purpose and its goal. The\ncode already shows how it is done, so commenting on this is redundant - and is a violation of the DRY principle.\n\n## Chapter 2: A Pragmatic Approach\n\n**Good design is easier to change than bad design.** A thing is well-designed if it adapts to the people who use it.\nCode should be Easy To Change. That's why SRP, decoupling, naming, ... are important, because of ETC.\n\n**DRY - Don't Repeat Yourself.** Every piece of knowledge must have a single, unambiguous, authoritative representation\nwithin a system.\n\nMost people maintenance begins when an application is released, that maintenance means fixing bugs and enhancing\nfeatures. This is wrong. Programmers are constantly in maintenance mode. Maintenance is not a discrete activity, but a\nroutine part of the entire development process. When we perform maintenance, we have to find and change the\nrepresentation of things. It is easy to duplicate knowledge in the specifications, processes, and programs we develop,\nand when we do so, we invite a maintenance nightmare.\n\nDRY is about the duplication of knowledge, of intent. It is about expressing the same thing in two different places,\npossibly in two totally different ways.\n\nCode may be the same, but the knowledge they represent may be different, and this is not a duplication, that is a\ncoincidence.\n\n> All services offered by a module should be visible through a uniform notation, which does not betray whether they are\n> implemented through storage of through computation.\n\n**Make it easy to reuse.** You should foster an environment where it is easier to find and reuse existing stuff than to\nwrite it yourself. If it isn't easy, people will not do it. And if you fail to reuse, you risk duplicating knowledge.\n\nTwo or more things are orthogonal if changes in one do not affect any of the others. In a well-designed system, the\ndatabase code will be orthogonal to the user interface - you can change the interface without affecting the database,\nand swap databases without changing the interface. Non-orthogonal systems are more complex to change and control.\n\n**Eliminate effects between unrelated things.** We want to design components that are self-contained - independent and\nwith a single, well-defined purpose.\n\nWhen components are well isolated from one another, you know that you can change one without having to worry about the\nrest. As long as you don't change that component's external interfaces, you can be confident that you will not cause\nproblems that ripple through the entire system.\n\nModular, component-based, layered systems -> these are orthogonal systems.\n\n- Keep your code decoupled - write shy modules, modules that don't reveal anything unnecessary to other modules and that\n  don't rely on other modules' implementations. If you need to change an object's state, get the other object to do it\n  for you.\n- Avoid global data - in general, your code is easier to understand and maintain if you explicitly pass any required\n  context into your modules.\n- Avoid similar functions - duplicate code is a symptom of structural problems.\n\n**There are no final decisions.** The mistake lies in assuming that any decision is cast in stone - and not in preparing\nfor the contingencies that might arise. Think of decisions as being written in the sand at the beach. A big wave can\ncome along and wipe them out at any time.\n\n**Forgo following fads.** Choose architecture based on fundamentals, not fashion. No one knows what the future may hold.\n\n**Use tracer bullets to find the target.** Look for important requirements, the one that define the system. Look for\nareas where you have doubts, and where you see the biggest risks. Then prioritize your development so that these are the\nfirst areas you code. Benefits of the tracer code:\n\n- Users get to see something working early.\n- Developers build a structure to work in.\n- You have an integration platform.\n- You have something to demonstrate.\n- You have a better feel for progress.\n\nPrototyping generates disposable code. Tracer code is lean but complete, and forms part of the skeleton of the final\nsystem. Think of prototyping as the reconnaissance and intelligence gathering that takes place before a single tracer\nbullet is fired.\n\nPrototypes are designed to answer just a few questions, so they are much cheaper and faster to develop than applications\nthat go into production. You can prototype: architecture, new functionality in an existing system, structure or contents\nof external data, third-party tools or components, performance issues, user interface design.\n\n**Prototype to learn.** Prototyping is a learning experience. Its value lies not in the code produced, but in the lesson\nlearned. That's really the point of prototyping. It is easy to become mislead by the apparent completeness of a\ndemonstrated prototype, and project sponsors or management may insist on deploying the prototype. Remind them that you\ncan build a great prototype of a new car out of balsa wood and duct tape, but you wouldn't try to drive it in rush-hour\ntraffic.\n\nIf you feel there is a strong possibility in your environment or culture that the purpose of prototype code may be\nmisinterpreted, you may be better off with the tracer bullet approach.\n\n**Program close to the problem domain.** Try to write code using the vocabulary of the application domain.\n\n**Estimate to avoid surprises.** Estimate before you start. You will spot potential problems up front.\n\nBasic estimating trick: ask someone who's already done it. Before you get too committed to model building, cast around\nfor someone who has been in a similar situation in the past. See how their problems got resolved.\n\nModel building can be both creative and useful in the long term. Often, the process of building the model leads to\ndiscoveries of underlying patterns and processes that weren't apparent on the surface. Building the model introduces\ninaccuracies into the estimating process.\n\n_PERT - Program Evaluation Review Technique_ - an estimating methodology, every PERT task has an optimistic, a most\nlikely, and a pessimistic estimate. Using a range of values like this is a great way to avoid one of the most common\ncauses of estimation error - padding a number because you are unsure.\n\n**Iterate the schedule with the code.** Make the management understand that the team, their productivity, and the\nenvironment will determine the schedule. By formalizing this, and refining the schedule as part of each iteration, you\nwill be giving them the most accurate scheduling estimates you can.\n\n## Chapter 3: The Basic Tools\n\nTools amplify your talent. The better your tools, and the better you know how to use them, the more productive you can\nbe.\n\n**Keep knowledge in plain text.** Text will not become obsolete. Make plain text understandable to humans.\n\n**Always use version control.** Make sure that everything is under version control: documentation, phone number lists,\nmemos to vendors, makefiles, build and release procedures - everything.\n\n**Fix the problem, not the blame.** It doesn't really matter whether the bug is your fault or someone else's.\n\n**Don't panic.** The first rule of debugging. Don't waste a single neutron on the train of thought that begins \"but that\ncan't happen\" because clearly it can, and has.\n\n**Failing test before fixing code.** We want a bug that can be reproduced with a single command. It is a lot harder to\nfix a bug if you have to go through 15 steps to get to the point where the bug shows up.\n\n**Read the damn error message.** Most exceptions tell both what failed and where it failed.\n\nBinary search can be used for finding releases that caused the error, determining minimal subset of values that cause\nprogram to fail.\n\n**Select isn't broken.** It is possible that a bug exists in the OS, the compiler, or a third-party product - but this\nshould not be your first thought. It is much more likely that the bug exists in the application code under development.\n\n**Don't assume it - prove it.** Don't gloss over a routine or piece of code involved in the bug because you \"know\" it\nworks. Prove it. Prove it in this context, with this data, with these boundary conditions.\n\n## Chapter 4: Pragmatic Paranoia\n\n**You can't write perfect software.** Perfect software doesn't exist. Pragmatic Programmers don't trust themselves.\nKnowing that no one writes perfect code, including themselves. Pragmatic Programmers build in defenses against their own\nmistakes.\n\n**Design with contracts.** Be strict in what you will accept before you begin, and promise as little as possible in\nreturn. Remember, if your contract indicates that you will accept anything and promise the world in return, you have got\na lot of code to write.\n\n**Crash early.** Don't catch or rescue all exceptions, re-raising them after writing some kind of message. Do not\neclipse code by the error handling. Without exception handling code is less coupled. Crashing often is the best thing\nyou can do. The Erland and Elixir languages embrace this philosophy.\n\nWhen your code discovers that something that was supposed to be impossible just happened, your program is no longer\nviable. Anything it does from this point forward becomes suspect, so terminate it as soon as possible.\n\n**Use assertions to prevent the impossible.** Whenever you find yourself thinking \"but of course that could never\nhappen\" add code to check it. Assertions are also useful checks on an algorithm's operation. Assertions check for things\nthat should never happen. LEAVE ASSERTIONS TURNED ON.\n\n**Finish what you start.** It simply means that the function or object that allocates a resource should be responsible\nfor deallocating it.\n\n**Take small steps - always.** Always take small, deliberate steps, checking for feedback and adjusting before\nproceeding. Consider that the rate of feedback is your speed limit. You never take on a step or a task that is \"too big\"\n. The more you have to predict what the future will look like, the more risk you incur that you will be wrong. Instead\nof wasting effort designing for an uncertain future, you can always fall back on designing your code to be replaceable.\n\nMaking code replaceable will also help with cohesion, coupling, and DRY, leading to a better design overall.\n\n## Chapter 5: Bend, or Break\n\nDecoupling shows how to keep separate concepts separate, decreasing coupling. Coupling is the enemy of change, because\nit links together things that must change in parallel.\n\nWhen you are designing bridges, you want them to hold their shape - you need them to be rigid. But when you are\ndesigning software that you will want to change, you want exactly the opposite - you want it to be flexible.\n\n**Decoupled code is easier to change.**\n\n**Tell, don't ask.** (The Law of Demeter) You shouldn't make decisions based on the internal state of an object abd then\nupdate the object. Doing so totally destroys the benefits of encapsulation, and, in doing so, spreads the knowledge of\nthe implementation thought the code.\n\nA method defined in a class C should only call:\n\n- Other instance methods\n- Its parameters\n- Methods in objects it creates\n- Global variables\n\n**Don't chain method calls.** (Something simpler than the Law of Demeter.) Try not to have more than one \".\" when you\naccess something. The rule doesn't apply if the things you are changing are really unlikely to change (e.g. libraries\nthat come with the language).\n\n**Avoid global data.** It is like adding extra parameter to every method.\n\n**If it is important enough to be global, wrap it in an API.** Any mutable external resource is global data (database,\nfile system, service API, ...). Always wrap these resources behind code that you control.\n\nKeeping your code shy - having it deal with things it directly knows about, will help keep you applications decoupled,\nand that will make them more amenable to change.\n\nPublish/Subscribe generalizes the observer pattern, at the same time solving the problems of coupling and performance.\n\nStreams let us treat events as if they were a collection of data. It's as if we had a list of events, which got longer\nwhen new events arrive. We can treat streams like any other collection (manipulate, filter, combine).\n\nBaseline for reactive event handling: reactivex.io\n\n**Programming is about code, but programs are about data.** Start designing using transformations (unix-like pipelines).\nUsing pipelines means that you are automatically thinking in terms of transforming data.\n\n**Don't hoard state, pass it around.** Functions greatly reduce coupling. A function can be used (and reused) anywhere\nits parameters match the output of some other function. There is still a degree of coupling, but it is more manageable\nthan the OO-style of command and control.\n\nThinking of code as a series of nested transformations can be a liberating approach to programming. It takes a while to\nget used to, but once you have developed the habit you will find your code becomes cleaner, your functions shorter, and\nyour designs flatter.\n\n**Don't pay inheritance tax.** Inheritance is coupling. Not only is the child class coupled to the parent, the parent's\nparent, and so on, but the code that uses the child is also coupled to al the ancestors.\n\nAlternatives to inheritance:\n\n- interfaces and protocols - these declarations create no code. We can use them to create types, and any class that\n  implements the appropriate interface will be compatible with that type.\n- delegation - has-a is better than is-a. If parent has 20 methods, and the subclass wants to make use of just 2 of\n  them, its objects will still have the other 18 just lying around and callable.\n- mixins and traits - use them to share functionality. The basic ide is simple, we want to be able to extend classes and\n  objects with new functionality without using inheritance. So we create a set of these functions, give that set a name,\n  and then somehow extend a class with them.\n\n**Prefer interfaces to express polymorphism.** Interfaces and protocols give us polymorphism without inheritance.\n\n**Parametrize your app using external configuration.** When code relies on values that may change after the application\nhas gone live, keep those values external to the app. Keep the environment and customer-specific values outside the\napp (credentials, logging levels, IP addresses, validation parameters, external rates - e.g. tax rates, formatting\ndetails, license keys).\n\nWhile static configuration is common, we currently favor a different approach. We still want configuration data kept\nexternal to the application, but rather than in a flat file ro database, we would like to see it stored behind a service\nAPI.\n\n## Chapter 6: Concurrency\n\nConcurrency - when the execution of two or more pieces of code act as if they run at the same time (context switching).\nParallelism is when they do run at the same time (multiple cores).\n\nTemporal coupling - coupling in time. Temporal coupling happens when your code imposes a sequence on things that is not\nrequired to solve the problem.\n\n**Analyze workflow to improve concurrency.** Find out what can happen at the same time, and what must happen in a strict\norder. One way to do this is to capture the workflow using a notation such as the activity diagram.\n\n**Shared state is incorrect state.** A semaphore is a thing that only one person can own at a time. You can create a\nsemaphore and the use it to control some other resource.\n\n**Random failures are often concurrency issues.** Whenever tow or more instances or your code can access some resource\nat the same time, you are looking at a potential problem.\n\n**Use actors for concurrency without shared state.** Actors execute concurrently, asynchronously and share nothing. An\nactor is an independent virtual processor with its own local state. Each actor has a mailbox. When a message appears in\nthe mailbox and the actor is idle, it kicks into life and processes the message. When it finishes processing, it\nprocesses another message in the mailbox, or goes back to sleep.\n\n**Use blackboards to coordinate workflow.** Order of data arrival is irrelevant - when a fact is posted it can trigger\nthe appropriate rules. The output of any rules can post to the backboard and cause the triggering of yet more applicable\nrules.\n\n## Chapter 7: While You Are Coding\n\n**Listen to your inner lizard.** When it feels like your code is pushing back, it is really your subconscious trying to\ntell you something is wrong.\n\nLearning to listen to your gut feeling when coding is an important skill to foster. But it applies to the bigger picture\nas well. Sometimes a design just feels wrong, or some requirements makes you feel uneasy. Stop and analyze these\nfeelings. If you are in a supportive environment, express them out loud. Explore them.\n\n**Don't program by coincidence.** Don't rely on luck and accidental success.\n\n- Always be aware of what you are doing.\n- Can you explain the code, in detail, to a more junior programmer? If not, perhaps you are relying on coincidences.\n- Don't code in dark. If you are not sure why it works, you will not know why it fails.\n- Proceed from a plan.\n- Don't depend on assumptions. If you can't tell something is reliable, assume the worst.\n- Document your assumptions.\n- Don't just test your code, but test your assumptions as well. Don't guess, try it. Write an assertion to test your\n  assumptions. If your assertion is right, you have improved the documentation in your code. If you discover your\n  assumption is wrong, then count yourself lucky.\n- _Don't be a slave to history. Don't let existing code dictate future code. All code can be replaced if it is no longer\n  appropriate._\n\n**Estimate the order of your algorithms.** Estimate the resources that algorithms use - time, processor, memory, and so\non. When you write anything containing loops or recursive calls, check the runtime and memory requirements. When a more\ndetailed analysis is needed - use Big-O notation.\n\nThink of the _O_ as meaning _on the order of_. Big-O is never going to give you actual numbers for time of memory of\nwhatever - it simply tells you how these values will change as the input changes.\n\nCommon sense estimation:\n\n- simple loops - _O(n)_\n- nested loops - _O(n^2)_\n- binary chop - _O(log n)_\n- divide and conquer - _O(n log n)_\n- combinatorics - running time might run out of time, _O(n!)_\n\n**Test your estimates.** The fastest one is not always the best for the job. Given a small input set, a straightforward\ninsertion sort will perform just as well as a quicksort, and will take less time to write and debug.\n\nBe wary of _premature optimisation_. It is always a good idea to make sure an algorithm really is a bottleneck before\ninvesting your precious time trying to improve it.\n\nRefactoring: As a program evolves, it will become necessary to rethink earlier decisions and rework portions of code.\nThis process is perfectly natural. Code needs to evolve - it is not a static thing.\n\nThe most common metaphor for software development is building construction. Rather than a construction, software is more\nlike a gardening - it is more organic than concrete.\n\nRefactoring is not intended to be a special, high-ceremony, once-in-a-while activity. Refactoring is a day-to-day\nactivity, taking low risk small steps. It is a targeted, precise approach to help keep the code easy to change. You need\ngood, automated unit testing that validates the behavior of the code.\n\nAny number of things may cause code to qualify for refactoring:\n\n- duplication\n- non-orthogonal design - change to one thing affects the other\n- outdated knowledge\n- usage - some features may be more important than originally thought\n- performance\n- the test pass - when you have added a small amount of code, and that extra test passes, you have a great opportunity\n  to dive in and tidy up what you just wrote.\n\n**Refactor early, refactor often.** Time pressure is often used as an excuse for not refactoring. Fail to refactor now,\nand there will be a far greater time investment to fix the problem down the road.\n\n**Explain this principle to others by using a medical analogy: think of the code that needs refactoring as \"a growth\".\nRemoving it requires invasive surgery. You can go in now, and take it out while it is still small. Or, you could wait\nwhile it grows and spreads - but removing it then will be both more expensive and more dangerous. Wait even longer, and\nyou may lose the patient entirely.**\n\nHow to refactor without doing more harm than good:\n\n1. Don't try to refactor and add functionality at the same time.\n2. Make sure you have good tests before you begin refactoring. Run the tests as often as possible.\n3. Take short, deliberate steps. Refactoring often involves making many localized changes that result in a larger-scale\n   change.\n\nDon't live with broken windows.\n\n**Testing is not about finding bugs.** Major benefits of testing happen when you think about and write the tests, not\nwhen you run them.\n\n**A test is the first user of your code.** Testing is vital feedback that guides your coding. _A function or method that\nis tightly coupled to other code is hard to test, because you have to set up all that environment._ Making your stuff\ntestable also reduces its coupling.\n\n**Build end-to-end, not top-down or bottom up.** Build small pieces of end-to-end functionality, learning about the\nproblem as you go.\n\nLike our hardware colleagues, we need to build testability into the software from the very beginning, and test each\npiece thoroughly before trying to wire them together. Chip-level testing for hardware is roughly equivalent to unit\ntesting in software. Write test cases that ensure a given unit honors its contract. We want to test that the module\ndelivers the functionality it promises.\n\n**Design to test.** Start thinking about testing before you write a line of code.\n\nApproaches:\n\n- Test first - TDD - probably the best choice in most circumstances.\n- Test during - a good fallback when TDD is not useful or convenient.\n- Test never - the worst choice.\n\n**Test your software, or your users will.** Make no mistake, testing is part of programming. It is not something left to\nother departments or staff. Testing, design, coding - it is all programming.\n\n**Use property-based tests to validate your assumptions.** Property-based tests will try things you never thought to\ntry, and exercise your code in ways it wasn't meant to be used. For python use _Hypothesis_ framework. Hypothesis gives\nyou a minilanguage for describing the data it should generate.\n\n**Keep it simple and minimize attack surfaces.** Bear in mind these security principles:\n\n1. Minimize Attack Surface Area\n    1. Code complexity makes the attack surface larger, with more opportunities for unanticipated side effects. Think of\n       complex code as making the surface area more porous and open to infection. Simple, smaller code is better.\n    2. Never trust data from an external entity, always sanitize it before passing it on to a database, view rendering,\n       or other processing.\n    3. Unauthenticated services are an attack vector. Any user anywhere in the world cal call unauthenticated services.\n    4. Keep the number of authenticated users at an absolute minimum. Cull unused, old, or outdated users and services.\n       If an account with development services is compromised, your entire product is compromised.\n    5. Don't give too much information about an error in the response.\n2. Principle of Least Privilege - Every program and every privileged user of the system should operate using the least\n   amount of privilege necessary to complete the job.\n3. Don't leave personally identifiable information, financial data, passwords, or other credentials in plain text. Don't\n   check in secrets, API keys, SSH keys, encryption passwords or other credentials alongside your code in version\n   control.\n4. Apply security patches quickly. The largest data breaches in history were caused by systems that were behind on their\n   updates.\n\nYou don't want to do encryption yourself. Even the tiniest error can compromise everything. Rely on reliable things.\nTake the more pragmatic approach and let someone else worry about it and use a third party authentication provider.\n\n**Name well, rename when needed.** Things should be named according to the role they play in your code. Honor the local\nculture (snake_case vs CamelCase vs ...). Every project has its own vocabulary - jargon words that have a special\nmeaning to the team. It is important everyone on the team knows what these words mean. One way is to encourage a lot of\ncommunication, another way is to have a project glossary.\n\nWhen you see a name that no longer expresses the intent, or is misleading or confusing, fix it.\n\n## Chapter 8: Before the Project\n\n**No one knows exactly what they want.** Requirements rarely lie on the surface. Normally, they are buried deep beneath\nlayers of assumptions, misconceptions, and politics.\n\n**Programmers help people understand what they want.** Our job is to help people understand what they want.\n\n**Requirements are learned in a feedback loop.** Your role is to interpret what the client says and to feed back to them\nthe implications. This is both an intellectual proces and a creative one. Your job is to help the client understand the\nconsequences of their stated requirements.\n\n**Work with the user to think like a user.** There is a simple technique for getting inside your client's heads: become\na client.\n\n**Policy is metadata.** Don't hardcode policy into a system, instead express it as metadata used by the system.\n\n**Use a project glossary.** Create and maintain a project glossary - one place that defines all the specific terms and\nvocabulary used in a project. It is hard to succeed on a project if users and developers call the same thing by\ndifferent names.\n\n**Don't think outside the box - find the box.** When faced with an impossible problem, identify the real constraints.\nAsk yourself: Does it have to be done this way? Does it have to be done at all?\n\nSometimes you find yourself working on a problem that seems much harder than you thought it should be. You may think\nthis particula problem is \"impossible\". This is an ideal time to do something else for a while. Sleep on it, go walk the\ndog. People who were distracted did better on a complex problem-solving task than people who put in conscious effort. If\nyou are not willing to drop the problem for a while, the next best thing is probably finding someone to explain it to (\nrubber duck).\n\nConway's Law: \"_Organizations which design systems are constrained to produce designs which are copies of the\ncommunication structures of these organizations_\".\n\n**Don't go into code alone.**\n\nPair programming - the inherited peer-pressure of a second person helps against moments of weakness and bad habits of\nnaming variables such as foo and such. You are less inclined to take a potentially embarrassing shortcut when someone is\nactively watching, which also results in higher-quality code.\n\nMob programming - it is an extension of pair programming that involves more than just two developers. You can think of\nmob programming as tight collaboration with live coding.\n\n**Agile is not a noun, agile is how you do things.** Agile is an adjective. Remember the values from the manifesto:\n\n1. Individuals and interactions over processes and tools\n2. Working software over comprehensive documentation\n3. Customer collaboration over contract negotiation\n4. Responding to change over following a plan\n\nAgility is all about responding to change, responding to the unknowns you encounter after you set out.\n\nRecipe for working in an agile way:\n\n1. Work out where you are.\n2. Make the smallest meaningful step towards where you want to be.\n3. Evaluate where you end up, and fix anything you broke (this requires a good design, because it is easier to fix good\n   design).\n\n## Chapter 9: Pragmatic Projects\n\n**Maintain small, stable teams.** A pragmatic team is small, under 10-12 or so members. Members come and go rarely.\nEveryone knows everyone well, trust each other, and depends on each other.\n\nQuality is a team issue. The most diligent developer placed on a team that just doesn't care will find it difficult to\nmaintain the enthusiasm needed to fix niggling problems. Teams as a whole should not tolerate broken windows - those\nsmall imperfections that no one fixes.\n\n**Schedule to make it happen.** If your team is serious about improvement and innovation, you need to schedule it.\nTrying to get things done \"whenever there is a free moment\" means they will never happen. Whatever sort of backlog or\ntask list or flow you are working with, don't reserve it for only feature development. The team works on more than just\nnew features:\n\n- old systems maintenance\n- process reflection and refinement - continuous improvement can only happen when you take the time to look around\n- new tech experiments - try new stuff and analyze results\n- learning and skill improvements - brown bags, training sessions\n\n**Organize fully functional teams.**\n\nThere is a simple marketing trick that helps teams communicate as one - generate a brand. When you start a project, come\nup with a name for it, ideally off-the-wall. Spend 30 minutes coming up with a zany logo, and use it, but it gives your\nteam an identity to build on, and the world something memorable to associate with your work.\n\nGood communication is key to avoiding problems. You should be able to ask a question of team members and get a\nmore-or-less instant reply. If you have to wait for a week for the team meeting to ask your question or share your\nstatus, that is an awful lot of friction.\n\n**Do what works, not what is fashionable.** Ask yourself, why are you even using that particular development\nmethod/framework/whatever? Does it work well for you? Or it was adopted just because it was being used by the latest\ninternet-fueled story?\n\nYou want to take the best pieces from any particular methodology and adapt the for use. No one fits for all, and current\nmethods are far from complete, so you will need to look at more than just one popular method. That is very different\nmindset from \"but Scrum/Lean/Kanban/XP/agile does it this way...\".\n\nThe goal isn't to do Scrum/do agile/ do Lean or what-have-you. The goal is to be in a position to deliver working\nsoftware that gives the users some new capability at a moment's notice. Not weeks, months, or years from now. If you are\ndelivering in years, they shorten the cycle to months. From months, cut it down to weeks. From a four-week sprint, try\ntwo. From a two-week sprint, try one. Then daily. Then, finally, on demand. Note that being able to deliver on demand\ndeos not mean you are forced to deliver every minute of every day. You deliver when the users need it, when it makes\nbusiness sense to do so.\n\n**Deliver when users need it.** In order to move to this style of continuous development, you need to a rock-solid\ninfrastructure.\n\nOnce your infrastructure is in order, you need to decide how to organize the work. Beginners might want to start with\nScrum for project management. More disciplined and experienced teams might look to Kanban and Lean techniques. But\ninvestigate it first. Try these approaches for yourself.\n\n**Use version control to drive builds, tests and releases.** Build, test, and deployment are triggered via commits or\npushes version control, and built in a container in the cloud. Release to staging or production is specified by using a\ntag in your version control system.\n\n**Test early, test often, test automatically.** A good project may well have more test code than production code. The\ntime it takes to produce this test code is worth the effort. It ends up being much cheaper in the long run, and you\nactually stand a chance of producing a product with close to zero defects.\n\n**Coding ain't done till all the tests run.** The automatic build runs all available tests. It is important to aim to \"\ntest for real\" - the test environment should match the production environment closely. The build may cover several major\ntypes of software testing: unit testing, integration testing, validation and verification and performance testing.\n\n**Use Saboteurs to test your testing.** Because we can't write perfect software, we can't write perfect tests. We need\nto test the tests. After you have written a test to detect a bug, cause the bug deliberately and make sure the test\ncomplains. If you are really serious about testing, take a separate branch, introduce bugs on purpose and verify that\nthe tests will catch them. At a higher level, you can use something like Netflix's Chaos Monkey.\n\n**Test state coverage, not code coverage.** Even if you happen to hit every line of code, that is not whole picture.\nWhat is important is the number of states that your program may have. States are not equivalent to lines of code. A\ngreat wat to explore how your code handles unexpected states is to have a computer generate those states (property-based\ntesting).\n\n**Find bug once.** Once a human tester finds a bug, it should be the last time a human tester finds that bug. If a bug\nslips through the net of existing tests, you need to add a new test to trap it next time.\n\n**Don't use manual procedures.** Tracking down differences of any one component usually reveals a surprise. People\naren't as repeatable as computers are. Nor should we expect them to be. Everything should depend on automation. Project\nbuild, deployment, ... Once you introduce manual steps, you have broken a very large window.\n\n**Delight users, don't just deliver code.** If you want to delight your client, forge a relationship with them where you\ncan actively help solve their problems. Be a _Problem Solver_ (not Software Engineer/Developer). That is the essence of\na Pragmatic Programmer.\n\n**Sign your work.** If we are responsible for a design, or a piece of code, we do a job we can be proud of. Artisans of\nan earlier age were proud to sign their work. You should be, too.\n\nHowever, you shouldn't jealously defend your code against interlopers, by the same token, you should treat other\npeople's code with respect. Mutual respect among the developers is critical to make this tip work.\n\nWe want to see pride in ownership \"_I wrote this, and I stand behind my work_\". Your signature should come to be\nrecognized as an indicator of quality. People should see your name on a piece of code and expect it to be solid, well\nwritten, tested and documented.\n\nA really professional job. Written by a professional. A Pragmatic Programmer.\n\n## Postface\n\nWe have a duty to ask ourselves two questions about every piece of code we deliver:\n\n1. Have I protected the user?\n2. Would I use this myself?\n\n**First, do no harm.** Would I be happy to be a user of this software? Do I want my details shared? Do I want my\nmovements to be given to retail outlets? Would I be happy to be driven by this autonomous vehicle? Am I comfortable\ndoing this? If you are involved in the project, you are just as responsible as the sponsors.\n\n**Don't enable scumbags.**\n\n**It is your life. Share it. Celebrate. Build It. AND HAVE FUN.** You are building the future. Your duty is to make a\nfuture that we would all want to inhabit. Recognize when you are doing something against this ideal, and have courage to\nsay no.\n"
  },
  {
    "path": "books/pytest/.coveragerc",
    "content": "[paths]\nsource =\n    src/"
  },
  {
    "path": "books/pytest/Dockerfile",
    "content": "FROM python:3.10.2\n\nWORKDIR /src\n\nENV PYTHONPATH \"${PYTHONPATH}:/src\"\n\nCOPY requirements.txt .\nCOPY setup.cfg .\n\nRUN pip install -r requirements.txt\n\nCOPY src/ src/\nCOPY tests/ tests/\n"
  },
  {
    "path": "books/pytest/docker-compose.yml",
    "content": "version: \"3.9\"\nservices:\n  book:\n    build:\n      context: .\n      dockerfile: Dockerfile\n    volumes:\n      - ./:/src\n"
  },
  {
    "path": "books/pytest/notes.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Python Testing with Pytest: Simple, Rapid, Effective, and Scalable\n\nBook by Brian Okken\n\nCode here: [click](.)\n\n- [Chapter 1: Getting Started with pytest](#chapter-1-getting-started-with-pytest)\n- [Chapter 2: Writing Test Functions](#chapter-2-writing-test-functions)\n- [Chapter 3: pytest Fixtures](#chapter-3-pytest-fixtures)\n- [Chapter 4: Built-in fixtures](#chapter-4-built-in-fixtures)\n- [Chapter 5: Parametrization](#chapter-5-parametrization)\n- [Chapter 6: Markers](#chapter-6-markers)\n- [Chapter 7: Strategy](#chapter-7-strategy)\n- [Chapter 8: Configuration Files](#chapter-8-configuration-files)\n- [Chapter 9: Coverage](#chapter-9-coverage)\n- [Chapter 10: Mocking](#chapter-10-mocking)\n- [Chapter 11: tox and Continuous Integration](#chapter-11-tox-and-continuous-integration)\n- [Chapter 12: Testing Scripts and Applications](#chapter-12-testing-scripts-and-applications)\n- [Chapter 13: Debugging Test Failures](#chapter-13-debugging-test-failures)\n- [Chapter 14: Third-Party Plugins](#chapter-14-third-party-plugins)\n- [Chapter 15: Building Plugins](#chapter-15-building-plugins)\n- [Chapter 16: Advanced Parametrization](#chapter-16-advanced-parametrization)\n\n## Chapter 1: Getting Started with pytest\n\nPart of pytest execution is test discovery, where pytest looks for `.py` files starting with `test_` or ending\nwith `_test`. Test methods and functions must start with `test_`, test classes should start with `Test`.\n\nFlag `--tb=no` turns off tracebacks.\n\nTest outcomes:\n\n- PASSED (.)\n- FAILED (F)\n- SKIPPED (S) - you can tell pytest to skip a test by using `@pytest.mark.skip` or `@pytest.mark.skipif`\n- XFAIL (x) - the test was not supposed to pass (`@pytest.mark.xfail`)\n- XPASS (X) - the teas was marked with xfail, but it ran and passed\n- ERROR (E) - an exception happened during the execution\n\n## Chapter 2: Writing Test Functions\n\nWriting knowledge-building tests - when faced a new data structure, it is often helpful to write some quick tests so\nthat you can understand how the data structure works. The point of these tests is to check my understanding of how the\nstructure works, and possibly to document that knowledge for someone else or even for a future me.\n\n`pytest` includes a feature called \"_assert rewriting_\", that intercepts _assert_ calls and replaces them with something\nthat can tell you more about why your assertions failed.\n\n`pytest.fail()` underneath raises an exception. When calling this function or raising an exception directly, we don't\nget the wonderful \"assert rewriting\" provided by the `pytest`.\n\nAssertion helper function - used to wrap up a complicated assertion check. `__tracebackhide__ = True` the effect will be\nthat failing tests will not include this function in the traceback.\n\nFlag `--tb=short` - shorted traceback format.\n\nUse `pytest.raises` to test expected exceptions. You can check error details by using `match`, `match` accepts regular\nexpressions and matches it with the exception message. You can also use `as exc_info` (or any other variable name) to\ninterrogate extra parameters.\n\nArrange-Act-Assert or Given-When-Then patterns are about separating test into stages. A common anti-pattern is to have\nmore \"Arrange-Assert-Act-Assert-Act-Assert-...\". Test should focus on testing one behavior.\n\n`pytest` allows to group tests with classes. You can utilize class hierarchies for inherited methods. However, book\nauthor doesn't recommend tests inheritance because they easily confuse readers. Use classes only for grouping.\n\n`pytest` allows to run a subset of tests, examples:\n\n- `pytest ch2/test_classes.py::TestEquality::test_equality`\n- `pytest ch2/test_classes.py::TestEquality`\n- `pytest ch2/test_classes.py`\n- `pytest ch2/test_card.py::test_defaults`\n- `pytest ch2/test_card.py`\n\n`-k` argument takes an expression, and tells pytest to run tests that contain a substring that matches the expression,\nexamples:\n\n- `pytest -v -k TestEquality`\n- `pytest -v -k TestEq`\n- `pytest -v -k equality`\n- `pytest -v -k \"equality and not equality_fail\"` (_and, or, parenthesis, not_ are allowed to create complex\n  expressions)\n\n## Chapter 3: pytest Fixtures\n\nFixtures are helper functions, run by pytest before (and sometimes after) the actual test functions. Code in the fixture\ncan do whatever you want it to do. Fixture can be also used to refer to the resource that is being set up by the fixture\nfunctions.\n\n`pytest` treats exceptions differently during fixtures compared to during a test function.\n\n- FAIL - the failure is somewhere in the test function\n- ERROR - the failure is somewhere in the fixture\n\nFixtures help a lot when dealing with databases.\n\nFixture functions run before the tests that use them. If there is a `yield` in the function, it stops there, passes\ncontrol to the tests, and picks up on the next line after the tests are done. The code above `yield` is \"setup\" and the\ncode after `yield` is \"teardown\". The code after `yield`, is guaranteed to run regardless of what happens during the\ntests.\n\nFlag `--setup-show` shows us the order of operations of tests and fixtures, including the setup and teardown phases of\nthe fixtures.\n\nThe scope dictates how often the setup and teardown get run when it is used by multiple test functions:\n\n- _function_ - (default scope) run once per test function. The setup is run before each test using the fixture. The\n  teardown is run after each test using the fixture.\n- _class_ - run once per test class, regardless of how many test methods are in the class.\n- _module_ - run once per module, regardless of how many test functions/methods of other fixtures in the module use it.\n- _package_ - run once per package, regardless of how many test functions/methods of other fixtures in the package use\n  it.\n- _session_ - run once per session, all test methods/functions using a fixture of session scope share one setup and\n  teardown call.\n\nThe scope is set at the definition of a fixture, and not at the place where it is called `@pytest.fixture(scope=...)`.\n\nFixtures can only depend on other fixtures of their same scope or wider.\n\n`conftest.py` is considered by `pytest` as a \"local plugin\". Gets read by pytest automatically. Use `conftest.py` to\nshare fixtures among multiple test files. We can have `conftest.py` files at every level of our test directory. Test can\nuse any fixture that is in the same test module as a test function, or in a `conftest.py` file in the same directory (or\nin the parent directory).\n\nUse `--fixtures` to show list of all available fixtures our test can use.\n\nUse `--fixtures-per-test` to see what fixtures are used by each test and where the functions are defined.\n\nUsing multiple stage fixtures can provide some incredible speed benefits and maintain test order independence.\n\nIt is possible to set fixture scope dynamically, e.g. by passing a new flag as an argument.\n\nUse `autouse=True` to run fixture all the time. The `autouse` feature is good to have around. But it is more of an\nexception than a rule. Opt for named fixtures unless you have a really great reason not to.\n\n`pytest` allows you to rename fixtures with a `name` parameter to `@pytest.fixture`.\n\n## Chapter 4: Built-in fixtures\n\n`tmp path` and `tmp_path_factory` - used to create temporary directories.\n\n- `tmp path`\n    - function scope\n- `tmp_path_factory`\n    - session scope\n    - you have to call `mktemp` to get a directory\n- `tmpdir_factory`\n    - similar to `tmp_path_factory`, but instead of `Path`, returns `py.path.local`\n\n`capsys` - enables the capturing of writes to `stdout` and `stderr`.\n\n- `capfd` - like `capsys`, but captures file descriptors 1 and 2 (stdout and stderr)\n- `capsysbinary` - `capsys` captures text, `capsysbinary` captures binary\n- `caplog` - captures output written with the logging package\n\nA \"monkey patch\" is a dynamic modification of a class or module during runtime. \"Monkey patching\" is a convenient way to\ntake over part of the runtime environment of the application code and replace it with entities that are more convenient\nfor testing.\n\n`monkeypatch` - used to modify objects, directories, evn variables. When test ends, the original unpatched code is\nrestored. It has the following functions:\n\n- `setattr` - sets an attribute\n- `delattr` - deletes an attribute\n- `setitem` - sets a directory entry\n- `delitem` - deletes a directory entry\n- `setenv` - sets an env variable\n- `delenv` - deletes an env variable\n- `syspath_prepend` - prepends, `path` to `sys.path`, which is Python's lis of import locations\n- `chdir` - changes the current working directory\n\nIf you start using monkey-patching:\n\n- you will start to understand this\n- you will start to avoid mocking and monkey-patching whenever possible\n\nDESIGN FOR TESTABILITY. A concept borrowed from hardware designers. Concept of adding functionality to software to make\nit easier to test.\n\nMore fixtures: https://docs.pytest.org/en/6.2.x/fixture.html or run `pytest --fixtures`.\n\n## Chapter 5: Parametrization\n\nParametrized tests refer to adding parameters to our test functions and passing in multiple sets of arguments to the\ntest to create new test cases.\n\nWith fixture parametrization, we shift parameters to a fixture, `pytest` will then call the fixture once each for every\nset of values we provide.\n\nFixture parametrization has the benefit of having a fixture run for each set of arguments. This is useful if you have\nsetup or teardown code that needs to run for each test case - e.g. different database connection, different file\ncontent, ...\n\n`pytest_generate_tests` - hook function. Allows you to modify the parametrization list at test collection time in\ninteresting ways.\n\n## Chapter 6: Markers\n\nMarkers are a way to tell pytest there is something special about a particular test. You can think of them like tags or\nlabels. If some tests are slow, you can mark them with `@pytest.mark.slow` and have pytest skip those tests when you are\nin hurry. You can pick a handful of tests out of a test suite and mark them with `@pytest.mark.smoke`.\n\nBuilt-in markers:\n\n- `@pytest.mark.filterwarnings(warning)` - adds a warning filter to the given test\n- `@pytest.mark.skip(reason=None)` - skip the test with an optional reason\n- `@pytest.mark.skipif(condition, ..., *, reason)` - skip the test if any of the conditions are true\n- `@pytest.mark.xfail(condition, ..., *, reason, run=True, raises=None, stric=xfail_strict)` - we can expect the test to\n  fail. If we want to run all tests, even those that we know will fail, we can use this marker.\n- `@pytest.mark.parametrize(argnames, argvalues, indirect, ids, scope)` - call a test function multiple times\n- `@pytest.mark.usefixtures(fixturename1, fizxturename2, ...)` - marks tests as needing all rhe specified fixtures\n\nCustom markers - you need to add `pytest.ini` with marker definition, some ideas for markers:\n\n- `@pytest.mark.smoke` - run `pytest -v -m smoke` to run smoke tests only\n- `@pytest.mark.exception` - run `pytest -v -m exception` to run exception-related tests only\n\nCustom markers shine when we have more files involved. We can also add markers to entire files or classes. We can even\nput multiple markers on a single test.\n\nFile-level marker:\n\n```python\npytestmark = [pytest.mark.marker_one, pytest.mark.marker_two]\n```\n\nWhen filtering tests using markers, it is possible to combine markers and use a bit of logic, just like we did with\nthe `-k` keyword, e.g. `pytest -v -m \"custom and exception\"`, `pytest -v -m \"finish and not smoke\"`.\n\n`--strict-markers` - raises an error when mark was not found (by default a warning is raised). Also, an error is raised\nat collection time, not at run time - error is reported earlier.\n\nMarkers can be used in conjunction with fixtures.\n\nUse `--markers` to list all available markers.\n\n## Chapter 7: Strategy\n\n_Testing enough to sleep at night_: The idea of testing enough so that you can sleep at night may have come from\nsoftware systems where developers have to be on call to fix software if it stops working in the middle of the night. It\nhas been extended to including sleeping soundly, knowing that your software is well tested.\n\nTesting through the API tests most of the system and logic.\n\nBefore you create the test cases you want to test, evaluate what features to test. When you have a lot of functionality\nand features to test, you have to prioritize the order of developing tests. At least a rough idea of order helps.\nPrioritize using the following factors:\n\n1. Recent - new features, new areas of code, recently modified, refactored.\n2. Core - your product's unique selling propositions. The essential functions that must continue to work in order for\n   the product to be useful.\n3. Risk - areas of the application that pose more risk, such as areas important to customers but not used regularly by\n   the development team or parts that use 3-rd party code you don't trust.\n4. Problematic - functionality that frequently breaks or even gets defect reports against it.\n5. Expertise - features or algorithms understood by a limited subset of people\n\nCreating test cases.\n\n- start with a non-trivial, \"happy path\" test case\n- then look at test cases that represent\n    - interesting set of inputs\n    - interesting starting states\n    - interesting end states\n    - all possible error states\n\n## Chapter 8: Configuration Files\n\nNon-test files that affect how _pytest_ runs.\n\n- `pytest.ini` - primary pytest configuration file that allows you to change pytest's default behavior. Its location\n  also defines the pytest root directory.\n- `conftest.py` - this file contains fixtures and hook functions. It can exist in at the root directory or in any\n  subdirectory. It is a good idea to stick to only one `conftest.py` file, so you can find fixture definitions easily.\n- `__init__.py` - when put into test subdirectories, this file allows you to have identical test file names in multiple\n  test directories. This means you can have `api/test_add.py` and `cli/test_add.py` but only if you have `__init__.py`\n  in both directories.\n- `tox.ini`, `pyproject.toml`, `setup.cfg` - these files can take the place of `pytest.ini`\n\nExample `pytest.ini`:\n\n```\n[pytest]              -- including `[pytest]` in `pytest.ini` allows the pytest ini parsing to treat `pytest.ini` and `tox.ini` identically\naddopts =             -- enables us to list the pytest flags we always want to run in this project\n    --stric-markers   -- raise an error for any unregistered marker\n    --strict-config   -- raise an error for any difficulty in parsing config files\n    -ra               -- display extra text summary at the end of a test run\n    \ntestpaths = tests     -- tells the python wehere to look for tests\n\nmarkers =             -- declare markers\n    smoke: subset of tests\n    exception: check for expected exceptions\n```\n\nExample `tox.ini`:\n\n```\n[tox]\n; tox specific settings\n\n[pytest]\naddopts =\n    --stric-markers\n    --strict-config\n    -ra\n...\n```\n\nExample `pyptoject.toml`:\n\n```\n[tool.pytest.ini_options]\naddopts = [\n  \"--stric-markers\",\n  \"--strict-config\",\n  \"-ra\"\n]\n\ntestpaths = tests \n\nmarkers =[\n  \"smoke: subset of tests\",\n  \"exception: check for expected exceptions\"\n]\n```\n\nExample `setup.cfg`:\n\n```\n[tool:pytest]\naddopts =\n    --stric-markers\n    --strict-config\n    -ra\n...\n```\n\nEven if you don't need any configuration settings, it is still a great idea to place an empty `pytest.ini` at the top of\nyour project, because pytest may keep searching for this file.\n\n## Chapter 9: Coverage\n\nTools that measure code coverage watch your code while a test suite is being run and keep track of which lines are hit\nand which are not. That measurement is called \"line coverage\" = \"total number of lines\" / \"total lines of code\".\n\nCode coverage tools can also tell you if all paths are taken in control statements - \"branch coverage\".\n\nCode coverage cannot tell you if your test suite is good - it can only tell you how much of the application code is\ngetting hit by your test suite.\n\n`coverage.py` - preferred Python coverage tool, `pytest-cov` - popular pytest plugin (depends on `coverage.py`, so it\nwill be installed as well).\n\nTo run tests with `coverage.py`, you need to add `--cov` flag.\n\nTo add missing lines to the terminal report, add the `--cov-report=term-missing` flag.\n\n`coverage.py` is able to generate HTML reports: `docker-compose run --rm book pytest --cov=src --cov-report=html`, to\nhelp view coverage data in more detail.\n\n`# pragra: no cover` - tells `coverage` to exclude either a single line or a block of code.\n\n**Beware of Coverage-Driven Development!** The problem with adding tests just to hit 100% is that doing so will mask the\nfact that these lines aren't being used and therefore are not needed by the application. It also adds test code and\ncoding time that is not necessary.\n\n## Chapter 10: Mocking\n\nThe `mock` package is used to swap out pieces of the system to isolate bits of our application code from the rest of the\nsystem. Mock objects are called sometimes _test doubles_, _spies_, _fakes_ or _stubs_.\n\nTyper provides a testing interface. With it, we don't have use `subprocess.run`, which is good, because we can't mock\nstuff running in a separate process.\n\nMocks by default accept any access. If real object allows `.start(index)`, we want our mock objects to\nallow `start(index)` as well. Mock objects are too flexible by default - they will also accept `star()` - any misspelled\nmethods, additional parameters, really anything.\n\n_Mock drift_ - occurs when the interface you are mocking changes, and your mock in your test code doesn't.\n\nUse `autospec=True` - without it, mock will allow you to call any function, with any parameters, even if it doesn't make\nsense for the real thing being mocked. Always use _autospec_ when you can.\n\n**Mocking tests implementation, not behavior.** When we are using mocks in a test, we are no longer testing behavior,\nbut testing implementation. Focusing tests on testing implementation is dangerous and time-consuming.\n\n_Change detector test_ - test that break during valid refactoring. When test fail whenever the code changes, they are\nchange detector tests, and are usually more trouble than they worth.\n\nMocking is useful when you need to generate an exception or make sure your code calls a particular API method when it is\nsupposed to, with the correct parameters.\n\nThere are several special-purpose mocking libraries:\n\n- mocking database: `pytest-postgresql`, `pytest-mongo`, `pytest-mysql`, `pytest-dynamodb`\n- mocking HTTP servers: `pytest-httpserver`\n- mocking requests: `responses`, `betamax`\n- other: `pytest-rabbitmq`, `pytest-soir`, `pytest-elasticsearch`, `pytest-redis`\n\nAdding functionality that makes testing easier is part of \"design for testability\" and can be used to allow testing at\nmultiple levels or testing at a higher level.\n\n## Chapter 11: tox and Continuous Integration\n\nCI refers to the practice of merging all developers' code changes into a shared repository on a regular basis - often\nseveral times a day.\n\nBefore the implementation of CI, teams used version control to keep track of code updates, and different developers\nwould add a feature/fix on the separate branches. Then code was merged, built, and tested. The frequency of merge varied\nfrom \"when your code is ready, merge it\" to regularly scheduled merges (weekly, monthly). The merge was called\n_integration_ because the code is being integrated together.\n\nWith this soft of version control, code conflicts happened often. Some merge errors were not found until very late.\n\nCI tools build and run tests all on their own, usually triggered by a merge request. Because the build and test stages\nare automated, developers can integrate more frequently, even several times a day.\n\nCI tools automate the process of build and test.\n\n`tox` - command-line tool that allows you to run complete suite of tests in multiple envs. Great starting point when\nlearning about CI. `tox`:\n\n1. creates a virtual env in a .tox directory\n2. pip installs some dependencies\n3. builds your package\n4. pip installs your package\n5. runs your tests\n\n`tox` can automate testing process locally, but also it helps with cloud-based CI. You can integrate tox with GitHub\nActions.\n\n## Chapter 12: Testing Scripts and Applications\n\nDefinitions:\n\n- script - a single file containing Python code that is intended to be run directly from Python\n- importable script - a script in which no code is executed when it is imported. Code is executed only when it is run\n  directly\n- application - package or script that has external dependencies\n\nTesting a small script with `subprcoess.run` works okay, but it does have drawbacks\n\n- we may want to test sections of larger scripts separately\n- we may want to separate test code and scripts into different directories\n\nSolution for this is to make a script importable. Add `if __name__ == \"__main__\"` - this code is executed only when we\ncall the script with `python script.py`.\n\n## Chapter 13: Debugging Test Failures\n\npytest includes few command-line flags that are useful for debugging:\n\n- `-lf` / `--last-failed` - runs just the tests that failed last\n- `-ff` / `--failed-first` - runs all the test, starting from the last failed\n- `-x` / `--exitfirst` - stops the test session after the first failure\n- `--maxfail=num` -stops the tests after `num` failures\n- `-nf` / `--new-first` - runs all the tests, ordered by the modification time\n- `--sw` / `--stepwise` - stops the tests at the first failure, starts the test at the last failure next time\n- `--sw-skip` / `--stepwise-skip` - same as `--sw`, but skips the first failure\n\nFlags to control pytest output:\n\n- `-v` / `--verbose` - all the test names, passing or failing\n- `--tb=[auto/long/short/line/native/no]` - controls the traceback style\n- `-l` / `--showlocals` - displays local variables alongside the stacktrace\n\nFlags to start a command-line debugger:\n\n- `--pdb` - starts an interactive debugging session at the point of failure\n- `--trace` - starts the pdb source-code debugger immediately when running each test\n- `--pdbcls` - uses alternatives to pdb\n\n`pdb` - Python Debugger - part of the Python standard library. Add `breakpoint()` call, when a pytest hits this function\ncall, it will stop there and launch `pdb`. There are common commands recognized by `pdb` - full list in the\ndocumentation (or use PyCharm's debugger instead if you can).\n\n## Chapter 14: Third-Party Plugins\n\nThe pytest code is designed to allow customisation and extensions, and there are hooks available to allow modifications\nand improvements through plugins.\n\nEvery time you put fixtures and/or hook functions into a project's `conftest.py` file, you create a local plugin. Only\nsome extra work is needed to turn these files into installable plugins.\n\n`pytest` plugins are installed with `pip`.\n\nPlugins that change the normal test run flow:\n\n- `pytest-order` - specify the order using marker\n- `pytest-randomly` - randomize order, first by file, then by a class, then by test\n- `pytest-repeat` - makes it easy to repeat a single/multiple test(s), specific number of times\n- `pytest-rerunfailures` - rerun failed tests (helpful for flaky tests)\n- `pytest-xdist` - runs tests in parallel, either using multiple CPUs or multiple remote machines\n\nPlugins that alter or enhance output:\n\n- `pytest-instafail` - reports tracebacks and output from failed tests right after the failure\n- `pytest-sugar` - shows green checkmarks instead of dots and has nice progress bar\n- `pytest-html` - allows for HTML report generation\n\nPlugins for web development:\n\n- `pytest-selenium` - additional fixtures to allow easy configuration of browser-based tests\n- `pytest-splinter` - built on top of Selenium, allows Splinter to be used more easily from pytest\n- `pytest-django`, `pytest-flask` - make testing Django/Flask apps easier\n\nPlugins for fake data:\n\n- `Faker` - generates fake data, provides `faker` fixture\n- `model-bakery` - generates Django models with fake data\n- `pytest-factoryboy` - includes fixtures for Factory Boy\n- `pytest-mimesis` - generates fake data similarly to Faker, but Mimesis is quite a bit faster\n\nPlugins that extend pytest functionality:\n\n- `pytest-cov` - runs coverage while testing\n- `pytest-benchmark` - runs benchmark timing on code within tests\n- `pytest-timeout` - doesn't let tests run too long\n- `pytest-asyncio` - test async functions\n- `pytest-bdd` - BDD-style tests with pytest\n- `pytest-freezegun` - freezes time so that any code that reads the time will get the same value during a tests, you can\n  also set a particular date or time\n- `pytest-mock` - thin wrapper around the `unittest.mock`\n\nFull list of plugins: https://docs.pytest.org/en/latest/reference/plugin_list.html\n\n## Chapter 15: Building Plugins\n\nHook functions - function entry points that pytest provides to allow plugin developers to intercept pytest behaviour at\ncertain points and make changes. There are multiple hook functions, example:\n\n- `pytest_configure()` - perform initial config. We can use this function to for example, pre-declare `slow` marker.\n- `pytest_addoption()` - register options and settings, e.g. new flag: _--slow_\n- `pytest_collection_modifyitems()` - called after test collection, can be used to filter or re-order the test items,\n  e.g. to find _slow_ tests\n\nThe Node Interface: https://docs.pytest.org/en/latest/reference/reference.html#node\n\nYou can transform local `conftest.py` to installable plugin. You can use `Flit` to get help with the `pyproject.toml`\nand `LICENSE`.\n\nPlugins are code that needs to be tested just like any other code. `pytester` ias a plugin shipped with `pytest`.\n`pytester` creates a temporary directory for each test that uses the `pytester` fixture, there are a bunch of\nfunctions to help populate this directory - https://docs.pytest.org/en/latest/reference/reference.html#pytester\n\n## Chapter 16: Advanced Parametrization\n\nWhen using complex parametrization values, `pytest` numbers test cases like: `starting_card0, starting_card1, ...`. It\nis possible to generate custom identifiers:\n\n```py\ncard_list = [\n    Card(\"foo\", \"todo\"),\n    Card(\"foo\", \"in prog\"),\n    Card(\"foo\", \"done\"),\n]\n\n\n@pytest.mark.parametrize(\"starting_card\", card_list, ids=str)\n```\n\nYou can write custom ID function:\n\n```py\ndef cards_state(card):\n    return card.state\n\n\n@pytest.mark.parametrize(\"starting_card\", card_list, ids=cards_state)\n```\n\nLambda function works as well:\n\n```py\n@pytest.mark.parametrize(\"starting_card\", card_list, ids=lambda c: c.state)\n```\n\nIf you have one wor two parameters requiring special treatment, use `pytest.param` to override the ID:\n\n```py\ncard_list = [\n    Card(\"foo\", \"todo\"),\n    pytest.param(Card(\"foo\", \"in prog\"), id=\"special\"),\n    Card(\"foo\", \"done\"),\n]\n\n\n@pytest.mark.parametrize(\"starting_card\", card_list, ids=cards_state)\n```\n\nYou can supply a list to `ids`, instead of a function:\n\n```py\nid_list = [\"todo\", \"in prog\", \"done\"]\n\n\n@pytest.mark.parametrize(\"starting_card\", card_list, ids=id_list)\n```\n\nbut you have to be extra careful to keep the lists synchronized. Otherwise, the IDs are wrong.\n\nIt is possible to write our own function to generate parameter values:\n\n```py\ndef text_variants():\n    # This function can read data from a file/API/database/... as well.\n    variants = {...: ...}\n\n    for key, value in variants.items():\n        yield pytest.param(value, id=key)\n\n\n@pytest.mark.parametrize(\"variant\", text_variants())\n```\n\nIf you want to test all combinations, stacking parameters is the way to go:\n\n```py\n@pytest.mark.parametrize(\"state\", states)\n@pytest.mark.parametrize(\"owner\", owners)\n@pytest.mark.parametrize(\"summary\", summaries)\ndef test_stacking(summary, owner, state):\n```\n\nthis will act rather like cascading for loops, looping on the parameters from the bottom decorator to the top.\n\nAn _indirect parameter_ is the one that get passed to a fixture before it gets send to the test function. Indirect\nparameters essentially let us parameterize a fixture, while keeping the parameter values with the test function. This\nallows different tests to use the same fixture with different parameter values.\n\n```py\n@pytest.fixture()\ndef user(request):\n    role = request.param\n    print(f\"Logging in as {role}\")\n    yield role\n    print(f\"Logging out {role}\")\n\n\n@pytest.mark.parametrize(\"user\", [\"admin\", \"team_member\", \"visitor\"], indirect=[\"user\"])\ndef test_access_rights(user):\n    ...\n```\n"
  },
  {
    "path": "books/pytest/requirements.txt",
    "content": "tinydb\npytest\nfaker\ntox\ncoverage\npytest-cov\ntinydb\ntyper\nrich\n"
  },
  {
    "path": "books/pytest/setup.cfg",
    "content": "[tool:pytest]\npython_paths = .\ntestpaths = tests\n"
  },
  {
    "path": "books/pytest/src/__init__.py",
    "content": "\"\"\"Top-level package for cards.\"\"\"\n\n__version__ = \"1.0.0\"\n\nfrom .api import *  # noqa\nfrom .cli import app  # noqa\n"
  },
  {
    "path": "books/pytest/src/api.py",
    "content": "\"\"\"\nAPI for the cards project\n\"\"\"\nfrom dataclasses import asdict\nfrom dataclasses import dataclass\nfrom dataclasses import field\n\nfrom src.db import DB\n\n__all__ = [\n    \"Card\",\n    \"CardsDB\",\n    \"CardsException\",\n    \"MissingSummary\",\n    \"InvalidCardId\",\n]\n\n__version__ = \"1.0.0\"\n\n\n@dataclass\nclass Card:\n    summary: str = None\n    owner: str = None\n    state: str = \"todo\"\n    id: int = field(default=None, compare=False)\n\n    @classmethod\n    def from_dict(cls, d):\n        return Card(**d)\n\n    def to_dict(self):\n        return asdict(self)\n\n\nclass CardsException(Exception):\n    pass\n\n\nclass MissingSummary(CardsException):\n    pass\n\n\nclass InvalidCardId(CardsException):\n    pass\n\n\nclass CardsDB:\n    def __init__(self, db_path):\n        self._db_path = db_path\n        self._db = DB(db_path, \".cards_db\")\n\n    def add_card(self, card: Card) -> int:\n        \"\"\"Add a card, return the id of card.\"\"\"\n        if not card.summary:\n            raise MissingSummary\n        if card.owner is None:\n            card.owner = \"\"\n        id = self._db.create(card.to_dict())\n        self._db.update(id, {\"id\": id})\n        return id\n\n    def get_card(self, card_id: int) -> Card:\n        \"\"\"Return a card with a matching id.\"\"\"\n        db_item = self._db.read(card_id)\n        if db_item is not None:\n            return Card.from_dict(db_item)\n        else:\n            raise InvalidCardId(card_id)\n\n    def list_cards(self, owner=None, state=None):\n        \"\"\"Return a list of cards.\"\"\"\n        all = self._db.read_all()\n        if (owner is not None) and (state is not None):\n            return [\n                Card.from_dict(t)\n                for t in all\n                if (t[\"owner\"] == owner and t[\"state\"] == state)\n            ]\n        elif owner is not None:\n            return [\n                Card.from_dict(t) for t in all if t[\"owner\"] == owner\n            ]\n        elif state is not None:\n            return [\n                Card.from_dict(t) for t in all if t[\"state\"] == state\n            ]\n        else:\n            return [Card.from_dict(t) for t in all]\nz\n    def count(self) -> int:\n        \"\"\"Return the number of cards in db.\"\"\"\n        return self._db.count()\n\n    def update_card(self, card_id: int, card_mods: Card) -> None:\n        \"\"\"Update a card with modifications.\"\"\"\n        try:\n            self._db.update(card_id, card_mods.to_dict())\n        except KeyError as exc:\n            raise InvalidCardId(card_id) from exc\n\n    def start(self, card_id: int):\n        \"\"\"Set a card state to 'in prog'.\"\"\"\n        self.update_card(card_id, Card(state=\"in prog\"))\n\n    def finish(self, card_id: int):\n        \"\"\"Set a card state to 'done'.\"\"\"\n        self.update_card(card_id, Card(state=\"done\"))\n\n    def delete_card(self, card_id: int) -> None:\n        \"\"\"Remove a card from db with given card_id.\"\"\"\n        try:\n            self._db.delete(card_id)\n        except KeyError as exc:\n            raise InvalidCardId(card_id) from exc\n\n    def delete_all(self) -> None:\n        \"\"\"Remove all cards from db.\"\"\"\n        self._db.delete_all()\n\n    def close(self):\n        self._db.close()\n\n    def path(self):\n        return self._db_path\n"
  },
  {
    "path": "books/pytest/src/cli.py",
    "content": "\"\"\"Command Line Interface (CLI) for cards project.\"\"\"\nimport os\nimport pathlib\nfrom contextlib import contextmanager\nfrom io import StringIO\nfrom typing import List\n\nimport rich\nimport typer\nfrom rich.table import Table\n\nimport src.api as cards\n\napp = typer.Typer(name=\"cards\", add_completion=False)\n\n\n@app.command()\ndef version():\n    \"\"\"Return version of cards application\"\"\"\n    print(cards.__version__)\n\n\n@app.command()\ndef add(\n        summary: List[str], owner: str = typer.Option(None, \"-o\", \"--owner\")\n):\n    \"\"\"Add a card to db.\"\"\"\n    summary = \" \".join(summary) if summary else None\n    with cards_db() as db:\n        db.add_card(cards.Card(summary, owner, state=\"todo\"))\n\n\n@app.command()\ndef delete(card_id: int):\n    \"\"\"Remove card in db with given id.\"\"\"\n    with cards_db() as db:\n        try:\n            db.delete_card(card_id)\n        except cards.InvalidCardId:\n            print(f\"Error: Invalid card id {card_id}\")\n\n\n@app.command(\"list\")\ndef list_cards(\n        owner: str = typer.Option(None, \"-o\", \"--owner\"),\n        state: str = typer.Option(None, \"-s\", \"--state\"),\n):\n    \"\"\"\n    List cards in db.\n    \"\"\"\n    with cards_db() as db:\n        the_cards = db.list_cards(owner=owner, state=state)\n        table = Table(box=rich.box.SIMPLE)\n        table.add_column(\"ID\")\n        table.add_column(\"state\")\n        table.add_column(\"owner\")\n        table.add_column(\"summary\")\n        for t in the_cards:\n            owner = \"\" if t.owner is None else t.owner\n            table.add_row(str(t.id), t.state, owner, t.summary)\n        out = StringIO()\n        rich.print(table, file=out)\n        print(out.getvalue())\n\n\n@app.command()\ndef update(\n        card_id: int,\n        owner: str = typer.Option(None, \"-o\", \"--owner\"),\n        summary: List[str] = typer.Option(None, \"-s\", \"--summary\"),\n):\n    \"\"\"Modify a card in db with given id with new info.\"\"\"\n    summary = \" \".join(summary) if summary else None\n    with cards_db() as db:\n        try:\n            db.update_card(\n                card_id, cards.Card(summary, owner, state=None)\n            )\n        except cards.InvalidCardId:\n            print(f\"Error: Invalid card id {card_id}\")\n\n\n@app.command()\ndef start(card_id: int):\n    \"\"\"Set a card state to 'in prog'.\"\"\"\n    with cards_db() as db:\n        try:\n            db.start(card_id)\n        except cards.InvalidCardId:\n            print(f\"Error: Invalid card id {card_id}\")\n\n\n@app.command()\ndef finish(card_id: int):\n    \"\"\"Set a card state to 'done'.\"\"\"\n    with cards_db() as db:\n        try:\n            db.finish(card_id)\n        except cards.InvalidCardId:\n            print(f\"Error: Invalid card id {card_id}\")\n\n\n@app.command()\ndef config():\n    \"\"\"List the path to the Cards db.\"\"\"\n    with cards_dbz() as db:\n        print(db.path())\n\n\n@app.command()\ndef count():\n    \"\"\"Return number of cards in db.\"\"\"\n    with cards_db() as db:\n        print(db.count())\n\n\n@app.callback(invoke_without_command=True)\ndef main(ctx: typer.Context):\n    \"\"\"\n    Cards is a small command line task tracking application.\n    \"\"\"\n    if ctx.invoked_subcommand is None:\n        list_cards(owner=None, state=None)\n\n\ndef get_path():\n    db_path_env = os.getenv(\"CARDS_DB_DIR\", \"\")\n    if db_path_env:\n        db_path = pathlib.Path(db_path_env)\n    else:\n        db_path = pathlib.Path.home() / \"cards_db\"\n    return db_path\n\n\n@contextmanager\ndef cards_db():\n    db_path = get_path()\n    db = cards.CardsDB(db_path)\n    yield db\n    db.close()\n"
  },
  {
    "path": "books/pytest/src/db.py",
    "content": "\"\"\"\nDB for the cards project\n\"\"\"\nimport tinydb\n\n\nclass DB:\n    def __init__(self, db_path, db_file_prefix):\n        self._db = tinydb.TinyDB(\n            db_path / f\"{db_file_prefix}.json\", create_dirs=True\n        )\n\n    def create(self, item: dict) -> int:\n        id = self._db.insert(item)\n        return id\n\n    def read(self, id: int):\n        item = self._db.get(doc_id=id)\n        return item\n\n    def read_all(self):\n        return self._db\n\n    def update(self, id: int, mods) -> None:\n        changes = {k: v for k, v in mods.items() if v is not None}\n        self._db.update(changes, doc_ids=[id])\n\n    def delete(self, id: int) -> None:\n        self._db.remove(doc_ids=[id])\n\n    def delete_all(self) -> None:\n        self._db.truncate()\n\n    def count(self) -> int:\n        return len(self._db)\n\n    def close(self):\n        self._db.close()\n"
  },
  {
    "path": "books/pytest/tests/ch_02/test_card.py",
    "content": "import pytest\n\nfrom src import Card\n\n\ndef test_field_access():\n    c = Card(\"something\", \"brian\", \"todo\", 123)\n    assert (c.summary, c.owner, c.state, c.id) == (\"something\", \"brian\", \"todo\", 123)\n\n\ndef test_defaults():\n    c = Card()\n    assert (c.summary, c.owner, c.state, c.id) == (None, None, \"todo\", None)\n\n\ndef test_equality():\n    assert Card(\"something\", \"brian\", \"todo\", 123) == Card(\"something\", \"brian\", \"todo\", 123)\n\n\ndef test_equality_with_different_ids():\n    assert Card(\"something\", \"brian\", \"todo\", 123) == Card(\"something\", \"brian\", \"todo\", 321)\n\n\ndef test_inequality():\n    assert Card(\"something\", \"brian\", \"todo\", 123) != Card(\"completely different\", \"okken\", \"todo\", 123)\n\n\ndef test_to_dict():\n    assert Card.from_dict({\n        \"summary\": \"something\",\n        \"owner\": \"brian\",\n        \"state\": \"todo\",\n        \"id\": 123\n    }) == Card(\"something\", \"brian\", \"todo\", 123)\n\n\ndef test_from_dict():\n    assert Card(\"something\", \"brian\", \"todo\", 123).to_dict() == {\n        \"summary\": \"something\",\n        \"owner\": \"brian\",\n        \"state\": \"todo\",\n        \"id\": 123\n    }\n"
  },
  {
    "path": "books/pytest/tests/ch_02/test_classes.py",
    "content": "from src import Card\n\n\nclass TestEquality:\n    def test_equality(self):\n        assert Card(\"something\", \"brian\", \"todo\", 123) == Card(\"something\", \"brian\", \"todo\", 123)\n\n    def test_equality_with_different_ids(self):\n        assert Card(\"something\", \"brian\", \"todo\", 123) == Card(\"something\", \"brian\", \"todo\", 321)\n\n    def test_inequality(self):\n        assert Card(\"something\", \"brian\", \"todo\", 123) != Card(\"completely different\", \"okken\", \"todo\", 123)\n"
  },
  {
    "path": "books/pytest/tests/ch_02/test_exceptions.py",
    "content": "import pytest\nfrom src import CardsDB\n\n\ndef test_no_path_raises():\n    with pytest.raises(TypeError):\n        CardsDB()\n\n\ndef test_raises_with_info():\n    with pytest.raises(TypeError, match=\"missing 1 .* positional argument\"):\n        CardsDB()\n"
  },
  {
    "path": "books/pytest/tests/ch_02/test_helper.py",
    "content": "import pytest\n\nfrom src import Card\n\n\ndef assert_identical(c1: Card, c2: Card):\n    # Do not include 'assert_identical' in traceback:\n    __tracebackhide__ = True\n\n    assert c1 == c2\n    if c1.id != c2.id:\n        pytest.fail(f\"id's don't match. {c1.id} != {c2.id}\")\n\n\ndef test_identical():\n    assert_identical(Card(\"foo\", id=123), Card(\"foo\", id=123))\n\n\n@pytest.mark.skip()\ndef test_identical_fail():\n    assert_identical(Card(\"foo\", id=123), Card(\"foo\", id=321))\n"
  },
  {
    "path": "books/pytest/tests/ch_03/conftest.py",
    "content": "from pathlib import Path\nfrom tempfile import TemporaryDirectory\n\nimport pytest\n\nfrom src import (\n    Card,\n    CardsDB,\n)\n\n\n@pytest.fixture(scope=\"session\")\ndef db():\n    with TemporaryDirectory() as db_dir:\n        db_path = Path(db_dir)\n        _db = CardsDB(db_path)\n        yield _db\n        _db.close()\n\n\n@pytest.fixture(scope=\"function\")\ndef cards_db(db):\n    db.delete_all()\n    return db\n\n\n@pytest.fixture(scope=\"session\")\ndef some_cards():\n    return [\n        Card(\"write book\", \"brian\", \"done\"),\n        Card(\"edit book\", \"katie\", \"done\"),\n        Card(\"write 2nd edition\", \"brian\", \"todo\"),\n        Card(\"edit 2nd edition\", \"katie\", \"todo\"),\n    ]\n\n\n"
  },
  {
    "path": "books/pytest/tests/ch_03/test_autouse.py",
    "content": "from time import (\n    localtime,\n    sleep,\n    strftime,\n    time,\n)\n\nimport pytest\n\n\n@pytest.fixture(scope=\"function\")\ndef non_empty_db(cards_db, some_cards):\n    for c in some_cards:\n        cards_db.add_card(c)\n    return cards_db\n\n\n@pytest.fixture(autouse=True, scope=\"session\")\ndef footer_session_scope():\n    yield\n    now = time()\n    print(\"---\")\n    print(f\"finished : {strftime('%d %b %X', localtime(now))}\")\n    print(\"--------\")\n\n\n@pytest.fixture(autouse=True)\ndef footer_function_scope():\n    start = time()\n    yield\n    stop = time()\n    print(f\"test duration: {stop - start:0.3}\")\n\n\ndef test_1():\n    sleep(1)\n\n\ndef test_2():\n    sleep(1.23)\n"
  },
  {
    "path": "books/pytest/tests/ch_03/test_count.py",
    "content": "from src import Card\n\n\ndef test_empty(cards_db):\n    assert cards_db.count() == 0\n\n\ndef test_two(cards_db):\n    cards_db.add_card(Card(\"first\"))\n    cards_db.add_card(Card(\"second\"))\n    assert cards_db.count() == 2\n\n\ndef test_three(cards_db):\n    cards_db.add_card(Card(\"first\"))\n    cards_db.add_card(Card(\"second\"))\n    cards_db.add_card(Card(\"three\"))\n    assert cards_db.count() == 3\n"
  },
  {
    "path": "books/pytest/tests/ch_03/test_count_initial.py",
    "content": "from pathlib import Path\nfrom tempfile import TemporaryDirectory\n\nfrom src import CardsDB\n\n\ndef test_empty():\n    with TemporaryDirectory() as db_dir:\n        db_path = Path(db_dir)\n        db = CardsDB(db_path)\n\n        count = db.count()\n        db.close()\n\n        assert count == 0\n"
  },
  {
    "path": "books/pytest/tests/ch_03/test_fixtures.py",
    "content": "import pytest\n\n\n@pytest.fixture()\ndef some_data():\n    return 42\n\n\ndef test_some_data(some_data):\n    assert some_data == 42\n"
  },
  {
    "path": "books/pytest/tests/ch_03/test_rename_fixture.py",
    "content": "import pytest\n\n\n@pytest.fixture(name=\"ultimate_answer\")\ndef ultimate_answer_fixture():\n    return 42\n\n\ndef test_everything(ultimate_answer):\n    assert ultimate_answer == 42\n"
  },
  {
    "path": "books/pytest/tests/ch_03/test_some.py",
    "content": "import pytest\n\n\n@pytest.fixture(scope=\"function\")\ndef non_empty_db(cards_db, some_cards):\n    for c in some_cards:\n        cards_db.add_card(c)\n    return cards_db\n\n\ndef test_add_some(cards_db, some_cards):\n    expected_count = len(some_cards)\n    for c in some_cards:\n        cards_db.add_card(c)\n    assert cards_db.count() == expected_count\n\n\ndef test_non_empty(non_empty_db):\n    assert non_empty_db.count() > 0\n"
  },
  {
    "path": "books/pytest/tests/ch_04/conftest.py",
    "content": "import pytest\nfrom src import CardsDB\n\n\n@pytest.fixture(scope=\"session\")\ndef db(tmp_path_factory):\n    db_path = tmp_path_factory.mktemp(\"cards_db\")\n    _db = CardsDB(db_path)\n    yield _db\n    _db.close()\n"
  },
  {
    "path": "books/pytest/tests/ch_04/test_config.py",
    "content": "import src as cards\nfrom typer.testing import CliRunner\n\n\ndef run_cards(*params):\n    runner = CliRunner()\n    result = runner.invoke(cards.app, params)\n    return result.output.rstrip()\n\n\ndef test_run_cards():\n    assert run_cards(\"version\") == cards.__version__\n\n\ndef test_patch_get_path(monkeypatch, tmp_path):\n    def fake_get_path():\n        return tmp_path\n\n    monkeypatch.setattr(cards.cli, \"get_path\", fake_get_path)\n    assert run_cards(\"config\") == str(tmp_path)\n\n\ndef test_patch_home(monkeypatch, tmp_path):\n    full_cards_dir = tmp_path / \"cards_db\"\n\n    def fake_home():\n        return tmp_path\n\n    monkeypatch.setattr(cards.cli.pathlib.Path, \"home\", fake_home)\n    assert run_cards(\"config\") == str(full_cards_dir)\n\n\ndef test_patch_env_var(monkeypatch, tmp_path):\n    monkeypatch.setenv(\"CARDS_DB_DIR\", str(tmp_path))\n    assert run_cards(\"config\") == str(tmp_path)\n"
  },
  {
    "path": "books/pytest/tests/ch_04/test_tmp.py",
    "content": "def test_tmp_path(tmp_path):\n    file = tmp_path / \"file.txt\"\n    file.write_text(\"Hello\")\n    assert file.read_text() == \"Hello\"\n\n\ndef test_tmp_path_factory(tmp_path_factory):\n    path = tmp_path_factory.mktemp(\"sub\")\n    file = path / \"file.txt\"\n    file.write_text(\"Hello\")\n    assert file.read_text() == \"Hello\"\n"
  },
  {
    "path": "books/pytest/tests/ch_04/test_version.py",
    "content": "from typer.testing import CliRunner\n\nimport src as cards\n\n\ndef test_version(capsys):\n    cards.cli.version()\n    output = capsys.readouterr().out.rstrip()\n    assert output == cards.__version__\n\n\ndef test_version_v2():\n    runner = CliRunner()\n    result = runner.invoke(cards.app, [\"version\"])\n    output = result.output.rstrip()\n    assert output == cards.__version__\n"
  },
  {
    "path": "books/pytest/tests/ch_05/test_parametrize.py",
    "content": "import pytest\n\nfrom src import (\n    Card,\n    CardsDB,\n)\n\n\n@pytest.fixture(scope=\"session\")\ndef db(tmp_path_factory):\n    db_path = tmp_path_factory.mktemp(\"cards_db\")\n    _db = CardsDB(db_path)\n    yield _db\n    _db.close()\n\n\n@pytest.fixture(scope=\"function\")\ndef cards_db(db):\n    db.delete_all()\n    return db\n\n\n@pytest.mark.parametrize(\"initial_state\", [\"done\", \"in prog\", \"todo\"])\ndef test_finish(cards_db, initial_state):\n    c = Card(\"write a book\", state=initial_state)\n    index = cards_db.add_card(c)\n    cards_db.finish(index)\n\n    c = cards_db.get_card(index)\n\n    assert c.state == \"done\"\n\n\n@pytest.fixture(params=[\"done\", \"in prog\", \"todo\"])\ndef start_state(request):\n    return request.param\n\n\ndef test_finish_v2(cards_db, start_state):\n    c = Card(\"write a book\", state=start_state)\n    index = cards_db.add_card(c)\n    cards_db.finish(index)\n\n    c = cards_db.get_card(index)\n\n    assert c.state == \"done\"\n\n\ndef pytest_generate_tests(metafunc):\n    if \"start_state_2\" in metafunc.fixturenames:\n        metafunc.parametrize(\"start_state_2\", [\"done\", \"in prog\", \"todo\"])\n\n\ndef test_finish_v3(cards_db, start_state_2):\n    c = Card(\"write a book\", state=start_state_2)\n    index = cards_db.add_card(c)\n    cards_db.finish(index)\n\n    c = cards_db.get_card(index)\n\n    assert c.state == \"done\""
  },
  {
    "path": "books/pytest/tests/ch_06/pytest.ini",
    "content": "[pytest]\nmarkers =\n    smoke: subset of tests\n    exception: check for expected exceptions\n    custom: run only ch_06/custom\n    num_cards: number of cards to prefill for cards_db fixture\nadopts =\n    --stric-markers\n"
  },
  {
    "path": "books/pytest/tests/ch_06/test_builtin.py",
    "content": "from pathlib import Path\nfrom tempfile import TemporaryDirectory\n\nimport pytest\nfrom packaging.version import parse\n\nfrom src import (\n    Card,\n    CardsDB,\n    api,\n)\n\n\n@pytest.mark.skip(reason=\"card doesn't support comparison yet\")\ndef test_less_than_skip():\n    assert Card(\"a task\") < Card(\"b task\")\n\n\n@pytest.mark.skipif(\n    parse(api.__version__).major < 2,\n    reason=\"Card comparison not supported in 1.x\"\n)\ndef test_less_than_skipif():\n    assert Card(\"a task\") < Card(\"b task\")\n\n\n@pytest.mark.xfail(\n    parse(api.__version__).major < 2,\n    reason=\"Card comparison not supported in 1.x\"\n)\ndef test_less_than_xfail():\n    assert Card(\"a task\") < Card(\"b task\")\n\n\n@pytest.mark.xfail(reason=\"XPASS demo\")\ndef test_xpass():\n    assert Card(\"a task\") == Card(\"a task\")\n\n\n@pytest.mark.xfail(reason=\"strict demo\", strict=True)\n@pytest.mark.skip\ndef test_xpass_strict():\n    assert Card(\"a task\") == Card(\"a task\")\n\n"
  },
  {
    "path": "books/pytest/tests/ch_06/test_custom.py",
    "content": "import pytest\n\nfrom src import (\n    Card,\n    CardsDB,\n    InvalidCardId,\n)\n\npytestmark = [pytest.mark.custom]\n\n@pytest.fixture(scope=\"session\")\ndef db(tmp_path_factory):\n    db_path = tmp_path_factory.mktemp(\"cards_db\")\n    _db = CardsDB(db_path)\n    yield _db\n    _db.close()\n\n\n@pytest.fixture(scope=\"function\")\ndef cards_db(db):\n    db.delete_all()\n    return db\n\n\n@pytest.mark.smoke\ndef test_start(cards_db):\n    i = cards_db.add_card(Card(\"foo\", state=\"todo\"))\n    cards_db.start(i)\n    c = cards_db.get_card(i)\n    assert c.state == \"in prog\"\n\n\n@pytest.mark.exception\ndef test_start_non_existent(cards_db):\n    with pytest.raises(InvalidCardId):\n        cards_db.start(123)\n"
  },
  {
    "path": "books/pytest/tests/ch_06/text_combination.py",
    "content": "import pytest\nfrom src import (\n    Card,\n    CardsDB,\n)\n\n\n@pytest.fixture(scope=\"session\")\ndef db(tmp_path_factory):\n    db_path = tmp_path_factory.mktemp(\"cards_db\")\n    _db = CardsDB(db_path)\n    yield _db\n    _db.close()\n\n\n@pytest.fixture(scope=\"function\")\ndef cards_db(db, request, faker):\n    db.delete_all()\n\n    faker.seed_instance(101)\n    m = request.node.get_closest_marker(\"num_cards\")\n    if m and len(m.args) > 0:\n        num_cards = m.args[0]\n        for _ in range(num_cards):\n            db.add_card(Card(summary=faker.sentence(), owner=faker.first_name()))\n    return db\n\n\n@pytest.mark.num_cards\ndef test_zero(cards_db):\n    assert cards_db.count() == 0\n\n\n@pytest.mark.num_cards(3)\ndef test_three(cards_db):\n    assert cards_db.count() == 3\n"
  },
  {
    "path": "books/pytest/tests/ch_12/hello.py",
    "content": "def main():\n    print(\"Hello world\")\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "books/pytest/tests/ch_12/test_hello.py",
    "content": "from tests.ch_12 import hello\n\n\ndef test_hello(capsys):\n    hello.main()\n    output = capsys.readouterr().out\n    assert output == \"Hello world\\n\"\n"
  },
  {
    "path": "books/pytest/tests/ch_15/conftest.py",
    "content": "import pytest\n\n\ndef pytest_configure(config):\n    config.addinivalue_line(\"markers\", \"slow: mark test as slow to run\")\n\n\ndef pytest_addoption(parser):\n    parser.addoption(\"--slow\", action=\"store_true\", help=\"include tests marked slow\")\n\n\ndef pytest_collection_modifyitems(config, items):\n    if not config.getoption(\"--slow\"):\n        skip_slow = pytest.mark.skip(reason=\"need --slow option to run\")\n        for item in items:\n            if item.get_closest_marker(\"slow\"):\n                item.add_marker(skip_slow)\n"
  },
  {
    "path": "books/pytest/tests/ch_15/pytest.ini",
    "content": "[pytest]\nmarkers = slow: mark test as slow to run\n"
  },
  {
    "path": "books/pytest/tests/ch_15/test_slow.py",
    "content": "import pytest\n\n\ndef test_normal():\n    pass\n\n@pytest.mark.slow\ndef test_slow():\n    pass"
  },
  {
    "path": "books/python-architecture-patterns/Dockerfile",
    "content": "FROM python:3.10.2\n\nWORKDIR /src\n\nENV PYTHONPATH \"${PYTHONPATH}:/src\"\n\nCOPY requirements.txt .\nCOPY setup.cfg .\n\nRUN pip install -r requirements.txt\n\nCOPY src/ src/\nCOPY tests/ tests/\n"
  },
  {
    "path": "books/python-architecture-patterns/Makefile",
    "content": "test-flake8:\n\tdocker-compose run --rm api flake8 .\n\ntest-mypy:\n\tdocker-compose run --rm api mypy .\n\ntest-pytest:\n\tdocker-compose run --rm api pytest .\n"
  },
  {
    "path": "books/python-architecture-patterns/docker-compose.yml",
    "content": "version: \"3.9\"\nservices:\n\n  redis_pubsub:\n    build:\n      context: .\n      dockerfile: Dockerfile\n    image: allocation-image\n    depends_on:\n      - postgres\n      - redis\n      - mailhog\n    environment:\n      - DB_HOST=postgres\n      - DB_PASSWORD=abc123\n      - REDIS_HOST=redis\n      - EMAIL_HOST=mailhog\n      - PYTHONDONTWRITEBYTECODE=1\n    volumes:\n      - ./:/src\n    entrypoint:\n      - python\n      - src/redis_consumer.py\n\n  api:\n    image: allocation-image\n    build:\n      context: .\n      dockerfile: Dockerfile\n    depends_on:\n      - redis_pubsub\n    volumes:\n      - ./:/src\n    environment:\n      - DB_HOST=postgres\n      - DB_PASSWORD=abc123\n      - API_HOST=api\n      - REDIS_HOST=redis\n      - EMAIL_HOST=mailhog\n      - PYTHONUNBUFFERED=1\n      - PYTHONDONTWRITEBYTECODE=1\n    command: uvicorn src.app:api --host 0.0.0.0 --port 80 --reload\n    ports:\n      - \"5005:80\"\n\n  postgres:\n    image: postgres:14.2\n    environment:\n      - POSTGRES_USER=allocation\n      - POSTGRES_PASSWORD=abc123\n    ports:\n      - \"54321:5432\"\n\n  redis:\n    image: redis:alpine\n    ports:\n      - \"63791:6379\"\n\n  mailhog:\n    image: mailhog/mailhog\n    ports:\n      - \"11025:1025\"\n      - \"18025:8025\"\n"
  },
  {
    "path": "books/python-architecture-patterns/notes.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Architecture Patterns with Python: Enabling Test-Driven Development, Domain-Driven Design, and Event-Driven Microservices\n\nBook by Harry Percival and Bob Gregory\n\nCode here: [click](.)\n\n- [Introduction](#introduction)\n- [Chapter 1: Domain Modeling](#chapter-1-domain-modeling)\n- [Chapter 2: Repository Pattern](#chapter-2-repository-pattern)\n- [Chapter 3: On Coupling and Abstractions](#chapter-3-on-coupling-and-abstractions)\n- [Chapter 4: FlaskAPI and Service Layer](#chapter-4-flaskapi-and-service-layer)\n- [Chapter 5: TDD in High Gear and Low Gear](#chapter-5-tdd-in-high-gear-and-low-gear)\n- [Chapter 6: Unit of Work Pattern](#chapter-6-unit-of-work-pattern)\n- [Chapter 7: Aggregates and Consistency Boundaries](#chapter-7-aggregates-and-consistency-boundaries)\n- [Chapter 8: Events and the Message Bus](#chapter-8-events-and-the-message-bus)\n- [Chapter 9: Going to Town the Message Bus](#chapter-9-going-to-town-the-message-bus)\n- [Chapter 10: Commands and Command Handler](#chapter-10-commands-and-command-handler)\n- [Chapter 11: Event-Driven Architecture: Using Events to Integrate Microservices](#chapter-11-event-driven-architecture-using-events-to-integrate-microservices)\n- [Chapter 12: Command-Query Responsibility Segregation (CQRS)](#chapter-12-command-query-responsibility-segregation-cqrs)\n- [Chapter 13: Dependency Injection (and Bootstrapping)](#chapter-13-dependency-injection-and-bootstrapping)\n- [Epilogue](#epilogue)\n- [Appendix](#appendix)\n\n## Introduction\n\nSoftware systems tend toward chaos. When we first start building a new system, we have grand ideas that our code will be\nclean and well-ordered, but iver time we find that it gathers cruft and edge cases and ends up a confusing morass of\nmanager classes and util modules.\n\nFortunately, the techniques to avoid creating a big ball of mud aren't complex.\n\nEncapsulation covers two closely related ideas: simplifying behavior and hiding data. We encapsulate behavior by\nidentifying a task that needs to be done in our code and giving that task a well-defined object or function. We call\nthat object ro function an abstraction.\n\nEncapsulating behavior by using abstractions is a powerful tool for making code more expressive, more testable, and\neasier to maintain.\n\nEncapsulation and abstraction help us by hiding details and protecting the consistency of our data, but wee= need to pay\nattention to the interactions between our objects and functions. When one function, module or object uses another, we\nsay that the one depends on the other. Those dependencies form a kind of network or graph. For example: Presentation\nLayer -> Business Logic -> Database Layer.\n\nLayered architecture is the most common pattern for building business software.\n\nThe Dependency Inversion Principle:\n\n1. High-level modules should not depend on low-level modules. Both should depend on abstractions.\n2. Abstractions should not depend on details. Instead, details should depend on abstractions.\n\nHigh-level modules are the code that your organization really cares about. The high-level modules of a software system\nare the functions, classes, and packages that deal with our real-world concepts. By contract, low-level modules are the\ncode that your organization doesn't care about. If payroll runs on time, your business is unlikely to care whether that\nis a coron job or a transient function running on Kubernetes.\n\n> All problems in computer science can be solved by adding another level of indirection ~ David Wheeler.\n\nWe don't want business logic changes to slow down because they arte closely coupled to low-level infrastructure details.\nAdding an abstraction between them allows the two to change independently of each other.\n\n## Chapter 1: Domain Modeling\n\nThe _domain_ is a fancy word of saying _the problem you are trying to solve_. A _model_ is a map of a process or\nphenomenon that captures a useful property.\n\nIn a nutshell, DDD says that the most important thing about software is that it provides a useful model of a problem. If\nwe get that model right, our software delivers value and makes new things possible.\n\nWhen we hear our business stakeholders using unfamiliar words, or using terms in a specific way, we should listen to\nunderstand the deeper meaning and encode their hard-won experience into our software.\n\nChoose memorable identifiers for our objects so that the examples are easier to talk about.\n\nWhenever we have a business concept that has data but has no identity, we often choose to represent it using the Value\nObject pattern. A value object is any domain object that is uniquely identified as by the data it holds, we usually make\nthem immutable. Named tuples and frozen data classes are a great tool for this.\n\nEntities, unlike values, have identity equality. We can change their values, and they are still recognizably the same\nthing. Batches, in our example, are entities. We can allocate lines to a batch, or change the date that we expect it to\narrive, and it will still be the same entity.\n\nWe usually make this explicit in code by implementing equality operators on entities.\n\nFor value objects, the hash should be based on all attributes, and we should ensure that the objects are immutable. For\nentities, the simplest option is to say that the hash is None, meaning that the object is not hashable and cannot, for\nexample be used in a set. If for some reason you decide to use set or dict operations with entities, the hash should be\nbased on the attributes, that defines the entity's unique identity over time.\n\nExceptions can express domain concepts too.\n\n## Chapter 2: Repository Pattern\n\nRepository Pattern - a simplifying abstraction over data storage, allowing us to decouple our model layer from the data\nlayer. This simplifying abstraction makes our system more testable by hiding the complexities of the database. It hides\nthe boring details of data access by pretending that all of our data is in memory. This pattern is very common in DDD.\n\nLayered architecture is a common approach to structuring a system that has a UI, some logic, and a database.\n\nOnion architecture - model being inside, and dependencies flowing inward to it.\n\nORM gives us persistence ignorance - fancy model doesn't need to know anything about how data is loaded or persisted.\nUsing and ORM is already an example of the DIP. Instead of depending on hardcoded SQL, we depend on abstraction - the\nORM.\n\nThe simplest repository has just two methods:\n\n- add - to put a new item in the repository\n- get - to return a previously added item.\n\nOne of the biggest benefits of the Repository pattern is the possibility to build a fake repository.\n\n> Building fakes for your abstractions is an excellent way to get design feedback: if it's hard to fake, the abstraction\n> is probably too complicated.\n\nSimple CRUD wrapper around a database, don't need a domain model or a repository.\n\nRepository Pattern Recap:\n\n- Apply dependency inversion to your ORM - Domain model should be free of infrastructure concerns, so your ORM should\n  import your model, and not the other way around.\n- The Repository pattern is a simple abstraction around permanent storage - The repository gives you the illusion of a\n  collection of in-memory objects. It makes it easy to create a FakeRepository for testing and to swap fundamental\n  details of your infrastructure without disrupting your code application.\n\n## Chapter 3: On Coupling and Abstractions\n\nWhen we are unable to change component A for fear of breaking component B, we say that the components have become\ncoupled. Globally, coupling increases the risk and the cost of changing our code, sometimes to the point where we feel\nunable to make any changes at all.\n\nWe can reduce the degree of coupling within a system by abstracting away the details.\n\nAccording to authors it is better to use fake resources instead of mocks:\n\n- Mocks are used to verify how something gets used, they have methods like `assert_called_once_with`. They are\n  associated with London-school TDD.\n- Fakes are working implementations of the thing they are replacing, but they are designed for use only in tests. They\n  wouldn't work in real life. You can use them to make assertions about the end state of a system rather than the\n  behaviours along the way, so they are associated with classic-style TDD.\n\nTDD is a design practice first and a testing practice second. The tests act as a record of our design choices and serve\nto explain the system to use when we return to the code after a long absence.\n\nTests that use too many mocks get overwhelmed with setup code that hides the story we care about.\n\nLinks:\n\n- [YOW! Conference 2017 - Steve Freeman - Test Driven Development: That’s Not What We Meant](https://www.youtube.com/watch?v=B48Exq57Zg8)\n- [Edwin Jung - Mocking and Patching Pitfalls](https://www.youtube.com/watch?v=Ldlz4V-UCFw)\n\n## Chapter 4: FlaskAPI and Service Layer\n\nService Layer - extract logic from the endpoint, because it might be doing too much - validating input, handling errors,\ncommitting.\n\nOur high-level module, the service layer, depends on the repository abstraction. And the details of the implementation\nfor our specific choice of persistent storage also depend on the same abstraction.\n\nThe responsibilities of the ~~Flask~~ FastAPI app are just standard web stuff - per-request session management, parsing\ninformation out of POST parameters, response status codes and JSON. All the orchestration logic is in the use\ncase/service layer, and the domain logic stays in the domain.\n\nApplication service - its job is to handle requests from the outside world and to orchestrate an operation. Drives the\napplication by following a bunch of simple steps:\n\n- Get some data from the database\n- Update the domain model\n- Persist any changes\n\nThis is the kind of boring work that has to happen for every operation in your system, and keeping it separate from\nbusiness logic helps to keep things tidy.\n\nDomain service - this is the name for a piece of logic that belongs in the domain model but doesn't sit naturally inside\na stateful entity or value object.\n\n## Chapter 5: TDD in High Gear and Low Gear\n\nOnce you implement domain modeling and the service layer, you really actually can get to a stage where unit tests\noutnumber integration and end-to-end tests by an order of magnitude.\n\nTests are supposed to help us change our system fearlessly, but often we see teams writing too many tests against their\ndomain model. This causes problems when they come to change their codebase and find that they need to update tens or\neven hundreds of unit tests.\n\nThe service layer forms an API for our system that we can drive in multiple ways. Testing against this API reduces the\namount of code that we need to change when we refactor our domain model. If we restrict ourselves to testing only against\nthe service layer, we will not have any tests that directly interact with \"private\" methods or attributes on our model\nobjects, which leaves us freer to refactor them.\n\nMost of the time, when we are adding a new feature or fixing a bug, we don't need to make extensive changes to the\ndomain mode. IN these cases, we prefer to write tests against service because of the lower coupling and higher coverage.\n\nWhen starting a new project or when hitting a particularly gnarly problem, we will drop back down to writing tests\nagainst the domain model, so we get better feedback and executable documentation of our intent.\n\nMetaphor of shifting gears - when starting a journey, the bicycle needs to be in a low hear, so it can overcome inertia.\nOnce we are off and running, we can go faster and more efficiently by changing into a high gear. But if we suddenly\nencounter a steep hill or are forced to slow down by a hazard, we again drop to a low gear until we can pick up speed\nagain.\n\nRules of Thumb for Different Types of Test:\n\n1. Aim for one end-to-end test per feature - the objective is to demonstrate that the feature works, and that all the\n   moving parts are glued together.\n2. Write the bulk of your tests against the service layer - these end-to-end tests offer a good trade-off between\n   coverage, runtime, and efficiency.\n3. Maintain a small core of tests written against your domain model - these tests have highly focused coverage and are\n   more brittle, but they have the highest feedback. Don't be afraid to delete these tests if the functionality is later\n   covered by tests at the service layer.\n4. Error handling counts as a feature - ideally, your application will be structured such that all errors bubble up to\n   your entrypoints are handled in the same way. This means you need to test ony the happy path for each feature, and to\n   reserve one end-to-end test for all unhappy paths.\n\nExpress your service layer in terms of primitives rather than domain objects.\n\n## Chapter 6: Unit of Work Pattern\n\nIf the Repository pattern is our abstraction over persistent storage, the Unit of Work pattern is our abstraction over\nthe idea of atomic operations. It will allow us to decouple our service layer from the data layer.\n\nUnit of Work acts as a single entrypoint to our persistent storage, and it keeps track of what objects were loaded and\nof the latest state.\n\nUnit of Work and Repository classes are collaborators.\n\n> Don't mock what you don't own\n\nRule of thumb that forces us to build these simple abstractions over messy subsystems. This encourages us to think\ncarefully about or designs.\n\nIt is better to require explicit commit, so we can choose when to flush state. The default behaviour is to not change\nanything, this makes software safe by default. There is one code path that leads to changes in the system: total success\nand an explicit commit. Any other code path, any exception, any early exit from the UoW's scope leads to safe state.\n\nYou should always feel free to throw away tests if you think they are not going to add value longer term.\n\nSQLAlchemy already uses a Unit Of Work in the shape of Session object (track changes to the entity, and when the session\nis flushed, all your changes are persisted together). Then, why bother? The Session API is very rich, Unit Of Work can\nsimplify the session to its essential core: start, commit or throw away. Besides, our Unit Of Work can access Repository\nobject.\n\nUnit of Work Pattern Recap:\n\n- _The Unit of Work Pattern is an abstraction around data integrity_ - It helps to enforce the consistency of our domain\n  model, and improves performance, by letting us perform a single flush operation at the end of an operation.\n- _It works closely with the Repository Pattern and Service Layer patterns_ - The Unit of Work pattern completes our\n  abstractions over data access by representing atomic updates. Each of our service-layer use cases runs in a single\n  unit of work that succeeds of rails as a block.\n- _This is a lovely case for a context manager_ - Context managers are an idiomatic way of defining scope in Python. We\n  can use a context manager to automatically roll back our work at the end of a request, which means the system is safe\n  by default.\n- _SQLAlchemy already implements this pattern_ - We introduce an even simpler abstraction over the SQLAlchemy Session\n  object in order to \"narrow\" the interface between the ORM and our code. This helps keep us loosely coupled.\n\n## Chapter 7: Aggregates and Consistency Boundaries\n\nConstraint is a rule that restricts the possible states of our model can get into, while an invariant is defined a\nlittle more precisely as a condition that is always true.\n\nThe Aggregate pattern - a design pattern from the DDD community that helps us to solve concurrency issues. An aggregate\nis just a domain object that contains other domain objects and lets us treat the whole collection as a single unit.\n\n> An aggregate is a cluster of associated objects that we treat as a unit for the purpose of data changes.\n\nWe have to choose right granularity for our aggregate. Candidates: Shipment, Cart, Stock, Product.\n\nBounded contexts were invented as a reaction against attempts to capture entire businesses into a single model.\nAttributes needed in one context are irrelevant in another. Concepts with the same name can have entirely different\nmeanings in different contexts. It is better to have several models, draw boundaries around each context, and handle the\ntranslation between different contexts explicitly.\n\nThis concept translates very well to the world of microservices, where each microservice is free to have its own concept\nof \"customer\" and its own rules for translating that to and from other microservices it integrates with.\n\nAggregates should be the only way to get to out model.\n\nThe Aggregate pattern is designed to help manage some technical constraints around consistency and performance.\n\nVersion numbers are just one way to implement optimistic* locking. Optimistic - our default assumption is that\neverything will be fine when two users want to make changes to the database. We think it will be unlikely that they will\nconflict each other. We let them go ahead and just make sure we have a way to notice if there is a problem.\n\nPessimistic - works under the assumption that two users are to cause conflicts, and we want to prevent conflicts in all\ncases, so we lock everything just to be safe. In our example, that would mean locking the whole `batches` table or using\n`SELECT FOR UPDATE`. With pessimistic locking, you don't need to think about handling failures because the database will\nprevent them.\n\nThe usual way to handle a failure is to retry the operation from the beginning.\n\nAggregates and Consistency Boundaries Recap:\n\n- _Aggregates are your entrypoints into the domain model_ - By restricting the number of ways that things can be\n  changed, we make the system easier to reason about.\n- _Aggregates are in charge of a consistency boundaries_ - An aggregate's job is to be able to manage our business rules\n  about invariants as they apply to a group of related objects. It is the aggregate's job to check that the objects\n  within its remit are consistent with each other and with our rules, and to reject changes that would break the rules.\n- Aggregates and concurrency issues go together - When thinking about implementing these consistency checks, we end up\n  thinking about transactions and locks. Choosing the right aggregate is about performance as well as conceptual\n  organization of your domain.\n\n## Chapter 8: Events and the Message Bus\n\nReporting, permissions and workflows touching zillions of objects make a mess of our codebase.\n\n> Rule of thumb: if you can't describe what your function does without using words like \"then\" or \"and\", you might be\n> violating the SRP.\n\nA message bus gives us a nice way to separate responsibilities when we have to take multiple actions in response to a\nrequest.\n\nDomain Events and the Message Bus Recap:\n\n- _Events can help with the single responsibility principle_ - Code gets tangled up when we mix multiple concerns in one\n  place. Events can help us to keep things tidy by separating primary use cases from secondary ones. We also use events\n  for communicating between aggregates so that we don't need to run long-running transactions that lock against multiple\n  tables.\n- _A message bus routes messages to handlers_ - You can think of a message bus as a dict that maps from events to their\n  consumers. It doesn't \"know\" anything bout the meaning of events; it is just a piece of dumb infrastructure for\n  getting messages around the system.\n- _Option 1: Service layer raises events and passes them to message bus_ - The simplest way to start using events is\n  your system is to raise them from handlers by calling `bus.handle(event)` after you commit your unit of work.\n- _Option 2: Domain model raises events, service layer passes them to message bus_ - The logic about when to raise an\n  event really should live with the model, so we can improve our system's design and testability by raising events from\n  the domain model. It is easy for our handlers to collect events off the model objects after commit and pass them to\n  the bus.\n- _Option 3: UoW collects events from aggregates and passes the to message bus_ - Adding `bus.handle(aggregate.events)`\n  to every handler is annoying, so we can tidy up by making our unit of work responsible for raising events that were\n  raised by loaded objects. This is the most complex design and might rely on ORM magic, but it is clean and easy to use\n  once set up.\n\n## Chapter 9: Going to Town the Message Bus\n\nIf we rethink our API calls as capturing events, the service-layer functions can be event handlers too, and we no longer\nneed to make a distinction between internal and external event handlers.\n\nMultiple database transactions can cause integrity issues. Something could happen that means the first transaction\ncompletes but the second one does not.\n\nEvents are simple dataclasses that define the data structures for inputs and internal messages within our system. This\nis quite powerful from a DDD standpoint, since events often translate very well into business language.\n\nHandlers are the way we react to the events. They can call down to our model or call out to external services. We can\ndefine multiple handlers for a single event if we want to. Handlers can also raise other events. This allows us to be\nvery granular about what a handler does and really stick to the SRP.\n\n## Chapter 10: Commands and Command Handler\n\nCommands are a type of message - instructions sent by one part of a system to another. We usually represent commands\nwith dumb data structures and can handle them in much the same way as events. Commands are sent by one actor to another\nspecific actor with the expectation that a particular thing will happen as a result. When we post a form to an API\nhandler, we are sending a command. We name commands with imperative mood verb phrases like \"allocate stock\" or \"delay\nshipment\".\n\nEvents are broadcast by an actor to all interested listeners. We often use events to spread the knowledge about\nsuccessful commands. We name events with past-tense verb phrases like \"order allocated to stock\" or \"shipment delayed\".\n\nHow to mitigate problems caused by the lost messages? The system might be left in an inconsistent state. In our\nallocation service we have already taken steps to prevent that from happening. We have carefully identified aggregates\nthat act as consistency boundaries, and whe have introduced a UoW that manages the atomic success or failure of an\nupdate to an aggregate.\n\nWhen a user wants to make the system do something, we represent their request as a command. That command should modify a\nsingle aggregate and either succeed or fail in totality. Any other bookkeeping, cleanup and notification we need to do\ncan happen via an event. We don't require the event handlers to succeed in order for the command to be successful.\n\nWe raise events about an aggregate after we persist our state to the database. It is OK for events to fail independently\nfrom the commands that raised them.\n\nTenacity is a Python library that implements common patterns for retrying.\n\n## Chapter 11: Event-Driven Architecture: Using Events to Integrate Microservices\n\nOften, first instinct when migrating an existing application to microservices, is to split system into _nouns_.\n\nStyle of architecture, where we create a microservice per database table and treat out HTTP APIa as CRUD interfaces to\nanemic models, is the most common initial way for people to approach service-oriented design. This works fine for\nsystems that are very simple, but it can quickly degrade into a distributed ball of mud.\n\nWhen two things have to be changed together, we say that they are coupled. We can never completely avoid coupling,\nexcept by having our software not talk to any other software. What we want is to avoid inappropriate coupling.\n\nHow do we get appropriate coupling? We should think in terms of verbs, not nouns. Our domain model is about modeling a\nbusiness process. It is not a static data about a thing, it is a model of a verb.\n\nInstead of thinking about a system for orders and a system for batches, we think about a system for allocating and\nordering.\n\nMicroservices should be consistency boundaries. That means we don't need to rely on synchronous calls. Each service\naccepts commands from the outside world and raises events to record the result. Other services can listen to those\nevents to trigger the next steps in the workflow.\n\nThings can fail independently, it is easier to handle degraded behavior - we can still take orders if the allocation\nservice is having a bad day. Secondly, we are reducing the strength of coupling between our systems. If we need to\nchange the order of operations or to introduce new steps in the process, we can do that locally.\n\nEvents can come from the outside, but they can also be published externally.\n\n> Event notification is nice because it implies a low level of coupling, and is pretty simple to set up. It can become\n> problematic, however, if there really is a logical flow that runs over various event notifications. It can be hard to\n> see such flow as it is not explicit in any program text. This can make it hard to debug and modify.\n\n~ Martin Fowler.\n\n## Chapter 12: Command-Query Responsibility Segregation (CQRS)\n\nReads (queries) and writes (commands) are different, so they should be treated differently.\n\nMost users are not going to buy your product, they are just viewers. We can make reads eventually consistent in order to\nmake them perform better.\n\nAll distributed systems are inconsistent. As soon as you have a web server and two customers, you have the potential for\nstale data. No matter what we do, we are always going to find that our software systems are inconsistent with reality,\nand so we will always need business process to cope with these edge cases. It is OK to trade performance for consistency\non the read side, because stale data is essentially unavoidable.\n\nREADS: Simple read, highly cacheable, can be stale.\n\nWRITE: Complex business logic, uncacheable, must be transactionally consistent.\n\nPost/Redirect/Get Pattern - In this technique, a web endpoint accepts an HTTP POST and responds with a redirect to see\nthe result. For example, we might accept a POST to /batches to create a new batch and redirect user to /batches/123 to\nsee their newly created batch. This approach fixes the problems that arise whe users refresh the results page in their\nbrowser. It can lead to our users double-submitting data and thus buying two sofas when they needed only one. This\ntechnique is a simple example of CQS. In CQS we follow one simple rule - functions should either modify state or answer\nquestions. We can apply the same design by returning 201 Created or 202 Accepted, with a Location header containing the\nURI of our new resources.\n\nORM can expose us to performance problems. SELECT N+1 Problem is a common performance problem with ORMs - when\nretrieving a list of objects, your ORM will often perform an initial query to get all IDs of the objects it needs, and\nthen issue individual queries for each object to retrieve their attributes. This is especially likely if there are any\nforeign-key relationships on your objects.\n\nEven with well-tuned indexes, a relational database uses a lot of CPU to perform joins. The fastest queries will always\nbe `SELECT * FROM table WHERE condition`. More than raw speed, this approach buys us scale. Read-only stores can be\nhorizontally scaled out.\n\nRead model can be implemented using Redis.\n\nAs domain model becomes richer and more complex, a simplified read model become compelling.\n\n## Chapter 13: Dependency Injection (and Bootstrapping)\n\nMocks tightly couple us to the implementation. By choosing to monkeypatch `email.send_mail`, we are tied to\ndoing `import email`, and if we ever want to do `from email import send_mail`, we will have to change all our mocks.\n\nDeclaring explicit dependencies is unnecessary, and using them would make our application code marginally more complex.\nBut in return we would get tests that are easier to write and manage.\n\n> Explicit is better than implicit.\n\nPutting all the responsibility for passing dependencies to the right handler onto the message bus feels like a violation\nof the SRP. Instead, we will reach for a pattern called Composition Root (a bootstrap script), and we will do a bit of \"\nmanual DI\" (dependency inversion without a framework).\n\nSetting up dependency injection is just one of many typical setup activities that you need to do when starting your app.\nPutting this all together into a bootstrap script is often a good idea.\n\nThe bootstrap is also good as a place to provide sensible default configuration for your adapters, and as a single place\nto override those adapters with fakes for your tests.\n\n## Epilogue\n\nMaking complex changes to a system is often an easier sell if you link it to feature work. Perhaps you are launching a\nnew product or opening your service to new markets? This is the right time to spend engineering resources on fixing the\nfoundations. With a six-month project to deliver, it is easier to make the argument for three weeks of cleanup work.\n\nThe Strangler Fig pattern involves creating a new system around the edges of an old system, while keeping it running.\nBits of old functionality are gradually intercepted and replaced, until the old system is left doing nothing at all and\ncan be switched off.\n\nFocus on a specific problem and ask yourself how you can put the relevant ideas to use, perhaps in an initially limited\nand imperfect fashion.\n\nReliable messaging is hard: Redis pub/sub is not reliable and should not be used as a general-purpose messaging tool.\n\nWe explicitly choose small, focused transactions that can fail independently.\n\n## Appendix\n\n- Entity - A domain object whose attributes may change but that has a recognizable identity over time.\n- Value object - An immutable domain object whose attributes entirely define it. It is fungible with other identical\n  objects.\n- Aggregate - Cluster of associated objects that we treat as a unit for the purpose of data changes. Defines and\n  enforces a consistency boundary.\n- Event - Represents something that happened.\n- Command - Represents a job the system should perform.\n- Unit of work - Abstraction around data integrity. Each unit of work represents an atomic update. Makes repository\n  available. Tracks new events on retrieved aggregates.\n- Repository - Abstraction around persistent storage. Each aggregate has its own repository.\n\nDocker: Mounting our source and test code as `volumes` means we don't need to rebuild our containers every time we make\na code change.\n\nPostel's Law (robustness principle):\n\n> Be liberal in what you accept, and conservative in what you emit\n\nTolerant Reader Pattern: Validate as little as possible. Read only the fields you need, and don't overspecify their\ncontents. This will help your system stay robust when other systems change over time. Resist the temptation to share\nmessage definitions between systems: instead make it easy to define the data you depend on.\n\nIf you are in change of an API that is open to the public on the big bad internet, there might be good reasons to be\nmore conservative about what inputs you allow.\n\nIf validation is needed, do it at the edge of the system in order to avoid polluting domain model. Bear in mind that\ninvalid data wandering through your system is a time bomb, the deeper it gets, the more damage it can do.\n"
  },
  {
    "path": "books/python-architecture-patterns/requirements.txt",
    "content": "pytest==6.2.5\nmypy==0.931\nflake8==4.0.1\nSQLAlchemy==1.4.31\nfastapi==0.73.0\nsqlmodel==0.0.6\nrequests==2.27.1\npsycopg2==2.9.3\nuvicorn==0.17.4\nredis==4.1.4\ntypes-redis==4.1.17\ntenacity==8.0.1\n"
  },
  {
    "path": "books/python-architecture-patterns/setup.cfg",
    "content": "[tool:pytest]\npython_paths = .\ntestpaths = tests\nnorecursedirs = .*\naddopts = -sl\nfilterwarnings =\n    ignore::DeprecationWarning\n    ignore::PendingDeprecationWarning\n\n[mypy]\npython_version = 3.10\nignore_missing_imports = True\nstrict_optional = False\n\n[mypy-app.cache]\nignore_errors = True\n\n[flake8]\nmax-line-length = 180\nmax-complexity = 10\nformat = pylint\nshow-source = True\nstatistics = True\n"
  },
  {
    "path": "books/python-architecture-patterns/src/__init__.py",
    "content": ""
  },
  {
    "path": "books/python-architecture-patterns/src/adapters/__init__.py",
    "content": ""
  },
  {
    "path": "books/python-architecture-patterns/src/adapters/notifications.py",
    "content": "from abc import (\n    ABC,\n    abstractmethod,\n)\nimport smtplib\n\nfrom src import config\n\nDEFAULT_HOST = config.get_email_host_and_port()[\"host\"]\nDEFAULT_PORT = config.get_email_host_and_port()[\"port\"]\n\n\nclass AbstractNotifications(ABC):\n    @abstractmethod\n    def send(self, destination, message):\n        raise NotImplementedError\n\n\nclass EmailNotifications(AbstractNotifications):\n    def __init__(self, smtp_host=DEFAULT_HOST, port=DEFAULT_PORT):\n        self.server = smtplib.SMTP(smtp_host, port=port)\n        self.server.noop()\n\n    def send(self, destination, message):\n        self.server.sendmail(\n            from_addr=\"allocations@example.com\",\n            to_addrs=[destination],\n            msg=f\"Subject: allocation service notification\\n{message}\",\n        )\n"
  },
  {
    "path": "books/python-architecture-patterns/src/adapters/orm.py",
    "content": "from sqlmodel import (\n    Field,\n    SQLModel,\n)\n\n\nclass AllocationsView(SQLModel, table=True):\n    id: int = Field(primary_key=True)\n    order_id: str\n    sku: str\n    batch_ref: str\n\n\ndef create_db_and_tables(engine):\n    SQLModel.metadata.create_all(engine)\n\n\ndef clean_db_and_tables(engine):\n    SQLModel.metadata.drop_all(engine)\n"
  },
  {
    "path": "books/python-architecture-patterns/src/adapters/redis_publisher.py",
    "content": "from redis.client import Redis\n\nfrom src import config\nfrom src.domain.events import Event\n\nr = Redis(**config.get_redis_host_and_port())\n\n\ndef publish(channel: str, event: Event):\n    r.publish(channel, event.json())\n"
  },
  {
    "path": "books/python-architecture-patterns/src/adapters/repository.py",
    "content": "from typing import (\n    Optional,\n    Protocol,\n    Set,\n)\n\nfrom sqlmodel import (\n    Session,\n    select,\n)\n\nfrom src.domain.model import (\n    Batch,\n    Product,\n)\n\n\nclass AbstractRepository(Protocol):\n    def add(self, product: Product):\n        ...\n\n    def get(self, sku: str) -> Optional[Product]:\n        ...\n\n    def get_by_batch_ref(self, ref: str) -> Optional[Product]:\n        ...\n\n\nclass Repository(AbstractRepository):\n    def __init__(self, session: Session):\n        self.session = session\n\n    def add(self, product: Product):\n        self.session.add(product)\n        self.session.commit()\n\n    def get(self, sku: str) -> Optional[Product]:\n        return self.session.exec(select(Product).where(Product.sku == sku)).first()\n\n    def get_by_batch_ref(self, ref: str) -> Optional[Product]:\n        return self.session.exec(select(Product).join(Batch).where(Batch.reference == ref)).first()\n\n\nclass TrackingRepository(AbstractRepository):\n    seen: Set[Product]\n\n    def __init__(self, repo: AbstractRepository):\n        super().__init__()\n        self.seen = set()\n        self._repo = repo\n\n    def add(self, product: Product):\n        self._repo.add(product)\n        self.seen.add(product)\n\n    def get(self, sku: str) -> Optional[Product]:\n        product = self._repo.get(sku)\n        if product:\n            self.seen.add(product)\n        return product\n\n    def get_by_batch_ref(self, ref: str) -> Optional[Product]:\n        if product := self._repo.get_by_batch_ref(ref):\n            self.seen.add(product)\n        return product\n"
  },
  {
    "path": "books/python-architecture-patterns/src/app.py",
    "content": "from fastapi import (\n    FastAPI,\n    Response,\n    status,\n)\n\nfrom src import views\nfrom src.bootstrap import bootstrap\nfrom src.domain import commands\nfrom src.domain.model import (\n    Batch,\n    OrderLine,\n    OutOfStock,\n)\nfrom src.service_layer.handlers import InvalidSku\n\nbus = bootstrap()\napi = FastAPI()\n\n\n@api.post(\"/allocate\")\nasync def allocate_endpoint(order_line: OrderLine, response: Response):\n    try:\n        bus.handle(commands.Allocate(order_id=order_line.order_id, sku=order_line.sku, qty=order_line.qty))\n    except (OutOfStock, InvalidSku) as e:\n        response.status_code = status.HTTP_400_BAD_REQUEST\n        return {\"message\": str(e)}\n\n    return {\"message\": \"ok\"}\n\n\n@api.post(\"/add_batch\")\nasync def add_batch_endpoint(batch: Batch):\n    bus.handle(commands.CreateBatch(ref=batch.reference, sku=batch.sku, qty=batch.purchased_quantity, eta=batch.eta))\n    return {\"message\": \"ok\"}\n\n\n@api.post(\"/allocate/{order_id}\")\nasync def allocate_view_endpoint(order_id: str, response: Response):\n    if result := views.allocations(order_id, bus.uow):\n        return result\n    response.status_code = status.HTTP_400_BAD_REQUEST\n    return response\n"
  },
  {
    "path": "books/python-architecture-patterns/src/bootstrap.py",
    "content": "import inspect\nfrom typing import Callable\n\nfrom sqlalchemy.engine import Engine\nfrom sqlmodel import create_engine\n\nfrom src import config\nfrom src.adapters import redis_publisher\nfrom src.adapters.notifications import (\n    AbstractNotifications,\n    EmailNotifications,\n)\nfrom src.adapters.orm import create_db_and_tables\nfrom src.service_layer.message_bus import (\n    COMMAND_HANDLERS,\n    EVENT_HANDLERS,\n    MessageBus,\n)\nfrom src.service_layer.unit_of_work import (\n    AbstractUnitOfWork,\n    UnitOfWork,\n)\n\n\ndef bootstrap(start_orm: bool = True, engine: Engine = create_engine(config.get_postgres_uri()), uow: AbstractUnitOfWork = UnitOfWork(),\n              notifications: AbstractNotifications = EmailNotifications(), publish: Callable = redis_publisher.publish):\n    if start_orm:\n        create_db_and_tables(engine)\n\n    dependencies = {\"uow\": uow, \"notifications\": notifications, \"publish\": publish}\n    injected_event_handlers = {\n        event_type: [\n            inject_dependencies(handler, dependencies)\n            for handler in event_handlers\n        ]\n        for event_type, event_handlers in EVENT_HANDLERS.items()\n    }\n    injected_command_handlers = {\n        command_type: inject_dependencies(handler, dependencies)\n        for command_type, handler in COMMAND_HANDLERS.items()\n    }\n\n    return MessageBus(uow=uow, event_handlers=injected_event_handlers, command_handlers=injected_command_handlers)\n\n\ndef inject_dependencies(handler, dependencies):\n    params = inspect.signature(handler).parameters\n    deps = {\n        name: dependency\n        for name, dependency in dependencies.items()\n        if name in params\n    }\n    return lambda message: handler(message, **deps)\n"
  },
  {
    "path": "books/python-architecture-patterns/src/config.py",
    "content": "import os\n\n\ndef get_postgres_uri():\n    host = os.environ.get(\"DB_HOST\", \"localhost\")\n    port = 54321 if host == \"localhost\" else 5432\n    password = os.environ.get(\"DB_PASSWORD\", \"abc123\")\n    user, db_name = \"allocation\", \"allocation\"\n    return f\"postgresql://{user}:{password}@{host}:{port}/{db_name}\"\n\n\ndef get_api_url():\n    host = os.environ.get(\"API_HOST\", \"localhost\")\n    port = 80\n    return f\"http://{host}:{port}\"\n\n\ndef get_redis_host_and_port():\n    host = os.environ.get(\"REDIS_HOST\", \"localhost\")\n    port = 63791 if host == \"localhost\" else 6379\n    return dict(host=host, port=port)\n\n\ndef get_email_host_and_port():\n    host = os.environ.get(\"EMAIL_HOST\", \"localhost\")\n    port = 11025 if host == \"localhost\" else 1025\n    http_port = 18025 if host == \"localhost\" else 8025\n    return dict(host=host, port=port, http_port=http_port)\n"
  },
  {
    "path": "books/python-architecture-patterns/src/domain/__init__.py",
    "content": ""
  },
  {
    "path": "books/python-architecture-patterns/src/domain/commands.py",
    "content": "from dataclasses import dataclass\nfrom datetime import date\nfrom typing import Optional\n\n\nclass Command:\n    pass\n\n\n@dataclass\nclass Allocate(Command):\n    order_id: str\n    sku: str\n    qty: int\n\n\n@dataclass\nclass CreateBatch(Command):\n    ref: str\n    sku: str\n    qty: int\n    eta: Optional[date] = None\n\n\n@dataclass\nclass ChangeBatchQuantity(Command):\n    ref: str\n    qty: int\n"
  },
  {
    "path": "books/python-architecture-patterns/src/domain/events.py",
    "content": "from pydantic import BaseModel\n\n\nclass Event(BaseModel):\n    pass\n\n\nclass OutOfStock(Event):\n    sku: str\n\n\nclass Allocated(Event):\n    order_id: str\n    sku: str\n    qty: int\n    batch_ref: str\n\n\nclass Deallocated(Event):\n    order_id: str\n    sku: str\n    qty: int\n\n\nclass BatchQuantityChanged(Event):\n    batch_ref: str\n    qty: int\n"
  },
  {
    "path": "books/python-architecture-patterns/src/domain/model.py",
    "content": "from datetime import date\nfrom typing import (\n    Iterable,\n    List,\n    Optional,\n    Union,\n    cast,\n)\n\nfrom pydantic import PrivateAttr\nfrom pydantic.fields import ModelPrivateAttr\nfrom sqlmodel import (\n    Field,\n    Relationship,\n    SQLModel,\n)\n\nfrom src.domain import (\n    commands,\n    events,\n)\n\nMessage = Union[commands.Command, events.Event]\n\n\nclass OutOfStock(Exception):\n    pass\n\n\nclass OrderLine(SQLModel, table=True):\n    order_id: str\n    sku: str\n    qty: int\n    # DB-specific fields:\n    id: Optional[int] = Field(default=None, primary_key=True)\n    batch_id: Optional[int] = Field(default=None, foreign_key=\"batch.id\")\n    batch: Optional[\"Batch\"] = Relationship(back_populates=\"allocations\")\n\n\nclass Batch(SQLModel, table=True):\n    reference: str\n    sku: str\n    purchased_quantity: int\n    eta: Optional[date]\n    allocations: List[\"OrderLine\"] = Relationship(back_populates=\"batch\")\n    # DB-specific fields:\n    id: Optional[int] = Field(default=None, primary_key=True)\n    product_id: Optional[int] = Field(default=None, foreign_key=\"product.id\")\n    product: Optional[\"Product\"] = Relationship(back_populates=\"batches\")\n\n    def __eq__(self, other):\n        if not isinstance(other, Batch):\n            return False\n        return other.reference == self.reference\n\n    def __hash__(self):\n        return hash(self.reference)\n\n    def __gt__(self, other):\n        if self.eta is None:\n            return False\n        if other.eta is None:\n            return True\n        return self.eta > other.eta\n\n    def allocate(self, order_line: OrderLine) -> None:\n        if not self.can_allocate(order_line):\n            return\n        if order_line in self.allocations:\n            return\n        self.allocations.append(order_line)\n\n    def deallocate(self, order_line: OrderLine) -> None:\n        if order_line not in self.allocations:\n            return\n        self.allocations.remove(order_line)\n\n    def deallocate_one(self):\n        return self.allocations.pop()\n\n    @property\n    def allocated_quantity(self) -> int:\n        return sum(line.qty for line in self.allocations)\n\n    @property\n    def available_quantity(self) -> int:\n        return self.purchased_quantity - self.allocated_quantity\n\n    def can_allocate(self, order_line: OrderLine) -> bool:\n        return self.sku == order_line.sku and self.available_quantity >= order_line.qty\n\n\nclass Product(SQLModel, table=True):\n    sku: str\n    batches: List[\"Batch\"] = Relationship(back_populates=\"product\")\n    # DB-specific fields:\n    id: Optional[int] = Field(default=None, primary_key=True)\n    version_number: int = 0\n    # DB excluded fields:\n    _messages: ModelPrivateAttr = PrivateAttr(default=[])\n\n    def __hash__(self):\n        return hash(self.sku)\n\n    @property\n    def messages(self) -> List[Message]:\n        return self._messages.default\n\n    def allocate(self, order_line: OrderLine) -> Optional[str]:\n        try:\n            batch = next(b for b in sorted(cast(Iterable, self.batches)) if b.can_allocate(order_line))\n        except StopIteration:\n            self.messages.append(events.OutOfStock(sku=order_line.sku))\n            return None\n        batch.allocate(order_line)\n        self.version_number += 1\n        self.messages.append(events.Allocated(\n            order_id=order_line.order_id,\n            sku=order_line.sku,\n            qty=order_line.qty,\n            batch_ref=batch.reference\n        ))\n        return batch.reference\n\n    def change_batch_quantity(self, ref: str, qty: int):\n        batch = next(b for b in self.batches if b.reference == ref)\n        batch.purchased_quantity = qty\n        while batch.available_quantity < 0:\n            line = batch.deallocate_one()\n            self.messages.append(commands.Allocate(order_id=line.order_id, sku=line.sku, qty=line.qty))\n"
  },
  {
    "path": "books/python-architecture-patterns/src/redis_consumer.py",
    "content": "import json\nfrom typing import Dict\n\nfrom redis.client import Redis\n\nfrom src import config\nfrom src.bootstrap import bootstrap\nfrom src.domain import (\n    commands,\n    events,\n)\nfrom src.service_layer.message_bus import MessageBus\n\nr = Redis(**config.get_redis_host_and_port())\n\n\ndef main():\n    bus = bootstrap()\n    pubsub = r.pubsub(ignore_subscribe_messages=True)\n    pubsub.subscribe(\"change_batch_quantity\")\n\n    for m in pubsub.listen():\n        _handle_change_batch_quantity(m, bus)\n\n\ndef _handle_change_batch_quantity(message: Dict, bus: MessageBus):\n    event = events.BatchQuantityChanged(**json.loads(message[\"data\"]))\n    cmd = commands.ChangeBatchQuantity(ref=event.batch_ref, qty=event.qty)\n\n    bus.handle(message=cmd)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "books/python-architecture-patterns/src/service_layer/__init__.py",
    "content": ""
  },
  {
    "path": "books/python-architecture-patterns/src/service_layer/handlers.py",
    "content": "from src.adapters import redis_publisher\nfrom src.adapters.notifications import AbstractNotifications\nfrom src.domain import (\n    commands,\n    events,\n)\n\nfrom src.domain.model import (\n    Batch,\n    OrderLine,\n    Product,\n)\nfrom src.service_layer.unit_of_work import (\n    AbstractUnitOfWork,\n    UnitOfWork,\n)\n\n\nclass InvalidSku(Exception):\n    pass\n\n\ndef allocate(command: commands.Allocate, uow: AbstractUnitOfWork) -> str:\n    order_line = OrderLine(order_id=command.order_id, sku=command.sku, qty=command.qty)\n    with uow:\n        product = uow.products.get(sku=command.sku)\n\n        if not product:\n            raise InvalidSku(f\"Invalid SKU: {command.sku}\")\n\n        batch_ref = product.allocate(order_line)\n        uow.commit()\n\n    return batch_ref\n\n\ndef add_batch(command: commands.CreateBatch, uow: AbstractUnitOfWork):\n    with uow:\n        product = uow.products.get(command.sku)\n        if not product:\n            product = Product(sku=command.sku, batches=[])\n            uow.products.add(product)\n        product.batches.append(Batch(reference=command.ref, sku=command.sku, purchased_quantity=command.qty, eta=command.eta))\n        uow.commit()\n\n\ndef change_batch_quantity(command: commands.ChangeBatchQuantity, uow: AbstractUnitOfWork):\n    with uow:\n        product = uow.products.get_by_batch_ref(command.ref)\n        product.change_batch_quantity(ref=command.ref, qty=command.qty)\n        uow.commit()\n\n\ndef send_out_of_stock_notification(event: events.OutOfStock, notifications: AbstractNotifications):\n    notifications.send(\"stock@made.com\", f\"Out of stock for {event.sku}\")\n\n\ndef publish_allocated_event(event: events.Allocated, uow: AbstractUnitOfWork):\n    redis_publisher.publish(\"line_allocated\", event)\n\n\ndef add_allocation_to_read_model(event: events.Allocated, uow: UnitOfWork):\n    with uow:\n        uow.session.execute(\n            \"\"\"\n            INSERT INTO allocationsview (order_id, sku, batch_ref)\n            VALUES (:order_id, :sku, :batch_ref)\n            \"\"\",\n            dict(order_id=event.order_id, sku=event.sku, batch_ref=event.batch_ref),\n        )\n        uow.commit()\n\n\ndef remove_allocation_from_read_model(event: events.Deallocated, uow: UnitOfWork):\n    with uow:\n        uow.session.execute(\n            \"\"\"\n            DELETE FROM allocationsview\n            WHERE order_id = :order_id AND sku = :sku\n            \"\"\",\n            dict(order_id=event.order_id, sku=event.sku),\n        )\n        uow.commit()\n\n\ndef reallocate(event: events.Deallocated, uow: AbstractUnitOfWork, ):\n    with uow:\n        product = uow.products.get(sku=event.sku)\n        product.messages.append(commands.Allocate(**event.dict()))\n        uow.commit()\n"
  },
  {
    "path": "books/python-architecture-patterns/src/service_layer/message_bus.py",
    "content": "import logging\nfrom typing import (\n    Callable,\n    Dict,\n    List,\n    Type,\n    Union,\n)\n\nfrom src.domain import (\n    commands,\n    events,\n)\nfrom src.service_layer import handlers\nfrom src.service_layer.unit_of_work import AbstractUnitOfWork\n\nlogger = logging.getLogger(__name__)\n\nMessage = Union[commands.Command, events.Event]\n\nEVENT_HANDLERS: Dict[Type[events.Event], List[Callable]] = {\n    events.OutOfStock: [handlers.send_out_of_stock_notification],\n    events.Allocated: [handlers.publish_allocated_event, handlers.add_allocation_to_read_model],\n    events.Deallocated: [handlers.remove_allocation_from_read_model, handlers.reallocate]\n}\n\nCOMMAND_HANDLERS: Dict[Type[commands.Command], Callable] = {\n    commands.CreateBatch: handlers.add_batch,\n    commands.ChangeBatchQuantity: handlers.change_batch_quantity,\n    commands.Allocate: handlers.allocate,\n}\n\n\nclass MessageBus:\n    def __init__(self, uow: AbstractUnitOfWork, event_handlers: Dict[Type[events.Event], List[Callable]], command_handlers: Dict[Type[commands.Command], Callable]):\n        self.uow = uow\n        self.event_handlers = event_handlers\n        self.command_handlers = command_handlers\n\n        self.queue: List[Message] = []\n\n    def handle(self, message: Message):\n        self.queue = [message]\n        while self.queue:\n            message = self.queue.pop(0)\n            if isinstance(message, events.Event):\n                self._handle_event(message)\n            elif isinstance(message, commands.Command):\n                self._handle_command(message)\n            else:\n                raise Exception(f\"{message} was not an Event or Command\")\n\n    def _handle_event(self, event: events.Event):\n        for handler in self.event_handlers[type(event)]:\n            try:\n                logger.debug(f\"Handling event {event} with handler {handler}\")\n                handler(event)\n                self.queue.extend(self.uow.collect_new_messages())\n            except Exception as e:\n                logger.exception(f\"Exception handling event {event}: {e}\")\n                continue\n\n    def _handle_command(self, command: commands.Command):\n        try:\n            handler = self.command_handlers[type(command)]\n            handler(command)\n            self.queue.extend(self.uow.collect_new_messages())\n        except Exception:\n            logger.exception(\"Exception handling command %s\", command)\n            raise\n"
  },
  {
    "path": "books/python-architecture-patterns/src/service_layer/unit_of_work.py",
    "content": "from __future__ import annotations\n\nfrom abc import (\n    ABC,\n    abstractmethod,\n)\nfrom typing import Optional\n\nfrom sqlmodel import (\n    Session,\n    create_engine,\n)\n\nfrom src.adapters.repository import (\n    Repository,\n    TrackingRepository,\n)\nfrom src.config import get_postgres_uri\n\n\nclass AbstractUnitOfWork(ABC):\n    products: TrackingRepository\n\n    def __enter__(self) -> AbstractUnitOfWork:\n        return self\n\n    def __exit__(self, *args):\n        self.rollback()\n\n    def commit(self):\n        self._commit()\n\n    def collect_new_messages(self):\n        for product in self.products.seen:\n            while product.messages:\n                yield product.messages.pop(0)\n\n    @abstractmethod\n    def rollback(self):\n        raise NotImplementedError\n\n    @abstractmethod\n    def _commit(self):\n        raise NotImplementedError\n\n\ndef default_session():\n    return Session(create_engine(get_postgres_uri(), isolation_level=\"REPEATABLE READ\"))\n\n\nclass UnitOfWork(AbstractUnitOfWork):\n    def __init__(self, session: Optional[Session] = None):\n        # 'default_session()' can not be in the '__init__' because it would be evaluated only once:\n        self.session = session if session else default_session()\n\n    def __enter__(self):\n        self.products = TrackingRepository(repo=Repository(self.session))\n        return super().__enter__()\n\n    def __exit__(self, *args):\n        super().__exit__(*args)\n        self.session.close()\n\n    def rollback(self):\n        self.session.rollback()\n\n    def _commit(self):\n        self.session.commit()\n"
  },
  {
    "path": "books/python-architecture-patterns/src/views.py",
    "content": "from typing import (\n    Dict,\n    List,\n)\n\nfrom src.service_layer.unit_of_work import UnitOfWork\n\n\ndef allocations(order_id: str, uow: UnitOfWork) -> List[Dict]:\n    with uow:\n        results = uow.session.execute(\n            \"SELECT sku, batch_ref FROM allocationsview WHERE order_id = :order_id\",\n            dict(order_id=order_id),\n        )\n    return [dict(r) for r in results]\n"
  },
  {
    "path": "books/python-architecture-patterns/tests/__init__.py",
    "content": ""
  },
  {
    "path": "books/python-architecture-patterns/tests/conftest.py",
    "content": "import pytest\nimport redis\nfrom sqlmodel import (\n    Session,\n    create_engine,\n)\nfrom starlette.testclient import TestClient\nfrom tenacity import (\n    retry,\n    stop_after_delay,\n)\n\nfrom src import config\nfrom src.adapters.orm import (\n    clean_db_and_tables,\n    create_db_and_tables,\n)\nfrom src.app import api\n\n\n@pytest.fixture\ndef in_memory_db():\n    engine = create_engine(\"sqlite:///:memory:\")\n    clean_db_and_tables(engine)\n    create_db_and_tables(engine)\n    return engine\n\n\n@pytest.fixture\ndef session(in_memory_db):\n    create_db_and_tables(in_memory_db)\n    yield Session(in_memory_db)\n    clean_db_and_tables(in_memory_db)\n\n\n@retry(stop=stop_after_delay(10))\ndef wait_for_postgres_to_come_up(engine):\n    engine.connect()\n\n\n@retry(stop=stop_after_delay(10))\ndef wait_for_redis_to_come_up():\n    r = redis.Redis(**config.get_redis_host_and_port())\n    return r.ping()\n\n\n@pytest.fixture(scope=\"session\")\ndef postgres_db():\n    engine = create_engine(config.get_postgres_uri())\n    wait_for_postgres_to_come_up(engine)\n    clean_db_and_tables(engine)\n    create_db_and_tables(engine)\n    return engine\n\n\n@pytest.fixture\ndef postgres_session(postgres_db):\n    create_db_and_tables(postgres_db)\n    yield Session(postgres_db)\n    clean_db_and_tables(postgres_db)\n\n\n@pytest.fixture\ndef client():\n    return TestClient(api)\n"
  },
  {
    "path": "books/python-architecture-patterns/tests/e2e/__init__.py",
    "content": ""
  },
  {
    "path": "books/python-architecture-patterns/tests/e2e/api_client.py",
    "content": "import json\n\nfrom src.domain.model import (\n    Batch,\n    OrderLine,\n)\n\n\ndef post_to_allocate(client, order_id, sku, qty):\n    return client.post(\"/allocate\", json=json.loads(OrderLine(order_id=order_id, sku=sku, qty=qty).json()))\n\n\ndef get_allocation(client, order_id):\n    return client.post(f\"/allocate/{order_id}\")\n\n\ndef post_to_add_batch(client, ref, sku, qty, eta):\n    return client.post(\"/add_batch\", json=json.loads(Batch(reference=ref, sku=sku, purchased_quantity=qty, eta=eta).json()))\n"
  },
  {
    "path": "books/python-architecture-patterns/tests/e2e/redis_client.py",
    "content": "import json\nimport redis\n\nfrom src import config\n\nr = redis.Redis(**config.get_redis_host_and_port())\n\n\ndef subscribe_to(channel):\n    pubsub = r.pubsub()\n    pubsub.subscribe(channel)\n    confirmation = pubsub.get_message(timeout=3)\n    assert confirmation[\"type\"] == \"subscribe\"\n    return pubsub\n\n\ndef publish_message(channel, message):\n    r.publish(channel, json.dumps(message))\n"
  },
  {
    "path": "books/python-architecture-patterns/tests/e2e/test_app.py",
    "content": "from datetime import date\nfrom uuid import uuid4\n\nfrom tests.e2e.api_client import (\n    get_allocation,\n    post_to_add_batch,\n    post_to_allocate,\n)\n\n\ndef random_suffix():\n    return uuid4().hex[:6]\n\n\ndef random_sku(name=''):\n    return f\"sku-{name}-{random_suffix()}\"\n\n\ndef random_batch_ref(name=''):\n    return f\"batch-{name}-{random_suffix()}\"\n\n\ndef random_order_id(name=''):\n    return f\"order-{name}-{random_suffix()}\"\n\n\ndef test_happy_path_returns_200_and_allocated_batch(client):\n    sku, other_sku = random_sku(), random_sku(\"other\")\n    order_id = random_order_id()\n    early_batch, later_batch, other_batch = random_batch_ref('1'), random_batch_ref('2'), random_batch_ref('3')\n    post_to_add_batch(client, later_batch, sku, 100, date(2011, 1, 2))\n    post_to_add_batch(client, early_batch, sku, 100, date(2011, 1, 1))\n    post_to_add_batch(client, other_batch, other_sku, 100, None)\n\n    response = post_to_allocate(client=client, order_id=order_id, sku=sku, qty=3)\n    assert response.status_code == 200, response.status_code\n\n    response = get_allocation(client=client, order_id=order_id)\n    assert response.status_code == 200\n    assert response.json() == [{\"sku\": sku, \"batch_ref\": early_batch}]\n\n\ndef test_unhappy_path_returns_400_and_error_message(client):\n    unknown_order_id, unknown_sku = random_order_id(), random_sku()\n    response = post_to_allocate(client=client, order_id=random_order_id(), sku=unknown_sku, qty=20)\n\n    assert response.status_code == 400\n    assert response.json()[\"message\"] == f\"Invalid SKU: {unknown_sku}\"\n\n    response = get_allocation(client=client, order_id=unknown_order_id)\n    assert response.status_code == 400\n"
  },
  {
    "path": "books/python-architecture-patterns/tests/e2e/test_external_events.py",
    "content": "import json\nfrom datetime import date\n\nimport pytest\nfrom tenacity import (\n    Retrying,\n    stop_after_delay,\n)\n\nfrom tests.e2e import redis_client\nfrom tests.e2e.api_client import (\n    post_to_add_batch,\n    post_to_allocate,\n)\nfrom tests.e2e.test_app import (\n    random_batch_ref,\n    random_order_id,\n    random_sku,\n)\n\n\ndef test_change_batch_quantity_leading_to_allocation(client):\n    order_id, sku = random_order_id(), random_sku()\n    earlier_batch, later_batch = random_batch_ref(\"old\"), random_batch_ref(\"new\")\n    post_to_add_batch(client=client, ref=earlier_batch, sku=sku, qty=10, eta=date(2021, 1, 1))\n    post_to_add_batch(client=client, ref=later_batch, sku=sku, qty=10, eta=date(2021, 1, 2))\n\n    response = post_to_allocate(client=client, order_id=order_id, sku=sku, qty=10)\n    assert response.status_code == 200\n\n    subscription = redis_client.subscribe_to(\"line_allocated\")\n\n    redis_client.publish_message(\"change_batch_quantity\", {\"batch_ref\": earlier_batch, \"qty\": 5})\n\n    # it may take some for message to arrive:\n    for attempt in Retrying(stop=stop_after_delay(3), reraise=True):\n        with attempt:\n            message = subscription.get_message(timeout=1)\n            if not message:\n                continue\n            data = json.loads(message[\"data\"])\n            assert data[\"order_id\"] == order_id\n            assert data[\"batch_ref\"] == later_batch\n    if not message:\n        pytest.fail(\"Message not fetched\")\n"
  },
  {
    "path": "books/python-architecture-patterns/tests/integration/__init__.py",
    "content": ""
  },
  {
    "path": "books/python-architecture-patterns/tests/integration/test_uow.py",
    "content": "from threading import Thread\nfrom time import sleep\nfrom typing import List\n\nimport pytest\nfrom sqlalchemy.orm import selectinload\nfrom sqlmodel import (\n    Session,\n    select,\n)\n\nfrom src.domain.model import (\n    Batch,\n    OrderLine,\n    Product,\n)\nfrom src.service_layer.unit_of_work import UnitOfWork\nfrom tests.e2e.test_app import random_batch_ref\n\nsku = \"GENERIC-SOFA\"\n\n\ndef insert_batch(session, batch_id):\n    session.add(Product(sku=sku, batches=[Batch(reference=batch_id, sku=sku, purchased_quantity=100, eta=None)]))\n\n\ndef get_allocated_batch_ref(session, order_id, sku):\n    batches = session.exec(select(Batch).where(Batch.sku == sku).options(selectinload(Batch.allocations))).all()\n    batch = next(batch for batch in batches for allocation in batch.allocations if allocation.order_id == order_id)\n    return batch.reference\n\n\ndef test_uow_retrieve_batch_and_allocate_to_it(session):\n    insert_batch(session, \"batch1\")\n    session.commit()\n\n    with UnitOfWork(session) as uow:\n        product = uow.products.get(sku=sku)\n        line = OrderLine(order_id=\"o1\", sku=sku, qty=10)\n        product.allocate(order_line=line)\n        uow.commit()\n\n    assert get_allocated_batch_ref(session, \"o1\", \"GENERIC-SOFA\") == \"batch1\"\n\n\ndef test_rolls_back_uncommitted_work_by_default(in_memory_db):\n    old_session, new_session = Session(in_memory_db), Session(in_memory_db)\n    with UnitOfWork():\n        insert_batch(old_session, \"batch1\")\n    assert list(new_session.exec(select(Batch)).all()) == []\n\n\ndef test_rolls_back_on_error(in_memory_db):\n    old_session, new_session = Session(in_memory_db), Session(in_memory_db)\n\n    class MyException(Exception):\n        pass\n\n    with pytest.raises(MyException):\n        with UnitOfWork(old_session):\n            insert_batch(old_session, \"batch1\")\n            raise MyException()\n\n    assert list(new_session.exec(select(Batch)).all()) == []\n\n\ndef try_to_allocate(order_id: str, exceptions: List[Exception]):\n    line = OrderLine(order_id=order_id, sku=sku, qty=10)\n    try:\n        with UnitOfWork() as uow:\n            product = uow.products.get(sku)\n            product.allocate(line)\n            sleep(0.2)\n            uow.commit()\n    except Exception as e:\n        exceptions.append(e)\n\n\ndef test_concurrent_updates_to_version_number_are_not_allowed(postgres_db):\n    session = Session(postgres_db)\n    insert_batch(session, random_batch_ref())\n    session.commit()\n    exceptions = []\n\n    t1, t2 = Thread(target=try_to_allocate, args=(\"order_id_1\", exceptions)), Thread(target=try_to_allocate, args=(\"order_id_2\", exceptions))\n    t1.start(), t2.start(), t1.join(), t2.join()\n\n    product = session.exec(select(Product).where(Product.sku == sku)).one()\n    assert product.version_number == 1\n    assert \"could not serialize access due to concurrent update\" in str(exceptions[0])\n"
  },
  {
    "path": "books/python-architecture-patterns/tests/integration/test_views.py",
    "content": "from datetime import date\nfrom unittest.mock import Mock\n\nimport pytest\nfrom sqlmodel import Session\n\nfrom src import views\nfrom src.adapters.orm import clean_db_and_tables\nfrom src.bootstrap import bootstrap\nfrom src.domain import commands\nfrom src.service_layer.unit_of_work import UnitOfWork\n\ntoday = date.today()\n\n\n@pytest.fixture\ndef sqlite_bus(in_memory_db):\n    bus = bootstrap(\n        start_orm=True,\n        uow=UnitOfWork(Session(in_memory_db)),\n        notifications=Mock(),\n        publish=lambda *args: None,\n    )\n    yield bus\n    clean_db_and_tables(in_memory_db)\n\n\ndef test_allocations_view(sqlite_bus):\n    sqlite_bus.handle(commands.CreateBatch(\"sku1batch\", \"sku1\", 50, None))\n    sqlite_bus.handle(commands.CreateBatch(\"sku2batch\", \"sku2\", 50, today))\n    sqlite_bus.handle(commands.Allocate(\"order1\", \"sku1\", 20))\n    sqlite_bus.handle(commands.Allocate(\"order1\", \"sku2\", 20))\n\n    sqlite_bus.handle(commands.CreateBatch(\"sku1batch-later\", \"sku1\", 50, today))\n    sqlite_bus.handle(commands.Allocate(\"other_order\", \"sku1\", 30))\n    sqlite_bus.handle(commands.Allocate(\"other_order\", \"sku2\", 10))\n\n    assert views.allocations(\"order1\", sqlite_bus.uow) == [\n        {\"sku\": \"sku1\", \"batch_ref\": \"sku1batch\"},\n        {\"sku\": \"sku2\", \"batch_ref\": \"sku2batch\"},\n    ]\n\n\ndef test_deallocation(sqlite_bus):\n    sqlite_bus.handle(commands.CreateBatch(\"b1\", \"sku1\", 50, None))\n    sqlite_bus.handle(commands.CreateBatch(\"b2\", \"sku1\", 50, today))\n    sqlite_bus.handle(commands.Allocate(\"o1\", \"sku1\", 40))\n    sqlite_bus.handle(commands.ChangeBatchQuantity(\"b1\", 10))\n\n    assert views.allocations(\"o1\", sqlite_bus.uow) == [\n        {\"batch_ref\": \"b1\", \"sku\": \"sku1\"},\n        {\"batch_ref\": \"b2\", \"sku\": \"sku1\"}\n    ]\n"
  },
  {
    "path": "books/python-architecture-patterns/tests/unit/__init__.py",
    "content": ""
  },
  {
    "path": "books/python-architecture-patterns/tests/unit/test_batches.py",
    "content": "from datetime import date\n\nfrom src.domain.model import (\n    Batch,\n    OrderLine,\n)\n\n\ndef batch_and_line(sku, batch_quantity, line_quantity):\n    return Batch(reference=\"batch-001\", sku=sku, purchased_quantity=batch_quantity, eta=date.today()), OrderLine(order_id=\"order-123\", sku=sku, qty=line_quantity)\n\n\ndef test_allocating_to_batch_reduces_available_quantity():\n    batch, line = batch_and_line(\"SMALL-TABLE\", 20, 2)\n    batch.allocate(line)\n    assert batch.available_quantity == 18\n\n\ndef test_can_allocate_if_available_greater_than_required():\n    large_batch, small_line = batch_and_line(\"ELEGANT-LAMP\", 20, 2)\n    assert large_batch.can_allocate(small_line)\n\n\ndef test_cannot_allocate_if_available_smaller_than_required():\n    small_batch, large_line = batch_and_line(\"ELEGANT-LAMP\", 2, 20)\n    assert not small_batch.can_allocate(large_line)\n\n\ndef test_not_allocate_if_available_equal_to_required():\n    small_batch, large_line = batch_and_line(\"ELEGANT-LAMP\", 2, 2)\n    assert small_batch.can_allocate(large_line)\n\n\ndef test_cannot_allocate_if_skus_dont_match():\n    batch = Batch(reference=\"batch-001\", sku=\"UNCOMFORTABLE-CHAIN\", purchased_quantity=100, eta=None)\n    different_sku_line = OrderLine(order_id=\"order-123\", sku=\"EXPENSIVE-TOASTER\", qty=10)\n    assert not batch.can_allocate(different_sku_line)\n\n\ndef test_can_only_deallocate_allocated_lines():\n    batch, unallocated_line = batch_and_line(\"DECORATIVE-TRINKET\", 20, 2)\n    batch.deallocate(unallocated_line)\n    assert batch.available_quantity == 20\n\n\ndef test_allocation_is_idempotent():\n    batch, line = batch_and_line(\"ANGULAR-DESK\", 20, 2)\n    batch.allocate(line)\n    batch.allocate(line)\n    assert batch.available_quantity == 18\n"
  },
  {
    "path": "books/python-architecture-patterns/tests/unit/test_handlers.py",
    "content": "from __future__ import annotations\n\nfrom collections import defaultdict\nfrom datetime import date\nfrom typing import (\n    Dict,\n    List,\n    Optional,\n)\n\nimport pytest\n\nfrom src.adapters.notifications import AbstractNotifications\nfrom src.adapters.repository import (\n    AbstractRepository,\n    TrackingRepository,\n)\nfrom src.bootstrap import bootstrap\nfrom src.domain import commands\nfrom src.domain.model import Product\nfrom src.service_layer.handlers import InvalidSku\nfrom src.service_layer.unit_of_work import AbstractUnitOfWork\n\n\nclass FakeRepository(AbstractRepository):\n    def __init__(self, products):\n        super().__init__()\n        self._products = set(products)\n\n    def add(self, product: Product):\n        self._products.add(product)\n\n    def get(self, sku: str) -> Optional[Product]:\n        return next((product for product in self._products if product.sku == sku), None)\n\n    def get_by_batch_ref(self, ref: str) -> Optional[Product]:\n        return next((product for product in self._products for batch in product.batches if batch.reference == ref), None)\n\n\nclass FakeUnitOfWork(AbstractUnitOfWork):\n    def __init__(self):\n        self.products = TrackingRepository(repo=FakeRepository([]))\n        self.committed = False\n\n    def rollback(self):\n        pass\n\n    def _commit(self):\n        self.committed = True\n\n\nclass FakeNotifications(AbstractNotifications):\n    def __init__(self):\n        self.sent: Dict[str, List[str]] = defaultdict(list)\n\n    def send(self, destination, message):\n        self.sent[destination].append(message)\n\n\ndef bootstrap_test_app():\n    return bootstrap(\n        start_orm=False,\n        uow=FakeUnitOfWork(),\n        notifications=FakeNotifications(),\n        publish=lambda *args: None,\n    )\n\n\nclass TestAddBatch:\n    def test_for_new_product(self):\n        bus = bootstrap_test_app()\n        bus.handle(commands.CreateBatch(ref=\"b1\", sku=\"CRUNCHY-ARMCHAIN\", qty=100))\n        assert bus.uow.products.get(\"CRUNCHY-ARMCHAIN\") is not None\n        assert bus.uow.committed\n\n    def test_for_existing_product(self):\n        bus = bootstrap_test_app()\n        bus.handle(commands.CreateBatch(ref=\"b1\", sku=\"GARISH-RUG\", qty=100))\n        bus.handle(commands.CreateBatch(ref=\"b2\", sku=\"GARISH-RUG\", qty=99))\n        assert \"b2\" in [b.reference for b in bus.uow.products.get(\"GARISH-RUG\").batches]\n\n\nclass TestAllocate:\n    def test_errors_for_invalid_sku(self):\n        bus = bootstrap_test_app()\n        bus.handle(commands.CreateBatch(ref=\"b1\", sku=\"AREALSKU\", qty=100))\n        with pytest.raises(InvalidSku, match=\"Invalid SKU: NONEXISTENTSKU\"):\n            bus.handle(commands.Allocate(order_id=\"o1\", sku=\"NONEXISTENTSKU\", qty=10))\n\n    def test_commits(self):\n        bus = bootstrap_test_app()\n        bus.handle(commands.CreateBatch(ref=\"b1\", sku=\"OMINOUS-MIRROR\", qty=100))\n        bus.handle(commands.Allocate(order_id=\"o1\", sku=\"OMINOUS-MIRROR\", qty=10))\n        assert bus.uow.committed\n\n    def test_sends_email_on_out_of_stock_error(self):\n        fake_notifications = FakeNotifications()\n        bus = bootstrap(\n            start_orm=False,\n            uow=FakeUnitOfWork(),\n            notifications=fake_notifications,\n            publish=lambda *args: None,\n        )\n        bus.handle(commands.CreateBatch(ref=\"b1\", sku=\"POPULAR-CURTAINS\", qty=9))\n        bus.handle(commands.Allocate(order_id=\"o1\", sku=\"POPULAR-CURTAINS\", qty=10))\n        assert fake_notifications.sent[\"stock@made.com\"] == [\"Out of stock for POPULAR-CURTAINS\"]\n\n\nclass TestChangeBatchQuantity:\n    def test_changes_available_quantity(self):\n        bus = bootstrap_test_app()\n        bus.handle(commands.CreateBatch(ref=\"batch1\", sku=\"ADORABLE-SETTEE\", qty=100))\n\n        [batch] = bus.uow.products.get(\"ADORABLE-SETTEE\").batches\n        assert batch.available_quantity == 100\n\n        bus.handle(commands.ChangeBatchQuantity(ref=\"batch1\", qty=50))\n        assert batch.available_quantity == 50\n\n    def test_reallocates_if_necessary(self):\n        bus = bootstrap_test_app()\n        event_history = [\n            commands.CreateBatch(ref=\"batch1\", sku=\"INDIFFERENT-TABLE\", qty=50),\n            commands.CreateBatch(ref=\"batch2\", sku=\"INDIFFERENT-TABLE\", qty=50, eta=date.today()),\n            commands.Allocate(order_id=\"order1\", sku=\"INDIFFERENT-TABLE\", qty=20),\n            commands.Allocate(order_id=\"order2\", sku=\"INDIFFERENT-TABLE\", qty=20),\n        ]\n        for e in event_history:\n            bus.handle(e)\n\n        [batch_1, batch_2] = bus.uow.products.get(\"INDIFFERENT-TABLE\").batches\n        assert batch_1.available_quantity == 10\n        assert batch_2.available_quantity == 50\n\n        bus.handle(commands.ChangeBatchQuantity(ref=\"batch1\", qty=25))\n        assert batch_1.available_quantity == 5\n        assert batch_2.available_quantity == 30\n"
  },
  {
    "path": "books/python-architecture-patterns/tests/unit/test_product.py",
    "content": "from datetime import date\n\nfrom src.domain import events\nfrom src.domain.model import (\n    Batch,\n    OrderLine,\n    Product,\n)\n\n\ndef test_prefers_current_stock_batches_to_shipments():\n    in_stock_batch = Batch(reference=\"in-stock-batch\", sku=\"RETRO-CLOCK\", purchased_quantity=100, eta=None)\n    shipment_batch = Batch(reference=\"shipment-batch\", sku=\"RETRO-CLOCK\", purchased_quantity=100, eta=None)\n    line = OrderLine(order_id=\"oref\", sku=\"RETRO-CLOCK\", qty=10)\n    product = Product(sku=\"RETRO-CLOCK\", batches=[in_stock_batch, shipment_batch])\n\n    product.allocate(line)\n\n    assert in_stock_batch.available_quantity == 90\n    assert shipment_batch.available_quantity == 100\n\n\ndef test_prefers_earlier_batches():\n    earliest = Batch(reference=\"speedy-batch\", sku=\"MINIMALIST-SPOON\", purchased_quantity=100, eta=date(2022, 1, 7))\n    medium = Batch(reference=\"normal-batch\", sku=\"MINIMALIST-SPOON\", purchased_quantity=100, eta=date(2022, 1, 8))\n    latest = Batch(reference=\"slow-batch\", sku=\"MINIMALIST-SPOON\", purchased_quantity=100, eta=date(2022, 1, 9))\n    line = OrderLine(order_id=\"oref\", sku=\"MINIMALIST-SPOON\", qty=10)\n    product = Product(sku=\"MINIMALIST-SPOON\", batches=[medium, earliest, latest])\n\n    product.allocate(line)\n\n    assert earliest.available_quantity == 90\n    assert medium.available_quantity == 100\n    assert latest.available_quantity == 100\n\n\ndef test_returns_allocated_batch_ref():\n    in_stock_batch = Batch(reference=\"in-stock-batch-ref\", sku=\"HIGHBROW-POSTER\", purchased_quantity=100, eta=None)\n    shipment_batch = Batch(reference=\"shipment-batch-ref\", sku=\"HIGHBROW-POSTER\", purchased_quantity=100, eta=date(2022, 1, 7))\n    line = OrderLine(order_id=\"oref\", sku=\"HIGHBROW-POSTER\", qty=10)\n    product = Product(sku=\"HIGHBROW-POSTER\", batches=[in_stock_batch, shipment_batch])\n\n    allocation = product.allocate(line)\n\n    assert allocation == in_stock_batch.reference\n\n\ndef test_records_out_of_stock_event_if_cannot_allocate():\n    batch = Batch(reference=\"batch\", sku=\"SMALL-FORM\", purchased_quantity=10, eta=date(2022, 1, 7))\n    product = Product(sku=\"SMALL-FORK\", batches=[batch])\n    product.allocate(OrderLine(order_id=\"oref\", sku=\"SMALL-FORM\", qty=10))\n\n    allocation = product.allocate(OrderLine(order_id=\"oref\", sku=\"SMALL-FORM\", qty=1))\n\n    assert product.messages[-1] == events.OutOfStock(sku=\"SMALL-FORM\")\n    assert allocation is None\n"
  },
  {
    "path": "books/refactoring.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Refactoring: Improving the Design of Existing Code\n\nBook by Martin Fowler (Second Edition)\n\n- [Chapter 1: Refactoring: A First Example](#chapter-1-refactoring-a-first-example)\n- [Chapter 2: Principles in Refactoring](#chapter-2-principles-in-refactoring)\n- [Chapter 3: Bad Smells in Code](#chapter-3-bad-smells-in-code)\n- [Chapter 4: Building Tests](#chapter-4-building-tests)\n- [Chapter 5: Introducing the Catalog](#chapter-5-introducing-the-catalog)\n- [Chapter 6: A First Set of Refactorings](#chapter-6-a-first-set-of-refactorings)\n- [Chapter 7: Encapsulation](#chapter-7-encapsulation)\n- [Chapter 8: Moving Features](#chapter-8-moving-features)\n- [Chapter 9: Organising Data](#chapter-9-organising-data)\n- [Chapter 10: Simplifying Conditional Logic](#chapter-10-simplifying-conditional-logic)\n- [Chapter 11: Refactoring APIs](#chapter-11-refactoring-apis)\n- [Chapter 12: Dealing with Inheritance](#chapter-12-dealing-with-inheritance)\n\n## Chapter 1: Refactoring: A First Example\n\nA poorly designed system is hard to change - because it is hard to figure out what to change and hoe these changes will\ninteract with existing code.\n\n> When you have to add a feature to a program but the code is not structured in a convenient way, first refactor the\n> program to make it easy to add the feature, then add the feature.\n\nBefore making any changes, start with self-checking tests (assertions checked by testing framework). Tests can be\nconsidered as bug detectors, they should catch any change that introduces bugs.\n\nRefactoring changes the programs in small steps, so if you make a mistake, it is easy to find where the bug is. Author\nsuggests committing after each successful refactoring, so it is easier get back to a working state, then he squashes\nchanges into more significant commits before pushing changes to the remote repository.\n\nWhen refactoring a long functions, mentally try to identify points that separate different parts of the overall\nbehaviour (decomposition). Extracting a function is a common refactoring technique.\n\n> Any fool can write code that a computer can understand. Good programmers write code that humans can understand.\n\nOther techniques discussed also later: Replace Temp with Query, Inline Variable, Change Function Declaration, Split\nLoop, Slide Statements.\n\nThink of the best name at the moment and rename it later. Breaking large functions into smaller, only adds value if the\nnames are good.\n\n> Programmers are poor judges of how code actually performs. Many of our intuitions are broken by clever compilers,\n> modern caching techniques, .... The performance of software usually depends on just a few parts of the code, and\n> changes anywhere else don't make an appreciable difference.\n\nANYHOW, if refactoring introduces performance slow-downs, finish refactoring first and then do performance tuning.\n\nMutable data quickly becomes something rotten.\n\n> Always leave the code base healthier than when you found it. It will never be perfect, but it should be better.\n\n> A true test of good code is how easy it is to change it. Code should be obvious.\n\nWhen doing refactoring, take small steps, each step should leave code in a working state that compiles and passes its\ntests.\n\n## Chapter 2: Principles in Refactoring\n\nRefactoring (noun) - a change made to the internal structure of software to make it easier to understand and cheaper to\nmodify without changing its observable behaviour.\n\nRefactoring (verb) - to restructure software by applying a series of refactorings without changing its observable\nbehaviour.\n\nWhen doing refactoring, code should not spend much time in a broken state, meaning it allows stopping at any moment even\nif you haven't finished. If someone says their code was broken for a couple of days while they are refactoring, you can\nbe pretty sure they were not refactoring.\n\nTwo Hats - when developing new functionalities - do not change existing code, when refactoring - do not add new\nfunctionalities. Swap hats: refactor, add functionality, refactor, ...\n\nWhy should we refactor?\n\n- software design improvement - changes are made to achieve short-term goals, because of that, code looses its\n  structure, regular refactoring help keep the code in shape. Important aspect of refactoring is eliminating duplicated\n  code.\n- makes software easier to understand - think about future developers, decrease time needed to make a change. You don't\n  have to remember every aspect of code, make it easy to understand and decrease load on your brain.\n- helps in finding bugs - clarify the structure, certain assumptions.\n- helps programming faster - adding new features might be difficult in a system full of patches and patches for patches,\n  clear structure allows adding new capabilities faster. Good design allows to quickly find place where a change needs\n  to be made. Also, if code is clear, it is less likely to introduce a bug. Code base should be a platform for building\n  new features for its domain.\n\n> The Rule of Three - The first time you do something, you just do it. The second time you do something similar, you\n> wince at the duplication, but you do the duplicate anyway. The third time you do something similar, you refactor.\n\nWhen should we refactor?\n\n- preparatory refactoring - building a foundation for a new feature.\n\n    - > It is like you want to go 100 km east but instead of traipsing through the woods, you drive 20 kms north to the\n      > highway, and then you are going 3x the speed you could have if you just went straight there.\n\n- comprehension refactoring - making code easier to understand. Move understanding of a subject from head to code\n  itself.\n\n- litter-pickup refactoring - make small changes around place you are currently viewing - Boy Scout Rule.\n\n- planned and opportunistic refactoring - refactoring should happen when doing other things, planned refactorings are\n  usually required in teams that neglected refactoring.\n\n- long-term refactoring - refactoring may take weeks because of new library or pull some section of code out into a\n  component that can be shared between teams - even in such cases refactoring should be performed in small steps.\n\n- refactoring in a code review - code reviews help spread knowledge, through a development team. Code may look clear to\n  me but not for my team. Code reviews give the opportunity for more people to suggest useful ideas.\n\nSometimes it is easier to rewrite than refactor. The decision to refactor or rewrite requires good judgement and\nexperience.\n\nHowever, there are a couple of problems associated to refactoring:\n\n- some people see refactoring as something that is slowing down development (which is not really true), this should be\n  explained - the economic benefits of refactoring should always be the driving factor, we refactor because it makes us\n  faster to add features and fix bugs.\n- merge conflicts may be painful, especially in a team of multiple full-time developers, suggested approach is to use CI\n    - Continuous Integration - each team member integrates with mainline at least once per day.\n- to perform refactoring correctly you need to have good tests, code needs to be self-testing, without self-testing code\n  refactoring carries high risk of introducing bugs\n- refactoring legacy code is hard, but is a fantastic tool to help understand a legacy system. Legacy code is often\n  missing tests, adding tests for legacy code is difficult because it wasn't designed with testing in mind.\n- some time ago database refactoring was considered a problem era, currently we have migrations which are making\n  database refactoring possible\n\nRefactoring changed how people think about architecture (previously: completed before any development, now: changed\niteratively). YAGNI does not mean you need to neglect all architectural thinking.\n\nIn order to be fully agile, team has to be capable and enthusiastic refactorers. The first foundation for refactoring is\nself-testing code, the second is CI.\n\nGood programmers know that they rarely write clean code the first time around.\n\nIDEs use the syntax tree to analyse and refactor code (e.g. changing variable name is on syntax tree level, not on text\nlevel), this makes IDEs more powerful than text editors.\n\n## Chapter 3: Bad Smells in Code\n\nWhen you should start refactoring? It is a matter of intuition. However, there are some indicators.\n\nMYSTERIOUS NAME - code needs to be mundane and clear, good name can save hours of puzzled incomprehension in the future.\n\nDUPLICATED CODE - if you see the same code structure in more than one place, your program will be better if you find a\nway to unify them., duplication means every time you read these copies you need to read them carefully and look for\ndifferences.\n\nLONG FUNCTION - the programs that live best and longest are those with short functions. Whenever you feel you need to\ncomment something - decompose. Even a single line is worth extracting if it needs an explanation. Conditionals and loops\nare also signs for extractions.\n\nLONG PARAMETER LIST - long lists of parameters are confusing - pass an object, use query on existing object or combine\nfunction on object.\n\nGLOBAL DATA - problem with global data is that it can be modified from any place in the code base, this leads to bugs.\nGlobal data: global variables, class variables, singletons. Global data is especially nasty when it is mutable.\n\nMUTABLE DATA - (from functional programming) data should never change, updating data structure should return a new copy\nof the structure, leaving the old data pristine.\n\nDIVERGENT CHANGE - making changes should be easy, if you need to, for example, edit 4 functions every time you add a new\nfinancial instrument, something is off.\n\nSHOTGUN SURGERY - every time you make a change, you have to make a lot of little edits to a lot of different classes,\nwhen changes are all over the place, they are hard to find, and it is easy to miss an important change. In such case all\nfields should be put in a single module.\n\nFEATURE ENVY - for example: a function in one module spends more time communicating with functions or data inside\nanother module than it does within its own module - the function clearly wants to be with the data, so move function to\nget it there. Put things together that change together.\n\nDATA CLUMPS - some items enjoy hanging around together, same three or four data items appear together in lots of places\n\n- you can group them together.\n\nPRIMITIVE OBSESSION - many programmers are reluctant to create their own fundamental types which are useful for their\ndomain.\n\nREPEATED SWITCHES - basically the same problem as in DUPLICATED CODE.\n\nLOOPS - loops are less relevant in programming today because of presence of map and filter mechanisms.\n\nLAZY ELEMENT - sometimes you may want to replace function with inline code or collapse objects hierarchy.\n\nSPECULATIVE GENERALITY - all the special cases to handle situations that are not going to happen soon (YAGNI).\n\nTEMPORARY FIELD - a class with a field which is set only in certain circumstances - difficult to understand.\n\nMESSAGE CHAINS - client asks object for another object, which the client asks for yer another object - this might cause\na train wreck, navigating such code is difficult.\n\nMIDDLE MAN - internal details of the object should be hidden from the rest of the world\n\nINSIDER TRADING - modules should be separated to keep them whispering, if 2 modules have common interests, create a\nthird module for this communication.\n\nLARGE CLASS - when class has too many fields it is a sign that it is doing too much, this means duplicated code, chaos\nand death.\n\nALTERNATIVE CLASSES WITH DIFFERENT INTERFACES - if you are allowing substitution, classes have to have the same\ninterface.\n\nDATA CLASS - classes with fields, setters and getters - nothing else. Such classes are often being manipulated in far\ntoo much detail by other classes. You can try to move that behaviour into the data class.\n\nREFUSED BEQUEST - wrong hierarchy, subclasses don't want or need what they are given.\n\nCOMMENTS - when you feel the need to write a comment, first try to refactor the code so that any comment becomes\nsuperfluous.\n\n## Chapter 4: Building Tests\n\nProper refactoring can not be done without proper tests. A suite of tests is a powerful bug detector that decapitates\nthe time it takes to find bugs.\n\nTDD allows concentrating on the interface rather than the implementation, which is a good thing.\n\nAlways make sure a test will fail when it should (try to break your code, to see if test fails as well).\n\nTesting should be risk-driven, you don't need to test every getter.\n\nWhen you get a bug report, start by writing a unit test that exposes the bug.\n\nThe best measure for a good enough test suite is subjective: How confident are you that is someone introduces a defect\ninto your code, some test will fail?\n\n## Chapter 5: Introducing the Catalog\n\nThe rest of the book is a catalog of refactorings. Each *Refactoring* has: name, sketch, motivation, mechanics and\nexamples.\n\n## Chapter 6: A First Set of Refactorings\n\nEXTRACT FUNCTION - write small functions.\n\nINLINE FUNCTIONS - inverse of *extract function*, sometimes function body is as clear as the name. Helpful when you need\nto group functions - first you join them and then extract functions.\n\nEXTRACT VARIABLE - inverse of *inline variable*, expressions can become very complex and hard to read in such\nsituations, local variables may help break the expression down into something more manageable.\n\nINLINE VARIABLE - inverse of *extract variable*, sometimes name doesn't communicate more than the expression itself.\n\nCHANGE FUNCTION DECLARATION - if you see a function with the wrong name, change it as soon you understand what a better\nname would be, so next time you are looking at the code you don't have to figure out what is going on. Often a good way\nof improving name is to write a comment to describe the function's purpose - then turn that comment into a name (applies\nto names as well). Adding / removing parameters can be done through introducing intermediate wrapping function.\n\nENCAPSULATE VARIABLE - encapsulate access to the variable using functions, instead of accessing data directly, do this\nthrough single access point - function. Keeping data encapsulated is less important for immutable data.\n\nRENAME VARIABLE - variables can do a lot to explain what programmer is up to (if he names it well).\n\nINTRODUCE PARAMETER OBJECT - often a group of data items travel together, appear in function after function. Such group\nis a data clump - this can be easily replaced with data structure.\n\nExample:\n\n```\ndef amountInvoiced(start: date, end: date)\ndef amountInvoiced(date_range: Range)\n```\n\nGrouping data into a structure is valuable because it makes explicit the relationship between the data items and reduces\nthe size of parameter lists. Grouping helps to identify new structures.\n\nCOMBINE FUNCTIONS INTO CLASS - when group of functions operate closely together on a common body of data, there is an\nopportunity to form a class.\n\n> Uniform access principle - All services offered by a module should be available through a uniform notation, which does\n> not betray whether they are implemented through storage or through computation. With this, the client of the class\n> can't tell whether the *value* is a field or derived value.\n\nCOMBINE FUNCTIONS INTO TRANSFORM - instead of aggregating function into classes you can build functions that are\nenriching existing objects. Transformation is about producing essentially the same thing with some additional\ninformation.\n\nSPLIT PHASE - whenever you encounter code that does two things, look for a way to split it into separate modules. If\nsome processing has 2 stages, make the difference explicit by turning them into 2 separate modules.\n\n## Chapter 7: Encapsulation\n\nENCAPSULATE RECORD - instead of using plain dictionaries, encapsulate them into object. With object, you can hide what\nis stored and provide methods for all the values. The user does not have to care which value is calculated and which is\nstored. **Dictionaries are useful** in many programming situations **but they are not explicit about their fields**.\nRefactor implicit structures into explicit ones.\n\nENCAPSULATE COLLECTION - good idea is to ensure that the getter for the collection can not accidentally change it. One\nway to prevent modification of a collection is to use some form of read-only proxy to the collection. Such proxy can\nallow all reads but block any write to the collection. The most popular approach is to provide a getting method for the\ncollection, but make it return a copy of underlying collection.\n\nReplacing `customer.orders.size` with `customer.num_of_orders` is not recommended, because adds a lot of extra code and\ncripples the easy composability of collection operations.\n\nIf the team has the habit of not modifying collections outside the original module, it might be enough.\n\nIt is worse to be moderately paranoid about collections, rather copy them unnecessarily than debug errors due to\nunexpected modifications. For example instead of sorting in place return a new copy.\n\nREPLACE PRIMITIVE WITH OBJECT - simple facts can be represented by simple data items such as numbers or strings, as\ndevelopment proceeds, those simple items aren't so simple anymore. This is one of the most important refactorings.\nStarting with simple wrapping value with the object, you can extend the class with additional behaviours.\n\nREPLACE TEMP WITH QUERY - using temporary variables allows referring to the value while explaining its meaning and\navoiding repeating the code that calculates it. But while using a variable is handy, it can often be worthwhile to go a\nstep further and use a function instead, mostly when the variable needs to be calculated multiple times across the\nclass.\n\nEXTRACT CLASS - split classes containing too much logic into separate classes. Good signs for doing so:\n\n- subset of the data and a subset of methods seem to go together\n- data that usually change together or are particularly dependent on each other\n\nUseful test: Ask question: what would happen if you remove a piece of data or a method, what other fields and methods\nwould become nonsense?\n\nINLINE CLASS - inverse of *Extract Class*. Generally useful as intermediate step when performing refactoring, e.g. you\nput all attributes in one class, just to split them later.\n\nHIDE DELEGATE - Example: `person.department.manager` should be replaced with `person.manager` (additional getter hiding\ndelegate). Why? If delegate changes its interface, change has to propagated across all parts of the system.\n\nREMOVE MIDDLE MAN - inverse of *Hide Delegate*. Sometimes forwarding introduced by Hide Delegate, becomes irritating.\nSometimes it is easier to call the delegate directly (violation of Law of Demeter, but author suggests better name:\nOccasionally Useful Suggestion of Demeter).\n\nSUBSTITUTE ALGORITHM - There are usually several ways to do the same thing, same is with algorithms. When you learn more\nabout the problem, you can realise there is an easier way to do it.\n\n## Chapter 8: Moving Features\n\nAnother important part of refactoring is moving elements between contexts.\n\nMOVE FUNCTION - one of the most straightforward reasons to move a function is when it references elements in other\ncontexts more than the one it currently resides in. Deciding to move a function rarely an easy decision. Examine the\ncurrent and candidate contexts for that function.\n\nMOVE FIELD - programming involves writing a lot of code that implements behaviour - but the strength of a program is\nreally founded on its data structures. If I have a good set of data structures that match the problem, then my behaviour\ncode is simple and straightforward. Moving fields usually happen in the context of a broader set of changes.\n\nMOVE STATEMENTS INTO FUNCTION - removing duplication is one of the best rules of thumb of healthy code. Look to combine\nrepeating code into the function. That way any future modifications to the repeating code can be done in one place and\nused by all the callers.\n\nMOVE STATEMENTS TO CALLERS - this is inverse of *Move Statements into Function*. Motivation for this refactoring is that\nwe rarely get the boundaries right. Sometimes common behaviour used in several places needs to vary in some of its\ncalls, that is why you can move the varying behaviour function to its callers.\n\nREPLACE INLINE CODE WITH FUNCTION CALL - functions allow packaging bits of behaviour, this is useful for understanding\n\n- a named function can explain the purpose of the code rather than its mechanics. Also, useful for deduplication.\n\nSLIDE STATEMENTS - code is easier to understand when things that are related to each other appear together. If several\nlines of code accesses the same data structure, it is best for them to be together rather than intermingled with code\naccessing other data structures. You can also declare the variable just before you first use it.\n\nSPLIT LOOP - you have often seen loops that are doing two different things at once just because they can do that with\none pass through a loop. But if you are doing two different things in the same loop, then whenever you need to modify\nthe loop you have to understand both things. By splitting loop, you ensure you only need to understand the behaviour you\nneed to modify. Many programmers are uncomfortable with this refactoring as it forces you to execute the loop twice.\nREMINDER: Once you have your code clear, you can optimise it, and if the loop traversal is a bottleneck, it is easy to\nslam the loops back together. But the actual iteration through even a large list I rarely a bottleneck, and splitting\nthe loops often enables other, more powerful optimisations.\n\nREPLACE LOOP WITH PIPELINE - language environments provide better constructs than loops - the collection\npipeline (`input.filter(...).map(...)`). Logic much easier to follow if it is expressed as a pipeline. It can be read\nfrom top to bottom to see how objects flow through the pipeline.\n\nREMOVE DEAD CODE - decent compilers will remove unused code. But unused code is still a significant burden when trying\nto understand how the software works. Once code is not used it should be deleted. If you need it sometime in future -\nyou have a version control system, so you can always dig it out again. Commenting out dead code was once a bad habit, it\nwas useful before version control systems were widely used or when they were inconvenient.\n\n## Chapter 9: Organising Data\n\nData structures play an important role in our programs, so no surprise there are a clutch of refactorings that focus on\nthem.\n\nSPLIT VARIABLE - Using a variable for two different things is very confusing for the reader. Any variable with more than\none responsibility should be replaced with multiple variables, one for each responsibility.\n\nException: Collecting variables (e.g. `i = i + 1`) - often used for calculating sums, string concatenation, writing to\nstream or adding to a collection - don't split it.\n\nRENAME FIELD - Data structures are the key to understand what is going in inside the system. It is essential to keep\nthem clear. Rename fields in classes / records, so they are easy to understand.\n\nREPLACE DERIVED VARIABLE WITH QUERY - One of the biggest sources of problems in software is mutable data. Data changes can\noften couple together parts of code in awkward ways, with changes in one part leading to knock-on effects that are hard\nto spot. Remove variables that can be easily calculated. A calculation often makes it clearer what the meaning of the\ndata is, and it is being protected by from being corrupted when you fail to update the variable as the source data\nchanges.\n\nCHANGE REFERENCE TO VALUE - Instead of updating values of the nested objects, create new object with updated params.\nValue objects are generally easier to reason about, particularly because they are immutable. Immutable data structures\nare easier to work with.\n\nCHANGE VALUE TO REFERENCE - (inverse of *Change Reference to Value*). A data structure may have several records linked\nto the same logical data structure. The biggest difficulty in having physical copies of the same logical data occurs\nwhen you need to update the shared data. Then you have to find all the copies and update them all. If you miss one, you\nwill get a troubling inconsistency in the data. In this case, it is often worthwhile to change the copied data into a\nsingle reference.\n\n## Chapter 10: Simplifying Conditional Logic\n\nMuch of the power of programs comes from their ability to implement conditional logic - but, sadly, much of the\ncomplexity of programs lies in these conditionals.\n\nDECOMPOSE CONDITIONAL - Length of a function is in itself a factor that makes it hared to read, but conditions increase\nthe difficulty. As with any large block of code, you can make your intention clearer by decomposing it and replacing\neach chunk of code with a function call named after the intention of that chunk.\n\nCONSOLIDATE CONDITIONAL EXPRESSION - Sometimes you can run into a series of conditional checks where each check is\ndifferent yet the resulting action is the same. When you see this, you can use `and` and `or` operators to consolidate\nthem into a single conditional check with a single result.\n\nConsolidating is important because it makes it clearer by showing that you are making a single check that combines other\nchecks, and because it often sets you up for *Extract Function*. Extracting a condition is one of the most useful things\nyou can do to clarify code.\n\nREPLACE NESTED CONDITIONAL WITH GUARD CLAUSES - Guard Clause says: \"This isn't the core to this function, and if it\nhappens, do something and get out\". In other words, if you know the result, return it immediately instead of assigning\nto `result` variable, just to have one single return statement at the end of the function.\n\n*// A guard clause is simply a check that immediately exits the function, either with a return statement or an\nexception.*\n\nREPLACE CONDITIONAL WITH POLYMORPHISM - It is possible to put logic in superclasses which allows reasoning about it\nwithout having to worry about the variants. Each variant case can be put in a subclass. Complex conditional logic can\nbe improved using polymorphism. This feature can be overused, basic conditional logic should use basic conditional\nstatements.\n\nINTRODUCE SPECIAL CASE - also known as: *Introduce Null Object*. Many parts of the system have the same reaction to a\nparticular value, you may want to bring that reaction into a single place. Special Case pattern is a mechanism that\ncaptures all the common behaviour, this allows to replace most of special-case checks with simple calls. A common value\nthat needs special-case processing is null, which is why this pattern is often called the Null Object pattern.\n\nINTRODUCE ASSERTION - Often, sections of code work only if certain conditions are true. Such assumptions are not often\nstated explicitly, but can only be deducted by looking through an algorithm. Sometimes, these assumptions are stated with\na comment. A better technique is to make the assumption explicit by writing assertion. Failure of an assertion indicates\na programmer error. Assertions should never be checked by other parts of the system. Assertions should be written that\nthe program functions equally correctly if they all removed. Use assertions to check things that need to be true, use\nthem when you think they should never fail.\n\n## Chapter 11: Refactoring APIs\n\nModules and functions are building the blocks of our software. APIs are the joints that we use to plug them together.\nMaking APIs easy to understand and use is difficult.\n\nSEPARATING QUERY FROM MODIFIER - It is a good idea to clearly signal the difference between functions with side\neffects and those without. A good rule to follow is that any function that returns value should not have *observable* (\ne.g. cache does not count) side effects (command-query separation). Having a function that gives value without\nobservable side effects is very valuable because you can call this function as often as you like.\n\nPARAMETRISE FUNCTION - If you see two functions that carry out very similar logic with different literal values, you can\nremove duplication by using a single function with parameters for the different values.\n\nREMOVE FLAG ARGUMENT - A flag argument is a function argument that the caller uses to indicate which logic the called\nfunction should execute (via boolean value, enum or strings). Flags complicate the process of understanding what\nfunction calls are available and how to call them. Boolean values are the worst since they don't convey their meaning to\nthe reader - what `true` means? Remove flag arguments. There is only one case for flag arguments - when there are more\nthan one flag arguments - making specialised function for every combination of values would greatly increase the\ncomplexity. But on the other hand this is a signal of function doing too much.\n\nPRESERVE WHOLE OBJECT - If you see code that derives couple of values from a record and then passes these values into a\nfunction, replace those values with the whole record itself, letting the function body derive the value it needs. This\nchange reduces number of parameters and handles better future changes. Pulling several values from an object to do some\nlogic on them alone is a smell - *Feature Envy* - and usually a signal that this logic should be moved into the object\nitself. If several bits of code only use the same subset of an object's features, then that may indicate a good\nopportunity for *Extract Class*.\n\nREPLACE PARAMETER WITH QUERY - (inverse of *Replace Query with Parameter*). The parameter list to a function should\nsummarise the points of variability of that function, indicating the primary ways in which that function may behave\ndifferently. If a call passes in value that the function can easily determine for itself, that is a form of duplication.\nWhen the parameter is present, determining its value is the caller's responsibility - otherwise, that responsibility\nshifts to the function body. Usually habit should be to simplify life for callers, which implies moving responsibility\nto the function body.\n\nREPLACE QUERY WITH PARAMETER - (inverse of *Replace Parameter with Query*). You can move query to the parameter, you\nforce caller to figure out how to provide this value. This complicates life for callers of the functions (preferably\nmake life easier for customers).\n\nREMOVE SETTING METHOD - Providing a setting method indicates that a field may be changed. If you don't want that field\nto change once the object is created, do not provide a setting method (and make field immutable). Remove setter to\nmake it clear that updates make no sense after construction.\n\nREPLACE CONSTRUCTOR WITH FACTORY FUNCTION - Constructors often come with awkward limitations that aren't there for\nregular functions. Constructor name is fixed, often require special operator (`new`). A factory function from no such\nlimitations.\n\nREPLACE FUNCTION WITH COMMAND - There are times when it is useful to encapsulate a function into its now object (command\nobject / command). Such an object is mostly built around a single method, whose request and execution is the purpose of\nthe object. A command offers a greater flexibility for the control and expression of a function than the plain function\nmechanism. Commands can have operations such as `undo`. There are good reasons to use commands, but do not forget that\nthis flexibility comes at a price paid in complexity.\n\nREPLACE COMMAND WITH FUNCTION - (inverse of *Replace Function with Command*) - Command object provide a powerful\nmechanism for handling complex computations. Most of the time, you just want to invoke a function and have it to do its\nthing. If the function isn't too complex, then a command object is more trouble than its worth and should be turned into\na regular function.\n\n## Chapter 12: Dealing with Inheritance\n\nInheritance is a very useful and easy to misuse mechanism.\n\nPULL UP METHOD - form of removing duplication (duplication is bad because there is risk that an alteration to one copy\nwill not be made to the other). Pulling method up means putting method in a parent class.\n\nPULL UP CONSTRUCTOR BODY - Common constructor behaviour should reside in the superclass.\n\nPUSH DOWN METHOD - (inverse of *Pull Up Method*). If a method is only relevant to someone subclass (or a small\nproportion of subclasses), removing it from the superclass and putting it only on the subclass makes that clearer. You\ncan only do this refactoring if the caller knows it is working with a particular subclass - otherwise, use *Replace\nConditional with Polymorphism* with some placebo behaviour on the superclass.\n\nPUSH DOWN FIELD - If a field is only used by one subclass (or a small proportion of subclasses), move it to those\nsubclasses.\n\nREPLACE TYPE CODE WITH SUBCLASS - Instead of using *flag* in the object indicating type of the class (\ne.g. `Employe(engineer)`) create specialised superclass.\n\nREMOVE SUBCLASS - (inverse of *Replace Type Code with Subclasses*). Subclasses are useful, but as software system\nevolves, subclasses can lose their value. A subclass that does too little incur a cost in understanding that is no\nlonger worthwhile. When that time, it is best to remove the subclass, replacing it with a field on its\nsuperclass.\n\nEXTRACT SUPERCLASS - If you see 2 classes doing similar things, you can take advantage of the basic mechanism of\ninheritance to pull their similarities together into a superclass.\n\nCOLLAPSE HIERARCHY - When refactoring a class hierarchy, you can often pull and push features around. As the hierarchy\nevolves, you can find that a class and its parent are no longer different enough to be worth keeping separate. At this\npoint you can merge them together.\n\nREPLACE SUBCLASS WITH DELEGATE - Instead of subclassing objects you can create separate, independent entity. There is a\npopular principle: \"*Favour object composition over class inheritance*\", however it doesn't mean \"*inheritance is\nconsidered harmful*\". Inheritance is a valuable mechanism that does the job most of the time without problems. So reach\nfor inheritance first, and move for delegation when it starts to rub badly.\n\nREPLACE SUPERCLASS WITH DELEGATE - Subclassing can be done in a way that leads to confusion and complication. One of\nclassing example is mis-inheritance from the early days of objects was making a stack be a subclass of a list. The idea\nwas to reuse list's data storage and operations, however many additional, not applicable methods were available to the\nstack. A better approach is to make the list a field of the stack and delegate the necessary operations to it. \n"
  },
  {
    "path": "books/release-it.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Release It! Design and Deploy Production-Ready Software\n\nBook by Michael T. Nygard (Second Edition)\n\n- [Chapter 1: Living in Production](#chapter-1-living-in-production)\n- [Chapter 2: Case Study: The Exception That Grounded an Airline](#chapter-2-case-study-the-exception-that-grounded-an-airline)\n- [Chapter 3: Stabilise Your System](#chapter-3-stabilise-your-system)\n- [Chapter 4: Stability Anti-patterns](#chapter-4-stability-anti-patterns)\n- [Chapter 5: Stability Patterns](#chapter-5-stability-patterns)\n- [Chapter 6: Case Study: Phenomenal Cosmic Powers, Itty-Bitty Living Space](#chapter-6-case-study-phenomenal-cosmic-powers-itty-bitty-living-space)\n- [Chapter 7: Foundations](#chapter-7-foundations)\n- [Chapter 8: Processes on Machines](#chapter-8-processes-on-machines)\n- [Chapter 9: Interconnect](#chapter-9-interconnect)\n- [Chapter 10: Control Plane](#chapter-10-control-plane)\n- [Chapter 11: Security](#chapter-11-security)\n- [Chapter 12: Case Study: Waiting for Godot](#chapter-12-case-study-waiting-for-godot)\n- [Chapter 13: Design for Deployment](#chapter-13-design-for-deployment)\n- [Chapter 14: Handling Versions](#chapter-14-handling-versions)\n- [Chapter 15: Case Study: Trampled by Your Own Customers](#chapter-15-case-study-trampled-by-your-own-customers)\n- [Chapter 16: Adaptation](#chapter-16-adaptation)\n- [Chapter 17: Chaos Engineering](#chapter-17-chaos-engineering)\n\n## Chapter 1: Living in Production\n\n\"Feature complete\" doesn't mean it is \"production ready\". A lot of bad things can happen on production (crazy users,\nviruses, high traffic, ...). Production is the only place to learn how the software will respond to real-world stimuli,\nhence software should be delivered to production quickly and gradually.\n\nMost software architecture and design happens in clean and distant from production environments.\n\nDesign and architecture decisions are also financial decisions (downtime, resource usage, ...). It is important to\nconsider availability, capacity and flexibility when designing software. Pragmatic architect should consider dynamic of\nchange.\n\n## Chapter 2: Case Study: The Exception That Grounded an Airline\n\nA tiny programming error starts the snowball rolling downhill.\n\nIn any incident, author's priority is always to restore service. Restoring service takes precedence over investigation.\nIf it is possible to gather some data for postmortem analysis, that's great - unless it makes the outage longer. The\ntrick to restoring the service is figuring out what to target. You can always \"reboot the world\" by restarting every\nsingle server, layer by layer but that's not effective. Instead, be a doctor diagnosing a disease, look at the symptoms\nand figure what disease to treat.\n\nA postmortem is like a murder mystery, there are set of clues - some are reliable like logs, some are unreliable like\ncomments from people, there is no corpse - the servers are up and running, the state that caused the error no longer\nexists.\n\nLog analysis helped to identify the root cause.\n\nBugs are inevitable, how to prevent bugs in one system from affecting everything else? We are going to look at design\npatterns that can prevent this type of problem from spreading.\n\n## Chapter 3: Stabilise Your System\n\nEnterprise software must be cynical - expects bad things to happen and is never surprised when they do. It doesn't even\ntrust itself, it refuses to get too intimate with other systems, because it could get hurt.\n\nPoor stability means real costs - millions lost for example in lost transaction in trading system, reputation loss. On\nthe other hand, good stability does not necessarily cost a lot. Highly stable design usually costs the same to implement\nas the unstable one.\n\nTransaction - abstract unit of work processed by the system.\n\nImpulse - rapid shock to the system. For example rumor about a new console, causes impulse on the manufacturer's website\nor celebrity tweet. Things that can fracture (break) the system in a blink of an eye.\n\nStress - a force applied to the system over an extended period.\n\nThe major dangers to system's longevity are memory leaks and data growth, difficult to catch during tests. Applications\nnever run long enough in development environment to reveal longevity bugs.\n\nFailures will happen, you have ability to prepare system for specific failures (like car engineers areas designated to\nprotect passengers by failing first). It is possible to create failure modes that protect the rest of the system.\n\nLess-coupled architectures act as shock absorbers, diminishing the effect of the error instead of amplifying them.\n\nTerminology:\n\n- Fault - a condition that creates an incorrect internal state in the software.\n- Error - visibly incorrect behaviour, e.g. trading system buying 10M Pokemon futures\n- Failure - an unresponsive system\n\nChain of failure: Triggering a fault opens the crack, faults become errors and errors provoke failures. On each step, a\nfault may accelerate. Tight coupling accelerate cracks.\n\nOne way to prepare for every possible failure is to look at every external call, every I/O, every use of resources, and\nask WHAT IF IT:  can't make connection, takes 10 minutes to make the connection, makes connection and then disconnects,\ntakes 10 minutes to respond my query, 10k requests arrive at the same time, ...?\n\nIT community is divided into 2 camps:\n\n1. Make system fault-tolerant, catch exceptions, check error codes, keep faults from becoming errors\n2. \"let it crash\", so you can restart from a good known state\n\n## Chapter 4: Stability Anti-patterns\n\nAntipatterns that can wreck the system, they create, accelerate or multiply cracks in the system. These bad behaviours\nshould be avoided.\n\nYou have to set the socket timeout if you want to break out of blocking call, for example request may be stuck in the\nlistening queue for minutes or forever. Network failure can hit you in 2 ways: fast (immediate exception, e.g.\nconnection refused) or slow (dropped ACK). The blocked thread can't process other transactions, so overall capacity is\nreduced. If all threads are blocked, from practical point of view, the server is down.\n\nSometimes not every problem can be solved at the level of abstraction where it manifests. Sometimes the causes\nreverberate up and down the layers. You need to know how to drill through at least two layers of abstraction to find the\nreality at that level in order to understand problems.\n\nREST with JSON over HTTP is the lingua franca for services today. HTTP-based protocols have their own issues:\n\n- TCP connection can be accepted but never respond to HTTP request\n- provider may accept the connection but not read the request\n- provider may send back a response the caller doesn't know how to handle\n- provider may send back a response with a content type the caller doesn't expect or know how to handle\n- provider may claim to be sending JSON but in actually sending plain text\n\nTreat response as data until you have confirmed it meets your expectations.\n\nLibraries can have bugs too, they all have the variability in quality, style, and safety that you see from any other\nrandom sampling of code.\n\nThe most effective stability patterns to combat integration points failures are *Circuit Breaker* and *Decoupling\nMiddleware*.\n\nBEWARE NECESSARY EVIL - every integration point will fail in some way, you need to be prepared.\n\nPREPARE FOR MANY FORMS OF FAILURE - failure may take several forms: network errors, semantic errors, slow response, ...\n\nKNOW WHEN TO OPEN UP ABSTRACTIONS - debugging integration point failures usually requires peeling back a layer of\nabstraction\n\nFAILURES PROPAGATE QUICKLY - failure in remote systems quickly becomes your problem, when your code isn't defensive\nenough\n\nAPPLY PATTERNS TO AVERT INTEGRATION POINT PROBLEMS - use patterns like Circuit Breaker, Timeouts, Decoupling Middleware\nand Handshaking - discussed later\n\nHorizontal scaling - adding capacity through adding more servers, fault tolerance through redundancy. Vertical scaling -\nscaling by building bigger and bigger servers (more cores, memory and storage).\n\nRECOGNISE THAT ONE SERVER DOWN JEOPARDISED THE REST - a chain reaction can happen because the death of one server makes\nthe others pick up the slack\n\nHUNT FOR RESOURCE LEAK - most of the time, chain reactions happens when application has a memory leak\n\nHUNT FOR OBSCURE TIMING BUGS - race conditions can be triggered by traffic, if one server dies because of deadlock, the\nincreased load on the others makes them more likely to hit the deadlock too\n\nUSE AUTOSCALING - create health-checks for every autoscaling group, the scaler could shut down instances that fail their\nhealth checks and start new ones\n\nDEFEND WITH BULKHEADS - partitioning servers with Bulkheads - more details later.\n\nCascading failures - occurs when a crack in one layer triggers a crack in a calling layer. If caller handles errors\nbadly it will start to fail, resulting in cascading failure (for example database failure is going to impact any system\nthat is calling the database). Every dependency is a chance for a failure to cascade.\n\n- a cascading failure often results from a resource pool (e.g. connection pool) that gets exhausted, safe resource pools\n  always limit the time a thread can wait to check out a resource\n- defend with timeouts and circuit breaker\n\nCapacity is the maximum throughput your system can sustain under a given workload while maintaining acceptable\nperformance. Breaking limits creates cracks in the system. Limits:\n\n- heap memory - for example in memory-based sessions, memory can get short- many things can go wrong: out-of-memory\n  exceptions, not working logging. It is possible to use Weak References - Garbage Collection may reclaim memory if it\n  is too low (before out-of-memory error occurs). Callers have to behave nicely when payload is gone. Weak references\n  are useful but they do add complexity.\n- off-heap memory, off-host memory - for example Redis, but this is slower than local memory and there is a problem with\n  replication\n- number of sockets on the server is limited, every request corresponds to an open socket, the OS assigns inbound\n  connections to an ephemeral port that represents the receiving side of the connection. Because of TCP packet format,\n  one server can have up to 64 511 connections open. How can we serve millions of concurrent connections? The virtual IP\n  addresses.\n- closed sockets can be problematic too - before socket can be reused it goes through couple of states, for example\n  bongos defence algorithm. Bogon is a wandering packet that got routed inefficiently and arrives late (out of sequence)\n  , if socket were reused too quickly, late packet could trigger response.\n\nCookies are a clever way to pass state back and forth from client to server and vice versa. They allow all kinds of new\napplications, such as personalised portals and shopping sites. Cookies carry small amount of data because they need to\nbe encrypted and this is CPU heavy task.\n\nA session is an abstraction that makes building applications easier. All the user really sends are series of HTTP\nrequests, the server receives them, compute and returns response. Sessions are about caching data in memory.\n\nTruly dangerous users are the ones that target your website, once you are targeted, you will almost certainly be\nbreached.\n\nAdding complexity to solve one problem creates the risk of entirely new failure modes, e.g. multithreading - enables\nscalability but also introduces concurrency errors.\n\nCaching can be a powerful response to performance problem, however caching can cause troubles - it can eat away at the\nmemory available for the system, when that happens the garbage collector will spend more and more time attempting to\nrecover enough memory to process requests. You need to monitor hit rates for the cached items to see whether most items\nare being used from cache. **Caches should be built using weak references to hold the cached item itself.** It will help\nthe GC reclaim the memory.\n\nLibraries are notorious sources of blocking threads.\n\nSelf-Denial Attack - any situation in which the system conspires against itself. For example a coupon code sent to 10k\nusers to be used at certain date is going to attract millions of users (like XBOX preorder). Self-Denial can be avoided\nby building a shared-nothing architecture (no databases nor other resources)  - ideal horizontal scaling. Talk to\nmarketing department when they are going to send out mass emails - you will be able to pre-scale (prepare some\nadditional instances for increased load). Also be careful with open links to the resources, also watch out for Fight\nClub bugs - increased front-end load causes exponentially increasing backend processing.\n\nWith point-to-point connections, each instance has to talk directly to every other instance - this means O(n^2) scaling\n\n- be careful. Point-to-point communication can be replaced by: UDP broadcasts, TCP/UDP multicast, pub/sub messaging,\n  message queues.\n\nXP principle: Do the simplest thing that will work.\n\nWatch out for shared resources - they can be a bottleneck, stress-test it heavily, be sure clients will keep working\ndespite malfunctioning resource.\n\nFrontend always has the ability to overwhelm the backend, because their capacities are not balanced. However, you can\nnot build every service to be large enough to serve enormous load from the frontend - instead you myst build services to\nbe resilient in the face of tsunami of requests (e.g. Circuit Breaker, Handshaking, Back-pressure, Bulkheads).\n\nDog-pile - when a bunch of servers impose transient load all at once (term from American football). Can occur: when\nbooting all servers at once, on cron job, when the config management pushes out a change. Use random clock slew to\ndiffuse the demand from cron job (every instance does something at different time). Use a backoff algorithm so every\nclient retries at different time.\n\nInfrastructure management tools can cause a lot of trouble (e.g. Reddit outage) - build limiters and safeguards into\nthem, so they won't destroy entire system at once.\n\nSlow response is worse than refusing a connection or returning an error - because ties up resources in the calling\nsystem and in the called system. Slow responses usually result from excessive demand. System should have the ability to\nmonitor its own performance, so it can also tell when it isn't meeting its SLAs (service-level agreement).\n\nWhy slow responses are dangerous: because they trigger cascading failures, users hitting *reload* button cause even more\ntraffic to already overloaded system. If system tracks its own responsiveness, then it can tell when it is getting slow.\nIn such situation developer should consider sending an immediate error response.\n\n> Design with scepticism, and you will achieve resilience. Ask \"What can system X do to hurt me\" and then design a way\n> to dodge whatever wrench your supposed ally throws.\n\nUse realistic data volumes - typical development and test data sets are too small to exhibit problems, you need\nproduction size-data to see what happens when your query returns a million rows that you turn into objects. Calls should\nbe paginated. Do not rely on data providers, once they will go *berserk* and fill up a table for no reason.\n\n## Chapter 5: Stability Patterns\n\nHealthy patterns to reduce, eliminate or mitigate the effects of cracks in the system. Apply patterns wisely to reduce\nthe damage done by an individual failure.\n\nTIMEOUTS - Today every application is a distributed system, every system must grapple with the fundamental nature of\nnetworks - they are fallible. When any element breaks, code can't wait forever for a response that may never come -\nsooner or later. *Hope is not a design method*.\n\nTimeout is a simple mechanism allowing you to stop waiting for an answer once you think it will not come. Well-placed\ntimeouts provide fault isolation - **a problem in some other service does not have to become your problem**.\n\nTimeouts can also be relevant within a single service. Any resource pool can be exhausted. Any resource that block\nthreads must have a timeout to ensure that calling threads eventually unblock.\n\nTimeouts are often found in the company of retries, fast retries are very likely to fail again (wait between retries).\n\nCIRCUIT BREAKER - in the past houses were catching fire because of heated wires, when too many appliances were connected\nto the power source. Energy industry came up with a device that fails first in order to prevent fire.\n\nThe circuit breaker exists to fail without breaking the entire system, furthermore once the danger has passed, the\ncircuit breaker can be reset to restore full function to the system.\n\nThe same technique can be applied to software, dangerous operations can be wrapped with a component that can circumvent\ncall when the system is not healthy.\n\nIn a closed state, the circuit breaker executes operations as usual (calls to another system or other internal\noperations that are subject to timeout or other failure), if it fails, the circuit breaker makes a note of the failure.\nOnce the number of failures exceeds a threshold, the circuit breaker opens the circuit. When the circuit is open, calls\nare suspended - they fail immediately. After some time the circuit decides the operation has a chance of succeeding, so\nit goes to the half-open state, if the call succeeds - goes to the open state, if not - returns to the open state.\n\nThe circuit breaker can have different thresholds for different types of failures. Involve stakeholders to decide how\nsystem should behave when circuit is open.\n\nHow to measure number of failures - interesting idea is Leaky Bucket - separate thread counting failures and\nperiodically removing them. If buckets become empty quickly it means, the system is flooded with errors.\n\nIt should be possible to automatically open/close circuit.\n\nCircuit Breaker - don't do it if it hurts. Use it with timeouts. Ensure proper reporting of opened circuit.\n\nBULKHEADS - in a ship, bulkheads prevents water from moving from one compartment to another. You can apply the same\ntechnique, by partitioning the system, you can keep a failure in one part of the system from destroying everything. This\ncan be achieved by for example running application on multiple servers - if one fails we still have redundancy (e.g.\ninstances across zones and regions in AWS).\n\nBulkhead partitions capacity to preserve partial functionality when bad things happen. Granularity should be picked\ncarefully - thread pools in the application, CPUs, servers in a cluster. Bulkheads are especially useful in\nservice-oriented or microservice architectures in order to prevent chain reactions and entire company go down.\n\nSTEADY STATE - every time human touches a severer it is an opportunity for unforced errors. It is best to keep people\noff production systems to the greatest extent possible. People should treat servers as \"cattle\", not \"pets\", they should\nnot be logged to the server all the time to watch if everything is fine.\n\nThe Steady State pattern says that for every mechanism that accumulated a resource (log files, rows in the database,\ncaches in memory), some other mechanism must recycle that resource. Several types of sludge that can accumulated and how\nto avoid the need for fiddling:\n\n- data purging - easy to do, however can be nasty, especially in relational databases there is a risk of leaving\n  orphaned rows, also you need to make sure application will work when the data is gone.\n- log files - logs are valuable source of information, however if left unchecked are risk. When logs fill up the\n  filesystem, they jeopardise stability. Configure log file retention based on size. Probably best you can do is to\n  store logs on some centralised server (especially if you are required to store logs for years because of compliance\n  regime). Logstash - centralised server for logs, where they can be indexed, searched and monitored.\n- in-memory caching - improper usage of caching is the major cause of memory leaks, which in turn lead to horrors like\n  daily server restarts. Limit the amount of memory a cache can consume.\n\nSteady State encourages better operational discipline by limiting the need for system administrators to log on to the\nproduction servers.\n\nFAIL FAST - if the system can determine in advance that it will fail; at an operation, it is always better to fail fast\n\n- the caller does not have wasted its capacity for waiting. No, you don't need Deep Learning team to tell whether it\n  will fail. Example: if call requires database connection, application can quickly check if database is available.\n  Other approach is to configure load balancer appropriately (no servers - reject request). Use request validation to\n  know if data is correct.\n\nThe Fail Fast pattern improves overall system stability by avoiding slow responses.\n\nLET IT CRASH - there is no way to test everything or predict all the ways a system can break. We must assume that errors\nwill happen.\n\nThere must be a boundary for trashiness. We want to crash a component in isolation, the rest of the system must protect\nitself from a cascading failure. In a microservice architecture, a whole instance of the service might be the right\ngranularity.\n\nWe must be able to get back to clean state and resume normal operation as quickly as possible - otherwise we will see\nperformance degradation.\n\nSupervisors need to keep close track of how often they restart child processes. It might be necessary to restart\nsupervisor. Number of restarts can indicate that either the state is not sufficiently cleaned up of the system is in\njeopardy and the supervisor is just masking the underlying problem.\n\nThe final element of a \"let it crash\" is reintegration - the instance must be able somehow to join the pool to accept\nthe work. This can be done through health checks on instance level.\n\nHANDSHAKING - can be most valuable when unbalanced capacities are leading to slow responses. If the sever can detect\nthat it will not be able to meet its SLAs, then it should have some means to ask the caller to back off. It is an\neffective way to stop cracks from jumping layers, as in the case of a cascading failure.\n\nThe application can notify the load balancer through a health check that is not able to take more requests (503 - Not\nAvailable), then the load balancer knows not to send any additional work to that particular server.\n\nTEST HARNESSES - you can create test harnesses to emulate the remote system on the other end of each integration point.\nA good test harness should be as nasty and vicious as real-world systems will be.\n\nA test harness runs as a separate server, so it is not obliged to conform to the defined interface. It can provoke\nnetwork errors, protocol errors or application level errors.\n\nConsider building a test harness that substitutes for the remote end for every web services call.\n\nIntegration testing environments are good at examining failures only in the seventh layer of the OSI model (application\nlayer) - and not even all of those.\n\nThe test harness can be designed like an application server - it can have pluggable behaviour for the tests that are\nrelated to the real application. Broadly speaking, a test harness leads toward \"chaos engineering\".\n\nThe Test Harness pattern augments other testing methods. It does not replace unit tests, acceptance test, penetration\ntests and so on.\n\nDECOUPLING MIDDLEWARE - middleware is a graceless name of tools that inhabit a singularly messy space - integrating\nsystems that were never meant to work together. The connective tissue that bridges gaps between different islands of\nautomation.\n\nMiddleware, integrates systems by passing data and events back and forth between systems, decouples them by letting the\nparticipating systems remove specific knowledge of and calls to the other systems.\n\nTightly coupled middleware amplifies shocks to the systems, synchronous calls are particularly vicious amplifiers that\nfacilitate cascading failures (this includes JSON over HTTP).\n\nMessage oriented middleware decouples the endpoints in bots space and time, because the requesting system doesn't just\nsit around and wait for a reply. This form of middleware cannot produce a cascading failure.\n\nSHED LOAD - applications have zero control over their demand, at any moment, more that a billion devices could make a\nrequest.\n\nServices should model TCPs approach: When load gets too high, start to refuse new requests for work. This is related to\nFail Fast.\n\nThe ideal way to define \"load is too high\" is for a service toi monitor its own performance relative to its SLA. When\nrequests take longer than SLA, it is time to shed some load.\n\nCREATE BACK PRESSURE - every performance problem starts with a queue backing up somewhere, if a queue is unbounded, it\ncan consume all available memory. As queue's length reaches toward infinity, response time also heads toward infinity.\n\nBlocking the producer is a kind of flow control. It allows the queue to apply \"back pressure\" upstream. Back pressure\npropagates all the way to the ultimate client, who will be throttled down in speed until the queue releases.\n\nTCP uses back pressure - once the window is full, senders are not allowed to send anything until released.\n\nGOVERNOR - machines are great at performing repetitive tasks, humans are great at perceiving high level situation.\n\nIn 18th century steam engineers discovered it is possible to run machines so fast that the metal breaks. The solution\nwas the governor - a person which limits the speed of an engine.\n\nWe can create governors to slow down the rate of actions. A governor is stateful and time-aware, it knows what actions\nhave been taken over a period of time. (Reddit uses a governor to slow down the autoscaler, by adding logic that says it\ncan only shut down a certain percentage of instances at a time).\n\nThe whole point of a governor is to slow things down enough for humans to get involved.\n\n## Chapter 6: Case Study: Phenomenal Cosmic Powers, Itty-Bitty Living Space\n\nLaunching a new site is like having a baby. You must expect certain thing, such as being awakened in the middle of the\nnight. Monitoring technology provides a great safety net, pinpointing problems when they occur, but nothing beats the\npatter-matching power of the human brain.\n\nResponse time is always a lagging indicator. You can only measure the response time on requests that are done. So\nwhatever your worst response time may be, you can't measure it until the slowest requests finish. Requests that didn't\ncomplete, never got averaged in.\n\nRecovery-Oriented Computing - principles:\n\n- Failures are inevitable, in both hardware and software.\n- Modeling and analysis can be never sufficiently complete. A priori prediction of all failure modes is not possible.\n- Human action is a major source of system failures.\n\nInvestigations aim to improve survivability in the face of failures. The ability to restart single components, instead\nof entire servers, is a key concept of recovery-oriented computing.\n\n## Chapter 7: Foundations\n\nDesigning for production means thinking about production issues as first-class concerns (network, logging, monitoring,\nruntime control, security, people who do operations). There are several layers of concerns:\n\n1. Operations - security, availability, capacity, status, communication\n2. Control Plane - system monitoring, deployment, anomaly detection, features\n3. Interconnect - routing, load balancing, failover, traffic management\n4. Instances - services, processes, components, instance monitoring\n5. Foundation - hardware, VMs, IPs\n\nVirtualization promised developers a common hardware appearance across the bewildering array of physical configurations\nin the data centre. On the downside, performance is much less predictable. Many virtual machines can reside on the same\nphysical hosts. It is rare to move VMs from one host to another.\n\nWhen designing applications to run in virtual machines you must make sure that they are not sensitive to the loss or\nslowdown of the host.\n\nA clock on the VM is not monotonic and sequential, because VM can be suspended for an indefinite span of real time. The\nbottom line is: don't trust the OS clock. If external time is important, use an external source like a local NTP server.\n\nContainers have short-lived identity. As a result, it should not be configured on a per-instance basis. Container won't\nhave much, if any, local storage, so the application must rely on external storage for files, data, and maybe even\ncache.\n\nWhen you design an application for containers, keep a few things in mind: the whole container image moves from\nenvironment to environment, so the image can't hold things like production database credentials. Containers should not\ncontain hostnames or port numbers - because the setting needs to change dynamically while the container image stays the\nsame. Containerised applications need to send their telemetry out to a data collector.\n\nThe 12-Factor App [12factor.net] - created by engineers at Heroku, is a succinct description of a cloud-native,\nscalable, deployable application:\n\n1. Codebase - track one codebase in revision control. Deploy the same build to every environment.\n2. Dependencies - explicitly declare and isolate dependencies.\n3. Config - store config in the environment.\n4. Backing services - treat backing services as attached resources.\n5. Build, release, run - strictly separate build and run stages.\n6. Process - execute the app as one or more stateless processes.\n7. Port binding - export services via port binding.\n8. Concurrency - scale out via process model\n9. Disposability - maximise robustness with fast startup and graceful shutdown.\n10. Dev-prod parity - keep environment, staging and production as similar as possible.\n11. Logs - treat logs as event streams.\n12. Admin processes - run admin / management tasks as one-off processes.\n\n## Chapter 8: Processes on Machines\n\nService - a collection of processes across machines that work together to deliver a unit of functionality.\n\nInstance - an installation on a single machine out of a load-balanced array of the same executable.\n\nExecutable - an artefact that a machine can launch as process and created by build process.\n\nProcess - an operating system process running on a machine.\n\nInstallation - the executable and any attendant directories, configuration files and other resources.\n\nDeployment - the act of creating an installation on a machine.\n\nDevelopers should not do production builds from their now machines. Developer boxes are hopelessly polluted. We install\nall kinds of junk on these systems, play games and visit sketchy websites. Only make production builds on a CI server,\nand have it put the binary into a safe repository that nobody else can write into.\n\nConfiguration management tools like Chef, Puppet and Ansible are all about applying changes to running machines. They\nuse scripts, playbooks or recipes to transition the machine from one state to a new state.\n\nWe don't want our instance binaries to change per environment, but we do want their properties to change. That means the\ncode should look outside the deployment directory to find per-environment configurations.\n\nZooKeeper and etc are popular choices for a configuration service - but any outage to these systems can cause a lot of\ntrouble.\n\nShipboard engineers can tell when something is about to go wrong by the sound of the giant Diesel engines. We must\nfacilitate that awareness by building transparency into our systems. Transparency refers to the qualities that allow\noperators, developers and business sponsors to gain understanding of the system's historical trends, present conditions,\ninstantaneous state and future projections. Debugging a transparent system I s vastly easier, so transparent systems\nwill mature faster that opaque ones. System without transparency cannot survive long in production.\n\nTransparency arises from deliberate design and architecture. Instances should log their health and events to a plain old\ntext file. Any log-scraper can collect these without disturbing the server process. Logging is certainly a white-box\ntechnology, it must be integrated pervasively into the source code.\n\nNot every exception needs to be logged as an error. Just because a user entered a bad card number and the validation\ncompound threw an exception doesn't mean anything has to be done about it. Log errors in business logic or user input as\nWARNINGs. Reserve ERROR for a serious system problem.\n\nLogs have to present clear, accurate and actionable information to the humans who read them.\n\nMessage should include an identifier that can be used to trace the steps of a transaction.\n\nHealth Checks should be more that just \"yup, it is running\", it should report at least: IP, interpreter version,\napplication version, if instance is accepting work, the status of connection pools, caches and circuit breakers. Load\nbalancer can use the health check for the \"go live\" transition too. When the health check on a new instance goes from\nfailing to passing, it means the app is done with its startup.\n\n## Chapter 9: Interconnect\n\nThe interconnect layer covers all the mechanisms that knit a bunch of instances together into a cohesive system. That\nincludes a traffic management. Load balancing and discovery. This is the layer where we can really create high\navailability.\n\nConsul - dynamic discovery service, suited for large teams with hundreds of small services. On the other hand small\nbusiness with just a few developers would probably stick with direct DNS entries.\n\nDNS might be the best choice for small teams, particularly in a slowly changing infrastructure. When using DNS, it is\nimportant to have a logical service name to call, rather than physical hostname. Even if that logical name is just an\nalias to the underlying host, it is still preferable. DNS round-robin an easy approach to load balancing but suffers\nfrom putting too much control in the client's hands. DNS outage can be serious, do it should not be hosted on the same\ninfrastructure as production system. There should be more than one DNS provider with servers on different locations.\n\nAlmost everything we build today uses horizontally scalable farms of instances that implement request/reply semantics.\nHorizontal scaling helps with overall capacity and resilience, but it introduces the need for load balancing. Load\nbalancing is all about distributing requests across a pool of instances to serve all requests correctly in the shortest\nfeasible time.\n\nSoftware Load Balancing - low cost approach, uses an application to listen for requests and dole them out across the\npool of instances. This is basically a reverse proxy (proxy - multiplexes any outgoing calls into a single source IP\naddress, reverse proxy - demultiplexes calls coming into a single IP address and fans them out to multiple addresses).\nExamples: squid, HAProxy, Apache httpd, nginx.\n\nHardware Load Balancing - specialised network devices that serve a similar role to the reverse proxy server. They\nprovide better capacity and throughput because they operate closed to the network.\n\nOne of the most important services a load balancer can provide is service health checks. The load balancer will not send\nto an instance that fails a certain number of health checks.\n\nLoad balancers can also attempt to direct repeated requests to the same instance. This helps when you have stateful\nservices, like user session state in an application server. Directing the same requests to the same instances will\nprovide better response time for the caller because necessary resources will already be in that instance's memory. A\ndownside of sticky sessions is that they can prevent load from being distributed evenly.\n\nAnother useful way to employ load balancer is \"content based routing\". For example, search requests may go to one set of\ninstances, while user-signup requests go somewhere else.\n\nDemand Control - when, where and how to refuse to work under big demand.\n\n> Every failing system starts with a queue backing up somewhere.\n\nGoing nonlinear - service slowing down under heavy load, this means fewer and fewer sockets available to receive\nrequests exactly when the most requests are coming in.\n\nLoad shedding - under high load, turning away work system can't complete in time, the most important way to control\nincoming demand. We want to shed load as early as possible, so we can avoid tying up resources at several tiers before\nrejecting the request. Service should measure its response time and present it in the health check.\n\nService discovery. Services can announce themselves to begin receiving a load. A caller needs to know at least one IP\naddress to contact a particular service. Service discovery is itself another service, it can fail or get overloaded.\nService discover can be built on top of a distributed data store such as ZooKeeper or etc.\n\nIn CAP theorem, ZooKeeper is a CP system - when there is a network partition, some nodes will not answer queries or\naccept writes. HashiCorp's Consul resamples ZooKeeper, however Consul's architecture places it in the AP area - it\nprefers to remain available and risk stale information when a partition occurs.\n\n## Chapter 10: Control Plane\n\nThe control plane encompasses all the software and services that run in the background to make production load\nsuccessful. One way to think about it is this: if production user data passes through it, it is production software. If\nits main job is to manage other software, it is control plane.\n\nEvery part of control plane is optional if you are willing to make trade-offs. - for example: logging and monitoring\nhelps with postmortem analysis, without it all those will take longer or simply not be done.\n\nMechanical advantage is the multiplier on human effort that simple machines provide. With mechanical advantage, a person\ncan move something much heavier than themselves. It works for good of for ill. High leverage allows a person to make\nlarge changes with less effort.\n\nEvery postmortem review has 3 important jobs to do: Explain what happened. Apologise. Commit to improvement.\n\nAutomation has no judgement. When it goes wrong, it tends to do so really, really quickly. By the time a human perceives\nthe problem, it is a question of recovery rather than intervention. We should use automation for the things humans are\nbad at: repetitive tasks and fast response. We should use humans for the things' automation is bad at: perceiving the\nwhole situation at a higher level.\n\nMonitoring team should be responsible for providing monitoring tools - offer a monitoring service to customers.\n\nLog collectors can work in push (the instance is pushing logs over the network, helpful with containers) or pull mode (\nthe collector runs on a ventral machine and reaches out to all known hosts to remote-copy the logs). Getting all the\nlogs on one host is a minor achievement, the real beauty comes from indexing the logs - then you can search for\npatterns, make trend line graphs and raise alerts when bad things happen. This can be done using Elasticsearch, Logstash\nand Kibana.\n\nCategories of metrics that can be useful:\n\n- Traffic indicators - page requests, transaction count\n- Business transaction for each type - number processed, aborted, conversion rate\n- Users - demographics, number of users, usage patterns, errors encountered\n- Resource pool health - enabled state, total resources, number of resources created, number of blocked threads\n- Database connection health - number of SQLExceptions thrown, number of queries, average response time\n- Data consumption - number of rows present, footprint in memory and on disk\n- Integration point health - state of circuit breaker, number of timeouts, number of requests, average response time,\n  number of good responses, number of network, protocol errors, actual IP address\n- Cache health - items in cache, memory used by cache, cache hit rate, items flushed by garbage collector\n\nCanary deployment - a small set of instances that get the new build first. For a period of time, the instances running\nthe new build coexist with instances running the old build. The purpose of the canary deployment is to reject a bad\nbuild before it reaches the users.\n\nThe net result is that GUIs make terrible administrative interfaces for long-term production operation. The best\ninterface for long-term operation is the command line. Given a command line, operators can easily build a scaffolding of\nscripts, logging and automated actions to keep your software happy.\n\n## Chapter 11: Security\n\nSecurity must be baked in. It is not a seasoning to sprinkle onto your system at the end. You are responsible to protect\nyour consumers and your company.\n\nOWASP Top 10 - catalogued application security incidents and vulnerabilities. Top 10 list represents a consensus about\nthe most critical web application security flaws:\n\n1. Injection - an attack on a parser or interpreter that relies on user-supported input. Classic example - SQL\n   injection, it happens when code bashes strings together to make queries, but every SQL library allows the use of\n   placeholders in query strings. Keep in mind that \"*comes from a user*\", doesn't only mean the input arrived just now\n   in an HTTP request, data from a database may have originated from a user as well. XML parsers are vulnerable as\n   well (XXE injection).\n\n2. Broken Authentication and Session Management - at one time, it was common to use query parameters on URLS and\n   hyperlinks to carry session IDs, not only are thoseIDs are visible to every switch, router and proxy server, they are\n   also visible to humans. Anyone who copies and pastes their link from their browser shares their session. Session\n   hijacking can be dangerous when it is stolen from administrator. OWASP suggest the following guidelines for handling\n   session IDs:\n\n    1. Use long session ID with lots of entropy\n    2. Generate session ID using a pseudorandom number generator with good cryptographic properties (`rand` is not a\n       good choice)\n    3. Protect against XSS to avoid script execution that would reveal session ID\n    4. When user authenticates, generate a fresh session ID\n    5. Keep up to date with security patches and versions, too many systems run outdated versions with known\n       vulnerabilities\n    6. Use cookies to exchange session IDs, do not accept session IDS via other mechanisms\n\n   *Authentication* means we verify the identity of the caller. Is the caller who he or she claims to be? Some dos and\n   don't:\n\n    1. Don't keep passwords in your database\n    2. Never email a password to a user as a part of \"*forgotten password*\" process\n    3. Do apply strong hash algorithm to password. Use \"*salt*, which is some random data added to the password to make\n       dictionary attacks harder\n    4. Do allow users to enter overly long passwords\n    5. Do allow users to paste passwords into GUIs\n    6. Do allow users to paste passwords into GUIs\n    7. Do plan on rehashing passwords at some point in the future. We have to keep increasing the strength of our hash\n       algorithms. Make sure you can change the salt too\n    8. Don't allow attackers to make unlimited authentication attempts\n\n3. Cross-site Scripting - happens when a service renders a user's input directly into HTML without applying input\n   escaping, it is related to injection attacks. Bottom line is: never trust input, scrub it on the way and escape it on\n   the way out. Don't build structured data by smashing strings together.\n\n4. Broken Access Control - refers to application problems that allow attackers to access data they shouldn't. One of\n   common forms of broken access control is \"*direct object access*\", this happens when a URL contains something like a\n   database ID as a query parameter. Solution for this is to reduce the value of URL probing and checking authorisation\n   to objects in the first place. Generate unique but non-sequential identifiers or use a generic ULR that is\n   session-sensitive (`/users/123` -> `/users/me`). Rule of thumb: *If a caller is not authorised to see the contents of\n   a resource, it should be as if the resource doesn't even exist* (`404` instead of `403`). When a request involves a\n   file upload, the caller can overwrite any file the service is allowed to modify. The only safe way to handle file\n   uploads is to tread the client's filename as an arbitrary string to store in a database field. Don't build a path\n   from the filename in the request.\n\n5. Security Misconfiguration - default passwords are a serious problem. Security misconfiguration usually takes the form\n   of omission. Servers enable unneeded features by default. Admin consoles are a common source of problems. Another\n   common security misconfiguration relates to servers listening too broadly. You can improve information security right\n   away by splitting internal traffic onto its own NIC separate from public-facing traffic. Make sure every\n   administrator uses a personal account, not a group account. Go ahead and add some logging to those administrative and\n   internal calls.\n\n6. Sensitive Data Exposure - credit cards, medical records, insurance files, purchasing data, emails - all these\n   valuable things people can steal from you or use against you. Hackers don't attack your strong points, they look for\n   cracks in your shell. It can be as simple as employee's stolen laptop with a database extract in a spreadsheet. Some\n   guidelines:\n\n    1. Don't store sensitive information that you don't need\n    2. Use HTTP Strict Transport Security - it prevents clients from negotiating their way to insecure protocols\n    3. Stop using SHA-1\n    4. Never store passwords in plain text\n    5. Make sure sensitive data is encrypted in the database\n    6. Decrypt data based on the user's authorisation, not the server's\n\n   Consider using AWS Key Management Service. Application can request data encryption keys, which they use to encrypt or\n   decrypt data. HashiCorp Vault - alternative to AWS KMS.\n\n7. Insufficient Attack Protection - always assume that attackers have unlimited access to other machines behind\n   firewall. Services do not typically track illegitimate requests by their origin. They do not block callers that issue\n   too many bad requests. That allows an attacking program to keep making calls. API Gateways are a useful defence here.\n   An API Gateway can block callers by their API key. It can also throttle their request rate. Normally this helps\n   preserve capacity. In the case of an attack, it slows the rate of data compromise, thereby limiting the damage.\n\n8. Cross-Site Request Forgery - used to be a bigger issue than it is now. A VCSRF attack starts on another website, an\n   attacker uses a web page with JS, CSS or HTML that includes a Lin to your system. When the hapless user's browser\n   accesses your system, your system thinks it is a valid request from that user. Make sure that requests with side\n   effects (password change, mailing address update, purchases) use anti-CSRF tokens. These are extra fields containing\n   random data that your system emits when rendering a form. Most frameworks today do this for you. You can also tighten\n   up your cookie policy with the \"*SameSite*\" property. The SameSite attribute causes browser to send the cookie only\n   if the documents' origin is the same as the target's origin. SameSite cookie may require change session management\n   approach.\n\n9. Using Components with Known Vulnerabilities - most successful attacks are not the exciting \"*zero day, rush to patch\n   before they get it*\". Most attacks are mundane. It is important to keep applications up-to-date.\n\n10. Underprotected APIs - it is essential to make are sure that APIs are not misused. APIs must ensure that malicious\n    request cannot access data the original user would not be able to see. API should use the most secure means\n    available to communicate. Make sure the parser is hardened against malicious input. Fuzz-testing APIs is especially\n    important.\n\nThe principle of Least Privilege - a process should have the lowest level of privilege needed to accomplish the task.\nAnything application services need to do, they should do as nonadministrative users. Containers provide a nice degree of\nisolation from each other. Instead of creating multiple application-specific users on the host operating system, you can\npackage each application into its own container.\n\nConfigured Passwords - at the absolute minimum, passwords to production databases should be kept separate from any other\nconfiguration files. Password vaulting keeps passwords in encrypted files, which reduces the security problem. AWS Key\nManagement Service is useful here. With KMS applications use API calls to acquire decryption keys. That way the\nencrypted data don't sit in the same storage as the decryption keys.\n\nFrameworks can't protect you from the Top 10, neither can a one-time review by your company's applications security\nteam. Security is an ongoing activity. It must be part of system's architecture. You must have a process to discover\nattacks.\n\n## Chapter 12: Case Study: Waiting for Godot\n\n## Chapter 13: Design for Deployment\n\nHow to design applications for easy rollout - packaging, integration point versioning and database schema.\n\nOnce upon a time, we wrote our software, zipped it up and threw it over the wall to the operations, so they could deploy\nit. Operations would schedule some *planned* downtime to execute the release. HOWEVER, users should not care about\ndowntime, application should be updated without them knowing about the release.\n\nMost of the time, we design for the state of the system after a release. It assumes the whole system can be changed in\nsome instantaneous quantum jump. We have to treat deployment as a feature. Three key concerns: automation, orchestration\nand zero-downtime deployment.\n\nAUTOMATED DEPLOYMENTS. Build pipeline is the first tool of interest. It picks up after someone commits a change to VCS.\nBuild pipelines are often implemented with CI servers. CI would stop after publishing a test report and an archive, the\nbuild pipeline goes beyond - run a series of steps that culminate in a production deployment (deploy code to trial env,\nrun migrations, perform integration tests). Each stage of build pipeline is looking for reasons to reject the build -\nfailed tests, lint complaints, integration fails.\n\nTools: Jenkins, GoCD, Netflix Spinnaker, AWS Code Pipeline. Do not look for the best tools, pick one that suffices and\nget good with it. Avoid analysis trap.\n\nAt the end of the build pipeline, build served interacts with one of the configuration management tools.\n\nBetween the time a developer commits code to the repository and the time it runs in production, code is a pure\nliability. It may have unknown bugs, may break scaling or cause production downtime. It might be a great implementation\nof a feature nobody wants. The idea of continuous deployment is to reduce that delay as much as possible to reduce the\nliability of undeployed code.\n\nA bigger deployment with more change is definitely riskier. \"*If it hurts, do it more often*\" - do everything\ncontinuously, for the build pipeline it means - run the full build on every commit.\n\nShim - a thin piece of wood that fills a gap where two structures meet. In deployments, shim is a bit of code that helps\njoin old and new versions of the application. For example when migrating database, old instances will read from the old\ntable, new instances will be reading from the new table. Shims can be achieved using SQL triggers - insert to one table\nis propagated to the other.\n\n[MUTABLE INFRASTRUCTURE] We typically update machines in batches. You must choose to divide your machines into\nequal-sized groups. Suppose we have five groups: Alpha, Bravo, Charlie, Delta, Foxtrot. Rollout would go like this:\n\n1. Instruct Alpha to stop accepting new requests\n2. Wait for load to drain from Alpha\n3. Run the configuration management tool to update code and config\n4. Wait for green health checks on all machines in Alpha\n5. Instruct Alpha to start accepting requests\n6. Repeat the process for Bravo, Charlie, Delta, Foxtrot\n\nFirst group should be the canary group. Pause there to evaluate the build before moving on to the next group. Use\ntraffic shaping at your load balancer to gradually ramp up the traffic to the canary group while watching monitoring for\nanomalies and metrics.\n\nEvery application should include an end-to-end health check.\n\n[IMMUTABLE INFRASTRUCTURE] To roll code out here, we don't change the old machines. Instead, we spin up new machines on\nthe new version of the code. Machines can be started in the existing cluster or in a new cluster. With frequent\ndeployments, you are better off starting new machines in the existing cluster, that avoids interrupting open connections\nwhen switching between clusters. Be careful about cache and session.\n\nRemember about the post-rollout cleanup - drop old tables, views, columns, aliases, ...\n\nDEPLOY LIKE THE PROS - Currently deployments are frequent and should be seamless. The boundary between operations and\ndevelopment has become fractal. Designing for deployment gives the ability to make large changes in small steps. This\nall rests on a foundation of automated action and quality checking. The build pipeline should be able to apply all the\naccumulated wisdom of your architects, developers, designers, testers and DBAs.\n\nSoftware should be designed to be deployed easily. Zero downtime is the objective. Smaller, easier deployments mean you\ncan make big changes over a series of small steps. That reduces disruption to your users.\n\n## Chapter 14: Handling Versions\n\nIt is better for everyone if we do some extra work on our end to maintain compatibility rather than pushing migration\ncosts out onto other teams. How your software can be a good citizen?\n\nEach consuming application has its own development team that operates on its own schedule. If you want others to respect\nyour autonomy, then you must respect theirs. That means you can't force consumers to match your release schedule. Trying\nto coordinate consumer and provider deployments doesn't scale.\n\nTCP specification (Postel's Robustness Principle):\n\n> Be conservative in what you do, be liberal in what you accept from others.\n\nConsumer and provider must share a number of agreements in order to communicate: connection handshaking and duration,\nrequest framing, content encoding, message syntax, message semantics, authorisation and authentication.\n\nPostel's Robustness Principle can be seen as Liskov Substitution Principle: We can always accept more than we accepted\nbefore, but we cannot less or require more. We can return more than we returned before, but we cannot return less.\n\nHandling breaking changes - best approach is to add a version discriminator to the URL. This is the most common\napproach. You have to support both the old and the new versions for some period of time. Both versions should operate\nside by side. This allows consumers to upgrade as they are able. Internally you want to avoid duplication. Handle this\nin the controller, methods that handle the new API go directly to the most current version of the business logic,\nmethods that handle the old API get updated, so they convert old objects to the current ones on requests and convert new\nobjects to old ones on responses.\n\nWhen receiving requests or messages, your application has no control over the format. The same goes for calling out to\nother services. The other endpoint can start rejecting your requests at any time. After all, they may not observe the\nsame safety rules we just described. Always be defensive.\n\n## Chapter 15: Case Study: Trampled by Your Own Customers\n\nConway's Law:\n\n> If you have four teams working on a compiler, you will get a form-pass compiler.\n\nConway argues, two people must - in some fashion - communicate about the specification for that interface. If the\ncommunication does not occur, interface cannot be built.\n\nSometimes when you ask questions, but you don't get answers, it means nobody knows the answers. At other times, it means\nnobody wants to be seen answering the questions.\n\nLoad testing is about: defining a test plan, creating some scripts, configuring the load generators and test\ndispatchers.\n\nTests often are prepared wrongly, real word is crude and rude, there are scrapers not respecting your cookie policy,\nsearch browsers indexing your website, users doing weird stuff.\n\nMost websites have terms and conditions stating \"*By viewing this page you agree to ...*\", with this you can sue or at\nleast block sources of bots hitting your website millions of times.\n\n## Chapter 16: Adaptation\n\nTo make a change, your company has to go through a decision cycle - plan -> do -> check -> act. In small companies this\ncommunication may involve just one or two people, in larger companies an entire committee. Getting around the cycle\nfaster makes you more competitive. This drives the \"*fail fast*\" motto for startups.\n\nAgile and lean development methods helped remove delay from \"act\", DevOps helps remove even more in \"act\" and offers\ntons of new tools to help with \"observe\".\n\nThrashing - happens when organisation changes direction without taking the time to receive, process and incorporate\nfeedback. You may recognise it as constantly shifting development priorities or an unending series of crises. It creates\nteam confusion, unfinished work and lost productivity. To avoid trashing, try to create a steady cadence of delivery and\nfeedback.\n\nThe platform team should not implement all your specific monitoring rules, instead this team should provide an API that\nlets you install your monitoring rules into the monitoring service provided by the platform.\n\n> If your developers only use the platform because it is mandatory, then the platform is not good enough\n\nThe Fallacy of the DevOps Team - in larger companies, it is common to find a group called DevOps team. This team sits\nbetween development and operations with the goal of moving faster and automating releases into production. *This is an\nantipattern*. DevOps should soften the interface between different teams. DevOps goes deeper than deployment automation.\nIt is a shift from ticket and blame-driven operations with throw-it-over-the-wall releases TO one based on open sharing\nof information and skills, data-driven decision-making about architecture and design, production availability and\nresponsiveness. Isolating these ideas to a single team undermines the whole point.\n\nFrequent releases with incremental functionality allow your company to outpace its competitors.\n\nBlue/green deployment - machines are divided into pools. One pool is active in production. The other pool gets the new\ndeployment. That leaves time to test it before exposing it to customers. Once the new pool looks good, you shift\nproduction traffic over to it.\n\nMore code, means it is harder to change. Large codebases are more likely to become overgeneralised. A shared database\nmeans every change has a higher potential to disrupt. The big service will accumulate complexity faster than the sum of\ntwo smaller services. It is easier to maintain and prune a bonsai juniper than a hundred-foot oak.\n\nThe key to making evolutionary architecture work is failure. You have to try different approaches to similar problems\nand kill the ones that are less successful.\n\nJeff Bezos said that every team should be sized no bugger than you can feed with 2 large pizzas. Important but\nmisleading. It is not just about having fever people on a team. A self-sufficient two-pizza team also means each team\nmember has to cover more than one discipline. You can't have a two-pizza team if you need a dedicated DBA, frontend\ndeveloper, an infra guru a backend developer, an ML expert, a product manager, a GUI designed, and so on. The two-pizza\nteam is about reducing external dependencies. A thousand of dependencies will keep you from breaking free. It is really\nabout having a small group that can be self-sufficient and push things all the way through to production.\n\nNo coordinated deployments - If you ever find that you need to update both the provider and the caller of a service\ninterface at the same time, it is a warning sign that those services are strongly coupled.\n\nEvolutionary architecture is the one that supports incremental, guided c change as a first principle across multiple\ndimensions. Architecture styles:\n\n- Microservice - very small, disposable units of code. Emphasise scalability, team-scale autonomy. Vulnerable to\n  coupling with platform for monitoring, tracing and continuous delivery\n- Microkernel and plugins - in-process, in-memory message passing core with formal interfaces to extensions. Good for\n  incremental change in requirements, combining work from different teams. Vulnerable to language and runtime\n  environment.\n- Event-based - prefers asynchronous messages for communication, avoiding direct calls. Good for temporal decoupling,\n  Allows new subscribers without change to publishers. Allows logic change and reconstruction from history. Vulnerable\n  to semantic change in message formats over time.\n\nMicroservice size: ideally it should be no bigger than what fits in one developer's head.\n\nDon't pursue microservices just because the Silicon Valley unicorns are doing it. Make sure they address a real problem\nyou are likely to suffer. Otherwise, the operational overhead and debugging difficulty of microservices will outweigh\nyour benefits.\n\nSystems should exhibit loose clustering. In a loose cluster, the loss of an individual instance is no more significant\nthan the fall of a single tree in a forest. The members of a cluster should not be configured to know the identities of\nother members of the cluster.\n\nModular systems inherently have more options than monolithic ones. 5 modular operators - borrowed from a hardware:\n\n1. Splitting - breaking things into modules, or a module into submodules. The key with splitting is that the interface\n   to the original modules is unchanged. Before splitting, it handles the whole thing itself. Afterward, it delegates\n   work to the new modules but supports the same interface.\n2. Substituting - is just replacing one module with another (like swapping nVidia card with AMD). The original module\n   and the substitute need to share a common interface.\n3. Augmenting and Excluding - augmenting is adding a module to a system. Excluding is removing one. If you design your\n   parent system to make augmenting and excluding into first-class citizens, then you will reach a different design.\n4. Inversion - works by taking functionality that is distributed in several modules and raising it up higher in the\n   system.\n5. Porting - is about repurposing a module from a different system. Any time we use a service created by a different\n   project or system, we are porting that service to our system. Porting risks adding a coupling.\n\nInformation architecture is how we structure data. It is the data and the metadata we used to describe the things that\nmatter to our systems. It is a set of related models that capture some facets of reality. Your job in building systems\nis to decide what facets of reality matter to your system, how are you going to represent those and how that\nrepresentation can survive over time.\n\nEvents can be used for:\n\n- Notifications - fire and forget, one-way announcement, no response is expected\n- Even-carried state transfer - an event that replicates entities or parts of entities so other systems can do their\n  work\n- Event sourcing - when all changes are recorded as events that describe the change\n- Command-query responsibility segregation - reading and writing with different structures. Not the same as events, but\n  events are often found on the \"command\" side.\n\nVersioning can be a real challenge with events, especially once you have years' worth of them. Stay away from closed\nformats like serialised objects. Look toward open formats like JSON or self-describing messages. Avoid frameworks that\nrequire code generation based on schema. Treat messages like data instead of objects, and you are going to have a better\ntime supporting very old formats.\n\nExtract \"*policy proxy*\", questions of ownership and access control can be factored out of the service itself into a\nmore centrally controlled location.\n\nUse URL dualism to support many databases by using URLs as both the item identifier and a resolvable resource. Be\ncareful you should be able to verify that whatever you receive back is something you generated.\n\nOne of the basic enterprise architecture patterns is the \"Single System of Record\". The idea is that any particular\nconcept should originate in exactly one system, and that system will be enterprise-wide authority on entities within\nthat concept.\n\nWe need to be careful about exposing internal concepts to other systems. It creates semantics and operational coupling\nthat hinders future change.\n\n## Chapter 17: Chaos Engineering\n\nChaos engineering - the discipline of experimenting on a distributed system in order to build confidence in the system's\ncapability to withstand turbulent conditions in production. Staging or Qa environments aren't much of a guide to the\nlarge-scale behaviour of systems in production.\n\nCongested networks behave in a qualitatively different way than uncontested ones. Systems that work in a lo-latency,\nlow-loss network mat break badly in a congested network. Related paradox - *Volkswagen microbus* - you learn how to fix\nthe things that often break. You don't learn how to fix the things that rarely break. But that means when they do break,\nthe situation is likely to be more dire. We want a continuous low level of breakage to make sure our system can handle\nthe big things.\n\nWe use chaos engineering the way a weightlifter uses iron: to create tolerable levels of stress and breakage to increase\nthe strength of the system over time.\n\nAt Netflix, chaos is an opt-out process. That means every service in production will be subject to Chaos Monkey. Other\ncompanies adopting chaos engineering have chosen an opt-in approach. When you are adding chaos engineering to an\norganisation, consider starting with opting-in.\n\nYou must be able to break the system without breaking the bank. It that is not the case, chaos engineering is not for\nyou.\n\n> If you have a wall full of green dashboards, that means your monitoring tools aren't good enough. There is always\n> something weird going on.\n\nMake sure you have a recovery plan. The system may not automatically return to a healthy state when you turn off the\nchaos. You need to know what to restart, disconnect or clean up.\n\nChaos Monkey does one kind of injection - it kills instances (randomly). There are different types of monkeys: Latency\nMonkey, Janitor Monkey, Chaos King, ...\n\nKilling instances is the most basic and crude kind of injection. It will absolutely find weaknesses in your system.\n\nNetflix uses failure injection testing (FIT). FIT can tag a request at the inbound edge with a cookie that says, e.g. \"\nDoes the line, this request is going to fail when service G calls service H\". Netflix uses a common framework for all\nits outbound service calls, so it has a way to propagate this cookie and treat it uniformly.\n\nHigh-reliability organisations use drills and simulations to find the same kind of systematic weaknesses in their human\nside as in the software side. You can make this more fun by calling it a \"*zombie apocalypse simulation*\". Randomly\nselect 50% of your people and tell them they are zombies for the rest of the day.\n\nAfter the simulation review the issues. \n"
  },
  {
    "path": "books/system-design-interview.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# System Design Interview\n\nBook by Alex Xu & Sahn Lam\n\n- [1. Proximity Service](#1-proximity-service)\n\n## 1. Proximity Service\n"
  },
  {
    "path": "books/tidy-first.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Tidy First?\n\nBook by Kent Beck\n\n- [1. Guard Classes](#1-guard-classes)\n- [2. Dead code](#2-dead-code)\n- [3. Normalize symmetries](#3-normalize-symmetries)\n- [4. New Interface, Old implementation](#4-new-interface-old-implementation)\n- [5. Reading Order](#5-reading-order)\n- [6. Cohesion Order](#6-cohesion-order)\n- [7. Move Declaration and Initialization Together](#7-move-declaration-and-initialization-together)\n- [8. Explaining variables](#8-explaining-variables)\n- [9. Explaining constants](#9-explaining-constants)\n- [10. Explicit parameters](#10-explicit-parameters)\n- [11. Chunk statements](#11-chunk-statements)\n- [12. Extract helper](#12-extract-helper)\n- [13. One pile](#13-one-pile)\n- [14. Explaining comments](#14-explaining-comments)\n- [15. Delete redundant comments](#15-delete-redundant-comments)\n- [16. Separate Tidying](#16-separate-tidying)\n- [17. Chaining](#17-chaining)\n- [17. Chaining](#18-batch-sizes)\n- [18. Batch Sizes](#18-batch-sizes)\n- [19. Rhythm](#19-rhythm)\n- [20. Getting Untangled](#20-getting-untangled)\n- [21. First, After, Later, Never](#21-first-after-later-never)\n- [22. Beneficially Relating Elements](#22-beneficially-relating-elements)\n- [23. Structure and behavior](#23-structure-and-behavior)\n\n## 1. Guard Classes\n\nIf you see code like:\n\n```\nif condition: ...\n```\n\nor\n\n```\nif condition:\n    if another condition: ...\n```\n\ntidy the above to:\n\n```\nif not condition: return\nif not another condition: return\n...\n```\n\nExit immediately, it is easier to read -- before we get into the details, there are some preconditions we need to bear\nin mind.\n\nhttps://github.com/Bogdanp/dramatiq/pull/470\n\n## 2. Dead code\n\nDelete it. If you need it later, use version control. Delete only a little code in each tidying diff. Just in case, if\nit turns out that you were wrong, it will be easy to rever the change.\n\n## 3. Normalize symmetries\n\nTidy forms of unnecessary variations. Use common style for your functions. Things get confusing when two or more\npatterns are used interchangeably.\n\n## 4. New Interface, Old implementation\n\nIf some interface you need to use is very difficult to use, implement the interface you wish you could call and call it.\nImplement the interface by simply calling the old one.\n\n## 5. Reading Order\n\nReorder the code in the file in the order in which a reader would prefer to encounter it.\n\n## 6. Cohesion Order\n\nIf 2 functions are coupled, put them next to each other, if 2 files are coupled, put them in the same directory, ...\n\nIf you know how to eliminate coupling, go for it.\n\n## 7. Move Declaration and Initialization Together\n\nIt is easier to understand the code if each of the variables is declared and initialized just before it's used. It is\nhard to read when declaration is separated from initialization.\n\n## 8. Explaining variables\n\nWhen you understand a part of a big, hairy expression, extract the subexpression into a variable named after the\nintention of the expression.\n\nAlways separate the tidying commit from the behaviour change commit.\n\n## 9. Explaining constants\n\nCreate a symbolic constant. Replace uses of the literal constant with the symbol.\n\n## 10. Explicit parameters\n\nIt's common to see blocks of parameters passed in a map. This makes it hard to read and understand what data it\nrequired. Make the parameters explicit:\n\n```\nfoo(params) -> foo(a, b)\n```\n\n## 11. Chunk statements\n\nThe simplest tidying. Put a blank line between 2 parts doing different things. After you've chunked statements, you have\nmany paths forward: Explaining Variables, Extract Helper or Explaining Comments.\n\n## 12. Extract helper\n\nA block of code that has an obvious purpose and limited interaction with the rest of the code can be extracted into a\nhelper function. Using the helper can be taken care of in another tidying.\n\n## 13. One pile\n\nSometimes you read the code that has been split into many tine pieces, which makes it hard to understand. The biggest\ncost of code is the cost of reading it, not the cost of writing it.\n\nSometimes in order to regain the clarity, the code must be merged together, so new, easier-to-understand parts can be\nextracted.\n\n## 14. Explaining comments\n\nWrite down only what wasn't obvious from the code. Put yourself in the place of the future reader, or yourself 15\nminutes ago.\n\nImmediately upon finding a defect is a good time to comment. It is much better to add the comment that points out the\nissue, rather than leaving it buried in the sand.\n\n## 15. Delete redundant comments\n\nWhen you see a comment that says exactly what the code says, remove it.\n\n## 16. Separate Tidying\n\nTidying should go into their own separate PRs, with as few tidyings per PR as possible. Behavior and structure changes\nshould be in separate PRs.\n\n## 17. Chaining\n\nTidying can set up another tidyings. You will begin to flow tidyings together to achieve larger changes to the\nstructure of your code. Be wary of changing too much, too fast. A failed tidying is expensive relative to the cost of a\nseries of successful tidyings.\n\n## 18. Batch Sizes\n\nThe more tidyings per batch, the longer the delay before integrating, and the greater the chance that a tidying collides\nwith someone else is doing.\n\nThe change of a batch accidentally changing behavior rises with the number of tidyings in the batch.\n\nThe more tidyings per batch, the more we are prone to tidying just because, with all the additional costs that creates.\n\nIn many orgs, the fixed cost of getting a single change through review and deployment is substantial. Programmers feel\nthis cost, so they move right in the trade-off space (despite collisions, interactions, ...).\n\n## 19. Rhythm\n\nMore than an hour of tidying at a time before making a behavioral change likely means you've lost track of the minimum\nset of structure changes needed to enable your desired behavior change.\n\nTidying is a minutes-to-an-hour kind of activity. Sometimes it may take longer, but not for long.\n\n## 20. Getting Untangled\n\nTidying leads to more and more tidying. What to do? 3 options:\n\n1. Ship as it is [very impolite, prone to errors, but quick]\n2. Untangle the tidyings into separate PRs [more polite, but may require a lot of work]\n3. Start over, tidying first [more work, but leaves a coherent chain of commits]\n\nRe-implementation raises the possibility that you will see something new as you re-implement, letting you squeeze more\nvalue out of the same set of behavioral changes.\n\n## 21. First, After, Later, Never\n\n**Never**\n\n- you are never changing this code again\n- there is nothing to learn by improving the design\n\n**Later**\n\n- you have a big batch of tidying to do without immediate payoff\n- there is eventual payoff for completing the tidying\n- you can tidy in little batches\n\n**After**\n\n- waiting until next time to tidy first will be more expensive\n- you won't feel a sense of completion if you don't tidy after\n\n**First**\n\n- it will pay off immediately, either in improved comprehension or in cheaper behavior changes\n- you know what to tidy and how\n\n## 22. Beneficially Relating Elements\n\nSoftware design is beneficially relating elements.\n\nElements: Tokens -> Expressions -> Statements -> FUnctions -> Objects/modules -> Systems. Elements have boundaries.\n\nRelating: In software design we have a handful of relations like:\n\n- invokes\n- publishes\n- listens\n- refers\n\nBeneficially relating elements. Software designers can only:\n\n- Create and delete elements\n- Create and delete relationships\n- Increase the benefit of a relationship\n\n```\ncaller()\n    return box.width() + box.height()\n```\n\nThis function has 2 relationships with the box. This relationship can be adjusted. we can have `box.area()`.\n\n```\ncaller()\n    return box.area()\n```\n\nThe benefit is that it is simpler and the cost is that `box` has additional method.\n\n## 23. Structure and behavior\n\nSoftware creates value in two ways:\n\n- what it does today\n- the possibility of new things we can make it do tomorrow\n\nBehavior creates value. Rather than having to calculate a bunch of numbers by hand, the computer can calculate millions\nof the every second. If running software costs $1, you can charge folks $10 to run it on their behalf, then you have a\nbusiness.\n\nThe structure creates options. The structure could make it easy to add new features to our system, or it could make it\nhard.\n\n## 24. Economics: Time Value and Optionality\n\n- A dollar today is worth more than a dollar tomorrow, so earn sooner and spend later\n    - you can't spend it so it's worthless\n    - you can't invest it\n    - there's some chance that you won't get the dollar\n    - in the scope of this book: the time value of money encourages tidy after over tidy first\n- In a chaotic situation, options are better than things, so create options in the face of uncertainty\n\nSoftware design has to reconcile the imperatives of \"earn sooner/spend later\" abd \"create options, not things\".\n"
  },
  {
    "path": "books/understanding-distributed-systems.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Understanding Distributed Systems: What every developer should know about large distributed applications \n\nBook by Roberto Vitillo\n"
  },
  {
    "path": "case-studies/reddit.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# How Reddit mastered managing growth\n\n*Presentation by Greg Taylor*\n\n330M monthly active users. 8th most popular website in the World. 12M posts per month. 2B votes per month.\n\nReddit in 2016 - small engineering team with a monolith application. The Infrastructure team was responsible for\nprovisioning and configuring all infrastructure, operating most of the systems and handling non-trivial debugging.\nStatic infrastructure. This approach worked for more than a decade.\n\nIn 2016 team started rapidly growing. But monolith application was so fragile, every deploy was an adventure - blocker\nfor the organisation.\n\nHow to make everyone's life easier? How to onboard new employees?\n\nReddit decided to pursue with SOA - Service-Oriented-Architecture. This gave better separation of concerns between\nteams. However, if you have a monolith, and it works well for you: \"go home, give it a hug, tell it you love it, warts\nand all\".\n\nGrowing pains: Automated tests - they started using CI, master branch always had to be green.\n\nGrowing pains: Something to build on - instead of copying and pasting services out from another they needed to have a\nservice framework to base off of. Services are configured in the same way, they expose similar set of ports, they have\nthe same async event loop, they fetch secrets the same way, ... - baseplate.readthedocs.io\n\nGrowing pains: Artisanal infrastructure - they had hand-crated infrastructure, switched to Terraform (infrastructure as\ncode) - reusable modules - really valuable. Pulling existing infrastructure to Terraform was painful.\n\nGrowing pains: Staging/integration woes - their approach for staging was inappropriate for SOA, so they started using\nKubernetes.\n\nGrowing pains: Infra team as a bottleneck - everything was depending on the infrastructure team, so they gave developers\nmore freedom to modify Terraform. Not all teams want to operate the full stack for their service.\n\nService ownership, service owner is empowered to:\n\n- Dev and test their service in a prod-like env\n- Do most of the work to get to production\n- Own the health of their service\n- Diagnose issues\n\nService ownership comes with some challenges: you need to train developers and still there are mistakes going to happen.\nMistakes are learning opportunities.\n\nHow to build infrastructure as product? Service owners - learn some Kubernetes basics, deploy and operate their own\nservices. Reddit Infrastructure - Keep the Kubernetes cluster running, provision AWS resources, support and advise\nService owners.\n\nEngineers instead of learning entire stack, had to learn only one technology - Kubernetes. If developer needs e.g. S3 -\ninfra engineer is responsible for providing this.\n\nBatteries included - engineers do not have to worry about logging, secrets, security, ... - everything is out of the\nbox.\n\nExtensive documentation and training for developers. Without it, you don't have a product, you have a pile of\ntechnology.\n\n> An engineer should not require deep infra experience in order to be productive.\n\nPreventing damage: resource limits, throttling, network policy, access controls, scanning for common mistakes, docker\nimage policies\n"
  },
  {
    "path": "conferences/aws-innovate-ai-ml-21.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# AWS Innovate: AI/ML Edition 2021\n\n- [Move and scale your ML experiments in the cloud](#move-and-scale-your-ml-experiments-in-the-cloud)\n- [Detect potential bias in your datasets and explain how your models predict](#detect-potential-bias-in-your-datasets-and-explain-how-your-models-predict)\n- [Deploy state-of-the-art ML models and solutions in a single click](#deploy-state-of-the-art-ml-models-and-solutions-in-a-single-click)\n\nOnline conference took part on 24.02.2021, I participated in a couple of talks.\n\n## Move and scale your ML experiments in the cloud\n\nMachine learning experiments (labeling the data, storage, sharing, saving, tuning parameters) can be done in Amazon\nSageMaker IDE - secure, scalable, compliant solution - DevOps ready solution.\n\n**How to start?** We usually start with local notebooks, which are not powerful enough. You could move your Jupiter\nNotebook to the cloud (doing it on your own - a lot of maintenance), we can do better.\n\nDEMO:\n\n1. Just go to the SageMaker page on AWS\n2. Open SageMaker Studio (limitation: one instance per region)\n3. We are going through Standard setup:\n    1. Authentication method selection (SSO or IAM)\n    2. Permissions: which resources it can access - e.g. storage, by default SageMaker has access to any bucket with \"\n       sagemaker\" in the name\n    3. You can make your notebook shareable\n    4. Network and storage definitions - VPC or Public Internet, security groups, encryption\n    5. You can add your tags to identify resources\n4. Setup will take a few minutes\n\nYou can open the application. This is literally JupyterLab. You can copy for example GitHub repo there and run the\nnotebooks (it has git integration, so switching between branches is easy). You can easily switch machines, largest:\n488GB of RAM!\n\n![aws-innovate-ai-ml-21-1](../_images/aws-innovate-ai-ml-21-1.png)\n\nExample training:\n\n![aws-innovate-ai-ml-21-2](../_images/aws-innovate-ai-ml-21-2.png)\n\nSageMaker is not just a notebook - it allows for data preparation, building models, training, tuning and deployment.\n\n## Detect potential bias in your datasets and explain how your models predict\n\nBias - unfair representation of reality, as we use datasets, there is a risk, that data we use does not represent\nreality.\n\nExplainability - complex models, hard to understand why model came up with a prediction (e.g. deep learning). We need to\nknow why model came up with certain decision, e.g. medicine, legal obligations.\n\n**How to solve these issues?**\n\nWe used some dataset that have the following columns: age, sex, skin colour, ... Zooming in on sex: 1/3 female, 2/3\nmales - imbalanced. Zoom even more, 1:7 for sex earnings with >50k USD. Model can be biased towards overrepresented\ngroup.\n\nSo the first approach is to visualise the data to detect bias. But AWS has something better.\n\n**Analysis using Amazon SageMaker Clarify**\n\nBias analysis: pre-training analysis and post-training analysis. We define \"potential\" biased group: `faced_name=\"Sex\"`.\nResults are displayed in a nice charts (many awesome metrics):\n\n![aws-innovate-ai-ml-21-3](../_images/aws-innovate-ai-ml-21-3.png)\n\nIt also outputs report in HTML and Jupyter Notebook.\n\n**Explainability** - it uses SHAP 🎉 https://github.com/slundberg/shap\n\nFor explainability AWS outputs similar report:\n\n![aws-innovate-ai-ml-21-4](../_images/aws-innovate-ai-ml-21-4.png)\n\n## Deploy state-of-the-art ML models and solutions in a single click\n\nSageMaker Studio. Problem: text analysis, there are 60 models prepared for text analysis. We can select one, e.g.\ntrained on Wikipedia. Then we can deploy the model, we can fine tune the model - we need to provide the dataset in a\nspecial format. Model has an endpoint, which can be tested in the Jupyter Notebook.\n\n![aws-innovate-ai-ml-21-5](../_images/aws-innovate-ai-ml-21-5.png)\n\nWe have a notebook, but we can not give it to the Product Managers, that is why we can integrate it with for example an\nUI. There are libraries for the integration with JavaScript. Example: banana slicer review from Amazon:\n\n![aws-innovate-ai-ml-21-6](../_images/aws-innovate-ai-ml-21-6.png)\n\nNew data flow - tool for preparing a new data. Then you can pass the data to the model to train.\n\n**Remember to shut down the endpoint because you pay for it $$$.**\n"
  },
  {
    "path": "conferences/brown-bags.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n- [NLP - State of the Art](#nlp---state-of-the-art)\n- [Kanban Training](#kanban-training)\n\n## NLP - State of the Art\n\n*By Michał Jakóbczyk*\n\nTuring Test - are you able to distinguish if you are talking to a computer or a person? It determined the direction of\ndevelopment of NLP.\n\n> The Man Who Mistook His Wife For a Hat - Olivier Sacks - book recommendation.\n\nAnalyse sentence:\n\n```python\nfrom spacy import displacy\n\ndisplacy.render(nlp(\"Some sentence\"))\n```\n\n\"They ate the pizza with anchovies\" - context matters (with fishes or using fishes?).\n\n\"They ate the pizza with hands\"\n\n\"I shot an elephant in my pyjamas\" - model will refer pyjama to the elephant.\n\n\"I shot an elephant, in my pyjamas\" - model will refer pyjama to the person.\n\nWe know about these differences! Models have difficulties.\n\n40-50 years ago, NLP was mostly about POS tags analysis, recently is more about machine learning.\n\nPython code -> Assembler <- Machine learning model. In the end everything is Assembly.\n\n*playground.tensorflow.org* - 1 square = 1 neuron that is basically checking one if / one line.\n\nText to number:\n\n- document vectorisation - if document contains word - 1, 0 otherwise\n- one-hot encoding - you can use it for encoding word position (2D matrix) - a lot of memory\n- word embeddings - place word in a multidimensional space\n    - adding vectors - drawing a multidimensional sphere containing multiple words\n    - *projector.tensorflow.org*\n\nWe can compare sentences using embeddings.\n\n```python\nnlp(\"Gave a research talk in Boston\").similarity(nlp(\"Had a science lecture in Seattle\"))\n```\n\nTraining is done using input text, then every word is removed (word by word) and machine is supposed to guess missing\nword.\n\nGPT-3 - the biggest transformer, almost 5M$ spent on training this model\n\n## Kanban Training\n\n*By Marcin Lelek*\n\nhttps://tools.kaiten.io/featureban\n\nKANBAN - card + signal, name of the board, method for implementing improvements requested by client. Created by Toyota.\n\n3 rules:\n\n- stat with what you do now\n- gain agreement to evolutionary change (don't make changes against people, agree on change)\n- encourage acts of leadership at all levels (independent teams)\n\nGeneral practices:\n\n- you need to have a board to visualise progress\n- number of items in Work In Progress is limited\n- manage flow - work flow management, not people optimisation\n- make policies explicit - define policy how to treat a card in a column, e.g. when card moves from one column to\n  another\n- implement feedback loops\n- improve collaboratively\n- evolve experimentally\n\nDifferent levels of Kanban boards - e.g. 1 WIP per person.\n"
  },
  {
    "path": "conferences/pycon-2022.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n- [[EN] Don’t use a lot where a little will do. A story of programming tricks you wish you invented](#en-dont-use-a-lot-where-a-little-will-do-a-story-of-programming-tricks-you-wish-you-invented)\n- [[EN] Effective data science teams with databooks](#en-effective-data-science-teams-with-databooks)\n- [[PL] Poetry - poezja pythonowych pakietów](#pl-poetry---poezja-pythonowych-pakietw)\n- [[EN] Interfaces in Python. The benefits and harms](#en-interfaces-in-python-the-benefits-and-harms)\n- [[EN] Observability in backends with Python and OpenTelemetry](#en-observability-in-backends-with-python-and-opentelemetry)\n- [[EN] Hitchhiker's guide to typing](#en-hitchhikers-guide-to-typing)\n- [[EN] Lightning talks](#en-lightning-talks)\n- [[PL] Dzielenie monolitu w praktyce](#pl-dzielenie-monolitu-w-praktyce)\n- [[EN] pytest on steroids](#en-pytest-on-steroids)\n- [[EN] Music information retrieval with Python](#en-music-information-retrieval-with-python)\n\n## [EN] Don’t use a lot where a little will do. A story of programming tricks you wish you invented\n\nRegex has a debug mode - `re.DEBUG`\n\nPython has a built-in HTTP server capable of serving static files from the current directory.\n\n## [EN] Effective data science teams with databooks\n\n`databooks` - a tool for dealing with notebooks (automatic conflict resolution, metadata stripping, pre-commit hooks,\nprinting a notebook in a terminal, pretty printing git diff)\n\nArchitecture as Code (AaC) with Python or way to become your own boss\n\nPrototyping and visualization of system architecture using code.\n\n`diagrams` - a library for creating diagrams from Python\n\n## [PL] Poetry - poezja pythonowych pakietów\n\nNarzędzie do zarządzania zależnościami, a także do tworzenia pakietów oraz ich publikacji.\n\nMoże zastąpić `pip` czy `virtualenva`.\n\nWersjonowanie semantyczne: `major.minor.patch`\n\n## [EN] Interfaces in Python. The benefits and harms\n\nAbstract classes in Python - ABC\n\nSequence - any collection implementing 2 methods (length and getter)\n\nDependency Injection - passing parameters directly to for example init method.\n\n## [EN] Observability in backends with Python and OpenTelemetry\n\nTrace - a JSON object, can travel between services. Simple types (int, bool, lists, ...)\n\nAuto-instrumentation - install a couple of libraries, run the command, done.\n\nManual-instrumentation - via a context manager or a decorator inside the code.\n\nDistributed tracing with queues - context of the trace is going to be part of the massage that you enqueue.\n\nJagger - one of tools compatible with OpenTelemetry.\n\nunicorn has a separate thread for OpenTelemetry data.\n\n## [EN] Hitchhiker's guide to typing\n\nurllib3 case study: https://sethmlarson.dev/blog/tests-arent-enough-case-study-after-adding-types-to-urllib3\n\n## [EN] Lightning talks\n\nGitHub Actions are capable of running cron jobs.\n\nIdea: when learning a new language, rewrite an existing command line tool in selected language.\n\n## [PL] Dzielenie monolitu w praktyce\n\nKryteria sukcesu wydzielania mikroserwisu: chce szybko widzieć efekty, moc wycofać się w każdym momencie, testować\nsystem z ruchem produkcyjnym, ALE nie chce zepsuć produkcji, chce moc wrócić do starego rozwiązania, jak najmniej\nzmieniać w monolicie.\n\nPrzekształcanie monolitu w mikroserwisu: wydzielenie interfejsu w monolicie, stworzenie mikroserwisu z identycznym\ninterfejsem, dodanie nowej implementacji w monolicie korzystającej z nowego serwisu. Gdy przychodzi zapytanie możemy je\nwysyłać do dwóch miejsc, ostateczna odpowiedź powinna pochodzić ze starego systemu, po okresie testów przełączamy się na\nnowe rozwiązanie.\n\n![pycon-2022-monolith](../_images/pycon-2022-monolith.jpeg)\n\nStrangler Pattern - nazwa pochodzi od rośliny, która pasożytuje na drzewie, wykorzystuje je żeby rosnąć w górę po czym\nje zabija.\n\nDziałanie w Shadow Mode - wydzielenie mikroserwisu, zebranie zapytań i wyników.\n\n## [EN] pytest on steroids\n\nEverything in pytest is a plugin. When you create a fixture you create a local plugin.\n\n## [EN] Music information retrieval with Python\n\n`pedalbord` by Spotify - a python library for audio effects\n\n`Pyo` - audio synthesis engine, effects control, implementing loopers, used in live music\n\n![pycon-2022-apis](../_images/pycon-2022-apis.jpeg)\n\n`ISMIR dataset` - various datasets with music, lyrics, ...\n\n`mirdata` - a Python wrapper for ISMIR datasets\n\n`Librosa` - a library for music analysis\n\nIn general, there are plenty tools for music analysis, which then can be used to train ML models.\n\n![pycon-2022-music-tagging](../_images/pycon-2022-music-tagging.jpeg)\n\n![pycon-2022-source-separation](../_images/pycon-2022-source-separation.jpeg)\n\n![pycon-2022-source-separation-1](../_images/pycon-2022-source-separation-1.jpeg)\n\n![pycon-2022-transcription](../_images/pycon-2022-transcription.jpeg)\n\nMusic recommendations: very complex, massive business and cultural impact:\n\n![pycon-2022-music-recommendations](../_images/pycon-2022-music-recommendations.jpeg)\n\nGenerating music - neural audio synthesis or symbolic composition (then needs to be played by a human).\n\nLinks:\n- https://openai.com/blog/jukebox/\n- https://youtu.be/bXBliLjImio\n- https://youtu.be/MwtVkPKx3RA\n- https://youtu.be/tgq1YTQ2c0s\n- https://magenta.tensorflow.org\n"
  },
  {
    "path": "courses/fast-ai.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Practical Deep Learning for Coders\n\nCourse -> https://course.fast.ai/\n\n[TOC]\n\n## Lesson 1\n\nTruth, to start with Deep Learning:\n\n- high school math is sufficient\n- there is no need for enormous amounts of data\n- no need for expensive hardware for basic usage\n\n1961 first machine built on top of mathematical model from 1943. Heavily criticised by Minsky - example that artificial\nneural network could not learn simple XOR. Global academic gave up on neural networks.\n\n1986 MIT released a paper defining requirements for building and using neural networks. Later researchers proved, that\nadding additional layers of neural networks is enough to approximate any mathematical model. But in fact these models\nwere too slow and too big to be useful.\n\n**What is ML?** Like regular programming, a way to get computers to complete a specific task. Instead of telling the\ncomputer the exact steps to solve a problem, show it examples of the problem to solve and let it figure out how to solve\nit itself.\n\n*Neural network* - parametrised function that can solve any problem to any level of accuracy (in theory - *universal\napproximation theorem)*.\n\nWhat does it mean to train neural network? It means finding good weights. This is called **SDG**. SDG - Stochastic\nGradient Descent.\n\nNeural Networks work using patterns, need labeled data and create PREDICTIONS not recommended actions.\n\nYou need to be super careful what is the input data (initial bias, stereotypic data) will produce biased results. E.g.\nmarihuana consumption is equal amon whites and blacks, but black people are mor often arrested for marijuana possession.\nGiven biased input data will produce biased predictions, e.g. send more police officers to black neighbourhoods.\n\nSegmentation - marking areas on images (trees, cars, ...)\n\n## Lesson 2\n\nWhen you want to predict a category you are facing a classification problem. Whenever you want to predict a number you\nare dealing with regression problem.\n\n```python\nlearn = cnn_learner(data, architecture, metric)\n```\n\nArchitecture - e.g. *resnet32, resnet64* - name of the architecture (64 layers) - function that we are optimising.\n\nEpoch - e.g. looking at every image in the training set = 1 epoch, 1 loop\n\nMetric - function measuring quality of the model's predictions (*error_rate, accuracy*), we care about it.\n\nLoss != Metric, loss - computer uses this to update parameters, computer cares about it. For example tweaking parameters\njust a little might not change accuracy or error rate.\n\nModel might cheat - \"I have seen this image, this is a cat\", we don't want model to memorise images. That is why we need\nsplitting into training and validation. For validating time-series, you should not removed e.g. 20% of the data, instead,\ndrop off the end and let the model predict e.g. next 2 weeks.\n\n*Transfer learning* - using a pretrained model for a task different to what it was originally trained for. Take\npretrained (initial weights), add more epochs on your specific dataset and you will end up with way more better model.\n\n*Fine tuning* - transfer learning technique where the weights of pretrained model are updated by training for additional\nepochs using different task to that used for pretraining.\n\nYou can take advantage of pretrained feature - e.g. dog faces, patterns, etc.\n\nComputer Vision can be used for variety of problems, e.g. sound, virus analysis (data transformed into images).\n\n![fast-ai-1](../_images/fast-ai-1.png)\n\nSet of pretrained models: https://modelzoo.co/\n\n*How to decide if there is a relationship?*\n\n*Null hypothesis* - e.g. \"no relationship between X and Y\" -> gather data -> how often do we see a relationship?\n\n*P-Value* - probability of an observed result assuming that the null hypothesis is true.\n\n## Lesson 3\n\nSquare images are easier to process, you need to remember the length of only one dimension. `Squishing` is the most\nefficient method for resizing, because cropping removes information, adding black bars wastes computations. Another most\ncommon method is `Random Resize Crop` - few batches, different parts of the image are taken\n\nImageClassifierCleaner - utility tool (GUI) for finding examples, classifier is least confident about. You can manually\nimprove labelling.\n\n`VBox` - you can group multiple widgets together and create prototype application in notebook.\n\n`viola` - plugin for hiding cells with code, only inputs and outputs are visible. Add `viola` to the URL, and it will\ndisplay an application-like website in the browser. Great for prototyping.\n\nmybinder.org - you can turn notebook from GitHub into a publicly available web application.\n\n*Healthy skin* example - bing returns images of a young white woman - bias!\n\nBook recommendation: *Building Machine Learning Powered Applications*\n\nFeedback loop - e.g. predictive policing - system that sends police - feedback loops can result in negative implications\nof that bias getting worse and worse - e.g. you send police to the same place over and over.\n\nFastPages - dump notebook into a page.\n\nRecognising hand written digits (MNIST) was considered challenging problem ~20 years ago. Baseline idea: compare model /\nideal number with input - for MNIST, calculate average of the training set, on validation set - calculate distance (~95%\naccuracy). Baseline should be something simple to implement - then you build something on top of it.\n\nBroadcasting - if shapes of 2 elements don't match, e.g. A (1010, 28, 28) - B (28, 28), B will be subtracted from every\n1010 items from A.\n\nPyTorch has engine for calculating derivatives. In PyTorch `_`  at the end of the method means \"method in place\".\n\nLearning rate - size of a step in gradient descent\n\n## Plant Pathology\n\nhttps://www.kaggle.com/c/plant-pathology-2021-fgvc8/overview\n\n```python\nimport csv\nfrom fastai.vision.all import *\nfrom fastai.metrics import error_rate,\n    accuracy\n\npath = Path(\"/kaggle/input/plant-pathology-2021-fgvc8\")\n\n# Prepare data, labels are stored separately:\nwith open(path / \"train.csv\", mode='r') as csv_file:\n    csv_reader = csv.DictReader(csv_file)\n\n    train_labels = {\n        row[\"image\"]: row[\"labels\"]\n        for row in csv_reader\n    }\n\n\n# Function used for labeling images:\ndef label_func(file_path: Path) -> str:\n    return train_labels[str(file_path).split('/')[-1]]\n\n\n# Read data:\ndata_block = DataBlock(\n    blocks=(ImageBlock, CategoryBlock),\n    get_items=get_image_files,\n    get_y=label_func,\n    item_tfms=Resize(224)\n)\n\n# DataBlock to DataLoader:\ndata_loaders = data_block.dataloaders(path / \"train_images\")\n\n# Available classes:\ndata_loaders.vocab\n\n# Few example images:\ndata_loaders.show_batch()\n\n# ResNet34 architecture for image classification:\nlearner = cnn_learner(data_loaders, models.resnet34, metrics=error_rate)\n\n# 4 epochs, unfortunately one epoch takes ~1h most probably because of incorrect use of 'item_tfms' in DataBlock, which disables GPU usage:\nlearner.fine_tune(4)\n\n# Model validation, this model achieved 0.62 error_rate. \ninterpretation = ClassificationInterpretation.from_learner(learner)\ninterpretation.plot_confusion_matrix()\ninterpretation.plot_top_losses(5, nrows=1, figsize=(25, 5))\n\n# Saving model:\nlearner.export()\n```\n\n"
  },
  {
    "path": "patterns/abbreviations.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Abbreviations\n\n- [SOLID](#solid)\n- [DRY - Don't Repeat Yourself](#dry---dont-repeat-yourself)\n- [KISS - Keep It Simple, Stupid](#kiss---keep-it-simple-stupid)\n- [ACID](#acid)\n- [BASE](#base)\n- [CAP](#cap)\n- [NF](#nf)\n\n## SOLID\n\n### SRP - Single Responsibility Principle\n\nA class should have only one reason to change, so in order to reduce reasons for modifications - one class should have\none responsibility. It is a bad practise to create classes doing everything.\n\nWhy is it so important that class has only one reason to change? If class have more than one responsibility they become\ncoupled and this might lead to surprising consequences like one change breaks another functionality.\n\nYou can avoid these problems by asking a simple question before you make any changes: What is the responsibility of your\nclass / component / micro-service? If your answer includes the word “and”, you’re most likely breaking the single\nresponsibility principle.\n\n### OCP - Open-Closed Principle\n\nClasses, modules, functions, etc. should be open to extension but closed to modification.\n\nCode should be extensible and adaptable to new requirements. In other words, we should be able to add new system\nfunctionality without having to modify the existing code. We should add functionality only by writing new code.\n\nIf we want to add a new thing to the application and we have to modify the \"old\", existing code to achieve this, it is\nquite likely that it was not written in the best way. Ideally, new behaviors are simply added.\n\n### LSP - Liskov Substitution Principle\n\nThis rule deals with the correct use of inheritance and states that wherever we pass an object of a base class, we\nshould be able to pass an object of a class inheriting from that class.\n\nExample of violation:\n\n```python\nclass A:\n    def foo() -> str:\n        return \"foo\"\n\n\nclass B(A):\n    def foo(bar: str) -> str:\n        return f\"foo {bar}\"\n```\n\nB is not taking the same arguments, meaning A and B are not compatible. A can not be used instead of B, and B can not be\nused instead of A.\n\n### ISP - Interface Segregation Principle\n\nClients should not be forced to depend upon interfaces that they do not use. ISP splits interfaces that are very large\ninto smaller and more specific ones so that clients will only have to know about the methods that are of interest to\nthem.\n\nExample of violation:\n\n```python\nclass Shape:\n    def area() -> float:\n        raise NotImplementedError\n\n    def volume() -> float():\n        raise NotImplementedError\n```\n\n2D triangle does not have volume, hence it would need to implement interface that is not needed. In order to solve this,\nthere should be multiple interfaces: Shape and 3DShape.\n\n### DIP - Dependency Inversion Principle\n\nHigh-level modules, which provide complex logic, should be easily reusable and unaffected by changes in low-level\nmodules, which provide utility features. To achieve that, you need to introduce an abstraction that decouples the\nhigh-level and low-level modules from each other.\n\n> Entities must depend on abstractions, not on concretions. It states that the high-level module must not depend on the \n> low-level module, but they should depend on abstractions.\n\nFor example password reminder should not have knowledge about database provider (low level information).\n\n## DRY - Don't Repeat Yourself\n\n\"Every piece of knowledge must have a single, unambiguous, authoritative representation within a system\". When the DRY\nprinciple is applied successfully, a modification of any single element of a system does not require a change in other\nlogically unrelated elements.\n\n## KISS - Keep It Simple, Stupid\n\nThe KISS principle states that most systems work best if they are kept simple rather than made complicated; therefore,\nsimplicity should be a key goal in design, and unnecessary complexity should be avoided.\n\n## ACID\n\n### Atomicity\n\nEach transaction is either properly carried out or the process halts and the database reverts back to the state before\nthe transaction started. This ensures that all data in the database is valid.\n\n### Consistency\n\nA processed transaction will never endanger the structural integrity of the database. Database is always in consistent\nstate.\n\n### Isolation\n\nTransactions cannot compromise the integrity of other transactions by interacting with them while they are still in\nprogress.\n\n### Durability\n\nThe data related to the completed transaction will persist even in the cases of network or power outages. If a\ntransaction fails, it will not impact the manipulated data.\n\n## BASE\n\n### Basically Available\n\nEnsure availability of data by spreading and replicating it across the nodes of the database cluster - this is not done\nimmediately.\n\n### Soft State\n\nDue to the lack of immediate consistency, data values may change over time. The state of the system could change over\ntime, so even during times without input there may be changes going on due to 'eventual consistency', thus the state of\nthe system is always 'soft'.\n\n### Eventually Consistent\n\nThe system will *eventually* become consistent once it stops receiving input. The data will propagate to everywhere it\nshould sooner or later, but the system will continue to receive input and is not checking the consistency of every\ntransaction before it moves onto the next one.\n\n## CAP\n\nIn theoretical computer science, the CAP theorem states that it is impossible for a distributed data store to\nsimultaneously provide more than two out of the following three guarantees:\n\n### Consistency\n\nEvery read receives the most recent write or an error. Refers to whether a system operates fully or not. Does the system\nreliably follow the established rules within its programming according to those defined rules? Do all nodes within a\ncluster see all the data they are supposed to? This is the same idea presented in ACID.\n\n### Availability\n\nEvery request receives a (non-error) response, without the guarantee that it contains the most recent write. Is the\ngiven service or system available when requested? Does each request get a response outside of failure or success?\n\n### Partition Tolerance\n\nRepresents the fact that a given system continues to operate even under circumstances of data loss or system failure. A\nsingle node failure should not cause the entire system to collapse.\n\n## NF\n\nDatabase normalisation is the process of structuring a database, usually a relational database, in accordance with a\nseries of so-called normal forms in order to reduce data redundancy and improve data integrity.\n\n### 1NF\n\nTo satisfy 1NF, the values in each column of a table must be atomic.\n\n### 2NF\n\nMust be in 1NF + single column primary key (no composite keys).\n\n### 3NF\n\nMust be in 2NF + no transitive functional dependencies.\n\nTransitive Functional Dependencies - when changing a non-key column, might cause any of the other non-key columns to\nchange. For example:\n\n![3nf-violation](../_images/3nf-violation.png)\n"
  },
  {
    "path": "patterns/architecture.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Architecture Patterns\n\n- [Command and Query Responsibility Segregation (CQRS)](#command-and-query-responsibility-segregation-cqrs)\n- [Reporting Database](#reporting-database)\n- [Event Sourcing](#event-sourcing)\n- [Saga](#saga)\n\n## Command and Query Responsibility Segregation (CQRS)\n\nBased on: https://docs.microsoft.com/en-us/azure/architecture/patterns/cqrs, https://martinfowler.com/bliki/CQRS.html\n, https://bulldogjob.pl/articles/122-cqrs-i-event-sourcing-czyli-latwa-droga-do-skalowalnosci-naszych-systemow_\n\nThis pattern separates read and update operations for a data store. Traditionally the same data model is used to query\nand update a database. This might work well but for simple CRUD applications. For more complex applications, where there\nare more advanced operations on read and write sides CQRS might be a better idea.\n\nCommands update data, queries read data. Commands should be *task based*, rather than *data centric* (book hotel room\ninstead of set `reservation_status` to `reserved`). Queries *never* modify the database.\n\nUsually whenever command updates data it is also publishing an event and this needs to be done within a single\ntransaction.\n\n![patterns-architecture-cqrs-martin-fowler](../_images/patterns-architecture-cqrs-martin-fowler.png)\n\nCQRS:\n\n- you are able to scale Command and Query independently\n- separate models for updating and querying might lead to eventual consistency\n- suited for complex domains\n\n## Reporting Database\n\nBased on: https://martinfowler.com/bliki/ReportingDatabase.html\n\nSet up second database for reporting purposes, this database is completely different from the operational (application)\ndatabase.\n\nReporting Database:\n\n- designed specifically for reports\n\n- can be denormalized, usually read-only - redundant information might speed up queries\n- queries on the database don't add to the load on the operational database\n- additional data might be derived from the operational database\n- needs to be synced somehow with the main database (eg. sync data overnight or sync using events)\n\n## Event Sourcing\n\nBased on: https://docs.microsoft.com/en-us/azure/architecture/patterns/event-sourcing\n, https://microservices.io/patterns/data/event-sourcing.html\n\n> How to reliably/atomically update the database and publish messages/events?\n\nInstead of maintaining current state, application can have a log of state changes. Whenever the state of a business\nentity changes, a new event is appended to the list of events. Since saving an event is a single operation, it is\ninherently atomic. The application reconstructs an entity’s current state by replaying the events.\n\nThe event log also behaves like message broker. When a service saves an event in the event store, it is delivered to all\ninterested subscribers.\n\n> Event sourcing is commonly combined with the CQRS pattern by performing the data management tasks in response to the\n> events, and by materialising views from the stored events.\n\nIn order to maintain consistency in multi-threaded applications, adding a timestamp to every event might help in\nresolving issues, but not in all cases. Better approach is to label each event with an incremental identifier. If two\nactions attempt to add events for the same entity at the same time, the event store can reject an event that matches an\nexisting entity identifier.\n\n![patterns-architecture-event-sourcing-overview-microsoft](../_images/patterns-architecture-event-sourcing-overview.png)\n\nThis pattern is useful when:\n\n- you want to capture intent, purpose, or reason in the data\n- you want to record events that occur, and be able to replay them to restore the state of a system, roll back changes,\n  or keep a history and audit log\n\nNot useful when:\n\n- small problems\n- consistency and real-time updates to the views of the data are required\n- history, and capabilities to roll back and replay actions are not required\n\nExample: banking system - list of all transactions, basing on these transactions your total balance is calculated.\n\n## Saga\n\nBased on: https://microservices.io/patterns/data/saga.html\n\nIn a design where each service has its own database, sometimes transactions have to span multiple services, hence local\nACID transaction is not an option.\n\nA solution to this problem is *Saga* - a sequence of local transactions. Each local transaction updates the database and\npublishes a message or event to trigger the next local transaction in the saga. If a local transaction fails because it\nviolates a business rule then the saga executes a series of compensating transactions that undo the changes that were\nmade by the preceding local transactions.\n\nFor example: Service A creates a new Order with PENDING state and publishes an event that is consumed by another service\nB, service B responds with an event to service A. Service A accepts or rejects new Order.\n\nDON'T: Based on `Chapter 17: Microservices Architecture` @ `Fundamentals of Software Architecture`:\n\n> Don't do transactions in microservices - fix granularity instead.\n"
  },
  {
    "path": "teaching/python-intermediate/README.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Python Intermediate\n\nRepository with the code and tasks: https://github.com/pkardas/shapes\n\n"
  },
  {
    "path": "teaching/python-intro/README.md",
    "content": "[go back](https://github.com/pkardas/learning)\n\n# Introduction to Programming: Python for beginners\n\nThis folder contains the presentation and the notebook used during \"Introduction to Programming: Python for beginners\" classes. Training was intended for people with no prior programming skills. Each training was scheduled for 2 hours.\n\n`presentation` - meeting agenda, topics, theory, examples\n\n`notebook` - Jupyter Notebook with assignments, audience was supposed to fill in the gaps using provided theory and examples.\n\n"
  },
  {
    "path": "teaching/python-intro/notebook.ipynb",
    "content": "{\n  \"nbformat\": 4,\n  \"nbformat_minor\": 0,\n  \"metadata\": {\n    \"colab\": {\n      \"name\": \"Introduction to programming: Python for beginners.ipynb\",\n      \"provenance\": [],\n      \"collapsed_sections\": [\n        \"rDyFlkw1DnX_\",\n        \"lLgIUzF_PwR7\",\n        \"jxT57KJoPSs3\",\n        \"2eRI479WVjka\",\n        \"4i61CcItIwUv\",\n        \"jvxBKZp8nRZP\",\n        \"6wz3rtllMw6k\",\n        \"JOGgKUXDx356\"\n      ]\n    },\n    \"kernelspec\": {\n      \"name\": \"python3\",\n      \"display_name\": \"Python 3\"\n    }\n  },\n  \"cells\": [\n    {\n      \"cell_type\": \"markdown\",\n      \"metadata\": {\n        \"id\": \"rDyFlkw1DnX_\"\n      },\n      \"source\": [\n        \"# Task 1 \\n\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"6oNEjs4RbFo8\"\n      },\n      \"source\": [\n        \"def airhelp() -> str:\\n\",\n        \"  return \\\"AirHelp\\\"\\n\",\n        \"\\n\",\n        \"airhelp()\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"-2d4bfTpaaUB\"\n      },\n      \"source\": [\n        \"from datetime import date, timedelta\\n\",\n        \"\\n\",\n        \"def yesterday() -> date:\\n\",\n        \"  return date.today() - timedelta(days=1)\\n\",\n        \"\\n\",\n        \"yesterday()\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"glbGi1MgD0RH\"\n      },\n      \"source\": [\n        \"# Write a function with a name `hello_world` that **returns**: \\\"Hello world!\\\". Fill the gaps with Python code. \\n\",\n        \"\\n\",\n        \"def AAA() -> str:\\n\",\n        \"  return BBB\\n\",\n        \"\\n\",\n        \"hello_world()\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"markdown\",\n      \"metadata\": {\n        \"id\": \"lLgIUzF_PwR7\"\n      },\n      \"source\": [\n        \"# Task 2: \\n\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"mCBTZQCZbaMD\"\n      },\n      \"source\": [\n        \"def y(x: int) -> int:\\n\",\n        \"  return 2 * x\\n\",\n        \"\\n\",\n        \"y(10)\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"e1GeGmhqbjIw\"\n      },\n      \"source\": [\n        \"from typing import List\\n\",\n        \"\\n\",\n        \"def odds(numbers: List[int]) -> List[int]:\\n\",\n        \"  return [number for number in numbers if number % 2 != 0]\\n\",\n        \"\\n\",\n        \"odds([1, 2, 3, 4, 5, 6, 7, 8, 9])\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"g8oV4dh_PwR8\"\n      },\n      \"source\": [\n        \"# Write a function that greets the user, user's name is provided via a parameter. Return string with injected user name.\\n\",\n        \"\\n\",\n        \"def CCC(DDD: str) -> str:\\n\",\n        \"  return f\\\"Hello, {DDD} 👋\\\"\\n\",\n        \"\\n\",\n        \"print(hello(\\\"Kamil\\\")) # Hello, Kamil! 👋\\n\",\n        \"print(hello(\\\"Piotr\\\")) # Hello, Piotr! 👋\\n\",\n        \"print(hello(\\\"Marta\\\")) # Hello, Marta! 👋\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"markdown\",\n      \"metadata\": {\n        \"id\": \"jxT57KJoPSs3\"\n      },\n      \"source\": [\n        \"# Task 3: \\n\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"lLU8BZZSdRZa\"\n      },\n      \"source\": [\n        \"class Person:\\n\",\n        \"  def __init__(self, name: str, surname: str, age: int) -> None:\\n\",\n        \"    self.name = name\\n\",\n        \"    self.surname = surname\\n\",\n        \"    self.age = age\\n\",\n        \"\\n\",\n        \"p0 = Person(\\\"Anja\\\", \\\"Rubik\\\", 37)\\n\",\n        \"p1 = Person(\\\"Elon\\\", \\\"Musk\\\", 49)\\n\",\n        \"\\n\",\n        \"def introduce_person(person: Person) -> str:\\n\",\n        \"  return f\\\"{person.name} is {person.age} years old.\\\"\\n\",\n        \"\\n\",\n        \"print(introduce_person(p0))\\n\",\n        \"print(introduce_person(p1))\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"9kZa8oAbPSs4\"\n      },\n      \"source\": [\n        \"# Build a new data type - Message. Message should have 3 attributes: `content`, `sender_email` and `received_at`. Fill the gaps.\\n\",\n        \"\\n\",\n        \"from datetime import datetime\\n\",\n        \"\\n\",\n        \"class Message:\\n\",\n        \"  def EEE(self, FFF: str, GGG: str, HHH: datetime) -> None:\\n\",\n        \"    self.FFF = FFF\\n\",\n        \"    self.GGG = GGG\\n\",\n        \"    self.HHH = HHH\\n\",\n        \"\\n\",\n        \"m0 = Message(\\\"Hello! How are you?\\\", \\\"adam@gmail.com\\\", datetime(2021, 4, 21, 12, 0, 0))\\n\",\n        \"m1 = Message(\\\"I am fine!\\\",          \\\"dan@gmail.com\\\",  datetime.utcnow())\\n\",\n        \"\\n\",\n        \"print(m0.content, m0.sender_email, m0.received_at)\\n\",\n        \"print(m1.content, m1.sender_email, m1.received_at)\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"markdown\",\n      \"metadata\": {\n        \"id\": \"2eRI479WVjka\"\n      },\n      \"source\": [\n        \"# Task 4: \\n\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"1u5ekl3OfHoJ\"\n      },\n      \"source\": [\n        \"class Person:\\n\",\n        \"  def __init__(self, name: str, surname: str, age: int) -> None:\\n\",\n        \"    self.name = name\\n\",\n        \"    self.surname = surname\\n\",\n        \"    self.age = age\\n\",\n        \"\\n\",\n        \"  def introduce(self) -> str:\\n\",\n        \"    return f\\\"{self.name} is {self.age} years old.\\\"\\n\",\n        \"\\n\",\n        \"p0 = Person(\\\"Anja\\\", \\\"Rubik\\\", 37)\\n\",\n        \"p1 = Person(\\\"Elon\\\", \\\"Musk\\\", 49)\\n\",\n        \"\\n\",\n        \"print(p0.introduce())\\n\",\n        \"print(p1.introduce())\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"8RzdyrbNe2yx\"\n      },\n      \"source\": [\n        \"# Extend `Message`. Add a method that will return message language.\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"gzrXvaK_WmxA\"\n      },\n      \"source\": [\n        \"! pip install langdetect\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"DpnviQcAVjkc\"\n      },\n      \"source\": [\n        \"from langdetect import detect\\n\",\n        \"\\n\",\n        \"class Message:\\n\",\n        \"  def __init__(self, content: str, sender_email: str, received_at: datetime) -> None:\\n\",\n        \"    self.content = content\\n\",\n        \"    self.sender_email = sender_email\\n\",\n        \"    self.received_at = received_at\\n\",\n        \"    \\n\",\n        \"  @property\\n\",\n        \"  def language(self) -> str:\\n\",\n        \"    return detect(self.JJJ).upper()\\n\",\n        \"\\n\",\n        \"m0 = Message(\\\"Hi Johny.\\\",            \\\"adam@gmail.com\\\", datetime(2021, 4, 21, 12, 0, 0))\\n\",\n        \"m1 = Message(\\\"こんにちは、Akikoさん。\\\", \\\"dan@gmail.com\\\",  datetime(2021, 4, 21, 13, 0, 0))\\n\",\n        \"\\n\",\n        \"print(f\\\"'{m0.content}' is in {m0.language}\\\")  # This should print: \\\"'Hi Johny.' is in EN\\\"\\n\",\n        \"print(f\\\"'{m1.content}' is in {m1.language}\\\")  # This should print: \\\"'こんにちは、Akikoさん。' is in JA\\\"\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"markdown\",\n      \"metadata\": {\n        \"id\": \"4i61CcItIwUv\"\n      },\n      \"source\": [\n        \"# Task 5: \"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"UmHhUjSafdSV\"\n      },\n      \"source\": [\n        \"def print_people(people: List[Person]) -> None:\\n\",\n        \"  for i, person in enumerate(people):\\n\",\n        \"    print(i, person.name, person.surname)\\n\",\n        \"\\n\",\n        \"p0 = Person(\\\"Anja\\\", \\\"Rubik\\\", 37)\\n\",\n        \"p1 = Person(\\\"Elon\\\", \\\"Musk\\\", 49)\\n\",\n        \"p2 = Person(\\\"Abel\\\", \\\"Tesfaye\\\", 31)\\n\",\n        \"p3 = Person(\\\"Guido\\\", \\\"van Rossum\\\", 65)\\n\",\n        \"\\n\",\n        \"\\n\",\n        \"people = [p0, p1, p2, p3]\\n\",\n        \"\\n\",\n        \"print_people(people)\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"AW4cx5OugafV\"\n      },\n      \"source\": [\n        \"people[2].surname\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"JYGq-YtQIwUx\"\n      },\n      \"source\": [\n        \"from typing import List\\n\",\n        \"\\n\",\n        \"m0 = Message(\\\"Today is a beautiful day\\\",          \\\"tom@gmail.com\\\",  datetime(2020, 1,  1))\\n\",\n        \"m1 = Message(\\\"Today is rather average day\\\",       \\\"adam@gmail.com\\\", datetime(2005, 12, 5))\\n\",\n        \"m2 = Message(\\\"Dziś jest piękny dzień\\\",            \\\"ewa@gmail.com\\\",  datetime(2021, 4,  21))\\n\",\n        \"m3 = Message(\\\"Aujourd'hui est une belle journée\\\", \\\"tina@gmail.com\\\", datetime(2020, 12, 5))\\n\",\n        \"\\n\",\n        \"def print_messages(messages: List[Message]) -> None:\\n\",\n        \"  for i, message in enumerate(messages):\\n\",\n        \"    print(i, message.content)\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"zfSSyrxNJ5eB\"\n      },\n      \"source\": [\n        \"# Group messages `m0, m1, m2, m3` together\\n\",\n        \"messages = KKK\\n\",\n        \"\\n\",\n        \"print_messages(messages)\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"k2zBcriFlaXi\"\n      },\n      \"source\": [\n        \"# Access first message from the list\\n\",\n        \"messages[LLL].content\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"DfB4WNlAlr0O\"\n      },\n      \"source\": [\n        \"# Access the last message from the list\\n\",\n        \"messages[MMM].content\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"k2X690Xql1B3\"\n      },\n      \"source\": [\n        \"# Assign the last message to the variable and display message language\\n\",\n        \"last_message = messages[NNN]\\n\",\n        \"last_message.language\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"D5K5KejanLa6\"\n      },\n      \"source\": [\n        \"# Display the language of the last message without assigning to the variable\\n\",\n        \"messages[NNN].language\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"Id6lwNVtmHYT\"\n      },\n      \"source\": [\n        \"# Append message m4 to the existing list of the messages\\n\",\n        \"m4 = Message(\\\"Can you append me to the list, please?\\\", \\\"karen@gmail.com\\\", datetime(2021, 1, 5))\\n\",\n        \"messages.OOO(m4)\\n\",\n        \"\\n\",\n        \"print_messages(messages)\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"Ebap0NT4ngOt\"\n      },\n      \"source\": [\n        \"# ITERATE over the list of messages and print: message content, sender and message language.\\n\",\n        \"for PPP in QQQ:\\n\",\n        \"  print(PPP.content, PPP.sender_email, PPP.language)\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"markdown\",\n      \"metadata\": {\n        \"id\": \"jvxBKZp8nRZP\"\n      },\n      \"source\": [\n        \"# Task 6: \\n\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"Xedg0en_aPbd\"\n      },\n      \"source\": [\n        \"people_over_40 = [person for person in people if person.age > 40]\\n\",\n        \"\\n\",\n        \"print_people(people_over_40)\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"cl7oWpTqrtbA\"\n      },\n      \"source\": [\n        \"# Write a function returning filtered messages. Filter by message language.\\n\",\n        \"\\n\",\n        \"def messages_in_language(messages: List[Message], country_code: str) -> List[Message]:\\n\",\n        \"  return [RRR for RRR in SSS if RRR.language == country_code]\\n\",\n        \"\\n\",\n        \"messages = [\\n\",\n        \"  Message(\\\"This message is in English\\\",          \\\"xyz@gmail.com\\\", datetime.now()),\\n\",\n        \"  Message(\\\"This message is also in English\\\",     \\\"xyz@gmail.com\\\", datetime.now()),\\n\",\n        \"  Message(\\\"Ta wiadomość jest po polsku\\\",         \\\"xyz@gmail.com\\\", datetime.now()),\\n\",\n        \"  Message(\\\"Ta wiadomość również jest po polsku\\\", \\\"xyz@gmail.com\\\", datetime.now()),\\n\",\n        \"  Message(\\\"このメッセージは日本語で書かれています。\\\",   \\\"xyz@gmail.com\\\", datetime.now()),\\n\",\n        \"  Message(\\\"このメッセージは日本語でも書かれています\\\",   \\\"xyz@gmail.com\\\", datetime.now()),\\n\",\n        \"]\\n\",\n        \"\\n\",\n        \"print(\\\"-- PL --\\\")\\n\",\n        \"print_messages(messages_in_language(messages, \\\"PL\\\"))\\n\",\n        \"print(\\\"-- EN --\\\")\\n\",\n        \"print_messages(messages_in_language(messages, \\\"EN\\\"))\\n\",\n        \"print(\\\"-- JA --\\\")\\n\",\n        \"print_messages(messages_in_language(messages, \\\"JA\\\"))\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"markdown\",\n      \"metadata\": {\n        \"id\": \"6wz3rtllMw6k\"\n      },\n      \"source\": [\n        \"# Task 7: \\n\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"gxNBaSLpvFVY\"\n      },\n      \"source\": [\n        \"[1, 1, 1, 1, 1, 2]\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"Oii5vHEFvLPi\"\n      },\n      \"source\": [\n        \"(1, 1, 1, 1, 1, 2)\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"vXB9LKlmvQAE\"\n      },\n      \"source\": [\n        \"{1, 1, 1, 1, 1, 2}\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"H4Gss1dPMw6m\"\n      },\n      \"source\": [\n        \"# Write a function returning unique e-mails from the provided list of messages.\\n\",\n        \"\\n\",\n        \"def unique_emails(messages: List[Message]) -> List[str]:\\n\",\n        \"  return list({message.TTT for message in messages})\\n\",\n        \"\\n\",\n        \"messages = [\\n\",\n        \"  Message(\\\"Lorem ipsum\\\", \\\"anna@gmail.com\\\", datetime.now()),\\n\",\n        \"  Message(\\\"Lorem ipsum\\\", \\\"dan@gmail.com\\\",  datetime.now()),\\n\",\n        \"  Message(\\\"Lorem ipsum\\\", \\\"tom@gmail.com\\\",  datetime.now()),\\n\",\n        \"  Message(\\\"Lorem ipsum\\\", \\\"kate@gmail.com\\\", datetime.now()),\\n\",\n        \"  Message(\\\"Lorem ipsum\\\", \\\"tom@gmail.com\\\",  datetime.now()),\\n\",\n        \"  Message(\\\"Lorem ipsum\\\", \\\"kate@gmail.com\\\", datetime.now()),\\n\",\n        \"  Message(\\\"Lorem ipsum\\\", \\\"anna@gmail.com\\\", datetime.now()),\\n\",\n        \"  Message(\\\"Lorem ipsum\\\", \\\"kate@gmail.com\\\", datetime.now()),\\n\",\n        \"]\\n\",\n        \"\\n\",\n        \"# This should print, somthing like:\\n\",\n        \"# ['tom@gmail.com', 'anna@gmail.com', 'dan@gmail.com', 'kate@gmail.com']\\n\",\n        \"# (order might be different)\\n\",\n        \"unique_emails(messages)\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"markdown\",\n      \"metadata\": {\n        \"id\": \"JOGgKUXDx356\"\n      },\n      \"source\": [\n        \"# Task 8: \\n\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"RbDPROw9avi9\"\n      },\n      \"source\": [\n        \"sorted_people = sorted(people, key=lambda person: person.age)\\n\",\n        \"\\n\",\n        \"print_people(sorted_people)\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"metadata\": {\n        \"id\": \"NO-du1hNx35-\"\n      },\n      \"source\": [\n        \"# Write a function returning messages sorted by date.\\n\",\n        \"\\n\",\n        \"def messages_sorted_by_date(messages: List[Message]) -> List[Message]:\\n\",\n        \"  return sorted(VVV, key=lambda message: message.UUU)\\n\",\n        \"\\n\",\n        \"messages = [\\n\",\n        \"  Message(\\\"1\\\", \\\"example@gmail.com\\\", datetime(2005, 1, 1)),\\n\",\n        \"  Message(\\\"3\\\", \\\"example@gmail.com\\\", datetime(2006, 6, 2)),\\n\",\n        \"  Message(\\\"6\\\", \\\"example@gmail.com\\\", datetime(2020, 6, 6)),\\n\",\n        \"  Message(\\\"4\\\", \\\"example@gmail.com\\\", datetime(2007, 4, 1)),\\n\",\n        \"  Message(\\\"8\\\", \\\"example@gmail.com\\\", datetime(2021, 5, 5)),\\n\",\n        \"  Message(\\\"2\\\", \\\"example@gmail.com\\\", datetime(2005, 2, 6)),\\n\",\n        \"  Message(\\\"7\\\", \\\"example@gmail.com\\\", datetime(2020, 9, 9)),\\n\",\n        \"  Message(\\\"5\\\", \\\"example@gmail.com\\\", datetime(2010, 9, 1)),\\n\",\n        \"]\\n\",\n        \"\\n\",\n        \"# This should print something like: \\\"1, 2, 3, 4, 5, 6, 7, 8\\\"\\n\",\n        \"print_messages(messages_sorted_by_date(messages))\"\n      ],\n      \"execution_count\": null,\n      \"outputs\": []\n    }\n  ]\n}"
  }
]