Repository: Arp-G/async-elixir
Branch: master
Commit: 0c70f00fb595
Files: 30
Total size: 244.9 KB
Directory structure:
gitextract_b7ju2jm8/
├── .github/
│ └── FUNDING.yml
├── LICENSE
├── README.md
└── chapters/
├── ch_0.0_start.livemd
├── ch_1.1_concurrency_in_elixir.livemd
├── ch_1.2_immutability_and_memory_management.livemd
├── ch_2.1_process_internals.livemd
├── ch_2.2_process_basics.livemd
├── ch_2.3_process_linking.livemd
├── ch_2.4_process_monitoring_and_hibernation.livemd
├── ch_2.5_group_leaders_and_process_naming.livemd
├── ch_3.1_genserver_introduction.livemd
├── ch_3.2_building_a_genserver.livemd
├── ch_3.3_genserver_examples.livemd
├── ch_3.4_other_genserver_functions.livemd
├── ch_4.0_the_registry_module.livemd
├── ch_5.1_supervisors_introduction.livemd
├── ch_5.2_supervision_strategies.livemd
├── ch_5.3_restart_strategies.livemd
├── ch_5.4_introduction_to_dynamic_supervisor.livemd
├── ch_5.5_partition_supervisor.ex.livemd
├── ch_5.6_scaling_dynamic_supervisor.livemd
├── ch_6.0_project_building_a_download_manager.livemd
├── ch_7.1_intro_to_tasks.livemd
├── ch_7.2_awaiting_tasks.livemd
├── ch_7.3_task_async_stream.livemd
├── ch_7.4_supervised_tasks.livemd
├── ch_8.0_agents.livemd
├── ch_9.0_gotchas.livemd
└── sample_data/
└── top_websites.csv
================================================
FILE CONTENTS
================================================
================================================
FILE: .github/FUNDING.yml
================================================
# These are supported funding model platforms
github: Arp-G
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2023 Async Elixir
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
# Async Elixir 🔮
Welcome to the **Async Elixir** book repository!
The **Async Elixir** book is a deep dive into Elixir's concurrency features. If you're already comfortable with Elixir basics and eager to explore concurrent programming and process management, you're in the right place.
[](https://livebook.dev/run?url=https%3A%2F%2Fgithub.com%2FArp-G%2Fasync-elixir%2Fblob%2Fmaster%2Fchapters%2Fch_0.0_start.livemd)
## Getting Started
* Clone this repository
* Ensure you have [Livebook](https://livebook.dev/) installed
* Open livebook and then open the [Course Overview](chapters/ch_0.0_start.livemd) file in Livebook.
## Contributions
If you encounter any issues, find typos, or have valuable suggestions to improve the course content, don't hesitate to create an issue in this repository. Your contributions are highly appreciated!
================================================
FILE: chapters/ch_0.0_start.livemd
================================================
# Async Elixir
## Overview
#### Welcome to Async Elixir
Welcome to the **Async Elixir** course, a comprehensive exploration of Elixir's advanced concurrency features. This course is tailored for individuals with a foundational understanding of Elixir programming and a desire to elevate their expertise to the next level.
#### Course Overview
This course is your gateway to becoming proficient in Elixir's asynchronous capabilities. Whether you're an experienced developer expanding your skill set or a curious learner fascinated by concurrent programming, you'll gain a deep understanding of Elixir's powerful asynchronous features.
By the conclusion of this course, you will have developed a robust understanding of processes, OTP patterns, and the Elixir standard library's utilization for effective process management. You'll be well-equipped to apply these concepts confidently in real-world situations.
#### Key Topics Covered
* **In-Depth Process and Concurrency:** Explore the intricate workings of Elixir processes and the art of achieving concurrency.
* **Essential Process Management Concepts:** Learn vital concepts such as process linking, monitoring and more.
* **Abstractions for Resilience and Control:** Navigate essential abstractions like GenServers, Supervisors, and core Elixir modules such as Task, Registry, and Agents.
* **Applied Learning with a Project:** Put your knowledge into practice through a hands-on project, solidifying your grasp of asynchronous programming and process management.
#### Course Focus
* **Prerequisite Elixir Knowledge:** This course assumes a foundational familiarity with Elixir's syntax and concepts. It DOES NOT teach Elixir basics but instead focuses on the asynchronous capabilities of Elixir.
* **Core Concepts, Not Libraries:** Our emphasis remains on core concepts. The course doesn't cover libraries like Phoenix or Phoenix LiveView. Instead, it equips you with a strong foundation to comprehend such libraries more effectively.
### Prerequisites
Before diving into the course, make sure you have the following:
* Installed [Elixir](https://elixir-lang.org/install.html)
* Installed [Livebook](https://livebook.dev/)
* A basic understanding of Elixir syntax and the ability to write code in Elixir. If you need a refresher, you can review Elixir basics [here](https://elixir-lang.org/getting-started/introduction.html).
[Livebook](https://livebook.dev/) is an excellent tool for learning Elixir and experimenting with its concepts. Learn more about Livebook [here](https://github.com/livebook-dev/livebook).
#### About me
I am a full-stack software engineer with four years of focused experience in Elixir programming. Over this period, I have actively engaged in various Elixir projects, extensively delved into Elixir literature, including books and blogs, and honed my skills in this powerful language. My fascination with Elixir's process-oriented architecture has inspired me to condense my insights into this course.
## Table of Contents
#### Part 1: Concurrency and Processes
* 🌀 [Chapter 1.1: Concurrency in Elixir](ch_1.1_concurrency_in_elixir.livemd)
* 🧠 [Chapter 1.2: Immutability and Memory Management](ch_1.2_immutability_and_memory_management.livemd)
#### Part 2: Processes
* 🔄 [Chapter 2.1: Process Internals](ch_2.1_process_internals.livemd)
* 🚀 [Chapter 2.2: Process Basics](ch_2.2_process_basics.livemd)
* 🔗 [Chapter 2.3: Process Linking](ch_2.3_process_linking.livemd)
* 💡 [Chapter 2.4: Process Monitoring and Hibernation](ch_2.4_process_monitoring_and_hibernation.livemd)
* 🛌 [Chapter 2.5: Group Leaders and Process Naming](ch_2.5_group_leaders_and_process_naming.livemd)
#### Part 3: GenServer
* 🧪 [Chapter 3.1: GenServer Introduction](ch_3.1_genserver_introduction.livemd)
* 🏗️ [Chapter 3.2: Building a GenServer](ch_3.2_building_a_genserver.livemd)
* 🌐 [Chapter 3.3: GenServer Examples](ch_3.3_genserver_examples.livemd)
* 🔧 [Chapter 3.4: Other GenServer Functions](ch_3.4_other_genserver_functions.livemd)
#### Part 4: Registry Module
* 📚 [Chapter 4: The Registry Module](ch_4.0_the_registry_module.livemd)
#### Part 5: Supervision
* 👥 [Chapter 5.1: Supervisors Introduction](ch_5.1_supervisors_introduction.livemd)
* 🔄 [Chapter 5.2: Supervision Strategies](ch_5.2_supervision_strategies.livemd)
* 🔄 [Chapter 5.3: Restart Strategies](ch_5.3_restart_strategies.livemd)
* 🚀 [Chapter 5.4: Introduction to Dynamic Supervisor](ch_5.4_introduction_to_dynamic_supervisor.livemd)
* 🗄️ [Chapter 5.5: Partition Supervisor Example](ch_5.5_partition_supervisor.ex.livemd)
* ⚖️ [Chapter 5.6: Scaling with Dynamic Supervisor](ch_5.6_scaling_dynamic_supervisor.livemd)
#### Part 6: Project: Building a Download Manager
* 🛠️ [Chapter 6: Building a Download Manager](ch_6.0_project_building_a_download_manager.livemd)
#### Part 7: Tasks
* ⚙️ [Chapter 7.1: Introduction to Tasks](ch_7.1_intro_to_tasks.livemd)
* 🔄 [Chapter 7.2: Awaiting Tasks](ch_7.2_awaiting_tasks.livemd)
* 🔀 [Chapter 7.3: Task Async Stream](ch_7.3_task_async_stream.livemd)
* 🛡️ [Chapter 7.4: Supervised Tasks](ch_7.4_supervised_tasks.livemd)
#### Part 8: Agents
* 🤖 [Chapter 8: Agents](ch_8.0_agents.livemd)
#### Part 9: Misc
* 💡 [Chapter 9: Gotchas](ch_9.0_gotchas.livemd)
---
So without further ado, let's [get started](ch_1.1_concurrency_in_elixir.livemd)... 🚀
================================================
FILE: chapters/ch_1.1_concurrency_in_elixir.livemd
================================================
# Concurrency in Elixir
## Navigation
## Concurrency vs Parallelism
Concurrency and parallelism are often used interchangeably but are two distinct concepts. Concurrency refers to the execution of multiple tasks that overlap in time, with each task being interrupted and resumed intermittently by the CPU(context switching). This can create an illusion of tasks running simultaneously, but in reality, they are taking turns executing on a single time-sliced CPU.
Parallelism, on the other hand, involves the simultaneous execution of multiple tasks on a hardware system that has multiple computing resources, such as a multi-core CPU. Parallelism allows tasks to run literally at the same time, without having to share CPU time.
In essence, concurrency deals with handling multiple tasks at once, while parallelism deals with actually performing multiple tasks at the same time. While a system can exhibit both concurrency and parallelism, it is possible to have a concurrent system that is not parallel.

In todays world with machines having power multi-core CPUs writing code that can run cocurrently and parallely can lead to huge performance benifits. We should strive to leverage the capabilities of modern hardware advancements by writing code that can fully utilize them.
Furthermore, as we develop software, we often encounter problems that require background tasks and can benefit greatly from the use of parallel programming. Examples of such tasks include image processing, video transcoding, and make third party api calls, to name a few.
## Concurrency and parallelism in Elixir
Due to the functional and immutable nature of Elixir writing parallel and concurrent code becomes much simpler. Unlike many other languages that require locks and mutexes to handle issues related to shared state in parallel programming, Elixir's design mitigates these problems. As a result, parallel and concurrent code is a first-class citizen in Elixir, requiring less effort and complexity to implement effectively.
In Elixir, **the Erlang Virtual Machine (BEAM)** serves as the backbone for managing concurrency and parallelism. Let's take a closer look at how it works under the hood to provide us with these superpowers.
Elixir leverages lightweight processes that are expertly managed by the VM. These processes are not true OS processes, making them highly lightweight and allowing for thousands or even [millions](https://phoenixframework.org/blog/the-road-to-2-million-websocket-connections) of them to run concurrently without impacting performance.
This lightweight process model has given rise to several powerful applications, including the [Cowboy web server](https://github.com/ninenines/cowboy) which creates a process for every incoming web request to keep heavy work or errors within a single request from affecting others. Other examples include [Phoenix Channels](https://hexdocs.pm/phoenix/channels.html) and [Phoenix live view](https://hexdocs.pm/phoenix_live_view/Phoenix.LiveView.html) that employ an Erlang process per WebSocket connection.
## Processes in Elixir
In Elixir and Erlang, the term "processes" does not refer to operating system processes or threads. Instead, they are akin to [green threads](https://en.wikipedia.org/wiki/Green_thread#:~:text=In%20computer%20programming%2C%20a%20green,underlying%20operating%20system%20(OS).) or actors. These processes run concurrently on a single-core CPU and in parallel on multiple cores, managed and scheduled by the Erlang Virtual Machine.
Surprisingly, each process in Elixir and Erlang requires only around 300 words of memory and takes microseconds to start, making them incredibly lightweight. In fact, within the Erlang Virtual Machine, every entity executing code operates within a process.
For instance, in [Phoenix](https://www.phoenixframework.org/), when making a regular HTTP request using Phoenix controllers, the corresponding connection is allocated its own process. This process is swiftly terminated once the response is sent and the connection is closed. In [LiveView](https://github.com/phoenixframework/phoenix_live_view) we keep that process alive since we work with websockets.
Each process is capable of executing code and possesses a **first-in-first-out** mailbox to which other processes can send messages. Likewise, it can send messages to other processes. Processes in Elixir and Erlang are inherently **sequential**, meaning they handle one message at a time.
Similar to an operating system scheduler, the Erlang VM has the ability to start, pause, or preempt work as needed (In computing, preemption is the act of temporarily interrupting an executing task, with the intention of resuming it at a later time).
While waiting for a message, a process is completely ignored by the scheduler. As a result, **idle processes do not consume any system resources**.
#### Reductions
Erlang uses "reductions" as work units to decide when a process might be paused. A reduction in Erlang is a unit of work done by BEAM, including tasks like function application, arithmetic operations, and message passing. The scheduler monitors reductions for each process, pausing a process once it reaches a set reduction count, which lets another process take its turn to run. This ensures fairness in scheduling by preventing processes from hogging the CPU.
Additionally, reductions are applied flexibly based on the operation type. For instance, I/O operations consume reductions differently, allowing the scheduler to handle various operations effectively. Unlike traditional blocking I/O, Erlang's non-blocking model lets processes continue working during I/O waits, improving overall system performance.
#### Scheduling in BEAM
BEAM, the underlying virtual machine, employs a single OS thread per core, and each thread runs its own scheduler. Every scheduler is responsible for pulling processes from its own run queue, with the BEAM being responsible for populating these queues with Erlang processes for execution.
(Note: To utilize more than one core the Erlang Runtime System Application(ERTS) has to be built in SMP mode. SMP stands for Symmetric MultiProcessing, that is, the ability to execute a processes on any one of multiple CPUs.)
The scheduler manages two queues: a ready queue containing processes that are prepared to run and a waiting queue containing processes that are waiting to receive a message.
When a process is selected from the ready queue, it is handed over to BEAM for the execution of one CPU time slice. BEAM interrupts the running process and places it at the end of the ready queue when the time slice expires. However, if the process is blocked in a receive operation before the time slice runs out, it is added to the waiting queue.
###### Loadbalancer
A load balancer is also in place, responsible for executing migration logic to allocate processes across the run queues on separate cores. This logic assists in maintaining load balance by taking jobs away from overloaded queues (known as [task stealing](https://blog.stenmans.org/theBeamBook/#_task_stealing)) and assigning them to empty or underloaded queues (known as ["task migration"](https://blog.stenmans.org/theBeamBook/#_migration)).
In simpler terms, if one scheduler's queue becomes crowded due to processes taking an extended time, other schedulers step in to distribute the workload more evenly. For example, if a process accumulates a high number of function calls (reductions), without completing, the scheduler will preemptively pause it which means freeze it, mid-run, and send it back to the end of the work queue. This **preemptive multitasking** approach ensures that no single task can monopolize the system for an extended period, ensuring consistently **low latency**, a key feature of the BEAM.
The load balancer strives to maintain an equal maximum number of run-able processes across schedulers.
Looking beyond Erlang's internal run queues, the operating system also manages the scheduling of threads onto CPU cores at an OS level. This means that processes can not only swap within Erlang's run queue but also undergo complete context switches or be relocated to different cores by the OS.

You can find the numer of schedulers in your IEX session using the `System.schedulers/0` function.
```elixir
# Returns the number of schedulers in the VM.
System.schedulers()
# Returns the number of schedulers online in the VM.
# Here online means total number of schedulers which are active and actually being used.
System.schedulers_online()
```
#### Process priority
Erlang's priority system has four levels: low, normal, high, and max (reserved for Erlang's internal use). Each level has its own run queue and follows a round-robin scheduling method, except for max.
Processes in max or high priority queues are executed exclusively, blocking lower-priority processes until they're done. This design emphasizes efficiency for critical tasks but can cause bottlenecks if high-priority processes are overused, impacting overall application responsiveness.
Low and normal queues are more flexible, allowing interleaved execution without blocking each other. However, using high priority sparingly is crucial to avoid performance issues.
Additionally, Erlang permits communication across different priority levels, although a high-priority process waiting for a message from a lower-priority one will effectively lower its own priority.
Process priority can be changed in elixir using `Process.flag(:priority, :high)`
## Resources
* https://medium.com/flatiron-labs/elixir-and-the-beam-how-concurrency-really-works-3cc151cddd61
* https://blog.stenmans.org/theBeamBook/#CH-Scheduling
* https://blog.appsignal.com/2024/04/23/deep-diving-into-the-erlang-scheduler.html
* https://fly.io/phoenix-files/a-liveview-is-a-process/
* https://underjord.io/unpacking-elixir-concurrency.html
## Navigation
================================================
FILE: chapters/ch_1.2_immutability_and_memory_management.livemd
================================================
# Immutability and memory management
## Navigation
## Immutability in elixir
In Elixir, variables function as **labels** that refer to specific values. These values are immutable, meaning they cannot be changed. However, the label or variable can be reassigned to a different value. This provides the flexibility to bind the same value to multiple labels or variables.
In Erlang, it is not possible to reassign or rebind a variable. Attempting to do so will result in an error, as demonstrated in the following code:
```erlang
X = 5,
X = X * 10. % throws an exception error: no match of right hand side value 50
```
The error in the Erlang code occurs because the variable X is initially assigned to the value of 5, but then an attempt is made to reassign it to a new value (i.e. X * 10). Since variables in Erlang are immutable, this operation is not allowed and the code will fail to compile.
On the other hand, Elixir allows for rebinding of values, which makes the same code valid in Elixir. This is because in Erlang, **the = operator functions as a match operator rather than an assignment operator**.
Therefore, in the Erlang code `X = X * 10`, the left-hand side of the match (X) is already bound to the value of 5, and trying to match it with the right-hand side (X * 10) which evaluates to 50, results in a mismatch.
In Elixir, the ^ (pin) operator can be used to force a match, while the assignment operation uses the regular = operator. For example:
```
iex(1)> x = 5
5
iex(2)> x = x * 10 # Assignment
50
iex(3)> ^x = x * 10 # Matching
** (MatchError) no match of right hand side value: 500
(stdlib 4.0.1) erl_eval.erl:496: :erl_eval.expr/6
iex:3: (file)
```
In Elixir, any input passed into a function to be transformed creates a new value without modifying the original value. This allows for safe concurrent access to the same data by multiple processes. Since there is **no shared memory** that is getting mutated by multiple processes, concurrency is easier to manage. Any transformation on the original data will result in new data being created. Processes do not share state, they can only communicate asynchronously through message passing. This ensures it is safe to run them at the same time.
## Example of immutability & closures in elixir
```elixir
list = [1, 2, 3, 4]
# Returns a new list
Enum.filter(list, fn num -> num > 2 end)
# [1, 2, 3, 4] Original list remains unchanged
list
```
```elixir
x = 1
# An anonymous function
anon = fn ->
# Closure captures the value of x
IO.puts(x)
x = 0
end
# Outputs 1
anon.()
# Outputs 1
IO.puts(x)
x = 5
# Outputs 1
anon.()
```
## Persistent Datastructures
At this point you might be wandering that performing a full copy of the entire data whenever something changes would be an expensive and slow operation and lead to a high performance overhead.
To solve this problem there is a class of datastructures known as **persistent data structures**.
Persistent data structures are data structures that allow for the efficient storage and retrieval of data even after multiple modifications or updates have been made. These data structures preserve the previous versions of data and allow for efficient access to those versions.
A persistent data structure **maintains the previous versions of data** by using a technique known as structural sharing. Structural sharing allows the data structure to **share the unchanged parts of its structure** across multiple versions of the data, rather than copying the entire structure each time a modification is made. This sharing of unchanged structure makes persistent data structures efficient in both time and space.
Persistent data structures are widely used in functional programming languages, where immutability is a core concept.
Under the hood, the BEAM leverages persistent data structures in order to provide
immutability as a first-class citizen while not having to copy the entire data structure
any time something changes (with the exceptions of when data is passed between
processes or when data is extracted from native data stores like ETS).
For example, in Elixir lists are actually linked lists.
A linked list is a Tree with one branch.
```
Elixir: list = [1, 2, 3, 4]
Tree: 1 -> 2 -> 3 -> 4
Every time you prepend an element to a list, it will share its tail:
Elixir: [0 | list]
Tree: 0 -> (1 -> 2 -> 3 -> 4)
```
## High level overview of garbage collection in elixir
Erlang employs a generational copying garbage collection system where each process has its own private heap. This heap is divided into two segments - the young and old generations. Newly allocated data resides in the young generation, while data that has survived multiple garbage collection cycles is stored in the old generation.
The young generation undergoes more frequent garbage collection, whereas the old generation is only garbage collected during a full sweep, which occurs after a certain number of generational GC cycles. It can also be collected if not enough memory is reclaimed or if manually invoked. This process is referred to as **soft real-time** garbage collection because it halts only the process undergoing GC without affecting other processes. This characteristic is well-suited for soft real-time systems, as the entire runtime doesn't need to pause.
In addition to this, Erlang implements [reference counting garbage collection](https://en.wikipedia.org/wiki/Garbage_collection_\(computer_science\)#Reference_counting) for the shared heap. Here, objects in the shared heap are assigned reference counters that keep track of the number of references held by other objects. When an object's reference count drops to zero, it's considered inaccessible and gets destroyed.
For a deeper dive into garbage collection, you can explore this informative [video](https://www.youtube.com/watch?v=OSdaXNQ0xhQ) and this insightful [book](https://hamidreza-s.github.io/erlang%20garbage%20collection%20memory%20layout%20soft%20realtime/2015/08/24/erlang-garbage-collection-details-and-why-it-matters.html).
## Resources
* https://stackoverflow.com/questions/30203227/does-elixir-have-persistent-data-structures-similar-to-clojure
* https://elixirpatterns.dev/
* https://gist.github.com/josevalim/ce2f5871a96b6cbcf2c1
* https://elixirforum.com/t/how-would-you-explain-elixir-immutability/47323/11
* [Video on garbage collection](https://www.youtube.com/watch?v=OSdaXNQ0xhQ)
* [The beam book -garbage collection](https://hamidreza-s.github.io/erlang%20garbage%20collection%20memory%20layout%20soft%20realtime/2015/08/24/erlang-garbage-collection-details-and-why-it-matters.html)
## Navigation
================================================
FILE: chapters/ch_2.1_process_internals.livemd
================================================
# Process Internals
## Navigation
## What are processes?
A process is a **self-contained** entity where code is executed. It safeguards the system from errors in our code by restricting the effects of the error to the process that is executing the faulty code. Processes have their own address space and can communicate with other processes via signals and messages, and their execution is managed by a preemptive scheduler.
It is important to note that Elixir processes are not the same as operating system processes. Elixir processes are remarkably **lightweight** in terms of memory and CPU usage, even when compared to threads in other programming languages. Therefore, it is not uncommon to run tens or even [hundreds of thousands](https://phoenixframework.org/blog/the-road-to-2-million-websocket-connections) of processes simultaneously.
## Internals of a process
Let's explore the structure of an Elixir process at a high level.
An Elixir process consists of four primary memory blocks: the **stack**, the **heap**, the message area (also known as the **mailbox**), and the **Process Control Block** (PCB). The stack is responsible for tracking program execution by storing return addresses, passing function arguments, and keeping local variables. The heap, on the other hand, stores larger structures such as lists and tuples.
The message area or mailbox is used to hold messages sent from other processes to the target process. The PCB maintains the state of the process, while the stack, heap, and mailbox are dynamically allocated and can grow or shrink based on usage. Conversely, the PCB is statically allocated and contains several fields that control the process.
**Message passing** is the primary means of communication between Elixir processes. When one process sends a message to another, the message is copied from the sender's heap to the recipient's mailbox. In certain circumstances, such as when a process is suspended and no other processes are attempting to send it messages, the message may be directly copied to the recipient's mailbox. In other cases, the message is stored in an m-buf and moved to the heap after a garbage collection. M-bufs are variable-length heap fragments, and a process may have several m-bufs.
(It is worth noting that in the early versions of Erlang, parallelism was not available, so only one process could execute at any given time. In such versions, the sending process could write directly to the heap of the receiving process. However, with the rise of multicore systems, message copying across process heaps is managed using locks and queues. To learn more about this topic, please see this [article](https://blog.stenmans.org/theBeamBook/#_the_process_of_sending_a_message_to_a_process).)
## Resources
* https://elixir-lang.org/getting-started/processes.html
* https://hexdocs.pm/elixir/1.12/Process.html
* https://www.erlang-solutions.com/blog/understanding-processes-for-elixir-developers/
## Navigation
================================================
FILE: chapters/ch_2.2_process_basics.livemd
================================================
# Process Basics
## Navigation
## Process introspection
To check processes we have in a running system: `:shell_default.i`.
You might notice that many processes have a heap size of 233, that is because it is the default starting heap size of a process.
If there is a large number for the heap size, then the process uses a lot of memory and if there is a large number for the reductions then the process has executed a lot of code.
Get lot more infor about a process using `Process.info/1`.
```elixir
Process.whereis(:code_server)
pid = Process.whereis(:code_server)
Process.info(pid)
```
[Process.info/2](http://Process.info/2) can be used to view additional info like backtrace `Process.info(pid, :backtrace)`
The [observer](https://elixir-lang.org/getting-started/debugging.html#observer) is also a great tool to observe processes.
## Process Dictionary
There is actually one more memory area in a process where Erlang terms can be stored, the *Process Dictionary*.
The *Process Dictionary* (PD) is a process local key-value store. One advantage with this is that all keys and values are stored on the heap and there is no copying as with `send/2` or an ETS table.
([ETS](https://elixirschool.com/en/lessons/storage/ets#overview-0) or Erlang Term Storage is a in-memory store for Elixir and Erlang objects that comes included. ETS is capable of storing large amounts of data and offers constant time data access. Tables in ETS are created and owned by individual processes. When an owner process terminates, its tables are destroyed)
```elixir
# Stores the given key-value pair in the process dictionary.
Process.put(:count, 1)
Process.put(:locale, "en")
```
```elixir
# Returns the value for the given key in the process dictionary
Process.get(:count)
```
```elixir
# Returns all keys in the process dictionary
Process.get_keys()
```
```elixir
# Deletes the given key from the process dictionary
Process.delete(:count)
```
```elixir
# Returns all key-value pairs in the process dictionary.
Process.get()
```
## Spawning processes
The most fundamental way to create processes in Elixir is by using the [spawn/1](https://hexdocs.pm/elixir/1.12/Kernel.html#spawn/1), [receive/1](https://hexdocs.pm/elixir/1.12/Kernel.SpecialForms.html#receive/1), and [send/2](https://hexdocs.pm/elixir/1.12/Kernel.html#send/2) functions. They enable us to spawn a process, wait for messages, and send messages to a process, respectively.
Many higher-level abstractions, such as Task, GenServer, and Agent, are built on top of these primitive functions.
These functions are part of the [Kernel](https://hexdocs.pm/elixir/1.12/Kernel.html) module and are automatically imported, allowing us to call them directly without needing to use the `Kernel.` prefix.
Let's take a look at some examples of their usage...
```elixir
# Spawn a process, by passing it a function to execute.
# spawn/1 returns the pid (process identifier) of the spawed process
pid = spawn(fn -> IO.puts("Hello world") end)
# Once the process has finished excuting it will exit
Process.alive?(pid) |> IO.inspect()
# Sleep for 100ms to wait for process to exit
:timer.sleep(100)
Process.alive?(pid)
```
When spawning a process it goes through a lifecycle like so...
```mermaid
flowchart LR
spawn --> NewProcess --> ExecuteCallbackFunction --> Dead
```
## Exchanging messages between processes
To exchange messages between processes in Elixir, we can use the `send/2` and `receive/1` functions.
When a process uses `send/2` to send a message, it **doesn't block** - instead, the message is placed in the recipient's mailbox, and the sending process continues.
On the other hand, when a process uses `receive/1`, it blocks until a matching message is found in its mailbox. The call to `receive/1` searches the mailbox for a message that matches any of the given patterns.
`receive/1` supports guards and multiple clauses, such as `case/2`.
Let's look at an example of a process sending a message to itself.
```elixir
# Get the pid of the current process
self_pid = self()
# Send a message to the current process
send(self_pid, :ping)
# Check messages in mailbox without consuming them
Process.info(self_pid, :messages) |> IO.inspect(label: "Messages in mailbox")
# Recieve the message waiting in mailbox (consumes the message in the mailbox)
receive do
:ping -> IO.puts(:pong)
end
# Check messages in mailbox again
Process.info(self_pid, :messages) |> IO.inspect(label: "Messages in mailbox")
```
An optional after clause can be given in case the message was not received after the given timeout period, specified in milliseconds.
(If timeout `0` is given then the message is expected to be already present in the mailbox.)
```elixir
receive do
{:message, message} when message in [:start, :stop] -> IO.puts(message)
_ -> IO.puts(:stderr, "Unexpected message received")
after
1000 -> IO.puts(:stderr, "Timeout, no message in 1 seconds")
end
```
In the elixir IEx shell, we have a helper function flush/0 that flushes or consumes and prints all the messages in the mailbox of the shell process.
```elixir
send(self(), :hello)
Process.info(self_pid, :messages) |> IO.inspect(label: "Messages in mailbox before flush")
# In the iex shell we wont have to use the `IEx.Helpers,` prefix since these helpers functions are imported automatically
IEx.Helpers.flush()
Process.info(self_pid, :messages) |> IO.inspect(label: "Messages in mailbox after flush")
```
## Navigation
================================================
FILE: chapters/ch_2.3_process_linking.livemd
================================================
# Process Linking & Trapping Exists
## Navigation
## Process Linking
In Elixir, when we create a process, we have the option to link it to its parent process. This means that if the child process encounters an error and fails, the parent process will be notified.
When we use the `spawn/1` function to create a process, it will not be linked to its parent process. As a result, if the child process encounters an error and fails, the parent process will not be notified.
To ensure that the parent process is notified of any errors in the child process, we can use the `spawn_link/1` function instead. This function creates a linked process, so if the child process crashes, the parent process will receive an EXIT signal.
To illustrate this, let's consider an example...
```elixir
unlinked_child_process = spawn(fn -> raise("BOOM! Unliked process crashed!") end) |> IO.inspect()
IO.inspect(Process.info(self(), :links))
:timer.sleep(100)
IO.puts("Parent process still alive!")
```
In the above example we can see that the parent process is still alive after the spawned process crashes. Lets see what happens if the processes were linked
(Uncomment the code below and run it. After running it comment it out again.
Since the code below crashes the live view process we need to comment it in order to run the rest of the code in this chapter.)
```elixir
# linked_child_process = spawn_link(fn ->
# :timer.sleep(100)
# raise("BOOM! Linked process crashed!")
# end)
# |> IO.inspect(label: "Linked process PID")
# IO.inspect Process.info(self(), :links)
# :timer.sleep(200)
# IO.puts "Parent process still alive!"
# Child process <-> parent process <-> Livebook evaluation process
```
This time the print statement "Parent process still alive!" is never printed because when linked process crashes it brings down the parent process with it.
In our case this also leads the linked live view process to crash.
When a linked process exits gracefully with a reason `:normal` this does not lead to the parent process to crash. Any other reason other than `:normal` is considered an abnormal termination and will lead to the linked processes exiting as well.
When a process reaches its end, by default it exits with reason `:normal`
```elixir
linked_process =
spawn_link(fn ->
exit(:normal)
Process.sleep(60000)
end)
:timer.sleep(100)
IO.inspect(Process.alive?(linked_process), label: "Linked process alive?")
```
Linking can also be done manually by calling `Process.link/1`, lets see a bigger example...
```elixir
defmodule LinkingProcess do
def call do
child_process = spawn(&recursive_link_inspectior/0)
IO.inspect(self(), label: "Parent process PID")
IO.inspect(child_process, label: "Child process PID")
IO.inspect(Process.info(self(), :links), label: "Parent process links")
send(child_process, :inspect_links)
# Wait for the child process to print its links
:timer.sleep(100)
# Link the two processes
Process.link(child_process)
:timer.sleep(100)
IO.inspect(Process.info(self(), :links), label: "Parent process links")
send(child_process, :inspect_links)
end
defp recursive_link_inspectior do
receive do
:inspect_links ->
links = Process.info(self(), :links)
IO.inspect(links, label: "Child process links")
end
recursive_link_inspectior()
end
end
LinkingProcess.call()
```
When a process is linked to others, a crash in that process can trigger a cascade effect, potentially causing multiple other linked processes to crash as well. For instance, imagine a scenario where five processes (P1 to P5) are linked as follows:
`P1 <-> P2 <-> P3 <-> P4 <-> P5`
If any of these processes crash, it will cause all five to fail due to their interconnectivity. For instance, if P4 crashes, it will cause P3 and P5 to crash as well. This, in turn, will lead to the failure of P2, which will ultimately cause P1 to fail as well.
It's important to remember that **process links are bidirectional**, which means that if one process fails, it will affect the other processes as well.
### Importance of process linking
Processes and links play an important role when building fault-tolerant systems. Elixir processes are isolated and don’t share anything by default. Therefore, a failure in a process will never crash or corrupt the state of another process. Links, however, allow processes to establish a relationship in case of failure. We often link our processes to supervisors which will detect when a process dies and start a new process in its place.
While other languages would require us to catch/handle exceptions, in Elixir we are actually fine with letting processes fail because we expect supervisors to properly restart our systems. “Failing fast” (sometimes referred as “let it crash”) is a common philosophy when writing Elixir software!
### Trapping EXITS
For some reason if we want to prevent a process from crashing when a linked process exits we can do so by trapping exit message.
Normally when a process finishes its work it implicitly calls `exit(:normal)` to communicate with its parent process that its job has been done. Any other argument to `exit/1` other than `:normal` is treated as an error.
Setting `trap_exit` to true in Elixir means that **exit signals received by a process are converted into messages** of the form `{'EXIT', From, Reason}`. These messages can then be received like any other message in the process's mailbox. On the other hand, if `trap_exit` is set to false, the process will exit if it receives an exit signal that is not a normal exit, and the signal will be passed on to any processes that are linked to it.
By using `trap_exit` and linking processes, we can prevent the failure of one process from causing the failure of another. This allows the linked process to handle the termination of the other process gracefully, rather than being abruptly terminated itself.
As always lets look at an example to understand this better...
```elixir
# Start trapping exit for the current process
Process.flag(:trap_exit, true)
# A linked process that will exit abnormally with a reason :boom
p = spawn_link(fn -> exit(:boom) end)
:timer.sleep(100)
# Is the child process is alive?
IO.inspect(Process.alive?(p), label: "Child process alive?")
# Check how the parent process that is trapping exit received an EXIT message
Process.info(self(), :messages) |> IO.inspect(label: "Messages in parent process mailbox")
```
As we see from the print messages the linked process crashing does not lead to a crash of the parent process as it is trapping exits, instead the parent receives a message like `{:EXIT, linked_process_pid, :boom}` in its mailbox.
It is generally recommended to avoid trapping exits as it can modify the normal behavior of processes. Instead, it is recommended to utilize monitors and supervisors to handle failures.
When a process traps exits, it becomes unresponsive to exit signals unless a kill exit reason is explicitly sent to it. Lets look at an example...
```elixir
# Un-killable exit trapper process
p =
spawn(fn ->
Process.flag(:trap_exit, true)
:timer.sleep(:infinity)
end)
IO.inspect(Process.alive?(p), label: "Process alive initially")
Process.exit(p, :normal)
:timer.sleep(100)
IO.inspect(Process.alive?(p), label: "After :normal exit signal")
Process.exit(p, :boom)
:timer.sleep(100)
IO.inspect(Process.alive?(p), label: "After :boom exit signal")
# Only a :kill exit signal can kill a process thats trapping exits.
Process.exit(p, :kill)
:timer.sleep(100)
IO.inspect(Process.alive?(p), label: "After :kill exit signal")
```
In Elixir, the `:normal` and `:kill` are special exit reasons. `:normal` signifies a successful and expected process termination. On the other hand, `:kill` is non-trappable, causing immediate process termination. Any other termination reasons are informational and can be trapped if necessary.
Note that the call to `Process.exit(pid, :normal)` function is silently ignored if the specified `pid` is different from the calling process's own `pid` (`self()`). This is an edge case.
## Resources
* https://eddwardo.github.io/posts/links-in-elixir/
## Navigation
================================================
FILE: chapters/ch_2.4_process_monitoring_and_hibernation.livemd
================================================
# Process Monitoring and Hibernation
## Navigation
## Process Monitors
Process **links are bidirectional**, which means that if a linked process exits, it will also bring down the current process. However, if we only want the current process to be notified when a process has exited, instead of linking, we can use monitors.
Unlike linking, **monitoring is unidirectional**. If there is an error in a monitored process, it does not bring down the current process. Instead, the current process is notified via a `{:DOWN, , :process, , }` message.
It's worth noting that even when the monitored process exits normally, we still receive a message. In the case of process linking, a process is only notified if the linked process exits abnormally (i.e., with a reason other than :normal).
Lets look at an example of process monitoring...
```elixir
pid = spawn(fn -> :timer.sleep(10000) end)
Process.monitor(pid)
Process.exit(pid, :boom)
:timer.sleep(100)
IO.inspect(Process.info(self(), :messages))
```
## Process hibernation
We can call `Process.hibernate/3` to hibernate a process.
From the official [documentation](https://erlang.org/doc/man/erlang.html#hibernate-3)
> Puts the calling process into a **wait state** where its **memory allocation has been reduced as much as possible**. This is useful if the process does not expect to receive any messages soon. The process is awaken when a message is sent to it, and control resumes in Module:Function with the arguments specified by Args with the call stack emptied.
> In more technical terms, `erlang:hibernate/3` discards the call stack for the process, and then **garbage collects** the process. After this, all live data is in one continuous heap. The heap is then shrunken to the exact same size as the live data that it holds.
### When is process hibernation useful?
Hibernation of a process can be beneficial in situations where the process should not be terminated but is not expected to receive any messages anytime soon. By hibernating the process, we can free up the memory that was allocated to the process during garbage collection and thus prevent unnecessary resource usage.
Some practical examples where hibernation can be useful include occasionally used processes that should not be dropped, as doing so may be interpreted as a network disconnection by the client. Additionally, any process that is expensive to reinitialize may also be a good candidate for hibernation.
Lets see an example...
```elixir
p1 =
spawn(fn ->
_big_binary = :crypto.strong_rand_bytes(1000)
:timer.sleep(:infinity)
end)
p2 =
spawn(fn ->
_big_binary = :crypto.strong_rand_bytes(1000)
Process.hibernate(IO, :puts, ["P2 woken from hibernation"])
# This never executes as execution resumes at the function passed to Process.hibernate/3
IO.puts("Kabooom!")
end)
Process.info(p1, :total_heap_size) |> IO.inspect(label: "Heap size of P1")
Process.info(p2, :total_heap_size) |> IO.inspect(label: "Heap size of P2")
# Wake p2 from hibernation by sending it a message
send(p2, :msg)
:timer.sleep(100)
# Here the process is no longer alive since after executing the IO.puts/1
# call it has no other work and exits normally.
Process.alive?(p2) |> IO.inspect(label: "P2 alive")
```
## Resources
* https://elixirforum.com/t/when-is-hibernation-of-processes-useful/23181/5
* https://hexdocs.pm/elixir/1.12.3/Process.html
## Navigation
================================================
FILE: chapters/ch_2.5_group_leaders_and_process_naming.livemd
================================================
# Group leaders and naming processes
## Navigation
## Group Leader
In Erlang, every process belongs to a process group, and each group has a group leader. The group leader is **responsible for handling I/O** for the processes in its group. When a process is spawned, it inherits the same group leader as its parent process.
At system start-up, the init process(the first process which coordinates the start-up of the system) is both its own group leader and the group leader of all processes.
The Erlang VM **models I/O devices as processes**, which enables different nodes in the same network to exchange file processes and read/write files between nodes. The group leader can be configured per process and is used in different situations. For example, when executing code in a remote terminal, it ensures that messages from a remote node are redirected and printed in the terminal that triggered the request.
The **main responsibility of the group leader is to collect I/O output from all processes in its group and pass it to or from the underlying system**. It essentially owns the standard input, standard output, and standard error channels on behalf of the group.
When a file is opened using `File.open/2`, it returns a tuple like `{:ok, io_device}`, where `io_device` is the PID of the process that handles the file. This process monitors the process that opened the file (the owner process), and if the owner process terminates, the file is closed, and the process itself terminates too.
```elixir
{:ok, io_device_pid} = File.open("test.csv", [:write])
IO.write(io_device_pid, "a binary")
```
When you call `IO.write(pid, binary)`, the IO module sends a message to the process identified by pid with the desired operation, such as :put_chars.
The message has the following structure: `{:io_request, , , {:put_chars, :unicode, "hello"}}`.
When you write to :stdio, you are actually sending a message to the group leader, which writes to the standard-output file descriptor.
Therefore, these three code snippets are equivalent:
```elixir
IO.puts "hello"
IO.puts :stdio, "hello"
IO.puts Process.group_leader, "hello"
```
To understand this better let see some examples
Suppose we have two Erlang nodes named "node1" and "node2".
You can create two `iex` shells for this like
```elixir
iex --sname node1@localhost
iex --sname node2@localhost
```
(Note: If you want to send messages between nodes on different networks, we need to start the named nodes with a shared cookie)
If we execute the following code in the iex shell of node1:
```elixir
Node.spawn_link(:node2@localhost, fn ->
IO.puts("I will be executed on node2 but printed on node1 since the group leader is node1")
end)
```
The output of the IO.puts operation will be sent to the group leader, which in this case is node1.
Therefore, the output will be printed on node1's standard output stream, even though the process that performed the operation is running on node2.
On the other hand, if we specify the device PID as the `:init` process on node2, the output will be seen on node2's standard output stream:
```elixir
Node.spawn_link(:node2@localhost, fn ->
init_process_pid = Process.whereis(:init)
IO.puts(
init_process_pid,
"I will be executed on node2 and printed on node2 since the device ID passed was node2's init process"
)
end)
```
Finally, we can also set the group leader of a process explicitly by calling `Process.group_leader/2`.
In the following example, we set the group leader of the process running on node2 to node2's `:init` process:
```elixir
Node.spawn_link(:node2@localhost, fn ->
init_process_pid = Process.whereis(:init)
Process.group_leader(self(), init_process_pid)
IO.puts(
"I will be executed on node2 and printed on node2 since the group leader is set to node2's init process"
)
end)
```
In this case, the output of the `IO.puts` operation will be sent to node2's `:init` process, which is the new group leader of the process.
Therefore, the output will be printed on node2's standard output stream.
## Process naming
We can name processes and then refer to them via their registered name.
```elixir
Process.register(self(), :my_process)
Process.registered()
|> Enum.any?(&(&1 == :my_process))
|> IO.inspect(label: ":my_process registered?")
```
```elixir
send(:my_process, "Hello")
Process.info(self(), :messages)
```
```elixir
Process.unregister(:my_process)
Process.registered()
|> Enum.any?(&(&1 == :my_process))
|> IO.inspect(label: ":my_process registered?")
```
## Resources
* https://www.erlang.org/doc/man/erlang.html#group_leader-0
* https://stackoverflow.com/questions/36318766/what-is-a-group-leader
* https://rokobasilisk.gitbooks.io/elixir-getting-started/content/io_and_the_file_system/processes_and_group_leaders.html
* https://elixirschool.com/en/lessons/advanced/otp_distribution#a-note-on-io-and-nodes-2
## Navigation
================================================
FILE: chapters/ch_3.1_genserver_introduction.livemd
================================================
# GenServer Introduction
## Navigation
## What is a Genserver?
In OTP(Open Telecom Platform), we have several behaviors that formalize common patterns in programming. Behaviors can be thought of as design patterns for processes. Over time, programmers have identified common patterns of using processes in OTP and designed standardized interfaces to cater to such use cases.
One such behavior is the GenServer(Generic Server), which comes bundled with OTP. Other examples of behaviors include Supervisors and Applications.
At its most basic level, **a GenServer is a single process that runs a loop and handles one message per iteration, passing along an updated state**. By using the GenServer behavior and implementing the necessary callbacks, we can easily implement a client-server relation.
A GenServer process starts by initializing its state and then enters a waiting state, anticipating incoming messages. Upon receiving a message, the process handles it, updates its state, and returns to the waiting state (genserver loop).
A process can only execute when it receives a message. After initialization, a process simply waits for messages, an **idle process doesn't consume any resources**.
## Genserver callbacks
In order to create a genserver we must first use the genserver behaviour by adding the following line to our module `use Genserver`
After this we can implement the Genserver callbacks, a genserver has the following callbacks..
* `init/1`
* `handle_continue/2`
* `handle_call/3`
* `handle_cast/2`
* `handle_info/2`
* `terminate/2`
* `format_status/2`
* `code_change/3`
These callbacks are called at various points in the lifecycle of a genserver. Lets build a simple `counter` to go through these callbacks one by one...
```elixir
defmodule Counter do
use GenServer
@impl true
def init(state) do
IO.inspect("init called, initial counter state: #{state}")
{:ok, state}
end
@impl true
def handle_cast({:inc, value}, state) do
{:noreply, state + value}
end
@impl true
def handle_cast({:dec, value}, state) do
{:noreply, state - value}
end
@impl true
def handle_call(:get_count, _from, state) do
{:reply, "The count is #{state}", state}
end
@impl true
def handle_info(message, state) do
IO.inspect("Handle info called with message #{inspect(message)}")
{:noreply, state}
end
@impl true
def terminate(reason, _state) do
IO.inspect("Genserver Terminating with reason #{reason}...")
end
end
```
With the above counter genserver code let us try to understand the different callbacks.
We will enable tracing genserver messages via the [:sys.trace/2](https://www.erlang.org/doc/man/sys.html#trace-2) function from the erlang `sys` module.
```elixir
IO.inspect("Starting Genserver")
{:ok, pid} = GenServer.start(Counter, 0)
IO.inspect("Genserver Started")
# Start tracing the genserver processes
:sys.trace(pid, true)
# Increment counter by 10
:ok = GenServer.cast(pid, {:inc, 10})
# Decrement counter by 5
:ok = GenServer.cast(pid, {:dec, 5})
# Increment counter by 5
:ok = GenServer.cast(pid, {:inc, 2})
current_count = GenServer.call(pid, :get_count)
IO.puts("Current count = #{current_count}")
# Send a message to the genserver process
send(pid, "Hi genserver!")
# Stop the genserver
GenServer.stop(pid, :boom)
```
Let's analyze the lifecycle of the GenServer by examining the output of the above code. Firstly notice that all of the functions are marked with `@impl true` to signify that they are implementing the GenServer behavior.
Each GenServer callback receives the current state of the process and has the opportunity to update it. The callbacks can also return various values like `:noreply`, `:reply`, `:continue`, `:stop`, `:hibernate`, etc. These values govern the GenServer's lifecycle.
#### Starting the GenServer - init/2
To start the GenServer, we call `GenServer.start(Counter, 0)` which starts the GenServer process as an **unlinked process** we can use `GenServer.start_link/3` to start it as a linked process. We pass it the GenServer module name and the initial state of our Counter GenServer process. The output indicates that the `GenServer.start/2` call is **synchronous** and waits until the `init/2` GenServer callback. Once started, the GenServer process pid is returned.
#### handle_cast/2
We then send different cast messages like :inc and :dec to the GenServer to modify the process state, which, in our case, increments or decrements the counter. The `handle_cast/2` GenServer callback handles these cast calls. It's important to remember that cast messages are **asynchronous** and the `GenServer.cast/2` call does not wait for the cast message to be processed. Also, using cast, the GenServer **cannot send a reply** back to the caller process, so we only receive a `:ok` as the return value when calling `GenServer.cast/2`.
#### handle_call/3
We then use `GenServer.call/3` to fetch the current count, which is handled by the `handle_call/3` GenServer callback. Unlike `GenServer.cast/2`, this is a **synchronous operation**, meaning the `GenServer.call/3` function call must wait until the GenServer finishes processing the message. It also **allows the GenServer to return a reply to the caller**. In our case, the Counter GenServer returns the current count as a string like "The count is #{state}". It's worth noting that the `handle_call/3` receives a from parameter, which contains the pid of the caller process.
### handle_info/2
Next, we send a message to the GenServer process using the `send/2` function. It's important to remember that a GenServer can also receive messages like any other elixir process. The `handle_info/2` GenServer callback handles such messages that are not calls or casts. In our case, we simply log the message "Hi genserver!".
#### terminate/2
Finally, we stop the GenServer process by calling `GenServer.stop/2`, which invokes the `terminate/2` GenServer callback, and the GenServer process is stopped.
You might be wondering when the other GenServer callbacks are invoked, lets go through them one by one....
#### handle_continue/2
Most GenServer callbacks have the option to return a value containing a continue instruction like `{:continue, continue_arg}`. When such a value is returned, the `handle_continue/2` callback is invoked to handle the continue instruction. This is useful for splitting the work in a callback into multiple steps and updating the process state along the way, or for performing work after initialization.
For example, to initialize a GenServer, we may need to perform a time-consuming task within the init/2 callback, which would block the caller and prevent the GenServer from starting. To avoid this, we can return a value like `{:ok, state, {:continue, continue_arg}}`, which allows the GenServer to start and unblocks the caller. The handle_continue/2 callback is then immediately invoked, where we can set the GenServer state.
#### format_status/2
This callback is infrequently used, but it can be helpful when inspecting a GenServer state with functions like `:sys.get_state/1`. It defines a formatted version of the status.
#### code_change/3
This callback is also rarely used. It handles changes to the GenServer's state when a new version of a module is loaded ([hot code swapping](https://medium.com/blackode/how-to-perform-hot-code-swapping-in-elixir-afc824860012)) and the term structure of the state needs to be updated.
## The terminate callback
The `terminate/2` callback is triggered when a GenServer is about to exit, allowing for any necessary cleanup operations. However, it is important to note that `terminate/2` is not always guaranteed to be called.
`terminate/2` is only called when the GenServer is trapping exits using the `Process.flag(:trap_exit, true)` OR if in a callback we return a `:stop` tuple or `raise` and exception. We will later study about process supervisors which can stop a genserver using a `:brutal_kill` strategy which also does not result in a call to `terminate/2`.
Therefore it is *not guaranteed* that `terminate/2` is called when a GenServer exits and we should not rely on it and place critical logic in this callback.
When using `GenServer.stop/2` the terminate/2 callback will be invoked before exiting even if the GenServer process is not trapping exits.
For further information, see the discussion [here](https://stackoverflow.com/a/39775617).
## Lifecycle of a GenServer
A simplified overview of the lifecycle of a GenServer is given below

Now that we have got an overview of the workings of a GenServer lets look at some gotachas and key points related to GenServers...
## GenServer Key Points to Remember
* A GenServer is a **single elixir process** that operates in a loop, processing messages from its mailbox in the **order** they are received.
* If a message takes a long time to process, calling synchronous functions such as `GenServer.call/2` may result in timeouts. You can specify a longer timeout (the default is 5 seconds) or use multiple GenServers to avoid overloading a single process.
* GenServer functions fall into two categories: synchronous functions, like `GenServer.call/3`, which wait for a response, and asynchronous functions, like `GenServer.cast/2`, which do not wait for a reply.
* Prefer using `GenServer.call/2` instead of `GenServer.cast/2` to apply backpressure and avoid overwhelming the `GenServer` process. `GenServer.call/2` blocks the caller process until a reply is received, ensuring controlled interactions and preventing message overload.
* Implementing GenServer callbacks is optional, as Elixir provides default implementations. For example, if you don't define `handle_cast/2`, Elixir will use [a default implementation](https://github.com/elixir-lang/elixir/blob/a64d42f5d3cb6c32752af9d3312897e8cd5bb7ec/lib/elixir/lib/gen_server.ex#L809) that raises an error when the GenServer receives a cast message.
GenServer callbacks can return different values to control the process's lifecycle. For instance:
* Returning `{:continue, term()}` tells the GenServer to continue processing the message, triggering the `handle_continue/2` callback.
* Returning `{:stop, reason, new_state}` terminates the GenServer process.
* Returning `:hibernate` puts the GenServer process to sleep, freeing up resources.
## References
* [HexDocs: GenServer](https://hexdocs.pm/elixir/GenServer.html#call/3)
* [ElixirLang: GenServer](https://elixir-lang.org/getting-started/mix-otp/genserver.html)
* [Exercism: GenServer](https://exercism.org/tracks/elixir/concepts/genserver)
* https://github.com/DockYard-Academy/curriculum/blob/main/reading/genservers.livemd
## Navigation
================================================
FILE: chapters/ch_3.2_building_a_genserver.livemd
================================================
# Buliding a GenServer
## Navigation
## Building a GenServer from scratch
Let's delve into crafting a simplified GenServer-like implementation using Elixir's fundamental primitives such as `spawn_link` and `send`. This exercise will give us a clearer insight into the inner workings of GenServers.
For the sake of simplicity, we will focus on implementing only the commonly used callbacks: `init/1`, `handle_call/3`, `handle_cast/2`, and `handle_info/2`.
```elixir
defmodule MyGenServer do
# Callbacks to implement
@callback init(term()) :: {:ok, term()}
@callback handle_call(term(), pid(), term()) :: {:reply, term(), term()}
@callback handle_cast(term(), term()) :: {:noreply, term()}
@callback handle_info(term(), term()) :: {:noreply, term()}
# == Public API ==
def start_link(module, args) do
{:ok, spawn_link(__MODULE__, :server_init, [module, args])}
end
def call(server_pid, args) do
send(server_pid, {:call, self(), args})
receive do
{:response, response} -> response
end
end
def cast(server_pid, args) do
send(server_pid, {:cast, args})
end
def stop(server_pid, reason \\ :normal) do
send(server_pid, {:stop, reason})
end
# == Internal implementation ==
def server_init(module, args) do
{:ok, state} = module.init(args)
genserver_loop(module, state)
end
# Recursively loop and wait for messages
def genserver_loop(module, state) do
receive do
{:call, parent_pid, args} ->
{:reply, response, new_state} = module.handle_call(args, parent_pid, state)
send(parent_pid, {:response, response})
genserver_loop(module, new_state)
{:cast, args} ->
{:noreply, new_state} = module.handle_cast(args, state)
genserver_loop(module, new_state)
{:stop, reason} ->
module.terminate(reason, state)
exit(reason)
request ->
{:noreply, new_state} = module.handle_info(request, state)
genserver_loop(module, new_state)
end
end
end
```
## Using our GenServer
```elixir
defmodule Stack do
@behaviour MyGenServer
@impl true
def init(args) do
{:ok, args}
end
@impl true
def handle_call(:get_stack, _from, state) do
{:reply, state, state}
end
@impl true
def handle_call(:pop, _from, [num | state]) do
{:reply, num, state}
end
@impl true
def handle_cast({:push, num}, state) do
IO.inspect(num, label: "PUSH")
{:noreply, [num | state]}
end
@impl true
def handle_info(:stats, state) do
IO.inspect("Stack length: #{length(state)}")
{:noreply, state}
end
end
```
```elixir
{:ok, stack_server_pid} = MyGenServer.start_link(Stack, [])
MyGenServer.cast(stack_server_pid, {:push, 1})
MyGenServer.cast(stack_server_pid, {:push, 2})
MyGenServer.cast(stack_server_pid, {:push, 3})
MyGenServer.call(stack_server_pid, :get_stack) |> IO.inspect(label: "STACK")
MyGenServer.call(stack_server_pid, :pop) |> IO.inspect(label: "POP")
MyGenServer.call(stack_server_pid, :get_stack) |> IO.inspect(label: "STACK")
send(stack_server_pid, :stats)
```
## Navigation
================================================
FILE: chapters/ch_3.3_genserver_examples.livemd
================================================
# GenServer examples
## Navigation
## Password Manager
Our objective is to create a straightforward password manager GenServer that can save user's passwords in its state.
Notice how we have developed an API with functions like `save_password/3`, `get_password/1`, and `delete_password/1`. This API facilitates easy communication with the GenServer without needing to directly call GenServer functions like `GenServer.call/3` or `GenServer.cast/2`.
Also notice how we have ensured that the GenServer code is kept to a minimum and have placed our password validation logic in a separate module. This approach of separating the logic into a purely functional module makes testing easier since the buissness logic can be tested in isolation without dealing with the GenServer.
```elixir
defmodule Password do
defstruct url: nil, username: nil, password: nil, inserted_at: nil
@doc """
Check if a password entry is valid
"""
def validate_entry(%Password{url: url, username: username, password: password}) do
with {:ok, _url} <- URI.new(url),
{:ok, _username} <- validate_username(username),
{:ok, _password} <- validate_password(password) do
{:ok,
%Password{
url: url,
username: username,
password: password,
inserted_at: DateTime.utc_now()
}}
end
end
# Helper functions
defp validate_username(username) do
cond do
not is_binary(username) -> {:error, "Invalid username"}
String.length(username) == 0 -> {:error, "Username is empty"}
true -> {:ok, username}
end
end
defp validate_password(password) do
cond do
not is_binary(password) -> {:error, "Invalid password"}
String.length(password) < 3 -> {:error, "Password must be atleast 3 character long"}
true -> {:ok, password}
end
end
end
defmodule PasswordManager do
use GenServer
# Public APIs
def start_link(_opts) do
GenServer.start_link(
__MODULE__,
%{},
# Use the module name as the name of the GenServer Process
name: __MODULE__
)
end
def save_password(url, username, password) do
entry = %Password{
url: url,
username: username,
password: password
}
GenServer.call(__MODULE__, {:save_password, entry})
end
def get_password(url) do
GenServer.call(__MODULE__, {:get_password, url})
end
def delete_password(url) do
GenServer.cast(__MODULE__, {:delete_password, url})
end
def stop(), do: GenServer.stop(__MODULE__)
# Callbacks
@impl true
def init(state) do
{:ok, state}
end
@impl true
def handle_call({:save_password, new_password}, _from, state) do
case Password.validate_entry(new_password) do
{:ok, entry} -> {:reply, :saved, Map.put(state, entry.url, entry)}
{:error, reason} -> {:reply, {:error, reason}, state}
end
end
@impl true
def handle_call({:get_password, url}, _from, state) do
case Map.get(state, url) do
nil -> {:reply, :not_found, state}
entry -> {:reply, entry, state}
end
end
@impl true
def handle_cast({:delete_password, url}, state) do
state = Map.delete(state, url)
{:noreply, state}
end
end
```
```elixir
# Start the Password Manager Genserver
{:ok, _pid} = PasswordManager.start_link(nil)
PasswordManager.save_password("gmail.com", "john_doe@gmail.com", "12345")
|> IO.inspect(label: "Saving Gmail creds")
PasswordManager.save_password("spotify.com", "music4life", "ab")
|> IO.inspect(label: "Saving Spotify creds")
PasswordManager.save_password("apple.com", "iuser", "ilife")
|> IO.inspect(label: "Saving Apple creds")
PasswordManager.get_password("gmail.com") |> IO.inspect(label: "Gmail creds")
PasswordManager.get_password("spotify.com") |> IO.inspect(label: "Spotify creds")
PasswordManager.delete_password("gmail.com") |> IO.inspect(label: "Deleting Gmail")
PasswordManager.get_password("gmail.com") |> IO.inspect(label: "Gmail creds")
PasswordManager.stop()
```
### Testing GenServer
Now lets try to write some tests for the above GenServer.
```elixir
ExUnit.start()
defmodule PasswordManagerTest do
use ExUnit.Case
describe "save_password/3" do
test "saves password if password entry is valid" do
{:ok, _pid} = PasswordManager.start_link(nil)
assert :saved == PasswordManager.save_password("gmail.com", "john_doe@gmail.com", "12345")
assert %Password{
url: "gmail.com",
username: "john_doe@gmail.com",
password: "12345",
inserted_at: _
} = PasswordManager.get_password("gmail.com")
end
test "does not save password if password entry is invalid" do
{:ok, _pid} = PasswordManager.start_link(nil)
assert {:error, "Password must be atleast 3 character long"} ==
PasswordManager.save_password("gmail.com", "john_doe@gmail.com", "12")
assert :not_found == PasswordManager.get_password("gmail.com")
end
end
describe "delete_password/3" do
test "deletes password if password found" do
{:ok, _pid} = PasswordManager.start_link(nil)
assert :saved == PasswordManager.save_password("gmail.com", "john_doe@gmail.com", "12345")
assert :ok == PasswordManager.delete_password("gmail.com")
assert :not_found = PasswordManager.get_password("gmail.com")
end
end
end
ExUnit.run()
```
Here the password validation logic can be tested independently, without having to start the GenServer in the tests. This method of testing is preferable since testing pure functions is generally much easier than testing async GenServer code.
## Cron Job
Lets see another example of building a GenServer. In Elixir, you can easily create a basic CRON job using GenServers to execute a task periodically.
```elixir
defmodule CronJob do
use GenServer
# Every 10 seconds
@interval :timer.seconds(10)
def start_link(_opts) do
GenServer.start_link(__MODULE__, %{})
end
def init(state) do
schedule_work()
{:ok, state}
end
def handle_info(:work, state) do
work()
schedule_work()
{:noreply, state}
end
defp schedule_work() do
Process.send_after(self(), :work, @interval)
end
defp work() do
IO.inspect("Working...")
end
end
CronJob.start_link(nil)
```
This works fine for simple use cases however, if you require more advanced functionality consider using a library such as [Quantum](https://github.com/quantum-elixir/quantum-core).
## Resources
* https://hexdocs.pm/elixir/1.14.4/GenServer.html#reply/2
* https://medium.com/blackode/2-unique-use-cases-of-genserver-reply-deep-insights-elixir-expert-31e7abbd42d1
## Navigation
================================================
FILE: chapters/ch_3.4_other_genserver_functions.livemd
================================================
# Other GenServer functions
## Navigation
## GenServer.reply/2
In the previous chapters, we learned that a GenServer is a single process that processes messages from its mailbox one at a time. When using `GenServer.call/3`, the calling process waits until the GenServer sends a reply.
However, in some cases, a GenServer may receive a message that requires a time-consuming task, which can block the GenServer and prevent it from processing new messages while it handles the lengthy task.
To avoid this issue, we can delegate the time-consuming task to a separate process, which allows the GenServer to continue handling new messages without being blocked. Once the time-consuming task is completed, the GenServer can reply back to the caller using `GenServer.reply/2`.
**To put it simply,`GenServer.reply/2` can be used to send a reply back to a client that has called `GenServer.call/3`. This is especially useful when the reply cannot be specified in the return value of `handle_call/3`**
---
To illustrate this concept, let's walk through an example.
We will build a *FoodOrderingServer* which allows users to order for a food item or list past orders. Lets suppose the call to list the past order is a fast one however the call to place an order is a slow operation.
Ideally we don't want to block other calls to the GenServer while its busy placing an order.
```elixir
defmodule FoodOrderingServer do
use GenServer
# Public APIs
def start_link(_opts) do
GenServer.start_link(__MODULE__, %{}, name: __MODULE__)
end
def place_order(user, item) do
# We specify a timeout of 10 seconds to avoid timeout errors,
# since placing an order takes a lot of time
GenServer.call(__MODULE__, {:place_order, user, item}, 10000)
end
def list_orders(user) do
GenServer.call(__MODULE__, {:list_orders, user})
end
# Callbacks
@impl true
def init(_args) do
{:ok, %{}}
end
@impl true
def handle_call({:place_order, user, item}, from, state) do
IO.puts("Recieved new order request from #{inspect(from)}")
spawn(fn ->
# Simulate placing order which takes 6 seconds
:timer.sleep(6000)
send(__MODULE__, {:order_placed, user, item, from})
end)
# Notice how we return :noreply here
# (the caller process will be blocked and waiting since we did not reply)
{:noreply, state}
end
@impl true
def handle_call({:list_orders, user}, _from, state) do
{:reply, Map.get(state, user, []), state}
end
@impl true
def handle_info({:order_placed, user, item, from}, state) do
state =
Map.update(
state,
user,
[item],
fn existing_orders -> [item | existing_orders] end
)
IO.puts("Order #{item} ready for #{user}")
# Send reply to the caller who is waiting for the order
GenServer.reply(from, {:ok, :order_placed})
{:noreply, state}
end
end
```
```elixir
# Stop the Server if its already running
Process.whereis(FoodOrderingServer) |> GenServer.stop()
{:ok, _pid} = FoodOrderingServer.start_link(nil)
# This two lines of code will execute one by one synchornously since the current process will be
# waiting untill the order is placed and the GenServer replies back
FoodOrderingServer.place_order("Jhon", "sandwich")
FoodOrderingServer.place_order("Tom", "pizza")
```
```elixir
# Ordering simultaneously from different processes
spawn(fn -> FoodOrderingServer.place_order("Jhon", "burger") end)
spawn(fn ->
FoodOrderingServer.place_order("Tom", "ice cream")
FoodOrderingServer.list_orders("Tom")
|> IO.inspect(label: "Toms orders")
end)
spawn(fn ->
FoodOrderingServer.list_orders("Tom")
|> IO.inspect(label: "Toms orders")
end)
```
#### Code Breakdown
In the given code, we initially place orders one by one and observe that each call to `FoodOrderingServer.place_order/2` waits for the GenServer's reply before proceeding.
To simulate multiple users placing and listing their orders simultaneously, we spawn two processes and place orders from each of them concurrently. Since the GenServer delegates the order placing task to another process, it is not blocked and can immediately respond to both orders. Once the orders are processed, the processes send a message back to the GenServer via `send(__MODULE__, {:order_placed, user, item, from})` which is handled by the `handle_info/2` callback, after which the GenServer replies back to the callers who were awaiting the reply using `GenServer.reply(from, {:ok, :order_placed})`.
This design unblocks the GenServer, allowing it to always respond to messages promptly.
It's important to note that `GenServer.reply/2` can be invoked from any process, not just from within the GenServer process. In our example, we could have called `GenServer.reply(from, {:ok, :order_placed})` from the spawned processes.
This is possible because the `from` parameter holds the PID of the caller along with a `reference` that enables the caller to recognize that the message came as a reply for the `GenServer.call/2` that it was waiting for.
---
(Note: There are 2 other functions `GenServer.abcast/3` and `GenServer.multi_call/4` which allows us to cast and call multiple GenServers at a time and can be useful in a distributed environment.)
## Navigation
================================================
FILE: chapters/ch_4.0_the_registry_module.livemd
================================================
# The Registry module
## Navigation
## What is Registry?
From the official documentation
> A local, decentralized and scalable key-value process storage. It allows developers to lookup one or more processes with a given key.
Lets go through some important points about registries...
* The Registry in Elixir is a **process store** that stores **key-value** pairs, allowing us to register a process under a specific name.
* There are two types of Registries: **unique and duplicate**. A unique Registry only permits one process to be registered under a given name, while a duplicate Registry permits multiple processes to be registered under the same name.
* Each entry in the Registry is associated with the process that registered it. If the process crashes, the Registry automatically removes the keys associated with that process.
* The Registry **compares keys using the match operation (===/2)**.
* **Partitioning** the Registry is possible, allowing for more scalable behavior in highly concurrent environments with thousands or millions of entries.
* The Registry **uses ETS tables** to store data under the hood.
* Registries can **only be run locally** and donot support distributed access.
## Where to use Registry?
The most common use of Registry is to name process. The `:via`is frequently used to specify the process name when using the Registry.
In addition to process naming, the Registry offers other useful features such as a dispatch mechanism that enables developers to implement custom logic for request initiation. With this dispatching mechanism, developers can build scalable and highly efficient systems, such as a local PubSub, by utilizing the `dispatch/3` function.
## Naming processes using Registry
The most common use of Registry is in naming processes.
First we start the Registry process
```elixir
# We start a Registry process and name it "Registry.ProcessStore"
# Notice we use `keys: :unique` option which means every key in the Registry
# will point to a single process
{:ok, _} = Registry.start_link(keys: :unique, name: Registry.ProcessStore)
```
Now lets use this registry to name a GenServer
```elixir
defmodule Stack do
use GenServer
# Callbacks
@impl true
def init(stack) do
{:ok, stack}
end
@impl true
def handle_cast({:push, e}, stack) do
{:noreply, [e | stack]}
end
# Other Callbacks ....
end
```
Now that we have a simple GenServer lets try to start 2 instances of this GenServer and name each of them using the Registry.
```elixir
# Start the Stack GenServer and register it under the "Registry.ProcessStore"
# with a key named "stack_server_1"
GenServer.start_link(
Stack,
[],
name: {:via, Registry, {Registry.ProcessStore, "stack_server_1"}}
)
# Start another instance of the Stack server with the name "stack_server_2"
# Notice how we also store an optional value associated with this process `:second_stack`
GenServer.start_link(
Stack,
[],
name: {:via, Registry, {Registry.ProcessStore, "stack_server_2", :second_stack}}
)
```
When we register a process under a Registry, we have the option to store an associated metadata with that entry. In the second example mentioned above, we not only registered an instance of our Stack GenServer process under the registry but also stored the value `:second_stack` along with its corresponding entry.
Now lets call our Stack GenServer using its Registered name, we can use the `lookup/2` function that returns a list like `[{pid(), value()}]`. For Registries that allow duplicate entries a lookup can return multiple entries in this list.
```elixir
# Since we use an unique Registry, its guaranteed we will only get atmost
# one process under the name "stack_server"
[{stack_server_one_pid, nil}] = Registry.lookup(Registry.ProcessStore, "stack_server_1")
GenServer.cast(stack_server_one_pid, {:push, "stack1"})
[{stack_server_two_pid, value}] = Registry.lookup(Registry.ProcessStore, "stack_server_2")
IO.inspect(value, label: "Stack server 2 value")
GenServer.cast(stack_server_two_pid, {:push, "stack2"})
IO.inspect(:sys.get_state(stack_server_one_pid), label: "Stack Server1 state")
IO.inspect(:sys.get_state(stack_server_two_pid), label: "Stack Server2 state")
```
Let us now explore how a Registry operates when we permit the storage of duplicate entries.
When utilizing duplicate registries, it is not possible to use the :via option. To illustrate how duplicate registries function, let us attempt to register the current process twice using the `register/3` function.
```elixir
{:ok, _} = Registry.start_link(keys: :duplicate, name: Registry.DupProcessStore)
{:ok, _} = Registry.register(Registry.DupProcessStore, "async_city", :hello)
{:ok, _} = Registry.register(Registry.DupProcessStore, "async_city", :world)
Registry.lookup(Registry.DupProcessStore, "async_city")
```
Observe how the invocation of `Registry.lookup/2` resulted in a list containing 2 tuples, each representing a process along with its associated metadata. These two processes were registered under the identical name, "async_city".
## Dispatching using Registry
Dispatching allows us to fetch all entries for all processes registered under a given key. We pass a callback function which would receive the list of `{pid, value}` for every entry registered under the given key.
It is worth noting that dispatching takes place in the process that initiates the `dispatch/3` call, either serially or concurrently in the case of multiple partitions.
To better understand the concept of dispatching, let us take a look at an example.
```elixir
# Start a Registry which allows duplicates
{:ok, _} = Registry.start_link(keys: :duplicate, name: Registry.Numbers)
# Register the current process 3 times under the same key "odd"
# Save a value along with registration that is 1, "3" and fn -> 5
{:ok, _} = Registry.register(Registry.Numbers, "odd", 1)
{:ok, _} = Registry.register(Registry.Numbers, "odd", "3")
{:ok, _} = Registry.register(Registry.Numbers, "odd", fn -> 5 end)
# Register the current process 3 times under another key "even"
{:ok, _} = Registry.register(Registry.Numbers, "even", 2)
{:ok, _} = Registry.register(Registry.Numbers, "even", "4")
{:ok, _} = Registry.register(Registry.Numbers, "even", fn -> 6 end)
# Dispatching on processes registered under the key "odd"
Registry.dispatch(Registry.Numbers, "odd", fn entries ->
for {_pid, num} <- entries do
cond do
is_number(num) -> num
is_binary(num) -> String.to_integer(num)
is_function(num) -> num.()
end
|> IO.inspect(label: "ODD")
end
end)
# Dispatching on processes registered under the key "even"
Registry.dispatch(Registry.Numbers, "even", fn entries ->
for {_pid, num} <- entries do
cond do
is_number(num) -> num
is_binary(num) -> String.to_integer(num)
is_function(num) -> num.()
end
|> IO.inspect(label: "EVEN")
end
end)
```
### Building a pubsub system with Registry
We can also use this `dispatch/3` function to implement a local, non-distributed PubSub.
This works by registering multiple processes under a given key which acts like a pubsub topic.
We can then send a message to all processes registered under a key to emulate a pubsub broadcast. Here we also set the number of partitions to the number of schedulers online, which will make the registry more performant on highly concurrent environments.
Lets see this in action.
```elixir
{:ok, _} =
Registry.start_link(
keys: :duplicate,
name: Registry.ChatPubSub,
# The number of schedulers available in the VM
partitions: System.schedulers_online()
)
# Register the current process under the "Registry.ChatPubSub" registery with a key "chat_room:1"
{:ok, _} = Registry.register(Registry.ChatPubSub, "chat_room:1", [])
# Dispatching by looking up all process registered with the key "chat_room:1" in the
# "Registry.ChatPubSub" registry and then sending them a message.
Registry.dispatch(Registry.ChatPubSub, "chat_room:1", fn entries ->
for {pid, _} <- entries, do: send(pid, {:broadcast, "hello world"})
end)
# Receive any broadcasted messages
receive do
{:broadcast, message} -> IO.inspect(message, label: "Received broadcast")
end
```
By using this approach, we can register multiple processes under a single key within a Registry and subsequently dispatch messages to all the processes associated with that key.
## Other registry functions and match specs
Apart from the `register/3` and `lookup/2` functions, the Registry module has several other useful functions which allows us to find and manipulate data inside the Registry. Most of these functions are straightforward to understand.
However, its worth noting that some functions use match specs to find matching entries from the Registry let us look at some examples to understand how match specs work.
From the official documentation
> A match spec is a pattern that must be an atom or a tuple that will match the structure of the value stored in the registry. The atom `:_` can be used to ignore a given value or tuple element, while the atom `:"$1"` can be used to temporarily assign part of pattern to a variable for a subsequent comparison.
> Optionally, it is possible to pass a list of guard conditions for more precise matching. Each guard is a tuple, which describes checks that should be passed by assigned part of pattern. For example the `$1 > 1` guard condition would be expressed as the `{:>, :"$1", 1}` tuple. Please note that guard conditions will work only for assigned variables like :"$1", :"$2", and so forth.
Lets consider the `match/4` functions in the Registry module that returns entries from the Registry that matches the match spec passed.
```elixir
Registry.start_link(keys: :duplicate, name: Registry.MatchSpec)
# Register the current process multiple times with different values under the key "my_key"
{:ok, _} = Registry.register(Registry.MatchSpec, "my_key", 1)
{:ok, _} = Registry.register(Registry.MatchSpec, "my_key", "one")
{:ok, _} = Registry.register(Registry.MatchSpec, "my_key", {1, 2})
{:ok, _} = Registry.register(Registry.MatchSpec, "my_key", {2, 1})
{:ok, _} = Registry.register(Registry.MatchSpec, "my_key", {2, 2})
# Use different match specs to find matching entries from the Registry under the key "my_key"
Registry.match(Registry.MatchSpec, "my_key", 1)
|> IO.inspect(label: "* match spec: 1 returned")
Registry.match(Registry.MatchSpec, "my_key", :_)
|> IO.inspect(label: "* match spec: :_ returned")
Registry.match(Registry.MatchSpec, "my_key", {2, :_})
|> IO.inspect(label: "* match spec: {2, :_} returned")
Registry.match(Registry.MatchSpec, "my_key", {:"$1", :"$1"})
|> IO.inspect(label: ~s(* match spec: {:"$1", :"$1"} returned))
# Also using guards along with match specs
Registry.match(Registry.MatchSpec, "my_key", {:"$1", :"$2"}, [{:>, :"$1", :"$2"}])
|> IO.inspect(label: ~s(* match spec: {:"$1", :"$2"} with guard [{:>, :"$1", :"$2"}] returned))
Registry.match(Registry.MatchSpec, "my_key", :"$1", [{:is_binary, :"$1"}])
|> IO.inspect(label: ~s(* match spec: :"$1" with guard [{:is_binary, :"$1"}] returned"))
```
Other functions like `count_match/4`, `select/2`, etc in the Registry module also use match specs for filtering entries in the Registry.
## Resources
* The official Registry documentition: https://hexdocs.pm/elixir/1.14.4/Registry.html#content
* Guards in elixir: https://hexdocs.pm/elixir/1.14/patterns-and-guards.html#guards
## Navigation
================================================
FILE: chapters/ch_5.1_supervisors_introduction.livemd
================================================
# Introduction to Supervisors
## Navigation
## What is a Supervisor?
In our previous lesson on OTP, we explored the Genserver behavior. However, there's another critical behavior in OTP that deserves our attention: the **supervisor**.
Supervisors fulfill the role of overseeing other processes, often referred to as child processes, and contribute to the creation of a hierarchical process structure called a **supervision tree**. This tree not only ensures fault-tolerance but also governs the application's startup and shutdown processes.
Supervisors are the driving force behind the Elixir developer's inclination towards embracing the "let it crash" or "fail fast" philosophy. This approach allows supervisors to automatically restart crashed processes, facilitating a more robust system.
To better understand how supervisors work, let's examine a simple example. We'll create a Stack Genserver module with a bug that causes it to crash when attempting to pop an element from an empty stack. Since our Genserver is supervised, we can observe how the supervisor automatically restarts the failed Genserver process when it crashes.
```elixir
defmodule Stack do
use GenServer
def start_link(%{initial_value: value, name: name}) do
GenServer.start_link(__MODULE__, value, name: name)
end
## Callbacks
@impl true
def init(arg) do
IO.puts("Stack GenServer starting up!")
{:ok, [arg]}
end
@impl true
def handle_call({:push, element}, _from, stack) do
IO.puts("Pushed #{inspect(element)}")
{:reply, :pushed, [element | stack]}
end
@impl true
def handle_cast(:pop, [popped | stack]) do
IO.puts("Popped #{inspect(popped)}")
{:noreply, stack}
end
end
```
```elixir
children = [
%{
id: :stack_1,
# The Stack is a child porcess started via Stack.start_link/1
start: {Stack, :start_link, [%{initial_value: 0, name: :stack_1}]}
}
]
# Now we start the supervisor process and pass it the list of child specs (child processes to supervise)
# On starting the supervisor, it automatically starts all the child processes and supervises them
{:ok, supervisor_pid} = Supervisor.start_link(children, strategy: :one_for_one)
# After the supervisor starts, we can query the supervisor for information regarding all child processes supervised under it
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisor's children")
```
Now lets see what happens if our Stack Genserver process crashes
```elixir
GenServer.whereis(:stack_1) |> IO.inspect(label: "Stack Genserver Process pid")
:sys.get_state(GenServer.whereis(:stack_1))
|> IO.inspect(label: "Intial Genserver state")
GenServer.call(:stack_1, {:push, 10})
GenServer.call(:stack_1, {:push, 20})
GenServer.cast(:stack_1, :pop)
GenServer.cast(:stack_1, :pop)
GenServer.cast(:stack_1, :pop)
:sys.get_state(GenServer.whereis(:stack_1))
|> IO.inspect(label: "Genserver state just before crash")
# Boom! Stack genserver crashes..
GenServer.cast(:stack_1, :pop)
# wait for the supervisor to restart the Stack Server process
Process.sleep(200)
GenServer.whereis(:stack_1) |> IO.inspect(label: "Restarted stack Genserver Process pid")
:sys.get_state(GenServer.whereis(:stack_1)) |> IO.inspect(label: "Genserver state after crash")
```
## Child Specs
When starting a supervisor, we have the option to provide a list of child specifications that dictate how the supervisor should handle starting, stopping, and restarting each child process.
A supervisor can supervise two types of processes: workers and other supervisor processes. The former is commonly known as a `worker`, while the latter is referred to as a `supervisor`, typically forming a supervision tree.
A child specification is represented as a map with up to six elements. The first two elements are mandatory, while the remaining ones are optional.
Lets go through the different options that we can specify in the supervisor child spec
* `:id` - This key is **required** and serves as an internal identifier used by the supervisor to identify the child specification. It should be unique among the workers within the same supervisor.
* `:start` - This key is **required** and contains a tuple specifying the module, function, and arguments used to start the child process.
* `:restart` - This optional key, defaulted to `:permanent`, is an atom that determines when a terminated child process should be restarted.
* `:shutdown` - This optional key, defaulted to 5_000 (5 seconds) for workers and `:infinity` for supervisors, specifies how a child process should be terminated, either by an integer representing a timeout or the atom `:infinity`.
* `:type` - This optional key, defaulted to `:worker`, specifies whether the child process is a `:worker` or a `:supervisor`.
* `:modules` - This optional key contains a list of modules used by hot code upgrade mechanisms to identify processes using specific modules.
A child specification can be defined in one of three ways:
1. As a map representing the child specification itself.
```elixir
children = [
%{
id: :stack_1,
start: {Stack, :start_link, [%{initial_value: 0, name: :stack_1}]}
}
]
```
The above example defines a child with `:id` of `:stack_1`, which is started by invoking `Stack.start_link(%{initial_value: 0, name: :stack_1})`.
2. As a tuple with the module name as the first element and the start argument as the second.
```elixir
children = [
{Stack, %{initial_value: 0, name: :stack_1}}
]
```
When using this shorthand notation, the supervisor calls `Stack.child_spec(%{initial_value: 0, name: :stack_1})` to retrieve the child specification. The `Stack` module is responsible for defining its own `child_spec/1` function.
The `Stack` module can define its child specification as follows:
```elixir
def child_spec(arg) do
%{
id: Stack,
start: {Stack, :start_link, [arg]}
}
end
```
In this case, since `GenServer` already defines `Stack.child_spec/1`, we can leverage the automatically generated `child_spec/1` function and customize it by passing options directly to `use GenServer`. We will see examples of this in later chapters
3. Alternatively, a child specification can be specified by providing only the module name.
```elixir
children = [Stack]
```
This is equivalent to `{Stack, []}`. However, in our case, it would be invalid since `Stack.start_link/1` requires an initial value, and passing an empty list wouldn't work.
### The `Supervisor.child_spec/2` function
When using the shorthand notations mentioned above, such as the `{module, arg}` tuple or a module name only as a child specification, we can modify the generated child specifications using the `Supervisor.child_spec/2` function.
* When a two-element tuple of the form `{module, arg}` is provided, the child specification is retrieved by calling `module.child_spec(arg)`.
* When only a module is given, the child specification is retrieved by calling `module.child_spec([])`.
After retrieving the child specification, any overrides specified in the function argument are applied directly to the child spec.
For example, we can use the shorthand notation `{Stack, %{initial_value: 0, name: :stack_1}}`, but this would set `id: Stack` as the child's identifier since it is the default behavior of `module.child_spec(arg)`. However, we can override this behavior as shown below:
```elixir
children = [
Supervisor.child_spec({Stack, %{initial_value: 0, name: :stack_1}}, id: :special_stack)
]
```
## Resources
* https://hexdocs.pm/elixir/1.14.4/Supervisor.html
* https://elixir-lang.org/getting-started/mix-otp/supervisor-and-application.html
## Navigation
================================================
FILE: chapters/ch_5.2_supervision_strategies.livemd
================================================
# Supervision strategies
```elixir
Mix.install([
{:kino, "~> 0.9.0"}
])
```
## Navigation
## Supervision strategies
When starting a supervisor, we have the ability to specify a supervision strategy. This strategy determines the actions taken by the supervisor when one of its child processes crashes.
In the previous chapter, we started the supervisor for our Stack GenServer process using the `Supervisor.start_link(children, strategy: :one_for_one)` function call.
Here, the `:strategy` option passed to the supervisor refers to the supervision strategy being used.
Now, let's explore each of the supervision strategies in detail.
To illustrate the different strategies, we'll consider a simple GenServer that crashes if we send it a `:boom` message. This GenServer stores a random positive integer in its state.
```elixir
defmodule CrashDummyServer do
use GenServer
def start_link(name) do
random_state = System.unique_integer([:positive])
GenServer.start_link(__MODULE__, {random_state, name}, name: name)
end
## Callbacks
@impl true
def init({random_value, name}) do
IO.inspect("#{name} starting up!")
{:ok, random_value}
end
@impl true
def handle_cast(:boom, state) do
process_pid = self() |> inspect()
raise "BOOM! CrashDummyServer process: #{process_pid} crashed!"
{:noreply, state}
end
end
```
In the examples so far, we started a supervisor by directly calling the `Supervisor.start_link/2` function with the required options. However we can also define the supervisor as a module instead.
To do so we have to use the `Supervisor` otp behavior in our module.
```elixir
defmodule CrashDummySupervisor do
# Using this behaviour we will automatically define a child_spec/1 function
use Supervisor
def start_link(strategy) do
Supervisor.start_link(__MODULE__, strategy, name: __MODULE__)
end
# We have to implement this `init/1` callback when using the "Supervisor" behaviour
@impl true
def init(strategy) do
# Supervision tree
children = [
child_spec(:dummy1),
child_spec(:dummy2),
child_spec(:dummy3)
]
# Notice the supervision strategy
Supervisor.init(children, strategy: strategy)
end
defp child_spec(name) do
Supervisor.child_spec({CrashDummyServer, name}, id: name)
end
end
```
In the above code snippet, we define multiple instances of our "CrashDummyServer" GenServer within the supervision tree. When the supervisor is started, it automatically starts three instances (processes) of the CrashDummyServer with the names `:dummy1`, `:dummy2`, and `:dummy3`.
Since we want to start three processes of the same GenServer, we cannot use the `{CrashDummyServer, name}` child specification because it would assign the module name as the `:id`, resulting in the same `:id` being given to all three processes. To avoid this, we use the `Supervisor.child_spec/2` function and explicitly pass a separate `:id` to each process.
The supervision strategy is passed as an argument to the `start_link/1` function and `init/1` callback so that we can restart the same supervisor with a different supervision strategy.
## :one_for_one
With the "one_for_one" supervision strategy, if a child process terminates, only that specific process is restarted. In other words, if there are multiple child processes supervised by our supervisor and one of them crashes, only the crashed process is restarted while the other supervised processes continue running unaffected.

To observe the behavior of this strategy, we can start the supervisor and then intentionally crash one of the supervised processes to see the restart in action.
We will use [Kino](https://hexdocs.pm/kino/) to draw the supervision tree before and after the crash.
```elixir
{:ok, supervisor_pid} = CrashDummySupervisor.start_link(:one_for_one)
Process.info(supervisor_pid, :links) |> IO.inspect(label: "Supervisors links")
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")
:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")
Kino.Process.render_sup_tree(supervisor_pid)
```
```elixir
# Makes the dummy2 child crash
GenServer.cast(:dummy2, :boom)
# Wait for the process to crash and be restarted
Process.sleep(200)
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")
:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")
Kino.Process.render_sup_tree(supervisor_pid)
```
Based on the example, we can confirm that when using the `:one_for_one` supervision strategy, only the `:dummy2` GenServer process crashed and was subsequently restarted. As a result, the restarted process obtained a new process ID and its state was reset. On the other hand, the `:dummy1` and `:dummy3` processes continued to run without any interruption, maintaining their respective process IDs and states unchanged.
## :one_for_all
Upon restarting the CrashDummySupervisor with the `:one_for_all` restart strategy, if any child process terminates, all other child processes will be terminated as well. Following that, all child processes, including the terminated one, will be restarted.

Let's proceed with restarting the `CrashDummySupervisor` using the `:one_for_all` strategy.
```elixir
# Stop the existing supervisor process
# We used the module name as the Supervisor process name so we can use the module name to stop
# the supervisor process.
# This will also terminate the supervision tree and all process running under our supervisor
Supervisor.stop(CrashDummySupervisor)
{:ok, supervisor_pid} = CrashDummySupervisor.start_link(:one_for_all)
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")
:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")
Kino.Process.render_sup_tree(supervisor_pid)
```
```elixir
# Makes the dummy2 child crash
GenServer.cast(:dummy2, :boom)
# Wait for the process to crash and be restarted
Process.sleep(200)
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")
:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")
Kino.Process.render_sup_tree(supervisor_pid)
```
This time we can see that when the `:dummy_2` process crashed the supervisor restarted all the child processes. So the all processes now have a different pid.
## :rest_for_one
With the `:rest_for_one` strategy, if a child process terminates, not only the terminated child process but also the **subsequent child processes** that were started after it will be terminated and restarted.
This strategy is useful when you want to restart only a portion of your supervision tree. In this case, when a process crashes, only the processes dependent on the crashed process will be restarted.
#### Note:
The order in which child processes are specified in a supervision tree is crucial. A supervisor will attempt to start the child processes in the exact order specified in the supervisor child specification. Similarly, when a process crashes, the supervisor will restart the child processes in the same order.
When a supervisor shuts down, it terminates all children in the reverse order in which they are listed.

Let's see this strategy in action with our example. When the `:dummy2` process crashes, only the `:dummy2` and `:dummy3` processes will be restarted, while the `:dummy1` process will continue running.
```elixir
Supervisor.stop(CrashDummySupervisor)
{:ok, supervisor_pid} = CrashDummySupervisor.start_link(:rest_for_one)
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")
:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")
Kino.Process.render_sup_tree(supervisor_pid)
```
```elixir
# Makes the dummy2 child crash
GenServer.cast(:dummy2, :boom)
# Wait for the process to crash and be restarted
Process.sleep(200)
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")
:sys.get_state(GenServer.whereis(:dummy1)) |> IO.inspect(label: "Dummy 1 state")
:sys.get_state(GenServer.whereis(:dummy2)) |> IO.inspect(label: "Dummy 2 state")
:sys.get_state(GenServer.whereis(:dummy3)) |> IO.inspect(label: "Dummy 3 state")
Kino.Process.render_sup_tree(supervisor_pid)
```
### Resources
* The images for the different restart stragies are taken from the [erlang documentation](https://www.erlang.org/doc/design_principles/sup_princ.html#restart-strategy)
## Navigation
================================================
FILE: chapters/ch_5.3_restart_strategies.livemd
================================================
# Supervisor restart strategies
## Navigation
## Restart Strategies
In the previous chapter, we learned about supervision strategies that determine whether a supervisor should restart its child processes when one of them crashes. However, it's important to consider when a process should be considered "crashed." Processes can gracefully terminate, in which case we might not want to restart them, or they can crash due to errors.
To address this, restart strategies come into play. Unlike supervision strategies that apply to the entire supervision tree, restart strategies can be defined for each individual child process, allowing for more fine-grained control over their behavior.
Restart values can be specified either in the child spec or when creating a GenServer.
In the child spec, the restart value can be set as follows:
```elixir
children = [
%{
id: :stack_1,
start: {Stack, :start_link, []},
restart: :temporary # <================= Here
}
]
```
#### Modifying default child spec
Alternatively, when creating a GenServer, the restart value can be specified using the `use GenServer` macro:
```elixir
use GenServer, restart: :transient
```
As we learned earlier, GenServers provide a default `child_spec/1` function that automatically generates the child specification. By passing options directly to `use GenServer`, we can customize the `child_spec/1` function.
To understand the behavior of different restart options, let's create a simple GenServer and add it to the supervision tree with various restart options.
```elixir
defmodule CrashDummyServer do
use GenServer
def start_link(name) do
random_state = System.unique_integer([:positive])
GenServer.start_link(__MODULE__, {random_state, name}, name: name)
end
## Callbacks
@impl true
def init({random_value, name}) do
IO.inspect("#{name} starting up!")
{:ok, random_value}
end
@impl true
def handle_cast(:stop_gracefully, state) do
# Returning this value makes the GenServer stop gracefully with :normal reason
# If reason is neither :normal, :shutdown, nor {:shutdown, term} an error is logged.
{:stop, :normal, state}
end
@impl true
def handle_cast(:crash, state) do
process_pid = self() |> inspect()
raise "BOOM! CrashDummyServer process: #{process_pid} crashed!"
{:noreply, state}
end
end
```
```elixir
defmodule CrashDummySupervisor do
use Supervisor
def start_link() do
Supervisor.start_link(__MODULE__, :noop, name: __MODULE__)
end
@impl true
def init(_) do
# Supervision tree, start multiple instances of our genserver with different restart options
children = [
child_spec(:permanent_dummy, :permanent),
child_spec(:temporary_dummy, :temporary),
child_spec(:transient_dummy, :transient)
]
Supervisor.init(children, strategy: :one_for_one)
end
defp child_spec(name, restart_strategy) do
Supervisor.child_spec(
{CrashDummyServer, name},
id: name,
# Specifying the restart strategy
restart: restart_strategy
)
end
end
```
In the code snippet above, we have created a simple GenServer that crashes when receiving the `:boom` message and gracefully stops when receiving the `:stop_gracefully` message.
Within the Supervisor, we start three instances of this GenServer with three different restart strategies:
* `:permanent`: This is the default restart strategy, where the child process is always restarted regardless of whether it crashes or is gracefully shut down.
* `:temporary`: With this restart strategy, the child process is never restarted, even in the case of abnormal termination such as a crash. Any termination, even if it is abnormal, is considered successful.
* `:transient`: The child process is restarted only if it terminates abnormally, meaning it exits with an exit reason other than `:normal`, `:shutdown`, or `{:shutdown, term}`.
Now, let's test these restart strategies in action.
---
### `:permanent` restart strategy
```elixir
{:ok, supervisor_pid} = CrashDummySupervisor.start_link()
Supervisor.which_children(supervisor_pid)
```
```elixir
# Test graceful termination of child with `:permanent` restart strategy
# Notice how the GenServer is restarted
GenServer.cast(:permanent_dummy, :stop_gracefully)
```
```elixir
# Test abnormal termination of child with `:permanent` restart strategy
# Notice how the GenServer is restarted
GenServer.cast(:permanent_dummy, :crash)
```
---
### `:temporary` restart strategy
```elixir
# Test graceful termination of child with `:temporary` restart strategy
# Notice how the GenServer is NOT restarted
GenServer.cast(:temporary_dummy, :stop_gracefully)
```
```elixir
# Notice how temporary_dummy is no longer present in the list of children
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")
```
```elixir
# Restart the Supervisor so that all child processes are start again
Supervisor.stop(supervisor_pid)
{:ok, supervisor_pid} = CrashDummySupervisor.start_link()
```
```elixir
# Test abnormal termination of child with `:temporary` restart strategy
# Notice how the GenServer is NOT restarted
GenServer.cast(:temporary_dummy, :crash)
```
```elixir
# Notice how temporary_dummy is no longer present in the list of children
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")
```
---
### `:transient` restart strategy
```elixir
# Restart the Supervisor so that all child processes are start again
Supervisor.stop(supervisor_pid)
{:ok, supervisor_pid} = CrashDummySupervisor.start_link()
```
```elixir
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")
# Test graceful termination of child with `:transient` restart strategy
# Notice how the GenServer is NOT restarted since it was stopped gracefully
GenServer.cast(:transient_dummy, :stop_gracefully)
```
```elixir
# Notice how the transient child has a pid "undefined` since its no longer running
Supervisor.which_children(supervisor_pid)
```
```elixir
# Restart the Supervisor so that all child processes are start again
Supervisor.stop(supervisor_pid)
{:ok, supervisor_pid} = CrashDummySupervisor.start_link()
```
```elixir
# Test abnormal termination of child with `:transient` restart strategy
# Notice how the GenServer is restarted since it was stopped abnormally
GenServer.cast(:transient_dummy, :crash)
```
```elixir
# Notice how transient_dummy was restarted and all children are running
Supervisor.which_children(supervisor_pid) |> IO.inspect(label: "Supervisors Children")
```
## The Max Restarts option
So far, we have covered various options in the supervisor's child specifications, such as `:id`, `:strategy`, `:name`, and `:restart`. Now, let's explore the remaining options that the child specification supports.
Two important options are:
* `:max_restarts`: This option sets the maximum number of restarts allowed within a specified time frame. By default, it is set to 3.
* `:max_seconds`: This option defines the time frame in which the `:max_restarts` limit applies. The default value is 5 seconds.
These options determine the maximum **restart intensity** of a supervisor, controlling the number of restarts that can occur within a given time interval. It is part of the supervisor's built-in mechanism to manage restarts effectively.
**If the number of restarts exceeds the `:max_restarts` limit within the last `:max_seconds` seconds, the supervisor terminates all its child processes and itself**. In this case, the termination reason for the supervisor is `:shutdown`.
When a supervisor terminates, the next higher-level supervisor takes action. It either restarts the terminated supervisor or terminates itself.
The restart mechanism is designed to prevent a scenario where a process repeatedly crashes for the same reason, only to be restarted again and again.
## Shutdown strategy
When defining child specifications for a supervisor, we have the option to include the `:shutdown` option, which determines how the supervisor shuts down its child processes.
The `:shutdown` option has three possible values:
* *An integer greater than or equal to 0*: This specifies the amount of time in milliseconds that the supervisor will wait for its children to terminate after sending a `Process.exit(child, :shutdown)` signal. If the child process does not trap exits, it will be terminated immediately upon receiving the `:shutdown` signal. If the child process traps exits, it has the specified amount of time to terminate. If it fails to terminate within the specified time, the supervisor will forcefully terminate the child process using `Process.exit(child, :kill)`.
* `:brutal_kill`: This option causes the child process to be unconditionally and immediately terminated using `Process.exit(child, :kill)`. That is the supervisor will not wait for the child process to terminate gracefully but will immediately kill the process.
* `:infinity`: With this option, the supervisor will wait indefinitely for the child process to terminate.
By default, the `:shutdown` option is set to `5_000` (5 seconds), which means the supervisor will wait for a maximum of 5 seconds for the child process to shut down gracefully. If the child process does not terminate within this time frame, the supervisor will forcefully terminate it using `Process.exit(child, :kill)`.
These options provide flexibility in managing the shutdown behavior of child processes in a supervisor.
## Key points
* During startup, a supervisor processes all child specifications and starts each child in **the order they are defined**. This is achieved by invoking the function specified under the `:start` key in the child specification, typically `start_link/1`.
* When a supervisor initiates shutdown, it terminates its children in the **reverse order of their listing**. This termination process involves sending a shutdown exit signal, `Process.exit(child_pid, :shutdown)`, to each child process and waiting for a specified time interval for them to terminate. The default interval is 5000 milliseconds.
* If a child process is not trapping exits, it will immediately shut down upon receiving the first exit signal. On the other hand, if a child process is trapping exits, it will invoke the terminate callback and must terminate within a reasonable time before the supervisor forcefully terminates it.
* When an Elixir application exits, the termination propagates down the supervision tree. *Supervisors always trap exits* for various reasons, so they attempt to stop all their children upon receiving an exit signal. This is achieved by sending an exit signal to each child individually, allowing a timeout period before resorting to a brutal termination with `:kill`. The duration of this timeout is determined by the `shutdown` option specified in the child specification.
* Once all children are stopped, the supervisor itself stops as well, resulting in the orderly shutdown of the supervision tree.
These points highlight the startup and shutdown behavior of supervisors, the termination process for child processes, and the flow of exit signals within the supervision tree.
## Navigation
================================================
FILE: chapters/ch_5.4_introduction_to_dynamic_supervisor.livemd
================================================
# Introduction to Dynamic Supervisor
```elixir
Mix.install([
{:kino, "~> 0.9.0"}
])
```
## Navigation
## The Dynamic Supervisor
In the previous chapter, we learned about the Supervisor behavior, which enables us to supervise processes and restart them in case of failures, ensuring fault tolerance. However, the Supervisor behavior requires us to specify all the child processes it will supervise in advance as child specifications. In other words, **the Supervisor module was primarily designed to handle static children.**
When the supervisor starts, it creates and starts the child processes in the specified order, and when the supervisor is stopped, it terminates the processes in the reverse order.
On the other hand, a DynamicSupervisor **starts with no children** initially. Instead, children are started **on demand** using the `start_child/2` function, and there is **no specific ordering** between the children. This provides a lot of flexibility as we can dynamically add and remove child processes to be supervised. The DynamicSupervisor can efficiently handle a large number of children by utilizing optimized data structures and perform certain operations, such as shutting down, concurrently.
### Key points
* DynamicSupervisor is a specialized type of Supervisor designed to handle dynamic children. Note that in Erlang we have only one supervisor. These behaviors like DynamicSupervisor and PartitionSupervisor are abstraction built on top of the basic Supervisor to address common use cases more conveniently.
* Dynamic supervisors start without any children initially, and there is no predefined ordering between the children. Children can be added to the supervisor dynamically as needed, without any specific sequence or arrangement.
* The only available supervision strategy for DynamicSupervisor is `:one_for_one`.
* The `id` of a child in a DynamicSupervisor is always `:undefined`. This is because dynamically supervised children are created from the same child specification, and assigning a specific id to each child would result in conflicts.
### Supervisor.start_child/2 vs DynamicSupervisor.start_child/2
It may appear confusing that both the Supervisor and DynamicSupervisor modules provide a function called `start_child/2` to dynamically start supervised child processes. This raises the question of what distinguishes the two and why we have a dedicated DynamicSupervisor for dynamic child management.
While it is possible to dynamically start and stop children from a standard Supervisor, the DynamicSupervisor is specifically designed to excel in this use case. There are differences in how a DynamicSupervisor handles its children compared to a regular supervisor. For instance, a DynamicSupervisor does not impose an inherent ordering among its children.
On restart a DynamicSupervisor starts empty while a regular Supervisor typically starts along with all the child process defined in its child specifications. A DynamicSupervisor can concurrently shuts down all children when restarted unlike a standard supervisor which follows a specific restart order.
Furthermore, the DynamicSupervisor module provides additional options, such as `:max_children`, which allows setting a limit on the maximum number of dynamically supervised children.
Therefore it just more idiomatic and optimal to use a DynamicSupervisor instead of the regular Supervisor module when trying to dynamically start/stop supervised processes.
## Usage
Just like the regular supervisor module the DynamicSupervisor can either be started directly or defined as a module.
Lets look at some examples...
```elixir
children = [{DynamicSupervisor, name: MyTestDynamicSupervisor}]
# Th only possible strategy with DynamicSupervisor is :one_for_one
{:ok, supervisor_pid} = Supervisor.start_link(children, strategy: :one_for_one)
```
We will now create a simple GenServer that we can start under this supervisor
```elixir
defmodule TestServer do
use GenServer
def start_link(name) do
GenServer.start_link(__MODULE__, :noop, name: name)
end
## Callbacks
@impl true
def init(_arg) do
{:ok, :noop}
end
@impl true
def handle_call({:echo, arg}, _from, state) do
{:reply, arg, state}
end
end
```
Now we can use the `DynamicSupervisor.start_child(supervisor, child_spec)` function to dynamically start a child process under the supervisor. Notice how we need to pass a child spec to the function.
```elixir
{:ok, echo_1} = DynamicSupervisor.start_child(MyTestDynamicSupervisor, {TestServer, :echo1})
{:ok, echo_2} = DynamicSupervisor.start_child(MyTestDynamicSupervisor, {TestServer, :echo2})
```
```elixir
# Lets visualize the supervision tree
Kino.Process.render_sup_tree(supervisor_pid)
```
Notice how we dynamically started 2 instances of our `TestServer` GenServer process under the `MyTestDynamicSupervisor` DynamicSupervisor.
```elixir
DynamicSupervisor.count_children(MyTestDynamicSupervisor)
```
```elixir
# Notice how the id is undefined for a DynamicSupervisor
DynamicSupervisor.which_children(MyTestDynamicSupervisor)
```
We can easily terminate a dynamically started child
```elixir
DynamicSupervisor.terminate_child(MyTestDynamicSupervisor, echo_2)
```
```elixir
DynamicSupervisor.count_children(MyTestDynamicSupervisor)
```
```elixir
DynamicSupervisor.which_children(MyTestDynamicSupervisor)
```
---
### Module based DynamicSupervisor
Now lets use a module based DynamicSupervisor. Just like the regular Supervisor behaviour the DynamicSupervisor behaviour only has one callback that we must implement that is the `init/1` callback.
Also similar to the regular Supervisor module when starting a DynamicSupervisor we can pass options like `:name`, `:strategy`, `:max_restarts` and `:max_seconds`.
Two new options that are available with DynamicSupervisors are
* `:max_children` - the maximum amount of children to be running under this supervisor at the same time. When `:max_children` is exceeded, `start_child/2` returns `{:error, :max_children}`. Defaults to `:infinity`.
* `:extra_arguments` - arguments that are prepended to the arguments specified in the child spec given to `start_child/2`. Defaults to an empty list.
To understand this better lets look at an example:
```elixir
# A simple GenServer module which we would start under our supervisor
defmodule TestServerV2 do
use GenServer
def start_link(extra_arg, name, arg) do
GenServer.start_link(__MODULE__, [extra_arg, arg], name: name)
end
## Callbacks
@impl true
def init([extra_arg, arg]) do
IO.inspect(
"New TestServerV2 started with extra_arg = #{inspect(extra_arg)} and arg = #{inspect(arg)}"
)
{:ok, :noop}
end
@impl true
def handle_call({:echo, arg}, _from, state) do
{:reply, arg, state}
end
end
```
```elixir
defmodule MyTestDynamicSupervisorV2 do
# The DynamicSupervisor behaviour that defines a default child_spec/1
use DynamicSupervisor
def start_link(init_arg) do
DynamicSupervisor.start_link(__MODULE__, init_arg, name: __MODULE__)
end
# A public api to easily start child process under this supervisor
def start_child(name, arg) do
child_spec = %{id: TestServerV2, start: {TestServerV2, :start_link, [name, arg]}}
# This will start an child process of the TestServerV2 by calling
# TestServerV2.start_link(init_arg, name, arg)
DynamicSupervisor.start_child(__MODULE__, child_spec)
end
@impl true
def init(init_arg) do
# Returns a tuple containing the supervisor initialization options.
DynamicSupervisor.init(
strategy: :one_for_one,
max_children: 2,
extra_arguments: [init_arg]
)
|> IO.inspect(label: "DynamicSupervisor initialized with")
end
end
```
Few things to note in the above code snippets:
* We are using the `DynamicSupervisor.init/1` helper function to generate a tuple that initializes the dynamic supervisor with proper options in its `init/1` callback.
* We have added a helper function `MyTestDynamicSupervisorV2.start_child/2` to dynamically start supervised child processes under our dynamic supervisor.
* We have passed additional options like the `max_children` to limit the number of children the dynamic supervisor can start.
* The `extra_arguments: [init_arg]` option will automatically prepend the `init_arg` argument to every child process started under this supervisor. This is especially useful if we want to always send a specific argument to every child process that is started under this supervisor.
[Note: Similar to the regular supervisor module the DynamicSupervisor module also defines a default `child_spec/1` function so we can use shorthand syntax when defining child specs to pass to `DynamicSupervisor.start_child/2`]
```elixir
{:ok, supervisor_pid} = MyTestDynamicSupervisorV2.start_link("Elixir is ❤")
```
Now let us dynamically start and stop child process under our supervisor.
```elixir
{:ok, echo_1} = MyTestDynamicSupervisorV2.start_child(:echov2_1, :yolo)
{:ok, echo_2} = MyTestDynamicSupervisorV2.start_child(:echov2_2, :awesome_elixir)
```
Notice how the child processes that were started have received the "Elixir is ❤" specified as `:extra_arguments` along with the arguments that were passed.
```elixir
DynamicSupervisor.count_children(MyTestDynamicSupervisor) |> IO.inspect()
DynamicSupervisor.which_children(MyTestDynamicSupervisor)
```
```elixir
DynamicSupervisor.start_child(MyTestDynamicSupervisorV2, {TestServerV2, :echo3})
```
```elixir
# Lets visualize the supervision tree
Kino.Process.render_sup_tree(supervisor_pid)
```
```elixir
DynamicSupervisor.terminate_child(MyTestDynamicSupervisorV2, echo_1)
DynamicSupervisor.count_children(MyTestDynamicSupervisor) |> IO.inspect()
DynamicSupervisor.which_children(MyTestDynamicSupervisor)
```
In future chapters, we will delve into the topic of scaling a DynamicSupervisor by utilizing a PartitionSupervisor. We will also go through more examples of how to use dynamic supervisors in real use cases.
## Resources
* https://hexdocs.pm/elixir/DynamicSupervisor.html
* https://elixirforum.com/t/different-between-supervisor-start-child-and-dynamicsupervisor-start-child/14585/3
## Navigation
================================================
FILE: chapters/ch_5.5_partition_supervisor.ex.livemd
================================================
# The Partition Supervisor
```elixir
Mix.install([
{:kino, "~> 0.9.0"}
])
```
## Navigation
## Introduction
A PartitionSupervisor functions similarly to a regular supervisor, but with the added capability of creating partitions.
When a PartitionSupervisor is started, it *will create multiple partitions and will start a process under each of the partitions*.
This feature becomes particularly valuable when certain processes within a system have the potential to become bottlenecks. If these processes can easily partition their state without any interdependencies, the PartitionSupervisor can be used.
By starting multiple instances of such processes across different partitions, the workload can be distributed and potential bottlenecks can be avoided.
## Usage
Once a PartitionSupervisor is started, we can dispatch messages to its children using the `{:via, PartitionSupervisor, {name, key}}`. Here, `name` refers to the name of the PartitionSupervisor, and `key` is used for routing the message.
The PartitionSupervisor uses a routing strategy to determine the appropriate partition to which a message should be dispatched. When sending a message to a child process under a PartitionSupervisor, we provide a `key`. Depending on the routing strategy in place, the PartitionSupervisor will utilize this key to select the specific partition to which the message should be sent.
Let's explore an example to gain a better understanding of this concept.
Lets create a simple GenServer which we can start under the partition supervisor
```elixir
defmodule EchoServer do
use GenServer
def start_link(args) do
GenServer.start_link(__MODULE__, args)
end
@impl true
def init(args) do
IO.inspect("EchoServer #{inspect(self())} started with args: #{inspect(args)}")
{:ok, :noop}
end
@impl true
def handle_call({:echo, msg}, _from, state) do
IO.inspect("EchoServer(#{inspect(self())}) echoing: #{inspect(msg)}")
{:reply, msg, state}
end
end
```
Nows lets start a partition supervisor
```elixir
{:ok, supervisor_pid} =
PartitionSupervisor.start_link(
name: EchoServerPartitionSupervisor,
# Use the default child_spec/1 function of the GenServer
child_spec: EchoServer.child_spec(:test_arg)
)
```
Now lets visualize the supervision tree.
```elixir
Kino.Process.render_sup_tree(supervisor_pid)
```
From the above output we can now see that multiple processes of the `EchoServer` GenServer were started by the Partition Supervisor. A separate instance of the `EchoServer` was started for each partition that was created.
By default the number of partitions a PartitionSupervisor will create is equal to `System.schedulers_online()`(typically the number of CPU cores).
```elixir
System.schedulers_online()
```
The number of processes(partitions) we see in the supervision tree must match the above output returned from `System.schedulers_online()`.
The PartitionSupervisor provides additional options that can be passed during its initialization:
* `:partitions` - This option accepts a positive integer value that represents the number of partitions to create. By default, it is set to `System.schedulers_online()`, which corresponds to the number of online schedulers in the system.
* `:with_arguments` - A two-argument anonymous function that allows the partition to be given to the child starting function.
In addition to these specific options, other common options such as `:name`, `:child_spec`, `:max_restarts`, and `:max_seconds` can be used with the PartitionSupervisor, and they function as they do in regular supervisors.
Now lets restart our PartitionSupervisor with some of these options to customize its behaviour...
```elixir
# Defining a new echo server GenServer with a start_link/2 function
# to also receive the partition number as an argument.
defmodule EchoServerV2 do
use GenServer
def start_link(args, partition_number) do
GenServer.start_link(__MODULE__, [args, partition_number])
end
@impl true
def init([args, partition_number]) do
IO.inspect(
"EchoServer #{inspect(self())} started on partition #{partition_number} with args: #{inspect(args)}"
)
# We save the partition number in the GenServer state
{:ok, partition_number}
end
@impl true
def handle_call({:echo, msg}, _from, partition_number) do
IO.inspect(
"EchoServer(#{inspect(self())})(partition=#{partition_number}) echoing: #{inspect(msg)}"
)
{:reply, msg, partition_number}
end
end
```
```elixir
# Stop the existing supervisor
:ok = PartitionSupervisor.stop(EchoServerPartitionSupervisor)
# Start the EchoServerPartitionSupervisor again with added options
{:ok, supervisor_pid} =
PartitionSupervisor.start_link(
name: EchoServerPartitionSupervisor,
child_spec: EchoServerV2.child_spec(:test_arg),
# We explicitly specify the number of partitions to create
partitions: 3,
with_arguments: fn [existing_args], partition ->
# Inject the partition number into the args given to the child process
# This will be passed to the child process when it is started via the
# `start_link(args, partition_number)` function.
[existing_args, partition]
end
)
```
```elixir
Kino.Process.render_sup_tree(supervisor_pid)
```
Notice that this time only 3 partitions were created and 3 child processes were started.
Also notice how the partition number was passed as an argument to every child process, this is due to the use of the `with_arguments` option.
The `with_arguments` option allows us to customize the arguments passed to child processes in a partitioned supervision setup. By providing a two-argument anonymous function, we can include the partition number in the arguments used to start each child process. This **allows each process to have knowledge of the partition_number on which it is running**.
### Sending messages
To send a message to a child process under a PartitionSupervisor, we can use the `{:via, PartitionSupervisor, {name, key}}` tuple. Here key is used for routing the message to the appropriate partition.
By using this message dispatching method, we can effectively send messages to specific child processes running under the PartitionSupervisor based on the key that we pass.
```elixir
# Send a message to the EchoServer running on partition 0
:hi =
GenServer.call(
{:via, PartitionSupervisor, {EchoServerPartitionSupervisor, 0}},
{:echo, :hi}
)
# Send a message to the EchoServer running on partition 1
:ola =
GenServer.call(
{:via, PartitionSupervisor, {EchoServerPartitionSupervisor, 1}},
{:echo, :ola}
)
# Send a message to the EchoServer running on partition 2
:adios =
GenServer.call(
{:via, PartitionSupervisor, {EchoServerPartitionSupervisor, 2}},
{:echo, :adios}
)
# Send a message to the EchoServer running on partition 1
# (the routing key 1000 results in partition 1 to be selected)
GenServer.call(
{:via, PartitionSupervisor, {EchoServerPartitionSupervisor, 1000}},
{:echo, :boom}
)
```
When using integer keys with the PartitionSupervisor, the routing strategy is determined by the formula `rem(abs(key), partitions)`. In the example we provided, the message with the key `1000` was sent to partition 1 because `rem(abs(1000), 3) = rem(1000, 3) = 1`.
However, if the routing key is not an integer, the `:erlang.phash2(key, partitions)` hash function is used as the routing strategy. This function calculates a hash value based on the key and the number of partitions, resulting in the selection of the appropriate partition to which the message should be dispatched.
```elixir
:erlang.phash2("1000", 3) |> IO.inspect(label: "Partition")
GenServer.call(
{:via, PartitionSupervisor, {EchoServerPartitionSupervisor, "1000"}},
{:echo, :hello_world}
)
```
If we want to retrieve the PID of the process running on a partition for a certain key, we can use `GenServer.whereis({:via, PartitionSupervisor, {name, key}})`
```elixir
# Get the PID of the process running in the partition that would be
# selected when using "1000" as the key
GenServer.whereis({:via, PartitionSupervisor, {EchoServerPartitionSupervisor, "1000"}})
```
### Implementation detail
The PartitionSupervisor uses either an ETS table or a `Registry` to manage all of the partitions. Under the hood, the PartitionSupervisor generates a child spec for each partition and then acts as a regular supervisor. The ID of each child spec is the partition number.
## Navigation
================================================
FILE: chapters/ch_5.6_scaling_dynamic_supervisor.livemd
================================================
# Scaling Dynamic Supervisors
```elixir
Mix.install([
{:kino, "~> 0.9.0"}
])
```
## Navigation
## The scalability problem with DynamicSupervisors
In previous chapters, we learned about the DynamicSupervisor, which is effective for dynamically spawning and supervising child processes. However, in certain scenarios, the DynamicSupervisor can become a bottleneck.
The DynamicSupervisor operates as a single process responsible for starting other processes. In high-demand situations where there are numerous requests to start new child processes, the DynamicSupervisor may struggle to keep up. Additionally, if a child process experiences delays during initialization (e.g., being stuck in the `init/1` callback), it can block the DynamicSupervisor and prevent it from starting new child processes.
Lets simulate such a situation with an example...
```elixir
# A minimal GenServer which takes 2 seconds to initialize
defmodule SlowGenServer do
use GenServer
def start_link(args) do
GenServer.start_link(__MODULE__, args)
end
@impl true
def init(_args) do
# Simulate slow start of a GenServer by sleeping for 1 second
:timer.sleep(1000)
IO.inspect("Started new SlowGenServer #{inspect(self())}")
{:ok, :noop}
end
end
```
Note: In real scenarios, it's important to avoid performing time-consuming tasks in the `init/1` callback of a GenServer. Instead, we should leverage the `handle_continue/2` callback to handle long-running tasks and prevent them from blocking the GenServer startup process. However, for the purpose of this example, let's proceed with trying it out.
```elixir
# Start a DynamicSupervisor named "MySlowDynamicSupervisor"
{:ok, supervisor_pid} =
DynamicSupervisor.start_link(
name: MySlowDynamicSupervisor,
# Use the default child_spec/1 function of the GenServer
child_spec: DynamicSupervisor.child_spec([])
)
```
Now lets try to simultaneously add 5 child processes under our DynamicSupervisor.
```elixir
# Lets start 5 instances of the SlowGenServer process under the DynamicSupervisor
for _i <- 1..5 do
# Start a new process that in turn starts a new child process under the dynamic supervisor
spawn(fn -> DynamicSupervisor.start_child(MySlowDynamicSupervisor, SlowGenServer) end)
end
```
Notice how the DynamicSupervisor starts each child process one by one and is blocked until the previous child process is started. In real-world scenarios, this can cause significant delays in starting child processes under a single DynamicSupervisor, resulting in potential bottlenecks and performance issues.
Lets visualize the resulting supervision tree
```elixir
Kino.Process.render_sup_tree(supervisor_pid)
```
Notice how all the 5 instances of the `SlowGenServer` process are spawned under the same `MySlowDynamicSupervisor` instance.
## Using a PartitionSupervisor to scale DynamicSupervisors
To address the aforementioned problem, we can use a PartitionSupervisor to start multiple instances of the DynamicSupervisor. The **PartitionSupervisor acts as a supervisor for multiple DynamicSupervisor processes, each running in a separate partition**.
When a new child process needs to be started, the PartitionSupervisor selects one of the DynamicSupervisor processes to handle the request. This distribution of child process creation across multiple DynamicSupervisors helps distribute the workload and prevents bottlenecks that can occur when relying on a single DynamicSupervisor.
Lets see this in action...
```elixir
# Stop the existing dynamic supervisor
Supervisor.stop(MySlowDynamicSupervisor)
# Start a partition supervisor with a dynamic supervisor as the child process for each partition
{:ok, supervisor_pid} =
PartitionSupervisor.start_link(
name: MySlowPartitionSupervisor,
# Create 6 partitions
partitions: 6,
# Use the default child_spec/1 function of DynamicSupervisor
child_spec: DynamicSupervisor.child_spec([])
)
```
In the code above, we start a partition supervisor that will by create six partitions and will start a dynamic supervisor for each partition.
```elixir
Kino.Process.render_sup_tree(supervisor_pid)
```
Now, instead of directly calling the DynamicSupervisor by its name, we access it through the PartitionSupervisor using the `{:via, PartitionSupervisor, {partition_supervisor_name, key}}` format.
Now lets try again to start 6 child processes under the supervisor
```elixir
for i <- 1..5 do
# Start a new process that in turn starts a new child process under one of the
# dynamic supervisors via the partition supervisor
spawn(fn ->
DynamicSupervisor.start_child(
{:via, PartitionSupervisor, {MySlowPartitionSupervisor, i}},
SlowGenServer
)
end)
end
```
In the provided code, we spawn five new processes, and each process starts a new child process under one of the dynamic supervisors via the partition supervisor.
We use the numbers 1 to 5 as the routing keys for each child process. With six partitions available, each child process will be started under a separate dynamic supervisor.
Lets visualize the resulting supervision tree
```elixir
Kino.Process.render_sup_tree(supervisor_pid)
```
In the provided supervision tree, we can observe that five instances of the DynamicSupervisor were started under the `MySlowPartitionSupervisor` PartitionSupervisor. Each of these dynamic supervisors represents a separate partition.
Furthermore, under each dynamic supervisor, a separate instance of the `SlowGenServer` process was started.
---
By leveraging the PartitionSupervisor as the entry point, we can abstract away the details of the individual dynamic supervisors and rely on the routing strategy to handle the selection of the appropriate dynamic supervisor for starting the child processes. This approach allows for efficient distribution of child processes across multiple dynamic supervisors, reducing the load on any single supervisor and avoiding potential bottlenecks.
As a result, the child processes are started much faster compared to the previous example, where we relied on a single dynamic supervisor.
Note: In most real-world scenarios, the supervisor and partition supervisor are typically started as part of the application's supervision tree. Instead of manually calling start_link/1, we can define the supervisors and their child specifications in the application module.
Here's an example of how we can start the partition supervisor under a supervision tree:
```elixir
defmodule MyApp.Application do
use Application
def start(_type, _args) do
children = [
{PartitionSupervisor,
child_spec: DynamicSupervisor,
name: MySlowPartitionSupervisor}
]
opts = [strategy: :one_for_one]
Supervisor.start_link(children, opts)
end
end
```
### Resources:
* https://hexdocs.pm/elixir/1.15.0-rc.0/PartitionSupervisor.html#content
* https://blog.appsignal.com/2022/09/20/fix-process-bottlenecks-with-elixir-1-14s-partition-supervisor.html
## Navigation
================================================
FILE: chapters/ch_6.0_project_building_a_download_manager.livemd
================================================
# Project - Building a Download Manager 🚀
```elixir
Mix.install([
{:kino, "~> 0.9.0"},
{:elixir_uuid, "~> 1.2"},
{:httpoison, "~> 2.1"},
{:sizeable, "~> 1.0"}
])
```
## Navigation
## Introduction
In this chapter, we will put into practice the concepts we have learned in the previous chapters by building a project. Our project will be a simple download manager that has the capability to download multiple files simultaneously and provide status updates for each download. A key requirement is that the failure of one download should not impact the progress or completion of other ongoing downloads.
By building this download manager, we will explore the use of supervisors and dynamic supervisors to handle the concurrent downloading of files, ensuring fault tolerance and isolation between download processes. We will also dive into message passing and state management to track the progress and status of each download.
## The Download struct
We represent each download using a struct that contains the following fields:
* `id`: A unique identifier for the download.
* `name`: The name of the download.
* `src`: The source URL from where the download is initiated.
* `dest`: The destination file path where the downloaded file will be saved.
* `from`: The process ID (PID) of the requester process who initiated the download.
* `pid`: The process ID (PID) of the worker process that's downloading the file.
* `status`: The current status of the download.
* `size`: The size of the download in bytes.
* `bytes_downloaded`: The number of bytes downloaded for the download.
* `exit_status`: The exit status of the downloading process, if applicable.
* `error_status`: The reason for a failed download, if applicable.
* `start_time`: The timestamp when the download started.
* `end_time`: The timestamp when the download finished.
* `resp`: The response received from the source URL when downloading the file.
* `fd`: The file descriptor of the downloaded file on disk.
These fields provide essential information to track and manage the progress, status, and details of each download within our download manager. By utilizing this struct, we can effectively handle multiple concurrent downloads, monitor their progress, and report their status accurately.
```elixir
defmodule Download do
@enforce_keys [:id, :src, :dest, :from]
defstruct [
:id,
:name,
:src,
:dest,
:from,
:pid,
:size,
:status,
:bytes_downloaded,
:exit_status,
:error_reason,
:start_time,
:end_time,
:resp,
:fd
]
end
```
## Architecture
Lets look at the high level architecture of our application.

Our application will consist of 3 important modules.
The high-level tasks of each module are as follows:
* **DownloadManager**: The DownloadManager module is a GenServer that stores and tracks the state of all download worker processes. It provides APIs to add, remove, and retrieve download information. It also periodically updates the status of downloads and handles termination messages from download workers.
* **DownloadsSupervisor**: The DownloadsSupervisor module is a dynamic supervisor that allows the dynamic spawning of DownloadWorker processes. It starts and supervises individual DownloadWorker processes to handle each download. It provides functions to start and terminate child processes.
* **DownloadWorker**: The DownloadWorker module is a GenServer that is responsible for performing the actual download task. It runs as an individual process and handles downloading chunks of data from a given source URL. It communicates with the DownloadManager module to update the download status and handle completion or failure of the download.
Overall, the DownloadManager orchestrates the download process, the DownloadsSupervisor manages the lifecycle of DownloadWorker processes, and the DownloadWorker handles the actual downloading task.
Now lets go through the implementation of each of these modules in detail.
## The Download Worker
The `DownloadWorker` module is a GenServer responsible for handling individual file downloads. Each download is performed by its own process, represented by an instance of the `DownloadWorker` GenServer.
For downloading the file we utilize the [httpoison](https://github.com/edgurgel/httpoison) library and its async download feature, which allows us to stream chunks of the downloaded file.
This happens using the `HTTPoison.get(src, %{}, stream_to: self(), async: :once)` function call. The `stream_to: self()` option enables the `DownloadWorker` process to receive each downloaded chunk via the `handle_info/2` callback.
The `async: :once` option ensures that only one chunk at a time is sent to our `DownloadWorker` process. Once we have processed a chunk, we can request the next chunk by calling `HTTPoison.stream_next/1`.
The lifecycle of the `DownloadWorker` process is as follows:
* `init/1`: In the `init` callback, we prepare the download struct, initialize the process, and kick off the download by returning `{:ok, new_download, {:continue, :kickoff}}`. This ensures that the `handle_continue/2` callback will be called immediately after the `init/1` callback.
* `handle_continue/2`: After the initialization, in the `handle_continue` callback, we open the file where the download will be saved and initiate the download using `HTTPoison.get/3`. We save the file descriptor in the download struct as the GenServer state. If any errors occur during the download, we update the download status accordingly and return `{:stop, reason, failed_download}` to stop the GenServer with the appropriate failure reason.
* `handle_info/2`: This callback is responsible for handling various events and actions during the download process. Here are the different scenarios:
* `handle_info(%HTTPoison.AsyncHeaders{headers: headers}, download)`: Invoked when the download begins, providing headers containing metadata about the download, such as the file size. We save this information in the download struct as the GenServer state and request the first chunk using `HTTPoison.stream_next/1`.
* `handle_info(%HTTPoison.AsyncStatus{}, download)`: Requests the next chunk of the download by calling `HTTPoison.stream_next/1`.
* `handle_info(%HTTPoison.AsyncChunk{chunk: chunk}, download)`: Saves a received chunk by appending it to the file on disk. We then request the next chunk using `HTTPoison.stream_next/1`.
* `handle_info(%HTTPoison.AsyncEnd{}, download)`: Triggered when the download is completed. We close the file descriptor, update the download status, and return `{:stop, :finish, finished_download}` to gracefully terminate the download worker process.
* The remaining `handle_info/2` callbacks handle errors that occur during the download process, such as status code errors or HTTPoison errors. In such cases, we update the download status accordingly and terminate the download worker process by returning `{:stop, message, failed_download}`.
* `handle_call(:status, _from, download)`: This callback handles the `:status` call and simply returns the download struct, which represents the current state of the download. It can be used to inform the caller about the download status.
* `terminate/2`: This callback is invoked when the download worker process is about to exit. By enabling the `:trap_exit` flag in the `init/1` callback, we can trap exits and perform cleanup operations if the download process stops either due to failures or when a download finishes successfully.
In this callback we inform the caller process (identified by the `:from` property in the download struct) by sending a `{:terminating, id, download}` message. We then close the file descriptor and gracefully shut down the process.
Lastly, note that the `DownloadWorker` GenServer has the `restart: :temporary` option set, which means that any failed process won't be automatically restarted by the supervisor. In our case this is the expected behaviour since we already handle failed downloads and don't want to retry them.
```elixir
defmodule DownloadWorker do
@moduledoc """
Worker Genserver for downloading a file
"""
use GenServer, restart: :temporary
alias Download
require Logger
def start_link(args, opts \\ []), do: GenServer.start_link(__MODULE__, args, opts)
# Callbacks
@impl GenServer
def init(%Download{} = download) do
Logger.info("Start new download worker: #{download.id}")
# Makes your process call terminate/2 upon exit.
Process.flag(:trap_exit, true)
# Prepare the new download struct
new_download = %Download{
id: download.id,
src: download.src,
dest: download.dest,
from: download.from,
size: 0,
status: :initiate,
bytes_downloaded: 0,
exit_status: nil,
error_reason: nil,
start_time: nil,
end_time: nil,
resp: nil,
fd: nil
}
# Return the download struct as the GenServer state
# Also return {:continue, :kickoff} to immediately execute the handle_continue/2
# callback to kickoff the download
{:ok, new_download, {:continue, :kickoff}}
end
@impl GenServer
def handle_continue(:kickoff, %Download{src: src, dest: dest} = download) do
# Open up a file to save the download, this returns a file descriptor
{:ok, fd} = File.open(dest, [:write, :binary])
# Kick off the download
case HTTPoison.get(src, %{}, stream_to: self(), async: :once) do
{:ok, resp} ->
download = %Download{download | resp: resp, fd: fd}
{:noreply, download}
# In case of errors update the download struct and return
# {:stop, reason, state} to stop the GenServer process
{:error, %HTTPoison.Error{reason: reason}} ->
failed_download = %Download{
download
| status: :error,
error_reason: reason,
exit_status: :error
}
{:stop, reason, failed_download}
end
end
@impl GenServer
def terminate(reason, %Download{id: id, from: from} = download) do
# Inform parent process about download finish or failure.
# The parent process pid is available in the `:from` property of the download struct
Process.send(from, {:terminating, id, download}, [])
Logger.info(
"Terminate download-worker #{id}: reason=#{inspect(reason)} download=#{inspect(download)}"
)
# Close the file descriptor
download
|> Map.get(download, :fd)
|> File.close()
# Gracefully stop the GenServer process
:normal
end
@impl GenServer
def handle_call(:status, _from, download), do: {:reply, download, download}
@impl GenServer
def handle_info(%HTTPoison.AsyncStatus{code: code}, download) when code >= 400 do
message = "Failed with code: #{code}"
failed_download = %Download{
download
| status: :error,
error_reason: message,
exit_status: :error
}
{:stop, message, failed_download}
end
@impl GenServer
def handle_info(%HTTPoison.AsyncStatus{}, download) do
HTTPoison.stream_next(download.resp)
{:noreply, download}
end
@impl GenServer
def handle_info(%HTTPoison.Error{reason: reason}, download) do
message = inspect(reason)
failed_download = %Download{
download
| status: :error,
error_reason: message,
exit_status: :error
}
{:stop, message, failed_download}
end
@impl GenServer
def handle_info(%HTTPoison.AsyncHeaders{headers: headers}, download) do
# Get the "Content-Length" header from the list of headers returned in the response
content_length_header =
Enum.find(headers, fn
{"Content-Length", _length} -> true
_ -> false
end)
# Get the download file size from the "Content-Length" header
size =
case content_length_header do
{"Content-Length", length} -> length || 0
nil -> 0
end
# Ask for the next chunk of download
HTTPoison.stream_next(download.resp)
# Save the download size and other meta data in the download struct
download = %Download{download | size: size, status: :active, start_time: DateTime.utc_now()}
{:noreply, download}
end
@impl GenServer
def handle_info(%HTTPoison.AsyncChunk{chunk: chunk}, download) do
# Append the new chunk of data in the file using the file descriptor
IO.binwrite(download.fd, chunk)
HTTPoison.stream_next(download.resp)
# Update the bytes downloaded information in the download struct
bytes_downloaded = download.bytes_downloaded + byte_size(chunk)
download = %Download{download | bytes_downloaded: bytes_downloaded}
{:noreply, download}
end
@impl GenServer
def handle_info(%HTTPoison.AsyncEnd{}, download) do
File.close(download.fd)
finished_download = %Download{
download
| status: :finish,
exit_status: :normal,
end_time: DateTime.utc_now()
}
# Since the download is finished we are returning {:stop, .., ..}
# which will invoke the `terminate/2` callback
{:stop, :finish, finished_download}
end
end
```
## The DownloadSupervisor
The `DownloadsSupervisor` module serves as a dynamic supervisor, enabling us to dynamically spawn and manage `DownloadWorker` processes. Additionally, it provides the ability to stop any active download process as needed.
By utilizing a dynamic supervisor, we gain visibility into the active downloads by examining the child processes maintained under this supervisor. This allows us to easily track and manage the ongoing download operations.
```elixir
defmodule DownloadsSupervisor do
use DynamicSupervisor
require Logger
# Public functions the interact with the supervisor
@doc """
Start the supervisor process
"""
def start_link(_) do
DynamicSupervisor.start_link(__MODULE__, :no_args, name: __MODULE__)
end
@doc """
Start a new download under the supervisor
"""
def add(args) do
{:ok, pid} = DynamicSupervisor.start_child(__MODULE__, {DownloadWorker, args})
pid
end
@doc """
Stop an existing download process under the supervisor
"""
def remove(child_pid) do
DynamicSupervisor.terminate_child(__MODULE__, child_pid)
end
# Callbacks
@impl true
def init(:no_args) do
DynamicSupervisor.init(strategy: :one_for_one)
end
end
```
## The DownloadManager
The `DownloadManager` module serves as a GenServer module that aggregates and manages the state of all active, completed and failed downloads. It offers a convenient API to interact with the underlying `DownloadsSupervisor` and `DownloadWorker` processes.
The `DownloadManager` module has public functions to perform common tasks. By exposing these functions, we can interact with the `DownloadManager` without directly making calls to the GenServer module.
The public API functions provide a clean and concise interface for managing downloads, while the internal logic handles the complexities of interacting with the supervisor and worker processes.
To provide more context and important details:
* The `DownloadManager` maintains a map of all downloads in its state, and keeps this map updated with the lastest download statuses by querying the download workers periodically.
* In the `init/1` callback we schedule periodic updates to refresh the state of active downloads by sending a `:fetch_all` message to itself.
* The `DownloadManager` allows adding a new download, removing an existing download, retrieving the status of a specific download, listing all downloads, and clearing all downloaded files.
* When a download is added, the `DownloadManager` creates a new `Download` struct, starts a `DownloadWorker` process via the `DownloadsSupervisor`, and stores the download information in its state.
* The `DownloadManager` can receive a `{:terminating, id, last_child_state}` message from the `DownloadWorker` processes. This message serves as a notification to the download manager about finished downloads or download failures, enabling it to update the state accordingly. By handling this message, the DownloadManager can track the final state of a download.
```elixir
defmodule DownloadManager do
@moduledoc """
GenServer which stores aggregates and stores state for all download worker processes
Exposes APIs to add, delete, list downloads.
Examples:
{:ok, id} = DownloadManager.add("https://file-examples-com.github.io/uploads/2017/04/file_example_MP4_1920_18MG.mp4")
DownloadManager.list()
DownloadManager.get(id)
DownloadManager.remove(id)
"""
use GenServer
require Logger
@update_interval 1000
@base_download_path "/tmp/async_elixir_temp_downloads"
# Public API
@doc "Start the download manager process"
def start_link(_opts), do: GenServer.start_link(__MODULE__, %{}, name: __MODULE__)
@doc "Add a new download"
def add(src), do: GenServer.call(__MODULE__, {:add, src})
@doc "Remove an existing download"
def remove(id), do: GenServer.call(__MODULE__, {:remove, id})
@doc "Get the lastest information about a download"
def get(id), do: GenServer.call(__MODULE__, {:get, id})
@doc "Get the most updated list of downloads"
def list(), do: GenServer.call(__MODULE__, :list)
@doc "Clear all downloaded data"
def clear_all_downloads(), do: File.rm_rf!(@base_download_path)
# Callbacks
@impl GenServer
def init(_args) do
# Send a message to itself the begin aggregating lastest download statuses
Process.send_after(self(), :fetch_all, @update_interval)
{:ok, %{}}
end
# Callback to get the status of a given download using it download id
@impl GenServer
def handle_call({:get, id}, _from, state) do
case Map.get(state, id) do
nil ->
{
:reply,
{:error, :not_found},
state
}
download ->
{
:reply,
{:ok, download},
state
}
end
end
# Callback to add a new download
@impl GenServer
def handle_call({:add, src}, _from, state) do
id = UUID.uuid1()
File.mkdir_p!(@base_download_path)
download_destination = "#{@base_download_path}/#{id}"
download = %Download{
id: id,
src: src,
dest: download_destination,
# This `from` parameter allows the download worker to send a message back to the
# Download manager when the download finishes or in case of failures
from: self(),
name: guess_filename(src)
}
# Call the Download manager to start a new download worker process and initaite the download
pid = DownloadsSupervisor.add(download)
download = %Download{download | pid: pid}
# Save the download in a map that is the GenServer state
{:reply, {:ok, id}, Map.put(state, id, download)}
end
@impl GenServer
def handle_call({:remove, id}, _from, state) do
case Map.get(state, id) do
%Download{pid: pid, dest: dest} ->
# Send a message to the Download supervisor to terminate the Download Worker Process
# that is downloading the file
DownloadsSupervisor.remove(pid)
# Delete the file from disk
res = File.rm(dest)
Logger.info("Remove left over file: #{inspect(res)}")
# Remove the download from the downloads map that is the GenServer state
{:reply, {:ok, id}, Map.delete(state, id)}
_ ->
{:reply, {:error, :not_found}, state}
end
end
# Get the list of all downloads
@impl GenServer
def handle_call(:list, _from, state) do
{:reply, Map.values(state), state}
end
# Used by a Download Worker to inform the Download manager when when the download ends
@impl GenServer
def handle_info({:terminating, id, last_child_state}, state) do
{_old_value, state} =
Map.get_and_update(state, id, fn
current_value when is_nil(current_value) -> :pop
current_value -> {current_value, merge_with_old_state(current_value, last_child_state)}
end)
Logger.info("Recieved last state from child: #{id}, new_state: #{inspect(state)}")
{:noreply, state}
end
# Every 1 second (1000ms) we will query each of the active download worker
# process to refresh the state of the running downloads
@impl GenServer
def handle_info(:fetch_all, state) do
new_state =
state
|> Enum.map(fn
{id, %Download{status: status} = download} when status in [:finish, :error, :cancel] ->
{id, download}
{id, %Download{pid: pid} = download} ->
{id, fetch_status(pid, download)}
end)
|> Enum.into(%{})
Process.send_after(self(), :fetch_all, @update_interval)
{:noreply, new_state}
end
# Private helpers
# Call the download worker process to fetch the lastest status of the Download
defp fetch_status(pid, download) do
if Process.alive?(pid) do
new_download_state = GenServer.call(pid, :status)
merge_with_old_state(download, new_download_state)
else
%Download{download | status: :error, error_reason: "Killed"}
end
end
# Helper function to merge an old download struct with a new download struct
defp merge_with_old_state(old_download, new_download) do
%Download{
old_download
| size: new_download.size,
status: new_download.status,
bytes_downloaded: new_download.bytes_downloaded,
start_time: new_download.start_time,
end_time: new_download.end_time,
error_reason: new_download.error_reason
}
end
# Helper function that attempts to guess the name of the download from the download URL path
# In case of failures it assigns a UUID as the download name
defp guess_filename(url) do
path =
url
|> URI.parse()
|> Map.fetch!(:path)
if(is_nil(path), do: UUID.uuid1(), else: path |> Path.basename() |> String.trim())
end
end
```
At this point our download manager is ready.
Lets test it out! 🚀🚀🚀
## The Runner module
The `Runner` module allows us to test the functionality of our download manager and display the status of the downloads in a markdown table format. It uses the Kino library to render and update the table periodically.
The `render_downloads_list/0` function continuously updates the downloads table using the [Kino.animate/2](https://hexdocs.pm/kino/Kino.html#animate/2) function. It retrieves the latest list of downloads from the `DownloadManager` and constructs the table data by formatting the relevant fields. The markdown table is rendered using [Kino.Markdown.new/1](https://hexdocs.pm/kino/Kino.Markdown.html#new/1).
```elixir
defmodule Runner do
# 1 second (1000ms)
@refresh_rate 1000
def render_downloads_list do
# Every 1 second we refresh the downloads list
Kino.animate(@refresh_rate, fn _ ->
# Get the lastest downloads list from the DownloadManager
downloads = DownloadManager.list()
unless downloads == [] do
data =
downloads
|> Enum.map(fn download ->
data =
[
download.id,
download.name,
download.status,
percentage_progresss(download),
progress(download),
get_speed(download),
download.src,
download.dest,
download.start_time,
download.end_time || "NA",
download.error_reason || "NA"
]
|> Enum.join("|")
"|" <> data <> "|"
end)
|> Enum.join("\n")
# The headers for the downloads table
headers = """
|ID|Name|Status|Percentage Completed|Progress|Speed|Source URL|Destination|Started At|Ended At|Error Reason|
|--|----|------|--------------------|--------|-----|----------|-----------|----------|--------|------------|
"""
# Render the downloads table in markdown format
Kino.Markdown.new("#{headers}#{data}")
end
end)
end
# Private helper functions
# Calculate the progress percentage of a download from the download size
defp percentage_progresss(download) do
if download.status != :initiate && to_int(download.size) != 0 do
percentage = download.bytes_downloaded / to_int(download.size) * 100
"#{trunc(percentage)}%"
else
"NA"
end
end
# Use the Sizeable library to show the size of the download in a human readable way
defp progress(download) do
if download.status != :initiate && to_int(download.size) != 0 do
"#{Sizeable.filesize(download.bytes_downloaded)} / #{Sizeable.filesize(download.size)}"
else
"NA"
end
end
# Calculate the download speed using the download start time and data downloaded
defp get_speed(download) when download.status == :active,
do: "#{get_speed_in_bytes(download) |> Sizeable.filesize()}/sec"
defp get_speed(_download), do: "NA"
defp get_speed_in_bytes(%Download{bytes_downloaded: bytes_downloaded, start_time: start_time})
when is_nil(start_time) or bytes_downloaded == 0,
do: 0
defp get_speed_in_bytes(download) do
elapsed_time = DateTime.diff(DateTime.utc_now(), download.start_time)
if elapsed_time == 0, do: 0, else: download.bytes_downloaded / elapsed_time
end
defp to_int(num) when is_binary(num) do
case Integer.parse(num) do
{num, _} -> num
:error -> 0
end
end
defp to_int(num), do: num
end
```
Now lets start our Download manager and Download supervisor.
```elixir
# Stop any existing Download manager or Download supervisor processes
if Process.whereis(DownloadsSupervisor), do: DynamicSupervisor.stop(DownloadsSupervisor)
if Process.whereis(DownloadManager), do: GenServer.stop(DownloadManager)
# Clear any previously downloaded data
DownloadManager.clear_all_downloads() |> IO.inspect()
# Start the Download manager and Download supervisor processes
{:ok, download_sup_pid} = DownloadsSupervisor.start_link(:noop)
{:ok, download_manager_pid} = DownloadManager.start_link(:noop)
```
```elixir
# Call the runner module to render the downloads list as a markdown table
Runner.render_downloads_list()
```
```elixir
# Start 4 downloads of different sizes
{:ok, first_id} = DownloadManager.add("https://speed.hetzner.de/100MB.bin")
{:ok, second_id} = DownloadManager.add("https://speed.hetzner.de/1GB.bin")
{:ok, third_id} = DownloadManager.add("https://speed.hetzner.de/10GB.bin")
# This download will fail since the download url does not exists
{:ok, fourth_id} = DownloadManager.add("https://speed.hetzner.de/bad_file.bin")
```
Notice how the downloads in progress are being updated in real-time. It is worth noting that while one of the downloads encountered a failure, the remaining downloads continued unaffected.
This showcases the concept of process isolation, where failures in one process do not impact others. Furthermore, we receive informative notifications about the download failure, including the reason for the failure, this works because we are trapping exits in the `DownloadWorker` processes to update the final state of a download.
Now lets visualize the supervision tree of the Dynamic supervisor that is the `DownloadsSupervisor` module. As the downloads finish if we refresh the supervision tree we can notice how the worker processes are stopped and removed.
```elixir
Kino.Process.render_sup_tree(download_sup_pid)
```
Remove the fourth download that failed, notice how the download entry is removed from the downloads table after this code is executed.
```elixir
DownloadManager.remove(fourth_id)
```
Finally, we can trace the message flow between different processes when starting a new download. To achieve this, we utilize the `Kino.Process.render_seq_trace/2` function. In this case, we provide the PID of the download manager processes to the function, ensuring that only the messages sent to and from the download manager can be traced.
```elixir
pids_to_trace = [download_manager_pid]
# Trace and inspect messages being sent in between the processes
Kino.Process.render_seq_trace(pids_to_trace, fn ->
{:ok, _first_id} = DownloadManager.add("https://speed.hetzner.de/100MB.bin")
# Sleep to enable catching all messages between the processes
:timer.sleep(1000)
end)
```
Congratulations on successfully building a download manager from scratch! 🎉🎉🎉
Throughout this process, we applied various concepts that we have learned in previous chapters, reinforcing our understanding of Elixir's key features.
In summary, we utilized **GenServer** to manage the state of downloads, **DynamicSupervisor** to dynamically spawn and terminate download worker processes, **trapping exits** to handle failures gracefully, **message passing** between processes to communicate and coordinate activities, and finally, we learned how to **write and organize code that utilizes multiple processes** in Elixir.
Well done! 🥳
## Navigation
================================================
FILE: chapters/ch_7.1_intro_to_tasks.livemd
================================================
# Introduction to Tasks
## Navigation
## Introduction
In the previous chapters, we explored various methods of starting processes, including `spawn/1` and `spawn_link/1`. Now, let's dive into the Task module, which provides a more convenient approach to spawning processes for performing tasks.
The Task module offers a wide range of convenience functions to effectively manage launched tasks. Unlike plain processes started with `spawn/1`, tasks provide additional capabilities such as monitoring metadata and error logging.
With the Task module, we gain access to a many functions tailored to common use cases. We can easily await the completion of spawned tasks, launch supervised tasks, and execute multiple tasks concurrently. The abstraction and convenience functions provided by the Task module make working with processes a breeze, eliminating the need to delve into low-level details.
## Basics usage
Let's explore some basic examples of launching tasks:
```elixir
Task.start(fn -> IO.puts("Hello from the first task!") end)
# Same as Task.start/1, but accepts a module, function, and arguments instead.
Task.start(IO, :puts, ["Hello from the second task!"])
```
As you can see, launching a task is very similar to spawning a process using `spawn/1`. In this case, the process spawned by `Task.start/1` is not linked to the caller process. It is primarily used for performing side effects where we don't need to wait for the result or handle failures.
## The Task struct
Under the hood, a task in Elixir is essentially a regular Elixir process. When we spawn a task using one of the functions provided by the Task module, we receive a `Task` struct in return. This struct contains additional information about the task, and it can be utilized with various functions from both the Task and Task.Supervisor modules (which we will explore in greater detail in the upcoming chapters).
Now, let's take a closer look at the information encapsulated within the Task struct. To do this, we will start a task using the `Task.async/1` function and analyze the resulting struct.
```elixir
Task.async(fn -> :empty_task end)
```
We get back a structure like so
```elixir
%Task{
mfa: {module, function, arrity},
owner: owner_pid,
pid: task_process_pid,
ref: #Reference
}
```
* `:mfa` - a three-element tuple containing the module, function name, and arity invoked to start the task in async/1 and async/3
* `:owner` - the PID of the process that started the task
* `:pid` - the PID of the task process; nil if there is no process specifically assigned for the task
* `:ref` - an opaque term used as the task monitor reference
In the case of the `ref` field in the Task struct, it represents a monitor reference. When a task is spawned, the caller process monitors the task process using this reference. This monitoring enables the caller process to receive a `{:DOWN, , :process, , }` message when the task exits. The monitor reference is particularly useful when awaiting tasks to [receive exit messages in case of crashes](https://github.com/elixir-lang/elixir/blob/7f7a8bca99fa306a41a985df0018ba642e577d4d/lib/elixir/lib/task.ex#L841).
In the upcoming chapters, we will dive deeper into the capabilities of tasks. We will explore how to await task completion, supervise tasks, and uncover many more exciting features. Stay tuned!
## Navigation
================================================
FILE: chapters/ch_7.2_awaiting_tasks.livemd
================================================
# Awaiting Tasks
```elixir
Mix.install([])
```
## Navigation
## Introduction
Till now we have seen examples were we spawn a process to do something concurrently however sometimes we might need the value returned by the process. In such situations it is important to wait for the process to complete and then fetch the result.
The Task module provides the `async` and `await` functions to handle this common use case. With `Task.async/1`, a new process is created, **linked, and monitored** by the caller. Once the task action finishes, a message containing the result is sent to the caller. The `Task.await/2` function is then used to read this message and obtain the result.
Here are some key points to note about `async` and `await`:
* Async tasks establish a link between the caller and the spawned process. If either the caller or the task crashes, the other process will crash as well. This intentional linkage ensures that the computation is not carried out if the process meant to receive the result no longer exists.
* When using async tasks, it is important to await a reply as they are always sent. If you don't expect a reply but still want to launch a linked process, consider using `Task.start_link/1` instead.
Lets look at some code to understand this better...
```elixir
my_task =
Task.async(fn ->
# Sleep for 2 seconds
:timer.sleep(2000)
IO.puts("Done sleeping")
DateTime.utc_now()
end)
|> IO.inspect()
# Notice how the above task process is linked to the caller process
Process.info(self(), :links)
|> IO.inspect(label: "Parent process #{inspect(self())} links")
Task.await(my_task) |> IO.inspect(label: "Task returned")
```
Lets do this again, but check the process mailbox to see the message returned by the spawned task process.
```elixir
my_task = Task.async(fn -> "return value" end)
IO.inspect(my_task)
# Wait for process to complete
:timer.sleep(100)
:erlang.process_info(self(), :messages) |> IO.inspect(label: "Messages in mailbox")
Task.await(my_task)
```
Here notice how the spawed task process sends a message back to the caller in the format
`{, }`. The `Task.await/1` call basically [awaits this message](https://github.com/elixir-lang/elixir/blob/9fd85b06dcb74217108cd0bdf4164b6cd7f9e667/lib/elixir/lib/task.ex#L827) in a recieve block like so...
```elixir
receive do
# The reply message from the task
{^ref, reply} ->
# Stop monitoring the task since th task has sent a reply so must have completed successfully so we no longer monitor the process for crashes
demonitor(ref)
reply
# This is the message received from the task monitor, if this happens it means we received the :DOWN message without getting the reply message first, which means the task crashed
{:DOWN, ^ref, _, proc, reason} ->
# Exit the linked caller process that is awaiting since the task process crashed
exit({reason(reason, proc), {__MODULE__, :await, [task, timeout]}})
# more code
end
```
The other message returned is `{:DOWN, ref, :process, pid, reason}` - since all tasks are also monitored, you will also receive the `:DOWN` message delivered by `Process.monitor/1`. If you receive the :DOWN message without getting the reply message, it means the task crashed.
At any point we can ignore a linked task by calling `Task.ignore/1` which means the task will continue running, but it will be unlinked and we can no longer yield, await or shut it down. Also this means if the task fails the owner process will be unaffected. Lets look at and example...
```elixir
time_bomb_task =
Task.async(fn ->
:timer.sleep(2000)
raise "BOOOOM!"
end)
|> IO.inspect()
IO.inspect(Process.info(self(), :links), label: "Parent process #{inspect(self())} links")
# Unlink the spawned task
Task.ignore(time_bomb_task)
IO.inspect(Process.info(self(), :links), label: "Parent process #{inspect(self())} links")
:timer.sleep(2100)
IO.puts("Parent process survived!")
```
Lets see another example were we launch 3 tasks using the `Task.async/3` function that takes mfa(module function args) as arguments. Each of tasks generating a random number.
We then await there results and return the sum of the random numbers
```elixir
my_task1 = Task.async(Enum, :random, [0..10])
my_task2 = Task.async(Enum, :random, [10..20])
my_task3 = Task.async(Enum, :random, [20..30])
Task.await(my_task1) + Task.await(my_task2) + Task.await(my_task3)
```
In cases were wee need to await multiple tasks the Task module provides a better apporach using the `Task.await_many/1` that awaits replies from multiple tasks and returns them as a list.
For example we could rewrite the above example like so
```elixir
my_task1 = Task.async(Enum, :random, [0..10])
my_task2 = Task.async(Enum, :random, [10..20])
my_task3 = Task.async(Enum, :random, [20..30])
results = Task.await_many([my_task1, my_task2, my_task3])
IO.inspect(results, label: "Results from await_many")
Enum.sum(results)
```
Some important points to note about `Task.await_many/1` are...
* If any of the task processes dies, the caller process will exit with the same reason as that task.
* It returns a list of the results, in the **same order** as the tasks supplied in the tasks input argument.
* A timeout, in milliseconds or :infinity, can be given with a default value of 5000. If the timeout is exceeded, then the caller process will exit. Any task processes that are linked to the caller process (which is the case when a task is started with async) will also exit. Any task processes that are trapping exits or not linked to the caller process will continue to run.
## Task await timeouts
When calling `Task.await/1` by default the await timeout is 5 seconds after which the caller process will exit. If the task process is linked to the caller process which is the case when a task is started with async, then the task process will also exit. If the task process is trapping exits or not linked to the caller process, then it will continue to run.
Lets look at an example...
```elixir
dont_await_me = Task.async(fn -> :timer.sleep(:infinity) end)
Task.await(dont_await_me)
```
The `Task.await/1` function can only be called once for any given task.
If we want to check if a task has completed or not and not risk the caller process exiting we must use `Task.yield/2`.
## Yielding tasks
Sometimes we only wish to check if a Task is completed within a given timeout, if not we want the caller process to continue. Unlike `Task.await/1` were the caller process exits in cases of timeouts with `Task.yield/2` the caller process will continue to run if the Task has not yet completed within the timeout. Therefore `Task.yield/2` can be called multiple times on the same task.
Just like await the yield function will also block the caller process until the task completes or the timeout is reached.
These are the different scenarios when calling `Task.yield/1`
* When the task process finishes within the yield timeout - Returns `{:ok, result}` were `result` is the value returned by the task.
* When the task process does not reply within the yield timeout - Returns `nil`. This can happen if the timeout expires OR if the message from the task has already been consumed by the caller.
* When the task process has already exited OR if the task is not linked to the calling process - Returns `{:exit, reason}`
Now lets look at some code...
```elixir
heavy_task =
Task.async(fn ->
:timer.sleep(5000)
:finished_heavy_task
end)
Task.yield(heavy_task, 1000) |> IO.inspect(label: "after 1 second")
Task.yield(heavy_task, 1000) |> IO.inspect(label: "after 2 second")
Task.yield(heavy_task, 1000) |> IO.inspect(label: "after 3 second")
Task.yield(heavy_task, 1000) |> IO.inspect(label: "after 4 second")
:timer.sleep(1500)
:erlang.process_info(self(), :messages) |> IO.inspect(label: "Messages in mailbox")
Task.yield(heavy_task, 1000) |> IO.inspect(label: "After task finished")
:erlang.process_info(self(), :messages) |> IO.inspect(label: "Messages in mailbox")
Task.yield(heavy_task, 1000) |> IO.inspect(label: "After message from task was consumed")
```
Similar to `Task.await_many/2` we also have `Task.yield_many/2`
This function receives a list of tasks and waits for their replies in the given time interval. It returns a list of two-element tuples, with the task as the first element and the yielded result as the second. The tasks in the returned list will be in the same order as the tasks supplied in the tasks input argument.
Similarly to yield/2, each task's result will be `{:ok, term}` if the task has successfully reported its result back in the given time interval or `{:exit, reason}` if the task has died
nil if the task keeps running past the timeout
```elixir
tasks =
for i <- 1..10 do
Task.async(fn ->
Process.sleep(i * 1000)
i
end)
end
tasks_with_results = Task.yield_many(tasks)
results =
Enum.map(tasks_with_results, fn {task, res} ->
# Shut down the tasks that did not reply or exit
res || Task.shutdown(task, :brutal_kill)
end)
# Here we are matching only on {:ok, value} and
# ignoring {:exit, _} (crashed tasks) and `nil` (no replies)
for {:ok, value} <- results do
IO.inspect(value)
end
```
In the example above, we create tasks that sleep from 1 up to 10 seconds and return the number of seconds they slept for. If you execute the code all at once, you should see 1 up to 4 printed, as those were the tasks that have replied in the default timeout (5 seconds) of `Task.yield_many/1`. All other tasks will have been shut down using the `Task.shutdown/2` call.
As a convenience, you can achieve a similar behaviour to above by specifying the `:on_timeout` option to be `:kill_task` (or `:ignore`).
For example to kill all tasks which do not yield within 7 seconds we can write
`Task.yield_many(tasks_list, timeout: 7000, on_timeout: :kill_task)` (this option is available from elixir 1.15.0+)
## References
* https://hexdocs.pm/elixir/1.12/Task.html#content
## Navigation
================================================
FILE: chapters/ch_7.3_task_async_stream.livemd
================================================
# Task.async_stream/3
```elixir
Mix.install([
{:httpoison, "~> 2.1"},
{:jason, "1.4.0"},
{:nimble_csv, "~> 1.2"}
])
```
## Navigation
## Introduction
[Streams](https://hexdocs.pm/elixir/1.12/Stream.html) are a valuable feature in Elixir that allow for lazy emission of elements. Any [enumerable](https://hexdocs.pm/elixir/1.12/Enumerable.html) that generates elements one by one during enumeration is considered a stream. Streams are particularly useful when dealing with large datasets that could consume excessive memory if loaded all at once. With streams, we can manage data lazily, processing elements as needed. To learn more about Stream module check out the documentation [here](https://hexdocs.pm/elixir/1.15.0/Stream.html).
Now that we have a basic understanding of streams, let's explore the exciting `Task.async_stream/3` function. This function enables **concurrent processing of each element in a enumerable**, unlocking significant potential for parallel execution.
Since `Task.async_stream/3` works on enumerables it can work on both Streams and Enums.
Lets look at an example...
```elixir
# A function to get a chuk norris joke by calling an api
get_chuknorris_joke = fn ->
HTTPoison.get!("https://api.chucknorris.io/jokes/random")
|> Map.get(:body)
|> Jason.decode!()
|> Map.get("value")
end
```
Lets see how much time it takes to make 10 api calls one by one
```elixir
Enum.map(1..10, fn _ -> get_chuknorris_joke.() end)
```
Now lets try the same using `Task.async_stream/3`
```elixir
1..10
|> Task.async_stream(fn _ -> get_chuknorris_joke.() end)
|> Enum.to_list()
```
Observe the significant improvement in the function's execution speed this time. This is because `Task.async_stream/3` launched a separate process to handle each item in the enumerable. In our case, it spawned a separate process for each API call.
It's important to note that `Task.async_stream/3` returns a stream, which is lazy and won't execute until we consume it. A common way to consume a stream is by using one of the `Enum` functions, such as `Enum.to_list/1` in this case, or by invoking `Stream.run/1`.
`Task.async_stream/3` also provides various options to customize its behavior. One such option is `:max_concurrency`, which allows us to control the number of tasks running simultaneously. By default, it is set to `System.schedulers_online/0`.
Another consideration is the ordering of results from `Task.async_stream/3`. By default, Elixir buffers the results to emit them in the original order, as the spawned processes may finish in random order. However, setting the `:ordered` option to `false` removes the need for buffering at the expense of removing ordering.
For a complete list of options, refer to the [documentation](https://hexdocs.pm/elixir/1.12/Task.html#async_stream/3).
## A practical example
Now, let's explore another practical example of using `Task.async_stream/3`. In this example, we'll read a CSV file containing the top 100 websites.
The csv file has data in the following format...
```csv
1,"fonts.googleapis.com",10
2,"facebook.com",10
3,"twitter.com",10
4,"google.com",10
5,"youtube.com",10
...
```
Our goal is to check the reachability of each website by sending an HTTP request to it.
```elixir
"#{Path.absname(__DIR__)}/sample_data/top_websites.csv"
|> File.stream!()
|> NimbleCSV.RFC4180.parse_stream()
# Map out the website information from every row in the csv file
|> Stream.map(fn [_, website, _] -> website end)
|> Task.async_stream(&HTTPoison.get/1, timeout: :infinity, ordered: false, max_concurrency: 4)
# Filter out reachable websites
|> Stream.filter(fn
{:ok, _} -> true
_ -> false
end)
|> Enum.count()
|> IO.inspect(label: "Reachable website count")
```
Here's a breakdown of the code:
First, we use `File.stream!/1` to read the CSV file. This function provides a stream that allows us to access the file lazily, avoiding the need to load the entire file into memory.
Next, we parse the file using the [parse_stream/1](https://hexdocs.pm/nimble_csv/NimbleCSV.html#c:parse_stream/2) function from the [nimble_csv](https://github.com/dashbitco/nimble_csv) library. This gives us a parsed stream of the CSV data.
We then leverage `Task.async_stream/3` to make a GET request to each website concurrently. Since the order of the responses doesn't matter, we specify `ordered: false`. Additionally, we limit the concurrency to 4 requests at a time using the `max_concurrency: 4` option.
Finally, we filter out the reachable websites and then count the elements by consuming the stream using the `Enum.count/1` function.
By using `Task.async_stream/3` Elixir enables us to perform concurrent data processing with just a few lines of code. This simplicity and power of concurrent programming in Elixir is truly amazing 🚀
## References
* https://hexdocs.pm/elixir/1.12/Task.html#async_stream/3
## Navigation
================================================
FILE: chapters/ch_7.4_supervised_tasks.livemd
================================================
# Supervised Tasks
```elixir
Mix.install([
{:httpoison, "~> 2.1"},
{:kino, "~> 0.9.0"}
])
```
## Navigation
## Supervising tasks
In the previous chapters, we have learned how to start tasks and await their completion. These tasks are typically linked to the caller process and are not supervised.
Supervisors, as we have previously discovered, offer valuable control and visibility over processes.
By supervising our tasks, we can enhance our visibility and control over them. We can unlink tasks from the caller process to prevent cascading failures, and still have the ability to await the task's completion, among other benefits.
In Elixir, we are provided with the [Task.Supervisor](https://hexdocs.pm/elixir/1.15.2/Task.Supervisor.html#content) module, which allows us to **dynamically supervise tasks** easily and oversee the tasks we spawn.
Lets look at the some code to understand how this works...
We first start a `Task.Supervisor` as a part of our supervision tree, here we name it `AwesomeTaskSupervisor`. We can pass other options in the child specifications like `:max_restarts`, `:max_seconds`, etc. All the options are documented [here](https://hexdocs.pm/elixir/1.15.2/Task.Supervisor.html#start_link/1) in the `Task.Supervisor.start_link/1` docs.
```elixir
children = [{Task.Supervisor, name: AwesomeTaskSupervisor}]
Supervisor.start_link(children, strategy: :one_for_one)
```
We can now start a supervised Task under our supervisor like so...
```elixir
task =
Task.Supervisor.async(
AwesomeTaskSupervisor,
fn ->
:timer.sleep(200)
IO.puts("Hello from a supervised task!")
:returned_from_task
end
)
|> IO.inspect(label: "TASK")
Process.info(self(), :links)
|> IO.inspect(label: "Parent process #{inspect(self())} links")
# Wait for spawed task to finish
:timer.sleep(300)
# Notice the the reply message and the "DOWN" message from the completed task
# The "DOWN" message is because of the monitor of the task
IO.inspect(Process.info(self(), :messages))
```
Note that in the above example the spawed task was **still linked to the caller process** and was monitered by the caller process. This means that if the task process crashed it would bring down the caller process as well.
In the above example we use a single task supervisor process to launch multiple tasks, this can quickly become a bolttleneck in scenarios were lots of tasks are being spawed and the supervisor process is not able to keep up.
In such cases we can use the partition supervisor that will by default start a dynamic supervisor for each core in our machine. So it will start multiple instances of the `Task.Supervisor` and then pick a random instance to start the task on.
For this we would need to define the child spec of the supervisor like so
```elixir
# In the `child_spec` option we pass the name of the supervisor that we want to partition
children = [{PartitionSupervisor, child_spec: Task.Supervisor, name: ScalableTaskSupervisor}]
{:ok, supervisor_pid} = Supervisor.start_link(children, strategy: :one_for_one)
```
We can then start tasks via the partitioned task supervisor using the ` using the {:via, PartitionSupervisor, {name, key}}` format, `name` is the name of the partition supervisor and `key` is the routing key.
```elixir
Task.Supervisor.async(
{:via, PartitionSupervisor, {ScalableTaskSupervisor, :my_routing_key}},
fn ->
:timer.sleep(200)
IO.puts("Hello from a supervised task!")
end
)
Kino.Process.render_sup_tree(supervisor_pid)
```
You'll observe the initiation of multiple task supervisors, each associated with a specific partition. These supervisors operate under our "ScalableTaskSupervisor" partition supervisor, and our task was launched within one of these partitions.
Let's now explore the behavior when a supervised task encounters an unexpected crash.
```elixir
# Now lets spawn a supervised task process that will crash
Task.Supervisor.start_child(
AwesomeTaskSupervisor,
fn ->
IO.puts("A task was started!")
raise "boom"
end,
# restart process if the exit is abnormal, by default it is :temporary
restart: :transient
)
|> IO.inspect(label: "Returned from call to start_child")
```
However, it's important to note that when using the `Task.Supervisor.start_child/2` function, we **do not receive a task struct** that can be directly awaited or used with other functions in the `Task` module. Instead, we receive the PID of the task process.
When using `Task.Supervisor.start_child/2` the default supervisor restart strategy is `:temporary` so if a task process crashes, it will not be automatically restarted by the supervisor. In the example above we set the startegy to `:transient` so that the supervisor restarts the crashed task.
It's worth mentioning that when using other functions in the `Task.Supervisor` module, such as `async/3`, `async_nolink/3`, etc, the spawned task processes also have a `:temporary` restart strategy, which **cannot be changed**. This means that **if a task process crashes, it will not be automatically restarted by the supervisor**.
## Unlinked supervised tasks
To prevent linking a task with the caller process, we can utilize functions like `async_nolink/3`, `async_stream_nolink/4`, and others provided by the `Task.Supervisor` module.
By using these functions, the spawned tasks are not linked to the caller process. However, they are still monitored by the caller process. In the event of a task crash, the caller process remains unaffected and can still be informed about the crashed task through monitoring. Additionally, **the unlinked caller process retains the ability to await the completion of the task**.
This means if we want to avoid the caller process from exiting when a spawned task process exits abnormally while also retain the ability of the caller process to await the task we can use the functions like `async_nolink` in the `Task.Supervisor` module.
To gain a better understanding of this concept, let's explore some examples using our `AwesomeTaskSupervisor` that we previously started.
```elixir
unlinked_task =
Task.Supervisor.async_nolink(AwesomeTaskSupervisor, fn ->
:timer.sleep(300)
1 + 2
end)
|> IO.inspect(label: "Unliked Task")
supervisor_pid = Process.whereis(AwesomeTaskSupervisor)
# Spawned task process is not linked to the caller
Process.info(self(), :links)
|> IO.inspect(label: "Parent process #{inspect(self())} links")
# Spawned task process is linked to the supervisor
Process.info(supervisor_pid, :links)
|> IO.inspect(label: "AwesomeTaskSupervisor process #{inspect(self())} links")
# Spawned task process appears as a child under the supervisor
Task.Supervisor.children(AwesomeTaskSupervisor)
|> IO.inspect(label: "AwesomeTaskSupervisor Children")
# We can still await the unlinked task
Task.await(unlinked_task)
|> IO.inspect(label: "Result from task")
```
Note this function requires the task supervisor to have `:temporary` as the `:restart` option (the default that cannot be changed), as `async_nolink/3` keeps a direct reference to the task which is lost if the task is restarted.
### Task.Supervisor.async_nolink/3 vs Task.Supervisor.start_child/3
* Use `async_nolink/3` when you need to await or yield the task's result. The spawned task process is not linked to the caller process, but it is monitored. This allows the caller process to receive a message when the task exits and still await its completion.
* Use `start_child/3` when you want a fire-and-forget approach. The spawned task process is not linked to the caller process, and there is no monitoring or message communication. This is suitable for scenarios where you don't need the task's result or if it performs side-effects (like I/O) without the need for result handling.
The choice between `start_child` and `async_nolink` conveys the semantic intention. `start_child` indicates that you don't care about the result, while `async_nolink` implies that you may have an interest in the task's life and result, as the monitor will provide information about its termination.
## Usage with OTP behaviours
When using `async_nolink` to create a task within an OTP behavior like `GenServer`, it's important to match on the message received from the task in your `GenServer.handle_info/2` callback.
The reply sent by the task will be in the format `{ref, result}`, where `ref` is the monitor reference held by the task struct and `result` is the return value of the task function.
Regardless of how the task created with `async_nolink` terminates, the caller's process will always receive a `:DOWN` message with the same `ref` value held by the task struct. If the task terminates normally, the reason in the `:DOWN` message will be `:normal`.
Typically, `async_nolink/3` is used when there is a possibility of the task failing, and you want to prevent it from causing the caller process to crash.
Let's consider an example where we create a GenServer called `MyDownloader` that allows us to spawn tasks for downloading files. The GenServer process doesn't get blocked during the download; instead, it delegates the downloading task to a separate task process. The GenServer saves the spawned task reference in a `MapSet` in its state and logs when a downloading task succeeds or crashes.
```elixir
defmodule MyDownloader do
use GenServer
# == Public API ==
def start_link() do
GenServer.start_link(__MODULE__, :noop, name: __MODULE__)
end
def start_download(url) do
GenServer.call(__MODULE__, {:start_download, url})
:ok
end
def download(url) do
# We are using get! so that in case of failures the task will crash
HTTPoison.get!(url)
end
# == GenServer callbacks ==
@impl true
def init(_) do
{:ok, MapSet.new()}
end
@impl true
def handle_call({:start_download, url}, _from, state) do
task = Task.Supervisor.async_nolink(AwesomeTaskSupervisor, __MODULE__, :download, [url])
# Save the spawned task reference in a mapset in the GenServer state
state = MapSet.put(state, task.ref)
{:reply, :ok, state}
end
# The task completed successfully
@impl true
def handle_info({ref, response}, state) do
# We don't care about the DOWN message now, so let's demonitor and flush it
# Flushing will remove the {_, MonitorRef, _, _, _} message, if there is one,
# from the caller message queue after monitoring has been stopped.
Process.demonitor(ref, [:flush])
IO.inspect(response.body, label: "Task #{inspect(ref)} completed with result")
# Remove finished tasks from the mapset
state = MapSet.delete(state, ref)
{:noreply, state}
end
# The task failed, that is we received a :DOWN message before the task reply message
@impl true
def handle_info({:DOWN, ref, :process, _pid, reason}, state) do
IO.inspect(elem(reason, 0), label: "Task #{inspect(ref)} failed with reason")
# Remove failed tasks from the mapset
state = MapSet.delete(state, ref)
{:noreply, state}
end
end
```
Now lets take it for a spin...
```elixir
MyDownloader.start_link()
```
```elixir
MyDownloader.start_download("https://api.chucknorris.io/jokes/random")
MyDownloader.start_download(
"https://file-examples.com/storage/fede3f30f864a1f979d2bf0/2017/10/file_example_JPG_100kB.jpg"
)
# Invalid download url, the spawned task to request this url will crash
MyDownloader.start_download("https://invalid-file-download.invalid")
```
In this example, all three download tasks run concurrently. Two of the download task processes succeed and send a reply message to the GenServer process, which was captured by the `handle_info({ref, response}, state)` callback. One of the task processes crashes, but the GenServer continues running since the tasks were unlinked.
Due to the task monitor, the GenServer was able to capture the `:DOWN` message using the `handle_info({:DOWN, ref, :process, _pid, reason}, state)` callback and was informed about the crashed task.
That concludes our exploration of tasks in Elixir. By now, you should have gained a solid understanding of the power and convenience that tasks offer. The `Task` module is widely used in practice and provides a straightforward and efficient way to manage concurrent work in Elixir. It is yet another tool in the Elixir ecosystem that empowers us to embrace concurrent and parallel programming with ease, unlocking the full potential of the language.
Happy tasking! ✨🚀
## References
* https://hexdocs.pm/elixir/1.15.2/Task.Supervisor.html
* https://groups.google.com/g/elixir-lang-talk/c/gSK36qc7EpE?pli=1
## Navigation
================================================
FILE: chapters/ch_8.0_agents.livemd
================================================
# Agents
## Navigation
## Introduction
When it comes to storing state in Elixir, we have several options at our disposal. We can utilize the process dictionary for local access, employ GenServers, utilize [ETS tables](https://elixirschool.com/en/lessons/storage/ets), and more. However, one straightforward and convenient approach is to use Agents. Agents provide a simple abstraction for storing state in a process and offer a straightforward API for accessing and updating the stored state. Internally, Agents are implemented as GenServer processes.
## Usage
Using an agent is incredibly straightforward. Let's take a look at a simple example:
```elixir
{:ok, my_counter_pid} = Agent.start_link(fn -> 0 end)
# Retrieve the stored state
Agent.get(my_counter_pid, fn state -> state end)
|> IO.inspect(label: "GET")
# Perform a GenServer.cast/2 operation on the agent state
# The caller process will not wait for the operation to complete
Agent.cast(my_counter_pid, fn state -> state + 1 end)
|> IO.inspect(label: "CAST")
# Update the Agent state
# Implemented as a GenServer.call/2 the caller process will wait until the agent process replies
Agent.update(my_counter_pid, fn state -> state + 1 end)
|> IO.inspect(label: "UPDATE")
# Get and update the state in a single call
# Implemented as a GenServer.call/2
Agent.get_and_update(my_counter_pid, fn state -> {state, state + 1} end)
|> IO.inspect(label: "get_and_update")
# Retrieve the stored state
Agent.get(my_counter_pid, fn state -> state end)
|> IO.inspect(label: "GET")
# Stop the agent process
Agent.stop(my_counter_pid)
```
A agent can also be implemented as a module.
```elixir
defmodule FibonacciStore do
use Agent
@doc "Starts the FibonacciStore agent"
def start_link(_) do
Agent.start_link(fn -> [0, 1] end, name: __MODULE__)
end
@doc "Gets the next number in the Fibonacci series"
def next_fibonacci() do
Agent.get_and_update(__MODULE__, fn [num1, num2] ->
next_fibonacci = num1 + num2
{next_fibonacci, [num2, next_fibonacci]}
end)
end
@doc "Gets the last generated number in the Fibonacci series"
def current_fibonacci() do
Agent.get(__MODULE__, fn [_num1, num2] -> num2 end)
end
end
```
Now lets take our agent server for a spin.
```elixir
# Stop the agent process if already running
if Process.whereis(FibonacciStore), do: Agent.stop(FibonacciStore)
# Start the agent process
FibonacciStore.start_link(:noop)
# Print the current Fibonacci number in the Agent state
FibonacciStore.current_fibonacci() |> IO.inspect(label: "Current Fibonacci")
# Print the next 10 Fibonacci numbers
Enum.each(1..10, fn _ -> IO.puts(FibonacciStore.next_fibonacci()) end)
# Print the current Fibonacci number in the Agent state
FibonacciStore.current_fibonacci() |> IO.inspect(label: "Current Fibonacci")
# Stop the agent process
FibonacciStore
|> Process.whereis()
|> Agent.stop()
```
## Supervising Agents
Typically, agents are included in a supervision tree, much like GenServers. When we use `use Agent` in our module, it automatically creates a `child_spec/1` function, which enables us to start the agent directly under a supervisor.
The process of adding an agent to a supervision tree closely resembles that of a GenServer. To illustrate, let's consider starting our FibonacciStore agent under a supervisor:
```elixir
# Same as {FibonacciStore, []}
children = [FibonacciStore]
Supervisor.start_link(children, strategy: :one_for_all)
# Generate 5 numbers in the Fibonacci series
Enum.each(1..5, fn _ -> FibonacciStore.next_fibonacci() end)
FibonacciStore.current_fibonacci() |> IO.inspect(label: "State before restart")
# Simulate termination of the agent server to check if the supervisor restarts it
FibonacciStore
|> Process.whereis()
|> Process.exit(:boom)
# Wait for the supervisor to restart the agent process
:timer.sleep(200)
# Notice that the agent process was restarted and its state is now set to its initial state
FibonacciStore.current_fibonacci() |> IO.inspect(label: "State after restart")
```
In addition, similar to GenServers, the `use Agent` macro also accepts a list of options to customize the child specification and determine its behavior under a supervisor. By providing these options, we can tailor how the agent operates within the supervision tree. The following options can be passed:
* `:id` - Specifies the identifier for the child specification. By default, it is set to the current module's name.
* `:restart` - Determines the restart strategy for the child. The default value is `:permanent`, which restarts the child process regardless of whether it crashes or is gracefully terminated.
Here's an example of using the `use Agent` macro with customized options:
```elixir
use Agent, restart: :transient, shutdown: 10_000
```
In the above code, the agent child specification is configured to have a restart strategy of `:transient`, meaning that the child process will only be restarted if it terminates abnormally. Additionally, the shutdown strategy is set to allow a grace period of 10,000 milliseconds for the child to shut down before forcefully terminating it.
## Navigation
================================================
FILE: chapters/ch_9.0_gotchas.livemd
================================================
# Gotchas
## Navigation
## Loss of sharing
[Ref](https://medium.com/@johnjocoo/debugging-memory-issues-in-elixir-601c8a0a607d#2e85)
In Elixir, each process manages its own memory, meaning no data is shared between processes. All data sent between processes is fully copied, which includes data written to or read from an ETS table. During this copying, data is flattened, losing any internal sharing of terms.
Within a single process, however, [data can be shared](ch_1.2_immutability_and_memory_management.livemd#persistent-datastructures). For instance, if you have a list assigned to a variable, then prepend an element and assign it to another variable, the tail of the original list is shared between both variables, maintaining the same memory allocation.
Consider preloading associations in Ecto, like Posts and Comments, where a post has many comments. If you fetch 1000 comments and preload their 100 associated posts, Ecto shares these posts among the comments. However, when full copying occurs, each post is duplicated for each comment, resulting in 1000 separate post entries. This process, known as flattening or "loss of sharing," leads to significant memory duplication.
## Navigation
================================================
FILE: chapters/sample_data/top_websites.csv
================================================
1,"fonts.googleapis.com",10
2,"facebook.com",10
3,"twitter.com",10
4,"google.com",10
5,"youtube.com",10
6,"s.w.org",10
7,"instagram.com",10
8,"googletagmanager.com",10
9,"linkedin.com",10
10,"ajax.googleapis.com",10
11,"plus.google.com",10
12,"gmpg.org",10
13,"pinterest.com",9.63
14,"fonts.gstatic.com",9.6
15,"wordpress.org",9.54
16,"en.wikipedia.org",9.54
17,"youtu.be",9.47
18,"maps.google.com",9.3
19,"itunes.apple.com",9.21
20,"github.com",9.18
21,"bit.ly",9.11
22,"play.google.com",9.07
23,"goo.gl",9.03
24,"docs.google.com",9.02
25,"cdnjs.cloudflare.com",8.99
26,"vimeo.com",8.98
27,"support.google.com",8.87
28,"google-analytics.com",8.8
29,"maps.googleapis.com",8.79
30,"flickr.com",8.76
31,"vk.com",8.74
32,"t.co",8.72
33,"reddit.com",8.69
34,"amazon.com",8.66
35,"medium.com",8.64
36,"sites.google.com",8.57
37,"drive.google.com",8.51
38,"creativecommons.org",8.47
39,"microsoft.com",8.47
40,"developers.google.com",8.46
41,"adobe.com",8.44
42,"soundcloud.com",8.41
43,"theguardian.com",8.38
44,"apis.google.com",8.35
45,"ec.europa.eu",8.33
46,"lh3.googleusercontent.com",8.3
47,"chrome.google.com",8.28
48,"cloudflare.com",8.27
49,"nytimes.com",8.26
50,"maxcdn.bootstrapcdn.com",8.25
51,"support.microsoft.com",8.25
52,"blogger.com",8.25
53,"forbes.com",8.24
54,"s3.amazonaws.com",8.23
55,"code.jquery.com",8.23
56,"dropbox.com",8.19
57,"translate.google.com",8.15
58,"paypal.com",8.14
59,"apps.apple.com",8.14
60,"tinyurl.com",8.12
61,"etsy.com",8.1
62,"theatlantic.com",8.09
63,"m.facebook.com",8.08
64,"archive.org",8.05
65,"amzn.to",8.04
66,"cnn.com",8.04
67,"policies.google.com",8.02
68,"commons.wikimedia.org",8.02
69,"issuu.com",8.01
70,"i.imgur.com",8
71,"wordpress.com",8
72,"wp.me",7.99
73,"businessinsider.com",7.98
74,"yelp.com",7.98
75,"mail.google.com",7.98
76,"support.apple.com",7.97
77,"t.me",7.94
78,"apple.com",7.92
79,"washingtonpost.com",7.92
80,"bbc.com",7.92
81,"gstatic.com",7.92
82,"imgur.com",7.91
83,"amazon.de",7.91
84,"bbc.co.uk",7.9
85,"googleads.g.doubleclick.net",7.9
86,"mozilla.org",7.89
87,"eventbrite.com",7.89
88,"slideshare.net",7.88
89,"w3.org",7.87
90,"forms.gle",7.86
91,"platform.twitter.com",7.85
92,"accounts.google.com",7.84
93,"telegraph.co.uk",7.82
94,"messenger.com",7.82
95,"web.archive.org",7.81
96,"secure.gravatar.com",7.81
97,"usatoday.com",7.79
98,"huffingtonpost.com",7.78
99,"stackoverflow.com",7.78
100,"fb.com",7.78