Showing preview only (4,396K chars total). Download the full file or copy to clipboard to get everything.
Repository: isdanni/mit6.824
Branch: master
Commit: dfb4871d31b6
Files: 94
Total size: 4.2 MB
Directory structure:
gitextract_tcaroy90/
├── Makefile
├── README.md
├── lab/
│ ├── lab1 MapReduce.md
│ ├── lab2 Raft.md
│ ├── lab3 Paxos-based KV Service.md
│ └── lab4 shared key value service.md
├── lecture/
│ ├── l01 mapreduce/
│ │ └── l01.txt
│ ├── l02 PRC_threads_crawler_kv/
│ │ ├── PRC_Threads.md
│ │ ├── crawler.go
│ │ └── kv.go
│ ├── l03 GFS/
│ │ └── GFS.md
│ ├── l04 more_primary_backup/
│ │ └── FDS.md
│ ├── l06 fault tolerance raft/
│ │ └── raft.md
│ ├── l07 fault tolerance raft2/
│ │ └── raft2.md
│ └── l08 zookeeper/
│ └── zookeeper.md
└── src/
├── diskv/
│ ├── client.go
│ ├── common.go
│ ├── dist_test.go
│ ├── server.go
│ └── test.go
├── kvpaxos/
│ ├── client.go
│ ├── common.go
│ ├── server.go
│ └── test.go
├── kvraft/
│ ├── client.go
│ ├── common.go
│ ├── config.go
│ ├── server.go
│ └── test.go
├── labgob/
│ ├── labgob.go
│ └── test_test.go
├── labrpc/
│ ├── labrpc.go
│ └── test_test.go
├── linearizability/
│ ├── bitset.go
│ ├── linearizability.go
│ ├── model.go
│ └── models.go
├── main/
│ ├── diskvd.go
│ ├── ii.go
│ ├── lockc.go
│ ├── lockd.go
│ ├── mr-challenge.txt
│ ├── mr-testout.txt
│ ├── pbc.go
│ ├── pbd.go
│ ├── pg-being_ernest.txt
│ ├── pg-dorian_gray.txt
│ ├── pg-frankenstein.txt
│ ├── pg-grimm.txt
│ ├── pg-huckleberry_finn.txt
│ ├── pg-metamorphosis.txt
│ ├── pg-sherlock_holmes.txt
│ ├── pg-tom_sawyer.txt
│ ├── test-ii.sh
│ ├── test-mr.sh
│ ├── test-wc.sh
│ ├── viewd.go
│ └── wc.go
├── mapreduce/
│ ├── 824-mrinput-0.txt
│ ├── common.go
│ ├── common_map.go
│ ├── common_reduce.go
│ ├── common_rpc.go
│ ├── master.go
│ ├── master_rpc.go
│ ├── master_splitmerge.go
│ ├── schedule.go
│ ├── test_test.go
│ └── worker.go
├── paxos/
│ ├── paxos.go
│ └── test_test.go
├── pbservice/
│ ├── client.go
│ ├── common.go
│ ├── server.go
│ └── test.go
├── raft/
│ ├── config.go
│ ├── persister.go
│ ├── raft.go
│ ├── test_test.go
│ └── util.go
├── shardkv/
│ ├── client.go
│ ├── common.go
│ ├── config.go
│ ├── server.go
│ └── test_test.go
├── shardmaster/
│ ├── client.go
│ ├── common.go
│ ├── config.go
│ ├── server.go
│ └── test_test.go
└── viewservice/
├── client.go
├── common.go
├── server.go
└── test.go
================================================
FILE CONTENTS
================================================
================================================
FILE: Makefile
================================================
# This is the Makefile helping you submit the labs.
# Just create 6.824/api.key with your API key in it,
# and submit your lab with the following command:
# $ make [lab1|lab2a|lab2b|lab2c|lab3a|lab3b|lab4a|lab4b]
LABS=" lab1 lab2a lab2b lab2c lab3a lab3b lab4a lab4b "
%:
@echo "Preparing $@-handin.tar.gz"
@echo "Checking for committed temporary files..."
@if git ls-files | grep -E 'mrtmp|mrinput' > /dev/null; then \
echo "" ; \
echo "OBS! You have committed some large temporary files:" ; \
echo "" ; \
git ls-files | grep -E 'mrtmp|mrinput' | sed 's/^/\t/' ; \
echo "" ; \
echo "Follow the instructions at http://stackoverflow.com/a/308684/472927" ; \
echo "to remove them, and then run make again." ; \
echo "" ; \
exit 1 ; \
fi
@if echo $(LABS) | grep -q " $@ " ; then \
echo "Tarring up your submission..." ; \
tar cvzf $@-handin.tar.gz \
"--exclude=src/main/pg-*.txt" \
"--exclude=src/main/diskvd" \
"--exclude=src/mapreduce/824-mrinput-*.txt" \
"--exclude=mrtmp.*" \
"--exclude=src/main/diff.out" \
Makefile src; \
if ! test -e api.key ; then \
echo "Missing $(PWD)/api.key. Please create the file with your key in it or submit the $@-handin.tar.gz via the web interface."; \
else \
echo "Are you sure you want to submit $@? Enter 'yes' to continue:"; \
read line; \
if test "$$line" != "yes" ; then echo "Giving up submission"; exit; fi; \
if test `stat -c "%s" "$@-handin.tar.gz" 2>/dev/null || stat -f "%z" "$@-handin.tar.gz"` -ge 20971520 ; then echo "File exceeds 20MB."; exit; fi; \
mv api.key api.key.fix ; \
cat api.key.fix | tr -d '\n' > api.key ; \
rm api.key.fix ; \
curl -F file=@$@-handin.tar.gz -F "key=<api.key" \
https://6824.scripts.mit.edu/2018/handin.py/upload > /dev/null || { \
echo ; \
echo "Submit seems to have failed."; \
echo "Please upload the tarball manually on the submission website."; } \
fi; \
else \
echo "Bad target $@. Usage: make [$(LABS)]"; \
fi
================================================
FILE: README.md
================================================
# mit6.824 Distributed Systems
Spring 2020. Implemented with Go 1.10.
[https://pdos.csail.mit.edu/6.824/schedule.html](https://pdos.csail.mit.edu/6.824/schedule.html)
### What is 6.824 about?
6.824 is a core 12-unit graduate subject with lectures, readings, programming labs, an optional project, a mid-term exam, and a final exam. It will present abstractions and implementation techniques for engineering distributed systems. Major topics include fault tolerance, replication, and consistency. Much of the class consists of studying and discussing case studies of distributed systems.
### Lab
- Lab 1: MapReduce
- Lab 2: replication for fault-tolerance using Raft
- Lab 3: fault-tolerant key/value store
- Lab 4: sharded key/value store
### Set up
1. Install golang, and setup golang environment variables and directories. Click [here](https://github.com/golang/go/wiki/SettingGOPATH) to learn it.
2. Setup the labs.
```shell
cd $GOPATH
git clone https://github.com/isdanni/mit6.824.git
cd mit6.824
export GOPATH=$GOPATH:$(pwd)
```
### Notes
Tne source code also contains `/kvpaxos`, the implementation of consensus algorithm [paxos](https://en.wikipedia.org/wiki/Paxos_(computer_science));
================================================
FILE: lab/lab1 MapReduce.md
================================================
# 6.824 Lab 1: MapReduce
In this lab you'll build a **MapReduce library** as an introduction to programming in Go and to building fault tolerant distributed systems. In the first part you will write a simple MapReduce program. In the second part you will write a Master that hands out tasks to MapReduce workers, and handles failures of workers. The interface to the library and the approach to fault tolerance is similar to the one described in the original [MapReduce paper](http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf).
### Software
You'll implement this lab (and all the labs) in Go. The Go web site contains lots of tutorial information which you may want to look at. We will grade your labs using Go version 1.9; you should use 1.9 too, though we don't know of any problems with other versions.
The labs are designed to run on **Athena Linux machines** with x86 or x86_64 architecture; `uname -a` should mention `i386 GNU/Linux` or `i686 GNU/Linux` or `x86_64 GNU/Linux`. You can log into a public Athena host with `ssh athena.dialup.mit.edu`. You may get lucky and find that the labs work in other environments, for example on some laptop Linux or OSX installations.
We supply you with parts of a MapReduce implementation that supports both **distributed** and **non-distributed** operation (just the boring bits). You'll fetch the initial lab software with git (a version control system). To learn more about git, look at the Pro Git book or the git user's manual, or, if you are already familiar with other version control systems, you may find this CS-oriented overview of git useful.
These Athena commands will give you access to git and Go:
```shell
athena$ add git
athena$ setup ggo_v1.9
```
The URL for the course git repository is [git://g.csail.mit.edu/6.824-golabs-2018](git://g.csail.mit.edu/6.824-golabs-2018). To install the files in your directory, you need to clone the course repository, by running the commands below.
```shell
$ git clone git://g.csail.mit.edu/6.824-golabs-2018 6.824
$ cd 6.824
$ ls
Makefile src
```
Git allows you to keep track of the changes you make to the code. For example, if you want to checkpoint your progress, you can commit your changes by running:
`$ git commit -am 'partial solution to lab 1'`
The Map/Reduce implementation we give you has support for **two modes of operation**, **sequential** and **distributed**. In the former, the map and reduce tasks are executed one at a time: first, the first map task is executed to completion, then the second, then the third, etc. When all the map tasks have finished, the first reduce task is run, then the second, etc. This mode, while not very fast, is useful for debugging. The distributed mode runs many worker threads that first **execute map tasks in parallel, and then reduce tasks**. This is much faster, but also harder to implement and debug.
### Preamble: Getting familiar with the source
The mapreduce package provides a simple Map/Reduce library (in the mapreduce directory). Applications should normally call `Distributed()` [located in `master.go`] to start a job, but may instead call `Sequential()` [also in master.go] to get a sequential execution for debugging.
The code executes a job as follows:
1. The application provides a number of input files, a map function, a reduce function, and the number of reduce tasks (`nReduce`).
2. A master is created with this knowledge. It starts an RPC server (see `master_rpc.go`), and waits for workers to register (using the RPC call `Register()` [defined in master.go]). As tasks become available (in steps 4 and 5), `schedule()` [schedule.go] decides how to assign those tasks to workers, and how to handle worker failures.
3. The master considers each input file to be one map task, and calls `doMap()` [common_map.go] **at least once for each map task**. It does so either directly (when using `Sequential()`) or by issuing the `DoTask` RPC to a worker [worker.go]. Each call to doMap() reads the appropriate file, calls the map function on that file's contents, and writes the resulting key/value pairs to `nReduce` intermediate files. `doMap()` hashes each key to pick the intermediate file and thus the reduce task that will process the key. There will be `nMap` x `nReduce` files after all map tasks are done. Each file name contains **a prefix**, the **map task number**, and the **reduce task number**. If there are two map tasks and three reduce tasks, the map tasks will create these six intermediate files:
```
mrtmp.xxx-0-0
mrtmp.xxx-0-1
mrtmp.xxx-0-2
mrtmp.xxx-1-0
mrtmp.xxx-1-1
mrtmp.xxx-1-2
```
**Each worker must be able to read files written by any other worker, as well as the input files**. Real deployments use distributed storage systems such as `GFS` to allow this access even though workers run on different machines. In this lab you'll run all the workers on the same machine, and use the local file system.
4. The master next calls `doReduce()` [common_reduce.go] at least once for each reduce task. As with doMap(), it does so either directly or through a worker. The doReduce() for reduce task r collects the r'th intermediate file from each map task, and calls the reduce function for each key that appears in those files. The reduce tasks produce nReduce result files.
5. The master calls `mr.merge()` [master_splitmerge.go], which merges all the nReduce files produced by the previous step into a single output.
6. The master sends a Shutdown RPC to each of its workers, and then shuts down its own RPC server.
> **Note**: Over the course of the following exercises, you will have to write/modify `doMap`, `doReduce`, and `schedule` yourself. These are located in `common_map.go`, `common_reduce.go`, and `schedule.go` respectively. You will also have to write the map and reduce functions in ../`main/wc.go`.
You should not need to modify any other files, but reading them might be useful in order to understand how the other methods fit into the overall architecture of the system.
### Part I: Map/Reduce input and output
The Map/Reduce implementation you are given is missing some pieces. Before you can write your first Map/Reduce function pair, you will need to fix the sequential implementation. In particular, the code we give you is missing two crucial pieces: the function that **divides up the output of a map task**, and the function that **gathers all the inputs for a reduce task**. These tasks are carried out by the `doMap()` function in `common_map.go`, and the `doReduce()` function in `common_reduce.go` respectively. The comments in those files should point you in the right direction.
To help you determine if you have correctly implemented doMap() and doReduce(), we have provided you with a Go test suite that checks the correctness of your implementation. These tests are implemented in the file `test_test.go`. To run the tests for the sequential implementation that you have now fixed, run:
```shell
$ cd 6.824
$ export "GOPATH=$PWD" # go needs $GOPATH to be set to the project's working directory
$ cd "$GOPATH/src/mapreduce"
$ go test -run Sequential
ok mapreduce 2.694s
```
> You receive full credit for this part if your software passes the Sequential tests (as run by the command above) when we run your software on our machines.
If the output did not show ok next to the tests, your implementation has a bug in it. To give more verbose output, set `debugEnabled = true` in `common.go`, and add `-v` to the test command above. You will get much more output along the lines of:
```shell
$ env "GOPATH=$PWD/../../" go test -v -run Sequential
=== RUN TestSequentialSingle
master: Starting Map/Reduce task test
Merge: read mrtmp.test-res-0
master: Map/Reduce task completed
--- PASS: TestSequentialSingle (1.34s)
=== RUN TestSequentialMany
master: Starting Map/Reduce task test
Merge: read mrtmp.test-res-0
Merge: read mrtmp.test-res-1
Merge: read mrtmp.test-res-2
master: Map/Reduce task completed
--- PASS: TestSequentialMany (1.33s)
PASS
ok mapreduce 2.672s
```
### Part II: Single-worker word count
Now you will implement word count — a simple Map/Reduce example. Look in main/wc.go; you'll find empty mapF() and reduceF() functions. Your job is to insert code so that wc.go reports the number of occurrences of each word in its input. A word is any contiguous sequence of letters, as determined by unicode.IsLetter.
There are some input files with pathnames of the form `pg-*.txt` in `~/6.824/src/main`, downloaded from Project Gutenberg. Here's how to run wc with the input files:
```shell
$ cd 6.824
$ export "GOPATH=$PWD"
$ cd "$GOPATH/src/main"
$ go run wc.go master sequential pg-*.txt
# command-line-arguments
./wc.go:14: missing return at end of function
./wc.go:21: missing return at end of function
```
================================================
FILE: lab/lab2 Raft.md
================================================
# 6.824 Lab 2: Raft
> 6.824 - Spring 2018
### Introduction
This is the first in a series of labs in which you'll build a **fault-tolerant key/value storage system**. In this lab you'll implement Raft, a replicated state machine protocol. In the next lab you'll build a key/value service on top of Raft. Then you will “shard” your service over multiple replicated state machines for higher performance.
A replicated service achieves fault tolerance by storing complete copies of its state (i.e., data) on multiple replica servers. Replication allows the service to continue operating even if some of its servers experience failures (crashes or a broken or flaky network). The challenge is that failures may cause the replicas to hold differing copies of the data.
Raft manages a service's state replicas, and in particular it helps the service sort out what the correct state is after failures. Raft implements a replicated state machine. It organizes client requests into a sequence, called the log, and ensures that all the replicas agree on the contents of the log. Each replica executes the client requests in the log in the order they appear in the log, applying those requests to the replica's local copy of the service's state. Since all the live replicas see the same log contents, they all execute the same requests in the same order, and thus continue to have identical service state. If a server fails but later recovers, Raft takes care of bringing its log up to date. Raft will continue to operate as long as at least a majority of the servers are alive and can talk to each other. If there is no such majority, Raft will make no progress, but will pick up where it left off as soon as a majority can communicate again.
In this lab you'll implement Raft as a Go object type with associated methods, meant to be used as a module in a larger service. A set of Raft instances talk to each other with RPC to maintain replicated logs. Your Raft interface will support an indefinite sequence of numbered commands, also called log entries. The entries are numbered with index numbers. The log entry with a given index will eventually be committed. At that point, your Raft should send the log entry to the larger service for it to execute.
Your Raft instances are only allowed to interact using RPC. For example, different Raft instances are not allowed to share Go variables. Your code should not use files at all.
You should consult the extended Raft paper and the Raft lecture notes. You may find it useful to look at this illustration of the Raft protocol, a guide to Raft implementation written for 6.824 students in 2016, and advice about locking and structure for concurrency. For a wider perspective, have a look at Paxos, Chubby, Paxos Made Live, Spanner, Zookeeper, Harp, Viewstamped Replication, and Bolosky et al.
In this lab you'll implement most of the Raft design described in the extended paper, including saving persistent state and reading it after a node fails and then restarts. You will not implement cluster membership changes (Section 6) or log compaction / snapshotting (Section 7).
Start early. Although the amount of code isn't large, getting it to work correctly will be challenging.
Read and understand the extended Raft paper and the Raft lecture notes before you start. Your implementation should follow the paper's description closely, particularly Figure 2, since that's what the tests expect.
This lab is due in three parts. You must submit each part on the corresponding due date. This lab does not involve a lot of code, but concurrency makes it challenging to debug; start each part early.
```shell
$ cd ~/6.824
$ git pull
...
$ cd src/raft
$ GOPATH=~/6.824
$ export GOPATH
$ go test
Test (2A): initial election ...
--- FAIL: TestInitialElection2A (5.04s)
config.go:305: expected one leader, got none
Test (2A): election after network failure ...
--- FAIL: TestReElection2A (5.03s)
config.go:305: expected one leader, got none
...
$
```
================================================
FILE: lab/lab3 Paxos-based KV Service.md
================================================
# 6.824 Lab 3: Paxos-based Key/Value Service
>Part A Due: Fri Feb 27 11:59pm
> Part B Due: Fri Mar 13 11:59pm
### Introduction
Your Lab 2 depends on a single master view server to pick the primary. If the view server is not available (crashes or has network problems), then your key/value service won't work, even if both primary and backup are available. It also has the less critical defect that it copes with a server (primary or backup) that's briefly unavailable (e.g. due to a lost packet) by either blocking or declaring it dead; the latter is very expensive because it requires a complete key/value database transfer.
In this lab you'll fix the above problems by using Paxos to manage the replication of a key/value store. You won't have anything corresponding to a master view server. Instead, a set of replicas will process all client requests in the same order, using Paxos to agree on the order. Paxos will get the agreement right even if some of the replicas are unavailable, or have unreliable network connections, or even if subsets of the replicas are isolated in their own network partitions. As long as Paxos can assemble a majority of replicas, it can process client operations. Replicas that were not in the majority can catch up later by asking Paxos for operations that they missed.
Your system will consist of the following players: clients, kvpaxos servers, and Paxos peers. Clients send Put(), Append(), and Get() RPCs to key/value servers (called kvpaxos servers). A client can send an RPC to any of the kvpaxos servers, and should retry by sending to a different server if there's a failure. Each kvpaxos server contains a replica of the key/value database; handlers for client Get() and Put()/Append() RPCs; and a Paxos peer. Paxos takes the form of a library that is included in each kvpaxos server. A kvpaxos server talks to its local Paxos peer (via method calls). The different Paxos peers talk to each other via RPC to achieve agreement on each operation.
Your Paxos library's interface supports an indefinite sequence of agreement "instances". The instances are numbered with sequence numbers. Each instance is either "decided" or not yet decided. A decided instance has a value. If an instance is decided, then all the Paxos peers that are aware that it is decided will agree on the same value for that instance. The Paxos library interface allows kvpaxos to suggest a value for an instance, and to find out whether an instance has been decided and (if so) what that instance's value is.
Your kvpaxos servers will use Paxos to agree on the order in which client Put()s, Append()s, and Get()s execute. Each time a kvpaxos server receives a Put()/Append()/Get() RPC, it will use Paxos to cause some Paxos instance's value to be a description of that operation. That instance's sequence number determines when the operation executes relative to other operations. In order to find the value to be returned by a Get(), kvpaxos should first apply all Put()s and Append()s that are ordered before the Get() to its key/value database.
You should think of kvpaxos as using Paxos to implement a "log" of Put/Append/Get operations. That is, each Paxos instance is a log element, and the order of operations in the log is the order in which all kvpaxos servers will apply the operations to their key/value databases. Paxos will ensure that the kvpaxos servers agree on this order.
Only RPC may be used for interaction between clients and servers, between different servers, and between different clients. For example, different instances of your server are not allowed to share Go variables or files.
Your Paxos-based key/value storage system will have some limitations that would need to be fixed in order for it to be a serious system. It won't cope with crashes, since it stores neither the key/value database nor the Paxos state on disk. It requires the set of servers to be fixed, so one cannot replace old servers. Finally, it is slow: many Paxos messages are exchanged for each client operation. All of these problems can be fixed.
You should consult the Paxos lecture notes and the Paxos assigned reading. For a wider perspective, have a look at Chubby, Paxos Made Live, Spanner, Zookeeper, Harp, Viewstamped Replication, and Bolosky et al.
### Collaboration Policy
You must write all the code you hand in for 6.824, except for code that we give you as part of the assignment. You are not allowed to look at anyone else's solution, and you are not allowed to look at code from previous years. You may discuss the assignments with other students, but you may not look at or copy each others' code. Please do not publish your code or make it available to future 6.824 students -- for example, please do not make your code visible on github.
### Software
Do a git pull to get the latest lab software. We supply you with new skeleton code and new tests in src/paxos and src/kvpaxos.
```shell
$ add 6.824
$ cd ~/6.824
$ git pull
...
$ cd src/paxos
$ go test
Single proposer: --- FAIL: TestBasic (5.02 seconds)
test_test.go:48: too few decided; seq=0 ndecided=0 wanted=3
Forgetting: --- FAIL: TestForget (5.03 seconds)
test_test.go:48: too few decided; seq=0 ndecided=0 wanted=6
...
$
```
Ignore the huge number of "has wrong number of ins" and "type Paxos has no exported methods" errors.
### Part A: Paxos
First you'll implement a Paxos library. `paxos.go` contains descriptions of the methods you must implement. When you're done, you should pass all the tests in the paxos directory (after ignoring Go's many complaints):
```shell
$ cd ~/6.824/src/paxos
$ go test
Test: Single proposer ...
... Passed
Test: Many proposers, same value ...
... Passed
Test: Many proposers, different values ...
... Passed
Test: Out-of-order instances ...
... Passed
Test: Deaf proposer ...
... Passed
Test: Forgetting ...
... Passed
Test: Lots of forgetting ...
... Passed
Test: Paxos frees forgotten instance memory ...
... Passed
Test: Many instances ...
... Passed
Test: Minority proposal ignored ...
... Passed
Test: Many instances, unreliable RPC ...
... Passed
Test: No decision if partitioned ...
... Passed
Test: Decision in majority partition ...
... Passed
Test: All agree after full heal ...
... Passed
Test: One peer switches partitions ...
... Passed
Test: One peer switches partitions, unreliable ...
... Passed
Test: Many requests, changing partitions ...
... Passed
PASS
ok paxos 59.523s
$
```
Your implementation must support this interface:
```
px = paxos.Make(peers []string, me int)
px.Start(seq int, v interface{}) // start agreement on new instance
px.Status(seq int) (fate Fate, v interface{}) // get info about an instance
px.Done(seq int) // ok to forget all instances <= seq
px.Max() int // highest instance seq known, or -1
px.Min() int // instances before this have been forgotten
```
An application calls Make(peers,me) to create a Paxos peer. The peers argument contains the ports of all the peers (including this one), and the me argument is the index of this peer in the peers array. Start(seq,v) asks Paxos to start agreement on instance seq, with proposed value v; Start() should return immediately, without waiting for agreement to complete. The application calls Status(seq) to find out whether the Paxos peer thinks the instance has reached agreement, and if so what the agreed value is. Status() should consult the local Paxos peer's state and return immediately; it should not communicate with other peers. The application may call Status() for old instances (but see the discussion of Done() below).
Your implementation should be able to make progress on agreement for multiple instances at the same time. That is, if application peers call Start() with different sequence numbers at about the same time, your implementation should run the Paxos protocol concurrently for all of them. You should not wait for agreement to complete for instance i before starting the protocol for instance i+1. Each instance should have its own separate execution of the Paxos protocol.
A long-running Paxos-based server must forget about instances that are no longer needed, and free the memory storing information about those instances. An instance is needed if the application still wants to be able to call Status() for that instance, or if another Paxos peer may not yet have reached agreement on that instance. Your Paxos should implement freeing of instances in the following way. When a particular peer application will no longer need to call Status() for any instance <= x, it should call Done(x). That Paxos peer can't yet discard the instances, since some other Paxos peer might not yet have agreed to the instance. So each Paxos peer should tell each other peer the highest Done argument supplied by its local application. Each Paxos peer will then have a Done value from each other peer. It should find the minimum, and discard all instances with sequence numbers <= that minimum. The Min() method returns this minimum sequence number plus one.
It's OK for your Paxos to piggyback the Done value in the agreement protocol packets; that is, it's OK for peer P1 to only learn P2's latest Done value the next time that P2 sends an agreement message to P1. If Start() is called with a sequence number less than Min(), the Start() call should be ignored. If Status() is called with a sequence number less than Min(), Status() should return Forgotten.
Here is the Paxos pseudo-code (for a single instance) from the lecture:
proposer(v):
while not decided:
choose n, unique and higher than any n seen so far
send prepare(n) to all servers including self
if prepare_ok(n, n_a, v_a) from majority:
v' = v_a with highest n_a; choose own v otherwise
send accept(n, v') to all
if accept_ok(n) from majority:
send decided(v') to all
acceptor's state:
n_p (highest prepare seen)
n_a, v_a (highest accept seen)
acceptor's prepare(n) handler:
if n > n_p
n_p = n
reply prepare_ok(n, n_a, v_a)
else
reply prepare_reject
acceptor's accept(n, v) handler:
if n >= n_p
n_p = n
n_a = n
v_a = v
reply accept_ok(n)
else
reply accept_reject
Here's a reasonable plan of attack:
Add elements to the Paxos struct in paxos.go to hold the state you'll need, according to the lecture pseudo-code. You'll need to define a struct to hold information about each agreement instance.
Define RPC argument/reply type(s) for Paxos protocol messages, based on the lecture pseudo-code. The RPCs must include the sequence number for the agreement instance to which they refer. Remember the field names in the RPC structures must start with capital letters.
Write a proposer function that drives the Paxos protocol for an instance, and RPC handlers that implement acceptors. Start a proposer function in its own thread for each instance, as needed (e.g. in Start()).
At this point you should be able to pass the first few tests.
Now implement forgetting.
Hint: more than one Paxos instance may be executing at a given time, and they may be Start()ed and/or decided out of order (e.g. seq 10 may be decided before seq 5).
Hint: in order to pass tests assuming unreliable network, your paxos should call the local acceptor through a function call rather than RPC.
Hint: remember that multiple application peers may call Start() on the same instance, perhaps with different proposed values. An application may even call Start() for an instance that has already been decided.
Hint: think about how your paxos will forget (discard) information about old instances before you start writing code. Each Paxos peer will need to store instance information in some data structure that allows individual instance records to be deleted (so that the Go garbage collector can free / re-use the memory).
Hint: you do not need to write code to handle the situation where a Paxos peer needs to re-start after a crash. If one of your Paxos peers crashes, it will never be re-started.
Hint: have each Paxos peer start a thread per un-decided instance whose job is to eventually drive the instance to agreement, by acting as a proposer.
Hint: a single Paxos peer may be acting simultaneously as acceptor and proposer for the same instance. Keep these two activities as separate as possible.
Hint: a proposer needs a way to choose a higher proposal number than any seen so far. This is a reasonable exception to the rule that proposer and acceptor should be separate. It may also be useful for the propose RPC handler to return the highest known proposal number if it rejects an RPC, to help the caller pick a higher one next time. The px.me value will be different in each Paxos peer, so you can use px.me to help ensure that proposal numbers are unique.
Hint: figure out the minimum number of messages Paxos should use when reaching agreement in non-failure cases and make your implementation use that minimum.
Hint: the tester calls Kill() when it wants your Paxos to shut down; Kill() sets px.dead. You should call px.isdead() in any loops you have that might run for a while, and break out of the loop if px.isdead() is true. It's particularly important to do this any in any long-running threads you create.
Part B: Paxos-based Key/Value Server
Now you'll build kvpaxos, a fault-tolerant key/value storage system. You'll modify kvpaxos/client.go, kvpaxos/common.go, and kvpaxos/server.go.
Your kvpaxos replicas should stay identical; the only exception is that some replicas may lag others if they are not reachable. If a replica isn't reachable for a while, but then starts being reachable, it should eventually catch up (learn about operations that it missed).
Your kvpaxos client code should try different replicas it knows about until one responds. A kvpaxos replica that is part of a majority of replicas that can all reach each other should be able to serve client requests.
Your storage system must provide sequential consistency to applications that use its client interface. That is, completed application calls to the Clerk.Get(), Clerk.Put(), and Clerk.Append() methods in kvpaxos/client.go must appear to have affected all replicas in the same order and have at-most-once semantics. A Clerk.Get() should see the value written by the most recent Clerk.Put() or Clerk.Append() (in that order) to the same key. One consequence of this is that you must ensure that each application call to Clerk.Put() or Clerk.Append() must appear in that order just once (i.e., write the key/value database just once), even though internally your client.go may have to send RPCs multiple times until it finds a kvpaxos server replica that replies.
Here's a reasonable plan:
Fill in the Op struct in server.go with the "value" information that kvpaxos will use Paxos to agree on, for each client request. Op field names must start with capital letters. You should use Op structs as the agreed-on values -- for example, you should pass Op structs to Paxos Start(). Go's RPC can marshall/unmarshall Op structs; the call to gob.Register() in StartServer() teaches it how.
Implement the PutAppend() handler in server.go. It should enter a Put or Append Op in the Paxos log (i.e., use Paxos to allocate a Paxos instance, whose value includes the key and value (so that other kvpaxoses know about the Put() or Append())). An Append Paxos log entry should contain the Append's arguments, but not the resulting value, since the result might be large.
Implement a Get() handler. It should enter a Get Op in the Paxos log, and then "interpret" the the log before that point to make sure its key/value database reflects all recent Put()s.
Add code to cope with duplicate client requests, including situations where the client sends a request to one kvpaxos replica, times out waiting for a reply, and re-sends the request to a different replica. The client request should execute just once. Please make sure that your scheme for duplicate detection frees server memory quickly, for example by having the client tell the servers which RPCs it has heard a reply for. It's OK to piggyback this information on the next client request.
Hint: your server should try to assign the next available Paxos instance (sequence number) to each incoming client RPC. However, some other kvpaxos replica may also be trying to use that instance for a different client's operation. So the kvpaxos server has to be prepared to try different instances.
Hint: your kvpaxos servers should not directly communicate; they should only interact with each other through the Paxos log.
Hint: as in Lab 2, you will need to uniquely identify client operations to ensure that they execute just once. Also as in Lab 2, you can assume that each clerk has only one outstanding Put, Get, or Append.
Hint: a kvpaxos server should not complete a Get() RPC if it is not part of a majority (so that it does not serve stale data). This means that each Get() (as well as each Put() and Append()) must involve Paxos agreement.
Hint: don't forget to call the Paxos Done() method when a kvpaxos has processed an instance and will no longer need it or any previous instance.
Hint: your code will need to wait for Paxos instances to complete agreement. The only way to do this is to periodically call Status(), sleeping between calls. How long to sleep? A good plan is to check quickly at first, and then more slowly:
to := 10 * time.Millisecond
for {
status, _ := kv.px.Status(seq)
if status == paxos.Decided{
...
return
}
time.Sleep(to)
if to < 10 * time.Second {
to *= 2
}
}
Hint: if one of your kvpaxos servers falls behind (i.e. did not participate in the agreement for some instance), it will later need to find out what (if anything) was agree to. A reasonable way to to this is to call Start(), which will either discover the previously agreed-to value, or cause agreement to happen. Think about what value would be reasonable to pass to Start() in this situation.
Hint: When the test fails, check for gob error (e.g. "rpc: writing response: gob: type not registered for interface ...") in the log because go doesn't consider the error fatal, although it is fatal for the lab.
Handin procedure
Submit your code via the class's submission website, located here:
https://6824.scripts.mit.edu:444/submit/handin.py/
You may use your MIT Certificate or request an API key via email to log in for the first time. Your API key (XXX) is displayed once you logged in, which can be used to upload lab3 from the console as follows. For part A:
$ cd ~/6.824
$ echo XXX > api.key
$ make lab3a
And for part B:
$ cd ~/6.824
$ echo XXX > api.key
$ make lab3b
You can check the submission website to check if your submission is successful.
You will receive full credit if your software passes the test_test.go tests when we run your software on our machines. We will use the timestamp of your last submission for the purpose of calculating late days.
================================================
FILE: lab/lab4 shared key value service.md
================================================
# 6.824 Lab 4: Sharded Key/Value Service
### Introduction
In this lab you'll build a **key/value storage system** that "shards," or partitions, the keys over a set of replica groups. A shard is a subset of the key/value pairs; for example, all the keys starting with "a" might be one shard, all the keys starting with "b" another, etc. The reason for sharding is performance. Each replica group handles puts and gets for just a few of the shards, and the groups operate in parallel; thus total system throughput (puts and gets per unit time) increases in proportion to the number of groups.
Your sharded key/value store will have two main components. First, a set of replica groups. Each replica group is responsible for a subset of the shards. A replica consists of a handful of servers that use Paxos to replicate the group's shard. The second component is the "shard master". The shard master decides which replica group should serve each shard; this information is called the configuration. The configuration changes over time. Clients consult the shard master in order to find the replica group for a key, and replica groups consult the master in order to find out what shards to serve. There is a single shard master for the whole system, implemented as a fault-tolerant service using Paxos.
A sharded storage system must be able to shift shards among replica groups. One reason is that some groups may become more loaded than others, so that shards need to be moved to balance the load. Another reason is that replica groups may join and leave the system: new replica groups may be added to increase capacity, or existing replica groups may be taken offline for repair or retirement.
The main challenge in this lab will be handling reconfiguration in the replica groups. Within a single replica group, all group members must agree on when a reconfiguration occurs relative to client Put/Append/Get requests. For example, a Put may arrive at about the same time as a reconfiguration that causes the replica group to stop being responsible for the shard holding the Put's key. All replicas in the group must agree on whether the Put occurred before or after the reconfiguration. If before, the Put should take effect and the new owner of the shard will see its effect; if after, the Put won't take effect and client must re-try at the new owner. The recommended approach is to have each replica group use Paxos to log not just the sequence of Puts, Appends, and Gets but also the sequence of reconfigurations.
Reconfiguration also requires interaction among the replica groups. For example, in configuration 10 group G1 may be responsible for shard S1. In configuration 11, group G2 may be responsible for shard S1. During the reconfiguration from 10 to 11, G1 must send the contents of shard S1 (the key/value pairs) to G2.
You will need to ensure that at most one replica group is serving requests for each shard. Luckily it is reasonable to assume that each replica group is always available, because each group uses Paxos for replication and thus can tolerate some network and server failures. As a result, your design can rely on one group to actively hand off responsibility to another group during reconfiguration. This is simpler than the situation in primary/backup replication (Lab 2), where the old primary is often not reachable and may still think it is primary.
Only RPC may be used for interaction between clients and servers, between different servers, and between different clients. For example, different instances of your server are not allowed to share Go variables or files.
This lab's general architecture (a configuration service and a set of replica groups) is patterned at a high level on a number of systems: Flat Datacenter Storage, BigTable, Spanner, FAWN, Apache HBase, Rosebud, and many others. These systems differ in many details from this lab, though, and are also typically more sophisticated and capable. For example, your lab lacks persistent storage for key/value pairs and for the Paxos log; it sends more messages than required per Paxos agreement; it cannot evolve the sets of peers in each Paxos group; its data and query models are very simple; and handoff of shards is slow and doesn't allow concurrent client access.
### Part A: The Shard Master
```shell
$ cd ~/6.824/src/shardmaster
$ go test
Test: Basic leave/join ...
... Passed
Test: Historical queries ...
... Passed
Test: Move ...
... Passed
Test: Concurrent leave/join ...
... Passed
Test: Minimal transfers after joins ...
... Passed
Test: Minimal transfers after leaves ...
... Passed
Test: Concurrent leave/join, failure ...
... Passed
PASS
ok shardmaster 11.200s
$
```
================================================
FILE: lecture/l01 mapreduce/l01.txt
================================================
6.824 2018 Lecture 1: Introduction
6.824: Distributed Systems Engineering
What is a distributed system?
multiple cooperating computers
storage for big web sites, MapReduce, peer-to-peer sharing, &c
lots of critical infrastructure is distributed
Why distributed?
to organize physically separate entities
to achieve security via isolation
to tolerate faults via replication
to scale up throughput via parallel CPUs/mem/disk/net
But:
complex: many concurrent parts
must cope with partial failure
tricky to realize performance potential
Why take this course?
interesting -- hard problems, powerful solutions
used by real systems -- driven by the rise of big Web sites
active research area -- lots of progress + big unsolved problems
hands-on -- you'll build serious systems in the labs
COURSE STRUCTURE
http://pdos.csail.mit.edu/6.824
Course staff:
Malte Schwarzkopf, lecturer
Robert Morris, lecturer
Deepti Raghavan, TA
Edward Park, TA
Erik Nguyen, TA
Anish Athalye, TA
Course components:
lectures
readings
two exams
labs
final project (optional)
TA office hours
piazza for announcements and lab help
Lectures:
big ideas, paper discussion, and labs
Readings:
research papers, some classic, some new
the papers illustrate key ideas and important details
many lectures focus on the papers
please read papers before class!
each paper has a short question for you to answer
and you must send us a question you have about the paper
submit question&answer by midnight the night before
Exams:
Mid-term exam in class
Final exam during finals week
Lab goals:
deeper understanding of some important techniques
experience with distributed programming
first lab is due a week from Friday
one per week after that for a while
Lab 1: MapReduce
Lab 2: replication for fault-tolerance using Raft
Lab 3: fault-tolerant key/value store
Lab 4: sharded key/value store
Optional final project at the end, in groups of 2 or 3.
The final project substitutes for Lab 4.
You think of a project and clear it with us.
Code, short write-up, short demo on last day.
Lab grades depend on how many test cases you pass
we give you the tests, so you know whether you'll do well
careful: if it often passes, but sometimes fails,
chances are it will fail when we run it
Debugging the labs can be time-consuming
start early
come to TA office hours
ask questions on Piazza
MAIN TOPICS
This is a course about infrastructure, to be used by applications.
About abstractions that hide distribution from applications.
Three big kinds of abstraction:
Storage.
Communication.
Computation.
[diagram: users, application servers, storage servers]
A couple of topics come up repeatedly.
Topic: implementation
RPC, threads, concurrency control.
Topic: performance
The dream: scalable throughput.
Nx servers -> Nx total throughput via parallel CPU, disk, net.
So handling more load only requires buying more computers.
Scaling gets harder as N grows:
Load im-balance, stragglers.
Non-parallelizable code: initialization, interaction.
Bottlenecks from shared resources, e.g. network.
Note that some performance problems aren't easily attacked by scaling
e.g. decreasing response time for a single user request
might require programmer effort rather than just more computers
Topic: fault tolerance
1000s of servers, complex net -> always something broken
We'd like to hide these failures from the application.
We often want:
Availability -- app can make progress despite failures
Durability -- app will come back to life when failures are repaired
Big idea: replicated servers.
If one server crashes, client can proceed using the other(s).
Topic: consistency
General-purpose infrastructure needs well-defined behavior.
E.g. "Get(k) yields the value from the most recent Put(k,v)."
Achieving good behavior is hard!
"Replica" servers are hard to keep identical.
Clients may crash midway through multi-step update.
Servers crash at awkward moments, e.g. after executing but before replying.
Network may make live servers look dead; risk of "split brain".
Consistency and performance are enemies.
Consistency requires communication, e.g. to get latest Put().
"Strong consistency" often leads to slow systems.
High performance often imposes "weak consistency" on applications.
People have pursued many design points in this spectrum.
CASE STUDY: MapReduce
Let's talk about MapReduce (MR) as a case study
MR is a good illustration of 6.824's main topics
and is the focus of Lab 1
MapReduce overview
context: multi-hour computations on multi-terabyte data-sets
e.g. analysis of graph structure of crawled web pages
only practical with 1000s of computers
often not developed by distributed systems experts
distribution can be very painful, e.g. coping with failure
overall goal: non-specialist programmers can easily split
data processing over many servers with reasonable efficiency.
programmer defines Map and Reduce functions
sequential code; often fairly simple
MR runs the functions on 1000s of machines with huge inputs
and hides details of distribution
Abstract view of MapReduce
input is divided into M files
[diagram: maps generate rows of K-V pairs, reduces consume columns]
Input1 -> Map -> a,1 b,1 c,1
Input2 -> Map -> b,1
Input3 -> Map -> a,1 c,1
| | |
| | -> Reduce -> c,2
| -----> Reduce -> b,2
---------> Reduce -> a,2
MR calls Map() for each input file, produces set of k2,v2
"intermediate" data
each Map() call is a "task"
MR gathers all intermediate v2's for a given k2,
and passes them to a Reduce call
final output is set of <k2,v3> pairs from Reduce()
stored in R output files
[diagram: MapReduce API --
map(k1, v1) -> list(k2, v2)
reduce(k2, list(v2) -> list(k2, v3)]
Example: word count
input is thousands of text files
Map(k, v)
split v into words
for each word w
emit(w, "1")
Reduce(k, v)
emit(len(v))
MapReduce hides many painful details:
starting s/w on servers
tracking which tasks are done
data movement
recovering from failures
MapReduce scales well:
N computers gets you Nx throughput.
Assuming M and R are >= N (i.e., lots of input files and map output keys).
Maps()s can run in parallel, since they don't interact.
Same for Reduce()s.
The only interaction is via the "shuffle" in between maps and reduces.
So you can get more throughput by buying more computers.
Rather than special-purpose efficient parallelizations of each application.
Computers are cheaper than programmers!
What will likely limit the performance?
We care since that's the thing to optimize.
CPU? memory? disk? network?
In 2004 authors were limited by "network cross-section bandwidth".
[diagram: servers, tree of network switches]
Note all data goes over network, during Map->Reduce shuffle.
Paper's root switch: 100 to 200 gigabits/second
1800 machines, so 55 megabits/second/machine.
Small, e.g. much less than disk (~50-100 MB/s at the time) or RAM speed.
So they cared about minimizing movement of data over the network.
(Datacenter networks are much faster today.)
More details (paper's Figure 1):
master: gives tasks to workers; remembers where intermediate output is
M Map tasks, R Reduce tasks
input stored in GFS, 3 copies of each Map input file
all computers run both GFS and MR workers
many more input tasks than workers
master gives a Map task to each worker
hands out new tasks as old ones finish
Map worker hashes intermediate keys into R partitions, on local disk
Q: What's a good data structure for implementing this?
no Reduce calls until all Maps are finished
master tells Reducers to fetch intermediate data partitions from Map workers
Reduce workers write final output to GFS (one file per Reduce task)
How does detailed design reduce effect of slow network?
Map input is read from GFS replica on local disk, not over network.
Intermediate data goes over network just once.
Map worker writes to local disk, not GFS.
Intermediate data partitioned into files holding many keys.
Q: Why not stream the records to the reducer (via TCP) as they are being
produced by the mappers?
How do they get good load balance?
Critical to scaling -- bad for N-1 servers to wait for 1 to finish.
But some tasks likely take longer than others.
[diagram: packing variable-length tasks into workers]
Solution: many more tasks than workers.
Master hands out new tasks to workers who finish previous tasks.
So no task is so big it dominates completion time (hopefully).
So faster servers do more work than slower ones, finish abt the same time.
What about fault tolerance?
I.e. what if a server crashes during a MR job?
Hiding failures is a huge part of ease of programming!
Q: Why not re-start the whole job from the beginning?
MR re-runs just the failed Map()s and Reduce()s.
MR requires them to be pure functions:
they don't keep state across calls,
they don't read or write files other than expected MR inputs/outputs,
there's no hidden communication among tasks.
So re-execution yields the same output.
The requirement for pure functions is a major limitation of
MR compared to other parallel programming schemes.
But it's critical to MR's simplicity.
Details of worker crash recovery:
* Map worker crashes:
master sees worker no longer responds to pings
crashed worker's intermediate Map output is lost
but is likely needed by every Reduce task!
master re-runs, spreads tasks over other GFS replicas of input.
some Reduce workers may already have read failed worker's intermediate data.
here we depend on functional and deterministic Map()!
master need not re-run Map if Reduces have fetched all intermediate data
though then a Reduce crash would then force re-execution of failed Map
* Reduce worker crashes.
finshed tasks are OK -- stored in GFS, with replicas.
master re-starts worker's unfinished tasks on other workers.
* Reduce worker crashes in the middle of writing its output.
GFS has atomic rename that prevents output from being visible until complete.
so it's safe for the master to re-run the Reduce tasks somewhere else.
Other failures/problems:
* What if the master gives two workers the same Map() task?
perhaps the master incorrectly thinks one worker died.
it will tell Reduce workers about only one of them.
* What if the master gives two workers the same Reduce() task?
they will both try to write the same output file on GFS!
atomic GFS rename prevents mixing; one complete file will be visible.
* What if a single worker is very slow -- a "straggler"?
perhaps due to flakey hardware.
master starts a second copy of last few tasks.
* What if a worker computes incorrect output, due to broken h/w or s/w?
too bad! MR assumes "fail-stop" CPUs and software.
* What if the master crashes?
recover from check-point, or give up on job
For what applications *doesn't* MapReduce work well?
Not everything fits the map/shuffle/reduce pattern.
Small data, since overheads are high. E.g. not web site back-end.
Small updates to big data, e.g. add a few documents to a big index
Unpredictable reads (neither Map nor Reduce can choose input)
Multiple shuffles, e.g. page-rank (can use multiple MR but not very efficient)
More flexible systems allow these, but more complex model.
How might a real-world web company use MapReduce?
"CatBook", a new company running a social network for cats; needs to:
1) build a search index, so people can find other peoples' cats
2) analyze popularity of different cats, to decide advertising value
3) detect dogs and remove their profiles
Can use MapReduce for all these purposes!
- run large batch jobs over all profiles every night
1) build inverted index: map(profile text) -> (word, cat_id)
reduce(word, list(cat_id) -> list(word, list(cat_id))
2) count profile visits: map(web logs) -> (cat_id, "1")
reduce(cat_id, list("1")) -> list(cat_id, count)
3) filter profiles: map(profile image) -> img analysis -> (cat_id, "dog!")
reduce(cat_id, list("dog!")) -> list(cat_id)
Conclusion
MapReduce single-handedly made big cluster computation popular.
- Not the most efficient or flexible.
+ Scales well.
+ Easy to program -- failures and data movement are hidden.
These were good trade-offs in practice.
We'll see some more advanced successors later in the course.
Have fun with the lab!
================================================
FILE: lecture/l02 PRC_threads_crawler_kv/PRC_Threads.md
================================================
# 6.824 2018 Lecture 2: Infrastructure: RPC and threads
Most commonly-asked question:
### Why Go?
6.824 used C++ for many years. C++ worked out well but students **spent time tracking down pointer** and **alloc/free bugs** and there's **no very satisfactory C++ RPC package**
Go is a bit better than C++ for us:
- good support for concurrency (goroutines, channels, &c)
- good support for RPC
- garbage-collected (no use after freeing problems)
- type safe
- threads + GC is particularly attractive!
We like programming in Go: relatively simple and traditional
After the tutorial, use https://golang.org/doc/effective_go.html
Russ Cox will give a guest lecture March 8th.
### Why threads?
Threads are a **useful structuring tool**; Go calls them **goroutines**; everyone else calls them threads; they can be tricky
- They express concurrency, which shows up naturally in distributed systems I/O concurrency:
- While waiting for a response from another server, process next request
- Multicore: Threads run in parallel on several cores
Thread = "thread of execution"
- threads allow one program to (logically) execute many things at once
- the threads share memory
- each thread includes some per-thread state:
- program counter, registers, stack
How many threads in a program?
- Sometimes driven by **structure**
- e.g. one thread per client, one for background tasks
- Sometimes driven by desire for **multi-core parallelism**
- so one active thread per core
- the Go runtime automatically schedules runnable goroutines on available cores
- Sometimes driven by desire for **I/O concurrency**
the number is determined by latency and capacity
keep increasing until throughput stops growing
Go threads are pretty cheap
100s or 1000s are fine, but maybe not millions
Creating a thread is more expensive than a method call
Threading challenges:
sharing data
one thread reads data that another thread is changing?
e.g. two threads do count = count + 1
this is a "race" -- and is usually a bug
-> use Mutexes (or other synchronization)
-> or avoid sharing
coordination between threads
how to wait for all Map threads to finish?
-> use Go channels or WaitGroup
granularity of concurrency
coarse-grained -> simple, but little concurrency/parallelism
fine-grained -> more concurrency, more races and deadlocks
What is a crawler?
goal is to fetch all web pages, e.g. to feed to an indexer
web pages form a graph
multiple links to each page
graph has cycles
Crawler challenges
Arrange for I/O concurrency
Fetch many URLs at the same time
To increase URLs fetched per second
Since network latency is much more of a limit than network capacity
Fetch each URL only *once*
avoid wasting network bandwidth
be nice to remote servers
=> Need to remember which URLs visited
Know when finished
Crawler solutions [crawler.go link on schedule page]
Serial crawler:
the "fetched" map avoids repeats, breaks cycles
it's a single map, passed by reference to recursive calls
but: fetches only one page at a time
ConcurrentMutex crawler:
Creates a thread for each page fetch
Many concurrent fetches, higher fetch rate
The threads share the fetched map
Why the Mutex (== lock)?
Without the lock:
Two web pages contain links to the same URL
Two threads simultaneouly fetch those two pages
T1 checks fetched[url], T2 checks fetched[url]
Both see that url hasn't been fetched
Both fetch, which is wrong
Simultaneous read and write (or write+write) is a "race"
And often indicates a bug
The bug may show up only for unlucky thread interleavings
What will happen if I comment out the Lock()/Unlock() calls?
go run crawler.go
go run -race crawler.go
The lock causes the check and update to be atomic
How does it decide it is done?
sync.WaitGroup
implicitly waits for children to finish recursive fetches
ConcurrentChannel crawler
a Go channel:
a channel is an object; there can be many of them
ch := make(chan int)
a channel lets one thread send an object to another thread
ch <- x
the sender waits until some goroutine receives
y := <- ch
for y := range ch
a receiver waits until some goroutine sends
so you can use a channel to both communicate and synchronize
several threads can send and receive on a channel
remember: sender blocks until the receiver receives!
may be dangerous to hold a lock while sending...
ConcurrentChannel master()
master() creates a worker goroutine to fetch each page
worker() sends URLs on a channel
multiple workers send on the single channel
master() reads URLs from the channel
[diagram: master, channel, workers]
No need to lock the fetched map, because it isn't shared!
Is there any shared data?
The channel
The slices and strings sent on the channel
The arguments master() passes to worker()
When to use sharing and locks, versus channels?
Most problems can be solved in either style
What makes the most sense depends on how the programmer thinks
state -- sharing and locks
communication -- channels
waiting for events -- channels
Use Go's race detector:
https://golang.org/doc/articles/race_detector.html
go test -race
Remote Procedure Call (RPC)
a key piece of distributed system machinery; all the labs use RPC
goal: easy-to-program client/server communication
RPC message diagram:
Client Server
request--->
<---response
RPC tries to mimic local fn call:
Client:
z = fn(x, y)
Server:
fn(x, y) {
compute
return z
}
Rarely this simple in practice...
Software structure
client app handlers
stubs dispatcher
RPC lib RPC lib
net ------------ net
Go example: kv.go link on schedule page
A toy key/value storage server -- Put(key,value), Get(key)->value
Uses Go's RPC library
Common:
You have to declare Args and Reply struct for each RPC type
Client:
connect()'s Dial() creates a TCP connection to the server
Call() asks the RPC library to perform the call
you specify server function name, arguments, place to put reply
library marshalls args, sends request, waits, unmarshally reply
return value from Call() indicates whether it got a reply
usually you'll also have a reply.Err indicating service-level failure
Server:
Go requires you to declare an object with methods as RPC handlers
You then register that object with the RPC library
You accept TCP connections, give them to RPC library
The RPC library
reads each request
creates a new goroutine for this request
unmarshalls request
calls the named method (dispatch)
marshalls reply
writes reply on TCP connection
The server's Get() and Put() handlers
Must lock, since RPC library creates per-request goroutines
read args; modify reply
A few details:
Binding: how does client know who to talk to?
For Go's RPC, server name/port is an argument to Dial
Big systems have some kind of name or configuration server
Marshalling: format data into packets
Go's RPC library can pass strings, arrays, objects, maps, &c
Go passes pointers by copying (server can't directly use client pointer)
Cannot pass channels or functions
RPC problem: what to do about failures?
e.g. lost packet, broken network, slow server, crashed server
What does a failure look like to the client RPC library?
Client never sees a response from the server
Client does *not* know if the server saw the request!
Maybe server never saw the request
Maybe server executed, crashed just before sending reply
Maybe server executed, but network died just before delivering reply
[diagram of lost reply]
Simplest failure-handling scheme: "best effort"
Call() waits for response for a while
If none arrives, re-send the request
Do this a few times
Then give up and return an error
Q: is "best effort" easy for applications to cope with?
A particularly bad situation:
client executes
Put("k", 10);
Put("k", 20);
both succeed
what will Get("k") yield?
[diagram, timeout, re-send, original arrives late]
Q: is best effort ever OK?
read-only operations
operations that do nothing if repeated
e.g. DB checks if record has already been inserted
Better RPC behavior: "at most once"
idea: server RPC code detects duplicate requests
returns previous reply instead of re-running handler
Q: how to detect a duplicate request?
client includes unique ID (XID) with each request
uses same XID for re-send
server:
if seen[xid]:
r = old[xid]
else
r = handler()
old[xid] = r
seen[xid] = true
some at-most-once complexities
this will come up in lab 3
how to ensure XID is unique?
big random number?
combine unique client ID (ip address?) with sequence #?
server must eventually discard info about old RPCs
when is discard safe?
idea:
each client has a unique ID (perhaps a big random number)
per-client RPC sequence numbers
client includes "seen all replies <= X" with every RPC
much like TCP sequence #s and acks
or only allow client one outstanding RPC at a time
arrival of seq+1 allows server to discard all <= seq
how to handle dup req while original is still executing?
server doesn't know reply yet
idea: "pending" flag per executing RPC; wait or ignore
What if an at-most-once server crashes and re-starts?
if at-most-once duplicate info in memory, server will forget
and accept duplicate requests after re-start
maybe it should write the duplicate info to disk
maybe replica server should also replicate duplicate info
Go RPC is a simple form of "at-most-once"
open TCP connection
write request to TCP connection
Go RPC never re-sends a request
So server won't see duplicate requests
Go RPC code returns an error if it doesn't get a reply
perhaps after a timeout (from TCP)
perhaps server didn't see request
perhaps server processed request but server/net failed before reply came back
What about "exactly once"?
unbounded retries plus duplicate detection plus fault-tolerant service
Lab 3
================================================
FILE: lecture/l02 PRC_threads_crawler_kv/crawler.go
================================================
package main
import (
"fmt"
"sync"
)
//
// Several solutions to the crawler exercise from the Go tutorial
// https://tour.golang.org/concurrency/10
//
//
// Serial crawler
//
func Serial(url string, fetcher Fetcher, fetched map[string]bool) {
if fetched[url] {
return
}
fetched[url] = true
urls, err := fetcher.Fetch(url)
if err != nil {
return
}
for _, u := range urls {
Serial(u, fetcher, fetched)
}
return
}
//
// Concurrent crawler with shared state and Mutex
//
type fetchState struct {
mu sync.Mutex
fetched map[string]bool
}
func ConcurrentMutex(url string, fetcher Fetcher, f *fetchState) {
f.mu.Lock()
if f.fetched[url] {
f.mu.Unlock()
return
}
f.fetched[url] = true
f.mu.Unlock()
urls, err := fetcher.Fetch(url)
if err != nil {
return
}
var done sync.WaitGroup
for _, u := range urls {
done.Add(1)
go func(u string) {
defer done.Done()
ConcurrentMutex(u, fetcher, f)
}(u)
}
done.Wait()
return
}
func makeState() *fetchState {
f := &fetchState{}
f.fetched = make(map[string]bool)
return f
}
//
// Concurrent crawler with channels
//
func worker(url string, ch chan []string, fetcher Fetcher) {
urls, err := fetcher.Fetch(url)
if err != nil {
ch <- []string{}
} else {
ch <- urls
}
}
func master(ch chan []string, fetcher Fetcher) {
n := 1
fetched := make(map[string]bool)
for urls := range ch {
for _, u := range urls {
if fetched[u] == false {
fetched[u] = true
n += 1
go worker(u, ch, fetcher)
}
}
n -= 1
if n == 0 {
break
}
}
}
func ConcurrentChannel(url string, fetcher Fetcher) {
ch := make(chan []string)
go func() {
ch <- []string{url}
}()
master(ch, fetcher)
}
//
// main
//
func main() {
fmt.Printf("=== Serial===\n")
Serial("http://golang.org/", fetcher, make(map[string]bool))
fmt.Printf("=== ConcurrentMutex ===\n")
ConcurrentMutex("http://golang.org/", fetcher, makeState())
fmt.Printf("=== ConcurrentChannel ===\n")
ConcurrentChannel("http://golang.org/", fetcher)
}
//
// Fetcher
//
type Fetcher interface {
// Fetch returns a slice of URLs found on the page.
Fetch(url string) (urls []string, err error)
}
// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult
type fakeResult struct {
body string
urls []string
}
func (f fakeFetcher) Fetch(url string) ([]string, error) {
if res, ok := f[url]; ok {
fmt.Printf("found: %s\n", url)
return res.urls, nil
}
fmt.Printf("missing: %s\n", url)
return nil, fmt.Errorf("not found: %s", url)
}
// fetcher is a populated fakeFetcher.
var fetcher = fakeFetcher{
"http://golang.org/": &fakeResult{
"The Go Programming Language",
[]string{
"http://golang.org/pkg/",
"http://golang.org/cmd/",
},
},
"http://golang.org/pkg/": &fakeResult{
"Packages",
[]string{
"http://golang.org/",
"http://golang.org/cmd/",
"http://golang.org/pkg/fmt/",
"http://golang.org/pkg/os/",
},
},
"http://golang.org/pkg/fmt/": &fakeResult{
"Package fmt",
[]string{
"http://golang.org/",
"http://golang.org/pkg/",
},
},
"http://golang.org/pkg/os/": &fakeResult{
"Package os",
[]string{
"http://golang.org/",
"http://golang.org/pkg/",
},
},
}
================================================
FILE: lecture/l02 PRC_threads_crawler_kv/kv.go
================================================
package main
import (
"fmt"
"log"
"net"
"net/rpc"
"sync"
)
//
// RPC request/reply definitions
//
const (
OK = "OK"
ErrNoKey = "ErrNoKey"
)
type Err string
type PutArgs struct {
Key string
Value string
}
type PutReply struct {
Err Err
}
type GetArgs struct {
Key string
}
type GetReply struct {
Err Err
Value string
}
//
// Client
//
func connect() *rpc.Client {
client, err := rpc.Dial("tcp", ":1234")
if err != nil {
log.Fatal("dialing:", err)
}
return client
}
func get(key string) string {
client := connect()
args := GetArgs{"subject"}
reply := GetReply{}
err := client.Call("KV.Get", &args, &reply)
if err != nil {
log.Fatal("error:", err)
}
client.Close()
return reply.Value
}
func put(key string, val string) {
client := connect()
args := PutArgs{"subject", "6.824"}
reply := PutReply{}
err := client.Call("KV.Put", &args, &reply)
if err != nil {
log.Fatal("error:", err)
}
client.Close()
}
//
// Server
//
type KV struct {
mu sync.Mutex
data map[string]string
}
func server() {
kv := new(KV)
kv.data = map[string]string{}
rpcs := rpc.NewServer()
rpcs.Register(kv)
l, e := net.Listen("tcp", ":1234")
if e != nil {
log.Fatal("listen error:", e)
}
go func() {
for {
conn, err := l.Accept()
if err == nil {
go rpcs.ServeConn(conn)
} else {
break
}
}
l.Close()
}()
}
func (kv *KV) Get(args *GetArgs, reply *GetReply) error {
kv.mu.Lock()
defer kv.mu.Unlock()
val, ok := kv.data[args.Key]
if ok {
reply.Err = OK
reply.Value = val
} else {
reply.Err = ErrNoKey
reply.Value = ""
}
return nil
}
func (kv *KV) Put(args *PutArgs, reply *PutReply) error {
kv.mu.Lock()
defer kv.mu.Unlock()
kv.data[args.Key] = args.Value
reply.Err = OK
return nil
}
//
// main
//
func main() {
server()
put("subject", "6.824")
fmt.Printf("Put(subject, 6.824) done\n")
fmt.Printf("get(subject) -> %s\n", get("subject"))
}
================================================
FILE: lecture/l03 GFS/GFS.md
================================================
# 6.824 2018 Lecture 3: GFS
[The Google File System - Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/035fc972c796d33122033a0614bc94cff1527999.pdf)
### Why are we reading this paper?
- the file system used for map/reducemain themes of 6.824 show up in this paper.
- trading consistency for simplicity and performance
- motivation for subsequent designs
- good systems paper -- details from apps all the way to network
- performance, fault-tolerance, consistency
influential
- many other systems use GFS (e.g., Bigtable, Spanner @ Google)
- HDFS (Hadoop Distributed File System) based on GFS
### What is consistency?
- A correctness condition
- Important but difficult to achieve when data is replicated
- especially when application access it concurrently
- [diagram: simple example, single machine]
- if an application writes, what will a later read observe?
- what if the read is from a different application?
- but with replication, each write must also happen on other machines
- [diagram: two more machines, reads and writes go across]
- Clearly we have a problem here.
##### Weak consistency
read() may return stale data --- not the result of the most recent write
##### Strong consistency
read() always returns the data from the most recent write()
##### General tension between these:
- strong consistency is easy for application writers
- strong consistency is bad for performance
- weak consistency has good performance and is easy to scale to many servers
- weak consistency is complex to reason about
Many trade-offs give rise to different correctness conditions
These are called "consistency models"
First peek today; will show up in almost every paper we read this term
##### "Ideal" consistency model
Let's go back to the single-machine case; Would be nice if a replicated FS behaved like a non-replicated file system
[diagram: many clients on the same machine accessing files on single disk]
If one application writes, later reads will observe that write
What if two application concurrently write to the same file?
Q: what happens on a single machine?
In file systems often undefined --- file may have some mixed content
What if two application concurrently write to the same directory
Q: what happens on a single machine?
One goes first, the other goes second (use locking)
##### Challenges to achieving ideal consistency
- Concurrency -- as we just saw; plus there are many disks in reality
- Machine failures -- any operation can fail to complete
- Network partitions -- may not be able to reach every machine/disk
- Why are these challenges difficult to overcome?
- Requires communication between clients and servers
- May cost performance
- Protocols can become complex --- see next week
- Difficult to implement system correctly
- Many systems in 6.824 don't provide ideal
- GFS is one example
##### GFS goals:
With so many machines, failures are common
must tolerate
assume a machine fails once per year
w/ 1000 machines, ~3 will fail per day.
High-performance: many concurrent readers and writers
Map/Reduce jobs read and store final result in GFS
Note: *not* the temporary, intermediate files
Use network efficiently: save bandwidth
These challenges difficult combine with "ideal" consistency
High-level design / Reads
[Figure 1 diagram, master + chunkservers]
Master stores directories, files, names, open/read/write
But not POSIX
100s of Linux chunk servers with disks
store 64MB chunks (an ordinary Linux file for each chunk)
each chunk replicated on three servers
Q: Besides availability of data, what does 3x replication give us?
load balancing for reads to hot files
affinity
Q: why not just store one copy of each file on a RAID'd disk?
RAID isn't commodity
Want fault-tolerance for whole machine; not just storage device
Q: why are the chunks so big?
amortizes overheads, reduces state size in the master
GFS master server knows directory hierarchy
for directory, wht files are in it
for file, knows chunk servers for each 64 MB
master keeps state in memory
64 bytes of metadata per each chunk
master has private recoverable database for metadata
operation log flushed to disk
occasional asynchronous compression info checkpoint
N.B.: != the application checkpointing in §2.7.2
master can recovery quickly from power failure
shadow masters that lag a little behind master
can be promoted to master
Client read:
send file name and chunk index to master
master replies with set of servers that have that chunk
response includes version # of chunk
clients cache that information
ask nearest chunk server
checks version #
if version # is wrong, re-contact master
##### Writes
[Figure 2-style diagram with file offset sequence]
Random client write to existing file
client asks master for chunk locations + primary
master responds with chunk servers, version #, and who is primary
primary has (or gets) 60s lease
client computes chain of replicas based on network topology
client sends data to first replica, which forwards to others
pipelines network use, distributes load
replicas ack data receipt
client tells primary to write
primary assign sequence number and writes
then tells other replicas to write
once all done, ack to client
what if there's another concurrent client writing to the same place?
client 2 get sequenced after client 1, overwrites data
now client 2 writes again, this time gets sequenced first (C1 may be slow)
writes, but then client 1 comes and overwrites
=> all replicas have same data (= consistent), but mix parts from C1/C2
(= NOT defined)
Client append (not record append)
same deal, but may put parts from C1 and C2 in any order
consistent, but not defined
or, if just one client writes, no problem -- both consistent and defined
##### Record append
Client record append
client asks master for chunk locations
client pushes data to replicas, but specifies no offset
client contacts primary when data is on all chunk servers
primary assigns sequence number
primary checks if append fits into chunk
if not, pad until chunk boundary
primary picks offset for append
primary applies change locally
primary forwards request to replicas
let's saw R3 fails mid-way through applying the write
primary detects error, tells client to try again
client retries after contacting master
master has perhaps brought up R4 in the meantime (or R3 came back)
one replica now has a gap in the byte sequence, so can't just append
pad to next available offset across all replicas
primary and secondaries apply writes
primary responds to client after receiving acks from all replicas
##### Housekeeping
Master can appoint new primary if master doesn't refresh lease
Master replicates chunks if number replicas drop below some number
Master rebalances replicas
##### Failures
Chunk servers are easy to replace
failure may cause some clients to retry (& duplicate records)
Master: down -> GFS is unavailable
shadow master can serve read-only operations, which may return stale data
Q: Why not write operations?
split-brain syndrome (see next lecture)
##### Does GFS achieve "ideal" consistency?
Two cases: directories and files
Directories: yes, but...
Yes: strong consistency (only one copy)
But: master not always available & scalability limit
Files: not always
Mutations with atomic appends
record can be duplicated at two offsets
while other replicas may have a hole at one offset
Mutations without atomic append
data of several clients maybe intermingled
if you care, use atomic append or a temporary file and atomically rename
An "unlucky" client can read stale data for short period of time
A failed mutation leaves chunks inconsistent
The primary chunk server updated chunk
But then failed and the replicas are out of date
A client may read an not-up-to-date chunk
When client refreshes lease it will learn about new version #
Authors claims weak consistency is not a big problems for apps
Most file updates are append-only updates
Application can use UID in append records to detect duplicates
Application may just read less data (but not stale data)
Application can use temporary files and atomic rename
##### Performance (Figure 3)
huge aggregate throughput for read (3 copies, striping)
125 MB/sec in aggregate
Close to saturating network
writes to different files lower than possible maximum
authors blame their network stack
it causes delays in propagating chunks from one replica to next
concurrent appends to single file
limited by the server that stores last chunk
numbers and specifics have changed a lot in 15 years!
### Summary
case study of performance, fault-tolerance, consistency
specialized for MapReduce applications
what works well in GFS?
huge sequential reads and writes
appends
huge throughput (3 copies, striping)
fault tolerance of data (3 copies)
what less well in GFS?
fault-tolerance of master
small files (master a bottleneck)
clients may see stale data
appends maybe duplicated
### References
http://queue.acm.org/detail.cfm?id=1594206 (discussion of gfs evolution)
http://highscalability.com/blog/2010/9/11/googles-colossus-makes-search-real-time-by-dumping-mapreduce.html
================================================
FILE: lecture/l04 more_primary_backup/FDS.md
================================================
# 6.824 2014 Lecture 4: FDS Case Study
[Flat Datacenter Storage
Nightingale, Elson, Fan, Hofmann, Howell, Suzue
OSDI 2012](https://www.usenix.org/system/files/conference/osdi12/osdi12-final-75.pdf)
### why are we looking at this paper?
Lab 2 wants to be like this when it grows up though details are all different
- fantastic performance -- world record cluster sort
- good systems paper -- details from apps all the way to network
### what is FDS?
a **cluster storage system**
- stores giant blobs -- 128-bit ID, multi-megabyte content
- clients and servers connected by network with high bisection bandwidth for big-data processing (like MapReduce)
- cluster of 1000s of computers processing data in parallel
### high-level design -- a common pattern
- lots of clients
- lots of storage servers ("tractservers")
- partition the data
- master ("metadata server") controls partitioning
- replica groups for reliability
### why is this high-level design useful?
- 1000s of disks of space -> store giant blobs, or many big blobs
- 1000s of servers/disks/arms of parallel throughput
- can expand over time -- reconfiguration
- large pool of storage servers for instant replacement after failure
### motivating app: MapReduce-style sort
- a mapper reads its split 1/Mth of the input file (e.g., a tract)
map emits a <key, record> for each record in split
map partitions keys among R intermediate files (M*R intermediate files in total)
a reducer reads 1 of R intermediate files produced by each mapper
reads M intermediate files (of 1/R size)
sorts its input
produces 1/Rth of the final sorted output file (R blobs)
FDS sort
FDS sort does not store the intermediate files in FDS
a client is both a mapper and reducer
FDS sort is not locality-aware
in mapreduce, master schedules workers on machine that are close to the data
e.g., in same cluster
later versions of FDS sort uses more fine-grained work assignment
e.g., mapper doesn't get 1/N of the input file but something smaller
deals better with stragglers
### The Abstract's main claims are about performance.
They set the world-record for disk-to-disk sorting in 2012 for MinuteSort
1,033 disks and 256 computers (136 tract servers, 120 clients)
1,401 Gbyte in 59.4s
### Q: does the abstract's 2 GByte/sec per client seem impressive?
- how fast can you read a file from Athena AFS? (abt 10 MB/sec)
- how fast can you read a typical hard drive?
- how fast can typical networks move data?
### Q: abstract claims recover from lost disk (92 GB) in 6.2 seconds
- that's 15 GByte / sec
- impressive?
- how is that even possible? that's 30x the speed of a disk!
- who might care about this metric?
### what should we want to know from the paper?
- API?
- layout?
- finding data?
- add a server?
- replication?
- failure handling?
- failure model?
- consistent reads/writes? (i.e. does a read see latest write?)
- config mgr failure handling?
- good performance?
- useful for apps?
* API
Figure 1
128-bit blob IDs
blobs have a length
only whole-tract read and write -- 8 MB
Q: why are 128-bit blob IDs a nice interface?
why not file names?
Q: why do 8 MB tracts make sense?
(Figure 3...)
Q: what kinds of client applications is the API aimed at?
and not aimed at?
* Layout: how do they spread data over the servers?
Section 2.2
break each blob into 8 MB tracts
TLT maintained by metadata server
has n entries
for blob b and tract t, i = (hash(b) + t) mod n
TLT[i] contains list of tractservers w/ copy of the tract
clients and servers all have copies of the latest TLT table
Example four-entry TLT with no replication:
0: S1
1: S2
2: S3
3: S4
suppose hash(27) = 2
then the tracts of blob 27 are laid out:
S1: 2 6
S2: 3 7
S3: 0 4 8
S4: 1 5 ...
FDS is "striping" blobs over servers at tract granularity
Q: why have tracts at all? why not store each blob on just one server?
what kinds of apps will benefit from striping?
what kinds of apps won't?
Q: how fast will a client be able to read a single tract?
Q: where does the abstract's single-client 2 GB number come from?
Q: why not the UNIX i-node approach?
store an array per blob, indexed by tract #, yielding tractserver
so you could make per-tract placement decisions
e.g. write new tract to most lightly loaded server
Q: why not hash(b + t)?
Q: how many TLT entries should there be?
how about n = number of tractservers?
why do they claim this works badly? Section 2.2
The system needs to choose server pairs (or triplets &c) to put in TLT entries
For replication
Section 3.3
Q: how about
0: S1 S2
1: S2 S1
2: S3 S4
3: S4 S3
...
Why is this a bad idea?
How long will repair take?
What are the risks if two servers fail?
Q: why is the paper's n^2 scheme better?
TLT with n^2 entries, with every server pair occuring once
0: S1 S2
1: S1 S3
2: S1 S4
3: S2 S1
4: S2 S3
5: S2 S4
...
How long will repair take?
What are the risks if two servers fail?
Q: why do they actually use a minimum replication level of 3?
same n^2 table as before, third server is randomly chosen
What effect on repair time?
What effect on two servers failing?
What if three disks fail?
* Adding a tractserver
To increase the amount of disk space / parallel throughput
Metadata server picks some random TLT entries
Substitutes new server for an existing server in those TLT entries
* How do they maintain n^2 plus one arrangement as servers leave join?
Unclear.
Q: how long will adding a tractserver take?
Q: what about client writes while tracts are being transferred?
receiving tractserver may have copies from client(s) and from old srvr
how does it know which is newest?
Q: what if a client reads/writes but has an old tract table?
* Replication
A writing client sends a copy to each tractserver in the TLT.
A reading client asks one tractserver.
Q: why don't they send writes through a primary?
Q: what problems are they likely to have because of lack of primary?
why weren't these problems show-stoppers?
* What happens after a tractserver fails?
Metadata server stops getting heartbeat RPCs
Picks random replacement for each TLT entry failed server was in
New TLT gets a new version number
Replacement servers fetch copies
Example of the tracts each server holds:
S1: 0 4 8 ...
S2: 0 1 ...
S3: 4 3 ...
S4: 8 2 ...
Q: why not just pick one replacement server?
Q: how long will it take to copy all the tracts?
Q: if a tractserver's net breaks and is then repaired, might srvr serve old data?
Q: if a server crashes and reboots with disk intact, can contents be used?
e.g. if it only missed a few writes?
3.2.1's "partial failure recovery"
but won't it have already been replaced?
how to know what writes it missed?
Q: when is it better to use 3.2.1's partial failure recovery?
* What happens when the metadata server crashes?
Q: while metadata server is down, can the system proceed?
Q: is there a backup metadata server?
Q: how does rebooted metadata server get a copy of the TLT?
Q: does their scheme seem correct?
how does the metadata server know it has heard from all tractservers?
how does it know all tractservers were up to date?
* Random issues
Q: is the metadata server likely to be a bottleneck?
Q: why do they need the scrubber application mentioned in 2.3?
why don't they delete the tracts when the blob is deleted?
can a blob be written after it is deleted?
* Performance
Q: how do we know we're seeing "good" performance?
what's the best you can expect?
Q: limiting resource for 2 GB / second single-client?
Q: Figure 4a: why starts low? why goes up? why levels off?
why does it level off at that particular performance?
Q: Figure 4b shows random r/w as fast as sequential (Figure 4a).
is this what you'd expect?
Q: why are writes slower than reads with replication in Figure 4c?
Q: where does the 92 GB in 6.2 seconds come from?
Table 1, 4th column
that's 15 GB / second, both read and written
1000 disks, triple replicated, 128 servers?
what's the limiting resource? disk? cpu? net?
##### How big is each sort bucket?
i.e. is the sort of each bucket in-memory?
1400 GB total
128 compute servers
between 12 and 96 GB of RAM each
hmm, say 50 on average, so total RAM may be 6400 GB
thus sort of each bucket is in memory, does not write passes to FDS
thus total time is just four transfers of 1400 GB
client limit: 128 * 2 GB/s = 256 GB / sec
disk limit: 1000 * 50 MB/s = 50 GB / sec
thus bottleneck is likely to be disk throughput
================================================
FILE: lecture/l06 fault tolerance raft/raft.md
================================================
# 6.824 2020 Lecture 6: Raft (1)
> this lecture
> today: Raft elections and log handling(Lab 2A, 2B)
> next: Raft persistence, client
> behavior, snapshots (Lab 2C, Lab 3)
a pattern in the fault-tolerant systems we've seen
* MR replicates computation but relies on a single master to organize
* GFS replicates data but relies on the master to pick primaries
* VMware FT replicates service but relies on test-and-set to pick primary
all rely on a single entity to make critical decisions
nice: decisions by a single entity avoid split brain
### how coulds split brain arise, and why is it damaging?
suppose we're replicating a test-and-set service
the client request sets the state to 1, server replies w/ previous state
only one client should get a reply with "0" !!!
it's a lock, only one requester should get it
[C1, C2, S1, S2]
suppose client C1 can contact replica S1, but not replica S2
should C1 proceed with just replica S1?
if S2 has really crashed, C1 *must* proceed without S2,
otherwise the service doesn't tolerate faults!
if S2 is up but network prevents C1 from contacting S2,
C1 should *not* proceed without S2,
since S2 might be alive and serving client C2
with this setup, we're faced with a nasty choice:
either no ability to tolerate faults, despite replication, or
the possibility of incorrect operation due to split brain
the problem: computers cannot distinguish "server crashed" vs "network broken"
the symptom is the same: no response to a query over the network
the bad situation is often called "network partition":
C1 can talk to S1, C2 can talk to S2,
but C1+S1 see no responses from C2+S2
this difficulty seemed insurmountable for a long time
seemed to require outside agent (a human) to decide when to cut over
or a single perfectly reliable server (FT's test-and-set server)
or a perfectly reliable network (so "no response" == "crashed")
BUT these are all single points of failure -- not desirable
can one do better?
The big insight for coping w/ partition: majority vote
require an odd number of servers, e.g. 3
agreement from a majority is required to do anything -- 2 out of 3
why does majority help avoid split brain?
at most one partition can have a majority
breaks the symmetry we saw with just two servers
note: majority is out of all servers, not just out of live ones
more generally 2f+1 can tolerate f failed servers
since the remaining f+1 is a majority of 2f+1
if more than f fail (or can't be contacted), no progress
often called "quorum" systems
a key property of majorities is that any two must intersect
e.g. successive majorities for Raft leader election must overlap
and the intersection can convey information about previous decisions
Two partition-tolerant replication schemes were invented around 1990,
Paxos and View-Stamped Replication
in the last 15 years this technology has seen a lot of real-world use
the Raft paper is a good introduction to modern techniques
### topic: Raft overview
state machine replication with Raft -- Lab 3 as example:
[diagram: clients, 3 replicas, k/v layer + state, raft layer + logs]
Raft is a library included in each replica
time diagram of one client command
[C, L, F1, F2]
client sends Put/Get "command" to k/v layer in leader
leader adds command to log
leader sends AppendEntries RPCs to followers
followers add command to log
leader waits for replies from a bare majority (including itself)
entry is "committed" if a majority put it in their logs
committed means won't be forgotten even if failures
majority -> will be seen by the next leader's vote requests
leader executes command, replies to client
leader "piggybacks" commit info in next AppendEntries
followers execute entry once leader says it's committed
### why the logs?
the service keeps the state machine state, e.g. key/value DB
why isn't that enough?
the log orders the commands
to help replicas agree on a single execution order
to help the leader ensure followers have identical logs
the log stores tentative commands until committed
the log stores commands in case leader must re-send to followers
the log stores commands persistently for replay after reboot
### are the servers' logs exact replicas of each other?
no: some replicas may lag
no: we'll see that they can temporarily have different entries
the good news:
they'll eventually converge to be identical
the commit mechanism ensures servers only execute stable entries
### lab 2 Raft interface
rf.Start(command) (index, term, isleader)
Lab 3 k/v server's Put()/Get() RPC handlers call Start()
Start() only makes sense on the leader
starts Raft agreement on a new log entry
add to leader's log
leader sends out AppendEntries RPCs
Start() returns w/o waiting for RPC replies
k/v layer's Put()/Get() must wait for commit, on applyCh
agreement might fail if server loses leadership before committing
then the command is likely lost, client must re-send
isleader: false if this server isn't the leader, client should try another
term: currentTerm, to help caller detect if leader is later demoted
index: log entry to watch to see if the command was committed
ApplyMsg, with Index and Command
each peer sends an ApplyMsg on applyCh for each committed entry
each peer's local service code executes, updates local replica state
leader sends reply to waiting client RPC
### there are two main parts to Raft's design:
electing a new leader
ensuring identical logs despite failures
# topic: leader election (Lab 2A)
### why a leader?
ensures all replicas execute the same commands, in the same order
(some designs, e.g. Paxos, don't have a leader)
### Raft numbers the sequence of leaders
new leader -> new term
a term has at most one leader; might have no leader
the numbering helps servers follow latest leader, not superseded leader
### when does a Raft peer start a leader election?
when it doesn't hear from current leader for an "election timeout"
increments local currentTerm, tries to collect votes
note: this can lead to un-needed elections; that's slow but safe
note: old leader may still be alive and think it is the leader
### how to ensure at most one leader in a term?
(Figure 2 RequestVote RPC and Rules for Servers)
leader must get "yes" votes from a majority of servers
each server can cast only one vote per term
if candidate, votes for itself
if not a candidate, votes for first that asks (within Figure 2 rules)
at most one server can get majority of votes for a given term
-> at most one leader even if network partition
-> election can succeed even if some servers have failed
### how does a server learn about newly elected leader?
new leader sees yes votes from majority
others see AppendEntries heart-beats with a higher term number
i.e. from the new leader
the heart-beats suppress any new election
### an election may not succeed for two reasons:
* less than a majority of servers are reachable
* simultaneous candidates split the vote, none gets majority
### what happens if an election doesn't succeed?
**another timeout** (no heartbeat), a new election (and new term)
higher term takes precedence, candidates for older terms quit
### how does Raft avoid split votes?
each server picks a random election timeout
[diagram of times at which servers' timeouts expire]
randomness breaks symmetry among the servers
one will choose lowest random delay
hopefully enough time to elect before next timeout expires
others will see new leader's AppendEntries heartbeats and
not become candidates
randomized delays are a common pattern in network protocols
### how to choose the election timeout?
* at least a few heartbeat intervals (in case network drops a heartbeat)
to avoid needless elections, which waste time
* random part long enough to let one candidate succeed before next starts
* short enough to react quickly to failure, avoid long pauses
* short enough to allow a few re-tries before tester gets upset
tester requires election to complete in 5 seconds or less
### what if old leader isn't aware a new leader is elected?
perhaps old leader didn't see election messages
perhaps old leader is in a minority network partition
new leader means a majority of servers have incremented currentTerm
so old leader (w/ old term) can't get majority for AppendEntries
so old leader won't commit or execute any new log entries
thus no split brain
but a minority may accept old server's AppendEntries
so logs may diverge at end of old term
================================================
FILE: lecture/l07 fault tolerance raft2/raft2.md
================================================
#### 6.824 2020 Lecture 7: Raft (2)
*** topic: the Raft log (Lab 2B)
as long as the leader stays up:
clients only interact with the leader
clients can't see follower states or logs
things get interesting when changing leaders
e.g. after the old leader fails
how to change leaders without anomalies?
diverging replicas, missing operations, repeated operations, &c
### what do we want to ensure?
if any server executes a given command in a log entry,
then no server executes something else for that log entry
(Figure 3's State Machine Safety)
why? if the servers disagree on the operations, then a
change of leader might change the client-visible state,
which violates our goal of mimicing a single server.
example:
S1: put(k1,v1) | put(k1,v2)
S2: put(k1,v1) | put(k2,x)
can't allow both to execute their 2nd log entries!
### how can logs disagree after a crash?
a leader crashes before sending last AppendEntries to all
S1: 3
S2: 3 3
S3: 3 3
worse: logs might have different commands in same entry!
after a series of leader crashes, e.g.
10 11 12 13 <- log entry #
S1: 3
S2: 3 3 4
S3: 3 3 5
### Raft forces agreement by having followers adopt new leader's log
example:
S3 is chosen as new leader for term 6
S3 sends an AppendEntries with entry 13
prevLogIndex=12
prevLogTerm=5
S2 replies false (AppendEntries step 2)
S3 decrements nextIndex[S2] to 12
S3 sends AppendEntries w/ entries 12+13, prevLogIndex=11, prevLogTerm=3
S2 deletes its entry 12 (AppendEntries step 3)
similar story for S1, but S3 has to back up one farther
### the result of roll-back:
each live follower deletes tail of log that differs from leader
then each live follower accepts leader's entries after that point
now followers' logs are identical to leader's log
### Q: why was it OK to forget about S2's index=12 term=4 entry?
could new leader roll back *committed* entries from end of previous term?
i.e. could a committed entry be missing from the new leader's log?
this would be a disaster -- old leader might have already said "yes" to a client
so: Raft needs to ensure elected leader has all committed log entries
### why not elect the server with the longest log as leader?
example:
S1: 5 6 7
S2: 5 8
S3: 5 8
first, could this scenario happen? how?
S1 leader in term 6; crash+reboot; leader in term 7; crash and stay down
both times it crashed after only appending to its own log
Q: after S1 crashes in term 7, why won't S2/S3 choose 6 as next term?
next term will be 8, since at least one of S2/S3 learned of 7 while voting
S2 leader in term 8, only S2+S3 alive, then crash
all peers reboot
who should be next leader?
S1 has longest log, but entry 8 could have committed !!!
so new leader can only be one of S2 or S3
i.e. the rule cannot be simply "longest log"
end of 5.4.1 explains the "election restriction"
RequestVote handler only votes for candidate who is "at least as up to date":
candidate has higher term in last log entry, or
candidate has same last term and same length or longer log
so:
S2 and S3 won't vote for S1
S2 and S3 will vote for each other
so only S2 or S3 can be leader, will force S1 to discard 6,7
ok since 6,7 not on majority -> not committed -> reply never sent to clients
-> clients will resend the discarded commands
the point:
"at least as up to date" rule ensures new leader's log contains
all potentially committed entries
so new leader won't roll back any committed operation
The Question (from last lecture)
figure 7, top server is dead; which can be elected?
depending on who is elected leader in Figure 7, different entries
will end up committed or discarded
some will always remain committed: 111445566
they *could* have been committed + executed + replied to
some will certainly be discarded: f's 2 and 3; e's last 4,4
c's 6,6 and d's 7,7 may be discarded OR committed
### how to roll back quickly
the Figure 2 design backs up one entry per RPC -- slow!
lab tester may require faster roll-back
paper outlines a scheme towards end of Section 5.3
no details; here's my guess; better schemes are possible
```
Case 1 Case 2 Case 3
S1: 4 5 5 4 4 4 4
S2: 4 6 6 6 or 4 6 6 6 or 4 6 6 6
```
S2 is leader for term 6, S1 comes back to life, S2 sends AE for last 6; AE has prevLogTerm=6
rejection from S1 includes:
- XTerm: term in the conflicting entry (if any)
- XIndex: index of first entry with that term (if any)
XLen: log length
```
Case 1 (leader doesn't have XTerm):
nextIndex = XIndex
Case 2 (leader has XTerm):
nextIndex = leader's last entry for XTerm
Case 3 (follower's log is too short):
nextIndex = XLen
```
# topic: persistence (Lab 2C)
what would we like to happen after a server crashes?
Raft can continue with one missing server
but failed server must be repaired soon to avoid dipping below a majority
two strategies:
* replace with a fresh (empty) server
requires transfer of entire log (or snapshot) to new server (slow)
we *must* support this, in case failure is permanent
* or reboot crashed server, re-join with state intact, catch up
requires state that persists across crashes
we *must* support this, for simultaneous power failure
let's talk about the second strategy -- persistence
### if a server crashes and restarts, what must Raft remember?
Figure 2 lists "persistent state":
log[], currentTerm, votedFor
a Raft server can only re-join after restart if these are intact
thus it must save them to non-volatile storage
non-volatile = disk, SSD, battery-backed RAM, &c
save after each change -- many points in code
or before sending any RPC or RPC reply
##### why log[]?
if a server was in leader's majority for committing an entry,
must remember entry despite reboot, so any future leader is
guaranteed to see the committed log entry
##### why votedFor?
to prevent a client from voting for one candidate, then reboot,
then vote for a different candidate in the same (or older!) term
could lead to two leaders for the same term
##### why currentTerm?
to ensure terms only increase, so each term has at most one leader
to detect RPCs from stale leaders and candidates
some Raft state is volatile
commitIndex, lastApplied, next/matchIndex[]
why is it OK not to save these?
persistence is often the bottleneck for performance
a hard disk write takes 10 ms, SSD write takes 0.1 ms
so persistence limits us to 100 to 10,000 ops/second
(the other potential bottleneck is RPC, which takes << 1 ms on a LAN)
lots of tricks to cope with slowness of persistence:
batch many new log entries per disk write
persist to battery-backed RAM, not disk
##### how does the service (e.g. k/v server) recover its state after a crash+reboot?
easy approach: start with empty state, re-play Raft's entire persisted log
lastApplied is volatile and starts at zero, so you may need no extra code!
this is what Figure 2 does
but re-play will be too slow for a long-lived system
faster: use Raft snapshot and replay just the tail of the log
# topic: log compaction and Snapshots (Lab 3B)
problem:
log will get to be huge -- much larger than state-machine state!
will take a long time to re-play on reboot or send to a new server
luckily:
a server doesn't need *both* the complete log *and* the service state
the executed part of the log is captured in the state
clients only see the state, not the log
service state usually much smaller, so let's keep just that
what entries *can't* a server discard?
un-executed entries -- not yet reflected in the state
un-committed entries -- might be part of leader's majority
solution: service periodically creates persistent "snapshot"
[diagram: service state, snapshot on disk, raft log (same in mem and disk)]
copy of service state as of execution of a specific log entry
e.g. k/v table
service writes snapshot to persistent storage (disk)
snapshot includes index of last included log entry
service tells Raft it is snapshotted through some log index
Raft discards log before that index
a server can create a snapshot and discard prefix of log at any time
e.g. when log grows too long
### what happens on crash+restart?
service reads snapshot from disk
Raft reads persisted log from disk
service tells Raft to set lastApplied to last included index
to avoid re-applying already-applied log entries
problem: what if follower's log ends before leader's log starts?
because follower was offline and leader discarded early part of log
nextIndex[i] will back up to start of leader's log
so leader can't repair that follower with AppendEntries RPCs
thus the InstallSnapshot RPC
philosophical note:
state is often equivalent to operation history
you can often choose which one to store or communicate
we'll see examples of this duality later in the course
practical notes:
Raft's snapshot scheme is reasonable if the state is small
for a big DB, e.g. if replicating gigabytes of data, not so good
slow to create and write to disk
perhaps service data should live on disk in a B-Tree
no need to explicitly snapshot, since on disk already
dealing with lagging replicas is hard, though
leader should save the log for a while
or remember which parts of state have been updated
### linearizability
we need a definition of "correct" for Lab 3 &c
how should clients expect Put and Get to behave?
often called a consistency contract
helps us reason about how to handle complex situations correctly
e.g. concurrency, replicas, failures, RPC retransmission,
leader changes, optimizations
we'll see many consistency definitions in 6.824
"linearizability" is the most common and intuitive definition
formalizes behavior expected of a single server ("strong" consistency)
linearizability definition:
an execution history is linearizable if
one can find a total order of all operations,
that matches real-time (for non-overlapping ops), and
in which each read sees the value from the
write preceding it in the order.
a history is a record of client operations, each with
arguments, return value, time of start, time completed
example history 1:
|-Wx1-| |-Wx2-|
|---Rx2---|
|-Rx1-|
"Wx1" means "write value 1 to record x"
"Rx1" means "a read of record x yielded value 1"
draw the constraint arrows:
the order obeys value constraints (W -> R)
the order obeys real-time constraints (Wx1 -> Wx2)
this order satisfies the constraints:
Wx1 Rx1 Wx2 Rx2
so the history is linearizable
note: the definition is based on external behavior
so we can apply it without having to know how service works
note: histories explicitly incorporates concurrency in the form of
overlapping operations (ops don't occur at a point in time), thus good
match for how distributed systems operate.
example history 2:
|-Wx1-| |-Wx2-|
|--Rx2--|
|-Rx1-|
draw the constraint arrows:
Wx1 before Wx2 (time)
Wx2 before Rx2 (value)
Rx2 before Rx1 (time)
Rx1 before Wx2 (value)
there's a cycle -- so it cannot be turned into a linear order. so this
history is not linearizable. (it would be linearizable w/o Rx2, even
though Rx1 overlaps with Wx2.)
example history 3:
|--Wx0--| |--Wx1--|
|--Wx2--|
|-Rx2-| |-Rx1-|
order: Wx0 Wx2 Rx2 Wx1 Rx1
so the history linearizable.
so:
the service can pick either order for concurrent writes.
e.g. Raft placing concurrent ops in the log.
example history 4:
|--Wx0--| |--Wx1--|
|--Wx2--|
C1: |-Rx2-| |-Rx1-|
C2: |-Rx1-| |-Rx2-|
what are the constraints?
Wx2 then C1:Rx2 (value)
C1:Rx2 then Wx1 (value)
Wx1 then C2:Rx1 (value)
C2:Rx1 then Wx2 (value)
a cycle! so not linearizable.
so:
service can choose either order for concurrent writes
but all clients must see the writes in the same order
this is important when we have replicas or caches
they have to all agree on the order in which operations occur
example history 5:
|-Wx1-|
|-Wx2-|
|-Rx1-|
constraints:
Wx2 before Rx1 (time)
Rx1 before Wx2 (value)
(or: time constraints mean only possible order is Wx1 Wx2 Rx1)
there's a cycle; not linearizable
so:
reads must return fresh data: stale values aren't linearizable
even if the reader doesn't know about the write
the time rule requires reads to yield the latest data
linearzability forbids many situations:
split brain (two active leaders)
forgetting committed writes after a reboot
reading from lagging replicas
example history 6:
suppose clients re-send requests if they don't get a reply
in case it was the response that was lost:
leader remembers client requests it has already seen
if sees duplicate, replies with saved response from first execution
but this may yield a saved value from long ago -- a stale value!
what does linearizabilty say?
C1: |-Wx3-| |-Wx4-|
C2: |-Rx3-------------|
order: Wx3 Rx3 Wx4
so: returning the old saved value 3 is correct
You may find this page useful:
https://www.anishathalye.com/2017/06/04/testing-distributed-systems-for-linearizability/
*** duplicate RPC detection (Lab 3)
What should a client do if a Put or Get RPC times out?
i.e. Call() returns false
if server is dead, or request dropped: re-send
if server executed, but request lost: re-send is dangerous
problem:
these two cases look the same to the client (no reply)
if already executed, client still needs the result
idea: duplicate RPC detection
let's have the k/v service detect duplicate client requests
client picks an ID for each request, sends in RPC
same ID in re-sends of same RPC
k/v service maintains table indexed by ID
makes an entry for each RPC
record value after executing
if 2nd RPC arrives with the same ID, it's a duplicate
generate reply from the value in the table
design puzzles:
when (if ever) can we delete table entries?
if new leader takes over, how does it get the duplicate table?
if server crashes, how does it restore its table?
idea to keep the duplicate table small
one table entry per client, rather than one per RPC
each client has only one RPC outstanding at a time
each client numbers RPCs sequentially
when server receives client RPC #10,
it can forget about client's lower entries
since this means client won't ever re-send older RPCs
some details:
each client needs a unique client ID -- perhaps a 64-bit random number
client sends client ID and seq # in every RPC
repeats seq # if it re-sends
duplicate table in k/v service indexed by client ID
contains just seq #, and value if already executed
RPC handler first checks table, only Start()s if seq # > table entry
each log entry must include client ID, seq #
when operation appears on applyCh
update the seq # and value in the client's table entry
wake up the waiting RPC handler (if any)
what if a duplicate request arrives before the original executes?
could just call Start() (again)
it will probably appear twice in the log (same client ID, same seq #)
when cmd appears on applyCh, don't execute if table says already seen
how does a new leader get the duplicate table?
all replicas should update their duplicate tables as they execute
so the information is already there if they become leader
if server crashes how does it restore its table?
if no snapshots, replay of log will populate the table
if snapshots, snapshot must contain a copy of the table
but wait!
the k/v server is now returning old values from the duplicate table
what if the reply value in the table is stale?
is that OK?
example:
C1 C2
-- --
put(x,10)
first send of get(x), 10 reply dropped
put(x,20)
re-sends get(x), gets 10 from table, not 20
what does linearizabilty say?
C1: |-Wx10-| |-Wx20-|
C2: |-Rx10-------------|
order: Wx10 Rx10 Wx20
so: returning the remembered value 10 is correct
*** read-only operations (end of Section 8)
Q: does the Raft leader have to commit read-only operations in
the log before replying? e.g. Get(key)?
that is, could the leader respond immediately to a Get() using
the current content of its key/value table?
A: no, not with the scheme in Figure 2 or in the labs.
suppose S1 thinks it is the leader, and receives a Get(k).
it might have recently lost an election, but not realize,
due to lost network packets.
the new leader, say S2, might have processed Put()s for the key,
so that the value in S1's key/value table is stale.
serving stale data is not linearizable; it's split-brain.
so: Figure 2 requires Get()s to be committed into the log.
if the leader is able to commit a Get(), then (at that point
in the log) it is still the leader. in the case of S1
above, which unknowingly lost leadership, it won't be
able to get the majority of positive AppendEntries replies
required to commit the Get(), so it won't reply to the client.
but: many applications are read-heavy. committing Get()s
takes time. is there any way to avoid commit
for read-only operations? this is a huge consideration in
practical systems.
idea: leases
modify the Raft protocol as follows
define a lease period, e.g. 5 seconds
after each time the leader gets an AppendEntries majority,
it is entitled to respond to read-only requests for
a lease period without commiting read-only requests
to the log, i.e. without sending AppendEntries.
a new leader cannot execute Put()s until previous lease period
has expired
so followers keep track of the last time they responded
to an AppendEntries, and tell the new leader (in the
RequestVote reply).
result: faster read-only operations, still linearizable.
note: for the Labs, you should commit Get()s into the log;
don't implement leases.
in practice, people are often (but not always) willing to live with stale
data in return for higher performance
================================================
FILE: lecture/l08 zookeeper/zookeeper.md
================================================
# 6.824 2020 Lecture 8: Zookeeper Case Study
Reading: "ZooKeeper: wait-free coordination for internet-scale systems", Patrick
Hunt, Mahadev Konar, Flavio P. Junqueira, Benjamin Reed. Proceedings of the 2010
USENIX Annual Technical Conference.
### What questions does this paper shed light on?
* Can we have coordination as a stand-alone general-purpose service?
What should the API look like?
How can other distributed applications use it?
* We paid lots of money for Nx replica servers.
Can we get Nx performance from them?
First, performance.
For now, view ZooKeeper as some service replicated with a Raft-like scheme.
Much like Lab 3.
[clients, leader/state/log, followers/state/log]
Does this replication arrangement get faster as we add more servers?
Assume a busy system, lots of active clients.
Writes probably get slower with more replicas!
Since leader must send each write to growing # of followers.
What about reads?
### Q: Can replicas serve read-only client requests form their local state?
Without involving the leader or other replicas?
Then total read capacity would be O(# servers), not O(1)!
Q: Would reads from followers be linearizable?
Would reads always yield fresh data?
No:
Replica may not be in majority, so may not have seen a completed write.
Replica may not yet have seen a commit for a completed write.
Replica may be entirely cut off from the leader (same as above).
Linearizability forbids stale reads!
### Q: What if a client reads from an up-to-date replica, then a lagging replica?
It may see data values go *backwards* in time! Also forbidden.
##### Raft and Lab 3 avoid these problems.
Clients have to send reads to the leader.
So Lab 3 reads are linearizable.
But no opportunity to divide the read load over the followers.
##### How does ZooKeeper skin this cat?
By changing the definition of correctness!
It allows reads to yield stale data.
But otherwise preserves order.
##### Ordering guarantees (Section 2.3)
* Linearizable writes
clients send writes to the leader
the leader chooses an order, numbered by "zxid"
sends to replicas, which all execute in zxid order
this is just like the labs
* FIFO client order
each client specifies an order for its operations (reads AND writes)
writes:
writes appear in the write order in client-specified order
this is the business about the "ready" file in 2.3
reads:
each read executes at a particular point in the write order
a client's successive reads execute at non-decreasing points in the order
a client's read executes after all previous writes by that client
a server may block a client's read to wait for previous write, or sync()
##### Why does this make sense?
I.e. why OK for reads to return stale data?
why OK for client 1 to see new data, then client 2 sees older data?
At a high level:
not as painful for programmers as it may seem
very helpful for read performance!
Why is ZooKeeper useful despite loose consistency?
sync() causes subsequent client reads to see preceding writes.
useful when a read must see latest data
Writes are well-behaved, e.g. exclusive test-and-set operations
writes really do execute in order, on latest data.
Read order rules ensure "read your own writes".
Read order rules help reasoning.
e.g. if read sees "ready" file, subsequent reads see previous writes.
(Section 2.3)
Write order: Read order:
delete("ready")
write f1
write f2
create("ready")
exists("ready")
read f1
read f2
even if client switches servers!
e.g. watch triggered by a write delivered before reads from subsequent writes.
Write order: Read order:
exists("ready", watch=true)
read f1
delete("ready")
write f1
write f2
read f2
A few consequences:
Leader must preserve client write order across leader failure.
Replicas must enforce "a client's reads never go backwards in zxid order"
despite replica failure.
Client must track highest zxid it has read
to help ensure next read doesn't go backwards
even if sent to a different replica
Other performance tricks in ZooKeeper:
Clients can send async writes to leader (async = don't have to wait).
Leader batches up many requests to reduce net and disk-write overhead.
Assumes lots of active clients.
Fuzzy snapshots (and idempotent updates) so snapshot doesn't stop writes.
Is the resulting performance good?
Table 1
High read throughput -- and goes up with number of servers!
Lower write throughput -- and goes down with number of servers!
21,000 writes/second is pretty good!
Maybe limited by time to persist log to hard drives.
But still MUCH higher than 10 milliseconds per disk write -- batching.
The other big ZooKeeper topic: a general-purpose coordination service.
This is about the API and how it can help distributed s/w coordinate.
It is not clear what such an API should look like!
### What do we mean by coordination as a service?
Example: VMware-FT's test-and-set server
If one replica can't talk to the other, grabs t-a-s lock, becomes sole server
Must be exclusive to avoid two primaries (e.g. if network partition)
Must be fault-tolerant
Example: GFS (more speculative)
Perhaps agreement on which meta-data replica should be master
Perhaps recording list of chunk servers, which chunks, who is primary
Other examples: MapReduce, YMB, Crawler, etc.
Who is the master; lists of workers; division of labor; status of tasks
A general-purpose service would save much effort!
### Could we use a Lab 3 key/value store as a generic coordination service?
For example, to choose new GFS master if multiple replicas want to take over?
perhaps
Put("master", my IP address)
if Get("master") == my IP address:
act as master
problem: a racing Put() may execute after the Get()
2nd Put() overwrites first, so two masters, oops
Put() and Get() are not a good API for mutual exclusion!
problem: what to do if master fails?
perhaps master repeatedly Put()s a fresh timestamp?
lots of polling...
problem: clients need to know when master changes
periodic Get()s?
lots of polling...
### Zookeeper API overview (Figure 1)
the state: a file-system-like tree of znodes
file names, file content, directories, path names
typical use: configuration info in znodes
set of machines that participate in the application
which machine is the primary
each znode has a version number
types of znodes:
regular
ephemeral
sequential: name + seqno
### Operations on znodes (Section 2.2)
create(path, data, flags)
exclusive -- only first create indicates success
delete(path, version)
if znode.version = version, then delete
exists(path, watch)
watch=true means also send notification if path is later created/deleted
getData(path, watch)
setData(path, data, version)
if znode.version = version, then update
getChildren(path, watch)
sync()
sync then read ensures writes before sync are visible to same client's read
client could instead submit a write
ZooKeeper API well tuned to synchronization:
+ exclusive file creation; exactly one concurrent create returns success
+ getData()/setData(x, version) supports mini-transactions
+ sessions automate actions when clients fail (e.g. release lock on failure)
+ sequential files create order among multiple clients
+ watches -- avoid polling
Example: add one to a number stored in a ZooKeeper znode
what if the read returns stale data?
write will write the wrong value!
what if another client concurrently updates?
will one of the increments be lost?
while true:
x, v := getData("f")
if setData(x + 1, version=v):
break
this is a "mini-transaction"
effect is atomic read-modify-write
lots of variants, e.g. test-and-set for VMware-FT
Example: Simple Locks (Section 2.4)
acquire():
while true:
if create("lf", ephemeral=true), success
if exists("lf", watch=true)
wait for notification
release(): (voluntarily or session timeout)
delete("lf")
Q: what if lock released just as loser calls exists()?
### Example: Locks without Herd Effect
(look at pseudo-code in paper, Section 2.4, page 6)
1. create a "sequential" file
2. list files
3. if no lower-numbered, lock is acquired!
4. if exists(next-lower-numbered, watch=true)
5. wait for event...
6. goto 2
Q: could a lower-numbered file be created between steps 2 and 3?
Q: can watch fire before it is the client's turn?
A: yes
lock-10 <- current lock holder
lock-11 <- next one
lock-12 <- my request
if client that created lock-11 dies before it gets the lock, the
watch will fire but it isn't my turn yet.
### Using these locks
- Different from single-machine thread locks!
If lock holder fails, system automatically releases locks.
So locks are not really enforcing atomicity of other activities.
To make writes atomic, use "ready" trick or mini-transactions.
- Useful for master/leader election.
New leader must inspect state and clean up.
- Or soft locks, for performance but not correctness
e.g. only one worker does each Map or Reduce task (but OK if done twice)
e.g. a URL crawled by only one worker (but OK if done twice)
### ZooKeeper is a successful design.
see ZooKeeper's Wikipedia page for a list of projects that use it
Rarely eliminates all the complexity from distribution.
e.g. GFS master still needs to replicate file meta-data.
e.g. GFS primary has its own plan for replicating chunks.
But does bite off a bunch of common cases:
Master election.
Persistent master state (if state is small).
Who is the current master? (name service).
Worker registration.
Work queues.
### Topics not covered:
- persistence
- details of batching and pipelining for performance
- fuzzy snapshots
- idempotent operations
- duplicate client request detection
### References:
https://zookeeper.apache.org/doc/r3.4.8/api/org/apache/zookeeper/ZooKeeper.html
ZAB: http://dl.acm.org/citation.cfm?id=2056409
https://zookeeper.apache.org/
https://cs.brown.edu/~mph/Herlihy91/p124-herlihy.pdf (wait free, universal
objects, etc.)
================================================
FILE: src/diskv/client.go
================================================
package diskv
import "shardmaster"
import "net/rpc"
import "time"
import "sync"
import "fmt"
import "crypto/rand"
import "math/big"
type Clerk struct {
mu sync.Mutex // one RPC at a time
sm *shardmaster.Clerk
config shardmaster.Config
// You'll have to modify Clerk.
}
func nrand() int64 {
max := big.NewInt(int64(1) << 62)
bigx, _ := rand.Int(rand.Reader, max)
x := bigx.Int64()
return x
}
func MakeClerk(shardmasters []string) *Clerk {
ck := new(Clerk)
ck.sm = shardmaster.MakeClerk(shardmasters)
// You'll have to modify MakeClerk.
return ck
}
//
// call() sends an RPC to the rpcname handler on server srv
// with arguments args, waits for the reply, and leaves the
// reply in reply. the reply argument should be a pointer
// to a reply structure.
//
// the return value is true if the server responded, and false
// if call() was not able to contact the server. in particular,
// the reply's contents are only valid if call() returned true.
//
// you should assume that call() will return an
// error after a while if the server is dead.
// don't provide your own time-out mechanism.
//
// please use call() to send all RPCs, in client.go and server.go.
// please don't change this function.
//
func call(srv string, rpcname string,
args interface{}, reply interface{}) bool {
c, errx := rpc.Dial("unix", srv)
if errx != nil {
return false
}
defer c.Close()
err := c.Call(rpcname, args, reply)
if err == nil {
return true
}
fmt.Println(err)
return false
}
//
// which shard is a key in?
// please use this function,
// and please do not change it.
//
func key2shard(key string) int {
shard := 0
if len(key) > 0 {
shard = int(key[0])
}
shard %= shardmaster.NShards
return shard
}
//
// fetch the current value for a key.
// returns "" if the key does not exist.
// keeps trying forever in the face of all other errors.
//
func (ck *Clerk) Get(key string) string {
ck.mu.Lock()
defer ck.mu.Unlock()
// You'll have to modify Get().
for {
shard := key2shard(key)
gid := ck.config.Shards[shard]
servers, ok := ck.config.Groups[gid]
if ok {
// try each server in the shard's replication group.
for _, srv := range servers {
args := &GetArgs{}
args.Key = key
var reply GetReply
ok := call(srv, "DisKV.Get", args, &reply)
if ok && (reply.Err == OK || reply.Err == ErrNoKey) {
return reply.Value
}
if ok && (reply.Err == ErrWrongGroup) {
break
}
}
}
time.Sleep(100 * time.Millisecond)
// ask master for a new configuration.
ck.config = ck.sm.Query(-1)
}
}
// send a Put or Append request.
func (ck *Clerk) PutAppend(key string, value string, op string) {
ck.mu.Lock()
defer ck.mu.Unlock()
// You'll have to modify PutAppend().
for {
shard := key2shard(key)
gid := ck.config.Shards[shard]
servers, ok := ck.config.Groups[gid]
if ok {
// try each server in the shard's replication group.
for _, srv := range servers {
args := &PutAppendArgs{}
args.Key = key
args.Value = value
args.Op = op
var reply PutAppendReply
ok := call(srv, "DisKV.PutAppend", args, &reply)
if ok && reply.Err == OK {
return
}
if ok && (reply.Err == ErrWrongGroup) {
break
}
}
}
time.Sleep(100 * time.Millisecond)
// ask master for a new configuration.
ck.config = ck.sm.Query(-1)
}
}
func (ck *Clerk) Put(key string, value string) {
ck.PutAppend(key, value, "Put")
}
func (ck *Clerk) Append(key string, value string) {
ck.PutAppend(key, value, "Append")
}
================================================
FILE: src/diskv/common.go
================================================
package diskv
//
// Sharded key/value server.
// Lots of replica groups, each running op-at-a-time paxos.
// Shardmaster decides which group serves each shard.
// Shardmaster may change shard assignment from time to time.
//
// You will have to modify these definitions.
//
const (
OK = "OK"
ErrNoKey = "ErrNoKey"
ErrWrongGroup = "ErrWrongGroup"
)
type Err string
type PutAppendArgs struct {
Key string
Value string
Op string // "Put" or "Append"
// You'll have to add definitions here.
// Field names must start with capital letters,
// otherwise RPC will break.
}
type PutAppendReply struct {
Err Err
}
type GetArgs struct {
Key string
// You'll have to add definitions here.
}
type GetReply struct {
Err Err
Value string
}
================================================
FILE: src/diskv/dist_test.go
================================================
package shardkv
import (
"fmt"
"os"
"shardmaster"
"strconv"
)
func port(tag string, host int) string {
s := "/var/tmp/824-"
s += strconv.Itoa(os.Getuid()) + "/"
os.Mkdir(s, 0777)
s += "skv-"
s += strconv.Itoa(os.Getpid()) + "-"
s += tag + "-"
s += strconv.Itoa(host)
return s
}
// predict value that would result from an Append
func NextValue(prev string, val string) string {
return prev + val
}
func mcleanup(sma []*shardmaster.ShardMaster) {
for i := 0; i < len(sma); i++ {
if sma[i] != nil {
sma[i].Kill()
}
}
}
func TestConcurretnUnreliable(t *tetsing.T) {
fmt.Print("Test: Concurrent Put/Get/Move (unreliable) ...\n")
doConcurrent(t, true)
fmt.Println("Concurrent Feature completed!")
}
================================================
FILE: src/diskv/server.go
================================================
package diskv
import "net"
import "fmt"
import "net/rpc"
import "log"
import "time"
import "paxos"
import "sync"
import "sync/atomic"
import "os"
import "syscall"
import "encoding/gob"
import "encoding/base32"
import "math/rand"
import "shardmaster"
import "io/ioutil"
import "strconv"
const Debug = 0
func DPrintf(format string, a ...interface{}) (n int, err error) {
if Debug > 0 {
log.Printf(format, a...)
}
return
}
type Op struct {
// Your definitions here.
}
type DisKV struct {
mu sync.Mutex
l net.Listener
me int
dead int32 // for testing
unreliable int32 // for testing
sm *shardmaster.Clerk
px *paxos.Paxos
dir string // each replica has its own data directory
gid int64 // my replica group ID
// Your definitions here.
}
//
// these are handy functions that might be useful
// for reading and writing key/value files, and
// for reading and writing entire shards.
// puts the key files for each shard in a separate
// directory.
//
func (kv *DisKV) shardDir(shard int) string {
d := kv.dir + "/shard-" + strconv.Itoa(shard) + "/"
// create directory if needed.
_, err := os.Stat(d)
if err != nil {
if err := os.Mkdir(d, 0777); err != nil {
log.Fatalf("Mkdir(%v): %v", d, err)
}
}
return d
}
// cannot use keys in file names directly, since
// they might contain troublesome characters like /.
// base32-encode the key to get a file name.
// base32 rather than base64 b/c Mac has case-insensitive
// file names.
func (kv *DisKV) encodeKey(key string) string {
return base32.StdEncoding.EncodeToString([]byte(key))
}
func (kv *DisKV) decodeKey(filename string) (string, error) {
key, err := base32.StdEncoding.DecodeString(filename)
return string(key), err
}
// read the content of a key's file.
func (kv *DisKV) fileGet(shard int, key string) (string, error) {
fullname := kv.shardDir(shard) + "/key-" + kv.encodeKey(key)
content, err := ioutil.ReadFile(fullname)
return string(content), err
}
// replace the content of a key's file.
// uses rename() to make the replacement atomic with
// respect to crashes.
func (kv *DisKV) filePut(shard int, key string, content string) error {
fullname := kv.shardDir(shard) + "/key-" + kv.encodeKey(key)
tempname := kv.shardDir(shard) + "/temp-" + kv.encodeKey(key)
if err := ioutil.WriteFile(tempname, []byte(content), 0666); err != nil {
return err
}
if err := os.Rename(tempname, fullname); err != nil {
return err
}
return nil
}
// return content of every key file in a given shard.
func (kv *DisKV) fileReadShard(shard int) map[string]string {
m := map[string]string{}
d := kv.shardDir(shard)
files, err := ioutil.ReadDir(d)
if err != nil {
log.Fatalf("fileReadShard could not read %v: %v", d, err)
}
for _, fi := range files {
n1 := fi.Name()
if n1[0:4] == "key-" {
key, err := kv.decodeKey(n1[4:])
if err != nil {
log.Fatalf("fileReadShard bad file name %v: %v", n1, err)
}
content, err := kv.fileGet(shard, key)
if err != nil {
log.Fatalf("fileReadShard fileGet failed for %v: %v", key, err)
}
m[key] = content
}
}
return m
}
// replace an entire shard directory.
func (kv *DisKV) fileReplaceShard(shard int, m map[string]string) {
d := kv.shardDir(shard)
os.RemoveAll(d) // remove all existing files from shard.
for k, v := range m {
kv.filePut(shard, k, v)
}
}
func (kv *DisKV) Get(args *GetArgs, reply *GetReply) error {
// Your code here.
return nil
}
// RPC handler for client Put and Append requests
func (kv *DisKV) PutAppend(args *PutAppendArgs, reply *PutAppendReply) error {
// Your code here.
return nil
}
//
// Ask the shardmaster if there's a new configuration;
// if so, re-configure.
//
func (kv *DisKV) tick() {
// Your code here.
}
// tell the server to shut itself down.
// please don't change these two functions.
func (kv *DisKV) kill() {
atomic.StoreInt32(&kv.dead, 1)
kv.l.Close()
kv.px.Kill()
}
// call this to find out if the server is dead.
func (kv *DisKV) isdead() bool {
return atomic.LoadInt32(&kv.dead) != 0
}
// please do not change these two functions.
func (kv *DisKV) Setunreliable(what bool) {
if what {
atomic.StoreInt32(&kv.unreliable, 1)
} else {
atomic.StoreInt32(&kv.unreliable, 0)
}
}
func (kv *DisKV) isunreliable() bool {
return atomic.LoadInt32(&kv.unreliable) != 0
}
//
// Start a shardkv server.
// gid is the ID of the server's replica group.
// shardmasters[] contains the ports of the
// servers that implement the shardmaster.
// servers[] contains the ports of the servers
// in this replica group.
// Me is the index of this server in servers[].
// dir is the directory name under which this
// replica should store all its files.
// each replica is passed a different directory.
// restart is false the very first time this server
// is started, and true to indicate a re-start
// after a crash or after a crash with disk loss.
//
func StartServer(gid int64, shardmasters []string,
servers []string, me int, dir string, restart bool) *DisKV {
kv := new(DisKV)
kv.me = me
kv.gid = gid
kv.sm = shardmaster.MakeClerk(shardmasters)
kv.dir = dir
// Your initialization code here.
// Don't call Join().
// log.SetOutput(ioutil.Discard)
gob.Register(Op{})
rpcs := rpc.NewServer()
rpcs.Register(kv)
kv.px = paxos.Make(servers, me, rpcs)
// log.SetOutput(os.Stdout)
os.Remove(servers[me])
l, e := net.Listen("unix", servers[me])
if e != nil {
log.Fatal("listen error: ", e)
}
kv.l = l
// please do not change any of the following code,
// or do anything to subvert it.
go func() {
for kv.isdead() == false {
conn, err := kv.l.Accept()
if err == nil && kv.isdead() == false {
if kv.isunreliable() && (rand.Int63()%1000) < 100 {
// discard the request.
conn.Close()
} else if kv.isunreliable() && (rand.Int63()%1000) < 200 {
// process the request but force discard of reply.
c1 := conn.(*net.UnixConn)
f, _ := c1.File()
err := syscall.Shutdown(int(f.Fd()), syscall.SHUT_WR)
if err != nil {
fmt.Printf("shutdown: %v\n", err)
}
go rpcs.ServeConn(conn)
} else {
go rpcs.ServeConn(conn)
}
} else if err == nil {
conn.Close()
}
if err != nil && kv.isdead() == false {
fmt.Printf("DisKV(%v) accept: %v\n", me, err.Error())
kv.kill()
}
}
}()
go func() {
for kv.isdead() == false {
kv.tick()
time.Sleep(250 * time.Millisecond)
}
}()
return kv
}
================================================
FILE: src/diskv/test.go
================================================
package diskv
import "testing"
import "shardmaster"
import "runtime"
import "strconv"
import "strings"
import "os"
import "os/exec"
import "time"
import "fmt"
import "sync"
import "io/ioutil"
import "log"
import "math/rand"
import crand "crypto/rand"
import "encoding/base64"
import "path/filepath"
import "sync/atomic"
type tServer struct {
p *os.Process
port string // this replica's port name
dir string // directory for persistent data
started bool // has started at least once already
}
// information about the servers of one replica group.
type tGroup struct {
gid int64
servers []*tServer
}
// information about all the servers of a k/v cluster.
type tCluster struct {
t *testing.T
dir string
unreliable bool
masters []*shardmaster.ShardMaster
mck *shardmaster.Clerk
masterports []string
groups []*tGroup
}
func randstring(n int) string {
b := make([]byte, 2*n)
crand.Read(b)
s := base64.URLEncoding.EncodeToString(b)
return s[0:n]
}
func (tc *tCluster) newport() string {
return tc.dir + randstring(12)
}
//
// start a k/v replica server process.
// use separate process, rather than thread, so we
// can kill a replica unexpectedly.
// ../main/diskvd
//
func (tc *tCluster) start1(gi int, si int) {
args := []string{"../main/diskvd"}
attr := &os.ProcAttr{}
in, err := os.Open("/dev/null")
attr.Files = make([]*os.File, 3)
attr.Files[0] = in
attr.Files[1] = os.Stdout
attr.Files[2] = os.Stderr
g := tc.groups[gi]
s := g.servers[si]
args = append(args, "-g")
args = append(args, strconv.FormatInt(g.gid, 10))
for _, m := range tc.masterports {
args = append(args, "-m")
args = append(args, m)
}
for _, sx := range g.servers {
args = append(args, "-s")
args = append(args, sx.port)
}
args = append(args, "-i")
args = append(args, strconv.Itoa(si))
args = append(args, "-u")
args = append(args, strconv.FormatBool(tc.unreliable))
args = append(args, "-d")
args = append(args, s.dir)
args = append(args, "-r")
args = append(args, strconv.FormatBool(s.started)) // re-start?
p, err := os.StartProcess(args[0], args, attr)
if err != nil {
tc.t.Fatalf("StartProcess(%v): %v\n", args[0], err)
}
s.p = p
s.started = true
}
func (tc *tCluster) kill1(gi int, si int, deletefiles bool) {
g := tc.groups[gi]
s := g.servers[si]
if s.p != nil {
s.p.Kill()
s.p.Wait()
s.p = nil
}
if deletefiles {
if err := os.RemoveAll(s.dir); err != nil {
tc.t.Fatalf("RemoveAll")
}
os.Mkdir(s.dir, 0777)
}
}
func (tc *tCluster) cleanup() {
for gi := 0; gi < len(tc.groups); gi++ {
g := tc.groups[gi]
for si := 0; si < len(g.servers); si++ {
tc.kill1(gi, si, false)
}
}
for i := 0; i < len(tc.masters); i++ {
if tc.masters[i] != nil {
tc.masters[i].Kill()
}
}
// this RemoveAll, along with the directory naming
// policy in setup(), means that you can't run
// concurrent tests. the reason is to avoid accumulating
// lots of stuff in /var/tmp on Athena.
os.RemoveAll(tc.dir)
}
func (tc *tCluster) shardclerk() *shardmaster.Clerk {
return shardmaster.MakeClerk(tc.masterports)
}
func (tc *tCluster) clerk() *Clerk {
return MakeClerk(tc.masterports)
}
func (tc *tCluster) join(gi int) {
ports := []string{}
for _, s := range tc.groups[gi].servers {
ports = append(ports, s.port)
}
tc.mck.Join(tc.groups[gi].gid, ports)
}
func (tc *tCluster) leave(gi int) {
tc.mck.Leave(tc.groups[gi].gid)
}
// how many total bytes of file space in use?
func (tc *tCluster) space() int64 {
var bytes int64 = 0
ff := func(_ string, info os.FileInfo, err error) error {
if err == nil && info.Mode().IsDir() == false {
bytes += info.Size()
}
return nil
}
filepath.Walk(tc.dir, ff)
return bytes
}
func setup(t *testing.T, tag string, ngroups int, nreplicas int, unreliable bool) *tCluster {
runtime.GOMAXPROCS(4)
// compile ../main/diskvd.go
// cmd := exec.Command("go", "build", "-race", "diskvd.go")
cmd := exec.Command("go", "build", "diskvd.go")
cmd.Dir = "../main"
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
if err := cmd.Run(); err != nil {
t.Fatalf("could not compile ../main/diskvd.go: %v", err)
}
const nmasters = 3
tc := &tCluster{}
tc.t = t
tc.unreliable = unreliable
tc.dir = "/var/tmp/824-"
tc.dir += strconv.Itoa(os.Getuid()) + "/"
os.Mkdir(tc.dir, 0777)
tc.dir += "lab5-" + tag + "/"
os.RemoveAll(tc.dir)
os.Mkdir(tc.dir, 0777)
tc.masters = make([]*shardmaster.ShardMaster, nmasters)
tc.masterports = make([]string, nmasters)
for i := 0; i < nmasters; i++ {
tc.masterports[i] = tc.newport()
}
log.SetOutput(ioutil.Discard) // suppress method errors &c
for i := 0; i < nmasters; i++ {
tc.masters[i] = shardmaster.StartServer(tc.masterports, i)
}
log.SetOutput(os.Stdout) // re-enable error output.
tc.mck = tc.shardclerk()
tc.groups = make([]*tGroup, ngroups)
for i := 0; i < ngroups; i++ {
g := &tGroup{}
tc.groups[i] = g
g.gid = int64(i + 100)
g.servers = make([]*tServer, nreplicas)
for j := 0; j < nreplicas; j++ {
g.servers[j] = &tServer{}
g.servers[j].port = tc.newport()
g.servers[j].dir = tc.dir + randstring(12)
if err := os.Mkdir(g.servers[j].dir, 0777); err != nil {
t.Fatalf("Mkdir(%v): %v", g.servers[j].dir, err)
}
}
for j := 0; j < nreplicas; j++ {
tc.start1(i, j)
}
}
// return smh, gids, ha, sa, clean
return tc
}
//
// these tests are the same as in Lab 4.
//
func Test4Basic(t *testing.T) {
tc := setup(t, "basic", 3, 3, false)
defer tc.cleanup()
fmt.Printf("Test: Basic Join/Leave (lab4) ...\n")
tc.join(0)
ck := tc.clerk()
ck.Put("a", "x")
ck.Append("a", "b")
if ck.Get("a") != "xb" {
t.Fatalf("wrong value")
}
keys := make([]string, 10)
vals := make([]string, len(keys))
for i := 0; i < len(keys); i++ {
keys[i] = strconv.Itoa(rand.Int())
vals[i] = strconv.Itoa(rand.Int())
ck.Put(keys[i], vals[i])
}
// are keys still there after joins?
for gi := 1; gi < len(tc.groups); gi++ {
tc.join(gi)
time.Sleep(1 * time.Second)
for i := 0; i < len(keys); i++ {
v := ck.Get(keys[i])
if v != vals[i] {
t.Fatalf("joining; wrong value; g=%v k=%v wanted=%v got=%v",
gi, keys[i], vals[i], v)
}
vals[i] = strconv.Itoa(rand.Int())
ck.Put(keys[i], vals[i])
}
}
// are keys still there after leaves?
for gi := 0; gi < len(tc.groups)-1; gi++ {
tc.leave(gi)
time.Sleep(1 * time.Second)
for i := 0; i < len(keys); i++ {
v := ck.Get(keys[i])
if v != vals[i] {
t.Fatalf("leaving; wrong value; g=%v k=%v wanted=%v got=%v",
gi, keys[i], vals[i], v)
}
vals[i] = strconv.Itoa(rand.Int())
ck.Put(keys[i], vals[i])
}
}
fmt.Printf(" ... Passed\n")
}
func Test4Move(t *testing.T) {
tc := setup(t, "move", 3, 3, false)
defer tc.cleanup()
fmt.Printf("Test: Shards really move (lab4) ...\n")
tc.join(0)
ck := tc.clerk()
// insert one key per shard
for i := 0; i < shardmaster.NShards; i++ {
ck.Put(string('0'+i), string('0'+i))
}
// add group 1.
tc.join(1)
time.Sleep(5 * time.Second)
// check that keys are still there.
for i := 0; i < shardmaster.NShards; i++ {
if ck.Get(string('0'+i)) != string('0'+i) {
t.Fatalf("missing key/value")
}
}
// remove sockets from group 0.
for _, s := range tc.groups[0].servers {
os.Remove(s.port)
}
count := int32(0)
var mu sync.Mutex
for i := 0; i < shardmaster.NShards; i++ {
go func(me int) {
myck := tc.clerk()
v := myck.Get(string('0' + me))
if v == string('0'+me) {
mu.Lock()
atomic.AddInt32(&count, 1)
mu.Unlock()
} else {
t.Fatalf("Get(%v) yielded %v\n", me, v)
}
}(i)
}
time.Sleep(10 * time.Second)
ccc := atomic.LoadInt32(&count)
if ccc > shardmaster.NShards/3 && ccc < 2*(shardmaster.NShards/3) {
fmt.Printf(" ... Passed\n")
} else {
t.Fatalf("%v keys worked after killing 1/2 of groups; wanted %v",
ccc, shardmaster.NShards/2)
}
}
func Test4Limp(t *testing.T) {
tc := setup(t, "limp", 3, 3, false)
defer tc.cleanup()
fmt.Printf("Test: Reconfiguration with some dead replicas (lab4) ...\n")
tc.join(0)
ck := tc.clerk()
ck.Put("a", "b")
if ck.Get("a") != "b" {
t.Fatalf("got wrong value")
}
// kill one server from each replica group.
for gi := 0; gi < len(tc.groups); gi++ {
sa := tc.groups[gi].servers
tc.kill1(gi, rand.Int()%len(sa), false)
}
keys := make([]string, 10)
vals := make([]string, len(keys))
for i := 0; i < len(keys); i++ {
keys[i] = strconv.Itoa(rand.Int())
vals[i] = strconv.Itoa(rand.Int())
ck.Put(keys[i], vals[i])
}
// are keys still there after joins?
for gi := 1; gi < len(tc.groups); gi++ {
tc.join(gi)
time.Sleep(1 * time.Second)
for i := 0; i < len(keys); i++ {
v := ck.Get(keys[i])
if v != vals[i] {
t.Fatalf("joining; wrong value; g=%v k=%v wanted=%v got=%v",
gi, keys[i], vals[i], v)
}
vals[i] = strconv.Itoa(rand.Int())
ck.Put(keys[i], vals[i])
}
}
// are keys still there after leaves?
for gi := 0; gi < len(tc.groups)-1; gi++ {
tc.leave(gi)
time.Sleep(2 * time.Second)
g := tc.groups[gi]
for i := 0; i < len(g.servers); i++ {
tc.kill1(gi, i, false)
}
for i := 0; i < len(keys); i++ {
v := ck.Get(keys[i])
if v != vals[i] {
t.Fatalf("leaving; wrong value; g=%v k=%v wanted=%v got=%v",
g, keys[i], vals[i], v)
}
vals[i] = strconv.Itoa(rand.Int())
ck.Put(keys[i], vals[i])
}
}
fmt.Printf(" ... Passed\n")
}
func doConcurrent(t *testing.T, unreliable bool) {
tc := setup(t, "concurrent-"+strconv.FormatBool(unreliable), 3, 3, unreliable)
defer tc.cleanup()
for i := 0; i < len(tc.groups); i++ {
tc.join(i)
}
const npara = 11
var ca [npara]chan bool
for i := 0; i < npara; i++ {
ca[i] = make(chan bool)
go func(me int) {
ok := true
defer func() { ca[me] <- ok }()
ck := tc.clerk()
mymck := tc.shardclerk()
key := strconv.Itoa(me)
last := ""
for iters := 0; iters < 3; iters++ {
nv := strconv.Itoa(rand.Int())
ck.Append(key, nv)
last = last + nv
v := ck.Get(key)
if v != last {
ok = false
t.Fatalf("Get(%v) expected %v got %v\n", key, last, v)
}
gi := rand.Int() % len(tc.groups)
gid := tc.groups[gi].gid
mymck.Move(rand.Int()%shardmaster.NShards, gid)
time.Sleep(time.Duration(rand.Int()%30) * time.Millisecond)
}
}(i)
}
for i := 0; i < npara; i++ {
x := <-ca[i]
if x == false {
t.Fatalf("something is wrong")
}
}
}
func Test4Concurrent(t *testing.T) {
fmt.Printf("Test: Concurrent Put/Get/Move (lab4) ...\n")
doConcurrent(t, false)
fmt.Printf(" ... Passed\n")
}
func Test4ConcurrentUnreliable(t *testing.T) {
fmt.Printf("Test: Concurrent Put/Get/Move (unreliable) (lab4) ...\n")
doConcurrent(t, true)
fmt.Printf(" ... Passed\n")
}
//
// the rest of the tests are lab5-specific.
//
//
// do the servers write k/v pairs to disk, so that they
// are still available after kill+restart?
//
func Test5BasicPersistence(t *testing.T) {
tc := setup(t, "basicpersistence", 1, 3, false)
defer tc.cleanup()
fmt.Printf("Test: Basic Persistence ...\n")
tc.join(0)
ck := tc.clerk()
ck.Append("a", "x")
ck.Append("a", "y")
if ck.Get("a") != "xy" {
t.Fatalf("wrong value")
}
// kill all servers in all groups.
for gi, g := range tc.groups {
for si, _ := range g.servers {
tc.kill1(gi, si, false)
}
}
// check that requests are not executed.
ch := make(chan string)
go func() {
ck1 := tc.clerk()
v := ck1.Get("a")
ch <- v
}()
select {
case <-ch:
t.Fatalf("Get should not have succeeded after killing all servers.")
case <-time.After(3 * time.Second):
// this is what we hope for.
}
// restart all servers, check that they recover the data.
for gi, g := range tc.groups {
for si, _ := range g.servers {
tc.start1(gi, si)
}
}
time.Sleep(2 * time.Second)
ck.Append("a", "z")
v := ck.Get("a")
if v != "xyz" {
t.Fatalf("wrong value %v after restart", v)
}
fmt.Printf(" ... Passed\n")
}
//
// if server S1 is dead for a bit, and others accept operations,
// do they bring S1 up to date correctly after it restarts?
//
func Test5OneRestart(t *testing.T) {
tc := setup(t, "onerestart", 1, 3, false)
defer tc.cleanup()
fmt.Printf("Test: One server restarts ...\n")
tc.join(0)
ck := tc.clerk()
g0 := tc.groups[0]
k1 := randstring(10)
k1v := randstring(10)
ck.Append(k1, k1v)
k2 := randstring(10)
k2v := randstring(10)
ck.Put(k2, k2v)
for i := 0; i < len(g0.servers); i++ {
k1x := ck.Get(k1)
if k1x != k1v {
t.Fatalf("wrong value for k1, i=%v, wanted=%v, got=%v", i, k1v, k1x)
}
k2x := ck.Get(k2)
if k2x != k2v {
t.Fatalf("wrong value for k2")
}
tc.kill1(0, i, false)
time.Sleep(1 * time.Second)
z := randstring(10)
k1v += z
ck.Append(k1, z)
k2v = randstring(10)
ck.Put(k2, k2v)
tc.start1(0, i)
time.Sleep(2 * time.Second)
}
if ck.Get(k1) != k1v {
t.Fatalf("wrong value for k1")
}
if ck.Get(k2) != k2v {
t.Fatalf("wrong value for k2")
}
fmt.Printf(" ... Passed\n")
}
//
// check that the persistent state isn't too big.
//
func Test5DiskUse(t *testing.T) {
tc := setup(t, "diskuse", 1, 3, false)
defer tc.cleanup()
fmt.Printf("Test: Servers don't use too much disk space ...\n")
tc.join(0)
ck := tc.clerk()
g0 := tc.groups[0]
k1 := randstring(10)
k1v := randstring(10)
ck.Append(k1, k1v)
k2 := randstring(10)
k2v := randstring(10)
ck.Put(k2, k2v)
k3 := randstring(10)
k3v := randstring(10)
ck.Put(k3, k3v)
k4 := randstring(10)
k4v := randstring(10)
ck.Append(k4, k4v)
n := 100 + (rand.Int() % 20)
for i := 0; i < n; i++ {
k2v = randstring(1000)
ck.Put(k2, k2v)
x := randstring(1)
ck.Append(k3, x)
k3v += x
ck.Get(k4)
}
time.Sleep(100 * time.Millisecond)
k2v = randstring(1000)
ck.Put(k2, k2v)
time.Sleep(100 * time.Millisecond)
x := randstring(1)
ck.Append(k3, x)
k3v += x
time.Sleep(100 * time.Millisecond)
ck.Get(k4)
// let all the replicas tick().
time.Sleep(2100 * time.Millisecond)
max := int64(20 * 1000)
{
nb := tc.space()
if nb > max {
t.Fatalf("using too many bytes on disk (%v)", nb)
}
}
for i := 0; i < len(g0.servers); i++ {
tc.kill1(0, i, false)
}
{
nb := tc.space()
if nb > max {
t.Fatalf("using too many bytes on disk (%v > %v)", nb, max)
}
}
for i := 0; i < len(g0.servers); i++ {
tc.start1(0, i)
}
time.Sleep(time.Second)
if ck.Get(k1) != k1v {
t.Fatalf("wrong value for k1")
}
if ck.Get(k2) != k2v {
t.Fatalf("wrong value for k2")
}
if ck.Get(k3) != k3v {
t.Fatalf("wrong value for k3")
}
{
nb := tc.space()
if nb > max {
t.Fatalf("using too many bytes on disk (%v > %v)", nb, max)
}
}
fmt.Printf(" ... Passed\n")
}
//
// check that the persistent state isn't too big for Appends.
//
func Test5AppendUse(t *testing.T) {
tc := setup(t, "appenduse", 1, 3, false)
defer tc.cleanup()
fmt.Printf("Test: Servers don't use too much disk space for Appends ...\n")
tc.join(0)
ck := tc.clerk()
g0 := tc.groups[0]
k1 := randstring(10)
k1v := randstring(10)
ck.Append(k1, k1v)
k2 := randstring(10)
k2v := randstring(10)
ck.Put(k2, k2v)
k3 := randstring(10)
k3v := randstring(10)
ck.Put(k3, k3v)
k4 := randstring(10)
k4v := randstring(10)
ck.Append(k4, k4v)
n := 100 + (rand.Int() % 20)
for i := 0; i < n; i++ {
k2v = randstring(1000)
ck.Put(k2, k2v)
x := randstring(1000)
ck.Append(k3, x)
k3v += x
ck.Get(k4)
}
time.Sleep(100 * time.Millisecond)
k2v = randstring(1000)
ck.Put(k2, k2v)
time.Sleep(100 * time.Millisecond)
x := randstring(1)
ck.Append(k3, x)
k3v += x
time.Sleep(100 * time.Millisecond)
ck.Get(k4)
// let all the replicas tick().
time.Sleep(2100 * time.Millisecond)
max := int64(3*n*1000) + 20000
{
nb := tc.space()
if nb > max {
t.Fatalf("using too many bytes on disk (%v > %v)", nb, max)
}
}
for i := 0; i < len(g0.servers); i++ {
tc.kill1(0, i, false)
}
{
nb := tc.space()
if nb > max {
t.Fatalf("using too many bytes on disk (%v > %v)", nb, max)
}
}
for i := 0; i < len(g0.servers); i++ {
tc.start1(0, i)
}
time.Sleep(time.Second)
if ck.Get(k3) != k3v {
t.Fatalf("wrong value for k3")
}
time.Sleep(100 * time.Millisecond)
if ck.Get(k2) != k2v {
t.Fatalf("wrong value for k2")
}
time.Sleep(1100 * time.Millisecond)
if ck.Get(k1) != k1v {
t.Fatalf("wrong value for k1")
}
{
nb := tc.space()
if nb > max {
t.Fatalf("using too many bytes on disk (%v > %v)", nb, max)
}
}
fmt.Printf(" ... Passed\n")
}
//
// recovery if a single replica loses disk content.
//
func Test5OneLostDisk(t *testing.T) {
tc := setup(t, "onelostdisk", 1, 3, false)
defer tc.cleanup()
fmt.Printf("Test: One server loses disk and restarts ...\n")
tc.join(0)
ck := tc.clerk()
g0 := tc.groups[0]
k1 := randstring(10)
k1v := ""
k2 := randstring(10)
k2v := ""
for i := 0; i < 7+(rand.Int()%7); i++ {
x := randstring(10)
ck.Append(k1, x)
k1v += x
k2v = randstring(10)
ck.Put(k2, k2v)
}
time.Sleep(300 * time.Millisecond)
ck.Get(k1)
time.Sleep(300 * time.Millisecond)
ck.Get(k2)
for i := 0; i < len(g0.servers); i++ {
k1x := ck.Get(k1)
if k1x != k1v {
t.Fatalf("wrong value for k1, i=%v, wanted=%v, got=%v", i, k1v, k1x)
}
k2x := ck.Get(k2)
if k2x != k2v {
t.Fatalf("wrong value for k2")
}
tc.kill1(0, i, true)
time.Sleep(1 * time.Second)
{
z := randstring(10)
k1v += z
ck.Append(k1, z)
k2v = randstring(10)
ck.Put(k2, k2v)
}
tc.start1(0, i)
{
z := randstring(10)
k1v += z
ck.Append(k1, z)
time.Sleep(10 * time.Millisecond)
z = randstring(10)
k1v += z
ck.Append(k1, z)
}
time.Sleep(2 * time.Second)
}
if ck.Get(k1) != k1v {
t.Fatalf("wrong value for k1")
}
if ck.Get(k2) != k2v {
t.Fatalf("wrong value for k2")
}
fmt.Printf(" ... Passed\n")
}
//
// one disk lost while another replica is merely down.
//
func Test5OneLostOneDown(t *testing.T) {
tc := setup(t, "onelostonedown", 1, 5, false)
defer tc.cleanup()
fmt.Printf("Test: One server down, another loses disk ...\n")
tc.join(0)
ck := tc.clerk()
g0 := tc.groups[0]
k1 := randstring(10)
k1v := ""
k2 := randstring(10)
k2v := ""
for i := 0; i < 7+(rand.Int()%7); i++ {
x := randstring(10)
ck.Append(k1, x)
k1v += x
k2v = randstring(10)
ck.Put(k2, k2v)
}
time.Sleep(300 * time.Millisecond)
ck.Get(k1)
time.Sleep(300 * time.Millisecond)
ck.Get(k2)
tc.kill1(0, 0, false)
for i := 1; i < len(g0.servers); i++ {
k1x := ck.Get(k1)
if k1x != k1v {
t.Fatalf("wrong value for k1, i=%v, wanted=%v, got=%v", i, k1v, k1x)
}
k2x := ck.Get(k2)
if k2x != k2v {
t.Fatalf("wrong value for k2")
}
tc.kill1(0, i, true)
time.Sleep(1 * time.Second)
{
z := randstring(10)
k1v += z
ck.Append(k1, z)
k2v = randstring(10)
ck.Put(k2, k2v)
}
tc.start1(0, i)
{
z := randstring(10)
k1v += z
ck.Append(k1, z)
time.Sleep(10 * time.Millisecond)
z = randstring(10)
k1v += z
ck.Append(k1, z)
}
time.Sleep(2 * time.Second)
}
if ck.Get(k1) != k1v {
t.Fatalf("wrong value for k1")
}
if ck.Get(k2) != k2v {
t.Fatalf("wrong value for k2")
}
tc.start1(0, 0)
ck.Put("a", "b")
time.Sleep(1 * time.Second)
ck.Put("a", "c")
if ck.Get(k1) != k1v {
t.Fatalf("wrong value for k1")
}
if ck.Get(k2) != k2v {
t.Fatalf("wrong value for k2")
}
fmt.Printf(" ... Passed\n")
}
// check that all known appends are present in a value,
// and are in order for each concurrent client.
func checkAppends(t *testing.T, v string, counts []int) {
nclients := len(counts)
for i := 0; i < nclients; i++ {
lastoff := -1
for j := 0; j < counts[i]; j++ {
wanted := "x " + strconv.Itoa(i) + " " + strconv.Itoa(j) + " y"
off := strings.Index(v, wanted)
if off < 0 {
t.Fatalf("missing element %v %v in Append result", i, j)
}
off1 := strings.LastIndex(v, wanted)
if off1 != off {
t.Fatalf("duplicate element %v %v in Append result", i, j)
}
if off <= lastoff {
t.Fatalf("wrong order for element in Append result")
}
lastoff = off
}
}
}
func doConcurrentCrash(t *testing.T, unreliable bool) {
tc := setup(t, "concurrentcrash", 1, 3, unreliable)
defer tc.cleanup()
tc.join(0)
ck := tc.clerk()
k1 := randstring(10)
ck.Put(k1, "")
stop := int32(0)
ff := func(me int, ch chan int) {
ret := -1
defer func() { ch <- ret }()
myck := tc.clerk()
n := 0
for atomic.LoadInt32(&stop) == 0 || n < 5 {
myck.Append(k1, "x "+strconv.Itoa(me)+" "+strconv.Itoa(n)+" y")
n++
time.Sleep(200 * time.Millisecond)
}
ret = n
}
ncli := 5
cha := []chan int{}
for i := 0; i < ncli; i++ {
cha = append(cha, make(chan int))
go ff(i, cha[i])
}
for i := 0; i < 3; i++ {
tc.kill1(0, i%3, false)
time.Sleep(1000 * time.Millisecond)
ck.Get(k1)
tc.start1(0, i%3)
time.Sleep(3000 * time.Millisecond)
if unreliable {
time.Sleep(5000 * time.Millisecond)
}
ck.Get(k1)
}
for i := 0; i < 3; i++ {
tc.kill1(0, i%3, true)
time.Sleep(1000 * time.Millisecond)
ck.Get(k1)
tc.start1(0, i%3)
time.Sleep(3000 * time.Millisecond)
if unreliable {
time.Sleep(5000 * time.Millisecond)
}
ck.Get(k1)
}
time.Sleep(2 * time.Second)
atomic.StoreInt32(&stop, 1)
counts := []int{}
for i := 0; i < ncli; i++ {
n := <-cha[i]
if n < 0 {
t.Fatal("client failed")
}
counts = append(counts, n)
}
vx := ck.Get(k1)
checkAppends(t, vx, counts)
for i := 0; i < 3; i++ {
tc.kill1(0, i, false)
if ck.Get(k1) != vx {
t.Fatalf("mismatch")
}
tc.start1(0, i)
if ck.Get(k1) != vx {
t.Fatalf("mismatch")
}
time.Sleep(3000 * time.Millisecond)
if unreliable {
time.Sleep(5000 * time.Millisecond)
}
if ck.Get(k1) != vx {
t.Fatalf("mismatch")
}
}
}
func Test5ConcurrentCrashReliable(t *testing.T) {
fmt.Printf("Test: Concurrent Append and Crash ...\n")
doConcurrentCrash(t, false)
fmt.Printf(" ... Passed\n")
}
//
// Append() at same time as crash.
//
func Test5Simultaneous(t *testing.T) {
tc := setup(t, "simultaneous", 1, 3, true)
defer tc.cleanup()
fmt.Printf("Test: Simultaneous Append and Crash ...\n")
tc.join(0)
ck := tc.clerk()
k1 := randstring(10)
ck.Put(k1, "")
ch := make(chan int)
ff := func(x int) {
ret := -1
defer func() { ch <- ret }()
myck := tc.clerk()
myck.Append(k1, "x "+strconv.Itoa(0)+" "+strconv.Itoa(x)+" y")
ret = 1
}
counts := []int{0}
for i := 0; i < 50; i++ {
go ff(i)
time.Sleep(time.Duration(rand.Int()%200) * time.Millisecond)
if (rand.Int() % 1000) < 500 {
tc.kill1(0, i%3, false)
} else {
tc.kill1(0, i%3, true)
}
time.Sleep(1000 * time.Millisecond)
vx := ck.Get(k1)
checkAppends(t, vx, counts)
tc.start1(0, i%3)
time.Sleep(2200 * time.Millisecond)
z := <-ch
if z != 1 {
t.Fatalf("Append thread failed")
}
counts[0] += z
}
fmt.Printf(" ... Passed\n")
}
//
// recovery with mixture of lost disks and simple reboot.
// does a replica that loses its disk wait for majority?
//
func Test5RejoinMix1(t *testing.T) {
tc := setup(t, "rejoinmix1", 1, 5, false)
defer tc.cleanup()
fmt.Printf("Test: replica waits correctly after disk loss ...\n")
tc.join(0)
ck := tc.clerk()
k1 := randstring(10)
k1v := ""
for i := 0; i < 7+(rand.Int()%7); i++ {
x := randstring(10)
ck.Append(k1, x)
k1v += x
}
time.Sleep(300 * time.Millisecond)
ck.Get(k1)
tc.kill1(0, 0, false)
for i := 0; i < 2; i++ {
x := randstring(10)
ck.Append(k1, x)
k1v += x
}
time.Sleep(300 * time.Millisecond)
ck.Get(k1)
time.Sleep(300 * time.Millisecond)
tc.kill1(0, 1, true)
tc.kill1(0, 2, true)
tc.kill1(0, 3, false)
tc.kill1(0, 4, false)
tc.start1(0, 0)
tc.start1(0, 1)
tc.start1(0, 2)
time.Sleep(300 * time.Millisecond)
// check that requests are not executed.
ch := make(chan string)
go func() {
ck1 := tc.clerk()
v := ck1.Get(k1)
ch <- v
}()
select {
case <-ch:
t.Fatalf("Get should not have succeeded.")
case <-time.After(3 * time.Second):
// this is what we hope for.
}
tc.start1(0, 3)
tc.start1(0, 4)
{
x := randstring(10)
ck.Append(k1, x)
k1v += x
}
v := ck.Get(k1)
if v != k1v {
t.Fatalf("Get returned wrong value")
}
fmt.Printf(" ... Passed\n")
}
//
// does a replica that loses its state avoid
// changing its mind about Paxos agreements?
//
func Test5RejoinMix3(t *testing.T) {
tc := setup(t, "rejoinmix3", 1, 5, false)
defer tc.cleanup()
fmt.Printf("Test: replica Paxos resumes correctly after disk loss ...\n")
tc.join(0)
ck := tc.clerk()
k1 := randstring(10)
k1v := ""
for i := 0; i < 7+(rand.Int()%7); i++ {
x := randstring(10)
ck.Append(k1, x)
k1v += x
}
time.Sleep(300 * time.Millisecond)
ck.Get(k1)
// kill R1, R2.
tc.kill1(0, 1, false)
tc.kill1(0, 2, false)
// R0, R3, and R4 are up.
for i := 0; i < 100+(rand.Int()%7); i++ {
x := randstring(10)
ck.Append(k1, x)
k1v += x
}
// kill R0, lose disk.
tc.kill1(0, 0, true)
time.Sleep(50 * time.Millisecond)
// restart R1, R2, R0.
tc.start1(0, 1)
tc.start1(0, 2)
time.Sleep(1 * time.Millisecond)
tc.start1(0, 0)
chx := make(chan bool)
x1 := randstring(10)
x2 := randstring(10)
go func() { ck.Append(k1, x1); chx <- true }()
time.Sleep(10 * time.Millisecond)
go func() { ck.Append(k1, x2); chx <- true }()
<-chx
<-chx
xv := ck.Get(k1)
if xv == k1v+x1+x2 || xv == k1v+x2+x1 {
// ok
} else {
t.Fatalf("wrong value")
}
fmt.Printf(" ... Passed\n")
}
================================================
FILE: src/kvpaxos/client.go
================================================
package kvpaxos
import "net/rpc"
import "crypto/rand"
import "math/big"
import "fmt"
type Clerk struct {
servers []string
}
func nrand() int64 {
max := big.NewInt(int64(1) << 62)
bigx, _ := rand.Int(rand.Reader, max)
x := bigx.Int64()
return x
}
func MakeClerk(servers []string) *Clerk {
ck := new(Clerk)
ck.servers = servers
// You'll have to add code here.
return ck
}
//
// call() sends an RPC to the rpcname handler on server srv
// with arguments args, waits for the reply, and leaves the
// reply in reply. the reply argument should be a pointer
// to a reply structure.
//
// the return value is true if the server responded, and false
// if call() was not able to contact the server. in particular,
// the reply's contents are only valid if call() returned true.
//
// you should assume that call() will return an
// error after a while if the server is dead.
// don't provide your own time-out mechanism.
//
// please use call() to send all RPCs, in client.go and server.go.
// please don't change this function.
//
func call(srv string, rpcname string,
args interface{}, reply interface{}) bool {
c, errx := rpc.Dial("unix", srv)
if errx != nil {
return false
}
defer c.Close()
err := c.Call(rpcname, args, reply)
if err == nil {
return true
}
fmt.Println(err)
return false
}
//
// fetch the current value for a key.
// returns "" if the key does not exist.
// keeps trying forever in the face of all other errors.
//
func (ck *Clerk) Get(key string) string {
// You will have to modify this function.
args := &GetArgs{Key: key, Id: nrand()}
var reply GetReply
for i := 0; ; {
if ok := call(ck.servers[i], "KVPaxos.Get", args, &reply); ok && (reply.Err == OK || reply.Err == ErrNoKey) {
return reply.Value
}
i++
i %= len(ck.servers)
}
return ""
}
//
// shared by Put and Append.
//
func (ck *Clerk) PutAppend(key string, value string, op string) {
// You will have to modify this function.
args := &PutAppendArgs{Key: key, Value: value, Op: op, Id: nrand()}
var reply PutAppendReply
for i := 0; ; {
if ok := call(ck.servers[i], "KVPaxos.PutAppend", args, &reply); ok && reply.Err == OK {
return
}
i++
i %= len(ck.servers)
}
}
func (ck *Clerk) Put(key string, value string) {
ck.PutAppend(key, value, "Put")
}
func (ck *Clerk) Append(key string, value string) {
ck.PutAppend(key, value, "Append")
}
================================================
FILE: src/kvpaxos/common.go
================================================
package kvpaxos
const (
OK = "OK"
ErrNoKey = "ErrNoKey"
ErrPending = "ErrPending"
ErrForgotten = "ErrForgotten"
)
const (
Get = "Get"
Put = "Put"
Append = "Append"
)
type Err string
// Put or Append
type PutAppendArgs struct {
// You'll have to add definitions here.
Key string
Value string
Op string // "Put" or "Append"
Id int64
// You'll have to add definitions here.
// Field names must start with capital letters,
// otherwise RPC will break.
}
type PutAppendReply struct {
Err Err
}
type GetArgs struct {
Key string
Id int64
}
type GetReply struct {
Err Err
Value string
}
================================================
FILE: src/kvpaxos/server.go
================================================
package kvpaxos
import "net"
import "fmt"
import "net/rpc"
import "log"
import "paxos"
import "sync"
import "sync/atomic"
import "os"
import "syscall"
import "encoding/gob"
import "math/rand"
import "time"
const Debug = 0
func DPrintf(format string, a ...interface{}) (n int, err error) {
if Debug > 0 {
log.Printf(format, a...)
}
return
}
type Op struct {
// Your definitions here.
// Field names must start with capital letters,
// otherwise RPC will break.
OpName string
Key string
Value string
Id int64
}
type KVPaxos struct {
mu sync.Mutex
l net.Listener
me int
dead int32 // for testing
unreliable int32 // for testing
px *paxos.Paxos
// Your definitions here.
content map[string]string
seq int // seq for next req
history map[int64]bool
}
func (kv *KVPaxos) apply(op *Op) {
switch op.OpName {
case Put:
kv.content[op.Key] = op.Value
case Append:
kv.content[op.Key] += op.Value
default:
// nothing
}
// Inject the GET value into the history,
// the Write op can be recorded without value for dedup.
kv.history[op.Id] = true
return
}
// Try to decide the op in one of the paxos instance
// increase the seq until decide the op that we want,
// and apply the chosen value to the kv store.
func (kv *KVPaxos) TryDecide(op Op) (Err, string) {
// TODO concurrency optimization
kv.mu.Lock()
defer kv.mu.Unlock()
if _, ok := kv.history[op.Id]; ok {
if op.OpName == Get {
return OK, kv.content[op.Key]
} else {
return OK, ""
}
}
chosen := false
for !chosen {
timeout := 0 * time.Millisecond
sleep_interval := 10 * time.Millisecond
kv.px.Start(kv.seq, op)
INNER:
for {
fate, v := kv.px.Status(kv.seq)
switch fate {
case paxos.Decided:
{
_op := v.(Op)
kv.px.Done(kv.seq)
kv.apply(&_op)
kv.seq++
if _op.Id == op.Id {
if _op.OpName == Get {
if v, ok := kv.content[op.Key]; ok {
return OK, v
} else {
return ErrNoKey, ""
}
}
// for put/append operation
chosen = true
}
break INNER
}
case paxos.Pending:
{
if timeout > 10*time.Second {
return ErrPending, ""
} else {
time.Sleep(sleep_interval)
timeout += sleep_interval
sleep_interval *= 2
}
}
default:
// Forgotten, do nothing for impossibility
return ErrForgotten, ""
}
}
}
return OK, ""
}
func (kv *KVPaxos) Get(args *GetArgs, reply *GetReply) error {
// Your code here.
op := Op{OpName: Get, Key: args.Key, Value: "", Id: args.Id}
reply.Err, reply.Value = kv.TryDecide(op)
return nil
}
func (kv *KVPaxos) PutAppend(args *PutAppendArgs, reply *PutAppendReply) error {
// Your code here.
op := Op{OpName: args.Op, Key: args.Key, Value: args.Value, Id: args.Id}
reply.Err, _ = kv.TryDecide(op)
return nil
}
// tell the server to shut itself down.
// please do not change these two functions.
func (kv *KVPaxos) kill() {
DPrintf("Kill(%d): die\n", kv.me)
atomic.StoreInt32(&kv.dead, 1)
kv.l.Close()
kv.px.Kill()
}
// call this to find out if the server is dead.
func (kv *KVPaxos) isdead() bool {
return atomic.LoadInt32(&kv.dead) != 0
}
// please do not change these two functions.
func (kv *KVPaxos) setunreliable(what bool) {
if what {
atomic.StoreInt32(&kv.unreliable, 1)
} else {
atomic.StoreInt32(&kv.unreliable, 0)
}
}
func (kv *KVPaxos) isunreliable() bool {
return atomic.LoadInt32(&kv.unreliable) != 0
}
//
// servers[] contains the ports of the set of
// servers that will cooperate via Paxos to
// form the fault-tolerant key/value service.
// me is the index of the current server in servers[].
//
func StartServer(servers []string, me int) *KVPaxos {
// call gob.Register on structures you want
// Go's RPC library to marshall/unmarshall.
gob.Register(Op{})
kv := new(KVPaxos)
kv.me = me
// Your initialization code here.
kv.content = make(map[string]string)
kv.history = make(map[int64]bool)
kv.seq = 0
rpcs := rpc.NewServer()
rpcs.Register(kv)
kv.px = paxos.Make(servers, me, rpcs)
os.Remove(servers[me])
l, e := net.Listen("unix", servers[me])
if e != nil {
log.Fatal("listen error: ", e)
}
kv.l = l
// please do not change any of the following code,
// or do anything to subvert it.
go func() {
for kv.isdead() == false {
conn, err := kv.l.Accept()
if err == nil && kv.isdead() == false {
if kv.isunreliable() && (rand.Int63()%1000) < 100 {
// discard the request.
conn.Close()
} else if kv.isunreliable() && (rand.Int63()%1000) < 200 {
// process the request but force discard of reply.
c1 := conn.(*net.UnixConn)
f, _ := c1.File()
err := syscall.Shutdown(int(f.Fd()), syscall.SHUT_WR)
if err != nil {
fmt.Printf("shutdown: %v\n", err)
}
go rpcs.ServeConn(conn)
} else {
go rpcs.ServeConn(conn)
}
} else if err == nil {
conn.Close()
}
if err != nil && kv.isdead() == false {
fmt.Printf("KVPaxos(%v) accept: %v\n", me, err.Error())
kv.kill()
}
}
}()
return kv
}
================================================
FILE: src/kvpaxos/test.go
================================================
package kvpaxos
import "testing"
import "runtime"
import "strconv"
import "os"
import "time"
import "fmt"
import "math/rand"
import "strings"
import "sync/atomic"
func check(t *testing.T, ck *Clerk, key string, value string) {
v := ck.Get(key)
if v != value {
t.Fatalf("Get(%v) -> %v, expected %v", key, v, value)
}
}
func port(tag string, host int) string {
s := "/var/tmp/824-"
s += strconv.Itoa(os.Getuid()) + "/"
os.Mkdir(s, 0777)
s += "kv-"
s += strconv.Itoa(os.Getpid()) + "-"
s += tag + "-"
s += strconv.Itoa(host)
return s
}
func cleanup(kva []*KVPaxos) {
for i := 0; i < len(kva); i++ {
if kva[i] != nil {
kva[i].kill()
}
}
}
// predict effect of Append(k, val)
// if old value is prev.
func NextValue(prev string, val string) string {
return prev + val
}
func TestBasic(t *testing.T) {
runtime.GOMAXPROCS(4)
const nservers = 3
var kva []*KVPaxos = make([]*KVPaxos, nservers)
var kvh []string = make([]string, nservers)
defer cleanup(kva)
for i := 0; i < nservers; i++ {
kvh[i] = port("basic", i)
}
for i := 0; i < nservers; i++ {
kva[i] = StartServer(kvh, i)
}
ck := MakeClerk(kvh)
var cka [nservers]*Clerk
for i := 0; i < nservers; i++ {
cka[i] = MakeClerk([]string{kvh[i]})
}
fmt.Printf("Test: Basic put/append/get ...\n")
ck.Append("app", "x")
ck.Append("app", "y")
check(t, ck, "app", "xy")
ck.Put("a", "aa")
check(t, ck, "a", "aa")
cka[1].Put("a", "aaa")
check(t, cka[2], "a", "aaa")
check(t, cka[1], "a", "aaa")
check(t, ck, "a", "aaa")
fmt.Printf(" ... Passed\n")
fmt.Printf("Test: Concurrent clients ...\n")
for iters := 0; iters < 20; iters++ {
const npara = 15
var ca [npara]chan bool
for nth := 0; nth < npara; nth++ {
ca[nth] = make(chan bool)
go func(me int) {
defer func() { ca[me] <- true }()
ci := (rand.Int() % nservers)
myck := MakeClerk([]string{kvh[ci]})
if (rand.Int() % 1000) < 500 {
myck.Put("b", strconv.Itoa(rand.Int()))
} else {
myck.Get("b")
}
}(nth)
}
for nth := 0; nth < npara; nth++ {
<-ca[nth]
}
var va [nservers]string
for i := 0; i < nservers; i++ {
va[i] = cka[i].Get("b")
if va[i] != va[0] {
t.Fatalf("mismatch")
}
}
}
fmt.Printf(" ... Passed\n")
time.Sleep(1 * time.Second)
}
func TestDone(t *testing.T) {
runtime.GOMAXPROCS(4)
const nservers = 3
var kva []*KVPaxos = make([]*KVPaxos, nservers)
var kvh []string = make([]string, nservers)
defer cleanup(kva)
for i := 0; i < nservers; i++ {
kvh[i] = port("done", i)
}
for i := 0; i < nservers; i++ {
kva[i] = StartServer(kvh, i)
}
ck := MakeClerk(kvh)
var cka [nservers]*Clerk
for pi := 0; pi < nservers; pi++ {
cka[pi] = MakeClerk([]string{kvh[pi]})
}
fmt.Printf("Test: server frees Paxos log memory...\n")
ck.Put("a", "aa")
check(t, ck, "a", "aa")
runtime.GC()
var m0 runtime.MemStats
runtime.ReadMemStats(&m0)
// rtm's m0.Alloc is 2 MB
sz := 1000000
items := 10
for iters := 0; iters < 2; iters++ {
for i := 0; i < items; i++ {
key := strconv.Itoa(i)
value := make([]byte, sz)
for j := 0; j < len(value); j++ {
value[j] = byte((rand.Int() % 100) + 1)
}
ck.Put(key, string(value))
check(t, cka[i%nservers], key, string(value))
}
}
// Put and Get to each of the replicas, in case
// the Done information is piggybacked on
// the Paxos proposer messages.
for iters := 0; iters < 2; iters++ {
for pi := 0; pi < nservers; pi++ {
cka[pi].Put("a", "aa")
check(t, cka[pi], "a", "aa")
}
}
time.Sleep(1 * time.Second)
runtime.GC()
var m1 runtime.MemStats
runtime.ReadMemStats(&m1)
// rtm's m1.Alloc is 45 MB
// fmt.Printf(" Memory: before %v, after %v\n", m0.Alloc, m1.Alloc)
allowed := m0.Alloc + uint64(nservers*items*sz*2)
if m1.Alloc > allowed {
t.Fatalf("Memory use did not shrink enough (Used: %v, allowed: %v).\n", m1.Alloc, allowed)
}
fmt.Printf(" ... Passed\n")
}
func pp(tag string, src int, dst int) string {
s := "/var/tmp/824-"
s += strconv.Itoa(os.Getuid()) + "/"
s += "kv-" + tag + "-"
s += strconv.Itoa(os.Getpid()) + "-"
s += strconv.Itoa(src) + "-"
s += strconv.Itoa(dst)
return s
}
func cleanpp(tag string, n int) {
for i := 0; i < n; i++ {
for j := 0; j < n; j++ {
ij := pp(tag, i, j)
os.Remove(ij)
}
}
}
func part(t *testing.T, tag string, npaxos int, p1 []int, p2 []int, p3 []int) {
cleanpp(tag, npaxos)
pa := [][]int{p1, p2, p3}
for pi := 0; pi < len(pa); pi++ {
p := pa[pi]
for i := 0; i < len(p); i++ {
for j := 0; j < len(p); j++ {
ij := pp(tag, p[i], p[j])
pj := port(tag, p[j])
err := os.Link(pj, ij)
if err != nil {
t.Fatalf("os.Link(%v, %v): %v\n", pj, ij, err)
}
}
}
}
}
func TestPartition(t *testing.T) {
runtime.GOMAXPROCS(4)
tag := "partition"
const nservers = 5
var kva []*KVPaxos = make([]*KVPaxos, nservers)
defer cleanup(kva)
defer cleanpp(tag, nservers)
for i := 0; i < nservers; i++ {
var kvh []string = make([]string, nservers)
for j := 0; j < nservers; j++ {
if j == i {
kvh[j] = port(tag, i)
} else {
kvh[j] = pp(tag, i, j)
}
}
kva[i] = StartServer(kvh, i)
}
defer part(t, tag, nservers, []int{}, []int{}, []int{})
var cka [nservers]*Clerk
for i := 0; i < nservers; i++ {
cka[i] = MakeClerk([]string{port(tag, i)})
}
fmt.Printf("Test: No partition ...\n")
part(t, tag, nservers, []int{0, 1, 2, 3, 4}, []int{}, []int{})
cka[0].Put("1", "12")
cka[2].Put("1", "13")
check(t, cka[3], "1", "13")
fmt.Printf(" ... Passed\n")
fmt.Printf("Test: Progress in majority ...\n")
part(t, tag, nservers, []int{2, 3, 4}, []int{0, 1}, []int{})
cka[2].Put("1", "14")
check(t, cka[4], "1", "14")
fmt.Printf(" ... Passed\n")
fmt.Printf("Test: No progress in minority ...\n")
done0 := make(chan bool)
done1 := make(chan bool)
go func() {
cka[0].Put("1", "15")
done0 <- true
}()
go func() {
cka[1].Get("1")
done1 <- true
}()
select {
case <-done0:
t.Fatalf("Put in minority completed")
case <-done1:
t.Fatalf("Get in minority completed")
case <-time.After(time.Second):
}
check(t, cka[4], "1", "14")
cka[3].Put("1", "16")
check(t, cka[4], "1", "16")
fmt.Printf(" ... Passed\n")
fmt.Printf("Test: Completion after heal ...\n")
part(t, tag, nservers, []int{0, 2, 3, 4}, []int{1}, []int{})
select {
case <-done0:
case <-time.After(30 * 100 * time.Millisecond):
t.Fatalf("Put did not complete")
}
select {
case <-done1:
t.Fatalf("Get in minority completed")
default:
}
check(t, cka[4], "1", "15")
check(t, cka[0], "1", "15")
part(t, tag, nservers, []int{0, 1, 2}, []int{3, 4}, []int{})
select {
case <-done1:
case <-time.After(100 * 100 * time.Millisecond):
t.Fatalf("Get did not complete")
}
check(t, cka[1], "1", "15")
fmt.Printf(" ... Passed\n")
}
func randclerk(kvh []string) *Clerk {
sa := make([]string, len(kvh))
copy(sa, kvh)
for i := range sa {
j := rand.Intn(i + 1)
sa[i], sa[j] = sa[j], sa[i]
}
return MakeClerk(sa)
}
// check that all known appends are present in a value,
// and are in order for each concurrent client.
func checkAppends(t *testing.T, v string, counts []int) {
nclients := len(counts)
for i := 0; i < nclients; i++ {
lastoff := -1
for j := 0; j < counts[i]; j++ {
wanted := "x " + strconv.Itoa(i) + " " + strconv.Itoa(j) + " y"
off := strings.Index(v, wanted)
if off < 0 {
t.Fatalf("missing element in Append result")
}
off1 := strings.LastIndex(v, wanted)
if off1 != off {
t.Fatalf("duplicate element in Append result")
}
if off <= lastoff {
t.Fatalf("wrong order for element in Append result")
}
lastoff = off
}
}
}
func TestUnreliable(t *testing.T) {
runtime.GOMAXPROCS(4)
const nservers = 3
var kva []*KVPaxos = make([]*KVPaxos, nservers)
var kvh []string = make([]string, nservers)
defer cleanup(kva)
for i := 0; i < nservers; i++ {
kvh[i] = port("un", i)
}
for i := 0; i < nservers; i++ {
kva[i] = StartServer(kvh, i)
kva[i].setunreliable(true)
}
ck := MakeClerk(kvh)
var cka [nservers]*Clerk
for i := 0; i < nservers; i++ {
cka[i] = MakeClerk([]string{kvh[i]})
}
fmt.Printf("Test: Basic put/get, unreliable ...\n")
ck.Put("a", "aa")
check(t, ck, "a", "aa")
cka[1].Put("a", "aaa")
check(t, cka[2], "a", "aaa")
check(t, cka[1], "a", "aaa")
check(t, ck, "a", "aaa")
fmt.Printf(" ... Passed\n")
fmt.Printf("Test: Sequence of puts, unreliable ...\n")
for iters := 0; iters < 6; iters++ {
const ncli = 5
var ca [ncli]chan bool
for cli := 0; cli < ncli; cli++ {
ca[cli] = make(chan bool)
go func(me int) {
ok := false
defer func() { ca[me] <- ok }()
myck := randclerk(kvh)
key := strconv.Itoa(me)
vv := myck.Get(key)
myck.Append(key, "0")
vv = NextValue(vv, "0")
myck.Append(key, "1")
vv = NextValue(vv, "1")
myck.Append(key, "2")
vv = NextValue(vv, "2")
time.Sleep(100 * time.Millisecond)
if myck.Get(key) != vv {
t.Fatalf("wrong value")
}
if myck.Get(key) != vv {
t.Fatalf("wrong value")
}
ok = true
}(cli)
}
for cli := 0; cli < ncli; cli++ {
x := <-ca[cli]
if x == false {
t.Fatalf("failure")
}
}
}
fmt.Printf(" ... Passed\n")
fmt.Printf("Test: Concurrent clients, unreliable ...\n")
for iters := 0; iters < 20; iters++ {
const ncli = 15
var ca [ncli]chan bool
for cli := 0; cli < ncli; cli++ {
ca[cli] = make(chan bool)
go func(me int) {
defer func() { ca[me] <- true }()
myck := randclerk(kvh)
if (rand.Int() % 1000) < 500 {
myck.Put("b", strconv.Itoa(rand.Int()))
} else {
myck.Get("b")
}
}(cli)
}
for cli := 0; cli < ncli; cli++ {
<-ca[cli]
}
var va [nservers]string
for i := 0; i < nservers; i++ {
va[i] = cka[i].Get("b")
if va[i] != va[0] {
t.Fatalf("mismatch; 0 got %v, %v got %v", va[0], i, va[i])
}
}
}
fmt.Printf(" ... Passed\n")
fmt.Printf("Test: Concurrent Append to same key, unreliable ...\n")
ck.Put("k", "")
ff := func(me int, ch chan int) {
ret := -1
defer func() { ch <- ret }()
myck := randclerk(kvh)
n := 0
for n < 5 {
myck.Append("k", "x "+strconv.Itoa(me)+" "+strconv.Itoa(n)+" y")
n++
}
ret = n
}
ncli := 5
cha := []chan int{}
for i := 0; i < ncli; i++ {
cha = append(cha, make(chan int))
go ff(i, cha[i])
}
counts := []int{}
for i := 0; i < ncli; i++ {
n := <-cha[i]
if n < 0 {
t.Fatal("client failed")
}
counts = append(counts, n)
}
vx := ck.Get("k")
checkAppends(t, vx, counts)
{
for i := 0; i < nservers; i++ {
vi := cka[i].Get("k")
if vi != vx {
t.Fatalf("mismatch; 0 got %v, %v got %v", vx, i, vi)
}
}
}
fmt.Printf(" ... Passed\n")
time.Sleep(1 * time.Second)
}
func TestHole(t *testing.T) {
runtime.GOMAXPROCS(4)
fmt.Printf("Test: Tolerates holes in paxos sequence ...\n")
tag := "hole"
const nservers = 5
var kva []*KVPaxos = make([]*KVPaxos, nservers)
defer cleanup(kva)
defer cleanpp(tag, nservers)
for i := 0; i < nservers; i++ {
var kvh []string = make([]string, nservers)
for j := 0; j < nservers; j++ {
if j == i {
kvh[j] = port(tag, i)
} else {
kvh[j] = pp(tag, i, j)
}
}
kva[i] = StartServer(kvh, i)
}
defer part(t, tag, nservers, []int{}, []int{}, []int{})
for iters := 0; iters < 5; iters++ {
part(t, tag, nservers, []int{0, 1, 2, 3, 4}, []int{}, []int{})
ck2 := MakeClerk([]string{port(tag, 2)})
ck2.Put("q", "q")
done := int32(0)
const nclients = 10
var ca [nclients]chan bool
for xcli := 0; xcli < nclients; xcli++ {
ca[xcli] = make(chan bool)
go func(cli int) {
ok := false
defer func() { ca[cli] <- ok }()
var cka [nservers]*Clerk
for i := 0; i < nservers; i++ {
cka[i] = MakeClerk([]string{port(tag, i)})
}
key := strconv.Itoa(cli)
last := ""
cka[0].Put(key, last)
for atomic.LoadInt32(&done) == 0 {
ci := (rand.Int() % 2)
if (rand.Int() % 1000) < 500 {
nv := strconv.Itoa(rand.Int())
cka[ci].Put(key, nv)
last = nv
} else {
v := cka[ci].Get(key)
if v != last {
t.Fatalf("%v: wrong value, key %v, wanted %v, got %v",
cli, key, last, v)
}
}
}
ok = true
}(xcli)
}
time.Sleep(3 * time.Second)
part(t, tag, nservers, []int{2, 3, 4}, []int{0, 1}, []int{})
// can majority partition make progress even though
// minority servers were interrupted in the middle of
// paxos agreements?
check(t, ck2, "q", "q")
ck2.Put("q", "qq")
check(t, ck2, "q", "qq")
// restore network, wait for all threads to exit.
part(t, tag, nservers, []int{0, 1, 2, 3, 4}, []int{}, []int{})
atomic.StoreInt32(&done, 1)
ok := true
for i := 0; i < nclients; i++ {
z := <-ca[i]
ok = ok && z
}
if ok == false {
t.Fatal("something is wrong")
}
check(t, ck2, "q", "qq")
}
fmt.Printf(" ... Passed\n")
}
func TestManyPartition(t *testing.T) {
runtime.GOMAXPROCS(4)
fmt.Printf("Test: Many clients, changing partitions ...\n")
tag := "many"
const nservers = 5
var kva []*KVPaxos = make([]*KVPaxos, nservers)
defer cleanup(kva)
defer cleanpp(tag, nservers)
for i := 0; i < nservers; i++ {
var kvh []string = make([]string, nservers)
for j := 0; j < nservers; j++ {
if j == i {
kvh[j] = port(tag, i)
} else {
kvh[j] = pp(tag, i, j)
}
}
kva[i] = StartServer(kvh, i)
kva[i].setunreliable(true)
}
defer part(t, tag, nservers, []int{}, []int{}, []int{})
part(t, tag, nservers, []int{0, 1, 2, 3, 4}, []int{}, []int{})
done := int32(0)
// re-partition periodically
ch1 := make(chan bool)
go func() {
defer func() { ch1 <- true }()
for atomic.LoadInt32(&done) == 0 {
var a [nservers]int
for i := 0; i < nservers; i++ {
a[i] = (rand.Int() % 3)
}
pa := make([][]int, 3)
for i := 0; i < 3; i++ {
pa[i] = make([]int, 0)
for j := 0; j < nservers; j++ {
if a[j] == i {
pa[i] = append(pa[i], j)
}
}
}
part(t, tag, nservers, pa[0], pa[1], pa[2])
time.Sleep(time.Duration(rand.Int63()%200) * time.Millisecond)
}
}()
const nclients = 10
var ca [nclients]chan bool
for xcli := 0; xcli < nclients; xcli++ {
ca[xcli] = make(chan bool)
go func(cli int) {
ok := false
defer func() { ca[cli] <- ok }()
sa := make([]string, nservers)
for i := 0; i < nservers; i++ {
sa[i] = port(tag, i)
}
for i := range sa {
j := rand.Intn(i + 1)
sa[i], sa[j] = sa[j], sa[i]
}
myck := MakeClerk(sa)
key := strconv.Itoa(cli)
last := ""
myck.Put(key, last)
for atomic.LoadInt32(&done) == 0 {
if (rand.Int() % 1000) < 500 {
nv := strconv.Itoa(rand.Int())
myck.Append(key, nv)
last = NextValue(last, nv)
} else {
v := myck.Get(key)
if v != last {
t.Fatalf("%v: get wrong value, key %v, wanted %v, got %v",
cli, key, last, v)
}
}
}
ok = true
}(xcli)
}
time.Sleep(20 * time.Second)
atomic.StoreInt32(&done, 1)
<-ch1
part(t, tag, nservers, []int{0, 1, 2, 3, 4}, []int{}, []int{})
ok := true
for i := 0; i < nclients; i++ {
z := <-ca[i]
ok = ok && z
}
if ok {
fmt.Printf(" ... Passed\n")
}
}
================================================
FILE: src/kvraft/client.go
================================================
package raftkv
import "labrpc"
import "crypto/rand"
import "math/big"
type Clerk struct {
servers []*labrpc.ClientEnd
// You will have to modify this struct.
}
func nrand() int64 {
max := big.NewInt(int64(1) << 62)
bigx, _ := rand.Int(rand.Reader, max)
x := bigx.Int64()
return x
}
func MakeClerk(servers []*labrpc.ClientEnd) *Clerk {
ck := new(Clerk)
ck.servers = servers
// You'll have to add code here.
return ck
}
//
// fetch the current value for a key.
// returns "" if the key does not exist.
// keeps trying forever in the face of all other errors.
//
// you can send an RPC with code like this:
// ok := ck.servers[i].Call("KVServer.Get", &args, &reply)
//
// the types of args and reply (including whether they are pointers)
// must match the declared types of the RPC handler function's
// arguments. and reply must be passed as a pointer.
//
func (ck *Clerk) Get(key string) string {
// You will have to modify this function.
return ""
}
//
// shared by Put and Append.
//
// you can send an RPC with code like this:
// ok := ck.servers[i].Call("KVServer.PutAppend", &args, &reply)
//
// the types of args and reply (including whether they are pointers)
// must match the declared types of the RPC handler function's
// arguments. and reply must be passed as a pointer.
//
func (ck *Clerk) PutAppend(key string, value string, op string) {
// You will have to modify this function.
}
func (ck *Clerk) Put(key string, value string) {
ck.PutAppend(key, value, "Put")
}
func (ck *Clerk) Append(key string, value string) {
ck.PutAppend(key, value, "Append")
}
================================================
FILE: src/kvraft/common.go
================================================
package raftkv
const (
OK = "OK"
ErrNoKey = "ErrNoKey"
)
type Err string
// Put or Append
type PutAppendArgs struct {
Key string
Value string
Op string // "Put" or "Append"
// You'll have to add definitions here.
// Field names must start with capital letters,
// otherwise RPC will break.
}
type PutAppendReply struct {
WrongLeader bool
Err Err
}
type GetArgs struct {
Key string
// You'll have to add definitions here.
}
type GetReply struct {
WrongLeader bool
Err Err
Value string
}
================================================
FILE: src/kvraft/config.go
================================================
package raftkv
import "labrpc"
import "testing"
import "os"
// import "log"
import crand "crypto/rand"
import "math/big"
import "math/rand"
import "encoding/base64"
import "sync"
import "runtime"
import "raft"
import "fmt"
import "time"
import "sync/atomic"
func randstring(n int) string {
b := make([]byte, 2*n)
crand.Read(b)
s := base64.URLEncoding.EncodeToString(b)
return s[0:n]
}
func makeSeed() int64 {
max := big.NewInt(int64(1) << 62)
bigx, _ := crand.Int(crand.Reader, max)
x := bigx.Int64()
return x
}
// Randomize server handles
func random_handles(kvh []*labrpc.ClientEnd) []*labrpc.ClientEnd {
sa := make([]*labrpc.ClientEnd, len(kvh))
copy(sa, kvh)
for i := range sa {
j := rand.Intn(i + 1)
sa[i], sa[j] = sa[j], sa[i]
}
return sa
}
type config struct {
mu sync.Mutex
t *testing.T
net *labrpc.Network
n int
kvservers []*KVServer
saved []*raft.Persister
endnames [][]string // names of each server's sending ClientEnds
clerks map[*Clerk][]string
nextClientId int
maxraftstate int
start time.Time // time at which make_config() was called
// begin()/end() statistics
t0 time.Time // time at which test_test.go called cfg.begin()
rpcs0 int // rpcTotal() at start of test
ops int32 // number of clerk get/put/append method calls
}
func (cfg *config) checkTimeout() {
// enforce a two minute real-time limit on each test
if !cfg.t.Failed() && time.Since(cfg.start) > 120*time.Second {
cfg.t.Fatal("test took longer than 120 seconds")
}
}
func (cfg *config) cleanup() {
cfg.mu.Lock()
defer cfg.mu.Unlock()
for i := 0; i < len(cfg.kvservers); i++ {
if cfg.kvservers[i] != nil {
cfg.kvservers[i].Kill()
}
}
cfg.net.Cleanup()
cfg.checkTimeout()
}
// Maximum log size across all servers
func (cfg *config) LogSize() int {
logsize := 0
for i := 0; i < cfg.n; i++ {
n := cfg.saved[i].RaftStateSize()
if n > logsize {
logsize = n
}
}
return logsize
}
// Maximum snapshot size across all servers
func (cfg *config) SnapshotSize() int {
snapshotsize := 0
for i := 0; i < cfg.n; i++ {
n := cfg.saved[i].SnapshotSize()
if n > snapshotsize {
snapshotsize = n
}
}
return snapshotsize
}
// attach server i to servers listed in to
// caller must hold cfg.mu
func (cfg *config) connectUnlocked(i int, to []int) {
// log.Printf("connect peer %d to %v\n", i, to)
// outgoing socket files
for j := 0; j < len(to); j++ {
endname := cfg.endnames[i][to[j]]
cfg.net.Enable(endname, true)
}
// incoming socket files
for j := 0; j < len(to); j++ {
endname := cfg.endnames[to[j]][i]
cfg.net.Enable(endname, true)
}
}
func (cfg *config) connect(i int, to []int) {
cfg.mu.Lock()
defer cfg.mu.Unlock()
cfg.connectUnlocked(i, to)
}
// detach server i from the servers listed in from
// caller must hold cfg.mu
func (cfg *config) disconnectUnlocked(i int, from []int) {
// log.Printf("disconnect peer %d from %v\n", i, from)
// outgoing socket files
for j := 0; j < len(from); j++ {
if cfg.endnames[i] != nil {
endname := cfg.endnames[i][from[j]]
cfg.net.Enable(endname, false)
}
}
// incoming socket files
for j := 0; j < len(from); j++ {
if cfg.endnames[j] != nil {
endname := cfg.endnames[from[j]][i]
cfg.net.Enable(endname, false)
}
}
}
func (cfg *config) disconnect(i int, from []int) {
cfg.mu.Lock()
defer cfg.mu.Unlock()
cfg.disconnectUnlocked(i, from)
}
func (cfg *config) All() []int {
all := make([]int, cfg.n)
for i := 0; i < cfg.n; i++ {
all[i] = i
}
return all
}
func (cfg *config) ConnectAll() {
cfg.mu.Lock()
defer cfg.mu.Unlock()
for i := 0; i < cfg.n; i++ {
cfg.connectUnlocked(i, cfg.All())
}
}
// Sets up 2 partitions with connectivity between servers in each partition.
func (cfg *config) partition(p1 []int, p2 []int) {
cfg.mu.Lock()
defer cfg.mu.Unlock()
// log.Printf("partition servers into: %v %v\n", p1, p2)
for i := 0; i < len(p1); i++ {
cfg.disconnectUnlocked(p1[i], p2)
cfg.connectUnlocked(p1[i], p1)
}
for i := 0; i < len(p2); i++ {
cfg.disconnectUnlocked(p2[i], p1)
cfg.connectUnlocked(p2[i], p2)
}
}
// Create a clerk with clerk specific server names.
// Give it connections to all of the servers, but for
// now enable only connections to servers in to[].
func (cfg *config) makeClient(to []int) *Clerk {
cfg.mu.Lock()
defer cfg.mu.Unlock()
// a fresh set of ClientEnds.
ends := make([]*labrpc.ClientEnd, cfg.n)
endnames := make([]string, cfg.n)
for j := 0; j < cfg.n; j++ {
endnames[j] = randstring(20)
ends[j] = cfg.net.MakeEnd(endnames[j])
cfg.net.Connect(endnames[j], j)
}
ck := MakeClerk(random_handles(ends))
cfg.clerks[ck] = endnames
cfg.nextClientId++
cfg.ConnectClientUnlocked(ck, to)
return ck
}
func (cfg *config) deleteClient(ck *Clerk) {
cfg.mu.Lock()
defer cfg.mu.Unlock()
v := cfg.clerks[ck]
for i := 0; i < len(v); i++ {
os.Remove(v[i])
}
delete(cfg.clerks, ck)
}
// caller should hold cfg.mu
func (cfg *config) ConnectClientUnlocked(ck *Clerk, to []int) {
// log.Printf("ConnectClient %v to %v\n", ck, to)
endnames := cfg.clerks[ck]
for j := 0; j < len(to); j++ {
s := endnames[to[j]]
cfg.net.Enable(s, true)
}
}
func (cfg *config) ConnectClient(ck *Clerk, to []int) {
cfg.mu.Lock()
defer cfg.mu.Unlock()
cfg.ConnectClientUnlocked(ck, to)
}
// caller should hold cfg.mu
func (cfg *config) DisconnectClientUnlocked(ck *Clerk, from []int) {
// log.Printf("DisconnectClient %v from %v\n", ck, from)
endnames := cfg.clerks[ck]
for j := 0; j < len(from); j++ {
s := endnames[from[j]]
cfg.net.Enable(s, false)
}
}
func (cfg *config) DisconnectClient(ck *Clerk, from []int) {
cfg.mu.Lock()
defer cfg.mu.Unlock()
cfg.DisconnectClientUnlocked(ck, from)
}
// Shutdown a server by isolating it
func (cfg *config) ShutdownServer(i int) {
cfg.mu.Lock()
defer cfg.mu.Unlock()
cfg.disconnectUnlocked(i, cfg.All())
// disable client connections to the server.
// it's important to do this before creating
// the new Persister in saved[i], to avoid
// the possibility of the server returning a
// positive reply to an Append but persisting
// the result in the superseded Persister.
cfg.net.DeleteServer(i)
// a fresh persister, in case old instance
// continues to update the Persister.
// but copy old persister's content so that we always
// pass Make() the last persisted state.
if cfg.saved[i] != nil {
cfg.saved[i] = cfg.saved[i].Copy()
}
kv := cfg.kvservers[i]
if kv != nil {
cfg.mu.Unlock()
kv.Kill()
cfg.mu.Lock()
cfg.kvservers[i] = nil
}
}
// If restart servers, first call ShutdownServer
func (cfg *config) StartServer(i int) {
cfg.mu.Lock()
// a fresh set of outgoing ClientEnd names.
cfg.endnames[i] = make([]string, cfg.n)
for j := 0; j < cfg.n; j++ {
cfg.endnames[i][j] = randstring(20)
}
// a fresh set of ClientEnds.
ends := make([]*labrpc.ClientEnd, cfg.n)
for j := 0; j < cfg.n; j++ {
ends[j] = cfg.net.MakeEnd(cfg.endnames[i][j])
cfg.net.Connect(cfg.endnames[i][j], j)
}
// a fresh persister, so old instance doesn't overwrite
// new instance's persisted state.
// give the fresh persister a copy of the old persister's
// state, so that the spec is that we pass StartKVServer()
// the last persisted state.
if cfg.saved[i] != nil {
cfg.saved[i] = cfg.saved[i].Copy()
} else {
cfg.saved[i] = raft.MakePersister()
}
cfg.mu.Unlock()
cfg.kvservers[i] = StartKVServer(ends, i, cfg.saved[i], cfg.maxraftstate)
kvsvc := labrpc.MakeService(cfg.kvservers[i])
rfsvc := labrpc.MakeService(cfg.kvservers[i].rf)
srv := labrpc.MakeServer()
srv.AddService(kvsvc)
srv.AddService(rfsvc)
cfg.net.AddServer(i, srv)
}
func (cfg *config) Leader() (bool, int) {
cfg.mu.Lock()
defer cfg.mu.Unlock()
for i := 0; i < cfg.n; i++ {
_, is_leader := cfg.kvservers[i].rf.GetState()
if is_leader {
return true, i
}
}
return false, 0
}
// Partition servers into 2 groups and put current leader in minority
func (cfg *config) make_partition() ([]int, []int) {
_, l := cfg.Leader()
p1 := make([]int, cfg.n/2+1)
p2 := make([]int, cfg.n/2)
j := 0
for i := 0; i < cfg.n; i++ {
if i != l {
if j < len(p1) {
p1[j] = i
} else {
p2[j-len(p1)] = i
}
j++
}
}
p2[len(p2)-1] = l
return p1, p2
}
var ncpu_once sync.Once
func make_config(t *testing.T, n int, unreliable bool, maxraftstate int) *config {
ncpu_once.Do(func() {
if runtime.NumCPU() < 2 {
fmt.Printf("warning: only one CPU, which may conceal locking bugs\n")
}
rand.Seed(makeSeed())
})
runtime.GOMAXPROCS(4)
cfg := &config{}
cfg.t = t
cfg.net = labrpc.MakeNetwork()
cfg.n = n
cfg.kvservers = make([]*KVServer, cfg.n)
cfg.saved = make([]*raft.Persister, cfg.n)
cfg.endnames = make([][]string, cfg.n)
cfg.clerks = make(map[*Clerk][]string)
cfg.nextClientId = cfg.n + 1000 // client ids start 1000 above the highest serverid
cfg.maxraftstate = maxraftstate
cfg.start = time.Now()
// create a full set of KV servers.
for i := 0; i < cfg.n; i++ {
cfg.StartServer(i)
}
cfg.ConnectAll()
cfg.net.Reliable(!unreliable)
return cfg
}
func (cfg *config) rpcTotal() int {
return cfg.net.GetTota
gitextract_tcaroy90/
├── Makefile
├── README.md
├── lab/
│ ├── lab1 MapReduce.md
│ ├── lab2 Raft.md
│ ├── lab3 Paxos-based KV Service.md
│ └── lab4 shared key value service.md
├── lecture/
│ ├── l01 mapreduce/
│ │ └── l01.txt
│ ├── l02 PRC_threads_crawler_kv/
│ │ ├── PRC_Threads.md
│ │ ├── crawler.go
│ │ └── kv.go
│ ├── l03 GFS/
│ │ └── GFS.md
│ ├── l04 more_primary_backup/
│ │ └── FDS.md
│ ├── l06 fault tolerance raft/
│ │ └── raft.md
│ ├── l07 fault tolerance raft2/
│ │ └── raft2.md
│ └── l08 zookeeper/
│ └── zookeeper.md
└── src/
├── diskv/
│ ├── client.go
│ ├── common.go
│ ├── dist_test.go
│ ├── server.go
│ └── test.go
├── kvpaxos/
│ ├── client.go
│ ├── common.go
│ ├── server.go
│ └── test.go
├── kvraft/
│ ├── client.go
│ ├── common.go
│ ├── config.go
│ ├── server.go
│ └── test.go
├── labgob/
│ ├── labgob.go
│ └── test_test.go
├── labrpc/
│ ├── labrpc.go
│ └── test_test.go
├── linearizability/
│ ├── bitset.go
│ ├── linearizability.go
│ ├── model.go
│ └── models.go
├── main/
│ ├── diskvd.go
│ ├── ii.go
│ ├── lockc.go
│ ├── lockd.go
│ ├── mr-challenge.txt
│ ├── mr-testout.txt
│ ├── pbc.go
│ ├── pbd.go
│ ├── pg-being_ernest.txt
│ ├── pg-dorian_gray.txt
│ ├── pg-frankenstein.txt
│ ├── pg-grimm.txt
│ ├── pg-huckleberry_finn.txt
│ ├── pg-metamorphosis.txt
│ ├── pg-sherlock_holmes.txt
│ ├── pg-tom_sawyer.txt
│ ├── test-ii.sh
│ ├── test-mr.sh
│ ├── test-wc.sh
│ ├── viewd.go
│ └── wc.go
├── mapreduce/
│ ├── 824-mrinput-0.txt
│ ├── common.go
│ ├── common_map.go
│ ├── common_reduce.go
│ ├── common_rpc.go
│ ├── master.go
│ ├── master_rpc.go
│ ├── master_splitmerge.go
│ ├── schedule.go
│ ├── test_test.go
│ └── worker.go
├── paxos/
│ ├── paxos.go
│ └── test_test.go
├── pbservice/
│ ├── client.go
│ ├── common.go
│ ├── server.go
│ └── test.go
├── raft/
│ ├── config.go
│ ├── persister.go
│ ├── raft.go
│ ├── test_test.go
│ └── util.go
├── shardkv/
│ ├── client.go
│ ├── common.go
│ ├── config.go
│ ├── server.go
│ └── test_test.go
├── shardmaster/
│ ├── client.go
│ ├── common.go
│ ├── config.go
│ ├── server.go
│ └── test_test.go
└── viewservice/
├── client.go
├── common.go
├── server.go
└── test.go
SYMBOL INDEX (752 symbols across 67 files)
FILE: lecture/l02 PRC_threads_crawler_kv/crawler.go
function Serial (line 17) | func Serial(url string, fetcher Fetcher, fetched map[string]bool) {
type fetchState (line 36) | type fetchState struct
function ConcurrentMutex (line 41) | func ConcurrentMutex(url string, fetcher Fetcher, f *fetchState) {
function makeState (line 66) | func makeState() *fetchState {
function worker (line 76) | func worker(url string, ch chan []string, fetcher Fetcher) {
function master (line 85) | func master(ch chan []string, fetcher Fetcher) {
function ConcurrentChannel (line 103) | func ConcurrentChannel(url string, fetcher Fetcher) {
function main (line 115) | func main() {
type Fetcher (line 130) | type Fetcher interface
type fakeFetcher (line 136) | type fakeFetcher
method Fetch (line 143) | func (f fakeFetcher) Fetch(url string) ([]string, error) {
type fakeResult (line 138) | type fakeResult struct
FILE: lecture/l02 PRC_threads_crawler_kv/kv.go
constant OK (line 16) | OK = "OK"
constant ErrNoKey (line 17) | ErrNoKey = "ErrNoKey"
type Err (line 20) | type Err
type PutArgs (line 22) | type PutArgs struct
type PutReply (line 27) | type PutReply struct
type GetArgs (line 31) | type GetArgs struct
type GetReply (line 35) | type GetReply struct
function connect (line 44) | func connect() *rpc.Client {
function get (line 52) | func get(key string) string {
function put (line 64) | func put(key string, val string) {
type KV (line 79) | type KV struct
method Get (line 106) | func (kv *KV) Get(args *GetArgs, reply *GetReply) error {
method Put (line 121) | func (kv *KV) Put(args *PutArgs, reply *PutReply) error {
function server (line 84) | func server() {
function main (line 134) | func main() {
FILE: src/diskv/client.go
type Clerk (line 11) | type Clerk struct
method Get (line 85) | func (ck *Clerk) Get(key string) string {
method PutAppend (line 122) | func (ck *Clerk) PutAppend(key string, value string, op string) {
method Put (line 160) | func (ck *Clerk) Put(key string, value string) {
method Append (line 163) | func (ck *Clerk) Append(key string, value string) {
function nrand (line 18) | func nrand() int64 {
function MakeClerk (line 25) | func MakeClerk(shardmasters []string) *Clerk {
function call (line 49) | func call(srv string, rpcname string,
function key2shard (line 71) | func key2shard(key string) int {
FILE: src/diskv/common.go
constant OK (line 13) | OK = "OK"
constant ErrNoKey (line 14) | ErrNoKey = "ErrNoKey"
constant ErrWrongGroup (line 15) | ErrWrongGroup = "ErrWrongGroup"
type Err (line 18) | type Err
type PutAppendArgs (line 20) | type PutAppendArgs struct
type PutAppendReply (line 30) | type PutAppendReply struct
type GetArgs (line 34) | type GetArgs struct
type GetReply (line 39) | type GetReply struct
FILE: src/diskv/dist_test.go
function port (line 10) | func port(tag string, host int) string {
function NextValue (line 22) | func NextValue(prev string, val string) string {
function mcleanup (line 26) | func mcleanup(sma []*shardmaster.ShardMaster) {
function TestConcurretnUnreliable (line 34) | func TestConcurretnUnreliable(t *tetsing.T) {
FILE: src/diskv/server.go
constant Debug (line 20) | Debug = 0
function DPrintf (line 22) | func DPrintf(format string, a ...interface{}) (n int, err error) {
type Op (line 29) | type Op struct
type DisKV (line 33) | type DisKV struct
method shardDir (line 56) | func (kv *DisKV) shardDir(shard int) string {
method encodeKey (line 73) | func (kv *DisKV) encodeKey(key string) string {
method decodeKey (line 77) | func (kv *DisKV) decodeKey(filename string) (string, error) {
method fileGet (line 83) | func (kv *DisKV) fileGet(shard int, key string) (string, error) {
method filePut (line 92) | func (kv *DisKV) filePut(shard int, key string, content string) error {
method fileReadShard (line 105) | func (kv *DisKV) fileReadShard(shard int) map[string]string {
method fileReplaceShard (line 130) | func (kv *DisKV) fileReplaceShard(shard int, m map[string]string) {
method Get (line 138) | func (kv *DisKV) Get(args *GetArgs, reply *GetReply) error {
method PutAppend (line 144) | func (kv *DisKV) PutAppend(args *PutAppendArgs, reply *PutAppendReply)...
method tick (line 153) | func (kv *DisKV) tick() {
method kill (line 159) | func (kv *DisKV) kill() {
method isdead (line 166) | func (kv *DisKV) isdead() bool {
method Setunreliable (line 171) | func (kv *DisKV) Setunreliable(what bool) {
method isunreliable (line 179) | func (kv *DisKV) isunreliable() bool {
function StartServer (line 198) | func StartServer(gid int64, shardmasters []string,
FILE: src/diskv/test.go
type tServer (line 21) | type tServer struct
type tGroup (line 29) | type tGroup struct
type tCluster (line 35) | type tCluster struct
method newport (line 52) | func (tc *tCluster) newport() string {
method start1 (line 62) | func (tc *tCluster) start1(gi int, si int) {
method kill1 (line 103) | func (tc *tCluster) kill1(gi int, si int, deletefiles bool) {
method cleanup (line 119) | func (tc *tCluster) cleanup() {
method shardclerk (line 140) | func (tc *tCluster) shardclerk() *shardmaster.Clerk {
method clerk (line 144) | func (tc *tCluster) clerk() *Clerk {
method join (line 148) | func (tc *tCluster) join(gi int) {
method leave (line 156) | func (tc *tCluster) leave(gi int) {
method space (line 161) | func (tc *tCluster) space() int64 {
function randstring (line 45) | func randstring(n int) string {
function setup (line 173) | func setup(t *testing.T, tag string, ngroups int, nreplicas int, unrelia...
function Test4Basic (line 239) | func Test4Basic(t *testing.T) {
function Test4Move (line 296) | func Test4Move(t *testing.T) {
function Test4Limp (line 354) | func Test4Limp(t *testing.T) {
function doConcurrent (line 420) | func doConcurrent(t *testing.T, unreliable bool) {
function Test4Concurrent (line 466) | func Test4Concurrent(t *testing.T) {
function Test4ConcurrentUnreliable (line 472) | func Test4ConcurrentUnreliable(t *testing.T) {
function Test5BasicPersistence (line 486) | func Test5BasicPersistence(t *testing.T) {
function Test5OneRestart (line 544) | func Test5OneRestart(t *testing.T) {
function Test5DiskUse (line 599) | func Test5DiskUse(t *testing.T) {
function Test5AppendUse (line 696) | func Test5AppendUse(t *testing.T) {
function Test5OneLostDisk (line 795) | func Test5OneLostDisk(t *testing.T) {
function Test5OneLostOneDown (line 874) | func Test5OneLostOneDown(t *testing.T) {
function checkAppends (line 965) | func checkAppends(t *testing.T, v string, counts []int) {
function doConcurrentCrash (line 987) | func doConcurrentCrash(t *testing.T, unreliable bool) {
function Test5ConcurrentCrashReliable (line 1077) | func Test5ConcurrentCrashReliable(t *testing.T) {
function Test5Simultaneous (line 1086) | func Test5Simultaneous(t *testing.T) {
function Test5RejoinMix1 (line 1139) | func Test5RejoinMix1(t *testing.T) {
function Test5RejoinMix3 (line 1219) | func Test5RejoinMix3(t *testing.T) {
FILE: src/kvpaxos/client.go
type Clerk (line 9) | type Clerk struct
method Get (line 66) | func (ck *Clerk) Get(key string) string {
method PutAppend (line 83) | func (ck *Clerk) PutAppend(key string, value string, op string) {
method Put (line 96) | func (ck *Clerk) Put(key string, value string) {
method Append (line 99) | func (ck *Clerk) Append(key string, value string) {
function nrand (line 13) | func nrand() int64 {
function MakeClerk (line 20) | func MakeClerk(servers []string) *Clerk {
function call (line 44) | func call(srv string, rpcname string,
FILE: src/kvpaxos/common.go
constant OK (line 4) | OK = "OK"
constant ErrNoKey (line 5) | ErrNoKey = "ErrNoKey"
constant ErrPending (line 6) | ErrPending = "ErrPending"
constant ErrForgotten (line 7) | ErrForgotten = "ErrForgotten"
constant Get (line 11) | Get = "Get"
constant Put (line 12) | Put = "Put"
constant Append (line 13) | Append = "Append"
type Err (line 16) | type Err
type PutAppendArgs (line 19) | type PutAppendArgs struct
type PutAppendReply (line 30) | type PutAppendReply struct
type GetArgs (line 34) | type GetArgs struct
type GetReply (line 39) | type GetReply struct
FILE: src/kvpaxos/server.go
constant Debug (line 17) | Debug = 0
function DPrintf (line 19) | func DPrintf(format string, a ...interface{}) (n int, err error) {
type Op (line 26) | type Op struct
type KVPaxos (line 36) | type KVPaxos struct
method apply (line 50) | func (kv *KVPaxos) apply(op *Op) {
method TryDecide (line 68) | func (kv *KVPaxos) TryDecide(op Op) (Err, string) {
method Get (line 126) | func (kv *KVPaxos) Get(args *GetArgs, reply *GetReply) error {
method PutAppend (line 133) | func (kv *KVPaxos) PutAppend(args *PutAppendArgs, reply *PutAppendRepl...
method kill (line 142) | func (kv *KVPaxos) kill() {
method isdead (line 150) | func (kv *KVPaxos) isdead() bool {
method setunreliable (line 155) | func (kv *KVPaxos) setunreliable(what bool) {
method isunreliable (line 163) | func (kv *KVPaxos) isunreliable() bool {
function StartServer (line 173) | func StartServer(servers []string, me int) *KVPaxos {
FILE: src/kvpaxos/test.go
function check (line 14) | func check(t *testing.T, ck *Clerk, key string, value string) {
function port (line 21) | func port(tag string, host int) string {
function cleanup (line 32) | func cleanup(kva []*KVPaxos) {
function NextValue (line 42) | func NextValue(prev string, val string) string {
function TestBasic (line 46) | func TestBasic(t *testing.T) {
function TestDone (line 119) | func TestDone(t *testing.T) {
function pp (line 191) | func pp(tag string, src int, dst int) string {
function cleanpp (line 201) | func cleanpp(tag string, n int) {
function part (line 210) | func part(t *testing.T, tag string, npaxos int, p1 []int, p2 []int, p3 [...
function TestPartition (line 229) | func TestPartition(t *testing.T) {
function randclerk (line 332) | func randclerk(kvh []string) *Clerk {
function checkAppends (line 344) | func checkAppends(t *testing.T, v string, counts []int) {
function TestUnreliable (line 366) | func TestUnreliable(t *testing.T) {
function TestHole (line 521) | func TestHole(t *testing.T) {
function TestManyPartition (line 612) | func TestManyPartition(t *testing.T) {
FILE: src/kvraft/client.go
type Clerk (line 8) | type Clerk struct
method Get (line 39) | func (ck *Clerk) Get(key string) string {
method PutAppend (line 55) | func (ck *Clerk) PutAppend(key string, value string, op string) {
method Put (line 59) | func (ck *Clerk) Put(key string, value string) {
method Append (line 62) | func (ck *Clerk) Append(key string, value string) {
function nrand (line 13) | func nrand() int64 {
function MakeClerk (line 20) | func MakeClerk(servers []*labrpc.ClientEnd) *Clerk {
FILE: src/kvraft/common.go
constant OK (line 4) | OK = "OK"
constant ErrNoKey (line 5) | ErrNoKey = "ErrNoKey"
type Err (line 8) | type Err
type PutAppendArgs (line 11) | type PutAppendArgs struct
type PutAppendReply (line 20) | type PutAppendReply struct
type GetArgs (line 25) | type GetArgs struct
type GetReply (line 30) | type GetReply struct
FILE: src/kvraft/config.go
function randstring (line 19) | func randstring(n int) string {
function makeSeed (line 26) | func makeSeed() int64 {
function random_handles (line 34) | func random_handles(kvh []*labrpc.ClientEnd) []*labrpc.ClientEnd {
type config (line 44) | type config struct
method checkTimeout (line 62) | func (cfg *config) checkTimeout() {
method cleanup (line 69) | func (cfg *config) cleanup() {
method LogSize (line 82) | func (cfg *config) LogSize() int {
method SnapshotSize (line 94) | func (cfg *config) SnapshotSize() int {
method connectUnlocked (line 107) | func (cfg *config) connectUnlocked(i int, to []int) {
method connect (line 123) | func (cfg *config) connect(i int, to []int) {
method disconnectUnlocked (line 131) | func (cfg *config) disconnectUnlocked(i int, from []int) {
method disconnect (line 151) | func (cfg *config) disconnect(i int, from []int) {
method All (line 157) | func (cfg *config) All() []int {
method ConnectAll (line 165) | func (cfg *config) ConnectAll() {
method partition (line 174) | func (cfg *config) partition(p1 []int, p2 []int) {
method makeClient (line 191) | func (cfg *config) makeClient(to []int) *Clerk {
method deleteClient (line 211) | func (cfg *config) deleteClient(ck *Clerk) {
method ConnectClientUnlocked (line 223) | func (cfg *config) ConnectClientUnlocked(ck *Clerk, to []int) {
method ConnectClient (line 232) | func (cfg *config) ConnectClient(ck *Clerk, to []int) {
method DisconnectClientUnlocked (line 239) | func (cfg *config) DisconnectClientUnlocked(ck *Clerk, from []int) {
method DisconnectClient (line 248) | func (cfg *config) DisconnectClient(ck *Clerk, from []int) {
method ShutdownServer (line 255) | func (cfg *config) ShutdownServer(i int) {
method StartServer (line 287) | func (cfg *config) StartServer(i int) {
method Leader (line 325) | func (cfg *config) Leader() (bool, int) {
method make_partition (line 339) | func (cfg *config) make_partition() ([]int, []int) {
method rpcTotal (line 392) | func (cfg *config) rpcTotal() int {
method begin (line 399) | func (cfg *config) begin(description string) {
method op (line 406) | func (cfg *config) op() {
method end (line 414) | func (cfg *config) end() {
function make_config (line 360) | func make_config(t *testing.T, n int, unreliable bool, maxraftstate int)...
FILE: src/kvraft/server.go
constant Debug (line 11) | Debug = 0
function DPrintf (line 13) | func DPrintf(format string, a ...interface{}) (n int, err error) {
type Op (line 21) | type Op struct
type KVServer (line 27) | type KVServer struct
method Get (line 39) | func (kv *KVServer) Get(args *GetArgs, reply *GetReply) {
method PutAppend (line 43) | func (kv *KVServer) PutAppend(args *PutAppendArgs, reply *PutAppendRep...
method Kill (line 53) | func (kv *KVServer) Kill() {
function StartKVServer (line 72) | func StartKVServer(servers []*labrpc.ClientEnd, me int, persister *raft....
FILE: src/kvraft/test.go
constant electionTimeout (line 16) | electionTimeout = 1 * time.Second
constant linearizabilityCheckTimeout (line 18) | linearizabilityCheckTimeout = 1 * time.Second
function Get (line 21) | func Get(cfg *config, ck *Clerk, key string) string {
function Put (line 27) | func Put(cfg *config, ck *Clerk, key string, value string) {
function Append (line 32) | func Append(cfg *config, ck *Clerk, key string, value string) {
function check (line 37) | func check(cfg *config, t *testing.T, ck *Clerk, key string, value strin...
function run_client (line 45) | func run_client(t *testing.T, cfg *config, me int, ca chan bool, fn func...
function spawn_clients_and_wait (line 55) | func spawn_clients_and_wait(t *testing.T, cfg *config, ncli int, fn func...
function NextValue (line 72) | func NextValue(prev string, val string) string {
function checkClntAppends (line 78) | func checkClntAppends(t *testing.T, clnt int, v string, count int) {
function checkConcurrentAppends (line 99) | func checkConcurrentAppends(t *testing.T, v string, counts []int) {
function partitioner (line 122) | func partitioner(t *testing.T, cfg *config, ch chan bool, done *int32) {
function GenericTest (line 151) | func GenericTest(t *testing.T, part string, nclients int, unreliable boo...
function GenericTestLinearizability (line 285) | func GenericTestLinearizability(t *testing.T, part string, nclients int,...
function TestBasic3A (line 426) | func TestBasic3A(t *testing.T) {
function TestConcurrent3A (line 431) | func TestConcurrent3A(t *testing.T) {
function TestUnreliable3A (line 436) | func TestUnreliable3A(t *testing.T) {
function TestUnreliableOneKey3A (line 441) | func TestUnreliableOneKey3A(t *testing.T) {
function TestOnePartition3A (line 476) | func TestOnePartition3A(t *testing.T) {
function TestManyPartitionsOneClient3A (line 551) | func TestManyPartitionsOneClient3A(t *testing.T) {
function TestManyPartitionsManyClients3A (line 556) | func TestManyPartitionsManyClients3A(t *testing.T) {
function TestPersistOneClient3A (line 561) | func TestPersistOneClient3A(t *testing.T) {
function TestPersistConcurrent3A (line 566) | func TestPersistConcurrent3A(t *testing.T) {
function TestPersistConcurrentUnreliable3A (line 571) | func TestPersistConcurrentUnreliable3A(t *testing.T) {
function TestPersistPartition3A (line 576) | func TestPersistPartition3A(t *testing.T) {
function TestPersistPartitionUnreliable3A (line 581) | func TestPersistPartitionUnreliable3A(t *testing.T) {
function TestPersistPartitionUnreliableLinearizable3A (line 586) | func TestPersistPartitionUnreliableLinearizable3A(t *testing.T) {
function TestSnapshotRPC3B (line 597) | func TestSnapshotRPC3B(t *testing.T) {
function TestSnapshotSize3B (line 653) | func TestSnapshotSize3B(t *testing.T) {
function TestSnapshotRecover3B (line 684) | func TestSnapshotRecover3B(t *testing.T) {
function TestSnapshotRecoverManyClients3B (line 689) | func TestSnapshotRecoverManyClients3B(t *testing.T) {
function TestSnapshotUnreliable3B (line 694) | func TestSnapshotUnreliable3B(t *testing.T) {
function TestSnapshotUnreliableRecover3B (line 699) | func TestSnapshotUnreliableRecover3B(t *testing.T) {
function TestSnapshotUnreliableRecoverConcurrentPartition3B (line 704) | func TestSnapshotUnreliableRecoverConcurrentPartition3B(t *testing.T) {
function TestSnapshotUnreliableRecoverConcurrentPartitionLinearizable3B (line 709) | func TestSnapshotUnreliableRecoverConcurrentPartitionLinearizable3B(t *t...
FILE: src/labgob/labgob.go
type LabEncoder (line 22) | type LabEncoder struct
method Encode (line 32) | func (enc *LabEncoder) Encode(e interface{}) error {
method EncodeValue (line 37) | func (enc *LabEncoder) EncodeValue(value reflect.Value) error {
function NewEncoder (line 26) | func NewEncoder(w io.Writer) *LabEncoder {
type LabDecoder (line 42) | type LabDecoder struct
method Decode (line 52) | func (dec *LabDecoder) Decode(e interface{}) error {
function NewDecoder (line 46) | func NewDecoder(r io.Reader) *LabDecoder {
function Register (line 58) | func Register(value interface{}) {
function RegisterName (line 63) | func RegisterName(name string, value interface{}) {
function checkValue (line 68) | func checkValue(value interface{}) {
function checkType (line 72) | func checkType(t reflect.Type) {
function checkDefault (line 122) | func checkDefault(value interface{}) {
function checkDefault1 (line 129) | func checkDefault1(value reflect.Value, depth int, name string) {
FILE: src/labgob/test_test.go
type T1 (line 7) | type T1 struct
type T2 (line 14) | type T2 struct
type T3 (line 20) | type T3 struct
function TestGOB (line 27) | func TestGOB(t *testing.T) {
type T4 (line 110) | type T4 struct
function TestCapital (line 119) | func TestCapital(t *testing.T) {
function TestDefault (line 146) | func TestDefault(t *testing.T) {
FILE: src/labrpc/labrpc.go
type reqMsg (line 62) | type reqMsg struct
type replyMsg (line 70) | type replyMsg struct
type ClientEnd (line 75) | type ClientEnd struct
method Call (line 84) | func (e *ClientEnd) Call(svcMeth string, args interface{}, reply inter...
type Network (line 116) | type Network struct
method Cleanup (line 156) | func (rn *Network) Cleanup() {
method Reliable (line 160) | func (rn *Network) Reliable(yes bool) {
method LongReordering (line 167) | func (rn *Network) LongReordering(yes bool) {
method LongDelays (line 174) | func (rn *Network) LongDelays(yes bool) {
method ReadEndnameInfo (line 181) | func (rn *Network) ReadEndnameInfo(endname interface{}) (enabled bool,
method IsServerDead (line 197) | func (rn *Network) IsServerDead(endname interface{}, servername interf...
method ProcessReq (line 207) | func (rn *Network) ProcessReq(req reqMsg) {
method MakeEnd (line 299) | func (rn *Network) MakeEnd(endname interface{}) *ClientEnd {
method AddServer (line 318) | func (rn *Network) AddServer(servername interface{}, rs *Server) {
method DeleteServer (line 325) | func (rn *Network) DeleteServer(servername interface{}) {
method Connect (line 334) | func (rn *Network) Connect(endname interface{}, servername interface{}) {
method Enable (line 342) | func (rn *Network) Enable(endname interface{}, enabled bool) {
method GetCount (line 350) | func (rn *Network) GetCount(servername interface{}) int {
method GetTotalCount (line 358) | func (rn *Network) GetTotalCount() int {
function MakeNetwork (line 130) | func MakeNetwork() *Network {
type Server (line 368) | type Server struct
method AddService (line 380) | func (rs *Server) AddService(svc *Service) {
method dispatch (line 386) | func (rs *Server) dispatch(req reqMsg) replyMsg {
method GetCount (line 413) | func (rs *Server) GetCount() int {
function MakeServer (line 374) | func MakeServer() *Server {
type Service (line 421) | type Service struct
method dispatch (line 459) | func (svc *Service) dispatch(methname string, req reqMsg) replyMsg {
function MakeService (line 428) | func MakeService(rcvr interface{}) *Service {
FILE: src/labrpc/test_test.go
type JunkArgs (line 10) | type JunkArgs struct
type JunkReply (line 13) | type JunkReply struct
type JunkServer (line 17) | type JunkServer struct
method Handler1 (line 23) | func (js *JunkServer) Handler1(args string, reply *int) {
method Handler2 (line 30) | func (js *JunkServer) Handler2(args int, reply *string) {
method Handler3 (line 37) | func (js *JunkServer) Handler3(args int, reply *int) {
method Handler4 (line 45) | func (js *JunkServer) Handler4(args *JunkArgs, reply *JunkReply) {
method Handler5 (line 50) | func (js *JunkServer) Handler5(args JunkArgs, reply *JunkReply) {
function TestBasic (line 54) | func TestBasic(t *testing.T) {
function TestTypes (line 89) | func TestTypes(t *testing.T) {
function TestDisconnect (line 131) | func TestDisconnect(t *testing.T) {
function TestCounts (line 170) | func TestCounts(t *testing.T) {
function TestConcurrentMany (line 206) | func TestConcurrentMany(t *testing.T) {
function TestUnreliable (line 264) | func TestUnreliable(t *testing.T) {
function TestConcurrentOne (line 317) | func TestConcurrentOne(t *testing.T) {
function TestRegression1 (line 379) | func TestRegression1(t *testing.T) {
function TestKilled (line 454) | func TestKilled(t *testing.T) {
function TestBenchmark (line 499) | func TestBenchmark(t *testing.T) {
FILE: src/linearizability/bitset.go
type bitset (line 3) | type bitset
method clone (line 17) | func (b bitset) clone() bitset {
method set (line 27) | func (b bitset) set(pos uint) bitset {
method clear (line 33) | func (b bitset) clear(pos uint) bitset {
method get (line 39) | func (b bitset) get(pos uint) bool {
method popcnt (line 44) | func (b bitset) popcnt() uint {
method hash (line 56) | func (b bitset) hash() uint64 {
method equals (line 64) | func (b bitset) equals(b2 bitset) bool {
function newBitset (line 8) | func newBitset(bits uint) bitset {
function bitsetIndex (line 23) | func bitsetIndex(pos uint) (uint, uint) {
FILE: src/linearizability/linearizability.go
type entryKind (line 9) | type entryKind
constant callEntry (line 12) | callEntry entryKind = false
constant returnEntry (line 13) | returnEntry = true
type entry (line 16) | type entry struct
type byTime (line 23) | type byTime
method Len (line 25) | func (a byTime) Len() int {
method Swap (line 29) | func (a byTime) Swap(i, j int) {
method Less (line 33) | func (a byTime) Less(i, j int) bool {
function makeEntries (line 37) | func makeEntries(history []Operation) []entry {
type node (line 51) | type node struct
function insertBefore (line 59) | func insertBefore(n *node, mark *node) *node {
function length (line 72) | func length(n *node) uint {
function renumber (line 81) | func renumber(events []Event) []Event {
function convertEntries (line 97) | func convertEntries(events []Event) []entry {
function makeLinkedEntries (line 109) | func makeLinkedEntries(entries []entry) *node {
type cacheEntry (line 128) | type cacheEntry struct
function cacheContains (line 133) | func cacheContains(model Model, cache map[uint64][]cacheEntry, entry cac...
type callsEntry (line 142) | type callsEntry struct
function lift (line 147) | func lift(entry *node) {
function unlift (line 157) | func unlift(entry *node) {
function checkSingle (line 167) | func checkSingle(model Model, subhistory *node, kill *int32) bool {
function fillDefault (line 216) | func fillDefault(model Model) Model {
function CheckOperations (line 229) | func CheckOperations(model Model, history []Operation) bool {
function CheckOperationsTimeout (line 235) | func CheckOperationsTimeout(model Model, history []Operation, timeout ti...
function CheckEvents (line 272) | func CheckEvents(model Model, history []Event) bool {
function CheckEventsTimeout (line 278) | func CheckEventsTimeout(model Model, history []Event, timeout time.Durat...
FILE: src/linearizability/model.go
type Operation (line 3) | type Operation struct
type EventKind (line 10) | type EventKind
constant CallEvent (line 13) | CallEvent EventKind = false
constant ReturnEvent (line 14) | ReturnEvent EventKind = true
type Event (line 17) | type Event struct
type Model (line 23) | type Model struct
function NoPartition (line 41) | func NoPartition(history []Operation) [][]Operation {
function NoPartitionEvent (line 45) | func NoPartitionEvent(history []Event) [][]Event {
function ShallowEqual (line 49) | func ShallowEqual(state1, state2 interface{}) bool {
FILE: src/linearizability/models.go
type KvInput (line 5) | type KvInput struct
type KvOutput (line 11) | type KvOutput struct
function KvModel (line 15) | func KvModel() Model {
FILE: src/main/diskvd.go
function usage (line 25) | func usage() {
function main (line 30) | func main() {
FILE: src/main/ii.go
function mapF (line 11) | func mapF(document string, value string) (res []mapreduce.KeyValue) {
function reduceF (line 18) | func reduceF(key string, values []string) string {
function main (line 26) | func main() {
FILE: src/main/lockc.go
function usage (line 11) | func usage() {
function main (line 16) | func main() {
FILE: src/main/lockd.go
function main (line 19) | func main() {
FILE: src/main/pbc.go
function usage (line 25) | func usage() {
function main (line 31) | func main() {
FILE: src/main/pbd.go
function main (line 12) | func main() {
FILE: src/main/viewd.go
function main (line 12) | func main() {
FILE: src/main/wc.go
function mapF (line 16) | func mapF(filename string, contents string) []mapreduce.KeyValue {
function reduceF (line 25) | func reduceF(key string, values []string) string {
function main (line 33) | func main() {
FILE: src/mapreduce/common.go
constant debugEnabled (line 9) | debugEnabled = false
function debug (line 12) | func debug(format string, a ...interface{}) (n int, err error) {
type jobPhase (line 20) | type jobPhase
constant mapPhase (line 23) | mapPhase jobPhase = "mapPhase"
constant reducePhase (line 24) | reducePhase = "reducePhase"
type KeyValue (line 29) | type KeyValue struct
function reduceName (line 36) | func reduceName(jobName string, mapTask int, reduceTask int) string {
function mergeName (line 41) | func mergeName(jobName string, reduceTask int) string {
FILE: src/mapreduce/common_map.go
function doMap (line 7) | func doMap(
function ihash (line 58) | func ihash(s string) int {
FILE: src/mapreduce/common_reduce.go
function doReduce (line 3) | func doReduce(
FILE: src/mapreduce/common_rpc.go
type DoTaskArgs (line 13) | type DoTaskArgs struct
type ShutdownReply (line 27) | type ShutdownReply struct
type RegisterArgs (line 32) | type RegisterArgs struct
function call (line 51) | func call(srv string, rpcname string,
FILE: src/mapreduce/master.go
type Master (line 14) | type Master struct
method Register (line 36) | func (mr *Master) Register(args *RegisterArgs, _ *struct{}) error {
method forwardRegistrations (line 85) | func (mr *Master) forwardRegistrations(ch chan string) {
method run (line 132) | func (mr *Master) run(jobName string, files []string, nreduce int,
method Wait (line 155) | func (mr *Master) Wait() {
method killWorkers (line 161) | func (mr *Master) killWorkers() []int {
function newMaster (line 49) | func newMaster(master string) (mr *Master) {
function Sequential (line 60) | func Sequential(jobName string, files []string, nreduce int,
function Distributed (line 105) | func Distributed(jobName string, files []string, nreduce int, master str...
FILE: src/mapreduce/master_rpc.go
method Shutdown (line 12) | func (mr *Master) Shutdown(_, _ *struct{}) error {
method startRPCServer (line 21) | func (mr *Master) startRPCServer() {
method stopRPCServer (line 59) | func (mr *Master) stopRPCServer() {
FILE: src/mapreduce/master_splitmerge.go
method merge (line 14) | func (mr *Master) merge() {
function removeFile (line 54) | func removeFile(n string) {
method CleanupFiles (line 62) | func (mr *Master) CleanupFiles() {
FILE: src/mapreduce/schedule.go
function schedule (line 14) | func schedule(jobName string, mapFiles []string, nReduce int, phase jobP...
FILE: src/mapreduce/test_test.go
constant nNumber (line 17) | nNumber = 100000
constant nMap (line 18) | nMap = 20
constant nReduce (line 19) | nReduce = 10
function MapFunc (line 26) | func MapFunc(file string, value string) (res []KeyValue) {
function ReduceFunc (line 37) | func ReduceFunc(key string, values []string) string {
function check (line 46) | func check(t *testing.T, files []string) {
function checkWorker (line 90) | func checkWorker(t *testing.T, l []int) {
function makeInputs (line 99) | func makeInputs(num int) []string {
function port (line 122) | func port(suffix string) string {
function setup (line 132) | func setup() *Master {
function cleanup (line 139) | func cleanup(mr *Master) {
function TestSequentialSingle (line 146) | func TestSequentialSingle(t *testing.T) {
function TestSequentialMany (line 154) | func TestSequentialMany(t *testing.T) {
function TestParallelBasic (line 162) | func TestParallelBasic(t *testing.T) {
function TestParallelCheck (line 174) | func TestParallelCheck(t *testing.T) {
function TestOneFailure (line 194) | func TestOneFailure(t *testing.T) {
function TestManyFailures (line 207) | func TestManyFailures(t *testing.T) {
FILE: src/mapreduce/worker.go
type Parallelism (line 18) | type Parallelism struct
type Worker (line 25) | type Worker struct
method DoTask (line 40) | func (wk *Worker) DoTask(arg *DoTaskArgs, _ *struct{}) error {
method Shutdown (line 98) | func (wk *Worker) Shutdown(_ *struct{}, res *ShutdownReply) error {
method register (line 108) | func (wk *Worker) register(master string) {
function RunWorker (line 119) | func RunWorker(MasterAddress string, me string,
FILE: src/paxos/paxos.go
type Fate (line 40) | type Fate
constant Decided (line 43) | Decided Fate = iota + 1
constant Pending (line 44) | Pending
constant Forgotten (line 45) | Forgotten
type Paxos (line 48) | type Paxos struct
method Prepare (line 201) | func (px *Paxos) Prepare(args *PrepareArgs, reply *PrepareReply) error {
method Accept (line 218) | func (px *Paxos) Accept(args *AcceptArgs, reply *AcceptReply) error {
method Decide (line 235) | func (px *Paxos) Decide(args *DecideArgs, reply *DecideReply) error {
method Start (line 389) | func (px *Paxos) Start(seq int, v interface{}) {
method Done (line 408) | func (px *Paxos) Done(seq int) {
method UpdateDoneSeqs (line 424) | func (px *Paxos) UpdateDoneSeqs(args *SeqArgs, reply *SeqReply) error {
method Max (line 474) | func (px *Paxos) Max() int {
method Min (line 512) | func (px *Paxos) Min() int {
method Status (line 526) | func (px *Paxos) Status(seq int) (Fate, interface{}) {
method Kill (line 544) | func (px *Paxos) Kill() {
method isdead (line 554) | func (px *Paxos) isdead() bool {
method setunreliable (line 559) | func (px *Paxos) setunreliable(what bool) {
method isunreliable (line 567) | func (px *Paxos) isunreliable() bool {
type ProposerManager (line 68) | type ProposerManager struct
method RunProposer (line 172) | func (proposerMgr *ProposerManager) RunProposer(seq int, v interface{}) {
type AcceptorManager (line 78) | type AcceptorManager struct
method GetInstance (line 187) | func (acceptorMgr *AcceptorManager) GetInstance(seq int) *Acceptor {
type Acceptor (line 84) | type Acceptor struct
type Proposer (line 92) | type Proposer struct
method Propose (line 245) | func (proposer *Proposer) Propose() {
type PrepareArgs (line 99) | type PrepareArgs struct
type PrepareReply (line 104) | type PrepareReply struct
type AcceptArgs (line 111) | type AcceptArgs struct
type AcceptReply (line 117) | type AcceptReply struct
type DecideArgs (line 122) | type DecideArgs struct
type DecideReply (line 127) | type DecideReply
type SeqArgs (line 129) | type SeqArgs struct
type SeqReply (line 134) | type SeqReply
function call (line 152) | func call(srv string, name string, args interface{}, reply interface{}) ...
function Make (line 576) | func Make(peers []string, me int, rpcs *rpc.Server) *Paxos {
FILE: src/paxos/test_test.go
function randstring (line 14) | func randstring(n int) string {
function port (line 21) | func port(tag string, host int) string {
function ndecided (line 32) | func ndecided(t *testing.T, pxa []*Paxos, seq int) int {
function waitn (line 51) | func waitn(t *testing.T, pxa []*Paxos, seq int, wanted int) {
function waitmajority (line 68) | func waitmajority(t *testing.T, pxa []*Paxos, seq int) {
function checkmax (line 72) | func checkmax(t *testing.T, pxa []*Paxos, seq int, max int) {
function cleanup (line 80) | func cleanup(pxa []*Paxos) {
function noTestSpeed (line 88) | func noTestSpeed(t *testing.T) {
function TestBasic (line 114) | func TestBasic(t *testing.T) {
function TestDeaf (line 174) | func TestDeaf(t *testing.T) {
function TestForget (line 217) | func TestForget(t *testing.T) {
function TestManyForget (line 299) | func TestManyForget(t *testing.T) {
function TestForgetMem (line 371) | func TestForgetMem(t *testing.T) {
function TestDoneMax (line 459) | func TestDoneMax(t *testing.T) {
function TestRPCCount (line 503) | func TestRPCCount(t *testing.T) {
function TestMany (line 578) | func TestMany(t *testing.T) {
function TestOld (line 628) | func TestOld(t *testing.T) {
function TestManyUnreliable (line 665) | func TestManyUnreliable(t *testing.T) {
function pp (line 712) | func pp(tag string, src int, dst int) string {
function cleanpp (line 722) | func cleanpp(tag string, n int) {
function part (line 731) | func part(t *testing.T, tag string, npaxos int, p1 []int, p2 []int, p3 [...
function TestPartition (line 753) | func TestPartition(t *testing.T) {
function TestLots (line 852) | func TestLots(t *testing.T) {
FILE: src/pbservice/client.go
type Clerk (line 10) | type Clerk struct
method Get (line 78) | func (ck *Clerk) Get(key string) string {
method PutAppend (line 97) | func (ck *Clerk) PutAppend(key string, value string, op string) {
method Put (line 116) | func (ck *Clerk) Put(key string, value string) {
method Append (line 124) | func (ck *Clerk) Append(key string, value string) {
function nrand (line 19) | func nrand() int64 {
function MakeClerk (line 26) | func MakeClerk(vshost string, me string) *Clerk {
function call (line 54) | func call(srv string, rpcname string,
FILE: src/pbservice/common.go
constant OK (line 4) | OK = "OK"
constant ErrNoKey (line 5) | ErrNoKey = "ErrNoKey"
constant ErrWrongServer (line 6) | ErrWrongServer = "ErrWrongServer"
constant Put (line 10) | Put = "Put"
constant Append (line 11) | Append = "Append"
type Err (line 14) | type Err
type PutAppendArgs (line 17) | type PutAppendArgs struct
type PutAppendReply (line 30) | type PutAppendReply struct
type GetArgs (line 34) | type GetArgs struct
type GetReply (line 39) | type GetReply struct
type BackupArgs (line 44) | type BackupArgs struct
type BackupReply (line 50) | type BackupReply struct
FILE: src/pbservice/server.go
constant RoleNull (line 17) | RoleNull = 0
constant RolePrimary (line 18) | RolePrimary = 1
constant RoleBackup (line 19) | RoleBackup = 2
type PBServer (line 22) | type PBServer struct
method LocalOp (line 42) | func (pb *PBServer) LocalOp(args *PutAppendArgs) {
method BackupOp (line 53) | func (pb *PBServer) BackupOp(args *PutAppendArgs, reply *PutAppendRepl...
method LocalPut (line 74) | func (pb *PBServer) LocalPut(key string, value string) {
method LocalAppend (line 78) | func (pb *PBServer) LocalAppend(key string, value string) {
method Backup (line 86) | func (pb *PBServer) Backup(args *BackupArgs, reply *BackupReply) error {
method Get (line 97) | func (pb *PBServer) Get(args *GetArgs, reply *GetReply) error {
method PutAppend (line 128) | func (pb *PBServer) PutAppend(args *PutAppendArgs, reply *PutAppendRep...
method Replicate (line 182) | func (pb *PBServer) Replicate() {
method tick (line 208) | func (pb *PBServer) tick() {
method kill (line 256) | func (pb *PBServer) kill() {
method isdead (line 262) | func (pb *PBServer) isdead() bool {
method setunreliable (line 267) | func (pb *PBServer) setunreliable(what bool) {
method isunreliable (line 275) | func (pb *PBServer) isunreliable() bool {
function StartServer (line 279) | func StartServer(vshost string, me string) *PBServer {
FILE: src/pbservice/test.go
function check (line 18) | func check(ck *Clerk, key string, value string) {
function port (line 25) | func port(tag string, host int) string {
function TestBasicFail (line 36) | func TestBasicFail(t *testing.T) {
function TestAtMostOnce (line 178) | func TestAtMostOnce(t *testing.T) {
function TestFailPut (line 232) | func TestFailPut(t *testing.T) {
function TestConcurrentSame (line 321) | func TestConcurrentSame(t *testing.T) {
function checkAppends (line 419) | func checkAppends(t *testing.T, v string, counts []int) {
function TestConcurrentSameAppend (line 444) | func TestConcurrentSameAppend(t *testing.T) {
function TestConcurrentSameUnreliable (line 551) | func TestConcurrentSameUnreliable(t *testing.T) {
function TestRepeatedCrash (line 666) | func TestRepeatedCrash(t *testing.T) {
function TestRepeatedCrashUnreliable (line 778) | func TestRepeatedCrashUnreliable(t *testing.T) {
function proxy (line 890) | func proxy(t *testing.T, port string, delay *int32) {
function TestPartition1 (line 950) | func TestPartition1(t *testing.T) {
function TestPartition2 (line 1044) | func TestPartition2(t *testing.T) {
FILE: src/raft/config.go
function randstring (line 23) | func randstring(n int) string {
function makeSeed (line 30) | func makeSeed() int64 {
type config (line 37) | type config struct
method crash1 (line 98) | func (cfg *config) crash1(i int) {
method start1 (line 135) | func (cfg *config) start1(i int) {
method checkTimeout (line 217) | func (cfg *config) checkTimeout() {
method cleanup (line 224) | func (cfg *config) cleanup() {
method connect (line 235) | func (cfg *config) connect(i int) {
method disconnect (line 258) | func (cfg *config) disconnect(i int) {
method rpcCount (line 280) | func (cfg *config) rpcCount(server int) int {
method rpcTotal (line 284) | func (cfg *config) rpcTotal() int {
method setunreliable (line 288) | func (cfg *config) setunreliable(unrel bool) {
method setlongreordering (line 292) | func (cfg *config) setlongreordering(longrel bool) {
method checkOneLeader (line 298) | func (cfg *config) checkOneLeader() int {
method checkTerms (line 331) | func (cfg *config) checkTerms() int {
method checkNoLeader (line 347) | func (cfg *config) checkNoLeader() {
method nCommitted (line 359) | func (cfg *config) nCommitted(index int) (int, interface{}) {
method wait (line 385) | func (cfg *config) wait(index int, n int, startTerm int) interface{} {
method one (line 426) | func (cfg *config) one(cmd int, expectedServers int, retry bool) int {
method begin (line 478) | func (cfg *config) begin(description string) {
method end (line 490) | func (cfg *config) end() {
function make_config (line 59) | func make_config(t *testing.T, n int, unreliable bool) *config {
FILE: src/raft/persister.go
type Persister (line 14) | type Persister struct
method Copy (line 24) | func (ps *Persister) Copy() *Persister {
method SaveRaftState (line 33) | func (ps *Persister) SaveRaftState(state []byte) {
method ReadRaftState (line 39) | func (ps *Persister) ReadRaftState() []byte {
method RaftStateSize (line 45) | func (ps *Persister) RaftStateSize() int {
method SaveStateAndSnapshot (line 53) | func (ps *Persister) SaveStateAndSnapshot(state []byte, snapshot []byt...
method ReadSnapshot (line 60) | func (ps *Persister) ReadSnapshot() []byte {
method SnapshotSize (line 66) | func (ps *Persister) SnapshotSize() int {
function MakePersister (line 20) | func MakePersister() *Persister {
FILE: src/raft/raft.go
type ApplyMsg (line 39) | type ApplyMsg struct
type Raft (line 48) | type Raft struct
method GetState (line 62) | func (rf *Raft) GetState() (int, bool) {
method persist (line 76) | func (rf *Raft) persist() {
method readPersist (line 91) | func (rf *Raft) readPersist(data []byte) {
method RequestVote (line 132) | func (rf *Raft) RequestVote(args *RequestVoteArgs, reply *RequestVoteR...
method sendRequestVote (line 165) | func (rf *Raft) sendRequestVote(server int, args *RequestVoteArgs, rep...
method Start (line 185) | func (rf *Raft) Start(command interface{}) (int, int, bool) {
method Kill (line 202) | func (rf *Raft) Kill() {
type RequestVoteArgs (line 117) | type RequestVoteArgs struct
type RequestVoteReply (line 125) | type RequestVoteReply struct
function Make (line 217) | func Make(peers []*labrpc.ClientEnd, me int,
FILE: src/raft/test_test.go
constant RaftElectionTimeout (line 20) | RaftElectionTimeout = 1000 * time.Millisecond
function TestInitialElection2A (line 22) | func TestInitialElection2A(t *testing.T) {
function TestReElection2A (line 50) | func TestReElection2A(t *testing.T) {
function TestBasicAgree2B (line 86) | func TestBasicAgree2B(t *testing.T) {
function TestFailAgree2B (line 109) | func TestFailAgree2B(t *testing.T) {
function TestFailNoAgree2B (line 140) | func TestFailNoAgree2B(t *testing.T) {
function TestConcurrentStarts2B (line 191) | func TestConcurrentStarts2B(t *testing.T) {
function TestRejoin2B (line 292) | func TestRejoin2B(t *testing.T) {
function TestBackup2B (line 330) | func TestBackup2B(t *testing.T) {
function TestCount2B (line 402) | func TestCount2B(t *testing.T) {
function TestPersist12C (line 512) | func TestPersist12C(t *testing.T) {
function TestPersist22C (line 558) | func TestPersist22C(t *testing.T) {
function TestPersist32C (line 604) | func TestPersist32C(t *testing.T) {
function TestFigure82C (line 644) | func TestFigure82C(t *testing.T) {
function TestUnreliableAgree2C (line 700) | func TestUnreliableAgree2C(t *testing.T) {
function TestFigure8Unreliable2C (line 729) | func TestFigure8Unreliable2C(t *testing.T) {
function internalChurn (line 784) | func internalChurn(t *testing.T, unreliable bool) {
function TestReliableChurn2C (line 929) | func TestReliableChurn2C(t *testing.T) {
function TestUnreliableChurn2C (line 933) | func TestUnreliableChurn2C(t *testing.T) {
FILE: src/raft/util.go
constant Debug (line 6) | Debug = 0
function DPrintf (line 8) | func DPrintf(format string, a ...interface{}) (n int, err error) {
FILE: src/shardkv/client.go
function key2shard (line 22) | func key2shard(key string) int {
function nrand (line 31) | func nrand() int64 {
type Clerk (line 38) | type Clerk struct
method Get (line 68) | func (ck *Clerk) Get(key string) string {
method PutAppend (line 101) | func (ck *Clerk) PutAppend(key string, value string, op string) {
method Put (line 130) | func (ck *Clerk) Put(key string, value string) {
method Append (line 133) | func (ck *Clerk) Append(key string, value string) {
function MakeClerk (line 54) | func MakeClerk(masters []*labrpc.ClientEnd, make_end func(string) *labrp...
FILE: src/shardkv/common.go
constant OK (line 13) | OK = "OK"
constant ErrNoKey (line 14) | ErrNoKey = "ErrNoKey"
constant ErrWrongGroup (line 15) | ErrWrongGroup = "ErrWrongGroup"
type Err (line 18) | type Err
type PutAppendArgs (line 21) | type PutAppendArgs struct
type PutAppendReply (line 31) | type PutAppendReply struct
type GetArgs (line 36) | type GetArgs struct
type GetReply (line 41) | type GetReply struct
FILE: src/shardkv/config.go
function randstring (line 20) | func randstring(n int) string {
function makeSeed (line 27) | func makeSeed() int64 {
function random_handles (line 35) | func random_handles(kvh []*labrpc.ClientEnd) []*labrpc.ClientEnd {
type group (line 45) | type group struct
type config (line 53) | type config struct
method checkTimeout (line 72) | func (cfg *config) checkTimeout() {
method cleanup (line 79) | func (cfg *config) cleanup() {
method checklogs (line 88) | func (cfg *config) checklogs() {
method mastername (line 105) | func (cfg *config) mastername(i int) string {
method servername (line 111) | func (cfg *config) servername(gid int, i int) string {
method makeClient (line 115) | func (cfg *config) makeClient() *Clerk {
method deleteClient (line 141) | func (cfg *config) deleteClient(ck *Clerk) {
method ShutdownServer (line 153) | func (cfg *config) ShutdownServer(gi int, i int) {
method ShutdownGroup (line 194) | func (cfg *config) ShutdownGroup(gi int) {
method StartServer (line 201) | func (cfg *config) StartServer(gi int, i int) {
method StartGroup (line 261) | func (cfg *config) StartGroup(gi int) {
method StartMasterServer (line 267) | func (cfg *config) StartMasterServer(i int) {
method shardclerk (line 289) | func (cfg *config) shardclerk() *shardmaster.Clerk {
method join (line 303) | func (cfg *config) join(gi int) {
method joinm (line 307) | func (cfg *config) joinm(gis []int) {
method leave (line 321) | func (cfg *config) leave(gi int) {
method leavem (line 325) | func (cfg *config) leavem(gis []int) {
function make_config (line 335) | func make_config(t *testing.T, n int, unreliable bool, maxraftstate int)...
FILE: src/shardkv/server.go
type Op (line 12) | type Op struct
type ShardKV (line 18) | type ShardKV struct
method Get (line 32) | func (kv *ShardKV) Get(args *GetArgs, reply *GetReply) {
method PutAppend (line 36) | func (kv *ShardKV) PutAppend(args *PutAppendArgs, reply *PutAppendRepl...
method Kill (line 46) | func (kv *ShardKV) Kill() {
function StartServer (line 80) | func StartServer(servers []*labrpc.ClientEnd, me int, persister *raft.Pe...
FILE: src/shardkv/test_test.go
constant linearizabilityCheckTimeout (line 13) | linearizabilityCheckTimeout = 1 * time.Second
function check (line 15) | func check(t *testing.T, ck *Clerk, key string, value string) {
function TestStaticShards (line 25) | func TestStaticShards(t *testing.T) {
function TestJoinLeave (line 89) | func TestJoinLeave(t *testing.T) {
function TestSnapshot (line 142) | func TestSnapshot(t *testing.T) {
function TestMissChange (line 210) | func TestMissChange(t *testing.T) {
function TestConcurrent1 (line 296) | func TestConcurrent1(t *testing.T) {
function TestConcurrent2 (line 377) | func TestConcurrent2(t *testing.T) {
function TestUnreliable1 (line 448) | func TestUnreliable1(t *testing.T) {
function TestUnreliable2 (line 490) | func TestUnreliable2(t *testing.T) {
function TestUnreliable3 (line 553) | func TestUnreliable3(t *testing.T) {
function TestChallenge1Delete (line 653) | func TestChallenge1Delete(t *testing.T) {
function TestChallenge1Concurrent (line 734) | func TestChallenge1Concurrent(t *testing.T) {
function TestChallenge2Unaffected (line 807) | func TestChallenge2Unaffected(t *testing.T) {
function TestChallenge2Partial (line 877) | func TestChallenge2Partial(t *testing.T) {
FILE: src/shardmaster/client.go
type Clerk (line 12) | type Clerk struct
method Query (line 56) | func (ck *Clerk) Query(num int) Config {
method Join (line 72) | func (ck *Clerk) Join(gid int64, servers []string) {
method Leave (line 89) | func (ck *Clerk) Leave(gid int64) {
method Move (line 105) | func (ck *Clerk) Move(shard int, gid int64) {
function MakeClerk (line 16) | func MakeClerk(servers []string) *Clerk {
function call (line 39) | func call(srv string, rpcname string,
FILE: src/shardmaster/common.go
constant NShards (line 21) | NShards = 10
type Config (line 25) | type Config struct
constant OK (line 32) | OK = "OK"
type Err (line 35) | type Err
type JoinArgs (line 37) | type JoinArgs struct
type JoinReply (line 41) | type JoinReply struct
type LeaveArgs (line 46) | type LeaveArgs struct
type LeaveReply (line 50) | type LeaveReply struct
type MoveArgs (line 55) | type MoveArgs struct
type MoveReply (line 60) | type MoveReply struct
type QueryArgs (line 65) | type QueryArgs struct
type QueryReply (line 69) | type QueryReply struct
FILE: src/shardmaster/config.go
function randstring (line 16) | func randstring(n int) string {
function random_handles (line 24) | func random_handles(kvh []*labrpc.ClientEnd) []*labrpc.ClientEnd {
type config (line 34) | type config struct
method checkTimeout (line 47) | func (cfg *config) checkTimeout() {
method cleanup (line 54) | func (cfg *config) cleanup() {
method LogSize (line 67) | func (cfg *config) LogSize() int {
method connectUnlocked (line 80) | func (cfg *config) connectUnlocked(i int, to []int) {
method connect (line 96) | func (cfg *config) connect(i int, to []int) {
method disconnectUnlocked (line 104) | func (cfg *config) disconnectUnlocked(i int, from []int) {
method disconnect (line 124) | func (cfg *config) disconnect(i int, from []int) {
method All (line 130) | func (cfg *config) All() []int {
method ConnectAll (line 138) | func (cfg *config) ConnectAll() {
method partition (line 147) | func (cfg *config) partition(p1 []int, p2 []int) {
method makeClient (line 164) | func (cfg *config) makeClient(to []int) *Clerk {
method deleteClient (line 184) | func (cfg *config) deleteClient(ck *Clerk) {
method ConnectClientUnlocked (line 196) | func (cfg *config) ConnectClientUnlocked(ck *Clerk, to []int) {
method ConnectClient (line 205) | func (cfg *config) ConnectClient(ck *Clerk, to []int) {
method DisconnectClientUnlocked (line 212) | func (cfg *config) DisconnectClientUnlocked(ck *Clerk, from []int) {
method DisconnectClient (line 221) | func (cfg *config) DisconnectClient(ck *Clerk, from []int) {
method ShutdownServer (line 228) | func (cfg *config) ShutdownServer(i int) {
method StartServer (line 260) | func (cfg *config) StartServer(i int) {
method Leader (line 299) | func (cfg *config) Leader() (bool, int) {
method make_partition (line 313) | func (cfg *config) make_partition() ([]int, []int) {
function make_config (line 332) | func make_config(t *testing.T, n int, unreliable bool) *config {
FILE: src/shardmaster/server.go
type ShardMaster (line 20) | type ShardMaster struct
method join (line 73) | func (sm *ShardMaster) join(op *Op) {
method leave (line 160) | func (sm *ShardMaster) leave(op *Op) {
method move (line 242) | func (sm *ShardMaster) move(op *Op) {
method query (line 272) | func (sm *ShardMaster) query(op *Op) interface{} {
method apply (line 314) | func (sm *ShardMaster) apply(op *Op) {
method TryDecide (line 331) | func (sm *ShardMaster) TryDecide(op Op) error {
method Join (line 372) | func (sm *ShardMaster) Join(args *JoinArgs, reply *JoinReply) error {
method Leave (line 385) | func (sm *ShardMaster) Leave(args *LeaveArgs, reply *LeaveReply) error {
method Move (line 397) | func (sm *ShardMaster) Move(args *MoveArgs, reply *MoveReply) error {
method Query (line 409) | func (sm *ShardMaster) Query(args *QueryArgs, reply *QueryReply) error {
method Kill (line 422) | func (sm *ShardMaster) Kill() {
method isdead (line 429) | func (sm *ShardMaster) isdead() bool {
method setunreliable (line 434) | func (sm *ShardMaster) setunreliable(what bool) {
method isunreliable (line 442) | func (sm *ShardMaster) isunreliable() bool {
type GroupShardsNumSlice (line 38) | type GroupShardsNumSlice
method Len (line 50) | func (s GroupShardsNumSlice) Len() int { return len(s) }
method Swap (line 51) | func (s GroupShardsNumSlice) Swap(i, j int) { s[i], s[j] = s[j], ...
method Less (line 52) | func (s GroupShardsNumSlice) Less(i, j int) bool { return s[i].num < s...
type GroupShardsNum (line 40) | type GroupShardsNum struct
type ShardGid (line 45) | type ShardGid struct
type Op (line 54) | type Op struct
constant OP_JOIN (line 66) | OP_JOIN = "join"
constant OP_LEAVE (line 67) | OP_LEAVE = "leave"
constant OP_MOVE (line 68) | OP_MOVE = "move"
constant OP_QUERY (line 69) | OP_QUERY = "query"
function getGroupsSize (line 283) | func getGroupsSize(nShards, groupsNum int) []int {
function copyCfg (line 298) | func copyCfg(lastCfg Config) Config {
function nrand (line 307) | func nrand() int64 {
function StartServer (line 452) | func StartServer(servers []string, me int) *ShardMaster {
FILE: src/shardmaster/test_test.go
function check (line 11) | func check(t *testing.T, groups []int, ck *Clerk) {
function check_same_config (line 55) | func check_same_config(t *testing.T, c1 Config, c2 Config) {
function TestBasic (line 80) | func TestBasic(t *testing.T) {
function TestMulti (line 252) | func TestMulti(t *testing.T) {
FILE: src/viewservice/client.go
type Clerk (line 10) | type Clerk struct
method Ping (line 56) | func (ck *Clerk) Ping(viewnum uint) (View, error) {
method Get (line 72) | func (ck *Clerk) Get() (View, bool) {
method Primary (line 82) | func (ck *Clerk) Primary() string {
function MakeClerk (line 15) | func MakeClerk(me string, server string) *Clerk {
function call (line 39) | func call(srv string, rpcname string,
FILE: src/viewservice/common.go
type View (line 36) | type View struct
constant PingInterval (line 44) | PingInterval = time.Millisecond * 100
constant DeadPings (line 48) | DeadPings = 5
type PingArgs (line 60) | type PingArgs struct
type PingReply (line 65) | type PingReply struct
type GetArgs (line 75) | type GetArgs struct
type GetReply (line 78) | type GetReply struct
FILE: src/viewservice/server.go
type ViewServer (line 12) | type ViewServer struct
method Ping (line 29) | func (vs *ViewServer) Ping(args *PingArgs, reply *PingReply) error {
method Get (line 82) | func (vs *ViewServer) Get(args *GetArgs, reply *GetReply) error {
method tick (line 93) | func (vs *ViewServer) tick() {
method Kill (line 147) | func (vs *ViewServer) Kill() {
method isdead (line 155) | func (vs *ViewServer) isdead() bool {
method GetRPCCount (line 160) | func (vs *ViewServer) GetRPCCount() int32 {
function StartServer (line 164) | func StartServer(me string) *ViewServer {
FILE: src/viewservice/test.go
function check (line 11) | func check(t *testing.T, ck *Clerk, p string, b string, n uint) {
function port (line 27) | func port(suffix string) string {
function Test1 (line 37) | func Test1(t *testing.T) {
Condensed preview — 94 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (4,610K chars).
[
{
"path": "Makefile",
"chars": 2001,
"preview": "# This is the Makefile helping you submit the labs. \n# Just create 6.824/api.key with your API key in it, \n# and submit"
},
{
"path": "README.md",
"chars": 1204,
"preview": "# mit6.824 Distributed Systems\n\nSpring 2020. Implemented with Go 1.10.\n\n[https://pdos.csail.mit.edu/6.824/schedule.html]"
},
{
"path": "lab/lab1 MapReduce.md",
"chars": 8831,
"preview": "# 6.824 Lab 1: MapReduce\n\nIn this lab you'll build a **MapReduce library** as an introduction to programming in Go and t"
},
{
"path": "lab/lab2 Raft.md",
"chars": 4004,
"preview": "# 6.824 Lab 2: Raft\n\n> 6.824 - Spring 2018\n\n### Introduction\n\nThis is the first in a series of labs in which you'll buil"
},
{
"path": "lab/lab3 Paxos-based KV Service.md",
"chars": 19020,
"preview": "# 6.824 Lab 3: Paxos-based Key/Value Service\n>Part A Due: Fri Feb 27 11:59pm\n> Part B Due: Fri Mar 13 11:59pm\n\n### Intro"
},
{
"path": "lab/lab4 shared key value service.md",
"chars": 4716,
"preview": "# 6.824 Lab 4: Sharded Key/Value Service\n\n\n### Introduction\n\n\nIn this lab you'll build a **key/value storage system** th"
},
{
"path": "lecture/l01 mapreduce/l01.txt",
"chars": 12932,
"preview": "6.824 2018 Lecture 1: Introduction\n\n6.824: Distributed Systems Engineering\n\nWhat is a distributed system?\n multiple coo"
},
{
"path": "lecture/l02 PRC_threads_crawler_kv/PRC_Threads.md",
"chars": 10462,
"preview": "# 6.824 2018 Lecture 2: Infrastructure: RPC and threads\n\nMost commonly-asked question: \n\n### Why Go?\n 6.824 used C++ fo"
},
{
"path": "lecture/l02 PRC_threads_crawler_kv/crawler.go",
"chars": 3229,
"preview": "package main\n\nimport (\n\t\"fmt\"\n\t\"sync\"\n)\n\n//\n// Several solutions to the crawler exercise from the Go tutorial\n// https:/"
},
{
"path": "lecture/l02 PRC_threads_crawler_kv/kv.go",
"chars": 1936,
"preview": "package main\n\nimport (\n\t\"fmt\"\n\t\"log\"\n\t\"net\"\n\t\"net/rpc\"\n\t\"sync\"\n)\n\n//\n// RPC request/reply definitions\n//\n\nconst (\n\tOK "
},
{
"path": "lecture/l03 GFS/GFS.md",
"chars": 9871,
"preview": "# 6.824 2018 Lecture 3: GFS\n\n[The Google File System - Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003](ht"
},
{
"path": "lecture/l04 more_primary_backup/FDS.md",
"chars": 8817,
"preview": "# 6.824 2014 Lecture 4: FDS Case Study\n\n[Flat Datacenter Storage\nNightingale, Elson, Fan, Hofmann, Howell, Suzue\nOSDI 20"
},
{
"path": "lecture/l06 fault tolerance raft/raft.md",
"chars": 8787,
"preview": "# 6.824 2020 Lecture 6: Raft (1)\n\n> this lecture\n> today: Raft elections and log handling(Lab 2A, 2B)\n> next: Raft pers"
},
{
"path": "lecture/l07 fault tolerance raft2/raft2.md",
"chars": 18308,
"preview": "#### 6.824 2020 Lecture 7: Raft (2)\n\n*** topic: the Raft log (Lab 2B)\n\nas long as the leader stays up:\n clients only in"
},
{
"path": "lecture/l08 zookeeper/zookeeper.md",
"chars": 10621,
"preview": "# 6.824 2020 Lecture 8: Zookeeper Case Study\n\nReading: \"ZooKeeper: wait-free coordination for internet-scale systems\", P"
},
{
"path": "src/diskv/client.go",
"chars": 3545,
"preview": "package diskv\n\nimport \"shardmaster\"\nimport \"net/rpc\"\nimport \"time\"\nimport \"sync\"\nimport \"fmt\"\nimport \"crypto/rand\"\nimpor"
},
{
"path": "src/diskv/common.go",
"chars": 771,
"preview": "package diskv\n\n//\n// Sharded key/value server.\n// Lots of replica groups, each running op-at-a-time paxos.\n// Shardmaste"
},
{
"path": "src/diskv/dist_test.go",
"chars": 723,
"preview": "package shardkv\n\nimport (\n\t\"fmt\"\n\t\"os\"\n\t\"shardmaster\"\n\t\"strconv\"\n)\n\nfunc port(tag string, host int) string {\n\ts := \"/var"
},
{
"path": "src/diskv/server.go",
"chars": 6496,
"preview": "package diskv\n\nimport \"net\"\nimport \"fmt\"\nimport \"net/rpc\"\nimport \"log\"\nimport \"time\"\nimport \"paxos\"\nimport \"sync\"\nimport"
},
{
"path": "src/diskv/test.go",
"chars": 25486,
"preview": "package diskv\n\nimport \"testing\"\nimport \"shardmaster\"\nimport \"runtime\"\nimport \"strconv\"\nimport \"strings\"\nimport \"os\"\nimpo"
},
{
"path": "src/kvpaxos/client.go",
"chars": 2377,
"preview": "package kvpaxos\n\nimport \"net/rpc\"\nimport \"crypto/rand\"\nimport \"math/big\"\n\nimport \"fmt\"\n\ntype Clerk struct {\n\tservers []s"
},
{
"path": "src/kvpaxos/common.go",
"chars": 639,
"preview": "package kvpaxos\n\nconst (\n\tOK = \"OK\"\n\tErrNoKey = \"ErrNoKey\"\n\tErrPending = \"ErrPending\"\n\tErrForgotten = \"E"
},
{
"path": "src/kvpaxos/server.go",
"chars": 5093,
"preview": "package kvpaxos\n\nimport \"net\"\nimport \"fmt\"\nimport \"net/rpc\"\nimport \"log\"\nimport \"paxos\"\nimport \"sync\"\nimport \"sync/atomi"
},
{
"path": "src/kvpaxos/test.go",
"chars": 15208,
"preview": "package kvpaxos\n\nimport \"testing\"\nimport \"runtime\"\nimport \"strconv\"\nimport \"os\"\nimport \"time\"\nimport \"fmt\"\nimport \"math/"
},
{
"path": "src/kvraft/client.go",
"chars": 1591,
"preview": "package raftkv\n\nimport \"labrpc\"\nimport \"crypto/rand\"\nimport \"math/big\"\n\n\ntype Clerk struct {\n\tservers []*labrpc.ClientEn"
},
{
"path": "src/kvraft/common.go",
"chars": 540,
"preview": "package raftkv\n\nconst (\n\tOK = \"OK\"\n\tErrNoKey = \"ErrNoKey\"\n)\n\ntype Err string\n\n// Put or Append\ntype PutAppendArgs "
},
{
"path": "src/kvraft/config.go",
"chars": 10106,
"preview": "package raftkv\n\nimport \"labrpc\"\nimport \"testing\"\nimport \"os\"\n\n// import \"log\"\nimport crand \"crypto/rand\"\nimport \"math/bi"
},
{
"path": "src/kvraft/server.go",
"chars": 2192,
"preview": "package raftkv\n\nimport (\n\t\"labgob\"\n\t\"labrpc\"\n\t\"log\"\n\t\"raft\"\n\t\"sync\"\n)\n\nconst Debug = 0\n\nfunc DPrintf(format string, a .."
},
{
"path": "src/kvraft/test.go",
"chars": 19820,
"preview": "package raftkv\n\nimport \"linearizability\"\n\nimport \"testing\"\nimport \"strconv\"\nimport \"time\"\nimport \"math/rand\"\nimport \"log"
},
{
"path": "src/labgob/labgob.go",
"chars": 3822,
"preview": "package labgob\n\n//\n// trying to send non-capitalized fields over RPC produces a range of\n// misbehavior, including both "
},
{
"path": "src/labgob/test_test.go",
"chars": 2999,
"preview": "package labgob\n\nimport \"testing\"\n\nimport \"bytes\"\n\ntype T1 struct {\n\tT1int0 int\n\tT1int1 int\n\tT1string0 string\n\tT1st"
},
{
"path": "src/labrpc/labrpc.go",
"chars": 13252,
"preview": "package labrpc\n\n//\n// channel-based RPC, for 824 labs.\n//\n// simulates a network that can lose requests, lose replies,\n/"
},
{
"path": "src/labrpc/test_test.go",
"chars": 9822,
"preview": "package labrpc\n\nimport \"testing\"\nimport \"strconv\"\nimport \"sync\"\nimport \"runtime\"\nimport \"time\"\nimport \"fmt\"\n\ntype JunkAr"
},
{
"path": "src/linearizability/bitset.go",
"chars": 1425,
"preview": "package linearizability\n\ntype bitset []uint64\n\n// data layout:\n// bits 0-63 are in data[0], the next are in data[1], etc"
},
{
"path": "src/linearizability/linearizability.go",
"chars": 6643,
"preview": "package linearizability\n\nimport (\n\t\"sort\"\n\t\"sync/atomic\"\n\t\"time\"\n)\n\ntype entryKind bool\n\nconst (\n\tcallEntry entryKind "
},
{
"path": "src/linearizability/model.go",
"chars": 1451,
"preview": "package linearizability\n\ntype Operation struct {\n\tInput interface{}\n\tCall int64 // invocation time\n\tOutput interface{"
},
{
"path": "src/linearizability/models.go",
"chars": 1339,
"preview": "package linearizability\n\n// kv model\n\ntype KvInput struct {\n\tOp uint8 // 0 => get, 1 => put, 2 => append\n\tKey string\n\tVa"
},
{
"path": "src/main/diskvd.go",
"chars": 1790,
"preview": "package main\n\n//\n// start a diskvd server. it's a member of some replica\n// group, which has other members, and it needs"
},
{
"path": "src/main/ii.go",
"chars": 1425,
"preview": "package main\n\nimport \"os\"\nimport \"fmt\"\nimport \"mapreduce\"\n\n// The mapping function is called once for each piece of the "
},
{
"path": "src/main/lockc.go",
"chars": 500,
"preview": "package main\n\n//\n// see comments in lockd.go\n//\n\nimport \"lockservice\"\nimport \"os\"\nimport \"fmt\"\n\nfunc usage() {\n\tfmt.Prin"
},
{
"path": "src/main/lockd.go",
"chars": 657,
"preview": "package main\n\n// export GOPATH=~/6.824\n// go build lockd.go\n// go build lockc.go\n// ./lockd -p a b &\n// ./lockd -b a b &"
},
{
"path": "src/main/mr-challenge.txt",
"chars": 1667,
"preview": "www: 8 pg-being_ernest.txt,pg-dorian_gray.txt,pg-frankenstein.txt,pg-grimm.txt,pg-huckleberry_finn.txt,pg-metamorphosis."
},
{
"path": "src/main/mr-testout.txt",
"chars": 99,
"preview": "that: 7871\nit: 7987\nin: 8415\nwas: 8578\na: 13382\nof: 13536\nI: 14296\nto: 16079\nand: 23612\nthe: 29748\n"
},
{
"path": "src/main/pbc.go",
"chars": 875,
"preview": "package main\n\n//\n// pbservice client application\n//\n// export GOPATH=~/6.824\n// go build viewd.go\n// go build pbd.go\n// "
},
{
"path": "src/main/pbd.go",
"chars": 300,
"preview": "package main\n\n//\n// see directions in pbc.go\n//\n\nimport \"time\"\nimport \"pbservice\"\nimport \"os\"\nimport \"fmt\"\n\nfunc main() "
},
{
"path": "src/main/pg-being_ernest.txt",
"chars": 138885,
"preview": "The Project Gutenberg eBook, The Importance of Being Earnest, by Oscar\nWilde\n\n\nThis eBook is for the use of anyone anywh"
},
{
"path": "src/main/pg-dorian_gray.txt",
"chars": 453168,
"preview": "The Project Gutenberg EBook of The Picture of Dorian Gray, by Oscar Wilde\n\nThis eBook is for the use of anyone anywhere "
},
{
"path": "src/main/pg-frankenstein.txt",
"chars": 441033,
"preview": "Project Gutenberg's Frankenstein, by Mary Wollstonecraft (Godwin) Shelley\n\nThis eBook is for the use of anyone anywhere "
},
{
"path": "src/main/pg-grimm.txt",
"chars": 540174,
"preview": "The Project Gutenberg EBook of Grimms' Fairy Tales, by The Brothers Grimm\n\nThis eBook is for the use of anyone anywhere "
},
{
"path": "src/main/pg-huckleberry_finn.txt",
"chars": 594262,
"preview": "\n\nThe Project Gutenberg EBook of Adventures of Huckleberry Finn, Complete\nby Mark Twain (Samuel Clemens)\n\nThis eBook is "
},
{
"path": "src/main/pg-metamorphosis.txt",
"chars": 139054,
"preview": "The Project Gutenberg EBook of Metamorphosis, by Franz Kafka\nTranslated by David Wyllie.\n\nThis eBook is for the use of a"
},
{
"path": "src/main/pg-sherlock_holmes.txt",
"chars": 581863,
"preview": "Project Gutenberg's The Adventures of Sherlock Holmes, by Arthur Conan Doyle\n\nThis eBook is for the use of anyone anywhe"
},
{
"path": "src/main/pg-tom_sawyer.txt",
"chars": 412665,
"preview": "\nThe Project Gutenberg EBook of The Adventures of Tom Sawyer, Complete by\nMark Twain (Samuel Clemens)\n\nThis eBook is for"
},
{
"path": "src/main/test-ii.sh",
"chars": 468,
"preview": "#!/bin/bash\ngo run ii.go master sequential pg-*.txt\n\n# cause sort to be case sensitive.\n# on Ubuntu (Athena) it's otherw"
},
{
"path": "src/main/test-mr.sh",
"chars": 491,
"preview": "#!/bin/bash\nhere=$(dirname \"$0\")\n[[ \"$here\" = /* ]] || here=\"$PWD/$here\"\nexport GOPATH=\"$here/../../\"\necho \"\"\necho \"==> "
},
{
"path": "src/main/test-wc.sh",
"chars": 326,
"preview": "#!/bin/bash\ngo run wc.go master sequential pg-*.txt\nsort -n -k2 mrtmp.wcseq | tail -10 | diff - mr-testout.txt > diff.ou"
},
{
"path": "src/main/viewd.go",
"chars": 283,
"preview": "package main\n\n//\n// see directions in pbc.go\n//\n\nimport \"time\"\nimport \"viewservice\"\nimport \"os\"\nimport \"fmt\"\n\nfunc main("
},
{
"path": "src/main/wc.go",
"chars": 1385,
"preview": "package main\n\nimport (\n\t\"fmt\"\n\t\"mapreduce\"\n\t\"os\"\n)\n\n//\n// The map function is called once for each file of input. The fi"
},
{
"path": "src/mapreduce/824-mrinput-0.txt",
"chars": 588890,
"preview": "0\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n15\n16\n17\n18\n19\n20\n21\n22\n23\n24\n25\n26\n27\n28\n29\n30\n31\n32\n33\n34\n35\n36\n37\n38\n39\n40\n41\n42\n4"
},
{
"path": "src/mapreduce/common.go",
"chars": 1108,
"preview": "package mapreduce\n\nimport (\n\t\"fmt\"\n\t\"strconv\"\n)\n\n// Debugging enabled?\nconst debugEnabled = false\n\n// debug() will only "
},
{
"path": "src/mapreduce/common_map.go",
"chars": 2307,
"preview": "package mapreduce\n\nimport (\n\t\"hash/fnv\"\n)\n\nfunc doMap(\n\tjobName string, // the name of the MapReduce job\n\tmapTask int, /"
},
{
"path": "src/mapreduce/common_reduce.go",
"chars": 1797,
"preview": "package mapreduce\n\nfunc doReduce(\n\tjobName string, // the name of the whole MapReduce job\n\treduceTask int, // which redu"
},
{
"path": "src/mapreduce/common_rpc.go",
"chars": 1929,
"preview": "package mapreduce\n\nimport (\n\t\"fmt\"\n\t\"net/rpc\"\n)\n\n// What follows are RPC types and methods.\n// Field names must start wi"
},
{
"path": "src/mapreduce/master.go",
"chars": 4906,
"preview": "package mapreduce\n\n//\n// Please do not modify this file.\n//\n\nimport (\n\t\"fmt\"\n\t\"net\"\n\t\"sync\"\n)\n\n// Master holds all the s"
},
{
"path": "src/mapreduce/master_rpc.go",
"chars": 1571,
"preview": "package mapreduce\n\nimport (\n\t\"fmt\"\n\t\"log\"\n\t\"net\"\n\t\"net/rpc\"\n\t\"os\"\n)\n\n// Shutdown is an RPC method that shuts down the Ma"
},
{
"path": "src/mapreduce/master_splitmerge.go",
"chars": 1447,
"preview": "package mapreduce\n\nimport (\n\t\"bufio\"\n\t\"encoding/json\"\n\t\"fmt\"\n\t\"log\"\n\t\"os\"\n\t\"sort\"\n)\n\n// merge combines the results of th"
},
{
"path": "src/mapreduce/schedule.go",
"chars": 1158,
"preview": "package mapreduce\n\nimport \"fmt\"\n\n//\n// schedule() starts and waits for all tasks in the given phase (mapPhase\n// or redu"
},
{
"path": "src/mapreduce/test_test.go",
"chars": 4756,
"preview": "package mapreduce\n\nimport (\n\t\"fmt\"\n\t\"testing\"\n\t\"time\"\n\n\t\"bufio\"\n\t\"log\"\n\t\"os\"\n\t\"sort\"\n\t\"strconv\"\n\t\"strings\"\n)\n\nconst (\n\tn"
},
{
"path": "src/mapreduce/worker.go",
"chars": 3722,
"preview": "package mapreduce\n\n//\n// Please do not modify this file.\n//\n\nimport (\n\t\"fmt\"\n\t\"log\"\n\t\"net\"\n\t\"net/rpc\"\n\t\"os\"\n\t\"sync\"\n\t\"ti"
},
{
"path": "src/paxos/paxos.go",
"chars": 16337,
"preview": "package paxos\n\n//\n// Paxos library, to be included in an application.\n// Multiple applications will run, each including\n"
},
{
"path": "src/paxos/test_test.go",
"chars": 19579,
"preview": "package paxos\n\nimport \"testing\"\nimport \"runtime\"\nimport \"strconv\"\nimport \"os\"\nimport \"time\"\nimport \"fmt\"\nimport \"math/ra"
},
{
"path": "src/pbservice/client.go",
"chars": 2899,
"preview": "package pbservice\n\nimport \"viewservice\"\nimport \"net/rpc\"\nimport \"fmt\"\n\nimport \"crypto/rand\"\nimport \"math/big\"\n\ntype Cler"
},
{
"path": "src/pbservice/common.go",
"chars": 867,
"preview": "package pbservice\n\nconst (\n\tOK = \"OK\"\n\tErrNoKey = \"ErrNoKey\"\n\tErrWrongServer = \"ErrWrongServer\"\n)\n\ncon"
},
{
"path": "src/pbservice/server.go",
"chars": 7688,
"preview": "package pbservice\n\nimport \"net\"\nimport \"fmt\"\nimport \"net/rpc\"\nimport \"log\"\nimport \"time\"\nimport \"viewservice\"\nimport \"sy"
},
{
"path": "src/pbservice/test.go",
"chars": 25391,
"preview": "package pbservice\n\nimport \"viewservice\"\nimport \"fmt\"\nimport \"io\"\nimport \"net\"\nimport \"testing\"\nimport \"time\"\nimport \"log"
},
{
"path": "src/raft/config.go",
"chars": 12183,
"preview": "package raft\n\n//\n// support for Raft tester.\n//\n// we will use the original config.go to test your code for grading.\n// "
},
{
"path": "src/raft/persister.go",
"chars": 1438,
"preview": "package raft\n\n//\n// support for Raft and kvraft to save persistent\n// Raft state (log &c) and k/v server snapshots.\n//\n/"
},
{
"path": "src/raft/raft.go",
"chars": 6809,
"preview": "package raft\n\n//\n// this is an outline of the API that raft must expose to\n// the service (or tester). see comments belo"
},
{
"path": "src/raft/test_test.go",
"chars": 19639,
"preview": "package raft\n\n//\n// Raft tests.\n//\n// we will use the original test_test.go to test your code for grading.\n// so, while "
},
{
"path": "src/raft/util.go",
"chars": 181,
"preview": "package raft\n\nimport \"log\"\n\n// Debugging\nconst Debug = 0\n\nfunc DPrintf(format string, a ...interface{}) (n int, err erro"
},
{
"path": "src/shardkv/client.go",
"chars": 3128,
"preview": "package shardkv\n\n//\n// client code to talk to a sharded key/value service.\n//\n// the client first talks to the shardmast"
},
{
"path": "src/shardkv/common.go",
"chars": 886,
"preview": "package shardkv\n\n//\n// Sharded key/value server.\n// Lots of replica groups, each running op-at-a-time paxos.\n// Shardmas"
},
{
"path": "src/shardkv/config.go",
"chars": 9333,
"preview": "package shardkv\n\nimport \"shardmaster\"\nimport \"labrpc\"\nimport \"testing\"\nimport \"os\"\n\n// import \"log\"\nimport crand \"crypto"
},
{
"path": "src/shardkv/server.go",
"chars": 2793,
"preview": "package shardkv\n\n\n// import \"shardmaster\"\nimport \"labrpc\"\nimport \"raft\"\nimport \"sync\"\nimport \"labgob\"\n\n\n\ntype Op struct "
},
{
"path": "src/shardkv/test_test.go",
"chars": 18480,
"preview": "package shardkv\n\nimport \"linearizability\"\n\nimport \"testing\"\nimport \"strconv\"\nimport \"time\"\nimport \"fmt\"\nimport \"sync/ato"
},
{
"path": "src/shardmaster/client.go",
"chars": 2508,
"preview": "package shardmaster\n\n//\n// Shardmaster clerk.\n// Please don't change this file.\n//\n\nimport \"net/rpc\"\nimport \"time\"\nimpor"
},
{
"path": "src/shardmaster/common.go",
"chars": 1501,
"preview": "package shardmaster\n\n//\n// Master shard server: assigns shards to replication groups.\n//\n// RPC interface:\n// Join(serve"
},
{
"path": "src/shardmaster/config.go",
"chars": 8232,
"preview": "package shardmaster\n\nimport \"labrpc\"\nimport \"raft\"\nimport \"testing\"\nimport \"os\"\n\n// import \"log\"\nimport crand \"crypto/ra"
},
{
"path": "src/shardmaster/server.go",
"chars": 11841,
"preview": "package shardmaster\n\nimport crand \"crypto/rand\"\nimport \"errors\"\nimport \"fmt\"\nimport \"log\"\nimport \"math/big\"\nimport \"net\""
},
{
"path": "src/shardmaster/test_test.go",
"chars": 8446,
"preview": "package shardmaster\n\nimport (\n\t\"sync\"\n\t\"testing\"\n)\n\n// import \"time\"\nimport \"fmt\"\n\nfunc check(t *testing.T, groups []int"
},
{
"path": "src/viewservice/client.go",
"chars": 1964,
"preview": "package viewservice\n\nimport \"net/rpc\"\nimport \"fmt\"\n\n//\n// the viewservice Clerk lives in the client\n// and maintains a l"
},
{
"path": "src/viewservice/common.go",
"chars": 2254,
"preview": "package viewservice\n\nimport \"time\"\n\n//\n// This is a non-replicated view service for a simple\n// primary/backup system.\n/"
},
{
"path": "src/viewservice/server.go",
"chars": 5352,
"preview": "package viewservice\n\nimport \"net\"\nimport \"net/rpc\"\nimport \"log\"\nimport \"time\"\nimport \"sync\"\nimport \"fmt\"\nimport \"os\"\nimp"
},
{
"path": "src/viewservice/test.go",
"chars": 5301,
"preview": "package viewservice\n\nimport \"testing\"\nimport \"runtime\"\nimport \"time\"\nimport \"fmt\"\nimport \"os\"\nimport \"strconv\"\n\n\nfunc ch"
}
]
About this extraction
This page contains the full source code of the isdanni/mit6.824 GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 94 files (4.2 MB), approximately 1.1M tokens, and a symbol index with 752 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.