Repository: missing-semester-cn/missing-semester-cn.github.io Branch: master Commit: 8334ef64fdf1 Files: 65 Total size: 603.7 KB Directory structure: gitextract_gfrnrc45/ ├── .editorconfig ├── .github/ │ └── ISSUE_TEMPLATE/ │ └── translation.md ├── .gitignore ├── 404.html ├── CNAME ├── Gemfile ├── README.md ├── _2019/ │ ├── automation.md │ ├── backups.md │ ├── command-line.md │ ├── course-overview.md │ ├── data-wrangling.md │ ├── dotfiles.md │ ├── editors.md │ ├── files/ │ │ ├── example-data.xml │ │ └── example.c │ ├── index.html │ ├── machine-introspection.md │ ├── os-customization.md │ ├── package-management.md │ ├── program-introspection.md │ ├── remote-machines.md │ ├── security.md │ ├── shell.md │ ├── version-control.md │ ├── virtual-machines.md │ └── web.md ├── _2020/ │ ├── command-line.md │ ├── course-shell.md │ ├── data-wrangling.md │ ├── debugging-profiling.md │ ├── editors-notes.txt │ ├── editors.md │ ├── files/ │ │ ├── example-data.xml │ │ └── vimrc │ ├── index.html │ ├── metaprogramming.md │ ├── potpourri.md │ ├── qa.md │ ├── security.md │ ├── shell-tools.md │ └── version-control.md ├── _config.yml ├── _includes/ │ ├── head.html │ ├── nav.html │ ├── scaled_image.html │ ├── scaled_video.html │ └── video.html ├── _layouts/ │ ├── default.html │ ├── lecture.html │ ├── page.html │ └── redirect.html ├── about.md ├── index.md ├── lectures.html ├── license.md ├── robots.txt └── static/ ├── css/ │ ├── main.css │ └── syntax.css └── files/ ├── logger.py ├── sorts.py └── subtitles/ └── 2020/ ├── command-line.sbv ├── debugging-profiling.sbv ├── qa.sbv └── shell-tools.sbv ================================================ FILE CONTENTS ================================================ ================================================ FILE: .editorconfig ================================================ root = true [*] charset = utf-8 end_of_line = lf indent_style = space insert_final_newline = true trim_trailing_whitespace = true [*.md] indent_size = 4 trim_trailing_whitespace = false [*.{html,xml}] indent_size = 2 [*.yml] indent_size = 2 [*.css] indent_size = 2 ================================================ FILE: .github/ISSUE_TEMPLATE/translation.md ================================================ --- name: translation about: choose the file you plan to translate title: '' labels: trans assignees: '' --- Filename : Estimated time of finish : Note: Please make sure you can finish it within two weeks. ================================================ FILE: .gitignore ================================================ .ruby-version .bundle/ _site/ .jekyll-metadata .claude/ ================================================ FILE: 404.html ================================================ --- layout: default title: "404: Page not found" permalink: /404.html ---
Sorry, the page you were looking for doesn't exist or has been moved.
You can go back to the home page or use the search bar to find what you're looking for.
If you think this is an error, please contact us.
Click on specific topics below to see lecture videos and lecture notes.
We've also shared this class beyond MIT in the hopes that others may benefit from these resources. You can find posts and discussion on
================================================ FILE: _2019/machine-introspection.md ================================================ --- layout: lecture title: "Machine Introspection" presenter: Jon video: aspect: 56.25 id: eNYT2Oq3PF8 --- Sometimes, computers misbehave. And very often, you want to know why. Let's look at some tools that help you do that! But first, let's make sure you're able to do introspection. Often, system introspection requires that you have certain privileges, like being the member of a group (like `power` for shutdown). The `root` user is the ultimate privilege; they can do pretty much anything. You can run a command as `root` (but be careful!) using `sudo`. ## What happened? If something goes wrong, the first place to start is to look at what happened around the time when things went wrong. For this, we need to look at logs. Traditionally, logs were all stored in `/var/log`, and many still are. Usually there's a file or folder per program. Use `grep` or `less` to find your way through them. There's also a kernel log that you can see using the `dmesg` command. This used to be available as a plain-text file, but nowadays you often have to go through `dmesg` to get at it. Finally, there is the "system log", which is increasingly where all of your log messages go. On _most_, though not all, Linux systems, that log is managed by `systemd`, the "system daemon", which controls all the services that run in the background (and much much more at this point). That log is accessible through the somewhat inconvenient `journalctl` tool if you are root, or part of the `admin` or `wheel` groups. For `journalctl`, you should be aware of these flags in particular: - `-u UNIT`: show only messages related to the given systemd service - `--full`: don't truncate long lines (the stupidest feature) - `-b`: only show messages from the latest boot (see also `-b -2`) - `-n100`: only show last 100 entries ## What is happening? If something _is_ wrong, or you just want to get a feel for what's going on in your system, you have a number of tools at your disposal for inspecting the currently running system: First, there's `top`, and the improved version `htop`, which show you various statistics for the currently running processes on the system. CPU use, memory use, process trees, etc. There are lots of shortcuts, but `t` is particularly useful for enabling the tree view. You can also see the process tree with `pstree` (+ `-p` to include PIDs). If you want to know what those programs are doing, you'll often want to tail their log files. `journalctl -f`, `dmesg -w`, and `tail -f` are you friends here. Sometimes, you want to know more about the resources being used overall on your system. [`dstat`](http://dag.wiee.rs/home-made/dstat/) is excellent for that. It gives you real-time resource metrics for lots of different subsystems like I/O, networking, CPU utilization, context switches, and the like. `man dstat` is the place to start. If you're running out of disk space, there are two primary utilities you'll want to know about: `df` and `du`. The former shows you the status of all the partitions on your system (try it with `-h`), whereas the latter measures the size of all the folders you give it, including their contents (see also `-h` and `-s`). To figure out what network connections you have open, `ss` is the way to go. `ss -t` will show all open TCP connections. `ss -tl` will show all listening (i.e., server) ports on your system. `-p` will also include which process is using that connection, and `-n` will give you the raw port numbers. ## System configuration There are _many_ ways to configure your system, but we'll go through two very common ones: networking and services. Most applications on your system tell you how to configure them in their manpage, and usually it will involve editing files in `/etc`; the system configuration directory. If you want to configure your network, the `ip` command lets you do that. Its arguments take on a slightly weird form, but `ip help command` will get you pretty far. `ip addr` shows you information about your network interfaces and how they're configured (IP addresses and such), and `ip route` shows you how network traffic is routed to different network hosts. Network problems can often be resolved purely through the `ip` tool. There's also `iw` for managing wireless network interfaces. `ping` is a handy tool for checking how deeply things are broken. Try pinging a hostname (google.com), an external IP address (1.1.1.1), and an internal IP address (192.168.1.1 or default gw). You may also want to fiddle with `/etc/resolv.conf` to check your DNS settings (how hostnames are resolved to IP addresses). To configure services, you pretty much have to interact with `systemd` these days, for better or for worse. Most services on your system will have a systemd service file that defines a systemd _unit_. These files define what command to run when that services is started, how to stop it, where to log things, etc. They're usually not too bad to read, and you can find most of them in `/usr/lib/systemd/system/`. You can also define your own in `/etc/systemd/system` . Once you have a systemd service in mind, you use the `systemctl` command to interact with it. `systemctl enable UNIT` will set the service to start on boot (`disable` removes it again), and `start`, `stop`, and `restart` will do what you expect. If something goes wrong, systemd will let you know, and you can use `journalctl -u UNIT` to see the application's log. You can also use `systemctl status` to see how all your system services are doing. If your boot feels slow, it's probably due to a couple of slow services, and you can use `systemd-analyze` (try it with `blame`) to figure out which ones. # Exercises `locate`? `dmidecode`? `tcpdump`? `/boot`? `iptables`? `/proc`? ================================================ FILE: _2019/os-customization.md ================================================ --- layout: lecture title: "OS Customization" presenter: Anish video: aspect: 62.5 id: epSRVqQzeDo --- There is a lot you can do to customize your operating system beyond what is available in the settings menus. # Keyboard remapping Your keyboard probably has keys that you aren't using very much. Instead of having useless keys, you can remap them to do useful things. ## Remapping to other keys The simplest thing is to remap keys to other keys. For example, if you don't use the caps lock key very much, then you can remap it to something more useful. If you are a Vim user, for example, you might want to remap caps lock to escape. On macOS, you can do some remappings through Keyboard settings in System Preferences; for more complicated mappings, you need special software. ## Remapping to arbitrary commands You don't just have to remap keys to other keys: there are tools that will let you remap keys (or combinations of keys) to arbitrary commands. For example, you could make command-shift-t open a new terminal window. # Customizing hidden OS settings ## macOS macOS exposes a lot of useful settings through the `defaults` command. For example, you can make Dock icons of hidden applications translucent: ```shell defaults write com.apple.dock showhidden -bool true ``` There is no single list of all possible settings, but you can find lists of specific customizations online, such as Mathias Bynens' [.macos](https://github.com/mathiasbynens/dotfiles/blob/master/.macos). # Window management ## Tiling window management [Tiling window management](https://en.wikipedia.org/wiki/Tiling_window_manager) is one approach to window management, where you organize windows into non-overlapping frames. If you're using a Unix-based operating system, you can install a tiling window manager; if you're using something like Windows or macOS, you can install applications that let you approximate this behavior. ## Screen management You can set up keyboard shortcuts to help you manipulate windows across screens. ## Layouts If there are specific ways you lay out windows on a screen, rather than "executing" that layout manually, you can script it, making instantiating a layout trivial. # Resources - [Hammerspoon](https://www.hammerspoon.org/) - macOS desktop automation - [Rectangle](https://rectangleapp.com/) - macOS window manager - [Karabiner](https://karabiner-elements.pqrs.org/) - sophisticated macOS keyboard remapping - [r/unixporn](https://www.reddit.com/r/unixporn/) - screenshots and documentation of people's fancy configurations # Exercises 1. Figure out how to remap your Caps Lock key to something you use more often (such as Escape or Ctrl or Backspace). 1. Make a custom global keyboard shortcut to open a new terminal window or a new browser window. {% comment %} TODO - Bitbar / Polybar - Clipboard Manager (stack/searchable history) {% endcomment %} ================================================ FILE: _2019/package-management.md ================================================ --- layout: lecture title: "Package Management and Dependency Management" presenter: Anish video: aspect: 56.25 id: tgvt473T8xA --- Software usually builds on (a collection of) other software, which necessitates dependency management. Package/dependency management programs are language-specific, but many share common ideas. # Package repositories Packages are hosted in _package repositories_. There are different repositories for different languages (and sometimes multiple for a particular language), such as [PyPI](https://pypi.org/) for Python, [RubyGems](https://rubygems.org/) for Ruby, and [crates.io](https://crates.io/) for Rust. They generally store software (source code and sometimes pre-compiled binaries for specific platforms) for all versions of a package. # Semantic versioning Software evolves over time, and we need a way to refer to software versions. Some simple ways could be to refer to software by a sequence number or a commit hash, but we can do better in terms of communicating more information: using version numbers. There are many approaches; one popular one is [Semantic Versioning](https://semver.org/): ``` x.y.z ^ ^ ^ | | +- patch | +--- minor +----- major ``` Increment **major** version when you make incompatible API changes. Increment **minor** version when you add functionality in a backward-compatible manner. Increment **patch** when you make backward-compatible bug fixes. For example, if you depend on a feature introduced in `v1.2.0` of some software, then you can install `v1.x.y` for any minor version `x >= 2` and any patch version `y`. You need to install major version `1` (because `2` can introduce backward-incompatible changes), and you need to install a minor version `>= 2` (because you depend on a feature introduced in that minor version). You can use any newer minor version or patch version because they should not introduce any backward-incompatible changes. # Lock files In addition to specifying versions, it can be nice to enforce that the _contents_ of the dependency have not changed to prevent tampering. Some tools use _lock files_ to specify cryptographic hashes of dependencies (along with versions) that are checked on package install. # Specifying versions Tools often let you specify versions in multiple ways, such as: - exact version, e.g. `2.3.12` - minimum major version, e.g. `>= 2` - specific major version and minimum patch version, e.g. `>= 2.3, <3.0` Specifying an exact version can be advantageous to avoid different behaviors based on installed dependencies (this shouldn't happen if all dependencies faithfully follow semver, but sometimes people make mistakes). Specifying a minimum requirement has the advantage of allowing bug fixes to be installed (e.g. patch upgrades). # Dependency resolution Package managers use various dependency resolution algorithms to satisfy dependency requirements. This often gets challenging with complex dependencies (e.g. a package can be indirectly depended on by multiple top-level dependencies, and different versions could be required). Different package managers have different levels of sophistication in their dependency resolution, but it's something to be aware of: you may need to understand this if you are debugging dependencies. # Virtual environments If you're developing multiple software projects, they may depend on different versions of a particular piece of software. Sometimes, your build tool will handle this naturally (e.g. by building a static binary). For other build tools and programming languages, one approach is handling this with virtual environments (e.g. with the [virtualenv](https://docs.python-guide.org/dev/virtualenvs/) tool for Python). Instead of installing dependencies system-wide, you can install dependencies per-project in a virtual environment, and _activate_ the virtual environment that you want to use when you're working on a specific project. # Vendoring Another very different approach to dependency management is _vendoring_. Instead of using a dependency manager or build tool to fetch software, you copy the entire source code for a dependency into your software's repository. This has the advantage that you're always building against the same version of the dependency and you don't need to rely on a package repository, but it is more effort to upgrade dependencies. ================================================ FILE: _2019/program-introspection.md ================================================ --- layout: lecture title: "Program Introspection" presenter: Anish video: aspect: 62.5 id: 74MhV-7hYzg --- # Debugging When printf-debugging isn't good enough: use a debugger. Debuggers let you interact with the execution of a program, letting you do things like: - halt execution of the program when it reaches a certain line - single-step through the program - inspect values of variables - many more advanced features ## GDB/LLDB [GDB](https://www.gnu.org/software/gdb/) and [LLDB](https://lldb.llvm.org/). Supports many C-like languages. Let's look at [example.c](/2019/files/example.c). Compile with debug flags: `gcc -g -o example example.c`. Open GDB: `gdb example` Some commands: - `run` - `b {name of function}` - set a breakpoint - `b {file}:{line}` - set a breakpoint - `c` - continue - `step` / `next` / `finish` - step in / step over / step out - `p {variable}` - print value of variable - `watch {expression}` - set a watchpoint that triggers when the value of the expression changes - `rwatch {expression}` - set a watchpoint that triggers when the value is read - `layout` ## PDB [PDB](https://docs.python.org/3/library/pdb.html) is the Python debugger. Insert `import pdb; pdb.set_trace()` where you want to drop into PDB, basically a hybrid of a debugger (like GDB) and a Python shell. ## Web browser Developer Tools Another example of a debugger, this time with a graphical interface. # strace Observe system calls a program makes: `strace {program}`. # Profiling Types of profiling: CPU, memory, etc. Simplest profiler: `time`. ## Go Run test code with CPU profiler: `go test -cpuprofile=cpu.out` Analyze profile: `go tool pprof -web cpu.out` Run test code with Memory profiler: `go test -memprofile=mem.out` Analyze profile: `go tool pprof -web mem.out` ## Perf Basic performance stats: `perf stat {command}` Run a program with the profiler: `perf record {command}` Analyze profile: `perf report` ================================================ FILE: _2019/remote-machines.md ================================================ --- layout: lecture title: "Remote Machines" presenter: Jose video: aspect: 62.5 id: X5c2Y8BCowM --- It has become more and more common for programmers to use remote servers in their everyday work. If you need to use remote servers in order to deploy backend software or you need a server with higher computational capabilities, you will end up using a Secure Shell (SSH). As with most tools covered, SSH is highly configurable so it is worth learning about it. ## Executing commands An often overlooked feature of `ssh` is the ability to run commands directly. - `ssh foobar@server ls` will execute ls in the home folder of foobar - It works with pipes, so `ssh foobar@server ls | grep PATTERN` will grep locally the remote output of `ls` and `ls | ssh foobar@server grep PATTERN` will grep remotely the local output of `ls`. ## SSH Keys Key-based authentication exploits public-key cryptography to prove to the server that the client owns the secret private key without revealing the key. This way you do not need to reenter your password every time. Nevertheless the private key (e.g. `~/.ssh/id_rsa`) is effectively your password so treat it like so. - Key generation. To generate a pair you can simply run `ssh-keygen -t rsa -b 4096`. If you do not choose a passphrase anyone that gets hold of your private key will be able to access authorized servers so it is recommended to choose one and use `ssh-agent` to manage shell sessions. If you have configured pushing to Github using SSH keys you have probably done the steps outlined [here](https://help.github.com/articles/connecting-to-github-with-ssh/) and have a valid pair already. To check if you have a passphrase and validate it you can run `ssh-keygen -y -f /path/to/key`. - Key based authentication. `ssh` will look into `.ssh/authorized_keys` to determine which clients it should let in. To copy a public key over we can use the ```bash cat .ssh/id_dsa.pub | ssh foobar@remote 'cat >> ~/.ssh/authorized_keys' ``` A simpler solution can be achieved with `ssh-copy-id` where available. ```bash ssh-copy-id -i .ssh/id_dsa.pub foobar@remote ``` ## Copying files over ssh There are many ways to copy files over ssh - `ssh+tee`, the simplest is to use `ssh` command execution and stdin input by doing `cat localfile | ssh remote_server tee serverfile` - `scp` when copying large amounts of files/directories, the secure copy `scp` command is more convenient since it can easily recurse over paths. The syntax is `scp path/to/local_file remote_host:path/to/remote_file` - `rsync` improves upon `scp` by detecting identical files in local and remote and preventing copying them again. It also provides more fine grained control over symlinks, permissions and has extra features like the `--partial` flag that can resume from a previously interrupted copy. `rsync` has a similar syntax to `scp`. ## Backgrounding processes By default when interrupting a ssh connection, child processes of the parent shell are killed along with it. There are a couple of alternatives - `nohup` - the `nohup` tool effectively allows for a process to live when the terminal gets killed. Although this can sometimes be achieved with `&` and `disown`, nohup is a better default. More details can be found [here](https://unix.stackexchange.com/questions/3886/difference-between-nohup-disown-and). - `tmux`, `screen` - whereas `nohup` effectively backgrounds the process it is not convenient for interactive shell sessions. In that case using a terminal multiplexer like `screen` or `tmux` is a convenient choice since one can easily detach and reattach the associated shells. Lastly, if you disown a program and want to reattach it to the current terminal, you can look into [reptyr](https://github.com/nelhage/reptyr). `reptyr PID` will grab the process with id PID and attach it to your current terminal. ## Port Forwarding In many scenarios you will run into software that works by listening to ports in the machine. When this happens in your local machine you can simply do `localhost:PORT` or `127.0.0.1:PORT`, but what do you do with a remote server that does not have its ports directly available through the network/internet?. This is called port forwarding and it comes in two flavors: Local Port Forwarding and Remote Port Forwarding (see the pictures for more details, credit of the pictures from [this SO post](https://unix.stackexchange.com/questions/115897/whats-ssh-port-forwarding-and-whats-the-difference-between-ssh-local-and-remot)). **Local Port Forwarding**  **Remote Port Forwarding**  The most common scenario is local port forwarding where a service in the remote machine listens in a port and you want to link a port in your local machine to forward to the remote port. For example if we execute `jupyter notebook` in the remote server that listens to the port `8888`. Thus to forward that to the local port `9999` we would do `ssh -L 9999:localhost:8888 foobar@remote_server` and then navigate to `localhost:9999` in our local machine. ## Graphics Forwarding Sometimes forwarding ports is not enough since we want to run a GUI based program in the server. You can always resort to Remote Desktop Software that sends the entire Desktop Environment (ie. options like RealVNC, Teamviewer, &c). However for a single GUI tool, SSH provides a good alternative: Graphics Forwarding. Using the `-X` flag tells SSH to forward For trusted X11 forwarding the `-Y` flag can be used. Final note is that for this to work the `sshd_config` on the server must have the following options ```bash X11Forwarding yes X11DisplayOffset 10 ``` ## Roaming A common pain when connecting to a remote server are disconnections due to shutting down/sleeping your computer or changing a network. Moreover if one has a connection with significant lag using ssh can become quite frustrating. [Mosh](https://mosh.org/), the mobile shell, improves upon ssh, allowing roaming connections, intermittent connectivity and providing intelligent local echo. Mosh is present in all common distributions and package managers. Mosh requires an ssh server to be working in the server. You do not need to be superuser to install mosh but it does require that ports 60000 through 60010 to be open in the server (they usually are since they are not in the privileged range). A downside of `mosh` is that is does not support roaming port/graphics forwarding so if you use those often `mosh` won't be of much help. ## SSH Configuration #### Client We have covered many many arguments that we can pass. A tempting alternative is to create shell aliases that look like `alias my_serer="ssh -X -i ~/.id_rsa -L 9999:localhost:8888 foobar@remote_server`, however there is a better alternative, using `~/.ssh/config`. ```bash Host vm User foobar HostName 172.16.174.141 Port 22 IdentityFile ~/.ssh/id_rsa RemoteForward 9999 localhost:8888 # Configs can also take wildcards Host *.mit.edu User foobaz ``` An additional advantage of using the `~/.ssh/config` file over aliases is that other programs like `scp`, `rsync`, `mosh`, &c are able to read it as well and convert the settings into the corresponding flags. Note that the `~/.ssh/config` file can be considered a dotfile, and in general it is fine for it to be included with the rest of your dotfiles. However if you make it public, think about the information that you are potentially providing strangers on the internet: the addresses of your servers, the users you are using, the open ports, &c. This may facilitate some types of attacks so be thoughtful about sharing your SSH configuration. Warning: Never include your RSA keys ( `~/.ssh/id_rsa*` ) in a public repository! #### Server side Server side configuration is usually specified in `/etc/ssh/sshd_config`. Here you can make changes like disabling password authentication, changing ssh ports, enabling X11 forwarding, &c. You can specify config settings in a per user basis. ## Remote Filesystem Sometimes it is convenient to mount a remote folder. [sshfs](https://github.com/libfuse/sshfs) can mount a folder on a remote server locally, and then you can use a local editor. ## Exercises 1. For SSH to work the host needs to be running an SSH server. Install an SSH server (such as OpenSSH) in a virtual machine so you can do the rest of the exercises. To figure out what is the ip of the machine run the command `ip addr` and look for the inet field (ignore the `127.0.0.1` entry, that corresponds to the loopback interface). 1. Go to `~/.ssh/` and check if you have a pair of SSH keys there. If not, generate them with `ssh-keygen -t rsa -b 4096`. It is recommended that you use a password and use `ssh-agent` , more info [here](https://www.ssh.com/ssh/agent). 1. Use `ssh-copy-id` to copy the key to your virtual machine. Test that you can ssh without a password. Then, edit your `sshd_config` in the server to disable password authentication by editing the value of `PasswordAuthentication`. Disable root login by editing the value of `PermitRootLogin`. 1. Edit the `sshd_config` in the server to change the ssh port and check that you can still ssh. If you ever have a public facing server, a non default port and key only login will throttle a significant amount of malicious attacks. 1. Install mosh in your server/VM, establish a connection and then disconnect the network adapter of the server/VM. Can mosh properly recover from it? 1. Another use of local port forwarding is to tunnel certain host to the server. If your network filters some website like for example `reddit.com` you can tunnel it through the server as follows: - Run `ssh remote_server -L 80:reddit.com:80` - Set `reddit.com` and `www.reddit.com` to `127.0.0.1` in `/etc/hosts` - Check that you are accessing that website through the server - If it is not obvious use a website such as [ipinfo.io](https://ipinfo.io/) which will change depending on your host public ip. 1. Background port forwarding can easily be achieved with a couple of extra flags. Look into what the `-N` and `-f` flags do in `ssh` and figure out what a command such as this `ssh -N -f -L 9999:localhost:8888 foobar@remote_server` does. ## References - [SSH Hacks](http://matt.might.net/articles/ssh-hacks/) - [Secure Secure Shell](https://stribika.github.io/2015/01/04/secure-secure-shell.html) {% comment %} Lecture notes will be available by the start of lecture. {% endcomment %} ================================================ FILE: _2019/security.md ================================================ --- layout: lecture title: "Security and Privacy" presenter: Jon video: aspect: 56.25 id: OBx_c-i-M8s --- The world is a scary place, and everyone's out to get you. Okay, maybe not, but that doesn't mean you want to flaunt all your secrets. Security (and privacy) is generally all about raising the bar for attackers. Find out what your threat model is, and then design your security mechanisms around that! If the threat model is the NSA or Mossad, you're _probably_ going to have a bad time. There are _many_ ways to make your technical persona more secure. We'll touch on a lot of high-level things here, but this is a process, and educating yourself is one of the best things you can do. So: ## Follow the Right People One of the best ways to improve your security know-how is to follow other people who are vocal about security. Some suggestions: - [@TroyHunt](https://twitter.com/TroyHunt) - [@SwiftOnSecurity](https://twitter.com/SwiftOnSecurity) - [@taviso](https://twitter.com/taviso) - [@thegrugq](https://twitter.com/thegrugq) - [@tqbf](https://twitter.com/tqbf) - [@mattblaze](https://twitter.com/mattblaze) - [@moxie](https://twitter.com/moxie) See also [this list](https://heimdalsecurity.com/blog/best-twitter-cybersec-accounts/) for more suggestions. ## General Security Advice Tech Solidarity has a pretty great list of [do's and don'ts for journalists](https://web.archive.org/web/20221123204419/https://techsolidarity.org/resources/basic_security.htm) that has a lot of sane advice, and is decently up-to-date. [@thegrugq](https://medium.com/@thegrugq) also has a good blog post on [travel security advice](https://medium.com/@thegrugq/stop-fabricating-travel-security-advice-35259bf0e869) that's worth reading. We'll repeat much of the advice from those sources here, plus some more. Also, get a [USB data blocker](https://www.amazon.com/dp/B00QRRZ2QM/), because [USB is scary](https://www.bleepingcomputer.com/news/security/heres-a-list-of-29-different-types-of-usb-attacks/). ## Authentication The very first thing you should do, if you haven't already, is download a password manager. Some good ones are: - [1password](https://1password.com/) - [KeePass](https://keepass.info/) - [BitWarden](https://bitwarden.com/) - [`pass`](https://git.zx2c4.com/password-store/about/) If you're particularly paranoid, use one that encrypts the passwords locally on your computer, as opposed to storing them in plain-text at the server. Use it to generate passwords for all the web sites you care about right now. Then, switch on two-factor authentication, ideally with a [FIDO/U2F](https://fidoalliance.org/) dongle (a [YubiKey](https://www.yubico.com/quiz/) for example, which has [20% off for students](https://www.yubico.com/why-yubico/for-education/)). TOTP (like Google Authenticator or Duo) will also work in a pinch, but [doesn't protect against phishing](https://twitter.com/taviso/status/1082015009348104192). SMS is pretty much useless unless your threat model only includes random strangers picking up your password in transit. Also, a note about paper keys. Often, services will give you a "backup key" that you can use as a second factor if you lose your real second factor (btw, always keep a backup dongle somewhere safe!). While you _can_ stick those in your password managers, that means that should someone get access to your password manager, you're totally hosed (but maybe you're okay with that thread model). If you are truly paranoid, print out these paper keys, never store them digitally, and place them in a safe in the real world. ## Private Communication Use [Signal](https://www.signal.org/) ([setup instructions](https://medium.com/@mshelton/signal-for-beginners-c6b44f76a1f0). [Wire](https://wire.com/en/) is [fine too](https://www.securemessagingapps.com/); WhatsApp is okay; [don't use Telegram](https://twitter.com/bascule/status/897187286554628096)). Desktop messengers are pretty broken (partially due to usually relying on Electron, which is a huge trust stack). E-mail is particularly problematic, even if PGP signed. It's not generally forward-secure, and the key-distribution problem is pretty severe. [keybase.io](https://keybase.io/) helps, and is useful for a number of other reasons. Also, PGP keys are generally handled on desktop computers, which is one of the least secure computing environments. Relatedly, consider getting a Chromebook, or just work on a tablet with a keyboard. ## File Security File security is hard, and operates on many level. What is it you're trying to secure against? [](https://xkcd.com/538/) - Offline attacks (someone steals your laptop while it's off): turn on full disk encryption. ([cryptsetup + LUKS](https://wiki.archlinux.org/index.php/Dm-crypt/Encrypting_a_non-root_file_system) on Linux, [BitLocker](https://fossbytes.com/enable-full-disk-encryption-windows-10/) on Windows, [FileVault](https://support.apple.com/en-us/HT204837) on macOS. Note that this won't help if the attacker _also_ has you and really wants your secrets. - Online attacks (someone has your laptop and it's on): use file encryption. There are two primary mechanisms for doing so - Encrypted filesystems: stacked filesystem encryption software encrypts files individually rather than having encrypted block devices. You can "mount" these filesystems by providing the decryption key, and then browse the files inside it freely. When you unmount it, those files are all unavailable. Modern solutions include [gocryptfs](https://github.com/rfjakob/gocryptfs) and [eCryptFS](http://ecryptfs.org/). More detailed comparisons can be found [here](https://nuetzlich.net/gocryptfs/comparison/) and [here](https://wiki.archlinux.org/index.php/disk_encryption#Comparison_table) - Encrypted files: encrypt individual files with symmetric encryption (see `gpg -c`) and a secret key. Or, like `pass`, also encrypt the key with your public key so only you can read it back later with your private key. Exact encryption settings matter a lot! - [Plausible deniability](https://en.wikipedia.org/wiki/Plausible_deniability) (what seems to be the problem officer?): usually lower performance, and easier to lose data. Hard to actually prove that it provides [deniable encryption](https://en.wikipedia.org/wiki/Deniable_encryption)! See the [discussion here](https://security.stackexchange.com/questions/135846/is-plausible-deniability-actually-feasible-for-encrypted-volumes-disks), and then consider whether you may want to try [VeraCrypt](https://www.veracrypt.fr/en/Home.html) (the maintained fork of good ol' TrueCrypt). - Encrypted backups: use [Tarsnap](https://www.tarsnap.com/) or [Borgbase](https://www.borgbase.com/) - Think about whether an attacker can delete your backups if they get a hold of your laptop! ## Internet Security & Privacy The internet is a _very_ scary place. Open WiFi networks [are](https://www.troyhunt.com/the-beginners-guide-to-breaking-website/) [scary](https://www.troyhunt.com/talking-with-scott-hanselman-on/). Make sure you delete them afterwards, otherwise your phone will happily announce and re-connect to something with the same name later! If you're ever on a network you don't trust, a VPN _may_ be worthwhile, but keep in mind that you're trusting the VPN provider _a lot_. Do you really trust them more than your ISP? If you truly want a VPN, use a provider you're sure you trust, and you should probably pay for it. Or set up [WireGuard](https://www.wireguard.com/) for yourself -- it's [excellent](https://web.archive.org/web/20210526211307/https://latacora.micro.blog/there-will-be/)! There are also secure configuration settings for a lot of internet-enabled applications at [cipherlist.eu](https://cipherlist.eu/). If you're particularly privacy-oriented, [privacytools.io](https://privacytools.io) is also a good resource. Some of you may wonder about [Tor](https://www.torproject.org/). Keep in mind that Tor is _not_ particularly resistant to powerful global attackers, and is weak against traffic analysis attacks. It may be useful for hiding traffic on a small scale, but won't really buy you all that much in terms of privacy. You're better off using more secure services in the first place (Signal, TLS + certificate pinning, etc.). ## Web Security So, you want to go on the Web too? Jeez, you're really pushing your luck here. Install [HTTPS Everywhere](https://www.eff.org/https-everywhere). SSL/TLS is [critical](https://www.troyhunt.com/ssl-is-not-about-encryption/), and it's _not_ just about encryption, but also about being able to verify that you're talking to the right service in the first place! If you run your own web server, [test it](https://www.ssllabs.com/ssltest/index.html). TLS configuration [can get hairy](https://wiki.mozilla.org/Security/Server_Side_TLS). HTTPS Everywhere will do its very best to never navigate you to HTTP sites when there's an alternative. That doesn't save you, but it helps. If you're truly paranoid, blacklist any SSL/TLS CAs that you don't absolutely need. Install [uBlock Origin](https://github.com/gorhill/uBlock). It is a [wide-spectrum blocker](https://github.com/gorhill/uBlock/wiki/Blocking-mode) that doesn't just stop ads, but all sorts of third-party communication a page may try to do. And inline scripts and such. If you're willing to spend some time on configuration to make things work, go to [medium mode](https://github.com/gorhill/uBlock/wiki/Blocking-mode:-medium-mode) or even [hard mode](https://github.com/gorhill/uBlock/wiki/Blocking-mode:-hard-mode). Those _will_ make some sites not work until you've fiddled with the settings enough, but will also significantly improve your online security. If you're using Firefox, enable [Multi-Account Containers](https://support.mozilla.org/en-US/kb/containers). Create separate containers for social networks, banking, shopping, etc. Firefox will keep the cookies and other state for each of the containers totally separate, so sites you visit in one container can't snoop on sensitive data from the others. In Google Chrome, you can use [Chrome Profiles](https://support.google.com/chrome/answer/2364824) to achieve similar results. Exercises TODO 1. Encrypt a file using PGP 1. Use veracrypt to create a simple encrypted volume 1. Enable 2FA for your most data sensitive accounts i.e. GMail, Dropbox, Github, &c ================================================ FILE: _2019/shell.md ================================================ --- layout: lecture title: "Shell and Scripting" presenter: Jon video: aspect: 56.25 id: dbDRfmH5uSI --- The shell is an efficient, textual interface to your computer. The shell prompt: what greets you when you open a terminal. Lets you run programs and commands; common ones are: - `cd` to change directory - `ls` to list files and directories - `mv` and `cp` to move and copy files But the shell lets you do _so_ much more; you can invoke any program on your computer, and command-line tools exist for doing pretty much anything you may want to do. And they're often more efficient than their graphical counterparts. We'll go through a bunch of those in this class. The shell provides an interactive programming language ("scripting"). There are many shells: - You've probably used `sh` or `bash`. - Also shells that match languages: `csh`. - Or "better" shells: `fish`, `zsh`, `ksh`. In this class we'll focus on the ubiquitous `sh` and `bash`, but feel free to play around with others. I like `fish`. Shell programming is a *very* useful tool in your toolbox. Can either write programs directly at the prompt, or into a file. `#!/bin/sh` + `chmod +x` to make shell executable. ## Working with the shell Run a command a bunch of times: ```bash for i in $(seq 1 5); do echo hello; done ``` There's a lot to unpack: - `for x in list; do BODY; done` - `;` terminates a command -- equivalent to newline - split `list`, assign each to `x`, and run body - splitting is "whitespace splitting", which we'll get back to - no curly braces in shell, so `do` + `done` - `$(seq 1 5)` - run the program `seq` with arguments `1` and `5` - substitute entire `$()` with the output of that program - equivalent to ```bash for i in 1 2 3 4 5 ``` - `echo hello` - everything in a shell script is a command - in this case, run the `echo` command, which prints its arguments with the argument `hello`. - all commands are searched for in `$PATH` (colon-separated) We have variables: ```bash for f in $(ls); do echo $f; done ``` Will print each file name in the current directory. Can also set variables using `=` (no space!): ```bash foo=bar echo $foo ``` There are a bunch of "special" variables too: - `$1` to `$9`: arguments to the script - `$0` name of the script itself - `$#` number of arguments - `$$` process ID of current shell To only print directories ```bash for f in $(ls); do if test -d $f; then echo dir $f; fi; done ``` More to unpack here: - `if CONDITION; then BODY; fi` - `CONDITION` is a command; if it returns with exit status 0 (success), then `BODY` is run. - can also hook in an `else` or `elif` - again, no curly braces, so `then` + `fi` - `test` is another program that provides various checks and comparisons, and exits with 0 if they're true (`$?`) - `man COMMAND` is your friend: `man test` - can also be invoked with `[` + `]`: `[ -d $f ]` - take a look at `man test` and `which "["` But wait! This is wrong! What if a file is called "My Documents"? - `for f in $(ls)` expands to `for f in My Documents` - first do the test on `My`, then on `Documents` - not what we wanted! - biggest source of bugs in shell scripts ## Argument splitting Bash splits arguments by whitespace; not always what you want! - need to use quoting to handle spaces in arguments `for f in "My Documents"` would work correctly - same problem somewhere else -- do you see where? `test -d $f`: if `$f` contains whitespace, `test` will error! - `echo` happens to be okay, because split + join by space but what if a filename contains a newline?! turns into space! - quote all use of variables that you don't want split - but how do we fix our script above? what does `for f in "$(ls)"` do do you think? Globbing is the answer! - bash knows how to look for files using patterns: - `*` any string of characters - `?` any single character - `{a,b,c}` any of these characters - `for f in *`: all files in this directory - when globbing, each matching file becomes its own argument - still need to make sure to quote when _using_: `test -d "$f"` - can make advanced patterns: - `for f in a*`: all files starting with `a` in the current directory - `for f in foo/*.txt`: all `.txt` files in `foo` - `for f in foo/*/p??.txt` all three-letter text files starting with p in subdirs of `foo` Whitespace issues don't stop there: - `if [ $foo = "bar" ]; then` -- see the issue? - what if `$foo` is empty? arguments to `[` are `=` and `bar`... - _can_ work around this with `[ x$foo = "xbar" ]`, but bleh - instead, use `[[`: bash built-in comparator that has special parsing - also allows `&&` instead of `-a`, `||` over `-o`, etc. ## Composability Shell is powerful in part because of composability. Can chain multiple programs together rather than have one program that does everything. The key character is `|` (pipe). - `a | b` means run both `a` and `b` send all output of `a` as input to `b` print the output of `b` All programs you launch ("processes") have three "streams": - `STDIN`: when the program reads input, it comes from here - `STDOUT`: when the program prints something, it goes here - `STDERR`: a 2nd output the program can choose to use - by default, `STDIN` is your keyboard, `STDOUT` and `STDERR` are both your terminal. but you can change that! - `a | b` makes `STDOUT` of `a` `STDIN` of `b`. - also have: - `a > foo` (`STDOUT` of `a` goes to the file `foo`) - `a 2> foo` (`STDERR` of `a` goes to the file `foo`) - `a < foo` (`STDIN` of `a` is read from the file `foo`) - hint: `tail -f` will print a file as it's being written - why is this useful? lets you manipulate output of a program! - `ls | grep foo`: all files that contain the word `foo` - `ps | grep foo`: all processes that contain the word `foo` - `journalctl | grep -i intel | tail -n5`: last 5 system log messages with the word intel (case insensitive) - `who | sendmail -t me@example.com` send the list of logged-in users to `me@example.com` - forms the basis for much data-wrangling, as we'll cover later Bash also provides a number of other ways to compose programs. You can group commands with `(a; b) | tac`: run `a`, then `b`, and send all their output to `tac`, which prints its input in reverse order. A lesser-known, but super useful one is _process substitution_. `b <(a)` will run `a`, generate a temporary file-name for its output stream, and pass that file-name to `b`. For example: ```bash diff <(journalctl -b -1 | head -n20) <(journalctl -b -2 | head -n20) ``` will show you the difference between the first 20 lines of the last boot log and the one before that. ## Job and process control What if you want to run longer-term things in the background? - the `&` suffix runs a program "in the background" - it will give you back your prompt immediately - handy if you want to run two programs at the same time like a server and client: `server & client` - note that the running program still has your terminal as `STDOUT`! try: `server > server.log & client` - see all such processes with `jobs` - notice that it shows "Running" - bring it to the foreground with `fg %JOB` (no argument is latest) - if you want to background the current program: `^Z` + `bg` (Here `^Z` means pressing `Ctrl+Z`) - `^Z` stops the current process and makes it a "job" - `bg` runs the last job in the background (as if you did `&`) - background jobs are still tied to your current session, and exit if you log out. `disown` lets you sever that connection. or use `nohup`. - `$!` is pid of last background process What about other stuff running on your computer? - `ps` is your friend: lists running processes - `ps -A`: print processes from all users (also `ps ax`) - `ps` has *many* arguments: see `man ps` - `pgrep`: find processes by searching (like `ps -A | grep`) - `pgrep -af`: search and display with arguments - `kill`: send a _signal_ to a process by ID (`pkill` by search + `-f`) - signals tell a process to "do something" - most common: `SIGKILL` (`-9` or `-KILL`): tell it to exit *now* equivalent to `^\` - also `SIGTERM` (`-15` or `-TERM`): tell it to exit gracefully equivalent to `^C` ## Flags Most command line utilities take parameters using **flags**. Flags usually come in short form (`-h`) and long form (`--help`). Usually running `CMD -h` or `man CMD` will give you a list of the flags the program takes. Short flags can usually be combined, running `rm -r -f` is equivalent to running `rm -rf` or `rm -fr`. Some common flags are a de facto standard and you will seem them in many applications: * `-a` commonly refers to all files (i.e. also including those that start with a period) * `-f` usually refers to forcing something, like `rm -f` * `-h` displays the help for most commands * `-v` usually enables a verbose output * `-V` usually prints the version of the command Also, a double dash `--` is used in built-in commands and many other commands to signify the end of command options, after which only positional parameters are accepted. So if you have a file called `-v` (which you can) and want to grep it `grep pattern -- -v` will work whereas `grep pattern -v` won't. In fact, one way to create such file is to do `touch -- -v`. ## Exercises 1. If you are completely new to the shell you may want to read a more comprehensive guide about it such as [BashGuide](http://mywiki.wooledge.org/BashGuide). If you want a more in-depth introduction [The Linux Command Line](http://linuxcommand.org/tlcl.php) is a good resource. 1. **PATH, which, type** We briefly discussed that the `PATH` environment variable is used to locate the programs that you run through the command line. Let's explore that a little further - Run `echo $PATH` (or `echo $PATH | tr -s ':' '\n'` for pretty printing) and examine its contents, what locations are listed? - The command `which` locates a program in the user PATH. Try running `which` for common commands like `echo`, `ls` or `mv`. Note that `which` is a bit limited since it does not understand shell aliases. Try running `type` and `command -v` for those same commands. How is the output different? - Run `PATH=` and try running the previous commands again, some work and some don't, can you figure out why? 1. **Special Variables** - What does the variable `~` expands as? What about `.`? And `..`? - What does the variable `$?` do? - What does the variable `$_` do? - What does the variable `!!` expand to? What about `!!*`? And `!l`? - Look for documentation for these options and familiarize yourself with them 1. **xargs** Sometimes piping doesn't quite work because the command being piped into does not expect the newline separated format. For example `file` command tells you properties of the file. Try running `ls | file` and `ls | xargs file`. What is `xargs` doing? 1. **Shebang** When you write a script you can specify to your shell what interpreter should be used to interpret the script by using a [shebang](https://en.wikipedia.org/wiki/Shebang_(Unix)) line. Write a script called `hello` with the following contentsmake it executable with `chmod +x hello`. Then execute it with `./hello`. Then remove the first line and execute it again? How is the shell using that first line? ```bash #! /usr/bin/python print("Hello World!") ``` You will often see programs that have a shebang that looks like `#! usr/bin/env bash`. This is a more portable solution with it own set of [advantages and disadvantages](https://unix.stackexchange.com/questions/29608/why-is-it-better-to-use-usr-bin-env-name-instead-of-path-to-name-as-my). How is `env` different from `which`? What environment variable does `env` use to decide what program to run? 1. **Pipes, process substitution, subshell** Create a script called `slow_seq.sh` with the following contents and do `chmod +x slow_seq.sh` to make it executable. ```bash #! /usr/bin/env bash for i in $(seq 1 10); do echo $i; sleep 1; done ``` There is a way in which pipes (and process substitution) differ from using subshell execution, i.e. `$()`. Run the following commands and observe the differences: - `./slow_seq.sh | grep -P "[3-6]"` - `grep -P "[3-6]" <(./slow_seq.sh)` - `echo $(./slow_seq.sh) | grep -P "[3-6]"` 1. **Misc** - Try running `touch {a,b}{a,b}` then `ls` what did appear? - Sometimes you want to keep STDIN and still pipe it to a file. Try running `echo HELLO | tee hello.txt` - Try running `cat hello.txt > hello.txt ` what do you expect to happen? What does happen? - Run `echo HELLO > hello.txt` and then run `echo WORLD >> hello.txt`. What are the contents of `hello.txt`? How is `>` different from `>>`? - Run `printf "\e[38;5;81mfoo\e[0m\n"`. How was the output different? If you want to know more, search for ANSI color escape sequences. - Run `touch a.txt` then run `^txt^log` what did bash do for you? In the same vein, run `fc`. What does it do? {% comment %} TODO 1. **parallel** - set -e, set -x - traps {% endcomment %} 1. **Keyboard shortcuts** As with any application you use frequently is worth familiarising yourself with its keyboard shortcuts. Type the following ones and try figuring out what they do and in what scenarios it might be convenient knowing about them. For some of them it might be easier searching online about what they do. (remember that `^X` means pressing `Ctrl+X`) - `^A`, `^E` - `^R` - `^L` - `^C`, `^\` and `^D` - `^U` and `^Y` ================================================ FILE: _2019/version-control.md ================================================ --- layout: lecture title: "Version Control" presenter: Jon video: aspect: 56.25 id: 3fig2Vz8QXs --- Whenever you are working on something that changes over time, it's useful to be able to _track_ those changes. This can be for a number of reasons: it gives you a record of what changed, how to undo it, who changed it, and possibly even why. Version control systems (VCS) give you that ability. They let you _commit_ changes to a set of files, along with a message describing the change, as well as look at and undo changes you've made in the past. Most VCS support sharing the commit history between multiple users. This allows for convenient collaboration: you can see the changes I've made, and I can see the changes you've made. And since the VCS tracks _changes_, it can often (though not always) figure out how to combine our changes as long as they touch relatively disjoint things. There [_a lot_](https://en.wikipedia.org/wiki/Comparison_of_version-control_software) of VCSes out there that differ a lot in what they support, how they function, and how you interact with them. Here, we'll focus on [git](https://git-scm.com/), one of the more commonly used ones, but I recommend you also take a look at [Mercurial](https://www.mercurial-scm.org/). With that all said -- to the cliffnotes! ## Is git dark magic? not quite.. you need to understand the data model. we're going to skip over some of the details, but roughly speaking, the _core_ "thing" in git is a commit. - every commit has a unique name, "revision hash" a long hash like `998622294a6c520db718867354bf98348ae3c7e2` often shortened to a short (unique-ish) prefix: `9986222` - commit has author + commit message - also has the hash of any _ancestor commits_ usually just the hash of the previous commit - commit also represents a _diff_, a representation of how you get from the commit's ancestors to the commit (e.g., remove this line in this file, add these lines to this file, rename that file, etc.) - in reality, git stores the full before and after state - probably don't want to store big files that change! initially, the _repository_ (roughly: the folder that git manages) has no content, and no commits. let's set that up: ```console $ git init hackers $ cd hackers $ git status ``` the output here actually gives us a good starting point. let's dig in and make sure we understand it all. first, "On branch master". - don't want to use hashes all the time. - branches are names that point to hashes. - master is traditionally the name for the "latest" commit. every time a new commit is made, the master name will be made to point to the new commit's hash. - special name `HEAD` refers to "current" name - you can also make your own names with `git branch` (or `git tag`) we'll get back to that let's skip over "No commits yet" because that's all there is to it. then, "nothing to commit". - every commit contains a diff with all the changes you made. but how is that diff constructed in the first place? - _could_ just always commit _all_ changes you've made since the last commit - sometimes you want to only commit some of them (e.g., not `TODO`s) - sometimes you want to break up a change into multiple commits to give a separate commit message for each one - git lets you _stage_ changes to construct a commit - add changes to a file or files to the staged changes with `git add` - add only some changes in a file with `git add -p` - without argument `git add` operates on "all known files" - remove a file and stage its removal with `git rm` - empty the set of staged changes `git reset` - note that this does *not* change any of your files! it *only* means that no changes will be included in a commit - to remove only some staged changes: `git reset FILE` or `git reset -p` - check staged changes with `git diff --staged` - see remaining changes with `git diff` - when you're happy with the stage, make a commit with `git commit` - if you just want to commit *all* changes: `git commit -a` - `git help add` has a bunch more helpful info while you're playing with the above, try to run `git status` to see what git thinks you're doing -- it's surprisingly helpful! ## A commit you say... okay, we have a commit, now what? - we can look at recent changes: `git log` (or `git log --oneline`) - we can look at the full changes: `git log -p` - we can show a particular commit: `git show master` - or with `-p` for full diff/patch - we can go back to the state at a commit using `git checkout NAME` - if `NAME` is a commit hash, git says we're "detached". this just means there's no `NAME` that refers to this commit, so if we make commits, no-one will know about them. - we can revert a change with `git revert NAME` - applies the diff in the commit at `NAME` in reverse. - we can compare an older version to this one using `git diff NAME..` - `a..b` is a commit _range_. if either is left out, it means `HEAD`. - we can show all the commits between using `git log NAME..` - `-p` works here too - we can change `master` to point to a particular commit (effectively undoing everything since) with `git reset NAME`: - huh, why? wasn't `reset` to change staged changes? reset has a "second" form (see `git help reset`) which sets `HEAD` to the commit pointed to by the given name. - notice that this didn't change any files -- `git diff` now effectively shows `git diff NAME..`. ## What's in a name? clearly, names are important in git. and they're the key to understanding *a lot* of what goes on in git. so far, we've talked about commit hashes, master, and `HEAD`. but there's more! - you can make your own branches (like master) with `git branch b` - creates a new name, `b`, which points to the commit at `HEAD` - you're still "on" master though, so if you make a new commit, master will point to that new commit, `b` will not. - switch to a branch with `git checkout b` - any commits you make will now update the `b` name - switch back to master with `git checkout master` - all your changes in `b` are hidden away - a very handy way to be able to easily test out changes - tags are other names that never change, and that have their own message. often used to mark releases + changelogs. - `NAME^` means "the commit before `NAME` - can apply recursively: `NAME^^^` - you _most likely_ mean `~` when you use `~` - `~` is "temporal", whereas `^` goes by ancestors - `~~` is the same as `^^` - with `~` you can also write `X~3` for "3 commits older than `X` - you don't want `^3` - `git diff HEAD^` - `-` means "the previous name" - most commands operate on `HEAD` unless you give another argument ## Clean up your mess your commit history will _very_ often end up as: - `add feature x` -- maybe even with a commit message about `x`! - `forgot to add file` - `fix bug` - `typo` - `typo2` - `actually fix` - `actually actually fix` - `tests pass` - `fix example code` - `typo` - `x` - `x` - `x` - `x` that's _fine_ as far as git is concerned, but is not very helpful to your future self, or to other people who are curious about what has changed. git lets you clean up these things: - `git commit --amend`: fold staged changes into previous commit - note that this _changes_ the previous commit, giving it a new hash! - `git rebase -i HEAD~13` is _magical_. for each commit from past 13, choose what to do: - default is `pick`; do nothing - `r`: change commit message - `e`: change commit (add or remove files) - `s`: combine commit with previous and edit commit message - `f`: "fixup" -- combine commit with previous; discard commit msg - at the end, `HEAD` is made to point to what is now the last commit - often referred to as _squashing_ commits - what it really does: rewind `HEAD` to rebase start point, then re-apply the commits in order as directed. - `git reset --hard NAME`: reset the state of all files to that of `NAME` (or `HEAD` if no name is given). handy for undoing changes. ## Playing with others a common use-case for version control is to allow multiple people to make changes to a set of files without stepping on each other's toes. or rather, to make sure that _if_ they step on each other's toes, they won't just silently overwrite each other's changes. git is a _distributed_ VCS: everyone has a local copy of the entire repository (well, of everything others have chosen to publish). some VCSes are _centralized_ (e.g., subversion): a server has all the commits, clients only have the files they have "checked out". basically, they only have the _current_ files, and need to ask the server if they want anything else. every copy of a git repository can be listed as a "remote". you can copy an existing git repository using `git clone ADDRESS` (instead of `git init`). this creates a remote called _origin_ that points to `ADDRESS`. you can fetch names and the commits they point to from a remote with `git fetch REMOTE`. all names at a remote are available to you as `REMOTE/NAME`, and you can use them just like local names. if you have write access to a remote, you can change names at the remote to point to commits you've made using `git push`. for example, let's make the master name (branch) at the remote `origin` point to the commit that our master branch currently points to: - `git push origin master:master` - for convenience, you can set `origin/master` as the default target for when you `git push` from the current branch with `-u` - consider: what does this do? `git push origin master:HEAD^` often you'll use GitHub, GitLab, BitBucket, or something else as your remote. there's nothing "special" about that as far as git is concerned. it's all just names and commits. if someone makes a change to master and updates `github/master` to point to their commit (we'll get back to that in a second), then when you `git fetch github`, you'll be able to see their changes with `git log github/master`. ## Working with others so far, branches seem pretty useless: you can create them, do work on them, but then what? eventually, you'll just make master point to them anyway, right? - what if you had to fix something while working on a big feature? - what if someone else made a change to master in the meantime? inevitably, you will have to _merge_ changes in one branch with changes in another, whether those changes are made by you or someone else. git lets you do this with, unsurprisingly, `git merge NAME`. `merge` will: - look for the latest point where `HEAD` and `NAME` shared a commit ancestor (i.e., where they diverged) - (try to) apply all those changes to the current `HEAD` - produce a commit that contains all those changes, and lists both `HEAD` and `NAME` as its ancestors - set `HEAD` to that commit's hash once your big feature has been finished, you can merge its branch into master, and git will ensure that you don't lose any changes from either branch! if you've used git in the past, you may recognize `merge` by a different name: `pull`. when you do `git pull REMOTE BRANCH`, that is: - `git fetch REMOTE` - `git merge REMOTE/BRANCH` - where, like `push`, `REMOTE` and `BRANCH` are often omitted and use the "tracking" remote branch (remember `-u`?) this usually works _great_. as long as the changes to the branches being merged are disjoint. if they are not, you get a _merge conflict_. sounds scary... - a merge conflict is just git telling you that it doesn't know what the final diff should look like - git pauses and asks you to finish staging the "merge commit" - open the conflicted file in your editor and look for lots of angle brackets (`<<<<<<<`). the stuff above `=======` is the change made in the `HEAD` since the shared ancestor commit. the stuff below is the change made in the `NAME` since the shared commit. - `git mergetool` is pretty handy -- opens a diff editor - once you've _resolved_ the conflict by figuring out what the file should now look like, stage those changes with `git add`. - when all the conflicts are resolved, finish with `git commit` - you can give up with `git merge --abort` you've just resolved your first git merge conflict! \o/ now you can publish your finished changes with `git push` ## When worlds collide when you `push`, git checks that no-one else's work is lost if you update the remote name you're pushing too. it does this by checking that the current commit of the remote name is an ancestor of the commit you are pushing. if it is, git can safely just update the name; this is called _fast-forwarding_. if it is not, git will refuse to update the remote name, and tell you there have been changes. if your push is rejected, what do you do? - merge remote changes with `git pull` (i.e., `fetch` + `merge`) - force the push with `--force`: this will lose other people's changes! - there's also `--force-with-lease`, which will only force the change if the remote name hasn't changed since the last time you fetched from that remote. much safer! - if you've rebased local commits that you've previously pushed ("history rewriting"; probably don't do this), you'll have to force push. think about why! - try to re-apply your changes "on top of" the changes made remotely - this is a `rebase`! - rewind all local commits since shared ancestor - fast-forward `HEAD` to commit at remote name - apply local commits in-order - may have conflicts you have to manually resolve - `git rebase --continue` or `--abort` - lots more [here](https://git-scm.com/book/en/v2/Git-Branching-Rebasing) - `git pull --rebase` will start this process for you - whether you should merge or rebase is a hot topic! some good reads: - [this](https://www.atlassian.com/git/tutorials/merging-vs-rebasing) - [this](http://web.archive.org/web/20210106220723/https://derekgourlay.com/blog/git-when-to-merge-vs-when-to-rebase/) - [this](https://stackoverflow.com/questions/804115/when-do-you-use-git-rebase-instead-of-git-merge) # Further reading [](https://xkcd.com/1597/) - [Learn git branching](https://learngitbranching.js.org/) - [How to explain git in simple words](https://smusamashah.github.io/blog/2017/10/14/explain-git-in-simple-words) - [Git from the bottom up](https://jwiegley.github.io/git-from-the-bottom-up/) - [Git for computer scientists](http://eagain.net/articles/git-for-computer-scientists/) - [Oh shit, git!](https://ohshitgit.com/) - [The Pro Git book](https://git-scm.com/book/en/v2) # Exercises 1. On a repo try modifying an existing file. What happens when you do `git stash`? What do you see when running `git log --all --oneline`? Run `git stash pop` to undo what you did with `git stash`. In what scenario might this be useful? 1. One common mistake when learning git is to commit large files that should not be managed by git or adding sensitive information. Try adding a file to a repository, making some commits and then deleting that file from history (you may want to look at [this](https://help.github.com/articles/removing-sensitive-data-from-a-repository/)). Also if you do want git to manage large files for you, look into [Git-LFS](https://git-lfs.github.com/) 1. Git is really convenient for undoing changes but one has to be familiar even with the most unlikely changes 1. If a file is mistakenly modified in some commit it can be reverted with `git revert`. However if a commit involves several changes `revert` might not be the best option. How can we use `git checkout` to recover a file version from a specific commit? 1. Create a branch, make a commit in said branch and then delete it. Can you still recover said commit? Try looking into `git reflog`. (Note: Recover dangling things quickly, git will periodically automatically clean up commits that nothing points to.) 1. If one is too trigger happy with `git reset --hard` instead of `git reset` changes can be easily lost. However since the changes were staged, we can recover them. (look into `git fsck --lost-found` and `.git/lost-found`) 1. In any git repo look under the folder `.git/hooks` you will find a bunch of scripts that end with `.sample`. If you rename them without the `.sample` they will run based on their name. For instance `pre-commit` will execute before doing a commit. Experiment with them 1. Like many command line tools `git` provides a configuration file (or dotfile) called `~/.gitconfig` . Create and alias using `~/.gitconfig` so that when you run `git graph` you get the output of `git log --oneline --decorate --all --graph` (this is a good command to quickly visualize the commit graph) 1. Git also lets you define global ignore patterns under `~/.gitignore_global`, this is useful to prevent common errors like adding RSA keys. Create a `~/.gitignore_global` file and add the pattern `*rsa`, then test that it works in a repo. 1. Once you start to get more familiar with `git`, you will find yourself running into common tasks, such as editing your `.gitignore`. [git extras](https://github.com/tj/git-extras/blob/master/Commands.md) provides a bunch of little utilities that integrate with `git`. For example `git ignore PATTERN` will add the specified pattern to the `.gitignore` file in your repo and `git ignore-io LANGUAGE` will fetch the common ignore patterns for that language from [gitignore.io](https://www.gitignore.io). Install `git extras` and try using some tools like `git alias` or `git ignore`. 1. Git GUI programs can be a great resource sometimes. Try running [gitk](https://git-scm.com/docs/gitk) in a git repo an explore the different parts of the interface. Then run `gitk --all` what are the differences? 1. Once you get used to command line applications GUI tools can feel cumbersome/bloated. A nice compromise between the two are ncurses based tools which can be navigated from the command line and still provide an interactive interface. Git has [tig](https://github.com/jonas/tig), try installing it and running it in a repo. You can find some usage examples [here](https://www.atlassian.com/blog/git/git-tig). {% comment %} - forced push + `--force-with-lease` - git merge/rebase --abort - git blame - exercise about why rebasing public commits is bad {% endcomment %} ================================================ FILE: _2019/virtual-machines.md ================================================ --- layout: lecture title: "Virtual Machines and Containers" presenter: Anish, Jon video: aspect: 56.25 id: LJ9ki5zq6Ik --- # Virtual Machines Virtual machines are simulated computers. You can configure a guest virtual machine with some operating system and configuration and use it without affecting your host environment. For this class, you can use VMs to experiment with operating systems, software, and configurations without risk: you won't affect your primary development environment. In general, VMs have lots of uses. They are commonly used for running software that only runs on a certain operating system (e.g. using a Windows VM on Linux to run Windows-specific software). They are often used for experimenting with potentially malicious software. ## Useful features - **Isolation**: hypervisors do a pretty good job of isolating the guest from the host, so you can use VMs to run buggy or untrusted software reasonably safely. - **Snapshots**: you can take "snapshots" of your virtual machine, capturing the entire machine state (disk, memory, etc.), make changes to your machine, and then restore to an earlier state. This is useful for testing out potentially destructive actions, among other things. ## Disadvantages Virtual machines are generally slower than running on bare metal, so they may be unsuitable for certain applications. ## Setup - **Resources**: shared with host machine; be aware of this when allocating physical resources. - **Networking**: many options, default NAT should work fine for most use cases. - **Guest addons**: many hypervisors can install software in the guest to enable nicer integration with host system. You should use this if you can. ## Resources - Hypervisors - [VirtualBox](https://www.virtualbox.org/) (open-source) - [Virt-manager](https://virt-manager.org/) (open-source, manages KVM virtual machines and LXC containers) - [VMWare](https://www.vmware.com/) (commercial, available from IS&T [for MIT students](https://ist.mit.edu/vmware-fusion)) If you are already familiar with popular hypervisors/VMs you may want to learn more about how to do this from a command line friendly way. One option is the [libvirt](https://wiki.libvirt.org/page/UbuntuKVMWalkthrough) toolkit which allows you to manage multiple different virtualization providers/hypervisors. ## Exercises 1. Download and install a hypervisor. 1. Create a new virtual machine and install a Linux distribution (e.g. [Debian](https://www.debian.org/)). 1. Experiment with snapshots. Try things that you've always wanted to try, like running `sudo rm -rf --no-preserve-root /`, and see if you can recover easily. 1. Read what a [fork-bomb](https://en.wikipedia.org/wiki/Fork_bomb) (`:(){ :|:& };:`) is and run it on the VM to see that the resource isolation (CPU, Memory, &c) works. 1. Install guest addons and experiment with different windowing modes, file sharing, and other features. # Containers Virtual Machines are relatively heavy-weight; what if you want to spin up machines in an automated fashion? Enter containers! - Amazon Firecracker - Docker - rkt - lxc Containers are _mostly_ just an assembly of various Linux security features, like virtual file system, virtual network interfaces, chroots, virtual memory tricks, and the like, that together give the appearance of virtualization. Not quite as secure or isolated as a VM, but pretty close and getting better. Usually higher performance, and much faster to start, but not always. The performance boost comes from the fact that unlike VMs which run an entire copy of the operating system, containers share the linux kernel with the host. However note that if you are running linux containers on Windows/macOS a Linux VM will need to be active as a middle layer between the two.  _Comparison between Docker containers and Virtual Machines. Credit: blog.docker.com_ Containers are handy for when you want to run an automated task in a standardized setup: - Build systems - Development environments - Pre-packaged servers - Running untrusted programs - Grading student submissions - (Some) cloud computing - Continuous integration - Travis CI - GitHub Actions Moreover, container software like Docker has also been extensively used as a solution for [dependency hell](https://en.wikipedia.org/wiki/Dependency_hell). If a machine needs to be running many services with conflicting dependencies they can be isolated using containers. Usually, you write a file that defines how to construct your container. You start with some minimal _base image_ (like Alpine Linux), and then a list of commands to run to set up the environment you want (install packages, copy files, build stuff, write config files, etc.). Normally, there's also a way to specify any external ports that should be available, and an _entrypoint_ that dictates what command should be run when the container is started (like a grading script). In a similar fashion to code repository websites (like [GitHub](https://github.com/)) there are some container repository websites (like [DockerHub](https://hub.docker.com/))where many software services have prebuilt images that one can easily deploy. ## Exercises 1. Choose a container software (Docker, LXC, …) and install a simple Linux image. Try SSHing into it. 1. Search and download a prebuilt container image for a popular web server (nginx, apache, …) ================================================ FILE: _2019/web.md ================================================ --- layout: lecture title: "Web and Browsers" presenter: Jose video: aspect: 62.5 id: XpZO3S8odec --- Apart from the terminal, the web browser is a tool you will find yourself spending significant amounts of time into. Thus it is worth learning how to use it efficiently and ## Shortcuts Clicking around in your browser is often not the fastest option, getting familiar with common shortcuts can really pay off in the long run. - `Middle Button Click` in a link opens it in a new tab - `Ctrl+T` Opens a new tab - `Ctrl+Shift+T` Reopens a recently closed tab - `Ctrl+L` selects the contents of the search bar - `Ctrl+F` to search within a webpage. If you do this often, you may benefit from an extension that supports regular expressions in searches. ## Search operators Web search engines like Google or DuckDuckGo provide search operators to enable more elaborate web searches: - `"bar foo"` enforces an exact match of bar foo - `foo site:bar.com` searches for foo within bar.com - `foo -bar ` excludes the terms containing bar from the search - `foobar filetype:pdf` Searches for files of that extension - `(foo|bar)` searches for matches that have foo OR bar More through lists are available for popular engines like [Google](https://ahrefs.com/blog/google-advanced-search-operators/) and [DuckDuckGo](https://duck.co/help/results/syntax) ## Searchbar The searchbar is a powerful tool too. Most browsers can infer search engines from websites and will store them. By editing the keyword argument - In Google Chrome they are in [chrome://settings/searchEngines](chrome://settings/searchEngines) - In Firefox they are in [about:preferences#search](about:preferences#search) For example you can make so that `y SOME SEARCH TERMS` to directly search in youtube. Moreover, if you own a domain you can setup subdomain forwards using your registrar. For instance I have mapped `https://ht.josejg.com` to this course website. That way I can just type `ht.` and the searchbar will autocomplete. Another good feature of this setup is that unlike bookmarks they will work in every browser. ## Privacy extensions Nowadays surfing the web can get quite annoying due to ads and invasive due to trackers. Moreover a good adblocker not only blocks most ad content but it will also block sketchy and malicious websites since they will be included in the common blacklists. They will also reduce page load times sometimes by reducing the amount of requests performed. A couple of recommendations are: - **uBlock origin** ([Chrome](https://chrome.google.com/webstore/detail/ublock-origin/cjpalhdlnbpafiamejdnhcphjbkeiagm), [Firefox](https://addons.mozilla.org/en-US/firefox/addon/ublock-origin/)): block ads and trackers based on predefined rules. You should also consider taking a look at the enabled blacklists in settings since you can enable more based on your region or browsing habits. You can even install filters from [around the web](https://github.com/gorhill/uBlock/wiki/Filter-lists-from-around-the-web) - **[Privacy Badger](https://privacybadger.org/)**: detects and blocks trackers automatically. For example when you go from website to website ad companies track which sites you visit and build a profile of you - **[HTTPS everywhere](https://www.eff.org/https-everywhere)** is a wonderful extension that redirects to HTTPS version of a website automatically, if available. You can find about more addons of this kind [here](https://www.privacytools.io/privacy-browser-addons/) ## Style customization Web browsers are just another piece of software running in _your machine_ and thus you usually have the last say about what they should display or how they should behave. An example of this are custom styles. Browsers determine how to render the style of a webpage using Cascading Style Sheets often abbreviated as CSS. You can access the source code of a website by inspecting it and changing its contents and styles temporarily (this is also a reason why you should never trust webpage screenshots). If you want to permanently tell your browser to override the style settings for a webpage you will need to use an extension. Our recommendation is **[Stylus](https://github.com/openstyles/stylus)** ([Firefox](https://addons.mozilla.org/en-US/firefox/addon/styl-us/), [Chrome](https://chrome.google.com/webstore/detail/stylus/clngdbkpkpeebahjckkjfobafhncgmne?hl=en)). For example, we can write the following style for the class website ```css body { background-color: #2d2d2d; color: #eee; font-family: Fira Code; font-size: 16pt; } a:link { text-decoration: none; color: #0a0; } ``` Moreover, Stylus can find styles written by other users and published in [userstyles.org](https://userstyles.org/). Most common websites have one or several dark theme stylesheets for instance. FYI, you should not use Stylish since it was shown to leak user data, more [here](https://arstechnica.com/information-technology/2018/07/stylish-extension-with-2m-downloads-banished-for-tracking-every-site-visit/) ## Functionality Customization In the same way that you can modify the style, you can also modify the behaviour of a website by writing custom javascript and them sourcing it using a web browser extension such as [Tampermonkey](https://tampermonkey.net/) For example the following script enables vim-like navigation using the J and K keys. ```js // ==UserScript== // @name VIM HT // @namespace http://tampermonkey.net/ // @version 0.1 // @description Vim JK for our website // @author You // @match https://hacker-tools.github.io/* // @grant none // ==/UserScript== (function() { 'use strict'; window.onkeyup = function(e) { var key = e.keyCode ? e.keyCode : e.which; if (key == 74) { // J is key 74 window.scrollBy(0,500);; }else if (key == 75) { // K is key 75 window.scrollBy(0,-500);; } } })(); ``` There are also script repositories such as [OpenUserJS](https://openuserjs.org/) and [Greasy Fork](https://greasyfork.org/en). However, be warned, installing user scripts from others can be very dangerous since they can pretty much do anything such as steal your credit card numbers. Never install a script unless you read the whole thing yourself, understand what it does, and are absolutely sure that you know it isn't doing anything suspicious. Never install a script that contains minified or obfuscated code that you can't read! ## Web APIs It has become more and more common for webservices to offer an application interface aka web API so you can interact with the services making web requests. A more in depth introduction to the topic can be found [here](https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Client-side_web_APIs/Introduction). There are [many public APIs](https://github.com/toddmotto/public-apis). Web APIs can be useful for very many reasons: - **Retrieval**. Web APIs can quite easily provide you information such as maps, weather or what your public ip address. For instance `curl ipinfo.io` will return a JSON object with some details about your public ip, region, location, &c. With proper parsing these tools can be integrated even with command line tools. The following bash functions talks to Googles autocompletion API and returns the first ten matches. ```bash function c() { url='https://www.google.com/complete/search?client=hp&hl=en&xhr=t' # NB: user-agent must be specified to get back UTF-8 data! curl -H 'user-agent: Mozilla/5.0' -sSG --data-urlencode "q=$*" "$url" | jq -r ".[1][][0]" | sed 's,\?b>,,g' } ``` - **Interaction**. Web API endpoints can also be used to trigger actions. These usually require some sort of authentication token that you can obtain through the service. For example performing the following `curl -X POST -H 'Content-type: application/json' --data '{"text":"Hello, World!"}' "https://hooks.slack.com/services/$SLACK_TOKEN"` will send a `Hello, World!` message in a channel. - **Piping**. Since some services with web APIs are rather popular, common web API "gluing" has already been implemented and is provided with server included. This is the case for services like [If This Then That](https://ifttt.com/) and [Zapier](https://zapier.com/) ## Web Automation Sometimes web APIs are not enough. If only reading is needed you can use a html parser like `pup` or use a library, for example python has BeautifulSoup. However if interactivity or javascript execution is required those solutions fall short. WebDriver For example, the following script will save the specified url using the wayback machine simulating the interaction of typing the website. ```python from selenium.webdriver import Firefox from selenium.webdriver.common.keys import Keys def snapshot_wayback(driver, url): driver.get("https://web.archive.org/") elem = driver.find_element_by_class_name('web-save-url-input') elem.clear() elem.send_keys(url) elem.send_keys(Keys.RETURN) driver.close() driver = Firefox() url = 'https://hacker-tools.github.io' snapshot_wayback(driver, url) ``` ## Exercises 1. Edit a keyword search engine that you use often in your web browser 1. Install the mentioned extensions. Look into how uBlock Origin/Privacy Badger can be disabled for a website. What differences do you see? Try doing it in a website with plenty of ads like YouTube. 1. Install Stylus and write a custom style for the class website using the CSS provided. Here are some common programming characters `= == === >= => ++ /= ~=`. What happens to them when changing the font to Fira Code? If you want to know more search for programming font ligatures. 1. Find a web api to get the weather in your city/area. 1. Use a WebDriver software like [Selenium](https://docs.seleniumhq.org/) to automate some repetitive manual task that you perform often with your browser. ================================================ FILE: _2020/command-line.md ================================================ --- layout: lecture title: "命令行环境" date: 2020-01-21 ready: true sync: true syncdate: 2025-08-16 video: aspect: 56.25 id: e8BO_dYxk5c solution: ready: true url: command-line-solution --- 当您使用 shell 进行工作时,可以使用一些方法改善您的工作流,本节课我们就来讨论这些方法。 我们已经使用 shell 一段时间了,但是到目前为止我们的关注点主要集中在使用不同的命令上面。现在,我们将会学习如何同时执行多个不同的进程并追踪它们的状态、如何停止或暂停某个进程以及如何使进程在后台运行。 我们还将学习一些能够改善您的 shell 及其他工具的工作流的方法,这主要是通过定义别名或基于配置文件对其进行配置来实现的。这些方法都可以帮您节省大量的时间。例如,仅需要执行一些简单的命令,我们就可以在所有的主机上使用相同的配置。我们还会学习如何使用 SSH 操作远端机器。 # 任务控制 某些情况下我们需要中断正在执行的任务,比如当一个命令需要执行很长时间才能完成时(假设我们在使用 `find` 搜索一个非常大的目录结构)。大多数情况下,我们可以使用 `Ctrl-C` 来停止命令的执行。但是它的工作原理是什么呢?为什么有的时候会无法结束进程? ## 结束进程 您的 shell 会使用 UNIX 提供的信号机制执行进程间通信。当一个进程接收到信号时,它会停止执行、处理该信号并基于信号传递的信息来改变其执行。就这一点而言,信号是一种 _软件中断_。 在上面的例子中,当我们输入 `Ctrl-C` 时,shell 会发送一个 `SIGINT` 信号到进程。 下面这个 Python 程序向您展示了捕获信号 `SIGINT` 并忽略它的基本操作,它并不会让程序停止。为了停止这个程序,我们需要使用 `SIGQUIT` 信号,通过输入 `Ctrl-\` 可以发送该信号。 ```python #!/usr/bin/env python import signal, time def handler(signum, time): print("\nI got a SIGINT, but I am not stopping") signal.signal(signal.SIGINT, handler) i = 0 while True: time.sleep(.1) print("\r{}".format(i), end="") i += 1 ``` 如果我们向这个程序发送两次 `SIGINT` ,然后再发送一次 `SIGQUIT`,程序会有什么反应?注意 `^` 是我们在终端输入 `Ctrl` 时的表示形式: ``` $ python sigint.py 24^C I got a SIGINT, but I am not stopping 26^C I got a SIGINT, but I am not stopping 30^\[1] 39913 quit python sigint.pyƒ ``` 尽管 `SIGINT` 和 `SIGQUIT` 都常常用来发出和终止程序相关的请求。`SIGTERM` 则是一个更加通用的、也更加优雅地退出信号。为了发出这个信号我们需要使用 [`kill`](https://www.man7.org/linux/man-pages/man1/kill.1.html) 命令, 它的语法是: `kill -TERM您也可以访问去年的讲座笔记和视频。
================================================ FILE: _2020/metaprogramming.md ================================================ --- layout: lecture title: "元编程" details: 构建系统、依赖管理、测试、持续集成 date: 2020-01-27 ready: true sync: true syncdate: 2025-08-16 video: aspect: 56.25 id: _Ms1Z4xfqv4 solution: ready: true url: metaprogramming-solution --- 我们这里说的 “元编程(metaprogramming)” 是什么意思呢?好吧,对于本文要介绍的这些内容,这是我们能够想到的最能概括它们的词。因为我们今天要讲的东西,更多是关于 *流程* ,而不是写代码或更高效的工作。本节课我们会学习构建系统、代码测试以及依赖管理。在您还是学生的时候,这些东西看上去似乎对您来说没那么重要,不过当您开始实习或走进社会的时候,您将会接触到大型的代码库,本节课讲授的这些东西也会变得随处可见。必须要指出的是,“元编程” 也有 “[用于操作程序的程序](https://en.wikipedia.org/wiki/Metaprogramming)” 之含义,这和我们今天讲座所介绍的概念是完全不同的。 # 构建系统 如果您使用 LaTeX 来编写论文,您需要执行哪些命令才能编译出您想要的论文呢?执行基准测试、绘制图表然后将其插入论文的命令又有哪些?或者,如何编译本课程提供的代码并执行测试呢? 对于大多数系统来说,不论其是否包含代码,都会包含一个 “构建过程”。有时,您需要执行一系列操作。通常,这一过程包含了很多步骤,很多分支。执行一些命令来生成图表,然后执行另外的一些命令生成结果,然后再执行其他的命令来生成最终的论文。有很多事情需要我们完成,您并不是第一个因此感到苦恼的人,幸运的是,有很多工具可以帮助我们完成这些操作。 这些工具通常被称为 "构建系统",而且这些工具还不少。如何选择工具完全取决于您当前手头上要完成的任务以及项目的规模。从本质上讲,这些工具都是非常类似的。您需要定义 *依赖*、*目标* 和 *规则*。您必须告诉构建系统您具体的构建目标,系统的任务则是找到构建这些目标所需要的依赖,并根据规则构建所需的中间产物,直到最终目标被构建出来。理想的情况下,如果目标的依赖没有发生改动,并且我们可以从之前的构建中复用这些依赖,那么与其相关的构建规则并不会被执行。 `make` 是最常用的构建系统之一,您会发现它通常被安装到了几乎所有基于 UNIX 的系统中。`make` 并不完美,但是对于中小型项目来说,它已经足够好了。当您执行 `make` 时,它会去参考当前目录下名为 `Makefile` 的文件。所有构建目标、相关依赖和规则都需要在该文件中定义,它看上去是这样的: ```make paper.pdf: paper.tex plot-data.png pdflatex paper.tex plot-%.png: %.dat plot.py ./plot.py -i $*.dat -o $@ ``` 这个文件中的指令,即如何使用右侧文件构建左侧文件的规则。或者,换句话说,冒号左侧的是构建目标,冒号右侧的是构建它所需的依赖。缩进的部分是从依赖构建目标时需要用到的一段命令。在 `make` 中,第一条指令还指明了构建的目的,如果您使用不带参数的 `make`,这便是我们最终的构建结果。或者,您可以使用这样的命令来构建其他目标:`make plot-data.png`。 规则中的 `%` 是一种模式,它会匹配其左右两侧相同的字符串。例如,如果目标是 `plot-foo.png`, `make` 会去寻找 `foo.dat` 和 `plot.py` 作为依赖。现在,让我们看看如果在一个空的源码目录中执行 `make` 会发生什么? ```console $ make make: *** No rule to make target 'paper.tex', needed by 'paper.pdf'. Stop. ``` `make` 会告诉我们,为了构建出 `paper.pdf`,它需要 `paper.tex`,但是并没有一条规则能够告诉它如何构建该文件。让我们构建它吧! ```console $ touch paper.tex $ make make: *** No rule to make target 'plot-data.png', needed by 'paper.pdf'. Stop. ``` 哟,有意思,我们是 **有** 构建 `plot-data.png` 的规则的,但是这是一条模式规则。因为源文件 `data.dat` 并不存在,因此 `make` 就会告诉您它不能构建 `plot-data.png`,让我们创建这些文件: ```console $ cat paper.tex \documentclass{article} \usepackage{graphicx} \begin{document} \includegraphics[scale=0.65]{plot-data.png} \end{document} $ cat plot.py #!/usr/bin/env python import matplotlib import matplotlib.pyplot as plt import numpy as np import argparse parser = argparse.ArgumentParser() parser.add_argument('-i', type=argparse.FileType('r')) parser.add_argument('-o') args = parser.parse_args() data = np.loadtxt(args.i) plt.plot(data[:, 0], data[:, 1]) plt.savefig(args.o) $ cat data.dat 1 1 2 2 3 3 4 4 5 8 ``` 当我们执行 `make` 时会发生什么? ```console $ make ./plot.py -i data.dat -o plot-data.png pdflatex paper.tex ... lots of output ... ``` 看!PDF ! 如果再次执行 `make` 会怎样? ```console $ make make: 'paper.pdf' is up to date. ``` 什么事情都没做!为什么?好吧,因为它什么都不需要做。make 检查出所有之前构建的目标仍然与其列出的依赖项保持最新状态。让我们试试修改 `paper.tex` 后再重新执行 `make`: ```console $ vim paper.tex $ make pdflatex paper.tex ... ``` 注意 `make` 并 **没有** 重新构建 `plot.py`,因为没必要;`plot-data.png` 的所有依赖都没有发生改变。 # 依赖管理 就您的项目来说,它的依赖可能本身也是其他的项目。您也许会依赖某些程序(例如 `python`)、系统包(例如 `openssl`)或相关编程语言的库(例如 `matplotlib`)。 现在,大多数的依赖可以通过某些 **软件仓库** 来获取,这些仓库会在一个地方托管大量的依赖,我们则可以通过一套非常简单的机制来安装依赖。例如 Ubuntu 系统下面有 Ubuntu 软件包仓库,您可以通过 `apt` 这个工具来访问, RubyGems 则包含了 Ruby 的相关库,PyPi 包含了 Python 库, Arch Linux 用户贡献的库则可以在 Arch User Repository 中找到。 由于每个仓库、每种工具的运行机制都不太一样,因此我们并不会在本节课深入讲解具体的细节。我们会介绍一些通用的术语,例如 *版本控制*。大多数被其他项目所依赖的项目都会在每次发布新版本时创建一个 *版本号*。通常看上去像 8.1.3 或 64.1.20192004。版本号一般是数字构成的,但也并不绝对。版本号有很多用途,其中最重要的作用是保证软件能够运行。试想一下,假如我的库要发布一个新版本,在这个版本里面我重命名了某个函数。如果有人在我的库升级版本后,仍希望基于它构建新的软件,那么很可能构建会失败,因为它希望调用的函数已经不复存在了。有了版本控制就可以很好的解决这个问题,我们可以指定当前项目需要基于某个版本,甚至某个范围内的版本,或是某些项目来构建。这么做的话,即使某个被依赖的库发生了变化,依赖它的软件可以基于其之前的版本进行构建。 这样还并不理想!如果我们发布了一项和安全相关的升级,它并 *没有* 影响到任何公开接口(API),但是处于安全的考虑,依赖它的项目都应该立即升级,那应该怎么做呢?这也是版本号包含多个部分的原因。不同项目所用的版本号其具体含义并不完全相同,但是一个相对比较常用的标准是 [语义版本号](https://semver.org/),这种版本号具有不同的语义,它的格式是这样的:主版本号.次版本号.补丁号。相关规则有: - 如果新的版本没有改变 API,请将补丁号递增; - 如果您添加了 API 并且该改动是向后兼容的,请将次版本号递增; - 如果您修改了 API 但是它并不向后兼容,请将主版本号递增。 这么做有很多好处。现在如果我们的项目是基于您的项目构建的,那么只要最新版本的主版本号只要没变就是安全的 ,次版本号不低于之前我们使用的版本即可。换句话说,如果我依赖的版本是 `1.3.7`,那么使用 `1.3.8`、`1.6.1`,甚至是 `1.3.0` 都是可以的。如果版本号是 `2.2.4` 就不一定能用了,因为它的主版本号增加了。我们可以将 Python 的版本号作为语义版本号的一个实例。您应该知道,Python 2 和 Python 3 的代码是不兼容的,这也是为什么 Python 的主版本号改变的原因。类似的,使用 Python 3.5 编写的代码在 3.7 上可以运行,但是在 3.4 上可能会不行。 使用依赖管理系统的时候,您可能会遇到锁文件(_lock files_)这一概念。锁文件列出了您当前每个依赖所对应的具体版本号。通常,您需要执行升级程序才能更新依赖的版本。这么做的原因有很多,例如避免不必要的重新编译、创建可复现的软件版本或禁止自动升级到最新版本(可能会包含 bug)。还有一种极端的依赖锁定叫做 _vendoring_,它会把您的依赖中的所有代码直接拷贝到您的项目中,这样您就能够完全掌控代码的任何修改,同时您也可以将自己的修改添加进去,不过这也意味着如果该依赖的维护者更新了某些代码,您也必须要自己去拉取这些更新。 # 持续集成系统 随着您接触到的项目规模越来越大,您会发现修改代码之后还有很多额外的工作要做。您可能需要上传一份新版本的文档、上传编译后的文件到某处、发布代码到 pypi,执行测试套件等等。或许您希望每次有人提交代码到 GitHub 的时候,他们的代码风格被检查过并执行过某些基准测试?如果您有这方面的需求,那么请花些时间了解一下持续集成。 持续集成(Continuous integration),或者叫做 CI 是一种雨伞术语(umbrella term,涵盖了一组术语的术语),它指的是那些“当您的代码变动时,自动运行的东西”,市场上有很多提供各式各样 CI 工具的公司,这些工具大部分都是免费或开源的。比较大的有 Travis CI、Azure Pipelines 和 GitHub Actions。它们的工作原理都是类似的:您需要在代码仓库中添加一个文件,描述当前仓库发生任何修改时,应该如何应对。目前为止,最常见的规则是:如果有人提交代码,执行测试套件。当这个事件被触发时,CI 提供方会启动一个(或多个)虚拟机,执行您制定的规则,并且通常会记录下相关的执行结果。您可以进行某些设置,这样当测试套件失败时您能够收到通知或者当测试全部通过时,您的仓库主页会显示一个徽标。 本课程的网站基于 GitHub Pages 构建,这就是一个很好的例子。Pages 在每次 `master` 有代码更新时,会执行 Jekyll 博客软件,然后使您的站点可以通过某个 GitHub 域名来访问。对于我们来说这些事情太琐碎了,我现在我们只需要在本地进行修改,然后使用 git 提交代码,发布到远端。CI 会自动帮我们处理后续的事情。 ## 测试简介 多数的大型软件都有“测试套件”。您可能已经对测试的相关概念有所了解,但是我们觉得有些测试方法和测试术语还是应该再次提醒一下: - 测试套件(Test suite):所有测试的统称。 - 单元测试(Unit test):一种“微型测试”,用于对某个封装的特性进行测试。 - 集成测试(Integration test):一种“宏观测试”,针对系统的某一大部分进行,测试其不同的特性或组件是否能 *协同* 工作。 - 回归测试(Regression test):一种实现特定模式的测试,用于保证之前引起问题的 bug 不会再次出现。 - 模拟(Mocking): 使用一个假的实现来替换函数、模块或类型,屏蔽那些和测试不相关的内容。例如,您可能会“模拟网络连接” 或 “模拟硬盘”。 # 课后练习 [习题解答]({{site.url}}/{{site.solution_url}}/{{page.solution.url}}) 1. 大多数的 makefiles 都提供了 一个名为 `clean` 的构建目标,这并不是说我们会生成一个名为 `clean` 的文件,而是我们可以使用它清理文件,让 make 重新构建。您可以理解为它的作用是“撤销”所有构建步骤。在上面的 makefile 中为 `paper.pdf` 实现一个 `clean` 目标。您需要将构建目标设置为 [phony](https://www.gnu.org/software/make/manual/html_node/Phony-Targets.html)。您也许会发现 [`git ls-files`](https://git-scm.com/docs/git-ls-files) 子命令很有用。其他一些有用的 make 构建目标可以在 [这里](https://www.gnu.org/software/make/manual/html_node/Standard-Targets.html#Standard-Targets) 找到; 2. 指定版本要求的方法很多,让我们学习一下 [Rust 的构建系统](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html) 的依赖管理。大多数的包管理仓库都支持类似的语法。对于每种语法(尖号、波浪号、通配符、比较、多个版本要求),构建一种场景使其具有实际意义; 3. Git 可以作为一个简单的 CI 系统来使用,在任何 git 仓库中的 `.git/hooks` 目录中,您可以找到一些文件(当前处于未激活状态),它们的作用和脚本一样,当某些事件发生时便可以自动执行。请编写一个 [`pre-commit`](https://git-scm.com/docs/githooks#_pre_commit) 钩子,它会在提交前执行 `make paper.pdf` 并在出现构建失败的情况拒绝您的提交。这样做可以避免产生包含不可构建版本的提交信息; 4. 基于 [GitHub Pages](https://pages.github.com/) 创建任意一个可以自动发布的页面。添加一个 [GitHub Action](https://github.com/features/actions) 到该仓库,对仓库中的所有 shell 文件执行 `shellcheck`([方法之一](https://github.com/marketplace/actions/shellcheck)); 5. [构建属于您的](https://help.github.com/en/actions/automating-your-workflow-with-github-actions/building-actions) GitHub action,对仓库中所有的 `.md` 文件执行 [`proselint`](http://proselint.com/) 或 [`write-good`](https://github.com/btford/write-good),在您的仓库中开启这一功能,提交一个包含错误的文件看看该功能是否生效。 ================================================ FILE: _2020/potpourri.md ================================================ --- layout: lecture title: "大杂烩" date: 2020-01-29 ready: true sync: true syncdate: 2025-08-16 video: aspect: 56.25 id: JZDt-PRq0uo --- ## 目录 - [修改键位映射](#%E4%BF%AE%E6%94%B9%E9%94%AE%E4%BD%8D%E6%98%A0%E5%B0%84) - [守护进程](#%E5%AE%88%E6%8A%A4%E8%BF%9B%E7%A8%8B) - [FUSE](#fuse) - [备份](#%E5%A4%87%E4%BB%BD) - [API(应用程序接口)](#API%EF%BC%88%E5%BA%94%E7%94%A8%E7%A8%8B%E5%BA%8F%E6%8E%A5%E5%8F%A3%EF%BC%89) - [常见命令行标志参数及模式](#%E5%B8%B8%E8%A7%81%E5%91%BD%E4%BB%A4%E8%A1%8C%E6%A0%87%E5%BF%97%E5%8F%82%E6%95%B0%E5%8F%8A%E6%A8%A1%E5%BC%8F) - [窗口管理器](#%E7%AA%97%E5%8F%A3%E7%AE%A1%E7%90%86%E5%99%A8) - [VPN](#vpn) - [Markdown](#markdown) - [Hammerspoon (macOS 桌面自动化)](#Hammerspoon%20(macOS%E6%A1%8C%E9%9D%A2%E8%87%AA%E5%8A%A8%E5%8C%96)) - [资源](#%E8%B5%84%E6%BA%90) - [开机引导以及 Live USB](#%E5%BC%80%E6%9C%BA%E5%BC%95%E5%AF%BC%E4%BB%A5%E5%8F%8A%20Live%20USB) - [Docker, Vagrant, VMs, Cloud, OpenStack](#docker-vagrant-vms-cloud-openstack) - [交互式记事本编程](#%E4%BA%A4%E4%BA%92%E5%BC%8F%E8%AE%B0%E4%BA%8B%E6%9C%AC%E7%BC%96%E7%A8%8B) - [GitHub](#github) ## 修改键位映射 作为一名程序员,键盘是你的主要输入工具。它像计算机里的其他部件一样是可配置的,而且值得你在这上面花时间。 一个很常见的配置是修改键位映射。通常这个功能由在计算机上运行的软件实现。当某一个按键被按下,软件截获键盘发出的按键事件(keypress event)并使用另外一个事件取代。比如: - 将 Caps Lock 映射为 Ctrl 或者 Escape:Caps Lock 使用了键盘上一个非常方便的位置而它的功能却很少被用到,所以我们(讲师)非常推荐这个修改; - 将 PrtSc 映射为播放/暂停:大部分操作系统支持播放/暂停键; - 交换 Ctrl 和 Meta 键(Windows 的徽标键或者 Mac 的 Command 键)。 你也可以将键位映射为任意常用的指令。软件监听到特定的按键组合后会运行设定的脚本。 - 打开一个新的终端或者浏览器窗口; - 输出特定的字符串,比如:一个超长邮件地址或者 MIT ID; - 使计算机或者显示器进入睡眠模式。 甚至更复杂的修改也可以通过软件实现: - 映射按键顺序,比如:按 Shift 键五下切换大小写锁定; - 区别映射单点和长按,比如:单点 Caps Lock 映射为 Escape,而长按 Caps Lock 映射为 Ctrl; - 对不同的键盘或软件保存专用的映射配置。 下面是一些修改键位映射的软件: - macOS - [karabiner-elements](https://pqrs.org/osx/karabiner/), [skhd](https://github.com/koekeishiya/skhd) 或者 [BetterTouchTool](https://folivora.ai/) - Linux - [xmodmap](https://wiki.archlinux.org/index.php/Xmodmap) 或者 [Autokey](https://github.com/autokey/autokey) - Windows - 控制面板,[AutoHotkey](https://www.autohotkey.com/) 或者 [SharpKeys](https://www.randyrants.com/category/sharpkeys/) - QMK - 如果你的键盘支持定制固件,[QMK](https://docs.qmk.fm/) 可以直接在键盘的硬件上修改键位映射。保留在键盘里的映射免除了在别的机器上的重复配置。 ## 守护进程 即便守护进程(daemon)这个词看上去有些陌生,你应该已经大约明白它的概念。大部分计算机都有一系列在后台保持运行,不需要用户手动运行或者交互的进程。这些进程就是守护进程。以守护进程运行的程序名一般以 `d` 结尾,比如 SSH 服务端 `sshd`,用来监听传入的 SSH 连接请求并对用户进行鉴权。 Linux 中的 `systemd`(the system daemon)是最常用的配置和运行守护进程的方法。运行 `systemctl status` 命令可以看到正在运行的所有守护进程。这里面有很多可能你没有见过,但是掌管了系统的核心部分的进程:管理网络、DNS 解析、显示系统的图形界面等等。用户使用 `systemctl` 命令和 `systemd` 交互来 `enable`(启用)、`disable`(禁用)、`start`(启动)、`stop`(停止)、`restart`(重启)、或者 `status`(检查)配置好的守护进程及系统服务。 `systemd` 提供了一个很方便的界面用于配置和启用新的守护进程或系统服务。下面的配置文件使用了守护进程来运行一个简单的 Python 程序。文件的内容非常直接所以我们不对它详细阐述。`systemd` 配置文件的详细指南可参见 [freedesktop.org](https://www.freedesktop.org/software/systemd/man/systemd.service.html)。 ```ini # /etc/systemd/system/myapp.service [Unit] # 配置文件描述 Description=My Custom App # 在网络服务启动后启动该进程 After=network.target [Service] # 运行该进程的用户 User=foo # 运行该进程的用户组 Group=foo # 运行该进程的根目录 WorkingDirectory=/home/foo/projects/mydaemon # 开始该进程的命令 ExecStart=/usr/bin/local/python3.7 app.py # 在出现错误时重启该进程 Restart=on-failure [Install] # 相当于Windows的开机启动。即使GUI没有启动,该进程也会加载并运行 WantedBy=multi-user.target # 如果该进程仅需要在GUI活动时运行,这里应写作: # WantedBy=graphical.target # graphical.target在multi-user.target的基础上运行和GUI相关的服务 ``` 如果你只是想定期运行一些程序,可以直接使用 [`cron`](https://www.man7.org/linux/man-pages/man8/cron.8.html)。它是一个系统内置的,用来执行定期任务的守护进程。 ## FUSE 现在的软件系统一般由很多模块化的组件构建而成。你使用的操作系统可以通过一系列共同的方式使用不同的文件系统上的相似功能。比如当你使用 `touch` 命令创建文件的时候,`touch` 使用系统调用(system call)向内核发出请求。内核再根据文件系统,调用特有的方法来创建文件。这里的问题是,UNIX 文件系统在传统上是以内核模块的形式实现,导致只有内核可以进行文件系统相关的调用。 [FUSE](https://en.wikipedia.org/wiki/Filesystem_in_Userspace)(用户空间文件系统)允许运行在用户空间上的程序实现文件系统调用,并将这些调用与内核接口联系起来。在实践中,这意味着用户可以在文件系统调用中实现任意功能。 FUSE 可以用于实现如:一个将所有文件系统操作都使用 SSH 转发到远程主机,由远程主机处理后返回结果到本地计算机的虚拟文件系统。这个文件系统里的文件虽然存储在远程主机,对于本地计算机上的软件而言和存储在本地别无二致。`sshfs` 就是一个实现了这种功能的 FUSE 文件系统。 一些有趣的 FUSE 文件系统包括: - [sshfs](https://github.com/libfuse/sshfs):使用 SSH 连接在本地打开远程主机上的文件 - [rclone](https://rclone.org/commands/rclone_mount/):将 Dropbox、Google Drive、Amazon S3、或者 Google Cloud Storage 一类的云存储服务挂载为本地文件系统 - [gocryptfs](https://nuetzlich.net/gocryptfs/):覆盖在加密文件上的文件系统。文件以加密形式保存在磁盘里,但该文件系统挂载后用户可以直接从挂载点访问文件的明文 - [kbfs](https://keybase.io/docs/kbfs):分布式端到端加密文件系统。在这个文件系统里有私密(private),共享(shared),以及公开(public)三种类型的文件夹 - [borgbackup](https://borgbackup.readthedocs.io/en/stable/usage/mount.html):方便用户浏览删除重复数据后的压缩加密备份 ## 备份 任何没有备份的数据都可能在一个瞬间永远消失。复制数据很简单,但是可靠地备份数据很难。下面列举了一些关于备份的基础知识,以及一些常见做法容易掉进的陷阱。 首先,复制存储在同一个磁盘上的数据不是备份,因为这个磁盘是一个单点故障(single point of failure)。这个磁盘一旦出现问题,所有的数据都可能丢失。放在家里的外置磁盘因为火灾、抢劫等原因可能会和源数据一起丢失,所以是一个弱备份。推荐的做法是将数据备份到不同的地点存储。 同步方案也不是备份。即使方便如 Dropbox 或者 Google Drive,当数据在本地被抹除或者损坏,同步方案可能会把这些“更改”同步到云端。同理,像 RAID 这样的磁盘镜像方案也不是备份。它不能防止文件被意外删除、损坏、或者被勒索软件加密。 有效备份方案的几个核心特性是:版本控制,删除重复数据,以及安全性。对备份的数据实施版本控制保证了用户可以从任何记录过的历史版本中恢复数据。在备份中检测并删除重复数据,使其仅备份增量变化可以减少存储开销。在安全性方面,作为用户,你应该考虑别人需要有什么信息或者工具才可以访问或者完全删除你的数据及备份。最后一点,不要盲目信任备份方案。用户应该经常检查备份是否可以用来恢复数据。 备份不限制于备份在本地计算机上的文件。云端应用的重大发展使得我们很多的数据只存储在云端。当我们无法登录这些应用,在云端存储的网络邮件,社交网络上的照片,流媒体音乐播放列表,以及在线文档等等都会随之丢失。用户应该有这些数据的离线备份,而且已经有项目可以帮助下载并存储它们。 如果想要了解更多具体内容,请参考本课程 2019 年关于备份的 [课堂笔记](/2019/backups)。 ## API(应用程序接口) 关于如何使用计算机有效率地完成 _本地_ 任务,我们这堂课已经介绍了很多方法。这些方法在互联网上其实也适用。大多数线上服务提供的 API(应用程序接口)让你可以通过编程方式来访问这些服务的数据。比如,美国国家气象局就提供了一个可以从 shell 中获取天气预报的 API。 这些 API 大多具有类似的格式。它们的结构化 URL 通常使用 `api.service.com` 作为根路径,用户可以访问不同的子路径来访问需要调用的操作,以及添加查询参数使 API 返回符合查询参数条件的结果。 以美国天气数据为例,为了获得某个地点的天气数据,你可以发送一个 GET 请求(比如使用 `curl`)到 [`https://api.weather.gov/points/42.3604,-71.094`](https://api.weather.gov/points/42.3604,-71.094)。返回中会包括一系列用于获取特定信息(比如小时预报、气象观察站信息等)的 URL。通常这些返回都是 `JSON` 格式,你可以使用 [`jq`](https://stedolan.github.io/jq/) 等工具来选取需要的部分。 有些需要认证的 API 通常要求用户在请求中加入某种私密令牌(secret token)来完成认证。请阅读你想访问的 API 所提供的文档来确定它请求的认证方式,但是其实大多数 API 都会使用 [OAuth](https://www.oauth.com/)。OAuth 通过向用户提供一系列仅可用于该 API 特定功能的私密令牌进行校验。因为使用了有效 OAuth 令牌的请求在 API 看来就是用户本人发出的请求,所以请一定保管好这些私密令牌。否则其他人就可以冒用你的身份进行任何你可以在这个 API 上进行的操作。 [IFTTT](https://ifttt.com/) 这个网站可以将很多 API 整合在一起,让某 API 发生的特定事件触发在其他 API 上执行的任务。IFTTT 的全称 If This Then That 足以说明它的用法,比如在检测到用户的新推文后,自动发布在其他平台。但是你可以对它支持的 API 进行任意整合,所以试着来设置一下任何你需要的功能吧! ## 常见命令行标志参数及模式 命令行工具的用法千差万别,阅读 `man` 页面可以帮助你理解每种工具的用法。即便如此,下面我们将介绍一下命令行工具一些常见的共同功能。 - 大部分工具支持 `--help` 或者类似的标志参数(flag)来显示它们的简略用法。 - 会造成不可撤回操作的工具一般会提供“空运行”(dry run)标志参数,这样用户可以确认工具真实运行时会进行的操作。这些工具通常也会有“交互式”(interactive)标志参数,在执行每个不可撤回的操作前提示用户确认。 - `--version` 或者 `-V` 标志参数可以让工具显示它的版本信息(对于提交软件问题报告非常重要)。 - 基本所有的工具支持使用 `--verbose` 或者 `-v` 标志参数来输出详细的运行信息。多次使用这个标志参数,比如 `-vvv`,可以让工具输出更详细的信息(经常用于调试)。同样,很多工具支持 `--quiet` 标志参数来抑制除错误提示之外的其他输出。 - 大多数工具中,使用 `-` 代替输入或者输出文件名意味着工具将从标准输入(standard input)获取所需内容,或者向标准输出(standard output)输出结果。 - 会造成破坏性结果的工具一般默认进行非递归的操作,但是支持使用“递归”(recursive)标志函数(通常是 `-r`)。 - 有的时候你可能需要向工具传入一个 _看上去_ 像标志参数的普通参数,比如: - 使用 `rm` 删除一个叫 `-r` 的文件; - 在通过一个程序运行另一个程序的时候(`ssh machine foo`),向内层的程序(`foo`)传递一个标志参数。 这时候你可以使用特殊参数 `--` 让某个程序 _停止处理_ `--` 后面出现的标志参数以及选项(以 `-` 开头的内容): - `rm -- -r` 会让 `rm` 将 `-r` 当作文件名; - `ssh machine --for-ssh -- foo --for-foo` 的 `--` 会让 `ssh` 知道 `--for-foo` 不是 `ssh` 的标志参数。 ## 窗口管理器 大部分人适应了 Windows、macOS、以及 Ubuntu 默认的“拖拽”式窗口管理器。这些窗口管理器的窗口一般就堆在屏幕上,你可以拖拽改变窗口的位置、缩放窗口、以及让窗口堆叠在一起。这种堆叠式(floating/stacking)管理器只是窗口管理器中的一种。特别在 Linux 中,有很多种其他的管理器。 平铺式(tiling)管理器就是一个常见的替代。顾名思义,平铺式管理器会把不同的窗口像贴瓷砖一样平铺在一起而不和其他窗口重叠。这和 [tmux](https://github.com/tmux/tmux) 管理终端窗口的方式类似。平铺式管理器按照写好的布局显示打开的窗口。如果只打开一个窗口,它会填满整个屏幕。新开一个窗口的时候,原来的窗口会缩小到比如三分之二或者三分之一的大小来腾出空间。打开更多的窗口会让已有的窗口进一步调整。 就像 tmux 那样,平铺式管理器可以让你在完全不使用鼠标的情况下使用键盘切换、缩放、以及移动窗口。它们值得一试! ## VPN VPN 现在非常火,但我们不清楚这是不是因为 [一些好的理由](https://gist.github.com/joepie91/5a9909939e6ce7d09e29)。你应该了解 VPN 能提供的功能和它的限制。使用了 VPN 的你对于互联网而言,**最好的情况** 下也就是换了一个网络供应商(ISP)。所有你发出的流量看上去来源于 VPN 供应商的网络而不是你的“真实”地址,而你实际接入的网络只能看到加密的流量。 虽然这听上去非常诱人,但是你应该知道使用 VPN 只是把原本对网络供应商的信任放在了 VPN 供应商那里——网络供应商 _能看到的_,VPN 供应商 _也都能看到_。如果相比网络供应商你更信任 VPN 供应商,那当然很好。反之,则连接 VPN 的价值不明确。机场的不加密公共热点确实不可以信任,但是在家庭网络环境里,这个差异就没有那么明显。 你也应该了解现在大部分包含用户敏感信息的流量已经被 HTTPS 或者 TLS 加密。这种情况下你所处的网络环境是否“安全”不太重要:供应商只能看到你和哪些服务器在交谈,却不能看到你们交谈的内容。 这一切的大前提都是“最好的情况”。曾经发生过 VPN 提供商错误使用弱加密或者直接禁用加密的先例。另外,有些恶意的或者带有投机心态的供应商会记录和你有关的所有流量,并很可能会将这些信息卖给第三方。找错一家 VPN 经常比一开始就不用 VPN 更危险。 MIT 向有访问校内资源需求的成员开放自己运营的 [VPN](https://ist.mit.edu/vpn)。如果你也想自己配置一个 VPN,可以了解一下 [WireGuard](https://www.wireguard.com/) 以及 [Algo](https://github.com/trailofbits/algo)。 ## Markdown 你在职业生涯中大概率会编写各种各样的文档。在很多情况下这些文档需要使用标记来增加可读性,比如:插入粗体或者斜体内容,增加页眉、超链接、以及代码片段。 在不使用 Word 或者 LaTeX 等复杂工具的情况下,你可以考虑使用 [Markdown](https://commonmark.org/help/) 这个轻量化的标记语言(markup language)。你可能已经见过 Markdown 或者它的一个变种。很多环境都支持并使用 Markdown 的一些子功能。 Markdown 致力于将人们编写纯文本时的一些习惯标准化。比如: - 用 `*` 包围的文字表示强调(*斜体*),或者用 `**` 表示特别强调(**粗体**); - 以 `#` 开头的行是标题,`#` 的数量表示标题的级别,比如:`##二级标题`; - 以 `-` 开头代表一个无序列表的元素。一个数字加 `.`(比如 `1.`)代表一个有序列表元素; - 反引号 `` ` ``(backtick)包围的文字会以 `代码字体` 显示。如果要显示一段代码,可以在每一行前加四个空格缩进,或者使用三个反引号包围整个代码片段: ``` 就像这样 ``` - 如果要添加超链接,将 _需要显示_ 的文字用方括号包围,并在后面紧接着用圆括号包围链接:`[显示文字](指向的链接)`。 Markdown 不仅容易上手,而且应用非常广泛。实际上本课程的课堂笔记和其他资料都是使用 Markdown 编写的。点击 [这个链接](https://github.com/missing-semester-cn/missing-semester-cn.github.io/blob/master/_2020/potpourri.md) 可以看到本页面的原始 Markdown 内容。 ## Hammerspoon (macOS 桌面自动化) [Hammerspoon](https://www.hammerspoon.org/) 是面向 macOS 的一个桌面自动化框架。它允许用户编写和操作系统功能挂钩的 Lua 脚本,从而与键盘、鼠标、窗口、文件系统等交互。 下面是 Hammerspoon 的一些示例应用: - 绑定移动窗口到的特定位置的快捷键 - 创建可以自动将窗口整理成特定布局的菜单栏按钮 - 在你到实验室以后,通过检测所连接的 WiFi 网络自动静音扬声器 - 在你不小心拿了朋友的充电器时弹出警告 从用户的角度,Hammerspoon 可以运行任意 Lua 代码,绑定菜单栏按钮、按键、或者事件。Hammerspoon 提供了一个全面的用于和系统交互的库,因此它能没有限制地实现任何功能。你可以从头编写自己的 Hammerspoon 配置,也可以结合别人公布的配置来满足自己的需求。 ### 资源 - [Getting Started with Hammerspoon](https://www.hammerspoon.org/go/):Hammerspoon 官方教程 - [Sample configurations](https://github.com/Hammerspoon/hammerspoon/wiki/Sample-Configurations):Hammerspoon 官方示例配置 - [Anish's Hammerspoon config](https://github.com/anishathalye/dotfiles-local/tree/mac/hammerspoon):Anish 的 Hammerspoon 配置 ## 开机引导以及 Live USB 在你的计算机启动时,[BIOS](https://en.wikipedia.org/wiki/BIOS) 或者 [UEFI](https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface) 会在加载操作系统之前对硬件系统进行初始化,这被称为引导(booting)。你可以通过按下计算机提示的键位组合来配置引导,比如 `Press F9 to configure BIOS. Press F12 to enter boot menu`。在 BIOS 菜单中你可以对硬件相关的设置进行更改,也可以在引导菜单中选择从硬盘以外的其他设备加载操作系统——比如 Live USB。 [Live USB](https://en.wikipedia.org/wiki/Live_USB) 是包含了完整操作系统的闪存盘。Live USB 的用途非常广泛,包括: - 作为安装操作系统的启动盘; - 在不将操作系统安装到硬盘的情况下,直接运行 Live USB 上的操作系统; - 对硬盘上的相同操作系统进行修复; - 恢复硬盘上的数据。 Live USB 通过在闪存盘上 _写入_ 操作系统的镜像制作,而写入不是单纯的往闪存盘上复制 `.iso` 文件。你可以使用 [UNetbootin](https://unetbootin.github.io/) 、[Rufus](https://github.com/pbatard/rufus) 等 Live USB 写入工具制作。 ## Docker, Vagrant, VMs, Cloud, OpenStack [虚拟机](https://en.wikipedia.org/wiki/Virtual_machine)(Virtual Machine)以及容器化(containerization)等工具可以帮助你模拟一个包括操作系统的完整计算机系统。虚拟机可以用于创建独立的测试或者开发环境,以及用作安全测试的沙盒。 [Vagrant](https://www.vagrantup.com/) 是一个构建和配置虚拟开发环境的工具。它支持用户在配置文件中写入比如操作系统、系统服务、需要安装的软件包等描述,然后使用 `vagrant up` 命令在各种环境(VirtualBox,KVM,Hyper-V 等)中启动一个虚拟机。[Docker](https://www.docker.com/) 是一个使用容器化概念的类似工具。 租用云端虚拟机可以享受以下资源的即时访问: - 便宜、常开、且有公共 IP 地址的虚拟机用来托管网站等服务 - 有大量 CPU、磁盘、内存、以及 GPU 资源的虚拟机 - 超出用户可以使用的物理主机数量的虚拟机 - 相比物理主机的固定开支,虚拟机的开支一般按运行的时间计算。所以如果用户只需要在短时间内使用大量算力,租用 1000 台虚拟机运行几分钟明显更加划算。 受欢迎的 VPS 服务商有 [Amazon AWS](https://aws.amazon.com/),[Google Cloud](https://cloud.google.com/)、[ Microsoft Azure](https://azure.microsoft.com/) 以及 [DigitalOcean](https://www.digitalocean.com/)。 MIT CSAIL 的成员可以使用 [CSAIL OpenStack instance](https://tig.csail.mit.edu/shared-computing/open-stack/) 申请免费的虚拟机用于研究。 ## 交互式记事本编程 [交互式记事本](https://en.wikipedia.org/wiki/Notebook_interface) 可以帮助开发者进行与运行结果交互等探索性的编程。现在最受欢迎的交互式记事本环境大概是 [Jupyter](https://jupyter.org/)。它的名字来源于所支持的三种核心语言:Julia、Python、R。[Wolfram Mathematica](https://www.wolfram.com/mathematica/) 是另外一个常用于科学计算的优秀环境。 ## GitHub [GitHub](https://github.com/) 是最受欢迎的开源软件开发平台之一。我们课程中提到的很多工具,从 [vim](https://github.com/vim/vim) 到 [Hammerspoon](https://github.com/Hammerspoon/hammerspoon),都托管在 Github 上。向你每天使用的开源工具作出贡献其实很简单,下面是两种贡献者们经常使用的方法: - 创建一个 [议题(issue)](https://help.github.com/en/github/managing-your-work-on-github/creating-an-issue)。 议题可以用来反映软件运行的问题或者请求新的功能。创建议题并不需要创建者阅读或者编写代码,所以它是一个轻量化的贡献方式。高质量的问题报告对于开发者十分重要。在现有的议题发表评论也可以对项目的开发作出贡献。 - 使用 [拉取请求(pull request)](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests) 提交代码更改。由于涉及到阅读和编写代码,提交拉取请求总的来说比创建议题更加深入。拉取请求是请求别人把你自己的代码拉取(且合并)到他们的仓库里。很多开源项目仅允许认证的管理者管理项目代码,所以一般需要 [复刻(fork)](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) 这些项目的上游仓库(upstream repository),在你的 Github 账号下创建一个内容完全相同但是由你控制的复刻仓库。这样你就可以在这个复刻仓库自由创建新的分支并推送修复问题或者实现新功能的代码。完成修改以后再回到开源项目的 Github 页面 [创建一个拉取请求](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request)。 提交请求后,项目管理者会和你交流拉取请求里的代码并给出反馈。如果没有问题,你的代码会和上游仓库中的代码合并。很多大的开源项目会提供贡献指南,容易上手的议题,甚至专门的指导项目来帮助参与者熟悉这些项目。 ================================================ FILE: _2020/qa.md ================================================ --- layout: lecture title: "提问&回答" date: 2020-01-30 ready: true sync: true syncdate: 2025-08-16 video: aspect: 56.25 id: Wz50FvGG6xU --- 最后一节课,我们回答学生提出的问题: - [学习操作系统相关内容的推荐,比如进程,虚拟内存,中断,内存管理等](#学习操作系统相关内容的推荐比如进程虚拟内存中断内存管理等) - [你会优先学习的工具有那些?](#你会优先学习的工具有那些) - [使用 Python VS Bash 脚本 VS 其他语言?](#使用-python-vs-bash-脚本-vs-其他语言) - [`source script.sh` 和 `./script.sh` 有什么区别?](#source-scriptsh-和-scriptsh-有什么区别) - [各种软件包和工具存储在哪里?引用过程是怎样的? `/bin` 或 `/lib` 是什么?](#各种软件包和工具存储在哪里引用过程是怎样的-bin-或-lib-是什么) - [我应该用 `apt-get install` 还是 `pip install` 去下载软件包呢?](#我应该用-apt-get-install-还是-pip-install-去下载软件包呢) - [用于提高代码性能,简单好用的性能分析工具有哪些?](#用于提高代码性能简单好用的性能分析工具有哪些) - [你使用那些浏览器插件?](#你使用那些浏览器插件) - [有哪些有用的数据整理工具?](#有哪些有用的数据整理工具) - [Docker 和虚拟机有什么区别?](#docker-和虚拟机有什么区别) - [不同操作系统的优缺点是什么,我们如何选择(比如选择最适用于我们需求的 Linux 发行版)?](#不同操作系统的优缺点是什么我们如何选择比如选择最适用于我们需求的-linux-发行版) - [使用 Vim 编辑器 VS Emacs 编辑器?](#使用-vim-编辑器-vs-emacs-编辑器) - [机器学习应用的提示或技巧?](#机器学习应用的提示或技巧) - [还有更多的 Vim 小窍门吗?](#还有更多的-vim-小窍门吗) - [2FA 是什么,为什么我需要使用它?](#2fa-是什么为什么我需要使用它) - [对于不同的 Web 浏览器有什么评价?](#对于不同的-web-浏览器有什么评价) ## 学习操作系统相关内容的推荐,比如进程,虚拟内存,中断,内存管理等 首先,不清楚你是不是真的需要了解这些更底层的话题。 当你开始编写更加底层的代码,比如实现或修改内核的时候,这些内容是很重要的。除了其他课程中简要介绍过的进程和信号量之外,大部分话题都不相关。 学习资源: - [MIT's 6.828 class](https://pdos.csail.mit.edu/6.828/) - 研究生阶段的操作系统课程(课程资料是公开的)。 - 现代操作系统 第四版(*Modern Operating Systems 4th ed*) - 作者是 Andrew S. Tanenbaum 这本书对上述很多概念都有很好的描述。 - FreeBSD 的设计与实现(*The Design and Implementation of the FreeBSD Operating System*) - 关于 FreeBSD OS 不错的资源(注意,FreeBSD OS 不是 Linux)。 - 其他的指南例如 [用 Rust 写操作系统](https://os.phil-opp.com/) 这里用不同的语言逐步实现了内核,主要用于教学的目的。 ## 你会优先学习的工具有那些? 值得优先学习的内容: - 多去使用键盘,少使用鼠标。这一目标可以通过多加利用快捷键,更换界面等来实现。 - 学好编辑器。作为程序员你大部分时间都是在编辑文件,因此值得学好这些技能。 - 学习怎样去自动化或简化工作流程中的重复任务。因为这会节省大量的时间。 - 学习像 Git 之类的版本控制工具并且知道如何与 GitHub 结合,以便在现代的软件项目中协同工作。 ## 使用 Python VS Bash 脚本 VS 其他语言? 通常来说,Bash 脚本对于简短的一次性脚本有效,比如当你想要运行一系列的命令的时候。但是 Bash 脚本有一些比较奇怪的地方,这使得大型程序或脚本难以用 Bash 实现: - Bash 对于简单的使用情形没什么问题,但是很难对于所有可能的输入都正确。例如,脚本参数中的空格会导致 Bash 脚本出错。 - Bash 对于代码重用并不友好。因此,重用你先前已经写好的代码很困难。通常 Bash 中没有软件库的概念。 - Bash 依赖于一些像 `$?` 或 `$@` 的特殊字符指代特殊的值。其他的语言却会显式地引用,比如 `exitCode` 或 `sys.args`。 因此,对于大型或者更加复杂的脚本我们推荐使用更加成熟的脚本语言例如 Python 和 Ruby。 你可以找到很多用这些语言编写的,用来解决常见问题的在线库。 如果你发现某种语言实现了你所需要的特定功能库,最好的方式就是直接去使用那种语言。 ## `source script.sh` 和 `./script.sh` 有什么区别? 这两种情况 `script.sh` 都会在 bash 会话中被读取和执行,不同点在于哪个会话执行这个命令。 对于 `source` 命令来说,命令是在当前的 bash 会话中执行的,因此当 `source` 执行完毕,对当前环境的任何更改(例如更改目录或是定义函数)都会留存在当前会话中。 单独运行 `./script.sh` 时,当前的 bash 会话将启动新的 bash 会话(实例),并在新实例中运行命令 `script.sh`。 因此,如果 `script.sh` 更改目录,新的 bash 会话(实例)会更改目录,但是一旦退出并将控制权返回给父 bash 会话,父会话仍然留在先前的位置(不会有目录的更改)。 同样,如果 `script.sh` 定义了要在终端中访问的函数,需要用 `source` 命令在当前 bash 会话中定义这个函数。否则,如果你运行 `./script.sh`,只有新的 bash 会话(进程)才能执行定义的函数,而当前的 shell 不能。 ## 各种软件包和工具存储在哪里?引用过程是怎样的? `/bin` 或 `/lib` 是什么? 根据你在命令行中运行的程序,这些包和工具会全部在 `PATH` 环境变量所列出的目录中查找到, 你可以使用 `which` 命令(或是 `type` 命令)来检查你的 shell 在哪里发现了特定的程序。 一般来说,特定种类的文件存储有一定的规范,[文件系统,层次结构标准(Filesystem, Hierarchy Standard)](https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard) 可以查到我们讨论内容的详细列表。 - `/bin` - 基本命令二进制文件 - `/sbin` - 基本的系统二进制文件,通常是 root 运行的 - `/dev` - 设备文件,通常是硬件设备接口文件 - `/etc` - 主机特定的系统配置文件 - `/home` - 系统用户的主目录 - `/lib` - 系统软件通用库 - `/opt` - 可选的应用软件 - `/sys` - 包含系统的信息和配置([第一堂课](/2020/course-shell/) 介绍的) - `/tmp` - 临时文件( `/var/tmp` ) 通常重启时删除 - `/usr/` - 只读的用户数据 + `/usr/bin` - 非必须的命令二进制文件 + `/usr/sbin` - 非必须的系统二进制文件,通常是由 root 运行的 + `/usr/local/bin` - 用户编译程序的二进制文件 - `/var` -变量文件 像日志或缓存 ## 我应该用 `apt-get install` 还是 `pip install` 去下载软件包呢? 这个问题没有普遍的答案。这与使用系统程序包管理器还是特定语言的程序包管理器来安装软件这一更笼统的问题相关。需要考虑的几件事: - 常见的软件包都可以通过这两种方法获得,但是小众的软件包或较新的软件包可能不在系统程序包管理器中。在这种情况下,使用特定语言的程序包管理器是更好的选择。 - 同样,特定语言的程序包管理器相比系统程序包管理器有更多的最新版本的程序包。 - 当使用系统软件包管理器时,将在系统范围内安装库。如果出于开发目的需要不同版本的库,则系统软件包管理器可能不能满足你的需要。对于这种情况,大多数编程语言都提供了隔离或虚拟环境,因此你可以用特定语言的程序包管理器安装不同版本的库而不会发生冲突。对于 Python,可以使用 virtualenv,对于 Ruby,使用 RVM 。 - 根据操作系统和硬件架构,其中一些软件包可能会附带二进制文件或者软件包需要被编译。例如,在树莓派(Raspberry Pi)之类的 ARM 架构计算机中,在软件附带二进制文件和软件包需要被编译的情况下,使用系统包管理器比特定语言包管理器更好。这在很大程度上取决于你的特定设置。 你应该仅使用一种解决方案,而不同时使用两种方法,因为这可能会导致难以解决的冲突。我们的建议是尽可能使用特定语言的程序包管理器,并使用隔离的环境(例如 Python 的 virtualenv)以避免影响全局环境。 ## 用于提高代码性能,简单好用的性能分析工具有哪些? 性能分析方面相当有用和简单工具是 [print timing](/2020/debugging-profiling/#timing)。你只需手动计算代码不同部分之间花费的时间。通过重复执行此操作,你可以有效地对代码进行二分法搜索,并找到花费时间最长的代码段。 对于更高级的工具, Valgrind 的 [Callgrind](http://valgrind.org/docs/manual/cl-manual.html) 可让你运行程序并计算所有的时间花费以及所有调用堆栈(即哪个函数调用了另一个函数)。然后,它会生成带注释的代码版本,其中包含每行花费的时间。但是,它会使程序运行速度降低一个数量级,并且不支持线程。其他的,[ `perf` ](http://www.brendangregg.com/perf.html) 工具和其他特定语言的采样性能分析器可以非常快速地输出有用的数据。[Flamegraphs](http://www.brendangregg.com/flamegraphs.html) 是对采样分析器结果的可视化工具。你还可以使用针对特定编程语言或任务的工具。例如,对于 Web 开发而言,Chrome 和 Firefox 内置的开发工具具有出色的性能分析器。 有时,代码中最慢的部分是系统等待磁盘读取或网络数据包之类的事件。在这些情况下,需要检查根据硬件性能估算的理论速度是否不偏离实际数值,也有专门的工具来分析系统调用中的等待时间,包括用于用户程序内核跟踪的 [eBPF](http://www.brendangregg.com/blog/2019-01-01/learn-ebpf-tracing.html) 。如果需要低级的性能分析,[ `bpftrace` ](https://github.com/iovisor/bpftrace) 值得一试。 ## 你使用那些浏览器插件? 我们钟爱的插件主要与安全性与可用性有关: - [uBlock Origin](https://github.com/gorhill/uBlock) - 是一个 [用途广泛(wide-spectrum)](https://github.com/gorhill/uBlock/wiki/Blocking-mode) 的拦截器,它不仅可以拦截广告,还可以拦截第三方的页面,也可以拦截内部脚本和其他种类资源的加载。如果你打算花更多的时间去配置,前往 [中等模式(medium mode)](https://github.com/gorhill/uBlock/wiki/Blocking-mode:-medium-mode) 或者 [强力模式(hard mode)](https://github.com/gorhill/uBlock/wiki/Blocking-mode:-hard-mode)。在你调整好设置之前一些网站会停止工作,但是这些配置会显著提高你的网络安全水平。另外, [简易模式(easy mode)](https://github.com/gorhill/uBlock/wiki/Blocking-mode:-easy-mode) 作为默认模式已经相当不错了,可以拦截大部分的广告和跟踪,你也可以自定义规则来拦截网站对象。 - [Stylus](https://github.com/openstyles/stylus/) - 是 Stylish 的分支(不要使用 Stylish,它会 [窃取浏览记录](https://www.theregister.co.uk/2018/07/05/browsers_pull_stylish_but_invasive_browser_extension/)),这个插件可让你将自定义 CSS 样式加载到网站。使用 Stylus,你可以轻松地自定义和修改网站的外观。可以删除侧边框,更改背景颜色,更改文字大小或字体样式。这可以使你经常访问的网站更具可读性。此外,Stylus 可以找到其他用户编写并发布在 [userstyles.org](https://userstyles.org/) 中的样式。大多数常用的网站都有一个或几个深色主题样式。 - 全页屏幕捕获 - 内置于 [Firefox](https://screenshots.firefox.com/) 和 [ Chrome 扩展程序](https://chrome.google.com/webstore/detail/full-page-screen-capture/fdpohaocaechififmbbbbbknoalclacl?hl=en) 中。这些插件提供完整的网站截图,通常比打印要好用。 - [多账户容器](https://addons.mozilla.org/en-US/firefox/addon/multi-account-containers/) - 该插件使你可以将 Cookie 分为“容器”,从而允许你以不同的身份浏览 web 网页并且/或确保网站无法在它们之间共享信息。 - 密码集成管理器 - 大多数密码管理器都有浏览器插件,这些插件帮你将登录凭据输入网站的过程不仅方便,而且更加安全。与简单复制粘贴用户名和密码相比,这些插件将首先检查网站域是否与列出的条目相匹配,以防止冒充网站的网络钓鱼窃取登录凭据。 ## 有哪些有用的数据整理工具? 在数据整理那一节课程中,我们没有时间讨论一些数据整理工具,包括分别用于 JSON 和 HTML 数据的专用解析器, `jq` 和 `pup`。Perl 语言是另一个更高级的可以用于数据整理管道的工具。另一个技巧是使用 `column -t` 命令,可以将空格文本(不一定对齐)转换为对齐的文本。 更一般地讲,还有 vim 和 Python 两个非传统意义上的数据整理工具。对于某些复杂的多行转换,vim 宏是非常有用的工具。你可以记录一系列操作,并根据需要重复执行多次,例如,在“编辑器”一节的 [讲义](/2020/editors/#macros)(去年 [视频](/2019/editors/))中,有一个示例是使用 vim 宏将 XML 格式的文件转换为 JSON。 对于通常以 CSV 格式显示的表格数据, Python [pandas](https://pandas.pydata.org/) 库是一个很棒的工具。不仅因为它能让复杂操作的定义(如分组依据,联接或过滤器)变得非常容易,而且还便于根据不同属性绘制数据。它还支持导出多种表格格式,包括 XLS,HTML 或 LaTeX。另外,R 语言(一种有争议的 [不好](http://arrgh.tim-smith.us/) 的语言)具有很多功能,可以计算数据的统计数字,这在管道的最后一步中非常有用。 [ggplot2](https://ggplot2.tidyverse.org/) 是 R 中很棒的绘图库。 ## Docker 和虚拟机有什么区别? Docker 基于容器这个更为概括的概念。关于容器和虚拟机之间最大的不同是,虚拟机会执行整个的 OS 栈,包括内核(即使这个内核和主机内核相同)。与虚拟机不同,容器避免运行其他内核实例,而是与主机分享内核。在 Linux 环境中,有 LXC 机制来实现,并且这能使一系列分离的主机像是在使用自己的硬件启动程序,而实际上是共享主机的硬件和内核。因此容器的开销小于完整的虚拟机。 另一方面,容器的隔离性较弱而且只有在主机运行相同的内核时才能正常工作。例如,如果你在 macOS 上运行 Docker,Docker 需要启动 Linux 虚拟机去获取初始的 Linux 内核,这样的开销仍然很大。最后,Docker 是容器的特定实现,它是为软件部署而定制的。基于这些,它有一些奇怪之处:例如,默认情况下,Docker 容器在重启之间不会有以任何形式的存储。 ## 不同操作系统的优缺点是什么,我们如何选择(比如选择最适用于我们需求的 Linux 发行版)? 关于 Linux 发行版,尽管有相当多的版本,但大部分发行版在大多数使用情况下的表现是相同的。 可以使用任何发行版去学习 Linux 与 UNIX 的特性和其内部工作原理。 发行版之间的根本区别是发行版如何处理软件包更新。 某些版本,例如 Arch Linux 采用滚动更新策略,用了最前沿的软件包(bleeding-edge),但软件可能并不稳定。另外一些发行版(如 Debian,CentOS 或 Ubuntu LTS)其更新策略要保守得多,因此更新的内容会更稳定,但会牺牲一些新功能。我们建议你使用 Debian 或 Ubuntu 来获得简单稳定的台式机和服务器体验。 Mac OS 是介于 Windows 和 Linux 之间的一个操作系统,它有很漂亮的界面。但是,Mac OS 是基于 BSD 而不是 Linux,因此系统的某些部分和命令是不同的。 另一种值得体验的是 FreeBSD。虽然某些程序不能在 FreeBSD 上运行,但与 Linux 相比,BSD 生态系统的碎片化程度要低得多,并且说明文档更加友好。 除了开发 Windows 应用程序或需要使用某些 Windows 系统更好支持的功能(例如对游戏的驱动程序支持)外,我们不建议使用 Windows。 对于双系统,我们认为最有效的是 macOS 的 bootcamp,长期来看,任何其他组合都可能会出现问题,尤其是当你结合了其他功能比如磁盘加密。 ## 使用 Vim 编辑器 VS Emacs 编辑器? 我们三个都使用 vim 作为我们的主要编辑器。但是 Emacs 也是一个不错的选择,你可以两者都尝试,看看那个更适合你。Emacs 不使用 vim 的模式编辑,但是这些功能可以通过 Emacs 插件像 [Evil](https://github.com/emacs-evil/evil) 或 [Doom Emacs](https://github.com/hlissner/doom-emacs) 来实现。 Emacs 的优点是可以用 Lisp 语言进行扩展(Lisp 比 vim 默认的脚本语言 vimscript 要更好用)。 ## 机器学习应用的提示或技巧? 课程的一些经验可以直接用于机器学习程序。 就像许多科学学科一样,在机器学习中,你需要进行一系列实验,并检查哪些数据有效,哪些无效。 你可以使用 Shell 轻松快速地搜索这些实验结果,并且以合理的方式汇总。这意味着需要在限定时间内或使用特定数据集的情况下,检查所有实验结果。通过使用 JSON 文件记录实验的所有相关参数,使用我们在本课程中介绍的工具,这件事情可以变得极其简单。 最后,如果你不使用集群提交你的 GPU 作业,那你应该研究如何使该过程自动化,因为这是一项非常耗时的任务,会消耗你的精力。 ## 还有更多的 Vim 小窍门吗? 更多的窍门: - 插件 - 花时间去探索插件。有很多不错的插件修复了 vim 的缺陷或者增加了能够与现有 vim 工作流结合的新功能。关于这部分内容,资源是 [VimAwesome](https://vimawesome.com/) 和其他程序员的 dotfiles。 - 标记 - 在 vim 里你可以使用 `m
o <-- o <-- o <-- o <---- o
^ /
\ v
--- o <-- o
Git 中的提交是不可改变的。但这并不代表错误不能被修改,只不过这种“修改”实际上是创建了一个全新的提交记录。而引用(参见下文)则被更新为指向这些新的提交。
## 数据模型及其伪代码表示
以伪代码的形式来学习 Git 的数据模型,可能更加清晰:
```
// 文件就是一组数据
type blob = arrayLecture video coming soon!
{% endif %} {{ content }}Licensed under CC BY-NC-SA.
Redirecting you to {{ page.redirect }}
================================================ FILE: about.md ================================================ --- layout: lecture title: "开设此课程的动机" --- 在传统的计算机科学课程中,从操作系统、编程语言到机器学习,这些高大上课程和主题已经非常多了。 然而有一个至关重要的主题却很少被专门讲授,而是留给学生们自己去探索。 这部分内容就是:精通工具。 这些年,我们在麻省理工学院参与了许多课程的助教活动,过程当中愈发意识到很多学生对于工具的了解知之甚少。 计算机设计的初衷就是任务自动化,然而学生们却常常陷在大量的重复任务中,或者无法完全发挥出诸如 版本控制、文本编辑器等工具的强大作用。效率低下和浪费时间还是其次,更糟糕的是,这还可能导致数据丢失或 无法完成某些特定任务。 这些主题不是大学课程的一部分:学生一直都不知道如何使用这些工具,或者说,至少是不知道如何高效 地使用,因此浪费了时间和精力在本来可以更简单的任务上。标准的计算机科学课程缺少了这门能让计算 变得更简捷的关键课程。 # The missing semester of your CS education 为了解决这个问题,我们开设了一个课程,涵盖各项对成为高效率计算机科学家或程序员至关重要的 主题。这个课程实用且具有很强的实践性,提供了各种能够立即广泛应用解决问题的趁手工具指导。 该课在 2020 年 1 月“独立活动期”开设,为期一个月,是学生开办的短期课程。虽然该课程针对 麻省理工学院,但我们公开提供了全部课程的录制视频与相关资料。 如果该课程适合你,那么以下还有一些具体的课程示例: ## 命令行与 shell 工具 如何使用别名、脚本和构建系统来自动化执行通用重复的任务。不再总是从文档中拷贝粘贴 命令。不要再“逐个执行这 15 个命令”,不要再“你忘了执行这个命令”、“你忘了传那个 参数”,类似的对话不要再有了。 例如,快速搜索历史记录可以节省大量时间。在下面这个示例中,我们展示了如何通过`convert`命令 在历史记录中跳转的一些技巧。 ## 版本控制 如何**正确地**使用版本控制,利用它避免尴尬的情况发生。与他人协作,并且能够快速定位 有问题的提交 不再大量注释代码。不再为解决 bug 而找遍所有代码。不再“我去,刚才是删了有用的代码?!”。 我们将教你如何通过拉取请求来为他人的项目贡献代码。 下面这个示例中,我们使用`git bisect`来定位哪个提交破坏了单元测试,并且通过`git revert`来进行修复。 ## 文本编辑 不论是本地还是远程,如何通过命令行高效地编辑文件,并且充分利用编辑器特性。不再来回复制文件。不再重复编辑文件。 Vim 的宏是它最好的特性之一,在下面这个示例中,我们使用嵌套的 Vim 宏快速地将 html 表格转换成了 csv 格式。 ## 远程服务器 使用 SSH 密钥连接远程机器进行工作时如何保持连接,并且让终端能够复用。不再为了仅执行个别命令 总是打开许多命令行终端。不再每次连接都总输入密码。不再因为网络断开或必须重启笔记本时 就丢失全部上下文。 以下示例,我们使用`tmux`来保持远程服务器的会话存在,并使用`mosh`来支持网络漫游和断开连接。 ## 查找文件 如何快速查找你需要的文件。不再挨个点击项目中的文件,直到找到你所需的代码。 以下示例,我们通过`fd`快速查找文件,通过`rg`找代码片段。我们也用到了`fasd`快速`cd`并`vim`最近/常用的文件/文件夹。 ## 数据处理 如何通过命令行直接轻松快速地修改、查看、解析、绘制和计算数据和文件。不再从日志文件拷贝 粘贴。不再手动统计数据。不再用电子表格画图。 ## 虚拟机 如何使用虚拟机尝试新操作系统,隔离无关的项目,并且保持宿主机整洁。不再因为做安全实验而 意外损坏你的计算机。不再有大量随机安装的不同版本软件包。 ## 安全 如何在不泄露隐私的情况下畅游互联网。不再抓破脑袋想符合自己疯狂规则的密码。不再连接不安全的开放 WiFi 网络。不再传输未加密的信息。 # 结论 这 12 节课将包括但不限于以上内容,同时每堂课都提供了能帮助你熟悉这些工具的练手小测验。如果不能 等到一月,你也可以看下[黑客工具](https://hacker-tools.github.io/lectures/),这是我们去年的 试讲。它是本课程的前身,包含许多相同的主题。 无论面对面还是远程在线,欢迎你的参与。 Happy hacking,