Showing preview only (628K chars total). Download the full file or copy to clipboard to get everything.
Repository: missing-semester-cn/missing-semester-cn.github.io
Branch: master
Commit: 8334ef64fdf1
Files: 65
Total size: 603.7 KB
Directory structure:
gitextract_gfrnrc45/
├── .editorconfig
├── .github/
│ └── ISSUE_TEMPLATE/
│ └── translation.md
├── .gitignore
├── 404.html
├── CNAME
├── Gemfile
├── README.md
├── _2019/
│ ├── automation.md
│ ├── backups.md
│ ├── command-line.md
│ ├── course-overview.md
│ ├── data-wrangling.md
│ ├── dotfiles.md
│ ├── editors.md
│ ├── files/
│ │ ├── example-data.xml
│ │ └── example.c
│ ├── index.html
│ ├── machine-introspection.md
│ ├── os-customization.md
│ ├── package-management.md
│ ├── program-introspection.md
│ ├── remote-machines.md
│ ├── security.md
│ ├── shell.md
│ ├── version-control.md
│ ├── virtual-machines.md
│ └── web.md
├── _2020/
│ ├── command-line.md
│ ├── course-shell.md
│ ├── data-wrangling.md
│ ├── debugging-profiling.md
│ ├── editors-notes.txt
│ ├── editors.md
│ ├── files/
│ │ ├── example-data.xml
│ │ └── vimrc
│ ├── index.html
│ ├── metaprogramming.md
│ ├── potpourri.md
│ ├── qa.md
│ ├── security.md
│ ├── shell-tools.md
│ └── version-control.md
├── _config.yml
├── _includes/
│ ├── head.html
│ ├── nav.html
│ ├── scaled_image.html
│ ├── scaled_video.html
│ └── video.html
├── _layouts/
│ ├── default.html
│ ├── lecture.html
│ ├── page.html
│ └── redirect.html
├── about.md
├── index.md
├── lectures.html
├── license.md
├── robots.txt
└── static/
├── css/
│ ├── main.css
│ └── syntax.css
└── files/
├── logger.py
├── sorts.py
└── subtitles/
└── 2020/
├── command-line.sbv
├── debugging-profiling.sbv
├── qa.sbv
└── shell-tools.sbv
================================================
FILE CONTENTS
================================================
================================================
FILE: .editorconfig
================================================
root = true
[*]
charset = utf-8
end_of_line = lf
indent_style = space
insert_final_newline = true
trim_trailing_whitespace = true
[*.md]
indent_size = 4
trim_trailing_whitespace = false
[*.{html,xml}]
indent_size = 2
[*.yml]
indent_size = 2
[*.css]
indent_size = 2
================================================
FILE: .github/ISSUE_TEMPLATE/translation.md
================================================
---
name: translation
about: choose the file you plan to translate
title: ''
labels: trans
assignees: ''
---
Filename :
Estimated time of finish :
Note: Please make sure you can finish it within two weeks.
================================================
FILE: .gitignore
================================================
.ruby-version
.bundle/
_site/
.jekyll-metadata
.claude/
================================================
FILE: 404.html
================================================
---
layout: default
title: "404: Page not found"
permalink: /404.html
---
<div class="error-page">
<h1 class="title">404</h1>
<p>Sorry, the page you were looking for doesn't exist or has been moved.</p>
<p>You can go back to the <a href="/">home page</a> or use the search bar to find what you're looking for.</p>
<p>If you think this is an error, please contact us.</p>
</div>
<style>
.error-page {
text-align: center;
padding: 50px;
font-family: Arial, sans-serif;
}
.error-page .title {
font-size: 100px;
color: #ff6f61;
}
.error-page p {
font-size: 18px;
color: #333;
}
.error-page a {
color: #007bff;
text-decoration: none;
}
.error-page a:hover {
text-decoration: underline;
}
</style>
================================================
FILE: CNAME
================================================
missing-semester-cn.github.io
================================================
FILE: Gemfile
================================================
source 'https://rubygems.org'
gem 'github-pages'
================================================
FILE: README.md
================================================
# 计算机教育中缺失的一课
The Missing Semester of Your CS Education 英文课程网站在[这里](https://missing.csail.mit.edu/)!
这是[中文站点](https://missing-semester-cn.github.io)(<span style="float:right"><img src = "https://img.shields.io/badge/最近一次与英文版同步-2021--04--24-green"></span>)
欢迎为本项目做出贡献!如果您要编辑添加内容,请提出 issue 或提交 pull request。
## 开发部署
要在本地构建并查看网站,请运行:
```bash
bundle install
bundle exec jekyll serve -w
```
## 许可说明
本课程的所有内容,包括网站源代码、讲义、练习题和讲课视频,均按照 [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) 国际许可协议进行许可。
有关贡献或翻译的更多信息,请参见[这里](https://missing.csail.mit.edu/license)。
-----------------
## 2026 课程状态
请在[sync-offical-2026](https://github.com/missing-semester-cn/missing-semester-cn.github.io/tree/sync-offical-2026)分支的README中认领任务并提交翻译。
| 讲义 | 翻译者 | 状态 |
| ---- | ---- | ---- |
| [agentic-coding.md](_2026/agentic-coding.md) | [@Lingfeng AI](https://github.com/hanxiaomax) | 待翻译 |
| [beyond-code.md](_2026/beyond-code.md) | 待分配 | 待翻译 |
| [code-quality.md](_2026/code-quality.md) | 待分配 | 待翻译 |
| [command-line-environment.md](_2026/command-line-environment.md) | 待分配 | 待翻译 |
| [course-shell.md](_2026/course-shell.md) | 待分配 | 待翻译 |
| [debugging-profiling.md](_2026/debugging-profiling.md) | 待分配 | 待翻译 |
| [development-environment.md](_2026/development-environment.md) | 待分配 | 待翻译 |
| [shipping-code.md](_2026/shipping-code.md) | 待分配 | 待翻译 |
| [version-control.md](_2026/version-control.md) | 待分配 | 待翻译 |
-----------------
## 项目状态
想要参与这个翻译项目,请通过创建一个 issue 来预订您的主题,我会相应地更新此表格,以避免重复工作。
| 讲义 | 翻译者 | 状态 |
| ---- | ---- |---- |
| [course-shell.md](_2020/course-shell.md) | [@Lingfeng AI](https://github.com/hanxiaomax) | 完成 |
| [shell-tools.md](_2020/shell-tools.md) | [@Lingfeng AI](https://github.com/hanxiaomax) | 完成 |
| [editors.md](_2020/editors.md) | [@stechu](https://github.com/stechu) | 完成 |
| [data-wrangling.md](_2020/data-wrangling.md) | [@Lingfeng AI](https://github.com/hanxiaomax) | 完成 |
| [command-line.md](_2020/command-line.md) | [@Lingfeng AI](https://github.com/hanxiaomax) | 完成 |
| [version-control.md](_2020/version-control.md) | [@Lingfeng AI](https://github.com/hanxiaomax) | 完成 |
| [debugging-profiling.md](_2020/debugging-profiling.md) |[@Lingfeng AI](https://github.com/hanxiaomax) | 完成 |
| [metaprogramming.md](_2020/metaprogramming.md) | [@Lingfeng AI](https://github.com/hanxiaomax) | 完成 |
| [security.md](_2020/security.md) | [@catcarbon](https://github.com/catcarbon) | 完成 |
| [potpourri.md](_2020/potpourri.md) | [@catcarbon](https://github.com/catcarbon) | 完成 |
| [qa.md](_2020/qa.md) | [@AA1HSHH](https://github.com/AA1HSHH) | 完成 |
| [about.md](about.md) | [@Binlogo](https://github.com/Binlogo) | 完成 |
## 新项目
[Learncpp中文版](https://github.com/hanxiaomax/Learncpp_CN).
================================================
FILE: _2019/automation.md
================================================
---
layout: lecture
title: "Automation"
presenter: Jose
video:
aspect: 56.25
id: BaLlAaHz-1k
---
Sometimes you write a script that does something but you want for it to run periodically, say a backup task. You can always write an *ad hoc* solution that runs in the background and comes online periodically. However, most UNIX systems come with the cron daemon which can run task with a frequency up to a minute based on simple rules.
On most UNIX systems the cron daemon, `crond` will be running by default but you can always check using `ps aux | grep crond`.
## The crontab
The configuration file for cron can be displayed running `crontab -l` edited running `crontab -e` The time format that cron uses are five space separated fields along with the user and command
- **minute** - What minute of the hour the command will run on,
and is between '0' and '59'
- **hour** - This controls what hour the command will run on, and is specified in
the 24 hour clock, values must be between 0 and 23 (0 is midnight)
- **dom** - This is the Day of Month, that you want the command run on, e.g. to
run a command on the 19th of each month, the dom would be 19.
- **month** - This is the month a specified command will run on, it may be specified
numerically (0-12), or as the name of the month (e.g. May)
- **dow** - This is the Day of Week that you want a command to be run on, it can
also be numeric (0-7) or as the name of the day (e.g. sun).
- **user** - This is the user who runs the command.
- **command** - This is the command that you want run. This field may contain
multiple words or spaces.
Note that using an asterisk `*` means all and using an asterisk followed by a slash and number means every nth value. So `*/5` means every five. Some examples are
```shell
*/5 * * * * # Every five minutes
0 * * * * # Every hour at o'clock
0 9 * * * # Every day at 9:00 am
0 9-17 * * * # Every hour between 9:00am and 5:00pm
0 0 * * 5 # Every Friday at 12:00 am
0 0 1 */2 * # Every other month, the first day, 12:00am
```
You can find many more examples of common crontab schedules in [crontab.guru](https://crontab.guru/examples.html)
## Shell environment and logging
A common pitfall when using cron is that it does not load the same environment scripts that common shells do such as `.bashrc`, `.zshrc`, &c and it does not log the output anywhere by default. Combined with the maximum frequency being one minute, it can become quite painful to debug cronscripts initially.
To deal with the environment, make sure that you use absolute paths in all your scripts and modify your environment variables such as `PATH` so the script can run successfully. To simplify logging, a good recommendation is to write your crontab in a format like this
```shell
* * * * * user /path/to/cronscripts/every_minute.sh >> /tmp/cron_every_minute.log 2>&1
```
And write the script in a separate file. Remember that `>>` appends to the file and that `2>&1` redirects `stderr` to `stdout` (you might to want keep them separate though).
## Anacron
One caveat of using cron is that if the computer is powered off or asleep when the cron script should run then it is not executed. For frequent tasks this might be fine, but if a task runs less often, you may want to ensure that it is executed. [anacron](https://linux.die.net/man/8/anacron) works similar to `cron` except that the frequency is specified in days. Unlike cron, it does not assume that the machine is running continuously. Hence, it can be used on machines that aren't running 24 hours a day, to control regular jobs as daily, weekly, and monthly jobs.
## Exercises
1. Make a script that looks every minute in your downloads folder for any file that is a picture (you can look into [MIME types](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types) or use a regular expression to match common extensions) and moves them into your Pictures folder.
1. Write a cron script to weekly check for outdated packages in your system and prompts you to update them or updates them automatically.
{% comment %}
- [fswatch](https://github.com/emcrisostomo/fswatch)
- GUI automation (pyautogui) [Automating the boring stuff Chapter 18](https://automatetheboringstuff.com/chapter18/)
- Ansible/puppet/chef
- https://xkcd.com/1205/
- https://xkcd.com/1319/
{% endcomment %}
================================================
FILE: _2019/backups.md
================================================
---
layout: lecture
title: "Backups"
presenter: Jose
video:
aspect: 56.25
id: lrpqYF8tcYQ
---
There are two types of people:
- Those who do backups
- Those who will do backups
Any data you own that you haven't backed up is data that could be gone at any moment, forever. Here we will cover some good backup basics and the pitfalls of some approaches.
## 3-2-1 Rule
The [3-2-1 rule](https://www.us-cert.gov/sites/default/files/publications/data_backup_options.pdf) is a general recommended strategy for backing up your data. It state that you should have:
- at least **3 copies** of your data
- **2** copies in **different mediums**
- **1** of the copies being **offsite**
The main idea behind this recommendation is not to put all your eggs in one basket. Having 2 different devices/disks ensures that a single hardware failure doesn't take away all your data. Similarly, if you store your only backup at home and the house burns down or gets robbed you lose everything, that's what the offsite copy is there for. Onsite backups give you availability and speed, offsite give you the resiliency should a disaster happen.
## Testing your backups
A common pitfall when performing backups is blindly trusting whatever the system says it's doing and not verifying that the data can be properly recovered. Toy Story 2 was almost lost and their backups were not working, [luck](https://www.youtube.com/watch?v=8dhp_20j0Ys) ended up saving them.
## Versioning
You should understand that [RAID](https://en.wikipedia.org/wiki/RAID) is not a backup, and in general **mirroring is not a backup solution**. Simply syncing your files somewhere will not help in several scenarios, such as:
- Data corruption
- Malicious software
- Deleting files by mistake
If the changes on your data propagate to the backup then you won't be able to recover in these scenarios. Note that this is the case for a lot of cloud storage solutions like Dropbox, Google Drive, One Drive, &c. Some of them do keep deleted data around for short amounts of time but usually the interface to recover is not something you want to be using to recover large amounts of files.
A proper backup system should be versioned in order to prevent this failure mode. By providing different snapshots in time one can easily navigate them to restore whatever was lost. The most widely known software of this kind is macOS Time Machine.
## Deduplication
However, making several copies of your data might be extremely costly in terms of disk space. Nevertheless, from one version to the next, most data will be identical and needs not be transferred again. This is where [data deduplication](https://en.wikipedia.org/wiki/Data_deduplication) comes into play, by keeping track of what has already been stored one can do **incremental backups** where only the changes from one version to the next need to be stored. This significantly reduces the amount of space needed for backups beyond the first copy.
## Encryption
Since we might be backing up to untrusted third parties like cloud providers it is worth considering that if you backup your data is copied *as is* then it could potentially be looked by unwanted agents. Documents like your taxes are sensitive information that should not be backed up in plain format. To prevent this, many backup solutions offer **client side encryption** where data is encrypted before being sent to the server. That way the server cannot read the data it is storing but you can decrypt it with your secret key.
As a side note, if your disk (or home partition) is not encrypted, then anyone that get hold of your computer can manage to override the user access controls and read your data. Modern hardware supports fast and efficient read and writes of encrypted data so you might want to consider enabling **full disk encryption**.
## Append only
The properties reviewed so far focus on hardware failure or user mistakes but fail to address what happens if a malicious agent wanted to delete your data. Namely, say someone hacks into your system, are they able to wipe all your copies of the data you care about? If you worry about that scenario then you need some sort of append only backup solution. In general, this means having a server that will allow you to send new data but will refuse to delete existing data. Usually users have two keys, an append only key that supports creating new backups and a full access key that also allows for deleting old backups that are no longer needed. The latter one is stored offline.
Note that this is a quite challenging scenario since you need the ability to make changes whilst still preventing a malicious user from deleting your data. Existing commercial solutions include [Tarsnap](https://www.tarsnap.com/) and [Borgbase](https://www.borgbase.com/).
## Additional considerations
Some other things you may want to look into are:
- **Periodic backups**: outdated backups can become pretty useless. Making backups regularly should be a consideration for your system
- **Bootable backups**: some programs allow you to clone your entire disk. That way you have an image that contains an entire copy of your system you can boot directly from.
- **Differential backup strategies**, you may not necessarily care the same about all your data. You can define different backup policies for different types of data.
- **Append only backups** an additional consideration is to enforce append only operations to your backup repositories in order to prevent malicious agents to delete them if they get hold of your machine.
## Webservices
Not all the data that you use lives on your hard disk. If you use **webservices**, then it might be the case that some data you care about, such as Google Docs presentations or Spotify playlists, is stored online. Another easy example that is easy to forget is email accounts with web access, such as Gmail. Figuring out a backup solution in these cases is somewhat trickier. However, there are many services that allow you to download your data, either directly or via an API. Tools such as [gmvault](https://github.com/gaubert/gmvault) for Gmail are available to download the email files to your computer.
## Webpages
Similarly, some high quality content can be found online in the form of webpages. If said content is static one can easily back it up by just saving the website and all of its attachments. Another alternative is the [Wayback Machine](https://archive.org/web/), a massive digital archive of the World Wide Web managed by the [Internet Archive](https://archive.org/), a non profit organization focused on the preservation of all sorts of media. The Wayback Machine allows you to capture and archive webpages being able to later retrieve all the snapshots that have been archived for that website. If you find it useful, consider [donating](https://archive.org/donate/) to the project.
## Resources
Some good backup programs and services we have used and can honestly recommend:
- [Tarsnap](https://www.tarsnap.com/) - deduplicated, encrypted online backup service for the truly paranoid.
- [Borg Backup](https://borgbackup.readthedocs.io) - deduplicated backup program that supports compression and authenticated encryption. If you need a cloud provider [rsync.net](https://www.rsync.net/products/borg.html) has special offerings for Borg users.
- [rsync](https://rsync.samba.org/) is a utility that provides fast incremental file transfer. It is not a full backup solution.
- [rclone](https://rclone.org/) like rsync but for cloud storage providers such as Amazon S3, Dropbox, Google Drive, rsync.net, &c. Supports client side encryption of remote folders.
## Exercises
1. Consider how you are (not) backing up your data and look into fixing/improving that.
1. Figure out how to backup your email accounts
1. Choose a webservice you use often (Spotify, Google Music, etc.) and figure out what options for backing up your data are. Often people have already made tools (such as [youtube-dl](https://ytdl-org.github.io/youtube-dl/)) solutions based on available APIs.
1. Think of a website you have visited repeatedly over the years and look it up in [archive.org](https://archive.org/web/), how many versions does it have?
1. One way to efficiently implement deduplication is to use hardlinks. Whereas symbolic link (also called a soft link or a symlink) is a file that points to another file or folder, a hardlink is a exact copy of the pointer (it uses the same inode and points to the same place in the disk). Thus if the original file is removed a symlink stops working whereas a hard link doesn't. However, hardlinks only work for files. Try using the command `ln` to create hard links and compare them to symlinks created with `ln -s`. (In macOS you will need to install the gnu coreutils or the hln package).
================================================
FILE: _2019/command-line.md
================================================
---
layout: lecture
title: "Command-line environment"
presenter: Jose
video:
aspect: 62.5
id: i0rf1gpKL1E
---
## Aliases & Functions
As you can imagine it can become tiresome typing long commands that involve many flags or verbose options. Nevertheless, most shells support **aliasing**. For instance, an alias in bash has the following structure (note there is no space around the `=` sign):
```bash
alias alias_name="command_to_alias"
```
<!-- We can alias common flags for our commands like `alias ll=ls -ltAh`. Alias can be composed -->
Alias have many convenient features
```bash
# Alias can summarize good default flags
alias ll="ls -lh"
# Save a lot of typing for common commands
alias gc="git commit"
# Alias can overwrite existing commands
alias mv="mv -i"
alias mkdir="mkdir -p"
# Alias can be composed
alias la="ls -A"
alias lla="la -l"
# To ignore an alias run it prepended with \
\ls
# Or can be disabled using unalias
unalias la
```
<!--
To get rid of an alias you can run `unalias alias_name` or to ignore alias when running a command you can prepend the command with a backward slash `\alias_name`. This is convenient when an alias is overwriting an existing name. -->
However in many scenarios aliases can be limiting, specially when you are trying to write chain commands together that take the same arguments. An alternative exists which is **functions** which are a midpoint between aliases and custom shell scripts.
Here is an example function that makes a directory and move into it.
```bash
mcd () {
mkdir -p $1
cd $1
}
```
Alias and functions will not persist shell sessions by default. To make an alias persistent you need to include it a one the shell startup script files like `.bashrc` or `.zshrc`. My suggestion is to write them separately in a `.alias` and `source` that file from your different shell config files.
<!-- Lastly, if you decide to alias any of these tools with the "improved" version, e.g. `alias bat=cat` it is useful to know that you can tell bash to ignore aliases by doing `\cat` and ignore both aliases and functions by doing `command cat` -->
## Shells & Frameworks
During shell and scripting we covered the `bash` shell since it is by far the most ubiquitous shell and most systems have it as the default option. Nevertheless, it is not the only option.
For example the `zsh` shell is a superset of `bash` and provides many convenient features out of the box such as:
- Smarter globbing, `**`
- Inline globbing/wildcard expansion
- Spelling correction
- Better tab completion/selection
- Path expansion (`cd /u/lo/b` will expand as `/usr/local/bin`)
Moreover many shells can be improved with **frameworks**, some popular general frameworks like [prezto](https://github.com/sorin-ionescu/prezto) or [oh-my-zsh](https://github.com/robbyrussell/oh-my-zsh), and smaller ones that focus on specific features like for example [zsh-syntax-highlighting](https://github.com/zsh-users/zsh-syntax-highlighting) or [zsh-history-substring-search](https://github.com/zsh-users/zsh-history-substring-search). Other shells like [fish](https://fishshell.com/) include a lot of these user-friendly features by default. Some of these features include:
- Right prompt
- Command syntax highlighting
- History substring search
- manpage based flag completions
- Smarter autocompletion
- Prompt themes
One thing to note when using these frameworks is that if the code they run is not properly optimized or it is too much code, your shell can start slowing down. You can always profile it and disable the features that you do not use often or value over speed.
## Terminal Emulators & Multiplexers
Along with customizing your shell it is worth spending some time figuring out your choice of **terminal emulator** and its settings. There are many many terminal emulators out there (here is a [comparison](https://anarc.at/blog/2018-04-12-terminal-emulators-1/)).
Since you might be spending hundreds to thousands of hours in your terminal it pays off to look into its settings. Some of the aspects that you may want to modify in your terminal include:
- Font choice
- Color Scheme
- Keyboard shortcuts
- Tab/Pane support
- Scrollback configuration
- Performance (some newer terminals like [Alacritty](https://github.com/jwilm/alacritty) offer GPU acceleration)
It is also worth mentioning **terminal multiplexers** like [tmux](https://github.com/tmux/tmux). `tmux` allows you to pane and tab multiple shell sessions. It also supports attaching and detaching which is a very common use-case when you are working on a remote server and want to keep you shell running without having to worry about disowning you current processes (by default when you log out your processes are terminated). This way, with `tmux` you can jump into and out of complex terminal layouts. Similar to terminal emulators `tmux` supports heavy customization by editing the `~/.tmux.conf` file.
## Command-line utilities
The command line utilities that most UNIX based operating systems have by default are more than enough to do 99% of the stuff you usually need to do.
In the next few subsections I will cover alternative tools for extremely common shell operations which are more convenient to use. Some of these tools add new improved functionality to the command whereas others just focus on providing a simpler, more intuitive interface with better defaults.
### `fasd` vs `cd`
Even with improved path expansion and tab autocomplete, changing directories can become quite repetitive. [Fasd](https://github.com/clvv/fasd) (or [autojump](https://github.com/wting/autojump)) solves this issue by keeping track of recent and frequent folders you have been to and performing fuzzy matching.
Thus if I have visited the path `/home/user/awesome_project/code` running `z code` will `cd` to it. If I have multiple folders called code I can disambiguate by running `z awe code` which will be closer match. Unlike autojump, fasd also provides commands that instead of performing `cd` just expand frequent and /or recent files,folders or both.
### `bat` vs `cat`
Even though `cat` does it job perfectly, [bat](https://github.com/sharkdp/bat) improves it by providing syntax highlighting, paging, line numbers and git integration.
### `exa`/`ranger` vs `ls`
`ls` is a great command but some of the defaults can be annoying such as displaying the size in raw bytes. [exa](https://github.com/ogham/exa) provides better defaults
If you are in need of navigating many folders and/or previewing many files, [ranger](https://github.com/ranger/ranger) can be much more efficient than `cd` and `cat` due to its wonderful interface. It is quite customizable and with a correct setup you can even [preview images](https://github.com/ranger/ranger/wiki/Image-Previews) in your terminal
### `fd` vs `find`
[fd](https://github.com/sharkdp/fd) is a simple, fast and user-friendly alternative to `find`. `find` defaults like having to use the `--name` flag (which is what you want to do 99% of the time) make it easier to use in an every day basis. It is also `git` aware and will skip files in your `.gitignore` and `.git` folder by default. It also has nice color coding by default.
### `rg/fzf` vs `grep`
`grep` is a great tool but if you want to grep through many files at once, there are better tools for that purpose. [ack](https://github.com/beyondgrep/ack3), [ag](https://github.com/ggreer/the_silver_searcher) & [rg](https://github.com/BurntSushi/ripgrep) recursively search your current directory for a regex pattern while respecting your gitignore rules. They all work pretty similar but I favor `rg` due to how fast it can search my entire home directory.
Similarly, it can be easy to find yourself doing `CMD | grep PATTERN` over an over again. [fzf](https://github.com/junegunn/fzf) is a command line fuzzy finder that enables you to interactively filter the output of pretty much any command.
### `rsync` vs `cp/scp`
Whereas `mv` and `scp` are perfect for most scenarios, when copying/moving around large amounts of files, large files or when some of the data is already on the destination `rsync` is a huge improvement. `rsync` will skip files that have already been transferred and with the `--partial` flag it can resume from a previously interrupted copy.
### `trash` vs `rm`
`rm` is a dangerous command in the sense that once you delete a file there is no turning back. However, modern OS do not behave like that when you delete something in the file explorer, they just move it to the Trash folder which is cleared periodically.
Since how the trash is managed varies from OS to OS there is not a single CLI utility. In macOS there is [trash](https://hasseg.org/trash/) and in linux there is [trash-cli](https://github.com/andreafrancia/trash-cli/) among others.
### `mosh` vs `ssh`
`ssh ` is a very handy tool but if you have a slow connection, the lag can become annoying and if the connection interrupts you have to reconnect. [mosh](https://mosh.org/) is a handy tool that works allows roaming, supports intermittent connectivity, and provides intelligent local echo.
### `tldr` vs `man`
You can figure out what a commands does and what options it has using `man` and the `-h`/'--help' flag most of the time. However, in some cases it can be a bit daunting navigating these if they are detailed
The [tldr](https://github.com/tldr-pages/tldr) command is a community driven documentation system that's available from the command line and gives a few simple illustrative examples of what the command does and the most common argument options.
### `aunpack` vs `tar/unzip/unrar`
As [this xkcd](https://xkcd.com/1168/) references, it can be quite tricky to remember the options for `tar` and sometimes you need a different tool altogether such as `unrar` for .rar files.
The [atool](https://www.nongnu.org/atool/) package provides the `aunpack` command which will figure out the correct options and always put the extracted archives in a new folder.
## Exercises
1. Run `cat .bash_history | sort | uniq -c | sort -rn | head -n 10` (or `cat .zhistory | sort | uniq -c | sort -rn | head -n 10` for zsh) to get top 10 most used commands and consider writing shorter aliases for them
1. Choose a terminal emulator and figure out how to change the following properties:
- Font choice
- Color scheme. How many colors does a standard scheme have? why?
- Scrollback history size
1. Install `fasd` or some similar software and write a bash/zsh function called `v` that performs fuzzy matching on the passed arguments and opens up the top result in your editor of choice. Then, modify it so that if there are multiple matches you can select them with `fzf`.
1. Since `fzf` is quite convenient for performing fuzzy searches and the shell history is quite prone to those kind of searches, investigate how to bind `fzf` to `^R`. You can find some info [here](https://github.com/junegunn/fzf/wiki/Configuring-shell-key-bindings)
1. What does the `--bar` option do in `ack`?
================================================
FILE: _2019/course-overview.md
================================================
---
layout: lecture
title: "Course Overview"
presenter: Anish
video:
aspect: 56.25
id: qw2c6ffSVOM
---
# Motivation
This class is about [hacker](https://en.wikipedia.org/wiki/Hacker_culture)
tools, not [hacker](https://en.wikipedia.org/wiki/Security_hacker) tools.
MIT classes do not cover any of this content in detail. It's hugely beneficial
to be proficient with your tools: it'll save you a lot of time (and the payoff
time is very short).
We want to teach you about new tools, how to make the most of your tools, how
to customize your tools, and how to extend your tools.
# Class structure
We have 6 lectures covering a [variety of topics](/2019/). We have lecture
notes online, but there will be a lot of content covered in class (e.g. in the
form of demos) that may not be in the notes. We will be recording lectures.
Each class is split into two 50-minute lectures with a 10-minute break in
between. Lectures are mostly live demonstrations followed by hands-on
exercises. We might have a short amount of time at the end of each class to get
started on the exercises in an office-hours-style setting.
To make the most of the class, you should go through all the exercises on your
own. We'll inspire you to learn more about your tools, and we'll show you
what's possible and cover some of the basics in detail, but we can't teach you
everything in the time we have.
================================================
FILE: _2019/data-wrangling.md
================================================
---
layout: lecture
title: "Data Wrangling"
presenter: Jon
video:
aspect: 56.25
id: VW2jn9Okjhw
---
Have you ever had a bunch of text and wanted to do something with it?
Good. That's what data wrangling is all about!
Specifically, adapting data from one format to another, until you end up
with exactly what you wanted.
We've already seen basic data wrangling: `journalctl | grep -i intel`.
- find all system log entries that mention Intel (case insensitive)
- really, most of data wrangling is about knowing what tools you have,
and how to combine them.
Let's start from the beginning: we need a data source, and something to
do with it. Logs often make for a good use-case, because you often want
to investigate things about them, and reading the whole thing isn't
feasible. Let's figure out who's trying to log into my server by looking
at my server's log:
```bash
ssh myserver journalctl
```
That's far too much stuff. Let's limit it to ssh stuff:
```bash
ssh myserver journalctl | grep sshd
```
Notice that we're using a pipe to stream a _remote_ file through `grep`
on our local computer! `ssh` is magical. This is still way more stuff
than we wanted though. And pretty hard to read. Let's do better:
```bash
ssh myserver journalctl | grep sshd | grep "Disconnected from"
```
There's still a lot of noise here. There are _a lot_ of ways to get rid
of that, but let's look at one of the most powerful tools in your
toolkit: `sed`.
`sed` is a "stream editor" that builds on top of the old `ed` editor. In
it, you basically give short commands for how to modify the file, rather
than manipulate its contents directly (although you can do that too).
There are tons of commands, but one of the most common ones is `s`:
substitution. For example, we can write:
```bash
ssh myserver journalctl
| grep sshd
| grep "Disconnected from"
| sed 's/.*Disconnected from //'
```
What we just wrote was a simple _regular expression_; a powerful
construct that lets you match text against patterns. The `s` command is
written on the form: `s/REGEX/SUBSTITUTION/`, where `REGEX` is the
regular expression you want to search for, and `SUBSTITUTION` is the
text you want to substitute matching text with.
## Regular expressions
Regular expressions are common and useful enough that it's worthwhile to
take some time to understand how they work. Let's start by looking at
the one we used above: `/.*Disconnected from /`. Regular expressions are
usually (though not always) surrounded by `/`. Most ASCII characters
just carry their normal meaning, but some characters have "special"
matching behavior. Exactly which characters do what vary somewhat
between different implementations of regular expressions, which is a
source of great frustration. Very common patterns are:
- `.` means "any single character" except newline
- `*` zero or more of the preceding match
- `+` one or more of the preceding match
- `[abc]` any one character of `a`, `b`, and `c`
- `(RX1|RX2)` either something that matches `RX1` or `RX2`
- `^` the start of the line
- `$` the end of the line
`sed`'s regular expressions are somewhat weird, and will require you to
put a `\` before most of these to give them their special meaning. Or
you can pass `-E`.
So, looking back at `/.*Disconnected from /`, we see that it matches
any text that starts with any number of characters, followed by the
literal string "Disconnected from ". Which is what we wanted. But
beware, regular expressions are tricky. What if someone tried to log in
with the username "Disconnected from"? We'd have:
```
Jan 17 03:13:00 thesquareplanet.com sshd[2631]: Disconnected from invalid user Disconnected from 46.97.239.16 port 55920 [preauth]
```
What would we end up with? Well, `*` and `+` are, by default, "greedy".
They will match as much text as they can. So, in the above, we'd end up
with just
```
46.97.239.16 port 55920 [preauth]
```
Which may not be what we wanted. In some regular expression
implementations, you can just suffix `*` or `+` with a `?` to make them
non-greedy, but sadly `sed` doesn't support that. We _could_ switch to
perl's command-line mode though, which _does_ support that construct:
```bash
perl -pe 's/.*?Disconnected from //'
```
We'll stick to `sed` for the rest of this though, because it's by far
the more common tool for these kinds of jobs. `sed` can also do other
handy things like print lines following a given match, do multiple
substitutions per invocation, search for things, etc. But we won't cover
that too much here. `sed` is basically an entire topic in and of itself,
but there are often better tools.
Okay, so we also have a suffix we'd like to get rid of. How might we do
that? It's a little tricky to match just the text that follows the
username, especially if the username can have spaces and such! What we
need to do is match the _whole_ line:
```bash
| sed -E 's/.*Disconnected from (invalid |authenticating )?user .* [^ ]+ port [0-9]+( \[preauth\])?$//'
```
Let's look at what's going on with a [regex
debugger](https://regex101.com/r/qqbZqh/2). Okay, so the start is still
as before. Then, we're matching any of the "user" variants (there are
two prefixes in the logs). Then we're matching on any string of
characters where the username is. Then we're matching on any single word
(`[^ ]+`; any non-empty sequence of non-space characters). Then the word
"port" followed by a sequence of digits. Then possibly the suffix
` [preauth]`, and then the end of the line.
Notice that with this technique, as username of "Disconnected from"
won't confuse us any more. Can you see why?
There is one problem with this though, and that is that the entire log
becomes empty. We want to _keep_ the username after all. For this, we
can use "capture groups". Any text matched by a regex surrounded by
parentheses is stored in a numbered capture group. These are available
in the substitution (and in some engines, even in the pattern itself!)
as `\1`, `\2`, `\3`, etc. So:
```bash
| sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \[preauth\])?$/\2/'
```
As you can probably imagine, you can come up with _really_ complicated
regular expressions. For example, here's an article on how you might
match an [e-mail
address](https://www.regular-expressions.info/email.html). It's [not
easy](https://web.archive.org/web/20221223174323/http://emailregex.com/). And there's [lots of
discussion](https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression/1917982).
And people have [written
tests](https://fightingforalostcause.net/content/misc/2006/compare-email-regex.php).
And [test matrices](https://mathiasbynens.be/demo/url-regex). You can
even write a regex for determining if a given number [is a prime
number](https://www.noulakaz.net/2007/03/18/a-regular-expression-to-check-for-prime-numbers/).
Regular expressions are notoriously hard to get right, but they are also
very handy to have in your toolbox!
## Back to data wrangling
Okay, so we now have
```bash
ssh myserver journalctl
| grep sshd
| grep "Disconnected from"
| sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \[preauth\])?$/\2/'
```
We could do it just with `sed`, but why would we? For fun is why.
```bash
ssh myserver journalctl
| sed -E
-e '/Disconnected from/!d'
-e 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \[preauth\])?$/\2/'
```
This shows off some of `sed`'s capabilities. `sed` can also inject text
(with the `i` command), explicitly print lines (with the `p` command),
select lines by index, and lots of other things. Check `man sed`!
Anyway. What we have now gives us a list of all the usernames that have
attempted to log in. But this is pretty unhelpful. Let's look for common
ones:
```bash
ssh myserver journalctl
| grep sshd
| grep "Disconnected from"
| sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \[preauth\])?$/\2/'
| sort | uniq -c
```
`sort` will, well, sort its input. `uniq -c` will collapse consecutive
lines that are the same into a single line, prefixed with a count of the
number of occurrences. We probably want to sort that too and only keep
the most common logins:
```bash
ssh myserver journalctl
| grep sshd
| grep "Disconnected from"
| sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \[preauth\])?$/\2/'
| sort | uniq -c
| sort -nk1,1 | tail -n10
```
`sort -n` will sort in numeric (instead of lexicographic) order. `-k1,1`
means "sort by only the first whitespace-separated column". The `,n`
part says "sort until the `n`th field, where the default is the end of
the line. In this _particular_ example, sorting by the whole line
wouldn't matter, but we're here to learn!
If we wanted the _least_ common ones, we could use `head` instead of
`tail`. There's also `sort -r`, which sorts in reverse order.
Okay, so that's pretty cool, but we'd sort of like to only give the
usernames, and maybe not one per line?
```bash
ssh myserver journalctl
| grep sshd
| grep "Disconnected from"
| sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \[preauth\])?$/\2/'
| sort | uniq -c
| sort -nk1,1 | tail -n10
| awk '{print $2}' | paste -sd,
```
Let's start with `paste`: it lets you combine lines (`-s`) by a given
single-character delimiter (`-d`). But what's this `awk` business?
## awk -- another editor
`awk` is a programming language that just happens to be really good at
processing text streams. There is _a lot_ to say about `awk` if you were
to learn it properly, but as with many other things here, we'll just go
through the basics.
First, what does `{print $2}` do? Well, `awk` programs take the form of
an optional pattern plus a block saying what to do if the pattern
matches a given line. The default pattern (which we used above) matches
all lines. Inside the block, `$0` is set to the entire line's contents,
and `$1` through `$n` are set to the `n`th _field_ of that line, when
separated by the `awk` field separator (whitespace by default, change
with `-F`). In this case, we're saying that, for every line, print the
contents of the second field, which happens to be the username!
Let's see if we can do something fancier. Let's compute the number of
single-use usernames that start with `c` and end with `e`:
```bash
| awk '$1 == 1 && $2 ~ /^c[^ ]*e$/ { print $2 }' | wc -l
```
There's a lot to unpack here. First, notice that we now have a pattern
(the stuff that goes before `{...}`). The pattern says that the first
field of the line should be equal to 1 (that's the count from `uniq
-c`), and that the second field should match the given regular
expression. And the block just says to print the username. We then count
the number of lines in the output with `wc -l`.
However, `awk` is a programming language, remember?
```awk
BEGIN { rows = 0 }
$1 == 1 && $2 ~ /^c[^ ]*e$/ { rows += $1 }
END { print rows }
```
`BEGIN` is a pattern that matches the start of the input (and `END`
matches the end). Now, the per-line block just adds the count from the
first field (although it'll always be 1 in this case), and then we print
it out at the end. In fact, we _could_ get rid of `grep` and `sed`
entirely, because `awk` [can do it
all](https://backreference.org/2010/02/10/idiomatic-awk/), but we'll
leave that as an exercise to the reader.
## Analyzing data
You can do math!
```bash
| paste -sd+ | bc -l
```
```bash
echo "2*($(data | paste -sd+))" | bc -l
```
You can get stats in a variety of ways.
[`st`](https://github.com/nferraz/st) is pretty neat, but if you already
have R:
```bash
ssh myserver journalctl
| grep sshd
| grep "Disconnected from"
| sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \[preauth\])?$/\2/'
| sort | uniq -c
| awk '{print $1}' | R --slave -e 'x <- scan(file="stdin", quiet=TRUE); summary(x)'
```
R is another (weird) programming language that's great at data analysis
and [plotting](https://ggplot2.tidyverse.org/). We won't go into too
much detail, but suffice to say that `summary` prints summary statistics
about a matrix, and we computed a matrix from the input stream of
numbers, so R gives us the statistics we wanted!
If you just want some simple plotting, `gnuplot` is your friend:
```bash
ssh myserver journalctl
| grep sshd
| grep "Disconnected from"
| sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \[preauth\])?$/\2/'
| sort | uniq -c
| sort -nk1,1 | tail -n10
| gnuplot -p -e 'set boxwidth 0.5; plot "-" using 1:xtic(2) with boxes'
```
## Data wrangling to make arguments
Sometimes you want to do data wrangling to find things to install or
remove based on some longer list. The data wrangling we've talked about
so far + `xargs` can be a powerful combo:
```bash
rustup toolchain list | grep nightly | grep -vE "nightly-x86|01-17" | sed 's/-x86.*//' | xargs rustup toolchain uninstall
```
# Exercises
1. If you are not familiar with Regular Expressions
[here](https://regexone.com/) is a short interactive tutorial that
covers most of the basics
1. How is `sed s/REGEX/SUBSTITUTION/g` different from the regular sed?
What about `/I` or `/m`?
1. To do in-place substitution it is quite tempting to do something like
`sed s/REGEX/SUBSTITUTION/ input.txt > input.txt`. However this is a
bad idea, why? Is this particular to `sed`?
1. Implement a simple grep equivalent tool in a language you are familiar with using regex. If you want the output to be color highlighted like grep is, search for ANSI color escape sequences.
1. Sometimes some operations like renaming files can be tricky with raw commands like `mv` . `rename` is a nifty tool to achieve this and has a sed-like syntax. Try creating a bunch of files with spaces in their names and use `rename` to replace them with underscores.
1. Look for boot messages that are _not_ shared between your past three
reboots (see `journalctl`'s `-b` flag). You may want to just mash all
the boot logs together in a single file, as that may make things
easier.
1. Produce some statistics of your system boot time over the last ten
boots using the log timestamp of the messages
```
Logs begin at ...
```
and
```
systemd[577]: Startup finished in ...
```
1. Find the number of words (in `/usr/share/dict/words`) that contain at
least three `a`s and don't have a `'s` ending. What are the three
most common last two letters of those words? `sed`'s `y` command, or
the `tr` program, may help you with case insensitivity. How many
of those two-letter combinations are there? And for a challenge:
which combinations do not occur?
1. Find an online data set like [this
one](https://commons.wikimedia.org/wiki/Data:Wikipedia_statistics/data.tab) or [this
one](https://ucr.fbi.gov/crime-in-the-u.s/2016/crime-in-the-u.s.-2016/topic-pages/tables/table-1).
Maybe another one [from
here](https://www.springboard.com/blog/data-science/free-public-data-sets-data-science-project/).
Fetch it using `curl` and extract out just two columns of numerical
data. If you're fetching HTML data,
[`pup`](https://github.com/EricChiang/pup) might be helpful. For JSON
data, try [`jq`](https://stedolan.github.io/jq/). Find the min and
max of one column in a single command, and the sum of the difference
between the two columns in another.
================================================
FILE: _2019/dotfiles.md
================================================
---
layout: lecture
title: "Dotfiles"
presenter: Anish
video:
aspect: 62.5
id: YSZBWWJw3mI
---
Many programs are configured using plain-text files known as "dotfiles"
(because the file names begin with a `.`, e.g. `~/.gitconfig`, so that they are
hidden in the directory listing `ls` by default).
A lot of the tools you use probably have a lot of settings that can be tuned
pretty finely. Often times, tools are customized with specialized languages,
e.g. Vimscript for Vim or the shell's own language for a shell.
Customizing and adapting your tools to your preferred workflow will make you
more productive. We advise you to invest time in customizing your tool yourself
rather than cloning someone else's dotfiles from GitHub.
You probably have some dotfiles set up already. Some places to look:
- `~/.bashrc`
- `~/.emacs`
- `~/.vim`
- `~/.gitconfig`
Some programs don't put the files under your home folder directly and instead they put them in a folder under `~/.config`.
Dotfiles are not exclusive to command line applications, for instance the [MPV](https://mpv.io/) video player can be configured editing files under `~/.config/mpv`
# Learning to customize tools
You can learn about your tool's settings by reading online documentation or
[man pages](https://en.wikipedia.org/wiki/Man_page). Another great way is to
search the internet for blog posts about specific programs, where authors will
tell you about their preferred customizations. Yet another way to learn about
customizations is to look through other people's dotfiles: you can find tons of
[dotfiles
repositories](https://github.com/search?o=desc&q=dotfiles&s=stars&type=Repositories)
on GitHub --- see the most popular one
[here](https://github.com/mathiasbynens/dotfiles) (we advise you not to blindly
copy configurations though).
# Organization
How should you organize your dotfiles? They should be in their own folder,
under version control, and **symlinked** into place using a script. This has
the benefits of:
- **Easy installation**: if you log in to a new machine, applying your
customizations will only take a minute
- **Portability**: your tools will work the same way everywhere
- **Synchronization**: you can update your dotfiles anywhere and keep them all
in sync
- **Change tracking**: you're probably going to be maintaining your dotfiles
for your entire programming career, and version history is nice to have for
long-lived projects
```shell
cd ~/src
mkdir dotfiles
cd dotfiles
git init
touch bashrc
# create a bashrc with some settings, e.g.:
# PS1='\w > '
touch install
chmod +x install
# insert the following into the install script:
# #!/usr/bin/env bash
# BASEDIR=$(dirname $0)
# cd $BASEDIR
#
# ln -s ${PWD}/bashrc ~/.bashrc
git add bashrc install
git commit -m 'Initial commit'
```
# Advanced topics
## Machine-specific customizations
Most of the time, you'll want the same configuration across machines, but
sometimes, you'll want a small delta on a particular machine. Here are a couple
ways you can handle this situation:
### Branch per machine
Use version control to maintain a branch per machine. This approach is
logically straightforward but can be pretty heavyweight.
### If statements
If the configuration file supports it, use the equivalent of if-statements to
apply machine specific customizations. For example, your shell could have something
like:
```shell
if [[ "$(uname)" == "Linux" ]]; then {do_something else}; fi
# Darwin is the architecture name for macOS systems
if [[ "$(uname)" == "Darwin" ]]; then {do_something}; fi
# You can also make it machine specific
if [[ "$(hostname)" == "myServer" ]]; then {do_something}; fi
```
### Includes
If the configuration file supports it, make use of includes. For example,
a `~/.gitconfig` can have a setting:
```
[include]
path = ~/.gitconfig_local
```
And then on each machine, `~/.gitconfig_local` can contain machine-specific
settings. You could even track these in a separate repository for
machine-specific settings.
This idea is also useful if you want different programs to share some configurations. For instance if you want both `bash` and `zsh` to share the same set of aliases you can write them under `.aliases` and have the following block in both.
```bash
# Test if ~/.aliases exists and source it
if [ -f ~/.aliases ]; then
source ~/.aliases
fi
```
# Resources
- Your instructors' dotfiles:
[Anish](https://github.com/anishathalye/dotfiles),
[Jon](https://github.com/jonhoo/configs),
[Jose](https://github.com/jjgo/dotfiles)
- [GitHub does dotfiles](http://dotfiles.github.io/): dotfile frameworks,
utilities, examples, and tutorials
- [Shell startup
scripts](https://blog.flowblok.id.au/2013-02/shell-startup-scripts.html): an
explanation of the different configuration files used for your shell
# Exercises
1. Create a folder for your dotfiles and set up [version
control](/2019/version-control/).
1. Add a configuration for at least one program, e.g. your shell, with some
customization (to start off, it can be something as simple as customizing
your shell prompt by setting `$PS1`).
1. Set up a method to install your dotfiles quickly (and without manual effort)
on a new machine. This can be as simple as a shell script that calls `ln -s`
for each file, or you could use a [specialized
utility](http://dotfiles.github.io/utilities/).
1. Test your installation script on a fresh virtual machine.
1. Migrate all of your current tool configurations to your dotfiles repository.
1. Publish your dotfiles on GitHub.
================================================
FILE: _2019/editors.md
================================================
---
layout: lecture
title: "Editors"
presenter: Anish
video:
aspect: 62.5
id: 1vLcusYSrI4
---
# Importance of Editors
As programmers, we spend most of our time editing plain-text files. It's worth
investing time learning an editor that fits your needs.
How do you learn a new editor? You force yourself to use that editor for a
while, even if it temporarily hampers your productivity. It'll pay off soon
enough (two weeks is enough to learn the basics).
We are going to teach you Vim, but we encourage you to experiment with other
editors. It's a very personal choice, and people have [strong
opinions](https://en.wikipedia.org/wiki/Editor_war).
We can't teach you how to use a powerful editor in 50 minutes, so we're going
to focus on teaching you the basics, showing you some of the more advanced
functionality, and giving you the resources to master the tool. We'll teach you
lessons in the context of Vim, but most ideas will translate to any other
powerful editor you use (and if they don't, then you probably shouldn't use
that editor!).

<!-- source: https://blogs.msdn.microsoft.com/steverowe/2004/11/17/code-editor-learning-curves/ -->
The editor learning curves graph is a myth. Learning the basics of a powerful
editor is quite easy (even though it might take years to master).
Which editors are popular today? See this [Stack Overflow
survey](https://insights.stackoverflow.com/survey/2018/#development-environments-and-tools)
(there may be some bias because Stack Overflow users may not be representative
of programmers as a whole).
## Command-line Editors
Even if you eventually settle on using a GUI editor, it's worth learning a
command-line editor for easily editing files on remote machines.
# Nano
Nano is a simple command-line editor.
- Move with arrow keys
- All other shortcuts (save, exit) shown at the bottom
# Vim
Vi/Vim is a powerful text editor. It's a command-line program that's usually
installed everywhere, which makes it convenient for editing files on a remote
machine.
Vim also has graphical versions, such as GVim and
[MacVim](https://macvim-dev.github.io/macvim/). These provide additional
features such as 24-bit color, menus, and popups.
## Philosophy of Vim
- When programming, you spend most of your time reading/editing, not writing
- Vim is a **modal** editor: different modes for inserting text vs manipulating text
- Vim is programmable (with Vimscript and also other languages like Python)
- Vim's interface itself is like a programming language
- Keystrokes (with mnemonic names) are commands
- Commands are composable
- Don't use the mouse: too slow
- Editor should work at the speed you think
## Introductory Vim
### Modes
Vim shows the current mode in the bottom left.
- Normal mode: for moving around a file and making edits
- Spend most of your time here
- Insert mode: for inserting text
- Visual (visual, line, or block) mode: for selecting blocks of text
You change modes by pressing `<ESC>` to switch from any mode back to normal
mode. From normal mode, enter insert mode with `i`, visual mode with `v`,
visual line mode with `V`, and visual block mode with `<C-v>`.
You use the `<ESC>` key a lot when using Vim: consider remapping Caps Lock to
Escape.
### Basics
Vim ex commands are issued through `:{command}` in normal mode.
- `:q` quit (close window)
- `:w` save
- `:wq` save and quit
- `:e {name of file}` open file for editing
- `:ls` show open buffers
- `:help {topic}` open help
- `:help :w` opens help for the `:w` ex command
- `:help w` opens help for the `w` movement
### Movement
Vim is all about efficient movement. Navigate the file in Normal mode.
- Disable arrow keys to avoid bad habits
```vim
nnoremap <Left> :echoe "Use h"<CR>
nnoremap <Right> :echoe "Use l"<CR>
nnoremap <Up> :echoe "Use k"<CR>
nnoremap <Down> :echoe "Use j"<CR>
```
- Basic movement: `hjkl` (left, down, up, right)
- Words: `w` (next word), `b` (beginning of word), `e` (end of word)
- Lines: `0` (beginning of line), `^` (first non-blank character), `$` (end of line)
- Screen: `H` (top of screen), `M` (middle of screen), `L` (bottom of screen)
- File: `gg` (beginning of file), `G` (end of file)
- Line numbers: `:{number}<CR>` or `{number}G` (line {number})
- Misc: `%` (corresponding item)
- Find: `f{character}`, `t{character}`, `F{character}`, `T{character}`
- find/to forward/backward {character} on the current line
- Repeating N times: `{number}{movement}`, e.g. `10j` moves down 10 lines
- Search: `/{regex}`, `n` / `N` for navigating matches
### Selection
Visual modes:
- Visual
- Visual Line
- Visual Block
Can use movement keys to make selection.
### Manipulating text
Everything that you used to do with the mouse, you now do with keyboards (and
powerful, composable commands).
- `i` enter insert mode
- but for manipulating/deleting text, want to use something more than
backspace
- `o` / `O` insert line below / above
- `d{motion}` delete {motion}
- e.g. `dw` is delete word, `d$` is delete to end of line, `d0` is delete
to beginning of line
- `c{motion}` change {motion}
- e.g. `cw` is change word
- like `d{motion}` followed by `i`
- `x` delete character (equal do `dl`)
- `s` substitute character (equal to `xi`)
- visual mode + manipulation
- select text, `d` to delete it or `c` to change it
- `u` to undo, `<C-r>` to redo
- Lots more to learn: e.g. `~` flips the case of a character
### Resources
- `vimtutor` command-line program to teach you vim
- [Vim Adventures](https://vim-adventures.com/) game to learn Vim
## Customizing Vim
Vim is customized through a plain-text configuration file in `~/.vimrc`
(containing Vimscript commands). There are probably lots of basic settings that
you want to turn on.
Look at people's dotfiles on GitHub for inspiration, but try not to
copy-and-paste people's full configuration. Read it, understand it, and take
what you need.
Some customizations to consider:
- Syntax highlighting: `syntax on`
- Color schemes
- Line numbers: `set nu` / `set rnu`
- Backspacing through everything: `set backspace=indent,eol,start`
## Advanced Vim
Here are a few examples to show you the power of the editor. We can't teach you
all of these kinds of things, but you'll learn them as you go. A good
heuristic: whenever you're using your editor and you think "there must be a
better way of doing this", there probably is: look it up online.
### Search and replace
`:s` (substitute) command ([documentation](http://vim.wikia.com/wiki/Search_and_replace)).
- `%s/foo/bar/g`
- replace foo with bar globally in file
- `%s/\[.*\](\(.*\))/\1/g`
- replace named Markdown links with plain URLs
### Multiple windows
- `sp` / `vsp` to split windows
- Can have multiple views of the same buffer.
### Mouse support
- `set mouse+=a`
- can click, scroll select
### Macros
- `q{character}` to start recording a macro in register `{character}`
- `q` to stop recording
- `@{character}` replays the macro
- Macro execution stops on error
- `{number}@{character}` executes a macro {number} times
- Macros can be recursive
- first clear the macro with `q{character}q`
- record the macro, with `@{character}` to invoke the macro recursively
(will be a no-op until recording is complete)
- Example: convert xml to json ([file](/2019/files/example-data.xml))
- Array of objects with keys "name" / "email"
- Use a Python program?
- Use sed / regexes
- `g/people/d`
- `%s/<person>/{/g`
- `%s/<name>\(.*\)<\/name>/"name": "\1",/g`
- ...
- Vim commands / macros
- `Gdd`, `ggdd` delete first and last lines
- Macro to format a single element (register `e`)
- Go to line with `<name>`
- `qe^r"f>s": "<ESC>f<C"<ESC>q`
- Macro to format a person
- Go to line with `<person>`
- `qpS{<ESC>j@eA,<ESC>j@ejS},<ESC>q`
- Macro to format a person and go to the next person
- Go to line with `<person>`
- `qq@pjq`
- Execute macro until end of file
- `999@q`
- Manually remove last `,` and add `[` and `]` delimiters
## Extending Vim
There are tons of plugins for extending vim.
First, get set up with a plugin manager like
[vim-plug](https://github.com/junegunn/vim-plug),
[Vundle](https://github.com/VundleVim/Vundle.vim), or
[pathogen.vim](https://github.com/tpope/vim-pathogen).
Some plugins to consider:
- [ctrlp.vim](https://github.com/kien/ctrlp.vim): fuzzy file finder
- [vim-fugitive](https://github.com/tpope/vim-fugitive): git integration
- [vim-surround](https://github.com/tpope/vim-surround): manipulating "surroundings"
- [gundo.vim](https://github.com/sjl/gundo.vim): navigate undo tree
- [nerdtree](https://github.com/scrooloose/nerdtree): file explorer
- [syntastic](https://github.com/vim-syntastic/syntastic): syntax checking
- [vim-easymotion](https://github.com/easymotion/vim-easymotion): magic motions
- [vim-over](https://github.com/osyo-manga/vim-over): substitute preview
Lists of plugins:
- [Vim Awesome](https://vimawesome.com/)
## Vim-mode in Other Programs
For many popular editors (e.g. vim and emacs), many other tools support editor
emulation.
- Shell
- bash: `set -o vi`
- zsh: `bindkey -v`
- `export EDITOR=vim` (environment variable used by programs like `git`)
- `~/.inputrc`
- `set editing-mode vi`
There are even vim keybinding extensions for web [browsers](http://vim.wikia.com/wiki/Vim_key_bindings_for_web_browsers), some popular ones are [Vimium](https://chrome.google.com/webstore/detail/vimium/dbepggeogbaibhgnhhndojpepiihcmeb?hl=en) for Google Chrome and [Tridactyl](https://github.com/tridactyl/tridactyl) for Firefox.
## Resources
- [Vim Tips Wiki](http://vim.wikia.com/wiki/Vim_Tips_Wiki)
- [Vim Advent Calendar](https://vimways.org/2018/): various Vim tips
- [Neovim](https://neovim.io/) is a modern vim reimplementation with more active development.
- [Vim Golf](http://www.vimgolf.com/): Various Vim challenges
{% comment %}
# Resources
TODO resources for other editors?
{% endcomment %}
# Exercises
1. Experiment with some editors. Try at least one command-line editor (e.g.
Vim) and at least one GUI editor (e.g. Atom). Learn through tutorials like
`vimtutor` (or the equivalents for other editors). To get a real feel for a
new editor, commit to using it exclusively for a couple days while going
about your work.
1. Customize your editor. Look through tips and tricks online, and look through
other people's configurations (often, they are well-documented).
1. Experiment with plugins for your editor.
1. Commit to using a powerful editor for at least a couple weeks: you should
start seeing the benefits by then. At some point, you should be able to get
your editor to work as fast as you think.
1. Install a linter (e.g. pyflakes for python) link it to your editor and test it is working.
================================================
FILE: _2019/files/example-data.xml
================================================
<people>
<person>
<name>Johnny Zhang Jr.</name>
<email>amyalvarez@cole.com</email>
</person>
<person>
<name>Edward Cook</name>
<email>dsparks@alvarez-dunn.com</email>
</person>
<person>
<name>Stephen Sweeney</name>
<email>dlewis@gmail.com</email>
</person>
<person>
<name>Krystal Riley</name>
<email>jflores@wright.biz</email>
</person>
<person>
<name>Ashley Robinson</name>
<email>robertsmichael@yahoo.com</email>
</person>
<person>
<name>Kimberly Brooks</name>
<email>sharoncunningham@larson.com</email>
</person>
<person>
<name>Brent Proctor</name>
<email>edward86@stewart.com</email>
</person>
<person>
<name>William Roberts</name>
<email>parkertodd@webb.com</email>
</person>
<person>
<name>Amanda Morales</name>
<email>lorizavala@hodges.com</email>
</person>
<person>
<name>Bryan Poole Jr.</name>
<email>carolyn56@gray-campos.net</email>
</person>
<person>
<name>Dale Hall</name>
<email>martinjames@yahoo.com</email>
</person>
<person>
<name>Isabella Reynolds</name>
<email>wbowen@wallace.com</email>
</person>
<person>
<name>Ann Rodriguez</name>
<email>charles37@taylor-riley.biz</email>
</person>
<person>
<name>Bryan Davis</name>
<email>jessica60@hotmail.com</email>
</person>
<person>
<name>Dalton Powell</name>
<email>piercenatasha@yahoo.com</email>
</person>
<person>
<name>Scott Turner</name>
<email>harold68@yahoo.com</email>
</person>
<person>
<name>Nicholas Castillo</name>
<email>dawnstephens@robinson.info</email>
</person>
<person>
<name>Joseph Pierce</name>
<email>lukepatterson@hotmail.com</email>
</person>
<person>
<name>Robyn White</name>
<email>jenniferrobinson@hotmail.com</email>
</person>
<person>
<name>Justin Rice</name>
<email>brandi76@gmail.com</email>
</person>
<person>
<name>Jamie Graham</name>
<email>harrisdavid@yahoo.com</email>
</person>
<person>
<name>Phillip Schmidt</name>
<email>stephanie33@gmail.com</email>
</person>
<person>
<name>John Baker</name>
<email>todd86@hotmail.com</email>
</person>
<person>
<name>Sharon Austin</name>
<email>srivera@yahoo.com</email>
</person>
<person>
<name>Erica Avila</name>
<email>jenniferreed@bowers-wilson.com</email>
</person>
<person>
<name>Jeremy Bass</name>
<email>jdavis@collins.com</email>
</person>
<person>
<name>Joshua Parsons</name>
<email>stephaniecoleman@miller-barker.com</email>
</person>
<person>
<name>Emma Mccoy</name>
<email>taylorjohn@wagner.net</email>
</person>
<person>
<name>Megan Williams</name>
<email>ronnie54@gmail.com</email>
</person>
<person>
<name>Michael Sutton</name>
<email>connie58@mendoza.net</email>
</person>
<person>
<name>Nicholas York</name>
<email>kennedykevin@collins.com</email>
</person>
<person>
<name>Donald Robles</name>
<email>williamsbrandon@gmail.com</email>
</person>
<person>
<name>Melissa Allen</name>
<email>pproctor@ramos-patel.com</email>
</person>
<person>
<name>Shannon Jones</name>
<email>beckkathleen@johnson.com</email>
</person>
<person>
<name>David White</name>
<email>sandra73@thompson.com</email>
</person>
<person>
<name>Jonathan Thomas</name>
<email>johnsonjeremy@gmail.com</email>
</person>
<person>
<name>Rachael Floyd</name>
<email>amanda78@johnson.info</email>
</person>
<person>
<name>Tina Carter</name>
<email>josewells@jones.net</email>
</person>
<person>
<name>Eric Johnson</name>
<email>bowersaustin@hernandez-edwards.com</email>
</person>
<person>
<name>William Kramer</name>
<email>rhunt@johnson.com</email>
</person>
<person>
<name>Nathan Williams</name>
<email>cynthiayoung@hotmail.com</email>
</person>
<person>
<name>Patty Schwartz</name>
<email>salinasdavid@sheppard.biz</email>
</person>
<person>
<name>David Collins</name>
<email>pcalhoun@yahoo.com</email>
</person>
<person>
<name>James Thomas</name>
<email>brianfox@rogers-cruz.com</email>
</person>
<person>
<name>Mark Casey</name>
<email>jerry88@graham.com</email>
</person>
<person>
<name>Robert Galloway</name>
<email>cherylmcgee@hotmail.com</email>
</person>
<person>
<name>Caitlin Dunn</name>
<email>nicholemartin@yahoo.com</email>
</person>
<person>
<name>Nancy Allison</name>
<email>martha33@molina-bullock.com</email>
</person>
<person>
<name>Marvin Burns</name>
<email>wrocha@gmail.com</email>
</person>
<person>
<name>Kimberly Jones</name>
<email>anitamunoz@french-christian.com</email>
</person>
<person>
<name>Caitlin Wood</name>
<email>thomasrandall@bowers-sullivan.org</email>
</person>
<person>
<name>Sara Burton</name>
<email>riosangelica@gmail.com</email>
</person>
<person>
<name>Jessica Roberson</name>
<email>theresa11@hotmail.com</email>
</person>
<person>
<name>Nicole Macias</name>
<email>kevinhodge@martin.biz</email>
</person>
<person>
<name>Christina Williams</name>
<email>shawn35@rice-bailey.org</email>
</person>
<person>
<name>Cody Winters</name>
<email>nicholassmith@barron-wu.com</email>
</person>
<person>
<name>Patricia Miller DDS</name>
<email>pierceraymond@watkins.org</email>
</person>
<person>
<name>Jennifer Lyons</name>
<email>vrivera@gmail.com</email>
</person>
<person>
<name>Jerry Rojas</name>
<email>jacobalexander@yahoo.com</email>
</person>
<person>
<name>Matthew Perez</name>
<email>jrivas@hotmail.com</email>
</person>
<person>
<name>Patrick Hogan</name>
<email>moorelisa@yahoo.com</email>
</person>
<person>
<name>Lisa Howard</name>
<email>stephen90@smith.biz</email>
</person>
<person>
<name>Justin Sloan</name>
<email>edwardsmichael@hotmail.com</email>
</person>
<person>
<name>Suzanne Morrow</name>
<email>shane74@yahoo.com</email>
</person>
<person>
<name>Theresa Lara</name>
<email>maryrichardson@clark.com</email>
</person>
<person>
<name>Christopher Powers</name>
<email>yfowler@davis-lee.net</email>
</person>
<person>
<name>Teresa Howell</name>
<email>amy15@yahoo.com</email>
</person>
<person>
<name>Richard Shelton</name>
<email>ksmith@yahoo.com</email>
</person>
<person>
<name>Jeremy Cole</name>
<email>bleach@gmail.com</email>
</person>
<person>
<name>Melissa Clark</name>
<email>rosejeffrey@yahoo.com</email>
</person>
<person>
<name>Kimberly Mcdaniel</name>
<email>ularson@ross-david.com</email>
</person>
<person>
<name>Kelly Dixon</name>
<email>gatesstephen@hotmail.com</email>
</person>
<person>
<name>Devin Quinn</name>
<email>wjohnson@hotmail.com</email>
</person>
<person>
<name>Kevin Greene</name>
<email>lhanson@hotmail.com</email>
</person>
<person>
<name>Jeffery Wiggins</name>
<email>amy76@gmail.com</email>
</person>
<person>
<name>Latoya Allen</name>
<email>vking@yahoo.com</email>
</person>
<person>
<name>Zachary Walker</name>
<email>diazjames@hotmail.com</email>
</person>
<person>
<name>Alyssa Molina</name>
<email>elizabeth59@gmail.com</email>
</person>
<person>
<name>Heather Miranda</name>
<email>davidturner@cortez-martinez.biz</email>
</person>
<person>
<name>Lori Gardner</name>
<email>murphytaylor@yahoo.com</email>
</person>
<person>
<name>Jessica Simpson</name>
<email>jamesdean@rosales.com</email>
</person>
<person>
<name>Anna Dickerson</name>
<email>abigailmurphy@hotmail.com</email>
</person>
<person>
<name>Molly Oconnor</name>
<email>morrisrhonda@yahoo.com</email>
</person>
<person>
<name>Brandi Braun</name>
<email>ericksonmatthew@jenkins.org</email>
</person>
<person>
<name>Renee Flowers</name>
<email>brownantonio@yang-crosby.org</email>
</person>
<person>
<name>Cassandra Compton</name>
<email>progers@yahoo.com</email>
</person>
<person>
<name>David Gilbert</name>
<email>vickie78@gmail.com</email>
</person>
<person>
<name>Brenda Davis</name>
<email>cynthiajones@thornton.com</email>
</person>
<person>
<name>Nicholas Rivera</name>
<email>longalyssa@yahoo.com</email>
</person>
<person>
<name>Dustin Hodges</name>
<email>sgolden@lee.com</email>
</person>
<person>
<name>Chad Wong</name>
<email>williambernard@mccarty.net</email>
</person>
<person>
<name>Robin Craig</name>
<email>xbyrd@austin.com</email>
</person>
<person>
<name>Heather Parker</name>
<email>allenjoshua@rodriguez.com</email>
</person>
<person>
<name>Jennifer Roberts</name>
<email>manningtravis@gmail.com</email>
</person>
<person>
<name>James Andrews</name>
<email>ginaromero@hotmail.com</email>
</person>
<person>
<name>Dorothy Hines</name>
<email>dsmith@thomas.com</email>
</person>
<person>
<name>Stephen Garcia</name>
<email>hughesbrendan@hotmail.com</email>
</person>
<person>
<name>Alfred Ellis</name>
<email>elizabeth41@crawford.info</email>
</person>
<person>
<name>Marilyn White</name>
<email>victoriaford@hotmail.com</email>
</person>
<person>
<name>Brian Graves</name>
<email>cpatel@gmail.com</email>
</person>
<person>
<name>Elizabeth Wagner</name>
<email>newtonwesley@cohen.com</email>
</person>
<person>
<name>Michelle Flores</name>
<email>shelbygross@duke-thomas.info</email>
</person>
<person>
<name>Larry Russell</name>
<email>richard99@meyer.com</email>
</person>
<person>
<name>Terrence Boyd</name>
<email>markmartin@flores.com</email>
</person>
<person>
<name>Jessica Carroll</name>
<email>eric30@yahoo.com</email>
</person>
<person>
<name>Erin Dean</name>
<email>toddmartin@guerra.biz</email>
</person>
<person>
<name>Craig Hernandez</name>
<email>joshualang@gonzalez.com</email>
</person>
<person>
<name>Amber Choi</name>
<email>doughertynancy@harmon.org</email>
</person>
<person>
<name>Renee Brown</name>
<email>terribeard@archer-gibson.info</email>
</person>
<person>
<name>Curtis Turner</name>
<email>pjohnson@hotmail.com</email>
</person>
<person>
<name>Benjamin Reed</name>
<email>marksmith@austin.net</email>
</person>
<person>
<name>Christina Fernandez</name>
<email>richardjoseph@esparza-peters.com</email>
</person>
<person>
<name>Jasmine Campbell</name>
<email>thomasmatthew@gmail.com</email>
</person>
<person>
<name>Catherine Bond</name>
<email>coreyroberts@gonzalez.com</email>
</person>
<person>
<name>Connie Jones</name>
<email>koneal@riley.com</email>
</person>
<person>
<name>Cody Taylor</name>
<email>kelsey99@hotmail.com</email>
</person>
<person>
<name>Kendra Gray</name>
<email>walkerrussell@hotmail.com</email>
</person>
<person>
<name>Alexander Murray</name>
<email>grossrobert@hotmail.com</email>
</person>
<person>
<name>Arthur Jackson</name>
<email>travis73@hotmail.com</email>
</person>
<person>
<name>Dr. William Vasquez DDS</name>
<email>gonzalezdaniel@hotmail.com</email>
</person>
<person>
<name>April Hampton</name>
<email>desireemorris@mcguire.info</email>
</person>
<person>
<name>Gerald Hunter</name>
<email>justin91@ross-scott.biz</email>
</person>
<person>
<name>Morgan Bolton</name>
<email>erika30@lloyd-smith.biz</email>
</person>
<person>
<name>Angela Barker</name>
<email>daniel17@carr.com</email>
</person>
<person>
<name>Angela Montgomery</name>
<email>jonathangoodwin@smith-perez.com</email>
</person>
<person>
<name>Yolanda Henry</name>
<email>shawnmcguire@gmail.com</email>
</person>
<person>
<name>Susan Hines</name>
<email>sarahbailey@wallace.com</email>
</person>
<person>
<name>Michelle Young</name>
<email>lewismichele@yahoo.com</email>
</person>
<person>
<name>Glen Hood</name>
<email>ljackson@vazquez.com</email>
</person>
<person>
<name>Christopher Wright</name>
<email>evansjulie@walton.com</email>
</person>
<person>
<name>Susan Guzman DDS</name>
<email>medinaelizabeth@gmail.com</email>
</person>
<person>
<name>Barbara Cortez</name>
<email>bchavez@cameron.com</email>
</person>
<person>
<name>Stacey Hammond</name>
<email>nancyturner@stewart.com</email>
</person>
<person>
<name>Amanda Stout</name>
<email>macdonaldlatoya@hotmail.com</email>
</person>
<person>
<name>Lisa Johnson</name>
<email>wnolan@gmail.com</email>
</person>
<person>
<name>Carlos Wyatt</name>
<email>iperez@cohen.com</email>
</person>
<person>
<name>Samantha Brewer</name>
<email>thomas47@hotmail.com</email>
</person>
<person>
<name>Brett Jackson</name>
<email>zpowell@cruz-rivera.com</email>
</person>
<person>
<name>Johnny Guzman</name>
<email>tmerritt@yahoo.com</email>
</person>
<person>
<name>Mary Davis</name>
<email>collinslisa@hotmail.com</email>
</person>
<person>
<name>Willie Mccoy</name>
<email>joshua20@terrell.biz</email>
</person>
<person>
<name>Kelsey Rivera</name>
<email>randy72@gmail.com</email>
</person>
<person>
<name>Melissa Maddox</name>
<email>christopher13@gmail.com</email>
</person>
<person>
<name>Jason Rodriguez</name>
<email>kellypierce@harris.com</email>
</person>
<person>
<name>Donna Walsh</name>
<email>wardraymond@martinez.com</email>
</person>
<person>
<name>Monique Patel</name>
<email>cynthia75@james.net</email>
</person>
<person>
<name>Dr. Lindsay Farrell PhD</name>
<email>brownmaria@gmail.com</email>
</person>
<person>
<name>Ann Ruiz</name>
<email>jeremiah94@pennington.org</email>
</person>
<person>
<name>Mary Alexander</name>
<email>catherineharper@munoz.org</email>
</person>
<person>
<name>Brittany Russell</name>
<email>haileywinters@russell-coffey.net</email>
</person>
<person>
<name>Dominique Rosales</name>
<email>matthewpatterson@carr.com</email>
</person>
<person>
<name>Henry Waters</name>
<email>karen72@logan.com</email>
</person>
<person>
<name>Jared Weaver</name>
<email>karlafletcher@baldwin.org</email>
</person>
<person>
<name>Mr. Thomas Atkins</name>
<email>gboone@gmail.com</email>
</person>
<person>
<name>Carla Cohen</name>
<email>ibarron@gmail.com</email>
</person>
<person>
<name>Tricia Lewis</name>
<email>pperez@hotmail.com</email>
</person>
<person>
<name>Mario Gill</name>
<email>lisa43@brown.org</email>
</person>
<person>
<name>James Olsen</name>
<email>vickie82@hotmail.com</email>
</person>
<person>
<name>Michael Perry</name>
<email>rdavis@yahoo.com</email>
</person>
<person>
<name>Matthew Lucas</name>
<email>joshuagray@carpenter-stanley.com</email>
</person>
<person>
<name>Christine Torres</name>
<email>samanthayoung@smith-aguilar.biz</email>
</person>
<person>
<name>Lindsay Miller</name>
<email>randyevans@yahoo.com</email>
</person>
<person>
<name>Margaret Jones</name>
<email>kevincantu@alexander-carson.org</email>
</person>
<person>
<name>Cameron Mcdonald</name>
<email>deckerjerome@garcia.com</email>
</person>
<person>
<name>Brittany Sanders</name>
<email>dennis55@leonard-turner.com</email>
</person>
<person>
<name>Daniel Patterson</name>
<email>timothy36@novak.com</email>
</person>
<person>
<name>David Chaney</name>
<email>kristen02@hotmail.com</email>
</person>
<person>
<name>Sheri Silva</name>
<email>idawson@alvarez.com</email>
</person>
<person>
<name>Holly Ward</name>
<email>saraallen@dunn-smith.net</email>
</person>
<person>
<name>Bryan Solis</name>
<email>stacey30@lam.biz</email>
</person>
<person>
<name>Diane Carter</name>
<email>paulvargas@gmail.com</email>
</person>
<person>
<name>David Brown</name>
<email>james98@gmail.com</email>
</person>
<person>
<name>Bridget Fritz</name>
<email>beth24@hotmail.com</email>
</person>
<person>
<name>Paul Boyd</name>
<email>johngutierrez@hotmail.com</email>
</person>
<person>
<name>Ernest Baker</name>
<email>phillipwhite@hotmail.com</email>
</person>
<person>
<name>George Myers</name>
<email>frank52@hammond.com</email>
</person>
<person>
<name>Daniel Miller</name>
<email>joshua96@gmail.com</email>
</person>
<person>
<name>Jonathan Ayala</name>
<email>jerryharris@davis.net</email>
</person>
<person>
<name>Jill Stone</name>
<email>pwright@hotmail.com</email>
</person>
<person>
<name>Trevor Richard</name>
<email>mreed@thompson.org</email>
</person>
<person>
<name>Jason Thomas</name>
<email>josephflowers@hotmail.com</email>
</person>
<person>
<name>Arthur Thomas</name>
<email>lnelson@hicks.com</email>
</person>
<person>
<name>Austin Collins</name>
<email>ambermann@barnes.com</email>
</person>
<person>
<name>Jason Diaz</name>
<email>ericreyes@hotmail.com</email>
</person>
<person>
<name>Darryl Hall</name>
<email>faithdixon@barnes-burgess.org</email>
</person>
<person>
<name>Jason Thomas</name>
<email>brittany32@yahoo.com</email>
</person>
<person>
<name>John Sanders</name>
<email>waltontheresa@hotmail.com</email>
</person>
<person>
<name>Lisa Hayes</name>
<email>victor14@hotmail.com</email>
</person>
<person>
<name>Chelsea Wong</name>
<email>iwatkins@williams-solomon.com</email>
</person>
<person>
<name>Joseph Fitzgerald</name>
<email>mary86@hotmail.com</email>
</person>
<person>
<name>Crystal Schroeder</name>
<email>kbarron@wilson-flynn.org</email>
</person>
<person>
<name>Denise Bean</name>
<email>noah23@gmail.com</email>
</person>
<person>
<name>Jamie Atkins</name>
<email>cwebb@hotmail.com</email>
</person>
<person>
<name>Joshua Kim</name>
<email>esmith@ramirez.com</email>
</person>
<person>
<name>Deanna Mooney</name>
<email>jason13@turner.com</email>
</person>
<person>
<name>Jasmine Baker</name>
<email>torresjacob@braun.com</email>
</person>
<person>
<name>Victoria Williams</name>
<email>rwilliams@hotmail.com</email>
</person>
<person>
<name>Sandra Hall</name>
<email>williamsonrichard@gmail.com</email>
</person>
<person>
<name>Miranda Mcpherson</name>
<email>xrussell@barajas.biz</email>
</person>
<person>
<name>Samantha Walton</name>
<email>danielle73@gmail.com</email>
</person>
<person>
<name>Kyle Serrano</name>
<email>stonecassandra@mcfarland.info</email>
</person>
<person>
<name>Mr. Bruce Maldonado DDS</name>
<email>diazmatthew@yahoo.com</email>
</person>
<person>
<name>Amber Fisher</name>
<email>jonesdavid@rubio.info</email>
</person>
<person>
<name>Brett Berry</name>
<email>millerteresa@gmail.com</email>
</person>
<person>
<name>Cory Bradley</name>
<email>umatthews@summers.com</email>
</person>
<person>
<name>Ryan Peters</name>
<email>shepherdmonique@gmail.com</email>
</person>
<person>
<name>Laura Lee</name>
<email>lfleming@higgins.com</email>
</person>
<person>
<name>Christian Smith</name>
<email>johnnymartinez@castro-miller.com</email>
</person>
<person>
<name>Kelly Hanson</name>
<email>velazquezsandra@chavez-malone.info</email>
</person>
<person>
<name>Brian King</name>
<email>hwood@yahoo.com</email>
</person>
<person>
<name>Cynthia Owens</name>
<email>sbrown@hotmail.com</email>
</person>
<person>
<name>Lisa Clark</name>
<email>derek74@bell-martinez.com</email>
</person>
<person>
<name>Brenda Ford</name>
<email>kevin55@hotmail.com</email>
</person>
<person>
<name>Daniel Brady</name>
<email>wbennett@hotmail.com</email>
</person>
<person>
<name>Jake Wilson</name>
<email>lorraine60@solis.biz</email>
</person>
<person>
<name>April Cole</name>
<email>halltyler@yahoo.com</email>
</person>
<person>
<name>Melissa Callahan</name>
<email>cmckenzie@rodriguez.info</email>
</person>
<person>
<name>Taylor Brown</name>
<email>davisadam@gmail.com</email>
</person>
<person>
<name>Patrick Guerrero</name>
<email>hannah48@delgado.net</email>
</person>
<person>
<name>Brian Gonzalez</name>
<email>burchmalik@johnson.com</email>
</person>
<person>
<name>Robert Bailey</name>
<email>debbiemoore@hotmail.com</email>
</person>
<person>
<name>Jesus Maynard</name>
<email>gene45@gmail.com</email>
</person>
<person>
<name>Linda Greer</name>
<email>johnharris@reed-allen.net</email>
</person>
<person>
<name>Travis Thomas</name>
<email>bryantrachel@gmail.com</email>
</person>
<person>
<name>Vicki Mitchell</name>
<email>edaniels@hotmail.com</email>
</person>
<person>
<name>Paula Espinoza</name>
<email>donnameyer@dennis.org</email>
</person>
<person>
<name>James Hoffman</name>
<email>haustin@larson-wiggins.biz</email>
</person>
<person>
<name>Ashlee Perkins</name>
<email>stevenknapp@miller.com</email>
</person>
<person>
<name>Rebecca Leon</name>
<email>smitchell@simpson-johnson.com</email>
</person>
<person>
<name>Jorge Williams</name>
<email>shawn36@peters-meadows.com</email>
</person>
<person>
<name>Bob Flores</name>
<email>kellercourtney@yahoo.com</email>
</person>
<person>
<name>Lisa Miller</name>
<email>johnsoncrystal@gmail.com</email>
</person>
<person>
<name>Brandon Davis</name>
<email>bryanpetersen@hotmail.com</email>
</person>
<person>
<name>Joshua Daugherty</name>
<email>josehayes@carey.com</email>
</person>
<person>
<name>Justin Wise</name>
<email>pamelacosta@simmons-morrow.com</email>
</person>
<person>
<name>Kimberly Johnson</name>
<email>combssandra@deleon.com</email>
</person>
<person>
<name>Toni Stone</name>
<email>eestrada@charles.com</email>
</person>
<person>
<name>Julie Rivers</name>
<email>rwilliams@castillo-nelson.org</email>
</person>
<person>
<name>Kelly Scott</name>
<email>danielsmith@hotmail.com</email>
</person>
<person>
<name>Michael Carr</name>
<email>clarklisa@newman-barrett.com</email>
</person>
<person>
<name>Jonathan Vaughn</name>
<email>dennisrebecca@lawrence-harris.com</email>
</person>
<person>
<name>Erica Lowe</name>
<email>wilsonkelly@hotmail.com</email>
</person>
<person>
<name>Kimberly Clark</name>
<email>jose15@gmail.com</email>
</person>
<person>
<name>Lindsey Robertson</name>
<email>rdickerson@yahoo.com</email>
</person>
<person>
<name>Cindy Anderson</name>
<email>gmorton@daniels.com</email>
</person>
<person>
<name>Tami Barber</name>
<email>harveykaren@hotmail.com</email>
</person>
<person>
<name>Tiffany Wu</name>
<email>jessica90@gmail.com</email>
</person>
<person>
<name>Edward Bowers</name>
<email>hallkathy@gmail.com</email>
</person>
<person>
<name>Shawn Collier</name>
<email>rhondasmith@hotmail.com</email>
</person>
<person>
<name>Michael Cox</name>
<email>usimpson@graham-cunningham.net</email>
</person>
</people>
================================================
FILE: _2019/files/example.c
================================================
#include <stdio.h>
const char *numbers[] = {
"one",
"two",
"three",
"four",
"five",
"six",
"seven",
"eight",
"nine",
"ten"
};
void say(int i)
{
const char *msg = numbers[i-1];
printf("%s\n", msg);
}
int main()
{
for (int i = 1; i <= 10; i++) {
say(i);
}
}
================================================
FILE: _2019/index.html
================================================
---
layout: page
title: "2019 Lectures"
permalink: /2019/
---
<p>Click on specific topics below to see lecture videos and lecture notes.</p>
<h1>Tuesday, 1/15</h1>
<ul>
<li><a href="/2019/course-overview/">Course overview</a></li>
<li><a href="/2019/virtual-machines/">Virtual machines and containers</a></li>
<li><a href="/2019/shell/">Shell and scripting</a></li>
</ul>
<h1>Thursday, 1/17</h1>
<ul>
<li><a href="/2019/command-line/">Command-line environment</a></li>
<li><a href="/2019/data-wrangling/">Data wrangling</a></li>
</ul>
<h1>Tuesday, 1/22</h1>
<ul>
<li><a href="/2019/editors/">Editors</a></li>
<li><a href="/2019/version-control/">Version control</a></li>
</ul>
<h1>Thursday, 1/24</h1>
<ul>
<li><a href="/2019/dotfiles/">Dotfiles</a></li>
<li><a href="/2019/backups/">Backups</a></li>
<li><a href="/2019/automation/">Automation</a></li>
<li><a href="/2019/machine-introspection/">Machine introspection</a></li>
</ul>
<h1>Tuesday, 1/29</h1>
<ul>
<li><a href="/2019/program-introspection/">Program introspection</a></li>
<li><a href="/2019/package-management/">Package/dependency management</a></li>
<li><a href="/2019/os-customization/">OS customization</a></li>
<li><a href="/2019/remote-machines/">Remote machines</a></li>
</ul>
<h1>Thursday, 1/31</h1>
<ul>
<li><a href="/2019/web/">Web and browsers</a></li>
<li><a href="/2019/security/">Security and privacy</a></li>
</ul>
<hr>
<h1>Discussion</h1>
<p>We've also shared this class beyond MIT in the hopes that others may
benefit from these resources. You can find posts and discussion on</p>
<ul>
<li><a href="https://news.ycombinator.com/item?id=19078281">Hacker News</a></li>
<li><a href="https://lobste.rs/s/h6157x/mit_hacker_tools_lecture_series_on">Lobsters</a></li>
<li><a href="https://www.reddit.com/r/learnprogramming/comments/an42uu/mit_hacker_tools_a_lecture_series_on_programmer/">/r/learnprogramming</a></li>
<li><a href="https://www.reddit.com/r/programming/comments/an3xki/mit_hacker_tools_a_lecture_series_on_programmer/">/r/programming</a></li>
<li><a href="https://twitter.com/Jonhoo/status/1091896192332693504">Twitter</a></li>
<li><a href="https://www.youtube.com/playlist?list=PLyzOVJj3bHQuiujH1lpn8cA9dsyulbYRv">YouTube</a></li>
</ul>
================================================
FILE: _2019/machine-introspection.md
================================================
---
layout: lecture
title: "Machine Introspection"
presenter: Jon
video:
aspect: 56.25
id: eNYT2Oq3PF8
---
Sometimes, computers misbehave. And very often, you want to know why.
Let's look at some tools that help you do that!
But first, let's make sure you're able to do introspection. Often,
system introspection requires that you have certain privileges, like
being the member of a group (like `power` for shutdown). The `root` user
is the ultimate privilege; they can do pretty much anything. You can run
a command as `root` (but be careful!) using `sudo`.
## What happened?
If something goes wrong, the first place to start is to look at what
happened around the time when things went wrong. For this, we need to
look at logs.
Traditionally, logs were all stored in `/var/log`, and many still are.
Usually there's a file or folder per program. Use `grep` or `less` to
find your way through them.
There's also a kernel log that you can see using the `dmesg` command.
This used to be available as a plain-text file, but nowadays you often
have to go through `dmesg` to get at it.
Finally, there is the "system log", which is increasingly where all of
your log messages go. On _most_, though not all, Linux systems, that log
is managed by `systemd`, the "system daemon", which controls all the
services that run in the background (and much much more at this point).
That log is accessible through the somewhat inconvenient `journalctl`
tool if you are root, or part of the `admin` or `wheel` groups.
For `journalctl`, you should be aware of these flags in particular:
- `-u UNIT`: show only messages related to the given systemd service
- `--full`: don't truncate long lines (the stupidest feature)
- `-b`: only show messages from the latest boot (see also `-b -2`)
- `-n100`: only show last 100 entries
## What is happening?
If something _is_ wrong, or you just want to get a feel for what's going
on in your system, you have a number of tools at your disposal for
inspecting the currently running system:
First, there's `top`, and the improved version `htop`, which show you
various statistics for the currently running processes on the system.
CPU use, memory use, process trees, etc. There are lots of shortcuts,
but `t` is particularly useful for enabling the tree view. You can also
see the process tree with `pstree` (+ `-p` to include PIDs). If you want
to know what those programs are doing, you'll often want to tail their
log files. `journalctl -f`, `dmesg -w`, and `tail -f` are you friends
here.
Sometimes, you want to know more about the resources being used overall
on your system. [`dstat`](http://dag.wiee.rs/home-made/dstat/) is
excellent for that. It gives you real-time resource metrics for lots of
different subsystems like I/O, networking, CPU utilization, context
switches, and the like. `man dstat` is the place to start.
If you're running out of disk space, there are two primary utilities
you'll want to know about: `df` and `du`. The former shows you the
status of all the partitions on your system (try it with `-h`), whereas
the latter measures the size of all the folders you give it, including
their contents (see also `-h` and `-s`).
To figure out what network connections you have open, `ss` is the way to
go. `ss -t` will show all open TCP connections. `ss -tl` will show all
listening (i.e., server) ports on your system. `-p` will also include
which process is using that connection, and `-n` will give you the raw
port numbers.
## System configuration
There are _many_ ways to configure your system, but we'll go through
two very common ones: networking and services. Most applications on your
system tell you how to configure them in their manpage, and usually it
will involve editing files in `/etc`; the system configuration
directory.
If you want to configure your network, the `ip` command lets you do
that. Its arguments take on a slightly weird form, but `ip help command`
will get you pretty far. `ip addr` shows you information about your
network interfaces and how they're configured (IP addresses and such),
and `ip route` shows you how network traffic is routed to different
network hosts. Network problems can often be resolved purely through the
`ip` tool. There's also `iw` for managing wireless network interfaces.
`ping` is a handy tool for checking how deeply things are broken. Try
pinging a hostname (google.com), an external IP address (1.1.1.1), and
an internal IP address (192.168.1.1 or default gw). You may also want to
fiddle with `/etc/resolv.conf` to check your DNS settings (how hostnames
are resolved to IP addresses).
To configure services, you pretty much have to interact with `systemd`
these days, for better or for worse. Most services on your system will
have a systemd service file that defines a systemd _unit_. These files
define what command to run when that services is started, how to stop
it, where to log things, etc. They're usually not too bad to read, and
you can find most of them in `/usr/lib/systemd/system/`. You can also
define your own in `/etc/systemd/system` .
Once you have a systemd service in mind, you use the `systemctl` command
to interact with it. `systemctl enable UNIT` will set the service to
start on boot (`disable` removes it again), and `start`, `stop`, and
`restart` will do what you expect. If something goes wrong, systemd will
let you know, and you can use `journalctl -u UNIT` to see the
application's log. You can also use `systemctl status` to see how all
your system services are doing. If your boot feels slow, it's probably
due to a couple of slow services, and you can use `systemd-analyze` (try
it with `blame`) to figure out which ones.
# Exercises
`locate`?
`dmidecode`?
`tcpdump`?
`/boot`?
`iptables`?
`/proc`?
================================================
FILE: _2019/os-customization.md
================================================
---
layout: lecture
title: "OS Customization"
presenter: Anish
video:
aspect: 62.5
id: epSRVqQzeDo
---
There is a lot you can do to customize your operating system beyond what is
available in the settings menus.
# Keyboard remapping
Your keyboard probably has keys that you aren't using very much. Instead of
having useless keys, you can remap them to do useful things.
## Remapping to other keys
The simplest thing is to remap keys to other keys. For example, if you don't
use the caps lock key very much, then you can remap it to something more
useful. If you are a Vim user, for example, you might want to remap caps lock
to escape.
On macOS, you can do some remappings through Keyboard settings in System
Preferences; for more complicated mappings, you need special software.
## Remapping to arbitrary commands
You don't just have to remap keys to other keys: there are tools that will let
you remap keys (or combinations of keys) to arbitrary commands. For example,
you could make command-shift-t open a new terminal window.
# Customizing hidden OS settings
## macOS
macOS exposes a lot of useful settings through the `defaults` command. For
example, you can make Dock icons of hidden applications translucent:
```shell
defaults write com.apple.dock showhidden -bool true
```
There is no single list of all possible settings, but you can find lists of
specific customizations online, such as Mathias Bynens'
[.macos](https://github.com/mathiasbynens/dotfiles/blob/master/.macos).
# Window management
## Tiling window management
[Tiling window management](https://en.wikipedia.org/wiki/Tiling_window_manager)
is one approach to window management, where you organize windows into
non-overlapping frames. If you're using a Unix-based operating system, you can
install a tiling window manager; if you're using something like Windows or
macOS, you can install applications that let you approximate this behavior.
## Screen management
You can set up keyboard shortcuts to help you manipulate windows across
screens.
## Layouts
If there are specific ways you lay out windows on a screen, rather than
"executing" that layout manually, you can script it, making instantiating a
layout trivial.
# Resources
- [Hammerspoon](https://www.hammerspoon.org/) - macOS desktop automation
- [Rectangle](https://rectangleapp.com/) - macOS window manager
- [Karabiner](https://karabiner-elements.pqrs.org/) - sophisticated macOS keyboard remapping
- [r/unixporn](https://www.reddit.com/r/unixporn/) - screenshots and
documentation of people's fancy configurations
# Exercises
1. Figure out how to remap your Caps Lock key to something you use more often
(such as Escape or Ctrl or Backspace).
1. Make a custom global keyboard shortcut to open a new terminal window or a
new browser window.
{% comment %}
TODO
- Bitbar / Polybar
- Clipboard Manager (stack/searchable history)
{% endcomment %}
================================================
FILE: _2019/package-management.md
================================================
---
layout: lecture
title: "Package Management and Dependency Management"
presenter: Anish
video:
aspect: 56.25
id: tgvt473T8xA
---
Software usually builds on (a collection of) other software, which necessitates
dependency management.
Package/dependency management programs are language-specific, but many share
common ideas.
# Package repositories
Packages are hosted in _package repositories_. There are different repositories
for different languages (and sometimes multiple for a particular language),
such as [PyPI](https://pypi.org/) for Python, [RubyGems](https://rubygems.org/)
for Ruby, and [crates.io](https://crates.io/) for Rust. They generally store
software (source code and sometimes pre-compiled binaries for specific
platforms) for all versions of a package.
# Semantic versioning
Software evolves over time, and we need a way to refer to software versions.
Some simple ways could be to refer to software by a sequence number or a commit
hash, but we can do better in terms of communicating more information: using
version numbers.
There are many approaches; one popular one is [Semantic
Versioning](https://semver.org/):
```
x.y.z
^ ^ ^
| | +- patch
| +--- minor
+----- major
```
Increment **major** version when you make incompatible API changes.
Increment **minor** version when you add functionality in a backward-compatible manner.
Increment **patch** when you make backward-compatible bug fixes.
For example, if you depend on a feature introduced in `v1.2.0` of some
software, then you can install `v1.x.y` for any minor version `x >= 2` and any
patch version `y`. You need to install major version `1` (because `2` can
introduce backward-incompatible changes), and you need to install a minor
version `>= 2` (because you depend on a feature introduced in that minor
version). You can use any newer minor version or patch version because
they should not introduce any backward-incompatible changes.
# Lock files
In addition to specifying versions, it can be nice to enforce that the
_contents_ of the dependency have not changed to prevent tampering. Some tools
use _lock files_ to specify cryptographic hashes of dependencies (along with
versions) that are checked on package install.
# Specifying versions
Tools often let you specify versions in multiple ways, such as:
- exact version, e.g. `2.3.12`
- minimum major version, e.g. `>= 2`
- specific major version and minimum patch version, e.g. `>= 2.3, <3.0`
Specifying an exact version can be advantageous to avoid different behaviors
based on installed dependencies (this shouldn't happen if all dependencies
faithfully follow semver, but sometimes people make mistakes). Specifying a
minimum requirement has the advantage of allowing bug fixes to be installed
(e.g. patch upgrades).
# Dependency resolution
Package managers use various dependency resolution algorithms to satisfy
dependency requirements. This often gets challenging with complex dependencies
(e.g. a package can be indirectly depended on by multiple top-level
dependencies, and different versions could be required). Different package
managers have different levels of sophistication in their dependency
resolution, but it's something to be aware of: you may need to understand this
if you are debugging dependencies.
# Virtual environments
If you're developing multiple software projects, they may depend on different
versions of a particular piece of software. Sometimes, your build tool will
handle this naturally (e.g. by building a static binary).
For other build tools and programming languages, one approach is handling this
with virtual environments (e.g. with the
[virtualenv](https://docs.python-guide.org/dev/virtualenvs/) tool for Python).
Instead of installing dependencies system-wide, you can install dependencies
per-project in a virtual environment, and _activate_ the virtual environment
that you want to use when you're working on a specific project.
# Vendoring
Another very different approach to dependency management is _vendoring_.
Instead of using a dependency manager or build tool to fetch software, you copy
the entire source code for a dependency into your software's repository. This
has the advantage that you're always building against the same version of the
dependency and you don't need to rely on a package repository, but it is more
effort to upgrade dependencies.
================================================
FILE: _2019/program-introspection.md
================================================
---
layout: lecture
title: "Program Introspection"
presenter: Anish
video:
aspect: 62.5
id: 74MhV-7hYzg
---
# Debugging
When printf-debugging isn't good enough: use a debugger.
Debuggers let you interact with the execution of a program, letting you do
things like:
- halt execution of the program when it reaches a certain line
- single-step through the program
- inspect values of variables
- many more advanced features
## GDB/LLDB
[GDB](https://www.gnu.org/software/gdb/) and [LLDB](https://lldb.llvm.org/).
Supports many C-like languages.
Let's look at [example.c](/2019/files/example.c). Compile with debug flags:
`gcc -g -o example example.c`.
Open GDB:
`gdb example`
Some commands:
- `run`
- `b {name of function}` - set a breakpoint
- `b {file}:{line}` - set a breakpoint
- `c` - continue
- `step` / `next` / `finish` - step in / step over / step out
- `p {variable}` - print value of variable
- `watch {expression}` - set a watchpoint that triggers when the value of the expression changes
- `rwatch {expression}` - set a watchpoint that triggers when the value is read
- `layout`
## PDB
[PDB](https://docs.python.org/3/library/pdb.html) is the Python debugger.
Insert `import pdb; pdb.set_trace()` where you want to drop into PDB, basically
a hybrid of a debugger (like GDB) and a Python shell.
## Web browser Developer Tools
Another example of a debugger, this time with a graphical interface.
# strace
Observe system calls a program makes: `strace {program}`.
# Profiling
Types of profiling: CPU, memory, etc.
Simplest profiler: `time`.
## Go
Run test code with CPU profiler: `go test -cpuprofile=cpu.out`
Analyze profile: `go tool pprof -web cpu.out`
Run test code with Memory profiler: `go test -memprofile=mem.out`
Analyze profile: `go tool pprof -web mem.out`
## Perf
Basic performance stats: `perf stat {command}`
Run a program with the profiler: `perf record {command}`
Analyze profile: `perf report`
================================================
FILE: _2019/remote-machines.md
================================================
---
layout: lecture
title: "Remote Machines"
presenter: Jose
video:
aspect: 62.5
id: X5c2Y8BCowM
---
It has become more and more common for programmers to use remote servers in their everyday work. If you need to use remote servers in order to deploy backend software or you need a server with higher computational capabilities, you will end up using a Secure Shell (SSH). As with most tools covered, SSH is highly configurable so it is worth learning about it.
## Executing commands
An often overlooked feature of `ssh` is the ability to run commands directly.
- `ssh foobar@server ls` will execute ls in the home folder of foobar
- It works with pipes, so `ssh foobar@server ls | grep PATTERN` will grep locally the remote output of `ls` and `ls | ssh foobar@server grep PATTERN` will grep remotely the local output of `ls`.
## SSH Keys
Key-based authentication exploits public-key cryptography to prove to the server that the client owns the secret private key without revealing the key. This way you do not need to reenter your password every time. Nevertheless the private key (e.g. `~/.ssh/id_rsa`) is effectively your password so treat it like so.
- Key generation. To generate a pair you can simply run `ssh-keygen -t rsa -b 4096`. If you do not choose a passphrase anyone that gets hold of your private key will be able to access authorized servers so it is recommended to choose one and use `ssh-agent` to manage shell sessions.
If you have configured pushing to Github using SSH keys you have probably done the steps outlined [here](https://help.github.com/articles/connecting-to-github-with-ssh/) and have a valid pair already. To check if you have a passphrase and validate it you can run `ssh-keygen -y -f /path/to/key`.
- Key based authentication. `ssh` will look into `.ssh/authorized_keys` to determine which clients it should let in. To copy a public key over we can use the
```bash
cat .ssh/id_dsa.pub | ssh foobar@remote 'cat >> ~/.ssh/authorized_keys'
```
A simpler solution can be achieved with `ssh-copy-id` where available.
```bash
ssh-copy-id -i .ssh/id_dsa.pub foobar@remote
```
## Copying files over ssh
There are many ways to copy files over ssh
- `ssh+tee`, the simplest is to use `ssh` command execution and stdin input by doing `cat localfile | ssh remote_server tee serverfile`
- `scp` when copying large amounts of files/directories, the secure copy `scp` command is more convenient since it can easily recurse over paths. The syntax is `scp path/to/local_file remote_host:path/to/remote_file`
- `rsync` improves upon `scp` by detecting identical files in local and remote and preventing copying them again. It also provides more fine grained control over symlinks, permissions and has extra features like the `--partial` flag that can resume from a previously interrupted copy. `rsync` has a similar syntax to `scp`.
## Backgrounding processes
By default when interrupting a ssh connection, child processes of the parent shell are killed along with it. There are a couple of alternatives
- `nohup` - the `nohup` tool effectively allows for a process to live when the terminal gets killed. Although this can sometimes be achieved with `&` and `disown`, nohup is a better default. More details can be found [here](https://unix.stackexchange.com/questions/3886/difference-between-nohup-disown-and).
- `tmux`, `screen` - whereas `nohup` effectively backgrounds the process it is not convenient for interactive shell sessions. In that case using a terminal multiplexer like `screen` or `tmux` is a convenient choice since one can easily detach and reattach the associated shells.
Lastly, if you disown a program and want to reattach it to the current terminal, you can look into [reptyr](https://github.com/nelhage/reptyr). `reptyr PID` will grab the process with id PID and attach it to your current terminal.
## Port Forwarding
In many scenarios you will run into software that works by listening to ports in the machine. When this happens in your local machine you can simply do `localhost:PORT` or `127.0.0.1:PORT`, but what do you do with a remote server that does not have its ports directly available through the network/internet?. This is called port forwarding and it
comes in two flavors: Local Port Forwarding and Remote Port Forwarding (see the pictures for more details, credit of the pictures from [this SO post](https://unix.stackexchange.com/questions/115897/whats-ssh-port-forwarding-and-whats-the-difference-between-ssh-local-and-remot)).
**Local Port Forwarding**

**Remote Port Forwarding**

The most common scenario is local port forwarding where a service in the remote machine listens in a port and you want to link a port in your local machine to forward to the remote port. For example if we execute `jupyter notebook` in the remote server that listens to the port `8888`. Thus to forward that to the local port `9999` we would do `ssh -L 9999:localhost:8888 foobar@remote_server` and then navigate to `localhost:9999` in our local machine.
## Graphics Forwarding
Sometimes forwarding ports is not enough since we want to run a GUI based program in the server. You can always resort to Remote Desktop Software that sends the entire Desktop Environment (ie. options like RealVNC, Teamviewer, &c). However for a single GUI tool, SSH provides a good alternative: Graphics Forwarding.
Using the `-X` flag tells SSH to forward
For trusted X11 forwarding the `-Y` flag can be used.
Final note is that for this to work the `sshd_config` on the server must have the following options
```bash
X11Forwarding yes
X11DisplayOffset 10
```
## Roaming
A common pain when connecting to a remote server are disconnections due to shutting down/sleeping your computer or changing a network. Moreover if one has a connection with significant lag using ssh can become quite frustrating. [Mosh](https://mosh.org/), the mobile shell, improves upon ssh, allowing roaming connections, intermittent connectivity and providing intelligent local echo.
Mosh is present in all common distributions and package managers. Mosh requires an ssh server to be working in the server. You do not need to be superuser to install mosh but it does require that ports 60000 through 60010 to be open in the server (they usually are since they are not in the privileged range).
A downside of `mosh` is that is does not support roaming port/graphics forwarding so if you use those often `mosh` won't be of much help.
## SSH Configuration
#### Client
We have covered many many arguments that we can pass. A tempting alternative is to create shell aliases that look like `alias my_serer="ssh -X -i ~/.id_rsa -L 9999:localhost:8888 foobar@remote_server`, however there is a better alternative, using `~/.ssh/config`.
```bash
Host vm
User foobar
HostName 172.16.174.141
Port 22
IdentityFile ~/.ssh/id_rsa
RemoteForward 9999 localhost:8888
# Configs can also take wildcards
Host *.mit.edu
User foobaz
```
An additional advantage of using the `~/.ssh/config` file over aliases is that other programs like `scp`, `rsync`, `mosh`, &c are able to read it as well and convert the settings into the corresponding flags.
Note that the `~/.ssh/config` file can be considered a dotfile, and in general it is fine for it to be included with the rest of your dotfiles. However if you make it public, think about the information that you are potentially providing strangers on the internet: the addresses of your servers, the users you are using, the open ports, &c. This may facilitate some types of attacks so be thoughtful about sharing your SSH configuration.
Warning: Never include your RSA keys ( `~/.ssh/id_rsa*` ) in a public repository!
#### Server side
Server side configuration is usually specified in `/etc/ssh/sshd_config`. Here you can make changes like disabling password authentication, changing ssh ports, enabling X11 forwarding, &c. You can specify config settings in a per user basis.
## Remote Filesystem
Sometimes it is convenient to mount a remote folder. [sshfs](https://github.com/libfuse/sshfs) can mount a folder on a remote server
locally, and then you can use a local editor.
## Exercises
1. For SSH to work the host needs to be running an SSH server. Install an SSH server (such as OpenSSH) in a virtual machine so you can do the rest of the exercises. To figure out what is the ip of the machine run the command `ip addr` and look for the inet field (ignore the `127.0.0.1` entry, that corresponds to the loopback interface).
1. Go to `~/.ssh/` and check if you have a pair of SSH keys there. If not, generate them with `ssh-keygen -t rsa -b 4096`. It is recommended that you use a password and use `ssh-agent` , more info [here](https://www.ssh.com/ssh/agent).
1. Use `ssh-copy-id` to copy the key to your virtual machine. Test that you can ssh without a password. Then, edit your `sshd_config` in the server to disable password authentication by editing the value of `PasswordAuthentication`. Disable root login by editing the value of `PermitRootLogin`.
1. Edit the `sshd_config` in the server to change the ssh port and check that you can still ssh. If you ever have a public facing server, a non default port and key only login will throttle a significant amount of malicious attacks.
1. Install mosh in your server/VM, establish a connection and then disconnect the network adapter of the server/VM. Can mosh properly recover from it?
1. Another use of local port forwarding is to tunnel certain host to the server. If your network filters some website like for example `reddit.com` you can tunnel it through the server as follows:
- Run `ssh remote_server -L 80:reddit.com:80`
- Set `reddit.com` and `www.reddit.com` to `127.0.0.1` in `/etc/hosts`
- Check that you are accessing that website through the server
- If it is not obvious use a website such as [ipinfo.io](https://ipinfo.io/) which will change depending on your host public ip.
1. Background port forwarding can easily be achieved with a couple of extra flags. Look into what the `-N` and `-f` flags do in `ssh` and figure out what a command such as this `ssh -N -f -L 9999:localhost:8888 foobar@remote_server` does.
## References
- [SSH Hacks](http://matt.might.net/articles/ssh-hacks/)
- [Secure Secure Shell](https://stribika.github.io/2015/01/04/secure-secure-shell.html)
{% comment %}
Lecture notes will be available by the start of lecture.
{% endcomment %}
================================================
FILE: _2019/security.md
================================================
---
layout: lecture
title: "Security and Privacy"
presenter: Jon
video:
aspect: 56.25
id: OBx_c-i-M8s
---
The world is a scary place, and everyone's out to get you.
Okay, maybe not, but that doesn't mean you want to flaunt all your
secrets. Security (and privacy) is generally all about raising the bar
for attackers. Find out what your threat model is, and then design your
security mechanisms around that! If the threat model is the NSA or
Mossad, you're _probably_ going to have a bad time.
There are _many_ ways to make your technical persona more secure. We'll
touch on a lot of high-level things here, but this is a process, and
educating yourself is one of the best things you can do. So:
## Follow the Right People
One of the best ways to improve your security know-how is to follow
other people who are vocal about security. Some suggestions:
- [@TroyHunt](https://twitter.com/TroyHunt)
- [@SwiftOnSecurity](https://twitter.com/SwiftOnSecurity)
- [@taviso](https://twitter.com/taviso)
- [@thegrugq](https://twitter.com/thegrugq)
- [@tqbf](https://twitter.com/tqbf)
- [@mattblaze](https://twitter.com/mattblaze)
- [@moxie](https://twitter.com/moxie)
See also [this
list](https://heimdalsecurity.com/blog/best-twitter-cybersec-accounts/)
for more suggestions.
## General Security Advice
Tech Solidarity has a pretty great list of [do's and don'ts for
journalists](https://web.archive.org/web/20221123204419/https://techsolidarity.org/resources/basic_security.htm)
that has a lot of sane advice, and is decently up-to-date. [@thegrugq](https://medium.com/@thegrugq)
also has a good blog post on [travel security
advice](https://medium.com/@thegrugq/stop-fabricating-travel-security-advice-35259bf0e869)
that's worth reading. We'll repeat much of the advice from those sources
here, plus some more. Also, get a [USB data
blocker](https://www.amazon.com/dp/B00QRRZ2QM/), because [USB is
scary](https://www.bleepingcomputer.com/news/security/heres-a-list-of-29-different-types-of-usb-attacks/).
## Authentication
The very first thing you should do, if you haven't already, is download
a password manager. Some good ones are:
- [1password](https://1password.com/)
- [KeePass](https://keepass.info/)
- [BitWarden](https://bitwarden.com/)
- [`pass`](https://git.zx2c4.com/password-store/about/)
If you're particularly paranoid, use one that encrypts the passwords
locally on your computer, as opposed to storing them in plain-text at
the server. Use it to generate passwords
for all the web sites you care about right now. Then, switch on
two-factor authentication, ideally with a
[FIDO/U2F](https://fidoalliance.org/) dongle (a
[YubiKey](https://www.yubico.com/quiz/) for example, which has [20% off
for students](https://www.yubico.com/why-yubico/for-education/)). TOTP
(like Google Authenticator or Duo) will also work in a pinch, but
[doesn't protect against
phishing](https://twitter.com/taviso/status/1082015009348104192). SMS is
pretty much useless unless your threat model only includes random
strangers picking up your password in transit.
Also, a note about paper keys. Often, services will give you a "backup
key" that you can use as a second factor if you lose your real second
factor (btw, always keep a backup dongle somewhere safe!). While you
_can_ stick those in your password managers, that means that should
someone get access to your password manager, you're totally hosed (but
maybe you're okay with that thread model). If you are truly paranoid,
print out these paper keys, never store them digitally, and place them
in a safe in the real world.
## Private Communication
Use [Signal](https://www.signal.org/) ([setup
instructions](https://medium.com/@mshelton/signal-for-beginners-c6b44f76a1f0).
[Wire](https://wire.com/en/) is [fine
too](https://www.securemessagingapps.com/); WhatsApp is okay; [don't use
Telegram](https://twitter.com/bascule/status/897187286554628096)).
Desktop messengers are pretty broken (partially due to usually relying
on Electron, which is a huge trust stack).
E-mail is particularly problematic, even if PGP signed. It's not
generally forward-secure, and the key-distribution problem is pretty
severe. [keybase.io](https://keybase.io/) helps, and is useful for a
number of other reasons. Also, PGP keys are generally handled on desktop
computers, which is one of the least secure computing environments.
Relatedly, consider getting a Chromebook, or just work on a tablet with
a keyboard.
## File Security
File security is hard, and operates on many level. What is it you're
trying to secure against?
[](https://xkcd.com/538/)
- Offline attacks (someone steals your laptop while it's off): turn on
full disk encryption. ([cryptsetup +
LUKS](https://wiki.archlinux.org/index.php/Dm-crypt/Encrypting_a_non-root_file_system)
on Linux,
[BitLocker](https://fossbytes.com/enable-full-disk-encryption-windows-10/)
on Windows, [FileVault](https://support.apple.com/en-us/HT204837) on
macOS. Note that this won't help if the attacker _also_ has you and
really wants your secrets.
- Online attacks (someone has your laptop and it's on): use file
encryption. There are two primary mechanisms for doing so
- Encrypted filesystems: stacked filesystem encryption software encrypts files individually rather than having encrypted block devices. You can "mount" these filesystems by providing the decryption key, and then browse the files inside it freely. When you unmount it, those files are all unavailable. Modern solutions include [gocryptfs](https://github.com/rfjakob/gocryptfs) and [eCryptFS](http://ecryptfs.org/). More detailed comparisons can be found [here](https://nuetzlich.net/gocryptfs/comparison/) and [here](https://wiki.archlinux.org/index.php/disk_encryption#Comparison_table)
- Encrypted files: encrypt individual files with symmetric
encryption (see `gpg -c`) and a secret key. Or, like `pass`, also
encrypt the key with your public key so only you can read it back
later with your private key. Exact encryption settings matter a
lot!
- [Plausible
deniability](https://en.wikipedia.org/wiki/Plausible_deniability)
(what seems to be the problem officer?): usually lower performance,
and easier to lose data. Hard to actually prove that it provides
[deniable
encryption](https://en.wikipedia.org/wiki/Deniable_encryption)! See
the [discussion
here](https://security.stackexchange.com/questions/135846/is-plausible-deniability-actually-feasible-for-encrypted-volumes-disks),
and then consider whether you may want to try
[VeraCrypt](https://www.veracrypt.fr/en/Home.html) (the maintained
fork of good ol' TrueCrypt).
- Encrypted backups: use [Tarsnap](https://www.tarsnap.com/) or [Borgbase](https://www.borgbase.com/)
- Think about whether an attacker can delete your backups if they
get a hold of your laptop!
## Internet Security & Privacy
The internet is a _very_ scary place. Open WiFi networks
[are](https://www.troyhunt.com/the-beginners-guide-to-breaking-website/)
[scary](https://www.troyhunt.com/talking-with-scott-hanselman-on/). Make
sure you delete them afterwards, otherwise your phone will happily
announce and re-connect to something with the same name later!
If you're ever on a network you don't trust, a VPN _may_ be worthwhile,
but keep in mind that you're trusting the VPN provider _a lot_. Do you
really trust them more than your ISP? If you truly want a VPN, use a
provider you're sure you trust, and you should probably pay for it. Or
set up [WireGuard](https://www.wireguard.com/) for yourself -- it's
[excellent](https://web.archive.org/web/20210526211307/https://latacora.micro.blog/there-will-be/)!
There are also secure configuration settings for a lot of internet-enabled
applications at [cipherlist.eu](https://cipherlist.eu/). If you're particularly
privacy-oriented, [privacytools.io](https://privacytools.io) is also a good
resource.
Some of you may wonder about [Tor](https://www.torproject.org/). Keep in
mind that Tor is _not_ particularly resistant to powerful global
attackers, and is weak against traffic analysis attacks. It may be
useful for hiding traffic on a small scale, but won't really buy you all
that much in terms of privacy. You're better off using more secure
services in the first place (Signal, TLS + certificate pinning, etc.).
## Web Security
So, you want to go on the Web too?
Jeez, you're really pushing your luck here.
Install [HTTPS Everywhere](https://www.eff.org/https-everywhere).
SSL/TLS is
[critical](https://www.troyhunt.com/ssl-is-not-about-encryption/), and
it's _not_ just about encryption, but also about being able to verify
that you're talking to the right service in the first place! If you run
your own web server, [test it](https://www.ssllabs.com/ssltest/index.html). TLS configuration
[can get hairy](https://wiki.mozilla.org/Security/Server_Side_TLS).
HTTPS Everywhere will do its very best to never navigate you to HTTP
sites when there's an alternative. That doesn't save you, but it helps.
If you're truly paranoid, blacklist any SSL/TLS CAs that you don't
absolutely need.
Install [uBlock Origin](https://github.com/gorhill/uBlock). It is a
[wide-spectrum
blocker](https://github.com/gorhill/uBlock/wiki/Blocking-mode) that
doesn't just stop ads, but all sorts of third-party communication a page
may try to do. And inline scripts and such. If you're willing to spend
some time on configuration to make things work, go to [medium
mode](https://github.com/gorhill/uBlock/wiki/Blocking-mode:-medium-mode)
or even [hard
mode](https://github.com/gorhill/uBlock/wiki/Blocking-mode:-hard-mode).
Those _will_ make some sites not work until you've fiddled with the
settings enough, but will also significantly improve your online
security.
If you're using Firefox, enable [Multi-Account
Containers](https://support.mozilla.org/en-US/kb/containers). Create
separate containers for social networks, banking, shopping, etc. Firefox
will keep the cookies and other state for each of the containers totally
separate, so sites you visit in one container can't snoop on sensitive
data from the others. In Google Chrome, you can use [Chrome
Profiles](https://support.google.com/chrome/answer/2364824) to achieve
similar results.
Exercises
TODO
1. Encrypt a file using PGP
1. Use veracrypt to create a simple encrypted volume
1. Enable 2FA for your most data sensitive accounts i.e. GMail, Dropbox, Github, &c
================================================
FILE: _2019/shell.md
================================================
---
layout: lecture
title: "Shell and Scripting"
presenter: Jon
video:
aspect: 56.25
id: dbDRfmH5uSI
---
The shell is an efficient, textual interface to your computer.
The shell prompt: what greets you when you open a terminal.
Lets you run programs and commands; common ones are:
- `cd` to change directory
- `ls` to list files and directories
- `mv` and `cp` to move and copy files
But the shell lets you do _so_ much more; you can invoke any program on
your computer, and command-line tools exist for doing pretty much
anything you may want to do. And they're often more efficient than their
graphical counterparts. We'll go through a bunch of those in this class.
The shell provides an interactive programming language ("scripting").
There are many shells:
- You've probably used `sh` or `bash`.
- Also shells that match languages: `csh`.
- Or "better" shells: `fish`, `zsh`, `ksh`.
In this class we'll focus on the ubiquitous `sh` and `bash`, but feel
free to play around with others. I like `fish`.
Shell programming is a *very* useful tool in your toolbox.
Can either write programs directly at the prompt, or into a file.
`#!/bin/sh` + `chmod +x` to make shell executable.
## Working with the shell
Run a command a bunch of times:
```bash
for i in $(seq 1 5); do echo hello; done
```
There's a lot to unpack:
- `for x in list; do BODY; done`
- `;` terminates a command -- equivalent to newline
- split `list`, assign each to `x`, and run body
- splitting is "whitespace splitting", which we'll get back to
- no curly braces in shell, so `do` + `done`
- `$(seq 1 5)`
- run the program `seq` with arguments `1` and `5`
- substitute entire `$()` with the output of that program
- equivalent to
```bash
for i in 1 2 3 4 5
```
- `echo hello`
- everything in a shell script is a command
- in this case, run the `echo` command, which prints its arguments
with the argument `hello`.
- all commands are searched for in `$PATH` (colon-separated)
We have variables:
```bash
for f in $(ls); do echo $f; done
```
Will print each file name in the current directory.
Can also set variables using `=` (no space!):
```bash
foo=bar
echo $foo
```
There are a bunch of "special" variables too:
- `$1` to `$9`: arguments to the script
- `$0` name of the script itself
- `$#` number of arguments
- `$$` process ID of current shell
To only print directories
```bash
for f in $(ls); do if test -d $f; then echo dir $f; fi; done
```
More to unpack here:
- `if CONDITION; then BODY; fi`
- `CONDITION` is a command; if it returns with exit status 0
(success), then `BODY` is run.
- can also hook in an `else` or `elif`
- again, no curly braces, so `then` + `fi`
- `test` is another program that provides various checks and
comparisons, and exits with 0 if they're true (`$?`)
- `man COMMAND` is your friend: `man test`
- can also be invoked with `[` + `]`: `[ -d $f ]`
- take a look at `man test` and `which "["`
But wait! This is wrong! What if a file is called "My Documents"?
- `for f in $(ls)` expands to `for f in My Documents`
- first do the test on `My`, then on `Documents`
- not what we wanted!
- biggest source of bugs in shell scripts
## Argument splitting
Bash splits arguments by whitespace; not always what you want!
- need to use quoting to handle spaces in arguments
`for f in "My Documents"` would work correctly
- same problem somewhere else -- do you see where?
`test -d $f`: if `$f` contains whitespace, `test` will error!
- `echo` happens to be okay, because split + join by space
but what if a filename contains a newline?! turns into space!
- quote all use of variables that you don't want split
- but how do we fix our script above?
what does `for f in "$(ls)"` do do you think?
Globbing is the answer!
- bash knows how to look for files using patterns:
- `*` any string of characters
- `?` any single character
- `{a,b,c}` any of these characters
- `for f in *`: all files in this directory
- when globbing, each matching file becomes its own argument
- still need to make sure to quote when _using_: `test -d "$f"`
- can make advanced patterns:
- `for f in a*`: all files starting with `a` in the current directory
- `for f in foo/*.txt`: all `.txt` files in `foo`
- `for f in foo/*/p??.txt`
all three-letter text files starting with p in subdirs of `foo`
Whitespace issues don't stop there:
- `if [ $foo = "bar" ]; then` -- see the issue?
- what if `$foo` is empty? arguments to `[` are `=` and `bar`...
- _can_ work around this with `[ x$foo = "xbar" ]`, but bleh
- instead, use `[[`: bash built-in comparator that has special parsing
- also allows `&&` instead of `-a`, `||` over `-o`, etc.
<!-- TODO: arrays? $@. ${array[@]} vs "${array[@]}". -->
## Composability
Shell is powerful in part because of composability. Can chain multiple
programs together rather than have one program that does everything.
The key character is `|` (pipe).
- `a | b` means run both `a` and `b`
send all output of `a` as input to `b`
print the output of `b`
All programs you launch ("processes") have three "streams":
- `STDIN`: when the program reads input, it comes from here
- `STDOUT`: when the program prints something, it goes here
- `STDERR`: a 2nd output the program can choose to use
- by default, `STDIN` is your keyboard, `STDOUT` and `STDERR` are both
your terminal. but you can change that!
- `a | b` makes `STDOUT` of `a` `STDIN` of `b`.
- also have:
- `a > foo` (`STDOUT` of `a` goes to the file `foo`)
- `a 2> foo` (`STDERR` of `a` goes to the file `foo`)
- `a < foo` (`STDIN` of `a` is read from the file `foo`)
- hint: `tail -f` will print a file as it's being written
- why is this useful? lets you manipulate output of a program!
- `ls | grep foo`: all files that contain the word `foo`
- `ps | grep foo`: all processes that contain the word `foo`
- `journalctl | grep -i intel | tail -n5`:
last 5 system log messages with the word intel (case insensitive)
- `who | sendmail -t me@example.com`
send the list of logged-in users to `me@example.com`
- forms the basis for much data-wrangling, as we'll cover later
Bash also provides a number of other ways to compose programs.
You can group commands with `(a; b) | tac`: run `a`, then `b`, and send
all their output to `tac`, which prints its input in reverse order.
A lesser-known, but super useful one is _process substitution_.
`b <(a)` will run `a`, generate a temporary file-name for its output
stream, and pass that file-name to `b`. For example:
```bash
diff <(journalctl -b -1 | head -n20) <(journalctl -b -2 | head -n20)
```
will show you the difference between the first 20 lines of the last boot
log and the one before that.
<!-- TODO: exit codes? -->
## Job and process control
What if you want to run longer-term things in the background?
- the `&` suffix runs a program "in the background"
- it will give you back your prompt immediately
- handy if you want to run two programs at the same time
like a server and client: `server & client`
- note that the running program still has your terminal as `STDOUT`!
try: `server > server.log & client`
- see all such processes with `jobs`
- notice that it shows "Running"
- bring it to the foreground with `fg %JOB` (no argument is latest)
- if you want to background the current program: `^Z` + `bg` (Here `^Z` means pressing `Ctrl+Z`)
- `^Z` stops the current process and makes it a "job"
- `bg` runs the last job in the background (as if you did `&`)
- background jobs are still tied to your current session, and exit if
you log out. `disown` lets you sever that connection. or use `nohup`.
- `$!` is pid of last background process
<!-- TODO: process output control (^S and ^Q)? -->
What about other stuff running on your computer?
- `ps` is your friend: lists running processes
- `ps -A`: print processes from all users (also `ps ax`)
- `ps` has *many* arguments: see `man ps`
- `pgrep`: find processes by searching (like `ps -A | grep`)
- `pgrep -af`: search and display with arguments
- `kill`: send a _signal_ to a process by ID (`pkill` by search + `-f`)
- signals tell a process to "do something"
- most common: `SIGKILL` (`-9` or `-KILL`): tell it to exit *now*
equivalent to `^\`
- also `SIGTERM` (`-15` or `-TERM`): tell it to exit gracefully
equivalent to `^C`
## Flags
Most command line utilities take parameters using **flags**. Flags usually come in short form (`-h`) and long form (`--help`). Usually running `CMD -h` or `man CMD` will give you a list of the flags the program takes.
Short flags can usually be combined, running `rm -r -f` is equivalent to running `rm -rf` or `rm -fr`.
Some common flags are a de facto standard and you will seem them in many applications:
* `-a` commonly refers to all files (i.e. also including those that start with a period)
* `-f` usually refers to forcing something, like `rm -f`
* `-h` displays the help for most commands
* `-v` usually enables a verbose output
* `-V` usually prints the version of the command
Also, a double dash `--` is used in built-in commands and many other commands to signify the end of command options, after which only positional parameters are accepted. So if you have a file called `-v` (which you can) and want to grep it `grep pattern -- -v` will work whereas `grep pattern -v` won't. In fact, one way to create such file is to do `touch -- -v`.
## Exercises
1. If you are completely new to the shell you may want to read a more comprehensive guide about it such as [BashGuide](http://mywiki.wooledge.org/BashGuide). If you want a more in-depth introduction [The Linux Command Line](http://linuxcommand.org/tlcl.php) is a good resource.
1. **PATH, which, type**
We briefly discussed that the `PATH` environment variable is used to locate the programs that you run through the command line. Let's explore that a little further
- Run `echo $PATH` (or `echo $PATH | tr -s ':' '\n'` for pretty printing) and examine its contents, what locations are listed?
- The command `which` locates a program in the user PATH. Try running `which` for common commands like `echo`, `ls` or `mv`. Note that `which` is a bit limited since it does not understand shell aliases. Try running `type` and `command -v` for those same commands. How is the output different?
- Run `PATH=` and try running the previous commands again, some work and some don't, can you figure out why?
1. **Special Variables**
- What does the variable `~` expands as? What about `.`? And `..`?
- What does the variable `$?` do?
- What does the variable `$_` do?
- What does the variable `!!` expand to? What about `!!*`? And `!l`?
- Look for documentation for these options and familiarize yourself with them
1. **xargs**
Sometimes piping doesn't quite work because the command being piped into does not expect the newline separated format. For example `file` command tells you properties of the file.
Try running `ls | file` and `ls | xargs file`. What is `xargs` doing?
1. **Shebang**
When you write a script you can specify to your shell what interpreter should be used to interpret the script by using a [shebang](https://en.wikipedia.org/wiki/Shebang_(Unix)) line. Write a script called `hello` with the following contentsmake it executable with `chmod +x hello`. Then execute it with `./hello`. Then remove the first line and execute it again? How is the shell using that first line?
```bash
#! /usr/bin/python
print("Hello World!")
```
You will often see programs that have a shebang that looks like `#! usr/bin/env bash`. This is a more portable solution with it own set of [advantages and disadvantages](https://unix.stackexchange.com/questions/29608/why-is-it-better-to-use-usr-bin-env-name-instead-of-path-to-name-as-my). How is `env` different from `which`? What environment variable does `env` use to decide what program to run?
1. **Pipes, process substitution, subshell**
Create a script called `slow_seq.sh` with the following contents and do `chmod +x slow_seq.sh` to make it executable.
```bash
#! /usr/bin/env bash
for i in $(seq 1 10); do
echo $i;
sleep 1;
done
```
There is a way in which pipes (and process substitution) differ from using subshell execution, i.e. `$()`. Run the following commands and observe the differences:
- `./slow_seq.sh | grep -P "[3-6]"`
- `grep -P "[3-6]" <(./slow_seq.sh)`
- `echo $(./slow_seq.sh) | grep -P "[3-6]"`
1. **Misc**
- Try running `touch {a,b}{a,b}` then `ls` what did appear?
- Sometimes you want to keep STDIN and still pipe it to a file. Try running `echo HELLO | tee hello.txt`
- Try running `cat hello.txt > hello.txt ` what do you expect to happen? What does happen?
- Run `echo HELLO > hello.txt` and then run `echo WORLD >> hello.txt`. What are the contents of `hello.txt`? How is `>` different from `>>`?
- Run `printf "\e[38;5;81mfoo\e[0m\n"`. How was the output different? If you want to know more, search for ANSI color escape sequences.
- Run `touch a.txt` then run `^txt^log` what did bash do for you? In the same vein, run `fc`. What does it do?
{% comment %}
TODO
1. **parallel**
- set -e, set -x
- traps
{% endcomment %}
1. **Keyboard shortcuts**
As with any application you use frequently is worth familiarising yourself with its keyboard shortcuts. Type the following ones and try figuring out what they do and in what scenarios it might be convenient knowing about them. For some of them it might be easier searching online about what they do. (remember that `^X` means pressing `Ctrl+X`)
- `^A`, `^E`
- `^R`
- `^L`
- `^C`, `^\` and `^D`
- `^U` and `^Y`
================================================
FILE: _2019/version-control.md
================================================
---
layout: lecture
title: "Version Control"
presenter: Jon
video:
aspect: 56.25
id: 3fig2Vz8QXs
---
Whenever you are working on something that changes over time, it's
useful to be able to _track_ those changes. This can be for a number of
reasons: it gives you a record of what changed, how to undo it, who
changed it, and possibly even why. Version control systems (VCS) give
you that ability. They let you _commit_ changes to a set of files, along
with a message describing the change, as well as look at and undo
changes you've made in the past.
Most VCS support sharing the commit history between multiple users. This
allows for convenient collaboration: you can see the changes I've made,
and I can see the changes you've made. And since the VCS tracks
_changes_, it can often (though not always) figure out how to combine
our changes as long as they touch relatively disjoint things.
There [_a
lot_](https://en.wikipedia.org/wiki/Comparison_of_version-control_software)
of VCSes out there that differ a lot in what they support, how they
function, and how you interact with them. Here, we'll focus on
[git](https://git-scm.com/), one of the more commonly used ones, but I
recommend you also take a look at
[Mercurial](https://www.mercurial-scm.org/).
With that all said -- to the cliffnotes!
## Is git dark magic?
not quite.. you need to understand the data model.
we're going to skip over some of the details, but roughly speaking,
the _core_ "thing" in git is a commit.
- every commit has a unique name, "revision hash"
a long hash like `998622294a6c520db718867354bf98348ae3c7e2`
often shortened to a short (unique-ish) prefix: `9986222`
- commit has author + commit message
- also has the hash of any _ancestor commits_
usually just the hash of the previous commit
- commit also represents a _diff_, a representation of how you get from
the commit's ancestors to the commit (e.g., remove this line in this
file, add these lines to this file, rename that file, etc.)
- in reality, git stores the full before and after state
- probably don't want to store big files that change!
initially, the _repository_ (roughly: the folder that git manages) has
no content, and no commits. let's set that up:
```console
$ git init hackers
$ cd hackers
$ git status
```
the output here actually gives us a good starting point. let's dig in
and make sure we understand it all.
first, "On branch master".
- don't want to use hashes all the time.
- branches are names that point to hashes.
- master is traditionally the name for the "latest" commit.
every time a new commit is made, the master name will be made to
point to the new commit's hash.
- special name `HEAD` refers to "current" name
- you can also make your own names with `git branch` (or `git tag`)
we'll get back to that
let's skip over "No commits yet" because that's all there is to it.
then, "nothing to commit".
- every commit contains a diff with all the changes you made.
but how is that diff constructed in the first place?
- _could_ just always commit _all_ changes you've made since the last
commit
- sometimes you want to only commit some of them (e.g., not `TODO`s)
- sometimes you want to break up a change into multiple commits to
give a separate commit message for each one
- git lets you _stage_ changes to construct a commit
- add changes to a file or files to the staged changes with `git add`
- add only some changes in a file with `git add -p`
- without argument `git add` operates on "all known files"
- remove a file and stage its removal with `git rm`
- empty the set of staged changes `git reset`
- note that this does *not* change any of your files!
it *only* means that no changes will be included in a commit
- to remove only some staged changes:
`git reset FILE` or `git reset -p`
- check staged changes with `git diff --staged`
- see remaining changes with `git diff`
- when you're happy with the stage, make a commit with `git commit`
- if you just want to commit *all* changes: `git commit -a`
- `git help add` has a bunch more helpful info
while you're playing with the above, try to run `git status` to see what
git thinks you're doing -- it's surprisingly helpful!
## A commit you say...
okay, we have a commit, now what?
- we can look at recent changes: `git log` (or `git log --oneline`)
- we can look at the full changes: `git log -p`
- we can show a particular commit: `git show master`
- or with `-p` for full diff/patch
- we can go back to the state at a commit using `git checkout NAME`
- if `NAME` is a commit hash, git says we're "detached". this just
means there's no `NAME` that refers to this commit, so if we make
commits, no-one will know about them.
- we can revert a change with `git revert NAME`
- applies the diff in the commit at `NAME` in reverse.
- we can compare an older version to this one using `git diff NAME..`
- `a..b` is a commit _range_. if either is left out, it means `HEAD`.
- we can show all the commits between using `git log NAME..`
- `-p` works here too
- we can change `master` to point to a particular commit (effectively
undoing everything since) with `git reset NAME`:
- huh, why? wasn't `reset` to change staged changes?
reset has a "second" form (see `git help reset`) which sets `HEAD`
to the commit pointed to by the given name.
- notice that this didn't change any files -- `git diff` now
effectively shows `git diff NAME..`.
## What's in a name?
clearly, names are important in git. and they're the key to
understanding *a lot* of what goes on in git. so far, we've talked about
commit hashes, master, and `HEAD`. but there's more!
- you can make your own branches (like master) with `git branch b`
- creates a new name, `b`, which points to the commit at `HEAD`
- you're still "on" master though, so if you make a new commit,
master will point to that new commit, `b` will not.
- switch to a branch with `git checkout b`
- any commits you make will now update the `b` name
- switch back to master with `git checkout master`
- all your changes in `b` are hidden away
- a very handy way to be able to easily test out changes
- tags are other names that never change, and that have their own
message. often used to mark releases + changelogs.
- `NAME^` means "the commit before `NAME`
- can apply recursively: `NAME^^^`
- you _most likely_ mean `~` when you use `~`
- `~` is "temporal", whereas `^` goes by ancestors
- `~~` is the same as `^^`
- with `~` you can also write `X~3` for "3 commits older than `X`
- you don't want `^3`
- `git diff HEAD^`
- `-` means "the previous name"
- most commands operate on `HEAD` unless you give another argument
## Clean up your mess
your commit history will _very_ often end up as:
- `add feature x` -- maybe even with a commit message about `x`!
- `forgot to add file`
- `fix bug`
- `typo`
- `typo2`
- `actually fix`
- `actually actually fix`
- `tests pass`
- `fix example code`
- `typo`
- `x`
- `x`
- `x`
- `x`
that's _fine_ as far as git is concerned, but is not very helpful to
your future self, or to other people who are curious about what has
changed. git lets you clean up these things:
- `git commit --amend`: fold staged changes into previous commit
- note that this _changes_ the previous commit, giving it a new hash!
- `git rebase -i HEAD~13` is _magical_.
for each commit from past 13, choose what to do:
- default is `pick`; do nothing
- `r`: change commit message
- `e`: change commit (add or remove files)
- `s`: combine commit with previous and edit commit message
- `f`: "fixup" -- combine commit with previous; discard commit msg
- at the end, `HEAD` is made to point to what is now the last commit
- often referred to as _squashing_ commits
- what it really does: rewind `HEAD` to rebase start point, then
re-apply the commits in order as directed.
- `git reset --hard NAME`: reset the state of all files to that of
`NAME` (or `HEAD` if no name is given). handy for undoing changes.
## Playing with others
a common use-case for version control is to allow multiple people to
make changes to a set of files without stepping on each other's toes.
or rather, to make sure that _if_ they step on each other's toes, they
won't just silently overwrite each other's changes.
git is a _distributed_ VCS: everyone has a local copy of the entire
repository (well, of everything others have chosen to publish). some
VCSes are _centralized_ (e.g., subversion): a server has all the
commits, clients only have the files they have "checked out". basically,
they only have the _current_ files, and need to ask the server if they
want anything else.
every copy of a git repository can be listed as a "remote". you can copy
an existing git repository using `git clone ADDRESS` (instead of `git
init`). this creates a remote called _origin_ that points to `ADDRESS`.
you can fetch names and the commits they point to from a remote with
`git fetch REMOTE`. all names at a remote are available to you as
`REMOTE/NAME`, and you can use them just like local names.
if you have write access to a remote, you can change names at the remote
to point to commits you've made using `git push`. for example, let's
make the master name (branch) at the remote `origin` point to the commit
that our master branch currently points to:
- `git push origin master:master`
- for convenience, you can set `origin/master` as the default target
for when you `git push` from the current branch with `-u`
- consider: what does this do? `git push origin master:HEAD^`
often you'll use GitHub, GitLab, BitBucket, or something else as your
remote. there's nothing "special" about that as far as git is concerned.
it's all just names and commits. if someone makes a change to master and
updates `github/master` to point to their commit (we'll get back to
that in a second), then when you `git fetch github`, you'll be able to
see their changes with `git log github/master`.
## Working with others
so far, branches seem pretty useless: you can create them, do work on
them, but then what? eventually, you'll just make master point to them
anyway, right?
- what if you had to fix something while working on a big feature?
- what if someone else made a change to master in the meantime?
inevitably, you will have to _merge_ changes in one branch with changes
in another, whether those changes are made by you or someone else. git
lets you do this with, unsurprisingly, `git merge NAME`. `merge` will:
- look for the latest point where `HEAD` and `NAME` shared a commit
ancestor (i.e., where they diverged)
- (try to) apply all those changes to the current `HEAD`
- produce a commit that contains all those changes, and lists both
`HEAD` and `NAME` as its ancestors
- set `HEAD` to that commit's hash
once your big feature has been finished, you can merge its branch into
master, and git will ensure that you don't lose any changes from either
branch!
if you've used git in the past, you may recognize `merge` by a different
name: `pull`. when you do `git pull REMOTE BRANCH`, that is:
- `git fetch REMOTE`
- `git merge REMOTE/BRANCH`
- where, like `push`, `REMOTE` and `BRANCH` are often omitted and use
the "tracking" remote branch (remember `-u`?)
this usually works _great_. as long as the changes to the branches being
merged are disjoint. if they are not, you get a _merge conflict_. sounds
scary...
- a merge conflict is just git telling you that it doesn't know what
the final diff should look like
- git pauses and asks you to finish staging the "merge commit"
- open the conflicted file in your editor and look for lots of angle
brackets (`<<<<<<<`). the stuff above `=======` is the change made in
the `HEAD` since the shared ancestor commit. the stuff below is the
change made in the `NAME` since the shared commit.
- `git mergetool` is pretty handy -- opens a diff editor
- once you've _resolved_ the conflict by figuring out what the file
should now look like, stage those changes with `git add`.
- when all the conflicts are resolved, finish with `git commit`
- you can give up with `git merge --abort`
you've just resolved your first git merge conflict! \o/
now you can publish your finished changes with `git push`
## When worlds collide
when you `push`, git checks that no-one else's work is lost if you
update the remote name you're pushing too. it does this by checking
that the current commit of the remote name is an ancestor of the commit
you are pushing. if it is, git can safely just update the name; this is
called _fast-forwarding_. if it is not, git will refuse to update the
remote name, and tell you there have been changes.
if your push is rejected, what do you do?
- merge remote changes with `git pull` (i.e., `fetch` + `merge`)
- force the push with `--force`: this will lose other people's changes!
- there's also `--force-with-lease`, which will only force the change
if the remote name hasn't changed since the last time you fetched
from that remote. much safer!
- if you've rebased local commits that you've previously pushed
("history rewriting"; probably don't do this), you'll have to force
push. think about why!
- try to re-apply your changes "on top of" the changes made remotely
- this is a `rebase`!
- rewind all local commits since shared ancestor
- fast-forward `HEAD` to commit at remote name
- apply local commits in-order
- may have conflicts you have to manually resolve
- `git rebase --continue` or `--abort`
- lots more [here](https://git-scm.com/book/en/v2/Git-Branching-Rebasing)
- `git pull --rebase` will start this process for you
- whether you should merge or rebase is a hot topic! some good reads:
- [this](https://www.atlassian.com/git/tutorials/merging-vs-rebasing)
- [this](http://web.archive.org/web/20210106220723/https://derekgourlay.com/blog/git-when-to-merge-vs-when-to-rebase/)
- [this](https://stackoverflow.com/questions/804115/when-do-you-use-git-rebase-instead-of-git-merge)
# Further reading
[](https://xkcd.com/1597/)
- [Learn git branching](https://learngitbranching.js.org/)
- [How to explain git in simple words](https://smusamashah.github.io/blog/2017/10/14/explain-git-in-simple-words)
- [Git from the bottom up](https://jwiegley.github.io/git-from-the-bottom-up/)
- [Git for computer scientists](http://eagain.net/articles/git-for-computer-scientists/)
- [Oh shit, git!](https://ohshitgit.com/)
- [The Pro Git book](https://git-scm.com/book/en/v2)
# Exercises
1. On a repo try modifying an existing file. What happens when you do `git stash`? What do you see when running `git log --all --oneline`? Run `git stash pop` to undo what you did with `git stash`. In what scenario might this be useful?
1. One common mistake when learning git is to commit large files that should not be managed by git or adding sensitive information. Try adding a file to a repository, making some commits and then deleting that file from history (you may want to look at [this](https://help.github.com/articles/removing-sensitive-data-from-a-repository/)). Also if you do want git to manage large files for you, look into [Git-LFS](https://git-lfs.github.com/)
1. Git is really convenient for undoing changes but one has to be familiar even with the most unlikely changes
1. If a file is mistakenly modified in some commit it can be reverted with `git revert`. However if a commit involves several changes `revert` might not be the best option. How can we use `git checkout` to recover a file version from a specific commit?
1. Create a branch, make a commit in said branch and then delete it. Can you still recover said commit? Try looking into `git reflog`. (Note: Recover dangling things quickly, git will periodically automatically clean up commits that nothing points to.)
1. If one is too trigger happy with `git reset --hard` instead of `git reset` changes can be easily lost. However since the changes were staged, we can recover them. (look into `git fsck --lost-found` and `.git/lost-found`)
1. In any git repo look under the folder `.git/hooks` you will find a bunch of scripts that end with `.sample`. If you rename them without the `.sample` they will run based on their name. For instance `pre-commit` will execute before doing a commit. Experiment with them
1. Like many command line tools `git` provides a configuration file (or dotfile) called `~/.gitconfig` . Create and alias using `~/.gitconfig` so that when you run `git graph` you get the output of `git log --oneline --decorate --all --graph` (this is a good command to quickly visualize the commit graph)
1. Git also lets you define global ignore patterns under `~/.gitignore_global`, this is useful to prevent common errors like adding RSA keys. Create a `~/.gitignore_global` file and add the pattern `*rsa`, then test that it works in a repo.
1. Once you start to get more familiar with `git`, you will find yourself running into common tasks, such as editing your `.gitignore`. [git extras](https://github.com/tj/git-extras/blob/master/Commands.md) provides a bunch of little utilities that integrate with `git`. For example `git ignore PATTERN` will add the specified pattern to the `.gitignore` file in your repo and `git ignore-io LANGUAGE` will fetch the common ignore patterns for that language from [gitignore.io](https://www.gitignore.io). Install `git extras` and try using some tools like `git alias` or `git ignore`.
1. Git GUI programs can be a great resource sometimes. Try running [gitk](https://git-scm.com/docs/gitk) in a git repo an explore the different parts of the interface. Then run `gitk --all` what are the differences?
1. Once you get used to command line applications GUI tools can feel cumbersome/bloated. A nice compromise between the two are ncurses based tools which can be navigated from the command line and still provide an interactive interface. Git has [tig](https://github.com/jonas/tig), try installing it and running it in a repo. You can find some usage examples [here](https://www.atlassian.com/blog/git/git-tig).
{% comment %}
- forced push + `--force-with-lease`
- git merge/rebase --abort
- git blame
- exercise about why rebasing public commits is bad
{% endcomment %}
================================================
FILE: _2019/virtual-machines.md
================================================
---
layout: lecture
title: "Virtual Machines and Containers"
presenter: Anish, Jon
video:
aspect: 56.25
id: LJ9ki5zq6Ik
---
# Virtual Machines
Virtual machines are simulated computers. You can configure a guest virtual
machine with some operating system and configuration and use it without
affecting your host environment.
For this class, you can use VMs to experiment with operating systems, software,
and configurations without risk: you won't affect your primary development
environment.
In general, VMs have lots of uses. They are commonly used for running software
that only runs on a certain operating system (e.g. using a Windows VM on Linux
to run Windows-specific software). They are often used for experimenting with
potentially malicious software.
## Useful features
- **Isolation**: hypervisors do a pretty good job of isolating the guest from
the host, so you can use VMs to run buggy or untrusted software reasonably
safely.
- **Snapshots**: you can take "snapshots" of your virtual machine, capturing
the entire machine state (disk, memory, etc.), make changes to your machine,
and then restore to an earlier state. This is useful for testing out
potentially destructive actions, among other things.
## Disadvantages
Virtual machines are generally slower than running on bare metal, so they may
be unsuitable for certain applications.
## Setup
- **Resources**: shared with host machine; be aware of this when allocating
physical resources.
- **Networking**: many options, default NAT should work fine for most use
cases.
- **Guest addons**: many hypervisors can install software in the guest to
enable nicer integration with host system. You should use this if you can.
## Resources
- Hypervisors
- [VirtualBox](https://www.virtualbox.org/) (open-source)
- [Virt-manager](https://virt-manager.org/) (open-source, manages KVM virtual machines and LXC containers)
- [VMWare](https://www.vmware.com/) (commercial, available from IS&T [for
MIT students](https://ist.mit.edu/vmware-fusion))
If you are already familiar with popular hypervisors/VMs you may want to learn more about how to do this from a command line friendly way. One option is the [libvirt](https://wiki.libvirt.org/page/UbuntuKVMWalkthrough) toolkit which allows you to manage multiple different virtualization providers/hypervisors.
## Exercises
1. Download and install a hypervisor.
1. Create a new virtual machine and install a Linux distribution (e.g.
[Debian](https://www.debian.org/)).
1. Experiment with snapshots. Try things that you've always wanted to try, like
running `sudo rm -rf --no-preserve-root /`, and see if you can recover
easily.
1. Read what a [fork-bomb](https://en.wikipedia.org/wiki/Fork_bomb) (`:(){ :|:& };:`) is and run it on the VM to see that the resource isolation (CPU, Memory, &c) works.
1. Install guest addons and experiment with different windowing modes, file
sharing, and other features.
# Containers
Virtual Machines are relatively heavy-weight; what if you want to spin
up machines in an automated fashion? Enter containers!
- Amazon Firecracker
- Docker
- rkt
- lxc
Containers are _mostly_ just an assembly of various Linux security
features, like virtual file system, virtual network interfaces, chroots,
virtual memory tricks, and the like, that together give the appearance
of virtualization.
Not quite as secure or isolated as a VM, but pretty close and getting
better. Usually higher performance, and much faster to start, but not
always.
The performance boost comes from the fact that unlike VMs which run an entire copy of the operating system, containers share the linux kernel with the host. However note that if you are running linux containers on Windows/macOS a Linux VM will need to be active as a middle layer between the two.

_Comparison between Docker containers and Virtual Machines. Credit: blog.docker.com_
Containers are handy for when you want to run an automated task in a
standardized setup:
- Build systems
- Development environments
- Pre-packaged servers
- Running untrusted programs
- Grading student submissions
- (Some) cloud computing
- Continuous integration
- Travis CI
- GitHub Actions
Moreover, container software like Docker has also been extensively used as a solution for [dependency hell](https://en.wikipedia.org/wiki/Dependency_hell). If a machine needs to be running many services with conflicting dependencies they can be isolated using containers.
Usually, you write a file that defines how to construct your container.
You start with some minimal _base image_ (like Alpine Linux), and then
a list of commands to run to set up the environment you want (install
packages, copy files, build stuff, write config files, etc.). Normally,
there's also a way to specify any external ports that should be
available, and an _entrypoint_ that dictates what command should be run
when the container is started (like a grading script).
In a similar fashion to code repository websites (like [GitHub](https://github.com/)) there are some container repository websites (like [DockerHub](https://hub.docker.com/))where many software services have prebuilt images that one can easily deploy.
## Exercises
1. Choose a container software (Docker, LXC, …) and install a simple Linux image. Try SSHing into it.
1. Search and download a prebuilt container image for a popular web server (nginx, apache, …)
================================================
FILE: _2019/web.md
================================================
---
layout: lecture
title: "Web and Browsers"
presenter: Jose
video:
aspect: 62.5
id: XpZO3S8odec
---
Apart from the terminal, the web browser is a tool you will find yourself spending significant amounts of time into. Thus it is worth learning how to use it efficiently and
## Shortcuts
Clicking around in your browser is often not the fastest option, getting familiar with common shortcuts can really pay off in the long run.
- `Middle Button Click` in a link opens it in a new tab
- `Ctrl+T` Opens a new tab
- `Ctrl+Shift+T` Reopens a recently closed tab
- `Ctrl+L` selects the contents of the search bar
- `Ctrl+F` to search within a webpage. If you do this often, you may benefit from an extension that supports regular expressions in searches.
## Search operators
Web search engines like Google or DuckDuckGo provide search operators to enable more elaborate web searches:
- `"bar foo"` enforces an exact match of bar foo
- `foo site:bar.com` searches for foo within bar.com
- `foo -bar ` excludes the terms containing bar from the search
- `foobar filetype:pdf` Searches for files of that extension
- `(foo|bar)` searches for matches that have foo OR bar
More through lists are available for popular engines like [Google](https://ahrefs.com/blog/google-advanced-search-operators/) and [DuckDuckGo](https://duck.co/help/results/syntax)
## Searchbar
The searchbar is a powerful tool too. Most browsers can infer search engines from websites and will store them. By editing the keyword argument
- In Google Chrome they are in [chrome://settings/searchEngines](chrome://settings/searchEngines)
- In Firefox they are in [about:preferences#search](about:preferences#search)
For example you can make so that `y SOME SEARCH TERMS` to directly search in youtube.
Moreover, if you own a domain you can setup subdomain forwards using your registrar. For instance I have mapped `https://ht.josejg.com` to this course website. That way I can just type `ht.` and the searchbar will autocomplete. Another good feature of this setup is that unlike bookmarks they will work in every browser.
## Privacy extensions
Nowadays surfing the web can get quite annoying due to ads and invasive due to trackers. Moreover a good adblocker not only blocks most ad content but it will also block sketchy and malicious websites since they will be included in the common blacklists. They will also reduce page load times sometimes by reducing the amount of requests performed. A couple of recommendations are:
- **uBlock origin** ([Chrome](https://chrome.google.com/webstore/detail/ublock-origin/cjpalhdlnbpafiamejdnhcphjbkeiagm), [Firefox](https://addons.mozilla.org/en-US/firefox/addon/ublock-origin/)): block ads and trackers based on predefined rules. You should also consider taking a look at the enabled blacklists in settings since you can enable more based on your region or browsing habits. You can even install filters from [around the web](https://github.com/gorhill/uBlock/wiki/Filter-lists-from-around-the-web)
- **[Privacy Badger](https://privacybadger.org/)**: detects and blocks trackers automatically. For example when you go from website to website ad companies track which sites you visit and build a profile of you
- **[HTTPS everywhere](https://www.eff.org/https-everywhere)** is a wonderful extension that redirects to HTTPS version of a website automatically, if available.
You can find about more addons of this kind [here](https://www.privacytools.io/privacy-browser-addons/)
## Style customization
Web browsers are just another piece of software running in _your machine_ and thus you usually have the last say about what they should display or how they should behave. An example of this are custom styles. Browsers determine how to render the style of a webpage using Cascading Style Sheets often abbreviated as CSS.
You can access the source code of a website by inspecting it and changing its contents and styles temporarily (this is also a reason why you should never trust webpage screenshots).
If you want to permanently tell your browser to override the style settings for a webpage you will need to use an extension. Our recommendation is **[Stylus](https://github.com/openstyles/stylus)** ([Firefox](https://addons.mozilla.org/en-US/firefox/addon/styl-us/), [Chrome](https://chrome.google.com/webstore/detail/stylus/clngdbkpkpeebahjckkjfobafhncgmne?hl=en)).
For example, we can write the following style for the class website
```css
body {
background-color: #2d2d2d;
color: #eee;
font-family: Fira Code;
font-size: 16pt;
}
a:link {
text-decoration: none;
color: #0a0;
}
```
Moreover, Stylus can find styles written by other users and published in [userstyles.org](https://userstyles.org/). Most common websites have one or several dark theme stylesheets for instance. FYI, you should not use Stylish since it was shown to leak user data, more [here](https://arstechnica.com/information-technology/2018/07/stylish-extension-with-2m-downloads-banished-for-tracking-every-site-visit/)
## Functionality Customization
In the same way that you can modify the style, you can also modify the behaviour of a website by writing custom javascript and them sourcing it using a web browser extension such as [Tampermonkey](https://tampermonkey.net/)
For example the following script enables vim-like navigation using the J and K keys.
```js
// ==UserScript==
// @name VIM HT
// @namespace http://tampermonkey.net/
// @version 0.1
// @description Vim JK for our website
// @author You
// @match https://hacker-tools.github.io/*
// @grant none
// ==/UserScript==
(function() {
'use strict';
window.onkeyup = function(e) {
var key = e.keyCode ? e.keyCode : e.which;
if (key == 74) { // J is key 74
window.scrollBy(0,500);;
}else if (key == 75) { // K is key 75
window.scrollBy(0,-500);;
}
}
})();
```
There are also script repositories such as [OpenUserJS](https://openuserjs.org/) and [Greasy Fork](https://greasyfork.org/en). However, be warned, installing user scripts from others can be very dangerous since they can pretty much do anything such as steal your credit card numbers. Never install a script unless you read the whole thing yourself, understand what it does, and are absolutely sure that you know it isn't doing anything suspicious. Never install a script that contains minified or obfuscated code that you can't read!
## Web APIs
It has become more and more common for webservices to offer an application interface aka web API so you can interact with the services making web requests.
A more in depth introduction to the topic can be found [here](https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Client-side_web_APIs/Introduction). There are [many public APIs](https://github.com/toddmotto/public-apis). Web APIs can be useful for very many reasons:
- **Retrieval**. Web APIs can quite easily provide you information such as maps, weather or what your public ip address. For instance `curl ipinfo.io` will return a JSON object with some details about your public ip, region, location, &c. With proper parsing these tools can be integrated even with command line tools. The following bash functions talks to Googles autocompletion API and returns the first ten matches.
```bash
function c() {
url='https://www.google.com/complete/search?client=hp&hl=en&xhr=t'
# NB: user-agent must be specified to get back UTF-8 data!
curl -H 'user-agent: Mozilla/5.0' -sSG --data-urlencode "q=$*" "$url" |
jq -r ".[1][][0]" |
sed 's,</\?b>,,g'
}
```
- **Interaction**. Web API endpoints can also be used to trigger actions. These usually require some sort of authentication token that you can obtain through the service. For example performing the following
`curl -X POST -H 'Content-type: application/json' --data '{"text":"Hello, World!"}' "https://hooks.slack.com/services/$SLACK_TOKEN"` will send a `Hello, World!` message in a channel.
- **Piping**. Since some services with web APIs are rather popular, common web API "gluing" has already been implemented and is provided with server included. This is the case for services like [If This Then That](https://ifttt.com/) and [Zapier](https://zapier.com/)
## Web Automation
Sometimes web APIs are not enough. If only reading is needed you can use a html parser like `pup` or use a library, for example python has BeautifulSoup. However if interactivity or javascript execution is required those solutions fall short. WebDriver
For example, the following script will save the specified url using the wayback machine simulating the interaction of typing the website.
```python
from selenium.webdriver import Firefox
from selenium.webdriver.common.keys import Keys
def snapshot_wayback(driver, url):
driver.get("https://web.archive.org/")
elem = driver.find_element_by_class_name('web-save-url-input')
elem.clear()
elem.send_keys(url)
elem.send_keys(Keys.RETURN)
driver.close()
driver = Firefox()
url = 'https://hacker-tools.github.io'
snapshot_wayback(driver, url)
```
## Exercises
1. Edit a keyword search engine that you use often in your web browser
1. Install the mentioned extensions. Look into how uBlock Origin/Privacy Badger can be disabled for a website. What differences do you see? Try doing it in a website with plenty of ads like YouTube.
1. Install Stylus and write a custom style for the class website using the CSS provided. Here are some common programming characters `= == === >= => ++ /= ~=`. What happens to them when changing the font to Fira Code? If you want to know more search for programming font ligatures.
1. Find a web api to get the weather in your city/area.
1. Use a WebDriver software like [Selenium](https://docs.seleniumhq.org/) to automate some repetitive manual task that you perform often with your browser.
================================================
FILE: _2020/command-line.md
================================================
---
layout: lecture
title: "命令行环境"
date: 2020-01-21
ready: true
sync: true
syncdate: 2025-08-16
video:
aspect: 56.25
id: e8BO_dYxk5c
solution:
ready: true
url: command-line-solution
---
当您使用 shell 进行工作时,可以使用一些方法改善您的工作流,本节课我们就来讨论这些方法。
我们已经使用 shell 一段时间了,但是到目前为止我们的关注点主要集中在使用不同的命令上面。现在,我们将会学习如何同时执行多个不同的进程并追踪它们的状态、如何停止或暂停某个进程以及如何使进程在后台运行。
我们还将学习一些能够改善您的 shell 及其他工具的工作流的方法,这主要是通过定义别名或基于配置文件对其进行配置来实现的。这些方法都可以帮您节省大量的时间。例如,仅需要执行一些简单的命令,我们就可以在所有的主机上使用相同的配置。我们还会学习如何使用 SSH 操作远端机器。
# 任务控制
某些情况下我们需要中断正在执行的任务,比如当一个命令需要执行很长时间才能完成时(假设我们在使用 `find` 搜索一个非常大的目录结构)。大多数情况下,我们可以使用 `Ctrl-C` 来停止命令的执行。但是它的工作原理是什么呢?为什么有的时候会无法结束进程?
## 结束进程
您的 shell 会使用 UNIX 提供的信号机制执行进程间通信。当一个进程接收到信号时,它会停止执行、处理该信号并基于信号传递的信息来改变其执行。就这一点而言,信号是一种 _软件中断_。
在上面的例子中,当我们输入 `Ctrl-C` 时,shell 会发送一个 `SIGINT` 信号到进程。
下面这个 Python 程序向您展示了捕获信号 `SIGINT` 并忽略它的基本操作,它并不会让程序停止。为了停止这个程序,我们需要使用 `SIGQUIT` 信号,通过输入 `Ctrl-\` 可以发送该信号。
```python
#!/usr/bin/env python
import signal, time
def handler(signum, time):
print("\nI got a SIGINT, but I am not stopping")
signal.signal(signal.SIGINT, handler)
i = 0
while True:
time.sleep(.1)
print("\r{}".format(i), end="")
i += 1
```
如果我们向这个程序发送两次 `SIGINT` ,然后再发送一次 `SIGQUIT`,程序会有什么反应?注意 `^` 是我们在终端输入 `Ctrl` 时的表示形式:
```
$ python sigint.py
24^C
I got a SIGINT, but I am not stopping
26^C
I got a SIGINT, but I am not stopping
30^\[1] 39913 quit python sigint.pyƒ
```
尽管 `SIGINT` 和 `SIGQUIT` 都常常用来发出和终止程序相关的请求。`SIGTERM` 则是一个更加通用的、也更加优雅地退出信号。为了发出这个信号我们需要使用 [`kill`](https://www.man7.org/linux/man-pages/man1/kill.1.html) 命令, 它的语法是: `kill -TERM <PID>`。
## 暂停和后台执行进程
信号可以让进程做其他的事情,而不仅仅是终止它们。例如,`SIGSTOP` 会让进程暂停。在终端中,键入 `Ctrl-Z` 会让 shell 发送 `SIGTSTP` 信号,`SIGTSTP` 是 Terminal Stop 的缩写(即 `terminal` 版本的 SIGSTOP)。
我们可以使用 [`fg`](https://www.man7.org/linux/man-pages/man1/fg.1p.html) 或 [`bg`](http://man7.org/linux/man-pages/man1/bg.1p.html) 命令恢复暂停的工作。它们分别表示在前台继续或在后台继续。
[`jobs`](http://man7.org/linux/man-pages/man1/jobs.1p.html) 命令会列出当前终端会话中尚未完成的全部任务。您可以使用 pid 引用这些任务(也可以用 [`pgrep`](https://www.man7.org/linux/man-pages/man1/pgrep.1.html) 找出 pid)。更加符合直觉的操作是您可以使用百分号 + 任务编号(`jobs` 会打印任务编号)来选取该任务。如果要选择最近的一个任务,可以使用 `$!` 这一特殊参数。
还有一件事情需要掌握,那就是命令中的 `&` 后缀可以让命令在直接在后台运行,这使得您可以直接在 shell 中继续做其他操作,不过它此时还是会使用 shell 的标准输出,这一点有时会比较恼人(这种情况可以使用 shell 重定向处理)。
让已经在运行的进程转到后台运行,您可以键入 `Ctrl-Z` ,然后紧接着再输入 `bg`。注意,后台的进程仍然是您的终端进程的子进程,一旦您关闭终端(会发送另外一个信号 `SIGHUP`),这些后台的进程也会终止。为了防止这种情况发生,您可以使用 [`nohup`](https://www.man7.org/linux/man-pages/man1/nohup.1.html)(一个用来忽略 `SIGHUP` 的封装)来运行程序。针对已经运行的程序,可以使用 `disown` 。除此之外,您可以使用终端多路复用器来实现,下一章节我们会进行详细地探讨。
下面这个简单的会话中展示来了些概念的应用。
```
$ sleep 1000
^Z
[1] + 18653 suspended sleep 1000
$ nohup sleep 2000 &
[2] 18745
appending output to nohup.out
$ jobs
[1] + suspended sleep 1000
[2] - running nohup sleep 2000
$ bg %1
[1] - 18653 continued sleep 1000
$ jobs
[1] - running sleep 1000
[2] + running nohup sleep 2000
$ kill -STOP %1
[1] + 18653 suspended (signal) sleep 1000
$ jobs
[1] + suspended (signal) sleep 1000
[2] - running nohup sleep 2000
$ kill -SIGHUP %1
[1] + 18653 hangup sleep 1000
$ jobs
[2] + running nohup sleep 2000
$ kill -SIGHUP %2
$ jobs
[2] + running nohup sleep 2000
$ kill %2
[2] + 18745 terminated nohup sleep 2000
$ jobs
```
`SIGKILL` 是一个特殊的信号,它不能被进程捕获并且它会马上结束该进程。不过这样做会有一些副作用,例如留下孤儿进程。
您可以在 [这里](<https://en.wikipedia.org/wiki/Signal_(IPC)>) 或输入 [`man signal`](https://www.man7.org/linux/man-pages/man7/signal.7.html) 或使用 `kill -l` 来获取更多关于信号的信息。
# 终端多路复用
当您在使用命令行时,您通常会希望同时执行多个任务。举例来说,您可以想要同时运行您的编辑器,并在终端的另外一侧执行程序。尽管再打开一个新的终端窗口也能达到目的,使用终端多路复用器则是一种更好的办法。
像 [`tmux`](https://www.man7.org/linux/man-pages/man1/tmux.1.html) 这类的终端多路复用器可以允许我们基于面板和标签分割出多个终端窗口,这样您便可以同时与多个 shell 会话进行交互。
不仅如此,终端多路复用使我们可以分离当前终端会话并在将来重新连接。
这让您操作远端设备时的工作流大大改善,避免了 `nohup` 和其他类似技巧的使用。
现在最流行的终端多路器是 [`tmux`](https://www.man7.org/linux/man-pages/man1/tmux.1.html)。`tmux` 是一个高度可定制的工具,您可以使用相关快捷键创建多个标签页并在它们间导航。
`tmux` 的快捷键需要我们掌握,它们都是类似 `<C-b> x` 这样的组合,即需要先按下 `Ctrl+b`,松开后再按下 `x`。`tmux` 中对象的继承结构如下:
- **会话** - 每个会话都是一个独立的工作区,其中包含一个或多个窗口
- `tmux` 开始一个新的会话
- `tmux new -s NAME` 以指定名称开始一个新的会话
- `tmux ls` 列出当前所有会话
- 在 `tmux` 中输入 `<C-b> d` ,将当前会话分离
- `tmux a` 重新连接最后一个会话。您也可以通过 `-t` 来指定具体的会话
- **窗口** - 相当于编辑器或是浏览器中的标签页,从视觉上将一个会话分割为多个部分
- `<C-b> c` 创建一个新的窗口,使用 `<C-d>` 关闭
- `<C-b> N` 跳转到第 _N_ 个窗口,注意每个窗口都是有编号的
- `<C-b> p` 切换到前一个窗口
- `<C-b> n` 切换到下一个窗口
- `<C-b> ,` 重命名当前窗口
- `<C-b> w` 列出当前所有窗口
- **面板** - 像 vim 中的分屏一样,面板使我们可以在一个屏幕里显示多个 shell
- `<C-b> "` 水平分割
- `<C-b> %` 垂直分割
- `<C-b> <方向>` 切换到指定方向的面板,<方向> 指的是键盘上的方向键
- `<C-b> z` 切换当前面板的缩放
- `<C-b> [` 开始往回卷动屏幕。您可以按下空格键来开始选择,回车键复制选中的部分
- `<C-b> <空格>` 在不同的面板排布间切换
扩展阅读:
[这里](https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/) 是一份 `tmux` 快速入门教程, [而这一篇](http://linuxcommand.org/lc3_adv_termmux.php) 文章则更加详细,它包含了 `screen` 命令。您也许想要掌握 [`screen`](https://www.man7.org/linux/man-pages/man1/screen.1.html) 命令,因为在大多数 UNIX 系统中都默认安装有该程序。
# 别名
输入一长串包含许多选项的命令会非常麻烦。因此,大多数 shell 都支持设置别名。shell 的别名相当于一个长命令的缩写,shell 会自动将其替换成原本的命令。例如,bash 中的别名语法如下:
```bash
alias alias_name="command_to_alias arg1 arg2"
```
注意, `=` 两边是没有空格的,因为 [`alias`](https://www.man7.org/linux/man-pages/man1/alias.1p.html) 是一个 shell 命令,它只接受一个参数。
别名有许多很方便的特性:
```bash
# 创建常用命令的缩写
alias ll="ls -lh"
# 能够少输入很多
alias gs="git status"
alias gc="git commit"
alias v="vim"
# 手误打错命令也没关系
alias sl=ls
# 重新定义一些命令行的默认行为
alias mv="mv -i" # -i prompts before overwrite
alias mkdir="mkdir -p" # -p make parent dirs as needed
alias df="df -h" # -h prints human readable format
# 别名可以组合使用
alias la="ls -A"
alias lla="la -l"
# 在忽略某个别名
\ls
# 或者禁用别名
unalias la
# 获取别名的定义
alias ll
# 会打印 ll='ls -lh'
```
值得注意的是,在默认情况下 shell 并不会保存别名。为了让别名持续生效,您需要将配置放进 shell 的启动文件里,像是 `.bashrc` 或 `.zshrc`,下一节我们就会讲到。
# 配置文件(Dotfiles)
很多程序的配置都是通过纯文本格式的被称作 _点文件_ 的配置文件来完成的(之所以称为点文件,是因为它们的文件名以 `.` 开头,例如 `~/.vimrc`。也正因为此,它们默认是隐藏文件,`ls` 并不会显示它们)。
shell 的配置也是通过这类文件完成的。在启动时,您的 shell 程序会读取很多文件以加载其配置项。根据 shell 本身的不同,您从登录开始还是以交互的方式完成这一过程可能会有很大的不同。关于这一话题,[这里](https://blog.flowblok.id.au/2013-02/shell-startup-scripts.html) 有非常好的资源。
对于 `bash` 来说,在大多数系统下,您可以通过编辑 `.bashrc` 或 `.bash_profile` 来进行配置。在文件中您可以添加需要在启动时执行的命令,例如上文我们讲到过的别名,或者是您的环境变量。
实际上,很多程序都要求您在 shell 的配置文件中包含一行类似 `export PATH="$PATH:/path/to/program/bin"` 的命令,这样才能确保这些程序能够被 shell 找到。
还有一些其他的工具也可以通过 _点文件_ 进行配置:
- `bash` - `~/.bashrc`, `~/.bash_profile`
- `git` - `~/.gitconfig`
- `vim` - `~/.vimrc` 和 `~/.vim` 目录
- `ssh` - `~/.ssh/config`
- `tmux` - `~/.tmux.conf`
我们应该如何管理这些配置文件呢,它们应该在它们的文件夹下,并使用版本控制系统进行管理,然后通过脚本将其 **符号链接** 到需要的地方。这么做有如下好处:
- **安装简单**: 如果您登录了一台新的设备,在这台设备上应用您的配置只需要几分钟的时间;
- **可移植性**: 您的工具在任何地方都以相同的配置工作
- **同步**: 在一处更新配置文件,可以同步到其他所有地方
- **变更追踪**: 您可能要在整个程序员生涯中持续维护这些配置文件,而对于长期项目而言,版本历史是非常重要的
配置文件中需要放些什么?您可以通过在线文档和 [帮助手册](https://en.wikipedia.org/wiki/Man_page) 了解所使用工具的设置项。另一个方法是在网上搜索有关特定程序的文章,作者们在文章中会分享他们的配置。还有一种方法就是直接浏览其他人的配置文件:您可以在这里找到无数的 [dotfiles 仓库](https://github.com/search?o=desc&q=dotfiles&s=stars&type=Repositories) —— 其中最受欢迎的那些可以在 [这里](https://github.com/mathiasbynens/dotfiles) 找到(我们建议您不要直接复制别人的配置)。[这里](https://dotfiles.github.io/) 也有一些非常有用的资源。
本课程的老师们也在 GitHub 上开源了他们的配置文件:
[Anish](https://github.com/anishathalye/dotfiles),
[Jon](https://github.com/jonhoo/configs),
[Jose](https://github.com/jjgo/dotfiles).
## 可移植性
配置文件的一个常见的痛点是它可能并不能在多种设备上生效。例如,如果您在不同设备上使用的操作系统或者 shell 是不同的,则配置文件是无法生效的。或者,有时您仅希望特定的配置只在某些设备上生效。
有一些技巧可以轻松达成这些目的。如果配置文件支持 if 语句或类似的东西,则您可以借助它针对不同的设备编写不同的配置。例如,您的 shell 可以包含:
```bash
if [[ "$(uname)" == "Linux" ]]; then {do_something}; fi
# 使用和 shell 相关的配置时先检查当前 shell 类型
if [[ "$SHELL" == "zsh" ]]; then {do_something}; fi
# 您也可以针对特定的设备进行配置
if [[ "$(hostname)" == "myServer" ]]; then {do_something}; fi
```
如果配置文件支持 include 功能,您也可以多加利用。例如:`~/.gitconfig` 可以这样编写:
```
[include]
path = ~/.gitconfig_local
```
然后我们可以在日常使用的设备上创建配置文件 `~/.gitconfig_local` 来包含与该设备相关的特定配置。您甚至应该创建一个单独的代码仓库来管理这些与设备相关的配置。
如果您希望在不同的程序之间共享某些配置,该方法也适用。例如,如果您想要在 `bash` 和 `zsh` 中同时启用一些别名,您可以把它们写在 `.aliases` 里,然后在这两个 shell 里应用:
```bash
# Test if ~/.aliases exists and source it
if [ -f ~/.aliases ]; then
source ~/.aliases
fi
```
# 远端设备
对于程序员来说,在他们的日常工作中使用远程服务器已经非常普遍了。如果您需要使用远程服务器来部署后端软件或您需要一些计算能力强大的服务器,您就会用到安全 shell(SSH)。和其他工具一样,SSH 也是可以高度定制的,也值得我们花时间学习它。
通过如下命令,您可以使用 `ssh` 连接到其他服务器:
```bash
ssh foo@bar.mit.edu
```
这里我们尝试以用户名 `foo` 登录服务器 `bar.mit.edu`。服务器可以通过 URL 指定(例如 `bar.mit.edu`),也可以使用 IP 指定(例如 `foobar@192.168.1.42`)。后面我们会介绍如何修改 ssh 配置文件使我们可以用类似 `ssh bar` 这样的命令来登录服务器。
## 执行命令
`ssh` 的一个经常被忽视的特性是它可以直接远程执行命令。
`ssh foobar@server ls` 可以直接在用 foobar 的命令下执行 `ls` 命令。
想要配合管道来使用也可以, `ssh foobar@server ls | grep PATTERN` 会在本地查询远端 `ls` 的输出而 `ls | ssh foobar@server grep PATTERN` 会在远端对本地 `ls` 输出的结果进行查询。
## SSH 密钥
基于密钥的验证机制使用了密码学中的公钥,我们只需要向服务器证明客户端持有对应的私钥,而不需要公开其私钥。这样您就可以避免每次登录都输入密码的麻烦了秘密就可以登录。不过,私钥(通常是 `~/.ssh/id_rsa` 或者 `~/.ssh/id_ed25519`) 等效于您的密码,所以一定要好好保存它。
### 密钥生成
使用 [`ssh-keygen`](http://man7.org/linux/man-pages/man1/ssh-keygen.1.html) 命令可以生成一对密钥:
```bash
ssh-keygen -o -a 100 -t ed25519 -f ~/.ssh/id_ed25519
```
您可以为密钥设置密码,防止有人持有您的私钥并使用它访问您的服务器。您可以使用 [`ssh-agent`](https://www.man7.org/linux/man-pages/man1/ssh-agent.1.html) 或 [`gpg-agent`](https://linux.die.net/man/1/gpg-agent) ,这样就不需要每次都输入该密码了。
如果您曾经配置过使用 SSH 密钥推送到 GitHub,那么可能您已经完成了 [这里](https://help.github.com/articles/connecting-to-github-with-ssh/) 介绍的这些步骤,并且已经有了一个可用的密钥对。要检查您是否持有密码并验证它,您可以运行 `ssh-keygen -y -f /path/to/key`.
### 基于密钥的认证机制
`ssh` 会查询 `.ssh/authorized_keys` 来确认那些用户可以被允许登录。您可以通过下面的命令将一个公钥拷贝到这里:
```bash
cat .ssh/id_ed25519.pub | ssh foobar@remote 'cat >> ~/.ssh/authorized_keys'
```
如果支持 `ssh-copy-id` 的话,可以使用下面这种更简单的解决方案:
```bash
ssh-copy-id -i .ssh/id_ed25519.pub foobar@remote
```
## 通过 SSH 复制文件
使用 ssh 复制文件有很多方法:
- `ssh+tee`, 最简单的方法是执行 `ssh` 命令,然后通过这样的方法利用标准输入实现 `cat localfile | ssh remote_server tee serverfile`。回忆一下,[`tee`](https://www.man7.org/linux/man-pages/man1/tee.1.html) 命令会将标准输出写入到一个文件;
- [`scp`](https://www.man7.org/linux/man-pages/man1/scp.1.html) :当需要拷贝大量的文件或目录时,使用 `scp` 命令则更加方便,因为它可以方便的遍历相关路径。语法如下:`scp path/to/local_file remote_host:path/to/remote_file`;
- [`rsync`](https://www.man7.org/linux/man-pages/man1/rsync.1.html) 对 `scp` 进行了改进,它可以检测本地和远端的文件以防止重复拷贝。它还可以提供一些诸如符号连接、权限管理等精心打磨的功能。甚至还可以基于 `--partial` 标记实现断点续传。`rsync` 的语法和 `scp` 类似;
## 端口转发
很多情况下我们都会遇到软件需要监听特定设备的端口。如果是在您的本机,可以使用 `localhost:PORT` 或 `127.0.0.1:PORT`。但是如果需要监听远程服务器的端口该如何操作呢?这种情况下远端的端口并不会直接通过网络暴露给您。
此时就需要进行 _端口转发_。端口转发有两种,一种是本地端口转发和远程端口转发(参见下图,该图片引用自这篇 [StackOverflow 文章](https://unix.stackexchange.com/questions/115897/whats-ssh-port-forwarding-and-whats-the-difference-between-ssh-local-and-remot))中的图片。
**本地端口转发**

**远程端口转发**

常见的情景是使用本地端口转发,即远端设备上的服务监听一个端口,而您希望在本地设备上的一个端口建立连接并转发到远程端口上。例如,我们在远端服务器上运行 Jupyter notebook 并监听 `8888` 端口。 然后,建立从本地端口 `9999` 的转发,使用 `ssh -L 9999:localhost:8888 foobar@remote_server` 。这样只需要访问本地的 `localhost:9999` 即可。
## SSH 配置
我们已经介绍了很多参数。为它们创建一个别名是个好想法,我们可以这样做:
```bash
alias my_server="ssh -i ~/.id_ed25519 --port 2222 -L 9999:localhost:8888 foobar@remote_server"
```
不过,更好的方法是使用 `~/.ssh/config`.
```bash
Host vm
User foobar
HostName 172.16.174.141
Port 2222
IdentityFile ~/.ssh/id_ed25519
LocalForward 9999 localhost:8888
# 在配置文件中也可以使用通配符
Host *.mit.edu
User foobaz
```
这么做的好处是,使用 `~/.ssh/config` 文件来创建别名,类似 `scp`、`rsync` 和 `mosh` 的这些命令都可以读取这个配置并将设置转换为对应的命令行选项。
注意,`~/.ssh/config` 文件也可以被当作配置文件,而且一般情况下也是可以被导入其他配置文件的。不过,如果您将其公开到互联网上,那么其他人都将会看到您的服务器地址、用户名、开放端口等等。这些信息可能会帮助到那些企图攻击您系统的黑客,所以请务必三思。
服务器侧的配置通常放在 `/etc/ssh/sshd_config`。您可以在这里配置免密认证、修改 ssh 端口、开启 X11 转发等等。 您也可以为每个用户单独指定配置。
## 杂项
连接远程服务器的一个常见痛点是遇到由关机、休眠或网络环境变化导致的掉线。如果连接的延迟很高也很让人讨厌。[Mosh](https://mosh.org/)(即 mobile shell )对 ssh 进行了改进,它允许连接漫游、间歇连接及智能本地回显。
有时将一个远端文件夹挂载到本地会比较方便, [sshfs](https://github.com/libfuse/sshfs) 可以将远端服务器上的一个文件夹挂载到本地,然后您就可以使用本地的编辑器了。
# Shell & 框架
在 shell 工具和脚本那节课中我们已经介绍了 `bash` shell,因为它是目前最通用的 shell,大多数的系统都将其作为默认 shell。但是,它并不是唯一的选项。
例如,`zsh` shell 是 `bash` 的超集并提供了一些方便的功能:
- 智能替换, `**`
- 行内替换/通配符扩展
- 拼写纠错
- 更好的 tab 补全和选择
- 路径展开 (`cd /u/lo/b` 会被展开为 `/usr/local/bin`)
**框架** 也可以改进您的 shell。比较流行的通用框架包括 [prezto](https://github.com/sorin-ionescu/prezto) 或 [oh-my-zsh](https://ohmyz.sh/)。还有一些更精简的框架,它们往往专注于某一个特定功能,例如 [zsh 语法高亮](https://github.com/zsh-users/zsh-syntax-highlighting) 或 [zsh 历史子串查询](https://github.com/zsh-users/zsh-history-substring-search)。像 [fish](https://fishshell.com/) 这样的 shell 已经默认包含了许多这类用户友好的功能,包括:
- 向右对齐
- 命令语法高亮
- 历史子串查询
- 基于手册页面的选项补全
- 更智能的自动补全
- 提示符主题
需要注意的是,使用这些框架可能会降低您 shell 的性能,尤其是如果这些框架的代码没有优化或者代码过多。您随时可以测试其性能或禁用某些不常用的功能来实现速度与功能的平衡。
# 终端模拟器
和自定义 shell 一样,花点时间选择适合您的 **终端模拟器** 并进行设置是很有必要的。有许多终端模拟器可供您选择(这里有一些关于它们之间 [比较](https://anarc.at/blog/2018-04-12-terminal-emulators-1/) 的信息)
您会花上很多时间在使用终端上,因此研究一下终端的设置是很有必要的,您可以从下面这些方面来配置您的终端:
- 字体选择
- 彩色主题
- 快捷键
- 标签页/面板支持
- 回退配置
- 性能(像 [Alacritty](https://github.com/jwilm/alacritty) 或者 [kitty](https://sw.kovidgoyal.net/kitty/) 这种比较新的终端,它们支持 GPU 加速)。
# 课后练习
[习题解答]({{site.url}}/{{site.solution_url}}/{{page.solution.url}})
## 任务控制
1. 我们可以使用类似 `ps aux | grep` 这样的命令来获取任务的 pid ,然后您可以基于 pid 来结束这些进程。但我们其实有更好的方法来做这件事。在终端中执行 `sleep 10000` 这个任务。然后用 `Ctrl-Z` 将其切换到后台并使用 `bg` 来继续允许它。现在,使用 [`pgrep`](https://www.man7.org/linux/man-pages/man1/pgrep.1.html) 来查找 pid 并使用 [`pkill`](https://www.man7.org/linux/man-pages/man1/pgrep.1.html) 结束进程而不需要手动输入 pid。(提示:: 使用 `-af` 标记)。
2. 如果您希望某个进程结束后再开始另外一个进程, 应该如何实现呢?在这个练习中,我们使用 `sleep 60 &` 作为先执行的程序。一种方法是使用 [`wait`](http://man7.org/linux/man-pages/man1/wait.1p.html) 命令。尝试启动这个休眠命令,然后待其结束后再执行 `ls` 命令。
但是,如果我们在不同的 bash 会话中进行操作,则上述方法就不起作用了。因为 `wait` 只能对子进程起作用。之前我们没有提过的一个特性是,`kill` 命令成功退出时其状态码为 0 ,其他状态则是非 0。`kill -0` 则不会发送信号,但是会在进程不存在时返回一个不为 0 的状态码。请编写一个 bash 函数 `pidwait` ,它接受一个 pid 作为输入参数,然后一直等待直到该进程结束。您需要使用 `sleep` 来避免浪费 CPU 性能。
## 终端多路复用
1. 请完成这个 `tmux` [教程](https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/) 参考 [这些步骤](https://www.hamvocke.com/blog/a-guide-to-customizing-your-tmux-conf/) 来学习如何自定义 `tmux`。
## 别名
1. 创建一个 `dc` 别名,它的功能是当我们错误的将 `cd` 输入为 `dc` 时也能正确执行。
2. 执行 `history | awk '{$1="";print substr($0,2)}' | sort | uniq -c | sort -n | tail -n 10` 来获取您最常用的十条命令,尝试为它们创建别名。注意:这个命令只在 Bash 中生效,如果您使用 ZSH,使用 `history 1` 替换 `history`。
## 配置文件
让我们帮助您进一步学习配置文件:
1. 为您的配置文件新建一个文件夹,并设置好版本控制
2. 在其中添加至少一个配置文件,比如说您的 shell,在其中包含一些自定义设置(可以从设置 `$PS1` 开始)。
3. 建立一种在新设备进行快速安装配置的方法(无需手动操作)。最简单的方法是写一个 shell 脚本对每个文件使用 `ln -s`,也可以使用 [专用工具](https://dotfiles.github.io/utilities/)
4. 在新的虚拟机上测试该安装脚本。
5. 将您现有的所有配置文件移动到项目仓库里。
6. 将项目发布到 GitHub。
## 远端设备
进行下面的练习需要您先安装一个 Linux 虚拟机(如果已经安装过则可以直接使用),如果您对虚拟机尚不熟悉,可以参考 [这篇教程](https://hibbard.eu/install-ubuntu-virtual-box/) 来进行安装。
1. 前往 `~/.ssh/` 并查看是否已经存在 SSH 密钥对。如果不存在,请使用 `ssh-keygen -o -a 100 -t ed25519` 来创建一个。建议为密钥设置密码然后使用 `ssh-agent`,更多信息可以参考 [这里](https://www.ssh.com/ssh/agent);
2. 在 `.ssh/config` 加入下面内容:
```bash
Host vm
User username_goes_here
HostName ip_goes_here
IdentityFile ~/.ssh/id_ed25519
LocalForward 9999 localhost:8888
```
3. 使用 `ssh-copy-id vm` 将您的 ssh 密钥拷贝到服务器。
4. 使用 `python -m http.server 8888` 在您的虚拟机中启动一个 Web 服务器并通过本机的 `http://localhost:9999` 访问虚拟机上的 Web 服务器
5. 使用 `sudo vim /etc/ssh/sshd_config` 编辑 SSH 服务器配置,通过修改 `PasswordAuthentication` 的值来禁用密码验证。通过修改 `PermitRootLogin` 的值来禁用 root 登录。然后使用 `sudo service sshd restart` 重启 `ssh` 服务器,然后重新尝试。
6. (附加题) 在虚拟机中安装 [`mosh`](https://mosh.org/) 并启动连接。然后断开服务器/虚拟机的网络适配器。mosh 可以恢复连接吗?
7. (附加题) 查看 `ssh` 的 `-N` 和 `-f` 选项的作用,找出在后台进行端口转发的命令是什么?
================================================
FILE: _2020/course-shell.md
================================================
---
layout: lecture
title: "课程概览与 shell"
date: 2020-01-13
ready: true
sync: true
syncdate: 2025-08-16
video:
aspect: 56.25
id: Z56Jmr9Z34Q
solution:
ready: true
url: course-shell-solution
---
# 动机
作为计算机科学家,我们都知道计算机最擅长帮助我们完成重复性的工作。
但是我们却常常忘记这一点也适用于我们使用计算机的方式,而不仅仅是利用计算机程序去帮我们求解问题。
在从事与计算机相关的工作时,我们有很多触手可及的工具可以帮助我们更高效的解决问题。
但是我们中的大多数人实际上只利用了这些工具中的很少一部分,我们常常只是死记硬背一些如咒语般的命令,
或是当我们卡住的时候,盲目地从网上复制粘贴一些命令。
本课程意在帮你解决这一问题。
我们希望教会您如何挖掘现有工具的潜力,并向您介绍一些新的工具。也许我们还可以促使您想要去探索(甚至是去开发)更多的工具。
我们认为这是大多数计算机科学相关课程中缺少的重要一环。
# 课程结构
本课程包含 11 个时长在一小时左右的讲座,每一个讲座都会关注一个
[特定的主题](/missing-semester/2020/)。尽管这些讲座之间基本上是各自独立的,但随着课程的进行,我们会假定您已经掌握了之前的内容。
每个讲座都有在线笔记供查阅,但是课上的很多内容并不会包含在笔记中。因此我们也会把课程录制下来发布到互联网上供大家观看学习。
我们希望能在这 11 个一小时讲座中涵盖大部分必须的内容,因此课程的信息密度是相当大的。为了能帮助您以自己的节奏来掌握讲座内容,每次课程都包含一组练习来帮助您掌握本节课的重点。
课后我们会安排答疑的时间来回答您的问题。如果您参加的是在线课程,可以发送邮件到
[missing-semester@mit.edu](mailto:missing-semester@mit.edu) 来联系我们。
由于时长的限制,我们不可能达到那些专门课程一样的细致程度,我们会适时地将您介绍一些优秀的资源,帮助您深入的理解相关的工具或主题。
但是如果您还有一些特别关注的话题,也请联系我们。
# 主题 1: The Shell
## shell 是什么?
如今的计算机有着多种多样的交互接口让我们可以进行指令的输入,从炫酷的图像用户界面(GUI),语音输入甚至是 AR/VR 都已经无处不在。
这些交互接口可以覆盖 80% 的使用场景,但是它们也从根本上限制了您的操作方式——你不能点击一个不存在的按钮或者是用语音输入一个还没有被录入的指令。
为了充分利用计算机的能力,我们不得不回到最根本的方式,使用文字接口:Shell
几乎所有您能够接触到的平台都支持某种形式的 shell,有些甚至还提供了多种 shell 供您选择。虽然它们之间有些细节上的差异,但是其核心功能都是一样的:它允许你执行程序,输入并获取某种半结构化的输出。
本节课我们会使用 Bourne Again SHell, 简称 "bash" 。
这是被最广泛使用的一种 shell,它的语法和其他的 shell 都是类似的。打开 shell _提示符_(您输入指令的地方),您首先需要打开 _终端_ 。您的设备通常都已经内置了终端,或者您也可以安装一个,非常简单。
## 使用 shell
当您打开终端时,您会看到一个提示符,它看起来一般是这个样子的:
```console
missing:~$
```
这是 shell 最主要的文本接口。它告诉你,你的主机名是 `missing` 并且您当前的工作目录("current working directory")或者说您当前所在的位置是 `~` (表示 "home")。 `$` 符号表示您现在的身份不是 root 用户(稍后会介绍)。在这个提示符中,您可以输入 _命令_ ,命令最终会被 shell 解析。最简单的命令是执行一个程序:
```console
missing:~$ date
Fri 10 Jan 2020 11:49:31 AM EST
missing:~$
```
这里,我们执行了 `date` 这个程序,不出意料地,它打印出了当前的日期和时间。然后,shell 等待我们输入其他命令。我们可以在执行命令的同时向程序传递 _参数_ :
```console
missing:~$ echo hello
hello
```
上例中,我们让 shell 执行 `echo` ,同时指定参数 `hello`。`echo` 程序将该参数打印出来。
shell 基于空格分割命令并进行解析,然后执行第一个单词代表的程序,并将后续的单词作为程序可以访问的参数。如果您希望传递的参数中包含空格(例如一个名为 My Photos 的文件夹),您要么用使用单引号,双引号将其包裹起来,要么使用转义符号 `\` 进行处理(`My\ Photos`)。
但是,shell 是如何知道去哪里寻找 `date` 或 `echo` 的呢?其实,类似于 Python 或 Ruby,shell 是一个编程环境,所以它具备变量、条件、循环和函数(下一课进行讲解)。当你在 shell 中执行命令时,您实际上是在执行一段 shell 可以解释执行的简短代码。如果你要求 shell 执行某个指令,但是该指令并不是 shell 所了解的编程关键字,那么它会去咨询 _环境变量_ `$PATH`,它会列出当 shell 接到某条指令时,进行程序搜索的路径:
```console
missing:~$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
missing:~$ which echo
/bin/echo
missing:~$ /bin/echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
```
当我们执行 `echo` 命令时,shell 了解到需要执行 `echo` 这个程序,随后它便会在 `$PATH` 中搜索由 `:` 所分割的一系列目录,基于名字搜索该程序。当找到该程序时便执行(假定该文件是 _可执行程序_,后续课程将详细讲解)。确定某个程序名代表的是哪个具体的程序,可以使用
`which` 程序。我们也可以绕过 `$PATH`,通过直接指定需要执行的程序的路径来执行该程序
## 在 shell 中导航
shell 中的路径是一组被分割的目录,在 Linux 和 macOS 上使用 `/` 分割,而在 Windows 上是 `\`。路径 `/` 代表的是系统的根目录,所有的文件夹都包括在这个路径之下,在 Windows 上每个盘都有一个根目录(例如:
`C:\`)。 我们假设您在学习本课程时使用的是 Linux 文件系统。如果某个路径以 `/` 开头,那么它是一个 _绝对路径_,其他的都是 _相对路径_ 。相对路径是指相对于当前工作目录的路径,当前工作目录可以使用 `pwd` 命令来获取。此外,切换目录需要使用 `cd` 命令。在路径中,`.` 表示的是当前目录,而 `..` 表示上级目录:
```console
missing:~$ pwd
/home/missing
missing:~$ cd /home
missing:/home$ pwd
/home
missing:/home$ cd ..
missing:/$ pwd
/
missing:/$ cd ./home
missing:/home$ pwd
/home
missing:/home$ cd missing
missing:~$ pwd
/home/missing
missing:~$ ../../bin/echo hello
hello
```
注意,shell 会实时显示当前的路径信息。您可以通过配置 shell 提示符来显示各种有用的信息,这一内容我们会在后面的课程中进行讨论。
一般来说,当我们运行一个程序时,如果我们没有指定路径,则该程序会在当前目录下执行。例如,我们常常会搜索文件,并在需要时创建文件。
为了查看指定目录下包含哪些文件,我们使用 `ls` 命令:
```console
missing:~$ ls
missing:~$ cd ..
missing:/home$ ls
missing
missing:/home$ cd ..
missing:/$ ls
bin
boot
dev
etc
home
...
```
除非我们利用第一个参数指定目录,否则 `ls` 会打印当前目录下的文件。大多数的命令接受标记和选项(带有值的标记),它们以 `-` 开头,并可以改变程序的行为。通常,在执行程序时使用 `-h` 或 `--help` 标记可以打印帮助信息,以便了解有哪些可用的标记或选项。例如,`ls --help` 的输出如下:
```
-l use a long listing format
```
```console
missing:~$ ls -l /home
drwxr-xr-x 1 missing users 4096 Jun 15 2019 missing
```
这个参数可以更加详细地列出目录下文件或文件夹的信息。首先,本行第一个字符 `d` 表示
`missing` 是一个目录。然后接下来的九个字符,每三个字符构成一组。
(`rwx`). 它们分别代表了文件所有者(`missing`),用户组(`users`) 以及其他所有人具有的权限。其中 `-` 表示该用户不具备相应的权限。从上面的信息来看,只有文件所有者可以修改(`w`),`missing` 文件夹 (例如,添加或删除文件夹中的文件)。为了进入某个文件夹,用户需要具备该文件夹以及其父文件夹的“搜索”权限(以“可执行”:`x`)权限表示。为了列出它的包含的内容,用户必须对该文件夹具备读权限(`r`)。对于文件来说,权限的意义也是类似的。注意,`/bin` 目录下的程序在最后一组,即表示所有人的用户组中,均包含 `x` 权限,也就是说任何人都可以执行这些程序。
在这个阶段,还有几个趁手的命令是您需要掌握的,例如 `mv`(用于重命名或移动文件)、 `cp`(拷贝文件)以及 `mkdir`(新建文件夹)。
如果您想要知道关于程序参数、输入输出的信息,亦或是想要了解它们的工作方式,请试试 `man` 这个程序。它会接受一个程序名作为参数,然后将它的文档(用户手册)展现给您。注意,使用 `q` 可以退出该程序。
```console
missing:~$ man ls
```
## 在程序间创建连接
在 shell 中,程序有两个主要的“流”:它们的输入流和输出流。
当程序尝试读取信息时,它们会从输入流中进行读取,当程序打印信息时,它们会将信息输出到输出流中。
通常,一个程序的输入输出流都是您的终端。也就是,您的键盘作为输入,显示器作为输出。
但是,我们也可以重定向这些流!
最简单的重定向是 `< file` 和 `> file`。这两个命令可以将程序的输入输出流分别重定向到文件:
```console
missing:~$ echo hello > hello.txt
missing:~$ cat hello.txt
hello
missing:~$ cat < hello.txt
hello
missing:~$ cat < hello.txt > hello2.txt
missing:~$ cat hello2.txt
hello
```
您还可以使用 `>>` 来向一个文件追加内容。使用管道( _pipes_ ),我们能够更好的利用文件重定向。
`|` 操作符允许我们将一个程序的输出和另外一个程序的输入连接起来:
```console
missing:~$ ls -l / | tail -n1
drwxr-xr-x 1 root root 4096 Jun 20 2019 var
missing:~$ curl --head --silent google.com | grep --ignore-case content-length | cut --delimiter=' ' -f2
219
```
我们会在数据清理一章中更加详细的探讨如何更好的利用管道。
## 一个功能全面又强大的工具
对于大多数的类 Unix 系统,有一类用户是非常特殊的,那就是:根用户(root user)。
您应该已经注意到了,在上面的输出结果中,根用户几乎不受任何限制,他可以创建、读取、更新和删除系统中的任何文件。
通常在我们并不会以根用户的身份直接登录系统,因为这样可能会因为某些错误的操作而破坏系统。
取而代之的是我们会在需要的时候使用 `sudo` 命令。顾名思义,它的作用是让您可以以 su(super user 或 root 的简写)的身份执行一些操作。
当您遇到拒绝访问(permission denied)的错误时,通常是因为此时您必须是根用户才能操作。然而,请再次确认您是真的要执行此操作。
有一件事情是您必须作为根用户才能做的,那就是向 `sysfs` 文件写入内容。系统被挂载在 `/sys` 下,`sysfs` 文件则暴露了一些内核(kernel)参数。
因此,您不需要借助任何专用的工具,就可以轻松地在运行期间配置系统内核。**注意 Windows 和 macOS 没有这个文件**
例如,您笔记本电脑的屏幕亮度写在 `brightness` 文件中,它位于
```
/sys/class/backlight
```
通过将数值写入该文件,我们可以改变屏幕的亮度。现在,蹦到您脑袋里的第一个想法可能是:
```console
$ sudo find -L /sys/class/backlight -maxdepth 2 -name '*brightness*'
/sys/class/backlight/thinkpad_screen/brightness
$ cd /sys/class/backlight/thinkpad_screen
$ sudo echo 3 > brightness
An error occurred while redirecting file 'brightness'
open: Permission denied
```
出乎意料的是,我们还是得到了一个错误信息。毕竟,我们已经使用了
`sudo` 命令!关于 shell,有件事我们必须要知道。`|`、`>`、和 `<` 是通过 shell 执行的,而不是被各个程序单独执行。
`echo` 等程序并不知道 `|` 的存在,它们只知道从自己的输入输出流中进行读写。
回到上面更改屏幕亮度命令执行的报错,为了能让 `sudo echo` 命令输出的亮度值写入 brightness 文件, _shell_ (权限为当前用户) 会先尝试打开 brightness 文件,但此时操作 shell 的不是根(root)用户,所以系统拒绝了这个打开操作,提示无权限。
明白这一点后,我们可以这样操作:
```console
$ echo 3 | sudo tee brightness
```
此时打开 `/sys` 文件的是 `tee` 这个程序,并且该程序以 `root` 权限在运行,因此操作可以进行。
这样您就可以在 `/sys` 中愉快地玩耍了,例如修改系统中各种 LED 的状态(路径可能会有所不同):
```console
$ echo 1 | sudo tee /sys/class/leds/input6::scrolllock/brightness
```
# 接下来.....
学到这里,您掌握的 shell 知识已经可以完成一些基础的任务了。您应该已经可以查找感兴趣的文件并使用大多数程序的基本功能了。
在下一场讲座中,我们会探讨如何利用 shell 及其他工具执行并自动化更复杂的任务。
# 课后练习
[习题解答]({{site.url}}/{{site.solution_url}}/{{page.solution.url}})
本课程中的每节课都包含一系列练习题。有些题目是有明确目的的,另外一些则是开放题,例如“尝试使用 X 和 Y”,我们强烈建议您一定要动手实践,用于尝试这些内容。
此外,我们没有为这些练习题提供答案。如果有任何困难,您可以发送邮件给我们并描述你已经做出的尝试,我们会设法帮您解答。
1. 本课程需要使用类 Unix shell,例如 Bash 或 ZSH。如果您在 Linux 或者 MacOS 上面完成本课程的练习,则不需要做任何特殊的操作。如果您使用的是 Windows,则您不应该使用 cmd 或是 Powershell;您可以使用 [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/) 或者是 Linux 虚拟机。使用 `echo $SHELL` 命令可以查看您的 shell 是否满足要求。如果打印结果为 `/bin/bash` 或 `/usr/bin/zsh` 则是可以的。
2. 在 `/tmp` 下新建一个名为 `missing` 的文件夹。
3. 用 `man` 查看程序 `touch` 的使用手册。
4. 用 `touch` 在 `missing` 文件夹中新建一个叫 `semester` 的文件。
5. 将以下内容一行一行地写入 `semester` 文件:
```
#!/bin/sh
curl --head --silent https://missing.csail.mit.edu
```
第一行可能有点棘手, `#` 在 Bash 中表示注释,而 `!` 即使被双引号(`"`)包裹也具有特殊的含义。
单引号(`'`)则不一样,此处利用这一点解决输入问题。更多信息请参考 [Bash quoting 手册](https://www.gnu.org/software/bash/manual/html_node/Quoting.html)
6. 尝试执行这个文件。例如,将该脚本的路径(`./semester`)输入到您的 shell 中并回车。如果程序无法
gitextract_gfrnrc45/
├── .editorconfig
├── .github/
│ └── ISSUE_TEMPLATE/
│ └── translation.md
├── .gitignore
├── 404.html
├── CNAME
├── Gemfile
├── README.md
├── _2019/
│ ├── automation.md
│ ├── backups.md
│ ├── command-line.md
│ ├── course-overview.md
│ ├── data-wrangling.md
│ ├── dotfiles.md
│ ├── editors.md
│ ├── files/
│ │ ├── example-data.xml
│ │ └── example.c
│ ├── index.html
│ ├── machine-introspection.md
│ ├── os-customization.md
│ ├── package-management.md
│ ├── program-introspection.md
│ ├── remote-machines.md
│ ├── security.md
│ ├── shell.md
│ ├── version-control.md
│ ├── virtual-machines.md
│ └── web.md
├── _2020/
│ ├── command-line.md
│ ├── course-shell.md
│ ├── data-wrangling.md
│ ├── debugging-profiling.md
│ ├── editors-notes.txt
│ ├── editors.md
│ ├── files/
│ │ ├── example-data.xml
│ │ └── vimrc
│ ├── index.html
│ ├── metaprogramming.md
│ ├── potpourri.md
│ ├── qa.md
│ ├── security.md
│ ├── shell-tools.md
│ └── version-control.md
├── _config.yml
├── _includes/
│ ├── head.html
│ ├── nav.html
│ ├── scaled_image.html
│ ├── scaled_video.html
│ └── video.html
├── _layouts/
│ ├── default.html
│ ├── lecture.html
│ ├── page.html
│ └── redirect.html
├── about.md
├── index.md
├── lectures.html
├── license.md
├── robots.txt
└── static/
├── css/
│ ├── main.css
│ └── syntax.css
└── files/
├── logger.py
├── sorts.py
└── subtitles/
└── 2020/
├── command-line.sbv
├── debugging-profiling.sbv
├── qa.sbv
└── shell-tools.sbv
SYMBOL INDEX (8 symbols across 3 files)
FILE: _2019/files/example.c
function say (line 16) | void say(int i)
function main (line 22) | int main()
FILE: static/files/logger.py
class CustomFormatter (line 4) | class CustomFormatter(logging.Formatter):
method format (line 22) | def format(self, record):
FILE: static/files/sorts.py
function test_sorted (line 4) | def test_sorted(fn, iters=1000):
function insertionsort (line 11) | def insertionsort(array):
function quicksort (line 23) | def quicksort(array):
function quicksort_inplace (line 32) | def quicksort_inplace(array, low=0, high=None):
Condensed preview — 65 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (644K chars).
[
{
"path": ".editorconfig",
"chars": 270,
"preview": "root = true\n\n[*]\ncharset = utf-8\nend_of_line = lf\nindent_style = space\ninsert_final_newline = true\ntrim_trailing_whitesp"
},
{
"path": ".github/ISSUE_TEMPLATE/translation.md",
"chars": 209,
"preview": "---\nname: translation\nabout: choose the file you plan to translate\ntitle: ''\nlabels: trans\nassignees: ''\n\n---\n\nFilename "
},
{
"path": ".gitignore",
"chars": 55,
"preview": ".ruby-version\n.bundle/\n_site/\n.jekyll-metadata\n.claude/"
},
{
"path": "404.html",
"chars": 834,
"preview": "---\nlayout: default\ntitle: \"404: Page not found\"\npermalink: /404.html\n---\n\n<div class=\"error-page\">\n <h1 class=\"title"
},
{
"path": "CNAME",
"chars": 30,
"preview": "missing-semester-cn.github.io\n"
},
{
"path": "Gemfile",
"chars": 49,
"preview": "source 'https://rubygems.org'\ngem 'github-pages'\n"
},
{
"path": "README.md",
"chars": 2743,
"preview": "# 计算机教育中缺失的一课\n\nThe Missing Semester of Your CS Education 英文课程网站在[这里](https://missing.csail.mit.edu/)!\n\n这是[中文站点](https://"
},
{
"path": "_2019/automation.md",
"chars": 4486,
"preview": "---\nlayout: lecture\ntitle: \"Automation\"\npresenter: Jose\nvideo:\n aspect: 56.25\n id: BaLlAaHz-1k\n---\n\nSometimes you writ"
},
{
"path": "_2019/backups.md",
"chars": 8836,
"preview": "---\nlayout: lecture\ntitle: \"Backups\"\npresenter: Jose\nvideo:\n aspect: 56.25\n id: lrpqYF8tcYQ\n---\n\nThere are two types o"
},
{
"path": "_2019/command-line.md",
"chars": 11050,
"preview": "---\nlayout: lecture\ntitle: \"Command-line environment\"\npresenter: Jose\nvideo:\n aspect: 62.5\n id: i0rf1gpKL1E\n---\n\n## Al"
},
{
"path": "_2019/course-overview.md",
"chars": 1384,
"preview": "---\nlayout: lecture\ntitle: \"Course Overview\"\npresenter: Anish\nvideo:\n aspect: 56.25\n id: qw2c6ffSVOM\n---\n\n# Motivation"
},
{
"path": "_2019/data-wrangling.md",
"chars": 15545,
"preview": "---\nlayout: lecture\ntitle: \"Data Wrangling\"\npresenter: Jon\nvideo:\n aspect: 56.25\n id: VW2jn9Okjhw\n---\n\nHave you ever h"
},
{
"path": "_2019/dotfiles.md",
"chars": 5587,
"preview": "---\nlayout: lecture\ntitle: \"Dotfiles\"\npresenter: Anish\nvideo:\n aspect: 62.5\n id: YSZBWWJw3mI\n---\n\nMany programs are co"
},
{
"path": "_2019/editors.md",
"chars": 11038,
"preview": "---\nlayout: lecture\ntitle: \"Editors\"\npresenter: Anish\nvideo:\n aspect: 62.5\n id: 1vLcusYSrI4\n---\n\n# Importance of Edito"
},
{
"path": "_2019/files/example-data.xml",
"chars": 24056,
"preview": "<people>\n <person>\n <name>Johnny Zhang Jr.</name>\n <email>amyalvarez@cole.com</email>\n </person>\n <person>\n "
},
{
"path": "_2019/files/example.c",
"chars": 323,
"preview": "#include <stdio.h>\n\nconst char *numbers[] = {\n \"one\",\n \"two\",\n \"three\",\n \"four\",\n \"five\",\n \"six\",\n "
},
{
"path": "_2019/index.html",
"chars": 2281,
"preview": "---\nlayout: page\ntitle: \"2019 Lectures\"\npermalink: /2019/\n---\n\n<p>Click on specific topics below to see lecture videos a"
},
{
"path": "_2019/machine-introspection.md",
"chars": 5763,
"preview": "---\nlayout: lecture\ntitle: \"Machine Introspection\"\npresenter: Jon\nvideo:\n aspect: 56.25\n id: eNYT2Oq3PF8\n---\n\nSometime"
},
{
"path": "_2019/os-customization.md",
"chars": 2915,
"preview": "---\nlayout: lecture\ntitle: \"OS Customization\"\npresenter: Anish\nvideo:\n aspect: 62.5\n id: epSRVqQzeDo\n---\n\nThere is a l"
},
{
"path": "_2019/package-management.md",
"chars": 4375,
"preview": "---\nlayout: lecture\ntitle: \"Package Management and Dependency Management\"\npresenter: Anish\nvideo:\n aspect: 56.25\n id: "
},
{
"path": "_2019/program-introspection.md",
"chars": 1953,
"preview": "---\nlayout: lecture\ntitle: \"Program Introspection\"\npresenter: Anish\nvideo:\n aspect: 62.5\n id: 74MhV-7hYzg\n---\n\n# Debug"
},
{
"path": "_2019/remote-machines.md",
"chars": 10596,
"preview": "---\nlayout: lecture\ntitle: \"Remote Machines\"\npresenter: Jose\nvideo:\n aspect: 62.5\n id: X5c2Y8BCowM\n---\n\nIt has become "
},
{
"path": "_2019/security.md",
"chars": 10533,
"preview": "---\nlayout: lecture\ntitle: \"Security and Privacy\"\npresenter: Jon\nvideo:\n aspect: 56.25\n id: OBx_c-i-M8s\n---\n\nThe world"
},
{
"path": "_2019/shell.md",
"chars": 13960,
"preview": "---\nlayout: lecture\ntitle: \"Shell and Scripting\"\npresenter: Jon\nvideo:\n aspect: 56.25\n id: dbDRfmH5uSI\n---\n\nThe shell "
},
{
"path": "_2019/version-control.md",
"chars": 18587,
"preview": "---\nlayout: lecture\ntitle: \"Version Control\"\npresenter: Jon\nvideo:\n aspect: 56.25\n id: 3fig2Vz8QXs\n---\n\nWhenever you a"
},
{
"path": "_2019/virtual-machines.md",
"chars": 5486,
"preview": "---\nlayout: lecture\ntitle: \"Virtual Machines and Containers\"\npresenter: Anish, Jon\nvideo:\n aspect: 56.25\n id: LJ9ki5zq"
},
{
"path": "_2019/web.md",
"chars": 9967,
"preview": "---\nlayout: lecture\ntitle: \"Web and Browsers\"\npresenter: Jose\nvideo:\n aspect: 62.5\n id: XpZO3S8odec\n---\n\nApart from th"
},
{
"path": "_2020/command-line.md",
"chars": 15566,
"preview": "---\nlayout: lecture\ntitle: \"命令行环境\"\ndate: 2020-01-21\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n aspect: 56.25"
},
{
"path": "_2020/course-shell.md",
"chars": 8263,
"preview": "---\nlayout: lecture\ntitle: \"课程概览与 shell\"\ndate: 2020-01-13\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n aspect: 5"
},
{
"path": "_2020/data-wrangling.md",
"chars": 10187,
"preview": "---\nlayout: lecture\ntitle: \"数据整理\"\ndate: 2020-01-16\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n aspect: 56.25\n "
},
{
"path": "_2020/debugging-profiling.md",
"chars": 19766,
"preview": "---\nlayout: lecture\ntitle: \"调试及性能分析\"\ndate: 2020-01-23\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n aspect: 56.25"
},
{
"path": "_2020/editors-notes.txt",
"chars": 4000,
"preview": "I use these notes as a reference when teaching. If you're a student who ended\nup here, you probably want to look at edit"
},
{
"path": "_2020/editors.md",
"chars": 10580,
"preview": "---\nlayout: lecture\ntitle: \"编辑器 (Vim)\"\ndate: 2020-01-15\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n aspect: 56."
},
{
"path": "_2020/files/example-data.xml",
"chars": 24056,
"preview": "<people>\n <person>\n <name>Johnny Zhang Jr.</name>\n <email>amyalvarez@cole.com</email>\n </person>\n <person>\n "
},
{
"path": "_2020/files/vimrc",
"chars": 3254,
"preview": "\" Comments in Vimscript start with a `\"`.\n\n\" If you open this file in Vim, it'll be syntax highlighted for you.\n\n\" Vim i"
},
{
"path": "_2020/index.html",
"chars": 903,
"preview": "---\nlayout: page\ntitle: \"2020 Lectures\"\npermalink: /2020/\nphony: true\nexcerpt: '' # work around a bug\n---\n\n<ul class=\"do"
},
{
"path": "_2020/metaprogramming.md",
"chars": 7106,
"preview": "---\nlayout: lecture\ntitle: \"元编程\"\ndetails: 构建系统、依赖管理、测试、持续集成\ndate: 2020-01-27\nready: true\nsync: true\nsyncdate: 2025-08-16"
},
{
"path": "_2020/potpourri.md",
"chars": 12763,
"preview": "---\nlayout: lecture\ntitle: \"大杂烩\"\ndate: 2020-01-29\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n aspect: 56.25\n i"
},
{
"path": "_2020/qa.md",
"chars": 10221,
"preview": "---\nlayout: lecture\ntitle: \"提问&回答\"\ndate: 2020-01-30\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n aspect: 56.25\n "
},
{
"path": "_2020/security.md",
"chars": 9511,
"preview": "---\nlayout: lecture\ntitle: \"安全和密码学\"\ndate: 2020-01-28\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n aspect: 56.25\n"
},
{
"path": "_2020/shell-tools.md",
"chars": 12482,
"preview": "---\nlayout: lecture\ntitle: \"Shell 工具和脚本\"\ndate: 2020-01-14\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n aspect: 5"
},
{
"path": "_2020/version-control.md",
"chars": 14256,
"preview": "---\nlayout: lecture\ntitle: \"版本控制(Git)\"\ndate: 2020-01-22\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n aspect: 56."
},
{
"path": "_config.yml",
"chars": 533,
"preview": "# Setup\ntitle: 'the missing semester of your cs education'\nurl: https://missing-semester-cn.github.io\nsolution_url: miss"
},
{
"path": "_includes/head.html",
"chars": 2280,
"preview": "<head>\n <meta charset=\"utf-8\">\n\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n <link rel=\"a"
},
{
"path": "_includes/nav.html",
"chars": 1421,
"preview": "<div id=\"nav-bg\">\n <nav id=\"top-nav\">\n <input type=\"checkbox\" id=\"menu-icon\">\n <label class=\"menu-label\" for=\"menu-"
},
{
"path": "_includes/scaled_image.html",
"chars": 223,
"preview": "<a href=\"{% if include.href %}{{ include.href }}{% else %}{{ include.src }}{% endif %}\">\n <img src=\"{{ include.src }}\" "
},
{
"path": "_includes/scaled_video.html",
"chars": 173,
"preview": "<video src=\"{{ include.src }}\" {{ include.options }} controls class=\"{{ include.class }}\" style=\"width: 100%; max-width:"
},
{
"path": "_includes/video.html",
"chars": 100,
"preview": "<video src=\"{{ include.src }}\" {{ include.options }} controls class=\"{{ include.class }}\">\n</video>\n"
},
{
"path": "_layouts/default.html",
"chars": 171,
"preview": "<!DOCTYPE html>\n<html lang=\"en\">\n\n {% include head.html %}\n\n <body>\n\n {% include nav.html %}\n\n <div id=\"content\""
},
{
"path": "_layouts/lecture.html",
"chars": 846,
"preview": "---\nlayout: default\n---\n\n<h1 class=\"title\">{{ page.title }}{% if page.subtitle %} <span class=\"subtitle\">{{ page.subtitl"
},
{
"path": "_layouts/page.html",
"chars": 163,
"preview": "---\nlayout: default\n---\n\n<h1 class=\"title\">{{ page.title }}{% if page.subtitle %} <span class=\"subtitle\">{{ page.subtitl"
},
{
"path": "_layouts/redirect.html",
"chars": 412,
"preview": "---\nlayout: null\n---\n<!DOCTYPE html>\n<html>\n <head>\n <meta charset=\"utf-8\">\n <title>\n {{ site.title }} -- {{"
},
{
"path": "about.md",
"chars": 2901,
"preview": "---\nlayout: lecture\ntitle: \"开设此课程的动机\"\n---\n\n在传统的计算机科学课程中,从操作系统、编程语言到机器学习,这些高大上课程和主题已经非常多了。\n然而有一个至关重要的主题却很少被专门讲授,而是留给学生们自己"
},
{
"path": "index.md",
"chars": 3903,
"preview": "---\nlayout: page\ntitle: 计算机教育中缺失的一课\n---\n\n# The Missing Semester of Your CS Education 中文版\n\n大学里的计算机课程通常专注于讲授从操作系统到机器学习这些学院"
},
{
"path": "lectures.html",
"chars": 58,
"preview": "---\nlayout: redirect\nredirect: /2020/\ntitle: Lectures\n---\n"
},
{
"path": "license.md",
"chars": 1877,
"preview": "---\nlayout: default\ntitle: \"License\"\npermalink: /license\n---\n\n# License\n\nAll the content in this course, including the w"
},
{
"path": "robots.txt",
"chars": 24,
"preview": "User-agent: *\nDisallow:\n"
},
{
"path": "static/css/main.css",
"chars": 6402,
"preview": "/* Copyright (c) 2017 Anish Athalye */\n@import url(https://fonts.googleapis.com/css?family=Source+Sans+Pro);\n@import url"
},
{
"path": "static/css/syntax.css",
"chars": 8011,
"preview": "pre.highlight { background-color: #f9f9f9; background-clip: border-box }\n.highlight .c { color: #999988; font-style: ita"
},
{
"path": "static/files/logger.py",
"chars": 1924,
"preview": "import logging\nimport sys\n\nclass CustomFormatter(logging.Formatter):\n \"\"\"Logging Formatter to add colors and count wa"
},
{
"path": "static/files/sorts.py",
"chars": 1314,
"preview": "import random\n\n\ndef test_sorted(fn, iters=1000):\n for i in range(iters):\n l = [random.randint(0, 100) for i in"
},
{
"path": "static/files/subtitles/2020/command-line.sbv",
"chars": 62323,
"preview": "0:00:00.480,0:00:02.480\nOkay, can everyone hear me okay?\n\n0:00:03.720,0:00:06.160\nOkay, so welcome back.\n\n0:00:06.160,0:"
},
{
"path": "static/files/subtitles/2020/debugging-profiling.sbv",
"chars": 61384,
"preview": "0:00:00.000,0:00:04.200\nSo welcome back. Today we are gonna\ncover debugging and profiling.\n\n0:00:04.720,0:00:09.340\nBefo"
},
{
"path": "static/files/subtitles/2020/qa.sbv",
"chars": 68918,
"preview": "0:00:00.000,0:00:06.540\nI guess we should do an intro to to this as well,\n\n0:00:06.540,0:00:09.580\nso this is a just sor"
},
{
"path": "static/files/subtitles/2020/shell-tools.sbv",
"chars": 57002,
"preview": "0:00:00.400,0:00:02.860\nOkay, welcome back.\n\n0:00:02.860,0:00:05.920\nToday we're gonna cover a couple separate\n\n0:00:05."
}
]
About this extraction
This page contains the full source code of the missing-semester-cn/missing-semester-cn.github.io GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 65 files (603.7 KB), approximately 221.4k tokens, and a symbol index with 8 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.