[
  {
    "path": ".editorconfig",
    "content": "root = true\n\n[*]\ncharset = utf-8\nend_of_line = lf\nindent_style = space\ninsert_final_newline = true\ntrim_trailing_whitespace = true\n\n[*.md]\nindent_size = 4\ntrim_trailing_whitespace = false\n\n[*.{html,xml}]\nindent_size = 2\n\n[*.yml]\nindent_size = 2\n\n[*.css]\nindent_size = 2\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/translation.md",
    "content": "---\nname: translation\nabout: choose the file you plan to translate\ntitle: ''\nlabels: trans\nassignees: ''\n\n---\n\nFilename :\nEstimated time of finish :\n\nNote: Please make sure you can finish it within two weeks.\n"
  },
  {
    "path": ".gitignore",
    "content": ".ruby-version\n.bundle/\n_site/\n.jekyll-metadata\n.claude/"
  },
  {
    "path": "404.html",
    "content": "---\nlayout: default\ntitle: \"404: Page not found\"\npermalink: /404.html\n---\n\n<div class=\"error-page\">\n    <h1 class=\"title\">404</h1>\n    <p>Sorry, the page you were looking for doesn't exist or has been moved.</p>\n    <p>You can go back to the <a href=\"/\">home page</a> or use the search bar to find what you're looking for.</p>\n    <p>If you think this is an error, please contact us.</p>\n</div>\n\n<style>\n    .error-page {\n        text-align: center;\n        padding: 50px;\n        font-family: Arial, sans-serif;\n    }\n\n    .error-page .title {\n        font-size: 100px;\n        color: #ff6f61;\n    }\n\n    .error-page p {\n        font-size: 18px;\n        color: #333;\n    }\n\n    .error-page a {\n        color: #007bff;\n        text-decoration: none;\n    }\n\n    .error-page a:hover {\n        text-decoration: underline;\n    }\n</style>\n"
  },
  {
    "path": "CNAME",
    "content": "missing-semester-cn.github.io\n"
  },
  {
    "path": "Gemfile",
    "content": "source 'https://rubygems.org'\ngem 'github-pages'\n"
  },
  {
    "path": "README.md",
    "content": "# 计算机教育中缺失的一课\n\nThe Missing Semester of Your CS Education 英文课程网站在[这里](https://missing.csail.mit.edu/)！\n\n这是[中文站点](https://missing-semester-cn.github.io)(<span style=\"float:right\"><img src = \"https://img.shields.io/badge/最近一次与英文版同步-2021--04--24-green\"></span>)\n\n\n欢迎为本项目做出贡献！如果您要编辑添加内容，请提出 issue 或提交 pull request。\n\n## 开发部署\n\n要在本地构建并查看网站，请运行：\n\n```bash\nbundle install\nbundle exec jekyll serve -w\n```\n\n## 许可说明\n\n本课程的所有内容，包括网站源代码、讲义、练习题和讲课视频，均按照 [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) 国际许可协议进行许可。\n有关贡献或翻译的更多信息，请参见[这里](https://missing.csail.mit.edu/license)。\n\n-----------------\n\n## 2026 课程状态\n\n\n\n请在[sync-offical-2026](https://github.com/missing-semester-cn/missing-semester-cn.github.io/tree/sync-offical-2026)分支的README中认领任务并提交翻译。\n\n| 讲义 | 翻译者 | 状态 |\n| ---- | ---- | ---- |\n| [agentic-coding.md](_2026/agentic-coding.md) | [@Lingfeng AI](https://github.com/hanxiaomax) | 待翻译 |\n| [beyond-code.md](_2026/beyond-code.md) | 待分配 | 待翻译 |\n| [code-quality.md](_2026/code-quality.md) | 待分配 | 待翻译 |\n| [command-line-environment.md](_2026/command-line-environment.md) | 待分配 | 待翻译 |\n| [course-shell.md](_2026/course-shell.md) | 待分配 | 待翻译 |\n| [debugging-profiling.md](_2026/debugging-profiling.md) | 待分配 | 待翻译 |\n| [development-environment.md](_2026/development-environment.md) | 待分配 | 待翻译 |\n| [shipping-code.md](_2026/shipping-code.md) | 待分配 | 待翻译 |\n| [version-control.md](_2026/version-control.md) | 待分配 | 待翻译 |\n\n-----------------\n\n## 项目状态\n\n想要参与这个翻译项目，请通过创建一个 issue 来预订您的主题，我会相应地更新此表格，以避免重复工作。\n\n\n|  讲义   | 翻译者  | 状态 |\n|  ----  | ----  |----  |\n| [course-shell.md](_2020/course-shell.md)  | [@Lingfeng AI](https://github.com/hanxiaomax) | 完成 |\n| [shell-tools.md](_2020/shell-tools.md)  | [@Lingfeng AI](https://github.com/hanxiaomax) | 完成 |\n| [editors.md](_2020/editors.md)  |  [@stechu](https://github.com/stechu) | 完成 |\n| [data-wrangling.md](_2020/data-wrangling.md)  | [@Lingfeng AI](https://github.com/hanxiaomax) | 完成 |\n| [command-line.md](_2020/command-line.md)  | [@Lingfeng AI](https://github.com/hanxiaomax) | 完成 |\n| [version-control.md](_2020/version-control.md)  | [@Lingfeng AI](https://github.com/hanxiaomax) | 完成 |\n| [debugging-profiling.md](_2020/debugging-profiling.md)  |[@Lingfeng AI](https://github.com/hanxiaomax)  | 完成  |\n| [metaprogramming.md](_2020/metaprogramming.md)  | [@Lingfeng AI](https://github.com/hanxiaomax) | 完成 |\n| [security.md](_2020/security.md)  | [@catcarbon](https://github.com/catcarbon) | 完成 |\n| [potpourri.md](_2020/potpourri.md) |  [@catcarbon](https://github.com/catcarbon) | 完成 |\n| [qa.md](_2020/qa.md) | [@AA1HSHH](https://github.com/AA1HSHH) | 完成 |\n| [about.md](about.md)  | [@Binlogo](https://github.com/Binlogo)  | 完成 |\n\n\n## 新项目\n\n[Learncpp中文版](https://github.com/hanxiaomax/Learncpp_CN).\n\n"
  },
  {
    "path": "_2019/automation.md",
    "content": "---\nlayout: lecture\ntitle: \"Automation\"\npresenter: Jose\nvideo:\n  aspect: 56.25\n  id: BaLlAaHz-1k\n---\n\nSometimes you write a script that does something but you want for it to run periodically, say a backup task. You can always write an *ad hoc* solution that runs in the background and comes online periodically. However, most UNIX systems come with the cron daemon which can run task with a frequency up to a minute based on simple rules.\n\nOn most UNIX systems the cron daemon, `crond` will be running by default but you can always check using `ps aux | grep crond`.\n\n## The crontab\n\nThe configuration file for cron can be displayed running `crontab -l` edited running `crontab -e` The time format that cron uses are five space separated fields along with the user and command\n\n- **minute** -  What minute of the hour the command will run on,\n     and is between '0' and '59'\n- **hour** -    This controls what hour the command will run on, and is specified in\n         the 24 hour clock, values must be between 0 and 23 (0 is midnight)\n- **dom** - This is the Day of Month, that you want the command run on, e.g. to\n     run a command on the 19th of each month, the dom would be 19.\n- **month** -   This is the month a specified command will run on, it may be specified\n     numerically (0-12), or as the name of the month (e.g. May)\n- **dow** - This is the Day of Week that you want a command to be run on, it can\n     also be numeric (0-7) or as the name of the day (e.g. sun).\n- **user** -    This is the user who runs the command.\n- **command** - This is the command that you want run. This field may contain\n     multiple words or spaces.\n\nNote that using an asterisk `*` means all and using an asterisk followed by a slash and number means every nth value. So `*/5` means every five. Some examples are\n\n```shell\n*/5   *    *   *   *       # Every five minutes\n  0   *    *   *   *       # Every hour at o'clock\n  0   9    *   *   *       # Every day at 9:00 am\n  0   9-17 *   *   *       # Every hour between 9:00am and 5:00pm\n  0   0    *   *   5       # Every Friday at 12:00 am\n  0   0    1   */2 *       # Every other month, the first day, 12:00am\n```\nYou can find many more examples of common crontab schedules in [crontab.guru](https://crontab.guru/examples.html)\n\n## Shell environment and logging\n\nA common pitfall when using cron is that it does not load the same environment scripts that common shells do such as `.bashrc`, `.zshrc`, &c and it does not log the output anywhere by default. Combined with the maximum frequency being one minute, it can become quite painful to debug cronscripts initially.\n\nTo deal with the environment, make sure that you use absolute paths in all your scripts and modify your environment variables such as `PATH` so the script can run successfully. To simplify logging, a good recommendation is to write your crontab in a format like this\n\n\n```shell\n* * * * *   user  /path/to/cronscripts/every_minute.sh >> /tmp/cron_every_minute.log 2>&1\n```\n\nAnd write the script in a separate file. Remember that `>>` appends to the file and that `2>&1` redirects `stderr` to `stdout` (you might to want keep them separate though).\n\n## Anacron\n\nOne caveat of using cron is that if the computer is powered off or asleep when the cron script should run then it is not executed. For frequent tasks this might be fine, but if a task runs less often, you may want to ensure that it is executed. [anacron](https://linux.die.net/man/8/anacron) works similar to `cron` except that the frequency is specified in days. Unlike cron, it does not assume that the machine is running continuously. Hence, it can be used on machines that aren't running 24 hours a day, to control regular jobs as daily, weekly, and monthly jobs.\n\n\n## Exercises\n\n1. Make a script that looks every minute in your downloads folder for any file that is a picture (you can look into [MIME types](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types) or use a regular expression to match common extensions) and moves them into your Pictures folder.\n\n1. Write a cron script to weekly check for outdated packages in your system and prompts you to update them or updates them automatically.\n\n\n\n{% comment %}\n\n- [fswatch](https://github.com/emcrisostomo/fswatch)\n- GUI automation (pyautogui) [Automating the boring stuff Chapter 18](https://automatetheboringstuff.com/chapter18/)\n- Ansible/puppet/chef\n\n- https://xkcd.com/1205/\n- https://xkcd.com/1319/\n\n{% endcomment %}\n"
  },
  {
    "path": "_2019/backups.md",
    "content": "---\nlayout: lecture\ntitle: \"Backups\"\npresenter: Jose\nvideo:\n  aspect: 56.25\n  id: lrpqYF8tcYQ\n---\n\nThere are two types of people:\n\n- Those who do backups\n- Those who will do backups\n\nAny data you own that you haven't backed up is data that could be gone at any moment, forever. Here we will cover some good backup basics and the pitfalls of some approaches.\n\n## 3-2-1 Rule\n\nThe [3-2-1 rule](https://www.us-cert.gov/sites/default/files/publications/data_backup_options.pdf) is a general recommended strategy for backing up your data. It state that you should have:\n\n- at least **3 copies** of your data\n- **2** copies in **different mediums**\n- **1** of the copies being **offsite**\n\nThe main idea behind this recommendation is not to put all your eggs in one basket. Having 2 different devices/disks ensures that a single hardware failure doesn't take away all your data. Similarly, if you store your only backup at home and the house burns down or gets robbed you lose everything, that's what the offsite copy is there for. Onsite backups give you availability and speed, offsite give you the resiliency should a disaster happen.\n\n## Testing your backups\n\nA common pitfall when performing backups is blindly trusting whatever the system says it's doing and not verifying that the data can be properly recovered. Toy Story 2 was almost lost and their backups were not working, [luck](https://www.youtube.com/watch?v=8dhp_20j0Ys) ended up saving them.\n\n## Versioning\n\nYou should understand that [RAID](https://en.wikipedia.org/wiki/RAID) is not a backup, and in general **mirroring is not a backup solution**. Simply syncing your files somewhere will not help in several scenarios, such as:\n\n- Data corruption\n- Malicious software\n- Deleting files by mistake\n\nIf the changes on your data propagate to the backup then you won't be able to recover in these scenarios. Note that this is the case for a lot of cloud storage solutions like Dropbox, Google Drive, One Drive, &c. Some of them do keep deleted data around for short amounts of time but usually the interface to recover is not something you want to be using to recover large amounts of files.\n\nA proper backup system should be versioned in order to prevent this failure mode. By providing different snapshots in time one can easily navigate them to restore whatever was lost. The most widely known software of this kind is macOS Time Machine.\n\n## Deduplication\n\nHowever, making several copies of your data might be extremely costly in terms of disk space. Nevertheless, from one version to the next, most data will be identical and needs not be transferred again. This is where [data deduplication](https://en.wikipedia.org/wiki/Data_deduplication) comes into play, by keeping track of what has already been stored one can do **incremental backups** where only the changes from one version to the next need to be stored. This significantly reduces the amount of space needed for backups beyond the first copy.\n\n## Encryption\n\nSince we might be backing up to untrusted third parties like cloud providers it is worth considering that if you backup your data is copied *as is* then it could potentially be looked by unwanted agents. Documents like your taxes are sensitive information that should not be backed up in plain format. To prevent this, many backup solutions offer **client side encryption** where data is encrypted before being sent to the server. That way the server cannot read the data it is storing but you can decrypt it with your secret key.\n\nAs a side note, if your disk (or home partition) is not encrypted, then anyone that get hold of your computer can manage to override the user access controls and read your data. Modern hardware supports fast and efficient read and writes of encrypted data so you might want to consider enabling **full disk encryption**.\n\n\n## Append only\n\nThe properties reviewed so far focus on hardware failure or user mistakes but fail to address what happens if a malicious agent wanted to delete your data. Namely, say someone hacks into your system, are they able to wipe all your copies of the data you care about? If you worry about that scenario then you need some sort of append only backup solution. In general, this means having a server that will allow you to send new data but will refuse to delete existing data. Usually users have two keys, an append only key that supports  creating new backups and a full access key that also allows for deleting old backups that are no longer needed. The latter one is stored offline.\n\nNote that this is a quite challenging scenario since you need the ability to make changes whilst still preventing a malicious user from deleting your data. Existing commercial solutions include [Tarsnap](https://www.tarsnap.com/) and [Borgbase](https://www.borgbase.com/).\n\n\n## Additional considerations\n\nSome other things you may want to look into are:\n\n- **Periodic backups**: outdated backups can become pretty useless. Making backups regularly should be a consideration for your system\n- **Bootable backups**: some programs allow you to clone your entire disk. That way you have an image that contains an entire copy of your system you can boot directly from.\n- **Differential backup strategies**, you may not necessarily care the same about all your data. You can define different backup policies for different types of data.\n- **Append only backups** an additional consideration is to enforce append only operations to your backup repositories in order to prevent malicious agents to delete them if they get hold of your machine.\n\n\n## Webservices\n\nNot all the data that you use lives on your hard disk. If you use **webservices**, then it might be the case that some data you care about, such as Google Docs presentations or Spotify playlists, is stored online. Another easy example that is easy to forget is email accounts with web access, such as Gmail. Figuring out a backup solution in these cases is somewhat trickier. However, there are many services that allow you to download your data, either directly or via an API. Tools such as [gmvault](https://github.com/gaubert/gmvault) for Gmail are available to download the email files to your computer.\n\n\n## Webpages\n\nSimilarly, some high quality content can be found online in the form of webpages. If said content is static one can easily back it up by just saving the website and all of its attachments. Another alternative is the [Wayback Machine](https://archive.org/web/), a massive digital archive of the World Wide Web managed by the [Internet Archive](https://archive.org/), a non profit organization focused on the preservation of all sorts of media. The Wayback Machine allows you to capture and archive webpages being able to later retrieve all the snapshots that have been archived for that website. If you find it useful, consider [donating](https://archive.org/donate/) to the project.\n\n\n## Resources\n\nSome good backup programs and services we have used and can honestly recommend:\n\n- [Tarsnap](https://www.tarsnap.com/) - deduplicated, encrypted online backup service for the truly paranoid.\n- [Borg Backup](https://borgbackup.readthedocs.io) - deduplicated backup program that supports compression and authenticated encryption. If you need a cloud provider [rsync.net](https://www.rsync.net/products/borg.html) has special offerings for Borg users.\n- [rsync](https://rsync.samba.org/) is a utility that provides fast incremental file transfer. It is not a full backup solution.\n- [rclone](https://rclone.org/) like rsync but for cloud storage providers such as Amazon S3, Dropbox, Google Drive, rsync.net, &c. Supports client side encryption of remote folders.\n\n## Exercises\n\n1. Consider how you are (not) backing up your data and look into fixing/improving that.\n\n1. Figure out how to backup your email accounts\n\n1. Choose a webservice you use often (Spotify, Google Music, etc.) and figure out what options for backing up your data are. Often people have already made tools (such as [youtube-dl](https://ytdl-org.github.io/youtube-dl/)) solutions based on available APIs.\n\n1. Think of a website you have visited repeatedly over the years and look it up in [archive.org](https://archive.org/web/), how many versions does it have?\n\n1. One way to efficiently implement deduplication is to use hardlinks. Whereas symbolic link (also called a soft link or a symlink) is a file that points to another file or folder, a hardlink is a exact copy of the pointer (it uses the same inode and points to the same place in the disk). Thus if the original file is removed a symlink stops working whereas a hard link doesn't. However, hardlinks only work for files. Try using the command `ln` to create hard links and compare them to symlinks created with `ln -s`. (In macOS you will need to install the gnu coreutils or the hln package).\n"
  },
  {
    "path": "_2019/command-line.md",
    "content": "---\nlayout: lecture\ntitle: \"Command-line environment\"\npresenter: Jose\nvideo:\n  aspect: 62.5\n  id: i0rf1gpKL1E\n---\n\n## Aliases & Functions\n\nAs you can imagine it can become tiresome typing long commands that involve many flags or verbose options. Nevertheless, most shells support **aliasing**. For instance, an alias in bash has the following structure (note there is no space around the `=` sign):\n\n```bash\nalias alias_name=\"command_to_alias\"\n```\n\n<!-- We can alias common flags for our commands like `alias ll=ls -ltAh`. Alias can be composed  -->\n\nAlias have many convenient features\n\n```bash\n# Alias can summarize good default flags\nalias ll=\"ls -lh\"\n\n# Save a lot of typing for common commands\nalias gc=\"git commit\"\n\n# Alias can overwrite existing commands\nalias mv=\"mv -i\"\nalias mkdir=\"mkdir -p\"\n\n# Alias can be composed\nalias la=\"ls -A\"\nalias lla=\"la -l\"\n\n# To ignore an alias run it prepended with \\\n\\ls\n# Or can be disabled using unalias\nunalias la\n\n```\n<!--\nTo get rid of an alias you can run `unalias alias_name` or to ignore alias when running a command you can prepend the command with a backward slash `\\alias_name`. This is convenient when an alias is overwriting an existing name. -->\n\n\nHowever in many scenarios aliases can be limiting, specially when you are trying to write chain commands together that take the same arguments. An alternative exists which is **functions** which are a midpoint between aliases and custom shell scripts.\n\nHere is an example function that makes a directory and move into it.\n\n```bash\nmcd () {\n    mkdir -p $1\n    cd $1\n}\n```\n\nAlias and functions will not persist shell sessions by default. To make an alias persistent you need to include it a one the shell startup script files like `.bashrc` or `.zshrc`. My suggestion is to write them separately in a `.alias` and `source` that file from your different shell config files.\n\n<!-- Lastly, if you decide to alias any of these tools with the \"improved\" version, e.g. `alias bat=cat` it is useful to know that you can tell bash to ignore aliases by doing `\\cat` and ignore both aliases and functions by doing `command cat` -->\n\n## Shells & Frameworks\n\nDuring shell and scripting we covered the `bash` shell since it is by far the most ubiquitous shell and most systems have it as the default option. Nevertheless, it is not the only option.\n\nFor example the `zsh` shell is a superset of `bash` and provides many convenient features out of the box such as:\n\n- Smarter globbing, `**`\n- Inline globbing/wildcard expansion\n- Spelling correction\n- Better tab completion/selection\n- Path expansion (`cd /u/lo/b` will expand as `/usr/local/bin`)\n\nMoreover many shells can be improved with **frameworks**, some popular general frameworks like [prezto](https://github.com/sorin-ionescu/prezto) or [oh-my-zsh](https://github.com/robbyrussell/oh-my-zsh), and smaller ones that focus on specific features like for example [zsh-syntax-highlighting](https://github.com/zsh-users/zsh-syntax-highlighting) or [zsh-history-substring-search](https://github.com/zsh-users/zsh-history-substring-search). Other shells like [fish](https://fishshell.com/) include a lot of these user-friendly features by default. Some of these features include:\n\n- Right prompt\n- Command syntax highlighting\n- History substring search\n- manpage based flag completions\n- Smarter autocompletion\n- Prompt themes\n\nOne thing to note when using these frameworks is that if the code they run is not properly optimized or it is too much code, your shell can start slowing down. You can always profile it and disable the features that you do not use often or value over speed.\n\n## Terminal Emulators & Multiplexers\n\nAlong with customizing your shell it is worth spending some time figuring out your choice of **terminal emulator** and its settings. There are many many terminal emulators out there (here is a [comparison](https://anarc.at/blog/2018-04-12-terminal-emulators-1/)).\n\nSince you might be spending hundreds to thousands of hours in your terminal it pays off to look into its settings. Some of the aspects that you may want to modify in your terminal include:\n\n- Font choice\n- Color Scheme\n- Keyboard shortcuts\n- Tab/Pane support\n- Scrollback configuration\n- Performance (some newer terminals like [Alacritty](https://github.com/jwilm/alacritty) offer GPU acceleration)\n\nIt is also worth mentioning **terminal multiplexers** like [tmux](https://github.com/tmux/tmux). `tmux` allows you to pane and tab multiple shell sessions. It also supports attaching and detaching which is a very common use-case when you are working on a remote server and want to keep you shell running without having to worry about disowning you current processes (by default when you log out your processes are terminated).  This way, with `tmux` you can jump into and out of complex terminal layouts. Similar to terminal emulators `tmux` supports heavy customization by editing the `~/.tmux.conf` file.\n\n\n## Command-line utilities\n\nThe command line utilities that most UNIX based operating systems have by default are more than enough to do 99% of the stuff you usually need to do.\n\n\nIn the next few subsections I will cover alternative tools for extremely common shell operations which are more convenient to use. Some of these tools add new improved functionality to the command whereas others just focus on providing a simpler, more intuitive interface with better defaults.\n\n### `fasd` vs `cd`\n\nEven with improved path expansion and tab autocomplete, changing directories can become quite repetitive. [Fasd](https://github.com/clvv/fasd) (or [autojump](https://github.com/wting/autojump)) solves this issue by keeping track of recent and frequent folders you have been to and performing fuzzy matching.\n\nThus if I have visited the path `/home/user/awesome_project/code` running `z code` will `cd` to it. If I have multiple folders called code I can disambiguate by running `z awe code` which will be closer match. Unlike autojump,  fasd also provides commands that instead of performing `cd` just expand frequent and /or recent files,folders or both.\n\n\n### `bat` vs `cat`\n\nEven though `cat` does it job perfectly, [bat](https://github.com/sharkdp/bat) improves it by providing syntax highlighting, paging, line numbers and git integration.\n\n\n### `exa`/`ranger` vs `ls`\n\n`ls` is a great command but some of the defaults can be annoying such as displaying the size in raw bytes. [exa](https://github.com/ogham/exa) provides better defaults\n\nIf you are in need of navigating many folders and/or previewing many files, [ranger](https://github.com/ranger/ranger) can be much more efficient than `cd` and `cat` due to its wonderful interface. It is quite customizable and with a correct setup you can even [preview images](https://github.com/ranger/ranger/wiki/Image-Previews) in your terminal\n\n### `fd` vs `find`\n\n[fd](https://github.com/sharkdp/fd) is a simple, fast and user-friendly alternative to `find`. `find` defaults like having to use the `--name` flag (which is what you want to do 99% of the time) make it easier to use in an every day basis. It is also `git` aware and will skip files in your `.gitignore` and `.git` folder by default. It also has nice color coding by default.\n\n### `rg/fzf` vs `grep`\n\n`grep` is a great tool but if you want to grep through many files at once, there are better tools for that purpose. [ack](https://github.com/beyondgrep/ack3), [ag](https://github.com/ggreer/the_silver_searcher) & [rg](https://github.com/BurntSushi/ripgrep) recursively search your current directory for a regex pattern while respecting your gitignore rules. They all work pretty similar but I favor `rg` due to how fast it can search my entire home directory.\n\nSimilarly, it can be easy to find yourself doing `CMD | grep PATTERN` over an over again. [fzf](https://github.com/junegunn/fzf) is a command line fuzzy finder that enables you to interactively filter the output of pretty much any command.\n\n### `rsync` vs `cp/scp`\n\nWhereas `mv` and `scp` are perfect for most scenarios, when copying/moving around large amounts of files, large files or when some of the data is already on the destination `rsync` is a huge improvement. `rsync` will skip files that have already been transferred and with the `--partial` flag it can resume from a previously interrupted copy.\n\n### `trash` vs `rm`\n\n`rm` is a dangerous command in the sense that once you delete a file there is no turning back. However, modern OS do not behave like that when you delete something in the file explorer, they just move it to the Trash folder which is cleared periodically.\n\nSince how the trash is managed varies from OS to OS there is not a single CLI utility. In macOS there is [trash](https://hasseg.org/trash/) and in linux there is [trash-cli](https://github.com/andreafrancia/trash-cli/) among others.\n\n### `mosh` vs `ssh`\n\n`ssh ` is a very handy tool but if you have a slow connection, the lag can become annoying and if the connection interrupts you have to reconnect. [mosh](https://mosh.org/) is a handy tool that works allows roaming, supports intermittent connectivity, and provides intelligent local echo.\n\n### `tldr` vs `man`\n\nYou can figure out what a commands does and what options it has using `man` and the `-h`/'--help' flag most of the time. However, in some cases it can be a bit daunting navigating these if they are detailed\n\nThe [tldr](https://github.com/tldr-pages/tldr) command is a community driven documentation system that's available from the command line and gives a few simple illustrative examples of what the command does and the most common argument options.\n\n\n### `aunpack` vs `tar/unzip/unrar`\n\nAs [this xkcd](https://xkcd.com/1168/) references, it can be quite tricky to remember the options for `tar` and sometimes you need a different tool altogether such as `unrar` for .rar files.\nThe [atool](https://www.nongnu.org/atool/) package provides the `aunpack` command which will figure out the correct options and always put the extracted archives in a new folder.\n\n\n## Exercises\n\n1. Run `cat .bash_history | sort | uniq -c | sort -rn | head -n 10` (or `cat .zhistory | sort | uniq -c | sort -rn | head -n 10` for zsh)  to get top 10 most used commands and consider writing shorter aliases for them\n1. Choose a terminal emulator and figure out how to change the following properties:\n    - Font choice\n    - Color scheme. How many colors does a standard scheme have? why?\n    - Scrollback history size\n\n1. Install `fasd` or some similar software and write a bash/zsh function called `v` that performs fuzzy matching on the passed arguments and opens up the top result in your editor of choice. Then, modify it so that if there are multiple matches you can select them with `fzf`.\n1. Since `fzf` is quite convenient for performing fuzzy searches and the shell history is quite prone to those kind of searches, investigate how to bind `fzf` to `^R`. You can find some info [here](https://github.com/junegunn/fzf/wiki/Configuring-shell-key-bindings)\n1. What does the `--bar` option do in `ack`?\n"
  },
  {
    "path": "_2019/course-overview.md",
    "content": "---\nlayout: lecture\ntitle: \"Course Overview\"\npresenter: Anish\nvideo:\n  aspect: 56.25\n  id: qw2c6ffSVOM\n---\n\n# Motivation\n\nThis class is about [hacker](https://en.wikipedia.org/wiki/Hacker_culture)\ntools, not [hacker](https://en.wikipedia.org/wiki/Security_hacker) tools.\n\nMIT classes do not cover any of this content in detail. It's hugely beneficial\nto be proficient with your tools: it'll save you a lot of time (and the payoff\ntime is very short).\n\nWe want to teach you about new tools, how to make the most of your tools, how\nto customize your tools, and how to extend your tools.\n\n# Class structure\n\nWe have 6 lectures covering a [variety of topics](/2019/). We have lecture\nnotes online, but there will be a lot of content covered in class (e.g. in the\nform of demos) that may not be in the notes. We will be recording lectures.\n\nEach class is split into two 50-minute lectures with a 10-minute break in\nbetween. Lectures are mostly live demonstrations followed by hands-on\nexercises. We might have a short amount of time at the end of each class to get\nstarted on the exercises in an office-hours-style setting.\n\nTo make the most of the class, you should go through all the exercises on your\nown. We'll inspire you to learn more about your tools, and we'll show you\nwhat's possible and cover some of the basics in detail, but we can't teach you\neverything in the time we have.\n"
  },
  {
    "path": "_2019/data-wrangling.md",
    "content": "---\nlayout: lecture\ntitle: \"Data Wrangling\"\npresenter: Jon\nvideo:\n  aspect: 56.25\n  id: VW2jn9Okjhw\n---\n\nHave you ever had a bunch of text and wanted to do something with it?\nGood. That's what data wrangling is all about!\nSpecifically, adapting data from one format to another, until you end up\nwith exactly what you wanted.\n\nWe've already seen basic data wrangling: `journalctl | grep -i intel`.\n - find all system log entries that mention Intel (case insensitive)\n - really, most of data wrangling is about knowing what tools you have,\n   and how to combine them.\n\nLet's start from the beginning: we need a data source, and something to\ndo with it. Logs often make for a good use-case, because you often want\nto investigate things about them, and reading the whole thing isn't\nfeasible. Let's figure out who's trying to log into my server by looking\nat my server's log:\n\n```bash\nssh myserver journalctl\n```\n\nThat's far too much stuff. Let's limit it to ssh stuff:\n\n```bash\nssh myserver journalctl | grep sshd\n```\n\nNotice that we're using a pipe to stream a _remote_ file through `grep`\non our local computer! `ssh` is magical. This is still way more stuff\nthan we wanted though. And pretty hard to read. Let's do better:\n\n```bash\nssh myserver journalctl | grep sshd | grep \"Disconnected from\"\n```\n\nThere's still a lot of noise here. There are _a lot_ of ways to get rid\nof that, but let's look at one of the most powerful tools in your\ntoolkit: `sed`.\n\n`sed` is a \"stream editor\" that builds on top of the old `ed` editor. In\nit, you basically give short commands for how to modify the file, rather\nthan manipulate its contents directly (although you can do that too).\nThere are tons of commands, but one of the most common ones is `s`:\nsubstitution. For example, we can write:\n\n```bash\nssh myserver journalctl\n | grep sshd\n | grep \"Disconnected from\"\n | sed 's/.*Disconnected from //'\n```\n\nWhat we just wrote was a simple _regular expression_; a powerful\nconstruct that lets you match text against patterns. The `s` command is\nwritten on the form: `s/REGEX/SUBSTITUTION/`, where `REGEX` is the\nregular expression you want to search for, and `SUBSTITUTION` is the\ntext you want to substitute matching text with.\n\n## Regular expressions\n\nRegular expressions are common and useful enough that it's worthwhile to\ntake some time to understand how they work. Let's start by looking at\nthe one we used above: `/.*Disconnected from /`. Regular expressions are\nusually (though not always) surrounded by `/`. Most ASCII characters\njust carry their normal meaning, but some characters have \"special\"\nmatching behavior. Exactly which characters do what vary somewhat\nbetween different implementations of regular expressions, which is a\nsource of great frustration. Very common patterns are:\n\n - `.` means \"any single character\" except newline\n - `*` zero or more of the preceding match\n - `+` one or more of the preceding match\n - `[abc]` any one character of `a`, `b`, and `c`\n - `(RX1|RX2)` either something that matches `RX1` or `RX2`\n - `^` the start of the line\n - `$` the end of the line\n\n`sed`'s regular expressions are somewhat weird, and will require you to\nput a `\\` before most of these to give them their special meaning. Or\nyou can pass `-E`.\n\nSo, looking back at `/.*Disconnected from /`, we see that it matches\nany text that starts with any number of characters, followed by the\nliteral string \"Disconnected from \". Which is what we wanted. But\nbeware, regular expressions are tricky. What if someone tried to log in\nwith the username \"Disconnected from\"? We'd have:\n\n```\nJan 17 03:13:00 thesquareplanet.com sshd[2631]: Disconnected from invalid user Disconnected from 46.97.239.16 port 55920 [preauth]\n```\n\nWhat would we end up with? Well, `*` and `+` are, by default, \"greedy\".\nThey will match as much text as they can. So, in the above, we'd end up\nwith just\n\n```\n46.97.239.16 port 55920 [preauth]\n```\n\nWhich may not be what we wanted. In some regular expression\nimplementations, you can just suffix `*` or `+` with a `?` to make them\nnon-greedy, but sadly `sed` doesn't support that. We _could_ switch to\nperl's command-line mode though, which _does_ support that construct:\n\n```bash\nperl -pe 's/.*?Disconnected from //'\n```\n\nWe'll stick to `sed` for the rest of this though, because it's by far\nthe more common tool for these kinds of jobs. `sed` can also do other\nhandy things like print lines following a given match, do multiple\nsubstitutions per invocation, search for things, etc. But we won't cover\nthat too much here. `sed` is basically an entire topic in and of itself,\nbut there are often better tools.\n\nOkay, so we also have a suffix we'd like to get rid of. How might we do\nthat? It's a little tricky to match just the text that follows the\nusername, especially if the username can have spaces and such! What we\nneed to do is match the _whole_ line:\n\n```bash\n | sed -E 's/.*Disconnected from (invalid |authenticating )?user .* [^ ]+ port [0-9]+( \\[preauth\\])?$//'\n```\n\nLet's look at what's going on with a [regex\ndebugger](https://regex101.com/r/qqbZqh/2). Okay, so the start is still\nas before. Then, we're matching any of the \"user\" variants (there are\ntwo prefixes in the logs). Then we're matching on any string of\ncharacters where the username is. Then we're matching on any single word\n(`[^ ]+`; any non-empty sequence of non-space characters). Then the word\n\"port\" followed by a sequence of digits. Then possibly the suffix\n` [preauth]`, and then the end of the line.\n\nNotice that with this technique, as username of \"Disconnected from\"\nwon't confuse us any more. Can you see why?\n\nThere is one problem with this though, and that is that the entire log\nbecomes empty. We want to _keep_ the username after all. For this, we\ncan use \"capture groups\". Any text matched by a regex surrounded by\nparentheses is stored in a numbered capture group. These are available\nin the substitution (and in some engines, even in the pattern itself!)\nas `\\1`, `\\2`, `\\3`, etc. So:\n\n```bash\n | sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \\[preauth\\])?$/\\2/'\n```\n\nAs you can probably imagine, you can come up with _really_ complicated\nregular expressions. For example, here's an article on how you might\nmatch an [e-mail\naddress](https://www.regular-expressions.info/email.html). It's [not\neasy](https://web.archive.org/web/20221223174323/http://emailregex.com/). And there's [lots of\ndiscussion](https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression/1917982).\nAnd people have [written\ntests](https://fightingforalostcause.net/content/misc/2006/compare-email-regex.php).\nAnd [test matrices](https://mathiasbynens.be/demo/url-regex). You can\neven write a regex for determining if a given number [is a prime\nnumber](https://www.noulakaz.net/2007/03/18/a-regular-expression-to-check-for-prime-numbers/).\n\nRegular expressions are notoriously hard to get right, but they are also\nvery handy to have in your toolbox!\n\n## Back to data wrangling\n\nOkay, so we now have\n\n```bash\nssh myserver journalctl\n | grep sshd\n | grep \"Disconnected from\"\n | sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \\[preauth\\])?$/\\2/'\n```\n\nWe could do it just with `sed`, but why would we? For fun is why.\n\n```bash\nssh myserver journalctl\n | sed -E\n   -e '/Disconnected from/!d'\n   -e 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \\[preauth\\])?$/\\2/'\n```\n\nThis shows off some of `sed`'s capabilities. `sed` can also inject text\n(with the `i` command), explicitly print lines (with the `p` command),\nselect lines by index, and lots of other things. Check `man sed`!\n\nAnyway. What we have now gives us a list of all the usernames that have\nattempted to log in. But this is pretty unhelpful. Let's look for common\nones:\n\n```bash\nssh myserver journalctl\n | grep sshd\n | grep \"Disconnected from\"\n | sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \\[preauth\\])?$/\\2/'\n | sort | uniq -c\n```\n\n`sort` will, well, sort its input. `uniq -c` will collapse consecutive\nlines that are the same into a single line, prefixed with a count of the\nnumber of occurrences. We probably want to sort that too and only keep\nthe most common logins:\n\n```bash\nssh myserver journalctl\n | grep sshd\n | grep \"Disconnected from\"\n | sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \\[preauth\\])?$/\\2/'\n | sort | uniq -c\n | sort -nk1,1 | tail -n10\n```\n\n`sort -n` will sort in numeric (instead of lexicographic) order. `-k1,1`\nmeans \"sort by only the first whitespace-separated column\". The `,n`\npart says \"sort until the `n`th field, where the default is the end of\nthe line. In this _particular_ example, sorting by the whole line\nwouldn't matter, but we're here to learn!\n\nIf we wanted the _least_ common ones, we could use `head` instead of\n`tail`. There's also `sort -r`, which sorts in reverse order.\n\nOkay, so that's pretty cool, but we'd sort of like to only give the\nusernames, and maybe not one per line?\n\n```bash\nssh myserver journalctl\n | grep sshd\n | grep \"Disconnected from\"\n | sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \\[preauth\\])?$/\\2/'\n | sort | uniq -c\n | sort -nk1,1 | tail -n10\n | awk '{print $2}' | paste -sd,\n```\n\nLet's start with `paste`: it lets you combine lines (`-s`) by a given\nsingle-character delimiter (`-d`). But what's this `awk` business?\n\n## awk -- another editor\n\n`awk` is a programming language that just happens to be really good at\nprocessing text streams. There is _a lot_ to say about `awk` if you were\nto learn it properly, but as with many other things here, we'll just go\nthrough the basics.\n\nFirst, what does `{print $2}` do? Well, `awk` programs take the form of\nan optional pattern plus a block saying what to do if the pattern\nmatches a given line. The default pattern (which we used above) matches\nall lines. Inside the block, `$0` is set to the entire line's contents,\nand `$1` through `$n` are set to the `n`th _field_ of that line, when\nseparated by the `awk` field separator (whitespace by default, change\nwith `-F`). In this case, we're saying that, for every line, print the\ncontents of the second field, which happens to be the username!\n\nLet's see if we can do something fancier. Let's compute the number of\nsingle-use usernames that start with `c` and end with `e`:\n\n```bash\n | awk '$1 == 1 && $2 ~ /^c[^ ]*e$/ { print $2 }' | wc -l\n```\n\nThere's a lot to unpack here. First, notice that we now have a pattern\n(the stuff that goes before `{...}`). The pattern says that the first\nfield of the line should be equal to 1 (that's the count from `uniq\n-c`), and that the second field should match the given regular\nexpression. And the block just says to print the username. We then count\nthe number of lines in the output with `wc -l`.\n\nHowever, `awk` is a programming language, remember?\n\n```awk\nBEGIN { rows = 0 }\n$1 == 1 && $2 ~ /^c[^ ]*e$/ { rows += $1 }\nEND { print rows }\n```\n\n`BEGIN` is a pattern that matches the start of the input (and `END`\nmatches the end). Now, the per-line block just adds the count from the\nfirst field (although it'll always be 1 in this case), and then we print\nit out at the end. In fact, we _could_ get rid of `grep` and `sed`\nentirely, because `awk` [can do it\nall](https://backreference.org/2010/02/10/idiomatic-awk/), but we'll\nleave that as an exercise to the reader.\n\n## Analyzing data\n\nYou can do math!\n\n```bash\n | paste -sd+ | bc -l\n```\n\n```bash\necho \"2*($(data | paste -sd+))\" | bc -l\n```\n\nYou can get stats in a variety of ways.\n[`st`](https://github.com/nferraz/st) is pretty neat, but if you already\nhave R:\n\n```bash\nssh myserver journalctl\n | grep sshd\n | grep \"Disconnected from\"\n | sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \\[preauth\\])?$/\\2/'\n | sort | uniq -c\n | awk '{print $1}' | R --slave -e 'x <- scan(file=\"stdin\", quiet=TRUE); summary(x)'\n```\n\nR is another (weird) programming language that's great at data analysis\nand [plotting](https://ggplot2.tidyverse.org/). We won't go into too\nmuch detail, but suffice to say that `summary` prints summary statistics\nabout a matrix, and we computed a matrix from the input stream of\nnumbers, so R gives us the statistics we wanted!\n\nIf you just want some simple plotting, `gnuplot` is your friend:\n\n```bash\nssh myserver journalctl\n | grep sshd\n | grep \"Disconnected from\"\n | sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \\[preauth\\])?$/\\2/'\n | sort | uniq -c\n | sort -nk1,1 | tail -n10\n | gnuplot -p -e 'set boxwidth 0.5; plot \"-\" using 1:xtic(2) with boxes'\n```\n\n## Data wrangling to make arguments\n\nSometimes you want to do data wrangling to find things to install or\nremove based on some longer list. The data wrangling we've talked about\nso far + `xargs` can be a powerful combo:\n\n```bash\nrustup toolchain list | grep nightly | grep -vE \"nightly-x86|01-17\" | sed 's/-x86.*//' | xargs rustup toolchain uninstall\n```\n\n# Exercises\n\n1. If you are not familiar with Regular Expressions\n   [here](https://regexone.com/) is a short interactive tutorial that\n   covers most of the basics\n1. How is `sed s/REGEX/SUBSTITUTION/g` different from the regular sed?\n   What about `/I` or `/m`?\n1. To do in-place substitution it is quite tempting to do something like\n   `sed s/REGEX/SUBSTITUTION/ input.txt > input.txt`. However this is a\n   bad idea, why? Is this particular to `sed`?\n1. Implement a simple grep equivalent tool in a language you are familiar with using regex. If you want the output to be color highlighted like grep is, search for ANSI color escape sequences.\n1. Sometimes some operations like renaming files can be tricky with raw commands like `mv` . `rename` is a nifty tool to achieve this and has a sed-like syntax. Try creating a bunch of files with spaces in their names and use `rename` to replace them with underscores.\n1. Look for boot messages that are _not_ shared between your past three\n   reboots (see `journalctl`'s `-b` flag). You may want to just mash all\n   the boot logs together in a single file, as that may make things\n   easier.\n1. Produce some statistics of your system boot time over the last ten\n   boots using the log timestamp of the messages\n   ```\n   Logs begin at ...\n   ```\n   and\n   ```\n   systemd[577]: Startup finished in ...\n   ```\n1. Find the number of words (in `/usr/share/dict/words`) that contain at\n   least three `a`s and don't have a `'s` ending. What are the three\n   most common last two letters of those words? `sed`'s `y` command, or\n   the `tr` program, may help you with case insensitivity. How many\n   of those two-letter combinations are there? And for a challenge:\n   which combinations do not occur?\n1. Find an online data set like [this\n   one](https://commons.wikimedia.org/wiki/Data:Wikipedia_statistics/data.tab) or [this\n   one](https://ucr.fbi.gov/crime-in-the-u.s/2016/crime-in-the-u.s.-2016/topic-pages/tables/table-1).\n   Maybe another one [from\n   here](https://www.springboard.com/blog/data-science/free-public-data-sets-data-science-project/).\n   Fetch it using `curl` and extract out just two columns of numerical\n   data. If you're fetching HTML data,\n   [`pup`](https://github.com/EricChiang/pup) might be helpful. For JSON\n   data, try [`jq`](https://stedolan.github.io/jq/). Find the min and\n   max of one column in a single command, and the sum of the difference\n   between the two columns in another.\n"
  },
  {
    "path": "_2019/dotfiles.md",
    "content": "---\nlayout: lecture\ntitle: \"Dotfiles\"\npresenter: Anish\nvideo:\n  aspect: 62.5\n  id: YSZBWWJw3mI\n---\n\nMany programs are configured using plain-text files known as \"dotfiles\"\n(because the file names begin with a `.`, e.g. `~/.gitconfig`, so that they are\nhidden in the directory listing `ls` by default).\n\nA lot of the tools you use probably have a lot of settings that can be tuned\npretty finely. Often times, tools are customized with specialized languages,\ne.g. Vimscript for Vim or the shell's own language for a shell.\n\nCustomizing and adapting your tools to your preferred workflow will make you\nmore productive. We advise you to invest time in customizing your tool yourself\nrather than cloning someone else's dotfiles from GitHub.\n\nYou probably have some dotfiles set up already. Some places to look:\n\n- `~/.bashrc`\n- `~/.emacs`\n- `~/.vim`\n- `~/.gitconfig`\n\nSome programs don't put the files under your home folder directly and instead they put them in a folder under `~/.config`.\n\nDotfiles are not exclusive to command line applications, for instance the [MPV](https://mpv.io/) video player can be configured editing files under `~/.config/mpv`\n\n# Learning to customize tools\n\nYou can learn about your tool's settings by reading online documentation or\n[man pages](https://en.wikipedia.org/wiki/Man_page). Another great way is to\nsearch the internet for blog posts about specific programs, where authors will\ntell you about their preferred customizations. Yet another way to learn about\ncustomizations is to look through other people's dotfiles: you can find tons of\n[dotfiles\nrepositories](https://github.com/search?o=desc&q=dotfiles&s=stars&type=Repositories)\non GitHub --- see the most popular one\n[here](https://github.com/mathiasbynens/dotfiles) (we advise you not to blindly\ncopy configurations though).\n\n# Organization\n\nHow should you organize your dotfiles? They should be in their own folder,\nunder version control, and **symlinked** into place using a script. This has\nthe benefits of:\n\n- **Easy installation**: if you log in to a new machine, applying your\ncustomizations will only take a minute\n- **Portability**: your tools will work the same way everywhere\n- **Synchronization**: you can update your dotfiles anywhere and keep them all\nin sync\n- **Change tracking**: you're probably going to be maintaining your dotfiles\nfor your entire programming career, and version history is nice to have for\nlong-lived projects\n\n```shell\ncd ~/src\nmkdir dotfiles\ncd dotfiles\ngit init\ntouch bashrc\n# create a bashrc with some settings, e.g.:\n#     PS1='\\w > '\ntouch install\nchmod +x install\n# insert the following into the install script:\n#     #!/usr/bin/env bash\n#     BASEDIR=$(dirname $0)\n#     cd $BASEDIR\n#\n#     ln -s ${PWD}/bashrc ~/.bashrc\ngit add bashrc install\ngit commit -m 'Initial commit'\n```\n\n# Advanced topics\n\n## Machine-specific customizations\n\nMost of the time, you'll want the same configuration across machines, but\nsometimes, you'll want a small delta on a particular machine. Here are a couple\nways you can handle this situation:\n\n### Branch per machine\n\nUse version control to maintain a branch per machine. This approach is\nlogically straightforward but can be pretty heavyweight.\n\n### If statements\n\nIf the configuration file supports it, use the equivalent of if-statements to\napply machine specific customizations. For example, your shell could have something\nlike:\n\n```shell\nif [[ \"$(uname)\" == \"Linux\" ]]; then {do_something else}; fi\n\n# Darwin is the architecture name for macOS systems\nif [[ \"$(uname)\" == \"Darwin\" ]]; then {do_something}; fi\n\n# You can also make it machine specific\nif [[ \"$(hostname)\" == \"myServer\" ]]; then {do_something}; fi\n```\n\n### Includes\n\nIf the configuration file supports it, make use of includes. For example,\na `~/.gitconfig` can have a setting:\n\n```\n[include]\n    path = ~/.gitconfig_local\n```\n\nAnd then on each machine, `~/.gitconfig_local` can contain machine-specific\nsettings. You could even track these in a separate repository for\nmachine-specific settings.\n\nThis idea is also useful if you want different programs to share some configurations. For instance if you want both `bash` and `zsh` to share the same set of aliases you can write them under `.aliases` and have the following block in both.\n\n```bash\n# Test if ~/.aliases exists and source it\nif [ -f ~/.aliases ]; then\n    source ~/.aliases\nfi\n```\n\n# Resources\n\n- Your instructors' dotfiles:\n  [Anish](https://github.com/anishathalye/dotfiles),\n  [Jon](https://github.com/jonhoo/configs),\n  [Jose](https://github.com/jjgo/dotfiles)\n- [GitHub does dotfiles](http://dotfiles.github.io/): dotfile frameworks,\nutilities, examples, and tutorials\n- [Shell startup\n  scripts](https://blog.flowblok.id.au/2013-02/shell-startup-scripts.html): an\n  explanation of the different configuration files used for your shell\n\n# Exercises\n\n1. Create a folder for your dotfiles and set up [version\n   control](/2019/version-control/).\n\n1. Add a configuration for at least one program, e.g. your shell, with some\n   customization (to start off, it can be something as simple as customizing\n   your shell prompt by setting `$PS1`).\n\n1. Set up a method to install your dotfiles quickly (and without manual effort)\n   on a new machine. This can be as simple as a shell script that calls `ln -s`\n   for each file, or you could use a [specialized\n   utility](http://dotfiles.github.io/utilities/).\n\n1. Test your installation script on a fresh virtual machine.\n\n1. Migrate all of your current tool configurations to your dotfiles repository.\n\n1. Publish your dotfiles on GitHub.\n"
  },
  {
    "path": "_2019/editors.md",
    "content": "---\nlayout: lecture\ntitle: \"Editors\"\npresenter: Anish\nvideo:\n  aspect: 62.5\n  id: 1vLcusYSrI4\n---\n\n# Importance of Editors\n\nAs programmers, we spend most of our time editing plain-text files. It's worth\ninvesting time learning an editor that fits your needs.\n\nHow do you learn a new editor? You force yourself to use that editor for a\nwhile, even if it temporarily hampers your productivity. It'll pay off soon\nenough (two weeks is enough to learn the basics).\n\nWe are going to teach you Vim, but we encourage you to experiment with other\neditors. It's a very personal choice, and people have [strong\nopinions](https://en.wikipedia.org/wiki/Editor_war).\n\nWe can't teach you how to use a powerful editor in 50 minutes, so we're going\nto focus on teaching you the basics, showing you some of the more advanced\nfunctionality, and giving you the resources to master the tool. We'll teach you\nlessons in the context of Vim, but most ideas will translate to any other\npowerful editor you use (and if they don't, then you probably shouldn't use\nthat editor!).\n\n![Editor Learning Curves](/2019/files/editor-learning-curves.jpg)\n\n<!-- source: https://blogs.msdn.microsoft.com/steverowe/2004/11/17/code-editor-learning-curves/ -->\n\nThe editor learning curves graph is a myth. Learning the basics of a powerful\neditor is quite easy (even though it might take years to master).\n\nWhich editors are popular today? See this [Stack Overflow\nsurvey](https://insights.stackoverflow.com/survey/2018/#development-environments-and-tools)\n(there may be some bias because Stack Overflow users may not be representative\nof programmers as a whole).\n\n## Command-line Editors\n\nEven if you eventually settle on using a GUI editor, it's worth learning a\ncommand-line editor for easily editing files on remote machines.\n\n# Nano\n\nNano is a simple command-line editor.\n\n- Move with arrow keys\n- All other shortcuts (save, exit) shown at the bottom\n\n# Vim\n\nVi/Vim is a powerful text editor. It's a command-line program that's usually\ninstalled everywhere, which makes it convenient for editing files on a remote\nmachine.\n\nVim also has graphical versions, such as GVim and\n[MacVim](https://macvim-dev.github.io/macvim/). These provide additional\nfeatures such as 24-bit color, menus, and popups.\n\n## Philosophy of Vim\n\n- When programming, you spend most of your time reading/editing, not writing\n    - Vim is a **modal** editor: different modes for inserting text vs manipulating text\n- Vim is programmable (with Vimscript and also other languages like Python)\n- Vim's interface itself is like a programming language\n    - Keystrokes (with mnemonic names) are commands\n    - Commands are composable\n- Don't use the mouse: too slow\n- Editor should work at the speed you think\n\n## Introductory Vim\n\n### Modes\n\nVim shows the current mode in the bottom left.\n\n- Normal mode: for moving around a file and making edits\n    - Spend most of your time here\n- Insert mode: for inserting text\n- Visual (visual, line, or block) mode: for selecting blocks of text\n\nYou change modes by pressing `<ESC>` to switch from any mode back to normal\nmode. From normal mode, enter insert mode with `i`, visual mode with `v`,\nvisual line mode with `V`, and visual block mode with `<C-v>`.\n\nYou use the `<ESC>` key a lot when using Vim: consider remapping Caps Lock to\nEscape.\n\n### Basics\n\nVim ex commands are issued through `:{command}` in normal mode.\n\n- `:q` quit (close window)\n- `:w` save\n- `:wq` save and quit\n- `:e {name of file}` open file for editing\n- `:ls` show open buffers\n- `:help {topic}` open help\n    - `:help :w` opens help for the `:w` ex command\n    - `:help w` opens help for the `w` movement\n\n### Movement\n\nVim is all about efficient movement. Navigate the file in Normal mode.\n\n- Disable arrow keys to avoid bad habits\n```vim\nnnoremap <Left> :echoe \"Use h\"<CR>\nnnoremap <Right> :echoe \"Use l\"<CR>\nnnoremap <Up> :echoe \"Use k\"<CR>\nnnoremap <Down> :echoe \"Use j\"<CR>\n```\n- Basic movement: `hjkl` (left, down, up, right)\n- Words: `w` (next word), `b` (beginning of word), `e` (end of word)\n- Lines: `0` (beginning of line), `^` (first non-blank character), `$` (end of line)\n- Screen: `H` (top of screen), `M` (middle of screen), `L` (bottom of screen)\n- File: `gg` (beginning of file), `G` (end of file)\n- Line numbers: `:{number}<CR>` or `{number}G` (line {number})\n- Misc: `%` (corresponding item)\n- Find: `f{character}`, `t{character}`, `F{character}`, `T{character}`\n    - find/to forward/backward {character} on the current line\n- Repeating N times: `{number}{movement}`, e.g. `10j` moves down 10 lines\n- Search: `/{regex}`, `n` / `N` for navigating matches\n\n### Selection\n\nVisual modes:\n\n- Visual\n- Visual Line\n- Visual Block\n\nCan use movement keys to make selection.\n\n### Manipulating text\n\nEverything that you used to do with the mouse, you now do with keyboards (and\npowerful, composable commands).\n\n- `i` enter insert mode\n    - but for manipulating/deleting text, want to use something more than\n    backspace\n- `o` / `O` insert line below / above\n- `d{motion}` delete {motion}\n    - e.g. `dw` is delete word, `d$` is delete to end of line, `d0` is delete\n    to beginning of line\n- `c{motion}` change {motion}\n    - e.g. `cw` is change word\n    - like `d{motion}` followed by `i`\n- `x` delete character (equal do `dl`)\n- `s` substitute character (equal to `xi`)\n- visual mode + manipulation\n    - select text, `d` to delete it or `c` to change it\n- `u` to undo, `<C-r>` to redo\n- Lots more to learn: e.g. `~` flips the case of a character\n\n### Resources\n\n- `vimtutor` command-line program to teach you vim\n- [Vim Adventures](https://vim-adventures.com/) game to learn Vim\n\n## Customizing Vim\n\nVim is customized through a plain-text configuration file in `~/.vimrc`\n(containing Vimscript commands). There are probably lots of basic settings that\nyou want to turn on.\n\nLook at people's dotfiles on GitHub for inspiration, but try not to\ncopy-and-paste people's full configuration. Read it, understand it, and take\nwhat you need.\n\nSome customizations to consider:\n\n- Syntax highlighting: `syntax on`\n- Color schemes\n- Line numbers: `set nu` / `set rnu`\n- Backspacing through everything: `set backspace=indent,eol,start`\n\n## Advanced Vim\n\nHere are a few examples to show you the power of the editor. We can't teach you\nall of these kinds of things, but you'll learn them as you go. A good\nheuristic: whenever you're using your editor and you think \"there must be a\nbetter way of doing this\", there probably is: look it up online.\n\n### Search and replace\n\n`:s` (substitute) command ([documentation](http://vim.wikia.com/wiki/Search_and_replace)).\n\n- `%s/foo/bar/g`\n    - replace foo with bar globally in file\n- `%s/\\[.*\\](\\(.*\\))/\\1/g`\n    - replace named Markdown links with plain URLs\n\n### Multiple windows\n\n- `sp` / `vsp` to split windows\n- Can have multiple views of the same buffer.\n\n### Mouse support\n\n- `set mouse+=a`\n    - can click, scroll select\n\n### Macros\n\n- `q{character}` to start recording a macro in register `{character}`\n- `q` to stop recording\n- `@{character}` replays the macro\n- Macro execution stops on error\n- `{number}@{character}` executes a macro {number} times\n- Macros can be recursive\n    - first clear the macro with `q{character}q`\n    - record the macro, with `@{character}` to invoke the macro recursively\n    (will be a no-op until recording is complete)\n- Example: convert xml to json ([file](/2019/files/example-data.xml))\n    - Array of objects with keys \"name\" / \"email\"\n    - Use a Python program?\n    - Use sed / regexes\n        - `g/people/d`\n        - `%s/<person>/{/g`\n        - `%s/<name>\\(.*\\)<\\/name>/\"name\": \"\\1\",/g`\n        - ...\n    - Vim commands / macros\n        - `Gdd`, `ggdd` delete first and last lines\n        - Macro to format a single element (register `e`)\n            - Go to line with `<name>`\n            - `qe^r\"f>s\": \"<ESC>f<C\"<ESC>q`\n        - Macro to format a person\n            - Go to line with `<person>`\n            - `qpS{<ESC>j@eA,<ESC>j@ejS},<ESC>q`\n        - Macro to format a person and go to the next person\n            - Go to line with `<person>`\n            - `qq@pjq`\n        - Execute macro until end of file\n            - `999@q`\n        - Manually remove last `,` and add `[` and `]` delimiters\n\n## Extending Vim\n\nThere are tons of plugins for extending vim.\n\nFirst, get set up with a plugin manager like\n[vim-plug](https://github.com/junegunn/vim-plug),\n[Vundle](https://github.com/VundleVim/Vundle.vim), or\n[pathogen.vim](https://github.com/tpope/vim-pathogen).\n\nSome plugins to consider:\n\n- [ctrlp.vim](https://github.com/kien/ctrlp.vim): fuzzy file finder\n- [vim-fugitive](https://github.com/tpope/vim-fugitive): git integration\n- [vim-surround](https://github.com/tpope/vim-surround): manipulating \"surroundings\"\n- [gundo.vim](https://github.com/sjl/gundo.vim): navigate undo tree\n- [nerdtree](https://github.com/scrooloose/nerdtree): file explorer\n- [syntastic](https://github.com/vim-syntastic/syntastic): syntax checking\n- [vim-easymotion](https://github.com/easymotion/vim-easymotion): magic motions\n- [vim-over](https://github.com/osyo-manga/vim-over): substitute preview\n\nLists of plugins:\n\n- [Vim Awesome](https://vimawesome.com/)\n\n## Vim-mode in Other Programs\n\nFor many popular editors (e.g. vim and emacs), many other tools support editor\nemulation.\n\n- Shell\n    - bash: `set -o vi`\n    - zsh: `bindkey -v`\n    - `export EDITOR=vim` (environment variable used by programs like `git`)\n- `~/.inputrc`\n    - `set editing-mode vi`\n\nThere are even vim keybinding extensions for web [browsers](http://vim.wikia.com/wiki/Vim_key_bindings_for_web_browsers), some popular ones are [Vimium](https://chrome.google.com/webstore/detail/vimium/dbepggeogbaibhgnhhndojpepiihcmeb?hl=en) for Google Chrome and [Tridactyl](https://github.com/tridactyl/tridactyl) for Firefox.\n\n\n## Resources\n\n- [Vim Tips Wiki](http://vim.wikia.com/wiki/Vim_Tips_Wiki)\n- [Vim Advent Calendar](https://vimways.org/2018/): various Vim tips\n- [Neovim](https://neovim.io/) is a modern vim reimplementation with more active development.\n- [Vim Golf](http://www.vimgolf.com/): Various Vim challenges\n\n{% comment %}\n# Resources\n\nTODO resources for other editors?\n{% endcomment %}\n\n# Exercises\n\n1. Experiment with some editors. Try at least one command-line editor (e.g.\n   Vim) and at least one GUI editor (e.g. Atom). Learn through tutorials like\n   `vimtutor` (or the equivalents for other editors). To get a real feel for a\n   new editor, commit to using it exclusively for a couple days while going\n   about your work.\n\n1. Customize your editor. Look through tips and tricks online, and look through\n   other people's configurations (often, they are well-documented).\n\n1. Experiment with plugins for your editor.\n\n1. Commit to using a powerful editor for at least a couple weeks: you should\n   start seeing the benefits by then. At some point, you should be able to get\n   your editor to work as fast as you think.\n\n1. Install a linter (e.g. pyflakes for python) link it to your editor and test it is working.\n"
  },
  {
    "path": "_2019/files/example-data.xml",
    "content": "<people>\n  <person>\n    <name>Johnny Zhang Jr.</name>\n    <email>amyalvarez@cole.com</email>\n  </person>\n  <person>\n    <name>Edward Cook</name>\n    <email>dsparks@alvarez-dunn.com</email>\n  </person>\n  <person>\n    <name>Stephen Sweeney</name>\n    <email>dlewis@gmail.com</email>\n  </person>\n  <person>\n    <name>Krystal Riley</name>\n    <email>jflores@wright.biz</email>\n  </person>\n  <person>\n    <name>Ashley Robinson</name>\n    <email>robertsmichael@yahoo.com</email>\n  </person>\n  <person>\n    <name>Kimberly Brooks</name>\n    <email>sharoncunningham@larson.com</email>\n  </person>\n  <person>\n    <name>Brent Proctor</name>\n    <email>edward86@stewart.com</email>\n  </person>\n  <person>\n    <name>William Roberts</name>\n    <email>parkertodd@webb.com</email>\n  </person>\n  <person>\n    <name>Amanda Morales</name>\n    <email>lorizavala@hodges.com</email>\n  </person>\n  <person>\n    <name>Bryan Poole Jr.</name>\n    <email>carolyn56@gray-campos.net</email>\n  </person>\n  <person>\n    <name>Dale Hall</name>\n    <email>martinjames@yahoo.com</email>\n  </person>\n  <person>\n    <name>Isabella Reynolds</name>\n    <email>wbowen@wallace.com</email>\n  </person>\n  <person>\n    <name>Ann Rodriguez</name>\n    <email>charles37@taylor-riley.biz</email>\n  </person>\n  <person>\n    <name>Bryan Davis</name>\n    <email>jessica60@hotmail.com</email>\n  </person>\n  <person>\n    <name>Dalton Powell</name>\n    <email>piercenatasha@yahoo.com</email>\n  </person>\n  <person>\n    <name>Scott Turner</name>\n    <email>harold68@yahoo.com</email>\n  </person>\n  <person>\n    <name>Nicholas Castillo</name>\n    <email>dawnstephens@robinson.info</email>\n  </person>\n  <person>\n    <name>Joseph Pierce</name>\n    <email>lukepatterson@hotmail.com</email>\n  </person>\n  <person>\n    <name>Robyn White</name>\n    <email>jenniferrobinson@hotmail.com</email>\n  </person>\n  <person>\n    <name>Justin Rice</name>\n    <email>brandi76@gmail.com</email>\n  </person>\n  <person>\n    <name>Jamie Graham</name>\n    <email>harrisdavid@yahoo.com</email>\n  </person>\n  <person>\n    <name>Phillip Schmidt</name>\n    <email>stephanie33@gmail.com</email>\n  </person>\n  <person>\n    <name>John Baker</name>\n    <email>todd86@hotmail.com</email>\n  </person>\n  <person>\n    <name>Sharon Austin</name>\n    <email>srivera@yahoo.com</email>\n  </person>\n  <person>\n    <name>Erica Avila</name>\n    <email>jenniferreed@bowers-wilson.com</email>\n  </person>\n  <person>\n    <name>Jeremy Bass</name>\n    <email>jdavis@collins.com</email>\n  </person>\n  <person>\n    <name>Joshua Parsons</name>\n    <email>stephaniecoleman@miller-barker.com</email>\n  </person>\n  <person>\n    <name>Emma Mccoy</name>\n    <email>taylorjohn@wagner.net</email>\n  </person>\n  <person>\n    <name>Megan Williams</name>\n    <email>ronnie54@gmail.com</email>\n  </person>\n  <person>\n    <name>Michael Sutton</name>\n    <email>connie58@mendoza.net</email>\n  </person>\n  <person>\n    <name>Nicholas York</name>\n    <email>kennedykevin@collins.com</email>\n  </person>\n  <person>\n    <name>Donald Robles</name>\n    <email>williamsbrandon@gmail.com</email>\n  </person>\n  <person>\n    <name>Melissa Allen</name>\n    <email>pproctor@ramos-patel.com</email>\n  </person>\n  <person>\n    <name>Shannon Jones</name>\n    <email>beckkathleen@johnson.com</email>\n  </person>\n  <person>\n    <name>David White</name>\n    <email>sandra73@thompson.com</email>\n  </person>\n  <person>\n    <name>Jonathan Thomas</name>\n    <email>johnsonjeremy@gmail.com</email>\n  </person>\n  <person>\n    <name>Rachael Floyd</name>\n    <email>amanda78@johnson.info</email>\n  </person>\n  <person>\n    <name>Tina Carter</name>\n    <email>josewells@jones.net</email>\n  </person>\n  <person>\n    <name>Eric Johnson</name>\n    <email>bowersaustin@hernandez-edwards.com</email>\n  </person>\n  <person>\n    <name>William Kramer</name>\n    <email>rhunt@johnson.com</email>\n  </person>\n  <person>\n    <name>Nathan Williams</name>\n    <email>cynthiayoung@hotmail.com</email>\n  </person>\n  <person>\n    <name>Patty Schwartz</name>\n    <email>salinasdavid@sheppard.biz</email>\n  </person>\n  <person>\n    <name>David Collins</name>\n    <email>pcalhoun@yahoo.com</email>\n  </person>\n  <person>\n    <name>James Thomas</name>\n    <email>brianfox@rogers-cruz.com</email>\n  </person>\n  <person>\n    <name>Mark Casey</name>\n    <email>jerry88@graham.com</email>\n  </person>\n  <person>\n    <name>Robert Galloway</name>\n    <email>cherylmcgee@hotmail.com</email>\n  </person>\n  <person>\n    <name>Caitlin Dunn</name>\n    <email>nicholemartin@yahoo.com</email>\n  </person>\n  <person>\n    <name>Nancy Allison</name>\n    <email>martha33@molina-bullock.com</email>\n  </person>\n  <person>\n    <name>Marvin Burns</name>\n    <email>wrocha@gmail.com</email>\n  </person>\n  <person>\n    <name>Kimberly Jones</name>\n    <email>anitamunoz@french-christian.com</email>\n  </person>\n  <person>\n    <name>Caitlin Wood</name>\n    <email>thomasrandall@bowers-sullivan.org</email>\n  </person>\n  <person>\n    <name>Sara Burton</name>\n    <email>riosangelica@gmail.com</email>\n  </person>\n  <person>\n    <name>Jessica Roberson</name>\n    <email>theresa11@hotmail.com</email>\n  </person>\n  <person>\n    <name>Nicole Macias</name>\n    <email>kevinhodge@martin.biz</email>\n  </person>\n  <person>\n    <name>Christina Williams</name>\n    <email>shawn35@rice-bailey.org</email>\n  </person>\n  <person>\n    <name>Cody Winters</name>\n    <email>nicholassmith@barron-wu.com</email>\n  </person>\n  <person>\n    <name>Patricia Miller DDS</name>\n    <email>pierceraymond@watkins.org</email>\n  </person>\n  <person>\n    <name>Jennifer Lyons</name>\n    <email>vrivera@gmail.com</email>\n  </person>\n  <person>\n    <name>Jerry Rojas</name>\n    <email>jacobalexander@yahoo.com</email>\n  </person>\n  <person>\n    <name>Matthew Perez</name>\n    <email>jrivas@hotmail.com</email>\n  </person>\n  <person>\n    <name>Patrick Hogan</name>\n    <email>moorelisa@yahoo.com</email>\n  </person>\n  <person>\n    <name>Lisa Howard</name>\n    <email>stephen90@smith.biz</email>\n  </person>\n  <person>\n    <name>Justin Sloan</name>\n    <email>edwardsmichael@hotmail.com</email>\n  </person>\n  <person>\n    <name>Suzanne Morrow</name>\n    <email>shane74@yahoo.com</email>\n  </person>\n  <person>\n    <name>Theresa Lara</name>\n    <email>maryrichardson@clark.com</email>\n  </person>\n  <person>\n    <name>Christopher Powers</name>\n    <email>yfowler@davis-lee.net</email>\n  </person>\n  <person>\n    <name>Teresa Howell</name>\n    <email>amy15@yahoo.com</email>\n  </person>\n  <person>\n    <name>Richard Shelton</name>\n    <email>ksmith@yahoo.com</email>\n  </person>\n  <person>\n    <name>Jeremy Cole</name>\n    <email>bleach@gmail.com</email>\n  </person>\n  <person>\n    <name>Melissa Clark</name>\n    <email>rosejeffrey@yahoo.com</email>\n  </person>\n  <person>\n    <name>Kimberly Mcdaniel</name>\n    <email>ularson@ross-david.com</email>\n  </person>\n  <person>\n    <name>Kelly Dixon</name>\n    <email>gatesstephen@hotmail.com</email>\n  </person>\n  <person>\n    <name>Devin Quinn</name>\n    <email>wjohnson@hotmail.com</email>\n  </person>\n  <person>\n    <name>Kevin Greene</name>\n    <email>lhanson@hotmail.com</email>\n  </person>\n  <person>\n    <name>Jeffery Wiggins</name>\n    <email>amy76@gmail.com</email>\n  </person>\n  <person>\n    <name>Latoya Allen</name>\n    <email>vking@yahoo.com</email>\n  </person>\n  <person>\n    <name>Zachary Walker</name>\n    <email>diazjames@hotmail.com</email>\n  </person>\n  <person>\n    <name>Alyssa Molina</name>\n    <email>elizabeth59@gmail.com</email>\n  </person>\n  <person>\n    <name>Heather Miranda</name>\n    <email>davidturner@cortez-martinez.biz</email>\n  </person>\n  <person>\n    <name>Lori Gardner</name>\n    <email>murphytaylor@yahoo.com</email>\n  </person>\n  <person>\n    <name>Jessica Simpson</name>\n    <email>jamesdean@rosales.com</email>\n  </person>\n  <person>\n    <name>Anna Dickerson</name>\n    <email>abigailmurphy@hotmail.com</email>\n  </person>\n  <person>\n    <name>Molly Oconnor</name>\n    <email>morrisrhonda@yahoo.com</email>\n  </person>\n  <person>\n    <name>Brandi Braun</name>\n    <email>ericksonmatthew@jenkins.org</email>\n  </person>\n  <person>\n    <name>Renee Flowers</name>\n    <email>brownantonio@yang-crosby.org</email>\n  </person>\n  <person>\n    <name>Cassandra Compton</name>\n    <email>progers@yahoo.com</email>\n  </person>\n  <person>\n    <name>David Gilbert</name>\n    <email>vickie78@gmail.com</email>\n  </person>\n  <person>\n    <name>Brenda Davis</name>\n    <email>cynthiajones@thornton.com</email>\n  </person>\n  <person>\n    <name>Nicholas Rivera</name>\n    <email>longalyssa@yahoo.com</email>\n  </person>\n  <person>\n    <name>Dustin Hodges</name>\n    <email>sgolden@lee.com</email>\n  </person>\n  <person>\n    <name>Chad Wong</name>\n    <email>williambernard@mccarty.net</email>\n  </person>\n  <person>\n    <name>Robin Craig</name>\n    <email>xbyrd@austin.com</email>\n  </person>\n  <person>\n    <name>Heather Parker</name>\n    <email>allenjoshua@rodriguez.com</email>\n  </person>\n  <person>\n    <name>Jennifer Roberts</name>\n    <email>manningtravis@gmail.com</email>\n  </person>\n  <person>\n    <name>James Andrews</name>\n    <email>ginaromero@hotmail.com</email>\n  </person>\n  <person>\n    <name>Dorothy Hines</name>\n    <email>dsmith@thomas.com</email>\n  </person>\n  <person>\n    <name>Stephen Garcia</name>\n    <email>hughesbrendan@hotmail.com</email>\n  </person>\n  <person>\n    <name>Alfred Ellis</name>\n    <email>elizabeth41@crawford.info</email>\n  </person>\n  <person>\n    <name>Marilyn White</name>\n    <email>victoriaford@hotmail.com</email>\n  </person>\n  <person>\n    <name>Brian Graves</name>\n    <email>cpatel@gmail.com</email>\n  </person>\n  <person>\n    <name>Elizabeth Wagner</name>\n    <email>newtonwesley@cohen.com</email>\n  </person>\n  <person>\n    <name>Michelle Flores</name>\n    <email>shelbygross@duke-thomas.info</email>\n  </person>\n  <person>\n    <name>Larry Russell</name>\n    <email>richard99@meyer.com</email>\n  </person>\n  <person>\n    <name>Terrence Boyd</name>\n    <email>markmartin@flores.com</email>\n  </person>\n  <person>\n    <name>Jessica Carroll</name>\n    <email>eric30@yahoo.com</email>\n  </person>\n  <person>\n    <name>Erin Dean</name>\n    <email>toddmartin@guerra.biz</email>\n  </person>\n  <person>\n    <name>Craig Hernandez</name>\n    <email>joshualang@gonzalez.com</email>\n  </person>\n  <person>\n    <name>Amber Choi</name>\n    <email>doughertynancy@harmon.org</email>\n  </person>\n  <person>\n    <name>Renee Brown</name>\n    <email>terribeard@archer-gibson.info</email>\n  </person>\n  <person>\n    <name>Curtis Turner</name>\n    <email>pjohnson@hotmail.com</email>\n  </person>\n  <person>\n    <name>Benjamin Reed</name>\n    <email>marksmith@austin.net</email>\n  </person>\n  <person>\n    <name>Christina Fernandez</name>\n    <email>richardjoseph@esparza-peters.com</email>\n  </person>\n  <person>\n    <name>Jasmine Campbell</name>\n    <email>thomasmatthew@gmail.com</email>\n  </person>\n  <person>\n    <name>Catherine Bond</name>\n    <email>coreyroberts@gonzalez.com</email>\n  </person>\n  <person>\n    <name>Connie Jones</name>\n    <email>koneal@riley.com</email>\n  </person>\n  <person>\n    <name>Cody Taylor</name>\n    <email>kelsey99@hotmail.com</email>\n  </person>\n  <person>\n    <name>Kendra Gray</name>\n    <email>walkerrussell@hotmail.com</email>\n  </person>\n  <person>\n    <name>Alexander Murray</name>\n    <email>grossrobert@hotmail.com</email>\n  </person>\n  <person>\n    <name>Arthur Jackson</name>\n    <email>travis73@hotmail.com</email>\n  </person>\n  <person>\n    <name>Dr. William Vasquez DDS</name>\n    <email>gonzalezdaniel@hotmail.com</email>\n  </person>\n  <person>\n    <name>April Hampton</name>\n    <email>desireemorris@mcguire.info</email>\n  </person>\n  <person>\n    <name>Gerald Hunter</name>\n    <email>justin91@ross-scott.biz</email>\n  </person>\n  <person>\n    <name>Morgan Bolton</name>\n    <email>erika30@lloyd-smith.biz</email>\n  </person>\n  <person>\n    <name>Angela Barker</name>\n    <email>daniel17@carr.com</email>\n  </person>\n  <person>\n    <name>Angela Montgomery</name>\n    <email>jonathangoodwin@smith-perez.com</email>\n  </person>\n  <person>\n    <name>Yolanda Henry</name>\n    <email>shawnmcguire@gmail.com</email>\n  </person>\n  <person>\n    <name>Susan Hines</name>\n    <email>sarahbailey@wallace.com</email>\n  </person>\n  <person>\n    <name>Michelle Young</name>\n    <email>lewismichele@yahoo.com</email>\n  </person>\n  <person>\n    <name>Glen Hood</name>\n    <email>ljackson@vazquez.com</email>\n  </person>\n  <person>\n    <name>Christopher Wright</name>\n    <email>evansjulie@walton.com</email>\n  </person>\n  <person>\n    <name>Susan Guzman DDS</name>\n    <email>medinaelizabeth@gmail.com</email>\n  </person>\n  <person>\n    <name>Barbara Cortez</name>\n    <email>bchavez@cameron.com</email>\n  </person>\n  <person>\n    <name>Stacey Hammond</name>\n    <email>nancyturner@stewart.com</email>\n  </person>\n  <person>\n    <name>Amanda Stout</name>\n    <email>macdonaldlatoya@hotmail.com</email>\n  </person>\n  <person>\n    <name>Lisa Johnson</name>\n    <email>wnolan@gmail.com</email>\n  </person>\n  <person>\n    <name>Carlos Wyatt</name>\n    <email>iperez@cohen.com</email>\n  </person>\n  <person>\n    <name>Samantha Brewer</name>\n    <email>thomas47@hotmail.com</email>\n  </person>\n  <person>\n    <name>Brett Jackson</name>\n    <email>zpowell@cruz-rivera.com</email>\n  </person>\n  <person>\n    <name>Johnny Guzman</name>\n    <email>tmerritt@yahoo.com</email>\n  </person>\n  <person>\n    <name>Mary Davis</name>\n    <email>collinslisa@hotmail.com</email>\n  </person>\n  <person>\n    <name>Willie Mccoy</name>\n    <email>joshua20@terrell.biz</email>\n  </person>\n  <person>\n    <name>Kelsey Rivera</name>\n    <email>randy72@gmail.com</email>\n  </person>\n  <person>\n    <name>Melissa Maddox</name>\n    <email>christopher13@gmail.com</email>\n  </person>\n  <person>\n    <name>Jason Rodriguez</name>\n    <email>kellypierce@harris.com</email>\n  </person>\n  <person>\n    <name>Donna Walsh</name>\n    <email>wardraymond@martinez.com</email>\n  </person>\n  <person>\n    <name>Monique Patel</name>\n    <email>cynthia75@james.net</email>\n  </person>\n  <person>\n    <name>Dr. Lindsay Farrell PhD</name>\n    <email>brownmaria@gmail.com</email>\n  </person>\n  <person>\n    <name>Ann Ruiz</name>\n    <email>jeremiah94@pennington.org</email>\n  </person>\n  <person>\n    <name>Mary Alexander</name>\n    <email>catherineharper@munoz.org</email>\n  </person>\n  <person>\n    <name>Brittany Russell</name>\n    <email>haileywinters@russell-coffey.net</email>\n  </person>\n  <person>\n    <name>Dominique Rosales</name>\n    <email>matthewpatterson@carr.com</email>\n  </person>\n  <person>\n    <name>Henry Waters</name>\n    <email>karen72@logan.com</email>\n  </person>\n  <person>\n    <name>Jared Weaver</name>\n    <email>karlafletcher@baldwin.org</email>\n  </person>\n  <person>\n    <name>Mr. Thomas Atkins</name>\n    <email>gboone@gmail.com</email>\n  </person>\n  <person>\n    <name>Carla Cohen</name>\n    <email>ibarron@gmail.com</email>\n  </person>\n  <person>\n    <name>Tricia Lewis</name>\n    <email>pperez@hotmail.com</email>\n  </person>\n  <person>\n    <name>Mario Gill</name>\n    <email>lisa43@brown.org</email>\n  </person>\n  <person>\n    <name>James Olsen</name>\n    <email>vickie82@hotmail.com</email>\n  </person>\n  <person>\n    <name>Michael Perry</name>\n    <email>rdavis@yahoo.com</email>\n  </person>\n  <person>\n    <name>Matthew Lucas</name>\n    <email>joshuagray@carpenter-stanley.com</email>\n  </person>\n  <person>\n    <name>Christine Torres</name>\n    <email>samanthayoung@smith-aguilar.biz</email>\n  </person>\n  <person>\n    <name>Lindsay Miller</name>\n    <email>randyevans@yahoo.com</email>\n  </person>\n  <person>\n    <name>Margaret Jones</name>\n    <email>kevincantu@alexander-carson.org</email>\n  </person>\n  <person>\n    <name>Cameron Mcdonald</name>\n    <email>deckerjerome@garcia.com</email>\n  </person>\n  <person>\n    <name>Brittany Sanders</name>\n    <email>dennis55@leonard-turner.com</email>\n  </person>\n  <person>\n    <name>Daniel Patterson</name>\n    <email>timothy36@novak.com</email>\n  </person>\n  <person>\n    <name>David Chaney</name>\n    <email>kristen02@hotmail.com</email>\n  </person>\n  <person>\n    <name>Sheri Silva</name>\n    <email>idawson@alvarez.com</email>\n  </person>\n  <person>\n    <name>Holly Ward</name>\n    <email>saraallen@dunn-smith.net</email>\n  </person>\n  <person>\n    <name>Bryan Solis</name>\n    <email>stacey30@lam.biz</email>\n  </person>\n  <person>\n    <name>Diane Carter</name>\n    <email>paulvargas@gmail.com</email>\n  </person>\n  <person>\n    <name>David Brown</name>\n    <email>james98@gmail.com</email>\n  </person>\n  <person>\n    <name>Bridget Fritz</name>\n    <email>beth24@hotmail.com</email>\n  </person>\n  <person>\n    <name>Paul Boyd</name>\n    <email>johngutierrez@hotmail.com</email>\n  </person>\n  <person>\n    <name>Ernest Baker</name>\n    <email>phillipwhite@hotmail.com</email>\n  </person>\n  <person>\n    <name>George Myers</name>\n    <email>frank52@hammond.com</email>\n  </person>\n  <person>\n    <name>Daniel Miller</name>\n    <email>joshua96@gmail.com</email>\n  </person>\n  <person>\n    <name>Jonathan Ayala</name>\n    <email>jerryharris@davis.net</email>\n  </person>\n  <person>\n    <name>Jill Stone</name>\n    <email>pwright@hotmail.com</email>\n  </person>\n  <person>\n    <name>Trevor Richard</name>\n    <email>mreed@thompson.org</email>\n  </person>\n  <person>\n    <name>Jason Thomas</name>\n    <email>josephflowers@hotmail.com</email>\n  </person>\n  <person>\n    <name>Arthur Thomas</name>\n    <email>lnelson@hicks.com</email>\n  </person>\n  <person>\n    <name>Austin Collins</name>\n    <email>ambermann@barnes.com</email>\n  </person>\n  <person>\n    <name>Jason Diaz</name>\n    <email>ericreyes@hotmail.com</email>\n  </person>\n  <person>\n    <name>Darryl Hall</name>\n    <email>faithdixon@barnes-burgess.org</email>\n  </person>\n  <person>\n    <name>Jason Thomas</name>\n    <email>brittany32@yahoo.com</email>\n  </person>\n  <person>\n    <name>John Sanders</name>\n    <email>waltontheresa@hotmail.com</email>\n  </person>\n  <person>\n    <name>Lisa Hayes</name>\n    <email>victor14@hotmail.com</email>\n  </person>\n  <person>\n    <name>Chelsea Wong</name>\n    <email>iwatkins@williams-solomon.com</email>\n  </person>\n  <person>\n    <name>Joseph Fitzgerald</name>\n    <email>mary86@hotmail.com</email>\n  </person>\n  <person>\n    <name>Crystal Schroeder</name>\n    <email>kbarron@wilson-flynn.org</email>\n  </person>\n  <person>\n    <name>Denise Bean</name>\n    <email>noah23@gmail.com</email>\n  </person>\n  <person>\n    <name>Jamie Atkins</name>\n    <email>cwebb@hotmail.com</email>\n  </person>\n  <person>\n    <name>Joshua Kim</name>\n    <email>esmith@ramirez.com</email>\n  </person>\n  <person>\n    <name>Deanna Mooney</name>\n    <email>jason13@turner.com</email>\n  </person>\n  <person>\n    <name>Jasmine Baker</name>\n    <email>torresjacob@braun.com</email>\n  </person>\n  <person>\n    <name>Victoria Williams</name>\n    <email>rwilliams@hotmail.com</email>\n  </person>\n  <person>\n    <name>Sandra Hall</name>\n    <email>williamsonrichard@gmail.com</email>\n  </person>\n  <person>\n    <name>Miranda Mcpherson</name>\n    <email>xrussell@barajas.biz</email>\n  </person>\n  <person>\n    <name>Samantha Walton</name>\n    <email>danielle73@gmail.com</email>\n  </person>\n  <person>\n    <name>Kyle Serrano</name>\n    <email>stonecassandra@mcfarland.info</email>\n  </person>\n  <person>\n    <name>Mr. Bruce Maldonado DDS</name>\n    <email>diazmatthew@yahoo.com</email>\n  </person>\n  <person>\n    <name>Amber Fisher</name>\n    <email>jonesdavid@rubio.info</email>\n  </person>\n  <person>\n    <name>Brett Berry</name>\n    <email>millerteresa@gmail.com</email>\n  </person>\n  <person>\n    <name>Cory Bradley</name>\n    <email>umatthews@summers.com</email>\n  </person>\n  <person>\n    <name>Ryan Peters</name>\n    <email>shepherdmonique@gmail.com</email>\n  </person>\n  <person>\n    <name>Laura Lee</name>\n    <email>lfleming@higgins.com</email>\n  </person>\n  <person>\n    <name>Christian Smith</name>\n    <email>johnnymartinez@castro-miller.com</email>\n  </person>\n  <person>\n    <name>Kelly Hanson</name>\n    <email>velazquezsandra@chavez-malone.info</email>\n  </person>\n  <person>\n    <name>Brian King</name>\n    <email>hwood@yahoo.com</email>\n  </person>\n  <person>\n    <name>Cynthia Owens</name>\n    <email>sbrown@hotmail.com</email>\n  </person>\n  <person>\n    <name>Lisa Clark</name>\n    <email>derek74@bell-martinez.com</email>\n  </person>\n  <person>\n    <name>Brenda Ford</name>\n    <email>kevin55@hotmail.com</email>\n  </person>\n  <person>\n    <name>Daniel Brady</name>\n    <email>wbennett@hotmail.com</email>\n  </person>\n  <person>\n    <name>Jake Wilson</name>\n    <email>lorraine60@solis.biz</email>\n  </person>\n  <person>\n    <name>April Cole</name>\n    <email>halltyler@yahoo.com</email>\n  </person>\n  <person>\n    <name>Melissa Callahan</name>\n    <email>cmckenzie@rodriguez.info</email>\n  </person>\n  <person>\n    <name>Taylor Brown</name>\n    <email>davisadam@gmail.com</email>\n  </person>\n  <person>\n    <name>Patrick Guerrero</name>\n    <email>hannah48@delgado.net</email>\n  </person>\n  <person>\n    <name>Brian Gonzalez</name>\n    <email>burchmalik@johnson.com</email>\n  </person>\n  <person>\n    <name>Robert Bailey</name>\n    <email>debbiemoore@hotmail.com</email>\n  </person>\n  <person>\n    <name>Jesus Maynard</name>\n    <email>gene45@gmail.com</email>\n  </person>\n  <person>\n    <name>Linda Greer</name>\n    <email>johnharris@reed-allen.net</email>\n  </person>\n  <person>\n    <name>Travis Thomas</name>\n    <email>bryantrachel@gmail.com</email>\n  </person>\n  <person>\n    <name>Vicki Mitchell</name>\n    <email>edaniels@hotmail.com</email>\n  </person>\n  <person>\n    <name>Paula Espinoza</name>\n    <email>donnameyer@dennis.org</email>\n  </person>\n  <person>\n    <name>James Hoffman</name>\n    <email>haustin@larson-wiggins.biz</email>\n  </person>\n  <person>\n    <name>Ashlee Perkins</name>\n    <email>stevenknapp@miller.com</email>\n  </person>\n  <person>\n    <name>Rebecca Leon</name>\n    <email>smitchell@simpson-johnson.com</email>\n  </person>\n  <person>\n    <name>Jorge Williams</name>\n    <email>shawn36@peters-meadows.com</email>\n  </person>\n  <person>\n    <name>Bob Flores</name>\n    <email>kellercourtney@yahoo.com</email>\n  </person>\n  <person>\n    <name>Lisa Miller</name>\n    <email>johnsoncrystal@gmail.com</email>\n  </person>\n  <person>\n    <name>Brandon Davis</name>\n    <email>bryanpetersen@hotmail.com</email>\n  </person>\n  <person>\n    <name>Joshua Daugherty</name>\n    <email>josehayes@carey.com</email>\n  </person>\n  <person>\n    <name>Justin Wise</name>\n    <email>pamelacosta@simmons-morrow.com</email>\n  </person>\n  <person>\n    <name>Kimberly Johnson</name>\n    <email>combssandra@deleon.com</email>\n  </person>\n  <person>\n    <name>Toni Stone</name>\n    <email>eestrada@charles.com</email>\n  </person>\n  <person>\n    <name>Julie Rivers</name>\n    <email>rwilliams@castillo-nelson.org</email>\n  </person>\n  <person>\n    <name>Kelly Scott</name>\n    <email>danielsmith@hotmail.com</email>\n  </person>\n  <person>\n    <name>Michael Carr</name>\n    <email>clarklisa@newman-barrett.com</email>\n  </person>\n  <person>\n    <name>Jonathan Vaughn</name>\n    <email>dennisrebecca@lawrence-harris.com</email>\n  </person>\n  <person>\n    <name>Erica Lowe</name>\n    <email>wilsonkelly@hotmail.com</email>\n  </person>\n  <person>\n    <name>Kimberly Clark</name>\n    <email>jose15@gmail.com</email>\n  </person>\n  <person>\n    <name>Lindsey Robertson</name>\n    <email>rdickerson@yahoo.com</email>\n  </person>\n  <person>\n    <name>Cindy Anderson</name>\n    <email>gmorton@daniels.com</email>\n  </person>\n  <person>\n    <name>Tami Barber</name>\n    <email>harveykaren@hotmail.com</email>\n  </person>\n  <person>\n    <name>Tiffany Wu</name>\n    <email>jessica90@gmail.com</email>\n  </person>\n  <person>\n    <name>Edward Bowers</name>\n    <email>hallkathy@gmail.com</email>\n  </person>\n  <person>\n    <name>Shawn Collier</name>\n    <email>rhondasmith@hotmail.com</email>\n  </person>\n  <person>\n    <name>Michael Cox</name>\n    <email>usimpson@graham-cunningham.net</email>\n  </person>\n</people>\n"
  },
  {
    "path": "_2019/files/example.c",
    "content": "#include <stdio.h>\n\nconst char *numbers[] = {\n    \"one\",\n    \"two\",\n    \"three\",\n    \"four\",\n    \"five\",\n    \"six\",\n    \"seven\",\n    \"eight\",\n    \"nine\",\n    \"ten\"\n};\n\nvoid say(int i)\n{\n    const char *msg = numbers[i-1];\n    printf(\"%s\\n\", msg);\n}\n\nint main()\n{\n    for (int i = 1; i <= 10; i++) {\n        say(i);\n    }\n}\n"
  },
  {
    "path": "_2019/index.html",
    "content": "---\nlayout: page\ntitle: \"2019 Lectures\"\npermalink: /2019/\n---\n\n<p>Click on specific topics below to see lecture videos and lecture notes.</p>\n\n<h1>Tuesday, 1/15</h1>\n<ul>\n  <li><a href=\"/2019/course-overview/\">Course overview</a></li>\n  <li><a href=\"/2019/virtual-machines/\">Virtual machines and containers</a></li>\n  <li><a href=\"/2019/shell/\">Shell and scripting</a></li>\n</ul>\n\n<h1>Thursday, 1/17</h1>\n<ul>\n  <li><a href=\"/2019/command-line/\">Command-line environment</a></li>\n  <li><a href=\"/2019/data-wrangling/\">Data wrangling</a></li>\n</ul>\n\n<h1>Tuesday, 1/22</h1>\n<ul>\n  <li><a href=\"/2019/editors/\">Editors</a></li>\n  <li><a href=\"/2019/version-control/\">Version control</a></li>\n</ul>\n\n<h1>Thursday, 1/24</h1>\n<ul>\n  <li><a href=\"/2019/dotfiles/\">Dotfiles</a></li>\n  <li><a href=\"/2019/backups/\">Backups</a></li>\n  <li><a href=\"/2019/automation/\">Automation</a></li>\n  <li><a href=\"/2019/machine-introspection/\">Machine introspection</a></li>\n</ul>\n\n<h1>Tuesday, 1/29</h1>\n<ul>\n  <li><a href=\"/2019/program-introspection/\">Program introspection</a></li>\n  <li><a href=\"/2019/package-management/\">Package/dependency management</a></li>\n  <li><a href=\"/2019/os-customization/\">OS customization</a></li>\n  <li><a href=\"/2019/remote-machines/\">Remote machines</a></li>\n</ul>\n\n<h1>Thursday, 1/31</h1>\n<ul>\n  <li><a href=\"/2019/web/\">Web and browsers</a></li>\n  <li><a href=\"/2019/security/\">Security and privacy</a></li>\n</ul>\n\n<hr>\n\n<h1>Discussion</h1>\n\n<p>We've also shared this class beyond MIT in the hopes that others may\nbenefit from these resources. You can find posts and discussion on</p>\n\n<ul>\n  <li><a href=\"https://news.ycombinator.com/item?id=19078281\">Hacker News</a></li>\n  <li><a href=\"https://lobste.rs/s/h6157x/mit_hacker_tools_lecture_series_on\">Lobsters</a></li>\n  <li><a href=\"https://www.reddit.com/r/learnprogramming/comments/an42uu/mit_hacker_tools_a_lecture_series_on_programmer/\">/r/learnprogramming</a></li>\n  <li><a href=\"https://www.reddit.com/r/programming/comments/an3xki/mit_hacker_tools_a_lecture_series_on_programmer/\">/r/programming</a></li>\n  <li><a href=\"https://twitter.com/Jonhoo/status/1091896192332693504\">Twitter</a></li>\n  <li><a href=\"https://www.youtube.com/playlist?list=PLyzOVJj3bHQuiujH1lpn8cA9dsyulbYRv\">YouTube</a></li>\n</ul>\n"
  },
  {
    "path": "_2019/machine-introspection.md",
    "content": "---\nlayout: lecture\ntitle: \"Machine Introspection\"\npresenter: Jon\nvideo:\n  aspect: 56.25\n  id: eNYT2Oq3PF8\n---\n\nSometimes, computers misbehave. And very often, you want to know why.\nLet's look at some tools that help you do that!\n\nBut first, let's make sure you're able to do introspection. Often,\nsystem introspection requires that you have certain privileges, like\nbeing the member of a group (like `power` for shutdown). The `root` user\nis the ultimate privilege; they can do pretty much anything. You can run\na command as `root` (but be careful!) using `sudo`.\n\n## What happened?\n\nIf something goes wrong, the first place to start is to look at what\nhappened around the time when things went wrong. For this, we need to\nlook at logs.\n\nTraditionally, logs were all stored in `/var/log`, and many still are.\nUsually there's a file or folder per program. Use `grep` or `less` to\nfind your way through them.\n\nThere's also a kernel log that you can see using the `dmesg` command.\nThis used to be available as a plain-text file, but nowadays you often\nhave to go through `dmesg` to get at it.\n\nFinally, there is the \"system log\", which is increasingly where all of\nyour log messages go. On _most_, though not all, Linux systems, that log\nis managed by `systemd`, the \"system daemon\", which controls all the\nservices that run in the background (and much much more at this point).\nThat log is accessible through the somewhat inconvenient `journalctl`\ntool if you are root, or part of the `admin` or `wheel` groups.\n\nFor `journalctl`, you should be aware of these flags in particular:\n\n - `-u UNIT`: show only messages related to the given systemd service\n - `--full`: don't truncate long lines (the stupidest feature)\n - `-b`: only show messages from the latest boot (see also `-b -2`)\n - `-n100`: only show last 100 entries\n\n## What is happening?\n\nIf something _is_ wrong, or you just want to get a feel for what's going\non in your system, you have a number of tools at your disposal for\ninspecting the currently running system:\n\nFirst, there's `top`, and the improved version `htop`, which show you\nvarious statistics for the currently running processes on the system.\nCPU use, memory use, process trees, etc. There are lots of shortcuts,\nbut `t` is particularly useful for enabling the tree view. You can also\nsee the process tree with `pstree` (+ `-p` to include PIDs). If you want\nto know what those programs are doing, you'll often want to tail their\nlog files. `journalctl -f`, `dmesg -w`, and `tail -f` are you friends\nhere.\n\nSometimes, you want to know more about the resources being used overall\non your system. [`dstat`](http://dag.wiee.rs/home-made/dstat/) is\nexcellent for that. It gives you real-time resource metrics for lots of\ndifferent subsystems like I/O, networking, CPU utilization, context\nswitches, and the like. `man dstat` is the place to start.\n\nIf you're running out of disk space, there are two primary utilities\nyou'll want to know about: `df` and `du`. The former shows you the\nstatus of all the partitions on your system (try it with `-h`), whereas\nthe latter measures the size of all the folders you give it, including\ntheir contents (see also `-h` and `-s`).\n\nTo figure out what network connections you have open, `ss` is the way to\ngo. `ss -t` will show all open TCP connections. `ss -tl` will show all\nlistening (i.e., server) ports on your system. `-p` will also include\nwhich process is using that connection, and `-n` will give you the raw\nport numbers.\n\n\n## System configuration\n\nThere are _many_ ways to configure your system, but we'll go through\ntwo very common ones: networking and services. Most applications on your\nsystem tell you how to configure them in their manpage, and usually it\nwill involve editing files in `/etc`; the system configuration\ndirectory.\n\nIf you want to configure your network, the `ip` command lets you do\nthat. Its arguments take on a slightly weird form, but `ip help command`\nwill get you pretty far. `ip addr` shows you information about your\nnetwork interfaces and how they're configured (IP addresses and such),\nand `ip route` shows you how network traffic is routed to different\nnetwork hosts. Network problems can often be resolved purely through the\n`ip` tool. There's also `iw` for managing wireless network interfaces.\n`ping` is a handy tool for checking how deeply things are broken. Try\npinging a hostname (google.com), an external IP address (1.1.1.1), and\nan internal IP address (192.168.1.1 or default gw). You may also want to\nfiddle with `/etc/resolv.conf` to check your DNS settings (how hostnames\nare resolved to IP addresses).\n\nTo configure services, you pretty much have to interact with `systemd`\nthese days, for better or for worse. Most services on your system will\nhave a systemd service file that defines a systemd _unit_. These files\ndefine what command to run when that services is started, how to stop\nit, where to log things, etc. They're usually not too bad to read, and\nyou can find most of them in `/usr/lib/systemd/system/`. You can also\ndefine your own in `/etc/systemd/system` .\n\nOnce you have a systemd service in mind, you use the `systemctl` command\nto interact with it. `systemctl enable UNIT` will set the service to\nstart on boot (`disable` removes it again), and `start`, `stop`, and\n`restart` will do what you expect. If something goes wrong, systemd will\nlet you know, and you can use `journalctl -u UNIT` to see the\napplication's log. You can also use `systemctl status` to see how all\nyour system services are doing. If your boot feels slow, it's probably\ndue to a couple of slow services, and you can use `systemd-analyze` (try\nit with `blame`) to figure out which ones.\n\n# Exercises\n\n`locate`?\n`dmidecode`?\n`tcpdump`?\n`/boot`?\n`iptables`?\n`/proc`?\n"
  },
  {
    "path": "_2019/os-customization.md",
    "content": "---\nlayout: lecture\ntitle: \"OS Customization\"\npresenter: Anish\nvideo:\n  aspect: 62.5\n  id: epSRVqQzeDo\n---\n\nThere is a lot you can do to customize your operating system beyond what is\navailable in the settings menus.\n\n# Keyboard remapping\n\nYour keyboard probably has keys that you aren't using very much. Instead of\nhaving useless keys, you can remap them to do useful things.\n\n## Remapping to other keys\n\nThe simplest thing is to remap keys to other keys. For example, if you don't\nuse the caps lock key very much, then you can remap it to something more\nuseful. If you are a Vim user, for example, you might want to remap caps lock\nto escape.\n\nOn macOS, you can do some remappings through Keyboard settings in System\nPreferences; for more complicated mappings, you need special software.\n\n## Remapping to arbitrary commands\n\nYou don't just have to remap keys to other keys: there are tools that will let\nyou remap keys (or combinations of keys) to arbitrary commands. For example,\nyou could make command-shift-t open a new terminal window.\n\n# Customizing hidden OS settings\n\n## macOS\n\nmacOS exposes a lot of useful settings through the `defaults` command. For\nexample, you can make Dock icons of hidden applications translucent:\n\n```shell\ndefaults write com.apple.dock showhidden -bool true\n```\n\nThere is no single list of all possible settings, but you can find lists of\nspecific customizations online, such as Mathias Bynens'\n[.macos](https://github.com/mathiasbynens/dotfiles/blob/master/.macos).\n\n# Window management\n\n## Tiling window management\n\n[Tiling window management](https://en.wikipedia.org/wiki/Tiling_window_manager)\nis one approach to window management, where you organize windows into\nnon-overlapping frames. If you're using a Unix-based operating system, you can\ninstall a tiling window manager; if you're using something like Windows or\nmacOS, you can install applications that let you approximate this behavior.\n\n## Screen management\n\nYou can set up keyboard shortcuts to help you manipulate windows across\nscreens.\n\n## Layouts\n\nIf there are specific ways you lay out windows on a screen, rather than\n\"executing\" that layout manually, you can script it, making instantiating a\nlayout trivial.\n\n# Resources\n\n- [Hammerspoon](https://www.hammerspoon.org/) - macOS desktop automation\n- [Rectangle](https://rectangleapp.com/) - macOS window manager\n- [Karabiner](https://karabiner-elements.pqrs.org/) - sophisticated macOS keyboard remapping\n- [r/unixporn](https://www.reddit.com/r/unixporn/) - screenshots and\ndocumentation of people's fancy configurations\n\n# Exercises\n\n1. Figure out how to remap your Caps Lock key to something you use more often\n   (such as Escape or Ctrl or Backspace).\n\n1. Make a custom global keyboard shortcut to open a new terminal window or a\n   new browser window.\n\n{% comment %}\n\nTODO\n\n- Bitbar / Polybar\n- Clipboard Manager (stack/searchable history)\n\n{% endcomment %}\n"
  },
  {
    "path": "_2019/package-management.md",
    "content": "---\nlayout: lecture\ntitle: \"Package Management and Dependency Management\"\npresenter: Anish\nvideo:\n  aspect: 56.25\n  id: tgvt473T8xA\n---\n\nSoftware usually builds on (a collection of) other software, which necessitates\ndependency management.\n\nPackage/dependency management programs are language-specific, but many share\ncommon ideas.\n\n# Package repositories\n\nPackages are hosted in _package repositories_. There are different repositories\nfor different languages (and sometimes multiple for a particular language),\nsuch as [PyPI](https://pypi.org/) for Python, [RubyGems](https://rubygems.org/)\nfor Ruby, and [crates.io](https://crates.io/) for Rust. They generally store\nsoftware (source code and sometimes pre-compiled binaries for specific\nplatforms) for all versions of a package.\n\n# Semantic versioning\n\nSoftware evolves over time, and we need a way to refer to software versions.\nSome simple ways could be to refer to software by a sequence number or a commit\nhash, but we can do better in terms of communicating more information: using\nversion numbers.\n\nThere are many approaches; one popular one is [Semantic\nVersioning](https://semver.org/):\n\n```\nx.y.z\n^ ^ ^\n| | +- patch\n| +--- minor\n+----- major\n```\n\nIncrement **major** version when you make incompatible API changes.\n\nIncrement **minor** version when you add functionality in a backward-compatible manner.\n\nIncrement **patch** when you make backward-compatible bug fixes.\n\nFor example, if you depend on a feature introduced in `v1.2.0` of some\nsoftware, then you can install `v1.x.y` for any minor version `x >= 2` and any\npatch version `y`. You need to install major version `1` (because `2` can\nintroduce backward-incompatible changes), and you need to install a minor\nversion `>= 2` (because you depend on a feature introduced in that minor\nversion). You can use any newer minor version or patch version because\nthey should not introduce any backward-incompatible changes.\n\n# Lock files\n\nIn addition to specifying versions, it can be nice to enforce that the\n_contents_ of the dependency have not changed to prevent tampering. Some tools\nuse _lock files_ to specify cryptographic hashes of dependencies (along with\nversions) that are checked on package install.\n\n# Specifying versions\n\nTools often let you specify versions in multiple ways, such as:\n\n- exact version, e.g. `2.3.12`\n- minimum major version, e.g. `>= 2`\n- specific major version and minimum patch version, e.g. `>= 2.3, <3.0`\n\nSpecifying an exact version can be advantageous to avoid different behaviors\nbased on installed dependencies (this shouldn't happen if all dependencies\nfaithfully follow semver, but sometimes people make mistakes). Specifying a\nminimum requirement has the advantage of allowing bug fixes to be installed\n(e.g. patch upgrades).\n\n# Dependency resolution\n\nPackage managers use various dependency resolution algorithms to satisfy\ndependency requirements. This often gets challenging with complex dependencies\n(e.g. a package can be indirectly depended on by multiple top-level\ndependencies, and different versions could be required). Different package\nmanagers have different levels of sophistication in their dependency\nresolution, but it's something to be aware of: you may need to understand this\nif you are debugging dependencies.\n\n# Virtual environments\n\nIf you're developing multiple software projects, they may depend on different\nversions of a particular piece of software. Sometimes, your build tool will\nhandle this naturally (e.g. by building a static binary).\n\nFor other build tools and programming languages, one approach is handling this\nwith virtual environments (e.g. with the\n[virtualenv](https://docs.python-guide.org/dev/virtualenvs/) tool for Python).\nInstead of installing dependencies system-wide, you can install dependencies\nper-project in a virtual environment, and _activate_ the virtual environment\nthat you want to use when you're working on a specific project.\n\n# Vendoring\n\nAnother very different approach to dependency management is _vendoring_.\nInstead of using a dependency manager or build tool to fetch software, you copy\nthe entire source code for a dependency into your software's repository. This\nhas the advantage that you're always building against the same version of the\ndependency and you don't need to rely on a package repository, but it is more\neffort to upgrade dependencies.\n"
  },
  {
    "path": "_2019/program-introspection.md",
    "content": "---\nlayout: lecture\ntitle: \"Program Introspection\"\npresenter: Anish\nvideo:\n  aspect: 62.5\n  id: 74MhV-7hYzg\n---\n\n# Debugging\n\nWhen printf-debugging isn't good enough: use a debugger.\n\nDebuggers let you interact with the execution of a program, letting you do\nthings like:\n\n- halt execution of the program when it reaches a certain line\n- single-step through the program\n- inspect values of variables\n- many more advanced features\n\n## GDB/LLDB\n\n[GDB](https://www.gnu.org/software/gdb/) and [LLDB](https://lldb.llvm.org/).\nSupports many C-like languages.\n\nLet's look at [example.c](/2019/files/example.c). Compile with debug flags:\n`gcc -g -o example example.c`.\n\nOpen GDB:\n\n`gdb example`\n\nSome commands:\n\n- `run`\n- `b {name of function}` - set a breakpoint\n- `b {file}:{line}` - set a breakpoint\n- `c` - continue\n- `step` / `next` / `finish` - step in / step over / step out\n- `p {variable}` - print value of variable\n- `watch {expression}` - set a watchpoint that triggers when the value of the expression changes\n- `rwatch {expression}` - set a watchpoint that triggers when the value is read\n- `layout`\n\n## PDB\n\n[PDB](https://docs.python.org/3/library/pdb.html) is the Python debugger.\n\nInsert `import pdb; pdb.set_trace()` where you want to drop into PDB, basically\na hybrid of a debugger (like GDB) and a Python shell.\n\n## Web browser Developer Tools\n\nAnother example of a debugger, this time with a graphical interface.\n\n# strace\n\nObserve system calls a program makes: `strace {program}`.\n\n# Profiling\n\nTypes of profiling: CPU, memory, etc.\n\nSimplest profiler: `time`.\n\n## Go\n\nRun test code with CPU profiler: `go test -cpuprofile=cpu.out`\n\nAnalyze profile: `go tool pprof -web cpu.out`\n\nRun test code with Memory profiler: `go test -memprofile=mem.out`\n\nAnalyze profile: `go tool pprof -web mem.out`\n\n## Perf\n\nBasic performance stats: `perf stat {command}`\n\nRun a program with the profiler: `perf record {command}`\n\nAnalyze profile: `perf report`\n"
  },
  {
    "path": "_2019/remote-machines.md",
    "content": "---\nlayout: lecture\ntitle: \"Remote Machines\"\npresenter: Jose\nvideo:\n  aspect: 62.5\n  id: X5c2Y8BCowM\n---\n\nIt has become more and more common for programmers to use remote servers in their everyday work. If you need to use remote servers in order to deploy backend software or you need a server with higher computational capabilities, you will end up using a Secure Shell (SSH). As with most tools covered, SSH is highly configurable so it is worth learning about it.\n\n\n## Executing commands\n\nAn often overlooked feature of `ssh` is the ability to run commands directly.\n\n- `ssh foobar@server ls` will execute ls in the home folder of foobar\n- It works with pipes, so `ssh foobar@server ls | grep PATTERN` will grep locally the remote output of `ls` and `ls | ssh foobar@server grep PATTERN` will grep remotely the local output of `ls`.\n\n## SSH Keys\n\nKey-based authentication exploits public-key cryptography to prove to the server that the client owns the secret private key without revealing the key. This way you do not need to reenter your password every time. Nevertheless the private key (e.g. `~/.ssh/id_rsa`) is effectively your password so treat it like so.\n\n- Key generation. To generate a pair you can simply run `ssh-keygen -t rsa -b 4096`. If you do not choose a passphrase anyone that gets hold of your private key will be able to access authorized servers so it is recommended to choose  one and use `ssh-agent` to manage shell sessions.\n\nIf you have configured pushing to Github using SSH keys you have probably done the steps outlined [here](https://help.github.com/articles/connecting-to-github-with-ssh/) and have a valid pair already. To check if you have a passphrase and validate it you can run `ssh-keygen -y -f /path/to/key`.\n\n- Key based authentication. `ssh` will look into `.ssh/authorized_keys` to determine which clients it should let in. To copy a public key over we can use the\n\n```bash\ncat .ssh/id_dsa.pub | ssh foobar@remote 'cat >> ~/.ssh/authorized_keys'\n```\n\nA simpler solution can be achieved with `ssh-copy-id` where available.\n\n```bash\nssh-copy-id -i .ssh/id_dsa.pub foobar@remote\n```\n\n## Copying files over ssh\n\nThere are many ways to copy files over ssh\n\n- `ssh+tee`, the simplest is to use `ssh` command execution and stdin input by doing `cat localfile | ssh remote_server tee serverfile`\n- `scp` when copying large amounts of files/directories, the secure copy `scp` command is more convenient since it can easily recurse over paths. The syntax is `scp path/to/local_file remote_host:path/to/remote_file`\n- `rsync` improves upon `scp` by detecting identical files in local and remote and preventing copying them again. It also provides more fine grained control over symlinks, permissions and has extra features like the `--partial` flag that can resume from a previously interrupted copy. `rsync` has a similar syntax to `scp`.\n\n\n## Backgrounding processes\n\nBy default when interrupting a ssh connection, child processes of the parent shell are killed along with it. There are a couple of alternatives\n\n- `nohup` - the `nohup` tool effectively allows for a process to live when the terminal gets killed. Although this can sometimes be achieved with `&` and `disown`, nohup is a better default. More details can be found [here](https://unix.stackexchange.com/questions/3886/difference-between-nohup-disown-and).\n\n- `tmux`, `screen` - whereas `nohup` effectively backgrounds the process it is not convenient for interactive shell sessions. In that case using a terminal multiplexer like `screen` or `tmux` is a convenient choice since one can easily detach and reattach the associated shells.\n\nLastly, if you disown a program and want to reattach it to the current terminal, you can look into [reptyr](https://github.com/nelhage/reptyr). `reptyr PID` will grab the process with id PID and attach it to your current terminal.\n\n## Port Forwarding\n\nIn many scenarios you will run into software that works by listening to ports in the machine. When this happens in your local machine you can simply do `localhost:PORT` or `127.0.0.1:PORT`, but what do you do with a remote server that does not have its ports directly available through the network/internet?. This is called port forwarding and it\ncomes in two flavors: Local Port Forwarding and Remote Port Forwarding (see the pictures for more details, credit of the pictures from [this SO post](https://unix.stackexchange.com/questions/115897/whats-ssh-port-forwarding-and-whats-the-difference-between-ssh-local-and-remot)).\n\n\n**Local Port Forwarding**\n![Local Port Forwarding](https://i.stack.imgur.com/a28N8.png)\n\n**Remote Port Forwarding**\n![Remote Port Forwarding](https://i.stack.imgur.com/4iK3b.png)\n\n\nThe most common scenario is local port forwarding where a service in the remote machine listens in a port and you want to link a port in your local machine to forward to the remote port. For example if we execute  `jupyter notebook` in the remote server that listens to the port `8888`. Thus to forward that to the local port `9999` we would do `ssh -L 9999:localhost:8888 foobar@remote_server` and then navigate to `localhost:9999` in our local machine.\n\n## Graphics Forwarding\n\nSometimes forwarding ports is not enough since we want to run a GUI based program in the server. You can always resort to Remote Desktop Software that sends the entire Desktop Environment (ie. options like RealVNC, Teamviewer, &c). However for a single GUI tool, SSH provides a good alternative: Graphics Forwarding.\n\nUsing the `-X` flag tells SSH to forward\n\n For trusted X11 forwarding the `-Y` flag can be used.\n\nFinal note is that for this to work the `sshd_config` on the server must have the following options\n\n```bash\nX11Forwarding yes\nX11DisplayOffset 10\n```\n\n## Roaming\n\nA common pain when connecting to a remote server are disconnections due to shutting down/sleeping your computer or changing a network. Moreover if one has a connection with significant lag using ssh can become quite frustrating. [Mosh](https://mosh.org/), the mobile shell, improves upon ssh, allowing roaming connections, intermittent connectivity and providing intelligent local echo.\n\nMosh is present in all common distributions and package managers. Mosh requires an ssh server to be working in the server. You do not need to be superuser to install mosh  but it does require that ports 60000 through 60010 to be open in the server (they usually are since they are not in the privileged range).\n\nA downside of `mosh` is that is does not support roaming port/graphics forwarding so if you use those often `mosh` won't be of much help.\n\n## SSH Configuration\n\n#### Client\n\nWe have covered many many arguments that we can pass. A tempting alternative is to create shell aliases that look like `alias my_serer=\"ssh -X -i ~/.id_rsa -L 9999:localhost:8888 foobar@remote_server`, however there is a better alternative, using `~/.ssh/config`.\n\n```bash\nHost vm\n    User foobar\n    HostName 172.16.174.141\n    Port 22\n    IdentityFile ~/.ssh/id_rsa\n    RemoteForward 9999 localhost:8888\n\n# Configs can also take wildcards\nHost *.mit.edu\n    User foobaz\n```\n\n\nAn additional advantage of using the `~/.ssh/config` file over aliases  is that other programs like `scp`, `rsync`, `mosh`, &c are able to read it as well and convert the settings into the corresponding flags.\n\n\nNote that the `~/.ssh/config` file can be considered a dotfile, and in general it is fine for it to be included with the rest of your dotfiles. However if you make it public, think about the information that you are potentially providing strangers on the internet: the addresses of your servers, the users you are using, the open ports, &c. This may facilitate some types of attacks so be thoughtful about sharing your SSH configuration.\n\nWarning: Never include your RSA keys ( `~/.ssh/id_rsa*` ) in a public repository!\n\n#### Server side\n\nServer side configuration is usually specified in `/etc/ssh/sshd_config`. Here you can make  changes like disabling password authentication, changing ssh ports, enabling X11 forwarding, &c. You can specify config settings in a per user basis.\n\n## Remote Filesystem\n\nSometimes it is convenient to mount a remote folder. [sshfs](https://github.com/libfuse/sshfs) can mount a folder on a remote server\nlocally, and then you can use a local editor.\n\n## Exercises\n\n1. For SSH to work the host needs to be running an SSH server. Install an SSH server (such as OpenSSH) in a virtual machine so you can do the rest of the exercises. To figure out what is the ip of the machine run the command `ip addr` and look for the inet field (ignore the `127.0.0.1` entry, that corresponds to the loopback interface).\n\n1. Go to `~/.ssh/` and check if you have a pair of SSH keys there. If not, generate them with `ssh-keygen -t rsa -b 4096`. It is recommended that you use a password and use `ssh-agent` , more info [here](https://www.ssh.com/ssh/agent).\n\n1. Use `ssh-copy-id` to copy the key to your virtual machine. Test that you can ssh without a password. Then, edit your `sshd_config` in the server to disable password authentication by editing the value of `PasswordAuthentication`. Disable root login by editing the value of `PermitRootLogin`.\n\n1. Edit the `sshd_config` in the server to change the ssh port and check that you can still ssh. If you ever have a public facing server, a non default port and key only login will throttle a significant amount of malicious attacks.\n\n1. Install mosh in your server/VM, establish a connection and then disconnect the network adapter of the server/VM. Can mosh properly recover from it?\n\n1. Another use of local port forwarding is to tunnel certain host to the server. If your network filters some website like for example `reddit.com` you can tunnel it through the server as follows:\n\n    - Run `ssh remote_server -L 80:reddit.com:80`\n    - Set `reddit.com` and `www.reddit.com` to `127.0.0.1` in `/etc/hosts`\n    - Check that you are accessing that website through the server\n    - If it is not obvious use a website such as [ipinfo.io](https://ipinfo.io/) which will change depending on your host public ip.\n\n\n1. Background port forwarding can easily be achieved with a couple of extra flags. Look into what the `-N` and `-f` flags do in `ssh` and figure out what a command such as this `ssh -N -f -L 9999:localhost:8888 foobar@remote_server` does.\n\n\n## References\n\n- [SSH Hacks](http://matt.might.net/articles/ssh-hacks/)\n- [Secure Secure Shell](https://stribika.github.io/2015/01/04/secure-secure-shell.html)\n\n{% comment %}\nLecture notes will be available by the start of lecture.\n{% endcomment %}\n"
  },
  {
    "path": "_2019/security.md",
    "content": "---\nlayout: lecture\ntitle: \"Security and Privacy\"\npresenter: Jon\nvideo:\n  aspect: 56.25\n  id: OBx_c-i-M8s\n---\n\nThe world is a scary place, and everyone's out to get you.\n\nOkay, maybe not, but that doesn't mean you want to flaunt all your\nsecrets. Security (and privacy) is generally all about raising the bar\nfor attackers. Find out what your threat model is, and then design your\nsecurity mechanisms around that! If the threat model is the NSA or\nMossad, you're _probably_ going to have a bad time.\n\nThere are _many_ ways to make your technical persona more secure. We'll\ntouch on a lot of high-level things here, but this is a process, and\neducating yourself is one of the best things you can do. So:\n\n## Follow the Right People\n\nOne of the best ways to improve your security know-how is to follow\nother people who are vocal about security. Some suggestions:\n\n - [@TroyHunt](https://twitter.com/TroyHunt)\n - [@SwiftOnSecurity](https://twitter.com/SwiftOnSecurity)\n - [@taviso](https://twitter.com/taviso)\n - [@thegrugq](https://twitter.com/thegrugq)\n - [@tqbf](https://twitter.com/tqbf)\n - [@mattblaze](https://twitter.com/mattblaze)\n - [@moxie](https://twitter.com/moxie)\n\nSee also [this\nlist](https://heimdalsecurity.com/blog/best-twitter-cybersec-accounts/)\nfor more suggestions.\n\n## General Security Advice\n\nTech Solidarity has a pretty great list of [do's and don'ts for\njournalists](https://web.archive.org/web/20221123204419/https://techsolidarity.org/resources/basic_security.htm)\nthat has a lot of sane advice, and is decently up-to-date. [@thegrugq](https://medium.com/@thegrugq)\nalso has a good blog post on [travel security\nadvice](https://medium.com/@thegrugq/stop-fabricating-travel-security-advice-35259bf0e869)\nthat's worth reading. We'll repeat much of the advice from those sources\nhere, plus some more. Also, get a [USB data\nblocker](https://www.amazon.com/dp/B00QRRZ2QM/), because [USB is\nscary](https://www.bleepingcomputer.com/news/security/heres-a-list-of-29-different-types-of-usb-attacks/).\n\n## Authentication\n\nThe very first thing you should do, if you haven't already, is download\na password manager. Some good ones are:\n\n - [1password](https://1password.com/)\n - [KeePass](https://keepass.info/)\n - [BitWarden](https://bitwarden.com/)\n - [`pass`](https://git.zx2c4.com/password-store/about/)\n\nIf you're particularly paranoid, use one that encrypts the passwords\nlocally on your computer, as opposed to storing them in plain-text at\nthe server. Use it to generate passwords\nfor all the web sites you care about right now. Then, switch on\ntwo-factor authentication, ideally with a\n[FIDO/U2F](https://fidoalliance.org/) dongle (a\n[YubiKey](https://www.yubico.com/quiz/) for example, which has [20% off\nfor students](https://www.yubico.com/why-yubico/for-education/)). TOTP\n(like Google Authenticator or Duo) will also work in a pinch, but\n[doesn't protect against\nphishing](https://twitter.com/taviso/status/1082015009348104192). SMS is\npretty much useless unless your threat model only includes random\nstrangers picking up your password in transit.\n\nAlso, a note about paper keys. Often, services will give you a \"backup\nkey\" that you can use as a second factor if you lose your real second\nfactor (btw, always keep a backup dongle somewhere safe!). While you\n_can_ stick those in your password managers, that means that should\nsomeone get access to your password manager, you're totally hosed (but\nmaybe you're okay with that thread model). If you are truly paranoid,\nprint out these paper keys, never store them digitally, and place them\nin a safe in the real world.\n\n## Private Communication\n\nUse [Signal](https://www.signal.org/) ([setup\ninstructions](https://medium.com/@mshelton/signal-for-beginners-c6b44f76a1f0).\n[Wire](https://wire.com/en/) is [fine\ntoo](https://www.securemessagingapps.com/); WhatsApp is okay; [don't use\nTelegram](https://twitter.com/bascule/status/897187286554628096)).\nDesktop messengers are pretty broken (partially due to usually relying\non Electron, which is a huge trust stack).\n\nE-mail is particularly problematic, even if PGP signed. It's not\ngenerally forward-secure, and the key-distribution problem is pretty\nsevere. [keybase.io](https://keybase.io/) helps, and is useful for a\nnumber of other reasons. Also, PGP keys are generally handled on desktop\ncomputers, which is one of the least secure computing environments.\nRelatedly, consider getting a Chromebook, or just work on a tablet with\na keyboard.\n\n## File Security\n\nFile security is hard, and operates on many level. What is it you're\ntrying to secure against?\n\n[![$5 wrench](https://imgs.xkcd.com/comics/security.png)](https://xkcd.com/538/)\n\n - Offline attacks (someone steals your laptop while it's off): turn on\n   full disk encryption. ([cryptsetup +\n   LUKS](https://wiki.archlinux.org/index.php/Dm-crypt/Encrypting_a_non-root_file_system)\n   on Linux,\n   [BitLocker](https://fossbytes.com/enable-full-disk-encryption-windows-10/)\n   on Windows, [FileVault](https://support.apple.com/en-us/HT204837) on\n   macOS. Note that this won't help if the attacker _also_ has you and\n   really wants your secrets.\n - Online attacks (someone has your laptop and it's on): use file\n   encryption. There are two primary mechanisms for doing so\n    - Encrypted filesystems: stacked filesystem encryption software encrypts files individually rather than having encrypted block devices. You can \"mount\" these filesystems by providing the decryption key, and then browse the files inside it freely. When you unmount it, those files are all unavailable.  Modern solutions include [gocryptfs](https://github.com/rfjakob/gocryptfs) and [eCryptFS](http://ecryptfs.org/). More detailed comparisons can be found [here](https://nuetzlich.net/gocryptfs/comparison/) and [here](https://wiki.archlinux.org/index.php/disk_encryption#Comparison_table)\n    - Encrypted files: encrypt individual files with symmetric\n      encryption (see `gpg -c`) and a secret key. Or, like `pass`, also\n      encrypt the key with your public key so only you can read it back\n      later with your private key. Exact encryption settings matter a\n      lot!\n - [Plausible\n   deniability](https://en.wikipedia.org/wiki/Plausible_deniability)\n   (what seems to be the problem officer?): usually lower performance,\n   and easier to lose data. Hard to actually prove that it provides\n   [deniable\n   encryption](https://en.wikipedia.org/wiki/Deniable_encryption)! See\n   the [discussion\n   here](https://security.stackexchange.com/questions/135846/is-plausible-deniability-actually-feasible-for-encrypted-volumes-disks),\n   and then consider whether you may want to try\n   [VeraCrypt](https://www.veracrypt.fr/en/Home.html) (the maintained\n   fork of good ol' TrueCrypt).\n - Encrypted backups: use [Tarsnap](https://www.tarsnap.com/) or [Borgbase](https://www.borgbase.com/)\n    - Think about whether an attacker can delete your backups if they\n      get a hold of your laptop!\n\n## Internet Security & Privacy\n\nThe internet is a _very_ scary place. Open WiFi networks\n[are](https://www.troyhunt.com/the-beginners-guide-to-breaking-website/)\n[scary](https://www.troyhunt.com/talking-with-scott-hanselman-on/). Make\nsure you delete them afterwards, otherwise your phone will happily\nannounce and re-connect to something with the same name later!\n\nIf you're ever on a network you don't trust, a VPN _may_ be worthwhile,\nbut keep in mind that you're trusting the VPN provider _a lot_. Do you\nreally trust them more than your ISP? If you truly want a VPN, use a\nprovider you're sure you trust, and you should probably pay for it. Or\nset up [WireGuard](https://www.wireguard.com/) for yourself -- it's\n[excellent](https://web.archive.org/web/20210526211307/https://latacora.micro.blog/there-will-be/)!\n\nThere are also secure configuration settings for a lot of internet-enabled\napplications at [cipherlist.eu](https://cipherlist.eu/). If you're particularly\nprivacy-oriented, [privacytools.io](https://privacytools.io) is also a good\nresource.\n\nSome of you may wonder about [Tor](https://www.torproject.org/). Keep in\nmind that Tor is _not_ particularly resistant to powerful global\nattackers, and is weak against traffic analysis attacks. It may be\nuseful for hiding traffic on a small scale, but won't really buy you all\nthat much in terms of privacy. You're better off using more secure\nservices in the first place (Signal, TLS + certificate pinning, etc.).\n\n## Web Security\n\nSo, you want to go on the Web too?\nJeez, you're really pushing your luck here.\n\nInstall [HTTPS Everywhere](https://www.eff.org/https-everywhere).\nSSL/TLS is\n[critical](https://www.troyhunt.com/ssl-is-not-about-encryption/), and\nit's _not_ just about encryption, but also about being able to verify\nthat you're talking to the right service in the first place! If you run\nyour own web server, [test it](https://www.ssllabs.com/ssltest/index.html). TLS configuration\n[can get hairy](https://wiki.mozilla.org/Security/Server_Side_TLS).\nHTTPS Everywhere will do its very best to never navigate you to HTTP\nsites when there's an alternative. That doesn't save you, but it helps.\nIf you're truly paranoid, blacklist any SSL/TLS CAs that you don't\nabsolutely need.\n\nInstall [uBlock Origin](https://github.com/gorhill/uBlock). It is a\n[wide-spectrum\nblocker](https://github.com/gorhill/uBlock/wiki/Blocking-mode) that\ndoesn't just stop ads, but all sorts of third-party communication a page\nmay try to do. And inline scripts and such. If you're willing to spend\nsome time on configuration to make things work, go to [medium\nmode](https://github.com/gorhill/uBlock/wiki/Blocking-mode:-medium-mode)\nor even [hard\nmode](https://github.com/gorhill/uBlock/wiki/Blocking-mode:-hard-mode).\nThose _will_ make some sites not work until you've fiddled with the\nsettings enough, but will also significantly improve your online\nsecurity.\n\nIf you're using Firefox, enable [Multi-Account\nContainers](https://support.mozilla.org/en-US/kb/containers). Create\nseparate containers for social networks, banking, shopping, etc. Firefox\nwill keep the cookies and other state for each of the containers totally\nseparate, so sites you visit in one container can't snoop on sensitive\ndata from the others. In Google Chrome, you can use [Chrome\nProfiles](https://support.google.com/chrome/answer/2364824) to achieve\nsimilar results.\n\nExercises\n\nTODO\n\n1. Encrypt a file using PGP\n1. Use veracrypt to create a simple encrypted volume\n1. Enable 2FA for your most data sensitive accounts i.e. GMail, Dropbox, Github, &c\n"
  },
  {
    "path": "_2019/shell.md",
    "content": "---\nlayout: lecture\ntitle: \"Shell and Scripting\"\npresenter: Jon\nvideo:\n  aspect: 56.25\n  id: dbDRfmH5uSI\n---\n\nThe shell is an efficient, textual interface to your computer.\n\nThe shell prompt: what greets you when you open a terminal.\nLets you run programs and commands; common ones are:\n\n - `cd` to change directory\n - `ls` to list files and directories\n - `mv` and `cp` to move and copy files\n\nBut the shell lets you do _so_ much more; you can invoke any program on\nyour computer, and command-line tools exist for doing pretty much\nanything you may want to do. And they're often more efficient than their\ngraphical counterparts. We'll go through a bunch of those in this class.\n\nThe shell provides an interactive programming language (\"scripting\").\nThere are many shells:\n\n - You've probably used `sh` or `bash`.\n - Also shells that match languages: `csh`.\n - Or \"better\" shells: `fish`, `zsh`, `ksh`.\n\nIn this class we'll focus on the ubiquitous `sh` and `bash`, but feel\nfree to play around with others. I like `fish`.\n\nShell programming is a *very* useful tool in your toolbox.\nCan either write programs directly at the prompt, or into a file.\n`#!/bin/sh` + `chmod +x` to make shell executable.\n\n## Working with the shell\n\nRun a command a bunch of times:\n\n```bash\nfor i in $(seq 1 5); do echo hello; done\n```\n\nThere's a lot to unpack:\n\n - `for x in list; do BODY; done`\n   - `;` terminates a command -- equivalent to newline\n   - split `list`, assign each to `x`, and run body\n   - splitting is \"whitespace splitting\", which we'll get back to\n   - no curly braces in shell, so `do` + `done`\n - `$(seq 1 5)`\n   - run the program `seq` with arguments `1` and `5`\n   - substitute entire `$()` with the output of that program\n   - equivalent to\n     ```bash\n     for i in 1 2 3 4 5\n     ```\n - `echo hello`\n   - everything in a shell script is a command\n   - in this case, run the `echo` command, which prints its arguments\n     with the argument `hello`.\n   - all commands are searched for in `$PATH` (colon-separated)\n\nWe have variables:\n```bash\nfor f in $(ls); do echo $f; done\n```\n\nWill print each file name in the current directory.\nCan also set variables using `=` (no space!):\n\n```bash\nfoo=bar\necho $foo\n```\n\nThere are a bunch of \"special\" variables too:\n\n - `$1` to `$9`: arguments to the script\n - `$0` name of the script itself\n - `$#` number of arguments\n - `$$` process ID of current shell\n\nTo only print directories\n\n```bash\nfor f in $(ls); do if test -d $f; then echo dir $f; fi; done\n```\n\nMore to unpack here:\n\n - `if CONDITION; then BODY; fi`\n   - `CONDITION` is a command; if it returns with exit status 0\n     (success), then `BODY` is run.\n   - can also hook in an `else` or `elif`\n   - again, no curly braces, so `then` + `fi`\n - `test` is another program that provides various checks and\n   comparisons, and exits with 0 if they're true (`$?`)\n   - `man COMMAND` is your friend: `man test`\n   - can also be invoked with `[` + `]`: `[ -d $f ]`\n     - take a look at `man test` and `which \"[\"`\n\nBut wait! This is wrong! What if a file is called \"My Documents\"?\n\n - `for f in $(ls)` expands to `for f in My Documents`\n - first do the test on `My`, then on `Documents`\n - not what we wanted!\n - biggest source of bugs in shell scripts\n\n## Argument splitting\n\nBash splits arguments by whitespace; not always what you want!\n\n - need to use quoting to handle spaces in arguments\n   `for f in \"My Documents\"` would work correctly\n - same problem somewhere else -- do you see where?\n   `test -d $f`: if `$f` contains whitespace, `test` will error!\n - `echo` happens to be okay, because split + join by space\n   but what if a filename contains a newline?! turns into space!\n - quote all use of variables that you don't want split\n - but how do we fix our script above?\n   what does `for f in \"$(ls)\"` do do you think?\n\nGlobbing is the answer!\n\n - bash knows how to look for files using patterns:\n   - `*` any string of characters\n   - `?` any single character\n   - `{a,b,c}` any of these characters\n - `for f in *`: all files in this directory\n - when globbing, each matching file becomes its own argument\n   - still need to make sure to quote when _using_: `test -d \"$f\"`\n - can make advanced patterns:\n   - `for f in a*`: all files starting with `a` in the current directory\n   - `for f in foo/*.txt`: all `.txt` files in `foo`\n   - `for f in foo/*/p??.txt`\n     all three-letter text files starting with p in subdirs of `foo`\n\nWhitespace issues don't stop there:\n\n - `if [ $foo = \"bar\" ]; then` -- see the issue?\n - what if `$foo` is empty? arguments to `[` are `=` and `bar`...\n - _can_ work around this with `[ x$foo = \"xbar\" ]`, but bleh\n - instead, use `[[`: bash built-in comparator that has special parsing\n   - also allows `&&` instead of `-a`, `||` over `-o`, etc.\n\n<!-- TODO: arrays? $@. ${array[@]} vs \"${array[@]}\". -->\n\n## Composability\n\nShell is powerful in part because of composability. Can chain multiple\nprograms together rather than have one program that does everything.\n\nThe key character is `|` (pipe).\n\n - `a | b` means run both `a` and `b`\n   send all output of `a` as input to `b`\n   print the output of `b`\n\nAll programs you launch (\"processes\") have three \"streams\":\n\n - `STDIN`: when the program reads input, it comes from here\n - `STDOUT`: when the program prints something, it goes here\n - `STDERR`: a 2nd output the program can choose to use\n - by default, `STDIN` is your keyboard, `STDOUT` and `STDERR` are both\n   your terminal. but you can change that!\n   - `a | b` makes `STDOUT` of `a` `STDIN` of `b`.\n   - also have:\n     - `a > foo` (`STDOUT` of `a` goes to the file `foo`)\n     - `a 2> foo` (`STDERR` of `a` goes to the file `foo`)\n     - `a < foo` (`STDIN` of `a` is read from the file `foo`)\n     - hint: `tail -f` will print a file as it's being written\n - why is this useful? lets you manipulate output of a program!\n   - `ls | grep foo`: all files that contain the word `foo`\n   - `ps | grep foo`: all processes that contain the word `foo`\n   - `journalctl | grep -i intel | tail -n5`:\n     last 5 system log messages with the word intel (case insensitive)\n   - `who | sendmail -t me@example.com`\n     send the list of logged-in users to `me@example.com`\n   - forms the basis for much data-wrangling, as we'll cover later\n\nBash also provides a number of other ways to compose programs.\n\nYou can group commands with `(a; b) | tac`: run `a`, then `b`, and send\nall their output to `tac`, which prints its input in reverse order.\n\nA lesser-known, but super useful one is _process substitution_.\n`b <(a)` will run `a`, generate a temporary file-name for its output\nstream, and pass that file-name to `b`. For example:\n\n```bash\ndiff <(journalctl -b -1 | head -n20) <(journalctl -b -2 | head -n20)\n```\nwill show you the difference between the first 20 lines of the last boot\nlog and the one before that.\n\n<!-- TODO: exit codes? -->\n\n## Job and process control\n\nWhat if you want to run longer-term things in the background?\n\n - the `&` suffix runs a program \"in the background\"\n   - it will give you back your prompt immediately\n   - handy if you want to run two programs at the same time\n     like a server and client: `server & client`\n   - note that the running program still has your terminal as `STDOUT`!\n     try: `server > server.log & client`\n - see all such processes with `jobs`\n   - notice that it shows \"Running\"\n - bring it to the foreground with `fg %JOB` (no argument is latest)\n - if you want to background the current program: `^Z` + `bg` (Here `^Z` means pressing `Ctrl+Z`)\n   - `^Z` stops the current process and makes it a \"job\"\n   - `bg` runs the last job in the background (as if you did `&`)\n - background jobs are still tied to your current session, and exit if\n   you log out. `disown` lets you sever that connection. or use `nohup`.\n - `$!` is pid of last background process\n\n<!-- TODO: process output control (^S and ^Q)? -->\n\nWhat about other stuff running on your computer?\n\n - `ps` is your friend: lists running processes\n   - `ps -A`: print processes from all users (also `ps ax`)\n   - `ps` has *many* arguments: see `man ps`\n - `pgrep`: find processes by searching (like `ps -A | grep`)\n   - `pgrep -af`: search and display with arguments\n - `kill`: send a _signal_ to a process by ID (`pkill` by search + `-f`)\n   - signals tell a process to \"do something\"\n   - most common: `SIGKILL` (`-9` or `-KILL`): tell it to exit *now*\n     equivalent to `^\\`\n   - also `SIGTERM` (`-15` or `-TERM`): tell it to exit gracefully\n     equivalent to `^C`\n\n\n## Flags\n\nMost command line utilities take parameters using **flags**. Flags usually come in short form (`-h`) and long form (`--help`). Usually running `CMD -h` or `man CMD` will give you a list of the flags the program takes.\nShort flags can usually be combined, running `rm -r -f` is equivalent to running `rm -rf` or `rm -fr`.\nSome common flags are a de facto standard and you will seem them in many applications:\n\n* `-a` commonly refers to all files (i.e. also including those that start with a period)\n* `-f` usually refers to forcing something, like `rm -f`\n* `-h` displays the help for most commands\n* `-v` usually enables a verbose output\n* `-V` usually prints the version of the command\n\nAlso, a double dash `--` is used in built-in commands and many other commands to signify the end of command options, after which only positional parameters are accepted. So if you have a file called `-v` (which you can) and want to grep it `grep pattern -- -v` will work whereas `grep pattern -v` won't. In fact, one way to create such file is to do `touch -- -v`.\n\n## Exercises\n\n1. If you are completely new to the shell you may want to read a more comprehensive guide about it such as [BashGuide](http://mywiki.wooledge.org/BashGuide). If you want a more in-depth introduction [The Linux Command Line](http://linuxcommand.org/tlcl.php) is a good resource.\n\n1. **PATH, which, type**\n\n    We briefly discussed that the `PATH` environment variable is used to locate the programs that you run through the command line. Let's explore that a little further\n    - Run `echo $PATH` (or `echo $PATH | tr -s ':' '\\n'` for pretty printing) and examine its contents, what locations are listed?\n    - The command `which` locates a program in the user PATH. Try running `which` for common commands like `echo`, `ls` or `mv`. Note that `which` is a bit limited since it does not understand shell aliases. Try running `type` and `command -v` for those same commands. How is the output different?\n    - Run `PATH=` and try running the previous commands again, some work and some don't, can you figure out why?\n\n1. **Special Variables**\n    - What does the variable `~` expands as? What about `.`? And `..`?\n    - What does the variable `$?` do?\n    - What does the variable `$_` do?\n    - What does the variable `!!` expand to? What about `!!*`? And `!l`?\n    - Look for documentation for these options and familiarize yourself with them\n\n1. **xargs**\n\n    Sometimes piping doesn't quite work because the command being piped into does not expect the newline separated format. For example `file` command tells you properties of the file.\n\n    Try running `ls | file` and `ls | xargs file`. What is `xargs` doing?\n\n\n1. **Shebang**\n\n    When you write a script you can specify to your shell what interpreter should be used to interpret the script by using a [shebang](https://en.wikipedia.org/wiki/Shebang_(Unix)) line. Write a script called `hello` with the following contentsmake  it executable with `chmod +x hello`. Then execute it with `./hello`. Then remove the first line and execute it again? How is the shell using that first line?\n\n\n    ```bash\n      #! /usr/bin/python\n\n      print(\"Hello World!\")\n    ```\n\n    You will often see programs that have a shebang that looks like `#! usr/bin/env bash`. This is a more portable solution with it own set of [advantages and disadvantages](https://unix.stackexchange.com/questions/29608/why-is-it-better-to-use-usr-bin-env-name-instead-of-path-to-name-as-my). How is `env` different from `which`? What environment variable does `env` use to decide what program to run?\n\n\n1. **Pipes, process substitution, subshell**\n\n    Create a script called `slow_seq.sh` with the following contents and do `chmod +x slow_seq.sh` to make it executable.\n\n    ```bash\n      #! /usr/bin/env bash\n\n      for i in $(seq 1 10); do\n              echo $i;\n              sleep 1;\n      done\n    ```\n\n    There is a way in which pipes (and process substitution) differ from using subshell execution, i.e. `$()`. Run the following commands and observe the differences:\n\n    - `./slow_seq.sh | grep -P \"[3-6]\"`\n    - `grep -P \"[3-6]\" <(./slow_seq.sh)`\n    - `echo $(./slow_seq.sh) | grep -P \"[3-6]\"`\n\n\n1. **Misc**\n    - Try running `touch {a,b}{a,b}` then `ls` what did appear?\n    - Sometimes you want to keep STDIN and still pipe it to a file. Try running `echo HELLO | tee hello.txt`\n    - Try running `cat hello.txt > hello.txt ` what do you expect to happen? What does happen?\n    - Run `echo HELLO > hello.txt` and then run `echo WORLD >> hello.txt`. What are the contents of `hello.txt`? How is `>` different from `>>`?\n    - Run `printf \"\\e[38;5;81mfoo\\e[0m\\n\"`. How was the output different? If you want to know more, search for ANSI color escape sequences.\n    - Run `touch a.txt` then run `^txt^log` what did bash do for you? In the same vein, run `fc`. What does it do?\n\n{% comment %}\n\nTODO\n\n1. **parallel**\n- set -e, set -x\n- traps\n\n{% endcomment %}\n\n1. **Keyboard shortcuts**\n\n    As with any application you use frequently is worth familiarising yourself with its keyboard shortcuts. Type the following ones and try figuring out what they do and in what scenarios it might be convenient knowing about them. For some of them it might be easier searching online about what they do. (remember that `^X` means pressing `Ctrl+X`)\n\n    - `^A`, `^E`\n    - `^R`\n    - `^L`\n    - `^C`, `^\\` and  `^D`\n    - `^U` and `^Y`\n"
  },
  {
    "path": "_2019/version-control.md",
    "content": "---\nlayout: lecture\ntitle: \"Version Control\"\npresenter: Jon\nvideo:\n  aspect: 56.25\n  id: 3fig2Vz8QXs\n---\n\nWhenever you are working on something that changes over time, it's\nuseful to be able to _track_ those changes. This can be for a number of\nreasons: it gives you a record of what changed, how to undo it, who\nchanged it, and possibly even why. Version control systems (VCS) give\nyou that ability. They let you _commit_ changes to a set of files, along\nwith a message describing the change, as well as look at and undo\nchanges you've made in the past.\n\nMost VCS support sharing the commit history between multiple users. This\nallows for convenient collaboration: you can see the changes I've made,\nand I can see the changes you've made. And since the VCS tracks\n_changes_, it can often (though not always) figure out how to combine\nour changes as long as they touch relatively disjoint things.\n\nThere [_a\nlot_](https://en.wikipedia.org/wiki/Comparison_of_version-control_software)\nof VCSes out there that differ a lot in what they support, how they\nfunction, and how you interact with them. Here, we'll focus on\n[git](https://git-scm.com/), one of the more commonly used ones, but I\nrecommend you also take a look at\n[Mercurial](https://www.mercurial-scm.org/).\n\nWith that all said -- to the cliffnotes!\n\n## Is git dark magic?\n\nnot quite.. you need to understand the data model.\nwe're going to skip over some of the details, but roughly speaking,\nthe _core_ \"thing\" in git is a commit.\n\n - every commit has a unique name, \"revision hash\"\n   a long hash like `998622294a6c520db718867354bf98348ae3c7e2`\n   often shortened to a short (unique-ish) prefix: `9986222`\n - commit has author + commit message\n - also has the hash of any _ancestor commits_\n   usually just the hash of the previous commit\n - commit also represents a _diff_, a representation of how you get from\n   the commit's ancestors to the commit (e.g., remove this line in this\n   file, add these lines to this file, rename that file, etc.)\n   - in reality, git stores the full before and after state\n   - probably don't want to store big files that change!\n\ninitially, the _repository_ (roughly: the folder that git manages) has\nno content, and no commits. let's set that up:\n\n```console\n$ git init hackers\n$ cd hackers\n$ git status\n```\n\nthe output here actually gives us a good starting point. let's dig in\nand make sure we understand it all.\n\nfirst, \"On branch master\".\n\n - don't want to use hashes all the time.\n - branches are names that point to hashes.\n - master is traditionally the name for the \"latest\" commit.\n   every time a new commit is made, the master name will be made to\n   point to the new commit's hash.\n - special name `HEAD` refers to \"current\" name\n - you can also make your own names with `git branch` (or `git tag`)\n   we'll get back to that\n\nlet's skip over \"No commits yet\" because that's all there is to it.\n\nthen, \"nothing to commit\".\n\n - every commit contains a diff with all the changes you made.\n   but how is that diff constructed in the first place?\n - _could_ just always commit _all_ changes you've made since the last\n   commit\n   - sometimes you want to only commit some of them (e.g., not `TODO`s)\n   - sometimes you want to break up a change into multiple commits to\n     give a separate commit message for each one\n - git lets you _stage_ changes to construct a commit\n   - add changes to a file or files to the staged changes with `git add`\n     - add only some changes in a file with `git add -p`\n     - without argument `git add` operates on \"all known files\"\n   - remove a file and stage its removal with `git rm`\n   - empty the set of staged changes `git reset`\n     - note that this does *not* change any of your files!\n       it *only* means that no changes will be included in a commit\n     - to remove only some staged changes:\n       `git reset FILE` or `git reset -p`\n   - check staged changes with `git diff --staged`\n   - see remaining changes with `git diff`\n   - when you're happy with the stage, make a commit with `git commit`\n     - if you just want to commit *all* changes: `git commit -a`\n     - `git help add` has a bunch more helpful info\n\nwhile you're playing with the above, try to run `git status` to see what\ngit thinks you're doing -- it's surprisingly helpful!\n\n## A commit you say...\n\nokay, we have a commit, now what?\n\n - we can look at recent changes: `git log` (or `git log --oneline`)\n - we can look at the full changes: `git log -p`\n - we can show a particular commit: `git show master`\n   - or with `-p` for full diff/patch\n - we can go back to the state at a commit using `git checkout NAME`\n   - if `NAME` is a commit hash, git says we're \"detached\". this just\n     means there's no `NAME` that refers to this commit, so if we make\n     commits, no-one will know about them.\n - we can revert a change with `git revert NAME`\n   - applies the diff in the commit at `NAME` in reverse.\n - we can compare an older version to this one using `git diff NAME..`\n   - `a..b` is a commit _range_. if either is left out, it means `HEAD`.\n - we can show all the commits between using `git log NAME..`\n   - `-p` works here too\n - we can change `master` to point to a particular commit (effectively\n   undoing everything since) with `git reset NAME`:\n   - huh, why? wasn't `reset` to change staged changes?\n     reset has a \"second\" form (see `git help reset`) which sets `HEAD`\n     to the commit pointed to by the given name.\n   - notice that this didn't change any files -- `git diff` now\n     effectively shows `git diff NAME..`.\n\n## What's in a name?\n\nclearly, names are important in git. and they're the key to\nunderstanding *a lot* of what goes on in git. so far, we've talked about\ncommit hashes, master, and `HEAD`. but there's more!\n\n - you can make your own branches (like master) with `git branch b`\n   - creates a new name, `b`, which points to the commit at `HEAD`\n   - you're still \"on\" master though, so if you make a new commit,\n     master will point to that new commit, `b` will not.\n   - switch to a branch with `git checkout b`\n     - any commits you make will now update the `b` name\n     - switch back to master with `git checkout master`\n       - all your changes in `b` are hidden away\n     - a very handy way to be able to easily test out changes\n - tags are other names that never change, and that have their own\n   message. often used to mark releases + changelogs.\n - `NAME^` means \"the commit before `NAME`\n   - can apply recursively: `NAME^^^`\n   - you _most likely_ mean `~` when you use `~`\n     - `~` is \"temporal\", whereas `^` goes by ancestors\n     - `~~` is the same as `^^`\n     - with `~` you can also write `X~3` for \"3 commits older than `X`\n     - you don't want `^3`\n   - `git diff HEAD^`\n - `-` means \"the previous name\"\n - most commands operate on `HEAD` unless you give another argument\n\n## Clean up your mess\n\nyour commit history will _very_ often end up as:\n\n - `add feature x` -- maybe even with a commit message about `x`!\n - `forgot to add file`\n - `fix bug`\n - `typo`\n - `typo2`\n - `actually fix`\n - `actually actually fix`\n - `tests pass`\n - `fix example code`\n - `typo`\n - `x`\n - `x`\n - `x`\n - `x`\n\nthat's _fine_ as far as git is concerned, but is not very helpful to\nyour future self, or to other people who are curious about what has\nchanged. git lets you clean up these things:\n\n - `git commit --amend`: fold staged changes into previous commit\n   - note that this _changes_ the previous commit, giving it a new hash!\n - `git rebase -i HEAD~13` is _magical_.\n   for each commit from past 13, choose what to do:\n   - default is `pick`; do nothing\n   - `r`: change commit message\n   - `e`: change commit (add or remove files)\n   - `s`: combine commit with previous and edit commit message\n   - `f`: \"fixup\" -- combine commit with previous; discard commit msg\n   - at the end, `HEAD` is made to point to what is now the last commit\n   - often referred to as _squashing_ commits\n   - what it really does: rewind `HEAD` to rebase start point, then\n     re-apply the commits in order as directed.\n - `git reset --hard NAME`: reset the state of all files to that of\n   `NAME` (or `HEAD` if no name is given). handy for undoing changes.\n\n## Playing with others\n\na common use-case for version control is to allow multiple people to\nmake changes to a set of files without stepping on each other's toes.\nor rather, to make sure that _if_ they step on each other's toes, they\nwon't just silently overwrite each other's changes.\n\ngit is a _distributed_ VCS: everyone has a local copy of the entire\nrepository (well, of everything others have chosen to publish). some\nVCSes are _centralized_ (e.g., subversion): a server has all the\ncommits, clients only have the files they have \"checked out\". basically,\nthey only have the _current_ files, and need to ask the server if they\nwant anything else.\n\nevery copy of a git repository can be listed as a \"remote\". you can copy\nan existing git repository using `git clone ADDRESS` (instead of `git\ninit`). this creates a remote called _origin_ that points to `ADDRESS`.\nyou can fetch names and the commits they point to from a remote with\n`git fetch REMOTE`. all names at a remote are available to you as\n`REMOTE/NAME`, and you can use them just like local names.\n\nif you have write access to a remote, you can change names at the remote\nto point to commits you've made using `git push`. for example, let's\nmake the master name (branch) at the remote `origin` point to the commit\nthat our master branch currently points to:\n\n   - `git push origin master:master`\n   - for convenience, you can set `origin/master` as the default target\n     for when you `git push` from the current branch with `-u`\n   - consider: what does this do? `git push origin master:HEAD^`\n\noften you'll use GitHub, GitLab, BitBucket, or something else as your\nremote. there's nothing \"special\" about that as far as git is concerned.\nit's all just names and commits. if someone makes a change to master and\nupdates `github/master` to point to their commit (we'll get back to\nthat in a second), then when you `git fetch github`, you'll be able to\nsee their changes with `git log github/master`.\n\n## Working with others\n\nso far, branches seem pretty useless: you can create them, do work on\nthem, but then what? eventually, you'll just make master point to them\nanyway, right?\n\n - what if you had to fix something while working on a big feature?\n - what if someone else made a change to master in the meantime?\n\ninevitably, you will have to _merge_ changes in one branch with changes\nin another, whether those changes are made by you or someone else. git\nlets you do this with, unsurprisingly, `git merge NAME`. `merge` will:\n\n - look for the latest point where `HEAD` and `NAME` shared a commit\n   ancestor (i.e., where they diverged)\n - (try to) apply all those changes to the current `HEAD`\n - produce a commit that contains all those changes, and lists both\n   `HEAD` and `NAME` as its ancestors\n - set `HEAD` to that commit's hash\n\nonce your big feature has been finished, you can merge its branch into\nmaster, and git will ensure that you don't lose any changes from either\nbranch!\n\nif you've used git in the past, you may recognize `merge` by a different\nname: `pull`. when you do `git pull REMOTE BRANCH`, that is:\n\n - `git fetch REMOTE`\n - `git merge REMOTE/BRANCH`\n - where, like `push`, `REMOTE` and `BRANCH` are often omitted and use\n   the \"tracking\" remote branch (remember `-u`?)\n\nthis usually works _great_. as long as the changes to the branches being\nmerged are disjoint. if they are not, you get a _merge conflict_. sounds\nscary...\n\n - a merge conflict is just git telling you that it doesn't know what\n   the final diff should look like\n - git pauses and asks you to finish staging the \"merge commit\"\n - open the conflicted file in your editor and look for lots of angle\n   brackets (`<<<<<<<`). the stuff above `=======` is the change made in\n   the `HEAD` since the shared ancestor commit. the stuff below is the\n   change made in the `NAME` since the shared commit.\n - `git mergetool` is pretty handy -- opens a diff editor\n - once you've _resolved_ the conflict by figuring out what the file\n   should now look like, stage those changes with `git add`.\n - when all the conflicts are resolved, finish with `git commit`\n   - you can give up with `git merge --abort`\n\nyou've just resolved your first git merge conflict! \\o/\nnow you can publish your finished changes with `git push`\n\n## When worlds collide\n\nwhen you `push`, git checks that no-one else's work is lost if you\nupdate the remote name you're pushing too. it does this by checking\nthat the current commit of the remote name is an ancestor of the commit\nyou are pushing. if it is, git can safely just update the name; this is\ncalled _fast-forwarding_. if it is not, git will refuse to update the\nremote name, and tell you there have been changes.\n\nif your push is rejected, what do you do?\n\n - merge remote changes with `git pull` (i.e., `fetch` + `merge`)\n - force the push with `--force`: this will lose other people's changes!\n   - there's also `--force-with-lease`, which will only force the change\n     if the remote name hasn't changed since the last time you fetched\n     from that remote. much safer!\n   - if you've rebased local commits that you've previously pushed\n     (\"history rewriting\"; probably don't do this), you'll have to force\n     push. think about why!\n - try to re-apply your changes \"on top of\" the changes made remotely\n   - this is a `rebase`!\n     - rewind all local commits since shared ancestor\n     - fast-forward `HEAD` to commit at remote name\n     - apply local commits in-order\n       - may have conflicts you have to manually resolve\n       - `git rebase --continue` or `--abort`\n     - lots more [here](https://git-scm.com/book/en/v2/Git-Branching-Rebasing)\n   - `git pull --rebase` will start this process for you\n   - whether you should merge or rebase is a hot topic! some good reads:\n     - [this](https://www.atlassian.com/git/tutorials/merging-vs-rebasing)\n     - [this](http://web.archive.org/web/20210106220723/https://derekgourlay.com/blog/git-when-to-merge-vs-when-to-rebase/)\n     - [this](https://stackoverflow.com/questions/804115/when-do-you-use-git-rebase-instead-of-git-merge)\n\n# Further reading\n\n[![XKCD on git](https://imgs.xkcd.com/comics/git.png)](https://xkcd.com/1597/)\n\n - [Learn git branching](https://learngitbranching.js.org/)\n - [How to explain git in simple words](https://smusamashah.github.io/blog/2017/10/14/explain-git-in-simple-words)\n - [Git from the bottom up](https://jwiegley.github.io/git-from-the-bottom-up/)\n - [Git for computer scientists](http://eagain.net/articles/git-for-computer-scientists/)\n - [Oh shit, git!](https://ohshitgit.com/)\n - [The Pro Git book](https://git-scm.com/book/en/v2)\n\n# Exercises\n\n1. On a repo try modifying an existing file. What happens when you do `git stash`? What do you see when running `git log --all --oneline`? Run `git stash pop` to undo what you did with `git stash`. In what scenario might this be useful?\n\n1. One common mistake when learning git is to commit large files that should not be managed by git or adding sensitive information. Try adding a file to a repository, making some commits and then deleting that file from history (you may want to look at [this](https://help.github.com/articles/removing-sensitive-data-from-a-repository/)). Also if you do want git to manage large  files for you, look into [Git-LFS](https://git-lfs.github.com/)\n\n1. Git is really convenient for undoing changes but one has to be familiar even with the most unlikely changes\n   1. If a file is mistakenly modified in some commit it can be reverted with `git revert`. However if a commit involves several changes `revert` might not be the best option. How can we use `git checkout` to recover a file version from a specific commit?\n   1. Create a branch, make a commit in said branch and then delete it. Can you still recover said commit? Try looking into `git reflog`. (Note: Recover dangling things quickly, git will periodically automatically clean up commits that nothing points to.)\n   1. If one is too trigger happy with `git reset --hard` instead of `git reset` changes can be easily lost. However since the changes were staged, we can recover them. (look into `git fsck --lost-found` and `.git/lost-found`)\n\n1. In any git repo look under the folder `.git/hooks` you will find a bunch of scripts that end with `.sample`. If you rename them without the `.sample` they will run based on their name. For instance `pre-commit` will execute before doing a commit. Experiment with them\n\n1. Like many command line tools `git` provides a configuration file (or dotfile) called `~/.gitconfig` . Create and alias using `~/.gitconfig` so that when you run `git graph` you get the output of `git log --oneline --decorate --all --graph` (this is a good command to quickly visualize the commit graph)\n\n1. Git also lets you define global ignore patterns under `~/.gitignore_global`, this is useful to prevent common errors like adding RSA keys. Create a `~/.gitignore_global` file and add the pattern `*rsa`, then test that it works in a repo.\n\n1. Once you start to get more familiar with `git`, you will find yourself running into common tasks, such as editing your `.gitignore`. [git extras](https://github.com/tj/git-extras/blob/master/Commands.md) provides a bunch of little utilities that integrate with `git`. For example `git ignore PATTERN` will add the specified pattern to the `.gitignore` file in your repo and `git ignore-io LANGUAGE` will fetch the common ignore patterns for that language from [gitignore.io](https://www.gitignore.io). Install `git extras` and try using some tools like `git alias` or `git ignore`.\n\n1. Git GUI programs can be a great resource sometimes. Try running [gitk](https://git-scm.com/docs/gitk) in a git repo an explore the different parts of the interface. Then run `gitk --all` what are the differences?\n\n1. Once you get used to command line applications GUI tools can feel cumbersome/bloated. A nice compromise between the two are ncurses based tools which can be navigated from the command line and still provide an interactive interface. Git has [tig](https://github.com/jonas/tig), try installing it and running it in a repo. You can find some usage examples [here](https://www.atlassian.com/blog/git/git-tig).\n\n\n{% comment %}\n\n - forced push + `--force-with-lease`\n - git merge/rebase --abort\n - git blame\n - exercise about why rebasing public commits is bad\n\n{% endcomment %}\n"
  },
  {
    "path": "_2019/virtual-machines.md",
    "content": "---\nlayout: lecture\ntitle: \"Virtual Machines and Containers\"\npresenter: Anish, Jon\nvideo:\n  aspect: 56.25\n  id: LJ9ki5zq6Ik\n---\n\n# Virtual Machines\n\nVirtual machines are simulated computers. You can configure a guest virtual\nmachine with some operating system and configuration and use it without\naffecting your host environment.\n\nFor this class, you can use VMs to experiment with operating systems, software,\nand configurations without risk: you won't affect your primary development\nenvironment.\n\nIn general, VMs have lots of uses. They are commonly used for running software\nthat only runs on a certain operating system (e.g. using a Windows VM on Linux\nto run Windows-specific software). They are often used for experimenting with\npotentially malicious software.\n\n## Useful features\n\n- **Isolation**: hypervisors do a pretty good job of isolating the guest from\nthe host, so you can use VMs to run buggy or untrusted software reasonably\nsafely.\n\n- **Snapshots**: you can take \"snapshots\" of your virtual machine, capturing\nthe entire machine state (disk, memory, etc.), make changes to your machine,\nand then restore to an earlier state. This is useful for testing out\npotentially destructive actions, among other things.\n\n## Disadvantages\n\nVirtual machines are generally slower than running on bare metal, so they may\nbe unsuitable for certain applications.\n\n## Setup\n\n- **Resources**: shared with host machine; be aware of this when allocating\nphysical resources.\n\n- **Networking**: many options, default NAT should work fine for most use\ncases.\n\n- **Guest addons**: many hypervisors can install software in the guest to\nenable nicer integration with host system. You should use this if you can.\n\n## Resources\n\n- Hypervisors\n    - [VirtualBox](https://www.virtualbox.org/) (open-source)\n    - [Virt-manager](https://virt-manager.org/) (open-source, manages KVM virtual machines and LXC containers)\n    - [VMWare](https://www.vmware.com/) (commercial, available from IS&T [for\n    MIT students](https://ist.mit.edu/vmware-fusion))\n\nIf you are already familiar with popular hypervisors/VMs you may want to learn more about how to do this from a command line friendly way. One option is the [libvirt](https://wiki.libvirt.org/page/UbuntuKVMWalkthrough) toolkit which allows you to manage multiple different virtualization providers/hypervisors.\n\n## Exercises\n\n1. Download and install a hypervisor.\n\n1. Create a new virtual machine and install a Linux distribution (e.g.\n[Debian](https://www.debian.org/)).\n\n1. Experiment with snapshots. Try things that you've always wanted to try, like\n   running `sudo rm -rf --no-preserve-root /`, and see if you can recover\n   easily.\n\n1. Read what a [fork-bomb](https://en.wikipedia.org/wiki/Fork_bomb) (`:(){ :|:& };:`) is and run it on the VM to see that the resource isolation (CPU, Memory, &c) works.\n\n1. Install guest addons and experiment with different windowing modes, file\n   sharing, and other features.\n\n# Containers\n\nVirtual Machines are relatively heavy-weight; what if you want to spin\nup machines in an automated fashion? Enter containers!\n\n - Amazon Firecracker\n - Docker\n - rkt\n - lxc\n\nContainers are _mostly_ just an assembly of various Linux security\nfeatures, like virtual file system, virtual network interfaces, chroots,\nvirtual memory tricks, and the like, that together give the appearance\nof virtualization.\n\nNot quite as secure or isolated as a VM, but pretty close and getting\nbetter. Usually higher performance, and much faster to start, but not\nalways.\n\nThe performance boost comes from the fact that unlike VMs which run an entire copy of the operating system, containers share the linux kernel with the host. However note that if you are running linux containers on Windows/macOS a Linux VM will need to be active as a middle layer between the two.\n\n![Docker vs VM](/2019/files/containers-vs-vms.png)\n_Comparison between Docker containers and Virtual Machines. Credit: blog.docker.com_\n\nContainers are handy for when you want to run an automated task in a\nstandardized setup:\n\n - Build systems\n - Development environments\n - Pre-packaged servers\n - Running untrusted programs\n   - Grading student submissions\n   - (Some) cloud computing\n - Continuous integration\n   - Travis CI\n   - GitHub Actions\n\nMoreover, container software like Docker has also been extensively used as a solution for [dependency hell](https://en.wikipedia.org/wiki/Dependency_hell). If a machine needs to be running many services with conflicting dependencies they can be isolated using containers.\n\nUsually, you write a file that defines how to construct your container.\nYou start with some minimal _base image_ (like Alpine Linux), and then\na list of commands to run to set up the environment you want (install\npackages, copy files, build stuff, write config files, etc.). Normally,\nthere's also a way to specify any external ports that should be\navailable, and an _entrypoint_ that dictates what command should be run\nwhen the container is started (like a grading script).\n\nIn a similar fashion to code repository websites (like [GitHub](https://github.com/)) there are some container repository websites (like [DockerHub](https://hub.docker.com/))where many software services have prebuilt images that one can easily deploy.\n\n## Exercises\n\n1. Choose a container software (Docker, LXC, …) and install a simple Linux image. Try SSHing into it.\n\n1. Search and download a prebuilt container image for a popular web server (nginx, apache, …)\n"
  },
  {
    "path": "_2019/web.md",
    "content": "---\nlayout: lecture\ntitle: \"Web and Browsers\"\npresenter: Jose\nvideo:\n  aspect: 62.5\n  id: XpZO3S8odec\n---\n\nApart from the terminal, the web browser is a tool you will find yourself spending significant amounts of time into. Thus it is worth learning how to use it efficiently and\n\n## Shortcuts\n\nClicking around in your browser is often not the fastest option, getting familiar with common shortcuts can really pay off in the long run.\n\n- `Middle Button Click` in a link opens it in a new tab\n- `Ctrl+T` Opens a new tab\n- `Ctrl+Shift+T` Reopens a recently closed tab\n- `Ctrl+L` selects the contents of the search bar\n- `Ctrl+F` to search within a webpage. If you do this often, you may benefit from an extension that supports regular expressions in searches.\n\n\n## Search operators\n\nWeb search engines like Google or DuckDuckGo provide search operators to enable more elaborate web searches:\n\n- `\"bar foo\"` enforces an exact match of bar foo\n- `foo site:bar.com` searches for foo within bar.com\n- `foo -bar ` excludes the terms containing bar from the search\n- `foobar filetype:pdf` Searches for files of that extension\n- `(foo|bar)` searches for matches that have foo OR bar\n\nMore through lists are available for popular engines like [Google](https://ahrefs.com/blog/google-advanced-search-operators/) and [DuckDuckGo](https://duck.co/help/results/syntax)\n\n\n## Searchbar\n\nThe searchbar is a powerful tool too. Most browsers can infer search engines from websites and will store them. By editing the keyword argument\n\n- In Google Chrome they are in [chrome://settings/searchEngines](chrome://settings/searchEngines)\n- In Firefox they are in [about:preferences#search](about:preferences#search)\n\nFor example you can make so that `y SOME SEARCH TERMS` to directly search in youtube.\n\nMoreover, if you own a domain you can setup subdomain forwards using your registrar. For instance I have mapped `https://ht.josejg.com` to this course website. That way I can just type `ht.` and the searchbar will autocomplete. Another good feature of this setup is that unlike bookmarks they will work in every browser.\n\n## Privacy extensions\n\nNowadays surfing the web can get quite annoying due to ads and invasive due to trackers. Moreover a good adblocker not only blocks most ad content but it will also block sketchy and malicious websites since they will be included in the common blacklists. They will also reduce page load times sometimes by reducing the amount of requests performed. A couple of recommendations are:\n\n- **uBlock origin** ([Chrome](https://chrome.google.com/webstore/detail/ublock-origin/cjpalhdlnbpafiamejdnhcphjbkeiagm), [Firefox](https://addons.mozilla.org/en-US/firefox/addon/ublock-origin/)): block ads and trackers based on predefined rules. You should also consider taking a look at the enabled blacklists in settings since you can enable more based on your region or browsing habits. You can even install filters from [around the web](https://github.com/gorhill/uBlock/wiki/Filter-lists-from-around-the-web)\n\n- **[Privacy Badger](https://privacybadger.org/)**: detects and blocks trackers automatically. For example when you go from website to website ad companies track which sites you visit and build a profile of you\n\n- **[HTTPS everywhere](https://www.eff.org/https-everywhere)** is a wonderful extension that redirects to HTTPS version of a website automatically, if available.\n\nYou can find about more addons of this kind [here](https://www.privacytools.io/privacy-browser-addons/)\n\n## Style customization\n\nWeb browsers are just another piece of software running in _your machine_ and thus you usually have the last say about what they should display or how they should behave. An example of this are custom styles. Browsers determine how to render the style of a  webpage using Cascading Style Sheets often abbreviated as CSS.\n\nYou can access the source code of a website by inspecting it and changing its contents and styles temporarily (this is also a reason why you should never trust webpage screenshots).\n\nIf you want to permanently tell your browser to override the style settings for a webpage you will need to use an extension. Our recommendation is **[Stylus](https://github.com/openstyles/stylus)** ([Firefox](https://addons.mozilla.org/en-US/firefox/addon/styl-us/), [Chrome](https://chrome.google.com/webstore/detail/stylus/clngdbkpkpeebahjckkjfobafhncgmne?hl=en)).\n\n\nFor example, we can write the following style for the class website\n\n\n```css\n\nbody {\n    background-color: #2d2d2d;\n    color: #eee;\n    font-family: Fira Code;\n    font-size: 16pt;\n}\n\na:link {\n    text-decoration: none;\n    color: #0a0;\n}\n```\n\nMoreover, Stylus can find styles written by other users and published in [userstyles.org](https://userstyles.org/). Most common websites have one or several dark theme stylesheets for instance. FYI, you should not use Stylish since it was shown to leak user data, more [here](https://arstechnica.com/information-technology/2018/07/stylish-extension-with-2m-downloads-banished-for-tracking-every-site-visit/)\n\n\n## Functionality Customization\n\nIn the same way that you can modify the style, you can also modify the behaviour of a website by writing custom javascript and them sourcing it using a web browser extension such as [Tampermonkey](https://tampermonkey.net/)\n\nFor example the following script enables vim-like navigation using the J and K keys.\n\n```js\n// ==UserScript==\n// @name         VIM HT\n// @namespace    http://tampermonkey.net/\n// @version      0.1\n// @description  Vim JK for our website\n// @author       You\n// @match        https://hacker-tools.github.io/*\n// @grant        none\n// ==/UserScript==\n\n\n(function() {\n    'use strict';\n\n    window.onkeyup = function(e) {\n        var key = e.keyCode ? e.keyCode : e.which;\n\n        if (key == 74) { // J is key 74\n            window.scrollBy(0,500);;\n        }else if (key == 75) { // K is key 75\n            window.scrollBy(0,-500);;\n        }\n    }\n})();\n```\n\nThere are also script repositories such as [OpenUserJS](https://openuserjs.org/) and [Greasy Fork](https://greasyfork.org/en). However, be warned, installing user scripts from others can be very dangerous since they can pretty much do anything such as steal your credit card numbers. Never install a script unless you read the whole thing yourself, understand what it does, and are absolutely sure that you know it isn't doing anything suspicious. Never install a script that contains minified or obfuscated code that you can't read!\n\n## Web APIs\n\nIt has become more and more common for webservices to offer an application interface aka web API so you can interact with the services making web requests.\nA more in depth introduction to the topic can be found [here](https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Client-side_web_APIs/Introduction). There are [many public APIs](https://github.com/toddmotto/public-apis). Web APIs can be useful for very many reasons:\n\n- **Retrieval**. Web APIs can quite easily provide you information such as maps, weather or what your public ip address. For instance `curl ipinfo.io` will return a JSON object with some details about your public ip, region, location, &c. With proper parsing these tools can be integrated even with command line tools. The following bash functions talks to Googles autocompletion API and returns the first ten matches.\n\n```bash\nfunction c() {\n    url='https://www.google.com/complete/search?client=hp&hl=en&xhr=t'\n    # NB: user-agent must be specified to get back UTF-8 data!\n    curl -H 'user-agent: Mozilla/5.0' -sSG --data-urlencode \"q=$*\" \"$url\" |\n        jq -r \".[1][][0]\" |\n        sed 's,</\\?b>,,g'\n}\n```\n\n- **Interaction**. Web API endpoints can also be used to trigger actions. These usually require some sort of authentication token that you can obtain through the service. For example performing the following\n`curl -X POST -H 'Content-type: application/json' --data '{\"text\":\"Hello, World!\"}' \"https://hooks.slack.com/services/$SLACK_TOKEN\"` will send a `Hello, World!` message in a channel.\n\n- **Piping**. Since some services with web APIs are rather popular, common web API \"gluing\" has already been implemented and is provided with server included. This is the case for services like [If This Then That](https://ifttt.com/) and [Zapier](https://zapier.com/)\n\n\n## Web Automation\n\nSometimes web APIs are not enough. If only reading is needed you can use a html parser like `pup` or use a library, for example python has BeautifulSoup. However if interactivity or javascript execution is required those solutions fall short. WebDriver\n\n\nFor example, the following script will save the specified url using the wayback machine simulating the interaction of typing the website.\n\n```python\nfrom selenium.webdriver import Firefox\nfrom selenium.webdriver.common.keys import Keys\n\n\ndef snapshot_wayback(driver, url):\n\n    driver.get(\"https://web.archive.org/\")\n    elem = driver.find_element_by_class_name('web-save-url-input')\n    elem.clear()\n    elem.send_keys(url)\n    elem.send_keys(Keys.RETURN)\n    driver.close()\n\n\ndriver = Firefox()\nurl = 'https://hacker-tools.github.io'\nsnapshot_wayback(driver, url)\n```\n\n\n## Exercises\n\n1. Edit a keyword search engine that you use often in your web browser\n1. Install the mentioned extensions. Look into how uBlock Origin/Privacy Badger can be disabled for a website. What differences do you see? Try doing it in a website with plenty of ads like YouTube.\n1. Install Stylus and write a custom style for the class website using the CSS provided. Here are some common programming characters `=   ==   ===   >=   =>   ++   /=   ~=`. What happens to them when changing the font to Fira Code? If you want to know more search for programming font ligatures.\n1. Find a web api to get the weather in your city/area.\n1. Use a WebDriver software like [Selenium](https://docs.seleniumhq.org/) to automate some repetitive manual task that you perform often with your browser.\n\n\n"
  },
  {
    "path": "_2020/command-line.md",
    "content": "---\nlayout: lecture\ntitle: \"命令行环境\"\ndate: 2020-01-21\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n    aspect: 56.25\n    id: e8BO_dYxk5c\nsolution:\n    ready: true\n    url: command-line-solution\n---\n\n当您使用 shell 进行工作时，可以使用一些方法改善您的工作流，本节课我们就来讨论这些方法。\n\n我们已经使用 shell 一段时间了，但是到目前为止我们的关注点主要集中在使用不同的命令上面。现在，我们将会学习如何同时执行多个不同的进程并追踪它们的状态、如何停止或暂停某个进程以及如何使进程在后台运行。\n\n我们还将学习一些能够改善您的 shell 及其他工具的工作流的方法，这主要是通过定义别名或基于配置文件对其进行配置来实现的。这些方法都可以帮您节省大量的时间。例如，仅需要执行一些简单的命令，我们就可以在所有的主机上使用相同的配置。我们还会学习如何使用 SSH 操作远端机器。\n\n# 任务控制\n\n某些情况下我们需要中断正在执行的任务，比如当一个命令需要执行很长时间才能完成时（假设我们在使用 `find` 搜索一个非常大的目录结构）。大多数情况下，我们可以使用 `Ctrl-C` 来停止命令的执行。但是它的工作原理是什么呢？为什么有的时候会无法结束进程？\n\n## 结束进程\n\n您的 shell 会使用 UNIX 提供的信号机制执行进程间通信。当一个进程接收到信号时，它会停止执行、处理该信号并基于信号传递的信息来改变其执行。就这一点而言，信号是一种 _软件中断_。\n\n在上面的例子中，当我们输入 `Ctrl-C` 时，shell 会发送一个 `SIGINT` 信号到进程。\n\n下面这个 Python 程序向您展示了捕获信号 `SIGINT` 并忽略它的基本操作，它并不会让程序停止。为了停止这个程序，我们需要使用 `SIGQUIT` 信号，通过输入 `Ctrl-\\` 可以发送该信号。\n\n```python\n#!/usr/bin/env python\nimport signal, time\n\ndef handler(signum, time):\n    print(\"\\nI got a SIGINT, but I am not stopping\")\n\nsignal.signal(signal.SIGINT, handler)\ni = 0\nwhile True:\n    time.sleep(.1)\n    print(\"\\r{}\".format(i), end=\"\")\n    i += 1\n```\n\n如果我们向这个程序发送两次 `SIGINT` ，然后再发送一次 `SIGQUIT`，程序会有什么反应？注意 `^` 是我们在终端输入 `Ctrl` 时的表示形式：\n\n```\n$ python sigint.py\n24^C\nI got a SIGINT, but I am not stopping\n26^C\nI got a SIGINT, but I am not stopping\n30^\\[1]    39913 quit       python sigint.pyƒ\n```\n\n尽管 `SIGINT` 和 `SIGQUIT` 都常常用来发出和终止程序相关的请求。`SIGTERM` 则是一个更加通用的、也更加优雅地退出信号。为了发出这个信号我们需要使用 [`kill`](https://www.man7.org/linux/man-pages/man1/kill.1.html) 命令, 它的语法是： `kill -TERM <PID>`。\n\n## 暂停和后台执行进程\n\n信号可以让进程做其他的事情，而不仅仅是终止它们。例如，`SIGSTOP` 会让进程暂停。在终端中，键入 `Ctrl-Z` 会让 shell 发送 `SIGTSTP` 信号，`SIGTSTP` 是 Terminal Stop 的缩写（即 `terminal` 版本的 SIGSTOP）。\n\n我们可以使用 [`fg`](https://www.man7.org/linux/man-pages/man1/fg.1p.html) 或 [`bg`](http://man7.org/linux/man-pages/man1/bg.1p.html) 命令恢复暂停的工作。它们分别表示在前台继续或在后台继续。\n\n[`jobs`](http://man7.org/linux/man-pages/man1/jobs.1p.html) 命令会列出当前终端会话中尚未完成的全部任务。您可以使用 pid 引用这些任务（也可以用 [`pgrep`](https://www.man7.org/linux/man-pages/man1/pgrep.1.html) 找出 pid）。更加符合直觉的操作是您可以使用百分号 + 任务编号（`jobs` 会打印任务编号）来选取该任务。如果要选择最近的一个任务，可以使用 `$!` 这一特殊参数。\n\n还有一件事情需要掌握，那就是命令中的 `&` 后缀可以让命令在直接在后台运行，这使得您可以直接在 shell 中继续做其他操作，不过它此时还是会使用 shell 的标准输出，这一点有时会比较恼人（这种情况可以使用 shell 重定向处理）。\n\n让已经在运行的进程转到后台运行，您可以键入 `Ctrl-Z` ，然后紧接着再输入 `bg`。注意，后台的进程仍然是您的终端进程的子进程，一旦您关闭终端（会发送另外一个信号 `SIGHUP`），这些后台的进程也会终止。为了防止这种情况发生，您可以使用 [`nohup`](https://www.man7.org/linux/man-pages/man1/nohup.1.html)（一个用来忽略 `SIGHUP` 的封装）来运行程序。针对已经运行的程序，可以使用 `disown` 。除此之外，您可以使用终端多路复用器来实现，下一章节我们会进行详细地探讨。\n\n下面这个简单的会话中展示来了些概念的应用。\n\n```\n$ sleep 1000\n^Z\n[1]  + 18653 suspended  sleep 1000\n\n$ nohup sleep 2000 &\n[2] 18745\nappending output to nohup.out\n\n$ jobs\n[1]  + suspended  sleep 1000\n[2]  - running    nohup sleep 2000\n\n$ bg %1\n[1]  - 18653 continued  sleep 1000\n\n$ jobs\n[1]  - running    sleep 1000\n[2]  + running    nohup sleep 2000\n\n$ kill -STOP %1\n[1]  + 18653 suspended (signal)  sleep 1000\n\n$ jobs\n[1]  + suspended (signal)  sleep 1000\n[2]  - running    nohup sleep 2000\n\n$ kill -SIGHUP %1\n[1]  + 18653 hangup     sleep 1000\n\n$ jobs\n[2]  + running    nohup sleep 2000\n\n$ kill -SIGHUP %2\n\n$ jobs\n[2]  + running    nohup sleep 2000\n\n$ kill %2\n[2]  + 18745 terminated  nohup sleep 2000\n\n$ jobs\n\n```\n\n`SIGKILL` 是一个特殊的信号，它不能被进程捕获并且它会马上结束该进程。不过这样做会有一些副作用，例如留下孤儿进程。\n\n您可以在 [这里](<https://en.wikipedia.org/wiki/Signal_(IPC)>) 或输入 [`man signal`](https://www.man7.org/linux/man-pages/man7/signal.7.html) 或使用 `kill -l` 来获取更多关于信号的信息。\n\n# 终端多路复用\n\n当您在使用命令行时，您通常会希望同时执行多个任务。举例来说，您可以想要同时运行您的编辑器，并在终端的另外一侧执行程序。尽管再打开一个新的终端窗口也能达到目的，使用终端多路复用器则是一种更好的办法。\n\n像 [`tmux`](https://www.man7.org/linux/man-pages/man1/tmux.1.html) 这类的终端多路复用器可以允许我们基于面板和标签分割出多个终端窗口，这样您便可以同时与多个 shell 会话进行交互。\n\n不仅如此，终端多路复用使我们可以分离当前终端会话并在将来重新连接。\n\n这让您操作远端设备时的工作流大大改善，避免了 `nohup` 和其他类似技巧的使用。\n\n现在最流行的终端多路器是 [`tmux`](https://www.man7.org/linux/man-pages/man1/tmux.1.html)。`tmux` 是一个高度可定制的工具，您可以使用相关快捷键创建多个标签页并在它们间导航。\n\n`tmux` 的快捷键需要我们掌握，它们都是类似 `<C-b> x` 这样的组合，即需要先按下 `Ctrl+b`，松开后再按下 `x`。`tmux` 中对象的继承结构如下：\n\n-   **会话** - 每个会话都是一个独立的工作区，其中包含一个或多个窗口\n\n    -   `tmux` 开始一个新的会话\n    -   `tmux new -s NAME` 以指定名称开始一个新的会话\n    -   `tmux ls` 列出当前所有会话\n    -   在 `tmux` 中输入 `<C-b> d` ，将当前会话分离\n    -   `tmux a` 重新连接最后一个会话。您也可以通过 `-t` 来指定具体的会话\n\n-   **窗口** - 相当于编辑器或是浏览器中的标签页，从视觉上将一个会话分割为多个部分\n\n    -   `<C-b> c` 创建一个新的窗口，使用 `<C-d>` 关闭\n    -   `<C-b> N` 跳转到第 _N_ 个窗口，注意每个窗口都是有编号的\n    -   `<C-b> p` 切换到前一个窗口\n    -   `<C-b> n` 切换到下一个窗口\n    -   `<C-b> ,` 重命名当前窗口\n    -   `<C-b> w` 列出当前所有窗口\n\n-   **面板** - 像 vim 中的分屏一样，面板使我们可以在一个屏幕里显示多个 shell\n    -   `<C-b> \"` 水平分割\n    -   `<C-b> %` 垂直分割\n    -   `<C-b> <方向>` 切换到指定方向的面板，<方向> 指的是键盘上的方向键\n    -   `<C-b> z` 切换当前面板的缩放\n    -   `<C-b> [` 开始往回卷动屏幕。您可以按下空格键来开始选择，回车键复制选中的部分\n    -   `<C-b> <空格>` 在不同的面板排布间切换\n\n扩展阅读：\n[这里](https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/) 是一份 `tmux` 快速入门教程， [而这一篇](http://linuxcommand.org/lc3_adv_termmux.php) 文章则更加详细，它包含了 `screen` 命令。您也许想要掌握 [`screen`](https://www.man7.org/linux/man-pages/man1/screen.1.html) 命令，因为在大多数 UNIX 系统中都默认安装有该程序。\n\n# 别名\n\n输入一长串包含许多选项的命令会非常麻烦。因此，大多数 shell 都支持设置别名。shell 的别名相当于一个长命令的缩写，shell 会自动将其替换成原本的命令。例如，bash 中的别名语法如下：\n\n```bash\nalias alias_name=\"command_to_alias arg1 arg2\"\n```\n\n注意， `=` 两边是没有空格的，因为 [`alias`](https://www.man7.org/linux/man-pages/man1/alias.1p.html) 是一个 shell 命令，它只接受一个参数。\n\n别名有许多很方便的特性:\n\n```bash\n# 创建常用命令的缩写\nalias ll=\"ls -lh\"\n\n# 能够少输入很多\nalias gs=\"git status\"\nalias gc=\"git commit\"\nalias v=\"vim\"\n\n# 手误打错命令也没关系\nalias sl=ls\n\n# 重新定义一些命令行的默认行为\nalias mv=\"mv -i\"           # -i prompts before overwrite\nalias mkdir=\"mkdir -p\"     # -p make parent dirs as needed\nalias df=\"df -h\"           # -h prints human readable format\n\n# 别名可以组合使用\nalias la=\"ls -A\"\nalias lla=\"la -l\"\n\n# 在忽略某个别名\n\\ls\n# 或者禁用别名\nunalias la\n\n# 获取别名的定义\nalias ll\n# 会打印 ll='ls -lh'\n```\n\n值得注意的是，在默认情况下 shell 并不会保存别名。为了让别名持续生效，您需要将配置放进 shell 的启动文件里，像是 `.bashrc` 或 `.zshrc`，下一节我们就会讲到。\n\n# 配置文件（Dotfiles）\n\n很多程序的配置都是通过纯文本格式的被称作 _点文件_ 的配置文件来完成的（之所以称为点文件，是因为它们的文件名以 `.` 开头，例如 `~/.vimrc`。也正因为此，它们默认是隐藏文件，`ls` 并不会显示它们）。\n\nshell 的配置也是通过这类文件完成的。在启动时，您的 shell 程序会读取很多文件以加载其配置项。根据 shell 本身的不同，您从登录开始还是以交互的方式完成这一过程可能会有很大的不同。关于这一话题，[这里](https://blog.flowblok.id.au/2013-02/shell-startup-scripts.html) 有非常好的资源。\n\n对于 `bash` 来说，在大多数系统下，您可以通过编辑 `.bashrc` 或 `.bash_profile` 来进行配置。在文件中您可以添加需要在启动时执行的命令，例如上文我们讲到过的别名，或者是您的环境变量。\n\n实际上，很多程序都要求您在 shell 的配置文件中包含一行类似 `export PATH=\"$PATH:/path/to/program/bin\"` 的命令，这样才能确保这些程序能够被 shell 找到。\n\n还有一些其他的工具也可以通过 _点文件_ 进行配置：\n\n-   `bash` - `~/.bashrc`, `~/.bash_profile`\n-   `git` - `~/.gitconfig`\n-   `vim` - `~/.vimrc` 和 `~/.vim` 目录\n-   `ssh` - `~/.ssh/config`\n-   `tmux` - `~/.tmux.conf`\n\n我们应该如何管理这些配置文件呢，它们应该在它们的文件夹下，并使用版本控制系统进行管理，然后通过脚本将其 **符号链接** 到需要的地方。这么做有如下好处：\n\n-   **安装简单**: 如果您登录了一台新的设备，在这台设备上应用您的配置只需要几分钟的时间；\n-   **可移植性**: 您的工具在任何地方都以相同的配置工作\n-   **同步**: 在一处更新配置文件，可以同步到其他所有地方\n-   **变更追踪**: 您可能要在整个程序员生涯中持续维护这些配置文件，而对于长期项目而言，版本历史是非常重要的\n\n配置文件中需要放些什么？您可以通过在线文档和 [帮助手册](https://en.wikipedia.org/wiki/Man_page) 了解所使用工具的设置项。另一个方法是在网上搜索有关特定程序的文章，作者们在文章中会分享他们的配置。还有一种方法就是直接浏览其他人的配置文件：您可以在这里找到无数的 [dotfiles 仓库](https://github.com/search?o=desc&q=dotfiles&s=stars&type=Repositories) —— 其中最受欢迎的那些可以在 [这里](https://github.com/mathiasbynens/dotfiles) 找到（我们建议您不要直接复制别人的配置）。[这里](https://dotfiles.github.io/) 也有一些非常有用的资源。\n\n本课程的老师们也在 GitHub 上开源了他们的配置文件：\n[Anish](https://github.com/anishathalye/dotfiles),\n[Jon](https://github.com/jonhoo/configs),\n[Jose](https://github.com/jjgo/dotfiles).\n\n## 可移植性\n\n配置文件的一个常见的痛点是它可能并不能在多种设备上生效。例如，如果您在不同设备上使用的操作系统或者 shell 是不同的，则配置文件是无法生效的。或者，有时您仅希望特定的配置只在某些设备上生效。\n\n有一些技巧可以轻松达成这些目的。如果配置文件支持 if 语句或类似的东西，则您可以借助它针对不同的设备编写不同的配置。例如，您的 shell 可以包含：\n\n```bash\nif [[ \"$(uname)\" == \"Linux\" ]]; then {do_something}; fi\n\n# 使用和 shell 相关的配置时先检查当前 shell 类型\nif [[ \"$SHELL\" == \"zsh\" ]]; then {do_something}; fi\n\n# 您也可以针对特定的设备进行配置\nif [[ \"$(hostname)\" == \"myServer\" ]]; then {do_something}; fi\n```\n\n如果配置文件支持 include 功能，您也可以多加利用。例如：`~/.gitconfig` 可以这样编写：\n\n```\n[include]\n    path = ~/.gitconfig_local\n```\n\n然后我们可以在日常使用的设备上创建配置文件 `~/.gitconfig_local` 来包含与该设备相关的特定配置。您甚至应该创建一个单独的代码仓库来管理这些与设备相关的配置。\n\n如果您希望在不同的程序之间共享某些配置，该方法也适用。例如，如果您想要在 `bash` 和 `zsh` 中同时启用一些别名，您可以把它们写在 `.aliases` 里，然后在这两个 shell 里应用：\n\n```bash\n# Test if ~/.aliases exists and source it\nif [ -f ~/.aliases ]; then\n    source ~/.aliases\nfi\n```\n\n# 远端设备\n\n对于程序员来说，在他们的日常工作中使用远程服务器已经非常普遍了。如果您需要使用远程服务器来部署后端软件或您需要一些计算能力强大的服务器，您就会用到安全 shell（SSH）。和其他工具一样，SSH 也是可以高度定制的，也值得我们花时间学习它。\n\n通过如下命令，您可以使用 `ssh` 连接到其他服务器：\n\n```bash\nssh foo@bar.mit.edu\n```\n\n这里我们尝试以用户名 `foo` 登录服务器 `bar.mit.edu`。服务器可以通过 URL 指定（例如 `bar.mit.edu`），也可以使用 IP 指定（例如 `foobar@192.168.1.42`）。后面我们会介绍如何修改 ssh 配置文件使我们可以用类似 `ssh bar` 这样的命令来登录服务器。\n\n## 执行命令\n\n`ssh` 的一个经常被忽视的特性是它可以直接远程执行命令。\n`ssh foobar@server ls` 可以直接在用 foobar 的命令下执行 `ls` 命令。\n想要配合管道来使用也可以， `ssh foobar@server ls | grep PATTERN` 会在本地查询远端 `ls` 的输出而 `ls | ssh foobar@server grep PATTERN` 会在远端对本地 `ls` 输出的结果进行查询。\n\n## SSH 密钥\n\n基于密钥的验证机制使用了密码学中的公钥，我们只需要向服务器证明客户端持有对应的私钥，而不需要公开其私钥。这样您就可以避免每次登录都输入密码的麻烦了秘密就可以登录。不过，私钥(通常是 `~/.ssh/id_rsa` 或者 `~/.ssh/id_ed25519`) 等效于您的密码，所以一定要好好保存它。\n\n### 密钥生成\n\n使用 [`ssh-keygen`](http://man7.org/linux/man-pages/man1/ssh-keygen.1.html) 命令可以生成一对密钥：\n\n```bash\nssh-keygen -o -a 100 -t ed25519 -f ~/.ssh/id_ed25519\n```\n\n您可以为密钥设置密码，防止有人持有您的私钥并使用它访问您的服务器。您可以使用 [`ssh-agent`](https://www.man7.org/linux/man-pages/man1/ssh-agent.1.html) 或 [`gpg-agent`](https://linux.die.net/man/1/gpg-agent) ，这样就不需要每次都输入该密码了。\n\n如果您曾经配置过使用 SSH 密钥推送到 GitHub，那么可能您已经完成了 [这里](https://help.github.com/articles/connecting-to-github-with-ssh/) 介绍的这些步骤，并且已经有了一个可用的密钥对。要检查您是否持有密码并验证它，您可以运行 `ssh-keygen -y -f /path/to/key`.\n\n### 基于密钥的认证机制\n\n`ssh` 会查询 `.ssh/authorized_keys` 来确认那些用户可以被允许登录。您可以通过下面的命令将一个公钥拷贝到这里：\n\n```bash\ncat .ssh/id_ed25519.pub | ssh foobar@remote 'cat >> ~/.ssh/authorized_keys'\n```\n\n如果支持 `ssh-copy-id` 的话，可以使用下面这种更简单的解决方案：\n\n```bash\nssh-copy-id -i .ssh/id_ed25519.pub foobar@remote\n```\n\n## 通过 SSH 复制文件\n\n使用 ssh 复制文件有很多方法：\n\n-   `ssh+tee`, 最简单的方法是执行 `ssh` 命令，然后通过这样的方法利用标准输入实现 `cat localfile | ssh remote_server tee serverfile`。回忆一下，[`tee`](https://www.man7.org/linux/man-pages/man1/tee.1.html) 命令会将标准输出写入到一个文件；\n-   [`scp`](https://www.man7.org/linux/man-pages/man1/scp.1.html) ：当需要拷贝大量的文件或目录时，使用 `scp` 命令则更加方便，因为它可以方便的遍历相关路径。语法如下：`scp path/to/local_file remote_host:path/to/remote_file`；\n-   [`rsync`](https://www.man7.org/linux/man-pages/man1/rsync.1.html) 对 `scp` 进行了改进，它可以检测本地和远端的文件以防止重复拷贝。它还可以提供一些诸如符号连接、权限管理等精心打磨的功能。甚至还可以基于 `--partial` 标记实现断点续传。`rsync` 的语法和 `scp` 类似；\n\n## 端口转发\n\n很多情况下我们都会遇到软件需要监听特定设备的端口。如果是在您的本机，可以使用 `localhost:PORT` 或 `127.0.0.1:PORT`。但是如果需要监听远程服务器的端口该如何操作呢？这种情况下远端的端口并不会直接通过网络暴露给您。\n\n此时就需要进行 _端口转发_。端口转发有两种，一种是本地端口转发和远程端口转发（参见下图，该图片引用自这篇 [StackOverflow 文章](https://unix.stackexchange.com/questions/115897/whats-ssh-port-forwarding-and-whats-the-difference-between-ssh-local-and-remot)）中的图片。\n\n**本地端口转发**\n![Local Port Forwarding](https://i.sstatic.net/a28N8.png)\n\n**远程端口转发**\n![Remote Port Forwarding](https://i.sstatic.net/4iK3b.png)\n\n常见的情景是使用本地端口转发，即远端设备上的服务监听一个端口，而您希望在本地设备上的一个端口建立连接并转发到远程端口上。例如，我们在远端服务器上运行 Jupyter notebook 并监听 `8888` 端口。 然后，建立从本地端口 `9999` 的转发，使用 `ssh -L 9999:localhost:8888 foobar@remote_server` 。这样只需要访问本地的 `localhost:9999` 即可。\n\n## SSH 配置\n\n我们已经介绍了很多参数。为它们创建一个别名是个好想法，我们可以这样做：\n\n```bash\nalias my_server=\"ssh -i ~/.id_ed25519 --port 2222 -L 9999:localhost:8888 foobar@remote_server\"\n```\n\n不过，更好的方法是使用 `~/.ssh/config`.\n\n```bash\nHost vm\n    User foobar\n    HostName 172.16.174.141\n    Port 2222\n    IdentityFile ~/.ssh/id_ed25519\n    LocalForward 9999 localhost:8888\n\n# 在配置文件中也可以使用通配符\nHost *.mit.edu\n    User foobaz\n```\n\n这么做的好处是，使用 `~/.ssh/config` 文件来创建别名，类似 `scp`、`rsync` 和 `mosh` 的这些命令都可以读取这个配置并将设置转换为对应的命令行选项。\n\n注意，`~/.ssh/config` 文件也可以被当作配置文件，而且一般情况下也是可以被导入其他配置文件的。不过，如果您将其公开到互联网上，那么其他人都将会看到您的服务器地址、用户名、开放端口等等。这些信息可能会帮助到那些企图攻击您系统的黑客，所以请务必三思。\n\n服务器侧的配置通常放在 `/etc/ssh/sshd_config`。您可以在这里配置免密认证、修改 ssh 端口、开启 X11 转发等等。 您也可以为每个用户单独指定配置。\n\n## 杂项\n\n连接远程服务器的一个常见痛点是遇到由关机、休眠或网络环境变化导致的掉线。如果连接的延迟很高也很让人讨厌。[Mosh](https://mosh.org/)（即 mobile shell ）对 ssh 进行了改进，它允许连接漫游、间歇连接及智能本地回显。\n\n有时将一个远端文件夹挂载到本地会比较方便， [sshfs](https://github.com/libfuse/sshfs) 可以将远端服务器上的一个文件夹挂载到本地，然后您就可以使用本地的编辑器了。\n\n# Shell & 框架\n\n在 shell 工具和脚本那节课中我们已经介绍了 `bash` shell，因为它是目前最通用的 shell，大多数的系统都将其作为默认 shell。但是，它并不是唯一的选项。\n\n例如，`zsh` shell 是 `bash` 的超集并提供了一些方便的功能：\n\n-   智能替换, `**`\n-   行内替换/通配符扩展\n-   拼写纠错\n-   更好的 tab 补全和选择\n-   路径展开 (`cd /u/lo/b` 会被展开为 `/usr/local/bin`)\n\n**框架** 也可以改进您的 shell。比较流行的通用框架包括 [prezto](https://github.com/sorin-ionescu/prezto) 或 [oh-my-zsh](https://ohmyz.sh/)。还有一些更精简的框架，它们往往专注于某一个特定功能，例如 [zsh 语法高亮](https://github.com/zsh-users/zsh-syntax-highlighting) 或 [zsh 历史子串查询](https://github.com/zsh-users/zsh-history-substring-search)。像 [fish](https://fishshell.com/) 这样的 shell 已经默认包含了许多这类用户友好的功能，包括：\n\n-   向右对齐\n-   命令语法高亮\n-   历史子串查询\n-   基于手册页面的选项补全\n-   更智能的自动补全\n-   提示符主题\n\n需要注意的是，使用这些框架可能会降低您 shell 的性能，尤其是如果这些框架的代码没有优化或者代码过多。您随时可以测试其性能或禁用某些不常用的功能来实现速度与功能的平衡。\n\n# 终端模拟器\n\n和自定义 shell 一样，花点时间选择适合您的 **终端模拟器** 并进行设置是很有必要的。有许多终端模拟器可供您选择（这里有一些关于它们之间 [比较](https://anarc.at/blog/2018-04-12-terminal-emulators-1/) 的信息）\n\n您会花上很多时间在使用终端上，因此研究一下终端的设置是很有必要的，您可以从下面这些方面来配置您的终端：\n\n-   字体选择\n-   彩色主题\n-   快捷键\n-   标签页/面板支持\n-   回退配置\n-   性能（像 [Alacritty](https://github.com/jwilm/alacritty) 或者 [kitty](https://sw.kovidgoyal.net/kitty/) 这种比较新的终端，它们支持 GPU 加速）。\n\n# 课后练习\n\n[习题解答]({{site.url}}/{{site.solution_url}}/{{page.solution.url}})\n\n## 任务控制\n\n1. 我们可以使用类似 `ps aux | grep` 这样的命令来获取任务的 pid ，然后您可以基于 pid 来结束这些进程。但我们其实有更好的方法来做这件事。在终端中执行 `sleep 10000` 这个任务。然后用 `Ctrl-Z` 将其切换到后台并使用 `bg` 来继续允许它。现在，使用 [`pgrep`](https://www.man7.org/linux/man-pages/man1/pgrep.1.html) 来查找 pid 并使用 [`pkill`](https://www.man7.org/linux/man-pages/man1/pgrep.1.html) 结束进程而不需要手动输入 pid。(提示：: 使用 `-af` 标记)。\n\n2. 如果您希望某个进程结束后再开始另外一个进程， 应该如何实现呢？在这个练习中，我们使用 `sleep 60 &` 作为先执行的程序。一种方法是使用 [`wait`](http://man7.org/linux/man-pages/man1/wait.1p.html) 命令。尝试启动这个休眠命令，然后待其结束后再执行 `ls` 命令。\n\n    但是，如果我们在不同的 bash 会话中进行操作，则上述方法就不起作用了。因为 `wait` 只能对子进程起作用。之前我们没有提过的一个特性是，`kill` 命令成功退出时其状态码为 0 ，其他状态则是非 0。`kill -0` 则不会发送信号，但是会在进程不存在时返回一个不为 0 的状态码。请编写一个 bash 函数 `pidwait` ，它接受一个 pid 作为输入参数，然后一直等待直到该进程结束。您需要使用 `sleep` 来避免浪费 CPU 性能。\n\n## 终端多路复用\n\n1. 请完成这个 `tmux` [教程](https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/) 参考 [这些步骤](https://www.hamvocke.com/blog/a-guide-to-customizing-your-tmux-conf/) 来学习如何自定义 `tmux`。\n\n## 别名\n\n1. 创建一个 `dc` 别名，它的功能是当我们错误的将 `cd` 输入为 `dc` 时也能正确执行。\n2. 执行 `history | awk '{$1=\"\";print substr($0,2)}' | sort | uniq -c | sort -n | tail -n 10` 来获取您最常用的十条命令，尝试为它们创建别名。注意：这个命令只在 Bash 中生效，如果您使用 ZSH，使用 `history 1` 替换 `history`。\n\n## 配置文件\n\n让我们帮助您进一步学习配置文件：\n\n1. 为您的配置文件新建一个文件夹，并设置好版本控制\n2. 在其中添加至少一个配置文件，比如说您的 shell，在其中包含一些自定义设置（可以从设置 `$PS1` 开始）。\n3. 建立一种在新设备进行快速安装配置的方法（无需手动操作）。最简单的方法是写一个 shell 脚本对每个文件使用 `ln -s`，也可以使用 [专用工具](https://dotfiles.github.io/utilities/)\n4. 在新的虚拟机上测试该安装脚本。\n5. 将您现有的所有配置文件移动到项目仓库里。\n6. 将项目发布到 GitHub。\n\n## 远端设备\n\n进行下面的练习需要您先安装一个 Linux 虚拟机（如果已经安装过则可以直接使用），如果您对虚拟机尚不熟悉，可以参考 [这篇教程](https://hibbard.eu/install-ubuntu-virtual-box/) 来进行安装。\n\n1. 前往 `~/.ssh/` 并查看是否已经存在 SSH 密钥对。如果不存在，请使用 `ssh-keygen -o -a 100 -t ed25519` 来创建一个。建议为密钥设置密码然后使用 `ssh-agent`，更多信息可以参考 [这里](https://www.ssh.com/ssh/agent)；\n2. 在 `.ssh/config` 加入下面内容：\n\n```bash\nHost vm\n    User username_goes_here\n    HostName ip_goes_here\n    IdentityFile ~/.ssh/id_ed25519\n    LocalForward 9999 localhost:8888\n```\n\n3. 使用 `ssh-copy-id vm` 将您的 ssh 密钥拷贝到服务器。\n4. 使用 `python -m http.server 8888` 在您的虚拟机中启动一个 Web 服务器并通过本机的 `http://localhost:9999` 访问虚拟机上的 Web 服务器\n5. 使用 `sudo vim /etc/ssh/sshd_config` 编辑 SSH 服务器配置，通过修改 `PasswordAuthentication` 的值来禁用密码验证。通过修改 `PermitRootLogin` 的值来禁用 root 登录。然后使用 `sudo service sshd restart` 重启 `ssh` 服务器，然后重新尝试。\n6. (附加题) 在虚拟机中安装 [`mosh`](https://mosh.org/) 并启动连接。然后断开服务器/虚拟机的网络适配器。mosh 可以恢复连接吗？\n7. (附加题) 查看 `ssh` 的 `-N` 和 `-f` 选项的作用，找出在后台进行端口转发的命令是什么？\n"
  },
  {
    "path": "_2020/course-shell.md",
    "content": "---\nlayout: lecture\ntitle: \"课程概览与 shell\"\ndate: 2020-01-13\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n  aspect: 56.25\n  id: Z56Jmr9Z34Q\nsolution:\n    ready: true\n    url: course-shell-solution\n---\n\n# 动机\n\n作为计算机科学家，我们都知道计算机最擅长帮助我们完成重复性的工作。\n但是我们却常常忘记这一点也适用于我们使用计算机的方式，而不仅仅是利用计算机程序去帮我们求解问题。\n在从事与计算机相关的工作时，我们有很多触手可及的工具可以帮助我们更高效的解决问题。\n但是我们中的大多数人实际上只利用了这些工具中的很少一部分，我们常常只是死记硬背一些如咒语般的命令，\n或是当我们卡住的时候，盲目地从网上复制粘贴一些命令。\n\n本课程意在帮你解决这一问题。\n\n我们希望教会您如何挖掘现有工具的潜力，并向您介绍一些新的工具。也许我们还可以促使您想要去探索（甚至是去开发）更多的工具。\n我们认为这是大多数计算机科学相关课程中缺少的重要一环。\n\n# 课程结构\n\n本课程包含 11 个时长在一小时左右的讲座，每一个讲座都会关注一个 \n[特定的主题](/missing-semester/2020/)。尽管这些讲座之间基本上是各自独立的，但随着课程的进行，我们会假定您已经掌握了之前的内容。\n每个讲座都有在线笔记供查阅，但是课上的很多内容并不会包含在笔记中。因此我们也会把课程录制下来发布到互联网上供大家观看学习。\n\n我们希望能在这 11 个一小时讲座中涵盖大部分必须的内容，因此课程的信息密度是相当大的。为了能帮助您以自己的节奏来掌握讲座内容，每次课程都包含一组练习来帮助您掌握本节课的重点。\n课后我们会安排答疑的时间来回答您的问题。如果您参加的是在线课程，可以发送邮件到\n [missing-semester@mit.edu](mailto:missing-semester@mit.edu) 来联系我们。\n\n由于时长的限制，我们不可能达到那些专门课程一样的细致程度，我们会适时地将您介绍一些优秀的资源，帮助您深入的理解相关的工具或主题。\n但是如果您还有一些特别关注的话题，也请联系我们。\n\n\n# 主题 1: The Shell\n\n##  shell 是什么？\n\n如今的计算机有着多种多样的交互接口让我们可以进行指令的输入，从炫酷的图像用户界面（GUI），语音输入甚至是 AR/VR 都已经无处不在。\n这些交互接口可以覆盖 80% 的使用场景，但是它们也从根本上限制了您的操作方式——你不能点击一个不存在的按钮或者是用语音输入一个还没有被录入的指令。\n为了充分利用计算机的能力，我们不得不回到最根本的方式，使用文字接口：Shell\n\n几乎所有您能够接触到的平台都支持某种形式的 shell，有些甚至还提供了多种 shell 供您选择。虽然它们之间有些细节上的差异，但是其核心功能都是一样的：它允许你执行程序，输入并获取某种半结构化的输出。\n\n本节课我们会使用 Bourne Again SHell, 简称 \"bash\" 。\n这是被最广泛使用的一种 shell，它的语法和其他的 shell 都是类似的。打开 shell _提示符_（您输入指令的地方），您首先需要打开 _终端_ 。您的设备通常都已经内置了终端，或者您也可以安装一个，非常简单。\n\n## 使用 shell\n\n当您打开终端时，您会看到一个提示符，它看起来一般是这个样子的：\n\n```console\nmissing:~$ \n```\n\n这是 shell 最主要的文本接口。它告诉你，你的主机名是 `missing` 并且您当前的工作目录（\"current working directory\"）或者说您当前所在的位置是 `~` (表示 \"home\")。 `$` 符号表示您现在的身份不是 root 用户（稍后会介绍）。在这个提示符中，您可以输入 _命令_ ，命令最终会被 shell 解析。最简单的命令是执行一个程序：\n\n```console\nmissing:~$ date\nFri 10 Jan 2020 11:49:31 AM EST\nmissing:~$ \n```\n\n这里，我们执行了 `date` 这个程序，不出意料地，它打印出了当前的日期和时间。然后，shell 等待我们输入其他命令。我们可以在执行命令的同时向程序传递 _参数_ ：\n\n```console\nmissing:~$ echo hello\nhello\n```\n上例中，我们让 shell 执行 `echo` ，同时指定参数 `hello`。`echo` 程序将该参数打印出来。\nshell 基于空格分割命令并进行解析，然后执行第一个单词代表的程序，并将后续的单词作为程序可以访问的参数。如果您希望传递的参数中包含空格（例如一个名为 My Photos 的文件夹），您要么用使用单引号，双引号将其包裹起来，要么使用转义符号 `\\` 进行处理（`My\\ Photos`）。\n\n但是，shell 是如何知道去哪里寻找 `date` 或 `echo` 的呢？其实，类似于 Python 或 Ruby，shell 是一个编程环境，所以它具备变量、条件、循环和函数（下一课进行讲解）。当你在 shell 中执行命令时，您实际上是在执行一段 shell 可以解释执行的简短代码。如果你要求 shell 执行某个指令，但是该指令并不是 shell 所了解的编程关键字，那么它会去咨询 _环境变量_  `$PATH`，它会列出当 shell 接到某条指令时，进行程序搜索的路径：\n\n\n```console\nmissing:~$ echo $PATH\n/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\nmissing:~$ which echo\n/bin/echo\nmissing:~$ /bin/echo $PATH\n/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n```\n\n当我们执行 `echo` 命令时，shell 了解到需要执行 `echo` 这个程序，随后它便会在 `$PATH` 中搜索由 `:` 所分割的一系列目录，基于名字搜索该程序。当找到该程序时便执行（假定该文件是 _可执行程序_，后续课程将详细讲解）。确定某个程序名代表的是哪个具体的程序，可以使用\n`which` 程序。我们也可以绕过 `$PATH`，通过直接指定需要执行的程序的路径来执行该程序\n\n## 在 shell 中导航\n\nshell 中的路径是一组被分割的目录，在 Linux 和 macOS 上使用 `/` 分割，而在 Windows 上是 `\\`。路径 `/` 代表的是系统的根目录，所有的文件夹都包括在这个路径之下，在 Windows 上每个盘都有一个根目录（例如：\n`C:\\`）。 我们假设您在学习本课程时使用的是 Linux 文件系统。如果某个路径以 `/` 开头，那么它是一个 _绝对路径_，其他的都是 _相对路径_ 。相对路径是指相对于当前工作目录的路径，当前工作目录可以使用 `pwd` 命令来获取。此外，切换目录需要使用 `cd` 命令。在路径中，`.` 表示的是当前目录，而 `..` 表示上级目录：\n\n```console\nmissing:~$ pwd\n/home/missing\nmissing:~$ cd /home\nmissing:/home$ pwd\n/home\nmissing:/home$ cd ..\nmissing:/$ pwd\n/\nmissing:/$ cd ./home\nmissing:/home$ pwd\n/home\nmissing:/home$ cd missing\nmissing:~$ pwd\n/home/missing\nmissing:~$ ../../bin/echo hello\nhello\n```\n\n注意，shell 会实时显示当前的路径信息。您可以通过配置 shell 提示符来显示各种有用的信息，这一内容我们会在后面的课程中进行讨论。\n\n一般来说，当我们运行一个程序时，如果我们没有指定路径，则该程序会在当前目录下执行。例如，我们常常会搜索文件，并在需要时创建文件。\n\n为了查看指定目录下包含哪些文件，我们使用 `ls` 命令：\n\n```console\nmissing:~$ ls\nmissing:~$ cd ..\nmissing:/home$ ls\nmissing\nmissing:/home$ cd ..\nmissing:/$ ls\nbin\nboot\ndev\netc\nhome\n...\n```\n\n除非我们利用第一个参数指定目录，否则 `ls` 会打印当前目录下的文件。大多数的命令接受标记和选项（带有值的标记），它们以 `-` 开头，并可以改变程序的行为。通常，在执行程序时使用 `-h` 或 `--help` 标记可以打印帮助信息，以便了解有哪些可用的标记或选项。例如，`ls --help` 的输出如下：\n\n```\n  -l                         use a long listing format\n```\n\n```console\nmissing:~$ ls -l /home\ndrwxr-xr-x 1 missing  users  4096 Jun 15  2019 missing\n```\n\n这个参数可以更加详细地列出目录下文件或文件夹的信息。首先，本行第一个字符 `d` 表示\n`missing` 是一个目录。然后接下来的九个字符，每三个字符构成一组。\n（`rwx`）. 它们分别代表了文件所有者（`missing`），用户组（`users`） 以及其他所有人具有的权限。其中 `-` 表示该用户不具备相应的权限。从上面的信息来看，只有文件所有者可以修改（`w`），`missing` 文件夹 （例如，添加或删除文件夹中的文件）。为了进入某个文件夹，用户需要具备该文件夹以及其父文件夹的“搜索”权限（以“可执行”：`x`）权限表示。为了列出它的包含的内容，用户必须对该文件夹具备读权限（`r`）。对于文件来说，权限的意义也是类似的。注意，`/bin` 目录下的程序在最后一组，即表示所有人的用户组中，均包含 `x` 权限，也就是说任何人都可以执行这些程序。\n\n\n在这个阶段，还有几个趁手的命令是您需要掌握的，例如 `mv`（用于重命名或移动文件）、 `cp`（拷贝文件）以及 `mkdir`（新建文件夹）。\n\n如果您想要知道关于程序参数、输入输出的信息，亦或是想要了解它们的工作方式，请试试 `man` 这个程序。它会接受一个程序名作为参数，然后将它的文档（用户手册）展现给您。注意，使用 `q` 可以退出该程序。\n\n```console\nmissing:~$ man ls\n```\n\n## 在程序间创建连接\n\n在 shell 中，程序有两个主要的“流”：它们的输入流和输出流。\n当程序尝试读取信息时，它们会从输入流中进行读取，当程序打印信息时，它们会将信息输出到输出流中。\n通常，一个程序的输入输出流都是您的终端。也就是，您的键盘作为输入，显示器作为输出。\n但是，我们也可以重定向这些流！\n\n最简单的重定向是 `< file` 和 `> file`。这两个命令可以将程序的输入输出流分别重定向到文件：\n\n```console\nmissing:~$ echo hello > hello.txt\nmissing:~$ cat hello.txt\nhello\nmissing:~$ cat < hello.txt\nhello\nmissing:~$ cat < hello.txt > hello2.txt\nmissing:~$ cat hello2.txt\nhello\n```\n\n您还可以使用 `>>` 来向一个文件追加内容。使用管道（ _pipes_ ），我们能够更好的利用文件重定向。\n`|` 操作符允许我们将一个程序的输出和另外一个程序的输入连接起来： \n\n```console\nmissing:~$ ls -l / | tail -n1\ndrwxr-xr-x 1 root  root  4096 Jun 20  2019 var\nmissing:~$ curl --head --silent google.com | grep --ignore-case content-length | cut --delimiter=' ' -f2\n219\n```\n\n我们会在数据清理一章中更加详细的探讨如何更好的利用管道。\n\n## 一个功能全面又强大的工具\n\n对于大多数的类 Unix 系统，有一类用户是非常特殊的，那就是：根用户（root user）。\n您应该已经注意到了，在上面的输出结果中，根用户几乎不受任何限制，他可以创建、读取、更新和删除系统中的任何文件。\n通常在我们并不会以根用户的身份直接登录系统，因为这样可能会因为某些错误的操作而破坏系统。\n取而代之的是我们会在需要的时候使用 `sudo` 命令。顾名思义，它的作用是让您可以以 su（super user 或 root 的简写）的身份执行一些操作。\n当您遇到拒绝访问（permission denied）的错误时，通常是因为此时您必须是根用户才能操作。然而，请再次确认您是真的要执行此操作。\n\n有一件事情是您必须作为根用户才能做的，那就是向 `sysfs` 文件写入内容。系统被挂载在 `/sys` 下，`sysfs` 文件则暴露了一些内核（kernel）参数。\n因此，您不需要借助任何专用的工具，就可以轻松地在运行期间配置系统内核。**注意 Windows 和 macOS 没有这个文件**\n\n例如，您笔记本电脑的屏幕亮度写在 `brightness` 文件中，它位于\n\n```\n/sys/class/backlight\n```\n通过将数值写入该文件，我们可以改变屏幕的亮度。现在，蹦到您脑袋里的第一个想法可能是：\n\n\n```console\n$ sudo find -L /sys/class/backlight -maxdepth 2 -name '*brightness*'\n/sys/class/backlight/thinkpad_screen/brightness\n$ cd /sys/class/backlight/thinkpad_screen\n$ sudo echo 3 > brightness\nAn error occurred while redirecting file 'brightness'\nopen: Permission denied\n```\n出乎意料的是，我们还是得到了一个错误信息。毕竟，我们已经使用了\n`sudo` 命令！关于 shell，有件事我们必须要知道。`|`、`>`、和 `<` 是通过 shell 执行的，而不是被各个程序单独执行。\n`echo` 等程序并不知道 `|` 的存在，它们只知道从自己的输入输出流中进行读写。\n回到上面更改屏幕亮度命令执行的报错，为了能让 `sudo echo` 命令输出的亮度值写入 brightness 文件， _shell_ (权限为当前用户) 会先尝试打开 brightness 文件，但此时操作 shell 的不是根（root）用户，所以系统拒绝了这个打开操作，提示无权限。\n\n明白这一点后，我们可以这样操作：\n\n```console\n$ echo 3 | sudo tee brightness\n```\n此时打开 `/sys` 文件的是 `tee` 这个程序，并且该程序以 `root` 权限在运行，因此操作可以进行。\n这样您就可以在 `/sys` 中愉快地玩耍了，例如修改系统中各种 LED 的状态（路径可能会有所不同）：\n\n```console\n$ echo 1 | sudo tee /sys/class/leds/input6::scrolllock/brightness\n```\n\n# 接下来.....\n\n学到这里，您掌握的 shell 知识已经可以完成一些基础的任务了。您应该已经可以查找感兴趣的文件并使用大多数程序的基本功能了。\n在下一场讲座中，我们会探讨如何利用 shell 及其他工具执行并自动化更复杂的任务。\n\n# 课后练习\n[习题解答]({{site.url}}/{{site.solution_url}}/{{page.solution.url}})\n本课程中的每节课都包含一系列练习题。有些题目是有明确目的的，另外一些则是开放题，例如“尝试使用 X 和 Y”，我们强烈建议您一定要动手实践，用于尝试这些内容。\n此外，我们没有为这些练习题提供答案。如果有任何困难，您可以发送邮件给我们并描述你已经做出的尝试，我们会设法帮您解答。\n\n\n1. 本课程需要使用类 Unix shell，例如 Bash 或 ZSH。如果您在 Linux 或者 MacOS 上面完成本课程的练习，则不需要做任何特殊的操作。如果您使用的是 Windows，则您不应该使用 cmd 或是 Powershell；您可以使用 [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/) 或者是 Linux 虚拟机。使用 `echo $SHELL` 命令可以查看您的 shell 是否满足要求。如果打印结果为 `/bin/bash` 或 `/usr/bin/zsh` 则是可以的。\n2. 在 `/tmp` 下新建一个名为 `missing` 的文件夹。\n3. 用 `man` 查看程序 `touch` 的使用手册。\n4. 用 `touch` 在 `missing` 文件夹中新建一个叫 `semester` 的文件。\n5. 将以下内容一行一行地写入 `semester` 文件：\n    ```\n    #!/bin/sh\n    curl --head --silent https://missing.csail.mit.edu\n    ```\n    第一行可能有点棘手， `#` 在 Bash 中表示注释，而 `!` 即使被双引号（`\"`）包裹也具有特殊的含义。\n    单引号（`'`）则不一样，此处利用这一点解决输入问题。更多信息请参考  [Bash quoting 手册](https://www.gnu.org/software/bash/manual/html_node/Quoting.html)\n\n6. 尝试执行这个文件。例如，将该脚本的路径（`./semester`）输入到您的 shell 中并回车。如果程序无法执行，请使用 `ls` 命令来获取信息并理解其不能执行的原因。\n7. 查看 `chmod` 的手册(例如，使用 `man chmod` 命令)\n\n8. 使用 `chmod` 命令改变权限，使 `./semester` 能够成功执行，不要使用 `sh semester` 来执行该程序。您的 shell 是如何知晓这个文件需要使用 `sh` 来解析呢？更多信息请参考：[shebang](https://en.wikipedia.org/wiki/Shebang_(Unix))\n\n9. 使用 `|` 和 `>` ，将 `semester` 文件输出的最后更改日期信息，写入主目录下的 `last-modified.txt` 的文件中\n\n10. 写一段命令来从 `/sys` 中获取笔记本的电量信息，或者台式机 CPU 的温度。注意：macOS 并没有 sysfs，所以 Mac 用户可以跳过这一题。\n\n"
  },
  {
    "path": "_2020/data-wrangling.md",
    "content": "---\nlayout: lecture\ntitle: \"数据整理\"\ndate: 2020-01-16\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n  aspect: 56.25\n  id: sz_dsktIjt4\nsolution:\n    ready: true\n    url: data-wrangling-solution\n---\n\n您是否曾经有过这样的需求，将某种格式存储的数据转换成另外一种格式? 肯定有过，对吧！\n这也正是我们这节课所要讲授的主要内容。具体来讲，我们需要不断地对数据进行处理，直到得到我们想要的最终结果。\n\n在之前的课程中，其实我们已经接触到了一些数据整理的基本技术。可以这么说，每当您使用管道运算符的时候，其实就是在进行某种形式的数据整理。\n\n例如这样一条命令 `journalctl | grep -i intel`，它会找到所有包含 intel（不区分大小写）的系统日志。您可能并不认为这是数据整理，但是它确实将某种形式的数据（全部系统日志）转换成了另外一种形式的数据（仅包含 intel 的日志）。大多数情况下，数据整理需要您能够明确哪些工具可以被用来达成特定数据整理的目的，并且明白如何组合使用这些工具。\n\n让我们从头讲起。既然是学习数据整理，那有两样东西自然是必不可少的：用来整理的数据以及相关的应用场景。日志处理通常是一个比较典型的使用场景，因为我们经常需要在日志中查找某些信息，这种情况下通读日志是不现实的。现在，让我们研究一下系统日志，看看哪些用户曾经尝试过登录我们的服务器：\n\n```bash\nssh myserver journalctl\n```\n\n内容太多了。现在让我们把涉及 sshd 的信息过滤出来：\n\n```bash\nssh myserver journalctl | grep sshd\n```\n\n注意，这里我们使用管道将一个远程服务器上的文件传递给本机的 `grep` 程序！\n`ssh` 太牛了，下一节课我们会讲授命令行环境，届时我们会详细讨论 `ssh` 的相关内容。此时我们打印出的内容，仍然比我们需要的要多得多，读起来也非常费劲。我们来改进一下：\n\n```bash\nssh myserver 'journalctl | grep sshd | grep \"Disconnected from\"' | less\n```\n\n多出来的引号是什么作用呢？这么说吧，我们的日志是一个非常大的文件，把这么大的文件流直接传输到我们本地的电脑上再进行过滤是对流量的一种浪费。因此我们采取另外一种方式，我们先在远端机器上过滤文本内容，然后再将结果传输到本机。 `less` 为我们创建来一个文件分页器，使我们可以通过翻页的方式浏览较长的文本。为了进一步节省流量，我们甚至可以将当前过滤出的日志保存到文件中，这样后续就不需要再次通过网络访问该文件了：\n\n\n```console\n$ ssh myserver 'journalctl | grep sshd | grep \"Disconnected from\"' > ssh.log\n$ less ssh.log\n```\n\n过滤结果中仍然包含不少没用的数据。我们有很多办法可以删除这些无用的数据，但是让我们先研究一下 `sed` 这个非常强大的工具。\n\n`sed` 是一个基于文本编辑器 `ed` 构建的 \"流编辑器\" 。在 `sed` 中，您基本上是利用一些简短的命令来修改文件，而不是直接操作文件的内容（尽管您也可以选择这样做）。相关的命令非常多，但是最常用的是 `s`，即 *替换* 命令，例如我们可以这样写：\n\n```bash\nssh myserver journalctl\n | grep sshd\n | grep \"Disconnected from\"\n | sed 's/.*Disconnected from //'\n```\n\n上面这段命令中，我们使用了一段简单的 *正则表达式*。正则表达式是一种非常强大的工具，可以让我们基于某种模式来对字符串进行匹配。`s` 命令的语法如下：`s/REGEX/SUBSTITUTION/`, 其中 `REGEX` 部分是我们需要使用的正则表达式，而 `SUBSTITUTION` 是用于替换匹配结果的文本。\n\n（您可能会从我们的 Vim [讲座笔记](/2020/editors/#advanced-vim) 中的\"搜索和替换\"部分认出这种语法！实际上，Vim 使用的搜索和替换语法与 `sed` 的替换命令相似。学习一个工具通常有助于您更熟练地使用其他工具。）\n\n## 正则表达式\n\n正则表达式非常常见也非常有用，值得您花些时间去理解它。让我们从这一句正则表达式开始学习： `/.*Disconnected from /`。正则表达式通常以（尽管并不总是） `/` 开始和结束。大多数的 ASCII 字符都表示它们本来的含义，但是有一些字符确实具有表示匹配行为的“特殊”含义。不同字符所表示的含义，根据正则表达式的实现方式不同，也会有所变化，这一点确实令人沮丧。常见的模式有：\n\n - `.`  除换行符之外的 \"任意单个字符\"\n - `*` 匹配前面字符零次或多次\n - `+` 匹配前面字符一次或多次\n - `[abc]` 匹配 `a`, `b` 和 `c` 中的任意一个\n - `(RX1|RX2)` 任何能够匹配 `RX1` 或 `RX2` 的结果\n - `^` 行首\n - `$` 行尾\n\n`sed` 的正则表达式有些时候是比较奇怪的，它需要你在这些模式前添加 `\\` 才能使其具有特殊含义。或者，您也可以添加 `-E` 选项来支持这些匹配。\n\n回过头我们再看 `/.*Disconnected from /`，我们会发现这个正则表达式可以匹配任何以若干任意字符开头，并接着包含 \"Disconnected from\" 的字符串。这也正是我们所希望的。但是请注意，正则表达式并不容易写对。如果有人将 \"Disconnected from\" 作为自己的用户名会怎样呢？\n\n```\nJan 17 03:13:00 thesquareplanet.com sshd[2631]: Disconnected from invalid user Disconnected from 46.97.239.16 port 55920 [preauth]\n```\n正则表达式会如何匹配？`*` 和 `+` 在默认情况下是贪婪模式，也就是说，它们会尽可能多的匹配文本。因此对上述字符串的匹配结果如下：\n\n```\n46.97.239.16 port 55920 [preauth]\n```\n这可不是我们想要的结果。对于某些正则表达式的实现来说，您可以给 `*` 或 `+` 增加一个 `?` 后缀使其变成非贪婪模式，但是很可惜 `sed` 并不支持该后缀。不过，我们可以切换到\nperl 的命令行模式，该模式支持编写这样的正则表达式：\n\n```bash\nperl -pe 's/.*?Disconnected from //'\n```\n\n让我们回到 `sed` 命令并使用它完成后续的任务，毕竟对于这一类任务，`sed` 是最常见的工具。`sed` 还可以非常方便的做一些事情，例如打印匹配后的内容，一次调用中进行多次替换搜索等。但是这些内容我们并不会在此进行介绍。`sed` 本身是一个非常全能的工具，但是在具体功能上往往能找到更好的工具作为替代品。\n\n好的，我们还需要去掉用户名后面的后缀，应该如何操作呢？\n\n想要匹配用户名后面的文本，尤其是当这里的用户名可以包含空格时，这个问题变得非常棘手！这里我们需要做的是匹配 *一整行*：\n\n```bash\n | sed -E 's/.*Disconnected from (invalid |authenticating )?user .* [^ ]+ port [0-9]+( \\[preauth\\])?$//'\n```\n让我们借助正则表达式在线调试工具 [regex debugger](https://regex101.com/r/qqbZqh/2) 来理解这段表达式。OK，开始的部分和以前是一样的，随后，我们匹配两种类型的“user”（在日志中基于两种前缀区分）。再然后我们匹配属于用户名的所有字符。接着，再匹配任意一个单词（`[^ ]+` 会匹配任意非空且不包含空格的序列）。紧接着后面匹配单“port”和它后面的一串数字，以及可能存在的后缀 `[preauth]`，最后再匹配行尾。\n\n\n注意，这样做的话，即使用户名是“Disconnected from”，对匹配结果也不会有任何影响，您知道这是为什么吗？\n\n问题还没有完全解决，日志的内容全部被替换成了空字符串，整个日志的内容因此都被删除了。我们实际上希望能够将用户名 *保留* 下来。对此，我们可以使用“捕获组（capture groups）”来完成。被圆括号内的正则表达式匹配到的文本，都会被存入一系列以编号区分的捕获组中。捕获组的内容可以在替换字符串时使用（有些正则表达式的引擎甚至支持替换表达式本身），例如 `\\1`、 `\\2`、`\\3` 等等，因此可以使用如下命令：\n\n```bash\n | sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \\[preauth\\])?$/\\2/'\n```\n\n想必您已经意识到了，为了完成某种匹配，我们最终可能会写出非常复杂的正则表达式。例如，这里有一篇关于如何匹配电子邮箱地址的文章 [e-mail address](https://www.regular-expressions.info/email.html)，匹配电子邮箱可一点 [也不简单](https://emailregex.com/)。网络上还有很多关于如何匹配电子邮箱地址的 [讨论](https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression/1917982)。人们还为其编写了 [测试用例](https://fightingforalostcause.net/content/misc/2006/compare-email-regex.php) 及 [测试矩阵](https://mathiasbynens.be/demo/url-regex)。您甚至可以编写一个用于判断一个数 [是否为质数](https://www.noulakaz.net/2007/03/18/a-regular-expression-to-check-for-prime-numbers/) 的正则表达式。\n\n\n正则表达式是出了名的难以写对，但是它仍然会是您强大的常备工具之一。\n\n## 回到数据整理\n\nOK，现在我们有如下表达式：\n\n```bash\nssh myserver journalctl\n | grep sshd\n | grep \"Disconnected from\"\n | sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \\[preauth\\])?$/\\2/'\n```\n\n`sed` 还可以做很多各种各样有趣的事情，例如文本注入：(使用 `i` 命令)，打印特定的行 (使用 `p` 命令)，基于索引选择特定行等等。详情请见 `man sed`!\n\n现在，我们已经得到了一个包含用户名的列表，列表中的用户都曾经尝试过登录我们的系统。但这还不够，让我们过滤出那些最常出现的用户：\n\n```bash\nssh myserver journalctl\n | grep sshd\n | grep \"Disconnected from\"\n | sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \\[preauth\\])?$/\\2/'\n | sort | uniq -c\n```\n\n`sort` 会对其输入数据进行排序。`uniq -c` 会把连续出现的行折叠为一行并使用出现次数作为前缀。我们希望按照出现次数排序，过滤出最常出现的用户名：\n\n```bash\nssh myserver journalctl\n | grep sshd\n | grep \"Disconnected from\"\n | sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \\[preauth\\])?$/\\2/'\n | sort | uniq -c\n | sort -nk1,1 | tail -n10\n```\n\n`sort -n` 会按照数字顺序对输入进行排序（默认情况下是按照字典序排序\n`-k1,1` 则表示“仅基于以空格分割的第一列进行排序”。`,n` 部分表示“仅排序到第 n 个部分”，默认情况是到行尾。就本例来说，针对整个行进行排序也没有任何问题，我们这里主要是为了学习这一用法！\n\n如果我们希望得到登录次数最少的用户，我们可以使用 `head` 来代替 `tail`。或者使用 `sort -r` 来进行倒序排序。\n\n\n相当不错。但我们只想获取用户名，而且不要一行一个地显示。\n\n\n```bash\nssh myserver journalctl\n | grep sshd\n | grep \"Disconnected from\"\n | sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \\[preauth\\])?$/\\2/'\n | sort | uniq -c\n | sort -nk1,1 | tail -n10\n | awk '{print $2}' | paste -sd,\n```\n\n如果您使用的是 MacOS：注意这个命令并不能配合 MacOS 系统默认的 BSD `paste` 使用。参考 [课程概览与 shell](https://missing-semester-cn.github.io/2020/course-shell/) 的习题内容获取更多相关信息。\n\n我们可以利用 `paste` 命令来合并行(`-s`)，并指定一个分隔符进行分割 (`-d`)，那 `awk` 的作用又是什么呢？\n\n## awk -- 另外一种编辑器\n\n`awk` 其实是一种编程语言，只不过它碰巧非常善于处理文本。关于 `awk` 可以介绍的内容太多了，限于篇幅，这里我们仅介绍一些基础知识。\n\n首先， `{print $2}` 的作用是什么？ `awk` 程序接受一个模式串（可选）和一个代码块，指定当模式匹配时该做何种操作。我们这里用的是默认的模式串，即匹配所有行。\n在代码块中，`$0` 表示整行的内容，`$1` 到 `$n` 为一行中的 n 个域，域的分割基于 `awk` 的域分隔符（默认是空格，可以通过 `-F` 来修改）。在这个例子中，我们的代码意思是：对于每一行文本，打印其第二个部分，也就是用户名。\n\n让我们康康，还有什么炫酷的操作可以做。让我们统计有多少用户名以 `c` 开头，`e` 结尾，并且仅尝试过一次登录的用户。\n\n```bash\n | awk '$1 == 1 && $2 ~ /^c[^ ]*e$/ { print $2 }' | wc -l\n```\n\n让我们好好分析一下。首先，注意这次我们为 `awk` 指定了一个匹配模式串（也就是 `{...}` 前面的那部分内容）。该匹配要求文本的第一部分需要等于 1（这部分刚好是 `uniq -c` 得到的计数值），然后其第二部分必须满足给定的一个正则表达式。代码块中的内容则表示打印用户名。然后我们使用 `wc -l` 统计输出结果的行数。\n\n不过，既然 `awk` 是一种编程语言，那么则可以这样： \n\n```awk\nBEGIN { rows = 0 }\n$1 == 1 && $2 ~ /^c[^ ]*e$/ { rows += $1 }\nEND { print rows }\n```\n\n\n`BEGIN` 也是一种模式，它会匹配输入的开头（ `END` 则匹配结尾）。然后，对每一行第一个部分进行累加，最后将结果输出。事实上，我们完全可以抛弃 `grep` 和 `sed` ，因为 `awk` 就可以 [解决所有问题](https://backreference.org/2010/02/10/idiomatic-awk)。至于怎么做，就留给读者们做课后练习吧。\n\n\n## 分析数据\n\n想做数学计算也是可以的！例如这样，您可以将每行的数字加起来：\n\n```bash\n | paste -sd+ | bc -l\n```\n\n下面这种更加复杂的表达式也可以：\n\n```bash\necho \"2*($(data | paste -sd+))\" | bc -l\n```\n\n有多种方式获取统计数据。[`st`](https://github.com/nferraz/st) 干净利落，但如果您已经安装了 [R](https://www.r-project.org)：\n\n```bash\nssh myserver journalctl\n | grep sshd\n | grep \"Disconnected from\"\n | sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \\[preauth\\])?$/\\2/'\n | sort | uniq -c\n | awk '{print $1}' | R --slave -e 'x <- scan(file=\"stdin\", quiet=TRUE); summary(x)'\n```\n\nR 也是一种编程语言，它非常适合被用来进行数据分析和 [绘制图表](https://ggplot2.tidyverse.org/)。这里我们不会讲的特别详细， 您只需要知道 `summary` 可以打印某个向量的统计结果。我们将输入的一系列数据存放在一个向量后，利用 R 语言就可以得到我们想要的统计数据。\n\n如果您希望绘制一些简单的图表， `gnuplot` 可以帮助到您：\n\n```bash\nssh myserver journalctl\n | grep sshd\n | grep \"Disconnected from\"\n | sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \\[preauth\\])?$/\\2/'\n | sort | uniq -c\n | sort -nk1,1 | tail -n10\n | gnuplot -p -e 'set boxwidth 0.5; plot \"-\" using 1:xtic(2) with boxes'\n```\n\n## 利用数据整理来确定参数\n\n有时候您要利用数据整理技术从一长串列表里找出你所需要安装或移除的东西。我们之前讨论的相关技术配合 `xargs` 即可实现：\n\n\n```bash\nrustup toolchain list | grep nightly | grep -vE \"nightly-x86\" | sed 's/-x86.*//' | xargs rustup toolchain uninstall\n```\n\n## 整理二进制数据\n\n虽然到目前为止我们的讨论都是基于文本数据，但对于二进制文件其实同样有用。例如我们可以用 ffmpeg 从相机中捕获一张图片，将其转换成灰度图后通过 SSH 将压缩后的文件发送到远端服务器，并在那里解压、存档并显示。\n\n```bash\nffmpeg -loglevel panic -i /dev/video0 -frames 1 -f image2 -\n | convert - -colorspace gray -\n | gzip\n | ssh mymachine 'gzip -d | tee copy.jpg | env DISPLAY=:0 feh -'\n```\n\n# 课后练习\n[习题解答]({{site.url}}/{{site.solution_url}}/{{page.solution.url}})\n\n1. 学习一下这篇简短的 [交互式正则表达式教程](https://regexone.com/).\n2. 统计 words 文件 (`/usr/share/dict/words`) 中包含至少三个 `a` 且不以 `'s` 结尾的单词个数。这些单词中，出现频率前三的末尾两个字母是什么？ `sed` 的 `y` 命令，或者 `tr` 程序也许可以帮你解决大小写的问题。共存在多少种词尾两字母组合？还有一个很 有挑战性的问题：哪个组合从未出现过？\n3. 进行原地替换听上去很有诱惑力，例如：\n   `sed s/REGEX/SUBSTITUTION/ input.txt > input.txt`。但是这并不是一个明智的做法，为什么呢？还是说只有 `sed` 是这样的? 查看 `man sed` 来完成这个问题\n\n4. 找出您最近十次开机的开机时间平均数、中位数和最长时间。在 Linux 上需要用到 `journalctl` ，而在 macOS 上使用 `log show`。找到每次起到开始和结束时的时间戳。在 Linux 上类似这样操作：\n   ```\n   Logs begin at ...\n   ```\n   和\n   ```\n   systemd[577]: Startup finished in ...\n   ```\n   在 macOS 上, [查找](https://eclecticlight.co/2018/03/21/macos-unified-log-3-finding-your-way/):\n\n   ```\n   === system boot:\n   ```\n   和\n   ```\n   Previous shutdown cause: 5\n   ```\n5. 查看之前三次重启启动信息中不同的部分(参见 `journalctl` 的 `-b` 选项)。将这一任务分为几个步骤，首先获取之前三次启动的启动日志，也许获取启动日志的命令就有合适的选项可以帮助您提取前三次启动的日志，亦或者您可以使用 `sed '0,/STRING/d'` 来删除 `STRING` 匹配到的字符串前面的全部内容。然后，过滤掉每次都不相同的部分，例如时间戳。下一步，重复记录输入行并对其计数(可以使用 `uniq` )。最后，删除所有出现过 3 次的内容（因为这些内容是三次启动日志中的重复部分）。\n6. 在网上找一个类似 [这个](https://stats.wikimedia.org/EN/TablesWikipediaZZ.htm) 或者 [这个](https://ucr.fbi.gov/crime-in-the-u.s/2016/crime-in-the-u.s.-2016/topic-pages/tables/table-1) 的数据集。或者从 [这里](https://www.springboard.com/blog/free-public-data-sets-data-science-project/) 找一些。使用 `curl` 获取数据集并提取其中两列数据，如果您想要获取的是 HTML 数据，那么 [`pup`](https://github.com/EricChiang/pup) 可能会更有帮助。对于 JSON 类型的数据，可以试试 [`jq`](https://stedolan.github.io/jq/)。请使用一条指令来找出其中一列的最大值和最小值，用另外一条指令计算两列之间差的总和。\n"
  },
  {
    "path": "_2020/debugging-profiling.md",
    "content": "---\nlayout: lecture\ntitle: \"调试及性能分析\"\ndate: 2020-01-23\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n  aspect: 56.25\n  id: l812pUnKxME\nsolution:\n    ready: true\n    url: debugging-profiling-solution\n---\n\n\n代码不能完全按照您的想法运行，它只能完全按照您的写法运行，这是编程界的一条金科玉律。\n\n让您的写法符合您的想法是非常困难的。在这节课中，我们会传授给您一些非常有用技术，帮您处理代码中的 bug 和程序性能问题。\n\n\n# 调试代码\n\n## 打印调试法与日志\n\n\"最有效的 debug 工具就是细致的分析，配合恰当位置的打印语句\" — Brian Kernighan, _Unix 新手入门_。\n\n调试代码的第一种方法往往是在您发现问题的地方添加一些打印语句，然后不断重复此过程直到您获取了足够的信息并找到问题的根本原因。\n\n另外一个方法是使用日志，而不是临时添加打印语句。日志较普通的打印语句有如下的一些优势：\n\n- 您可以将日志写入文件、socket 或者甚至是发送到远端服务器而不仅仅是标准输出；\n- 日志可以支持严重等级（例如 INFO, DEBUG, WARN, ERROR 等），这使您可以根据需要过滤日志；\n- 对于新发现的问题，很可能您的日志中已经包含了可以帮助您定位问题的足够的信息。\n\n\n[这里](/static/files/logger.py) 是一个包含日志的例程序：\n\n```bash\n$ python logger.py\n# Raw output as with just prints\n$ python logger.py log\n# Log formatted output\n$ python logger.py log ERROR\n# Print only ERROR levels and above\n$ python logger.py color\n# Color formatted output\n```\n\n有很多技巧可以使日志的可读性变得更好，我最喜欢的一个是技巧是对其进行着色。到目前为止，您应该已经知道，以彩色文本显示终端信息时可读性更好。但是应该如何设置呢？\n\n`ls` 和 `grep` 这样的程序会使用 [ANSI escape codes](https://en.wikipedia.org/wiki/ANSI_escape_code)，它是一系列的特殊字符，可以使您的 shell 改变输出结果的颜色。例如，执行 `echo -e \"\\e[38;2;255;0;0mThis is red\\e[0m\"` 会打印红色的字符串：`This is red` 。只要您的终端支持 [真彩色](https://gist.github.com/XVilka/8346728#terminals--true-color)。如果您的终端不支持真彩色（例如 MacOS 的 Terminal.app），您可以使用支持更加广泛的 16 色，例如：\"\\e[31; 1mThis is red\\e[0m \"。\n\n下面这个脚本向您展示了如何在终端中打印多种颜色（只要您的终端支持真彩色）\n\n```bash\n#!/usr/bin/env bash\nfor R in $(seq 0 20 255); do\n    for G in $(seq 0 20 255); do\n        for B in $(seq 0 20 255); do\n            printf \"\\e[38;2;${R};${G};${B}m█\\e[0m\";\n        done\n    done\ndone\n```\n\n## 第三方日志系统\n\n如果您正在构建大型软件系统，您很可能会使用到一些依赖，有些依赖会作为程序单独运行。如 Web 服务器、数据库或消息代理都是此类常见的第三方依赖。\n\n和这些系统交互的时候，阅读它们的日志是非常必要的，因为仅靠客户端侧的错误信息可能并不足以定位问题。\n\n幸运的是，大多数的程序都会将日志保存在您的系统中的某个地方。对于 UNIX 系统来说，程序的日志通常存放在 `/var/log`。例如， [NGINX](https://www.nginx.com/) web 服务器就将其日志存放于 `/var/log/nginx`。\n\n目前，系统开始使用 **system log**，您所有的日志都会保存在这里。大多数（但不是全部的）Linux 系统都会使用 `systemd`，这是一个系统守护进程，它会控制您系统中的很多东西，例如哪些服务应该启动并运行。`systemd` 会将日志以某种特殊格式存放于 `/var/log/journal`，您可以使用 [`journalctl`](http://man7.org/linux/man-pages/man1/journalctl.1.html) 命令显示这些消息。\n\n类似地，在 macOS 系统中，除了有一个 `/var/log/system.log` 之外，越来越多的工具开始使用用 [`log show`](https://www.manpagez.com/man/1/log/) 显示的系统日志。\n\n对于大多数的 UNIX 系统，您也可以使用 [`dmesg`](http://man7.org/linux/man-pages/man1/dmesg.1.html) 命令来读取内核的日志。\n\n如果您希望将日志加入到系统日志中，您可以使用 [`logger`](http://man7.org/linux/man-pages/man1/logger.1.html) 这个 shell 程序。下面这个例子显示了如何使用 `logger` 并且如何找到能够将其存入系统日志的条目。\n\n不仅如此，大多数的编程语言也提供写系统日志的方法。\n\n```bash\nlogger \"Hello Logs\"\n# On macOS\nlog show --last 1m | grep Hello\n# On Linux\njournalctl --since \"1m ago\" | grep Hello\n```\n\n正如我们在数据整理那节课上看到的那样，日志的内容可以非常的多，我们需要对其进行处理和过滤才能得到我们想要的信息。\n\n如果您发现您需要对 `journalctl` 和 `log show` 的结果进行大量的过滤，那么此时可以考虑使用它们自带的选项对其结果先过滤一遍再输出。还有一些像 [`lnav`](http://lnav.org/) 这样的工具，它为日志文件提供了更好的展现和浏览方式。\n\n## 调试器\n\n当通过打印已经不能满足您的调试需求时，您应该使用调试器。\n\n调试器是一种可以允许我们和正在执行的程序进行交互的程序，它可以做到：\n\n- 当到达某一行时将程序暂停；\n- 一次一条指令地逐步执行程序；\n- 程序崩溃后查看变量的值；\n- 满足特定条件时暂停程序；\n- 其他高级功能。\n\n很多编程语言都有自己的调试器。Python 的调试器是 [`pdb`](https://docs.python.org/3/library/pdb.html).\n\n下面对 `pdb` 支持的命令进行简单的介绍：\n\n- **l**(ist) - 显示当前行周围的 11 行，或接着上次显示的，继续往下显示 11 行；\n- **s**(tep) - 执行当前行，并在第一个可能的时机停止（通常指步入函数）；\n- **n**(ext) - 继续执行，直到到达当前函数的下一行或函数返回（通常指步过函数）；\n- **b**(reak) - 设置断点（基于传入的参数）；\n- **p**(rint) - 在当前上下文对表达式求值并打印结果。还有一个 **pp** 命令，它使用 [`pprint`](https://docs.python.org/3/library/pprint.html) 打印；\n- **r**(eturn) - 执行到当前函数返回；\n- **q**(uit) - 退出调试器。\n\n让我们使用 `pdb` 来修复下面的 Python 代码（参考讲座视频）\n\n```python\ndef bubble_sort(arr):\n    n = len(arr)\n    for i in range(n):\n        for j in range(n):\n            if arr[j] > arr[j+1]:\n                arr[j] = arr[j+1]\n                arr[j+1] = arr[j]\n    return arr\n\nprint(bubble_sort([4, 2, 1, 8, 7, 6]))\n```\n\n注意，因为 Python 是一种解释型语言，所以我们可以使用 `pdb` shell 来执行命令和指令。[`ipdb`](https://pypi.org/project/ipdb/) 是 `pdb` 的增强版，它使用 [`IPython`](https://ipython.org) 作为 REPL 并开启了 tab 补全、语法高亮、更好的回溯和更好的内省，同时保留了与 `pdb` 模块相同的接口。\n\n对于更底层的编程语言，您可能需要了解一下 [`gdb`](https://www.gnu.org/software/gdb/)（及其改进版 [`pwndbg`](https://github.com/pwndbg/pwndbg)）和 [`lldb`](https://lldb.llvm.org/)。\n\n它们针对类 C 语言的调试进行了优化，但也允许您探索几乎任何进程并获取其当前的机器状态，例如：寄存器、栈、程序计数器等。\n\n## 专门工具\n\n即使您需要调试的程序是一个二进制的黑盒程序，仍然有一些工具可以帮助到您。当您的程序需要执行一些只有操作系统内核才能完成的操作时，它需要使用 [系统调用](https://en.wikipedia.org/wiki/System_call)。有一些命令可以帮助您追踪您的程序执行的系统调用。在 Linux 中可以使用 [`strace`](http://man7.org/linux/man-pages/man1/strace.1.html) ，在 macOS 和 BSD 中可以使用 [`dtrace`](http://dtrace.org/blogs/about/)。`dtrace` 用起来可能有些别扭，因为它使用的是它自有的 `D` 语言，但是我们可以使用一个叫做 [`dtruss`](https://www.manpagez.com/man/1/dtruss/) 的封装使其具有和 `strace` (更多信息参考 [这里](https://8thlight.com/blog/colin-jones/2015/11/06/dtrace-even-better-than-strace-for-osx.html))类似的接口\n\n下面的例子展现来如何使用 `strace` 或 `dtruss` 来显示在 `ls` 执行时追踪 [`stat`](http://man7.org/linux/man-pages/man2/stat.2.html) 系统调用的结果。若需要深入了解 `strace`，[这篇文章](https://blogs.oracle.com/linux/strace-the-sysadmins-microscope-v2) 值得一读。\n\n```bash\n# On Linux\nsudo strace -e lstat ls -l > /dev/null\n# On macOS\nsudo dtruss -t lstat64_extended ls -l > /dev/null\n```\n\n有些情况下，我们需要查看网络数据包才能定位问题。像 [`tcpdump`](http://man7.org/linux/man-pages/man1/tcpdump.1.html) 和 [Wireshark](https://www.wireshark.org/) 这样的网络数据包分析工具可以帮助您获取网络数据包的内容并基于不同的条件进行过滤。 \n\n对于 web 开发， Chrome/Firefox 的开发者工具非常方便，功能也很强大：\n- 源码 -查看任意站点的 HTML/CSS/JS 源码；\n- 实时修改 HTML, CSS, JS 代码 - 修改网站的内容、样式和行为用于测试（由此可见网页截图是不可信的）；\n- Javascript shell - 在 JS REPL 中执行命令；\n- 网络 - 分析请求的时间线；\n- 存储 - 查看 Cookies 和本地应用存储。\n\n## 静态分析\n\n有些问题是您不需要执行代码就能发现的。例如，仔细观察一段代码，您就能发现某个循环变量覆盖了某个已经存在的变量或函数名；或是有个变量在被读取之前并没有被定义。\n这种情况下 [静态分析](https://en.wikipedia.org/wiki/Static_program_analysis) 工具就可以帮我们找到问题。静态分析会将程序的源码作为输入然后基于规则对其进行分析并对代码的正确性进行推理。\n\n下面这段 Python 代码中存在几个问题。 首先，我们的循环变量 `foo` 覆盖了之前定义的函数 `foo`。最后一行，我们还把 `bar` 错写成了 `baz`，因此当程序完成 `sleep`  (一分钟)后，执行到这一行的时候便会崩溃。\n\n```python\nimport time\n\ndef foo():\n    return 42\n\nfor foo in range(5):\n    print(foo)\nbar = 1\nbar *= 0.2\ntime.sleep(60)\nprint(baz)\n```\n静态分析工具可以发现此类的问题。当我们使用 [`pyflakes`](https://pypi.org/project/pyflakes) 分析代码的时候，我们会得到与这两处 bug 相关的错误信息。[`mypy`](http://mypy-lang.org/) 则是另外一个工具，它可以对代码进行类型检查。这里，`mypy` 会经过我们 `bar` 起初是一个 `int` ，然后变成了 `float`。这些问题都可以在不运行代码的情况下被发现。\n\n```bash\n$ pyflakes foobar.py\nfoobar.py:6: redefinition of unused 'foo' from line 3\nfoobar.py:11: undefined name 'baz'\n\n$ mypy foobar.py\nfoobar.py:6: error: Incompatible types in assignment (expression has type \"int\", variable has type \"Callable[[], Any]\")\nfoobar.py:9: error: Incompatible types in assignment (expression has type \"float\", variable has type \"int\")\nfoobar.py:11: error: Name 'baz' is not defined\nFound 3 errors in 1 file (checked 1 source file)\n```\n\n在 shell 工具那一节课的时候，我们介绍了 [`shellcheck`](https://www.shellcheck.net/)，这是一个类似的工具，但它是应用于 shell 脚本的。\n\n多数编辑器和 IDE 都支持在编辑器界面内直接显示这些工具的输出结果，并高亮标出警告和错误的位置。这通常被称为**代码检查（code linting）**，它也可以用来展示其他类型的问题，例如代码风格违规或不安全的代码结构。\n\n在 vim 中，插件 [`ale`](https://vimawesome.com/plugin/ale) 或 [`syntastic`](https://vimawesome.com/plugin/syntastic) 可以帮助您做同样的事情。\n在 Python 中， [`pylint`](https://www.pylint.org) 和 [`pep8`](https://pypi.org/project/pep8/) 是风格检查工具的典型例子，而 [`bandit`](https://pypi.org/project/bandit/) 则是设计用来发现常见安全漏洞的工具。对于其它语言，人们已经整理了非常详尽的静态分析工具列表，例如 [Awesome Static Analysis](https://github.com/mre/awesome-static-analysis)（您可能想去看看其中的 _Writing_ 一节）。代码检查工具 (linters) 则可以参考 [Awesome Linters](https://github.com/caramelomartins/awesome-linters)。\n\n与风格检查相辅相成的是**代码格式化工具（code formatters）**，例如 Python 的 [`black`](https://github.com/psf/black)、Go 语言的 `gofmt`、Rust 的 `rustfmt` 以及 JavaScript、HTML 和 CSS 的 [`prettier`](https://prettier.io/)。这些工具会自动格式化您的代码，使其符合该编程语言通用的风格规范。虽然您可能不太情愿将代码风格的控制权交给工具，但标准化的代码格式不仅有助于他人阅读您的代码，也能让您更轻松地阅读他人（同样经过格式化）的代码。\n\n# 性能分析 (Profiling)\n\n即使你的代码在功能上完全符合预期，但如果它在运行过程中耗尽了所有的 CPU 或内存资源，那也未必合格。算法课程通常会教授大 O 表示法，却很少教你如何找到程序中的热点 (hot spots)。鉴于[过早优化是万恶之源](http://wiki.c2.com/?PrematureOptimization)，您应该了解一下性能分析器 (profilers) 和监控工具。它们会帮助您找到程序中最耗时、最耗资源的部分，从而让您能够集中精力优化这些特定的部分。\n\n## 计时\n\n与调试类似，多数情况下，只需打印代码从一处运行到另一处的时间，即可发现问题。下面是一个使用 Python [`time`](https://docs.python.org/3/library/time.html) 模块的例子：\n\n```python\nimport time, random\nn = random.randint(1, 10) * 100\n\n# 获取当前时间 \nstart = time.time()\n\n# 做些工作\nprint(\"Sleeping for {} ms\".format(n))\ntime.sleep(n/1000)\n\n# 比较当前时间和起始时间\nprint(time.time() - start)\n\n# 输出\n# Sleeping for 500 ms\n# 0.5713930130004883\n```\n\n不过，墙上时间（wall clock time）也可能会有误导性，因为计算机可能同时在运行其他进程，或者在等待某些事件发生。工具通常会区分实际时间、用户时间和系统时间。通常用户时间加系统时间代表了您的进程在 CPU 上实际消耗了多少时间（更详细的解释可以参考 [这篇文章](https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1)）。\n\n- 真实时间 _Real_ - 程序从开始到结束流逝的墙上时间，包括其他进程使用的时间以及阻塞（例如等待 I/O 或网络）的时间\n- 用户时间 _User_ - CPU 执行用户态代码所花费的时间\n- 系统时间 _Sys_ - CPU 执行内核态代码所花费的时间\n\n例如，试着写一个执行 HTTP 请求的命令，并在命令前加上 [`time`](http://man7.org/linux/man-pages/man1/time.1.html)。网络不好的情况下您可能会看到下面的输出结果。请求花费了 2 秒多才完成，但是进程仅花费了 15 毫秒的 CPU 用户时间和 12 毫秒的 CPU 内核时间。\n\n```bash\n$ time curl https://missing.csail.mit.edu &> /dev/null\nreal    0m2.561s\nuser    0m0.015s\nsys     0m0.012s\n```\n\n## 性能分析工具（profilers）\n\n### CPU\n\n大多数情况下，当人们提及性能分析工具的时候，通常指的是 CPU 性能分析工具。\nCPU 性能分析工具有两种： 追踪分析器（_tracing_）及采样分析器（_sampling_）。\n追踪分析器 会记录程序的每一次函数调用，而采样分析器则只会周期性的监测（通常为每毫秒）您的程序并记录程序堆栈。它们使用这些记录来生成统计信息，显示程序在哪些事情上花费了最多的时间。如果您希望了解更多相关信息，可以参考 [这篇](https://jvns.ca/blog/2017/12/17/how-do-ruby---python-profilers-work-) 介绍性的文章。\n\n\n大多数的编程语言都有一些基于命令行的分析器，我们可以使用它们来分析代码。它们通常可以集成在 IDE 中，但是本节课我们会专注于这些命令行工具本身。\n\n在 Python 中，我们使用 `cProfile` 模块来分析每次函数调用所消耗的时间。 在下面的例子中，我们实现了一个基础的 grep 命令：\n\n```python\n#!/usr/bin/env python\n\nimport sys, re\n\ndef grep(pattern, file):\n    with open(file, 'r') as f:\n        print(file)\n        for i, line in enumerate(f.readlines()):\n            pattern = re.compile(pattern)\n            match = pattern.search(line)\n            if match is not None:\n                print(\"{}: {}\".format(i, line), end=\"\")\n\nif __name__ == '__main__':\n    times = int(sys.argv[1])\n    pattern = sys.argv[2]\n    for i in range(times):\n        for file in sys.argv[3:]:\n            grep(pattern, file)\n```\n\n我们可以使用下面的命令来对这段代码进行分析。通过它的输出我们可以知道，IO 消耗了大量的时间，编译正则表达式也比较耗费时间。因为正则表达式只需要编译一次，我们可以将其移动到 for 循环外面来改进性能。\n\n```\n$ python -m cProfile -s tottime grep.py 1000 '^(import|\\s*def)[^,]*$' *.py\n\n[omitted program output]\n\n ncalls  tottime  percall  cumtime  percall filename:lineno(function)\n   8000    0.266    0.000    0.292    0.000 {built-in method io.open}\n   8000    0.153    0.000    0.894    0.000 grep.py:5(grep)\n  17000    0.101    0.000    0.101    0.000 {built-in method builtins.print}\n   8000    0.100    0.000    0.129    0.000 {method 'readlines' of '_io._IOBase' objects}\n  93000    0.097    0.000    0.111    0.000 re.py:286(_compile)\n  93000    0.069    0.000    0.069    0.000 {method 'search' of '_sre.SRE_Pattern' objects}\n  93000    0.030    0.000    0.141    0.000 re.py:231(compile)\n  17000    0.019    0.000    0.029    0.000 codecs.py:318(decode)\n      1    0.017    0.017    0.911    0.911 grep.py:3(<module>)\n\n[omitted lines]\n```\n\n\n关于 Python 的 `cProfile` 分析器（以及其他一些类似的分析器），需要注意的是它显示的是每次函数调用的时间。看上去可能快到反直觉，尤其是如果您在代码里面使用了第三方的函数库，因为内部函数调用也会被看作函数调用。\n\n更加符合直觉的显示分析信息的方式是包括每行代码的执行时间，这也是 *行分析器* 的工作。例如，下面这段 Python 代码会向本课程的网站发起一个请求，然后解析响应返回的页面中的全部 URL：\n\n\n```python\n#!/usr/bin/env python\nimport requests\nfrom bs4 import BeautifulSoup\n\n# 这个装饰器会告诉行分析器 \n# 我们想要分析这个函数\n@profile\ndef get_urls():\n    response = requests.get('https://missing.csail.mit.edu')\n    s = BeautifulSoup(response.content, 'lxml')\n    urls = []\n    for url in s.find_all('a'):\n        urls.append(url['href'])\n\nif __name__ == '__main__':\n    get_urls()\n```\n\n如果我们使用 Python 的 `cProfile` 分析器，我们会得到超过 2500 行的输出结果，即使对其进行排序，我仍然搞不懂时间到底都花在哪了。如果我们使用 [`line_profiler`](https://github.com/pyutils/line_profiler)，它会基于行来显示时间：\n\n```bash\n$ kernprof -l -v a.py\nWrote profile results to urls.py.lprof\nTimer unit: 1e-06 s\n\nTotal time: 0.636188 s\nFile: a.py\nFunction: get_urls at line 5\n\nLine #  Hits         Time  Per Hit   % Time  Line Contents\n==============================================================\n 5                                           @profile\n 6                                           def get_urls():\n 7         1     613909.0 613909.0     96.5      response = requests.get('https://missing.csail.mit.edu')\n 8         1      21559.0  21559.0      3.4      s = BeautifulSoup(response.content, 'lxml')\n 9         1          2.0      2.0      0.0      urls = []\n10        25        685.0     27.4      0.1      for url in s.find_all('a'):\n11        24         33.0      1.4      0.0          urls.append(url['href'])\n```\n\n### 内存\n\n像 C 或者 C++ 这样的语言，内存泄漏会导致您的程序在使用完内存后不去释放它。为了应对内存类的 Bug，我们可以使用类似 [Valgrind](https://valgrind.org/) 这样的工具来检查内存泄漏问题。\n\n对于 Python 这类具有垃圾回收机制的语言，内存分析器也是很有用的，因为对于某个对象来说，只要有指针还指向它，那它就不会被回收。\n\n下面这个例子及其输出，展示了 [memory-profiler](https://pypi.org/project/memory-profiler/) 是如何工作的（注意装饰器和 `line-profiler` 类似）。\n\n```python\n@profile\ndef my_func():\n    a = [1] * (10 ** 6)\n    b = [2] * (2 * 10 ** 7)\n    del b\n    return a\n\nif __name__ == '__main__':\n    my_func()\n```\n\n```bash\n$ python -m memory_profiler example.py\nLine #    Mem usage  Increment   Line Contents\n==============================================\n     3                           @profile\n     4      5.97 MB    0.00 MB   def my_func():\n     5     13.61 MB    7.64 MB       a = [1] * (10 ** 6)\n     6    166.20 MB  152.59 MB       b = [2] * (2 * 10 ** 7)\n     7     13.61 MB -152.59 MB       del b\n     8     13.61 MB    0.00 MB       return a\n```\n\n### 事件分析\n\n在我们使用 `strace` 调试代码的时候，您可能会希望忽略一些特殊的代码并希望在分析时将其当作黑盒处理。[`perf`](http://man7.org/linux/man-pages/man1/perf.1.html) 命令将 CPU 的区别进行了抽象，它不会报告时间和内存的消耗，而是报告与您的程序相关的系统事件。\n\n例如，`perf` 可以报告不佳的缓存局部性（poor cache locality）、大量的页错误（page faults）或活锁（livelocks）。下面是关于常见命令的简介：\n\n- `perf list` - 列出可以被 pref 追踪的事件；\n- `perf stat COMMAND ARG1 ARG2` - 收集与某个进程或指令相关的事件；\n- `perf record COMMAND ARG1 ARG2` - 记录命令执行的采样信息并将统计数据储存在 `perf.data` 中；\n- `perf report` - 格式化并打印 `perf.data` 中的数据。\n\n\n### 可视化\n\n使用分析器来分析真实的程序时，由于软件的复杂性，其输出结果中将包含大量的信息。人类是一种视觉动物，非常不善于阅读大量的文字。因此很多工具都提供了可视化分析器输出结果的功能。\n\n对于采样分析器来说，常见的显示 CPU 分析数据的形式是 [火焰图](http://www.brendangregg.com/flamegraphs.html)，火焰图会在 Y 轴显示函数调用关系，并在 X 轴显示其耗时的比例。火焰图同时还是可交互的，您可以深入程序的某一具体部分，并查看其栈追踪（您可以尝试点击下面的图片）。\n\n[![FlameGraph](http://www.brendangregg.com/FlameGraphs/cpu-bash-flamegraph.svg)](http://www.brendangregg.com/FlameGraphs/cpu-bash-flamegraph.svg)\n\n调用图和控制流图可以显示子程序之间的关系，它将函数作为节点并把函数调用作为边。将它们和分析器的信息（例如调用次数、耗时等）放在一起使用时，调用图会变得非常有用，它可以帮助我们分析程序的流程。\n在 Python 中您可以使用 [`pycallgraph`](http://pycallgraph.slowchop.com/en/master/) 来生成这些图片。\n\n![Call Graph](https://upload.wikimedia.org/wikipedia/commons/2/2f/A_Call_Graph_generated_by_pycallgraph.png)\n\n\n## 资源监控\n\n有时候，分析程序性能的第一步是搞清楚它所消耗的资源。程序变慢通常是因为它所需要的资源不够了。例如，没有足够的内存或者网络连接变慢的时候。\n\n有很多很多的工具可以被用来显示不同的系统资源，例如 CPU 占用、内存使用、网络、磁盘使用等。\n\n- **通用监控** - 最流行的工具要数 [`htop`](https://htop.dev/), 了，它是 [`top`](http://man7.org/linux/man-pages/man1/top.1.html) 的改进版。`htop` 可以显示当前运行进程的多种统计信息。`htop` 有很多选项和快捷键，常见的有：`<F6>` 进程排序、 `t` 显示树状结构和 `h` 打开或折叠线程。 还可以留意一下 [`glances`](https://nicolargo.github.io/glances/) ，它的实现类似但是用户界面更好。如果需要合并测量全部的进程， [`dstat`](http://dag.wiee.rs/home-made/dstat/) 是也是一个非常好用的工具，它可以实时地计算不同子系统资源的度量数据，例如 I/O、网络、 CPU 利用率、上下文切换等等；\n- **I/O 操作** - [`iotop`](http://man7.org/linux/man-pages/man8/iotop.8.html) 可以显示实时 I/O 占用信息而且可以非常方便地检查某个进程是否正在执行大量的磁盘读写操作；\n- **磁盘使用** - [`df`](http://man7.org/linux/man-pages/man1/df.1.html) 可以显示每个分区的信息，而 [`du`](http://man7.org/linux/man-pages/man1/du.1.html) 则可以显示当前目录下每个文件的磁盘使用情况（ **d** isk **u** sage）。`-h` 选项可以使命令以对人类（**h** uman）更加友好的格式显示数据；[`ncdu`](https://dev.yorhel.nl/ncdu) 是一个交互性更好的 `du` ，它可以让您在不同目录下导航、删除文件和文件夹；\n- **内存使用** - [`free`](http://man7.org/linux/man-pages/man1/free.1.html) 可以显示系统当前空闲的内存。内存也可以使用 `htop` 这样的工具来显示；\n- **打开文件** - [`lsof`](http://man7.org/linux/man-pages/man8/lsof.8.html)  可以列出被进程打开的文件信息。 当我们需要查看某个文件是被哪个进程打开的时候，这个命令非常有用；\n- **网络连接和配置** - [`ss`](http://man7.org/linux/man-pages/man8/ss.8.html) 能帮助我们监控网络包的收发情况以及网络接口的显示信息。`ss` 常见的一个使用场景是找到端口被进程占用的信息。如果要显示路由、网络设备和接口信息，您可以使用 [`ip`](http://man7.org/linux/man-pages/man8/ip.8.html) 命令。注意，`netstat` 和 `ifconfig` 这两个命令已经被前面那些工具所代替了。\n- **网络使用** -  [`nethogs`](https://github.com/raboof/nethogs) 和 [`iftop`](http://www.ex-parrot.com/pdw/iftop/) 是非常好的用于对网络占用进行监控的交互式命令行工具。\n\n如果您希望测试一下这些工具，您可以使用 [`stress`](https://linux.die.net/man/1/stress) 命令来为系统人为地增加负载。\n\n\n### 专用工具\n\n有时候，您只需要对黑盒程序进行基准测试，并依此对软件选择进行评估。\n类似 [`hyperfine`](https://github.com/sharkdp/hyperfine) 这样的命令行可以帮您快速进行基准测试。例如，我们在 shell 工具和脚本那一节课中我们推荐使用 `fd` 来代替 `find`。我们这里可以用 `hyperfine` 来比较一下它们。\n\n例如，下面的例子中，我们可以看到 `fd` 比 `find` 要快 20 倍。\n\n```bash\n$ hyperfine --warmup 3 'fd -e jpg' 'find . -iname \"*.jpg\"'\nBenchmark #1: fd -e jpg\n  Time (mean ± σ):      51.4 ms ±   2.9 ms    [User: 121.0 ms, System: 160.5 ms]\n  Range (min … max):    44.2 ms …  60.1 ms    56 runs\n\nBenchmark #2: find . -iname \"*.jpg\"\n  Time (mean ± σ):      1.126 s ±  0.101 s    [User: 141.1 ms, System: 956.1 ms]\n  Range (min … max):    0.975 s …  1.287 s    10 runs\n\nSummary\n  'fd -e jpg' ran\n   21.89 ± 2.33 times faster than 'find . -iname \"*.jpg\"'\n```\n\n和 debug 一样，浏览器也包含了很多不错的性能分析工具，可以用来分析页面加载，让我们可以搞清楚时间都消耗在什么地方（加载、渲染、脚本等等）。 更多关于 [Firefox](https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Profiling_with_the_Built-in_Profiler) 和 [Chrome](https://developers.google.com/web/tools/chrome-devtools/rendering-tools) 的信息可以点击链接。\n\n# 课后练习\n[习题解答]({{site.url}}/{{site.solution_url}}/{{page.solution.url}})\n## 调试\n1. 使用 Linux 上的 `journalctl` 或 macOS 上的 `log show` 命令来获取最近一天中超级用户的登录信息及其所执行的指令。如果找不到相关信息，您可以执行一些无害的命令，例如 `sudo ls` 然后再次查看。\n\n2. 学习 [这份](https://github.com/spiside/pdb-tutorial) `pdb` 实践教程并熟悉相关的命令。更深入的信息您可以参考 [这份](https://realpython.com/python-debugging-pdb) 教程。\n\n3. 安装 [`shellcheck`](https://www.shellcheck.net/) 并尝试对下面的脚本进行检查。这段代码有什么问题吗？请修复相关问题。在您的编辑器中安装一个 linter 插件，这样它就可以自动地显示相关警告信息。\n   ```bash\n   #!/bin/sh\n   ## Example: a typical script with several problems\n   for f in $(ls *.m3u)\n   do\n     grep -qi hq.*mp3 $f \\\n       && echo -e 'Playlist $f contains a HQ file in mp3 format'\n   done\n   ```\n\n4. (进阶题) 请阅读 [可逆调试](https://undo.io/resources/reverse-debugging-whitepaper/) 并尝试创建一个可以工作的例子（使用 [`rr`](https://rr-project.org/) 或 [`RevPDB`](https://morepypy.blogspot.com/2016/07/reverse-debugging-for-python.html)）。\n\n## 性能分析\n\n1. [这里](/static/files/sorts.py) 有一些排序算法的实现。请使用 [`cProfile`](https://docs.python.org/3/library/profile.html) 和 [`line_profiler`](https://github.com/pyutils/line_profiler) 来比较插入排序和快速排序的性能。两种算法的瓶颈分别在哪里？然后使用 `memory_profiler` 来检查内存消耗，为什么插入排序更好一些？然后再看看原地排序版本的快排。附加题：使用 `perf` 来查看不同算法的循环次数及缓存命中及丢失情况。\n\n2. 这里有一些用于计算斐波那契数列 Python 代码，它为计算每个数字都定义了一个函数：\n   ```python\n   #!/usr/bin/env python\n   def fib0(): return 0\n\n   def fib1(): return 1\n\n   s = \"\"\"def fib{}(): return fib{}() + fib{}()\"\"\"\n\n   if __name__ == '__main__':\n\n       for n in range(2, 10):\n           exec(s.format(n, n-1, n-2))\n       # from functools import lru_cache\n       # for n in range(10):\n       #     exec(\"fib{} = lru_cache(1)(fib{})\".format(n, n))\n       print(eval(\"fib9()\"))\n   ```\n   将代码拷贝到文件中使其变为一个可执行的程序。首先安装 [`pycallgraph`](http://pycallgraph.slowchop.com/en/master/) 和 [`graphviz`](http://graphviz.org/)(如果您能够执行 `dot`, 则说明已经安装了 GraphViz.)。并使用 `pycallgraph graphviz -- ./fib.py` 来执行代码并查看 `pycallgraph.png` 这个文件。`fib0` 被调用了多少次？我们可以通过记忆法来对其进行优化。将注释掉的部分放开，然后重新生成图片。这回每个 `fibN` 函数被调用了多少次？\n3. 我们经常会遇到的情况是某个我们希望去监听的端口已经被其他进程占用了。让我们通过进程的 PID 查找相应的进程。首先执行 `python -m http.server 4444` 启动一个最简单的 web 服务器来监听 `4444` 端口。在另外一个终端中，执行 `lsof | grep LISTEN` 打印出所有监听端口的进程及相应的端口。找到对应的 PID 然后使用 `kill <PID>` 停止该进程。\n\n4. 限制进程资源也是一个非常有用的技术。执行 `stress -c 3` 并使用 `htop` 对 CPU 消耗进行可视化。现在，执行 `taskset --cpu-list 0,2 stress -c 3` 并可视化。`stress` 占用了 3 个 CPU 吗？为什么没有？阅读 [`man taskset`](http://man7.org/linux/man-pages/man1/taskset.1.html) 来寻找答案。附加题：使用 [`cgroups`](http://man7.org/linux/man-pages/man7/cgroups.7.html) 来实现相同的操作，限制 `stress -m` 的内存使用。\n\n5. (进阶题) `curl ipinfo.io` 命令或执行 HTTP 请求并获取关于您 IP 的信息。打开 [Wireshark](https://www.wireshark.org/) 并抓取 `curl` 发起的请求和收到的回复报文。（提示：可以使用 `http` 进行过滤，只显示 HTTP 报文）\n\n"
  },
  {
    "path": "_2020/editors-notes.txt",
    "content": "I use these notes as a reference when teaching. If you're a student who ended\nup here, you probably want to look at editors.md instead.\n\nwriting words (essays) vs programming\n- programming: more time spent reading, navigating, editing, than writing in a\n  long stream\n- different programs for different purposes\n\nworth mastering an editor\n- it will save you hundreds of hours\n- worth investing time in this, unlike many other things\n\nhow to learn\n- start with tutorial\n- stick with the editor for all code (and ideally word) editing tasks\n- avoid bad habits\n- look things up as you go\n- if it seems like there should be a better way, there probably is\n    - programmers care about their editors, so they're super powerful tools\n- timeline for learning\n    - in a couple hours: will learn basic editor functions (save, quit, ...)\n    - in 20 hours: will be as fast as you were with your old editor\n    - after that: benefits start\n    - you never stop learning (these are very fancy and powerful tools)\n\nwhich editor to learn?\n- people have strong opinions: editor wars\n- https://insights.stackoverflow.com/survey/2019/#development-environments-and-tools\n- VS Code is most popular GUI-based tool\n- Vim is most popular CLI-based tool\n- we are teaching you Vim\n    - all the instructors use Vim\n    - originated from Vi editor (1976), still being developed today\n    - interesting ideas\n    - lots of tools support Vim bindings (e.g. Vim emulation for VS Code has\n      1.4m downloads)\n    - has a bit of a learning curve (compared to GUI editor), but worth it\n    - worth learning even if you finally choose to use another editor\n\nthis lecture\n- philosophy of vim: the neat ideas of this editor\n- basics\n- demos\n- exercises, resources to learn more\n    - starting with vimtutor\n    - this lecture: focus on ideas, not details\n\nmodal editing (modal ~ \"modes\")\n- designed around idea that a lot of time is spent reading/navigating/making small edits\n- simplified picture: normal mode <-> insert mode\n- more complex picture: normal <-> {insert, replace, visual, v-line, v-block, command-line}\n- mode shown in bottom left\n- keystrokes have different meanings in different modes: e.g. `x`\n- remapping caps lock to escape\n\nbasics\n- switching between normal mode and insert mode\n- that's all you need to know to get started. insert mode works as you expect.\n- buffers vs tabs vs windows\n    - buffers ~ open files\n    - every tab has one or more windows\n    - a buffer can be open in 0 or more windows\n    - unlike e.g. web browser\n- command line\n    - `:` in normal mode\n    - :q, :w, :wq, :e {name of file}, :ls, :help {topic} (:help :w, :help w)\n\nvim normal mode is a programming language (most important idea in vim)\n- once you learn the primitives, you can combine them in interesting ways\n- becomes muscle memory\n- movement\n- selection\n- edits\n- counts\n- modifiers\n\ndemo (broken fizzbuzz)\n- main is never called\n  - `G` end of file\n  - `o` open new line below\n  - type in \"if __name__ ...\" thing\n- starts at 0 instead of 1\n  - search for `/range`\n  - `ww` to move forward 2 words\n  - `i` to insert text, \"1, \"\n  - `ea` to insert after limit, \"+1\"\n- newline for \"fizzbuzz\"\n  - `jj$i` to insert text at end of line\n  - add \", end=''\"\n  - `jj.` to repeat for second print\n  - `jjo` to open line below if\n  - add \"else: print()\"\n- fizz fizz\n  - `ci'` to change fizz\n- command-line argument\n  - `ggO` to open above\n  - \"import sys\"\n  - `/10`\n  - `ci(` to \"int(sys.argv[1])\"\n\ncustomizing vim\n- ~/.vimrc\n- start with our basic config\n- look online for inspiration\n\nplugins\n- no plugin manager necessary: just put plugins in `~/.vim/pack/vendor/start/`\n- recommended plugins\n    - fuzzy file finder: ctrlp.vim\n    - code search: ack.vim\n    - directory navigation: nerdtree\n    - magic motions: vim-easymotion\n- see what your instructors use\n- find more: https://vimawesome.com/\n\nvim bindings in other tools\n- shell (set -o vi / bindkey -v)\n- $EDITOR\n- readline\n- Jupyter notebook\n\nadvanced vim demos\n\nhomework\n"
  },
  {
    "path": "_2020/editors.md",
    "content": "---\nlayout: lecture\ntitle: \"编辑器 (Vim)\"\ndate: 2020-01-15\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n  aspect: 56.25\n  id: a6Q8Na575qc\nsolution:\n    ready: true\n    url: editors-solution\n---\n\n写作和写代码其实是两项非常不同的活动。当我们编程的时候，会经常在文件间进行切换、阅读、浏览和修改代码，而不是连续编写一大段的文字。因此代码编辑器和文本编辑器是很不同的两种工具（例如微软的 Word 与 Visual Studio Code）。\n\n作为程序员，我们大部分时间都花在代码编辑上，所以花点时间掌握某个适合自己的编辑器是非常值得的。通常学习使用一个新的编辑器包含以下步骤：\n\n- 阅读教程（比如这节课以及我们为您提供的资源）\n- 坚持使用它来完成你所有的编辑工作（即使一开始这会让你的工作效率降低）\n- 随时查阅：如果某个操作看起来像是有更方便的实现方法，一般情况下真的会有\n\n如果您能够遵循上述步骤，并且坚持使用新的编辑器完成您所有的文本编辑任务，那么学习一个复杂的代码编辑器的过程一般是这样的：头两个小时，您会学习到编辑器的基本操作，例如打开和编辑文件、保存与退出、浏览缓冲区。当学习时间累计达到 20 个小时之后，您使用新编辑器的效率应该已经和使用老编辑器一样快。在此之后，其益处开始显现：有了足够的知识和肌肉记忆后，使用新编辑器将大大节省你的时间。而现代文本编辑器都是些复杂且强大的工具，永远有新东西可学：学的越多，效率越高。\n\n# 该学哪个编辑器？\n\n程序员们对自己正在使用的文本编辑器通常有着 [非常强的执念](https://zh.wikipedia.org/wiki/编辑器之战)。\n\n\n现在最流行的编辑器是什么？[Stack Overflow 的调查](https://insights.stackoverflow.com/survey/2019/#development-environments-and-tools)（这个调查可能并不如我们想象的那样客观，因为 Stack Overflow 的用户并不能代表所有程序员）显示，[Visual Studio Code](https://code.visualstudio.com) 是目前最流行的代码编辑器。而 [Vim](https://www.vim.org) 则是最流行的基于命令行的编辑器。\n\n## Vim\n\n这门课的所有教员都使用 Vim 作为编辑器。Vim 有着悠久历史；它始于 1976 年的 Vi 编辑器，到现在还在\n不断开发中。Vim 有很多聪明的设计思想，所以很多其他工具也支持 Vim 模式（比如，140 万人安装了\n[Vim emulation for VS code](https://github.com/VSCodeVim/Vim)）。即使你最后使用\n其他编辑器，Vim 也值得学习。\n\n由于不可能在 50 分钟内教授 Vim 的所有功能，我们会专注于解释 Vim 的设计哲学，教你基础知识，\n并展示一部分高级功能，然后给你掌握这个工具所需要的资源。\n\n# Vim 的哲学\n\n在编程的时候，你会把大量时间花在阅读/编辑而不是在写代码上。所以，Vim 是一个 *多模态* 编辑\n器：它对于插入文字和操纵文字有不同的模式。Vim 是可编程的（可以使用 Vimscript 或者像\nPython 一样的其他程序语言），Vim 的接口本身也是一个程序语言：键入操作（以及其助记名）\n是命令，这些命令也是可组合的。Vim 避免了使用鼠标，因为那样太慢了；Vim 甚至避免用\n上下左右键因为那样需要太多的手指移动。\n\n这样的设计哲学使得 Vim 成为了一个能跟上你思维速度的编辑器。\n\n# 编辑模式\n\nVim 的设计以大多数时间都花在阅读、浏览和进行少量编辑改动为基础，因此它具有多种操作模式：\n\n- **正常模式**：在文件中四处移动光标进行修改\n- **插入模式**：插入文本\n- **替换模式**：替换文本\n- **可视化模式**（一般，行，块）：选中文本块\n- **命令模式**：用于执行命令\n\n在不同的操作模式下，键盘敲击的含义也不同。比如，`x` 在插入模式会插入字母 `x`，但是在正常模式\n会删除当前光标所在的字母，在可视模式下则会删除选中文块。\n\n在默认设置下，Vim 会在左下角显示当前的模式。Vim 启动时的默认模式是正常模式。通常你会把大部分\n时间花在正常模式和插入模式。\n\n你可以按下 `<ESC>`（退出键）从任何其他模式返回正常模式。在正常模式，键入 `i` 进入插入\n模式，`R` 进入替换模式，`v` 进入可视（一般）模式，`V` 进入可视（行）模式，`<C-v>`\n（Ctrl-V, 有时也写作 `^V`）进入可视（块）模式，`:` 进入命令模式。\n\n因为你会在使用 Vim 时大量使用 `<ESC>` 键，所以可以考虑把大小写锁定键重定义成 `<ESC>` 键（[MacOS 教程](https://vim.fandom.com/wiki/Map_caps_lock_to_escape_in_macOS)）或者创建一个[其他的映射](https://vim.fandom.com/wiki/Avoid_the_escape_key#Mappings)通过简单的按键序列来代替 `<ESC>`。\n\n# 基本操作\n\n## 插入文本\n\n在正常模式，键入 `i` 进入插入模式。现在 Vim 跟很多其他的编辑器一样，直到你键入 `<ESC>`\n返回正常模式。你只需要掌握这一点和上面介绍的所有基础知识就可以使用 Vim 来编辑文件了\n（虽然如果你一直停留在插入模式内不一定高效）。\n\n## 缓存， 标签页， 窗口\n\nVim 会维护一系列打开的文件，称为“缓存”。一个 Vim 会话包含一系列标签页，每个标签页包含\n一系列窗口（分隔面板）。每个窗口显示一个缓存。跟网页浏览器等其他你熟悉的程序不一样的是，\n缓存和窗口不是一一对应的关系；窗口只是缓冲区的视图。一个缓存可以在 *多个* 窗口打开，甚至在同一\n个标签页内的多个窗口打开。这个功能其实很好用，比如可以查看同一个文件的不同部分。\n\nVim 默认打开一个标签页，这个标签也包含一个窗口。\n\n## 命令行\n\n在正常模式下键入 `:` 进入命令行模式。 在键入 `:` 后，你的光标会立即跳到屏幕下方的命令行。\n这个模式有很多功能，包括打开，保存，关闭文件，以及\n[退出 Vim](https://twitter.com/iamdevloper/status/435555976687923200)。\n\n- `:q` 退出（关闭窗口）\n- `:w` 保存（写）\n- `:wq` 保存然后退出\n- `:e {文件名}` 打开要编辑的文件\n- `:ls` 显示打开的缓存\n- `:help {标题}` 打开帮助文档\n    - `:help :w` 打开 `:w` 命令的帮助文档\n    - `:help w` 打开 `w` 移动的帮助文档\n\n# Vim 的接口其实是一种编程语言\n\nVim 最重要的设计思想是 Vim 的界面本身是一个程序语言。键入操作（以及他们的助记名）\n本身是命令，这些命令可以组合使用。这使得移动和编辑更加高效，特别是一旦形成肌肉记忆。\n\n## 移动\n\n多数时候你会在正常模式下，使用移动命令在缓存中导航。在 Vim 里面移动也被称为 “名词”，\n因为它们指向文字块。\n\n- 基本移动: `hjkl` （左， 下， 上， 右）\n- 词： `w` （下一个词）， `b` （词初）， `e` （词尾）\n- 行： `0` （行初）， `^` （第一个非空格字符）， `$` （行尾）\n- 屏幕： `H` （屏幕首行）， `M` （屏幕中间）， `L` （屏幕底部）\n- 翻页： `Ctrl-u` （上翻）， `Ctrl-d` （下翻）\n- 文件： `gg` （文件头）， `G` （文件尾）\n- 行数： `:{行数}<CR>` 或者 `{行数}G` ({行数}为行数)\n- 杂项： `%` （找到配对，比如括号或者 /* */ 之类的注释对）\n- 查找： `f{字符}`， `t{字符}`， `F{字符}`， `T{字符}`\n    - 查找/到 向前/向后 在本行的{字符}\n    - `,` / `;` 用于导航匹配\n- 搜索: `/{正则表达式}`, `n` / `N` 用于导航匹配\n\n## 选择\n\n可视化模式:\n\n- 可视化：`v`\n- 可视化行： `V`\n- 可视化块：`Ctrl+v`\n\n可以用移动命令来选中。\n\n## 编辑\n\n所有你需要用鼠标做的事， 你现在都可以用键盘：采用编辑命令和移动命令的组合来完成。\n这就是 Vim 的界面开始看起来像一个程序语言的时候。Vim 的编辑命令也被称为 “动词”，\n因为动词可以施动于名词。\n\n- `i` 进入插入模式 \n    - 但是对于操纵/编辑文本，不单想用退格键完成\n- `O` / `o` 在之上/之下插入行\n- `d{移动命令}` 删除 {移动命令}\n    - 例如，`dw` 删除词, `d$` 删除到行尾, `d0` 删除到行头。\n- `c{移动命令}` 改变 {移动命令}\n    - 例如，`cw` 改变词\n    - 比如 `d{移动命令}` 再 `i`\n- `x` 删除字符（等同于 `dl`）\n- `s` 替换字符（等同于 `xi`）\n- 可视化模式 + 操作\n    - 选中文字, `d` 删除 或者 `c` 改变\n- `u` 撤销, `<C-r>` 重做\n- `y` 复制 / \"yank\" （其他一些命令比如 `d` 也会复制）\n- `p` 粘贴\n- 更多值得学习的: 比如 `~` 改变字符的大小写\n\n## 计数\n\n你可以用一个计数来结合“名词”和“动词”，这会执行指定操作若干次。\n\n- `3w` 向后移动三个词\n- `5j` 向下移动 5 行\n- `7dw` 删除 7 个词\n\n## 修饰语\n\n你可以用修饰语改变“名词”的意义。修饰语有 `i`，表示“内部”或者“在内”，和 `a`，\n表示“周围”。\n\n- `ci(` 改变当前括号内的内容\n- `ci[` 改变当前方括号内的内容\n- `da'` 删除一个单引号字符串， 包括周围的单引号\n\n# 演示\n\n这里是一个有问题的 [fizz buzz](https://en.wikipedia.org/wiki/Fizz_buzz)\n实现：\n\n```python\ndef fizz_buzz(limit):\n    for i in range(limit):\n        if i % 3 == 0:\n            print('fizz')\n        if i % 5 == 0:\n            print('fizz')\n        if i % 3 and i % 5:\n            print(i)\n\ndef main():\n    fizz_buzz(10)\n```\n\n我们会修复以下问题：\n\n- 主函数没有被调用\n- 从 0 而不是 1 开始\n- 在 15 的整数倍的时候在不同行打印 \"fizz\" 和 \"buzz\"\n- 在 5 的整数倍的时候打印 \"fizz\"\n- 采用硬编码的参数 10 而不是从命令控制行读取参数\n\n\n- 主函数没有被调用\n  - `G` 文件尾\n  - `o` 向下打开一个新行\n  - 输入 \"if __name__ ...\" \n- 从 0 而不是 1 开始\n  - 搜索 `/range`\n  - `ww` 向后移动两个词\n  - `i` 插入文字， \"1, \"\n  - `ea` 在 limit 后插入， \"+1\"\n- 在新的一行 \"fizzbuzz\"\n  - `jj$i` 插入文字到行尾\n  - 加入 \", end=''\"\n  - `jj.` 重复第二个打印\n  - `jjo` 在 if 打开一行\n  - 加入 \"else: print()\"\n- fizz fizz\n  - `ci'` 变到 fizz\n- 命令控制行参数\n  - `ggO` 向上打开\n  - \"import sys\"\n  - `/10`\n  - `ci(` to \"int(sys.argv[1])\"\n\n\n展示详情请观看课程视频。比较上面用 Vim 的操作和你可能使用其他程序的操作。\n值得一提的是 Vim 需要很少的键盘操作，允许你编辑的速度跟上你思维的速度。\n\n# 自定义 Vim\n\nVim 由一个位于 `~/.vimrc` 的文本配置文件（包含 Vim 脚本命令）。你可能会启用很多基本\n设置。\n\n我们提供一个文档详细的基本设置，你可以用它当作你的初始设置。我们推荐使用这个设置因为它修复了一些 Vim 默认设置奇怪行为。\n**在 [这儿](/2020/files/vimrc) 下载我们的设置，然后将它保存成 `~/.vimrc`.**\n\nVim 能够被重度自定义，花时间探索自定义选项是值得的。你可以参考其他人的在 GitHub\n上共享的设置文件，比如，你的授课人的 Vim 设置\n([Anish](https://github.com/anishathalye/dotfiles/blob/master/vimrc),\n[Jon](https://github.com/jonhoo/configs/blob/master/editor/.config/nvim/init.vim) (uses [neovim](https://neovim.io/)),\n[Jose](https://github.com/JJGO/dotfiles/blob/master/vim/.vimrc))。\n有很多好的博客文章也聊到了这个话题。尽量不要复制粘贴别人的整个设置文件，\n而是阅读和理解它，然后采用对你有用的部分。\n\n# 扩展 Vim\n\nVim 有很多扩展插件。跟很多互联网上已经过时的建议相反，你 *不* 需要在 Vim 使用一个插件\n管理器（从 Vim 8.0 开始）。你可以使用内置的插件管理系统。只需要创建一个\n`~/.vim/pack/vendor/start/` 的文件夹，然后把插件放到这里（比如通过 `git clone`）。\n\n以下是一些我们最爱的插件：\n\n- [ctrlp.vim](https://github.com/ctrlpvim/ctrlp.vim): 模糊文件查找\n- [ack.vim](https://github.com/mileszs/ack.vim): 代码搜索\n- [nerdtree](https://github.com/scrooloose/nerdtree): 文件浏览器\n- [vim-easymotion](https://github.com/easymotion/vim-easymotion): 魔术操作\n\n\n我们尽量避免在这里提供一份冗长的插件列表。你可以查看讲师们的开源的配置文件 ([Anish](https://github.com/anishathalye/dotfiles), [Jon](https://github.com/jonhoo/configs/blob/master/editor/.config/nvim/init.vim) (使用了 [neovim](https://neovim.io/)), [Jose](https://github.com/JJGO/dotfiles/blob/master/vim/.vimrc)) 来看看我们使用的其他插件。\n浏览 [Vim Awesome](https://vimawesome.com/) 来了解一些很棒的插件。\n这个话题也有很多博客文章：搜索 \"best Vim plugins\"。\n\n# 其他程序的 Vim 模式\n\n\n很多工具提供了 Vim 模式。这些 Vim 模式的质量参差不齐；取决于具体工具，有的提供了\n很多酷炫的 Vim 功能，但是大多数对基本功能支持的很好。\n\n## Shell\n\n如果你是一个 Bash 用户，用 `set -o vi`。如果你用 Zsh：`bindkey -v`。Fish 用\n`fish_vi_key_bindings`。另外，不管利用什么 shell，你可以\n`export EDITOR=vim`。 这是一个用来决定当一个程序需要启动编辑时启动哪个的环境变量。\n例如，`git` 会使用这个编辑器来编辑 commit 信息。\n\n## Readline\n\n很多程序使用 [GNU\nReadline](https://tiswww.case.edu/php/chet/readline/rltop.html) 库来作为\n它们的命令控制行界面。Readline 也支持基本的 Vim 模式，\n可以通过在 `~/.inputrc` 添加如下行开启：\n\n```\nset editing-mode vi\n```\n\n比如，在这个设置下，Python REPL 会支持 Vim 快捷键。\n\n## 其他\n\n甚至有 Vim 的网页浏览快捷键\n[browsers](http://vim.wikia.com/wiki/Vim_key_bindings_for_web_browsers), 受欢迎的有\n用于 Google Chrome 的\n[Vimium](https://chrome.google.com/webstore/detail/vimium/dbepggeogbaibhgnhhndojpepiihcmeb?hl=en)\n和用于 Firefox 的 [Tridactyl](https://github.com/tridactyl/tridactyl)。\n你甚至可以在 [Jupyter\nnotebooks](https://github.com/lambdalisue/jupyter-vim-binding) 中用 Vim 快捷键。\n[这个列表](https://reversed.top/2016-08-13/big-list-of-vim-like-software) 中列举了支持类 vim 键位绑定的软件。\n\n\n# Vim 进阶\n\n这里我们提供了一些展示这个编辑器能力的例子。我们无法把所有的这样的事情都教给你，但是你\n可以在使用中学习。一个好的对策是: 当你在使用你的编辑器的时候感觉 “一定有更好的方法来做这个”，\n那么很可能真的有：上网搜寻一下。\n\n## 搜索和替换\n\n`:s` （替换）命令（[文档](http://vim.wikia.com/wiki/Search_and_replace)）。\n\n- `%s/foo/bar/g`\n    - 在整个文件中将 foo 全局替换成 bar\n- `%s/\\[.*\\](\\(.*\\))/\\1/g`\n    - 将有命名的 Markdown 链接替换成简单 URLs\n\n## 多窗口\n\n- 用 `:sp` / `:vsp` 来分割窗口 \n- 同一个缓存可以在多个窗口中显示。\n\n## 宏\n\n- `q{字符}` 来开始在寄存器 `{字符}` 中录制宏\n- `q` 停止录制\n- `@{字符}` 重放宏\n- 宏的执行遇错误会停止\n- `{计数}@{字符}` 执行一个宏{计数}次\n- 宏可以递归\n    - 首先用 `q{字符}q` 清除宏\n    - 录制该宏，用 `@{字符}` 来递归调用该宏\n    （在录制完成之前不会有任何操作）\n- 例子：将 xml 转成 json ([file](/2020/files/example-data.xml))\n    - 一个有 \"name\" / \"email\" 键对象的数组\n    - 用一个 Python 程序？\n    - 用 sed / 正则表达式\n        - `g/people/d`\n        - `%s/<person>/{/g`\n        - `%s/<name>\\(.*\\)<\\/name>/\"name\": \"\\1\",/g`\n        - ...\n    - Vim 命令 / 宏\n        - `ggdd`, `Gdd` 删除第一行和最后一行\n        - 格式化单个元素的宏（存放在 `e` 中）\n            - 转到有 `<name>` 的行\n            - `qe^r\"f>s\": \"<ESC>f<C\"<ESC>q`\n        - 格式化单个人的宏\n            - 转到有 `<person>` 的行\n            - `qpS{<ESC>j@eA,<ESC>j@ejS},<ESC>q`\n        - 格式化单个人然后转到下一个人的宏\n            - 转到有 `<person>` 的行\n            - `qq@pjq`\n        - 执行宏到文件尾\n            - `999@q`\n        - 手动移除最后的 `,` 然后加上 `[` 和 `]` 分隔符\n\n# 扩展资料\n\n- `vimtutor` 是一个 Vim 安装时自带的教程（注：如果你使用的是 vim 9.2 或更高的版本，那么可以在正常模式下使用 `:Tutor` 来进入一个更现代化，互动性更强的教程）\n- [Vim Adventures](https://vim-adventures.com/) 是一个学习使用 Vim 的游戏\n- [Vim Tips Wiki](http://vim.wikia.com/wiki/Vim_Tips_Wiki)\n- [Vim Advent Calendar](https://vimways.org/2019/) 有很多 Vim 小技巧\n- [Vim Golf](http://www.vimgolf.com/) 是用 Vim 的用户界面作为程序语言的 [code golf](https://en.wikipedia.org/wiki/Code_golf)\n- [Vi/Vim Stack Exchange](https://vi.stackexchange.com/)\n- [Vim Screencasts](http://vimcasts.org/)\n- [Practical Vim](https://pragprog.com/titles/dnvim2/)（书籍）\n\n# 课后练习\n[习题解答]({{site.url}}/{{site.solution_url}}/{{page.solution.url}})\n1. 完成 `vimtutor`。备注：它在一个\n   [80x24](https://en.wikipedia.org/wiki/VT100)（80 列，24 行）\n   终端窗口看起来效果最好。\n2. 下载我们提供的 [vimrc](/2020/files/vimrc)，然后把它保存到 `~/.vimrc`。 通读这个注释详细的文件\n   （用 Vim!）， 然后观察 Vim 在这个新的设置下看起来和使用起来有哪些细微的区别。\n3. 安装和配置一个插件：\n   [ctrlp.vim](https://github.com/ctrlpvim/ctrlp.vim).\n   1. 用 `mkdir -p ~/.vim/pack/vendor/start` 创建插件文件夹\n   2. 下载这个插件： `cd ~/.vim/pack/vendor/start; git clone\n      https://github.com/ctrlpvim/ctrlp.vim`\n   3. 阅读这个插件的\n      [文档](https://github.com/ctrlpvim/ctrlp.vim/blob/master/readme.md)。\n       尝试用 CtrlP 来在一个工程文件夹里定位一个文件，打开 Vim, 然后用 Vim 命令控制行开始\n      `:CtrlP`.\n    4. 自定义 CtrlP：添加\n       [configuration](https://github.com/ctrlpvim/ctrlp.vim/blob/master/readme.md#basic-options)\n       到你的 `~/.vimrc` 来用按 Ctrl-P 打开 CtrlP\n4. 练习使用 Vim, 在你自己的机器上重做 [演示](#demo)。\n5. 下个月用 Vim 完成 *所有的* 文件编辑。每当不够高效的时候，或者你感觉 “一定有一个更好的方式”时，\n   尝试求助搜索引擎，很有可能有一个更好的方式。如果你遇到难题，可以来我们的答疑时间或者给我们发邮件。\n6. 在其他工具中设置 Vim 快捷键 （见上面的操作指南）。\n7. 进一步自定义你的 `~/.vimrc` 和安装更多插件。\n8. （高阶）用 Vim 宏将 XML 转换到 JSON ([例子文件](/2020/files/example-data.xml))。\n   尝试着先完全自己做，但是在你卡住的时候可以查看上面 [宏](#macros) 章节。\n"
  },
  {
    "path": "_2020/files/example-data.xml",
    "content": "<people>\n  <person>\n    <name>Johnny Zhang Jr.</name>\n    <email>amyalvarez@cole.com</email>\n  </person>\n  <person>\n    <name>Edward Cook</name>\n    <email>dsparks@alvarez-dunn.com</email>\n  </person>\n  <person>\n    <name>Stephen Sweeney</name>\n    <email>dlewis@gmail.com</email>\n  </person>\n  <person>\n    <name>Krystal Riley</name>\n    <email>jflores@wright.biz</email>\n  </person>\n  <person>\n    <name>Ashley Robinson</name>\n    <email>robertsmichael@yahoo.com</email>\n  </person>\n  <person>\n    <name>Kimberly Brooks</name>\n    <email>sharoncunningham@larson.com</email>\n  </person>\n  <person>\n    <name>Brent Proctor</name>\n    <email>edward86@stewart.com</email>\n  </person>\n  <person>\n    <name>William Roberts</name>\n    <email>parkertodd@webb.com</email>\n  </person>\n  <person>\n    <name>Amanda Morales</name>\n    <email>lorizavala@hodges.com</email>\n  </person>\n  <person>\n    <name>Bryan Poole Jr.</name>\n    <email>carolyn56@gray-campos.net</email>\n  </person>\n  <person>\n    <name>Dale Hall</name>\n    <email>martinjames@yahoo.com</email>\n  </person>\n  <person>\n    <name>Isabella Reynolds</name>\n    <email>wbowen@wallace.com</email>\n  </person>\n  <person>\n    <name>Ann Rodriguez</name>\n    <email>charles37@taylor-riley.biz</email>\n  </person>\n  <person>\n    <name>Bryan Davis</name>\n    <email>jessica60@hotmail.com</email>\n  </person>\n  <person>\n    <name>Dalton Powell</name>\n    <email>piercenatasha@yahoo.com</email>\n  </person>\n  <person>\n    <name>Scott Turner</name>\n    <email>harold68@yahoo.com</email>\n  </person>\n  <person>\n    <name>Nicholas Castillo</name>\n    <email>dawnstephens@robinson.info</email>\n  </person>\n  <person>\n    <name>Joseph Pierce</name>\n    <email>lukepatterson@hotmail.com</email>\n  </person>\n  <person>\n    <name>Robyn White</name>\n    <email>jenniferrobinson@hotmail.com</email>\n  </person>\n  <person>\n    <name>Justin Rice</name>\n    <email>brandi76@gmail.com</email>\n  </person>\n  <person>\n    <name>Jamie Graham</name>\n    <email>harrisdavid@yahoo.com</email>\n  </person>\n  <person>\n    <name>Phillip Schmidt</name>\n    <email>stephanie33@gmail.com</email>\n  </person>\n  <person>\n    <name>John Baker</name>\n    <email>todd86@hotmail.com</email>\n  </person>\n  <person>\n    <name>Sharon Austin</name>\n    <email>srivera@yahoo.com</email>\n  </person>\n  <person>\n    <name>Erica Avila</name>\n    <email>jenniferreed@bowers-wilson.com</email>\n  </person>\n  <person>\n    <name>Jeremy Bass</name>\n    <email>jdavis@collins.com</email>\n  </person>\n  <person>\n    <name>Joshua Parsons</name>\n    <email>stephaniecoleman@miller-barker.com</email>\n  </person>\n  <person>\n    <name>Emma Mccoy</name>\n    <email>taylorjohn@wagner.net</email>\n  </person>\n  <person>\n    <name>Megan Williams</name>\n    <email>ronnie54@gmail.com</email>\n  </person>\n  <person>\n    <name>Michael Sutton</name>\n    <email>connie58@mendoza.net</email>\n  </person>\n  <person>\n    <name>Nicholas York</name>\n    <email>kennedykevin@collins.com</email>\n  </person>\n  <person>\n    <name>Donald Robles</name>\n    <email>williamsbrandon@gmail.com</email>\n  </person>\n  <person>\n    <name>Melissa Allen</name>\n    <email>pproctor@ramos-patel.com</email>\n  </person>\n  <person>\n    <name>Shannon Jones</name>\n    <email>beckkathleen@johnson.com</email>\n  </person>\n  <person>\n    <name>David White</name>\n    <email>sandra73@thompson.com</email>\n  </person>\n  <person>\n    <name>Jonathan Thomas</name>\n    <email>johnsonjeremy@gmail.com</email>\n  </person>\n  <person>\n    <name>Rachael Floyd</name>\n    <email>amanda78@johnson.info</email>\n  </person>\n  <person>\n    <name>Tina Carter</name>\n    <email>josewells@jones.net</email>\n  </person>\n  <person>\n    <name>Eric Johnson</name>\n    <email>bowersaustin@hernandez-edwards.com</email>\n  </person>\n  <person>\n    <name>William Kramer</name>\n    <email>rhunt@johnson.com</email>\n  </person>\n  <person>\n    <name>Nathan Williams</name>\n    <email>cynthiayoung@hotmail.com</email>\n  </person>\n  <person>\n    <name>Patty Schwartz</name>\n    <email>salinasdavid@sheppard.biz</email>\n  </person>\n  <person>\n    <name>David Collins</name>\n    <email>pcalhoun@yahoo.com</email>\n  </person>\n  <person>\n    <name>James Thomas</name>\n    <email>brianfox@rogers-cruz.com</email>\n  </person>\n  <person>\n    <name>Mark Casey</name>\n    <email>jerry88@graham.com</email>\n  </person>\n  <person>\n    <name>Robert Galloway</name>\n    <email>cherylmcgee@hotmail.com</email>\n  </person>\n  <person>\n    <name>Caitlin Dunn</name>\n    <email>nicholemartin@yahoo.com</email>\n  </person>\n  <person>\n    <name>Nancy Allison</name>\n    <email>martha33@molina-bullock.com</email>\n  </person>\n  <person>\n    <name>Marvin Burns</name>\n    <email>wrocha@gmail.com</email>\n  </person>\n  <person>\n    <name>Kimberly Jones</name>\n    <email>anitamunoz@french-christian.com</email>\n  </person>\n  <person>\n    <name>Caitlin Wood</name>\n    <email>thomasrandall@bowers-sullivan.org</email>\n  </person>\n  <person>\n    <name>Sara Burton</name>\n    <email>riosangelica@gmail.com</email>\n  </person>\n  <person>\n    <name>Jessica Roberson</name>\n    <email>theresa11@hotmail.com</email>\n  </person>\n  <person>\n    <name>Nicole Macias</name>\n    <email>kevinhodge@martin.biz</email>\n  </person>\n  <person>\n    <name>Christina Williams</name>\n    <email>shawn35@rice-bailey.org</email>\n  </person>\n  <person>\n    <name>Cody Winters</name>\n    <email>nicholassmith@barron-wu.com</email>\n  </person>\n  <person>\n    <name>Patricia Miller DDS</name>\n    <email>pierceraymond@watkins.org</email>\n  </person>\n  <person>\n    <name>Jennifer Lyons</name>\n    <email>vrivera@gmail.com</email>\n  </person>\n  <person>\n    <name>Jerry Rojas</name>\n    <email>jacobalexander@yahoo.com</email>\n  </person>\n  <person>\n    <name>Matthew Perez</name>\n    <email>jrivas@hotmail.com</email>\n  </person>\n  <person>\n    <name>Patrick Hogan</name>\n    <email>moorelisa@yahoo.com</email>\n  </person>\n  <person>\n    <name>Lisa Howard</name>\n    <email>stephen90@smith.biz</email>\n  </person>\n  <person>\n    <name>Justin Sloan</name>\n    <email>edwardsmichael@hotmail.com</email>\n  </person>\n  <person>\n    <name>Suzanne Morrow</name>\n    <email>shane74@yahoo.com</email>\n  </person>\n  <person>\n    <name>Theresa Lara</name>\n    <email>maryrichardson@clark.com</email>\n  </person>\n  <person>\n    <name>Christopher Powers</name>\n    <email>yfowler@davis-lee.net</email>\n  </person>\n  <person>\n    <name>Teresa Howell</name>\n    <email>amy15@yahoo.com</email>\n  </person>\n  <person>\n    <name>Richard Shelton</name>\n    <email>ksmith@yahoo.com</email>\n  </person>\n  <person>\n    <name>Jeremy Cole</name>\n    <email>bleach@gmail.com</email>\n  </person>\n  <person>\n    <name>Melissa Clark</name>\n    <email>rosejeffrey@yahoo.com</email>\n  </person>\n  <person>\n    <name>Kimberly Mcdaniel</name>\n    <email>ularson@ross-david.com</email>\n  </person>\n  <person>\n    <name>Kelly Dixon</name>\n    <email>gatesstephen@hotmail.com</email>\n  </person>\n  <person>\n    <name>Devin Quinn</name>\n    <email>wjohnson@hotmail.com</email>\n  </person>\n  <person>\n    <name>Kevin Greene</name>\n    <email>lhanson@hotmail.com</email>\n  </person>\n  <person>\n    <name>Jeffery Wiggins</name>\n    <email>amy76@gmail.com</email>\n  </person>\n  <person>\n    <name>Latoya Allen</name>\n    <email>vking@yahoo.com</email>\n  </person>\n  <person>\n    <name>Zachary Walker</name>\n    <email>diazjames@hotmail.com</email>\n  </person>\n  <person>\n    <name>Alyssa Molina</name>\n    <email>elizabeth59@gmail.com</email>\n  </person>\n  <person>\n    <name>Heather Miranda</name>\n    <email>davidturner@cortez-martinez.biz</email>\n  </person>\n  <person>\n    <name>Lori Gardner</name>\n    <email>murphytaylor@yahoo.com</email>\n  </person>\n  <person>\n    <name>Jessica Simpson</name>\n    <email>jamesdean@rosales.com</email>\n  </person>\n  <person>\n    <name>Anna Dickerson</name>\n    <email>abigailmurphy@hotmail.com</email>\n  </person>\n  <person>\n    <name>Molly Oconnor</name>\n    <email>morrisrhonda@yahoo.com</email>\n  </person>\n  <person>\n    <name>Brandi Braun</name>\n    <email>ericksonmatthew@jenkins.org</email>\n  </person>\n  <person>\n    <name>Renee Flowers</name>\n    <email>brownantonio@yang-crosby.org</email>\n  </person>\n  <person>\n    <name>Cassandra Compton</name>\n    <email>progers@yahoo.com</email>\n  </person>\n  <person>\n    <name>David Gilbert</name>\n    <email>vickie78@gmail.com</email>\n  </person>\n  <person>\n    <name>Brenda Davis</name>\n    <email>cynthiajones@thornton.com</email>\n  </person>\n  <person>\n    <name>Nicholas Rivera</name>\n    <email>longalyssa@yahoo.com</email>\n  </person>\n  <person>\n    <name>Dustin Hodges</name>\n    <email>sgolden@lee.com</email>\n  </person>\n  <person>\n    <name>Chad Wong</name>\n    <email>williambernard@mccarty.net</email>\n  </person>\n  <person>\n    <name>Robin Craig</name>\n    <email>xbyrd@austin.com</email>\n  </person>\n  <person>\n    <name>Heather Parker</name>\n    <email>allenjoshua@rodriguez.com</email>\n  </person>\n  <person>\n    <name>Jennifer Roberts</name>\n    <email>manningtravis@gmail.com</email>\n  </person>\n  <person>\n    <name>James Andrews</name>\n    <email>ginaromero@hotmail.com</email>\n  </person>\n  <person>\n    <name>Dorothy Hines</name>\n    <email>dsmith@thomas.com</email>\n  </person>\n  <person>\n    <name>Stephen Garcia</name>\n    <email>hughesbrendan@hotmail.com</email>\n  </person>\n  <person>\n    <name>Alfred Ellis</name>\n    <email>elizabeth41@crawford.info</email>\n  </person>\n  <person>\n    <name>Marilyn White</name>\n    <email>victoriaford@hotmail.com</email>\n  </person>\n  <person>\n    <name>Brian Graves</name>\n    <email>cpatel@gmail.com</email>\n  </person>\n  <person>\n    <name>Elizabeth Wagner</name>\n    <email>newtonwesley@cohen.com</email>\n  </person>\n  <person>\n    <name>Michelle Flores</name>\n    <email>shelbygross@duke-thomas.info</email>\n  </person>\n  <person>\n    <name>Larry Russell</name>\n    <email>richard99@meyer.com</email>\n  </person>\n  <person>\n    <name>Terrence Boyd</name>\n    <email>markmartin@flores.com</email>\n  </person>\n  <person>\n    <name>Jessica Carroll</name>\n    <email>eric30@yahoo.com</email>\n  </person>\n  <person>\n    <name>Erin Dean</name>\n    <email>toddmartin@guerra.biz</email>\n  </person>\n  <person>\n    <name>Craig Hernandez</name>\n    <email>joshualang@gonzalez.com</email>\n  </person>\n  <person>\n    <name>Amber Choi</name>\n    <email>doughertynancy@harmon.org</email>\n  </person>\n  <person>\n    <name>Renee Brown</name>\n    <email>terribeard@archer-gibson.info</email>\n  </person>\n  <person>\n    <name>Curtis Turner</name>\n    <email>pjohnson@hotmail.com</email>\n  </person>\n  <person>\n    <name>Benjamin Reed</name>\n    <email>marksmith@austin.net</email>\n  </person>\n  <person>\n    <name>Christina Fernandez</name>\n    <email>richardjoseph@esparza-peters.com</email>\n  </person>\n  <person>\n    <name>Jasmine Campbell</name>\n    <email>thomasmatthew@gmail.com</email>\n  </person>\n  <person>\n    <name>Catherine Bond</name>\n    <email>coreyroberts@gonzalez.com</email>\n  </person>\n  <person>\n    <name>Connie Jones</name>\n    <email>koneal@riley.com</email>\n  </person>\n  <person>\n    <name>Cody Taylor</name>\n    <email>kelsey99@hotmail.com</email>\n  </person>\n  <person>\n    <name>Kendra Gray</name>\n    <email>walkerrussell@hotmail.com</email>\n  </person>\n  <person>\n    <name>Alexander Murray</name>\n    <email>grossrobert@hotmail.com</email>\n  </person>\n  <person>\n    <name>Arthur Jackson</name>\n    <email>travis73@hotmail.com</email>\n  </person>\n  <person>\n    <name>Dr. William Vasquez DDS</name>\n    <email>gonzalezdaniel@hotmail.com</email>\n  </person>\n  <person>\n    <name>April Hampton</name>\n    <email>desireemorris@mcguire.info</email>\n  </person>\n  <person>\n    <name>Gerald Hunter</name>\n    <email>justin91@ross-scott.biz</email>\n  </person>\n  <person>\n    <name>Morgan Bolton</name>\n    <email>erika30@lloyd-smith.biz</email>\n  </person>\n  <person>\n    <name>Angela Barker</name>\n    <email>daniel17@carr.com</email>\n  </person>\n  <person>\n    <name>Angela Montgomery</name>\n    <email>jonathangoodwin@smith-perez.com</email>\n  </person>\n  <person>\n    <name>Yolanda Henry</name>\n    <email>shawnmcguire@gmail.com</email>\n  </person>\n  <person>\n    <name>Susan Hines</name>\n    <email>sarahbailey@wallace.com</email>\n  </person>\n  <person>\n    <name>Michelle Young</name>\n    <email>lewismichele@yahoo.com</email>\n  </person>\n  <person>\n    <name>Glen Hood</name>\n    <email>ljackson@vazquez.com</email>\n  </person>\n  <person>\n    <name>Christopher Wright</name>\n    <email>evansjulie@walton.com</email>\n  </person>\n  <person>\n    <name>Susan Guzman DDS</name>\n    <email>medinaelizabeth@gmail.com</email>\n  </person>\n  <person>\n    <name>Barbara Cortez</name>\n    <email>bchavez@cameron.com</email>\n  </person>\n  <person>\n    <name>Stacey Hammond</name>\n    <email>nancyturner@stewart.com</email>\n  </person>\n  <person>\n    <name>Amanda Stout</name>\n    <email>macdonaldlatoya@hotmail.com</email>\n  </person>\n  <person>\n    <name>Lisa Johnson</name>\n    <email>wnolan@gmail.com</email>\n  </person>\n  <person>\n    <name>Carlos Wyatt</name>\n    <email>iperez@cohen.com</email>\n  </person>\n  <person>\n    <name>Samantha Brewer</name>\n    <email>thomas47@hotmail.com</email>\n  </person>\n  <person>\n    <name>Brett Jackson</name>\n    <email>zpowell@cruz-rivera.com</email>\n  </person>\n  <person>\n    <name>Johnny Guzman</name>\n    <email>tmerritt@yahoo.com</email>\n  </person>\n  <person>\n    <name>Mary Davis</name>\n    <email>collinslisa@hotmail.com</email>\n  </person>\n  <person>\n    <name>Willie Mccoy</name>\n    <email>joshua20@terrell.biz</email>\n  </person>\n  <person>\n    <name>Kelsey Rivera</name>\n    <email>randy72@gmail.com</email>\n  </person>\n  <person>\n    <name>Melissa Maddox</name>\n    <email>christopher13@gmail.com</email>\n  </person>\n  <person>\n    <name>Jason Rodriguez</name>\n    <email>kellypierce@harris.com</email>\n  </person>\n  <person>\n    <name>Donna Walsh</name>\n    <email>wardraymond@martinez.com</email>\n  </person>\n  <person>\n    <name>Monique Patel</name>\n    <email>cynthia75@james.net</email>\n  </person>\n  <person>\n    <name>Dr. Lindsay Farrell PhD</name>\n    <email>brownmaria@gmail.com</email>\n  </person>\n  <person>\n    <name>Ann Ruiz</name>\n    <email>jeremiah94@pennington.org</email>\n  </person>\n  <person>\n    <name>Mary Alexander</name>\n    <email>catherineharper@munoz.org</email>\n  </person>\n  <person>\n    <name>Brittany Russell</name>\n    <email>haileywinters@russell-coffey.net</email>\n  </person>\n  <person>\n    <name>Dominique Rosales</name>\n    <email>matthewpatterson@carr.com</email>\n  </person>\n  <person>\n    <name>Henry Waters</name>\n    <email>karen72@logan.com</email>\n  </person>\n  <person>\n    <name>Jared Weaver</name>\n    <email>karlafletcher@baldwin.org</email>\n  </person>\n  <person>\n    <name>Mr. Thomas Atkins</name>\n    <email>gboone@gmail.com</email>\n  </person>\n  <person>\n    <name>Carla Cohen</name>\n    <email>ibarron@gmail.com</email>\n  </person>\n  <person>\n    <name>Tricia Lewis</name>\n    <email>pperez@hotmail.com</email>\n  </person>\n  <person>\n    <name>Mario Gill</name>\n    <email>lisa43@brown.org</email>\n  </person>\n  <person>\n    <name>James Olsen</name>\n    <email>vickie82@hotmail.com</email>\n  </person>\n  <person>\n    <name>Michael Perry</name>\n    <email>rdavis@yahoo.com</email>\n  </person>\n  <person>\n    <name>Matthew Lucas</name>\n    <email>joshuagray@carpenter-stanley.com</email>\n  </person>\n  <person>\n    <name>Christine Torres</name>\n    <email>samanthayoung@smith-aguilar.biz</email>\n  </person>\n  <person>\n    <name>Lindsay Miller</name>\n    <email>randyevans@yahoo.com</email>\n  </person>\n  <person>\n    <name>Margaret Jones</name>\n    <email>kevincantu@alexander-carson.org</email>\n  </person>\n  <person>\n    <name>Cameron Mcdonald</name>\n    <email>deckerjerome@garcia.com</email>\n  </person>\n  <person>\n    <name>Brittany Sanders</name>\n    <email>dennis55@leonard-turner.com</email>\n  </person>\n  <person>\n    <name>Daniel Patterson</name>\n    <email>timothy36@novak.com</email>\n  </person>\n  <person>\n    <name>David Chaney</name>\n    <email>kristen02@hotmail.com</email>\n  </person>\n  <person>\n    <name>Sheri Silva</name>\n    <email>idawson@alvarez.com</email>\n  </person>\n  <person>\n    <name>Holly Ward</name>\n    <email>saraallen@dunn-smith.net</email>\n  </person>\n  <person>\n    <name>Bryan Solis</name>\n    <email>stacey30@lam.biz</email>\n  </person>\n  <person>\n    <name>Diane Carter</name>\n    <email>paulvargas@gmail.com</email>\n  </person>\n  <person>\n    <name>David Brown</name>\n    <email>james98@gmail.com</email>\n  </person>\n  <person>\n    <name>Bridget Fritz</name>\n    <email>beth24@hotmail.com</email>\n  </person>\n  <person>\n    <name>Paul Boyd</name>\n    <email>johngutierrez@hotmail.com</email>\n  </person>\n  <person>\n    <name>Ernest Baker</name>\n    <email>phillipwhite@hotmail.com</email>\n  </person>\n  <person>\n    <name>George Myers</name>\n    <email>frank52@hammond.com</email>\n  </person>\n  <person>\n    <name>Daniel Miller</name>\n    <email>joshua96@gmail.com</email>\n  </person>\n  <person>\n    <name>Jonathan Ayala</name>\n    <email>jerryharris@davis.net</email>\n  </person>\n  <person>\n    <name>Jill Stone</name>\n    <email>pwright@hotmail.com</email>\n  </person>\n  <person>\n    <name>Trevor Richard</name>\n    <email>mreed@thompson.org</email>\n  </person>\n  <person>\n    <name>Jason Thomas</name>\n    <email>josephflowers@hotmail.com</email>\n  </person>\n  <person>\n    <name>Arthur Thomas</name>\n    <email>lnelson@hicks.com</email>\n  </person>\n  <person>\n    <name>Austin Collins</name>\n    <email>ambermann@barnes.com</email>\n  </person>\n  <person>\n    <name>Jason Diaz</name>\n    <email>ericreyes@hotmail.com</email>\n  </person>\n  <person>\n    <name>Darryl Hall</name>\n    <email>faithdixon@barnes-burgess.org</email>\n  </person>\n  <person>\n    <name>Jason Thomas</name>\n    <email>brittany32@yahoo.com</email>\n  </person>\n  <person>\n    <name>John Sanders</name>\n    <email>waltontheresa@hotmail.com</email>\n  </person>\n  <person>\n    <name>Lisa Hayes</name>\n    <email>victor14@hotmail.com</email>\n  </person>\n  <person>\n    <name>Chelsea Wong</name>\n    <email>iwatkins@williams-solomon.com</email>\n  </person>\n  <person>\n    <name>Joseph Fitzgerald</name>\n    <email>mary86@hotmail.com</email>\n  </person>\n  <person>\n    <name>Crystal Schroeder</name>\n    <email>kbarron@wilson-flynn.org</email>\n  </person>\n  <person>\n    <name>Denise Bean</name>\n    <email>noah23@gmail.com</email>\n  </person>\n  <person>\n    <name>Jamie Atkins</name>\n    <email>cwebb@hotmail.com</email>\n  </person>\n  <person>\n    <name>Joshua Kim</name>\n    <email>esmith@ramirez.com</email>\n  </person>\n  <person>\n    <name>Deanna Mooney</name>\n    <email>jason13@turner.com</email>\n  </person>\n  <person>\n    <name>Jasmine Baker</name>\n    <email>torresjacob@braun.com</email>\n  </person>\n  <person>\n    <name>Victoria Williams</name>\n    <email>rwilliams@hotmail.com</email>\n  </person>\n  <person>\n    <name>Sandra Hall</name>\n    <email>williamsonrichard@gmail.com</email>\n  </person>\n  <person>\n    <name>Miranda Mcpherson</name>\n    <email>xrussell@barajas.biz</email>\n  </person>\n  <person>\n    <name>Samantha Walton</name>\n    <email>danielle73@gmail.com</email>\n  </person>\n  <person>\n    <name>Kyle Serrano</name>\n    <email>stonecassandra@mcfarland.info</email>\n  </person>\n  <person>\n    <name>Mr. Bruce Maldonado DDS</name>\n    <email>diazmatthew@yahoo.com</email>\n  </person>\n  <person>\n    <name>Amber Fisher</name>\n    <email>jonesdavid@rubio.info</email>\n  </person>\n  <person>\n    <name>Brett Berry</name>\n    <email>millerteresa@gmail.com</email>\n  </person>\n  <person>\n    <name>Cory Bradley</name>\n    <email>umatthews@summers.com</email>\n  </person>\n  <person>\n    <name>Ryan Peters</name>\n    <email>shepherdmonique@gmail.com</email>\n  </person>\n  <person>\n    <name>Laura Lee</name>\n    <email>lfleming@higgins.com</email>\n  </person>\n  <person>\n    <name>Christian Smith</name>\n    <email>johnnymartinez@castro-miller.com</email>\n  </person>\n  <person>\n    <name>Kelly Hanson</name>\n    <email>velazquezsandra@chavez-malone.info</email>\n  </person>\n  <person>\n    <name>Brian King</name>\n    <email>hwood@yahoo.com</email>\n  </person>\n  <person>\n    <name>Cynthia Owens</name>\n    <email>sbrown@hotmail.com</email>\n  </person>\n  <person>\n    <name>Lisa Clark</name>\n    <email>derek74@bell-martinez.com</email>\n  </person>\n  <person>\n    <name>Brenda Ford</name>\n    <email>kevin55@hotmail.com</email>\n  </person>\n  <person>\n    <name>Daniel Brady</name>\n    <email>wbennett@hotmail.com</email>\n  </person>\n  <person>\n    <name>Jake Wilson</name>\n    <email>lorraine60@solis.biz</email>\n  </person>\n  <person>\n    <name>April Cole</name>\n    <email>halltyler@yahoo.com</email>\n  </person>\n  <person>\n    <name>Melissa Callahan</name>\n    <email>cmckenzie@rodriguez.info</email>\n  </person>\n  <person>\n    <name>Taylor Brown</name>\n    <email>davisadam@gmail.com</email>\n  </person>\n  <person>\n    <name>Patrick Guerrero</name>\n    <email>hannah48@delgado.net</email>\n  </person>\n  <person>\n    <name>Brian Gonzalez</name>\n    <email>burchmalik@johnson.com</email>\n  </person>\n  <person>\n    <name>Robert Bailey</name>\n    <email>debbiemoore@hotmail.com</email>\n  </person>\n  <person>\n    <name>Jesus Maynard</name>\n    <email>gene45@gmail.com</email>\n  </person>\n  <person>\n    <name>Linda Greer</name>\n    <email>johnharris@reed-allen.net</email>\n  </person>\n  <person>\n    <name>Travis Thomas</name>\n    <email>bryantrachel@gmail.com</email>\n  </person>\n  <person>\n    <name>Vicki Mitchell</name>\n    <email>edaniels@hotmail.com</email>\n  </person>\n  <person>\n    <name>Paula Espinoza</name>\n    <email>donnameyer@dennis.org</email>\n  </person>\n  <person>\n    <name>James Hoffman</name>\n    <email>haustin@larson-wiggins.biz</email>\n  </person>\n  <person>\n    <name>Ashlee Perkins</name>\n    <email>stevenknapp@miller.com</email>\n  </person>\n  <person>\n    <name>Rebecca Leon</name>\n    <email>smitchell@simpson-johnson.com</email>\n  </person>\n  <person>\n    <name>Jorge Williams</name>\n    <email>shawn36@peters-meadows.com</email>\n  </person>\n  <person>\n    <name>Bob Flores</name>\n    <email>kellercourtney@yahoo.com</email>\n  </person>\n  <person>\n    <name>Lisa Miller</name>\n    <email>johnsoncrystal@gmail.com</email>\n  </person>\n  <person>\n    <name>Brandon Davis</name>\n    <email>bryanpetersen@hotmail.com</email>\n  </person>\n  <person>\n    <name>Joshua Daugherty</name>\n    <email>josehayes@carey.com</email>\n  </person>\n  <person>\n    <name>Justin Wise</name>\n    <email>pamelacosta@simmons-morrow.com</email>\n  </person>\n  <person>\n    <name>Kimberly Johnson</name>\n    <email>combssandra@deleon.com</email>\n  </person>\n  <person>\n    <name>Toni Stone</name>\n    <email>eestrada@charles.com</email>\n  </person>\n  <person>\n    <name>Julie Rivers</name>\n    <email>rwilliams@castillo-nelson.org</email>\n  </person>\n  <person>\n    <name>Kelly Scott</name>\n    <email>danielsmith@hotmail.com</email>\n  </person>\n  <person>\n    <name>Michael Carr</name>\n    <email>clarklisa@newman-barrett.com</email>\n  </person>\n  <person>\n    <name>Jonathan Vaughn</name>\n    <email>dennisrebecca@lawrence-harris.com</email>\n  </person>\n  <person>\n    <name>Erica Lowe</name>\n    <email>wilsonkelly@hotmail.com</email>\n  </person>\n  <person>\n    <name>Kimberly Clark</name>\n    <email>jose15@gmail.com</email>\n  </person>\n  <person>\n    <name>Lindsey Robertson</name>\n    <email>rdickerson@yahoo.com</email>\n  </person>\n  <person>\n    <name>Cindy Anderson</name>\n    <email>gmorton@daniels.com</email>\n  </person>\n  <person>\n    <name>Tami Barber</name>\n    <email>harveykaren@hotmail.com</email>\n  </person>\n  <person>\n    <name>Tiffany Wu</name>\n    <email>jessica90@gmail.com</email>\n  </person>\n  <person>\n    <name>Edward Bowers</name>\n    <email>hallkathy@gmail.com</email>\n  </person>\n  <person>\n    <name>Shawn Collier</name>\n    <email>rhondasmith@hotmail.com</email>\n  </person>\n  <person>\n    <name>Michael Cox</name>\n    <email>usimpson@graham-cunningham.net</email>\n  </person>\n</people>\n"
  },
  {
    "path": "_2020/files/vimrc",
    "content": "\" Comments in Vimscript start with a `\"`.\n\n\" If you open this file in Vim, it'll be syntax highlighted for you.\n\n\" Vim is based on Vi. Setting `nocompatible` switches from the default\n\" Vi-compatibility mode and enables useful Vim functionality. This\n\" configuration option turns out not to be necessary for the file named\n\" '~/.vimrc', because Vim automatically enters nocompatible mode if that file\n\" is present. But we're including it here just in case this config file is\n\" loaded some other way (e.g. saved as `foo`, and then Vim started with\n\" `vim -u foo`).\nset nocompatible\n\n\" Turn on syntax highlighting.\nsyntax on\n\n\" Disable the default Vim startup message.\nset shortmess+=I\n\n\" Show line numbers.\nset number\n\n\" This enables relative line numbering mode. With both number and\n\" relativenumber enabled, the current line shows the true line number, while\n\" all other lines (above and below) are numbered relative to the current line.\n\" This is useful because you can tell, at a glance, what count is needed to\n\" jump up or down to a particular line, by {count}k to go up or {count}j to go\n\" down.\nset relativenumber\n\n\" Always show the status line at the bottom, even if you only have one window open.\nset laststatus=2\n\n\" The backspace key has slightly unintuitive behavior by default. For example,\n\" by default, you can't backspace before the insertion point set with 'i'.\n\" This configuration makes backspace behave more reasonably, in that you can\n\" backspace over anything.\nset backspace=indent,eol,start\n\n\" By default, Vim doesn't let you hide a buffer (i.e. have a buffer that isn't\n\" shown in any window) that has unsaved changes. This is to prevent you from \"\n\" forgetting about unsaved changes and then quitting e.g. via `:qa!`. We find\n\" hidden buffers helpful enough to disable this protection. See `:help hidden`\n\" for more information on this.\nset hidden\n\n\" This setting makes search case-insensitive when all characters in the string\n\" being searched are lowercase. However, the search becomes case-sensitive if\n\" it contains any capital letters. This makes searching more convenient.\nset ignorecase\nset smartcase\n\n\" Enable searching as you type, rather than waiting till you press enter.\nset incsearch\n\n\" Unbind some useless/annoying default key bindings.\nnmap Q <Nop> \" 'Q' in normal mode enters Ex mode. You almost never want this.\n\n\" Disable audible bell because it's annoying.\nset noerrorbells visualbell t_vb=\n\n\" Enable mouse support. You should avoid relying on this too much, but it can\n\" sometimes be convenient.\nset mouse+=a\n\n\" Try to prevent bad habits like using the arrow keys for movement. This is\n\" not the only possible bad habit. For example, holding down the h/j/k/l keys\n\" for movement, rather than using more efficient movement commands, is also a\n\" bad habit. The former is enforceable through a .vimrc, while we don't know\n\" how to prevent the latter.\n\" Do this in normal mode...\nnnoremap <Left>  :echoe \"Use h\"<CR>\nnnoremap <Right> :echoe \"Use l\"<CR>\nnnoremap <Up>    :echoe \"Use k\"<CR>\nnnoremap <Down>  :echoe \"Use j\"<CR>\n\" ...and in insert mode\ninoremap <Left>  <ESC>:echoe \"Use h\"<CR>\ninoremap <Right> <ESC>:echoe \"Use l\"<CR>\ninoremap <Up>    <ESC>:echoe \"Use k\"<CR>\ninoremap <Down>  <ESC>:echoe \"Use j\"<CR>\n"
  },
  {
    "path": "_2020/index.html",
    "content": "---\nlayout: page\ntitle: \"2020 Lectures\"\npermalink: /2020/\nphony: true\nexcerpt: '' # work around a bug\n---\n\n<ul class=\"double-spaced\">\n  {% assign lectures = site['2020'] | sort: 'date' %}\n  {% for lecture in lectures %}\n    {% if lecture.phony != true %}\n      <li>\n        <strong>{{ lecture.date | date: '%-m/%d' }}</strong>:\n        {% if lecture.ready %}\n          <a href=\"{{ lecture.url }}\">{{ lecture.title }}</a>\n        {% elsif lecture.noclass %}\n          {{ lecture.title }} [no class]\n        {% else %}\n          {{ lecture.title }} [coming soon]\n        {% endif %}\n        {% if lecture.details %}\n          <br>\n          ({{ lecture.details }})\n        {% endif %}\n      </li>\n    {% endif %}\n  {% endfor %}\n</ul>\n\n讲座视频可以在 <a href=\"https://www.youtube.com/playlist?list=PLyzOVJj3bHQuloKGG59rS43e29ro7I57J\">YouTube</a>上找到。\n\n\n<h2>往期讲座</h2>\n\n<p>您也可以访问<a href=\"/2019/\">去年的讲座笔记和视频</a>。</p>\n"
  },
  {
    "path": "_2020/metaprogramming.md",
    "content": "---\nlayout: lecture\ntitle: \"元编程\"\ndetails: 构建系统、依赖管理、测试、持续集成\ndate: 2020-01-27\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n  aspect: 56.25\n  id: _Ms1Z4xfqv4\nsolution:\n    ready: true\n    url: metaprogramming-solution\n---\n\n我们这里说的 “元编程（metaprogramming）” 是什么意思呢？好吧，对于本文要介绍的这些内容，这是我们能够想到的最能概括它们的词。因为我们今天要讲的东西，更多是关于 *流程* ，而不是写代码或更高效的工作。本节课我们会学习构建系统、代码测试以及依赖管理。在您还是学生的时候，这些东西看上去似乎对您来说没那么重要，不过当您开始实习或走进社会的时候，您将会接触到大型的代码库，本节课讲授的这些东西也会变得随处可见。必须要指出的是，“元编程” 也有 “[用于操作程序的程序](https://en.wikipedia.org/wiki/Metaprogramming)” 之含义，这和我们今天讲座所介绍的概念是完全不同的。\n\n# 构建系统\n\n如果您使用 LaTeX 来编写论文，您需要执行哪些命令才能编译出您想要的论文呢？执行基准测试、绘制图表然后将其插入论文的命令又有哪些？或者，如何编译本课程提供的代码并执行测试呢？\n\n对于大多数系统来说，不论其是否包含代码，都会包含一个 “构建过程”。有时，您需要执行一系列操作。通常，这一过程包含了很多步骤，很多分支。执行一些命令来生成图表，然后执行另外的一些命令生成结果，然后再执行其他的命令来生成最终的论文。有很多事情需要我们完成，您并不是第一个因此感到苦恼的人，幸运的是，有很多工具可以帮助我们完成这些操作。\n\n这些工具通常被称为 \"构建系统\"，而且这些工具还不少。如何选择工具完全取决于您当前手头上要完成的任务以及项目的规模。从本质上讲，这些工具都是非常类似的。您需要定义 *依赖*、*目标* 和 *规则*。您必须告诉构建系统您具体的构建目标，系统的任务则是找到构建这些目标所需要的依赖，并根据规则构建所需的中间产物，直到最终目标被构建出来。理想的情况下，如果目标的依赖没有发生改动，并且我们可以从之前的构建中复用这些依赖，那么与其相关的构建规则并不会被执行。\n\n`make` 是最常用的构建系统之一，您会发现它通常被安装到了几乎所有基于 UNIX 的系统中。`make` 并不完美，但是对于中小型项目来说，它已经足够好了。当您执行 `make` 时，它会去参考当前目录下名为 `Makefile` 的文件。所有构建目标、相关依赖和规则都需要在该文件中定义，它看上去是这样的：\n\n```make\npaper.pdf: paper.tex plot-data.png\n\tpdflatex paper.tex\n\nplot-%.png: %.dat plot.py\n\t./plot.py -i $*.dat -o $@\n```\n\n这个文件中的指令，即如何使用右侧文件构建左侧文件的规则。或者，换句话说，冒号左侧的是构建目标，冒号右侧的是构建它所需的依赖。缩进的部分是从依赖构建目标时需要用到的一段命令。在 `make` 中，第一条指令还指明了构建的目的，如果您使用不带参数的 `make`，这便是我们最终的构建结果。或者，您可以使用这样的命令来构建其他目标：`make plot-data.png`。\n\n规则中的 `%` 是一种模式，它会匹配其左右两侧相同的字符串。例如，如果目标是 `plot-foo.png`， `make` 会去寻找 `foo.dat` 和 `plot.py` 作为依赖。现在，让我们看看如果在一个空的源码目录中执行 `make` 会发生什么？\n\n```console\n$ make\nmake: *** No rule to make target 'paper.tex', needed by 'paper.pdf'.  Stop.\n```\n\n`make` 会告诉我们，为了构建出 `paper.pdf`，它需要 `paper.tex`，但是并没有一条规则能够告诉它如何构建该文件。让我们构建它吧！\n\n```console\n$ touch paper.tex\n$ make\nmake: *** No rule to make target 'plot-data.png', needed by 'paper.pdf'.  Stop.\n```\n\n哟，有意思，我们是 **有** 构建 `plot-data.png` 的规则的，但是这是一条模式规则。因为源文件 `data.dat` 并不存在，因此 `make` 就会告诉您它不能构建 `plot-data.png`，让我们创建这些文件：\n\n```console\n$ cat paper.tex\n\\documentclass{article}\n\\usepackage{graphicx}\n\\begin{document}\n\\includegraphics[scale=0.65]{plot-data.png}\n\\end{document}\n$ cat plot.py\n#!/usr/bin/env python\nimport matplotlib\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport argparse\n\nparser = argparse.ArgumentParser()\nparser.add_argument('-i', type=argparse.FileType('r'))\nparser.add_argument('-o')\nargs = parser.parse_args()\n\ndata = np.loadtxt(args.i)\nplt.plot(data[:, 0], data[:, 1])\nplt.savefig(args.o)\n$ cat data.dat\n1 1\n2 2\n3 3\n4 4\n5 8\n```\n\n当我们执行 `make` 时会发生什么？\n\n```console\n$ make\n./plot.py -i data.dat -o plot-data.png\npdflatex paper.tex\n... lots of output ...\n```\n\n看！PDF ！\n\n如果再次执行 `make` 会怎样？\n\n```console\n$ make\nmake: 'paper.pdf' is up to date.\n```\n什么事情都没做！为什么？好吧，因为它什么都不需要做。make 检查出所有之前构建的目标仍然与其列出的依赖项保持最新状态。让我们试试修改 `paper.tex` 后再重新执行 `make`：\n\n```console\n$ vim paper.tex\n$ make\npdflatex paper.tex\n...\n```\n\n注意 `make` 并 **没有** 重新构建 `plot.py`，因为没必要；`plot-data.png` 的所有依赖都没有发生改变。\n\n\n# 依赖管理\n\n就您的项目来说，它的依赖可能本身也是其他的项目。您也许会依赖某些程序（例如 `python`）、系统包（例如 `openssl`）或相关编程语言的库（例如 `matplotlib`）。 现在，大多数的依赖可以通过某些 **软件仓库** 来获取，这些仓库会在一个地方托管大量的依赖，我们则可以通过一套非常简单的机制来安装依赖。例如 Ubuntu 系统下面有 Ubuntu 软件包仓库，您可以通过 `apt` 这个工具来访问， RubyGems 则包含了 Ruby 的相关库，PyPi 包含了 Python 库， Arch Linux 用户贡献的库则可以在 Arch User Repository 中找到。\n\n由于每个仓库、每种工具的运行机制都不太一样，因此我们并不会在本节课深入讲解具体的细节。我们会介绍一些通用的术语，例如 *版本控制*。大多数被其他项目所依赖的项目都会在每次发布新版本时创建一个 *版本号*。通常看上去像 8.1.3 或 64.1.20192004。版本号一般是数字构成的，但也并不绝对。版本号有很多用途，其中最重要的作用是保证软件能够运行。试想一下，假如我的库要发布一个新版本，在这个版本里面我重命名了某个函数。如果有人在我的库升级版本后，仍希望基于它构建新的软件，那么很可能构建会失败，因为它希望调用的函数已经不复存在了。有了版本控制就可以很好的解决这个问题，我们可以指定当前项目需要基于某个版本，甚至某个范围内的版本，或是某些项目来构建。这么做的话，即使某个被依赖的库发生了变化，依赖它的软件可以基于其之前的版本进行构建。\n\n这样还并不理想！如果我们发布了一项和安全相关的升级，它并 *没有* 影响到任何公开接口（API），但是处于安全的考虑，依赖它的项目都应该立即升级，那应该怎么做呢？这也是版本号包含多个部分的原因。不同项目所用的版本号其具体含义并不完全相同，但是一个相对比较常用的标准是 [语义版本号](https://semver.org/)，这种版本号具有不同的语义，它的格式是这样的：主版本号.次版本号.补丁号。相关规则有：\n\n - 如果新的版本没有改变 API，请将补丁号递增；\n - 如果您添加了 API 并且该改动是向后兼容的，请将次版本号递增；\n - 如果您修改了 API 但是它并不向后兼容，请将主版本号递增。\n\n这么做有很多好处。现在如果我们的项目是基于您的项目构建的，那么只要最新版本的主版本号只要没变就是安全的 ，次版本号不低于之前我们使用的版本即可。换句话说，如果我依赖的版本是 `1.3.7`，那么使用 `1.3.8`、`1.6.1`，甚至是 `1.3.0` 都是可以的。如果版本号是 `2.2.4` 就不一定能用了，因为它的主版本号增加了。我们可以将 Python 的版本号作为语义版本号的一个实例。您应该知道，Python 2 和 Python 3 的代码是不兼容的，这也是为什么 Python 的主版本号改变的原因。类似的，使用 Python 3.5 编写的代码在 3.7 上可以运行，但是在 3.4 上可能会不行。\n\n使用依赖管理系统的时候，您可能会遇到锁文件（_lock files_）这一概念。锁文件列出了您当前每个依赖所对应的具体版本号。通常，您需要执行升级程序才能更新依赖的版本。这么做的原因有很多，例如避免不必要的重新编译、创建可复现的软件版本或禁止自动升级到最新版本（可能会包含 bug）。还有一种极端的依赖锁定叫做 _vendoring_，它会把您的依赖中的所有代码直接拷贝到您的项目中，这样您就能够完全掌控代码的任何修改，同时您也可以将自己的修改添加进去，不过这也意味着如果该依赖的维护者更新了某些代码，您也必须要自己去拉取这些更新。\n\n# 持续集成系统\n\n随着您接触到的项目规模越来越大，您会发现修改代码之后还有很多额外的工作要做。您可能需要上传一份新版本的文档、上传编译后的文件到某处、发布代码到 pypi，执行测试套件等等。或许您希望每次有人提交代码到 GitHub 的时候，他们的代码风格被检查过并执行过某些基准测试？如果您有这方面的需求，那么请花些时间了解一下持续集成。\n\n持续集成（Continuous integration），或者叫做 CI 是一种雨伞术语（umbrella term，涵盖了一组术语的术语），它指的是那些“当您的代码变动时，自动运行的东西”，市场上有很多提供各式各样 CI 工具的公司，这些工具大部分都是免费或开源的。比较大的有 Travis CI、Azure Pipelines 和 GitHub Actions。它们的工作原理都是类似的：您需要在代码仓库中添加一个文件，描述当前仓库发生任何修改时，应该如何应对。目前为止，最常见的规则是：如果有人提交代码，执行测试套件。当这个事件被触发时，CI 提供方会启动一个（或多个）虚拟机，执行您制定的规则，并且通常会记录下相关的执行结果。您可以进行某些设置，这样当测试套件失败时您能够收到通知或者当测试全部通过时，您的仓库主页会显示一个徽标。\n\n本课程的网站基于 GitHub Pages 构建，这就是一个很好的例子。Pages 在每次 `master` 有代码更新时，会执行 Jekyll 博客软件，然后使您的站点可以通过某个 GitHub 域名来访问。对于我们来说这些事情太琐碎了，我现在我们只需要在本地进行修改，然后使用 git 提交代码，发布到远端。CI 会自动帮我们处理后续的事情。\n\n## 测试简介\n\n多数的大型软件都有“测试套件”。您可能已经对测试的相关概念有所了解，但是我们觉得有些测试方法和测试术语还是应该再次提醒一下：\n\n - 测试套件（Test suite）：所有测试的统称。\n - 单元测试（Unit test）：一种“微型测试”，用于对某个封装的特性进行测试。\n - 集成测试（Integration test）：一种“宏观测试”，针对系统的某一大部分进行，测试其不同的特性或组件是否能 *协同* 工作。\n - 回归测试（Regression test）：一种实现特定模式的测试，用于保证之前引起问题的 bug 不会再次出现。\n - 模拟（Mocking）: 使用一个假的实现来替换函数、模块或类型，屏蔽那些和测试不相关的内容。例如，您可能会“模拟网络连接” 或 “模拟硬盘”。\n\n\n# 课后练习\n[习题解答]({{site.url}}/{{site.solution_url}}/{{page.solution.url}})\n1. 大多数的 makefiles 都提供了 一个名为 `clean` 的构建目标，这并不是说我们会生成一个名为 `clean` 的文件，而是我们可以使用它清理文件，让 make 重新构建。您可以理解为它的作用是“撤销”所有构建步骤。在上面的 makefile 中为 `paper.pdf` 实现一个 `clean` 目标。您需要将构建目标设置为 [phony](https://www.gnu.org/software/make/manual/html_node/Phony-Targets.html)。您也许会发现 [`git ls-files`](https://git-scm.com/docs/git-ls-files) 子命令很有用。其他一些有用的 make 构建目标可以在 [这里](https://www.gnu.org/software/make/manual/html_node/Standard-Targets.html#Standard-Targets) 找到；\n\n2. 指定版本要求的方法很多，让我们学习一下 [Rust 的构建系统](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html) 的依赖管理。大多数的包管理仓库都支持类似的语法。对于每种语法(尖号、波浪号、通配符、比较、多个版本要求)，构建一种场景使其具有实际意义；\n\n3. Git 可以作为一个简单的 CI 系统来使用，在任何 git 仓库中的 `.git/hooks` 目录中，您可以找到一些文件（当前处于未激活状态），它们的作用和脚本一样，当某些事件发生时便可以自动执行。请编写一个 [`pre-commit`](https://git-scm.com/docs/githooks#_pre_commit) 钩子，它会在提交前执行 `make paper.pdf` 并在出现构建失败的情况拒绝您的提交。这样做可以避免产生包含不可构建版本的提交信息；\n\n4. 基于 [GitHub Pages](https://pages.github.com/) 创建任意一个可以自动发布的页面。添加一个 [GitHub Action](https://github.com/features/actions) 到该仓库，对仓库中的所有 shell 文件执行  `shellcheck`([方法之一](https://github.com/marketplace/actions/shellcheck))；\n\n5. [构建属于您的](https://help.github.com/en/actions/automating-your-workflow-with-github-actions/building-actions) GitHub action，对仓库中所有的 `.md` 文件执行 [`proselint`](http://proselint.com/) 或 [`write-good`](https://github.com/btford/write-good)，在您的仓库中开启这一功能，提交一个包含错误的文件看看该功能是否生效。\n"
  },
  {
    "path": "_2020/potpourri.md",
    "content": "---\nlayout: lecture\ntitle: \"大杂烩\"\ndate: 2020-01-29\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n  aspect: 56.25\n  id: JZDt-PRq0uo\n---\n\n## 目录\n\n- [修改键位映射](#%E4%BF%AE%E6%94%B9%E9%94%AE%E4%BD%8D%E6%98%A0%E5%B0%84)\n- [守护进程](#%E5%AE%88%E6%8A%A4%E8%BF%9B%E7%A8%8B)\n- [FUSE](#fuse)\n- [备份](#%E5%A4%87%E4%BB%BD)\n- [API（应用程序接口）](#API%EF%BC%88%E5%BA%94%E7%94%A8%E7%A8%8B%E5%BA%8F%E6%8E%A5%E5%8F%A3%EF%BC%89)\n- [常见命令行标志参数及模式](#%E5%B8%B8%E8%A7%81%E5%91%BD%E4%BB%A4%E8%A1%8C%E6%A0%87%E5%BF%97%E5%8F%82%E6%95%B0%E5%8F%8A%E6%A8%A1%E5%BC%8F)\n- [窗口管理器](#%E7%AA%97%E5%8F%A3%E7%AE%A1%E7%90%86%E5%99%A8)\n- [VPN](#vpn)\n- [Markdown](#markdown)\n- [Hammerspoon (macOS 桌面自动化)](#Hammerspoon%20(macOS%E6%A1%8C%E9%9D%A2%E8%87%AA%E5%8A%A8%E5%8C%96))\n  - [资源](#%E8%B5%84%E6%BA%90)\n- [开机引导以及 Live USB](#%E5%BC%80%E6%9C%BA%E5%BC%95%E5%AF%BC%E4%BB%A5%E5%8F%8A%20Live%20USB)\n- [Docker, Vagrant, VMs, Cloud, OpenStack](#docker-vagrant-vms-cloud-openstack)\n- [交互式记事本编程](#%E4%BA%A4%E4%BA%92%E5%BC%8F%E8%AE%B0%E4%BA%8B%E6%9C%AC%E7%BC%96%E7%A8%8B)\n- [GitHub](#github)\n\n## 修改键位映射\n作为一名程序员，键盘是你的主要输入工具。它像计算机里的其他部件一样是可配置的，而且值得你在这上面花时间。\n\n一个很常见的配置是修改键位映射。通常这个功能由在计算机上运行的软件实现。当某一个按键被按下，软件截获键盘发出的按键事件（keypress event）并使用另外一个事件取代。比如：\n- 将 Caps Lock 映射为 Ctrl 或者 Escape：Caps Lock 使用了键盘上一个非常方便的位置而它的功能却很少被用到，所以我们（讲师）非常推荐这个修改；\n- 将 PrtSc 映射为播放/暂停：大部分操作系统支持播放/暂停键；\n- 交换 Ctrl 和 Meta 键（Windows 的徽标键或者 Mac 的 Command 键）。\n\n你也可以将键位映射为任意常用的指令。软件监听到特定的按键组合后会运行设定的脚本。\n- 打开一个新的终端或者浏览器窗口；\n- 输出特定的字符串，比如：一个超长邮件地址或者 MIT ID；\n- 使计算机或者显示器进入睡眠模式。\n\n甚至更复杂的修改也可以通过软件实现：\n- 映射按键顺序，比如：按 Shift 键五下切换大小写锁定；\n- 区别映射单点和长按，比如：单点 Caps Lock 映射为 Escape，而长按 Caps Lock 映射为 Ctrl；\n- 对不同的键盘或软件保存专用的映射配置。\n\n下面是一些修改键位映射的软件：\n- macOS - [karabiner-elements](https://pqrs.org/osx/karabiner/), [skhd](https://github.com/koekeishiya/skhd) 或者 [BetterTouchTool](https://folivora.ai/)\n- Linux - [xmodmap](https://wiki.archlinux.org/index.php/Xmodmap) 或者 [Autokey](https://github.com/autokey/autokey)\n- Windows - 控制面板，[AutoHotkey](https://www.autohotkey.com/) 或者 [SharpKeys](https://www.randyrants.com/category/sharpkeys/)\n- QMK - 如果你的键盘支持定制固件，[QMK](https://docs.qmk.fm/) 可以直接在键盘的硬件上修改键位映射。保留在键盘里的映射免除了在别的机器上的重复配置。\n\n## 守护进程\n\n即便守护进程（daemon）这个词看上去有些陌生，你应该已经大约明白它的概念。大部分计算机都有一系列在后台保持运行，不需要用户手动运行或者交互的进程。这些进程就是守护进程。以守护进程运行的程序名一般以 `d` 结尾，比如 SSH 服务端 `sshd`，用来监听传入的 SSH 连接请求并对用户进行鉴权。\n\nLinux 中的 `systemd`（the system daemon）是最常用的配置和运行守护进程的方法。运行 `systemctl status` 命令可以看到正在运行的所有守护进程。这里面有很多可能你没有见过，但是掌管了系统的核心部分的进程：管理网络、DNS 解析、显示系统的图形界面等等。用户使用 `systemctl` 命令和 `systemd` 交互来 `enable`（启用）、`disable`（禁用）、`start`（启动）、`stop`（停止）、`restart`（重启）、或者 `status`（检查）配置好的守护进程及系统服务。\n\n`systemd` 提供了一个很方便的界面用于配置和启用新的守护进程或系统服务。下面的配置文件使用了守护进程来运行一个简单的 Python 程序。文件的内容非常直接所以我们不对它详细阐述。`systemd` 配置文件的详细指南可参见 [freedesktop.org](https://www.freedesktop.org/software/systemd/man/systemd.service.html)。\n\n```ini\n# /etc/systemd/system/myapp.service\n[Unit]\n# 配置文件描述\nDescription=My Custom App\n# 在网络服务启动后启动该进程\nAfter=network.target\n\n[Service]\n# 运行该进程的用户\nUser=foo\n# 运行该进程的用户组\nGroup=foo\n# 运行该进程的根目录\nWorkingDirectory=/home/foo/projects/mydaemon\n# 开始该进程的命令\nExecStart=/usr/bin/local/python3.7 app.py\n# 在出现错误时重启该进程\nRestart=on-failure\n\n[Install]\n# 相当于Windows的开机启动。即使GUI没有启动，该进程也会加载并运行\nWantedBy=multi-user.target\n# 如果该进程仅需要在GUI活动时运行，这里应写作：\n# WantedBy=graphical.target\n# graphical.target在multi-user.target的基础上运行和GUI相关的服务\n```\n\n如果你只是想定期运行一些程序，可以直接使用 [`cron`](https://www.man7.org/linux/man-pages/man8/cron.8.html)。它是一个系统内置的，用来执行定期任务的守护进程。\n\n\n## FUSE\n\n现在的软件系统一般由很多模块化的组件构建而成。你使用的操作系统可以通过一系列共同的方式使用不同的文件系统上的相似功能。比如当你使用 `touch` 命令创建文件的时候，`touch` 使用系统调用（system call）向内核发出请求。内核再根据文件系统，调用特有的方法来创建文件。这里的问题是，UNIX 文件系统在传统上是以内核模块的形式实现，导致只有内核可以进行文件系统相关的调用。\n\n[FUSE](https://en.wikipedia.org/wiki/Filesystem_in_Userspace)（用户空间文件系统）允许运行在用户空间上的程序实现文件系统调用，并将这些调用与内核接口联系起来。在实践中，这意味着用户可以在文件系统调用中实现任意功能。\n\nFUSE 可以用于实现如：一个将所有文件系统操作都使用 SSH 转发到远程主机，由远程主机处理后返回结果到本地计算机的虚拟文件系统。这个文件系统里的文件虽然存储在远程主机，对于本地计算机上的软件而言和存储在本地别无二致。`sshfs` 就是一个实现了这种功能的 FUSE 文件系统。\n\n一些有趣的 FUSE 文件系统包括：\n- [sshfs](https://github.com/libfuse/sshfs)：使用 SSH 连接在本地打开远程主机上的文件\n- [rclone](https://rclone.org/commands/rclone_mount/)：将 Dropbox、Google Drive、Amazon S3、或者 Google Cloud Storage 一类的云存储服务挂载为本地文件系统\n- [gocryptfs](https://nuetzlich.net/gocryptfs/)：覆盖在加密文件上的文件系统。文件以加密形式保存在磁盘里，但该文件系统挂载后用户可以直接从挂载点访问文件的明文\n- [kbfs](https://keybase.io/docs/kbfs)：分布式端到端加密文件系统。在这个文件系统里有私密（private），共享（shared），以及公开（public）三种类型的文件夹\n- [borgbackup](https://borgbackup.readthedocs.io/en/stable/usage/mount.html)：方便用户浏览删除重复数据后的压缩加密备份\n\n## 备份\n\n任何没有备份的数据都可能在一个瞬间永远消失。复制数据很简单，但是可靠地备份数据很难。下面列举了一些关于备份的基础知识，以及一些常见做法容易掉进的陷阱。\n\n首先，复制存储在同一个磁盘上的数据不是备份，因为这个磁盘是一个单点故障（single point of failure）。这个磁盘一旦出现问题，所有的数据都可能丢失。放在家里的外置磁盘因为火灾、抢劫等原因可能会和源数据一起丢失，所以是一个弱备份。推荐的做法是将数据备份到不同的地点存储。\n\n同步方案也不是备份。即使方便如 Dropbox 或者 Google Drive，当数据在本地被抹除或者损坏，同步方案可能会把这些“更改”同步到云端。同理，像 RAID 这样的磁盘镜像方案也不是备份。它不能防止文件被意外删除、损坏、或者被勒索软件加密。\n\n有效备份方案的几个核心特性是：版本控制，删除重复数据，以及安全性。对备份的数据实施版本控制保证了用户可以从任何记录过的历史版本中恢复数据。在备份中检测并删除重复数据，使其仅备份增量变化可以减少存储开销。在安全性方面，作为用户，你应该考虑别人需要有什么信息或者工具才可以访问或者完全删除你的数据及备份。最后一点，不要盲目信任备份方案。用户应该经常检查备份是否可以用来恢复数据。\n\n备份不限制于备份在本地计算机上的文件。云端应用的重大发展使得我们很多的数据只存储在云端。当我们无法登录这些应用，在云端存储的网络邮件，社交网络上的照片，流媒体音乐播放列表，以及在线文档等等都会随之丢失。用户应该有这些数据的离线备份，而且已经有项目可以帮助下载并存储它们。\n\n如果想要了解更多具体内容，请参考本课程 2019 年关于备份的 [课堂笔记](/2019/backups)。\n\n\n## API（应用程序接口）\n\n关于如何使用计算机有效率地完成 _本地_ 任务，我们这堂课已经介绍了很多方法。这些方法在互联网上其实也适用。大多数线上服务提供的 API（应用程序接口）让你可以通过编程方式来访问这些服务的数据。比如，美国国家气象局就提供了一个可以从 shell 中获取天气预报的 API。\n\n这些 API 大多具有类似的格式。它们的结构化 URL 通常使用 `api.service.com` 作为根路径，用户可以访问不同的子路径来访问需要调用的操作，以及添加查询参数使 API 返回符合查询参数条件的结果。\n\n以美国天气数据为例，为了获得某个地点的天气数据，你可以发送一个 GET 请求（比如使用 `curl`）到 [`https://api.weather.gov/points/42.3604,-71.094`](https://api.weather.gov/points/42.3604,-71.094)。返回中会包括一系列用于获取特定信息（比如小时预报、气象观察站信息等）的 URL。通常这些返回都是 `JSON` 格式，你可以使用 [`jq`](https://stedolan.github.io/jq/) 等工具来选取需要的部分。\n\n有些需要认证的 API 通常要求用户在请求中加入某种私密令牌（secret token）来完成认证。请阅读你想访问的 API 所提供的文档来确定它请求的认证方式，但是其实大多数 API 都会使用 [OAuth](https://www.oauth.com/)。OAuth 通过向用户提供一系列仅可用于该 API 特定功能的私密令牌进行校验。因为使用了有效 OAuth 令牌的请求在 API 看来就是用户本人发出的请求，所以请一定保管好这些私密令牌。否则其他人就可以冒用你的身份进行任何你可以在这个 API 上进行的操作。\n\n[IFTTT](https://ifttt.com/) 这个网站可以将很多 API 整合在一起，让某 API 发生的特定事件触发在其他 API 上执行的任务。IFTTT 的全称 If This Then That 足以说明它的用法，比如在检测到用户的新推文后，自动发布在其他平台。但是你可以对它支持的 API 进行任意整合，所以试着来设置一下任何你需要的功能吧！\n\n## 常见命令行标志参数及模式\n\n命令行工具的用法千差万别，阅读 `man` 页面可以帮助你理解每种工具的用法。即便如此，下面我们将介绍一下命令行工具一些常见的共同功能。\n\n - 大部分工具支持 `--help` 或者类似的标志参数（flag）来显示它们的简略用法。\n - 会造成不可撤回操作的工具一般会提供“空运行”（dry run）标志参数，这样用户可以确认工具真实运行时会进行的操作。这些工具通常也会有“交互式”（interactive）标志参数，在执行每个不可撤回的操作前提示用户确认。\n - `--version` 或者 `-V` 标志参数可以让工具显示它的版本信息（对于提交软件问题报告非常重要）。\n - 基本所有的工具支持使用 `--verbose` 或者 `-v` 标志参数来输出详细的运行信息。多次使用这个标志参数，比如 `-vvv`，可以让工具输出更详细的信息（经常用于调试）。同样，很多工具支持 `--quiet` 标志参数来抑制除错误提示之外的其他输出。\n - 大多数工具中，使用 `-` 代替输入或者输出文件名意味着工具将从标准输入（standard input）获取所需内容，或者向标准输出（standard output）输出结果。\n - 会造成破坏性结果的工具一般默认进行非递归的操作，但是支持使用“递归”（recursive）标志函数（通常是 `-r`）。\n - 有的时候你可能需要向工具传入一个 _看上去_ 像标志参数的普通参数，比如：\n   - 使用 `rm` 删除一个叫 `-r` 的文件；\n   - 在通过一个程序运行另一个程序的时候（`ssh machine foo`），向内层的程序（`foo`）传递一个标志参数。\n   \n   这时候你可以使用特殊参数 `--` 让某个程序 _停止处理_ `--` 后面出现的标志参数以及选项（以 `-` 开头的内容）：\n    - `rm -- -r` 会让 `rm` 将 `-r` 当作文件名；\n    - `ssh machine --for-ssh -- foo --for-foo` 的 `--` 会让 `ssh` 知道 `--for-foo` 不是 `ssh` 的标志参数。\n\n## 窗口管理器\n\n大部分人适应了 Windows、macOS、以及 Ubuntu 默认的“拖拽”式窗口管理器。这些窗口管理器的窗口一般就堆在屏幕上，你可以拖拽改变窗口的位置、缩放窗口、以及让窗口堆叠在一起。这种堆叠式（floating/stacking）管理器只是窗口管理器中的一种。特别在 Linux 中，有很多种其他的管理器。\n\n平铺式（tiling）管理器就是一个常见的替代。顾名思义，平铺式管理器会把不同的窗口像贴瓷砖一样平铺在一起而不和其他窗口重叠。这和 [tmux](https://github.com/tmux/tmux) 管理终端窗口的方式类似。平铺式管理器按照写好的布局显示打开的窗口。如果只打开一个窗口，它会填满整个屏幕。新开一个窗口的时候，原来的窗口会缩小到比如三分之二或者三分之一的大小来腾出空间。打开更多的窗口会让已有的窗口进一步调整。\n\n就像 tmux 那样，平铺式管理器可以让你在完全不使用鼠标的情况下使用键盘切换、缩放、以及移动窗口。它们值得一试！\n\n## VPN\n\nVPN 现在非常火，但我们不清楚这是不是因为 [一些好的理由](https://gist.github.com/joepie91/5a9909939e6ce7d09e29)。你应该了解 VPN 能提供的功能和它的限制。使用了 VPN 的你对于互联网而言，**最好的情况** 下也就是换了一个网络供应商（ISP）。所有你发出的流量看上去来源于 VPN 供应商的网络而不是你的“真实”地址，而你实际接入的网络只能看到加密的流量。\n\n虽然这听上去非常诱人，但是你应该知道使用 VPN 只是把原本对网络供应商的信任放在了 VPN 供应商那里——网络供应商 _能看到的_，VPN 供应商 _也都能看到_。如果相比网络供应商你更信任 VPN 供应商，那当然很好。反之，则连接 VPN 的价值不明确。机场的不加密公共热点确实不可以信任，但是在家庭网络环境里，这个差异就没有那么明显。\n\n你也应该了解现在大部分包含用户敏感信息的流量已经被 HTTPS 或者 TLS 加密。这种情况下你所处的网络环境是否“安全”不太重要：供应商只能看到你和哪些服务器在交谈，却不能看到你们交谈的内容。\n\n这一切的大前提都是“最好的情况”。曾经发生过 VPN 提供商错误使用弱加密或者直接禁用加密的先例。另外，有些恶意的或者带有投机心态的供应商会记录和你有关的所有流量，并很可能会将这些信息卖给第三方。找错一家 VPN 经常比一开始就不用 VPN 更危险。\n\nMIT 向有访问校内资源需求的成员开放自己运营的 [VPN](https://ist.mit.edu/vpn)。如果你也想自己配置一个 VPN，可以了解一下 [WireGuard](https://www.wireguard.com/) 以及 [Algo](https://github.com/trailofbits/algo)。\n\n## Markdown\n\n你在职业生涯中大概率会编写各种各样的文档。在很多情况下这些文档需要使用标记来增加可读性，比如：插入粗体或者斜体内容，增加页眉、超链接、以及代码片段。\n\n在不使用 Word 或者 LaTeX 等复杂工具的情况下，你可以考虑使用 [Markdown](https://commonmark.org/help/) 这个轻量化的标记语言（markup language）。你可能已经见过 Markdown 或者它的一个变种。很多环境都支持并使用 Markdown 的一些子功能。\n\nMarkdown 致力于将人们编写纯文本时的一些习惯标准化。比如：\n- 用 `*` 包围的文字表示强调（*斜体*），或者用 `**` 表示特别强调（**粗体**）；\n- 以 `#` 开头的行是标题，`#` 的数量表示标题的级别，比如：`##二级标题`；\n- 以 `-` 开头代表一个无序列表的元素。一个数字加 `.`（比如 `1.`）代表一个有序列表元素；\n- 反引号 `` ` ``（backtick）包围的文字会以 `代码字体` 显示。如果要显示一段代码，可以在每一行前加四个空格缩进，或者使用三个反引号包围整个代码片段：\n\n    ```\n    就像这样\n    ```\n- 如果要添加超链接，将 _需要显示_ 的文字用方括号包围，并在后面紧接着用圆括号包围链接：`[显示文字](指向的链接)`。\n\nMarkdown 不仅容易上手，而且应用非常广泛。实际上本课程的课堂笔记和其他资料都是使用 Markdown 编写的。点击 [这个链接](https://github.com/missing-semester-cn/missing-semester-cn.github.io/blob/master/_2020/potpourri.md) 可以看到本页面的原始 Markdown 内容。\n\n\n\n## Hammerspoon (macOS 桌面自动化)\n\n[Hammerspoon](https://www.hammerspoon.org/) 是面向 macOS 的一个桌面自动化框架。它允许用户编写和操作系统功能挂钩的 Lua 脚本，从而与键盘、鼠标、窗口、文件系统等交互。\n\n下面是 Hammerspoon 的一些示例应用：\n\n- 绑定移动窗口到的特定位置的快捷键\n- 创建可以自动将窗口整理成特定布局的菜单栏按钮\n- 在你到实验室以后，通过检测所连接的 WiFi 网络自动静音扬声器\n- 在你不小心拿了朋友的充电器时弹出警告\n\n从用户的角度，Hammerspoon 可以运行任意 Lua 代码，绑定菜单栏按钮、按键、或者事件。Hammerspoon 提供了一个全面的用于和系统交互的库，因此它能没有限制地实现任何功能。你可以从头编写自己的 Hammerspoon 配置，也可以结合别人公布的配置来满足自己的需求。\n\n### 资源\n\n- [Getting Started with Hammerspoon](https://www.hammerspoon.org/go/)：Hammerspoon 官方教程\n- [Sample configurations](https://github.com/Hammerspoon/hammerspoon/wiki/Sample-Configurations)：Hammerspoon 官方示例配置\n- [Anish's Hammerspoon config](https://github.com/anishathalye/dotfiles-local/tree/mac/hammerspoon)：Anish 的 Hammerspoon 配置\n\n## 开机引导以及 Live USB\n\n在你的计算机启动时，[BIOS](https://en.wikipedia.org/wiki/BIOS) 或者 [UEFI](https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface) 会在加载操作系统之前对硬件系统进行初始化，这被称为引导（booting）。你可以通过按下计算机提示的键位组合来配置引导，比如 `Press F9 to configure BIOS. Press F12 to enter boot menu`。在 BIOS 菜单中你可以对硬件相关的设置进行更改，也可以在引导菜单中选择从硬盘以外的其他设备加载操作系统——比如 Live USB。\n\n[Live USB](https://en.wikipedia.org/wiki/Live_USB) 是包含了完整操作系统的闪存盘。Live USB 的用途非常广泛，包括：\n - 作为安装操作系统的启动盘；\n - 在不将操作系统安装到硬盘的情况下，直接运行 Live USB 上的操作系统；\n - 对硬盘上的相同操作系统进行修复；\n - 恢复硬盘上的数据。\n\nLive USB 通过在闪存盘上 _写入_ 操作系统的镜像制作，而写入不是单纯的往闪存盘上复制 `.iso` 文件。你可以使用 [UNetbootin](https://unetbootin.github.io/) 、[Rufus](https://github.com/pbatard/rufus) 等 Live USB 写入工具制作。\n\n## Docker, Vagrant, VMs, Cloud, OpenStack\n\n[虚拟机](https://en.wikipedia.org/wiki/Virtual_machine)（Virtual Machine）以及容器化（containerization）等工具可以帮助你模拟一个包括操作系统的完整计算机系统。虚拟机可以用于创建独立的测试或者开发环境，以及用作安全测试的沙盒。\n\n[Vagrant](https://www.vagrantup.com/) 是一个构建和配置虚拟开发环境的工具。它支持用户在配置文件中写入比如操作系统、系统服务、需要安装的软件包等描述，然后使用 `vagrant up` 命令在各种环境（VirtualBox，KVM，Hyper-V 等）中启动一个虚拟机。[Docker](https://www.docker.com/) 是一个使用容器化概念的类似工具。\n\n租用云端虚拟机可以享受以下资源的即时访问：\n\n- 便宜、常开、且有公共 IP 地址的虚拟机用来托管网站等服务\n- 有大量 CPU、磁盘、内存、以及 GPU 资源的虚拟机\n- 超出用户可以使用的物理主机数量的虚拟机\n  - 相比物理主机的固定开支，虚拟机的开支一般按运行的时间计算。所以如果用户只需要在短时间内使用大量算力，租用 1000 台虚拟机运行几分钟明显更加划算。\n\n受欢迎的 VPS 服务商有 [Amazon AWS](https://aws.amazon.com/)，[Google Cloud](https://cloud.google.com/)、[ Microsoft Azure](https://azure.microsoft.com/) 以及 [DigitalOcean](https://www.digitalocean.com/)。\n\nMIT CSAIL 的成员可以使用 [CSAIL OpenStack instance](https://tig.csail.mit.edu/shared-computing/open-stack/)\n申请免费的虚拟机用于研究。\n\n## 交互式记事本编程\n\n[交互式记事本](https://en.wikipedia.org/wiki/Notebook_interface) 可以帮助开发者进行与运行结果交互等探索性的编程。现在最受欢迎的交互式记事本环境大概是 [Jupyter](https://jupyter.org/)。它的名字来源于所支持的三种核心语言：Julia、Python、R。[Wolfram Mathematica](https://www.wolfram.com/mathematica/) 是另外一个常用于科学计算的优秀环境。\n\n## GitHub\n\n[GitHub](https://github.com/) 是最受欢迎的开源软件开发平台之一。我们课程中提到的很多工具，从 [vim](https://github.com/vim/vim) 到\n[Hammerspoon](https://github.com/Hammerspoon/hammerspoon)，都托管在 Github 上。向你每天使用的开源工具作出贡献其实很简单，下面是两种贡献者们经常使用的方法：\n\n- 创建一个 [议题（issue）](https://help.github.com/en/github/managing-your-work-on-github/creating-an-issue)。\n议题可以用来反映软件运行的问题或者请求新的功能。创建议题并不需要创建者阅读或者编写代码，所以它是一个轻量化的贡献方式。高质量的问题报告对于开发者十分重要。在现有的议题发表评论也可以对项目的开发作出贡献。\n- 使用 [拉取请求（pull request）](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests) 提交代码更改。由于涉及到阅读和编写代码，提交拉取请求总的来说比创建议题更加深入。拉取请求是请求别人把你自己的代码拉取（且合并）到他们的仓库里。很多开源项目仅允许认证的管理者管理项目代码，所以一般需要 [复刻（fork）](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) 这些项目的上游仓库（upstream repository），在你的 Github 账号下创建一个内容完全相同但是由你控制的复刻仓库。这样你就可以在这个复刻仓库自由创建新的分支并推送修复问题或者实现新功能的代码。完成修改以后再回到开源项目的 Github 页面 [创建一个拉取请求](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request)。\n\n提交请求后，项目管理者会和你交流拉取请求里的代码并给出反馈。如果没有问题，你的代码会和上游仓库中的代码合并。很多大的开源项目会提供贡献指南，容易上手的议题，甚至专门的指导项目来帮助参与者熟悉这些项目。\n"
  },
  {
    "path": "_2020/qa.md",
    "content": "---\nlayout: lecture\ntitle: \"提问&回答\"\ndate: 2020-01-30\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n  aspect: 56.25\n  id: Wz50FvGG6xU\n---\n\n最后一节课，我们回答学生提出的问题:\n\n\n- [学习操作系统相关内容的推荐，比如进程，虚拟内存，中断，内存管理等](#学习操作系统相关内容的推荐比如进程虚拟内存中断内存管理等)\n- [你会优先学习的工具有那些？](#你会优先学习的工具有那些)\n- [使用 Python VS Bash 脚本 VS 其他语言?](#使用-python-vs-bash-脚本-vs-其他语言)\n- [`source script.sh` 和 `./script.sh` 有什么区别?](#source-scriptsh-和-scriptsh-有什么区别)\n- [各种软件包和工具存储在哪里？引用过程是怎样的? `/bin` 或 `/lib` 是什么？](#各种软件包和工具存储在哪里引用过程是怎样的-bin-或-lib-是什么)\n- [我应该用 `apt-get install` 还是 `pip install` 去下载软件包呢?](#我应该用-apt-get-install-还是-pip-install-去下载软件包呢)\n- [用于提高代码性能，简单好用的性能分析工具有哪些?](#用于提高代码性能简单好用的性能分析工具有哪些)\n- [你使用那些浏览器插件?](#你使用那些浏览器插件)\n- [有哪些有用的数据整理工具？](#有哪些有用的数据整理工具)\n- [Docker 和虚拟机有什么区别?](#docker-和虚拟机有什么区别)\n- [不同操作系统的优缺点是什么，我们如何选择（比如选择最适用于我们需求的 Linux 发行版）?](#不同操作系统的优缺点是什么我们如何选择比如选择最适用于我们需求的-linux-发行版)\n- [使用 Vim 编辑器 VS Emacs 编辑器?](#使用-vim-编辑器-vs-emacs-编辑器)\n- [机器学习应用的提示或技巧?](#机器学习应用的提示或技巧)\n- [还有更多的 Vim 小窍门吗？](#还有更多的-vim-小窍门吗)\n- [2FA 是什么，为什么我需要使用它?](#2fa-是什么为什么我需要使用它)\n- [对于不同的 Web 浏览器有什么评价?](#对于不同的-web-浏览器有什么评价)\n\n\n## 学习操作系统相关内容的推荐，比如进程，虚拟内存，中断，内存管理等\n\n\n\n首先，不清楚你是不是真的需要了解这些更底层的话题。\n当你开始编写更加底层的代码，比如实现或修改内核的时候，这些内容是很重要的。除了其他课程中简要介绍过的进程和信号量之外，大部分话题都不相关。\n\n学习资源：\n\n- [MIT's 6.828 class](https://pdos.csail.mit.edu/6.828/) - 研究生阶段的操作系统课程（课程资料是公开的）。\n- 现代操作系统 第四版（*Modern Operating Systems 4th ed*） - 作者是 Andrew S. Tanenbaum 这本书对上述很多概念都有很好的描述。\n- FreeBSD 的设计与实现（*The Design and Implementation of the FreeBSD Operating System*） - 关于 FreeBSD OS 不错的资源（注意，FreeBSD OS 不是 Linux）。\n- 其他的指南例如 [用 Rust 写操作系统](https://os.phil-opp.com/) 这里用不同的语言逐步实现了内核，主要用于教学的目的。\n\n\n## 你会优先学习的工具有那些？\n\n值得优先学习的内容：\n\n- 多去使用键盘，少使用鼠标。这一目标可以通过多加利用快捷键，更换界面等来实现。\n- 学好编辑器。作为程序员你大部分时间都是在编辑文件，因此值得学好这些技能。\n- 学习怎样去自动化或简化工作流程中的重复任务。因为这会节省大量的时间。\n- 学习像 Git 之类的版本控制工具并且知道如何与 GitHub 结合，以便在现代的软件项目中协同工作。\n\n## 使用 Python VS Bash 脚本 VS 其他语言?\n\n通常来说，Bash 脚本对于简短的一次性脚本有效，比如当你想要运行一系列的命令的时候。但是 Bash 脚本有一些比较奇怪的地方，这使得大型程序或脚本难以用 Bash 实现：\n\n- Bash 对于简单的使用情形没什么问题，但是很难对于所有可能的输入都正确。例如，脚本参数中的空格会导致 Bash 脚本出错。\n- Bash 对于代码重用并不友好。因此，重用你先前已经写好的代码很困难。通常 Bash 中没有软件库的概念。\n- Bash 依赖于一些像 `$?` 或 `$@` 的特殊字符指代特殊的值。其他的语言却会显式地引用，比如  `exitCode` 或 `sys.args`。\n\n因此，对于大型或者更加复杂的脚本我们推荐使用更加成熟的脚本语言例如 Python 和 Ruby。\n你可以找到很多用这些语言编写的，用来解决常见问题的在线库。\n如果你发现某种语言实现了你所需要的特定功能库，最好的方式就是直接去使用那种语言。\n\n## `source script.sh` 和 `./script.sh` 有什么区别?\n\n这两种情况 `script.sh` 都会在 bash 会话中被读取和执行，不同点在于哪个会话执行这个命令。\n对于 `source` 命令来说，命令是在当前的 bash 会话中执行的，因此当 `source` 执行完毕，对当前环境的任何更改（例如更改目录或是定义函数）都会留存在当前会话中。\n单独运行 `./script.sh` 时，当前的 bash 会话将启动新的 bash 会话（实例），并在新实例中运行命令 `script.sh`。\n因此，如果 `script.sh` 更改目录，新的 bash 会话（实例）会更改目录，但是一旦退出并将控制权返回给父 bash 会话，父会话仍然留在先前的位置（不会有目录的更改）。\n同样，如果 `script.sh` 定义了要在终端中访问的函数，需要用 `source` 命令在当前 bash 会话中定义这个函数。否则，如果你运行 `./script.sh`，只有新的 bash 会话（进程）才能执行定义的函数，而当前的 shell 不能。\n\n## 各种软件包和工具存储在哪里？引用过程是怎样的? `/bin` 或 `/lib` 是什么？\n\n根据你在命令行中运行的程序，这些包和工具会全部在 `PATH` 环境变量所列出的目录中查找到， 你可以使用 `which` 命令（或是 `type` 命令）来检查你的 shell 在哪里发现了特定的程序。\n一般来说，特定种类的文件存储有一定的规范，[文件系统，层次结构标准（Filesystem, Hierarchy Standard）](https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard) 可以查到我们讨论内容的详细列表。\n\n- `/bin` - 基本命令二进制文件\n- `/sbin` - 基本的系统二进制文件，通常是 root 运行的\n- `/dev` - 设备文件，通常是硬件设备接口文件\n- `/etc` - 主机特定的系统配置文件\n- `/home` - 系统用户的主目录\n- `/lib` - 系统软件通用库\n- `/opt` - 可选的应用软件\n- `/sys` - 包含系统的信息和配置([第一堂课](/2020/course-shell/) 介绍的)\n- `/tmp` - 临时文件( `/var/tmp` ) 通常重启时删除\n- `/usr/` - 只读的用户数据\n  + `/usr/bin` - 非必须的命令二进制文件\n  + `/usr/sbin` - 非必须的系统二进制文件，通常是由 root 运行的\n  + `/usr/local/bin` - 用户编译程序的二进制文件\n- `/var` -变量文件 像日志或缓存\n\n## 我应该用 `apt-get install` 还是 `pip install` 去下载软件包呢?\n\n这个问题没有普遍的答案。这与使用系统程序包管理器还是特定语言的程序包管理器来安装软件这一更笼统的问题相关。需要考虑的几件事：\n\n- 常见的软件包都可以通过这两种方法获得，但是小众的软件包或较新的软件包可能不在系统程序包管理器中。在这种情况下，使用特定语言的程序包管理器是更好的选择。\n- 同样，特定语言的程序包管理器相比系统程序包管理器有更多的最新版本的程序包。\n- 当使用系统软件包管理器时，将在系统范围内安装库。如果出于开发目的需要不同版本的库，则系统软件包管理器可能不能满足你的需要。对于这种情况，大多数编程语言都提供了隔离或虚拟环境，因此你可以用特定语言的程序包管理器安装不同版本的库而不会发生冲突。对于 Python，可以使用  virtualenv，对于 Ruby，使用 RVM 。\n- 根据操作系统和硬件架构，其中一些软件包可能会附带二进制文件或者软件包需要被编译。例如，在树莓派（Raspberry Pi）之类的 ARM 架构计算机中，在软件附带二进制文件和软件包需要被编译的情况下，使用系统包管理器比特定语言包管理器更好。这在很大程度上取决于你的特定设置。\n你应该仅使用一种解决方案，而不同时使用两种方法，因为这可能会导致难以解决的冲突。我们的建议是尽可能使用特定语言的程序包管理器，并使用隔离的环境（例如 Python 的 virtualenv）以避免影响全局环境。\n\n## 用于提高代码性能，简单好用的性能分析工具有哪些?\n\n性能分析方面相当有用和简单工具是 [print timing](/2020/debugging-profiling/#timing)。你只需手动计算代码不同部分之间花费的时间。通过重复执行此操作，你可以有效地对代码进行二分法搜索，并找到花费时间最长的代码段。\n\n对于更高级的工具， Valgrind 的 [Callgrind](http://valgrind.org/docs/manual/cl-manual.html) 可让你运行程序并计算所有的时间花费以及所有调用堆栈（即哪个函数调用了另一个函数）。然后，它会生成带注释的代码版本，其中包含每行花费的时间。但是，它会使程序运行速度降低一个数量级，并且不支持线程。其他的，[ `perf` ](http://www.brendangregg.com/perf.html) 工具和其他特定语言的采样性能分析器可以非常快速地输出有用的数据。[Flamegraphs](http://www.brendangregg.com/flamegraphs.html) 是对采样分析器结果的可视化工具。你还可以使用针对特定编程语言或任务的工具。例如，对于 Web 开发而言，Chrome 和 Firefox 内置的开发工具具有出色的性能分析器。\n\n有时，代码中最慢的部分是系统等待磁盘读取或网络数据包之类的事件。在这些情况下，需要检查根据硬件性能估算的理论速度是否不偏离实际数值，也有专门的工具来分析系统调用中的等待时间，包括用于用户程序内核跟踪的 [eBPF](http://www.brendangregg.com/blog/2019-01-01/learn-ebpf-tracing.html) 。如果需要低级的性能分析，[ `bpftrace` ](https://github.com/iovisor/bpftrace) 值得一试。\n\n\n## 你使用那些浏览器插件?\n\n我们钟爱的插件主要与安全性与可用性有关：\n- [uBlock Origin](https://github.com/gorhill/uBlock) - 是一个 [用途广泛（wide-spectrum）](https://github.com/gorhill/uBlock/wiki/Blocking-mode) 的拦截器，它不仅可以拦截广告，还可以拦截第三方的页面，也可以拦截内部脚本和其他种类资源的加载。如果你打算花更多的时间去配置，前往 [中等模式（medium mode）](https://github.com/gorhill/uBlock/wiki/Blocking-mode:-medium-mode) 或者 [强力模式（hard mode）](https://github.com/gorhill/uBlock/wiki/Blocking-mode:-hard-mode)。在你调整好设置之前一些网站会停止工作，但是这些配置会显著提高你的网络安全水平。另外， [简易模式（easy mode）](https://github.com/gorhill/uBlock/wiki/Blocking-mode:-easy-mode) 作为默认模式已经相当不错了，可以拦截大部分的广告和跟踪，你也可以自定义规则来拦截网站对象。\n- [Stylus](https://github.com/openstyles/stylus/) - 是 Stylish 的分支（不要使用 Stylish，它会 [窃取浏览记录](https://www.theregister.co.uk/2018/07/05/browsers_pull_stylish_but_invasive_browser_extension/)），这个插件可让你将自定义 CSS 样式加载到网站。使用 Stylus，你可以轻松地自定义和修改网站的外观。可以删除侧边框，更改背景颜色，更改文字大小或字体样式。这可以使你经常访问的网站更具可读性。此外，Stylus 可以找到其他用户编写并发布在 [userstyles.org](https://userstyles.org/) 中的样式。大多数常用的网站都有一个或几个深色主题样式。\n- 全页屏幕捕获 - 内置于 [Firefox](https://screenshots.firefox.com/) 和 [ Chrome 扩展程序](https://chrome.google.com/webstore/detail/full-page-screen-capture/fdpohaocaechififmbbbbbknoalclacl?hl=en) 中。这些插件提供完整的网站截图，通常比打印要好用。\n- [多账户容器](https://addons.mozilla.org/en-US/firefox/addon/multi-account-containers/) - 该插件使你可以将 Cookie 分为“容器”，从而允许你以不同的身份浏览 web 网页并且/或确保网站无法在它们之间共享信息。\n- 密码集成管理器 - 大多数密码管理器都有浏览器插件，这些插件帮你将登录凭据输入网站的过程不仅方便，而且更加安全。与简单复制粘贴用户名和密码相比，这些插件将首先检查网站域是否与列出的条目相匹配，以防止冒充网站的网络钓鱼窃取登录凭据。\n\n## 有哪些有用的数据整理工具？\n\n在数据整理那一节课程中，我们没有时间讨论一些数据整理工具，包括分别用于 JSON 和 HTML 数据的专用解析器， `jq` 和 `pup`。Perl 语言是另一个更高级的可以用于数据整理管道的工具。另一个技巧是使用 `column -t` 命令，可以将空格文本（不一定对齐）转换为对齐的文本。\n\n更一般地讲，还有 vim 和 Python 两个非传统意义上的数据整理工具。对于某些复杂的多行转换，vim 宏是非常有用的工具。你可以记录一系列操作，并根据需要重复执行多次，例如，在“编辑器”一节的 [讲义](/2020/editors/#macros)（去年 [视频](/2019/editors/)）中，有一个示例是使用 vim 宏将 XML 格式的文件转换为 JSON。\n\n对于通常以 CSV 格式显示的表格数据， Python [pandas](https://pandas.pydata.org/) 库是一个很棒的工具。不仅因为它能让复杂操作的定义（如分组依据，联接或过滤器）变得非常容易，而且还便于根据不同属性绘制数据。它还支持导出多种表格格式，包括 XLS，HTML 或 LaTeX。另外，R 语言(一种有争议的 [不好](http://arrgh.tim-smith.us/) 的语言）具有很多功能，可以计算数据的统计数字，这在管道的最后一步中非常有用。 [ggplot2](https://ggplot2.tidyverse.org/) 是 R 中很棒的绘图库。\n\n## Docker 和虚拟机有什么区别?\n\nDocker 基于容器这个更为概括的概念。关于容器和虚拟机之间最大的不同是，虚拟机会执行整个的 OS 栈，包括内核（即使这个内核和主机内核相同）。与虚拟机不同，容器避免运行其他内核实例，而是与主机分享内核。在 Linux 环境中，有 LXC 机制来实现，并且这能使一系列分离的主机像是在使用自己的硬件启动程序，而实际上是共享主机的硬件和内核。因此容器的开销小于完整的虚拟机。\n\n另一方面，容器的隔离性较弱而且只有在主机运行相同的内核时才能正常工作。例如，如果你在 macOS 上运行 Docker，Docker 需要启动 Linux 虚拟机去获取初始的 Linux 内核，这样的开销仍然很大。最后，Docker 是容器的特定实现，它是为软件部署而定制的。基于这些，它有一些奇怪之处：例如，默认情况下，Docker 容器在重启之间不会有以任何形式的存储。\n\n## 不同操作系统的优缺点是什么，我们如何选择（比如选择最适用于我们需求的 Linux 发行版）?\n\n关于 Linux 发行版，尽管有相当多的版本，但大部分发行版在大多数使用情况下的表现是相同的。\n可以使用任何发行版去学习 Linux 与 UNIX 的特性和其内部工作原理。\n发行版之间的根本区别是发行版如何处理软件包更新。\n某些版本，例如 Arch Linux 采用滚动更新策略，用了最前沿的软件包（bleeding-edge），但软件可能并不稳定。另外一些发行版（如 Debian，CentOS 或 Ubuntu LTS）其更新策略要保守得多，因此更新的内容会更稳定，但会牺牲一些新功能。我们建议你使用 Debian 或 Ubuntu 来获得简单稳定的台式机和服务器体验。\n\nMac OS 是介于 Windows 和 Linux 之间的一个操作系统，它有很漂亮的界面。但是，Mac OS 是基于 BSD 而不是 Linux，因此系统的某些部分和命令是不同的。\n另一种值得体验的是 FreeBSD。虽然某些程序不能在 FreeBSD 上运行，但与 Linux 相比，BSD 生态系统的碎片化程度要低得多，并且说明文档更加友好。\n除了开发 Windows 应用程序或需要使用某些 Windows 系统更好支持的功能（例如对游戏的驱动程序支持）外，我们不建议使用 Windows。\n\n对于双系统，我们认为最有效的是 macOS 的 bootcamp，长期来看，任何其他组合都可能会出现问题，尤其是当你结合了其他功能比如磁盘加密。\n\n## 使用 Vim 编辑器 VS Emacs 编辑器?\n\n我们三个都使用 vim 作为我们的主要编辑器。但是 Emacs 也是一个不错的选择，你可以两者都尝试，看看那个更适合你。Emacs 不使用 vim 的模式编辑，但是这些功能可以通过 Emacs 插件像 [Evil](https://github.com/emacs-evil/evil) 或 [Doom Emacs](https://github.com/hlissner/doom-emacs) 来实现。\nEmacs 的优点是可以用 Lisp 语言进行扩展（Lisp 比 vim 默认的脚本语言 vimscript 要更好用）。\n\n## 机器学习应用的提示或技巧?\n\n课程的一些经验可以直接用于机器学习程序。\n就像许多科学学科一样，在机器学习中，你需要进行一系列实验，并检查哪些数据有效，哪些无效。\n你可以使用 Shell 轻松快速地搜索这些实验结果，并且以合理的方式汇总。这意味着需要在限定时间内或使用特定数据集的情况下，检查所有实验结果。通过使用 JSON 文件记录实验的所有相关参数，使用我们在本课程中介绍的工具，这件事情可以变得极其简单。\n最后，如果你不使用集群提交你的 GPU 作业，那你应该研究如何使该过程自动化，因为这是一项非常耗时的任务，会消耗你的精力。\n\n## 还有更多的 Vim 小窍门吗？\n\n更多的窍门：\n\n- 插件 - 花时间去探索插件。有很多不错的插件修复了 vim 的缺陷或者增加了能够与现有 vim 工作流结合的新功能。关于这部分内容，资源是 [VimAwesome](https://vimawesome.com/) 和其他程序员的 dotfiles。\n- 标记 - 在 vim 里你可以使用 `m<X>` 为字母 `X` 做标记，之后你可以通过 `'<X>` 回到标记位置。这可以让你快速定位到文件内或文件间的特定位置。\n- 导航 - `Ctrl+O` 和 `Ctrl+I` 命令可以使你在最近访问位置前后移动。\n- 撤销树 - vim 有不错的更改跟踪机制，不同于其他的编辑器，vim 存储变更树，因此即使你撤销后做了一些修改，你仍然可以通过撤销树的导航回到初始状态。一些插件比如 [gundo.vim](https://github.com/sjl/gundo.vim) 和 [undotree](https://github.com/mbbill/undotree) 通过图形化来展示撤销树。\n- 时间撤销 - `:earlier` 和 `:later` 命令使得你可以用时间而非某一时刻的更改来定位文件。\n- [持续撤销](https://vim.fandom.com/wiki/Using_undo_branches#Persistent_undo) - 是一个默认未被开启的 vim 的内置功能，它在 vim 启动之间保存撤销历史，需要配置在 `.vimrc` 目录下的 `undofile` 和 `undodir`，vim 会保存每个文件的修改历史。\n- 热键（Leader Key） - 热键是一个用于用户自定义配置命令的特殊按键。这种模式通常是按下后释放这个按键（通常是空格键）并与其他的按键组合去实现一个特殊的命令。插件也会用这些按键增加它们的功能，例如，插件 UndoTree 使用 `<Leader> U` 去打开撤销树。\n- 高级文本对象 - 文本对象比如搜索也可以用 vim 命令构成。例如，`d/<pattern>` 会删除下一处匹配 pattern 的字符串，`cgn` 可以用于更改上次搜索的关键字。\n\n## 2FA 是什么，为什么我需要使用它?\n\n双因子验证（Two Factor Authentication 2FA）在密码之上为帐户增加了一层额外的保护。为了登录，你不仅需要知道密码，还必须以某种方式“证明”可以访问某些硬件设备。最简单的情形是可以通过接收手机的 SMS 来实现（尽管 SMS 2FA 存在 [已知问题](https://www.kaspersky.com/blog/2fa-practical-guide/24219/)）。我们推荐使用 [YubiKey](https://www.yubico.com/) 之类的 [U2F](https://en.wikipedia.org/wiki/Universal_2nd_Factor) 方案。\n\n## 对于不同的 Web 浏览器有什么评价?\n\n2020 的浏览器现状是，大部分的浏览器都与 Chrome 类似，因为它们都使用同样的引擎(Blink)。Microsoft Edge 同样基于 Blink，而 Safari 则 基于 WebKit(与 Blink 类似的引擎)，这些浏览器仅仅是更糟糕的 Chrome 版本。不管是在性能还是可用性上，Chrome 都是一款很不错的浏览器。如果你想要替代品，我们推荐 Firefox。Firefox 与 Chrome 的在各方面不相上下，并且在隐私方面更加出色。\n有一款目前还没有完成的叫 Flow 的浏览器，它实现了全新的渲染引擎，有望比现有引擎速度更快。\n"
  },
  {
    "path": "_2020/security.md",
    "content": "---\nlayout: lecture\ntitle: \"安全和密码学\"\ndate: 2020-01-28\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n  aspect: 56.25\n  id: tjwobAmnKTo\nsolution:\n    ready: true\n    url: security-solution\n---\n\n去年的 [这节课](/2019/security/) 我们从计算机 _用户_ 的角度探讨了增强隐私保护和安全的方法。\n今年我们将关注比如散列函数、密钥生成函数、对称/非对称密码体系这些安全和密码学的概念是如何应用于前几节课所学到的工具（Git 和 SSH）中的。\n\n本课程不能作为计算机系统安全 ([6.858](https://css.csail.mit.edu/6.858/)) 或者\n密码学 ([6.857](https://courses.csail.mit.edu/6.857/) 以及 6.875) 的替代。\n如果你不是密码学的专家，请不要 [试图创造或者修改加密算法](https://www.schneier.com/blog/archives/2015/05/amateurs_produc.html)。从事和计算机系统安全相关的工作同理。\n\n这节课将对一些基本的概念进行简单（但实用）的说明。\n虽然这些说明不足以让你学会如何 _设计_ 安全系统或者加密协议，但我们希望你可以对现在使用的程序和协议有一个大概了解。\n\n# 熵\n\n[熵](https://en.wikipedia.org/wiki/Entropy_(information_theory)) (Entropy) 是不确定性的度量，这很有用，可以用来决定密码的强度。\n\n![XKCD 936: Password Strength](https://imgs.xkcd.com/comics/password_strength.png)\n\n正如上面的 [XKCD 漫画](https://xkcd.com/936/) 所描述的，\n\"correcthorsebatterystaple\" 这个密码比 \"Tr0ub4dor&3\" 更安全——可是熵是如何量化安全性的呢？\n\n熵的单位是 _比特_。对于一个均匀分布的随机离散变量，熵等于 `log_2(所有可能的个数，即 n)`。\n扔一次硬币的熵是 1 比特。掷一次（六面）骰子的熵大约为 2.58 比特。\n\n一般我们认为攻击者了解密码的模型（最小长度，最大长度，可能包含的字符种类等），但是不了解某个密码是如何随机选择的——\n比如 [掷骰子](https://en.wikipedia.org/wiki/Diceware)。\n\n使用多少比特的熵取决于应用的威胁模型。 \n上面的 XKCD 漫画告诉我们，大约 40 比特的熵足以对抗在线穷举攻击（受限于网络速度和应用认证机制）。\n而对于离线穷举攻击（主要受限于计算速度）, 一般需要更强的密码 (比如 80 比特或更多)。\n\n# 散列函数\n\n[密码散列函数](https://en.wikipedia.org/wiki/Cryptographic_hash_function)\n(Cryptographic hash function) 可以将任意大小的数据映射为一个固定大小的输出。除此之外，还有一些其他特性。\n一个散列函数的大概规范如下：\n\n```\nhash(value: array<byte>) -> vector<byte, N>  (N对于该函数固定)\n```\n\n[SHA-1](https://en.wikipedia.org/wiki/SHA-1) 是 Git 中使用的一种散列函数，\n它可以将任意大小的输入映射为一个 160 比特（可被 40 位十六进制数表示）的输出。\n下面我们用 `sha1sum` 命令来测试 SHA1 对几个字符串的输出：\n\n```console\n$ printf 'hello' | sha1sum\naaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d\n$ printf 'hello' | sha1sum\naaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d\n$ printf 'Hello' | sha1sum \nf7ff9e8b7bb2e09b70935a5d785e0cc5d9d0abf0\n```\n\n抽象地讲，散列函数可以被认为是一个不可逆，且看上去随机（但具确定性）的函数\n（这就是 [散列函数的理想模型](https://en.wikipedia.org/wiki/Random_oracle)）。\n一个散列函数拥有以下特性：\n\n- 确定性：对于不变的输入永远有相同的输出。\n- 不可逆性：对于 `hash(m) = h`，难以通过已知的输出 `h` 来计算出原始输入 `m`。\n- 目标碰撞抵抗性/弱无碰撞：对于一个给定输入 `m_1`，难以找到 `m_2 != m_1` 且 `hash(m_1) = hash(m_2)`。\n- 碰撞抵抗性/强无碰撞：难以找到一组满足 `hash(m_1) = hash(m_2)` 的输入 `m_1, m_2`（该性质严格强于目标碰撞抵抗性）。\n\n注：虽然 SHA-1 还可以用于特定用途，但它已经 [不再被认为](https://shattered.io/) 是一个强密码散列函数。\n你可参照 [密码散列函数的生命周期](https://valerieaurora.org/hash.html) 这个表格了解一些散列函数是何时被发现弱点及破解的。 \n请注意，针对应用推荐特定的散列函数超出了本课程内容的范畴。\n如果选择散列函数对于你的工作非常重要，请先系统学习信息安全及密码学。\n\n\n## 密码散列函数的应用\n\n- Git 中的内容寻址存储(Content-addressed storage)：[散列函数](https://en.wikipedia.org/wiki/Hash_function) 是一个宽泛的概念（存在非密码学的散列函数），那么 Git 为什么要特意使用密码散列函数？\n- 文件的信息摘要(Message digest)：像 Linux ISO 这样的软件可以从非官方的（有时不太可信的）镜像站下载，所以需要设法确认下载的软件和官方一致。\n官方网站一般会在（指向镜像站的）下载链接旁边备注安装文件的哈希值。\n用户从镜像站下载安装文件后可以对照公布的哈希值来确定安装文件没有被篡改。\n- [承诺机制](https://en.wikipedia.org/wiki/Commitment_scheme)(Commitment scheme)：\n假设我希望承诺一个值，但之后再透露它—— \n比如在没有一个可信的、双方可见的硬币的情况下在我的脑海中公平的“扔一次硬币”。\n我可以选择一个值 `r = random()`，并和你分享它的哈希值 `h = sha256(r)`。\n这时你可以开始猜硬币的正反：我们一致同意偶数 `r` 代表正面，奇数 `r` 代表反面。\n你猜完了以后，我告诉你值 `r` 的内容，得出胜负。同时你可以使用 `sha256(r)` 来检查我分享的哈希值 `h` 以确认我没有作弊。\n\n# 密钥生成函数\n\n[密钥生成函数](https://en.wikipedia.org/wiki/Key_derivation_function) (Key Derivation Functions) 作为密码散列函数的相关概念，被应用于包括生成固定长度，可以使用在其他密码算法中的密钥等方面。\n为了对抗穷举法攻击，密钥生成函数通常较慢。\n\n## 密钥生成函数的应用\n\n- 从密码生成可以在其他加密算法中使用的密钥，比如对称加密算法（见下）。\n- 存储登录凭证时不可直接存储明文密码。<br>\n正确的方法是针对每个用户随机生成一个 [盐](https://en.wikipedia.org/wiki/Salt_(cryptography)) `salt = random()`，\n并存储盐，以及密钥生成函数对连接了盐的明文密码生成的哈希值 `KDF(password + salt)`。<br>\n在验证登录请求时，使用输入的密码连接存储的盐重新计算哈希值 `KDF(input + salt)`，并与存储的哈希值对比。\n\n# 对称加密\n\n说到加密，可能你会首先想到隐藏明文信息。对称加密使用以下几个方法来实现这个功能：\n\n```\nkeygen() -> key  (这是一个随机方法)\n\nencrypt(plaintext: array<byte>, key) -> array<byte>  (输出密文)\ndecrypt(ciphertext: array<byte>, key) -> array<byte>  (输出明文)\n```\n\n加密方法 `encrypt()` 输出的密文 `ciphertext` 很难在不知道 `key` 的情况下得出明文 `plaintext`。<br>\n解密方法 `decrypt()` 有明显的正确性。因为功能要求给定密文及其密钥，解密方法必须输出明文：`decrypt(encrypt(m, k), k) = m`。\n\n[AES](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard) 是现在常用的一种对称加密系统。\n\n## 对称加密的应用\n\n- 加密不信任的云服务上存储的文件。对称加密和密钥生成函数配合起来，就可以使用密码加密文件：\n将密码输入密钥生成函数生成密钥 `key = KDF(passphrase)`，然后存储 `encrypt(file, key)`。\n\n# 非对称加密\n\n非对称加密的“非对称”代表在其环境中，使用两个具有不同功能的密钥：\n一个是私钥(private key)，不向外公布；另一个是公钥(public key)，公布公钥不像公布对称加密的共享密钥那样可能影响加密体系的安全性。<br>\n非对称加密使用以下几个方法来实现加密/解密(encrypt/decrypt)，以及签名/验证(sign/verify)：\n\n```\nkeygen() -> (public key, private key)  (这是一个随机方法)\n\nencrypt(plaintext: array<byte>, public key) -> array<byte>  (输出密文)\ndecrypt(ciphertext: array<byte>, private key) -> array<byte>  (输出明文)\n\nsign(message: array<byte>, private key) -> array<byte>  (生成签名)\nverify(message: array<byte>, signature: array<byte>, public key) -> bool  (验证签名是否是由和这个公钥相关的私钥生成的)\n```\n\n非对称的加密/解密方法和对称的加密/解密方法有类似的特征。<br>\n信息在非对称加密中使用 _公钥_ 加密，\n且输出的密文很难在不知道 _私钥_ 的情况下得出明文。<br>\n解密方法 `decrypt()` 有明显的正确性。\n给定密文及私钥，解密方法一定会输出明文：\n`decrypt(encrypt(m, public key), private key) = m`。\n\n对称加密和非对称加密可以类比为机械锁。\n对称加密就好比一个防盗门：只要是有钥匙的人都可以开门或者锁门。\n非对称加密好比一个可以拿下来的挂锁。你可以把打开状态的挂锁（公钥）给任何一个人并保留唯一的钥匙（私钥）。这样他们将给你的信息装进盒子里并用这个挂锁锁上以后，只有你可以用保留的钥匙开锁。\n\n签名/验证方法具有和书面签名类似的特征。<br>\n在不知道 _私钥_ 的情况下，不管需要签名的信息为何，很难计算出一个可以使\n`verify(message, signature, public key)` 返回为真的签名。<br>\n对于使用私钥签名的信息，验证方法验证和私钥相对应的公钥时一定返回为真： `verify(message,\nsign(message, private key), public key) = true`。\n\n## 非对称加密的应用\n\n- [PGP 电子邮件加密](https://en.wikipedia.org/wiki/Pretty_Good_Privacy)：用户可以将所使用的公钥在线发布，比如：PGP 密钥服务器或\n[Keybase](https://keybase.io/)。任何人都可以向他们发送加密的电子邮件。\n- 聊天加密：像 [Signal](https://signal.org/) 和\n[Keybase](https://keybase.io/) 使用非对称密钥来建立私密聊天。\n- 软件签名：Git 支持用户对提交(commit)和标签(tag)进行 GPG 签名。任何人都可以使用软件开发者公布的签名公钥验证下载的已签名软件。\n\n## 密钥分发\n\n非对称加密面对的主要挑战是，如何分发公钥并对应现实世界中存在的人或组织。\n\nSignal 的信任模型是，信任用户第一次使用时给出的身份(trust on first use)，同时支持用户线下(out-of-band)、面对面交换公钥（Signal 里的 safety number）。\n\nPGP 使用的是 [信任网络](https://en.wikipedia.org/wiki/Web_of_trust)。简单来说，如果我想加入一个信任网络，则必须让已经在信任网络中的成员对我进行线下验证，比如对比证件。验证无误后，信任网络的成员使用私钥对我的公钥进行签名。这样我就成为了信任网络的一部分。只要我使用签名过的公钥所对应的私钥就可以证明“我是我”。\n\nKeybase 主要使用 [社交网络证明 (social proof)](https://keybase.io/blog/chat-apps-softer-than-tofu)，和一些别的精巧设计。\n\n每个信任模型有它们各自的优点：我们（讲师）更倾向于 Keybase 使用的模型。\n\n# 案例分析\n\n## 密码管理器\n\n每个人都应该尝试使用密码管理器，比如 [KeePassXC](https://keepassxc.org/)、[pass](https://www.passwordstore.org/) 和 [1Password](https://1password.com))。\n\n密码管理器会帮助你对每个网站生成随机且复杂（表现为高熵）的密码，并使用你指定的主密码配合密钥生成函数来对称加密它们。\n\n你只需要记住一个复杂的主密码，密码管理器就可以生成很多复杂度高且不会重复使用的密码。密码管理器通过这种方式降低密码被猜出的可能，并减少网站信息泄露后对其他网站密码的威胁。\n\n## 两步验证（双因子验证）\n\n[两步验证](https://en.wikipedia.org/wiki/Multi-factor_authentication)（2FA）要求用户同时使用密码（“你知道的信息”）和一个身份验证器（“你拥有的物品”，比如 [YubiKey](https://www.yubico.com/)）来消除密码泄露或者 [钓鱼攻击](https://en.wikipedia.org/wiki/Phishing) 的威胁。\n\n\n## 全盘加密\n\n对笔记本电脑的硬盘进行全盘加密是防止因设备丢失而信息泄露的简单且有效方法。\nLinux 的[cryptsetup +\nLUKS](https://wiki.archlinux.org/index.php/Dm-crypt/Encrypting_a_non-root_file_system)，\nWindows 的 [BitLocker](https://fossbytes.com/enable-full-disk-encryption-windows-10/)，或者 macOS 的 [FileVault](https://support.apple.com/en-us/HT204837) 都使用一个由密码保护的对称密钥来加密盘上的所有信息。\n\n## 聊天加密\n\n[Signal](https://signal.org/) 和 [Keybase](https://keybase.io/) 使用非对称加密对用户提供端到端 （End-to-end） 安全性。\n\n获取联系人的公钥非常关键。为了保证安全性，应使用线下方式验证 Signal 或者 Keybase 的用户公钥，或者信任 Keybase 用户提供的社交网络证明。\n\n## SSH\n\n我们在 [之前的一堂课](/2020/command-line/#remote-machines) 讨论了 SSH 和 SSH 密钥的使用。那么我们今天从密码学的角度来分析一下它们。\n\n当你运行 `ssh-keygen` 命令，它会生成一个非对称密钥对：公钥和私钥 `(public_key, private_key)`。 \n生成过程中使用的随机数由系统提供的熵决定。这些熵可以来源于硬件事件(hardware events)等。\n公钥最终会被分发，它可以直接明文存储。\n但是为了防止泄露，私钥必须加密存储。`ssh-keygen` 命令会提示用户输入一个密码，并将它输入密钥生成函数\n产生一个密钥。最终，`ssh-keygen` 使用对称加密算法和这个密钥加密私钥。\n\n在实际运用中，当服务器已知用户的公钥（存储在 `.ssh/authorized_keys` 文件中，一般在用户 HOME 目录下），尝试连接的客户端可以使用非对称签名来证明用户的身份——这便是 [挑战应答方式](https://en.wikipedia.org/wiki/Challenge%E2%80%93response_authentication)。\n简单来说，服务器选择一个随机数字发送给客户端。客户端使用用户私钥对这个数字信息签名后返回服务器。\n服务器随后使用 `.ssh/authorized_keys` 文件中存储的用户公钥来验证返回的信息是否由所对应的私钥所签名。这种验证方式可以有效证明试图登录的用户持有所需的私钥。\n\n{% comment %}\nextra topics, if there's time\n\nsecurity concepts, tips\n- biometrics\n- HTTPS\n{% endcomment %}\n\n# 资源\n\n- [去年的讲稿](/2019/security/): 更注重于计算机用户可以如何增强隐私保护和安全\n- [Cryptographic Right Answers](https://latacora.micro.blog/2018/04/03/cryptographic-right-answers.html): \n解答了在一些应用环境下“应该使用什么加密？”的问题\n\n# 课后练习\n[习题解答]({{site.url}}/{{site.solution_url}}/{{page.solution.url}})\n1. **熵**\n    1. 假设一个密码是由四个小写的单词拼接组成，每个单词都是从一个含有 10 万单词的字典中随机选择，且每个单词选中的概率相同。\n       一个符合这样构造的例子是 `correcthorsebatterystaple`。这个密码有多少比特的熵？\n    2. 假设另一个密码是用八个随机的大小写字母或数字组成。一个符合这样构造的例子是 `rg8Ql34g`。这个密码又有多少比特的熵？\n    3. 哪一个密码更强？\n    4. 假设一个攻击者每秒可以尝试 1 万个密码，这个攻击者需要多久可以分别破解上述两个密码？\n2. **密码散列函数** 从 [Debian 镜像站](https://www.debian.org/CD/http-ftp/) 下载一个光盘映像（比如这个来自阿根廷镜像站的 [映像](http://debian.xfree.com.ar/debian-cd/10.2.0/amd64/iso-cd/debian-10.2.0-amd64-netinst.iso)）。使用 `sha256sum` 命令对比下载映像的哈希值和官方 Debian 站公布的哈希值。如果你下载了上面的映像，官方公布的哈希值可以参考 [这个文件](https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/SHA256SUMS)。\n3. **对称加密** 使用\n   [OpenSSL](https://www.openssl.org/) 的 AES 模式加密一个文件: `openssl aes-256-cbc -salt -in {源文件名} -out {加密文件名}`。\n   使用 `cat` 或者 `hexdump` 对比源文件和加密的文件，再用 `openssl aes-256-cbc -d -in {加密文件名} -out\n   {解密文件名}` 命令解密刚刚加密的文件。最后使用` cmp`命令确认源文件和解密后的文件内容相同。\n4. **非对称加密**\n    1. 在你自己的电脑上使用更安全的 [ED25519 算法](https://wiki.archlinux.org/index.php/SSH_keys#Ed25519) 生成一组[SSH\n       密钥对](https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys--2)。为了确保私钥不使用时的安全，一定使用密码加密你的私钥。\n    2. [配置 GPG](https://www.digitalocean.com/community/tutorials/how-to-use-gpg-to-encrypt-and-sign-messages)。\n    3. 给 Anish 发送一封加密的电子邮件（[Anish 的公钥](https://keybase.io/anish)）。\n    4. 使用 `git commit -S` 命令签名一个 Git 提交，并使用 `git show --show-signature` 命令验证这个提交的签名。或者，使用 `git tag -s` 命令签名一个 Git 标签，并使用 `git tag -v` 命令验证标签的签名。\n"
  },
  {
    "path": "_2020/shell-tools.md",
    "content": "---\nlayout: lecture\ntitle: \"Shell 工具和脚本\"\ndate: 2020-01-14\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n  aspect: 56.25\n  id: kgII-YWo3Zw\nsolution:\n    ready: true\n    url: shell-tools-solution\n---\n\n在这节课中，我们将会展示 bash 作为脚本语言的一些基础操作，以及几种最常用的 shell 工具。\n\n\n# Shell 脚本\n\n到目前为止，我们已经学习了如何在 shell 中执行命令，并使用管道将命令组合使用。但是，很多情况下我们需要执行一系列的操作并使用条件或循环这样的控制流。\n\n\nshell 脚本的复杂性进一步提高。\n\n\n大多数 shell 都有自己的一套脚本语言，包括变量、控制流和自己的语法。shell 脚本与其他脚本语言不同之处在于，shell 脚本针对 shell 所从事的相关工作进行了优化。因此，创建命令流程（pipelines）、将结果保存到文件、从标准输入中读取输入，这些都是 shell 脚本中的原生操作，这让它比通用的脚本语言更易用。本节中，我们会专注于 bash 脚本，因为它最流行，应用更为广泛。\n\n在 bash 中为变量赋值的语法是 `foo=bar`，访问变量中存储的数值，其语法为 `$foo`。\n需要注意的是，`foo = bar` （使用空格隔开）是不能正确工作的，因为解释器会调用程序 `foo` 并将 `=` 和 `bar` 作为参数。\n总的来说，在 shell 脚本中使用空格会起到分割参数的作用，有时候可能会造成混淆，请务必多加检查。\n\nBash 中的字符串通过 `'` 和 `\"` 分隔符来定义，但是它们的含义并不相同。以 `'` 定义的字符串为原义字符串，其中的变量不会被转义，而 `\"` 定义的字符串会将变量值进行替换。\n\n```bash\nfoo=bar\necho \"$foo\"\n# 打印 bar\necho '$foo'\n# 打印 $foo\n```\n\n和其他大多数的编程语言一样，`bash` 也支持 `if`, `case`, `while` 和 `for` 这些控制流关键字。同样地，\n `bash` 也支持函数，它可以接受参数并基于参数进行操作。下面这个函数是一个例子，它会创建一个文件夹并使用 `cd` 进入该文件夹。\n\n\n```bash\nmcd () {\n    mkdir -p \"$1\"\n    cd \"$1\"\n}\n```\n\n这里 `$1` 是脚本的第一个参数。与其他脚本语言不同的是，bash 使用了很多特殊的变量来表示参数、错误代码和相关变量。下面列举了其中一些变量，更完整的列表可以参考 [这里](https://www.tldp.org/LDP/abs/html/special-chars.html)。\n- `$0` - 脚本名\n- `$1` 到 `$9` - 脚本的参数。 `$1` 是第一个参数，依此类推。\n- `$@` - 所有参数\n- `$#` - 参数个数\n- `$?` - 前一个命令的返回值\n- `$$` - 当前脚本的进程识别码\n- `!!` - 完整的上一条命令，包括参数。常见应用：当你因为权限不足执行命令失败时，可以使用 `sudo !!` 再尝试一次。\n- `$_` - 上一条命令的最后一个参数。如果你正在使用的是交互式 shell，你可以通过按下 `Esc` 之后键入 . 来获取这个值。\n\n命令通常使用 `STDOUT` 来返回输出值，使用 `STDERR` 来返回错误及错误码，便于脚本以更加友好的方式报告错误。\n返回码或退出状态是脚本/命令之间交流执行状态的方式。返回值 0 表示正常执行，其他所有非 0 的返回值都表示有错误发生。\n\n退出码可以搭配 `&&`（与操作符）和 `||`（或操作符）使用，用来进行条件判断，决定是否执行其他程序。它们都属于 [短路运算符](https://en.wikipedia.org/wiki/Short-circuit_evaluation)（short-circuiting） 同一行的多个命令可以用 `;` 分隔。程序 `true` 的返回码永远是 `0`，`false` 的返回码永远是 `1`。让我们看几个例子\n\n```bash\nfalse || echo \"Oops, fail\"\n# Oops, fail\n\ntrue || echo \"Will not be printed\"\n#\n\ntrue && echo \"Things went well\"\n# Things went well\n\nfalse && echo \"Will not be printed\"\n#\n\nfalse ; echo \"This will always run\"\n# This will always run\n```\n\n另一个常见的模式是以变量的形式获取一个命令的输出，这可以通过 _命令替换_（_command substitution_）实现。\n\n当您通过 `$( CMD )` 这样的方式来执行 `CMD` 这个命令时，它的输出结果会替换掉 `$( CMD )` 。例如，如果执行 `for file in $(ls)` ，shell 首先将调用 `ls` ，然后遍历得到的这些返回值。还有一个冷门的类似特性是 _进程替换_（_process substitution_）， `<( CMD )` 会执行 `CMD` 并将结果输出到一个临时文件中，并将 `<( CMD )` 替换成临时文件名。这在我们希望返回值通过文件而不是 STDIN 传递时很有用。例如， `diff <(ls foo) <(ls bar)` 会显示文件夹 `foo` 和 `bar` 中文件的区别。\n\n说了很多，现在该看例子了，下面这个例子展示了一部分上面提到的特性。这段脚本会遍历我们提供的参数，使用 `grep` 搜索字符串 `foobar`，如果没有找到，则将其作为注释追加到文件中。\n\n\n```bash\n#!/bin/bash\n\necho \"Starting program at $(date)\" # date会被替换成日期和时间\n\necho \"Running program $0 with $# arguments with pid $$\"\n\nfor file in \"$@\"; do\n    grep foobar \"$file\" > /dev/null 2> /dev/null\n    # 如果模式没有找到，则grep退出状态为 1\n    # 我们将标准输出流和标准错误流重定向到Null，因为我们并不关心这些信息\n    if [[ $? -ne 0 ]]; then\n        echo \"File $file does not have any foobar, adding one\"\n        echo \"# foobar\" >> \"$file\"\n    fi\ndone\n```\n\n在条件语句中，我们比较 `$?` 是否等于 0。\nBash 实现了许多类似的比较操作，您可以查看 [`test 手册`](https://man7.org/linux/man-pages/man1/test.1.html)。\n在 bash 中进行比较时，尽量使用双方括号 `[[ ]]` 而不是单方括号 `[ ]`，这样会降低犯错的几率，尽管这样并不能兼容 `sh`。 更详细的说明参见 [这里](http://mywiki.wooledge.org/BashFAQ/031)。\n\n当执行脚本时，我们经常需要提供形式类似的参数。bash 使我们可以轻松的实现这一操作，它可以基于文件扩展名展开表达式。这一技术被称为 shell 的 _通配_（_globbing_）\n\n- 通配符 - 当你想要利用通配符进行匹配时，你可以分别使用 `?` 和 `*` 来匹配一个或任意个字符。例如，对于文件 `foo`, `foo1`, `foo2`, `foo10` 和 `bar`, `rm foo?` 这条命令会删除 `foo1` 和 `foo2` ，而 `rm foo*` 则会删除除了 `bar` 之外的所有文件。\n- 花括号 `{}` - 当你有一系列的指令，其中包含一段公共子串时，可以用花括号来自动展开这些命令。这在批量移动或转换文件时非常方便。\n\n```bash\nconvert image.{png,jpg}\n# 会展开为\nconvert image.png image.jpg\n\ncp /path/to/project/{foo,bar,baz}.sh /newpath\n# 会展开为\ncp /path/to/project/foo.sh /path/to/project/bar.sh /path/to/project/baz.sh /newpath\n\n# 也可以结合通配使用\nmv *{.py,.sh} folder\n# 会移动所有 *.py 和 *.sh 文件\n\nmkdir foo bar\n\n# 下面命令会创建 foo/a, foo/b, ... foo/h, bar/a, bar/b, ... bar/h 这些文件\ntouch {foo,bar}/{a..h}\ntouch foo/x bar/y\n# 比较文件夹 foo 和 bar 中包含文件的不同\ndiff <(ls foo) <(ls bar)\n# 输出\n# < x\n# ---\n# > y\n```\n\n<!-- Lastly, pipes `|` are a core feature of scripting. Pipes connect one program's output to the next program's input. We will cover them more in detail in the data wrangling lecture. -->\n\n编写 `bash` 脚本有时候会很别扭和反直觉。例如 [shellcheck](https://github.com/koalaman/shellcheck) 这样的工具可以帮助你定位 sh/bash 脚本中的错误。\n\n注意，脚本并不一定只有用 bash 写才能在终端里调用。比如说，这是一段 Python 脚本，作用是将输入的参数倒序输出：\n\n```python\n#!/usr/local/bin/python\nimport sys\nfor arg in reversed(sys.argv[1:]):\n    print(arg)\n```\n\n内核知道去用 python 解释器而不是 shell 命令来运行这段脚本，是因为脚本的开头第一行的 [shebang](https://en.wikipedia.org/wiki/Shebang_(Unix))。\n\n在 `shebang` 行中使用 [`env`](https://man7.org/linux/man-pages/man1/env.1.html) 命令是一种好的实践，它会利用环境变量中的程序来解析该脚本，这样就提高了您的脚本的可移植性。`env` 会利用我们第一节讲座中介绍过的 `PATH` 环境变量来进行定位。\n例如，使用了 `env` 的 shebang 看上去是这样的 `#!/usr/bin/env python`。\n\n\nshell 函数和脚本有如下一些不同点：\n\n- 函数只能与 shell 使用相同的语言，脚本可以使用任意语言。因此在脚本中包含 `shebang` 是很重要的。\n- 函数仅在定义时被加载，脚本会在每次被执行时加载。这让函数的加载比脚本略快一些，但每次修改函数定义，都要重新加载一次。\n- 函数会在当前的 shell 环境中执行，脚本会在单独的进程中执行。因此，函数可以对环境变量进行更改，比如改变当前工作目录，脚本则不行。使用 [`export`](https://man7.org/linux/man-pages/man1/export.1p.html) 导出的环境变量会以传值的方式传递给脚本。\n- 与其他程序语言一样，函数可以提高代码模块性、代码复用性并创建清晰性的结构。shell 脚本中往往也会包含它们自己的函数定义。\n\n\n\n# Shell 工具\n\n## 查看命令如何使用\n\n看到这里，您可能会有疑问，我们应该如何为特定的命令找到合适的标记呢？例如 `ls -l`, `mv -i` 和 `mkdir -p`。更普遍的是，给您一个命令行，您应该怎样了解如何使用这个命令行并找出它的不同的选项呢？\n一般来说，您可能会先去网上搜索答案，但是，UNIX 可比 StackOverflow 出现的早，因此我们的系统里其实早就包含了可以获取相关信息的方法。\n\n在上一节中我们介绍过，最常用的方法是为对应的命令行添加 `-h` 或 `--help` 标记。另外一个更详细的方法则是使用 `man` 命令。[`man`](https://man7.org/linux/man-pages/man1/man.1.html) 命令是手册（manual）的缩写，它提供了命令的用户手册。\n\n例如，`man rm` 会输出命令 `rm` 的说明，同时还有其标记列表，包括之前我们介绍过的 `-i`。\n事实上，目前我们给出的所有命令的说明链接，都是网页版的 Linux 命令手册。即使是您安装的第三方命令，前提是开发者编写了手册并将其包含在了安装包中。在交互式的、基于字符处理的终端窗口中，一般也可以通过 `:help` 命令或键入 `?` 来获取帮助。\n\n有时候手册内容太过详实，让我们难以在其中查找哪些最常用的标记和语法。\n[TLDR pages](https://tldr.sh/) 是一个很不错的替代品，它提供了一些案例，可以帮助您快速找到正确的选项。\n\n例如，自己就常常在 tldr 上搜索 [`tar`](https://tldr.ostera.io/tar) 和 [`ffmpeg`](https://tldr.ostera.io/ffmpeg) 的用法。\n\n\n## 查找文件\n\n程序员们面对的最常见的重复任务就是查找文件或目录。所有的类 UNIX 系统都包含一个名为 [`find`](https://man7.org/linux/man-pages/man1/find.1.html) 的工具，它是 shell 上用于查找文件的绝佳工具。`find` 命令会递归地搜索符合条件的文件，例如：\n\n```bash\n# 查找所有名称为src的文件夹\nfind . -name src -type d\n# 查找所有文件夹路径中包含test的python文件\nfind . -path '*/test/*.py' -type f\n# 查找前一天修改的所有文件\nfind . -mtime -1\n# 查找所有大小在500k至10M的tar.gz文件\nfind . -size +500k -size -10M -name '*.tar.gz'\n```\n除了列出所寻找的文件之外，find 还能对所有查找到的文件进行操作。这能极大地简化一些单调的任务。\n\n\n```bash\n# 删除全部扩展名为.tmp 的文件\nfind . -name '*.tmp' -exec rm {} \\;\n# 查找全部的 PNG 文件并将其转换为 JPG\nfind . -name '*.png' -exec magick {} {}.jpg \\;\n```\n\n尽管 `find` 用途广泛，它的语法却比较难以记忆。例如，为了查找满足模式 `PATTERN` 的文件，您需要执行 `find -name '*PATTERN*'` (如果您希望模式匹配时是不区分大小写，可以使用 `-iname` 选项）\n\n您当然可以使用 alias 设置别名来简化上述操作，但 shell 的哲学之一便是寻找（更好用的）替代方案。\n记住，shell 最好的特性就是您只是在调用程序，因此您只要找到合适的替代程序即可（甚至自己编写）。\n\n例如，[`fd`](https://github.com/sharkdp/fd) 就是一个更简单、更快速、更友好的程序，它可以用来作为 `find` 的替代品。它有很多不错的默认设置，例如输出着色、默认支持正则匹配、支持 unicode 并且我认为它的语法更符合直觉。以模式 `PATTERN` 搜索的语法是 `fd PATTERN`。\n\n大多数人都认为 `find` 和 `fd` 已经很好用了，但是有的人可能想知道，我们是不是可以有更高效的方法，例如不要每次都搜索文件而是通过编译索引或建立数据库的方式来实现更加快速地搜索。\n\n这就要靠 [`locate`](https://man7.org/linux/man-pages/man1/locate.1.html) 了。\n`locate` 使用一个由 [`updatedb`](https://man7.org/linux/man-pages/man1/updatedb.1.html) 负责更新的数据库，在大多数系统中 `updatedb` 都会通过 [`cron`](https://man7.org/linux/man-pages/man8/cron.8.html) 每日更新。这便需要我们在速度和时效性之间作出权衡。而且，`find` 和类似的工具可以通过别的属性比如文件大小、修改时间或是权限来查找文件，`locate` 则只能通过文件名。 [这里](https://unix.stackexchange.com/questions/60205/locate-vs-find-usage-pros-and-cons-of-each-other) 有一个更详细的对比。\n\n\n## 查找代码\n\n查找文件是很有用的技能，但是很多时候您的目标其实是查看文件的内容。一个最常见的场景是您希望查找具有某种模式的全部文件，并找它们的位置。\n\n为了实现这一点，很多类 UNIX 的系统都提供了 [`grep`](https://man7.org/linux/man-pages/man1/grep.1.html) 命令，它是用于对输入文本进行匹配的通用工具。它是一个非常重要的 shell 工具，我们会在后续的数据清理课程中深入的探讨它。\n\n`grep` 有很多选项，这也使它成为一个非常全能的工具。其中我经常使用的有 `-C` ：获取查找结果的上下文（Context）；`-v` 将对结果进行反选（Invert），也就是输出不匹配的结果。举例来说， `grep -C 5` 会输出匹配结果前后五行。当需要搜索大量文件的时候，使用 `-R` 会递归地进入子目录并搜索所有的文本文件。\n\n但是，我们有很多办法可以对 `grep -R` 进行改进，例如使其忽略 `.git` 文件夹，使用多 CPU 等等。\n\n因此也出现了很多它的替代品，包括 [ack](https://beyondgrep.com/), [ag](https://github.com/ggreer/the_silver_searcher) 和 [rg](https://github.com/BurntSushi/ripgrep)。它们都特别好用，但是功能也都差不多，我比较常用的是 ripgrep (`rg`) ，因为它速度快，而且用法非常符合直觉。例子如下：\n\n```bash\n# 查找所有使用了 requests 库的文件\nrg -t py 'import requests'\n# 查找所有没有写 shebang 的文件（包含隐藏文件）\nrg -u --files-without-match \"^#\\!\"\n# 查找所有的foo字符串，并打印其之后的5行\nrg foo -A 5\n# 打印匹配的统计信息（匹配的行和文件的数量）\nrg --stats PATTERN\n```\n\n与 `find`/`fd` 一样，重要的是你要知道有些问题使用合适的工具就会迎刃而解，而具体选择哪个工具则不是那么重要。\n\n\n## 查找 shell 命令\n\n目前为止，我们已经学习了如何查找文件和代码，但随着你使用 shell 的时间越来越久，您可能想要找到之前输入过的某条命令。首先，按向上的方向键会显示你使用过的上一条命令，继续按上键则会遍历整个历史记录。\n\n\n`history` 命令允许您以程序员的方式来访问 shell 中输入的历史命令。这个命令会在标准输出中打印 shell 中的历史命令。如果我们要搜索历史记录，则可以利用管道将输出结果传递给 `grep` 进行模式搜索。\n`history | grep find` 会打印包含 find 子串的命令。\n\n对于大多数的 shell 来说，您可以使用 `Ctrl+R` 对命令历史记录进行回溯搜索。敲 `Ctrl+R` 后您可以输入子串来进行匹配，查找历史命令行。\n\n反复按下就会在所有搜索结果中循环。在 [zsh](https://github.com/zsh-users/zsh-history-substring-search) 中，使用方向键上或下也可以完成这项工作。\n\n\n`Ctrl+R` 可以配合 [fzf](https://github.com/junegunn/fzf/wiki/Configuring-shell-key-bindings#ctrl-r) 使用。`fzf` 是一个通用的模糊查找工具，它可以和很多命令一起使用。这里我们可以对历史命令进行模糊查找并将结果以赏心悦目的格式输出。\n\n另外一个和历史命令相关的技巧我喜欢称之为 **基于历史的自动补全**。\n这一特性最初是由 [fish](https://fishshell.com/) shell 创建的，它可以根据您最近使用过的开头相同的命令，动态地对当前的 shell 命令进行补全。这一功能在 [zsh](https://github.com/zsh-users/zsh-autosuggestions) 中也可以使用，它可以极大的提高用户体验。\n\n你可以修改 shell history 的行为，例如，如果在命令的开头加上一个空格，它就不会被加进 shell 记录中。当你输入包含密码或是其他敏感信息的命令时会用到这一特性。\n为此你需要在 `.bashrc` 中添加 `HISTCONTROL=ignorespace` 或者向 `.zshrc` 添加 `setopt HIST_IGNORE_SPACE`。\n如果你不小心忘了在前面加空格，可以通过编辑 `.bash_history` 或 `.zhistory` 来手动地从历史记录中移除那一项。\n\n\n\n## 文件夹导航\n\n之前对所有操作我们都默认一个前提，即您已经位于想要执行命令的目录下，但是如何才能高效地在目录间随意切换呢？有很多简便的方法可以做到，比如设置 alias，使用 [ln -s](https://man7.org/linux/man-pages/man1/ln.1.html) 创建符号连接等。而开发者们已经想到了很多更为精妙的解决方案。\n\n由于本课程的目的是尽可能对你的日常习惯进行优化。因此，我们可以使用 [`fasd`](https://github.com/clvv/fasd) 和 [autojump](https://github.com/wting/autojump) 这两个工具来查找最常用或最近使用的文件和目录。\n\nFasd 基于 [_frecency_ ](https://developer.mozilla.org/en-US/docs/Mozilla/Tech/Places/Frecency_algorithm) 对文件和文件排序，也就是说它会同时针对频率（_frequency_）和时效（_recency_）进行排序。默认情况下，`fasd` 使用命令 `z` 帮助我们快速切换到最常访问的目录。例如， 如果您经常访问 `/home/user/files/cool_project` 目录，那么可以直接使用 `z cool` 跳转到该目录。对于 autojump，则使用 `j cool` 代替即可。\n\n还有一些更复杂的工具可以用来概览目录结构，例如 [`tree`](https://linux.die.net/man/1/tree), [`broot`](https://github.com/Canop/broot) 或更加完整的文件管理器，例如 [`nnn`](https://github.com/jarun/nnn) 或 [`ranger`](https://github.com/ranger/ranger)。\n\n# 课后练习\n[习题解答]({{site.url}}/{{site.solution_url}}/{{page.solution.url}})\n\n1. 阅读 [`man ls`](https://man7.org/linux/man-pages/man1/ls.1.html) ，然后使用 `ls` 命令进行如下操作：\n\n    - 所有文件（包括隐藏文件）\n    - 文件打印以人类可以理解的格式输出 (例如，使用 454M 而不是 454279954)\n    - 文件以最近修改顺序排序\n    - 以彩色文本显示输出结果\n\n    典型输出如下：\n\n    ```\n    -rw-r--r--   1 user group 1.1M Jan 14 09:53 baz\n    drwxr-xr-x   5 user group  160 Jan 14 09:53 .\n    -rw-r--r--   1 user group  514 Jan 14 06:42 bar\n    -rw-r--r--   1 user group 106M Jan 13 12:12 foo\n    drwx------+ 47 user group 1.5K Jan 12 18:08 ..\n    ```\n\n2. 编写两个 bash 函数  `marco` 和 `polo` 执行下面的操作。\n   每当你执行 `marco` 时，当前的工作目录应当以某种形式保存，当执行 `polo` 时，无论现在处在什么目录下，都应当 `cd` 回到当时执行 `marco` 的目录。\n   为了方便 debug，你可以把代码写在单独的文件 `marco.sh` 中，并通过 `source marco.sh` 命令，（重新）加载函数。\n3. 假设您有一个命令，它很少出错。因此为了在出错时能够对其进行调试，需要花费大量的时间重现错误并捕获输出。\n   编写一段 bash 脚本，运行如下的脚本直到它出错，将它的标准输出和标准错误流记录到文件，并在最后输出所有内容。\n   加分项：报告脚本在失败前共运行了多少次。\n\n    ```bash\n    #!/usr/bin/env bash\n\n    n=$(( RANDOM % 100 ))\n\n    if [[ n -eq 42 ]]; then\n       echo \"Something went wrong\"\n       >&2 echo \"The error was using magic numbers\"\n       exit 1\n    fi\n\n    echo \"Everything went according to plan\"\n    ```\n\n4. 本节课我们讲解的 `find` 命令中的 `-exec` 参数非常强大，它可以对我们查找的文件进行操作。但是，如果我们要对所有文件进行操作呢？例如创建一个 zip 压缩文件？我们已经知道，命令行可以从参数或标准输入接受输入。在用管道连接命令时，我们将标准输出和标准输入连接起来，但是有些命令，例如 `tar` 则需要从参数接受输入。这里我们可以使用 [`xargs`](https://man7.org/linux/man-pages/man1/xargs.1.html) 命令，它可以使用标准输入中的内容作为参数。\n   例如 `ls | xargs rm` 会删除当前目录中的所有文件。\n\n    您的任务是编写一个命令，它可以递归地查找文件夹中所有的 HTML 文件，并将它们压缩成 zip 文件。注意，即使文件名中包含空格，您的命令也应该能够正确执行（提示：查看 `xargs` 的参数 `-d`，译注：MacOS 上的 `xargs` 没有 `-d`，[查看这个 issue](https://github.com/missing-semester/missing-semester/issues/93)）\n\n    {% comment %}\n    find . -type f -name \"*.html\" | xargs -d '\\n'  tar -cvzf archive.tar.gz\n    {% endcomment %}\n    如果您使用的是 MacOS，请注意默认的 BSD `find` 与 [GNU coreutils](https://en.wikipedia.org/wiki/List_of_GNU_Core_Utilities_commands) 中的是不一样的。你可以为 `find` 添加 `-print0` 选项，并为 `xargs` 添加 `-0` 选项。作为 Mac 用户，您需要注意 mac 系统自带的命令行工具和 GNU 中对应的工具是有区别的；如果你想使用 GNU 版本的工具，也可以使用 [brew 来安装](https://formulae.brew.sh/formula/coreutils)。\n\n5. （进阶）编写一个命令或脚本递归的查找文件夹中最近修改的文件。更通用的做法，你可以按照最近的修改时间列出文件吗？\n"
  },
  {
    "path": "_2020/version-control.md",
    "content": "---\nlayout: lecture\ntitle: \"版本控制(Git)\"\ndate: 2020-01-22\nready: true\nsync: true\nsyncdate: 2025-08-16\nvideo:\n  aspect: 56.25\n  id: 2sjqTHE0zok\nsolution:\n    ready: true\n    url: version-control-solution\n---\n\n版本控制系统 (VCSs) 是一类用于追踪源代码（或其他文件、文件夹）改动的工具。顾名思义，这些工具可以帮助我们管理代码的修改历史；不仅如此，它还可以让协作编码变得更方便。VCS 通过一系列的快照将某个文件夹及其内容保存了起来，每个快照都包含了顶级目录中所有的文件或文件夹的完整状态。同时它还维护了快照创建者的信息以及每个快照的相关信息等等。\n\n为什么说版本控制系统非常有用？即使您只是一个人进行编程工作，它也可以帮您创建项目的快照，记录每个改动的目的、基于多分支并行开发等等。和别人协作开发时，它更是一个无价之宝，您可以看到别人对代码进行的修改，同时解决由于并行开发引起的冲突。\n\n现代的版本控制系统可以帮助您轻松地（甚至自动地）回答以下问题：\n\n- 当前模块是谁编写的？\n- 这个文件的这一行是什么时候被编辑的？是谁作出的修改？修改原因是什么呢？\n- 最近的 1000 个版本中，何时/为什么导致了单元测试失败？\n\n尽管版本控制系统有很多， 其事实上的标准则是 **Git** 。而这篇 [XKCD 漫画](https://xkcd.com/1597/) 则反映出了人们对 Git 的评价：\n\n![xkcd 1597](https://imgs.xkcd.com/comics/git.png)\n\n因为 Git 接口的抽象泄漏（leaky abstraction）问题，通过自顶向下的方式（从命令行接口开始）学习 Git 可能会让人感到非常困惑。很多时候您只能死记硬背一些命令行，然后像使用魔法一样使用它们，一旦出现问题，就只能像上面那幅漫画里说的那样去处理了。\n\n尽管 Git 的接口有些丑陋，但是它的底层设计和思想却是非常优雅的。丑陋的接口只能靠死记硬背，而优雅的底层设计则非常容易被人理解。因此，我们将通过一种自底向上的方式向您介绍 Git。我们会从数据模型开始，最后再学习它的接口。一旦您搞懂了 Git 的数据模型，再学习其接口并理解这些接口是如何操作数据模型的就非常容易了。\n\n# Git 的数据模型\n\n进行版本控制的方法很多。Git 拥有一个经过精心设计的模型，这使其能够支持版本控制所需的所有特性，例如维护历史记录、支持分支和促进协作。\n\n## 快照\n\nGit 将顶级目录中的文件和文件夹作为集合，并通过一系列快照来管理其历史记录。在 Git 的术语里，文件被称作 Blob 对象（数据对象），也就是一组数据。目录则被称之为“树”，它将名字与 Blob 对象或树对象进行映射（使得目录中可以包含其他目录）。快照则是被追踪的最顶层的树。例如，一个树看起来可能是这样的：\n\n```\n<root> (tree)\n|\n+- foo (tree)\n|  |\n|  + bar.txt (blob, contents = \"hello world\")\n|\n+- baz.txt (blob, contents = \"git is wonderful\")\n```\n\n这个顶层的树包含了两个元素，一个名为 \"foo\" 的树（它本身包含了一个 blob 对象 \"bar.txt\"），以及一个 blob 对象 \"baz.txt\"。\n\n## 历史记录建模：关联快照\n\n版本控制系统和快照有什么关系呢？线性历史记录是一种最简单的模型，它包含了一组按照时间顺序线性排列的快照。不过出于种种原因，Git 并没有采用这样的模型。\n\n在 Git 中，历史记录是一个由快照组成的有向无环图。有向无环图，听上去似乎是什么高大上的数学名词。不过不要怕，您只需要知道这代表 Git 中的每个快照都有一系列的“父辈”，也就是其之前的一系列快照。注意，快照具有多个“父辈”而非一个，因为某个快照可能由多个父辈而来。例如，经过合并后的两条分支。\n\n在 Git 中，这些快照被称为“提交”。通过可视化的方式来表示这些历史提交记录时，看起来差不多是这样的：\n\n```\no <-- o <-- o <-- o\n            ^\n             \\\n              --- o <-- o\n```\n\n上面是一个 ASCII 码构成的简图，其中的 `o` 表示一次提交（快照）。\n\n箭头指向了当前提交的父辈（这是一种“在...之前”，而不是“在...之后”的关系）。在第三次提交之后，历史记录分岔成了两条独立的分支。这可能因为此时需要同时开发两个不同的特性，它们之间是相互独立的。开发完成后，这些分支可能会被合并并创建一个新的提交，这个新的提交会同时包含这些特性。新的提交会创建一个新的历史记录，看上去像这样（最新的合并提交用粗体标记）：\n\n<pre class=\"highlight\">\n<code>\no <-- o <-- o <-- o <---- <strong> o </strong>\n            ^            /\n             \\          v\n              --- o <-- o\n</code>\n</pre>\n\nGit 中的提交是不可改变的。但这并不代表错误不能被修改，只不过这种“修改”实际上是创建了一个全新的提交记录。而引用（参见下文）则被更新为指向这些新的提交。\n\n## 数据模型及其伪代码表示\n\n以伪代码的形式来学习 Git 的数据模型，可能更加清晰：\n\n```\n// 文件就是一组数据\ntype blob = array<byte>\n\n// 一个包含文件和目录的目录\ntype tree = map<string, tree | blob>\n\n// 每个提交都包含一个父辈，元数据和顶层树\ntype commit = struct {\n    parents: array<commit>\n    author: string\n    message: string\n    snapshot: tree\n}\n```\n\n这是一种简洁的历史模型。\n\n\n## 对象和内存寻址\n\nGit 中的对象可以是 blob、树或提交：\n\n```\ntype object = blob | tree | commit\n```\n\nGit 在储存数据时，所有的对象都会基于它们的 [SHA-1 哈希](https://en.wikipedia.org/wiki/SHA-1) 进行寻址。\n\n```\nobjects = map<string, object>\n\ndef store(object):\n    id = sha1(object)\n    objects[id] = object\n\ndef load(id):\n    return objects[id]\n```\n\nBlobs、树和提交都一样，它们都是对象。当它们引用其他对象时，它们并没有真正的在硬盘上保存这些对象，而是仅仅保存了它们的哈希值作为引用。\n\n例如，[上面](#snapshots) 例子中的树（可以通过 `git cat-file -p 698281bc680d1995c5f4caaf3359721a5a58d48d` 来进行可视化），看上去是这样的：\n\n```\n100644 blob 4448adbf7ecd394f42ae135bbeed9676e894af85    baz.txt\n040000 tree c68d233a33c5c06e0340e4c224f0afca87c8ce87    foo\n```\n\n树本身会包含一些指向其他内容的指针，例如 `baz.txt` (blob) 和 `foo`\n(树)。如果我们用 `git cat-file -p 4448adbf7ecd394f42ae135bbeed9676e894af85`，即通过哈希值查看 baz.txt 的内容，会得到以下信息：\n\n```\ngit is wonderful\n```\n\n## 引用\n\n现在，所有的快照都可以通过它们的 SHA-1 哈希值来标记了。但这也太不方便了，谁也记不住一串 40 位的十六进制字符。\n\n针对这一问题，Git 的解决方法是给这些哈希值赋予人类可读的名字，也就是引用（references）。引用是指向提交的指针。与对象不同的是，它是可变的（引用可以被更新，指向新的提交）。例如，`master` 引用通常会指向主分支的最新一次提交。\n\n```\nreferences = map<string, string>\n\ndef update_reference(name, id):\n    references[name] = id\n\ndef read_reference(name):\n    return references[name]\n\ndef load_reference(name_or_id):\n    if name_or_id in references:\n        return load(references[name_or_id])\n    else:\n        return load(name_or_id)\n```\n\n这样，Git 就可以使用诸如 \"master\" 这样人类可读的名称来表示历史记录中某个特定的提交，而不需要在使用一长串十六进制字符了。\n\n有一个细节需要我们注意， 通常情况下，我们会想要知道“我们当前所在位置”，并将其标记下来。这样当我们创建新的快照的时候，我们就可以知道它的相对位置（如何设置它的“父辈”）。在 Git 中，我们当前的位置有一个特殊的索引，它就是 \"HEAD\"。\n\n## 仓库\n\n最后，我们可以粗略地给出 Git 仓库的定义了：`对象` 和 `引用`。\n\n在硬盘上，Git 仅存储对象和引用：因为其数据模型仅包含这些东西。所有的 `git` 命令都对应着对提交树的操作，例如增加对象，增加或删除引用。\n\n当您输入某个指令时，请思考一下这条命令是如何对底层的图数据结构进行操作的。另一方面，如果您希望修改提交树，例如“丢弃未提交的修改和将 ‘master’ 引用指向提交 `5d83f9e` 时，有什么命令可以完成该操作（针对这个具体问题，您可以使用 `git checkout master; git reset --hard 5d83f9e`）\n\n# 暂存区\n\nGit 中还包括一个和数据模型完全不相关的概念，但它确是创建提交的接口的一部分。\n\n就上面介绍的快照系统来说，您也许会期望它的实现里包括一个 “创建快照” 的命令，该命令能够基于当前工作目录的当前状态创建一个全新的快照。有些版本控制系统确实是这样工作的，但 Git 不是。我们希望简洁的快照，而且每次从当前状态创建快照可能效果并不理想。例如，考虑如下场景，您开发了两个独立的特性，然后您希望创建两个独立的提交，其中第一个提交仅包含第一个特性，而第二个提交仅包含第二个特性。或者，假设您在调试代码时添加了很多打印语句，然后您仅仅希望提交和修复 bug 相关的代码而丢弃所有的打印语句。\n\nGit 处理这些场景的方法是使用一种叫做 “暂存区（staging area）”的机制，它允许您指定下次快照中要包括那些改动。\n\n# Git 的命令行接口\n\n为了避免重复信息，我们将不会详细解释以下命令行。强烈推荐您阅读 [Pro Git 中文版](https://git-scm.com/book/zh/v2) 或可以观看本讲座的视频来学习。\n\n\n## 基础\n\n{% comment %}\n\nThe `git init` command initializes a new Git repository, with repository\nmetadata being stored in the `.git` directory:\n\n```console\n$ mkdir myproject\n$ cd myproject\n$ git init\nInitialized empty Git repository in /home/missing-semester/myproject/.git/\n$ git status\nOn branch master\n\nNo commits yet\n\nnothing to commit (create/copy files and use \"git add\" to track)\n```\n\nHow do we interpret this output? \"No commits yet\" basically means our version\nhistory is empty. Let's fix that.\n\n```console\n$ echo \"hello, git\" > hello.txt\n$ git add hello.txt\n$ git status\nOn branch master\n\nNo commits yet\n\nChanges to be committed:\n  (use \"git rm --cached <file>...\" to unstage)\n\n        new file:   hello.txt\n\n$ git commit -m 'Initial commit'\n[master (root-commit) 4515d17] Initial commit\n 1 file changed, 1 insertion(+)\n create mode 100644 hello.txt\n```\n\nWith this, we've `git add` ed a file to the staging area, and then `git\ncommit`ed that change, adding a simple commit message \" Initial commit \". If we\ndidn't specify a `-m` option, Git would open our text editor to allow us type a\ncommit message.\n\nNow that we have a non-empty version history, we can visualize the history.\nVisualizing the history as a DAG can be especially helpful in understanding the\ncurrent status of the repo and connecting it with your understanding of the Git\ndata model.\n\nThe `git log` command visualizes history. By default, it shows a flattened\nversion, which hides the graph structure. If you use a command like `git log\n--all --graph --decorate`, it will show you the full version history of the\nrepository, visualized in graph form.\n\n```console\n$ git log --all --graph --decorate\n* commit 4515d17a167bdef0a91ee7d50d75b12c9c2652aa (HEAD -> master)\n  Author: Missing Semester <missing-semester@mit.edu>\n  Date:   Tue Jan 21 22:18:36 2020 -0500\n\n      Initial commit\n```\n\nThis doesn't look all that graph-like, because it only contains a single node.\nLet's make some more changes, author a new commit, and visualize the history\nonce more.\n\n```console\n$ echo \"another line\" >> hello.txt\n$ git status\nOn branch master\nChanges not staged for commit:\n  (use \"git add <file>...\" to update what will be committed)\n  (use \"git checkout -- <file>...\" to discard changes in working directory)\n\n        modified:   hello.txt\n\nno changes added to commit (use \"git add\" and/or \"git commit -a\")\n$ git add hello.txt\n$ git status\nOn branch master\nChanges to be committed:\n  (use \"git reset HEAD <file>...\" to unstage)\n\n        modified:   hello.txt\n\n$ git commit -m 'Add a line'\n[master 35f60a8] Add a line\n 1 file changed, 1 insertion(+)\n```\n\nNow, if we visualize the history again, we'll see some of the graph structure:\n\n```\n* commit 35f60a825be0106036dd2fbc7657598eb7b04c67 (HEAD -> master)\n| Author: Missing Semester <missing-semester@mit.edu>\n| Date:   Tue Jan 21 22:26:20 2020 -0500\n|\n|     Add a line\n|\n* commit 4515d17a167bdef0a91ee7d50d75b12c9c2652aa\n  Author: Anish Athalye <me@anishathalye.com>\n  Date:   Tue Jan 21 22:18:36 2020 -0500\n\n      Initial commit\n```\n\nAlso, note that it shows the current HEAD, along with the current branch\n(master).\n\nWe can look at old versions using the `git checkout` command.\n\n```console\n$ git checkout 4515d17  # previous commit hash; yours will be different\nNote: checking out '4515d17'.\n\nYou are in 'detached HEAD' state. You can look around, make experimental\nchanges and commit them, and you can discard any commits you make in this\nstate without impacting any branches by performing another checkout.\n\nIf you want to create a new branch to retain commits you create, you may\ndo so (now or later) by using -b with the checkout command again. Example:\n\n  git checkout -b <new-branch-name>\n\nHEAD is now at 4515d17 Initial commit\n$ cat hello.txt\nhello, git\n$ git checkout master\nPrevious HEAD position was 4515d17 Initial commit\nSwitched to branch 'master'\n$ cat hello.txt\nhello, git\nanother line\n```\n\nGit can show you how files have evolved (differences, or diffs) using the `git\ndiff` command:\n\n```console\n$ git diff 4515d17 hello.txt\ndiff --git c/hello.txt w/hello.txt\nindex 94bab17..f0013b2 100644\n--- c/hello.txt\n+++ w/hello.txt\n@@ -1 +1,2 @@\n hello, git\n +another line\n```\n\n{% endcomment %}\n\n- `git help <command>`: 获取 git 命令的帮助信息\n- `git init`: 创建一个新的 git 仓库，其数据会存放在一个名为 `.git` 的目录下\n- `git status`: 显示当前的仓库状态\n- `git add <filename>`: 添加文件到暂存区\n- `git commit`: 创建一个新的提交\n    - 如何编写 [良好的提交信息](https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html)!\n    - 为何要 [编写良好的提交信息](https://chris.beams.io/posts/git-commit/)\n- `git log`: 显示历史日志\n- `git log --all --graph --decorate`: 可视化历史记录（有向无环图）\n- `git diff <filename>`: 显示与暂存区文件的差异\n- `git diff <revision> <filename>`: 显示某个文件两个版本之间的差异\n- `git checkout <revision>`: 更新 HEAD（如果是检出分支则同时更新当前分支）\n\n## 分支和合并\n\n{% comment %}\n\nBranching allows you to \"fork\" version history. It can be helpful for working\non independent features or bug fixes in parallel. The `git branch` command can\nbe used to create new branches; `git checkout -b <branch name>` creates and\nbranch and checks it out.\n\nMerging is the opposite of branching: it allows you to combine forked version\nhistories, e.g. merging a feature branch back into master. The `git merge`\ncommand is used for merging.\n\n{% endcomment %}\n\n- `git branch`: 显示分支\n- `git branch <name>`: 创建分支\n- `git checkout -b <name>`: 创建分支并切换到该分支\n    - 相当于 `git branch <name>; git checkout <name>`\n- `git merge <revision>`: 合并到当前分支\n- `git mergetool`: 使用工具来处理合并冲突\n- `git rebase`: 将一系列补丁变基（rebase）为新的基线\n\n## 远端操作\n\n- `git remote`: 列出远端\n- `git remote add <name> <url>`: 添加一个远端\n- `git push <remote> <local branch>:<remote branch>`: 将对象传送至远端并更新远端引用\n- `git branch --set-upstream-to=<remote>/<remote branch>`: 创建本地和远端分支的关联关系\n- `git fetch`: 从远端获取对象/索引\n- `git pull`: 相当于 `git fetch; git merge`\n- `git clone`: 从远端下载仓库\n\n## 撤销\n\n- `git commit --amend`: 编辑提交的内容或信息\n- `git reset HEAD <file>`: 恢复暂存的文件\n- `git checkout -- <file>`: 丢弃修改\n- `git restore`: git2.32 版本后取代 git reset 进行许多撤销操作\n\n# Git 高级操作\n\n- `git config`: Git 是一个 [高度可定制的](https://git-scm.com/docs/git-config) 工具\n- `git clone --depth=1`: 浅克隆（shallow clone），不包括完整的版本历史信息\n- `git add -p`: 交互式暂存\n- `git rebase -i`: 交互式变基\n- `git blame`: 查看最后修改某行的人\n- `git stash`: 暂时移除工作目录下的修改内容\n- `git bisect`: 通过二分查找搜索历史记录\n- `.gitignore`: [指定](https://git-scm.com/docs/gitignore) 故意不追踪的文件\n\n# 杂项\n\n- **图形用户界面**: Git 的 [图形用户界面客户端](https://git-scm.com/downloads/guis) 有很多，但是我们自己并不使用这些图形用户界面的客户端，我们选择使用命令行接口\n- **Shell 集成**: 将 Git 状态集成到您的 shell 中会非常方便。([zsh](https://github.com/olivierverdier/zsh-git-prompt), [bash](https://github.com/magicmonty/bash-git-prompt))。[Oh My Zsh](https://github.com/ohmyzsh/ohmyzsh) 这样的框架中一般已经集成了这一功能\n- **编辑器集成**: 和上面一条类似，将 Git 集成到编辑器中好处多多。[fugitive.vim](https://github.com/tpope/vim-fugitive) 是 Vim 中集成 Git 的常用插件\n- **工作流**: 我们已经讲解了数据模型与一些基础命令，但还没讨论到进行大型项目时的一些惯例 (\n有 [很多](https://nvie.com/posts/a-successful-git-branching-model/)\n[不同的](https://www.endoflineblog.com/gitflow-considered-harmful)\n[处理方法](https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow))\n- **GitHub**: Git 并不等同于 GitHub。 在 GitHub 中您需要使用一个被称作 [拉取请求（pull request）](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests) 的方法来向其他项目贡献代码\n- **其他 Git 提供商**: GitHub 并不是唯一的。还有像 [GitLab](https://about.gitlab.com/) 和 [BitBucket](https://bitbucket.org/) 这样的平台。\n\n# 资源\n\n- [Pro Git](https://git-scm.com/book/en/v2) ，**强烈推荐**！学习前五章的内容可以教会您流畅使用 Git 的绝大多数技巧，因为您已经理解了 Git 的数据模型。后面的章节提供了很多有趣的高级主题。（[Pro Git 中文版](https://git-scm.com/book/zh/v2)）；\n- [Oh Shit, Git!?!](https://ohshitgit.com/) ，简短的介绍了如何从 Git 错误中恢复；\n- [Git for Computer Scientists](https://eagain.net/articles/git-for-computer-scientists/) ，简短的介绍了 Git 的数据模型，与本文相比包含较少量的伪代码以及大量的精美图片；\n- [Git from the Bottom Up](https://jwiegley.github.io/git-from-the-bottom-up/) 详细的介绍了 Git 的实现细节，而不仅仅局限于数据模型。好奇的同学可以看看；\n- [How to explain git in simple words](https://smusamashah.github.io/blog/2017/10/14/explain-git-in-simple-words)；\n- [Learn Git Branching](https://learngitbranching.js.org/) 通过基于浏览器的游戏来学习 Git ；\n\n\n# 课后练习\n\n[习题解答]({{site.url}}/{{site.solution_url}}/{{page.solution.url}})\n1. 如果您之前从来没有用过 Git，推荐您阅读 [Pro Git](https://git-scm.com/book/en/v2) 的前几章，或者完成像 [Learn Git Branching](https://learngitbranching.js.org/) 这样的教程。重点关注 Git 命令和数据模型相关内容；\n2. 克隆 [本课程网站的仓库](https://github.com/missing-semester-cn/missing-semester-cn.github.io.git)\n    1. 将版本历史可视化并进行探索\n    2. 是谁最后修改了 `README.md` 文件？（提示：使用 `git log` 命令并添加合适的参数）\n    3. 最后一次修改 `_config.yml` 文件中 `collections:` 行时的提交信息是什么？（提示：使用 `git blame` 和 `git show`）\n3. 使用 Git 时的一个常见错误是提交本不应该由 Git 管理的大文件，或是将含有敏感信息的文件提交给 Git 。尝试向仓库中添加一个文件并添加提交信息，然后将其从历史中删除 ( [这篇文章也许会有帮助](https://help.github.com/articles/removing-sensitive-data-from-a-repository/))；\n4. 从 GitHub 上克隆某个仓库，修改一些文件。当您使用 `git stash` 会发生什么？当您执行 `git log --all --oneline` 时会显示什么？通过 `git stash pop` 命令来撤销 `git stash` 操作，什么时候会用到这一技巧？\n5. 与其他的命令行工具一样，Git 也提供了一个名为 `~/.gitconfig` 配置文件 (或 dotfile)。请在 `~/.gitconfig` 中创建一个别名，使您在运行 `git graph` 时，您可以得到 `git log --all --graph --decorate --oneline` 的输出结果；\n6. 您可以通过执行 `git config --global core.excludesfile ~/.gitignore_global` 来设置全局忽略文件的位置，这会告诉 Git 使用该文件，但您仍需要手动在该路径创建 `~/.gitignore_global` 文件。配置您的全局 gitignore 文件来自动忽略系统或编辑器的临时文件，例如 `.DS_Store`；\n7. Fork [本课程网站的仓库](https://github.com/missing-semester-cn/missing-semester-cn.github.io.git)，找找有没有错别字或其他可以改进的地方，在 GitHub 上发起拉取请求（Pull Request)；\n"
  },
  {
    "path": "_config.yml",
    "content": "# Setup\ntitle: 'the missing semester of your cs education'\nurl: https://missing-semester-cn.github.io\nsolution_url: missing-notes-and-solutions/2020/solutions/\n# Settings\nmarkdown: kramdown\nkramdown:\n  input: GFM\n  hard_wrap: false\nhighlighter: rouge\npermalink: /:title/\nfuture: true\n# safe: true # breaks local rendering if enabled\ntimezone: America/New_York\nanalytics:\n  tracking_id: UA-53167467-11\n\ncollections:\n  '2019':\n    output: true\n  '2020':\n    output: true\n\n# Excludes\nexclude:\n  - README.md\n  - Gemfile\n  - Gemfile.lock\n"
  },
  {
    "path": "_includes/head.html",
    "content": "<head>\n  <meta charset=\"utf-8\">\n\n  <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n  <link rel=\"apple-touch-icon\" sizes=\"180x180\" href=\"/apple-touch-icon.png\">\n  <link rel=\"icon\" type=\"image/png\" sizes=\"32x32\" href=\"/favicon-32x32.png\">\n  <link rel=\"icon\" type=\"image/png\" sizes=\"16x16\" href=\"/favicon-16x16.png\">\n\n  <meta name=\"twitter:card\" content=\"summary_large_image\">\n  <meta name=\"og:url\" content=\"{{ page.url | prepend: site.url }}\">\n  <meta property=\"og:site_name\" content=\"{{ site.title }}\">\n\n  {% if page.description %}\n    {% assign description = page.description | strip_newlines %}\n    <meta name=\"description\" content=\"{{ description }}\">\n    <meta name=\"twitter:description\" content=\"{{ description }}\">\n    <meta property=\"og:description\" content=\"{{ description }}\">\n  {% endif %}\n\n  {% if page.short_title %}\n    {% assign title = page.short_title %}\n  {% elsif page.title %}\n    {% assign title = page.title %}\n  {% else %}\n    {% assign title = site.title %}\n  {% endif %}\n  <meta name=\"twitter:title\" content=\"{{ title }}\">\n  <meta property=\"og:title\" content=\"{{ title }}\">\n\n  {% if page.thumbnail %}\n    <meta name=\"twitter:image:src\" content=\"{{ page.thumbnail | prepend: site.url }}\">\n    <meta property=\"og:image\" content=\"{{ page.thumbnail | prepend: site.url }}\">\n  {% endif %}\n\n  <title>\n    {% if page.title %}\n      {{ page.title }} &middot; {{ site.title }}\n    {% else %}\n      {{ site.title }}\n    {% endif %}\n  </title>\n\n  <link rel=\"stylesheet\" href=\"/static/css/main.css\">\n  <link rel=\"stylesheet\" href=\"/static/css/syntax.css\">\n  <link href='https://fonts.googleapis.com/css?family=Source+Sans+Pro:200,300,400,600,700,900,200italic,300italic,400italic,600italic,700italic,900italic&subset=latin,latin-ext' rel='stylesheet' type='text/css'>\n  <link href='https://fonts.googleapis.com/css?family=Source+Code+Pro:200,300,400,500,600,700,900&subset=latin,latin-ext' rel='stylesheet' type='text/css'>\n\n  <script async src=\"https://www.googletagmanager.com/gtag/js?id={{ site.analytics.tracking_id }}\"></script>\n  <script>\n    window.dataLayer = window.dataLayer || [];\n    function gtag(){dataLayer.push(arguments);}\n    gtag('js', new Date());\n\n    gtag('config', '{{ site.analytics.tracking_id }}');\n  </script>\n\n</head>\n"
  },
  {
    "path": "_includes/nav.html",
    "content": "<div id=\"nav-bg\">\n  <nav id=\"top-nav\">\n  <input type=\"checkbox\" id=\"menu-icon\">\n    <label class=\"menu-label\" for=\"menu-icon\"></label>\n    <a href=\"/\" id=\"logo\">./missing-semester</a>\n    \n    <div class=\"trigger\">\n      <div class=\"trigger-child\">\n        <span class=\"nav-link\"><a href=\"/2020/\">讲座列表</a></span>\n        <span class=\"nav-link\"><a href=\"/about/\">关于本课程</a></span>\n        <span class=\"nav-link\"><a href=\"https://missing.csail.mit.edu/\">English</a></span>\n        <span class=\"nav-link\"><a href=\"https://missing-semester-cn.github.io/missing-notes-and-solutions/\">习题解答</a></span>\n        <span style=\"float: right;margin: 3px;padding-right:15px\"><a  href=\"https://github.com/missing-semester-cn/missing-semester-cn.github.io/stargazers\">\n        <img alt=\"GitHub stars\" src=\"https://img.shields.io/github/stars/missing-semester-cn/missing-semester-cn.github.io?style=social\"></a></span>\n     <span style=\"float: right;margin: 3px;padding-right:15px\"> <a  href=\"https://github.com/missing-semester-cn/missing-semester-cn.github.io\">\n     <img alt=\"GitHub forks\" src=\"https://img.shields.io/github/forks/missing-semester-cn/missing-semester-cn.github.io?style=social\"></a></span>\n\n         \n      </div>\n    </div>\n  </nav>\n</div>\n{% comment %} <div class=\"ribbon\">\n  <a href=\"https://github.com/missing-semester-cn/missing-semester-cn.github.io\" target=\"_blank\"  >Edit on GitHub</a>\n</div>  {% endcomment %}\n"
  },
  {
    "path": "_includes/scaled_image.html",
    "content": "<a href=\"{% if include.href %}{{ include.href }}{% else %}{{ include.src }}{% endif %}\">\n  <img src=\"{{ include.src }}\" alt=\"{{ include.alt }}\" style=\"width: 100%; max-width: {{ include.width }}px; max-height: 100%;\">\n</a>\n"
  },
  {
    "path": "_includes/scaled_video.html",
    "content": "<video src=\"{{ include.src }}\" {{ include.options }} controls class=\"{{ include.class }}\" style=\"width: 100%; max-width: {{ include.width }}px; max-height: 100%;\">\n</video>\n"
  },
  {
    "path": "_includes/video.html",
    "content": "<video src=\"{{ include.src }}\" {{ include.options }} controls class=\"{{ include.class }}\">\n</video>\n"
  },
  {
    "path": "_layouts/default.html",
    "content": "<!DOCTYPE html>\n<html lang=\"en\">\n\n  {% include head.html %}\n\n  <body>\n\n    {% include nav.html %}\n\n    <div id=\"content\">\n    {{ content }}\n    </div>\n\n  </body>\n\n</html>\n"
  },
  {
    "path": "_layouts/lecture.html",
    "content": "---\nlayout: default\n---\n\n<h1 class=\"title\">{{ page.title }}{% if page.subtitle %} <span class=\"subtitle\">{{ page.subtitle }}</span>{% endif %}</h1>\n\n{% if page.video.id %}\n  <div class=\"youtube-wrapper\" style=\"padding-bottom: {{ page.video.aspect }}%;\">\n    <iframe src=\"https://www.youtube.com/embed/{{ page.video.id }}\" frameborder=\"0\" allow=\"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen></iframe>\n  </div>\n{% elsif page.video %}\n  <p class=\"center gap accent\"><strong>Lecture video coming soon!</strong></p>\n{% endif %}\n\n{{ content }}\n\n<hr>\n\n<div class=\"small center\">\n<p><a href=\"https://github.com/missing-semester-cn/missing-semester-cn/blob/master/{{ page.path }}\">Edit this page</a>.</p>\n<p>Licensed under <a href=\"https://creativecommons.org/licenses/by-nc-sa/4.0\">CC BY-NC-SA</a>.</p>\n</div>\n"
  },
  {
    "path": "_layouts/page.html",
    "content": "---\nlayout: default\n---\n\n<h1 class=\"title\">{{ page.title }}{% if page.subtitle %} <span class=\"subtitle\">{{ page.subtitle }}</span>{% endif %}</h1>\n\n{{ content }}\n"
  },
  {
    "path": "_layouts/redirect.html",
    "content": "---\nlayout: null\n---\n<!DOCTYPE html>\n<html>\n  <head>\n    <meta charset=\"utf-8\">\n    <title>\n      {{ site.title }} -- {{ page.title }}\n    </title>\n    <meta http-equiv=\"refresh\" content=\"0; url={{ page.redirect }}\">\n  </head>\n  <body>\n    <script>\n      window.location = '{{ page.redirect }}';\n    </script>\n    <p>Redirecting you to <a href=\"{{ page.redirect }}\">{{ page.redirect }}</a></p>\n  </body>\n</html>\n"
  },
  {
    "path": "about.md",
    "content": "---\nlayout: lecture\ntitle: \"开设此课程的动机\"\n---\n\n在传统的计算机科学课程中，从操作系统、编程语言到机器学习，这些高大上课程和主题已经非常多了。\n然而有一个至关重要的主题却很少被专门讲授，而是留给学生们自己去探索。 这部分内容就是：精通工具。\n\n这些年，我们在麻省理工学院参与了许多课程的助教活动，过程当中愈发意识到很多学生对于工具的了解知之甚少。\n计算机设计的初衷就是任务自动化，然而学生们却常常陷在大量的重复任务中，或者无法完全发挥出诸如\n版本控制、文本编辑器等工具的强大作用。效率低下和浪费时间还是其次，更糟糕的是，这还可能导致数据丢失或\n无法完成某些特定任务。\n\n这些主题不是大学课程的一部分：学生一直都不知道如何使用这些工具，或者说，至少是不知道如何高效\n地使用，因此浪费了时间和精力在本来可以更简单的任务上。标准的计算机科学课程缺少了这门能让计算\n变得更简捷的关键课程。\n\n# The missing semester of your CS education\n\n为了解决这个问题，我们开设了一个课程，涵盖各项对成为高效率计算机科学家或程序员至关重要的\n主题。这个课程实用且具有很强的实践性，提供了各种能够立即广泛应用解决问题的趁手工具指导。\n该课在 2020 年 1 月“独立活动期”开设，为期一个月，是学生开办的短期课程。虽然该课程针对\n麻省理工学院，但我们公开提供了全部课程的录制视频与相关资料。\n\n如果该课程适合你，那么以下还有一些具体的课程示例：\n\n## 命令行与 shell 工具\n\n如何使用别名、脚本和构建系统来自动化执行通用重复的任务。不再总是从文档中拷贝粘贴\n命令。不要再“逐个执行这 15 个命令”，不要再“你忘了执行这个命令”、“你忘了传那个\n参数”，类似的对话不要再有了。\n\n例如，快速搜索历史记录可以节省大量时间。在下面这个示例中，我们展示了如何通过`convert`命令\n在历史记录中跳转的一些技巧。\n\n<video autoplay=\"autoplay\" loop=\"loop\" controls muted playsinline  oncontextmenu=\"return false;\"  preload=\"auto\"  class=\"demo\">\n  <source src=\"/static/media/demos/history.mp4\" type=\"video/mp4\">\n</video>\n\n## 版本控制\n\n如何**正确地**使用版本控制，利用它避免尴尬的情况发生。与他人协作，并且能够快速定位\n有问题的提交\n不再大量注释代码。不再为解决 bug 而找遍所有代码。不再“我去，刚才是删了有用的代码？！”。\n我们将教你如何通过拉取请求来为他人的项目贡献代码。\n\n下面这个示例中，我们使用`git bisect`来定位哪个提交破坏了单元测试，并且通过`git revert`来进行修复。\n\n<video autoplay=\"autoplay\" loop=\"loop\" controls muted playsinline  oncontextmenu=\"return false;\"  preload=\"auto\"  class=\"demo\">\n  <source src=\"/static/media/demos/git.mp4\" type=\"video/mp4\">\n</video>\n\n## 文本编辑\n\n不论是本地还是远程，如何通过命令行高效地编辑文件，并且充分利用编辑器特性。不再来回复制文件。不再重复编辑文件。\n\nVim 的宏是它最好的特性之一，在下面这个示例中，我们使用嵌套的 Vim 宏快速地将 html 表格转换成了 csv 格式。\n\n<video autoplay=\"autoplay\" loop=\"loop\" controls muted playsinline  oncontextmenu=\"return false;\"  preload=\"auto\"  class=\"demo\">\n  <source src=\"/static/media/demos/vim.mp4\" type=\"video/mp4\">\n</video>\n\n## 远程服务器\n\n使用 SSH 密钥连接远程机器进行工作时如何保持连接，并且让终端能够复用。不再为了仅执行个别命令\n总是打开许多命令行终端。不再每次连接都总输入密码。不再因为网络断开或必须重启笔记本时\n就丢失全部上下文。\n\n以下示例，我们使用`tmux`来保持远程服务器的会话存在，并使用`mosh`来支持网络漫游和断开连接。\n\n<video autoplay=\"autoplay\" loop=\"loop\" controls muted playsinline  oncontextmenu=\"return false;\"  preload=\"auto\"  class=\"demo\">\n  <source src=\"/static/media/demos/ssh.mp4\" type=\"video/mp4\">\n</video>\n\n## 查找文件\n\n如何快速查找你需要的文件。不再挨个点击项目中的文件，直到找到你所需的代码。\n\n以下示例，我们通过`fd`快速查找文件，通过`rg`找代码片段。我们也用到了`fasd`快速`cd`并`vim`最近/常用的文件/文件夹。\n\n<video autoplay=\"autoplay\" loop=\"loop\" controls muted playsinline  oncontextmenu=\"return false;\"  preload=\"auto\"  class=\"demo\">\n  <source src=\"/static/media/demos/find.mp4\" type=\"video/mp4\">\n</video>\n\n## 数据处理\n\n如何通过命令行直接轻松快速地修改、查看、解析、绘制和计算数据和文件。不再从日志文件拷贝\n粘贴。不再手动统计数据。不再用电子表格画图。\n\n## 虚拟机\n\n如何使用虚拟机尝试新操作系统，隔离无关的项目，并且保持宿主机整洁。不再因为做安全实验而\n意外损坏你的计算机。不再有大量随机安装的不同版本软件包。\n\n## 安全\n\n如何在不泄露隐私的情况下畅游互联网。不再抓破脑袋想符合自己疯狂规则的密码。不再连接不安全的开放 WiFi 网络。不再传输未加密的信息。\n\n# 结论\n\n这 12 节课将包括但不限于以上内容，同时每堂课都提供了能帮助你熟悉这些工具的练手小测验。如果不能\n等到一月，你也可以看下[黑客工具](https://hacker-tools.github.io/lectures/)，这是我们去年的\n试讲。它是本课程的前身，包含许多相同的主题。\n\n无论面对面还是远程在线，欢迎你的参与。\n\nHappy hacking,<br>\nAnish, Jose, and Jon\n"
  },
  {
    "path": "index.md",
    "content": "---\nlayout: page\ntitle: 计算机教育中缺失的一课\n---\n\n# The Missing Semester of Your CS Education 中文版\n\n大学里的计算机课程通常专注于讲授从操作系统到机器学习这些学院派的课程或主题，而对于如何精通工具这一主题则往往会留给学生自行探索。在这个系列课程中，我们讲授命令行、强大的文本编辑器的使用、使用版本控制系统提供的多种特性等等。学生在他们受教育阶段就会和这些工具朝夕相处（在他们的职业生涯中更是这样）。\n\n因此，花时间打磨使用这些工具的能力并能够最终熟练地、流畅地使用它们是非常有必要的。\n\n精通这些工具不仅可以帮助您更快的使用工具完成任务，并且可以帮助您解决在之前看来似乎无比复杂的问题。\n\n关于 [开设此课程的动机](/about/)。\n\n{% comment %}\n\n# Registration\n\nSign up for the IAP 2020 class by filling out this [registration form](https://forms.gle/TD1KnwCSV52qexVt9).\n{% endcomment %}\n\n# 日程 <span style=\"float:right\"><img src = \"https://img.shields.io/badge/文档同步时间-2021--04--24-blue\"></span>\n\n<ul>\n{% assign lectures = site['2020'] | sort: 'date' %}\n{% for lecture in lectures %}\n    {% if lecture.phony != true and lecture.solution !=true  %}\n        <li>\n        <strong>{{ lecture.date | date: '%-m/%d' }}</strong>:\n        {% if lecture.ready%}\n            <a href=\"{{ lecture.url }}\">{{ lecture.title }}</a><span style=\"float:right\"><img src = \"https://img.shields.io/badge/Chinese-✔-green\"></span>\n        {% else %}\n             <a href=\"{{ lecture.url }}\">{{ lecture.title }}  {% if lecture.noclass %}[no class]{% endif %}</a><span style=\"float:right\"><img src = \"https://img.shields.io/badge/Chinese-✘-orange\"></span>\n        {% endif %}\n        {% if lecture.sync %}\n           <span style=\"float:right\"><img src = \"https://img.shields.io/badge/Update-✔-green\"></span>\n        {% else %}\n           <span style=\"float:right\"><img src = \"https://img.shields.io/badge/Update-✘-orange\"></span>\n        {% endif %}\n        {% if lecture.solution.ready%}\n        <span style=\"float:right\"><a href=\"{{site.url}}/{{site.solution_url}}/{{lecture.solution.url}}\"><img src = \"https://img.shields.io/badge/Solution-✔-green\"></a></span>\n            {% else %}\n            <span style=\"float:right\"><img src = \"https://img.shields.io/badge/Solution-✘-orange\"></span>\n            {% endif %}\n        </li>\n    {% endif %}\n{% endfor %}\n</ul>\n\n讲座视频可以在 [\nYouTube](https://www.youtube.com/playlist?list=PLyzOVJj3bHQuloKGG59rS43e29ro7I57J) 上找到。\n\n# 关于本课程\n\n**教员**：本课程由 [Anish](https://www.anishathalye.com/)、[Jon](https://thesquareplanet.com/) 和 [Jose](http://josejg.com/) 讲授。\n\n**问题**：请通过 [missing-semester@mit.edu](mailto:missing-semester@mit.edu) 联系我们。\n\n# 在 MIT 之外\n\n我们也将本课程分享到了 MIT 之外，希望其他人也能受益于这些资源。您可以在下面这些地方找到相关文章和讨论。\n\n - [Hacker News](https://news.ycombinator.com/item?id=22226380)\n - [Lobsters](https://lobste.rs/s/ti1k98/missing_semester_your_cs_education_mit)\n - [/r/learnprogramming](https://www.reddit.com/r/learnprogramming/comments/eyagda/the_missing_semester_of_your_cs_education_mit/)\n - [/r/programming](https://www.reddit.com/r/programming/comments/eyagcd/the_missing_semester_of_your_cs_education_mit/)\n - [Twitter](https://twitter.com/jonhoo/status/1224383452591509507)\n - [YouTube](https://www.youtube.com/playlist?list=PLyzOVJj3bHQuloKGG59rS43e29ro7I57J)\n\n# 译文\n\n- [繁体中文](https://missing-semester-zh-hant.github.io/)\n- [Japanese](https://missing-semester-jp.github.io/)\n- [Korean](https://missing-semester-kr.github.io/)\n- [Portuguese](https://missing-semester-pt.github.io/)\n- [Russian](https://missing-semester-rus.github.io/)\n- [Serbian](https://netboxify.com/missing-semester/)\n- [Spanish](https://missing-semester-esp.github.io/)\n- [Turkish](https://missing-semester-tr.github.io/)\n- [Vietnamese](https://missing-semester-vn.github.io/)\n\n注意：上述链接为社区翻译，我们并未验证其内容。\n\n## 致谢\n\n感谢 Elaine Mello, Jim Cain 以及 [MIT Open Learning](https://openlearning.mit.edu/) 帮助我们录制讲座视频。\n\n感谢 Anthony Zolnik 和 [MIT AeroAstro](https://aeroastro.mit.edu/) 提供 A/V 设备。\n\n感谢 Brandi Adams 和 [MIT EECS](https://www.eecs.mit.edu/) 对本课程的支持。\n\n---\n\n<div class=\"small center\">\n<p><a href=\"https://github.com/missing-semester-cn/missing-semester-cn\">Source code</a>.</p>\n<p>Licensed under CC BY-NC-SA.</p>\n<p>See <a href=\"/license\">here</a> for contribution &amp; translation guidelines.</p>\n</div>\n"
  },
  {
    "path": "lectures.html",
    "content": "---\nlayout: redirect\nredirect: /2020/\ntitle: Lectures\n---\n"
  },
  {
    "path": "license.md",
    "content": "---\nlayout: default\ntitle: \"License\"\npermalink: /license\n---\n\n# License\n\nAll the content in this course, including the website source code, lecture notes, exercises, and lecture videos is licensed under Attribution-NonCommercial-ShareAlike 4.0 International [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).\n\nThis means that you are free to:\n- **Share** — copy and redistribute the material in any medium or format\n- **Adapt** — remix, transform, and build upon the material\n\nUnder the following terms:\n\n- **Attribution** — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.\n- **NonCommercial** — You may not use the material for commercial purposes.\n- **ShareAlike** — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.\n\nThis is a human-readable summary of (and not a substitute for) the [license](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).\n\n## Contribution guidelines\n\nYou can submit corrections and suggestions to the course material by submitting issues and pull requests on our GitHub [repo](https://github.com/missing-semester/missing-semester). This includes the captions for the video lectures which are also in the repo (see [here](https://github.com/missing-semester/missing-semester/tree/master/static/files/subtitles/2020)).\n\n## Translation guidelines\n\nYou are free to translate the lecture notes and exercises as long as you follow the license terms.\nIf your translation mirrors the course structure, please contact us so we can link your translated version from our page.\n\nFor translating the video captions, please submit your translations as community contributions in YouTube.\n\n"
  },
  {
    "path": "robots.txt",
    "content": "User-agent: *\nDisallow:\n"
  },
  {
    "path": "static/css/main.css",
    "content": "/* Copyright (c) 2017 Anish Athalye */\n@import url(https://fonts.googleapis.com/css?family=Source+Sans+Pro);\n@import url(https://fonts.googleapis.com/css?family=Source+Code+Pro);\n\n/* Basic styling */\n\n* {\n  box-sizing: border-box;\n  margin: 0;\n  padding: 0;\n  text-rendering: geometricPrecision;\n}\n\nhtml {\n/*  font-size: 14px;\n  font-family: \"Source Sans Pro\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;*/\n  font-family: \"Source Sans Pro\", sans-serif;\n  font-size: 14pt;\n  line-height: 1.5;\n}\n\n@media(min-width: 480px) {\n  html {\n    /*font-size: 16px;*/\n    font-size: 14pt;\n  }\n}\n\nbody {\n  margin: 0;\n  color: #000;\n  background-color: #fff;\n  overflow-y: scroll;\n}\n\nh1, h2, h3, h4, h5, h6 {\n  margin-bottom: 1rem;\n  font-weight: bold;\n  /*text-decoration: underline;*/\n  line-height: 1.25;\n  font-size: 1rem;\n}\n\nh1 {\n  margin-top: 1.25rem;\n  font-size: 1.5rem;\n}\n\nh2 {\n  margin-top: 1.25rem;\n  font-size: 1.1rem;\n}\n\np {\n  margin-top: 0;\n  margin-bottom: 1rem;\n}\n\nstrong {\n  font-weight: bold;\n}\n\nem {\n  font-style: italic;\n}\n\nul {\n  list-style-position: inside;\n  padding-left: 1rem;\n}\n\nol {\n  margin-left: 1rem;\n}\n\nli > ul {\n  padding-left: 2rem;\n}\n\nul li {\n  list-style-type: none;\n}\n\nul, ol {\n  margin-bottom: 1rem;\n}\n\nul ul, ol ul, ul ol, ol ol {\n  margin-bottom: inherit;\n}\n\nul li:before {\n  content: \"\\2013  \"; /* note: extra space needed because first is consumed by css parser */\n  position: absolute;\n  margin-left: -1rem;\n}\n\nul.double-spaced li {\n  margin-top: 1rem;\n}\n\npre, code {\n  font-family: \"Source Code Pro\", \"Menlo\", \"DejaVu Sans Mono\", \"Lucida Console\", monospace;\n}\n\ncode {\n  background-color: rgba(27,31,35,.05);\n  border-radius: 3px;\n  padding: 0 0.2rem;\n  /*font-size: 0.9em;*/\n  font-size: 12pt;\n}\n\npre {\n  color: #000;\n  margin: 1rem;\n  padding: 0.5rem 0.7rem;\n  border: 1px dashed #444;\n  /*font-size: .8rem;*/\n  font-size: 11pt;\n  overflow-x: auto;\n}\n\npre code {\n  color: inherit;\n  background: none;\n  font-size: 100%;\n  padding: 0;\n}\n\na {\n  color: #54008c;\n  text-decoration: underline;\n}\n\na:hover {\n  color: #fff;\n  background-color: #54008c;\n}\n\nimg, video {\n  display: block;\n  margin-left: auto;\n  margin-right: auto;\n  border-radius: 5px;\n  max-width: 100%;\n  max-height: 80vh;\n}\n\nvideo {\n  margin-bottom: 1rem;\n}\n\nsummary {\n  outline: none;\n  user-select: none;\n}\n\nhr {\n  position: relative;\n  margin: 1.5rem 0;\n  border: 0;\n  border-top: 1px solid #eee;\n  border-bottom: 1px solid #fff;\n}\n\n/* Classes */\n\n.title {\n  font-size: 2rem;\n}\n\n.subtitle {\n  font-size: 1.5rem;\n  margin-left: 1rem;\n}\n\n.small {\n  font-size: 0.75rem;\n}\n\n.small p {\n  margin-bottom: 0;\n}\n\n.center {\n  text-align: center;\n}\n\n.gap {\n  margin-top: 4rem;\n  margin-bottom: 4rem;\n}\n\n.accent {\n  color: #8c0038;\n}\n\n.youtube-wrapper {\n  position: relative;\n  height: 0;\n  margin-bottom: 1rem;\n}\n\n.youtube-wrapper iframe {\n  position: absolute;\n  top: 0;\n  left: 0;\n  width: 100%;\n  height: 100%;\n}\n\n/* Elements */\n\n#content {\n  max-width: 35rem;\n  margin: auto;\n  margin-bottom: 2rem;\n  padding: 1rem 1rem 0 1rem;\n}\n\n.demo {\n  margin-top: 2em;\n  margin-bottom: 2em;\n}\n\n#nav-bg {\n  margin: 0;\n  padding: 0.25rem 1rem;\n  font-family: \"Source Code Pro\", \"Menlo\", \"DejaVu Sans Mono\", \"Lucida Console\", monospace;\n  background: #54008c;\n  color: #fff;\n}\n\n#top-nav {\n  max-width: 75rem;\n  /* padding-left:8rem; */\n  margin: auto;\n  text-align: center;\n}\n\n#top-nav a {\n  color: #fff;\n  text-decoration: none;\n}\n\n#top-nav a:hover {\n  color: #000;\n  background-color: #fff;\n}\n\na#logo {\n  color: #f2deff;\n}\n\na:hover#logo {\n  color: #000;\n}\n\n#menu-icon {\n  display: none;\n}\n\n.trigger {\n  display: none;\n}\n\ninput[type=checkbox]:checked ~ .trigger {\n  display:block;\n  margin: auto;\n}\n\n.menu-label {\n  font-family: \"Source Code Pro\", \"Menlo\", \"DejaVu Sans Mono\", \"Lucida Console\", monospace;\n}\n\ninput[type=checkbox] ~ .menu-label:after {\n  content: \"(+)\";\n}\n\ninput[type=checkbox]:checked ~ .menu-label:after {\n  content: \"(-)\";\n}\n\n.nav-link {\n  display: block;\n}\n\n.trigger-child {\n  display: inline-block;\n  text-align: initial;\n}\n\n.nav-link:before {\n  content: \"- \";\n}\n\n/* in terms of our fixed-width layout; if smaller than this, we want to\n * collapse the menu */\n@media (min-width: 40rem) {\n  .menu-label {\n    display: none;\n  }\n\n  .trigger {\n    display: inline;\n    padding-top: inherit;\n  }\n\n  .trigger-child {\n    display: inline;\n    text-align: initial;\n  }\n\n  .nav-link {\n    display: initial;\n  }\n\n  .nav-link:before {\n    content: \"| \";\n  }\n}\n\n@media (prefers-color-scheme: dark) {\n\n    body {\n        background-color: #303030;\n        color: #ddd\n    }\n\n    a {\n      color: #66D9EF;\n      /*color: #A6E22E;*/\n      text-decoration: none;\n    }\n\n    a:hover {\n      color: #000;\n      background-color: #66D9EF;\n      text-decoration: none;\n    }\n\n    h1, h2, h3, h4, h5, h6 {\n        color: #eee;\n      }\n\n    #nav-bg, a#logo, #top-nav a {\n      background-color: #A6E22E;\n      color: #202020;\n    }\n\n    a:hover > code  {\n        background-color: #66D9EF;\n    }\n\n    .accent {\n        color: #F92672;\n    }\n}\n\n@media print {\n  #nav-bg, #logo, #top-nav { display: none; }\n  h1.title ~ p.center.gap.accent { display: none; }\n  .youtube-wrapper { display: none; }\n  html { font-size: 1em; font-family: sans-serif; }\n  body { background: none; }\n  #content { max-width: none; }\n  h1.title { text-align: center; }\n  h1, h2, h3, h4, h5, h6 { break-after: avoid-page; page-break-after: avoid; }\n  #content hr:last-of-type { display: none; }\n  #content pre { break-inside: avoid-page; page-break-inside: avoid; }\n  #content div.small:last-of-type { display: none; }\n}\n\n.ribbon {\n  background-color: #8cbcea;\n  overflow: hidden;\n  white-space: nowrap;\n  /* top left corner */\n  position: absolute;\n  right: -50px;\n  top: 40px;\n  /* 45 deg ccw rotation */\n  -webkit-transform: rotate(45deg);\n     -moz-transform: rotate(45deg);\n      -ms-transform: rotate(45deg);\n       -o-transform: rotate(45deg);\n          transform: rotate(45deg);\n  /* shadow */\n  -webkit-box-shadow: 0 0 10px #888;\n     -moz-box-shadow: 0 0 10px #888;\n          /* box-shadow: 0 0 10px #888; */\n          box-shadow: 0px -1px 20px 0px #562c8c6b;\n}\n.ribbon a {\n  border: 1px solid #000;\n  color: #000;\n  display: block;\n  font: bold 81.25% \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n  margin: 1px 0;\n  padding: 10px 50px;\n  text-align: center;\n  text-decoration: none;\n  /* shadow */\n  /* text-shadow: 0 0 5px #444; */\n}\n"
  },
  {
    "path": "static/css/syntax.css",
    "content": "pre.highlight { background-color: #f9f9f9; background-clip: border-box }\n.highlight .c { color: #999988; font-style: italic } /* Comment */\n.highlight .err { color: #a61717; background-color: #e3d2d2 } /* Error */\n.highlight .k { color: #000000; font-weight: bold } /* Keyword */\n.highlight .o { color: #000000; font-weight: bold } /* Operator */\n.highlight .cm { color: #999988; font-style: italic } /* Comment.Multiline */\n.highlight .cp { color: #999999; font-weight: bold; font-style: italic } /* Comment.Preproc */\n.highlight .c1 { color: #999988; font-style: italic } /* Comment.Single */\n.highlight .cs { color: #999999; font-weight: bold; font-style: italic } /* Comment.Special */\n.highlight .gd { color: #000000; background-color: #ffdddd } /* Generic.Deleted */\n.highlight .ge { color: #000000; font-style: italic } /* Generic.Emph */\n.highlight .gr { color: #aa0000 } /* Generic.Error */\n.highlight .gh { color: #999999 } /* Generic.Heading */\n.highlight .gi { color: #000000; background-color: #ddffdd } /* Generic.Inserted */\n.highlight .go { color: #888888 } /* Generic.Output */\n.highlight .gp { color: #995c8b } /* Generic.Prompt */\n.highlight .gs { font-weight: bold } /* Generic.Strong */\n.highlight .gu { color: #aaaaaa } /* Generic.Subheading */\n.highlight .gt { color: #aa0000 } /* Generic.Traceback */\n.highlight .kc { color: #000000; font-weight: bold } /* Keyword.Constant */\n.highlight .kd { color: #000000; font-weight: bold } /* Keyword.Declaration */\n.highlight .kn { color: #000000; font-weight: bold } /* Keyword.Namespace */\n.highlight .kp { color: #000000; font-weight: bold } /* Keyword.Pseudo */\n.highlight .kr { color: #000000; font-weight: bold } /* Keyword.Reserved */\n.highlight .kt { color: #445588; font-weight: bold } /* Keyword.Type */\n.highlight .m { color: #009999 } /* Literal.Number */\n.highlight .s { color: #d01040 } /* Literal.String */\n.highlight .na { color: #008080 } /* Name.Attribute */\n.highlight .nb { color: #0086b3 } /* Name.Builtin */\n.highlight .nc { color: #445588; font-weight: bold } /* Name.Class */\n.highlight .no { color: #008080 } /* Name.Constant */\n.highlight .nd { color: #3c5d5d; font-weight: bold } /* Name.Decorator */\n.highlight .ni { color: #800080 } /* Name.Entity */\n.highlight .ne { color: #990000; font-weight: bold } /* Name.Exception */\n.highlight .nf { color: #990000; font-weight: bold } /* Name.Function */\n.highlight .nl { color: #990000; font-weight: bold } /* Name.Label */\n.highlight .nn { color: #555555 } /* Name.Namespace */\n.highlight .nt { color: #000080 } /* Name.Tag */\n.highlight .nv { color: #008080 } /* Name.Variable */\n.highlight .ow { color: #000000; font-weight: bold } /* Operator.Word */\n.highlight .w { color: #bbbbbb } /* Text.Whitespace */\n.highlight .mf { color: #009999 } /* Literal.Number.Float */\n.highlight .mh { color: #009999 } /* Literal.Number.Hex */\n.highlight .mi { color: #009999 } /* Literal.Number.Integer */\n.highlight .mo { color: #009999 } /* Literal.Number.Oct */\n.highlight .sb { color: #d01040 } /* Literal.String.Backtick */\n.highlight .sc { color: #d01040 } /* Literal.String.Char */\n.highlight .sd { color: #d01040 } /* Literal.String.Doc */\n.highlight .s2 { color: #d01040 } /* Literal.String.Double */\n.highlight .se { color: #d01040 } /* Literal.String.Escape */\n.highlight .sh { color: #d01040 } /* Literal.String.Heredoc */\n.highlight .si { color: #d01040 } /* Literal.String.Interpol */\n.highlight .sx { color: #d01040 } /* Literal.String.Other */\n.highlight .sr { color: #009926 } /* Literal.String.Regex */\n.highlight .s1 { color: #d01040 } /* Literal.String.Single */\n.highlight .ss { color: #990073 } /* Literal.String.Symbol */\n.highlight .bp { color: #999999 } /* Name.Builtin.Pseudo */\n.highlight .vc { color: #008080 } /* Name.Variable.Class */\n.highlight .vg { color: #008080 } /* Name.Variable.Global */\n.highlight .vi { color: #008080 } /* Name.Variable.Instance */\n.highlight .il { color: #009999 } /* Literal.Number.Integer.Long */\n\n\n@media (prefers-color-scheme: dark) {\n    code { background-color: #232323; }\n    pre code { color: #ddd; }\n    pre.highlight { background-color: #232323; }\n    .highlight .hll { background-color: #232323; }\n    .highlight .c { color: #75715e } /* Comment */\n    .highlight .err { color: #960050; background-color: #1e0010 } /* Error */\n    .highlight .k { color: #66d9ef } /* Keyword */\n    .highlight .l { color: #ae81ff } /* Literal */\n    .highlight .n { color: #f8f8f2 } /* Name */\n    .highlight .o { color: #f92672 } /* Operator */\n    .highlight .p { color: #f8f8f2 } /* Punctuation */\n    .highlight .cm { color: #75715e } /* Comment.Multiline */\n    .highlight .cp { color: #75715e } /* Comment.Preproc */\n    .highlight .c1 { color: #75715e } /* Comment.Single */\n    .highlight .cs { color: #75715e } /* Comment.Special */\n    .highlight .ge { font-style: italic } /* Generic.Emph */\n    .highlight .gs { font-weight: bold } /* Generic.Strong */\n    .highlight .kc { color: #66d9ef } /* Keyword.Constant */\n    .highlight .kd { color: #66d9ef } /* Keyword.Declaration */\n    .highlight .kn { color: #f92672 } /* Keyword.Namespace */\n    .highlight .kp { color: #66d9ef } /* Keyword.Pseudo */\n    .highlight .kr { color: #66d9ef } /* Keyword.Reserved */\n    .highlight .kt { color: #66d9ef } /* Keyword.Type */\n    .highlight .ld { color: #e6db74 } /* Literal.Date */\n    .highlight .m { color: #ae81ff } /* Literal.Number */\n    .highlight .s { color: #e6db74 } /* Literal.String */\n    .highlight .na { color: #a6e22e } /* Name.Attribute */\n    .highlight .nb { color: #f8f8f2 } /* Name.Builtin */\n    .highlight .nc { color: #a6e22e } /* Name.Class */\n    .highlight .no { color: #66d9ef } /* Name.Constant */\n    .highlight .nd { color: #a6e22e } /* Name.Decorator */\n    .highlight .ni { color: #f8f8f2 } /* Name.Entity */\n    .highlight .ne { color: #a6e22e } /* Name.Exception */\n    .highlight .nf { color: #a6e22e } /* Name.Function */\n    .highlight .nl { color: #f8f8f2 } /* Name.Label */\n    .highlight .nn { color: #f8f8f2 } /* Name.Namespace */\n    .highlight .nx { color: #a6e22e } /* Name.Other */\n    .highlight .py { color: #f8f8f2 } /* Name.Property */\n    .highlight .nt { color: #f92672 } /* Name.Tag */\n    .highlight .nv { color: #f8f8f2 } /* Name.Variable */\n    .highlight .ow { color: #f92672 } /* Operator.Word */\n    .highlight .w { color: #f8f8f2 } /* Text.Whitespace */\n    .highlight .mf { color: #ae81ff } /* Literal.Number.Float */\n    .highlight .mh { color: #ae81ff } /* Literal.Number.Hex */\n    .highlight .mi { color: #ae81ff } /* Literal.Number.Integer */\n    .highlight .mo { color: #ae81ff } /* Literal.Number.Oct */\n    .highlight .sb { color: #e6db74 } /* Literal.String.Backtick */\n    .highlight .sc { color: #e6db74 } /* Literal.String.Char */\n    .highlight .sd { color: #e6db74 } /* Literal.String.Doc */\n    .highlight .s2 { color: #e6db74 } /* Literal.String.Double */\n    .highlight .se { color: #ae81ff } /* Literal.String.Escape */\n    .highlight .sh { color: #e6db74 } /* Literal.String.Heredoc */\n    .highlight .si { color: #e6db74 } /* Literal.String.Interpol */\n    .highlight .sx { color: #e6db74 } /* Literal.String.Other */\n    .highlight .sr { color: #e6db74 } /* Literal.String.Regex */\n    .highlight .s1 { color: #e6db74 } /* Literal.String.Single */\n    .highlight .ss { color: #e6db74 } /* Literal.String.Symbol */\n    .highlight .bp { color: #f8f8f2 } /* Name.Builtin.Pseudo */\n    .highlight .vc { color: #f8f8f2 } /* Name.Variable.Class */\n    .highlight .vg { color: #f8f8f2 } /* Name.Variable.Global */\n    .highlight .vi { color: #f8f8f2 } /* Name.Variable.Instance */\n    .highlight .il { color: #ae81ff } /* Literal.Number.Integer.Long */\n    .highlight .gh { } /* Generic Heading & Diff Header */\n    .highlight .gu { color: #75715e; } /* Generic.Subheading & Diff Unified/Comment? */\n    .highlight .gd { color: #f92672; } /* Generic.Deleted & Diff Deleted */\n    .highlight .gi { color: #a6e22e; } /* Generic.Inserted & Diff Inserted */\n}"
  },
  {
    "path": "static/files/logger.py",
    "content": "import logging\nimport sys\n\nclass CustomFormatter(logging.Formatter):\n    \"\"\"Logging Formatter to add colors and count warning / errors\"\"\"\n\n    grey = \"\\x1b[38;21m\"\n    yellow = \"\\x1b[33;21m\"\n    red = \"\\x1b[31;21m\"\n    bold_red = \"\\x1b[31;1m\"\n    reset = \"\\x1b[0m\"\n    format = \"%(asctime)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:%(lineno)d)\"\n\n    FORMATS = {\n        logging.DEBUG: grey + format + reset,\n        logging.INFO: grey + format + reset,\n        logging.WARNING: yellow + format + reset,\n        logging.ERROR: red + format + reset,\n        logging.CRITICAL: bold_red + format + reset\n    }\n\n    def format(self, record):\n        log_fmt = self.FORMATS.get(record.levelno)\n        formatter = logging.Formatter(log_fmt)\n        return formatter.format(record)\n\n# create logger with 'spam_application'\nlogger = logging.getLogger(\"Sample\")\n\n# create console handler with a higher log level\nch = logging.StreamHandler()\nch.setLevel(logging.DEBUG)\n\nif len(sys.argv)> 1:\n    if sys.argv[1] == 'log':\n        ch.setFormatter(logging.Formatter('%(asctime)s : %(levelname)s : %(name)s : %(message)s'))\n    elif sys.argv[1] == 'color':\n        ch.setFormatter(CustomFormatter())\n\nif len(sys.argv) > 2:\n    logger.setLevel(logging.__getattribute__(sys.argv[2]))\nelse:\n    logger.setLevel(logging.DEBUG)\n\nlogger.addHandler(ch)\n\n# logger.debug(\"debug message\")\n# logger.info(\"info message\")\n# logger.warning(\"warning message\")\n# logger.error(\"error message\")\n# logger.critical(\"critical message\")\n\nimport random\nimport time\nfor _ in range(100):\n    i = random.randint(0, 10)\n    if i <= 4:\n        logger.info(\"Value is {} - Everything is fine\".format(i))\n    elif i <= 6:\n        logger.warning(\"Value is {} - System is getting hot\".format(i))\n    elif i <= 8:\n        logger.error(\"Value is {} - Dangerous region\".format(i))\n    else:\n        logger.critical(\"Maximum value reached\")\n    time.sleep(0.3)\n\n"
  },
  {
    "path": "static/files/sorts.py",
    "content": "import random\n\n\ndef test_sorted(fn, iters=1000):\n    for i in range(iters):\n        l = [random.randint(0, 100) for i in range(0, random.randint(0, 50))]\n        assert fn(l) == sorted(l)\n        # print(fn.__name__, fn(l))\n\n\ndef insertionsort(array):\n\n    for i in range(len(array)):\n        j = i-1\n        v = array[i]\n        while j >= 0 and v < array[j]:\n            array[j+1] = array[j]\n            j -= 1\n        array[j+1] = v\n    return array\n\n\ndef quicksort(array):\n    if len(array) <= 1:\n        return array\n    pivot = array[0]\n    left = [i for i in array[1:] if i < pivot]\n    right = [i for i in array[1:] if i >= pivot]\n    return quicksort(left) + [pivot] + quicksort(right)\n\n\ndef quicksort_inplace(array, low=0, high=None):\n    if len(array) <= 1:\n        return array\n    if high is None:\n        high = len(array)-1\n    if low >= high:\n        return array\n\n    pivot = array[high]\n    j = low-1\n    for i in range(low, high):\n        if array[i] <= pivot:\n            j += 1\n            array[i], array[j] = array[j], array[i]\n    array[high], array[j+1] = array[j+1], array[high]\n    quicksort_inplace(array, low, j)\n    quicksort_inplace(array, j+2, high)\n    return array\n\n\nif __name__ == '__main__':\n    for fn in [quicksort, quicksort_inplace, insertionsort]:\n        test_sorted(fn)\n"
  },
  {
    "path": "static/files/subtitles/2020/command-line.sbv",
    "content": "0:00:00.480,0:00:02.480\nOkay, can everyone hear me okay?\n\n0:00:03.720,0:00:06.160\nOkay, so welcome back.\n\n0:00:06.160,0:00:10.320\nI'm gonna address a couple of items\nin kind of the administratrivia.\n\n0:00:10.640,0:00:13.080\nWith the end of the first week,\n\n0:00:13.179,0:00:16.349\nwe sent an email, noticing you that\n\n0:00:16.600,0:00:20.219\nwe have uploaded the videos for the first\nweek, so you can now find them online.\n\n0:00:20.470,0:00:26.670\nThey have all the screen recordings for the things\nthat we were doing, so you can go back to them.\n\n0:00:26.830,0:00:31.439\nLook if you're were confused about if\nwe did something quick and, again,\n\n0:00:31.440,0:00:37.560\nfeel free to ask us any questions if anything in the\nlecture notes is not clear. We also sent you a\n\n0:00:37.880,0:00:42.360\nsurvey so you can give us feedback\nabout what was not clear,\n\n0:00:42.360,0:00:46.280\nwhat items you would want a\nmore thorough explanation or\n\n0:00:47.110,0:00:51.749\njust any other item, if you're finding\nthe exercises too hard, too easy,\n\n0:00:52.239,0:00:55.288\ngo into that URL and we'll really\n\n0:00:55.960,0:01:00.040\nappreciate getting that feedback, because\nthat will make the course better\n\n0:01:00.480,0:01:03.800\nfor the remaining lectures and for\nfuture iterations of the course.\n\n0:01:05.080,0:01:07.080\nWith that out of the way\n\n0:01:07.080,0:01:10.840\nOh, and we're gonna try to upload the\nvideos in a more timely manner.\n\n0:01:11.200,0:01:16.040\nWe don't want to kind of wait until the end of\nthe week for that. So keep tuned for that.\n\n0:01:18.760,0:01:19.840\nThat out of the way,\n\n0:01:19.920,0:01:20.800\nnow I'm gonna\n\n0:01:21.120,0:01:24.960\nThis lecture's called command-line\nenvironment and we're\n\n0:01:25.160,0:01:28.440\ngoing to cover a few different topics. So the\n\n0:01:28.990,0:01:30.990\nmain topics we're gonna\n\n0:01:32.040,0:01:34.520\ncover, so you can keep track,\n\n0:01:34.680,0:01:36.400\nit's probably better here,\n\n0:01:36.400,0:01:37.720\nkeep track of what I'm talking.\n\n0:01:37.920,0:01:41.560\nThe first is gonna be job control.\n\n0:01:42.040,0:01:44.280\nThe second one is going to be\n\n0:01:44.600,0:01:46.600\nterminal multiplexers.\n\n0:01:51.720,0:01:57.360\nThen I'm going to explain what dotfiles\nare and how to configure your shell.\n\n0:01:57.360,0:02:03.240\nAnd lastly, how to efficiently work with\nremote machines. So if things are not\n\n0:02:05.110,0:02:07.589\nfully clear, kind of keep the structure.\n\n0:02:08.200,0:02:12.320\nThey all kind of interact in some\nway, of how you use your terminal,\n\n0:02:12.880,0:02:17.280\nbut they are somewhat separate\ntopics, so keep that in mind.\n\n0:02:17.600,0:02:23.800\nSo let's go with job control. So far we have\nbeen using the shell in a very, kind of\n\n0:02:24.800,0:02:27.720\nmono-command way. Like, you\nexecute a command and then\n\n0:02:27.840,0:02:31.800\nthe command executes, then you get some output,\nand that's all about what you can do.\n\n0:02:32.200,0:02:36.520\nAnd if you want to run several\nthings, it's not clear\n\n0:02:36.540,0:02:41.099\nhow you will do it. Or if you want to stop\nthe execution of a program, it's again,\n\n0:02:41.099,0:02:43.768\nlike how do I know how to stop a program?\n\n0:02:44.650,0:02:47.940\nLet's showcase this with a command called sleep.\n\n0:02:48.160,0:02:50.320\nSleep is a command that takes an argument,\n\n0:02:50.320,0:02:54.360\nand that argument is going to be an\ninteger number, and it will sleep.\n\n0:02:54.360,0:02:58.440\nIt will just kind of be there, on the\nbackground, for that many seconds.\n\n0:02:58.440,0:03:03.539\nSo if we do something like sleep 20, this process\nis gonna be sleeping for 20 seconds.\n\n0:03:03.539,0:03:07.720\nBut we don't want to wait 20 seconds\nfor the command to complete.\n\n0:03:08.040,0:03:10.800\nSo what we can do is type \"Ctrl+C\".\n\n0:03:10.840,0:03:12.580\nBy typing \"Ctrl+C\"\n\n0:03:12.580,0:03:17.840\nWe can see that, here, the terminal let us know,\n\n0:03:18.880,0:03:22.840\nand it's part of the syntax that we covered\nin the editors / Vim lecture,\n\n0:03:23.000,0:03:27.200\nthat we typed \"Ctrl+C\" and it stopped\nthe execution of the process.\n\n0:03:27.640,0:03:29.640\nWhat is actually going on here\n\n0:03:29.880,0:03:34.840\nis that this is using a UNIX communication\nmechanism called signals.\n\n0:03:35.120,0:03:37.360\nWhen we type \"Ctrl+C\",\n\n0:03:37.800,0:03:42.080\nwhat the terminal did for us,\nor the shell did for us,\n\n0:03:42.160,0:03:45.960\nis send a signal called SIGINT,\n\n0:03:45.960,0:03:51.320\nthat stands for SIGnal INTerrupt, that\ntells the program to stop itself.\n\n0:03:51.680,0:03:57.520\nAnd there are many, many, many signals\nof this kind. If you do man signal,\n\n0:03:58.880,0:04:05.060\nand just go down a little bit,\nhere you have a list of them.\n\n0:04:05.060,0:04:07.040\nThey all have number identifiers,\n\n0:04:07.520,0:04:10.640\nthey have kind of a short name\nand you can find a description.\n\n0:04:10.960,0:04:16.400\nSo for example, the one I have just\ndescribed is here, number 2, SIGINT.\n\n0:04:16.520,0:04:22.200\nThis is the signal that a terminal will send to a\nprogram when it wants to interrupt its execution.\n\n0:04:22.520,0:04:25.840\nA few more to be familiar with\n\n0:04:26.460,0:04:28.530\nis SIGQUIT, this is\n\n0:04:29.229,0:04:34.409\nagain, if you work from a terminal and you\nwant to quit the execution of a program.\n\n0:04:34.409,0:04:37.720\nFor most programs it will do the same thing,\n\n0:04:37.720,0:04:41.120\nbut we're gonna showcase now a program\nwhich will be different,\n\n0:04:41.440,0:04:43.760\nand this is the signal that will be sent.\n\n0:04:44.680,0:04:49.229\nIt can be confusing sometimes. Looking at\nthese signals, for example, the SIGTERM is\n\n0:04:50.080,0:04:54.100\nfor most cases equivalent to SIGINT and SIGQUIT\n\n0:04:54.480,0:04:58.380\nbut it's just when it's not\nsent through a terminal.\n\n0:04:59.680,0:05:01.680\nA few more that we're gonna\n\n0:05:01.900,0:05:06.209\ncover is SIGHUP, it's when there's\nlike a hang-up in the terminal.\n\n0:05:06.210,0:05:10.199\nSo for example, when you are in your\nterminal, if you close your terminal\n\n0:05:10.199,0:05:13.348\nand there are still things\nrunning in the terminal,\n\n0:05:13.480,0:05:17.000\nthat's the signal that the program is gonna send\n\n0:05:17.000,0:05:19.960\nto all the processes to tell\nthat they should close,\n\n0:05:19.960,0:05:25.080\nlike there was a hang-up in the\ncommand line communication\n\n0:05:25.080,0:05:26.800\nand they should close now.\n\n0:05:28.400,0:05:34.260\nSignals can do more things than just stopping, interrupting\nprograms and asking them to finish.\n\n0:05:34.260,0:05:36.840\nYou can for example use the\n\n0:05:37.520,0:05:43.840\nSIGSTOP to pause the execution of the\nprogram, and then you can use the\n\n0:05:44.480,0:05:50.160\nSIGCONT command for continuing, to continue the execution\nof the program at a point later in time.\n\n0:05:51.160,0:05:55.440\nSince all of this might be slightly too\nabstract, let's see a few examples.\n\n0:05:58.040,0:06:00.560\nFirst, let's showcase a\n\n0:06:01.960,0:06:06.240\nPython program. I'm going to very\nquickly go through the program.\n\n0:06:06.440,0:06:08.360\nThis is a Python program,\n\n0:06:08.720,0:06:10.760\nthat like most python programs,\n\n0:06:11.520,0:06:13.960\nis importing this signal library and\n\n0:06:14.960,0:06:20.400\nis defining this handler here.\nAnd this handler is writing,\n\n0:06:20.440,0:06:23.040\n\"Oh, I got a SIGINT, but\nI'm not gonna stop here\".\n\n0:06:23.480,0:06:24.960\nAnd after that,\n\n0:06:24.960,0:06:30.720\nwe tell Python that we want this program,\nwhen it gets a SIGINT, to stop.\n\n0:06:31.120,0:06:34.880\nThe rest of the program is a very silly program\nthat is just going to be printing numbers.\n\n0:06:35.060,0:06:37.540\nSo let's see this in action.\n\n0:06:37.560,0:06:39.560\nWe do Python SIGINT.\n\n0:06:39.880,0:06:44.970\nAnd it's counting. We try doing\n\"Ctrl+C\", this sends a SIGINT,\n\n0:06:44.970,0:06:50.000\nbut the program didn't actually stop. This\nis because we have a way in the program of\n\n0:06:50.400,0:06:54.600\ndealing with this exception,\nand we didn't want to exit.\n\n0:06:54.760,0:06:57.600\nIf we send a SIGQUIT, which is done through\n\n0:06:57.800,0:07:03.680\n\"Ctrl+\\\", here, we can see that since the program\ndoesn't have a way of dealing with SIGQUIT,\n\n0:07:03.730,0:07:06.269\nit does the default operation, which is\n\n0:07:06.820,0:07:08.800\nterminate the program.\n\n0:07:09.080,0:07:11.460\nAnd you could use this, for example,\n\n0:07:11.880,0:07:15.880\nif someone Ctrl+C's your program, and your\nprogram is supposed to do something,\n\n0:07:16.040,0:07:19.320\nlike you maybe want to save the intermediate\nstate of your program\n\n0:07:19.320,0:07:21.520\nto a file, so you can recover it for later.\n\n0:07:21.600,0:07:25.640\nThis is how you could write a handler like this.\n\n0:07:29.520,0:07:30.720\nCan you repeat the question?\n\n0:07:30.880,0:07:32.280\nWhat did you type right now, when it stopped?\n\n0:07:32.480,0:07:34.480\nSo I...\n\n0:07:34.630,0:07:38.880\nSo what I typed is, I type\n\"Ctrl+C\" to try to stop it\n\n0:07:38.880,0:07:42.869\nbut it didn't, because SIGINT is captured\nby the program. Then I type\n\n0:07:43.120,0:07:48.040\n\"Ctrl+\\\", which sends a SIGQUIT,\nwhich is a different signal,\n\n0:07:49.000,0:07:51.720\nand this signal is not captured by the program.\n\n0:07:52.090,0:07:54.869\nIt's also worth mentioning\nthat there is a couple of\n\n0:07:54.970,0:07:59.970\nsignals that cannot be captured by software.\nThere is a couple of signals\n\n0:08:00.820,0:08:02.820\nlike SIGKILL\n\n0:08:03.940,0:08:06.600\nthat cannot be captured. Like that, it will\n\n0:08:06.660,0:08:09.300\nterminate the execution of the\nprocess, no matter what.\n\n0:08:09.300,0:08:12.000\nAnd it can be sometimes harmful.\nYou do not want to be using it by\n\n0:08:12.000,0:08:16.460\ndefault, because this can leave for example an\norphan child, orphaned children processes.\n\n0:08:16.470,0:08:20.940\nLike if a process has other small children\nprocesses that it started, and you\n\n0:08:21.400,0:08:25.470\nSIGKILL it, all of those will\nkeep running in there,\n\n0:08:25.760,0:08:30.800\nbut they won't have a parent, and you can maybe\nhave a really weird behavior going on.\n\n0:08:32.040,0:08:35.680\nWhat signal is given to the\nprogram if we log off?\n\n0:08:35.800,0:08:37.440\nIf you log off?\n\n0:08:37.920,0:08:41.920\nThat would be... so for example, if you're in an\nSSH connection and you close the connection,\n\n0:08:41.920,0:08:45.600\nthat is the hang-up signal,\n\n0:08:45.600,0:08:51.200\nSIGHUP, which I'm gonna cover in an example.\nSo this is what would be sent up.\n\n0:08:51.560,0:08:56.360\nAnd you could write for example, if you want\nthe process to keep working even if you close\n\n0:08:56.960,0:09:02.560\nthat, you can write a wrapper around\nthat to ignore that signal.\n\n0:09:04.720,0:09:09.760\nLet's display what we could do\nwith the stop and continue.\n\n0:09:09.980,0:09:16.389\nSo, for example, we can start a really long process.\nLet's sleep a thousand, we're gonna take forever.\n\n0:09:16.960,0:09:18.920\nWe can control-c,\n\n0:09:18.920,0:09:20.360\n\"Ctrl+Z\", sorry,\n\n0:09:20.360,0:09:25.280\nand if we do \"Ctrl+Z\" we can see that the\nterminal is saying \"it's suspended\".\n\n0:09:25.400,0:09:31.520\nWhat this actually meant is that this process\nwas sent a SIGSTOP signal and now is\n\n0:09:31.900,0:09:36.900\nstill there, you could continue its execution, but right\nnow it's completely stopped and in the background\n\n0:09:38.580,0:09:41.720\nand we can launch a different program.\n\n0:09:41.720,0:09:43.680\nWhen we try to run this program,\n\n0:09:43.680,0:09:46.620\nplease notice that I have included\nan \"&\" at the end.\n\n0:09:46.820,0:09:52.380\nThis tells bash that I want this program\nto start running in the background.\n\n0:09:52.560,0:09:55.660\nThis is kind of related to all these\n\n0:09:55.660,0:09:59.720\nconcepts of running programs in\nthe shell, but backgrounded.\n\n0:10:00.350,0:10:04.359\nAnd what is gonna happen is\nthe program is gonna start\n\n0:10:04.720,0:10:07.580\nbut it's not gonna take over my prompt.\n\n0:10:07.580,0:10:11.540\nIf I just ran this command without\nthis, I could not do anything.\n\n0:10:11.540,0:10:15.820\nI would have no access to the prompt\nuntil the command either finished\n\n0:10:16.060,0:10:19.380\nor I ended it abruptly. But if I do this,\n\n0:10:19.520,0:10:23.080\nit's saying \"there's a new\nprocess which is this\".\n\n0:10:23.080,0:10:25.180\nThis is the process identifying number,\n\n0:10:25.180,0:10:26.940\nwe can ignore this for now.\n\n0:10:27.800,0:10:32.919\nIf I type the command \"jobs\", I get the\noutput that I have a suspended job\n\n0:10:32.920,0:10:35.800\nthat is the \"sleep 1000\" job.\n\n0:10:36.040,0:10:38.100\nAnd then I have another running job,\n\n0:10:38.120,0:10:42.200\nwhich is this \"NOHUP sleep 2000\".\n\n0:10:42.640,0:10:45.660\nSay I want to continue the first job.\n\n0:10:45.660,0:10:48.520\nThe first job is suspended,\nit's not executing anymore.\n\n0:10:48.640,0:10:52.600\nI can continue that doing \"BG %1\"\n\n0:10:53.870,0:10:58.359\nThat \"%\" is referring to the fact that\nI want to refer to this specific\n\n0:11:00.280,0:11:04.280\nprocess. And now, if I do that\nand I look at the jobs,\n\n0:11:04.300,0:11:06.460\nnow this job is running again. Now\n\n0:11:06.460,0:11:08.940\nboth of them are running.\n\n0:11:09.300,0:11:13.820\nIf I wanted to stop these all,\nI can use the kill command.\n\n0:11:14.040,0:11:16.060\nThe kill command\n\n0:11:16.220,0:11:18.620\nis for killing jobs,\n\n0:11:19.180,0:11:22.080\nwhich is just stopping them, intuitively,\n\n0:11:22.120,0:11:23.760\nbut actually it's really useful.\n\n0:11:23.860,0:11:28.200\nThe kill command just allows you\nto send any sort of Unix signal.\n\n0:11:28.360,0:11:32.220\nSo here for example, instead\nof killing it completely,\n\n0:11:32.220,0:11:34.640\nwe just send a stop signal.\n\n0:11:34.640,0:11:39.160\nHere I'm gonna send a stop signal, which\nis gonna pause the process again.\n\n0:11:39.160,0:11:41.280\nI still have to include the identifier,\n\n0:11:41.600,0:11:46.480\nbecause without the identifier the shell wouldn't know\nwhether to stop the first one or the second one.\n\n0:11:47.480,0:11:52.480\nNow it's said this has been suspended,\nbecause there was a signal sent.\n\n0:11:52.620,0:11:57.360\nIf I do \"jobs\", again, we can see\nthat the second one is running\n\n0:11:57.460,0:12:00.740\nand the first one has been stopped.\n\n0:12:01.420,0:12:04.300\nGoing back to one of the questions,\n\n0:12:04.300,0:12:06.980\nwhat happens when you close\nthe cell, for example,\n\n0:12:06.980,0:12:12.860\nand why sometimes people will say that\nyou should use this NOHUP command\n\n0:12:12.860,0:12:15.960\nbefore your run jobs in a remote session.\n\n0:12:16.220,0:12:23.120\nThis is because if we try to send\na hung up command to the first job\n\n0:12:23.560,0:12:27.820\nit's gonna, in a similar fashion\nas the other signals,\n\n0:12:27.820,0:12:32.280\nit's gonna hang it up and that's\ngonna terminate the job.\n\n0:12:32.800,0:12:35.960\nAnd the first job isn't there anymore\n\n0:12:36.320,0:12:39.140\nwhereas we have still the second job running.\n\n0:12:39.400,0:12:42.920\nHowever, if we try to send the\nsignal to the second job\n\n0:12:42.920,0:12:46.060\nwhat will happen if we close\nour terminal right now\n\n0:12:47.040,0:12:48.660\nis it's still running.\n\n0:12:48.660,0:12:52.480\nLike NOHUP, what it's doing\nis kind of encapsulating\n\n0:12:52.480,0:12:54.480\nwhatever command you're executing and\n\n0:12:54.740,0:12:58.720\nignoring wherever you get a hang up signal,\n\n0:12:58.900,0:13:03.680\nand just ignoring that so it can keep running.\n\n0:13:05.060,0:13:08.500\nAnd if we send the \"kill\"\nsignal to the second job,\n\n0:13:08.500,0:13:12.820\nthat one can't be ignored and that\nwill kill the job, no matter what.\n\n0:13:13.280,0:13:15.780\nAnd we don't have any jobs anymore.\n\n0:13:17.000,0:13:22.540\nThat kind of completes the\nsection on job control.\n\n0:13:22.740,0:13:27.100\nAny questions so far? Anything\nthat wasn't fully clear?\n\n0:13:29.040,0:13:30.400\nWhat does BG do?\n\n0:13:30.960,0:13:31.800\nSo BG...\n\n0:13:31.800,0:13:36.860\nThere are like two commands. Whenever you\nhave a command that has been backgrounded\n\n0:13:37.200,0:13:41.820\nand is stopped you can use\nBG (short for background)\n\n0:13:41.820,0:13:44.180\nto continue that process running\non the background.\n\n0:13:44.440,0:13:47.400\nThat's equivalent of just kind of sending it\n\n0:13:47.680,0:13:50.820\na continue signal, so it keeps running.\n\n0:13:50.820,0:13:54.820\nAnd then there's another one which\nis called FG, if you want to\n\n0:13:54.860,0:13:59.580\nrecover it to the foreground and you want\nto reattach your standard output.\n\n0:14:04.760,0:14:06.760\nOkay, good.\n\n0:14:07.120,0:14:11.420\nJobs are useful and in general, I\nthink knowing about signals can be\n\n0:14:11.420,0:14:14.360\nreally beneficial when dealing\nwith some part of Unix\n\n0:14:14.360,0:14:19.420\nbut most of the time what you actually want\nto do is something along the lines of\n\n0:14:19.670,0:14:24.099\nhaving your editor in one side and then\nthe program in another, and maybe\n\n0:14:24.720,0:14:28.280\nmonitoring what the resource\nconsumption is in our tab.\n\n0:14:28.680,0:14:33.640\nWe could achieve this using probably\nwhat you have seen a lot of the time,\n\n0:14:33.640,0:14:35.200\nwhich is just opening more windows.\n\n0:14:35.200,0:14:37.200\nWe can keep opening terminal windows.\n\n0:14:37.320,0:14:41.280\nBut the fact is there are kind of more\nconvenient solutions to this and\n\n0:14:41.280,0:14:43.800\nthis is what a terminal multiplexer does.\n\n0:14:44.080,0:14:48.520\nA terminal multiplexer like tmux\n\n0:14:48.840,0:14:52.160\nwill let you create different workspaces\nthat you can work in,\n\n0:14:52.640,0:14:54.280\nand quickly kind of,\n\n0:14:54.280,0:14:56.960\nthis has a huge variety of functionality,\n\n0:14:57.320,0:15:02.760\nIt will let you rearrange the environment and\nit will let you have different sessions.\n\n0:15:03.400,0:15:05.400\nThere's another more...\n\n0:15:05.600,0:15:07.640\nolder command, which is called \"screen\",\n\n0:15:07.640,0:15:09.360\nthat might be more readily available.\n\n0:15:09.360,0:15:12.200\nBut I think the concept kind\nof extrapolates to both.\n\n0:15:12.600,0:15:15.400\nWe recommend tmux, that you go and learn it.\n\n0:15:15.400,0:15:17.480\nAnd in fact, we have exercises on it.\n\n0:15:17.480,0:15:20.240\nI'm gonna showcase a different\nscenario right now.\n\n0:15:20.320,0:15:22.000\nSo whenever I talked...\n\n0:15:22.320,0:15:24.880\nOh, let me make a quick note.\n\n0:15:25.200,0:15:28.800\nThere are kind of three core concepts\nin tmux, that I'm gonna go through and\n\n0:15:30.110,0:15:33.130\nthe main idea is that there are what is called\n\n0:15:35.180,0:15:37.180\n\"sessions\".\n\n0:15:37.760,0:15:40.510\nSessions have \"windows\" and\n\n0:15:42.019,0:15:44.019\nwindows have \"panes\".\n\n0:15:45.709,0:15:49.539\nIt's gonna be kind of useful to\nkeep this hierarchy in mind.\n\n0:15:50.760,0:15:57.280\nYou can pretty much equate \"windows\" to what\n\"tabs\" are in other editors and others,\n\n0:15:57.280,0:16:00.720\nlike for example your web browser.\n\n0:16:01.280,0:16:06.440\nI'm gonna go through the features, mainly\nwhat you can do at the different levels.\n\n0:16:07.000,0:16:10.480\nSo first, when we do tmux, that starts a session.\n\n0:16:11.360,0:16:14.960\nAnd here right now it seems like nothing changed\n\n0:16:14.960,0:16:20.360\nbut what's happening right now is we're within a shell\nthat is different from the one we started before.\n\n0:16:20.640,0:16:24.840\nSo in our shell we started\na process, that is tmux\n\n0:16:24.840,0:16:28.840\nand that tmux started a different process,\nwhich is the shell we're currently in.\n\n0:16:28.980,0:16:30.400\nAnd the nice thing about this is that\n\n0:16:30.580,0:16:34.740\nthat tmux process is separate from\nthe original shell process.\n\n0:16:34.860,0:16:36.860\nSo\n\n0:16:40.580,0:16:44.460\nhere, we can do things.\n\n0:16:44.480,0:16:48.600\nWe can do \"ls -la\", for example, to\ntell us what is going on in here.\n\n0:16:48.920,0:16:53.960\nAnd then we can start running our program,\nand it will start running in there\n\n0:16:54.160,0:16:57.880\nand we can do \"Ctrl+A d\", for example, to detach\n\n0:17:12.760,0:17:15.960\nto detach from the session.\n\n0:17:16.140,0:17:19.120\nAnd if we do \"tmux a\"\n\n0:17:19.160,0:17:21.560\nthat's gonna reattach us to the session.\n\n0:17:21.560,0:17:22.300\nSo the process,\n\n0:17:22.300,0:17:25.180\nwe abandon the process counting numbers.\n\n0:17:25.180,0:17:28.300\nThis really silly Python program\nthat was just counting numbers,\n\n0:17:28.340,0:17:30.160\nwe left it running there.\n\n0:17:30.200,0:17:31.720\nAnd if we tmux...\n\n0:17:31.720,0:17:33.760\nHey, the process is still running there.\n\n0:17:33.780,0:17:37.820\nAnd we could close this entire\nterminal and open a new one and\n\n0:17:37.880,0:17:41.860\nwe could still reattach because this\ntmux session is still running.\n\n0:17:43.340,0:17:45.340\nAgain, we can...\n\n0:17:46.640,0:17:48.640\nBefore I go any further.\n\n0:17:48.920,0:17:53.740\nPretty much... Unlike Vim, where\nyou have this notion of modes,\n\n0:17:53.960,0:17:58.180\ntmux will work in a more emacsy way, which is\n\n0:17:58.180,0:18:04.140\nevery command, pretty much every command in tmux,\n\n0:18:04.220,0:18:06.020\nyou could enter it through the...\n\n0:18:06.020,0:18:08.160\nit has a command line, that we could use.\n\n0:18:08.240,0:18:11.320\nBut I recommend you to get familiar\nwith the key bindings.\n\n0:18:11.880,0:18:15.080\nIt can be somehow non intuitive at first,\n\n0:18:15.300,0:18:17.880\nbut once you get used to them...\n\n0:18:22.140,0:18:23.020\n\"Ctrl+C\", yeah\n\n0:18:24.440,0:18:30.760\nWhen you get familiar with them, you will be much faster\njust using the key bindings than using the commands.\n\n0:18:31.280,0:18:35.980\nOne note about the key bindings: all the\nkey bindings have a form that is like\n\n0:18:36.140,0:18:39.840\nyou type a prefix and then some key.\n\n0:18:40.060,0:18:44.000\nSo for example, to detach we\ndo \"Ctrl+A\" and then \"D\".\n\n0:18:44.160,0:18:50.140\nThis means you press \"Ctrl+A\" first, you release\nthat, and then press \"D\" to detach.\n\n0:18:50.380,0:18:54.200\nOn default tmux, the prefix is \"Ctrl+B\",\n\n0:18:54.200,0:18:58.780\nbut you will find that most people\nwill have this remapped to \"Ctrl+A\"\n\n0:18:58.780,0:19:02.680\nbecause it's a much more ergonomic\ntype on the keyboard.\n\n0:19:02.700,0:19:06.420\nYou can find more about how to do these\nthings in one of the exercises,\n\n0:19:06.960,0:19:12.780\nwhere we link you to the basics and how to do some\nkind of quality of life modifications to tmux.\n\n0:19:13.380,0:19:16.720\nGoing back to the concept of sessions,\n\n0:19:16.960,0:19:22.120\nwe can create a new session just\ndoing something like tmux new\n\n0:19:22.320,0:19:24.540\nand we can give sessions names.\n\n0:19:24.760,0:19:27.220\nSo we can do like \"tmux new -t foobar\"\n\n0:19:27.220,0:19:30.900\nand this is a completely different\nsession, that we have started.\n\n0:19:32.240,0:19:36.360\nWe can work here, we can detach from it.\n\n0:19:36.360,0:19:40.000\n\"tmux ls\" will tell us that we\nhave two different sessions:\n\n0:19:40.000,0:19:43.460\nthe first one is named \"0\", because\nI didn't give it a name,\n\n0:19:43.500,0:19:45.820\nand the second one is called \"foobar\".\n\n0:19:46.580,0:19:51.020\nI can attach the foobar session\n\n0:19:51.020,0:19:53.700\nand I can end it.\n\n0:19:54.680,0:19:56.340\nAnd it's really nice because\n\n0:19:56.340,0:20:00.139\nhaving this you can kind of work\nin completely different projects.\n\n0:20:00.140,0:20:04.340\nFor example, having two different\ntmux sessions and different\n\n0:20:04.480,0:20:08.440\neditor sessions, different processes running...\n\n0:20:10.160,0:20:15.100\nWhen you are within a session, we\nstart with the concept of windows.\n\n0:20:15.100,0:20:21.160\nHere we have a single window, but we\ncan use \"Ctrl+A c\" (for \"create\")\n\n0:20:21.160,0:20:23.720\nto open a new window.\n\n0:20:24.000,0:20:26.340\nAnd here nothing is executing.\n\n0:20:26.380,0:20:29.420\nWhat it's doing is, tmux has\nopened a new shell for us\n\n0:20:30.360,0:20:34.840\nand we can start running another\none of these programs here.\n\n0:20:35.460,0:20:42.460\nAnd to quickly jump between the tabs,\nwe can do \"Ctrl+A\" and \"previous\",\n\n0:20:42.460,0:20:44.520\n\"p\" for \"previous\",\n\n0:20:45.220,0:20:48.020\nand that will go up to the previous window.\n\n0:20:48.020,0:20:50.920\n\"Ctrl+A\" \"next\", to go to the next window.\n\n0:20:51.260,0:20:56.060\nYou can also use the numbers. So if we\nstart opening a lot of these tabs,\n\n0:20:56.200,0:21:00.160\nwe could use \"Ctrl+A 1\", to\nspecifically jump to the\n\n0:21:00.240,0:21:04.400\nto the window that is number \"1\".\n\n0:21:04.780,0:21:08.620\nAnd, lastly, it's also pretty\nuseful to know sometimes\n\n0:21:08.660,0:21:10.400\nthat you can rename them.\n\n0:21:10.400,0:21:13.380\nFor example here I'm executing\nthis Python process,\n\n0:21:13.580,0:21:16.800\nbut that might not be really\ninformative and I want...\n\n0:21:16.880,0:21:21.160\nI maybe want to have something like\nexecution or something like that and\n\n0:21:21.740,0:21:26.840\nthat will rename the name of that window so\nyou can have this really neatly organized.\n\n0:21:27.080,0:21:33.500\nThis still doesn't solve the need when you want to\nhave two things at the same time in your terminal,\n\n0:21:33.680,0:21:35.740\nlike in the same display.\n\n0:21:35.740,0:21:38.320\nThis is what panes are for. Right now, here\n\n0:21:38.420,0:21:40.420\nwe have a window with a single pane\n\n0:21:40.420,0:21:43.540\n(all the windows that we have opened\nso far have a single pane).\n\n0:21:43.640,0:21:50.800\nBut if we do 'Ctrl+A \"'\n\n0:21:51.040,0:21:56.540\nthis will split the current display\ninto two different panes.\n\n0:21:56.540,0:22:01.400\nSo, you see, the one we open below is a different\nshell from the one we have above,\n\n0:22:01.640,0:22:05.440\nand we can run any process that we want here.\n\n0:22:05.620,0:22:09.900\nWe can keep splitting this, if we do \"Ctrl+A %\"\n\n0:22:10.080,0:22:15.000\nthat will split vertically. And you can kind of\n\n0:22:15.000,0:22:18.220\nrearrange these tabs using a\nlot of different commands.\n\n0:22:18.220,0:22:22.620\nOne that I find very useful, when you are\nstarting and it's kind of frustrating,\n\n0:22:23.540,0:22:26.000\nrearranging them.\n\n0:22:26.160,0:22:30.160\nBefore I explain that, to move\nthrough these panes, which is\n\n0:22:30.300,0:22:32.280\nsomething you want to be doing all the time\n\n0:22:32.460,0:22:37.060\nYou just do \"Ctrl+A\" and the arrow\nkeys, and that will let you quickly\n\n0:22:37.460,0:22:43.960\nnavigate through the different\nwindows, and execute again...\n\n0:22:44.340,0:22:46.300\nI'm doing a lot of \"ls -a\"\n\n0:22:47.340,0:22:52.780\nI can do \"HTOP\", that we'll explain in\nthe debugging and profiling lecture.\n\n0:22:53.540,0:22:55.920\nAnd we can just navigate through them, again\n\n0:22:55.920,0:22:59.040\nlike to rearrange there's\nanother slew of commands,\n\n0:22:59.080,0:23:01.080\nyou will go through some in the Exercises\n\n0:23:02.400,0:23:07.160\n\"Ctrl+A\" space is pretty neat, because it\nwill kind of equispace the current ones\n\n0:23:07.160,0:23:10.260\nand let you through different layouts.\n\n0:23:11.480,0:23:14.260\nSome of them are too small for my current\n\n0:23:14.840,0:23:19.220\nterminal config, but that covers,\nI think, most of it.\n\n0:23:19.440,0:23:21.440\nOh, there's also,\n\n0:23:22.660,0:23:29.200\nhere, for example, this Vim execution\nthat we have started,\n\n0:23:29.200,0:23:33.380\nis too small for what the current tmux pane is.\n\n0:23:33.720,0:23:38.240\nSo one of the things that really is\nmuch more convenient to do in tmux,\n\n0:23:39.180,0:23:42.500\nin contrast to having multiple\nterminal windows, is that\n\n0:23:42.560,0:23:48.400\nyou can zoom into this, you can ask\nby doing \"Ctrl+A z\", for \"zoom\".\n\n0:23:48.400,0:23:52.960\nIt will expand the pane to\ntake over all the space,\n\n0:23:52.960,0:23:56.660\nand then \"Ctrl+A z\", again will go back to it.\n\n0:24:02.760,0:24:08.080\nAny questions for terminal multiplexers,\nor like, tmux concretely?\n\n0:24:14.140,0:24:16.780\nIs it running all the same thing?\n\n0:24:18.680,0:24:22.700\nLike, is there any difference in execution\nbetween running it in different windows?\n\n0:24:24.880,0:24:28.640\nIs it really just doing it all the\nsame, so that you can see it?\n\n0:24:28.800,0:24:34.900\nYeah, it wouldn't be any different from having\ntwo terminal windows open in your computer.\n\n0:24:34.920,0:24:39.220\nLike both of them are gonna be running.\nOf course, when it gets to the CPU,\n\n0:24:39.220,0:24:41.400\nthis is gonna be multiplexed again.\n\n0:24:41.460,0:24:44.400\nLike there's like a timesharing\nmechanism going there\n\n0:24:44.480,0:24:45.920\nbut there's no difference.\n\n0:24:46.040,0:24:52.260\ntmux is just making this much more convenient\nto use by giving you this visual layout\n\n0:24:52.560,0:24:55.020\nthat you can quickly manipulate through.\n\n0:24:55.020,0:24:59.860\nAnd one of the main advantages will come\nwhen we reach the remote machines\n\n0:24:59.860,0:25:05.300\nbecause you can leave one of these, we can\ndetach from one of these tmux systems,\n\n0:25:05.300,0:25:09.120\nclose the connection and even\nif we close the connection and\n\n0:25:09.120,0:25:11.640\nand the terminal is gonna send a hang-up signal,\n\n0:25:11.680,0:25:15.420\nthat's not gonna close all the\ntmux's that have been started.\n\n0:25:17.110,0:25:19.110\nAny other questions?\n\n0:25:23.620,0:25:27.980\nLet me disable the key-caster.\n\n0:25:33.580,0:25:38.040\nSo now we're gonna move into the topic\nof dotfiles and, in general,\n\n0:25:38.040,0:25:42.460\nhow to kind of configure your shell\nto do the things you want to do\n\n0:25:42.460,0:25:45.580\nand mainly how to do them quicker\nand in a more convenient way.\n\n0:25:46.360,0:25:49.260\nI'm gonna motivate this using aliases first.\n\n0:25:49.380,0:25:51.060\nSo what an alias is,\n\n0:25:51.060,0:25:54.260\nis that by now, you might be\nstarting to do something like\n\n0:25:54.920,0:26:01.680\na lot of the time, I just want to LS a directory and\nI want to display all the contents into a list format\n\n0:26:02.180,0:26:05.040\nand in a human readable thing.\n\n0:26:05.260,0:26:07.400\nAnd it's fine. Like it's not\nthat long of a command.\n\n0:26:07.400,0:26:10.300\nBut as you start building longer\nand longer commands,\n\n0:26:10.320,0:26:14.440\nit can become kind of bothersome having\nto retype them again and again.\n\n0:26:14.440,0:26:17.540\nThis is one of the reasons\nwhy aliases are useful.\n\n0:26:17.540,0:26:21.740\nAlias is a command that will\nbe a built-in in your shell,\n\n0:26:21.960,0:26:23.680\nand what it will do is\n\n0:26:23.680,0:26:27.540\nit will remap a short sequence of\ncharacters to a longer sequence.\n\n0:26:27.780,0:26:31.500\nSo if I do, for example, here\n\n0:26:31.500,0:26:36.840\nalias ll=\"ls -lah\"\n\n0:26:37.440,0:26:42.520\nIf I execute this command, this is gonna call\nthe \"alias\" command with this argument\n\n0:26:42.520,0:26:44.320\nand the LS is going to update\n\n0:26:44.540,0:26:49.040\nthe environment in my shell\nto be aware of this mapping.\n\n0:26:49.320,0:26:52.920\nSo if I now do LL,\n\n0:26:52.920,0:26:57.520\nit's executing that command without me\nhaving to type the entire command.\n\n0:26:57.720,0:27:01.180\nIt can be really handy for many, many reasons.\n\n0:27:01.180,0:27:04.740\nOne thing to note before I go any further is that\n\n0:27:05.000,0:27:09.960\nhere, alias is not anything special\ncompared to other commands,\n\n0:27:09.960,0:27:11.400\nit's just taking a single argument.\n\n0:27:11.680,0:27:15.600\nAnd there is no space around\nthis equals and that's\n\n0:27:16.020,0:27:18.720\nbecause alias takes a single argument\n\n0:27:18.720,0:27:21.640\nand if you try doing\n\n0:27:21.960,0:27:25.120\nsomething like this, that's giving\nit more than one argument\n\n0:27:25.120,0:27:28.360\nand that's not gonna work because\nthat's not the format it expects.\n\n0:27:29.520,0:27:33.680\nSo other use cases that work for aliases,\n\n0:27:34.720,0:27:36.549\nas I was saying,\n\n0:27:36.549,0:27:39.920\nfor some things it may be much more convenient,\n\n0:27:40.040,0:27:41.020\nlike\n\n0:27:41.020,0:27:43.200\none of my favorites is git status.\n\n0:27:43.200,0:27:47.500\nIt's extremely long, and I don't like typing\nthat long of a command every so often,\n\n0:27:47.560,0:27:48.960\nbecause you end up taking a lot of time.\n\n0:27:49.120,0:27:53.000\nSo GS will replace for doing the git status\n\n0:27:53.820,0:27:58.620\nYou can also use them to alias\nthings that you mistype often,\n\n0:27:58.620,0:28:01.160\nso you can do \"sl=ls\",\n\n0:28:01.160,0:28:02.540\nthat will work.\n\n0:28:05.800,0:28:10.620\nOther useful mappings are,\n\n0:28:10.680,0:28:15.460\nyou might want to alias a command to itself\n\n0:28:15.740,0:28:17.520\nbut with a default flag.\n\n0:28:17.520,0:28:21.100\nSo here what is going on is I'm creating an alias\n\n0:28:21.100,0:28:23.100\nwhich is an alias for the move command,\n\n0:28:23.300,0:28:29.780\nwhich is MV and I'm aliasing it to the\nsame command but adding the \"-i\" flag.\n\n0:28:29.980,0:28:34.460\nAnd this \"-i\" flag, if you go through the man page\nand look at it, it stands for \"interactive\".\n\n0:28:34.780,0:28:39.880\nAnd what it will do is it will prompt\nme before I do an overwrite.\n\n0:28:39.880,0:28:44.420\nSo once I have executed this,\nI can do something like\n\n0:28:44.700,0:28:47.360\nI want to move \"aliases\" into \"case\".\n\n0:28:47.700,0:28:53.140\nBy default \"move\" won't ask, and if \"case\"\nalready exists, it will be over.\n\n0:28:53.160,0:28:55.780\nThat's fine, I'm going to overwrite\nwhatever that's there.\n\n0:28:56.020,0:28:58.580\nBut here it's now expanded,\n\n0:28:58.580,0:29:01.660\n\"move\" has been expanded into this \"move -i\"\n\n0:29:01.660,0:29:03.540\nand it's using that to ask me\n\n0:29:03.540,0:29:07.400\n\"Oh, are you sure you want to overwrite this?\"\n\n0:29:07.700,0:29:11.780\nAnd I can say no, I don't want to lose that file.\n\n0:29:12.180,0:29:15.820\nLastly, you can use \"alias move\"\n\n0:29:15.820,0:29:18.520\nto ask for what this alias stands for.\n\n0:29:19.100,0:29:22.060\nSo it will tell you so you can quickly make sure\n\n0:29:22.080,0:29:25.400\nwhat the command that you\nare executing actually is.\n\n0:29:27.040,0:29:31.400\nOne inconvenient part about, for example,\nhaving aliases is how will you go about\n\n0:29:31.760,0:29:35.340\npersisting them into your current environment?\n\n0:29:35.500,0:29:38.120\nLike, if I were to close this terminal now,\n\n0:29:38.280,0:29:40.160\nall these aliases will go away.\n\n0:29:40.160,0:29:43.020\nAnd you don't want to be kind\nof retyping these commands\n\n0:29:43.020,0:29:46.760\nand more generally, if you start configuring\nyour shell more and more,\n\n0:29:46.860,0:29:50.880\nyou want some way of bootstrapping\nall this configuration.\n\n0:29:51.380,0:29:56.780\nYou will find that most shell command programs\n\n0:29:56.880,0:30:01.440\nwill use some sort of text\nbased configuration file.\n\n0:30:01.440,0:30:06.740\nAnd this is what we usually call \"dotfiles\", because\nthey start with a dot for historical reasons.\n\n0:30:07.060,0:30:13.160\nSo for bash in our case, which is a shell,\n\n0:30:13.160,0:30:15.560\nwe can look at the bashrc.\n\n0:30:16.180,0:30:19.840\nFor demonstration purposes,\nhere I have been using ZSH,\n\n0:30:19.900,0:30:24.460\nwhich is a different shell, and I'm going\nto be configuring bash, and starting bash.\n\n0:30:24.640,0:30:29.640\nSo if I create an entry here and I say\n\n0:30:29.940,0:30:31.960\nSL maps to LS\n\n0:30:32.600,0:30:36.020\nAnd I have modified that, and now I start bash.\n\n0:30:36.540,0:30:40.660\nBash is kind of completely unconfigured,\nbut now if I do SL...\n\n0:30:41.360,0:30:44.040\nHm, that's unexpected.\n\n0:30:46.280,0:30:48.000\nOh, good. Good getting that.\n\n0:30:48.300,0:30:52.200\nSo it matters where you config file is,\n\n0:30:52.200,0:30:55.260\nyour config file needs to be in your home folder.\n\n0:30:55.640,0:31:00.940\nSo your configuration file for\nbash will live in that \"~\",\n\n0:31:00.940,0:31:03.940\nwhich will expand to your home directory,\n\n0:31:03.940,0:31:05.560\nand then bashrc.\n\n0:31:06.160,0:31:08.840\nAnd here we can create the alias\n\n0:31:12.040,0:31:15.840\nand now we start a bash session and we do SL.\n\n0:31:15.840,0:31:21.500\nNow it has been loaded, and this is\nloaded at the beginning when this\n\n0:31:22.300,0:31:24.300\nbash program is started.\n\n0:31:24.700,0:31:31.200\nAll this configuration is loaded and you can, not only use\naliases, they can have a lot of parts of configuration.\n\n0:31:31.390,0:31:35.729\nSo for example here, I have a prompt\nwhich is fairly useless.\n\n0:31:35.730,0:31:38.429\nIt has just given me the name\nof the shell, which is bash,\n\n0:31:38.640,0:31:43.820\nand the version, which is 5.0. I don't\nwant this to be displayed and\n\n0:31:44.360,0:31:48.540\nas with many things in your shell, this\nis just an environment variable.\n\n0:31:48.600,0:31:53.120\nSo the \"PS1\" is just the prompt string\n\n0:31:53.710,0:31:55.480\nfor your prompt and\n\n0:31:55.480,0:32:02.520\nwe can actually modify this\nto just be a \"> \" symbol.\n\n0:32:02.520,0:32:08.280\nand now that has been modified, and we have\nthat. But if we exit and call bash again,\n\n0:32:08.620,0:32:15.059\nthat was lost. However, if we add this\nentry and say, oh we want \"PS1\"\n\n0:32:15.760,0:32:17.230\nto be\n\n0:32:17.230,0:32:19.179\nthis and\n\n0:32:19.179,0:32:24.689\nwe call bash again, this has been persisted.\nAnd we can keep modifying this configuration.\n\n0:32:25.090,0:32:27.209\nSo maybe we want to include\n\n0:32:27.880,0:32:29.880\nwhere the\n\n0:32:30.370,0:32:32.939\nworking directory that we are is in, and\n\n0:32:34.140,0:32:37.380\nthat's telling us the same information\nthat we had in the other shell.\n\n0:32:37.380,0:32:40.480\nAnd there are many, many options,\n\n0:32:40.780,0:32:45.060\nshells are highly, highly configurable, and\n\n0:32:45.700,0:32:49.920\nit's not only cells that are configured\nthrough these files,\n\n0:32:50.590,0:32:55.740\nthere are many other programs. As we saw for\nexample in the editors lecture, Vim is also\n\n0:32:55.840,0:33:02.900\nconfigured this way. We gave you this vimrc\nfile and told you to put it under your\n\n0:33:03.460,0:33:06.380\nhome/.vimrc\n\n0:33:06.380,0:33:11.800\nand this is the same concept, but just\nfor Vim. It's just giving it a set of\n\n0:33:12.160,0:33:18.340\ninstructions that it should load when it's started,\nso you can keep a configuration that you want.\n\n0:33:19.140,0:33:21.240\nAnd even non...\n\n0:33:21.580,0:33:27.140\nkind of a lot of programs will support this. For instance,\nmy terminal emulator, which is another concept,\n\n0:33:27.260,0:33:30.159\nwhich is the program that is\n\n0:33:30.159,0:33:35.459\nrunning the shell, in a way, and displaying\nthis into the screen in my computer.\n\n0:33:35.950,0:33:38.610\nIt can also be configured this way, so\n\n0:33:39.940,0:33:43.620\nif I modify this I can\n\n0:33:46.510,0:33:53.279\nchange the size of the font. Like right now, for\nexample, I have increased the font size a lot\n\n0:33:53.279,0:33:55.768\nfor demonstration purposes, but\n\n0:33:56.440,0:34:00.360\nif I change this entry and make it for example\n\n0:34:01.320,0:34:06.820\n28 and write this value, you see that\nthe size of the font has changed,\n\n0:34:06.820,0:34:12.920\nbecause I edited this text file that specifies\nhow my terminal emulator should work.\n\n0:34:19.480,0:34:20.900\nAny questions so far?\n\n0:34:20.900,0:34:22.280\nWith dotfiles.\n\n0:34:28.040,0:34:35.940\nOkay, it can be a bit daunting knowing that there\nis like this endless wall of configurations,\n\n0:34:35.940,0:34:40.600\nand how do you go about learning\nabout what can be configured?\n\n0:34:42.020,0:34:44.300\nThe good news is that\n\n0:34:44.640,0:34:48.900\nwe have linked you to really good\nresources in the lecture notes.\n\n0:34:48.960,0:34:56.440\nBut the main idea is that a lot of people really like\njust configuring these tools and have uploaded\n\n0:34:56.640,0:35:01.140\ntheir configuration files to GitHub, another\ndifferent kind of repositories online.\n\n0:35:01.140,0:35:03.300\nSo for example, here we are on GitHub,\n\n0:35:03.300,0:35:06.640\nwe search for dotfiles, and\ncan see that there are like\n\n0:35:06.780,0:35:12.540\nthousands of repositories of people sharing\ntheir configuration files. We have also...\n\n0:35:12.540,0:35:15.460\nLike, the class instructors\nhave linked our dotfiles.\n\n0:35:15.460,0:35:19.420\nSo if you really want to know how\nany part of our setup is working\n\n0:35:19.420,0:35:22.220\nyou can go through it and try to figure it out.\n\n0:35:22.220,0:35:24.220\nYou can also feel free to ask us.\n\n0:35:24.380,0:35:27.060\nIf we go for example to this repository here\n\n0:35:27.210,0:35:30.649\nwe can see that there's many, many\nfiles that you can configure.\n\n0:35:30.650,0:35:37.520\nFor example, there is one for bash, the first couple of ones\nare for git, that will be probably be covered in the\n\n0:35:38.610,0:35:40.819\nversion control lecture tomorrow.\n\n0:35:41.400,0:35:48.500\nIf we go for example to the bash profile, which is\na different form of what we saw in the bashrc,\n\n0:35:49.400,0:35:52.900\nit can be really useful because\nyou can learn through\n\n0:35:53.940,0:35:58.320\njust looking at the manual page, but the\nmanual pages is, a lot of the time\n\n0:35:58.480,0:36:03.520\njust kind of like a descriptive explanation\nof all the different options\n\n0:36:03.520,0:36:04.880\nand sometimes it's more helpful\n\n0:36:04.880,0:36:09.600\ngoing through examples of what people have done\nand trying to understand why they did it\n\n0:36:09.600,0:36:12.200\nand how it's helping their workflow.\n\n0:36:12.960,0:36:17.300\nWe can say here that this person has\ndone case-insensitive globbing.\n\n0:36:17.320,0:36:21.220\nWe covered globbing as this\nkind of filename expansion\n\n0:36:22.100,0:36:25.760\ntrick in the shell scripting and tools.\n\n0:36:25.900,0:36:28.800\nAnd here you say no, I don't want this to matter,\n\n0:36:28.800,0:36:30.760\nwhether using uppercase and lowercase,\n\n0:36:30.760,0:36:32.760\nand just setting this option in the shell\nfor these things to work this way\n\n0:36:35.360,0:36:38.140\nSimilarly, there is for example aliases.\n\n0:36:38.140,0:36:42.220\nHere you can see a lot of aliases that this\nperson is doing. For example, \"d\" for\n\n0:36:44.200,0:36:47.400\n\"d\" for \"Dropbox\", sorry, because\nthat's just much shorter.\n\n0:36:47.400,0:36:49.200\n\"g\" for \"git\"...\n\n0:36:49.740,0:36:54.560\nSay we go, for example, with vimrc. It\ncan be actually very, very informative,\n\n0:36:54.560,0:36:58.860\ngoing through this and trying\nto extract useful information.\n\n0:36:59.000,0:37:06.420\nWe do not recommend just kind of getting one huge blob\nof this and copying this into your config files,\n\n0:37:07.110,0:37:12.439\nbecause maybe things are prettier, but you might\nnot really understand what is going on.\n\n0:37:15.150,0:37:19.579\nLastly one thing I want to mention\nabout dotfiles is that\n\n0:37:20.460,0:37:23.390\npeople not only try to push these\n\n0:37:24.660,0:37:28.849\nfiles into GitHub just so other\npeople can read it, that's\n\n0:37:29.400,0:37:33.319\none reason. They also make really sure they can\n\n0:37:34.140,0:37:39.440\nreproduce their setup. And to do that\nthey use a slew of different tools.\n\n0:37:39.440,0:37:41.280\nOops, went a little too far.\n\n0:37:41.280,0:37:44.840\nSo GNU Stow is, for example, one of them\n\n0:37:45.720,0:37:49.060\nand the trick that they are doing is\n\n0:37:50.280,0:37:54.520\nthey are kind of putting all their\ndotfiles in a folder and they are\n\n0:37:55.200,0:37:59.520\nfaking to the system, using\na tool called symlinks,\n\n0:37:59.520,0:38:02.440\nthat they are actually what\nthey're not. I'm gonna\n\n0:38:03.150,0:38:05.150\ndraw really quick what I mean by that.\n\n0:38:05.790,0:38:10.939\nSo a common folder structure might look\nlike you have your home folder and\n\n0:38:11.670,0:38:14.300\nin this home folder you might have your\n\n0:38:16.050,0:38:21.380\nbashrc, that contains your bash configuration,\nyou might have your vimrc and\n\n0:38:22.500,0:38:25.760\nit would be really great if you could\nkeep this under version control.\n\n0:38:26.580,0:38:29.300\nBut the thing is, you might not\nwant to have a git repository,\n\n0:38:29.300,0:38:31.300\nwhich will be covered tomorrow,\n\n0:38:31.300,0:38:32.300\nin your home folder.\n\n0:38:32.300,0:38:37.360\nSo what people usually do is they\ncreate a dotfiles repository,\n\n0:38:38.280,0:38:42.160\nand then they have entries here for their\n\n0:38:43.050,0:38:47.239\nbashrc and their vimrc. And\nthis is where actually\n\n0:38:47.820,0:38:49.820\nthe files are\n\n0:38:50.100,0:38:52.400\nand what they are doing is they're just\n\n0:38:53.460,0:38:56.510\ntelling the OS to forward, whenever anyone\n\n0:38:56.760,0:39:01.849\nwants to read this file or write to this file,\njust forward this to this other file.\n\n0:39:03.000,0:39:05.719\nThis is a concept called symlinks\n\n0:39:06.690,0:39:08.630\nand it's useful in this scenario,\n\n0:39:08.630,0:39:12.600\nbut it in general it's a really\nuseful tool in UNIX\n\n0:39:12.700,0:39:14.700\nthat we haven't covered so far in the lectures\n\n0:39:14.960,0:39:16.740\nbut you might be...\n\n0:39:16.740,0:39:18.740\nthat you should be familiar with.\n\n0:39:19.100,0:39:22.840\nAnd in general, the syntax will be \"ln -s\"\n\n0:39:22.840,0:39:29.980\nfor specifying a symbolic link and then\nyou will put the path to the file\n\n0:39:30.570,0:39:33.049\nthat you want to create and then the\n\n0:39:33.780,0:39:35.780\nsymlink that you want to create.\n\n0:39:39.390,0:39:41.390\nAnd\n\n0:39:41.880,0:39:45.619\nAll these all these kind of fancy tools\nthat we're seeing here listed,\n\n0:39:45.810,0:39:52.159\nthey all amount to doing some sort of this trick, so\nthat you can have all your dotfiles neat and tidy\n\n0:39:52.680,0:39:57.829\ninto a folder, and then they can be\nversion-controlled, and they can be\n\n0:39:58.349,0:40:02.689\nsymlinked so the rest of the programs can\nfind them in their default locations.\n\n0:40:06.720,0:40:09.020\nAny questions regarding dotfiles?\n\n0:40:13.200,0:40:20.200\nDo you need to have the dotfiles in your home folder,\nand then also dotfiles in the version control folder?\n\n0:40:20.780,0:40:24.640\nSo what you will have is,\npretty much every program,\n\n0:40:24.640,0:40:26.180\nfor example bash,\n\n0:40:26.180,0:40:29.560\nwill always look for \"home/.bashrc\".\n\n0:40:29.560,0:40:33.480\nThat's where the program is going to look for.\n\n0:40:33.820,0:40:40.200\nWhat you do when you do a symlink\nis, you place your \"home/.bashrc\"\n\n0:40:40.200,0:40:44.900\nit's just a file that is kind\nof a special file in UNIX,\n\n0:40:45.150,0:40:49.609\nthat says oh, whenever you want to read\nthis file go to this other file.\n\n0:40:51.500,0:40:53.440\nThere's no content, like there is no...\n\n0:40:53.600,0:40:58.099\nyour aliases are not part of this dotfile. That file\nis just kind of like a pointer, saying now you should\n\n0:40:58.100,0:40:59.400\ngo that other way.\n\n0:40:59.400,0:41:02.600\nAnd by doing that you can have your other file\n\n0:41:02.600,0:41:04.400\nin that other folder.\n\n0:41:04.560,0:41:06.360\nIf version controlling is not useful, think about\n\n0:41:06.360,0:41:10.740\nwhat if you want to have them in your Dropbox\nfolder, so they're synced to the cloud,\n\n0:41:10.759,0:41:15.019\nfor example. That's kind of another use case\nwhere like symlinks could be really useful\n\n0:41:16.240,0:41:21.040\nSo you don't need the folder dotfiles\nto be in the home directory, right?\n\n0:41:21.040,0:41:23.820\nBecause you can just use the symlink,\nthat points somewhere else.\n\n0:41:23.960,0:41:29.760\nAs long as you have a way for the default path\nto resolve wherever you have it, yeah.\n\n0:41:35.100,0:41:38.000\nLast thing I want to cover in the lecture...\n\n0:41:38.000,0:41:40.380\nOh, sorry, any other questions about dotfiles?\n\n0:41:49.200,0:41:52.580\nLast thing I want to cover in the lecture\nis working with remote machines,\n\n0:41:52.580,0:41:55.549\nwhich is a thing that you will run into,\n\n0:41:55.559,0:41:56.900\nsooner or later.\n\n0:41:56.900,0:42:02.238\nAnd there are a few things that will make your life\nmuch easier when dealing with remote machines\n\n0:42:03.180,0:42:05.180\nif you know about them.\n\n0:42:05.220,0:42:08.380\nRight now maybe because you are\nusing the Athena cluster,\n\n0:42:08.380,0:42:10.740\nbut later on, during your programming career,\n\n0:42:10.740,0:42:11.960\nit's pretty sure that\n\n0:42:11.960,0:42:15.400\nthere is a fairly ubiquitous\nconcept of having your\n\n0:42:15.400,0:42:20.380\nlocal working environment and then having some\nproduction server that is actually running the\n\n0:42:20.970,0:42:23.239\ncode, so it is really good to get familiar\n\n0:42:24.480,0:42:26.749\nabout how to work in/with remote machines.\n\n0:42:27.420,0:42:35.180\nSo the main command for working\nwith remote machines is SSH.\n\n0:42:37.760,0:42:43.900\nSSH is just like a secure shell, it's\njust gonna take the responsibility for\n\n0:42:43.900,0:42:46.540\nreaching wherever we want or tell it to go\n\n0:42:47.560,0:42:50.700\nand trying to open a session there.\n\n0:42:50.700,0:42:52.400\nSo here the syntax is:\n\n0:42:53.130,0:42:56.660\n\"JJGO\" is the user that I want\nto use in the remote machine,\n\n0:42:56.660,0:42:58.430\nand this is because the user is\n\n0:42:58.529,0:43:03.460\ndifferent from the one I have my local machine,\nwhich will be the case a lot of the time,\n\n0:43:03.460,0:43:07.400\nthen the \"@\" is telling the\nterminal that this separates\n\n0:43:07.400,0:43:12.540\nwhat the user is from what the address is.\n\n0:43:12.540,0:43:16.540\nAnd here I'm using an IP address because\nwhat I'm actually doing is\n\n0:43:16.540,0:43:20.500\nI have a virtual machine in my computer,\n\n0:43:20.500,0:43:23.240\nthat is the one that is remote right now.\n\n0:43:23.240,0:43:26.400\nAnd I'm gonna be SSH'ing into it. This is the\n\n0:43:26.580,0:43:27.880\nURL that I'm using,\n\n0:43:27.880,0:43:29.860\nsorry, the IP that I'm using,\n\n0:43:29.860,0:43:32.280\nbut you might also see things like\n\n0:43:32.360,0:43:36.820\noh I want to SSH as \"JJGO\"\n\n0:43:36.820,0:43:39.840\nat \"foobar.mit.edu\"\n\n0:43:39.840,0:43:42.960\nThat's probably something more\ncommon, if you are using some\n\n0:43:42.960,0:43:47.260\nremote server that has a DNS name.\n\n0:43:48.180,0:43:51.860\nSo going back to a regular command,\n\n0:43:53.220,0:43:56.580\nwe try to SSH, it asks us for a password,\n\n0:43:56.580,0:43:58.180\nreally common thing.\n\n0:43:58.190,0:43:59.480\nAnd now we're there. We have...\n\n0:43:59.480,0:44:02.629\nwe're still in our same terminal emulator\n\n0:44:02.630,0:44:09.529\nbut right now SSH is kind of forwarding the\nentire virtual display to display what the\n\n0:44:09.869,0:44:14.358\nremote shell is displaying. And\nwe can execute commands here and\n\n0:44:15.630,0:44:17.630\nwe'll see the remote files\n\n0:44:18.390,0:44:22.819\nA couple of handy things to know about\nSSH, that were briefly covered in the\n\n0:44:23.220,0:44:27.080\ndata wrangling lecture, is that\nSSH is not only good for just\n\n0:44:28.280,0:44:33.760\nopening connections. It will also let\nyou just execute commands remotely.\n\n0:44:33.770,0:44:36.979\nSo for example, if I do that, it's gonna ask me\n\n0:44:37.710,0:44:39.020\nwhat is my password?, again.\n\n0:44:39.020,0:44:41.059\nAnd it's executing this command\n\n0:44:41.279,0:44:43.420\nthen coming back to my terminal\n\n0:44:43.420,0:44:47.420\nand piping the output of what that\ncommand was, in the remote machine,\n\n0:44:47.420,0:44:50.480\nthrough the standard output in my current cell.\n\n0:44:50.480,0:44:53.940\nAnd I could have this in...\n\n0:44:58.100,0:45:00.480\nI could have this in a pipe, and\n\n0:45:00.980,0:45:03.580\nthis will work and we'll just\n\n0:45:03.600,0:45:06.100\ndrop all this output and then have a local pipe\n\n0:45:06.100,0:45:07.879\nwhere I can keep working.\n\n0:45:08.640,0:45:12.140\nSo far, it has been kind of inconvenient,\nhaving to type our password.\n\n0:45:12.630,0:45:14.820\nThere's one really good trick for this.\n\n0:45:14.820,0:45:16.880\nIt's we can use something called \"SSH keys\".\n\n0:45:17.140,0:45:20.660\nSSH keys just use public key encryption\n\n0:45:20.660,0:45:24.980\nto create a pair of SSH keys, a public\nkey and a private key, and then\n\n0:45:25.170,0:45:29.320\nyou can give the server the\npublic part of the key.\n\n0:45:29.320,0:45:32.810\nSo you copy the public key and\nthen whenever you try to\n\n0:45:33.390,0:45:37.129\nauthenticate instead of using your password,\nit's gonna use the private key to\n\n0:45:37.820,0:45:40.800\nprove to the server that you are\nactually who you say you are.\n\n0:45:43.860,0:45:48.020\nWe can quickly showcase how you will go\n\n0:45:48.020,0:45:49.400\nabout doing this.\n\n0:45:49.400,0:45:53.180\nRight now I don't have any SSH keys,\nso I'm gonna create a couple of them.\n\n0:45:53.940,0:45:58.250\nFirst thing, it's just gonna ask\nme where I want this key to live.\n\n0:45:58.980,0:46:00.640\nUnsurprisingly, it's doing this.\n\n0:46:00.640,0:46:04.820\nThis is my home folder and then\nit's using this \".ssh\" path,\n\n0:46:05.460,0:46:08.750\nwhich refers back to the same concept\nthat we covered earlier about having\n\n0:46:08.850,0:46:12.439\ndotfiles. Like \".ssh\" is a folder\nthat contains a lot of the\n\n0:46:13.320,0:46:16.540\nconfiguration files for how\nyou want SSH to behave.\n\n0:46:17.060,0:46:19.420\nSo it will ask us a passphrase.\n\n0:46:19.680,0:46:23.120\nThe passphrase is to encrypt\nthe private part of the key\n\n0:46:23.120,0:46:27.160\nbecause if someone gets your private key,\nif you don't have a password protected\n\n0:46:27.920,0:46:29.580\nprivate key, if they get that key\n\n0:46:29.580,0:46:32.240\nthey can use that key to impersonate\nyou in any server.\n\n0:46:32.310,0:46:34.360\nWhereas if you add a passphrase,\n\n0:46:34.360,0:46:37.640\nthey will have to know what the passphrase\nis to actually use the key.\n\n0:46:40.800,0:46:51.740\nIt has created a keeper. We can check that\nthese two files are now under ssh.\n\n0:46:51.740,0:46:53.920\nAnd we can see...\n\n0:46:57.720,0:47:02.960\nWe have these two files: we have\nthe 25519 and the public key.\n\n0:47:03.320,0:47:06.300\nAnd if we \"cat\" through the output,\n\n0:47:06.300,0:47:09.760\nthat key is actually not like\nany fancy binary file, it's\n\n0:47:15.430,0:47:20.760\njust a text file that has the contents\nof the public key and some\n\n0:47:23.050,0:47:26.729\nalias name for it, so we can\nknow what this public key is.\n\n0:47:26.950,0:47:32.220\nThe way we can tell the server that\nwe're authorized to SSH there\n\n0:47:32.260,0:47:38.400\nis by just actually copying this file,\nlike copying this string into a file,\n\n0:47:38.400,0:47:41.540\nthat is \".ssh/authorized_keys\".\n\n0:47:42.100,0:47:46.160\nSo here what I'm doing is I'm\n\n0:47:46.960,0:47:49.770\ncatting the output of this file\n\n0:47:49.800,0:47:53.920\nwhich is just this line of\ntext that we want to copy\n\n0:47:53.920,0:47:57.440\nand I'm piping that into SSH and then remotely\n\n0:47:57.960,0:48:02.080\nI'm asking \"tee\" to dump the contents\nof the standard input\n\n0:48:02.080,0:48:05.220\ninto \".ssh/authorized_keys\".\n\n0:48:05.440,0:48:10.360\nAnd if we do that, obviously it's\ngonna ask us for a password.\n\n0:48:14.800,0:48:18.740\nIt was copied, and now we\ncan check that if we try\n\n0:48:19.690,0:48:21.690\nto SSH again,\n\n0:48:21.960,0:48:24.840\nIt's going to first ask us for a passphrase\n\n0:48:24.840,0:48:29.100\nbut you can arrange that so that\nit's saved in the session\n\n0:48:29.460,0:48:34.840\nand we didn't actually have to\ntype the key for the server.\n\n0:48:34.840,0:48:36.840\nAnd I can kind of show that again.\n\n0:48:45.820,0:48:47.540\nMore things that are useful.\n\n0:48:47.540,0:48:49.040\nOh, we can do...\n\n0:48:49.220,0:48:51.880\nIf that command seemed a little bit janky,\n\n0:48:51.980,0:48:55.000\nyou can actually use this command\nthat is built for this,\n\n0:48:55.000,0:49:00.640\nso you don't have to kind of\ncraft this \"ssh t\" command.\n\n0:49:00.640,0:49:03.800\nThat is just called \"ssh-copy-id\".\n\n0:49:05.000,0:49:08.080\nAnd we can do the same\n\n0:49:08.080,0:49:09.660\nand it's gonna copy the key.\n\n0:49:09.660,0:49:14.280\nAnd now, if we try to SSH,\n\n0:49:14.500,0:49:18.320\nwe can SSH without actually\ntyping any key at all,\n\n0:49:18.860,0:49:20.320\nor any password.\n\n0:49:20.660,0:49:21.520\nMore things.\n\n0:49:21.520,0:49:23.520\nWe will probably want to copy files.\n\n0:49:23.740,0:49:25.310\nYou cannot use \"CP\"\n\n0:49:25.310,0:49:29.720\nbut you can use \"SCP\", for \"SSH copy\".\n\n0:49:29.720,0:49:34.500\nAnd here we can specify that we want\nto copy this local file called notes\n\n0:49:34.500,0:49:36.880\nand the syntax is kind of similar.\n\n0:49:36.880,0:49:39.760\nWe want to copy to this remote and\n\n0:49:39.920,0:49:44.020\nthen we have a semicolon to separate\nwhat the path is going to be.\n\n0:49:44.020,0:49:45.040\nAnd then we have\n\n0:49:45.040,0:49:46.620\noh, we want to copy this as notes\n\n0:49:46.620,0:49:51.000\nbut we could also copy this as foobar.\n\n0:49:51.740,0:49:55.600\nAnd if we do that, it has been executed\n\n0:49:55.780,0:49:59.280\nand it's telling us that all the\ncontents have been copied there.\n\n0:49:59.540,0:50:02.200\nIf you're gonna be copying a lot of files,\n\n0:50:02.200,0:50:05.100\nthere is a better command\nthat you should be using\n\n0:50:05.100,0:50:07.740\nthat is called \"RSYNC\". For example, here\n\n0:50:07.900,0:50:10.780\njust by specifying these three flags,\n\n0:50:10.820,0:50:15.960\nI'm telling RSYNC to kind of preserve\nall the permissions whenever possible\n\n0:50:16.240,0:50:19.740\nto try to check if the file\nhas already been copied.\n\n0:50:19.740,0:50:24.100\nFor example, SCP will try to copy\nfiles that are already there.\n\n0:50:24.200,0:50:26.440\nThis will happen for example\nif you are trying to copy\n\n0:50:26.440,0:50:29.060\nand the connection interrupts\nin the middle of it.\n\n0:50:29.120,0:50:32.060\nSCP will start from the very beginning,\ntrying to copy every file,\n\n0:50:32.080,0:50:36.600\nwhereas RSYNC will continue\nfrom where it stopped.\n\n0:50:37.240,0:50:38.440\nAnd here,\n\n0:50:39.060,0:50:42.760\nwe ask it to copy the entire folder and\n\n0:50:43.780,0:50:46.560\nit's just really quickly\ncopied the entire folder.\n\n0:50:48.080,0:50:54.100\nOne of the other things to know about SSH is that\n\n0:50:54.320,0:50:59.860\nthe equivalent of the dot file\nfor SSH is the \"SSH config\".\n\n0:50:59.860,0:51:06.340\nSo if we edit the SSH config to be\n\n0:51:13.120,0:51:17.940\nIf I edit the SSH config to\nlook something like this,\n\n0:51:17.940,0:51:22.900\ninstead of having to, every\ntime, type \"ssh jjgo\",\n\n0:51:23.040,0:51:27.760\nhaving this really long string so I can\nlike refer to this specific remote,\n\n0:51:27.760,0:51:30.140\nI want to refer, with the specific user name,\n\n0:51:30.140,0:51:32.760\nI can have something here that says\n\n0:51:33.160,0:51:35.680\nthis is the username, this\nis the host name, that this\n\n0:51:36.860,0:51:40.540\nhost is referring to and you should\nuse this identity file.\n\n0:51:41.460,0:51:43.960\nAnd if I copy this,\n\n0:51:43.960,0:51:46.100\nthis is right now in my local folder,\n\n0:51:46.100,0:51:49.000\nI can copy this into ssh.\n\n0:51:49.600,0:51:53.520\nNow, instead of having to do this really\nlong command, I can just say\n\n0:51:53.520,0:51:57.100\nI just want to SSH into the host called VM.\n\n0:51:58.260,0:52:03.220\nAnd by doing that, it's grabbing all that\nconfiguration from the SSH config\n\n0:52:03.220,0:52:05.220\nand applying it here.\n\n0:52:05.240,0:52:10.060\nThis solution is much better than something\nlike creating an alias for SSH,\n\n0:52:10.360,0:52:13.360\nbecause other programs like SCP and RSYNC\n\n0:52:13.360,0:52:19.440\nalso know about the dotfiles for SSH and\nwill use them whenever they are there.\n\n0:52:22.820,0:52:30.400\nLast thing I want to cover about remote machines is\nthat here, for example, we'll have tmux and we can,\n\n0:52:31.760,0:52:35.780\nlike I was saying before, we\ncan start editing some file\n\n0:52:39.160,0:52:44.500\nand we can start running some job.\n\n0:52:54.200,0:52:56.180\nFor example, something like HTOP.\n\n0:52:56.180,0:52:58.720\nAnd this is running here, we can\n\n0:52:59.320,0:53:01.320\ndetach from it,\n\n0:53:01.430,0:53:03.430\nclose the connection and\n\n0:53:03.740,0:53:07.780\nthen SSH back. And then, if you do \"tmux a\",\n\n0:53:07.780,0:53:11.340\neverything is as you left it, like\nnothing has really changed.\n\n0:53:11.340,0:53:15.220\nAnd if you have things executing there in\nthe background, they will keep executing.\n\n0:53:17.500,0:53:23.300\nI think that, pretty much, ends\nall I have to say for this tool.\n\n0:53:23.300,0:53:26.420\nAny questions related to remote machines?\n\n0:53:32.860,0:53:36.780\nThat's a really good question.\nSo what I do for that,\n\n0:53:38.700,0:53:39.460\nOh, yes, sorry.\n\n0:53:39.460,0:53:44.880\nSo the question is, how do you deal with\ntrying to use tmux in your local machine,\n\n0:53:44.880,0:53:47.640\nand also trying to use tmux\nin the remote machine?\n\n0:53:48.400,0:53:50.760\nThere are a couple of tricks\nfor dealing with that.\n\n0:53:50.760,0:53:53.220\nThe first one is changing the prefix.\n\n0:53:53.360,0:53:55.340\nSo what I do, for example, is\n\n0:53:55.340,0:54:00.020\nin my local machine the prefix I have\nchanged from \"Ctrl+B\" to \"Ctrl+A\" and\n\n0:54:00.220,0:54:02.580\nthen in remove machines this is still \"Ctrl+B\".\n\n0:54:02.800,0:54:05.580\nSo I can kind of swap between,\n\n0:54:05.580,0:54:09.840\nif I want to do things to the\nlocal tmux I will do \"Ctrl+A\"\n\n0:54:09.840,0:54:13.460\nand if I want to do things to the\nremote tmux I would do \"Ctrl+B\".\n\n0:54:15.080,0:54:19.900\nAnother thing is that you\ncan have separate configs,\n\n0:54:20.080,0:54:24.100\nso I can do something like this, and then...\n\n0:54:27.260,0:54:31.040\nAh, because I don't have my own ssh config, yeah.\n\n0:54:32.240,0:54:33.000\nBut if you...\n\n0:54:33.000,0:54:34.420\nUm, I can SSH \"VM\".\n\n0:54:36.820,0:54:38.900\nHere, what you see,\n\n0:54:38.900,0:54:41.000\nthe difference between these\ntwo bars, for example,\n\n0:54:41.000,0:54:43.680\nis because the tmux config is different.\n\n0:54:44.380,0:54:48.500\nAs you will see in the exercises,\nthe tmux configuration is in\n\n0:54:50.320,0:54:53.780\nthe tmux.conf\n\n0:54:56.720,0:54:58.140\nAnd in tmux.conf,\n\n0:54:58.140,0:55:02.020\nhere you can do a lot of things like changing\nthe color depending on the host you are\n\n0:55:02.210,0:55:06.879\nso you can get like quick visual\nfeedback about where you are, or\n\n0:55:06.880,0:55:10.240\nif you have a nested session. Also, tmux will,\n\n0:55:10.520,0:55:15.280\nif you're in the same host and you\ntry to tmux within a tmux session,\n\n0:55:15.290,0:55:18.759\nit will kind of prevent you from doing\nit so you don't run into issues.\n\n0:55:21.700,0:55:25.400\nAny other questions related, to kind\nof all the topics we have covered.\n\n0:55:29.100,0:55:32.720\nAnother answer to that question is\nalso, if you type the prefix twice,\n\n0:55:32.880,0:55:35.760\nit sends it once to the underlying shell.\n\n0:55:35.920,0:55:40.100\nSo the local binding is \"Ctrl+A\" and\nthe remote binding is \"Ctrl+A\",\n\n0:55:40.100,0:55:45.260\nYou could type \"Ctrl+A\", \"Ctrl+A\" and then \"D\", for\nexample, detaches from the remote, basically.\n\n0:55:52.480,0:55:59.660\nI think that ends the class for today, there's a bunch\nof exercises related to all these main topics and\n\n0:56:00.380,0:56:05.410\nwe're gonna be holding office hours today, too.\nSo feel free to come and ask us any questions.\n\n"
  },
  {
    "path": "static/files/subtitles/2020/debugging-profiling.sbv",
    "content": "0:00:00.000,0:00:04.200\nSo welcome back. Today we are gonna\ncover debugging and profiling.\n\n0:00:04.720,0:00:09.340\nBefore I get into it we're gonna make another\nreminder to fill in the survey.\n\n0:00:09.520,0:00:14.580\nJust one of the main things we want to get\nfrom you is questions, because the last day\n\n0:00:14.820,0:00:18.080\nis gonna be questions from\nyou guys: about things that\n\n0:00:18.080,0:00:22.020\nwe haven't covered, or like you want\nus to kind of talk more in depth.\n\n0:00:23.350,0:00:26.969\nThe more questions we get, the more interesting\nwe can make that section,\n\n0:00:26.970,0:00:28.900\nso please go on and fill in the survey.\n\n0:00:28.900,0:00:35.660\nSo today's lecture is gonna be a lot of topics.\nAll the topics revolve around the concept of\n\n0:00:35.820,0:00:39.920\nwhat do you do when you have\na program that has some bugs.\n\n0:00:39.920,0:00:42.520\nWhich is most of the time, like when you\nare programming, you're kind of thinking\n\n0:00:42.720,0:00:47.400\nabout how you implement something and there's\nlike a half life of fixing all the issues that\n\n0:00:47.620,0:00:52.140\nthat program has. And even if your program behaves\nlike you want, it might be that it's\n\n0:00:52.390,0:00:55.680\nreally slow, or it's taking a lot\nof resources in the process.\n\n0:00:55.680,0:01:00.569\nSo today we're gonna see a lot of different\napproaches of dealing with these problems.\n\n0:01:01.300,0:01:05.099\nSo first, the first section is on debugging.\n\n0:01:06.159,0:01:08.279\nDebugging can be done in many different ways,\n\n0:01:08.380,0:01:10.119\nthere are all kinds of...\n\n0:01:10.120,0:01:13.640\nThe most simple approach that, pretty much, all\n\n0:01:13.640,0:01:17.140\nCS students will go through, will be just:\nyou have some code, and it's not behaving\n\n0:01:17.160,0:01:20.280\nlike you want, so you probe the code by adding\n\n0:01:20.280,0:01:23.420\nprint statements. This is called\n\"printf debugging\" and\n\n0:01:23.440,0:01:24.450\nit works pretty well.\n\n0:01:24.450,0:01:26.680\nLike, I have to be honest,\n\n0:01:26.820,0:01:33.120\nI use it a lot of the time because of how simple\nto set up and how quick the feedback can be.\n\n0:01:34.360,0:01:39.320\nOne of the issues with printf debugging\nis that you can get a lot of output\n\n0:01:39.320,0:01:40.740\nand maybe you don't want\n\n0:01:40.800,0:01:43.240\nto get as much output as you're getting.\n\n0:01:43.780,0:01:49.349\nThere has... people have thought of slightly more\ncomplex ways of doing printf debugging and\n\n0:01:53.920,0:01:58.320\none of these ways is what is usually\nreferred to as \"logging\".\n\n0:01:58.420,0:02:04.530\nSo the advantage of doing logging versus doing printf\ndebugging is that, when you're creating logs,\n\n0:02:05.080,0:02:09.780\nyou're not necessarily creating the logs because\nthere's a specific issue you want to fix;\n\n0:02:09.780,0:02:12.460\nit's mostly because you have built a\n\n0:02:12.480,0:02:16.840\nmore complex software system and you\nwant to log when some events happen.\n\n0:02:17.360,0:02:21.560\nOne of the core advantages of using\na logging library is that\n\n0:02:22.180,0:02:27.040\nyou can can define severity levels,\nand you can filter based on those.\n\n0:02:27.400,0:02:31.620\nLet's see an example of how we\ncan do something like that.\n\n0:02:32.320,0:02:35.840\nYeah, everything fits here. This\nis a really silly example:\n\n0:02:36.340,0:02:37.520\nWe're just gonna\n\n0:02:37.520,0:02:40.980\nsample random numbers and, depending\non the value of the number,\n\n0:02:41.120,0:02:44.720\nthat we can interpret as a kind\nof \"how wrong things are going\".\n\n0:02:44.740,0:02:48.760\nWe're going to log the value\nof the number and then\n\n0:02:49.340,0:02:51.640\nwe can see what is going on.\n\n0:02:52.580,0:02:59.280\nI need to disable these formatters...\n\n0:02:59.620,0:03:03.720\nAnd if we were just to execute the code as it is,\n\n0:03:04.160,0:03:07.420\nwe just get the output and we just\nkeep getting more and more output.\n\n0:03:07.420,0:03:13.599\nBut you have to kind of stare at it and make\nsense of what is going on, and we don't know\n\n0:03:13.600,0:03:19.629\nwhat is the relative timing between printfs, we don't really\nknow whether this is just an information message\n\n0:03:19.630,0:03:22.960\nor a message of whether something went wrong.\n\n0:03:23.810,0:03:25.810\nIf we just go in,\n\n0:03:27.320,0:03:29.780\nand undo, not that one...\n\n0:03:34.220,0:03:37.140\nThat one, we can set that formatter.\n\n0:03:38.620,0:03:41.600\nNow the output looks something more like this\n\n0:03:41.620,0:03:44.840\nSo for example, if you have several different\nmodules that you are programming with,\n\n0:03:44.840,0:03:46.940\nyou can identify them with like different levels.\n\n0:03:46.940,0:03:49.800\nHere, we have, we have debug levels,\n\n0:03:50.330,0:03:51.890\nwe have critical\n\n0:03:51.890,0:03:57.540\ninfo, different levels. And it might be handy because\nhere we might only care about the error messages.\n\n0:03:57.740,0:04:00.640\nLike those are like, the... We have been\n\n0:04:00.700,0:04:03.960\nworking on our code, so far so good,\nand suddenly we get some error.\n\n0:04:03.960,0:04:06.540\nWe can log that to identify where it's happening.\n\n0:04:06.580,0:04:11.640\nBut maybe there's a lot of information\nmessages, but we can deal with that\n\n0:04:12.709,0:04:16.809\nby just changing the level to error level.\n\n0:04:17.400,0:04:17.900\nAnd\n\n0:04:18.890,0:04:22.960\nnow if we were to run this again,\nwe are only going to get those\n\n0:04:23.620,0:04:28.160\nerrors in the output, and we can just look through\nthose to make sense of what is going on.\n\n0:04:28.920,0:04:33.320\nAnother really useful tool when\nyou're dealing with logs is\n\n0:04:34.130,0:04:36.670\nAs you kind of look at this,\n\n0:04:36.670,0:04:42.580\nit has become easier because now we have this critical\nand error levels that we can quickly identify.\n\n0:04:43.310,0:04:46.750\nBut since humans are fairly visual creatures,\n\n0:04:48.680,0:04:53.109\none thing that you can do is use\ncolors from your terminal to\n\n0:04:53.630,0:04:57.369\nidentify these things. So now,\nchanging the formatter,\n\n0:04:57.369,0:05:03.320\nwhat I've done is slightly change\nhow the output is formatted.\n\n0:05:03.580,0:05:09.340\nWhen I do that, now whenever I get a warning\nmessage, it's color coded by yellow;\n\n0:05:09.340,0:05:10.880\nwhenever I get like an error,\n\n0:05:10.960,0:05:16.140\nfaded red; and when it's critical, I have a\nbold red indicating something went wrong.\n\n0:05:16.280,0:05:22.620\nAnd here it's a really short output, but when you start\nhaving thousands and thousands of lines of log,\n\n0:05:22.620,0:05:26.380\nwhich is not unrealistic and happens\nevery single day in a lot of apps,\n\n0:05:27.140,0:05:32.500\nquickly browsing through them and identifying\nwhere the error or the red patches are\n\n0:05:32.600,0:05:35.320\ncan be really useful.\n\n0:05:35.600,0:05:41.400\nA quick aside is, you might be curious about\nhow the terminal is displaying these colors.\n\n0:05:41.580,0:05:45.320\nAt the end of the day, the terminal\nis only outputting characters.\n\n0:05:47.160,0:05:49.480\nLike, how is this program or how\nare other programs, like LS,\n\n0:05:50.060,0:05:56.050\nthat has all these fancy colors. How are they telling the\nterminal that it should use these different colors?\n\n0:05:56.360,0:05:58.779\nThis is nothing extremely fancy,\n\n0:05:59.440,0:06:03.440\nwhat these tools are doing, is\nsomething along these lines.\n\n0:06:03.740,0:06:04.540\nHere we have...\n\n0:06:05.420,0:06:08.340\nI can clear the rest of the output,\nso we can focus on this.\n\n0:06:08.660,0:06:14.000\nThere's some special characters,\nsome escape characters here,\n\n0:06:14.260,0:06:19.740\nthen we have some text and then we have some other\nspecial characters. And if we execute this line\n\n0:06:19.940,0:06:22.360\nwe get a red \"This is red\".\n\n0:06:22.480,0:06:26.640\nAnd you might have picked up on the\nfact that we have a \"255;0;0\" here,\n\n0:06:26.720,0:06:31.400\nthis is just telling the RGB values of\nthe color we want in the terminal.\n\n0:06:31.400,0:06:38.100\nAnd you pretty much can do this in any piece of code that\nyou have, and like that you can color code the output.\n\n0:06:38.100,0:06:42.540\nYour terminal is fairly fancy and supports\na lot of different colors in the output.\n\n0:06:42.550,0:06:45.400\nThis is not even all of them, this\nis like a sixteenth of them.\n\n0:06:46.100,0:06:49.119\nI think it can be fairly useful\nto know about that.\n\n0:06:52.100,0:06:55.960\nAnother thing is maybe you don't\nenjoy or you don't think\n\n0:06:56.200,0:06:58.620\nlogs are really fit for you.\n\n0:06:58.620,0:07:02.480\nThe thing is a lot of other systems that\nyou might start using will use logs.\n\n0:07:02.840,0:07:05.360\nAs you start building larger and larger systems,\n\n0:07:05.360,0:07:10.140\nyou might rely on other dependencies. Common\ndependencies might be web servers or\n\n0:07:10.220,0:07:12.320\ndatabases, it's a really common one.\n\n0:07:12.440,0:07:17.740\nAnd those will be logging their errors\nor exceptions in their own logs.\n\n0:07:17.740,0:07:20.540\nOf course, you will get some client-side error,\n\n0:07:20.620,0:07:25.140\nbut those sometimes are not informative enough\nfor you to figure out what is going on.\n\n0:07:25.900,0:07:33.940\nIn most UNIX systems, the logs are usually\nplaced under a folder called \"/var/log\"\n\n0:07:33.940,0:07:37.980\nand if we list it, we can see there's\na bunch of logs in here.\n\n0:07:42.680,0:07:48.040\nSo we have like the shutdown monitor\nlog, or some weekly logs.\n\n0:07:49.669,0:07:56.199\nThings related to the Wi-Fi, for\nexample. And if we output the\n\n0:07:57.560,0:08:00.840\nSystem log, which contains a lot\nof information about the system,\n\n0:08:00.840,0:08:03.940\nwe can get information about what's going on.\n\n0:08:04.120,0:08:06.780\nSimilarly, there are tools that will let you\n\n0:08:07.460,0:08:13.090\nmore sanely go through this output.\nBut here, looking at the system log,\n\n0:08:13.090,0:08:15.520\nI can look at this, and say:\n\n0:08:15.760,0:08:20.040\noh there's some service that is\nexiting with some abnormal code\n\n0:08:20.420,0:08:25.460\nand based on that information, I can go\nand try to figure out what's going on,\n\n0:08:25.510,0:08:27.500\nlike what's going wrong.\n\n0:08:29.020,0:08:32.000\nOne thing to know when you're\nworking with logs is that\n\n0:08:32.000,0:08:35.900\nmore traditionally, every software had their own\n\n0:08:35.920,0:08:42.540\nlog, but it has been increasingly more popular to have\na unified system log where everything is placed.\n\n0:08:43.010,0:08:49.299\nPretty much any application can log into the system\nlog, but instead of being in a plain text format,\n\n0:08:49.300,0:08:52.380\nit will be compressed in some special format.\n\n0:08:52.380,0:08:56.460\nAn example of this, it was what we covered\nin the data wrangling lecture.\n\n0:08:56.520,0:08:59.900\nIn the data wrangling lecture we\nwere using the \"journalctl\",\n\n0:09:00.200,0:09:04.280\nwhich is accessing the log and\noutputting all that output.\n\n0:09:04.340,0:09:07.380\nHere in Mac, now the command is \"log show\",\n\n0:09:07.380,0:09:10.020\nwhich will display a lot of information.\n\n0:09:10.100,0:09:15.760\nI'm gonna just display the last ten seconds,\nbecause logs are really, really verbose and\n\n0:09:17.060,0:09:23.720\njust displaying the last 10 seconds is still\ngonna output a fairly large amount of lines.\n\n0:09:23.900,0:09:28.240\nSo if we go back through what's going on,\n\n0:09:28.240,0:09:33.460\nwe here see that a lot of Apple things\nare going on, since this is a macbook.\n\n0:09:33.500,0:09:38.460\nMaybe we could find errors about\nlike some system issue here.\n\n0:09:39.280,0:09:46.920\nAgain they're fairly verbose, so you might want\nto practice your data wrangling techniques here,\n\n0:09:46.920,0:09:50.440\nlike 10 seconds equal to like 500\nlines of logs, so you can kind of\n\n0:09:50.960,0:09:54.960\nget an idea of how many lines\nper second you're getting.\n\n0:09:56.360,0:10:01.060\nThey're not only useful for figuring\nout some other programs' output,\n\n0:10:01.060,0:10:05.619\nthey're also useful for you, if you want to\nlog there instead of into your own file.\n\n0:10:05.779,0:10:11.319\nSo using the \"logger\" command,\nin both linux and mac,\n\n0:10:11.839,0:10:13.480\nYou can say okay\n\n0:10:13.480,0:10:18.880\nI'm gonna log this \"Hello Logs\"\ninto this system log.\n\n0:10:18.880,0:10:21.939\nWe execute the command and then\n\n0:10:22.760,0:10:27.640\nwe can check by going through\nthe last minute of logs,\n\n0:10:27.640,0:10:31.760\nsince it's gonna be fairly recent,\nand grepping for that \"Hello\"\n\n0:10:31.760,0:10:38.260\nwe find our entry. Fairly recent entry, that\nwe just created that said \"Hello Logs\".\n\n0:10:39.220,0:10:46.840\nAs you become more and more familiar with\nthese tools, you will find yourself using\n\n0:10:48.800,0:10:51.279\nthe logs more and more often, since\n\n0:10:51.529,0:10:56.349\neven if you have some bug that you haven't detected,\nand the program has been running for a while,\n\n0:10:56.349,0:11:02.240\nmaybe the information is already in the log and can\ntell you enough to figure out what is going on.\n\n0:11:02.800,0:11:08.260\nHowever, printf debugging is not everything.\nSo now I'm going to be covering debuggers.\n\n0:11:08.260,0:11:10.380\nBut first any questions on logs so far?\n\n0:11:11.720,0:11:15.040\nSo what kind of things can you\nfigure out from the logs?\n\n0:11:15.040,0:11:18.800\nlike this Hello Logs says that you did\nsomething with Hello at that time?\n\n0:11:18.940,0:11:25.040\nYeah, like say, for example, I can\nwrite a bash script that detects...\n\n0:11:25.060,0:11:29.480\nWell, that checks every time what\nWi-Fi network I'm connected to.\n\n0:11:29.480,0:11:34.150\nAnd every time it detects that it has changed,\nit makes an entry in the logs and says\n\n0:11:34.150,0:11:37.440\nOh now it looks like we have\nchanged Wi-Fi networks.\n\n0:11:37.440,0:11:41.400\nand then you might go back and parse\nthrough the logs and take like, okay\n\n0:11:41.510,0:11:47.559\nWhen did my computer change from one Wi-Fi network to\nanother. And this is just kind of a simple example\n\n0:11:47.560,0:11:50.260\nBut there are many, many ways,\n\n0:11:50.660,0:11:54.020\nmany types of information that\nyou could be logging here.\n\n0:11:54.020,0:11:59.040\nMore commonly, you will probably want to\ncheck if your computer, for example, is\n\n0:11:59.100,0:12:02.540\nentering sleep, for example,\nfor some unknown reason.\n\n0:12:02.680,0:12:04.660\nLike it's on hibernation mode.\n\n0:12:04.820,0:12:09.100\nThere's probably some information in the\nlogs about who asked that to happen,\n\n0:12:09.100,0:12:10.240\nor why it's that happening.\n\n0:12:11.720,0:12:14.880\nAny other questions? Okay.\n\n0:12:14.880,0:12:17.380\nSo when printf debugging is not enough,\n\n0:12:18.320,0:12:22.360\nthe best alternative after that is using...\n\n0:12:23.360,0:12:25.360\n[Exit that]\n\n0:12:28.480,0:12:30.260\nSo, it's using a debugger.\n\n0:12:30.580,0:12:37.620\nSo a debugger is a tool that will wrap around\nyour code and will let you run your code,\n\n0:12:38.120,0:12:40.480\nbut it will kind of keep control over it.\n\n0:12:40.480,0:12:42.500\nSo it will let you step\n\n0:12:42.500,0:12:47.080\nthrough the code and execute\nit and set breakpoints.\n\n0:12:47.080,0:12:50.020\nYou probably have seen debuggers\nin some way, if you have\n\n0:12:50.020,0:12:55.800\never used something like an IDE, because IDEs have this\nkind of fancy: set a breakpoint here, execute, ...\n\n0:12:56.080,0:12:59.040\nBut at the end of the day what\nthese tools are using is just\n\n0:12:59.040,0:13:04.740\nthese command line debuggers and they're just\npresenting them in a really fancy format.\n\n0:13:04.850,0:13:09.969\nHere we have a completely broken bubble\nsort, a simple sorting algorithm.\n\n0:13:10.000,0:13:11.560\nDon't worry about the details,\n\n0:13:11.560,0:13:14.980\nbut we just want to sort this\narray that we have here.\n\n0:13:17.360,0:13:19.460\nWe can try doing that by just doing\n\n0:13:21.340,0:13:23.340\nPython bubble.py\n\n0:13:23.500,0:13:28.360\nAnd when we do that... Oh there's some\nindex error, list index out of range.\n\n0:13:28.480,0:13:31.200\nWe could start adding prints\n\n0:13:31.200,0:13:33.740\nbut if have a really long string,\nwe can get a lot of information.\n\n0:13:33.820,0:13:37.820\nSo how about we go up to the\nmoment that we crashed?\n\n0:13:37.900,0:13:41.020\nWe can go to that moment and examine what the\n\n0:13:41.020,0:13:43.360\ncurrent state of the program was.\n\n0:13:43.520,0:13:49.080\nSo for doing that I'm gonna run the\nprogram using the Python debugger.\n\n0:13:49.080,0:13:53.820\nHere I'm using technically the ipython debugger,\njust because it has nice coloring syntax\n\n0:13:54.060,0:13:59.140\nso it's probably easier for\nboth of us to understand\n\n0:13:59.300,0:14:01.300\nwhat's going on in the output.\n\n0:14:01.310,0:14:04.929\nBut they're pretty much identical anyway.\n\n0:14:05.140,0:14:09.400\nSo we execute this, and now we are given a prompt\n\n0:14:09.400,0:14:13.080\nwhere we're being told that we are here,\nat the very first line of our program.\n\n0:14:13.100,0:14:15.440\nAnd we can...\n\n0:14:15.980,0:14:20.380\n\"L\" stands for \"List\", so as\nwith many of these tools\n\n0:14:21.140,0:14:24.400\nthere's kind of like a language\nof operations that you can do,\n\n0:14:24.400,0:14:28.220\nand they are often mnemonic, as it\nwas the case with VIM or TMUX.\n\n0:14:28.860,0:14:32.940\nSo here, \"L\" is for \"Listing\" the code,\nand we can see the entire code.\n\n0:14:34.540,0:14:38.880\n\"S\" is for \"Step\" and will let us kind of one\n\n0:14:38.880,0:14:42.180\nline at a time, go through the execution.\n\n0:14:42.300,0:14:47.360\nThe thing is we're only triggering\nthe error some time later.\n\n0:14:47.360,0:14:48.710\nSo\n\n0:14:48.710,0:14:55.150\nwe can restart the program and instead of\ntrying to step until we get to the issue,\n\n0:14:55.150,0:15:00.820\nwe can just ask for the program to continue\nwhich is the \"C\" command and\n\n0:15:01.480,0:15:04.160\nhey, we reached the issue.\n\n0:15:04.640,0:15:08.080\nWe got to this line where everything crashed,\n\n0:15:08.080,0:15:11.020\nwe're getting this list index out of range.\n\n0:15:11.020,0:15:13.560\nAnd now that we are here we can say, huh?\n\n0:15:14.120,0:15:17.520\nOkay, first, let's print the value of the array.\n\n0:15:18.080,0:15:21.520\nThis is the value of the current array\n\n0:15:23.120,0:15:26.840\nSo we have six items. Okay. What\nis the value of \"J\" here?\n\n0:15:27.200,0:15:31.929\nSo we look at the value of \"J\". \"J\" is 5\nhere, which will be the last element, but\n\n0:15:32.480,0:15:37.119\n\"J\" plus 1 is going to be 6, so that's\ntriggering the out of bounds error.\n\n0:15:37.970,0:15:40.389\nSo what we have to do is\n\n0:15:40.660,0:15:47.660\nthis \"N\", instead of \"N\" has to be \"N minus one\".\nWe have identified that the error lies there.\n\n0:15:47.660,0:15:50.800\nSo we can quit, which is \"Q\".\n\n0:15:52.010,0:15:54.729\nAgain, because it's a post-mortem debugger.\n\n0:15:56.090,0:16:00.219\nWe go back to the code and say okay,\n\n0:16:02.860,0:16:06.180\nwe need to append this \"N minus one\".\n\n0:16:06.760,0:16:11.140\nThat will prevent the list index out of range and\n\n0:16:11.480,0:16:14.260\nif we run this again without the debugger,\n\n0:16:15.020,0:16:18.729\nokay, no errors now. But this\nis not our sorted list.\n\n0:16:18.729,0:16:21.200\nThis is sorted, but it's not our list.\n\n0:16:21.300,0:16:23.000\nWe are missing entries from our list,\n\n0:16:23.160,0:16:27.420\nso there is some behavioral issue\nthat we're reaching here.\n\n0:16:27.920,0:16:32.409\nAgain, we could start using printf\ndebugging but kind of a hunch now\n\n0:16:32.409,0:16:37.940\nis that probably the way we're swapping entries\nin the bubble sort program is wrong.\n\n0:16:38.480,0:16:45.920\nWe can use the debugger for this. We can go through\nthem to the moment we're doing a swap and\n\n0:16:46.120,0:16:48.320\ncheck how the swap is being performed.\n\n0:16:48.540,0:16:50.600\nSo a quick overview,\n\n0:16:50.600,0:16:56.590\nwe have two for loops and\nin the most nested loop,\n\n0:16:56.720,0:17:03.220\nwe are checking if the array is larger than the other array.\nThe thing is if we just try to execute until this line,\n\n0:17:03.589,0:17:06.609\nit's only going to trigger\nwhenever we make a swap.\n\n0:17:06.700,0:17:11.640\nSo what we can do is we can set\na breakpoint in the sixth line.\n\n0:17:11.820,0:17:15.520\nWe can create a breakpoint in this line and then\n\n0:17:15.580,0:17:20.820\nthe program will execute and the moment we try to swap\nvariables is when the program is going to stop.\n\n0:17:21.080,0:17:22.940\nSo we create a breakpoint there\n\n0:17:22.940,0:17:27.000\nand then we continue the execution\nof the program. The program halts\n\n0:17:27.000,0:17:30.520\nand says hey, I have executed\nand I have reached this line.\n\n0:17:30.820,0:17:31.860\nNow\n\n0:17:31.920,0:17:39.120\nI can use \"locals()\", which is a Python function\nthat returns a dictionary with all the values\n\n0:17:39.120,0:17:41.220\nto quickly see the entire context.\n\n0:17:43.100,0:17:48.140\nThe string, the array is fine and is\nsix, again, just the beginning and\n\n0:17:48.680,0:17:51.100\nI step, go to the next line.\n\n0:17:51.780,0:17:52.620\nOh,\n\n0:17:52.620,0:17:57.000\nand I identify the issue: I'm swapping one\nitem at a time, instead of simultaneously,\n\n0:17:57.020,0:18:01.840\nso that's what's triggering the fact that\nwe're losing variables as we go through.\n\n0:18:03.200,0:18:06.729\nThat's kind of a very simple example, but\n\n0:18:07.490,0:18:09.050\ndebuggers are really powerful.\n\n0:18:09.050,0:18:13.320\nMost programming languages will\ngive you some sort of debugger,\n\n0:18:13.540,0:18:19.920\nand when you go to more low level debugging\nyou might run into tools like...\n\n0:18:19.920,0:18:21.920\nYou might want to use something like\n\n0:18:25.340,0:18:27.340\nGDB.\n\n0:18:31.580,0:18:34.360\nAnd GDB has one nice property:\n\n0:18:34.460,0:18:37.740\nGDB works really well with C/C++\nand all these C-like languages.\n\n0:18:37.780,0:18:42.720\nBut GDB actually lets you work with pretty\nmuch any binary that you can execute.\n\n0:18:42.720,0:18:47.800\nSo for example here we have sleep, which is just\na program that's going to sleep for 20 seconds.\n\n0:18:48.520,0:18:55.340\nIt's loaded and then we can do run, and then we\ncan interrupt this sending an interrupt signal.\n\n0:18:55.340,0:19:02.020\nAnd GDB is displaying for us, here, very low-level\ninformation about what's going on in the program.\n\n0:19:02.030,0:19:06.820\nSo we're getting the stack trace, we're seeing\nwe are in this nanosleep function,\n\n0:19:07.060,0:19:11.660\nwe can see the values of all the hardware\nregisters in your machine. So\n\n0:19:12.300,0:19:17.160\nyou can get a lot of low-level\ndetail using these tools.\n\n0:19:18.560,0:19:22.520\nI think that's all I want to cover for debuggers.\n\n0:19:22.520,0:19:25.540\nAny questions related to that?\n\n0:19:33.520,0:19:39.040\nAnother interesting tool when you're trying to\ndebug is that sometimes you want to debug as if\n\n0:19:39.480,0:19:42.220\nyour program is a black box.\n\n0:19:42.220,0:19:46.059\nSo you, maybe, know what the internals\nof the program but at the same time\n\n0:19:46.430,0:19:52.119\nyour computer knows whenever your program\nis trying to do some operations.\n\n0:19:52.280,0:19:54.729\nSo this is in UNIX systems,\n\n0:19:54.760,0:19:58.060\nthere's this notion of like user\nlevel code and kernel level code.\n\n0:19:58.060,0:20:03.180\nAnd when you try to do some operations like reading\na file or like reading the network connection\n\n0:20:03.340,0:20:06.020\nyou will have to do something\ncalled system calls.\n\n0:20:06.180,0:20:12.560\nYou can get a program and go through\nthose operations and ask\n\n0:20:14.000,0:20:18.300\nwhat operations did this software do?\n\n0:20:18.300,0:20:20.920\nSo for example, if you have\nlike a Python function\n\n0:20:20.980,0:20:26.660\nthat is only supposed to do a mathematical operation\nand you run it through this program,\n\n0:20:26.660,0:20:28.460\nand it's actually reading files,\n\n0:20:28.460,0:20:31.940\nWhy is it reading files? It shouldn't\nbe reading files. So, let's see.\n\n0:20:34.520,0:20:37.200\nThis is \"strace\".\n\n0:20:37.200,0:20:38.740\nSo for example, we can do it something like this.\n\n0:20:38.740,0:20:41.260\nSo here we're gonna run the \"LS - L\"\n\n0:20:42.220,0:20:47.900\nAnd then we're ignoring the output of LS, but\nwe are not ignoring the output of STRACE.\n\n0:20:47.900,0:20:49.740\nSo if we execute that...\n\n0:20:52.300,0:20:54.720\nWe're gonna get a lot of output.\n\n0:20:54.920,0:20:58.740\nThis is all the different system calls\n\n0:21:00.520,0:21:02.080\nThat this\n\n0:21:02.090,0:21:07.510\nLS has executed. You will see a bunch\nof OPEN, you will see FSTAT.\n\n0:21:08.150,0:21:14.170\nAnd for example, since it has to list all the properties\nof the files that are in this folder, we can\n\n0:21:15.110,0:21:20.410\ncheck for the LSTAT call. So the LSTAT call will\ncheck for the properties of the files and\n\n0:21:21.020,0:21:27.420\nwe can see that, effectively, all the files\nand folders that are in this directory\n\n0:21:27.700,0:21:31.540\nhave been accessed through\na system call, through LS.\n\n0:21:34.120,0:21:43.400\nInterestingly, sometimes you actually\ndon't need to run your code to\n\n0:21:44.360,0:21:47.000\nfigure out that there is something\nwrong with your code.\n\n0:21:47.960,0:21:52.449\nSo far we have seen enough ways of identifying\nissues by running the code,\n\n0:21:52.450,0:21:54.410\nbut what if you...\n\n0:21:54.410,0:21:58.980\nyou can look at a piece of code like this, like\nthe one I have shown right now in this screen,\n\n0:21:58.980,0:22:00.560\nand identify an issue.\n\n0:22:00.560,0:22:02.030\nSo for example here,\n\n0:22:02.030,0:22:06.670\nwe have some really silly piece of code. It\ndefines a function, prints a few variables,\n\n0:22:07.720,0:22:11.780\nmultiplies some variables, it sleeps for\na while and then we try to print BAZ.\n\n0:22:12.020,0:22:14.840\nAnd you could try to look at\nthis and say, hey, BAZ has\n\n0:22:15.500,0:22:20.650\nnever been defined anywhere. This is a new\nvariable. You probably meant to say BAR\n\n0:22:20.650,0:22:22.540\nbut you just mistyped it.\n\n0:22:22.540,0:22:26.480\nThing is, if we try to run this program,\n\n0:22:28.820,0:22:36.820\nit's gonna take 60 seconds, because like we have to wait until\nthis time.sleep function finishes. Here, sleep is just for\n\n0:22:37.790,0:22:42.070\nmotivating the example but in general you may\nbe loading a data set that takes really long\n\n0:22:42.140,0:22:44.740\nbecause you have to copy everything into memory.\n\n0:22:44.740,0:22:48.780\nAnd the thing is, there are programs\nthat will take source code as input,\n\n0:22:49.340,0:22:54.940\nwill process it and will say, oh probably this is\nwrong about this piece of code. So in Python,\n\n0:22:55.760,0:23:00.600\nor in general, these are called\nstatic analysis tools.\n\n0:23:00.780,0:23:02.860\nIn Python we have for example pyflakes.\n\n0:23:02.860,0:23:06.640\nIf we get this piece of code\nand run it through pyflakes,\n\n0:23:06.860,0:23:09.820\npyflakes is gonna give us a couple of issues.\n\n0:23:10.040,0:23:15.700\nFirst one is the one.... The second one is the one\nwe identified: here's an undefined name called BAZ.\n\n0:23:15.700,0:23:17.760\nYou probably should be doing\nsomething about that.\n\n0:23:17.760,0:23:22.720\nAnd the other one is like\noh, you're redefining the\n\n0:23:23.060,0:23:27.240\nthe FOO variable name in that line.\n\n0:23:27.540,0:23:31.400\nSo here we have a FOO function\nand then we are kind of\n\n0:23:31.400,0:23:34.620\nshadowing that function by\nusing a loop variable here.\n\n0:23:34.760,0:23:38.460\nSo now that FOO function that we\ndefined is not accessible anymore\n\n0:23:38.470,0:23:41.650\nand then if we try to call it afterwards,\nwe will get into errors.\n\n0:23:43.520,0:23:45.520\nThere are other types of\n\n0:23:46.250,0:23:53.170\nStatic Analysis tools. MYPY is a different one. MYPY\nis gonna report the same two errors, but it's also\n\n0:23:53.840,0:24:00.160\ngoing to complain about type checking. So it's gonna\nsay, oh here you're multiplying an int by a float and\n\n0:24:00.680,0:24:06.320\nif you care about the type checking of your\ncode, you should not be mixing those up.\n\n0:24:07.490,0:24:12.219\nit can be kind of inconvenient, having to run\nthis, look at the line, going back to your\n\n0:24:12.800,0:24:17.409\nVIM or like your editor, and figuring\nout what the error matches to.\n\n0:24:18.380,0:24:21.190\nThere are already solutions for that. One\n\n0:24:22.340,0:24:27.069\nway is that you can integrate most\neditors with these tools and here..\n\n0:24:28.279,0:24:34.059\nYou can see there is like some red highlighting on\nthe bash, and it will read the last line here.\n\n0:24:34.059,0:24:36.059\nSo, undefined named 'baz'.\n\n0:24:36.160,0:24:39.080\nSo as I'm editing this piece of Python code,\n\n0:24:39.080,0:24:43.360\nmy editor is gonna give me feedback\nabout what's going wrong with this.\n\n0:24:43.560,0:24:48.480\nOr like here have another one saying\nthe redefinition of unused foo.\n\n0:24:49.849,0:24:51.849\nAnd\n\n0:24:53.080,0:24:56.060\neven, there are some stylistic complaints.\n\n0:24:56.060,0:24:58.060\nSo, oh, I will expect two empty lines.\n\n0:24:58.120,0:25:03.660\nSo like in Python, you should be having two\nempty lines between a function definition.\n\n0:25:05.779,0:25:07.009\nThere are...\n\n0:25:07.009,0:25:09.280\nthere is a resource on the lecture notes\n\n0:25:09.280,0:25:13.160\nabout pretty much static analyzers for a\nlot of different programming languages.\n\n0:25:13.700,0:25:18.460\nThere are even static analyzers for English.\n\n0:25:18.840,0:25:24.260\nSo I have my notes\n\n0:25:24.580,0:25:30.280\nfor the class here, and if I run it through this\nstatic analyzer for English, that is \"writegood\".\n\n0:25:30.409,0:25:33.008\nIt's going to complain about\nsome stylistic properties.\n\n0:25:33.009,0:25:33.489\nSo like, oh,\n\n0:25:33.489,0:25:37.460\nI'm using \"very\", which is a weasel\nword and I shouldn't be using it.\n\n0:25:37.480,0:25:43.080\nOr \"quickly\" can weaken meaning, and you can have\nthis for spell checking, or for a lot of different\n\n0:25:43.600,0:25:48.000\ntypes of stylistic analysis.\n\n0:25:48.760,0:25:52.020\nAny questions so far?\n\n0:25:57.500,0:25:59.490\nOh,\n\n0:25:59.490,0:26:01.490\nI forgot to mention...\n\n0:26:01.640,0:26:07.320\nDepending on the task that you're performing,\nthere will be different types of debuggers.\n\n0:26:07.320,0:26:09.740\nFor example, if you're doing web development,\n\n0:26:09.860,0:26:13.520\nboth Firefox and Chrome\n\n0:26:13.740,0:26:20.600\nhave a really really good set of tools\nfor doing debugging for websites.\n\n0:26:20.600,0:26:23.880\nSo here we go and say inspect element,\n\n0:26:23.880,0:26:25.880\nwe can get the... do you know?\nhow to make this larger...\n\n0:26:27.660,0:26:29.220\nWe're getting\n\n0:26:29.220,0:26:33.380\nthe entire source code for\nthe web page for the class.\n\n0:26:35.549,0:26:37.549\nOh, yeah, here we go.\n\n0:26:38.640,0:26:40.640\nIs that better?\n\n0:26:40.799,0:26:47.149\nAnd we can actually go and change properties about\nthe course. So we can say... we can edit the title.\n\n0:26:47.400,0:26:51.280\nSay, this is not a class on\ndebugging and profiling.\n\n0:26:51.620,0:26:53.940\nAnd now the code for the website has changed.\n\n0:26:54.120,0:26:56.000\nThis is one of the reasons\nwhy you should never trust\n\n0:26:56.200,0:27:00.560\nany screenshots of websites, because\nthey can be completely modified.\n\n0:27:01.320,0:27:05.030\nAnd you can also modify this style.\nLike, here I have things\n\n0:27:06.120,0:27:07.559\nusing the\n\n0:27:07.560,0:27:09.500\nthe dark mode preference,\n\n0:27:09.680,0:27:11.900\nbut we can alter that.\n\n0:27:11.900,0:27:16.560\nBecause at the end of the day, the\nbrowser is rendering this for us.\n\n0:27:17.840,0:27:21.780\nWe can check the cookies, but there's\nlike a lot of different operations.\n\n0:27:21.799,0:27:27.619\nThere's also a built-in debugger for JavaScript,\nso you can step through JavaScript code.\n\n0:27:27.620,0:27:34.020\nSo kind of the takeaway is, depending on what you are\ndoing, you will probably want to search for what tools\n\n0:27:34.320,0:27:36.820\nprogrammers have built for them.\n\n0:27:44.880,0:27:47.630\nNow I'm gonna switch gears and\n\n0:27:48.200,0:27:51.800\nstop talking about debugging, which is kind\nof finding issues with the code, right?\n\n0:27:51.800,0:27:54.200\nkind of more about the behavior,\nand then start talking\n\n0:27:54.200,0:27:56.860\nabout like how you can use profiling.\n\n0:27:56.860,0:27:59.240\nAnd profiling is how to optimize the code.\n\n0:28:01.100,0:28:05.940\nIt might be because you want to optimize\nthe CPU, the memory, the network, ...\n\n0:28:06.330,0:28:09.889\nThere are many different reasons that\nyou want to be optimizing it.\n\n0:28:10.440,0:28:14.000\nAs it was the case with debugging,\nthe kind of first-order approach\n\n0:28:14.000,0:28:16.680\nthat a lot of people have\nexperience with already is\n\n0:28:16.880,0:28:21.880\noh, let's use just printf profiling,\nso to say, like we can just take...\n\n0:28:22.770,0:28:25.610\nLet me make this larger. We can\n\n0:28:26.130,0:28:28.110\ntake the current time here,\n\n0:28:28.110,0:28:34.610\nthen we can check, we can do some execution\nand then we can take the time again and\n\n0:28:35.060,0:28:37.320\nsubtract it from the original time.\n\n0:28:37.320,0:28:39.320\nAnd by doing this you can kind of narrow down\n\n0:28:39.540,0:28:46.040\nand fence some different parts of your code and try to figure\nout what is the time taken between those two parts.\n\n0:28:47.040,0:28:52.639\nAnd that's good. But sometimes it can be interesting,\nthe results. So here, we're sleeping for\n\n0:28:53.730,0:28:59.809\n0.5 seconds and the output is saying,\noh it's 0.5 plus some extra time,\n\n0:28:59.810,0:29:05.929\nwhich is kind of interesting. And if we keep running it,\nwe see there's like some small error and the thing is\n\n0:29:06.240,0:29:11.680\nhere, what we're actually measuring is what\nis usually referred to as the \"real time\".\n\n0:29:12.060,0:29:14.340\nReal time is as if you get\n\n0:29:14.340,0:29:15.930\nlike a\n\n0:29:15.930,0:29:19.249\nclock, and you start it when your program starts,\nand you stop it when your program ends.\n\n0:29:19.500,0:29:23.060\nBut the thing is, in your computer it is\nnot only your program that is running.\n\n0:29:23.060,0:29:27.460\nThere are many other programs running\nat the same time and those might\n\n0:29:27.760,0:29:34.640\nbe the ones that are taking the CPU.\nSo, to try to make sense of that,\n\n0:29:35.790,0:29:39.259\nA lot of... you'll see a lot of programs\n\n0:29:40.620,0:29:43.250\nusing the terminology that is\n\n0:29:44.100,0:29:46.760\nreal time, user time and system time.\n\n0:29:46.760,0:29:51.460\nReal time is what I explained, which is kind of\nthe entire length of time from start to finish.\n\n0:29:51.840,0:29:59.780\nThen there is the user time, which is the amount of time\nyour program spent on the CPU doing user level cycles.\n\n0:29:59.780,0:30:06.100\nSo as I was mentioning, in UNIX, you can be running\nuser level code or kernel level code.\n\n0:30:06.920,0:30:12.940\nSystem is kind of the opposite, it's the amount of CPU, like\nthe amount of time that your program spent on the CPU\n\n0:30:13.500,0:30:18.480\nexecuting kernel mode instructions.\nSo let's show this with an example.\n\n0:30:18.620,0:30:22.180\nHere I'm going to \"time\", which is a command,\n\n0:30:22.460,0:30:27.840\na shell command that's gonna get these three metrics\nfor the following command, and then I'm just\n\n0:30:28.100,0:30:30.560\ngrabbing a URL from\n\n0:30:31.160,0:30:36.760\na website that is hosted in Spain. So that's gonna take\nsome extra time to go over there and then go back.\n\n0:30:37.410,0:30:39.499\nIf we see, here, if we were to just...\n\n0:30:39.780,0:30:43.670\nWe have two prints, between the beginning\nand the end of the program.\n\n0:30:43.670,0:30:49.039\nWe could think that this program is taking like\n600 milliseconds to execute, but actually\n\n0:30:49.500,0:30:56.930\nmost of that time was spent just waiting for the\nresponse on the other side of the network and\n\n0:30:57.330,0:31:04.880\nwe actually only spent 16 milliseconds at the user level\nand like 9 seconds, in total 25 milliseconds, actually\n\n0:31:05.280,0:31:08.149\nexecuting CURL code. Everything\nelse was just waiting.\n\n0:31:12.090,0:31:14.480\nAny questions related to timing?\n\n0:31:19.860,0:31:21.860\nOk, so\n\n0:31:21.990,0:31:23.580\ntiming can be\n\n0:31:23.580,0:31:29.480\ncan become tricky, it's also kind of a black box solution.\nOr if you start adding print statements,\n\n0:31:29.660,0:31:35.860\nit's kind of hard to add print statements, with time everywhere.\nSo programmers have figured out better tools.\n\n0:31:36.140,0:31:38.700\nThese are usually referred to as \"profilers\".\n\n0:31:39.980,0:31:44.260\nOne quick note that I'm gonna make, is that\n\n0:31:44.720,0:31:46.720\nprofilers, like usually when people\n\n0:31:46.800,0:31:48.800\nrefer to profilers they usually talk about\n\n0:31:49.050,0:31:55.190\nCPU profilers because they are the most common, at identifying\nwhere like time is being spent on the CPU.\n\n0:31:56.790,0:31:59.180\nProfilers usually come in kind of two flavors:\n\n0:31:59.180,0:32:02.140\nthere's tracing profilers and sampling profilers.\n\n0:32:02.140,0:32:06.380\nand it's kind of good to know the difference\nbecause the output might be different.\n\n0:32:07.640,0:32:10.300\nTracing profilers kind of instrument your code.\n\n0:32:10.680,0:32:15.799\nSo they kind of execute with your code and every\ntime your code enters a function call,\n\n0:32:15.800,0:32:20.479\nthey kind of take a note of it. It's like, oh we're entering\nthis function call at this moment in time and\n\n0:32:21.860,0:32:24.860\nthey keep going and, once they\nfinish, they can report\n\n0:32:24.860,0:32:28.300\noh, you spent this much time executing\nin this function and\n\n0:32:28.580,0:32:33.760\nthis much time in this other function. So on, so forth,\nwhich is the example that we're gonna see now.\n\n0:32:34.590,0:32:38.329\nAnother type of tools are tracing,\nsorry, sampling profilers.\n\n0:32:38.430,0:32:44.840\nThe issue with tracing profilers is they add a lot of overhead.\nLike you might be running your code and having these kind of\n\n0:32:46.280,0:32:49.400\nprofiling next to you making all these counts,\n\n0:32:49.400,0:32:54.340\nwill hinder the performance of your program, so\nyou might get counts that are slightly off.\n\n0:32:55.380,0:32:59.450\nA sampling profiler, what it's gonna do\nis gonna execute your program and every\n\n0:32:59.940,0:33:05.239\n100 milliseconds, 10 milliseconds, like some defined period,\nit's gonna stop your program. It's gonna halt it,\n\n0:33:05.580,0:33:12.379\nit's gonna look at the stack trace and say, oh, you're\nright now in this point in the hierarchy, and\n\n0:33:12.630,0:33:15.530\nidentify which function is gonna\nbe executing at that point.\n\n0:33:16.260,0:33:19.760\nThe idea is that as long as you\nexecute this for long enough,\n\n0:33:19.760,0:33:24.290\nyou're gonna get enough statistics to know\nwhere most of the time is being spent.\n\n0:33:25.800,0:33:28.800\nSo, let's see an example of a tracing profiling.\n\n0:33:28.800,0:33:32.340\nSo here we have a piece of\ncode that is just like a\n\n0:33:33.480,0:33:35.540\nreally simple re-implementation of grep\n\n0:33:36.330,0:33:38.330\ndone in Python.\n\n0:33:38.400,0:33:44.030\nWhat we want to check is what is the bottleneck of this\nprogram? Like we're just opening a bunch of files,\n\n0:33:44.900,0:33:49.620\ntrying to match this pattern, and then\nprinting whenever we find a match.\n\n0:33:49.620,0:33:52.340\nAnd maybe it's the regex, maybe it's the print...\n\n0:33:52.460,0:33:53.940\nWe don't really know.\n\n0:33:53.940,0:33:59.040\nSo to do this in Python, we have the \"cProfile\".\n\n0:33:59.040,0:34:00.080\nAnd\n\n0:34:00.990,0:34:06.620\nhere I'm just calling this module and saying I want\nto sort this by the total amount of time, that\n\n0:34:06.780,0:34:13.429\nwe're gonna see briefly. I'm calling the\nprogram we just saw in the editor.\n\n0:34:13.429,0:34:18.679\nI'm gonna execute this a thousand times\nand then I want to match (the grep\n\n0:34:18.960,0:34:21.770\nArguments here) is I want to match these regex\n\n0:34:22.919,0:34:27.469\nto all the Python files in here.\nAnd this is gonna output some...\n\n0:34:30.780,0:34:34.369\nThis is gonna produce some output,\nthen we're gonna look at it. First,\n\n0:34:34.369,0:34:38.539\nis all the output from the greps,\nbut at the very end, we're getting\n\n0:34:39.119,0:34:42.979\noutput from the profiler itself. If we go up\n\n0:34:44.129,0:34:46.939\nwe can see that, hey,\n\n0:34:47.730,0:34:55.250\nby sorting we can see that the total number of calls. So we\ndid 8000 calls, because we executed this 1000 times and\n\n0:34:57.360,0:35:03.440\nthis is the total amount of time we spent in this function\n(cumulative time). And here we can start to identify\n\n0:35:03.920,0:35:06.040\nwhere the bottleneck is.\n\n0:35:06.050,0:35:11.449\nSo here, this built-in method IO open, is saying that\nwe're spending a lot of the time just waiting for\n\n0:35:12.080,0:35:14.340\nreading from the disk or...\n\n0:35:14.340,0:35:15.680\nThere, we can check, hey,\n\n0:35:15.680,0:35:19.840\na lot of time is also being spent\ntrying to match the regex.\n\n0:35:19.840,0:35:22.640\nWhich is something that you will expect.\n\n0:35:22.640,0:35:26.220\nOne of the caveats of using this\n\n0:35:26.480,0:35:29.540\ntracing profiler is that, as you can see, here\n\n0:35:29.540,0:35:35.239\nwe're seeing our function but we're also seeing\na lot of functions that correspond to built-ins.\n\n0:35:35.240,0:35:35.910\nSo like,\n\n0:35:35.910,0:35:41.899\nfunctions that are third party functions from the libraries.\nAnd as you start building more and more complex code,\n\n0:35:41.900,0:35:43.560\nThis is gonna be much harder.\n\n0:35:44.200,0:35:44.760\nSo\n\n0:35:46.080,0:35:49.720\nhere is another piece of Python code that,\n\n0:35:51.540,0:35:53.779\ndon't read through it, what it's doing is just\n\n0:35:54.420,0:35:57.589\ngrabbing the course website and\nthen it's printing all the...\n\n0:35:58.440,0:36:01.960\nIt's parsing it, and then it's printing\nall the hyperlinks that it has found.\n\n0:36:01.960,0:36:03.520\nSo there are like these two operations:\n\n0:36:03.520,0:36:07.800\ngoing there, grabbing a website, and\nthen parsing it, printing the links.\n\n0:36:07.800,0:36:09.740\nAnd we might want to get a sense of\n\n0:36:09.740,0:36:16.180\nhow those two operations compare to each\nother. If we just try to execute the\n\n0:36:16.680,0:36:18.680\ncProfiler here and\n\n0:36:19.260,0:36:24.949\nwe're gonna do the same, this is not gonna print anything.\nI'm using a tool we haven't seen so far,\n\n0:36:24.950,0:36:25.700\nbut I think it's pretty nice.\n\n0:36:25.700,0:36:32.810\nIt's \"TAC\", which is the opposite of \"CAT\", and it is going\nto reverse the output so I don't have to go up and look.\n\n0:36:33.430,0:36:35.430\nSo we do this and...\n\n0:36:36.250,0:36:39.179\nHey, we get some interesting output.\n\n0:36:39.880,0:36:46.200\nwe're spending a bunch of time in this built-in method\nsocket_getaddr_info and like in _imp_create_dynamic and\n\n0:36:46.510,0:36:48.540\nmethod_connect and posix_stat...\n\n0:36:49.210,0:36:55.740\nnothing in my code is directly calling these functions so I\ndon't really know what is the split between the operation of\n\n0:36:56.349,0:37:03.929\nmaking a web request and parsing the output of\nthat web request. So, for that, we can use\n\n0:37:04.900,0:37:07.920\na different type of profiler which is\n\n0:37:09.819,0:37:14.309\na line profiler. And the line profiler is\njust going to present the same results\n\n0:37:14.310,0:37:20.879\nbut in a more human-readable way, which is just, for this\nline of code, this is the amount of time things took.\n\n0:37:24.819,0:37:31.079\nSo it knows it has to do that, we have to add a\ndecorator to the Python function, we do that.\n\n0:37:34.869,0:37:36.869\nAnd as we do that,\n\n0:37:37.119,0:37:39.749\nwe now get slightly cropped output,\n\n0:37:39.750,0:37:46.169\nbut the main idea, we can look at the percentage of time and\nwe can see that making this request, get operation, took\n\n0:37:46.450,0:37:52.829\n88% of the time, whereas parsing the\nresponse took only 10.9% of the time.\n\n0:37:54.069,0:38:00.869\nThis can be really informative and a lot of different programming\nlanguages will support this type of a line profiling.\n\n0:38:04.569,0:38:07.439\nSometimes, you might not care about CPU.\n\n0:38:07.440,0:38:15.000\nMaybe you care about the memory or like some other resource.\nSimilarly, there are memory profilers: in Python\n\n0:38:15.000,0:38:21.599\nthere is \"memory_profiler\", for C you will have\n\"Valgrind\". So here is a fairly simple example,\n\n0:38:21.760,0:38:28.530\nwe just create this list with a million elements. That's\ngoing to consume like megabytes of space and\n\n0:38:29.200,0:38:33.920\nwe do the same, creating another\none with 20 million elements.\n\n0:38:34.860,0:38:38.180\nTo check, what was the memory allocation?\n\n0:38:38.980,0:38:44.369\nHow it's gonna happen, what's the consumption?\nWe can go through one memory profiler and\n\n0:38:44.950,0:38:46.619\nwe execute it,\n\n0:38:46.620,0:38:51.380\nand it's telling us the total memory\nusage and the increments.\n\n0:38:51.380,0:38:57.980\nAnd we can see that we have some overhead, because\nthis is an interpreted language and when we create\n\n0:38:58.450,0:39:00.599\nthis million,\n\n0:39:03.520,0:39:07.340\nthis list with a million entries, we're gonna\nneed this many megabytes of information.\n\n0:39:07.660,0:39:15.299\nThen we were getting another 150 megabytes. Then, we're freeing\nthis entry and that's decreasing the total amount.\n\n0:39:15.299,0:39:19.169\nWe are not getting a negative increment because\nof a bug, probably in the profiler.\n\n0:39:19.509,0:39:26.549\nBut if you know that your program is taking a huge amount of\nmemory and you don't know why, maybe because you're copying\n\n0:39:26.920,0:39:30.269\nobjects where you should be\ndoing things in place, then\n\n0:39:31.140,0:39:33.320\nusing a memory profiler can be really useful.\n\n0:39:33.320,0:39:37.780\nAnd in fact there's an exercise that will\nkind of work you through that, comparing\n\n0:39:37.980,0:39:39.980\nan in-place version of quicksort with like a\n\n0:39:40.059,0:39:44.008\nnon-inplace, that keeps making new and new copies.\nAnd if you using the memory profiler\n\n0:39:44.009,0:39:47.909\nyou can get a really good comparison\nbetween the two of them\n\n0:39:51.069,0:39:53.459\nAny questions so far, with profiling?\n\n0:39:53.460,0:39:57.940\nIs the memory profiler running the\nprogram in order to get that?\n\n0:39:58.140,0:40:03.180\nYeah... you might be able to figure\nout like just looking at the code.\n\n0:40:03.180,0:40:05.759\nBut as you get more and more complex\n(for this code at least)\n\n0:40:06.009,0:40:10.738\nBut you get more and more complex programs what\nthis is doing is running through the program\n\n0:40:10.739,0:40:16.739\nand for every line, at the very beginning,\nit's looking at the heap and saying\n\n0:40:16.739,0:40:19.319\n\"What are the objects that I have allocated now?\"\n\n0:40:19.319,0:40:22.979\n\"I have seven megabytes of objects\",\nand then goes to the next line,\n\n0:40:23.190,0:40:27.869\nlooks again, \"Oh now I have 50,\nso I have now added 43 there\".\n\n0:40:28.839,0:40:34.709\nAgain, you could do this yourself by asking for those\noperations in your code, every single line.\n\n0:40:34.920,0:40:39.899\nBut that's not how you should be doing things since people\nhave already written these tools for you to use.\n\n0:40:43.089,0:40:46.078\nAs it was the case with...\n\n0:40:51.480,0:40:58.220\nSo as in the case with strace, you can\ndo something similar in profiling.\n\n0:40:58.340,0:41:03.380\nYou might not care about the specific\nlines of code that you have,\n\n0:41:03.440,0:41:08.200\nbut maybe you want to check for outside events.\nLike, you maybe want to check how many\n\n0:41:09.410,0:41:14.469\nCPU cycles your computer program is using,\nor how many page faults it's creating.\n\n0:41:14.469,0:41:19.239\nMaybe you have like bad cache locality\nand that's being manifested somehow.\n\n0:41:19.340,0:41:22.960\nSo for that, there is the \"perf\" command.\n\n0:41:22.960,0:41:27.220\nThe perf command is gonna do this, where it\nis gonna run your program and it's gonna\n\n0:41:28.720,0:41:33.360\nkeep track of all these statistics and report them back\nto you. And this can be really helpful if you are\n\n0:41:33.680,0:41:36.060\nworking at a lower level. So\n\n0:41:37.300,0:41:42.840\nwe execute this command, I'm gonna\nexplain briefly what it's doing.\n\n0:41:48.650,0:41:51.639\nAnd this stress program is just\n\n0:41:52.219,0:41:54.698\nrunning in the CPU, and it's\njust a program to just\n\n0:41:54.829,0:41:59.528\nhog one CPU and like test that you can\nhog the CPU. And now if we Ctrl-C,\n\n0:42:00.619,0:42:02.708\nwe can go back and\n\n0:42:03.410,0:42:08.559\nwe get some information about the number of\npage faults that we have or the number of\n\n0:42:09.769,0:42:11.769\nCPU cycles that we utilize, and other\n\n0:42:12.469,0:42:14.329\nuseful\n\n0:42:14.329,0:42:18.968\nmetrics from our code. For some programs you can\n\n0:42:21.469,0:42:25.089\nlook at what the functions\nthat were being used were.\n\n0:42:26.120,0:42:30.140\nSo we can record what this program is doing,\n\n0:42:30.940,0:42:34.920\nwhich we don't know about because it's\na program someone else has written.\n\n0:42:35.240,0:42:37.240\nAnd\n\n0:42:38.180,0:42:42.279\nwe can report what it was doing by looking\nat the stack trace and we can say oh,\n\n0:42:42.279,0:42:44.279\nIt's spending a bunch of time in this\n\n0:42:44.660,0:42:46.640\n__random_r\n\n0:42:46.640,0:42:53.229\nstandard library function. And it's mainly because the way of hogging\na CPU is by just creating more and more pseudo-random numbers.\n\n0:42:53.779,0:42:55.779\nThere are some other\n\n0:42:55.819,0:42:58.149\nfunctions that have not been mapped, because they\n\n0:42:58.369,0:43:01.448\nbelong to the program, but if\nyou know about your program\n\n0:43:01.448,0:43:05.140\nyou can display this information\nusing more flags, about perf.\n\n0:43:05.140,0:43:10.220\nThere are really good tutorials online\nabout how to use this tool.\n\n0:43:12.010,0:43:14.010\nOh\n\n0:43:14.119,0:43:17.349\nOne one more thing regarding\nprofilers is, so far,\n\n0:43:17.350,0:43:20.109\nwe have seen that these profilers\nare really good at\n\n0:43:20.510,0:43:25.419\naggregating all this information and giving\nyou a lot of these numbers so you can\n\n0:43:25.790,0:43:29.739\noptimize your code or you can reason\nabout what is happening, but\n\n0:43:30.560,0:43:31.550\nthe thing is\n\n0:43:31.550,0:43:35.949\nhumans are not really good at making\nsense of lots of numbers and since\n\n0:43:36.080,0:43:39.249\nhumans are more visual creatures, it's much\n\n0:43:39.920,0:43:42.980\neasier to kind of have some\nsort of visualization.\n\n0:43:42.980,0:43:48.700\nAgain, programmers have already thought about\nthis and have come up with solutions.\n\n0:43:49.480,0:43:56.160\nA couple of popular ones, is a\nFlameGraph. A FlameGraph is a\n\n0:43:56.780,0:44:00.160\nsampling profiler. So this is just running\nyour code and taking samples\n\n0:44:00.160,0:44:03.280\nAnd then on the y-axis here\n\n0:44:03.280,0:44:10.980\nwe have the depth of the stack so we know that the bash function\ncalled this other function, and this called this other function,\n\n0:44:11.260,0:44:14.480\nso on, so forth. And on the x-axis it's\n\n0:44:14.630,0:44:17.500\nnot time, it's not the timestamps.\n\n0:44:17.500,0:44:23.290\nLike it's not this function run before, but it's just time\ntaken. Because, again, this is a sampling profiler:\n\n0:44:23.290,0:44:28.540\nwe're just getting small glimpses of what was it going\non in the program. But we know that, for example,\n\n0:44:29.119,0:44:32.949\nthis main program took the most time because the\n\n0:44:33.530,0:44:35.530\nx-axis is proportional to that.\n\n0:44:36.020,0:44:43.090\nThey are interactive and they can be really useful\nto identify the hot spots in your program.\n\n0:44:44.720,0:44:50.540\nAnother way of displaying information, and there is also\nan exercise on how to do this, is using a call graph.\n\n0:44:50.720,0:44:58.320\nSo a call graph is going to be displaying information, and it's gonna\ncreate a graph of which function called which other function.\n\n0:44:58.620,0:45:00.940\nAnd then you get information about, like,\n\n0:45:00.940,0:45:05.770\noh, we know that \"__main__\" called this\n\"Person\" function ten times and\n\n0:45:06.050,0:45:08.919\nit took this much time. And as you have\n\n0:45:09.080,0:45:13.029\nlarger and larger programs, looking at one of\nthese call graphs can be useful to identify\n\n0:45:14.270,0:45:19.689\nwhat piece of your code is calling this really\nexpensive IO operation, for example.\n\n0:45:24.560,0:45:30.360\nWith that I'm gonna cover the last\npart of the lecture, which is that\n\n0:45:30.360,0:45:36.600\nsometimes, you might not even know what exact\nresource is constrained in your program.\n\n0:45:36.619,0:45:39.019\nLike how do I know how much CPU\n\n0:45:39.380,0:45:44.060\nmy program is using, and I can quickly\nlook in there, or how much memory.\n\n0:45:44.060,0:45:46.680\nSo there are a bunch of really\n\n0:45:46.700,0:45:49.760\nnifty tools for doing that one of them is\n\n0:45:50.400,0:45:53.270\nHTOP. so HTOP is an\n\n0:45:54.000,0:45:59.810\ninteractive command-line tool and here it's\ndisplaying all the CPUs this machine has,\n\n0:46:00.160,0:46:07.740\nwhich is 12. It's displaying the amount of memory, it says I'm\nconsuming almost a gigabyte of the 32 gigabytes my machine has.\n\n0:46:07.740,0:46:11.660\nAnd then I'm getting all the different processes.\n\n0:46:11.730,0:46:13.290\nSo for example we have\n\n0:46:13.290,0:46:20.300\nzsh, mysql and other processes that are running in this\nmachine, and I can sort through the amount of CPU\n\n0:46:20.300,0:46:24.379\nthey're consuming or through the\npriority they're running at.\n\n0:46:25.980,0:46:28.129\nWe can check this, for example. Here\n\n0:46:28.130,0:46:30.230\nwe have the stress command again\n\n0:46:30.230,0:46:31.470\nand we're going to\n\n0:46:31.470,0:46:37.040\nrun it to take over four CPUs and check\nthat we can see that in HTOP.\n\n0:46:37.040,0:46:42.880\nSo we did spot those four CPU\njobs, and now I have seen that\n\n0:46:43.710,0:46:46.429\nbesides the ones we had before,\nnow I have this...\n\n0:46:50.310,0:46:56.119\nLike this \"stress -c\" command running\nand taking a bunch of our CPU.\n\n0:46:56.849,0:47:03.169\nEven though you could use a profiler to get similar information to\nthis, the way HTOP displays this kind of in a live interactive\n\n0:47:03.329,0:47:07.099\nfashion can be much quicker\nand much easier to parse.\n\n0:47:07.890,0:47:09.890\nIn the notes, there's a\n\n0:47:10.160,0:47:15.180\nreally long list of different tools for evaluating\ndifferent parts of your system.\n\n0:47:15.180,0:47:17.180\nSo that might be tools for analyzing the\n\n0:47:17.180,0:47:19.720\nnetwork performance, about looking the\n\n0:47:20.430,0:47:24.530\nnumber of IO operations, so you know\nwhether you're saturating the\n\n0:47:26.040,0:47:28.040\nthe reads from your disks,\n\n0:47:28.829,0:47:31.429\nyou can also look at what is the space usage.\n\n0:47:32.069,0:47:34.369\nWhich, I think, here...\n\n0:47:38.690,0:47:44.829\nSo NCDU... There's a tool called \"du\"\nwhich stands for \"disk usage\" and\n\n0:47:45.440,0:47:49.480\nwe have the \"-h\" flag for\n\"human readable output\".\n\n0:47:51.740,0:47:58.959\nWe can do videos and we can get output about\nthe size of all the files in this folder.\n\n0:48:08.059,0:48:10.059\nYeah, there we go.\n\n0:48:10.400,0:48:15.040\nThere are also interactive versions,\nlike HTOP was an interactive version.\n\n0:48:15.280,0:48:21.200\nSo NCDU is an interactive version that will let me navigate\nthrough the folders and I can see quickly that\n\n0:48:21.200,0:48:25.740\noh, we have... This is one of the\nfolders for the video lectures,\n\n0:48:26.329,0:48:29.049\nand we can see there are these four files\n\n0:48:29.690,0:48:36.579\nthat have like almost 9 GB each and I could\nquickly delete them through this interface.\n\n0:48:37.760,0:48:43.839\nAnother neat tool is \"LSOF\" which\nstands for \"LIST OF OPEN FILES\".\n\n0:48:44.240,0:48:47.500\nAnother pattern that you\nmay encounter is you know\n\n0:48:47.780,0:48:54.609\nsome process is using a file, but you don't know exactly which process\nis using that file. Or, similarly, some process is listening in\n\n0:48:55.400,0:48:59.020\na port, but again, how do you\nfind out which one it is?\n\n0:48:59.020,0:49:00.820\nSo to set an example.\n\n0:49:00.820,0:49:04.280\nWe just run a Python HTTP server on port\n\n0:49:05.210,0:49:06.559\n444\n\n0:49:06.559,0:49:10.899\nRunning there. Maybe we don't know that\nthat's running, but then we can\n\n0:49:13.130,0:49:15.130\nuse...\n\n0:49:17.089,0:49:19.089\nwe can use LSOF.\n\n0:49:22.660,0:49:29.200\nYeah, we can use LSOF, and the thing is LSOF\nis gonna print a lot of information.\n\n0:49:30.440,0:49:32.740\nYou need SUDO permissions because\n\n0:49:34.069,0:49:39.219\nthis is gonna ask for who has all these items.\n\n0:49:39.829,0:49:43.929\nSince we only care about the one\nwho is listening in this 444 port\n\n0:49:44.630,0:49:46.369\nwe can ask\n\n0:49:46.369,0:49:47.960\ngrep for that.\n\n0:49:47.960,0:49:55.750\nAnd we can see, oh, there's like this Python process, with\nthis identifier, that is using the port and then we can\n\n0:49:56.660,0:49:58.009\nkill it,\n\n0:49:58.009,0:50:00.969\nand that terminates that process.\n\n0:50:02.299,0:50:06.669\nAgain, there's a lot of different\ntools. There's even tools for\n\n0:50:08.450,0:50:10.569\ndoing what is called benchmarking.\n\n0:50:11.660,0:50:18.789\nSo in the shell tools and scripting lecture, I said\nlike for some tasks \"fd\" is much faster than \"find\"\n\n0:50:18.950,0:50:21.519\nBut like how will you check that?\n\n0:50:22.059,0:50:30.038\nI can test that with \"hyperfine\" and I have here\ntwo commands: one with \"fd\" that is just\n\n0:50:30.500,0:50:34.029\nsearching for JPEG files and\nthe same one with \"find\".\n\n0:50:34.579,0:50:41.079\nIf I execute them, it's gonna benchmark these\nscripts and give me some output about\n\n0:50:41.869,0:50:44.108\nhow much faster \"fd\" is\n\n0:50:45.380,0:50:47.380\ncompared to \"find\".\n\n0:50:47.660,0:50:52.269\nSo I think that kind of concludes...\nyeah, like 23 times for this task.\n\n0:50:52.940,0:50:55.990\nSo that kind of concludes the whole overview.\n\n0:50:56.539,0:51:00.309\nI know that there's like a lot of different\ntopics and there's like a lot of\n\n0:51:00.650,0:51:04.539\nperspectives on doing these things, but\nagain I want to reinforce the idea\n\n0:51:04.539,0:51:08.499\nthat you don't need to be a master\nof all these topics but more...\n\n0:51:08.750,0:51:11.229\nTo be aware that all these things exist.\n\n0:51:11.230,0:51:17.559\nSo if you run into these issues you don't reinvent the wheel,\nand you reuse all that other programmers have done.\n\n0:51:18.280,0:51:23.700\nGiven that, I'm happy to take any questions related\nto this last section or anything in the lecture.\n\n0:51:25.900,0:51:30.060\nIs there any way to sort of think about\nhow long a program should take?\n\n0:51:30.060,0:51:33.160\nYou know, if it's taking a while to run\n\n0:51:33.160,0:51:42.840\nyou know, should you be worried? Or depending on your process, let me wait\nanother ten minutes before I start looking at why it's taking so long.\n\n0:51:43.220,0:51:45.220\nOkay, so the...\n\n0:51:46.070,0:51:49.089\nThe task of knowing how long a program\n\n0:51:49.090,0:51:53.920\nshould run is pretty infeasible to figure out.\nIt will depend on the type of program.\n\n0:51:54.290,0:52:01.899\nIt depends on whether you're making HTTP requests or you're\nreading data... one thing that you can do is if you have\n\n0:52:02.390,0:52:02.980\nfor example,\n\n0:52:02.980,0:52:10.689\nif you know you have to read two gigabytes from memory,\nlike from disk, and load that into memory, you can make\n\n0:52:11.510,0:52:16.719\nback-of-the-envelope calculation. So like that shouldn't\ntake longer than like X seconds because this is\n\n0:52:16.940,0:52:20.050\nhow things are set up. Or if you are\n\n0:52:20.840,0:52:27.460\nreading some files from the network and you know kind of what the\nnetwork link is and they are taking say five times longer than\n\n0:52:27.460,0:52:29.460\nwhat you would expect then you could\n\n0:52:29.990,0:52:31.190\ntry to do that.\n\n0:52:31.190,0:52:37.839\nOtherwise, if you don't really know. Say you're trying to do some\nmathematical operation in your code and you're not really sure\n\n0:52:37.840,0:52:44.050\nabout how long that will take you can use something\nlike logging and try to kind of print intermediate\n\n0:52:44.570,0:52:50.469\nstages to get a sense of like, oh I need\nto do a thousand operations of this and\n\n0:52:51.800,0:52:53.600\nthree iterations\n\n0:52:53.600,0:53:00.700\ntook ten seconds. Then this is gonna take\nmuch longer than I can handle in my case.\n\n0:53:00.920,0:53:04.599\nSo I think there are there are ways, it\nwill again like depend on the task,\n\n0:53:04.600,0:53:08.800\nbut definitely, given all the tools we've\nseen really, we probably have like\n\n0:53:09.620,0:53:13.150\na couple of really good ways\nto start tackling that.\n\n0:53:14.750,0:53:16.750\nAny other questions?\n\n0:53:16.750,0:53:18.750\nYou can also do things like\n\n0:53:18.750,0:53:21.060\nrun HTOP and see if anything is running.\n\n0:53:22.380,0:53:25.500\nLike if your CPU is at 0%, something\nis probably wrong.\n\n0:53:31.140,0:53:32.579\nOkay.\n\n0:53:32.579,0:53:38.268\nThere's a lot of exercises for all the topics\nthat we have covered in today's class,\n\n0:53:38.269,0:53:41.419\nso feel free to do the ones\nthat are more interesting.\n\n0:53:42.180,0:53:44.539\nWe're gonna be holding office hours again today.\n\n0:53:45.059,0:53:48.979\nJust a reminder, office hours. You can come\nand ask questions about any lecture.\n\n0:53:48.980,0:53:53.510\nLike we're not gonna expect you to kind of\ndo the exercises in a couple of minutes.\n\n0:53:53.510,0:53:57.979\nThey take a really long while to get through\nthem, but we're gonna be there\n\n0:53:58.529,0:54:04.339\nto answer any questions from previous classes, or even not related\nto exercises. Like if you want to know more about how you\n\n0:54:04.619,0:54:09.889\nwould use TMUX in a way to kind of quickly switch\nbetween panes, anything that comes to your mind.\n\n"
  },
  {
    "path": "static/files/subtitles/2020/qa.sbv",
    "content": "0:00:00.000,0:00:06.540\nI guess we should do an intro to to this as well,\n\n0:00:06.540,0:00:09.580\nso this is a just sort of a\n\n0:00:09.581,0:00:14.740\nfree-form Q&A lecture where you, as in\nthe two people sitting here, but also\n\n0:00:14.740,0:00:19.841\neveryone at home who did not come here\nin person get to ask questions and we\n\n0:00:19.841,0:00:22.961\nhave a bunch of questions people asked\nin advance but you can also ask\n\n0:00:22.961,0:00:27.371\nadditional questions during, for the two\nof you who are here, you can do it either\n\n0:00:27.371,0:00:33.611\nby raising your hand or you can submit it on\nthe forum and be anonymous, it's up to you\n\n0:00:33.611,0:00:35.671\nregardless though, what we're gonna\ndo is just go through some of the\n\n0:00:35.681,0:00:40.241\nquestions have been asked and try to\ngive as helpful answers as we can\n\n0:00:40.241,0:00:43.691\nalthough they are unprepared on our side and\n\n0:00:43.791,0:00:45.611\nyeah that's the plan I guess we go\n\n0:00:45.611,0:00:48.911\nfrom popular to least popular\n\n0:00:48.911,0:00:49.991\nfire away\n\n0:00:49.991,0:00:52.091\nall right so for our first question any\n\n0:00:52.091,0:00:55.961\nrecommendations on learning operating\nsystem related topics like processes,\n\n0:00:55.961,0:00:59.861\nvirtual memory, interrupts,\nmemory management, etc\n\n0:00:59.861,0:01:01.811\nso I think this is a\n\n0:01:01.811,0:01:07.181\nis an interesting question because these\nare really low level concepts that often\n\n0:01:07.181,0:01:11.391\ndo not matter, unless you have to\ndeal with this in some capacity,\n\n0:01:11.391,0:01:12.771\nright so\n\n0:01:12.891,0:01:17.671\none instance where this matters is you're\nwriting really low level code like\n\n0:01:17.681,0:01:20.500\nyou're implementing a kernel or something\nlike that, or you want to\n\n0:01:20.500,0:01:22.811\njust hack on the Linux kernel.\n\n0:01:22.811,0:01:24.751\nIt's rare otherwise that you need to work with\n\n0:01:24.751,0:01:27.711\nespecially like virtual memory and\ninterrupts and stuff yourself\n\n0:01:27.851,0:01:32.071\nprocesses, I think are a more general concept\nthat we've talked a little bit about in\n\n0:01:32.071,0:01:36.611\nthis class as well and tools like\nhtop, pgrep, kill, and signals and\n\n0:01:36.761,0:01:37.711\nthat sort of stuff\n\n0:01:37.711,0:01:39.311\nin terms of learning it\n\n0:01:39.311,0:01:45.371\nmaybe one of the best ways, is to try to\ntake either an introductory class on the\n\n0:01:45.371,0:01:51.401\ntopic, so for example MIT has a class\ncalled 6.828, which is where\n\n0:01:51.401,0:01:55.091\nyou essentially build and develop your\nown operating system based on some code\n\n0:01:55.091,0:01:58.631\nthat you're given, and all of those labs\nare publicly available and all the\n\n0:01:58.631,0:02:01.601\nresources for the class are publicly available,\nand so that is a good way to\n\n0:02:01.601,0:02:04.001\nreally learn them is by doing them yourself.\n\n0:02:04.001,0:02:05.201\nThere are also various\n\n0:02:05.201,0:02:11.201\ntutorials online that basically guide\nyou through how do you write a kernel\n\n0:02:11.201,0:02:15.431\nfrom scratch. Not necessarily a very\nelaborate one, not one you would want\n\n0:02:15.431,0:02:20.561\nto run any real software on, but just to\nteach you the basics and so that would\n\n0:02:20.561,0:02:21.930\nbe another thing to look up.\n\n0:02:21.930,0:02:24.131\nLike how do I write a kernel in and then your\n\n0:02:24.131,0:02:27.611\nlanguage of choice. You will probably not\nfind one that lets you do it in Python\n\n0:02:27.611,0:02:33.612\nbut in like C, C++, Rust, there\nare a bunch of topics like this\n\n0:02:33.612,0:02:36.951\none other note on operating systems\n\n0:02:36.951,0:02:39.931\nso like Jon mentioned MIT has a 6.828 class but\n\n0:02:39.941,0:02:43.391\nif you're looking for a more high-level\noverview, not necessarily programming or\n\n0:02:43.391,0:02:46.001\nan operating system, but just learning about\nthe concepts another good resource\n\n0:02:46.001,0:02:51.331\nis a book called \"Modern Operating\nSystems\" by Andy Tannenbaum\n\n0:02:51.331,0:02:58.371\nthere's also actually a book called the \"The FreeBSD\nOperating System\" which is really good,\n\n0:02:58.371,0:03:03.031\nIt doesn't go through Linux, but it goes\nthrough FreeBSD and the BSD kernel is\n\n0:03:03.031,0:03:07.181\narguably better organized than the Linux\none and better documented and so it\n\n0:03:07.181,0:03:11.591\nmight be a gentler introduction to some of those\ntopics than trying to understand Linux\n\n0:03:11.591,0:03:14.951\nYou want to check it as answered?\n\n0:03:14.951,0:03:16.511\n- Yes + Nice\n\n0:03:16.511,0:03:17.451\nAnswered\n\n0:03:17.451,0:03:19.371\nFor our next question\n\n0:03:19.371,0:03:23.951\nWhat are some of the tools you'd\nprioritize learning first?\n\n0:03:23.951,0:03:29.551\n- Maybe we can all go through and\ngive our opinion on this? + Yeah\n\n0:03:29.551,0:03:31.713\nTools to prioritize learning first?\n\n0:03:31.713,0:03:36.451\nI think learning your editor well,\njust serves you in all capacities\n\n0:03:36.511,0:03:40.511\nlike being efficient at editing files,\nis just like a majority of\n\n0:03:40.511,0:03:45.041\nwhat you're going to spend your time doing.\nAnd in general, just using your\n\n0:03:45.041,0:03:49.211\nkeyboard more in your mouse less. It means\nthat you get to spend more of your\n\n0:03:49.311,0:03:53.751\ntime doing useful things and\nless of your time moving\n\n0:03:53.751,0:03:56.251\nI think that would be my top priority,\n\n0:04:04.511,0:04:06.751\nso I would say that for what\n\n0:04:06.760,0:04:09.671\ntool to prioritize will depend\non what exactly you're doing\n\n0:04:09.671,0:04:16.150\nI think the core idea is you should try\nto find the types of tasks that you are\n\n0:04:16.151,0:04:18.371\ndoing repetitively and so\n\n0:04:18.371,0:04:23.791\nif you are doing some sort of like\nmachine learning workload and\n\n0:04:24.011,0:04:27.130\nyou find yourself using jupyter notebooks,\nlike the one we presented\n\n0:04:27.130,0:04:32.560\nyesterday, a lot. Then again, using\na mouse for that might not be\n\n0:04:32.560,0:04:35.830\nthe best idea and you want to familiarize\nwith the keyboard shortcuts\n\n0:04:35.830,0:04:40.750\nand pretty much with anything you will\nend up figuring out that there are some\n\n0:04:40.751,0:04:45.611\nrepetitive tasks, and you're running a\ncomputer, and just trying to figure out\n\n0:04:45.611,0:04:48.311\noh there's probably a better way to do this\n\n0:04:48.431,0:04:50.871\nbe it a terminal, be it an editor\n\n0:04:51.111,0:04:55.891\nAnd it might be really interesting to\nlearn to use some of the topics that\n\n0:04:55.900,0:05:01.121\nwe have covered, but if they're not\nextremely useful in a everyday\n\n0:05:01.121,0:05:05.431\nbasis then it might not worth prioritizing them\n\n0:05:06.591,0:05:07.451\nout of the topics\n\n0:05:07.531,0:05:11.611\ncovered in this class in my opinion two\nof the most useful things are version\n\n0:05:11.621,0:05:15.220\ncontrol and text editors, and I think they're\na little bit different from each\n\n0:05:15.220,0:05:18.880\nother, in the sense that text editors I\nthink are really useful to learn well\n\n0:05:18.880,0:05:21.970\nbut it was probably the case that before\nwe started using vim and all its fancy\n\n0:05:21.970,0:05:25.390\nkeyboard shortcuts you had some other\ntext editor you were using before and\n\n0:05:25.390,0:05:29.890\nyou could edit text just fine maybe a little\nbit inefficiently whereas I think\n\n0:05:29.890,0:05:33.100\nversion control is another really useful\nskill and that's one where if you don't\n\n0:05:33.100,0:05:36.580\nreally know the tool properly, it can actually\nlead to some problems like loss\n\n0:05:36.580,0:05:39.490\nof data or just inability to collaborate\nproperly with people so I\n\n0:05:39.490,0:05:42.730\nthink version control is one of the first\nthings that's worth learning well\n\n0:05:42.730,0:05:46.871\nyeah, I agree with that, I think\nlearning a tool like Git is just\n\n0:05:46.871,0:05:49.691\ngonna save you so much heartache down the line\n\n0:05:49.691,0:05:51.431\nit also, to add on to that\n\n0:05:51.571,0:05:57.310\nIt really helps you collaborate with others\nand Anish touched a little bit on GitHub\n\n0:05:57.310,0:06:01.300\nin the last lecture, and just learning\nto use that tool well in order\n\n0:06:01.300,0:06:05.321\nto work on larger software projects\nthat other people are working on is\n\n0:06:05.321,0:06:06.431\nan invaluable skill\n\n0:06:10.071,0:06:11.391\nFor our next question\n\n0:06:11.391,0:06:12.871\nwhen do I use Python versus a\n\n0:06:12.881,0:06:16.051\nbash script, versus some other language\n\n0:06:16.051,0:06:19.661\nThis is tough, because I think this comes\n\n0:06:19.661,0:06:21.631\ndown to what Jose was saying earlier too\n\n0:06:21.771,0:06:23.731\nthat it really depends on\nwhat you're trying to do\n\n0:06:23.731,0:06:27.155\nFor me, I think for bash scripts in particular\n\n0:06:27.155,0:06:28.791\nbash scripts are for\n\n0:06:28.891,0:06:33.430\nautomating running a bunch of commands,\nyou don't want to write any\n\n0:06:33.430,0:06:35.411\nother like business logic in bash\n\n0:06:35.411,0:06:39.011\nit is just for I want to run these\n\n0:06:39.011,0:06:44.110\ncommands, in this order. Maybe with\narguments, but like even that\n\n0:06:44.110,0:06:47.581\nit's unclear do you want to bash script\nonce you start taking arguments\n\n0:06:47.581,0:06:52.691\nSimilarly, once you start doing any\nkind of like text processing or\n\n0:06:52.691,0:06:55.131\nconfiguration, all that\n\n0:06:55.131,0:06:59.111\nreach for a language that is a more serious\n\n0:06:59.111,0:07:01.031\nprogramming language than bash is\n\n0:07:01.091,0:07:03.451\nbash is really for sort of short one-off\n\n0:07:03.461,0:07:10.211\nscripts or ones that have a very well-defined\nuse case on the terminal in\n\n0:07:10.211,0:07:12.851\nthe shell, probably\n\n0:07:12.851,0:07:15.941\nFor a slightly more concrete guideline,\nyou might say write a\n\n0:07:15.941,0:07:19.211\nbash script if it's less than a hundred\nlines of code or so, but once it gets\n\n0:07:19.211,0:07:21.611\nbeyond that point bash is kind of\nunwieldy and it's probably worth\n\n0:07:21.611,0:07:25.091\nswitching to a more serious programming\nlanguage like Python\n\n0:07:25.091,0:07:26.511\nand to add to that\n\n0:07:26.511,0:07:32.211\nI would say that I found myself writing\nsometimes scripts in Python because\n\n0:07:32.211,0:07:36.911\nIf I have already solved some subproblem\nthat covers part of the problem in Python\n\n0:07:36.911,0:07:40.631\nI find it much easier to compose the\nprevious solution that I found out in\n\n0:07:40.631,0:07:45.731\nPython and just try to reuse bash code,\nthat I don't find as reusable as Python\n\n0:07:45.731,0:07:49.600\nAnd in the same way it's kind of nice that\na lot of people have written something\n\n0:07:49.600,0:07:52.631\nlike Python libraries or like Ruby libraries\nto do a lot of these things\n\n0:07:52.631,0:07:58.451\nwhereas in bash is kind of hard\nto have like code reuse\n\n0:07:58.451,0:08:01.720\nAnd in fact,\n\n0:08:01.720,0:08:07.631\nI think to add to that. Usually, if you\nfind a library in some language that\n\n0:08:07.631,0:08:12.091\nhelps with the task you're trying to\ndo, use that language for the job\n\n0:08:12.091,0:08:15.671\nAnd in bash there are no libraries, there\nare only the programs on your computer\n\n0:08:15.771,0:08:18.931\nSo you probably don't want to use\nit unless like there's a program\n\n0:08:18.941,0:08:23.741\nyou can just invoke I do think another\nthing worth remembering about bash\n\n0:08:23.741,0:08:26.451\nbash is really hard to get right.\n\n0:08:26.451,0:08:30.531\nIt's very easy to get it right for the particular\nuse case you're trying to solve right now\n\n0:08:30.531,0:08:32.471\nbut things like\n\n0:08:32.471,0:08:35.891\nWhat if one of the filenames has a space in it?\n\n0:08:35.891,0:08:38.891\nIt has caused so many bugs and so\n\n0:08:38.891,0:08:43.151\nmany problems in bash scripts and if you\nuse a real programming language then\n\n0:08:43.151,0:08:46.642\nthose problems just go away\n\n0:08:46.651,0:08:50.491\nChecked it\n\n0:08:50.571,0:08:51.571\nFor our next question\n\n0:08:51.571,0:08:56.211\nWhat is the difference between sourcing\na script and executing that script ?\n\n0:08:57.071,0:09:02.711\nSo this actually, we got in office\nhours a while back as well which is\n\n0:09:02.871,0:09:06.991\nAren't they the same? like aren't they\nboth just running the bash script?\n\n0:09:06.991,0:09:08.051\nand it is true\n\n0:09:08.051,0:09:12.191\nboth of these will end up executing the\nlines of code that are in the script\n\n0:09:12.191,0:09:16.571\nthe ways in which they differ is that\nsourcing a script is telling your\n\n0:09:16.571,0:09:22.991\ncurrent bash script, your current bash\nsession to execute that program\n\n0:09:23.131,0:09:28.911\nwhereas the other one is, start up a new instance\nof bash and run the program there instead\n\n0:09:29.291,0:09:34.931\nAnd this matters for things like imagine that\n\"script.sh\" tries to change directories\n\n0:09:34.931,0:09:37.841\nIf you are running the script\nas in the second invocation\n\n0:09:37.841,0:09:42.761\n\"./script.sh\", then the new\nprocess is going to change\n\n0:09:42.761,0:09:46.891\ndirectories but by the time that script\nexits and returns to your shell\n\n0:09:46.891,0:09:51.831\nyour shell still remains in the same place. However,\nif you do CD in a script and you source it\n\n0:09:51.831,0:09:55.241\nYour current instance of bash is the\none that ends up running it and\n\n0:09:55.241,0:09:57.951\nso it ends up CDing where you are\n\n0:09:57.951,0:10:01.171\nThis is also why if you define functions\n\n0:10:01.171,0:10:04.751\nFor example, that you may want to\nexecute in your shell session\n\n0:10:04.751,0:10:07.011\nYou need to source the script, not run it\n\n0:10:07.011,0:10:10.261\nBecause if you run it, that function\nwill be defined in the\n\n0:10:10.261,0:10:11.931\ninstance of bash\n\n0:10:11.931,0:10:16.831\nIn the bash process that gets launched but it\nwill not be defined in your current shell\n\n0:10:16.831,0:10:22.871\nI think those are two of the biggest\ndifferences between the two\n\n0:10:29.211,0:10:29.711\nNext question,\n\n0:10:29.873,0:10:35.131\nWhat are the places where various packages and tools\nare stored and how does referencing them work?\n\n0:10:35.131,0:10:39.171\nWhat even is /bin or /lib?\n\n0:10:39.171,0:10:45.091\nSo as we covered in the first lecture,\nthere is this PATH environment variable\n\n0:10:45.091,0:10:49.551\nwhich is a semicolon separated\nstring of all the places\n\n0:10:49.551,0:10:55.111\nwhere your shell is gonna look for binaries\nand if you just do something\n\n0:10:55.111,0:10:58.171\nlike \"echo $PATH\", you're gonna get this list\n\n0:10:58.171,0:11:02.251\nand all these places are gonna\nbe consulted in order.\n\n0:11:02.251,0:11:03.601\nIt's gonna go through all of them and in fact\n\n0:11:03.601,0:11:07.011\n- There is already... Did we cover which? + Yeah\n\n0:11:07.211,0:11:10.011\nSo if you run \"which\" and a specific command\n\n0:11:10.021,0:11:14.071\nthe shell is actually is gonna tell\nyou where it's finding this\n\n0:11:14.071,0:11:15.391\nBeyond that,\n\n0:11:15.391,0:11:20.431\nthere is like some conventions where a lot\nof programs will install their binaries\n\n0:11:20.431,0:11:24.071\nand they're like /usr/bin (or at\nleast they will include symlinks)\n\n0:11:24.071,0:11:26.051\nin /usr/bin so you can find them\n\n0:11:26.191,0:11:28.211\nThere's also a /usr/local/bin\n\n0:11:28.211,0:11:33.951\nThere are special directories. For example,\n/usr/sbin it's only for sudo user and\n\n0:11:33.951,0:11:38.491\nsome of these conventions are slightly\ndifferent between different distros so\n\n0:11:38.491,0:11:47.571\nI know like some distros for example install\nthe user libraries under /opt for example\n\n0:11:51.191,0:11:55.491\nYeah I think one thing just\nto talk a little bit of more\n\n0:11:55.651,0:12:00.631\nabout /bin and then Anish maybe you can\ndo the other folders so when it comes to\n\n0:12:00.631,0:12:02.791\n/bin the convention\n\n0:12:02.791,0:12:10.051\nThere are conventions, and the conventions are\nusually /bin are for essential system utilities\n\n0:12:10.051,0:12:12.531\n/usr/bin are for user programs and\n\n0:12:12.531,0:12:17.431\n/usr/local/bin are for user\ncompiled programs, sort of\n\n0:12:17.431,0:12:21.691\nso things that you installed that you intend\nthe user to run, are in /usr/bin\n\n0:12:21.691,0:12:26.711\nthings that a user has compiled themselves and stuck\non your system, probably goes in /usr/local/bin\n\n0:12:26.711,0:12:29.991\nbut again, this varies a lot from machine\nto machine, and distro to distro\n\n0:12:29.991,0:12:33.971\nOn Arch Linux, for example, /bin\nis a symlink to /usr/bin\n\n0:12:33.971,0:12:40.261\nThey're the same and as Jose mentioned, there's\nalso /sbin which is for programs that are\n\n0:12:40.261,0:12:43.801\nintended to only be run as root, that\nalso varies from distro to distro\n\n0:12:43.801,0:12:47.251\nwhether you even have that directory, and\non many systems like /usr/local/bin\n\n0:12:47.251,0:12:51.151\nmight not even be in your PATH, or\nmight not even exist on your system\n\n0:12:51.151,0:12:55.831\nOn BSD on the other hand /usr/local/bin\nis often used a lot more heavily\n\n0:12:56.731,0:12:57.231\nyeah so\n\n0:12:57.231,0:13:01.111\nWhat we were talking about so far, these\nare all ways that files and folders are\n\n0:13:01.111,0:13:05.071\norganized on Linux things or Linux or\nBSD things vary a little bit between\n\n0:13:05.071,0:13:07.151\nthat and macOS or other platforms\n\n0:13:07.151,0:13:09.301\nI think for the specific locations,\n\n0:13:09.301,0:13:11.471\nif you to know exactly what it's\nused for, you can look it up\n\n0:13:11.471,0:13:17.291\nBut some general patterns to keep in mind or anything\nwith /bin in it has binary executable programs in it,\n\n0:13:17.291,0:13:19.891\nanything with \\lib in it, has\nlibraries in it so things that\n\n0:13:19.891,0:13:25.081\nprograms can link against, and then some\nother things that are useful to know are\n\n0:13:25.081,0:13:29.431\nthere's a /etc on many systems, which\nhas configuration files in it and\n\n0:13:29.431,0:13:34.311\nthen there's /home, which underneath that directory\ncontains each user's home directory\n\n0:13:34.311,0:13:38.521\nso like on a linux box my username\nor if it's Anish will\n\n0:13:38.651,0:13:41.351\ncorrespond to a home directory /home/anish\n\n0:13:42.071,0:13:43.351\nYeah I guess there are\n\n0:13:43.351,0:13:47.671\na couple of others like /tmp is usually\na temporary directory that gets\n\n0:13:47.671,0:13:51.351\nerased when you reboot not always but sometimes,\nyou should check on your system\n\n0:13:51.731,0:13:59.211\nThere's a /var which often holds like\nfiles the change over time so\n\n0:13:59.211,0:14:06.151\nthese these are usually going to be things\nlike lock files for package managers\n\n0:14:06.151,0:14:12.431\nthey're gonna be things like log files\nfiles to keep track of process IDs\n\n0:14:12.431,0:14:16.471\nthen there's /dev which shows devices so\n\n0:14:16.471,0:14:20.551\nusually so these are special files that\ncorrespond to devices on your system we\n\n0:14:20.551,0:14:27.391\ntalked about /sys, Anish mentioned /etc\n\n0:14:29.051,0:14:36.031\n/opt is a common one for just like third-party\nsoftware that basically it's usually for\n\n0:14:36.031,0:14:40.951\ncompanies ported their software to Linux\nbut they don't actually understand what\n\n0:14:40.951,0:14:45.391\nrunning software on Linux is like, and\nso they just have a directory with all\n\n0:14:45.391,0:14:51.411\ntheir stuff in it and when those get installed\nthey usually get installed into /opt\n\n0:14:51.411,0:14:55.651\nI think those are the ones off the top of my head\n\n0:14:55.651,0:14:57.771\nyeah\n\n0:14:57.771,0:15:02.271\nAnd we will list these in our lecture notes\nwhich will produce after this lecture\n\n0:15:02.271,0:15:04.431\nNext question\n\n0:15:04.431,0:15:07.080\nShould I apt-get install a Python whatever\n\n0:15:07.080,0:15:10.691\npackage or pip install that package\n\n0:15:10.691,0:15:13.890\nso this is a good question that I think at\n\n0:15:13.890,0:15:17.310\na higher level this question is asking\nshould I use my systems package manager\n\n0:15:17.310,0:15:20.850\nto install things or should I use some other\npackage manager. Like in this case\n\n0:15:20.850,0:15:25.021\none that's more specific to a particular\nlanguage. And the answer here is also\n\n0:15:25.021,0:15:28.590\nkind of it depends, sometimes it's nice\nto manage things using a system package\n\n0:15:28.590,0:15:31.950\nmanager so everything can be installed\nand upgraded in a single place but\n\n0:15:31.950,0:15:35.160\nI think oftentimes whatever is available\nin the system repositories the things\n\n0:15:35.160,0:15:37.800\nyou can get via a tool like\napt-get or something similar\n\n0:15:37.800,0:15:41.040\nmight be slightly out of date compared to\nthe more language specific repository\n\n0:15:41.040,0:15:45.060\nso for example a lot of the Python packages\nI use I really want the most\n\n0:15:45.060,0:15:47.771\nup-to-date version and so\nI use pip to install them\n\n0:15:48.551,0:15:51.091\nThen, to extend on that is\n\n0:15:51.091,0:15:57.751\nsometimes the case the system packages\nmight require some other\n\n0:15:57.751,0:16:02.461\ndependencies that you might not have realized\nabout, and it's also might be\n\n0:16:02.461,0:16:07.201\nthe case or like for some systems,\nat least for like alpine Linux they\n\n0:16:07.201,0:16:11.221\ndon't have wheels for like a lot of the\nPython packages so it will just take\n\n0:16:11.221,0:16:15.331\nlonger to compile them, it will take more\nspace because they have to compile them\n\n0:16:15.331,0:16:20.761\nfrom scratch. Whereas if you just go\nto pip, pip has binaries for a lot of\n\n0:16:20.761,0:16:23.471\ndifferent platforms and that will probably work\n\n0:16:23.471,0:16:29.191\nYou also should be aware that pip might not do\nthe exact same thing in different computers\n\n0:16:29.191,0:16:33.601\nSo, for example, if you are in a kind of laptop\nor like a desktop that is running like\n\n0:16:33.601,0:16:38.971\na x86 or x86_64 you probably have binaries,\nbut if you're running something\n\n0:16:38.971,0:16:43.471\nlike Raspberry Pi or some other kind of\nembedded device. These are running on a\n\n0:16:43.471,0:16:47.611\ndifferent kind of hardware architecture\nand you might not have binaries\n\n0:16:47.611,0:16:51.841\nI think that's also good to take into account,\nin that case in might be worthwhile to\n\n0:16:51.841,0:16:58.551\nuse the system packages just because they\nwill take much shorter to get them\n\n0:16:58.551,0:17:01.691\nthan to just to compile from scratch\nthe entire Python installation\n\n0:17:01.691,0:17:06.741\nApart from that, I don't think I can think of any exceptions\nwhere I would actually use the system packages\n\n0:17:06.741,0:17:09.251\ninstead of the Python provided ones\n\n0:17:19.011,0:17:20.851\nSo, one other thing to keep in mind is that\n\n0:17:20.861,0:17:26.180\nsometimes you will have more than one\nprogram on your computer and you might\n\n0:17:26.180,0:17:29.961\nbe developing more than one program on\nyour computer and for some reason not\n\n0:17:29.961,0:17:33.861\nall programs are always built with the latest\nversion of things, sometimes they\n\n0:17:33.861,0:17:39.351\nare a little bit behind, and when you\ninstall something system-wide you can\n\n0:17:39.351,0:17:44.691\nonly... depends on your exact system,\nbut often you just have one version\n\n0:17:44.691,0:17:49.711\nwhat pip lets you do, especially combined\nwith something like python's virtualenv,\n\n0:17:49.711,0:17:54.531\nand similar concepts exist for other\nlanguages, where you can sort of say\n\n0:17:54.531,0:17:59.660\nI want to (NPM does the same thing as well\nwith its node modules, for example) where\n\n0:17:59.660,0:18:05.991\nI'm gonna compile the dependencies of\nthis package in sort of a subdirectory\n\n0:18:05.991,0:18:10.431\nof its own, and all of the versions that it\nrequires are going to be built in there\n\n0:18:10.431,0:18:13.910\nand you can do this separately for separate\nprojects so there they have\n\n0:18:13.910,0:18:16.910\ndifferent dependencies or the same dependencies\nwith different versions\n\n0:18:16.910,0:18:20.930\nthey still sort of kept separate. And that\nis one thing that's hard to achieve\n\n0:18:20.931,0:18:22.651\nwith system packages\n\n0:18:27.131,0:18:27.851\nNext question\n\n0:18:27.911,0:18:32.771\nWhat's the easiest and best profiling tools\nto use to improve performance of my code?\n\n0:18:34.351,0:18:39.231\nThis is a topic we could talk\nabout for a very long time\n\n0:18:39.231,0:18:42.881\nThe easiest and best is to print stuff using time\n\n0:18:42.881,0:18:48.431\nLike, I'm not joking, very often\nthe easiest thing is in your code\n\n0:18:48.971,0:18:53.751\nAt the top you figure out what the current\ntime is, and then you do sort of\n\n0:18:53.751,0:18:57.920\na binary search over your program of add\na print statement that prints how much\n\n0:18:57.920,0:19:02.511\ntime has elapsed since the start of your\nprogram and then you do that until you\n\n0:19:02.511,0:19:06.320\nfind the segment of code that took the\nlongest. And then you go into that\n\n0:19:06.320,0:19:09.531\nfunction and then you do the same thing\nagain and you keep doing this until you\n\n0:19:09.531,0:19:14.031\nfind roughly where the time was spent. It's\nnot foolproof, but it is really easy\n\n0:19:14.031,0:19:16.721\nand it gives you good information quickly\n\n0:19:16.721,0:19:25.361\nif you do need more advanced information\nValgrind has a tool called cache-grind?\n\n0:19:25.361,0:19:29.431\ncall grind? Cache grind? One of the two.\n\n0:19:29.431,0:19:33.310\nand this tool lets you run your program and\n\n0:19:33.310,0:19:38.741\nmeasure how long everything takes and\nall of the call stacks, like which\n\n0:19:38.741,0:19:42.521\nfunction called which function, and what\nyou end up with is a really neat\n\n0:19:42.521,0:19:47.081\nannotation of your entire program source\nwith the heat of every line basically\n\n0:19:47.081,0:19:51.761\nhow much time was spent there. It does\nslow down your program by like an order\n\n0:19:51.761,0:19:56.021\nof magnitude or more, and it doesn't really\nsupport threads but it is really\n\n0:19:56.021,0:20:01.121\nuseful if you can use it. If you can't,\nthen tools like perf or similar tools\n\n0:20:01.121,0:20:05.201\nfor other languages that do usually some\nkind of sampling profiling like we\n\n0:20:05.201,0:20:09.811\ntalked about in the profiler lecture, can\ngive you pretty useful data quickly,\n\n0:20:09.811,0:20:15.160\nbut it's a lot of data around\nthis, but they're a little bit\n\n0:20:15.160,0:20:18.971\nbiased and what kind of things they usually\nhighlight as a problem and it\n\n0:20:18.971,0:20:22.961\ncan sometimes be hard to extract meaningful\ninformation about what should\n\n0:20:22.961,0:20:27.701\nI change in response to them. Whereas the\nsort of print approach very quickly\n\n0:20:27.701,0:20:32.171\ngives you like this section\nof code is bad or slow\n\n0:20:32.171,0:20:34.871\nI think would be my answer\n\n0:20:34.871,0:20:40.431\nFlamegraphs are great, they're a good way\nto visualize some of this information\n\n0:20:41.491,0:20:45.550\nYeah I just have one thing to add,\noftentimes programming languages\n\n0:20:45.550,0:20:48.910\nhave language specific tools for profiling\nso to figure out what's the\n\n0:20:48.910,0:20:52.191\nright tool to use for your language like if\nyou're doing JavaScript in the web browser\n\n0:20:52.191,0:20:55.411\nthe web browser has a really nice tool for\ndoing profiling you should just use that\n\n0:20:55.411,0:21:00.471\nor if you are using go, for example, go has a built-in\nprofiler is really good you should just use that\n\n0:21:01.711,0:21:04.251\nA last thing to add to that\n\n0:21:04.251,0:21:09.951\nSometimes you might find that doing this binary\nsearch over time that you're kind of\n\n0:21:09.961,0:21:14.351\nfinding where the time is going, but this\ntime is sometimes happening because\n\n0:21:14.351,0:21:18.461\nyou're waiting on the network, or you're\nwaiting for some file, and in that case\n\n0:21:18.461,0:21:23.440\nyou want to make sure that the time\nthat is, if I want to write\n\n0:21:23.440,0:21:27.310\nlike 1 gigabyte file or like read 1\ngigabyte file and put it into memory\n\n0:21:27.310,0:21:32.260\nyou want to check that the actual time\nthere, is the minimum amount of time\n\n0:21:32.260,0:21:36.221\nyou actually have to wait. If it's ten times\nlonger, you should try to use some\n\n0:21:36.221,0:21:39.371\nother tools that we covered in the debugging\nand profiling section to see\n\n0:21:39.371,0:21:45.671\nwhy you're not utilizing all your\nresources because that might...\n\n0:21:50.511,0:21:56.071\nBecause that might be a lot of what's happening\nthing, like for example, in my research\n\n0:21:56.081,0:21:59.410\nin machine learning workloads, a lot of\ntime is loading data and you have to\n\n0:21:59.410,0:22:02.981\nmake sure well like the time it takes to\nload data is actually the minimum amount\n\n0:22:02.981,0:22:07.500\nof time you want to have that happening\n\n0:22:08.040,0:22:13.481\nAnd to build on that, there are actually\nspecialized tools for doing things like\n\n0:22:13.481,0:22:17.351\nanalyzing wait times. Very often when\nyou're waiting for something what's\n\n0:22:17.351,0:22:20.591\nreally happening is you're issuing your\nsystem call, and that system call takes\n\n0:22:20.591,0:22:24.191\nsome amount of time to respond. Like you do\na really large write, or a really large read\n\n0:22:24.191,0:22:28.361\nor you do many of them, and one thing\nthat can be really handy here is\n\n0:22:28.361,0:22:31.841\nto try to get information out of the\nkernel about where your program is\n\n0:22:31.841,0:22:37.000\nspending its time. And so there's (it's\nnot new), but there's a relatively\n\n0:22:37.000,0:22:42.820\nnewly available thing called BPF or eBPF.\nWhich is essentially kernel tracing\n\n0:22:42.820,0:22:48.531\nand you can do some really cool things with\nit, and that includes tracing user programs.\n\n0:22:48.531,0:22:51.760\nIt can be a little bit awkward to\nget started with, there's a tool\n\n0:22:51.760,0:22:56.201\ncalled BPF trace that i would recommend\nyou looking to, if you need to do like\n\n0:22:56.201,0:23:00.040\nthis kind of low-level performance debugging.\nBut it is really good for this\n\n0:23:00.040,0:23:04.601\nkind of stuff. You can get things like\nhistograms over how much time was spent\n\n0:23:04.601,0:23:06.671\nin particular system calls\n\n0:23:06.671,0:23:09.721\nIt's a great tool\n\n0:23:12.251,0:23:15.351\nWhat browser plugins do you use?\n\n0:23:16.731,0:23:19.731\nI try to use as few as I can get away with using\n\n0:23:19.731,0:23:25.991\nbecause I don't like things being in\nmy browser, but there are a couple of\n\n0:23:25.991,0:23:30.311\nones that are sort of staples.\nThe first one is uBlock Origin.\n\n0:23:30.311,0:23:36.611\nSo uBlock Origin is one of many ad blockers but\nit's a little bit more than an ad blocker.\n\n0:23:36.611,0:23:42.530\nIt is (a what do they call it?) a\nnetwork filtering tool so it lets\n\n0:23:42.530,0:23:47.331\nyou do more things than just block ads.\nIt also lets you like block connections\n\n0:23:47.331,0:23:51.351\nto certain domains, block connections\nfor certain types of resources\n\n0:23:51.351,0:23:56.031\nSo I have mine set up in what they call\nthe Advanced Mode, where basically\n\n0:23:56.031,0:24:02.451\nyou can disable basically all network requests.\nBut it's not just Network requests,\n\n0:24:02.451,0:24:07.430\nIt's also like I have disabled all inline\nscripts on every page and all\n\n0:24:07.430,0:24:11.540\nthird-party images and resources, and then\nyou can sort of create a whitelist\n\n0:24:11.540,0:24:16.351\nfor every page so it gives you really\nlow-level tools around how to\n\n0:24:16.351,0:24:20.331\nhow to improve the security of your browsing.\nBut you can also set it in not the\n\n0:24:20.331,0:24:23.991\nadvanced mode, and then it does much of\nthe same as a regular ad blocker would\n\n0:24:23.991,0:24:28.101\ndo, although in a fairly efficient way\nif you're looking at an ad blocker it's\n\n0:24:28.101,0:24:31.510\nprobably the one to use and it\nworks on like every browser\n\n0:24:31.511,0:24:34.451\nThat would be my top pick I think,\n\n0:24:39.111,0:24:44.391\nI think probably the one I\nuse like the most actively\n\n0:24:44.391,0:24:50.391\nis one called Stylus. It lets you modify\nthe CSS or like the stylesheets\n\n0:24:50.391,0:24:54.560\nthat webpages have. And it's pretty\nneat, because sometimes you're\n\n0:24:54.560,0:24:58.550\nlooking at a website and you want\nto hide some part of the website\n\n0:24:58.550,0:25:04.211\nyou don't care about. Like maybe a ad, maybe\nsome sidebar you're not finding useful\n\n0:25:04.211,0:25:06.290\nThe thing is, at the end of\nthe day these things are\n\n0:25:06.290,0:25:09.591\ndisplaying in your browser, and you\nhave control of what code is\n\n0:25:09.591,0:25:13.131\nexecuting and similar to what Jon was\nsaying, like you can customize this\n\n0:25:13.131,0:25:18.491\nto no end, and what I have for a lot of\nweb pages like hide this this part, or\n\n0:25:18.491,0:25:23.390\nalso trying to make like dark modes for\nthem like you can change pretty much the\n\n0:25:23.390,0:25:26.810\ncolor for every single website. And what\nis actually pretty neat is that there's\n\n0:25:26.810,0:25:31.461\nlike a repository online of people that\nhave contributed this is stylesheets\n\n0:25:31.461,0:25:35.031\nfor the websites. So someone probably\nhas (done) one for GitHub\n\n0:25:35.031,0:25:38.780\nLike I want dark GitHub and someone has\nalready contributed one that makes\n\n0:25:38.780,0:25:44.631\nthat much more pleasing to browse. Apart\nfrom that, one that it's not really\n\n0:25:44.631,0:25:49.491\nfancy, but I have found incredibly helpful\nis one that just takes a screenshot an\n\n0:25:49.491,0:25:53.121\nentire website. And It will\nscroll for you and make\n\n0:25:53.121,0:25:57.711\ncompound image of the entire website and that's\nreally great for when you're trying to\n\n0:25:57.711,0:26:00.111\nprint a website and is just terrible.\n\n0:26:00.111,0:26:00.611\n(It's built into Firefox)\n\n0:26:00.611,0:26:02.671\noh interesting\n\n0:26:02.671,0:26:05.751\noh now that you mention builtin to Firefox,\nanother one that I really like about\n\n0:26:05.751,0:26:09.071\nFirefox is the multi account containers\n\n0:26:09.071,0:26:10.831\n(Oh yeah, it's fantastic)\n\n0:26:10.831,0:26:12.291\nWhich kind of lets you\n\n0:26:12.291,0:26:16.670\nBy default a lot of web browsers, like\nfor example Chrome, have this\n\n0:26:16.670,0:26:20.601\nnotion of like there's session that you\nhave, where you have all your cookies\n\n0:26:20.601,0:26:24.560\nand they are kind of all shared from the\ndifferent websites in the sense of\n\n0:26:24.560,0:26:30.811\nyou keep opening new tabs and unless you go into\nincognito you kind of have the same profile\n\n0:26:30.811,0:26:34.190\nAnd that profile is the same for\nall websites, there is this\n\n0:26:34.191,0:26:35.851\nIs it an extension or is it built in?\n\n0:26:35.851,0:26:40.571\n(it's a mix, it's complicated)\n\n0:26:41.091,0:26:46.211\nSo I think you actually have to say you want\nto install it or enable it and again\n\n0:26:46.221,0:26:49.881\nthe name is Multi Account Containers and\nthese let you tell Firefox to have\n\n0:26:49.881,0:26:53.961\nseparate isolated sessions. So\nfor example, you want to say\n\n0:26:53.961,0:26:58.851\nI have a separate sessions for whenever I\nvisit to Google or whenever I visit Amazon\n\n0:26:58.851,0:27:01.791\nand that can be pretty neat, because then you can\n\n0:27:01.791,0:27:08.171\nAt a browser level it's ensuring that no information\nsharing is happening between the two of them\n\n0:27:08.171,0:27:11.961\nAnd it's much more convenient than\nhaving to open a incognito window\n\n0:27:11.961,0:27:14.471\nwhere it's gonna clean all the time the stuff\n\n0:27:14.471,0:27:17.311\n(One thing to mention is Stylus vs Stylish)\n\n0:27:17.531,0:27:19.651\nOh yeah, I forgot about that\n\n0:27:19.651,0:27:24.931\nOne important thing is the browser extension\nfor side loading CSS Stylesheets\n\n0:27:24.931,0:27:31.851\nit's called a Stylus and that's different\nfrom the older one that was\n\n0:27:31.851,0:27:37.400\ncalled Stylish, because that one got\nbought at some point by some shady\n\n0:27:37.400,0:27:40.711\ncompany, that started abusing it not only to have\n\n0:27:40.711,0:27:45.780\nthat functionality, but also to read your\nentire browser history and send that\n\n0:27:45.780,0:27:48.491\nback to their servers so they could data mine it.\n\n0:27:48.491,0:27:53.731\nSo, then people just built this open-source alternative\nthat is called Stylus, and that's the one\n\n0:27:53.731,0:27:58.951\nwe recommend. Said that, I think the repository\nfor styles is the same for the\n\n0:27:58.951,0:28:03.611\ntwo of them, but I would have\nto double check that.\n\n0:28:03.611,0:28:05.951\nDo you have any browser plugins Anish?\n\n0:28:06.071,0:28:09.311\nYes, so I also have some recommendations\nfor browser plugins\n\n0:28:09.311,0:28:13.991\nI also use uBlock Origin and I also use Stylus,\n\n0:28:13.991,0:28:18.511\nbut one other one that I'd recommend is\nintegration with a password manager\n\n0:28:18.511,0:28:21.631\nSo this is a topic that we have in\nthe lecture notes for the security\n\n0:28:21.631,0:28:24.841\nlecture, but we didn't really get to talk\nabout in detail. But basically password\n\n0:28:24.841,0:28:27.810\nmanagers do a really good job of increasing\nyour security when working\n\n0:28:27.810,0:28:31.831\nwith online accounts, and having browser\nintegration with your password manager\n\n0:28:31.831,0:28:34.410\ncan save you a lot of time like you\ncan open up a website then it can\n\n0:28:34.410,0:28:37.381\nautofill your login information for you\nsir and you go and copy and paste it\n\n0:28:37.381,0:28:40.320\nback and forth between a separate program\nif it's not integrated with your\n\n0:28:40.320,0:28:43.410\nweb browser, and it can also, this integration,\ncan save you from certain\n\n0:28:43.410,0:28:47.651\nattacks that would otherwise be possible if\nyou were doing this manual copy pasting.\n\n0:28:47.651,0:28:50.790\nFor example, phishing attacks. So\nyou find a website that looks very\n\n0:28:50.790,0:28:54.211\nsimilar to Facebook and you go to log in\nwith your facebook login credentials and\n\n0:28:54.211,0:28:56.851\nyou go to your password manager and copy\npaste the correct credentials into this\n\n0:28:56.851,0:29:00.060\nfunny web site and now all of a sudden\nit has your password but if you have\n\n0:29:00.060,0:29:03.091\nbrowser integration then the extension\ncan automatically check\n\n0:29:03.091,0:29:06.951\nlike. Am I on F A C E B O O K.com,or\nis it some other domain\n\n0:29:06.951,0:29:10.671\nthat maybe look similar and it will not enter\nthe login information if it's the wrong domain\n\n0:29:10.671,0:29:15.791\nso browser extension for\npassword managing is good\n\n0:29:15.791,0:29:17.930\nYeah I agree\n\n0:29:19.491,0:29:20.711\nNext question\n\n0:29:20.711,0:29:23.991\nWhat are other useful data wrangling tools?\n\n0:29:23.991,0:29:32.421\nSo in yesterday's lecture, I mentioned curl, so\ncurl is a fantastic tool for just making web\n\n0:29:32.421,0:29:35.811\nrequests and dumping them to your terminal.\nYou can also use it for things\n\n0:29:35.811,0:29:41.191\nlike uploading files which is really handy.\n\n0:29:41.191,0:29:48.431\nIn the exercises of that lecture we also talked about\nJQ and pup which are command line tools that let you\n\n0:29:48.431,0:29:52.991\nbasically write queries over JSON\nand HTML documents respectively\n\n0:29:52.991,0:30:00.391\nthat can be really handy. Other\ndata wrangling tools?\n\n0:30:00.391,0:30:03.821\nAh Perl, the Perl programming language is\n\n0:30:03.821,0:30:08.061\noften referred to as a write only\nprogramming language because it's\n\n0:30:08.061,0:30:13.431\nimpossible to read even if you wrote it.\nBut it is fantastic at doing just like\n\n0:30:13.431,0:30:21.561\nstraight up text processing, like nothing\nbeats it there, so maybe worth learning\n\n0:30:21.561,0:30:24.331\nsome very rudimentary Perl just\nto write some of those scripts\n\n0:30:24.331,0:30:29.371\nIt's easier often than writing some like hacked-up\ncombination of grep and awk and sed,\n\n0:30:29.371,0:30:36.311\nand it will be much faster to just tack something\nup than writing it up in Python, for example\n\n0:30:36.311,0:30:44.031\nbut apart from that, other data wrangling\n\n0:30:44.031,0:30:47.071\nNo, not off the top of my head really\n\n0:30:47.071,0:30:53.661\ncolumn -t, if you pipe any white space separated\n\n0:30:53.661,0:30:58.821\ninput into column -t it will align all\nthe white space of the columns so that\n\n0:30:58.821,0:31:05.771\nyou get nicely aligned columns that's, and\nhead and tail but we talked about those\n\n0:31:09.011,0:31:13.791\nI think a couple of additions to that,\nthat I find myself using commonly\n\n0:31:13.791,0:31:19.881\none is vim. Vim can be pretty useful\nfor like data wrangling on itself\n\n0:31:19.881,0:31:22.461\nSometimes you might find that the operation\nthat you're trying to do is\n\n0:31:22.461,0:31:27.711\nhard to put down in terms of piping\ndifferent operators but if you\n\n0:31:27.711,0:31:32.531\ncan just open the file and just record\n\n0:31:32.531,0:31:37.301\na couple of quick vim macros to do what you\nwant it to do, it might be like much,\n\n0:31:37.301,0:31:42.311\nmuch easier. That's one, and then the other\none, if you're dealing with tabular\n\n0:31:42.311,0:31:46.091\ndata and you want to do more complex operations\nlike sorting by one column,\n\n0:31:46.091,0:31:51.161\nthen grouping and then computing some sort\nof statistic, I think a lot of that\n\n0:31:51.161,0:31:55.951\nworkload I ended up just using Python\nand pandas because it's built for that\n\n0:31:55.951,0:32:00.190\nAnd one of the pretty neat features that\nI find myself also using is that it\n\n0:32:00.190,0:32:03.931\nwill export to many different formats.\nSo this intermediate state\n\n0:32:03.931,0:32:09.221\nhas its own kind of pandas dataframe\nobject but it can\n\n0:32:09.221,0:32:14.171\nexport to HTM, LaTeX, a lot of different\nlike table formats so if your end\n\n0:32:14.171,0:32:19.531\nproduct is some sort of summary table, then pandas\nI think it's a fantastic choice for that\n\n0:32:21.111,0:32:24.791\nI would second the vim and also\nPython I think those are\n\n0:32:24.791,0:32:29.051\ntwo of my most used data wrangling tools.\nFor the vim one, last year we had a demo\n\n0:32:29.051,0:32:31.841\nin the series in the lecture notes, but\nwe didn't cover it in class we had a\n\n0:32:31.841,0:32:38.051\ndemo of turning an XML file into a JSON version\nof that same data using only vim macros\n\n0:32:38.051,0:32:40.331\nAnd I think that's actually the\nway I would do it in practice\n\n0:32:40.331,0:32:43.241\nI don't want to go find a tool that does\nthis conversion it is actually simple\n\n0:32:43.241,0:32:45.431\nto encode as a vim macro,\nthen I just do it that way\n\n0:32:45.431,0:32:48.991\nAnd then also Python especially in an interactive\ntool like a Jupyter notebook\n\n0:32:48.991,0:32:51.171\nis a really great way of doing data wrangling\n\n0:32:51.171,0:32:52.951\nA third tool I'd mention which\nI don't remember if we\n\n0:32:52.961,0:32:55.361\ncovered in the data wrangling\nlecture or elsewhere\n\n0:32:55.361,0:32:58.751\nis a tool called pandoc which can do transformations\nbetween different text\n\n0:32:58.751,0:33:02.981\ndocument formats so you can convert from\nplaintext to HTML or HTML to markdown\n\n0:33:02.981,0:33:07.361\nor LaTeX to HTML or many other formats\nit actually it supports a large\n\n0:33:07.361,0:33:10.471\nlist of input formats and a\nlarge list of output formats\n\n0:33:10.471,0:33:16.361\nI think there's one last one which I mentioned briefly\nin the lecture on data wrangling which is\n\n0:33:16.361,0:33:20.441\nthe R programming language, it's\nan awful (I think it's an awful)\n\n0:33:20.441,0:33:25.120\nlanguage to program in. And i would never\nuse it in the middle of a data wrangling\n\n0:33:25.120,0:33:30.951\npipeline, but at the end, in order to like produce\npretty plots and statistics R is great\n\n0:33:30.951,0:33:35.581\nBecause R is built for doing\nstatistics and plotting\n\n0:33:35.581,0:33:40.591\nthere's a library for are called\nggplot which is just amazing\n\n0:33:40.591,0:33:46.551\nggplot2 i guess technically It's\ngreat, it produces very\n\n0:33:46.551,0:33:51.431\nnice visualizations and it lets you do,\nit does very easily do things like\n\n0:33:51.431,0:33:57.561\nIf you have a data set that has like multiple\nfacets like it's not just X and Y\n\n0:33:57.561,0:34:03.111\nit's like X Y Z and some other variable,\nand then you want to plot like the\n\n0:34:03.111,0:34:07.581\nthroughput grouped by all of those parameters\nat the same time and produce\n\n0:34:07.581,0:34:11.991\na visualization. R very easily let's you\ndo this and I haven't seen anywhere\n\n0:34:11.991,0:34:14.891\nthat lets you do that as easily\n\n0:34:16.971,0:34:17.951\nNext question,\n\n0:34:17.951,0:34:20.511\nWhat's the difference between\nDocker and a virtual machine\n\n0:34:23.271,0:34:27.731\nWhat's the easiest way to explain this? So docker\n\n0:34:27.741,0:34:31.221\nstarts something called containers and\ndocker is not the only program that\n\n0:34:31.221,0:34:36.561\nstarts containers. There are many others\nand usually they rely on some feature of\n\n0:34:36.561,0:34:40.401\nthe underlying kernel in the case of\ndocker they use something called LXC\n\n0:34:40.401,0:34:47.571\nwhich are Linux containers and the basic\npremise there is if you want to start\n\n0:34:47.571,0:34:53.181\nwhat looks like a virtual machine that\nis running roughly the same operating\n\n0:34:53.181,0:34:57.411\nsystem as you are already running on your\ncomputer then you don't really need\n\n0:34:57.411,0:35:04.701\nto run another instance of the kernel\nreally that other virtual machine can\n\n0:35:04.701,0:35:09.951\nshare a kernel. And you can just use the\nkernels built in isolation mechanisms to\n\n0:35:09.951,0:35:13.791\nspin up a program that thinks it's\nrunning on its own hardware but in\n\n0:35:13.791,0:35:18.501\nreality it's sharing the kernel and so this\nmeans that containers can often run\n\n0:35:18.501,0:35:22.611\nwith much lower overhead than a full virtual\nmachine will do but you should\n\n0:35:22.611,0:35:26.391\nkeep in mind that it also has somewhat weaker\nisolation because you are sharing\n\n0:35:26.391,0:35:30.831\na kernel between the two if you spin up\na virtual machine the only thing that's\n\n0:35:30.831,0:35:35.931\nshared is sort of the hardware and to\nsome extent the hypervisor, whereas\n\n0:35:35.931,0:35:40.791\nwith a docker container you're sharing\nthe full kernel and the that is a\n\n0:35:40.791,0:35:44.921\ndifferent threat model that you\nmight have to keep in mind\n\n0:35:47.341,0:35:52.361\nOne another small note there as Jon pointed\nout, to use containers something\n\n0:35:52.361,0:35:55.631\nlike Docker you need the underlying operating\nsystem to be roughly the same\n\n0:35:55.631,0:36:00.071\nas whatever the program that's running\non top of the container expects and so\n\n0:36:00.071,0:36:03.791\nif you're using macOS for example, the\nway you use docker is you run Linux\n\n0:36:03.791,0:36:08.261\ninside a virtual machine and then you can\nrun Docker on top of Linux so maybe\n\n0:36:08.261,0:36:11.741\nif you're going for containers in order\nto get better performance your trading\n\n0:36:11.741,0:36:15.131\nisolation for performance if you're running\non Mac OS that may not work out\n\n0:36:15.131,0:36:17.451\nexactly as expected\n\n0:36:17.451,0:36:21.221\nAnd one last note is that there\nis a slight difference, so\n\n0:36:21.221,0:36:25.721\nwith Docker and containers,\none of the gotchas you have\n\n0:36:25.721,0:36:29.411\nto be familiar with is that containers\nare more similar to virtual\n\n0:36:29.411,0:36:33.071\nmachines in the sense of that they will\npersist all the storage that you\n\n0:36:33.071,0:36:35.971\nhave where Docker by default won't have that.\n\n0:36:35.971,0:36:37.791\nLike Docker is supposed to be running\n\n0:36:37.791,0:36:41.771\nSo the main idea is like I want\nto run some software and\n\n0:36:41.771,0:36:45.671\nI get the image and it runs and if you\nwant to have any kind of persistent\n\n0:36:45.671,0:36:50.081\nstorage that links to the host system\nyou have to kind of manually specify\n\n0:36:50.081,0:36:56.051\nthat, whereas a virtual machine is using\nsome virtual disk that is being provided\n\n0:36:56.051,0:37:02.671\nNext question\n\n0:37:02.671,0:37:05.111\nWhat are the advantages of each operating system\n\n0:37:05.111,0:37:08.531\nand how can we choose between them?\nFor example, choosing the best Linux\n\n0:37:08.531,0:37:10.551\ndistribution for our purposes\n\n0:37:14.251,0:37:16.811\nI will say that for many, many tasks the\n\n0:37:16.811,0:37:20.171\nspecific Linux distribution that you're\nrunning is not that important\n\n0:37:20.171,0:37:23.731\nthe thing is, it's just what kind of\n\n0:37:23.731,0:37:27.651\nknowing that there are different types\nor like groups of distributions,\n\n0:37:27.651,0:37:32.251\nSo for example, there are some distributions\nthat have really frequent updates\n\n0:37:32.251,0:37:38.971\nbut they kind of break more easily. So for\nexample Arch Linux has a rolling update\n\n0:37:38.971,0:37:43.511\nway of pushing updates, where things might\nbreak but they're fine with the things\n\n0:37:43.511,0:37:47.891\nbeing that way. Where maybe where you\nhave some really important web server\n\n0:37:47.891,0:37:51.401\nthat is hosting all your business\nanalytics you want that thing\n\n0:37:51.401,0:37:55.961\nto have like a much more steady way of\nupdates. So that's for example why you\n\n0:37:55.961,0:37:58.121\nwill see distributions like Debian being\n\n0:37:58.121,0:38:02.951\nmuch more conservative about what they push, or\neven for example Ubuntu makes a difference\n\n0:38:02.951,0:38:07.001\nbetween the Long Term Releases\nthat they are only update every\n\n0:38:07.001,0:38:12.281\ntwo years and the more periodic\nreleases of one there is a\n\n0:38:12.281,0:38:16.661\nit's like two a year that they make.\nSo, kind of knowing that there's the\n\n0:38:16.661,0:38:21.341\ndifference apart from that some distributions\nhave different ways\n\n0:38:21.341,0:38:27.191\nof providing the binaries\nto you and the way they\n\n0:38:27.191,0:38:33.791\nhave the repositories so I think a lot of Red\nHat Linux don't want non free drivers in\n\n0:38:33.791,0:38:37.361\ntheir official repositories where I\nthink Ubuntu is fine with some of\n\n0:38:37.361,0:38:42.491\nthem, apart from that I think like just\na lot of what is core to most Linux\n\n0:38:42.491,0:38:47.411\ndistros is kind of shared between them\nand there's a lot of learning in the\n\n0:38:47.411,0:38:51.431\ncommon ground. So you don't have\nto worry about the specifics\n\n0:38:52.391,0:38:56.351\nKeeping with the theme of this class being somewhat\nopinionated, I'm gonna go ahead and say\n\n0:38:56.351,0:39:00.041\nthat if you're using Linux especially for\nthe first time choose something like\n\n0:39:00.041,0:39:03.851\nUbuntu or Debian. So you Ubuntu to is a\nDebian based distribution but maybe is a\n\n0:39:03.851,0:39:07.421\nlittle bit more friendly, Debian is a little\nbit more minimalist. I use Debian\n\n0:39:07.421,0:39:10.451\nand all my servers, for example. And I use\nDebian desktop on my desktop computers\n\n0:39:10.451,0:39:15.431\nthat run Linux if you're going for maybe\ntrying to learn more things and you want\n\n0:39:15.431,0:39:19.391\na distribution that trades stability for\nhaving more up-to-date software maybe\n\n0:39:19.391,0:39:21.911\nat the expense of you having to fix a\nbroken distribution every once in a\n\n0:39:21.911,0:39:26.911\nwhile then maybe you can consider something\nlike Arch Linux or Gentoo\n\n0:39:26.911,0:39:32.681\nor Slackware. Oh man, I'd say that like\nif you're installing Linux and just like\n\n0:39:32.681,0:39:34.891\nwant to get work done Debian is a great choice\n\n0:39:35.911,0:39:38.271\nYeah I think I agree with that.\n\n0:39:38.271,0:39:40.971\nThe other observation is like\nyou couldn't install BSD\n\n0:39:40.971,0:39:46.691\nBSD has gotten, has come a long way from\nwhere it was. There's still a bunch of\n\n0:39:46.691,0:39:50.921\nsoftware you can't really get for BSD but\nit gives you a very well-documented\n\n0:39:50.921,0:39:55.841\nexperience and and one thing that's different\nabout BSD compared to Linux is\n\n0:39:55.841,0:40:02.531\nthat in an BSD when you install BSD you\nget a full operating system, mostly\n\n0:40:02.651,0:40:07.531\nSo many of the programs are maintained by\nthe same team that maintains the kernel\n\n0:40:07.541,0:40:11.351\nand everything is sort of upgraded together,\nwhich is a little different\n\n0:40:11.351,0:40:13.271\nthan how thanks work in the Linux world it does\n\n0:40:13.271,0:40:16.751\nmean that things often move a little bit\nslower. I would not use it for things\n\n0:40:16.751,0:40:21.791\nlike gaming either, because drivers support\nis meh. But it is an interesting\n\n0:40:21.791,0:40:30.661\nenvironment to look at. And then for things\nlike Mac OS and Windows I think\n\n0:40:30.661,0:40:36.041\nIf you are a programmer, I don't know why\nyou are using Windows unless you are\n\n0:40:36.041,0:40:42.401\nbuilding things for Windows; or you want\nto be able to do gaming and stuff\n\n0:40:42.401,0:40:46.891\nbut in that case, maybe try dual booting,\neven though that's a pain too\n\n0:40:46.891,0:40:52.031\nMac OS is a is a good sort of middle point\nbetween the two where you get a system\n\n0:40:52.031,0:40:57.851\nthat is like relatively nicely polished\nfor you. But you still have access to\n\n0:40:57.851,0:41:01.191\nsome of the lower-level bits\nat least to a certain extent.\n\n0:41:01.191,0:41:07.451\nit's also really easy to dual boot Mac OS and Windows\nit is not quite the case with like Mac OS and\n\n0:41:07.451,0:41:09.651\nLinux or Linux and Windows\n\n0:41:13.911,0:41:15.751\nAlright, for the rest of the\nquestions so these are\n\n0:41:15.761,0:41:18.761\nall 0 upvote questions so maybe we can go\nthrough them quickly in the last five\n\n0:41:18.761,0:41:23.471\nor so minutes of class. So the next\none is Vim versus Emacs? Vim!\n\n0:41:23.471,0:41:30.911\nEasy answer, but a more serious answer is like I think\nall three of us use vim as our primary editor\n\n0:41:30.911,0:41:34.931\nI use Emacs for some research specific\nstuff which requires Emacs but\n\n0:41:34.931,0:41:38.681\nat a higher level both editors have interesting\nideas behind them and if you\n\n0:41:38.681,0:41:43.061\nhave the time is worth exploring both\nto see which fits you better and also\n\n0:41:43.061,0:41:46.811\nyou can use Emacs and run it in a vim\nemulation mode. I actually know a\n\n0:41:46.811,0:41:49.091\ngood number of people who do that so\nthey get access to some of the cool\n\n0:41:49.091,0:41:52.631\nEmacs functionality and some of the cool\nphilosophy behind that like Emacs is\n\n0:41:52.631,0:41:55.391\nprogrammable through Lisp which is kind of cool.\n\n0:41:55.391,0:41:59.411\nMuch better than vimscript, but people like\nvim's modal editing, so there's an\n\n0:41:59.411,0:42:04.481\nemacs plugin called evil mode which gives\nyou vim modal editing within Emacs so\n\n0:42:04.481,0:42:08.081\nit's not necessarily a binary choice you\ncan kind of combine both tools if you\n\n0:42:08.081,0:42:11.151\nwant to. And it's worth exploring\nboth if you have the time.\n\n0:42:11.151,0:42:12.731\nNext question\n\n0:42:12.731,0:42:15.671\nAny tips or tricks for machine\nlearning applications?\n\n0:42:19.271,0:42:22.351\nI think, like knowing how\n\n0:42:22.361,0:42:24.791\na lot of these tools, mainly the data wrangling\n\n0:42:24.791,0:42:30.041\na lot of the shell tools, it's really\nimportant because it seems a lot\n\n0:42:30.041,0:42:33.851\nof what you're doing as machine learning\nresearcher is trying different things\n\n0:42:33.851,0:42:39.491\nbut I think one core aspect of doing that,\nand like a lot of scientific work is being\n\n0:42:39.491,0:42:44.501\nable to have reproducible results\nand logging them in a sensible way\n\n0:42:44.501,0:42:47.711\nSo for example, instead of trying to come\nup with really hacky solutions of how\n\n0:42:47.711,0:42:51.151\nyou name your folders to make\nsense of the experiments\n\n0:42:51.151,0:42:53.251\nMaybe it's just worth having for example\n\n0:42:53.251,0:42:55.931\nwhat I do is have like a JSON\nfile that describes the\n\n0:42:55.931,0:43:00.371\nentire experiment I know like all the parameters\nthat are within and then I can\n\n0:43:00.371,0:43:05.111\nreally quickly, using the tools that\nwe have covered, query for all the\n\n0:43:05.111,0:43:09.701\nexperiments that have some specific\npurpose or use some data set\n\n0:43:09.701,0:43:15.071\nThings like that. Apart from that, the other\nside of this is, if you are running\n\n0:43:15.071,0:43:19.871\nkind of things for training machine\nlearning applications and you\n\n0:43:19.871,0:43:23.981\nare not already using some sort of\ncluster, like university or your\n\n0:43:23.981,0:43:28.301\ncompany is providing and you're just kind\nof manually sshing, like a lot of\n\n0:43:28.301,0:43:31.231\nlabs do, because that's kind of the easy way\n\n0:43:31.231,0:43:36.671\nIt's worth automating a lot of that job\nbecause it might not seem like it but\n\n0:43:36.671,0:43:40.601\nmanually doing a lot of these operations\ntakes away a lot of your time and also\n\n0:43:40.601,0:43:45.031\nkind of your mental energy\nfor running these things\n\n0:43:48.551,0:43:51.691\nAnymore vim tips?\n\n0:43:51.691,0:43:56.771\nI have one. So in the vim lecture we tried\nnot to link you to too many different\n\n0:43:56.771,0:44:00.131\nvim plugins because we didn't want that\nlecture to be overwhelming but I think\n\n0:44:00.131,0:44:02.921\nit's actually worth exploring vim plugins\nbecause there are lots and lots\n\n0:44:02.921,0:44:07.091\nof really cool ones out there.\nOne resource you can use is the\n\n0:44:07.091,0:44:10.571\ndifferent instructors dotfiles like a lot\nof us, I think I use like two dozen\n\n0:44:10.571,0:44:14.321\nvim plugins and I find a lot of them quite\nhelpful and I use them every day\n\n0:44:14.321,0:44:18.311\nwe all use slightly different subsets of\nthem. So go look at what we use or look\n\n0:44:18.311,0:44:22.131\nat some of the other resources we've linked\nto and you might find some stuff useful\n\n0:44:22.791,0:44:26.951\nA thing to add to that is, I don't think\nwe went into a lot detail in the\n\n0:44:27.041,0:44:31.571\nlecture, correct me if I'm wrong. It's\ngetting familiar with the leader key\n\n0:44:31.571,0:44:35.021\nWhich is kind of a special key\nthat a lot of programs will\n\n0:44:35.021,0:44:39.081\nespecially plugins, that will link to\nand for a lot of the common operations\n\n0:44:39.081,0:44:44.661\nvim has short ways of doing it, but you\ncan just figure out like quicker\n\n0:44:44.661,0:44:50.031\nversions for doing them. So for example, like\nI know that you can do like semicolon WQ\n\n0:44:50.031,0:44:55.521\nto save and exit or that you\ncan do like capital ZZ but I\n\n0:44:55.521,0:44:59.241\njust actually just do leader (which for\nme is the space) and then W. And I have\n\n0:44:59.241,0:45:04.131\ndone that for a lot of a lot of kind of\ncommon operations that I keep doing all\n\n0:45:04.131,0:45:08.091\nthe time. Because just saving one keystroke\nfor an extremely common operation\n\n0:45:08.091,0:45:11.371\nis just saving thousands a month\n\n0:45:11.371,0:45:12.951\nYeah just to expand a little bit\n\n0:45:12.951,0:45:17.031\non what the leader key is so in vim you\ncan bind some keys I can do like ctrl J\n\n0:45:17.031,0:45:20.481\ndoes something like holding one key and\nthen pressing another I can bind that to\n\n0:45:20.481,0:45:23.781\nsomething or I can bind a single keystroke\nto something. What the leader\n\n0:45:23.781,0:45:26.031\nkey lets you do, is bind\n\n0:45:26.031,0:45:28.311\nSo you can assign any key\nto be the leader key and\n\n0:45:28.311,0:45:32.841\nthen you can assign leader followed by\nsome other key to some action so for\n\n0:45:32.841,0:45:36.831\nexample like Jose's leader key is space\nand they can combine space and then\n\n0:45:36.831,0:45:41.601\nreleasing space followed by some other\nkey to an arbitrary vim command so it\n\n0:45:41.601,0:45:45.631\njust gives you yet another way of binding\nlike a whole set of key combinations.\n\n0:45:45.631,0:45:49.751\nLeader key plus kind of any key on\nthe keyboard to some functionality\n\n0:45:49.751,0:45:53.751\nI think I've I forget whether\nwe covered macros in the vim\n\n0:45:53.751,0:45:58.581\nuh sure but like vim macros are worth\nlearning they're not that complicated\n\n0:45:58.581,0:46:03.141\nbut knowing that they're there and knowing\nhow to use them is going to save\n\n0:46:03.141,0:46:09.501\nyou so much time. The other one is something\ncalled marks. So in vim you can\n\n0:46:09.501,0:46:13.491\npress m and then any letter on your keyboard\nto make a mark in that file and\n\n0:46:13.491,0:46:18.021\nthen you can press apostrophe on the\nsame letter to jump back to the same\n\n0:46:18.021,0:46:21.801\nplace. This is really useful if you're\nlike moving back and forth\n\n0:46:21.801,0:46:25.491\nbetween two different parts of your code\nfor example. You can mark one as A and\n\n0:46:25.491,0:46:29.611\none as B and you can then jump between\nthem with tick A and tick B.\n\n0:46:29.611,0:46:34.851\nThere's also Ctrl+O which jumps to the previous\nplace you were in the file no matter\n\n0:46:34.851,0:46:40.611\nwhat caused you to move. So for example\nif I am in a some line and then I jump\n\n0:46:40.611,0:46:45.201\nto B and then I jump to A, Ctrl+O will\ntake me back to B and then back to the\n\n0:46:45.201,0:46:48.831\nplace I originally was. This can also be\nhandy for things like if you're doing a\n\n0:46:48.831,0:46:52.671\nsearch then the place that you\nstarted the search is a part of\n\n0:46:52.671,0:46:56.211\nthat stack. So I can do a search I can\nthen like step through the results\n\n0:46:56.211,0:47:00.801\nand like change them and then Ctrl+O\nall the way back up to the search\n\n0:47:00.801,0:47:06.201\nCtrl+O also lets you move across files so\nif I go from one file to somewhere else in\n\n0:47:06.201,0:47:09.681\ndifferent file and somewhere else in the\nfirst file Ctrl+O will move me back\n\n0:47:09.681,0:47:15.261\nthrough that stack and then there's\nCtrl+I to move forward in that\n\n0:47:15.261,0:47:20.841\nstack and so it's not as though you\npop it and it goes away forever\n\n0:47:20.841,0:47:26.541\nThe command colon earlier is really handy.\nSo, colon earlier gives you an earlier\n\n0:47:26.541,0:47:32.870\nversion of the same file and it it does\nthis based on time not based on actions\n\n0:47:32.870,0:47:36.651\nso for example if you press a bunch of like\nundo and redo and make some changes\n\n0:47:36.651,0:47:42.561\nand stuff, earlier will take a literally\nearlier as in time version of your file\n\n0:47:42.561,0:47:46.971\nand restore it to your buffer. This can\nsometimes be good if you like undid and\n\n0:47:46.971,0:47:50.841\nthen rewrote something and then realize\nyou actually wanted the version that was\n\n0:47:50.841,0:47:55.100\nthere before you started undoing earlier\nlet's you do this. And there's a plug-in\n\n0:47:55.100,0:48:01.971\ncalled undo tree or something like\nthat There are several of these,\n\n0:48:01.971,0:48:05.781\nthat let you actually explore the full\ntree of undo history the vim keeps\n\n0:48:05.781,0:48:09.201\nbecause it doesn't just keep a linear history\nit actually keeps the full tree\n\n0:48:09.201,0:48:12.771\nand letting you explore that might in\nsome cases save you from having to\n\n0:48:12.771,0:48:16.461\nre-type stuff you typed in the past or\nstuff you just forgot exactly what you\n\n0:48:16.461,0:48:21.081\nhad there that used to work and no longer\nworks. And this is one final one I\n\n0:48:21.081,0:48:26.751\nwant to mention which is, we mentioned\nhow in vim you have verbs and nouns\n\n0:48:26.751,0:48:33.201\nright to your verbs like delete or yank\nand then you have nouns like next of\n\n0:48:33.201,0:48:37.401\nthis character or percent to swap brackets\nand that sort of stuff the\n\n0:48:37.401,0:48:44.571\nsearch command is a noun so you can do\nthings like D slash and then a string\n\n0:48:44.571,0:48:50.261\nand it will delete up to the next match\nof that pattern this is extremely useful\n\n0:48:50.261,0:48:54.251\nand I use it all the time\n\n0:48:58.500,0:49:03.520\nOne another neat addition on the undo stuff\nthat I find incredibly valuable in\n\n0:49:03.520,0:49:08.201\nan everyday basis is that like one of\nthe built-in functionalities of vim\n\n0:49:08.201,0:49:13.510\nis that you can specify an undo directory\nand if you have a specified an\n\n0:49:13.510,0:49:17.620\nundo directory by default vim, if you\ndon't have this enabled, whenever you\n\n0:49:17.620,0:49:23.091\nenter a file your undo history is\nclean, there's nothing in there\n\n0:49:23.091,0:49:26.371\nand as you make changes and then\nundo them you kind of create this\n\n0:49:26.380,0:49:32.800\nhistory but as soon as you exit the\nfile that's lost. Sorry, as soon\n\n0:49:32.800,0:49:37.181\nas you exit vim, that's lost. However\nif you have an undodir, vim is\n\n0:49:37.181,0:49:41.651\ngonna persist all those changes into\nthis directory so no matter how many\n\n0:49:41.651,0:49:45.580\ntimes you enter and leave that history\nis persisted and it's incredibly\n\n0:49:45.580,0:49:48.191\nhelpful because even like\n\n0:49:48.191,0:49:50.290\nit can be very helpful for\nsome files that you modify\n\n0:49:50.290,0:49:54.760\noften because then you can kind of keep\nthe flow. But it's also sometimes really\n\n0:49:54.760,0:50:00.010\nhelpful if you modify your bashrc see and\nsomething broke like five days later and\n\n0:50:00.010,0:50:03.070\nthen you've vim again. Like what actually\ndid I change ,if you don't\n\n0:50:03.070,0:50:06.760\nhave say like version control, then\nyou can just check the undos and\n\n0:50:06.760,0:50:10.661\nthat's actually what happened. And\nthe last one, it's also really\n\n0:50:10.661,0:50:14.891\nworth familiarizing yourself with registers\nand what different special\n\n0:50:14.891,0:50:20.380\nregisters vim uses. So for example if\nyou want to copy/paste really that's\n\n0:50:20.380,0:50:26.201\ngone into in a specific register and if you\nwant to for example use the a OS a copy\n\n0:50:26.201,0:50:30.040\nlike the OS clipboard, you should\nbe copying or yanking\n\n0:50:30.040,0:50:36.250\ncopying and pasting from a different register\nand there's a lot of them and yeah\n\n0:50:36.251,0:50:41.310\nI think that you should explore, there's\na lot of things to know about registers\n\n0:50:42.271,0:50:45.070\nThe next question is asking about two-factor\nauthentication and I'll just give\n\n0:50:45.070,0:50:48.490\na very quick answer to this one in the interest\nof time. So it's worth using two\n\n0:50:48.490,0:50:52.480\nfactor auth for anything security sensitive\nso I use it for my GitHub\n\n0:50:52.480,0:50:56.710\naccount and for my email and stuff like\nthat. And there's a bunch of different\n\n0:50:56.710,0:51:01.360\ntypes of two-factor auth. From SMS based\nto factor auth where you get special\n\n0:51:01.360,0:51:04.630\nlike a number texted to you when you try\nto log in you have to type that number\n\n0:51:04.630,0:51:08.710\nand to other tools like universal to\nfactor this is like those Yubikeys\n\n0:51:08.710,0:51:11.350\nthat you plug into your you have\nto tap it every time you login\n\n0:51:11.350,0:51:18.130\nso not all, (yeah Jon is holding a\nYubikey), not all two-factor auth is\n\n0:51:18.130,0:51:22.240\ncreated equal and you really want to be\nusing something like U2F rather than SMS\n\n0:51:22.240,0:51:25.300\nbased to factor auth. There something\nbased on one-time pass codes that you\n\n0:51:25.300,0:51:28.810\nhave to type in we don't have time to get\ninto the details of why some methods\n\n0:51:28.810,0:51:32.020\nare better than others but at a high\nlevel use U2F and the Internet has\n\n0:51:32.020,0:51:37.560\nplenty of explanations for why other\nmethods are not a great idea\n\n0:51:37.711,0:51:41.851\nLast question, any comments on differences\nbetween web browsers?\n\n0:51:48.171,0:51:50.171\nYes\n\n0:51:54.711,0:52:00.451\nDifferences between web browsers, there\nare fewer and fewer differences between\n\n0:52:00.461,0:52:06.000\nweb browsers these day. At this point\nalmost all web browsers are chrome\n\n0:52:06.000,0:52:09.580\nEither because you're using Chrome or\nbecause you're using a browser that's\n\n0:52:09.580,0:52:15.550\nusing the same browser engine as Chrome.\nIt's a little bit sad, one might say, but\n\n0:52:15.550,0:52:20.511\nI think these days whether you choose\n\n0:52:20.511,0:52:24.451\nChrome is a great browser for security reasons\n\n0:52:24.451,0:52:28.471\nif you want to have something\nthat's more customizable or\n\n0:52:28.471,0:52:39.490\nyou don't want to be tied to Google then\nuse Firefox, don't use Safari it's a\n\n0:52:39.490,0:52:45.701\nworse version of Chrome. The new Internet\nExplorer edge is pretty decent and also\n\n0:52:45.701,0:52:50.820\nuses the same browser engine as\nChrome and that's probably fine\n\n0:52:50.820,0:52:54.641\nalthough avoid it if you can because it\nhas some like legacy modes you don't\n\n0:52:54.641,0:52:58.064\nwant to deal with. I think that's\n\n0:52:58.064,0:53:03.091\nOh, there's a cool new browser called flow\n\n0:53:03.091,0:53:05.500\nthat you can't use for anything useful\nyet but they're actually writing\n\n0:53:05.500,0:53:08.693\ntheir own browser engine and that's really neat\n\n0:53:08.693,0:53:14.951\nFirefox also has this project called servo which is\nthey're really implementing their browser engine\n\n0:53:14.951,0:53:19.570\nin Rust in order to write it to be like\nsuper concurrent and what they've done\n\n0:53:19.570,0:53:24.961\nis they've started to take modules\nfrom that version and port them\n\n0:53:24.961,0:53:29.041\nover to gecko or integrate them with gecko\nwhich is the main browser engine\n\n0:53:29.041,0:53:32.221\nfor Firefox just to get those\nspeed ups there as well\n\n0:53:32.221,0:53:37.031\nand that's a neat neat thing\nyou can be watching out for\n\n0:53:39.231,0:53:41.851\nThat is all the questions, hey we did it. Nice\n\n0:53:41.851,0:53:50.751\nI guess thanks for taking the missing semester\nclass and let's do it again next year\n"
  },
  {
    "path": "static/files/subtitles/2020/shell-tools.sbv",
    "content": "0:00:00.400,0:00:02.860\nOkay, welcome back.\n\n0:00:02.860,0:00:05.920\nToday we're gonna cover a couple separate\n\n0:00:05.920,0:00:07.620\ntwo main topics related to the shell.\n\n0:00:07.620,0:00:11.240\nFirst, we're gonna do some kind of shell\nscripting, mainly related to bash,\n\n0:00:11.240,0:00:14.160\nwhich is the shell that most of you will start\n\n0:00:14.160,0:00:18.520\nin Mac, or like in most Linux systems,\nthat's the default shell.\n\n0:00:18.520,0:00:22.720\nAnd it's also kind of backward compatible through\nother shells like zsh, it's pretty nice.\n\n0:00:22.740,0:00:25.940\nAnd then we're gonna cover some other shell\ntools that are really convenient,\n\n0:00:26.060,0:00:29.320\nso you avoid doing really repetitive tasks,\n\n0:00:29.320,0:00:31.580\nlike looking for some piece of code\n\n0:00:31.580,0:00:33.420\nor for some elusive file.\n\n0:00:33.420,0:00:36.160\nAnd there are already really\nnice built-in commands\n\n0:00:36.160,0:00:40.960\nthat will really help you to do those things.\n\n0:00:40.960,0:00:43.260\nSo yesterday we already kind of introduced\n\n0:00:43.260,0:00:46.160\nyou to the shell and some of it's quirks,\n\n0:00:46.160,0:00:48.720\nand like how you start executing commands,\n\n0:00:48.720,0:00:50.600\nredirecting them.\n\n0:00:50.600,0:00:52.400\nToday, we're going to kind of cover more about\n\n0:00:52.460,0:00:56.120\nthe syntax of the variables, the control flow,\n\n0:00:56.120,0:00:57.720\nfunctions of the shell.\n\n0:00:57.720,0:01:02.700\nSo for example, once you drop\ninto a shell, say you want to\n\n0:01:02.760,0:01:06.360\ndefine a variable, which is\none of the first things you\n\n0:01:06.360,0:01:09.340\nlearn to do in a programming language.\n\n0:01:09.340,0:01:12.740\nHere you could do something like foo equals bar.\n\n0:01:12.860,0:01:18.400\nAnd now we can access the value\nof foo by doing \"$foo\".\n\n0:01:18.460,0:01:21.400\nAnd that's bar, perfect.\n\n0:01:21.400,0:01:24.480\nOne quirk that you need to be aware of is that\n\n0:01:24.480,0:01:27.900\nspaces are really critical when\nyou're dealing with bash.\n\n0:01:27.900,0:01:33.380\nMainly because spaces are reserved, and\nthat will be for separating arguments.\n\n0:01:33.380,0:01:36.700\nSo, for example, something like foo equals bar\n\n0:01:36.700,0:01:42.000\nwon't work, and the shell is gonna\ntell you why it's not working.\n\n0:01:42.000,0:01:46.280\nIt's because the foo command is not\nworking, like foo is non-existent.\n\n0:01:46.280,0:01:47.780\nAnd here what is actually happening,\nwe're not assigning foo to bar,\n\n0:01:47.780,0:01:52.260\nwhat is happening is we're\ncalling the foo program\n\n0:01:52.260,0:01:57.520\nwith the first argument \"=\" and\nthe second argument \"bar\".\n\n0:01:57.520,0:02:03.880\nAnd in general, whenever you are having\nsome issues, like some files with spaces\n\n0:02:03.880,0:02:06.160\nyou will need to be careful about that.\n\n0:02:06.160,0:02:10.620\nYou need to be careful about quoting strings.\n\n0:02:10.640,0:02:16.480\nSo, going into that, how you do strings in bash.\nThere are two ways that you can define a string:\n\n0:02:16.540,0:02:24.720\nYou can define strings using double quotes\nand you can define strings using single,\n\n0:02:24.720,0:02:26.540\nsorry,\n\n0:02:26.540,0:02:28.880\nusing single quotes.\n\n0:02:29.140,0:02:32.760\nHowever, for literal strings they are equivalent,\n\n0:02:32.760,0:02:35.460\nbut for the rest they are not equivalent.\n\n0:02:35.460,0:02:42.980\nSo, for example, if we do value is $foo,\n\n0:02:43.440,0:02:48.480\nthe $foo has been expanded like\na string, substituted to the\n\n0:02:48.480,0:02:50.820\nvalue of the foo variable in the shell.\n\n0:02:50.960,0:02:58.940\nWhereas if we do this with a simple quote,\nwe are just getting the $foo as it is\n\n0:02:58.940,0:03:02.280\nand single quotes won't be replacing. Again,\n\n0:03:02.280,0:03:07.290\nit's really easy to write a script, assume that\nthis is kind of like Python, that you might be\n\n0:03:07.290,0:03:10.860\nmore familiar with, and not realize all that.\n\n0:03:10.860,0:03:14.180\nAnd this is the way you will assign variables.\n\n0:03:14.180,0:03:17.849\nThen bash also has control flow\ntechniques that we'll see later,\n\n0:03:17.849,0:03:24.440\nlike for loops, while loops, and one main\nthing is you can define functions.\n\n0:03:24.440,0:03:27.820\nWe can access a function I have defined here.\n\n0:03:28.220,0:03:34.220\nHere we have the MCD function, that\nhas been defined, and the thing is\n\n0:03:34.220,0:03:38.400\nso far, we have just kind of seen how\nto execute several commands by piping\n\n0:03:38.400,0:03:40.720\ninto them, kind of saw that briefly yesterday.\n\n0:03:40.940,0:03:44.980\nBut a lot of times you want to do first\none thing and then another thing.\n\n0:03:44.980,0:03:47.580\nAnd that's kind of like the\n\n0:03:47.740,0:03:50.880\nsequential execution that we get here.\n\n0:03:50.880,0:03:54.260\nHere, for example, we're\ncalling the MCD function.\n\n0:03:56.860,0:03:57.800\nWe, first,\n\n0:03:57.800,0:04:02.960\nare calling the makedir command,\nwhich is creating this directory.\n\n0:04:02.960,0:04:05.600\nHere, $1 is like a special variable.\n\n0:04:05.600,0:04:07.440\nThis is the way that bash works,\n\n0:04:07.440,0:04:12.160\nwhereas in other scripting languages\nthere will be like argv,\n\n0:04:12.160,0:04:16.620\nthe first item of the array argv\nwill contain the argument.\n\n0:04:16.620,0:04:19.160\nIn bash it's $1. And in general, a lot\n\n0:04:19.160,0:04:21.640\nof things in bash will be dollar something\n\n0:04:21.640,0:04:26.680\nand will be reserved, we will\nbe seeing more examples later.\n\n0:04:26.680,0:04:30.290\nAnd once we have created the folder,\nwe CD into that folder,\n\n0:04:30.290,0:04:34.687\nwhich is kind of a fairly common\npattern that you will see.\n\n0:04:34.687,0:04:39.060\nWe will actually type this directly\ninto our shell, and it will work and\n\n0:04:39.120,0:04:45.260\nit will define this function. But sometimes\nit's nicer to write things in a file.\n\n0:04:45.260,0:04:50.040\nWhat we can do is we can source\nthis. And that will\n\n0:04:50.080,0:04:53.960\nexecute this script in our shell and load it.\n\n0:04:53.960,0:04:59.340\nSo now it looks like nothing happened,\nbut now the MCD function has\n\n0:04:59.340,0:05:03.460\nbeen defined in our shell. So\nwe can now for example do\n\n0:05:03.463,0:05:09.150\nMCD test, and now we move from\nthe tools directory to the test\n\n0:05:09.160,0:05:14.200\ndirectory. We both created the\nfolder and we moved into it.\n\n0:05:15.760,0:05:18.820\nWhat else. So a result is...\n\n0:05:18.820,0:05:22.160\nWe can access the first argument with $1.\n\n0:05:22.160,0:05:26.100\nThere's a lot more reserved commands,\n\n0:05:26.100,0:05:30.020\nfor example $0 will be the name of the script,\n\n0:05:30.020,0:05:35.260\n$2 through $9 will be the second\nthrough the ninth arguments\n\n0:05:35.260,0:05:38.070\nthat the bash script takes.\nSome of these reserved\n\n0:05:38.070,0:05:43.080\nkeywords can be directly used\nin the shell, so for example\n\n0:05:43.420,0:05:50.300\n$? will get you the error code\nfrom the previous command,\n\n0:05:50.300,0:05:53.580\nwhich I'll also explain briefly.\n\n0:05:53.580,0:05:58.320\nBut for example, $_ will get\nyou the last argument of the\n\n0:05:58.320,0:06:03.460\nprevious command. So another way\nwe could have done this is\n\n0:06:03.460,0:06:07.380\nwe could have said like \"mkdir test\"\n\n0:06:07.380,0:06:12.020\nand instead of rewriting test, we\ncan access that last argument\n\n0:06:12.020,0:06:18.400\nas part of the (previous command), using $_\n\n0:06:18.400,0:06:23.160\nlike, that will be replaced with\ntest and now we go into test.\n\n0:06:25.040,0:06:27.480\nThere are a lot of them, you\nshould familiarize with them.\n\n0:06:27.480,0:06:32.900\nAnother one I often use is called \"bang\nbang\" (\"!!\"), you will run into this\n\n0:06:32.910,0:06:37.300\nwhenever you, for example, are trying\nto create something and you don't have\n\n0:06:37.320,0:06:41.000\nenough permissions. Then, you can do \"sudo !!\"\n\n0:06:41.010,0:06:43.400\nand then that will replace the command in\n\n0:06:43.470,0:06:46.400\nthere and now you can just try doing\n\n0:06:46.440,0:06:48.380\nthat. And now it will prompt you for a password,\n\n0:06:48.380,0:06:50.080\nbecause you have sudo permissions.\n\n0:06:53.800,0:06:57.180\nBefore, I mentioned the, kind\nof the error command.\n\n0:06:57.180,0:06:59.400\nYesterday we saw that, in general, there are\n\n0:06:59.400,0:07:02.400\ndifferent ways a process can communicate\n\n0:07:02.400,0:07:05.091\nwith other processes or commands.\n\n0:07:05.100,0:07:08.420\nWe mentioned the standard\ninput, which also was like\n\n0:07:09.160,0:07:11.380\ngetting stuff through the standard input,\n\n0:07:11.640,0:07:13.840\nputting stuff into the standard output.\n\n0:07:13.840,0:07:16.830\nThere are a couple more interesting\nthings, there's also like a\n\n0:07:16.830,0:07:19.837\nstandard error, a stream where you write errors\n\n0:07:19.837,0:07:23.900\nthat happen with your program and you don't\nwant to pollute the standard output.\n\n0:07:23.900,0:07:27.420\nThere's also the error code,\nwhich is like a general\n\n0:07:27.420,0:07:29.520\nthing in a lot of programming languages,\n\n0:07:29.520,0:07:34.460\nsome way of reporting how the\nentire run of something went.\n\n0:07:34.460,0:07:36.060\nSo if we do\n\n0:07:36.060,0:07:41.020\nsomething like echo hello and we\n\n0:07:41.580,0:07:43.920\nquery for the value, it's zero. And it's zero\n\n0:07:43.920,0:07:45.840\nbecause everything went okay and there\n\n0:07:45.840,0:07:49.170\nweren't any issues. And a zero exit code is\n\n0:07:49.170,0:07:50.940\nthe same as you will get in a language\n\n0:07:50.940,0:07:54.980\nlike C, like 0 means everything\nwent fine, there were no errors.\n\n0:07:54.980,0:07:57.600\nHowever, sometimes things won't work.\n\n0:07:57.600,0:08:04.600\nSometimes, like if we try to grep\nfor foobar in our MCD script,\n\n0:08:04.600,0:08:08.130\nand now we check for that\nvalue, it's 1. And that's\n\n0:08:08.130,0:08:10.770\nbecause we tried to search for the foobar\n\n0:08:10.770,0:08:13.620\nstring in the MCD script and it wasn't there.\n\n0:08:13.620,0:08:17.190\nSo grep doesn't print anything, but\n\n0:08:17.190,0:08:19.950\nlet us know that things didn't work by\n\n0:08:19.950,0:08:22.260\ngiving us a 1 error code.\n\n0:08:22.260,0:08:24.420\nThere are some interesting commands like\n\n0:08:24.420,0:08:29.160\n\"true\", for example, will always have a zero\n\n0:08:29.160,0:08:35.060\nerror code, and false will always\nhave a one error code.\n\n0:08:35.060,0:08:37.919\nThen there are like\n\n0:08:37.919,0:08:40.080\nthese logical operators that you can use\n\n0:08:40.080,0:08:43.808\nto do some sort of conditionals.\nFor example, one way...\n\n0:08:43.808,0:08:47.160\nyou also have IF's and ELSE's, that\nwe will see later, but you can do\n\n0:08:47.160,0:08:51.920\nsomething like \"false\", and echo \"Oops fail\".\n\n0:08:51.920,0:08:56.300\nSo here we have two commands connected\nby this OR operator.\n\n0:08:56.300,0:09:00.250\nWhat bash is gonna do here, it's\ngonna execute the first one\n\n0:09:00.250,0:09:04.450\nand if the first one didn't work, then it's\n\n0:09:04.450,0:09:07.380\ngonna execute the second one. So here we get it,\n\n0:09:07.380,0:09:12.000\nbecause it's gonna try to do a logical\nOR. If the first one didn't have\n\n0:09:12.000,0:09:15.960\na zero error code, it's gonna try to\ndo the second one. Similarly, if we\n\n0:09:15.960,0:09:19.580\ninstead of use \"false\", we\nuse something like \"true\",\n\n0:09:19.580,0:09:22.180\nsince true will have a zero error code, then the\n\n0:09:22.180,0:09:24.700\nsecond one will be short-circuited and\n\n0:09:24.700,0:09:27.500\nit won't be printed.\n\n0:09:32.560,0:09:36.970\nSimilarly, we have an AND\noperator which will only\n\n0:09:36.970,0:09:39.430\nexecute the second part if the first one\n\n0:09:39.430,0:09:41.440\nran without errors.\n\n0:09:41.440,0:09:44.820\nAnd the same thing will happen.\n\n0:09:44.820,0:09:50.340\nIf the first one fails, then the second\npart of this thing won't be executed.\n\n0:09:50.340,0:09:57.280\nKind of not exactly related to that, but\nanother thing that you will see is\n\n0:10:00.020,0:10:04.120\nthat no matter what you execute,\nthen you can concatenate\n\n0:10:04.120,0:10:07.120\ncommands using a semicolon in the same line,\n\n0:10:07.120,0:10:10.300\nand that will always print.\n\n0:10:10.300,0:10:13.630\nBeyond that, what we haven't\nseen, for example, is how\n\n0:10:13.630,0:10:19.460\nyou go about getting the output\nof a command into a variable.\n\n0:10:19.630,0:10:24.120\nAnd the way we can do that is\ndoing something like this.\n\n0:10:24.120,0:10:29.480\nWhat we're doing here is we're getting\nthe output of the PWD command,\n\n0:10:29.480,0:10:32.720\nwhich is just printing the\npresent working directory\n\n0:10:32.720,0:10:33.740\nwhere we are right now.\n\n0:10:33.740,0:10:37.220\nAnd then we're storing that\ninto the foo variable.\n\n0:10:37.220,0:10:42.279\nSo we do that and then we ask\nfor foo, we view our string.\n\n0:10:42.280,0:10:48.460\nMore generally, we can do this thing\ncalled command substitution\n\n0:10:50.110,0:10:51.500\nby putting it into any string.\n\n0:10:51.500,0:10:55.162\nAnd since we're using double quotes\ninstead of single quotes\n\n0:10:55.162,0:10:57.440\nthat thing will be expanded and\n\n0:10:57.440,0:11:02.740\nit will tell us that we are\nin this working folder.\n\n0:11:02.740,0:11:09.240\nAnother interesting thing is, right now,\nwhat this is expanding to is a string\n\n0:11:09.400,0:11:10.300\ninstead of\n\n0:11:11.920,0:11:13.320\nIt's just expanding as a string.\n\n0:11:13.460,0:11:17.640\nAnother nifty and lesser known tool\nis called process substitution,\n\n0:11:17.640,0:11:20.540\nwhich is kind of similar. What it will do...\n\n0:11:24.360,0:11:30.041\nit will, here for example, the \"<(\",\nsome command and another parenthesis,\n\n0:11:30.041,0:11:34.840\nwhat that will do is: that will execute,\nthat will get the output to\n\n0:11:34.840,0:11:39.120\nkind of like a temporary file and it will\ngive the file handle to the command.\n\n0:11:39.120,0:11:42.020\nSo here what we're doing is we're getting...\n\n0:11:42.020,0:11:45.760\nwe're LS'ing the directory, putting\nit into a temporary file,\n\n0:11:45.760,0:11:48.040\ndoing the same thing for the\nparent folder and then\n\n0:11:48.040,0:11:51.310\nwe're concatenating both files. And this\n\n0:11:51.310,0:11:55.520\nwill, may be really handy, because\nsome commands instead of expecting\n\n0:11:55.520,0:11:59.500\nthe input coming from the stdin,\nthey are expecting things to\n\n0:11:59.500,0:12:03.560\ncome from some file that is giving\nsome of the arguments.\n\n0:12:04.700,0:12:07.620\nSo we get both things concatenated.\n\n0:12:12.880,0:12:17.040\nI think so far there's been a lot of\ninformation, let's see a simple,\n\n0:12:17.040,0:12:22.920\nan example script where we\nsee a few of these things.\n\n0:12:23.200,0:12:27.220\nSo for example here we have a string and we\n\n0:12:27.220,0:12:30.327\nhave this $date. So $date is a program.\n\n0:12:30.327,0:12:34.540\nAgain there's a lot of programs\nin UNIX you will kind of slowly\n\n0:12:34.540,0:12:36.120\nfamiliarize with a lot of them.\n\n0:12:36.120,0:12:42.820\nDate just prints what the current date is\nand you can specify different formats.\n\n0:12:43.800,0:12:48.700\nThen, we have these $0 here. $0 is the name\n\n0:12:48.700,0:12:50.540\nof the script that we're running.\n\n0:12:50.550,0:12:56.590\nThen we have $#, that's the number\nof arguments that we are giving\n\n0:12:56.590,0:13:01.920\nto the command, and then $$ is the process\nID of this command that is running.\n\n0:13:01.920,0:13:06.160\nAgain, there's a lot of these dollar\nthings, they're not intuitive\n\n0:13:06.160,0:13:07.690\nbecause they don't have like a mnemonic\n\n0:13:07.690,0:13:10.450\nway of remembering, maybe, $#. But\n\n0:13:10.450,0:13:12.880\nit can be... you will just be\n\n0:13:12.880,0:13:14.660\nseeing them and getting familiar with them.\n\n0:13:14.660,0:13:19.200\nHere we have this $@, and that will\nexpand to all the arguments.\n\n0:13:19.200,0:13:21.480\nSo, instead of having to assume that,\n\n0:13:21.490,0:13:25.840\nmaybe say, we have three arguments\nand writing $1, $2, $3,\n\n0:13:25.840,0:13:29.760\nif we don't know how many arguments we\ncan put all those arguments there.\n\n0:13:29.760,0:13:33.670\nAnd that has been given to a\nfor loop. And the for loop\n\n0:13:33.670,0:13:39.020\nwill, in time, get the file variable\n\n0:13:39.020,0:13:43.880\nand it will be giving each one of the arguments.\n\n0:13:43.880,0:13:47.529\nSo what we're doing is, for every\none of the arguments we're giving.\n\n0:13:47.529,0:13:51.699\nThen, in the next line we're running the\n\n0:13:51.699,0:13:56.920\ngrep command which is just search for\na substring in some file and we're\n\n0:13:56.920,0:14:01.380\nsearching for the string foobar in the file.\n\n0:14:01.380,0:14:06.490\nHere, we have put the variable\nthat the file took, to expand.\n\n0:14:06.490,0:14:11.559\nAnd yesterday we saw that if we care\nabout the output of a program, we can\n\n0:14:11.560,0:14:15.680\nredirect it to somewhere, to save it\nor to connect it to some other file.\n\n0:14:15.680,0:14:18.939\nBut sometimes you want the opposite.\n\n0:14:18.939,0:14:21.260\nSometimes, here for example, we care...\n\n0:14:21.260,0:14:25.119\nwe're gonna care about the error code. About\nthis script, we're gonna care whether the\n\n0:14:25.120,0:14:28.440\ngrep ran successfully or it didn't.\n\n0:14:28.440,0:14:33.220\nSo we can actually discard\nentirely what the output...\n\n0:14:33.220,0:14:37.480\nlike both the standard output and the\nstandard error of the grep command.\n\n0:14:37.480,0:14:39.970\nAnd what we're doing is we're\n\n0:14:39.970,0:14:43.029\nredirecting the output to /dev/null which\n\n0:14:43.029,0:14:46.540\nis kind of like a special device in UNIX\n\n0:14:46.540,0:14:49.119\nsystems where you can like write and\n\n0:14:49.119,0:14:51.129\nit will be discarded. Like you can\n\n0:14:51.129,0:14:52.869\nwrite no matter how much you want,\n\n0:14:52.869,0:14:57.730\nthere, and it will be discarded.\nAnd here's the \">\" symbol\n\n0:14:57.730,0:15:02.199\nthat we saw yesterday for redirecting\noutput. Here you have a \"2>\"\n\n0:15:02.199,0:15:04.689\nand, as some of you might have\n\n0:15:04.689,0:15:06.519\nguessed by now, this is for redirecting the\n\n0:15:06.519,0:15:08.589\nstandard error, because those those two\n\n0:15:08.589,0:15:11.709\nstreams are separate, and you kind of have to\n\n0:15:11.709,0:15:14.639\ntell bash what to do with each one of them.\n\n0:15:14.639,0:15:17.529\nSo here, we run, we check if the file has\n\n0:15:17.529,0:15:20.649\nfoobar, and if the file has foobar then it's\n\n0:15:20.649,0:15:22.959\ngoing to have a zero code. If it\n\n0:15:22.959,0:15:24.369\ndoesn't have foobar, it's gonna have a\n\n0:15:24.369,0:15:26.980\nnonzero error code. So that's exactly what we\n\n0:15:26.980,0:15:31.120\ncheck. In this if part of the command we\n\n0:15:31.120,0:15:34.840\nsay \"get me the error code\". Again, this $?\n\n0:15:34.840,0:15:37.240\nAnd then we have a comparison operator\n\n0:15:37.240,0:15:41.590\nwhich is \"-ne\", for \"non equal\". And some\n\n0:15:41.590,0:15:47.650\nother programming languages\nwill have \"==\", \"!=\", these\n\n0:15:47.650,0:15:51.070\nsymbols. In bash there's\n\n0:15:51.070,0:15:53.650\nlike a reserved set of comparisons and\n\n0:15:53.650,0:15:54.970\nit's mainly because there's a lot of\n\n0:15:54.970,0:15:57.520\nthings you might want to test for when\n\n0:15:57.520,0:15:59.080\nyou're in the shell. Here for example\n\n0:15:59.080,0:16:03.970\nwe're just checking for two values, two\ninteger values, being the same. Or for\n\n0:16:03.970,0:16:08.380\nexample here, the \"-F\" check will let\n\n0:16:08.380,0:16:10.420\nus know if a file exists, which is\n\n0:16:10.420,0:16:12.220\nsomething that you will run into very,\n\n0:16:12.220,0:16:17.530\nvery commonly. I'm going back to the\n\n0:16:17.530,0:16:23.020\nexample. Then, what happens when we\n\n0:16:24.400,0:16:28.600\nif the file did not have\nfoobar, like there was a\n\n0:16:28.600,0:16:31.990\nnonzero error code, then we print\n\n0:16:31.990,0:16:33.400\n\"this file doesn't have any foobar,\n\n0:16:33.400,0:16:36.400\nwe're going to add one\". And what we do is\n\n0:16:36.400,0:16:40.750\nwe echo this \"# foobar\", hoping this\n\n0:16:40.750,0:16:43.200\nis a comment to the file and then we're\n\n0:16:43.200,0:16:47.620\nusing the operator \">>\" to append at the end of\n\n0:16:47.620,0:16:50.800\nthe file. Here since the file has\n\n0:16:50.800,0:16:54.490\nbeen fed through the script, and we don't\nknow it beforehand, we have to substitute\n\n0:16:54.490,0:17:03.430\nthe variable of the filename. We can\nactually run this. We already have\n\n0:17:03.430,0:17:05.260\ncorrect permissions in this script and\n\n0:17:05.260,0:17:10.540\nwe can give a few examples. We have a\nfew files in this folder, \"mcd\" is the\n\n0:17:10.540,0:17:12.760\none we saw at the beginning for the MCD\n\n0:17:12.760,0:17:15.040\nfunction, some other \"script\" function and\n\n0:17:15.040,0:17:21.700\nwe can even feed the own script to itself\nto check if it has foobar in it.\n\n0:17:21.700,0:17:26.680\nAnd we run it and first we can\nsee that there's different\n\n0:17:26.680,0:17:29.460\nvariables that we saw, that have been\n\n0:17:29.460,0:17:33.400\nsuccessfully expanded. We have the date, that has\n\n0:17:33.400,0:17:36.700\nbeen replaced to the current time, then\n\n0:17:36.700,0:17:39.100\nwe're running this program, with three\n\n0:17:39.100,0:17:44.560\narguments, this randomized PID, and then\n\n0:17:44.560,0:17:46.510\nit's telling us MCD doesn't have any\n\n0:17:46.510,0:17:48.169\nfoobar, so we are adding a new one,\n\n0:17:48.169,0:17:50.450\nand this script file doesn't\n\n0:17:50.450,0:17:52.970\nhave one. So now for example let's look at MCD\n\n0:17:52.970,0:17:55.820\nand it has the comment that we were looking for.\n\n0:17:59.000,0:18:05.619\nOne other thing to know when you're\nexecuting scripts is that\n\n0:18:05.619,0:18:07.759\nhere we have like three completely\n\n0:18:07.759,0:18:10.279\ndifferent arguments but very commonly\n\n0:18:10.279,0:18:12.889\nyou will be giving arguments that\n\n0:18:12.889,0:18:16.100\ncan be more succinctly given in some way.\n\n0:18:16.100,0:18:20.179\nSo for example here if we wanted to\n\n0:18:20.179,0:18:25.429\nrefer to all the \".sh\" scripts we\n\n0:18:25.429,0:18:31.120\ncould just do something like \"ls *.sh\"\n\n0:18:31.120,0:18:36.120\nand this is a way of filename expansion\nthat most shells have\n\n0:18:36.120,0:18:38.450\nthat's called \"globbing\". Here, as you\n\n0:18:38.450,0:18:39.919\nmight expect, this is gonna say\n\n0:18:39.919,0:18:42.559\nanything that has any kind of sort of\n\n0:18:42.559,0:18:45.940\ncharacters and ends up with \"sh\".\n\n0:18:45.940,0:18:52.159\nUnsurprisingly, we get \"example.sh\"\nand \"mcd.sh\". We also have these\n\n0:18:52.159,0:18:54.769\n\"project1\" and \"project2\", and if there\n\n0:18:54.769,0:19:00.100\nwere like a... we can do a\n\"project42\", for example\n\n0:19:00.620,0:19:04.220\nAnd now if we just want to refer\nto the projects that have\n\n0:19:04.220,0:19:07.279\na single character, but not two characters\n\n0:19:07.279,0:19:08.720\nafterwards, like any other characters,\n\n0:19:08.720,0:19:13.879\nwe can use the question mark. So \"?\"\nwill expand to only a single one.\n\n0:19:13.880,0:19:17.360\nAnd we get, LS'ing, first\n\n0:19:17.360,0:19:21.049\n\"project1\" and then \"project2\".\n\n0:19:21.049,0:19:27.580\nIn general, globbing can be very powerful.\nYou can also combine it.\n\n0:19:31.880,0:19:35.480\nA common pattern is to use what\nis called curly braces.\n\n0:19:35.480,0:19:39.320\nSo let's say we have an image,\nthat we have in this folder\n\n0:19:39.320,0:19:43.620\nand we want to convert this image from PNG to JPG\n\n0:19:43.620,0:19:46.320\nor we could maybe copy it, or...\n\n0:19:46.320,0:19:49.609\nit's a really common pattern, to have\ntwo or more arguments that are\n\n0:19:49.609,0:19:55.240\nfairly similar and you want to do something\nwith them as arguments to some command.\n\n0:19:55.240,0:20:01.290\nYou could do it this way, or more\nsuccinctly, you can just do\n\n0:20:01.290,0:20:08.880\n\"image.{png,jpg}\"\n\n0:20:09.410,0:20:13.590\nAnd here, I'm getting some color feedback,\nbut what this will do, is\n\n0:20:13.590,0:20:17.610\nit'll expand into the line above.\n\n0:20:17.610,0:20:23.990\nActually, I can ask zsh to do that for\nme. And that what's happening here.\n\n0:20:23.990,0:20:26.550\nThis is really powerful. So for example\n\n0:20:26.550,0:20:29.220\nyou can do something like... we could do...\n\n0:20:29.220,0:20:34.220\n\"touch\" on a bunch of foo's, and\nall of this will be expanded.\n\n0:20:35.520,0:20:41.880\nYou can also do it at several levels\nand you will do the Cartesian...\n\n0:20:41.880,0:20:49.980\nif we have something like this,\nwe have one group here, \"{1,2}\"\n\n0:20:49.980,0:20:53.310\nand then here there's \"{1,2,3}\",\nand this is going to do\n\n0:20:53.310,0:20:54.990\nthe Cartesian product of these\n\n0:20:54.990,0:20:59.920\ntwo expansions and it will expand\ninto all these things,\n\n0:20:59.960,0:21:03.540\nthat we can quickly \"touch\".\n\n0:21:03.540,0:21:10.520\nYou can also combine the asterisk\nglob with the curly braces glob.\n\n0:21:10.520,0:21:16.840\nYou can even use kind of ranges.\nLike, we can do \"mkdir\"\n\n0:21:16.840,0:21:21.420\nand we create the \"foo\" and the\n\"bar\" directories, and then we\n\n0:21:21.420,0:21:25.680\ncan do something along these lines. This\n\n0:21:25.680,0:21:28.890\nis going to expand to \"fooa\", \"foob\"...\n\n0:21:28.890,0:21:31.430\nlike all these combinations, through \"j\", and\n\n0:21:31.430,0:21:35.250\nthen the same for \"bar\". I haven't\n\n0:21:35.250,0:21:38.610\nreally tested it... but yeah, we're getting\nall these combinations that we\n\n0:21:38.610,0:21:41.850\ncan \"touch\". And now, if we touch something\n\n0:21:41.850,0:21:47.970\nthat is different between these\ntwo [directories], we\n\n0:21:47.970,0:21:55.890\ncan again showcase the process\nsubstitution that we saw\n\n0:21:55.890,0:21:59.610\nearlier. Say we want to check what\nfiles are different between these\n\n0:21:59.610,0:22:03.400\ntwo folders. For us it's obvious,\nwe just saw it, it's X and Y,\n\n0:22:03.400,0:22:07.410\nbut we can ask the shell to do\nthis \"diff\" for us between the\n\n0:22:07.410,0:22:10.200\noutput of one LS and the other LS.\n\n0:22:10.200,0:22:12.810\nUnsurprisingly we're getting: X is\n\n0:22:12.810,0:22:14.700\nonly in the first folder and Y is\n\n0:22:14.700,0:22:20.970\nonly in the second folder. What is more\n\n0:22:20.970,0:22:26.519\nis, right now, we have only seen\nbash scripts. If you like other\n\n0:22:26.520,0:22:30.260\nscripts, like for some tasks bash\nis probably not the best,\n\n0:22:30.260,0:22:33.119\nit can be tricky. You can actually\nwrite scripts that\n\n0:22:33.119,0:22:35.700\ninteract with the shell implemented in a lot\n\n0:22:35.700,0:22:39.710\nof different languages. So for\nexample, let's see here a\n\n0:22:39.710,0:22:43.139\nPython script that has a magic line at the\n\n0:22:43.139,0:22:45.539\nbeginning that I'm not explaining for now.\n\n0:22:45.540,0:22:48.330\nThen we have \"import sys\",\n\n0:22:48.330,0:22:53.629\nit's kind of like... Python is not,\nby default, trying to interact\n\n0:22:53.629,0:22:56.999\nwith the shell so you will have to import\n\n0:22:56.999,0:22:58.799\nsome library. And then we're doing a\n\n0:22:58.799,0:23:01.529\nreally silly thing of just iterating\n\n0:23:01.529,0:23:06.440\nover \"sys.argv[1:]\".\n\n0:23:06.440,0:23:12.809\n\"sys.argv\" is kind of similar to what\nin bash we're getting as $0, $1, &c.\n\n0:23:12.809,0:23:16.649\nLike the vector of the arguments, we're\nprinting it in the reversed order.\n\n0:23:16.649,0:23:21.179\nAnd the magic line at the beginning is\n\n0:23:21.179,0:23:23.999\ncalled a shebang and is the way that the\n\n0:23:23.999,0:23:26.159\nshell will know how to run this program.\n\n0:23:26.159,0:23:30.509\nYou can always do something like\n\n0:23:30.509,0:23:34.379\n\"python script.py\", and then \"a b c\" and that\n\n0:23:34.379,0:23:36.659\nwill work, always, like that. But\n\n0:23:36.659,0:23:39.119\nwhat if we want to make this to be\n\n0:23:39.119,0:23:41.309\nexecutable from the shell? The way the\n\n0:23:41.309,0:23:44.190\nshell knows that it has to use python as the\n\n0:23:44.190,0:23:48.450\ninterpreter to run this file is using\n\n0:23:48.450,0:23:52.440\nthat first line. And that first line is\n\n0:23:52.440,0:23:56.620\ngiving it the path to where that thing lives.\n\n0:23:58.500,0:23:59.600\nHowever, you might not know.\n\n0:23:59.609,0:24:01.830\nLike, different machines will have probably\n\n0:24:01.830,0:24:04.049\ndifferent places where they put python\n\n0:24:04.049,0:24:06.090\nand you might not want to assume where\n\n0:24:06.090,0:24:08.789\npython is installed, or any other interpreter.\n\n0:24:08.789,0:24:16.379\nSo one thing that you can do is use the\n\n0:24:16.380,0:24:17.720\n\"env\" command.\n\n0:24:18.280,0:24:21.560\nYou can also give arguments in the shebang, so\n\n0:24:21.570,0:24:23.940\nwhat we're doing here is specifying\n\n0:24:23.940,0:24:29.720\nrun the \"env\" command, that is for pretty much every\nsystem, there are some exceptions, but like for\n\n0:24:29.720,0:24:31.550\npretty much every system it's is in\n\n0:24:31.550,0:24:33.620\n\"usr/bin\", where a lot of binaries live,\n\n0:24:33.620,0:24:36.200\nand then we're calling it with the\n\n0:24:36.200,0:24:38.570\nargument \"python\". And then that will make\n\n0:24:38.570,0:24:42.020\nuse of the path environment variable\n\n0:24:42.020,0:24:43.580\nthat we saw in the first lecture. It's\n\n0:24:43.580,0:24:45.680\ngonna search in that path for the Python\n\n0:24:45.680,0:24:48.620\nbinary and then it's gonna use that to\n\n0:24:48.620,0:24:50.480\ninterpret this file. And that will make\n\n0:24:50.480,0:24:52.490\nthis more portable so it can be run in\n\n0:24:52.490,0:24:57.520\nmy machine, and your machine\nand some other machine.\n\n0:25:08.020,0:25:12.140\nAnother thing is that the bash is not\n\n0:25:12.140,0:25:14.300\nreally like modern, it was\n\n0:25:14.300,0:25:16.340\ndeveloped a while ago. And sometimes\n\n0:25:16.340,0:25:18.890\nit can be tricky to debug. By\n\n0:25:18.890,0:25:21.980\ndefault, and the ways it will fail\n\n0:25:21.980,0:25:24.020\nsometimes are intuitive like the way we\n\n0:25:24.020,0:25:26.180\nsaw before of like foo command not\n\n0:25:26.180,0:25:28.610\nexisting, sometimes it's not. So there's\n\n0:25:28.610,0:25:31.280\nlike a really nifty tool that we have\n\n0:25:31.280,0:25:34.310\nlinked in the lecture notes, which is called\n\n0:25:34.310,0:25:37.580\n\"shellcheck\", that will kind of give you\n\n0:25:37.580,0:25:40.010\nboth warnings and syntactic errors\n\n0:25:40.010,0:25:43.250\nand other things that you might\nnot have quoted properly,\n\n0:25:43.250,0:25:46.040\nor you might have misplaced spaces in\n\n0:25:46.040,0:25:50.060\nyour files. So for example for\nextremely simple \"mcd.sh\"\n\n0:25:50.060,0:25:51.980\nfile we're getting a couple\n\n0:25:51.980,0:25:54.800\nof errors saying hey, surprisingly,\n\n0:25:54.800,0:25:56.090\nwe're missing a shebang, like this\n\n0:25:56.090,0:25:59.060\nmight not interpret it correctly if you're\n\n0:25:59.060,0:26:02.000\nit at a different system. Also, this\n\n0:26:02.000,0:26:05.620\nCD is taking a command and it might not\n\n0:26:05.620,0:26:08.960\nexpand properly so instead of using CD\n\n0:26:08.960,0:26:11.300\nyou might want to use something like CD\n\n0:26:11.300,0:26:14.540\nand then an OR and then an \"exit\". We go\n\n0:26:14.540,0:26:16.490\nback to what we explained earlier, what\n\n0:26:16.490,0:26:18.920\nthis will do is like if the\n\n0:26:18.920,0:26:21.860\nCD doesn't end correctly, you cannot CD\n\n0:26:21.860,0:26:23.720\ninto the folder because either you\n\n0:26:23.720,0:26:25.250\ndon't have permissions, it doesn't exist...\n\n0:26:25.250,0:26:28.780\nThat will give a nonzero error\n\n0:26:28.780,0:26:32.420\ncommand, so you will execute exit\n\n0:26:32.420,0:26:33.920\nand that will stop the script\n\n0:26:33.920,0:26:35.810\ninstead of continue executing as if\n\n0:26:35.810,0:26:37.240\nyou were in a place that you are\n\n0:26:37.240,0:26:42.900\nactually not in. And actually\nI haven't tested, but I\n\n0:26:42.920,0:26:47.179\nthink we can check for \"example.sh\"\n\n0:26:47.179,0:26:50.809\nand here we're getting that we should be\n\n0:26:50.809,0:26:55.070\nchecking the exit code in a\ndifferent way, because it's\n\n0:26:55.070,0:26:57.710\nprobably not the best way, doing it this\n\n0:26:57.710,0:27:01.580\nway. One last remark I want to make\n\n0:27:01.580,0:27:05.090\nis that when you're writing bash scripts\n\n0:27:05.090,0:27:07.159\nor functions for that matter,\n\n0:27:07.159,0:27:09.080\nthere's kind of a difference between\n\n0:27:09.080,0:27:12.590\nwriting bash scripts in isolation like a\n\n0:27:12.590,0:27:14.149\nthing that you're gonna run, and a thing\n\n0:27:14.149,0:27:16.100\nthat you're gonna load into your shell.\n\n0:27:16.100,0:27:19.850\nWe will see some of this in the command\n\n0:27:19.850,0:27:23.090\nline environment lecture, where we will kind of\n\n0:27:23.090,0:27:29.059\nbe tooling with the bashrc and the\nsshrc. But in general, if you make\n\n0:27:29.059,0:27:31.370\nchanges to for example where you are,\n\n0:27:31.370,0:27:34.009\nlike if you CD into a bash script and you\n\n0:27:34.009,0:27:36.919\njust execute that bash script, it won't CD\n\n0:27:36.919,0:27:39.980\ninto the shell are right now. But if you\n\n0:27:39.980,0:27:42.980\nhave loaded the code directly into\n\n0:27:42.980,0:27:45.559\nyour shell, for example you load...\n\n0:27:45.559,0:27:48.440\nyou source the function and then you execute\n\n0:27:48.440,0:27:50.269\nthe function then you will get those\n\n0:27:50.269,0:27:52.000\nside effects. And the same goes for\n\n0:27:52.000,0:27:57.220\ndefining variables into the shell.\n\n0:27:57.220,0:28:03.950\nNow I'm going to talk about some\ntools that I think are nifty when\n\n0:28:03.950,0:28:07.580\nworking with the shell. The first was\n\n0:28:07.580,0:28:09.799\nalso briefly introduced yesterday.\n\n0:28:09.799,0:28:13.309\nHow do you know what flags, or like\n\n0:28:13.309,0:28:15.320\nwhat exact commands are. Like how I am\n\n0:28:15.320,0:28:21.889\nsupposed to know that LS minus L will list\nthe files in a list format, or that\n\n0:28:21.889,0:28:25.789\nif I do \"move - i\", it's gonna like prom me\n\n0:28:25.789,0:28:28.639\nfor stuff. For that what you have is the \"man\"\n\n0:28:28.639,0:28:30.730\ncommand. And the man command will kind of\n\n0:28:30.730,0:28:33.590\nhave like a lot of information of how\n\n0:28:33.590,0:28:35.809\nwill you go about... so for example here it\n\n0:28:35.809,0:28:40.340\nwill explain for the \"-i\" flag, there are\n\n0:28:40.340,0:28:43.970\nall these options you can do. That's\n\n0:28:43.970,0:28:45.620\nactually pretty useful and it will work\n\n0:28:45.620,0:28:51.540\nnot only for really simple commands\nthat come packaged with your OS\n\n0:28:51.540,0:28:55.809\nbut will also work with some tools\nthat you install from the internet\n\n0:28:55.809,0:28:58.240\nfor example, if the person that did the\n\n0:28:58.240,0:29:01.390\ninstallation made it so that the man\n\n0:29:01.390,0:29:03.399\npackage were also installed. So for example\n\n0:29:03.399,0:29:06.490\na tool that we're gonna cover in a bit\n\n0:29:06.490,0:29:12.370\nwhich is called \"ripgrep\" and\nis called with RG, this didn't\n\n0:29:12.370,0:29:14.980\ncome with my system but it has installed\n\n0:29:14.980,0:29:17.230\nits own man page and I have it here and\n\n0:29:17.230,0:29:21.700\nI can access it. For some commands the\n\n0:29:21.700,0:29:25.029\nman page is useful but sometimes it can be\n\n0:29:25.029,0:29:28.270\ntricky to decipher because it's more\n\n0:29:28.270,0:29:30.399\nkind of a documentation and a\n\n0:29:30.399,0:29:32.679\ndescription of all the things the tool\n\n0:29:32.679,0:29:35.860\ncan do. Sometimes it will have\n\n0:29:35.860,0:29:37.720\nexamples but sometimes not, and sometimes\n\n0:29:37.720,0:29:41.620\nthe tool can do a lot of things so a\n\n0:29:41.620,0:29:45.250\ncouple of good tools that I use commonly\n\n0:29:45.250,0:29:50.289\nare \"convert\" or \"ffmpeg\", which deal\nwith images and video respectively and\n\n0:29:50.289,0:29:52.419\nthe man pages are like enormous. So there's\n\n0:29:52.419,0:29:54.850\none neat tool called \"tldr\" that\n\n0:29:54.850,0:29:58.240\nyou can install and you will have like\n\n0:29:58.240,0:30:02.710\nsome nice kind of explanatory examples\n\n0:30:02.710,0:30:05.470\nof how you want to use this command. And you\n\n0:30:05.470,0:30:07.840\ncan always Google for this, but I find\n\n0:30:07.840,0:30:10.120\nmyself saving going into the\n\n0:30:10.120,0:30:12.640\nbrowser, looking about some examples and\n\n0:30:12.640,0:30:14.919\ncoming back, whereas \"tldr\" are\n\n0:30:14.919,0:30:16.870\ncommunity contributed and\n\n0:30:16.870,0:30:19.210\nthey're fairly useful. Then,\n\n0:30:19.210,0:30:23.020\nthe one for \"ffmpeg\" has a lot of\n\n0:30:23.020,0:30:24.940\nuseful examples that are more nicely\n\n0:30:24.940,0:30:26.799\nformatted (if you don't have a huge\n\n0:30:26.799,0:30:30.820\nfont size for recording). Or even\n\n0:30:30.820,0:30:33.250\nsimple commands like \"tar\", that have a lot\n\n0:30:33.250,0:30:35.470\nof options that you are combining. So for\n\n0:30:35.470,0:30:37.840\nexample, here you can be combining 2, 3...\n\n0:30:37.840,0:30:41.710\ndifferent flags and it can not be\n\n0:30:41.710,0:30:43.419\nobvious, when you want to combine\n\n0:30:43.419,0:30:48.429\ndifferent ones. That's how you\n\n0:30:48.429,0:30:54.850\nwould go about finding more about these tools.\nOn the topic of finding, let's try\n\n0:30:54.850,0:30:58.690\nlearning how to find files. You can\n\n0:30:58.690,0:31:03.100\nalways go \"ls\", and like you can go like\n\n0:31:03.100,0:31:05.950\n\"ls project1\", and\n\n0:31:05.950,0:31:08.559\nkeep LS'ing all the way through. But\n\n0:31:08.559,0:31:11.740\nmaybe, if we already know that we want\n\n0:31:11.740,0:31:15.450\nto look for all the folders called\n\n0:31:15.450,0:31:19.000\n\"src\", then there's probably a better command\n\n0:31:19.000,0:31:21.400\nfor doing that. And that's \"find\".\n\n0:31:21.460,0:31:26.679\nFind is the tool that, pretty much comes\nwith every UNIX system. And find,\n\n0:31:26.679,0:31:35.230\nwe're gonna give it... here we're\nsaying we want to call find in the\n\n0:31:35.230,0:31:37.510\ncurrent folder, remember that \".\" stands\n\n0:31:37.510,0:31:40.149\nfor the current folder, and we want the\n\n0:31:40.149,0:31:46.539\nname to be \"src\" and we want the type to\nbe a directory. And by typing that it's\n\n0:31:46.539,0:31:49.870\ngonna recursively go through the current\n\n0:31:49.870,0:31:52.330\ndirectory and look for all these files,\n\n0:31:52.330,0:31:58.659\nor folders in this case, that match this\npattern. Find has a lot of useful\n\n0:31:58.659,0:32:01.840\nflags. So for example, you can even test\n\n0:32:01.840,0:32:05.440\nfor the path to be in a way. Here we're\n\n0:32:05.440,0:32:08.230\nsaying we want some number of folders,\n\n0:32:08.230,0:32:09.909\nwe don't really care how many folders,\n\n0:32:09.909,0:32:13.179\nand then we care about all the Python\n\n0:32:13.179,0:32:17.830\nscripts, all the things with the extension\n\".py\", that are within a\n\n0:32:17.830,0:32:19.899\ntest folder. And we're also checking, just in\n\n0:32:19.899,0:32:21.519\ncases really but we're checking just\n\n0:32:21.519,0:32:24.460\nthat it's also a type F, which stands for\n\n0:32:24.460,0:32:28.710\nfile. We're getting all these files.\n\n0:32:28.710,0:32:32.169\nYou can also use different flags for things\n\n0:32:32.169,0:32:34.000\nthat are not the path or the name.\n\n0:32:34.000,0:32:38.160\nYou could check things that have been\n\n0:32:38.160,0:32:42.060\nmodified (\"-mtime\" is for the modification\ntime), things that have been\n\n0:32:42.070,0:32:44.540\nmodified in the last day, which is gonna\n\n0:32:44.559,0:32:46.659\nbe pretty much everything. So this is gonna print\n\n0:32:46.659,0:32:49.029\na lot of the files we created and files\n\n0:32:49.029,0:32:51.850\nthat were already there. You can even\n\n0:32:51.850,0:32:54.960\nuse other things like size, the owner,\n\n0:32:54.960,0:32:59.080\npermissions, you name it. What is even more\n\n0:32:59.080,0:33:01.870\npowerful is, \"find\" can find stuff\n\n0:33:01.870,0:33:04.269\nbut it also can do stuff when you\n\n0:33:04.269,0:33:10.690\nfind those files. So we could look for all\n\n0:33:10.690,0:33:14.080\nthe files that have a TMP\n\n0:33:14.080,0:33:18.160\nextension, which is a temporary extension, and\n\n0:33:18.160,0:33:22.720\nthen, we can tell \"find\" that\nfor every one of those files,\n\n0:33:22.720,0:33:26.350\njust execute the \"rm\" command for them. And\n\n0:33:26.350,0:33:29.050\nthat will just be calling \"rm\" with all\n\n0:33:29.050,0:33:32.350\nthese files. So let's first execute it\n\n0:33:32.350,0:33:35.760\nwithout, and then we execute it with it.\n\n0:33:35.760,0:33:38.950\nAgain, as with the command line\n\n0:33:38.950,0:33:41.470\nphilosophy, it looks like nothing\n\n0:33:41.470,0:33:48.070\nhappened. But since we have\na zero error code, something\n\n0:33:48.070,0:33:49.540\nhappened - just that everything went\n\n0:33:49.540,0:33:51.490\ncorrect and everything is fine. And now,\n\n0:33:51.490,0:33:57.810\nif we look for these files,\nthey aren't there anymore.\n\n0:33:57.810,0:34:02.950\nAnother nice thing about the shell\nin general is that there are\n\n0:34:02.950,0:34:05.890\nthese tools, but people will keep\n\n0:34:05.890,0:34:08.230\nfinding new ways, so alternative\n\n0:34:08.230,0:34:12.220\nways of writing these tools. It's\nnice to know about it. So, for\n\n0:34:12.220,0:34:20.020\nexample find if you just want to match\nthe things that end in \"tmp\"\n\n0:34:20.020,0:34:24.190\nit can be sometimes weird to do this\nthing, it has a long command.\n\n0:34:24.190,0:34:27.760\nThere's things like \"fd\",\n\n0:34:27.760,0:34:32.320\nfor example, that is a shorter command\nthat by default will use regex\n\n0:34:32.320,0:34:34.899\nand will ignore your gitfiles, so you\n\n0:34:34.899,0:34:38.020\ndon't even search for them. It\n\n0:34:38.020,0:34:42.879\nwill color-code, it will have better\nUnicode support... It's nice to\n\n0:34:42.879,0:34:45.040\nknow about some of these tools. But, again,\n\n0:34:45.040,0:34:52.149\nthe main idea is that if you are aware\nthat these tools exist, you can\n\n0:34:52.149,0:34:53.740\nsave yourself a lot of time from doing\n\n0:34:53.740,0:34:57.660\nkind of menial and repetitive tasks.\n\n0:34:57.660,0:35:00.010\nAnother command to bear in mind is like\n\n0:35:00.010,0:35:01.990\n\"find\". Some of you may be\n\n0:35:01.990,0:35:04.300\nwondering, \"find\" is probably just\n\n0:35:04.300,0:35:06.520\nactually going through a directory\n\n0:35:06.520,0:35:09.580\nstructure and looking for things but\n\n0:35:09.580,0:35:11.260\nwhat if I'm doing a lot of \"finds\" a day?\n\n0:35:11.260,0:35:12.850\nWouldn't it be better, doing kind of\n\n0:35:12.850,0:35:18.790\na database approach and build an index\nfirst, and then use that index\n\n0:35:18.790,0:35:21.520\nand update it in some way. Well, actually\n\n0:35:21.520,0:35:23.380\nmost Unix systems already do it and\n\n0:35:23.380,0:35:28.170\nthis is through the \"locate\" command and\n\n0:35:28.170,0:35:31.690\nthe way that the locate will\n\n0:35:31.690,0:35:35.470\nbe used... it will just look for paths in\n\n0:35:35.470,0:35:38.680\nyour file system that have the substring\n\n0:35:38.680,0:35:44.710\nthat you want. I actually don't know if it\nwill work... Okay, it worked. Let me try to\n\n0:35:44.710,0:35:49.840\ndo something like \"missing-semester\".\n\n0:35:51.840,0:35:53.950\nYou're gonna take a while but\n\n0:35:53.950,0:35:56.109\nit found all these files that are somewhere\n\n0:35:56.109,0:35:57.730\nin my file system and since it has\n\n0:35:57.730,0:36:01.750\nbuilt an index already on them, it's much\n\n0:36:01.750,0:36:05.680\nfaster. And then, to keep it updated,\n\n0:36:05.680,0:36:11.980\nusing the \"updatedb\" command\nthat is running through cron,\n\n0:36:13.840,0:36:18.490\nto update this database. Finding files, again, is\n\n0:36:18.490,0:36:23.230\nreally useful. Sometimes you're actually concerned\nabout, not the files themselves,\n\n0:36:23.230,0:36:26.740\nbut the content of the files. For that\n\n0:36:26.740,0:36:31.420\nyou can use the grep command that we\n\n0:36:31.420,0:36:33.880\nhave seen so far. So you could do\n\n0:36:33.880,0:36:37.740\nsomething like grep foobar in MCD, it's there.\n\n0:36:37.740,0:36:43.690\nWhat if you want to, again, recursively\nsearch through the current\n\n0:36:43.690,0:36:45.760\nstructure and look for more files, right?\n\n0:36:45.760,0:36:48.700\nWe don't want to do this manually.\n\n0:36:48.700,0:36:51.220\nWe could use \"find\", and the \"-exec\", but\n\n0:36:51.220,0:36:58.920\nactually \"grep\" has the \"-R\" flag\nthat will go through the entire\n\n0:36:58.920,0:37:03.609\ndirectory, here. And it's telling us\n\n0:37:03.609,0:37:06.579\nthat oh we have the foobar line in example.sh\n\n0:37:06.579,0:37:09.279\nat these three places and in\n\n0:37:09.279,0:37:14.589\nthis other two places in foobar. This can be\n\n0:37:14.589,0:37:16.900\nreally convenient. Mainly, the\n\n0:37:16.900,0:37:18.940\nuse case for this is you know you have\n\n0:37:18.940,0:37:21.910\nwritten some code in some programming\n\n0:37:21.910,0:37:23.859\nlanguage, and you know it's somewhere in\n\n0:37:23.859,0:37:26.200\nyour file system but you actually don't\n\n0:37:26.200,0:37:28.599\nknow. But you can actually quickly search.\n\n0:37:28.600,0:37:32.980\nSo for example, I can quickly search\n\n0:37:35.660,0:37:40.320\nfor all the Python files that I have in my\n\n0:37:40.329,0:37:45.460\nscratch folder where I used the request library.\n\n0:37:45.460,0:37:47.589\nAnd if I run this, it's giving me\n\n0:37:47.589,0:37:50.890\nthrough all these files, exactly in\n\n0:37:50.890,0:37:53.650\nwhat line it has been found. And here\n\n0:37:53.650,0:37:56.260\ninstead of using grep, which is fine,\n\n0:37:56.260,0:37:58.930\nyou could also do this, I'm using \"ripgrep\",\n\n0:37:58.930,0:38:05.260\nwhich is kind of the same idea but\nagain trying to bring some more\n\n0:38:05.260,0:38:09.730\nniceties like color coding or file\n\n0:38:09.730,0:38:16.480\nprocessing and other things. It think it has,\nalso, unicode support. It's also pretty\n\n0:38:16.480,0:38:22.829\nfast so you are not paying like a\ntrade-off on this being slower and\n\n0:38:22.829,0:38:25.420\nthere's a lot of useful flags. You\n\n0:38:25.420,0:38:27.670\ncan say, oh, I actually want to get some\n\n0:38:27.670,0:38:30.460\ncontext around those results.\n\n0:38:33.040,0:38:36.400\nSo I want to get like five\nlines of context around\n\n0:38:36.400,0:38:42.819\nthat, so you can see where that import\nlives and see code around it.\n\n0:38:42.819,0:38:44.170\nHere in the import it's not really useful\n\n0:38:44.170,0:38:45.819\nbut like if you're looking for where you\n\n0:38:45.819,0:38:49.720\nuse the function, for example, it will\n\n0:38:49.720,0:38:54.010\nbe very handy. We can also do things like\n\n0:38:54.010,0:38:59.170\nwe can search, for example here,.\n\n0:38:59.170,0:39:04.839\nA more advanced use, we can say,\n\n0:39:04.840,0:39:11.580\n\"-u\" is for don't ignore hidden files, sometimes\n\n0:39:12.520,0:39:16.359\nyou want to be ignoring hidden\nfiles, except if you want to\n\n0:39:16.359,0:39:23.500\nsearch config files, that are by default\nhidden. Then, instead of printing\n\n0:39:23.500,0:39:28.400\nthe matches, we're asking to do something\nthat would be kind of hard, I think,\n\n0:39:28.400,0:39:31.380\nto do with grep, out of my head, which is\n\n0:39:31.390,0:39:34.569\n\"I want you to print all the files that\n\n0:39:34.569,0:39:37.750\ndon't match the pattern I'm giving you\", which\n\n0:39:37.750,0:39:40.030\nmay be a weird thing to ask here but\n\n0:39:40.030,0:39:42.940\nthen we keep going... And this pattern here\n\n0:39:42.940,0:39:45.790\nis a small regex which is saying\n\n0:39:45.790,0:39:48.099\nat the beginning of the line I have a\n\n0:39:48.099,0:39:51.190\n\"#\" and a \"!\", and that's a shebang.\n\n0:39:51.190,0:39:53.470\nLike that, we're searching here for all\n\n0:39:53.470,0:39:56.650\nthe files that don't have a shebang\n\n0:39:56.650,0:39:59.369\nand then we're giving it, here,\n\n0:39:59.369,0:40:02.470\na \"-t sh\" to only look for \"sh\"\n\n0:40:02.470,0:40:07.660\nfiles, because maybe all your\nPython or text files are fine\n\n0:40:07.660,0:40:10.000\nwithout a shebang. And here it's telling us\n\n0:40:10.000,0:40:13.020\n\"oh, MCD is obviously missing a shebang\"\n\n0:40:14.760,0:40:16.660\nWe can even... It has like some\n\n0:40:16.660,0:40:19.119\nnice flags, so for example if we\n\n0:40:19.120,0:40:21.360\ninclude the \"stats\" flag\n\n0:40:28.700,0:40:34.119\nit will get all these results but it will\nalso tell us information about all\n\n0:40:34.119,0:40:35.410\nthe things that it searched. For example,\n\n0:40:35.410,0:40:40.390\nthe number of matches that it found,\nthe lines, the file searched,\n\n0:40:40.390,0:40:44.040\nthe bytes that it printed, &c.\n\n0:40:44.040,0:40:47.160\nSimilar as with \"fd\", sometimes\nit's not as useful\n\n0:40:48.400,0:40:50.619\nusing one specific tool or another and\n\n0:40:50.620,0:40:55.780\nin fact, as ripgrep, there are several\nother tools. Like \"ack\",\n\n0:40:55.780,0:40:57.700\nis the original grep alternative that was\n\n0:40:57.700,0:41:00.670\nwritten. Then the silver searcher,\n\n0:41:00.670,0:41:04.089\n\"ag\", was another one... and they're all\n\n0:41:04.089,0:41:05.589\npretty much interchangeable so\n\n0:41:05.589,0:41:07.630\nmaybe you're at a system that has one and\n\n0:41:07.630,0:41:09.670\nnot the other, just knowing that you can\n\n0:41:09.670,0:41:12.040\nuse these things with these tools can be\n\n0:41:12.040,0:41:15.549\nfairly useful. Lastly, I want to cover\n\n0:41:15.549,0:41:19.780\nhow you go about, not finding files\nor code, but how you go about\n\n0:41:19.780,0:41:22.540\nfinding commands that you already\n\n0:41:22.540,0:41:30.160\nsome time figured out. The first, obvious\nway is just using the up arrow,\n\n0:41:30.160,0:41:34.540\nand slowly going through all your history,\nlooking for these matches.\n\n0:41:34.540,0:41:36.490\nThis is actually not very efficient, as\n\n0:41:36.490,0:41:42.579\nyou probably guessed. So the bash\nhas ways to do this more easily.\n\n0:41:42.579,0:41:44.619\nThere is the \"history\" command, that will\n\n0:41:44.619,0:41:49.180\nprint your history. Here I'm in zsh and\nit only prints some of my history, but\n\n0:41:49.180,0:41:54.069\nif I say, I want you to print everything\nfrom the beginning of time, it will print\n\n0:41:54.069,0:41:58.220\neverything from the beginning\nof whatever this history is.\n\n0:41:58.220,0:42:00.700\nAnd since this is a lot of results,\n\n0:42:00.700,0:42:02.589\nmaybe we care about the ones where we\n\n0:42:02.589,0:42:08.490\nuse the \"convert\" command to go from some\ntype of file to some other type of file.\n\n0:42:08.490,0:42:12.940\nSome image, sorry. Then, we're getting all\n\n0:42:12.940,0:42:15.849\nthese results here, about all the ones\n\n0:42:15.849,0:42:18.120\nthat match this substring.\n\n0:42:21.280,0:42:24.609\nEven more, pretty much all shells by default will\n\n0:42:24.609,0:42:27.130\nlink \"Ctrl+R\", the keybinding,\n\n0:42:27.130,0:42:29.680\nto do backward search. Here we\n\n0:42:29.680,0:42:31.569\nhave backward search, where we can\n\n0:42:31.569,0:42:34.750\ntype \"convert\" and it's finding the\n\n0:42:34.750,0:42:36.609\ncommand that we just typed. And if we just\n\n0:42:36.609,0:42:38.619\nkeep hitting \"Ctrl+R\", it will\n\n0:42:38.619,0:42:41.740\nkind of go through these matches and\n\n0:42:41.740,0:42:44.260\nit will let re-execute it\n\n0:42:44.260,0:42:49.240\nin place. Another thing that you can do,\n\n0:42:49.240,0:42:51.069\nrelated to that, is you can use this\n\n0:42:51.069,0:42:53.829\nreally nifty tool called \"fzf\", which is\n\n0:42:53.829,0:42:56.280\nlike a fuzzy finder, like it will...\n\n0:42:57.100,0:42:58.480\nIt will let you do kind of\n\n0:42:58.480,0:43:02.200\nlike an interactive grep. We could do\n\n0:43:02.200,0:43:06.369\nfor example this, where we can cat our\n\n0:43:06.369,0:43:10.030\nexample.sh command, that will print\n\n0:43:10.030,0:43:11.680\nprint to the standard output, and then we\n\n0:43:11.680,0:43:14.290\ncan pipe it through fzf. It's just getting\n\n0:43:14.290,0:43:18.490\nall the lines and then we can\ninteractively look for the\n\n0:43:18.490,0:43:21.849\nstring that we care about. And the nice\n\n0:43:21.849,0:43:26.349\nthing about fzf is that, if you enable\nthe default bindings, it will bind to\n\n0:43:26.349,0:43:33.670\nyour \"Ctrl+R\" shell execution and now\n\n0:43:33.670,0:43:36.490\nyou can quickly and dynamically like\n\n0:43:36.490,0:43:41.700\nlook for all the times you try to\nconvert a favicon in your history.\n\n0:43:42.020,0:43:46.375\nAnd it's also like fuzzy matching,\nwhereas like by default in grep\n\n0:43:46.375,0:43:49.420\nor these things you have to write a regex or some\n\n0:43:49.420,0:43:52.360\nexpression that will match within here.\n\n0:43:52.360,0:43:54.609\nHere I'm just typing \"convert\" and \"favicon\" and\n\n0:43:54.609,0:43:57.369\nit's just trying to do the best scan,\n\n0:43:57.369,0:44:01.349\ndoing the match in the lines it has.\n\n0:44:01.349,0:44:06.190\nLastly, a tool that probably you have\nalready seen, that I've been using\n\n0:44:06.190,0:44:08.410\nfor not retyping these extremely long\n\n0:44:08.410,0:44:13.080\ncommands is this \"history\nsubstring search\", where\n\n0:44:13.940,0:44:15.660\nas I type in my shell,\n\n0:44:15.670,0:44:19.630\nand both F fail to mention but both face\n\n0:44:19.630,0:44:22.760\nwhich I think was originally introduced,\nthis concept, and then\n\n0:44:22.760,0:44:25.760\nzsh has a really nice implementation)\n\n0:44:25.760,0:44:26.800\nwhat it'll let you do is\n\n0:44:26.800,0:44:31.300\nas you type the command, it will\ndynamically search back in your\n\n0:44:31.300,0:44:34.420\nhistory to the same command\nthat has a common prefix,\n\n0:44:34.980,0:44:36.900\nand then, if you...\n\n0:44:39.100,0:44:42.100\nit will change as the match list stops\n\n0:44:42.100,0:44:44.110\nworking and then as you do the\n\n0:44:44.120,0:44:49.760\nright arrow you can select that\ncommand and then re-execute it.\n\n0:45:05.800,0:45:09.920\nWe've seen a bunch of stuff... I think I have\n\n0:45:09.940,0:45:16.180\na few minutes left so I'm going\nto cover a couple of tools to do\n\n0:45:16.180,0:45:20.060\nreally quick directory listing\nand directory navigation.\n\n0:45:20.060,0:45:30.020\nSo you can always use the \"-R\" to recursively\nlist some directory structure,\n\n0:45:30.020,0:45:35.160\nbut that can be suboptimal, I cannot\nreally make sense of this easily.\n\n0:45:36.340,0:45:44.460\nThere's tool called \"tree\" that will\nbe the much more friendly form of\n\n0:45:44.460,0:45:47.500\nprinting all the stuff, it will\nalso color code based on...\n\n0:45:47.500,0:45:50.680\nhere for example \"foo\" is blue\nbecause it's a directory and\n\n0:45:50.680,0:45:55.100\nthis is red because it has execute permissions.\n\n0:45:55.100,0:46:00.220\nBut we can go even further than\nthat. There's really nice tools\n\n0:46:00.220,0:46:04.580\nlike a recent one called \"broot\" that\nwill do the same thing but here\n\n0:46:04.580,0:46:07.300\nfor example instead of doing\nthis thing of listing\n\n0:46:07.300,0:46:09.160\nevery single file, for example in bar\n\n0:46:09.160,0:46:11.400\nwe have these \"a\" through \"j\" files,\n\n0:46:11.400,0:46:14.260\nit will say \"oh there are more, unlisted here\".\n\n0:46:15.080,0:46:18.200\nI can actually start typing and it will again\n\n0:46:18.200,0:46:21.540\nagain facily match to the files that are there\n\n0:46:21.540,0:46:24.800\nand I can quickly select them\nand navigate through them.\n\n0:46:24.800,0:46:28.380\nSo, again, it's good to know that\n\n0:46:28.380,0:46:33.340\nthese things exist so you don't\nlose a large amount of time\n\n0:46:34.240,0:46:36.180\ngoing for these files.\n\n0:46:37.880,0:46:40.500\nThere are also, I think I have it installed\n\n0:46:40.500,0:46:44.829\nalso something more similar to what\nyou would expect your OS to have,\n\n0:46:44.829,0:46:49.960\nlike Nautilus or one of the Mac\nfinders that have like an\n\n0:46:49.960,0:46:59.260\ninteractive input where you can just use your\nnavigation arrows and quickly explore.\n\n0:46:59.260,0:47:03.849\nIt might be overkill but you'll\nbe surprised how quickly you can\n\n0:47:03.849,0:47:07.839\nmake sense of some directory structure\nby just navigating through it.\n\n0:47:07.840,0:47:12.780\nAnd pretty much all of these tools\nwill let you edit, copy files...\n\n0:47:12.780,0:47:16.880\nif you just look for the options for them.\n\n0:47:17.600,0:47:20.100\nThe last addendum is kind of going places.\n\n0:47:20.100,0:47:24.480\nWe have \"cd\", and \"cd\" is nice, it will get you\n\n0:47:26.120,0:47:30.060\nto a lot of places. But it's pretty handy if\n\n0:47:30.069,0:47:33.190\nyou can like quickly go places,\n\n0:47:33.190,0:47:36.730\neither you have been to recently or that\n\n0:47:36.730,0:47:40.599\nyou go frequently. And you can do this in\n\n0:47:40.599,0:47:42.520\nmany ways there's probably... you can start\n\n0:47:42.520,0:47:44.319\nthinking, oh I can make bookmarks, I can\n\n0:47:44.319,0:47:46.660\nmake... I can make aliases in the shell,\n\n0:47:46.660,0:47:49.020\nthat we will cover at some point,\n\n0:47:49.020,0:47:53.020\nsymlinks... But at this point,\n\n0:47:53.020,0:47:54.910\nprogrammers have like built all these\n\n0:47:54.910,0:47:56.799\ntools, so programmers have already figured\n\n0:47:56.799,0:47:59.520\nout a really nice way of doing this.\n\n0:47:59.520,0:48:01.930\nOne way of doing this is using what is\n\n0:48:01.930,0:48:05.760\ncalled \"auto jump\", which I\nthink is not loaded here...\n\n0:48:14.140,0:48:20.100\nOkay, don't worry. I will cover it\nin the command line environment.\n\n0:48:21.960,0:48:25.579\nI think it's because I disabled\nthe \"Ctrl+R\" and that also\n\n0:48:25.579,0:48:31.309\naffected other parts of the script.\nI think at this point if anyone has\n\n0:48:31.309,0:48:35.480\nany questions that are related to this,\nI'll be more than happy to answer\n\n0:48:35.480,0:48:37.509\nthem, if anything was left unclear.\n\n0:48:37.509,0:48:42.859\nOtherwise, a there's a bunch of\nexercises that we wrote, kind of\n\n0:48:42.859,0:48:46.549\ntouching on these topics and we\nencourage you to try them and\n\n0:48:46.549,0:48:48.559\ncome to office hours, where we can help\n\n0:48:48.559,0:48:54.569\nyou figure out how to do them, or some\nbash quirks that are not clear.\n\n"
  }
]