The task of building infrastructure and process for
ingesting, processing, and aggregating data so that it can be displayed to users
or made available to data scientists.
Data Science
The practice of using statistics, machine learning, and other
tools to analyze data to discover trends and truths that can be used to provide
business intelligence.
Batch Processing
Processing large amounts of data at once. This is acceptable
for smaller amounts of data and can be simpler in terms of development and
deployment. Some batch processes can also be useful for "recomputing the world"
when you want to analyze existing data in a new way.
Data Streaming
Processing data in small chunks, one at a time, rather than
processing all data at once. Streaming is necessary for processing infinite
event streams. It's also useful for processing large amounts of data, because it
prevents memory overflows during processing and makes it easier to process data
in a distributed or real-time manner.
Real-time
Analyzing data and delivering results simultaneously so that stream
output is always visible. For example, real-time analytics will mean that the
system is constantly processing events (clicks, purchases, etc) and displaying
the latest results in a user interface.
Parallel Processing
Performing multiple tasks at the same time, for example
on different cores or processors. Parallel processing is necessary in order to
perform more than one computation at once; common uses are parsing or
aggregation.
Concurrent Processing
Managing multiple ongoing tasks at once without
necessarily processing more than one task in the same exact moment. Concurrent
processing is required in order to perform more than one effect at once, such as
waiting for multiple network requests to complete.
Constant Memory
Processing a stream where the amount of memory required does
not increase with the size of the stream.
At Least Once Delivery
A guarantee that a given message will be delivered at
least once, but may be delivered more than once. This is achieved in Kafka by
committing an offset after it's been fully processed and in RabbitMQ by
acknowledging a message after fully processing it.
At Most Once Delivery
A guarantee that a message will never be delivered more
than once, but may not be delivered at all. This is achieved in Kafka by
committing an offset before fully processing a message and in RabbitMQ by
acknowledging a message before fully processing it.
Distributed Data Processing
Breaking up data into partitions so that large
amounts of data can be processed by many machines simultaneously.
Cluster
Several computers (or virtual machines) grouped together to perform a
single task.
Scala
A programming language (like Ruby, Python, or JavaScript) which is fast
and has become popular for data-focused tasks. Scala runs on the Java Virtual
Machine, which is a high-performance engine for running languages like Scala
that compile into bytecode.
Type Safety
Languages that provide type safety (such as Scala) check the
program for possible errors as part of the compilation process, which allows
developers to prevent many types of bugs before being deployed.
Spark
A distributed computing engine for big data and data streams. Spark is
a Scala-focused framework for data engineering and data science.
Kafka
A distributed commit log for data streams. Many of the large data
systems deployed today use Kafka.
Record Stream
A stream where each message is an independent, unique record
which does not replace a previous record in the stream.
Changelog Stream
A stream where each message represents the latest state for
a particular entity.
Topic
In Kafka, a partitioned, append-only log of messages which can be
consumed in order by partition.
Partition
In Kafka, a way of breaking the messages of a topic into groups
which can be consumed in parallel by one or more workers.
Queue
In RabbitMQ, messages sent to an exchange are placed on a queue.
Messages on a queue can be consumed in parallel by one or more workers.
Consumer
An application or process that reads from a data stream.
Producer
An application or process that writes to a data stream.
================================================
FILE: email/README.md
================================================
# Email
- Use [SendGrid][] or [Amazon SES][] to deliver email in staging and production
environments.
- Use a tool like [ActionMailer Preview][] to look at each created or updated
mailer view before merging.
[actionmailer preview]: https://guides.rubyonrails.org/action_mailer_basics.html#previewing-and-testing-mailers
[amazon ses]: https://thoughtbot.com/blog/deliver-email-with-amazon-ses-in-a-rails-app
[sendgrid]: https://devcenter.heroku.com/articles/sendgrid
================================================
FILE: erb/README.md
================================================
# ERB
[Sample](sample.html.erb)
- When wrapping long lines, keep the method name on the same line as the ERB
interpolation operator and keep each method argument on its own line.
- Use a trailing comma after each argument in a multi-line method call,
including the last item.
- Prefer double quotes for attributes.
================================================
FILE: erb/sample.html.erb
================================================
<%= short_method_call_that_fits_on_one_line arguments %>
<%= link_to(
some_object_with_a_long_name.title,
parent_object_child_object_path(some_object_with_a_long_name),
) %>
================================================
FILE: general/README.md
================================================
# General Guidelines
Style and best practices that apply to all languages and frameworks.
## Philosophy
- These are not to be blindly followed; strive to understand these and ask when
in doubt.
- Don't duplicate the functionality of a built-in library.
- Don't swallow exceptions or "fail silently."
- Don't write code that guesses at future functionality.
- Exceptions should be exceptional.
- Keep the code simple.
## Code Review
Use a linter to automatically review your GitHub pull requests for style guide
violations.
## Formatting
- Break long lines after 80 characters.
- Delete trailing spaces.
- Don't misspell.
- Use [Unix-style line endings] (`\n`).
- Use spaces around operators, except for unary operators, such as `!`.
[unix-style line endings]: http://unix.stackexchange.com/questions/23903/should-i-end-my-text-script-files-with-a-newline
## Naming
- Avoid abbreviations.
- Avoid object types in names (`user_array`, `email_method` `CalculatorClass`,
`ReportModule`).
- Prefer naming classes after domain concepts rather than patterns they
implement (e.g. `Guest` vs `NullUser`, `CachedRequest` vs `RequestDecorator`).
- Name the enumeration parameter the singular of the collection (`users.each { |user| greet(user) }`).
- Name variables, methods, and classes to reveal intent. This includes documentation and
examples (e.g. don't use `foo`, `bar`, `baz` in examples).
- Treat acronyms as words in names (`XmlHttpRequest` not `XMLHTTPRequest`), even
if the acronym is the entire name (`class Html` not `class HTML`).
## Organization
- Order methods so that caller methods are earlier in the file than the methods
they call.
- Order methods so that methods are as close as possible to other methods they
call.
================================================
FILE: git/README.md
================================================
# Git
A guide for programming within version control.
## Best Practices
- Avoid merge commits by using a [rebase workflow].
- Squash multiple trivial commits into a single commit.
- Write a [good commit message].
[rebase workflow]: https://github.com/thoughtbot/guides/blob/main/git/README.md#merge
[good commit message]: http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html
## Maintain a Repo
- Avoid including files in source control that are specific to your development
machine or process.
- Delete local and remote feature branches after merging.
- Perform work in a feature branch.
- Rebase frequently to incorporate upstream changes.
- Use a [pull request] for code reviews.
[pull request]: https://help.github.com/articles/using-pull-requests/
## Write a Feature
Create a local feature branch based off `main`.
```console
git checkout main
git pull
git checkout -b
```
Rebase frequently to incorporate upstream changes.
```console
git fetch origin
git rebase origin/main
```
Resolve conflicts. When feature is complete and tests pass, stage the changes.
```console
git add --all
```
When you've staged the changes, commit them.
```console
git status
git commit --verbose
```
Write a [good commit message]. Example format:
```text
Present-tense summary under 50 characters
- More information about commit (under 72 characters).
- More information about commit (under 72 characters).
http://project.management-system.com/ticket/123
```
If you've created more than one commit, [use `git rebase` interactively] to squash them into cohesive commits with good
messages:
```console
git rebase -i origin/main
```
Share your branch.
```console
git push origin
```
Submit a [GitHub pull request].
Ask for a code review in the project's chat room.
[use `git rebase` interactively]: https://help.github.com/articles/about-git-rebase/
[github pull request]: https://help.github.com/articles/using-pull-requests/
## Review Code
A team member other than the author reviews the pull request. They follow [Code
Review](/code-review/) guidelines to avoid miscommunication.
They make comments and ask questions directly on lines of code in the GitHub web
interface or in the project's chat room.
For changes which they can make themselves, they check out the branch.
```console
git checkout
./bin/setup
git diff staging/main..HEAD
```
They make small changes right in the branch, test the feature on their machine,
run tests, commit, and push.
When satisfied, they comment on the pull request `Ready to merge.`
## Merge
Rebase interactively. Squash commits like "Fix whitespace" into one or a small
number of valuable commit(s). Edit commit messages to reveal intent. Run tests.
```console
git fetch origin
git rebase -i origin/main
```
Force push your branch. This allows GitHub to automatically close your pull
request and mark it as merged when your commit(s) are pushed to `main`. It also
makes it possible to [find the pull request] that brought in your changes.
```console
git push --force-with-lease origin
```
View a list of new commits. View changed files. Merge branch into `main`.
```console
git log origin/main..
git diff --stat origin/main
git checkout main
git merge --ff-only
git push
```
Delete your remote feature branch.
```console
git push origin --delete
```
Delete your local feature branch.
```console
git branch --delete
```
[find the pull request]: http://stackoverflow.com/a/17819027
================================================
FILE: graphql/README.md
================================================
# GraphQL
A guide for building GraphQL servers and clients.
## Learning
A curated list of resources for learning GraphQL.
- **[Official GraphQL Learning Site]**
- **[How To GraphQL]** Online tutorial for both server and client GraphQL in
multiple programming languages.
- **[Learning GraphQL]** A clear introduction to GraphQL technology and a walk
through of building a GraphQL server.
- **[GraphQL: A Query Language for your API]** Presentation introducing GraphQL
to thoughtbot.
[official graphql learning site]: https://graphql.org/learn/
[how to graphql]: https://www.howtographql.com/
[learning graphql]: http://shop.oreilly.com/product/0636920137269.do
[graphql: a query language for your api]: https://www.dropbox.com/s/svqe68hpdiixf0g/presentation.pdf?dl=0
## Public GraphQL APIs
Publicly available GraphQL APIs allowing you to explore how GraphQL is and can
be used.
- **[GitHub GraphQL API Explorer]**
- **[Star Wars GraphQL]**
[github graphql api explorer]: https://developer.github.com/v4/explorer/
[star wars graphql]: https://graphql.org/swapi-graphql/
## Tools
- **[GraphiQL].** An Electron-based "web IDE" for interacting with GraphQL APIs.
GraphiQL can also be served as a page in an application.
- **[GraphQL Playground]** An Electron-based "web IDE" for interacting with
GraphQL APIs. Intends to expand upon GraphiQL.
- **[Insomnia]** An HTTP client with solid GraphQL support.
- **[Apollo Client Dev Tools]** Chrome Extension offering developer tools for
Apollo projects.
[graphiql]: https://github.com/graphql/graphiql
[graphql playground]: https://github.com/prisma/graphql-playground
[insomnia]: https://insomnia.rest/
[apollo client dev tools]: https://www.apollographql.com/docs/react/features/developer-tooling
## Best Practices
- Follow the latest version of the [GraphQL specification].
- When serving over HTTP, respond with a 200 OK status code to all GraphQL
queries.
- If a client or server error occurs, use the `errors` key in the GraphQL
response.
- If a user-facing error occurs (such as invalid user input), use the `data` key
in the GraphQL response.
- If a mutation can fail because of a user error, use a union type to describe
the possible outcomes.
- If there is an authenticated user, provide the user in the context for the
resolver.
- Provide the updated object as a field in mutations.
- Provide the ID of the deleted object as a field in mutations that delete
objects.
- Use JSON as a default transport format.
- Avoid returning null from operations. [#630]
[graphql specification]: https://graphql.github.io/graphql-spec/
[#630]: https://github.com/thoughtbot/guides/pull/630
================================================
FILE: html/README.md
================================================
# HTML
- Use the [W3C's Markup Validation Service][html-validator] to validate HTML
- Prefer double quotes for attributes.
- Use lowercase text for elements and attributes
- Use double quotes to wrap element attributes
- Use closing tags for all [normal elements]
- Prefer a HTML5 doctype
- Ensure elements are scoped properly
- Elements such as `` and `` must be placed within the page's
`` element
- Elements such as `