Showing preview only (1,482K chars total). Download the full file or copy to clipboard to get everything.
Repository: christophergandrud/Rep-Res-Book
Branch: master
Commit: 716e545e0764
Files: 90
Total size: 1.4 MB
Directory structure:
gitextract_fuj2c_ui/
├── .gitignore
├── Old/
│ ├── BookMake.R
│ ├── CoverGraphics/
│ │ └── 2ndEditionCover_v1/
│ │ ├── index.html
│ │ └── main.css
│ ├── EarlyOutline.md
│ ├── README.md
│ ├── Source-v2/
│ │ ├── .gitignore
│ │ ├── Children/
│ │ │ ├── Chapter1/
│ │ │ │ ├── chapter1.Rnw
│ │ │ │ └── chapter1.md
│ │ │ ├── Chapter10/
│ │ │ │ └── chapter10.Rnw
│ │ │ ├── Chapter11/
│ │ │ │ └── chapter11.Rnw
│ │ │ ├── Chapter12/
│ │ │ │ └── chapter12.Rnw
│ │ │ ├── Chapter13/
│ │ │ │ └── chapter13.Rnw
│ │ │ ├── Chapter14/
│ │ │ │ └── chapter14.Rnw
│ │ │ ├── Chapter2/
│ │ │ │ └── chapter2.Rnw
│ │ │ ├── Chapter3/
│ │ │ │ └── chapter3.Rnw
│ │ │ ├── Chapter4/
│ │ │ │ └── chapter4.Rnw
│ │ │ ├── Chapter5/
│ │ │ │ └── chapter5.Rnw
│ │ │ ├── Chapter6/
│ │ │ │ └── chapter6.Rnw
│ │ │ ├── Chapter7/
│ │ │ │ └── chapter7.Rnw
│ │ │ ├── Chapter8/
│ │ │ │ └── chapter8.Rnw
│ │ │ ├── Chapter9/
│ │ │ │ └── chapter9.Rnw
│ │ │ └── FrontMatter/
│ │ │ ├── AdditionalResources/
│ │ │ │ └── AdditionalResources.Rnw
│ │ │ ├── Packages.Rnw
│ │ │ ├── Preface.Rnw
│ │ │ ├── StylisticConventions.md
│ │ │ └── rep-res-PackagesCited.bib
│ │ ├── Rep-Res-Parent.Rnw
│ │ ├── Rep-Res-Parent.toc
│ │ ├── krantz.cls
│ │ └── rep-res-book.bib
│ ├── SourceOld/
│ │ ├── Chapter1/
│ │ │ └── chapter1.Rmd
│ │ ├── Chapter10/
│ │ │ └── chapter10.Rmd
│ │ ├── Chapter11/
│ │ │ └── chapter11.Rmd
│ │ ├── Chapter12/
│ │ │ └── chapter12.Rmd
│ │ ├── Chapter13/
│ │ │ └── chapter13.Rmd
│ │ ├── Chapter14/
│ │ │ └── chapter14.Rmd
│ │ ├── Chapter2/
│ │ │ └── chapter2.Rmd
│ │ ├── Chapter3/
│ │ │ └── chapter3.Rmd
│ │ ├── Chapter4/
│ │ │ └── chapter4.Rmd
│ │ ├── Chapter5/
│ │ │ └── chapter5.Rmd
│ │ ├── Chapter6/
│ │ │ └── chapter6.Rmd
│ │ ├── Chapter7/
│ │ │ └── chapter7.Rmd
│ │ ├── Chapter8/
│ │ │ └── chapter8.Rmd
│ │ └── Chapter9/
│ │ └── chapter9.Rmd
│ └── Writing_Setup/
│ ├── Early_Book_Origins.md
│ ├── HeaderFooter/
│ │ ├── IndvChapterFoot.tex
│ │ └── IndvChapterHead.tex
│ ├── IndvChapter.sh
│ ├── IndvChapter1.Rnw
│ ├── OldScripts/
│ │ ├── ConvertRmdtoRnw.sh
│ │ └── Rmd_Book.sh
│ ├── ProductionNotes.md
│ ├── Rnw_Book.sh
│ └── TableofContentPDF/
│ ├── GandrudRep-Res-Book-TOC.fdb_latexmk
│ ├── GandrudRep-Res-Book-TOC.tex
│ └── krantz.cls
├── README.Rmd
├── README.md
└── rep-res-3rd-edition/
├── .gitignore
├── 01-author.Rmd
├── 01-stylistic-conventions.Rmd
├── 02-additional-resources.Rmd
├── 03-introduction.Rmd
├── 04-getting-started.Rmd
├── 05-start-R.Rmd
├── 06-file-management.Rmd
├── 07-storage.Rmd
├── 08-gather.Rmd
├── 09-clean.Rmd
├── 10-modeling.Rmd
├── 11-tables.Rmd
├── 12-figures.Rmd
├── 13-latex.Rmd
├── 14-web.Rmd
├── 16-conclusion.Rmd
├── 99-references.Rmd
├── LICENSE
├── README.md
├── _bookdown.yml
├── _output.yml
├── book.bib
├── css/
│ └── style.css
├── index.Rmd
├── krantz.cls
├── latex/
│ ├── after_body.tex
│ ├── before_body.tex
│ └── preamble.tex
├── packages.bib
└── rep-res-3rd-edition.Rproj
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
# Ignore the following files from Git version control tracking #
################################################################
.DS_Store
.Rhistory
*.RData
*.aux
*.latexmk
*.log
*.gz
.Rproj.user
package-lock.json
_publish.R
_book
_bookdown_files
rsconnect
================================================
FILE: Old/BookMake.R
================================================
#################
# Make file for the book Reproducible Research with R and RStudio
# Christopher Gandrud
# Updated: 30 March 2015
#################
# This R source code compiles the manuscript for the book Reproducible Research
# with R and RStudio.
# It also updates the main README file.
## Installing required packages
# Note: To install the R packages used to compile the book open the
# Source/Children/FrontMatter/Packages.Rnw.
# Find: doInstall <- FALSE in the code chunk labeled "FrontPackageCitations".
# Change the value `FALSE` to `TRUE` and run the code chunk.
# Load knitr package
library(knitr)
#### Specify working directories. Change as needed. ####
# Rep-Res-Parent.Rnw
ParentDirectory <- "/git_repositories/Rep-Res-Book/Source/"
# README.Rmd
SetupDirectory <- "/git_repositories/Rep-Res-Book/"
##### Create PDF Book Manuscript ####
# Compile the book's parent document
setwd(ParentDirectory)
knitr::knit2pdf(input = "Rep-Res-Parent.Rnw")
# Embed fonts
# This is largely for complete replication purposes only and is not necessary.
## If using Windows please see extrafont set up instructions at
# https://GitHub.com/wch/extrafont
# extrafont::embed_fonts("Rep-Res-Parent.pdf")
#### README ####
setwd(SetupDirectory)
knitr::knit(input = "README.Rmd", output = "README.md")
================================================
FILE: Old/CoverGraphics/2ndEditionCover_v1/index.html
================================================
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8">
<link rel="stylesheet" href="main.css" type="text/css" />
</head>
<body>
<div class="book-logo"></div>
</body>
================================================
FILE: Old/CoverGraphics/2ndEditionCover_v1/main.css
================================================
.book-logo {
width: 100%;
height: 80%;
padding: 20%;
margin: 5%;
position: relative;
}
.book-logo:before {
content: "";
position: absolute;
top: 0;
left: 0;
width: 1em;
height: 1em;
color: orange;
box-shadow:
7em 1em,
8em 1em,
9em 1em,
10em 1em,
11em 1em,
12em 1em,
13em 1em,
7em 2em,
13em 2em,
7em 3em,
8em 3em,
9em 3em,
10em 3em,
11em 3em,
12em 3em,
13em 3em,
10em 4em,
10em 5em,
10em 6em,
2em 7em,
3em 7em,
4em 7em,
5em 7em,
6em 7em,
7em 7em,
8em 7em,
9em 7em,
10em 7em,
11em 7em,
12em 7em,
13em 7em,
14em 7em,
15em 7em,
16em 7em,
17em 7em,
18em 7em,
2em 8em,
12em 8em,
18em 8em,
2em 9em,
12em 9em,
18em 9em,
2em 10em,
12em 10em,
13em 10em,
14em 10em,
15em 10em,
18em 10em,
19em 10em,
20em 10em,
21em 10em,
0em 11em,
1em 11em,
2em 11em,
3em 11em,
4em 11em,
12em 11em,
14em 11em,
15em 11em,
20em 11em,
21em 11em,
0em 12em,
4em 12em,
5em 12em,
6em 12em,
12em 12em,
0em 13em,
1em 13em,
2em 13em,
3em 13em,
4em 13em,
6em 13em,
12em 13em,
13em 13em,
14em 13em,
15em 13em,
1em 14em,
6em 14em,
7em 14em,
8em 14em,
9em 14em,
12em 14em,
14em 14em,
15em 14em,
1em 15em,
6em 15em,
8em 15em,
9em 15em,
12em 15em,
1em 16em,
2em 16em,
3em 16em,
4em 16em,
6em 16em,
12em 16em,
13em 16em,
14em 16em,
15em 16em,
1em 17em,
3em 17em,
4em 17em,
6em 17em,
7em 17em,
8em 17em,
9em 17em,
14em 17em,
15em 17em,
1em 18em,
8em 18em,
9em 18em,
1em 19em,
2em 19em,
3em 19em,
4em 19em,
1em 20em,
3em 20em,
4em 20em,
1em 21em,
1em 22em,
2em 22em,
3em 22em,
4em 22em,
3em 23em,
4em 23em;
}
================================================
FILE: Old/EarlyOutline.md
================================================
# Reproducible Research with R and RStudio: A workflow for data gathering, analysis, and document creation
## Updated Chapter Outline
### Christopher Gandrud
### 23 June 2012
---
## Part I: Getting Started ###
### 1 Introducing Reproducible Research ###
1.1 What is reproducible research?
1.2 Why should research be reproducible?
> 1.2.1 Benefits for the research community
1.2.2 Benefits for individual researchers
1.3 Who should read this book?
> 1.3.1 Students
1.3.2 Researchers
1.3.3 Industry specialists
1.3.4 Background skills
1.4 Why use **R**/**RStudio** for reproducible research?
> 1.4.1 The advantages of **R** and `knitr`
1.4.2 The advantages of **RStudio**
1.5 Book overview
> 1.5.1 How to read this book
1.5.2 Contents overview
### 2 Getting Started with Reproducible Research ###
2.1 The Big Picture: A workflow for reproducible research
> 2.1.1 Data gathering
2.1.2 Data analysis
2.1.3 Data presentation
2.2 Practical tips for reproducible research
> 2.2.1 Document everything
2.2.2 Everything is a (text) file
2.2.3 All files should tell you what they are
2.2.4 Research projects are many files tied together
2.2.5 Have a plan to organize, store, and make your text files available
2.3 Introduction to the tools of reproducible research covered in this book
> 2.3.1 **R**/**RStudio**
2.3.2 `knitr`
2.3.3 Cloud storage & versioning
2.3.4 The command-line
2.3.5 Markup languages: \\( LaTeX \\) & **Markdown**/HTML
### 3 Getting Started with R/RStudio ###
3.1 Installing **R** and **RStudio**
3.2 Using **R**
> 3.2.1 Objects
3.2.2 Functions, Commands, and Arguments
3.2.3 More resources
3.3 Using **RStudio**
### 4 Getting Started with File Management ###
4.1 Locating and organizing files
> 4.1.1 Working directories
4.1.2 File management with **RStudio** Projects
4.2 Formatting and Commenting: writing **R** code to aid reproducibility
> 4.2.1 Why use a style guide to format **R** code?
4.2.2 Why comment on your code?
4.3 Introduction to `knitr`
> 4.3.1 Code chunks
4.3.2 Global options
4.3.3 `knitr` for the web: **Markdown**/HTML
4.3.4 `knitr` for PDF: \\( LaTeX \\)
4.4 Text editors and **R**/**RStudio**
## Part II: Reproducible Data Gathering & Storage ##
### 5 Gathering Data with R ###
5.1 Importing locally stored data sets
> 5.1.1 Single files
5.1.2 Looping through multiple files
5.2 Importing data sets from the internet
> 5.2.1 Data from non-secure (`http`) URLs
5.2.2 Data from secure (`https`) URLs
5.2.3 Data APIs & feeds
5.3 Basic web scraping
> 5.3.1 Scraping Tables
5.3.2 Gathering and Parsing Text
### 6 Storing, Versioning, Collaborating, and Accessing Files ###
6.1 Saving data in reproducible formats
6.2 Storing data in the cloud
6.3 **Dropbox**
> 6.3.1 Storing
6.3.2 Versioning
6.3.3 Collaborating
6.3.4 Accessing
6.4 **Dataverse**
> 6.4.1 Storing
6.4.2 Versioning
6.4.3 Collaborating
6.4.4 Accessing
6.5 **GitHub**
> 6.5.1 Storing
6.5.2 Versioning
6.5.3 Collaborating
6.5.4 Accessing
6.6 Citing data stored in the cloud
### 7 Preparing Data for Analysis ###
7.1 Cleaning data for merging
7.2 Sorting data
7.3 Merging data sets
7.4 Subsetting data
## Part III: Analysis and Results ##
### 8 Statistical Modelling and `knitr` ###
8.1 Incorporating analyses into the markup
> 8.1.1 Full code in the main document
- \\( LaTeX \\)
- **Markdown**
8.1.2 Sourcing R code from another file
- \\( LaTeX \\)
- **Markdown**
8.2 Saving output objects for future use
8.3 Including highlighted syntax in the output
> - \\( LaTeX \\)
- **Markdown**
8.4 Debugging
### 9 Showing Results with Tables ###
9.1 Table basics
> 9.1.1 Tables in \\( LaTeX \\)
9.1.2 Tables in **Markdown** and HTML
9.1.3 Basic `knitr` syntax for tables
9.2 Creating tables from **R** objects
> 9.2.1 `xtable` basics with supported class objects
- `xtable` for \\( LaTeX \\)
- `xtable` for **Markdown**
9.2.2 `xtable` for non-supported class objects
9.3 Tables with `apsrtable`
### 10 Showing Results with Figures ###
10.1 Basic `knitr` syntax for figures
10.2 Plots with `plot` and `ggplot2`
10.3 Animations
10.4 Motion charts and basic maps with `googleVis`
10.5 Advanced maps with `ggmap`
## Part IV: Final Documents ##
### 11 Presenting with \\( LaTeX \\) ###
11.1 The Basics
> 11.1.1 The Header
11.1.2 Headings
11.1.3 Footnotes and bibliographies
11.1.4 Math
11.1.5 Drawing figures with Ti*k*Z
11.2 Articles
> 11.2.1 Planning reproducible articles
11.2.2 Compiling \\( LaTeX \\) articles in **R** and **RStudio**
11.3 Presentations with Beamer
### 12 Large \\( LaTeX \\) Documents: Theses, books and batch reports ###
12.1 Planning large documents
> 12.1.1 Planning reproducible theses and books
12.1.2 Planning reproducible batch reports
12.2 Combining chapters
> 12.2.1 Presenting in parts
12.2.2 Parent documents
12.2.3 Child documents
- In line output with `\Sexpr{}`
- Custom title pages in \\( LaTeX \\)
12.3 Batch reports with `knitr` and the command-line
> 12.3.1 The data
12.3.2 The markup
12.3.3 The makefile
### 13 Presenting on the Web and Beyond with Markdown/HTML ###
13.1 The Basics
> 13.1.1 Headings
13.1.2 Footnotes and bibliographies with **MultiMarkdown**
13.1.3 Math
13.1.4 Drawing figures with CSS
13.2 Simple webpages
> 13.2.1 RPubs
13.2.2 Hosting webpages with Dropbox
13.3 Presentations with `Slidify`
13.4 Reproducible Websites
> 13.4.1 Blogging with Tumblr
13.4.2 Jekyll-Bootstrap and GitHub
13.4.3 Jekyll and GitHub
13.5 Using **Markdown** for non-HTML output with **Pandoc**
### 14 Other Resources for Reproducible Research ###
================================================
FILE: Old/README.md
================================================
# The Old Directory
This folder contains obsolete files that were used in earlier versions of the book.
================================================
FILE: Old/Source-v2/.gitignore
================================================
# Ignore LaTeX compile byproduct files #
########################################
*.aux
*.bbl
*.blg
cache/*
.DS_Store
figure/*
*.idx
*.ilg
*.ind
*.lof
*.log
*.lot
*.pdf
*.RData
.Ristory
*.tex
# Don't ignore these files
!Children/FrontMatter/AdditionalResources/imagesExamp/ExampDiagram.tex
!Children/FrontMatter/AdditionalResources/imagesExamp/FileTree.tex
!Children/Chapter2/images2/WorkFlowLinks.tex
!Children/Chapter3/images3/KnitrProcess.tex
!Children/Chapter4/images4/ExampleFilePath.tex
================================================
FILE: Old/Source-v2/Children/Chapter1/chapter1.Rnw
================================================
% Chapter Chapter 1 For Reproducible Research in R and RStudio
% Christopher Gandrud
% Created: 16/07/2012 05:45:03 pm CEST
% Updated: 5 May 2015
\chapter{Introducing Reproducible Research}\label{Intro}
Research is often presented in very selective containers: slideshows, journal articles, books, or maybe even websites. These presentation documents announce a project's findings and try to convince us that the results are correct \cite[]{Mesirov2010}. It's important to remember that these documents are not the research. Especially in the computational and statistical sciences, these documents are the ``advertising''. The research is the ``full software environment, code, and data that produced the results'' \cite[385]{Buckheit1995,Donoho2010}. When we separate the research from its advertisement we are making it difficult for others to verify the findings by reproducing them.
This book gives you the tools to dynamically combine your research with the presentation of your findings. The first tool is a workflow for reproducible research that weaves the principles of reproducibility throughout your entire research project, from data gathering to the statistical analysis, and the presentation of results. You will also learn how to use a number of computer tools that make this workflow possible. These tools include:
\begin{itemize}
\item the \textbf{R} statistical language that will allow you to gather data and analyze it;
\item the \textbf{LaTeX} and \textbf{Markdown} markup languages that you can use to create documents--slideshows, articles, books, and webpages--for presenting your findings;
\item the {\emph{knitr}} and \emph{rmarkdown} \textbf{packages} for R and other tools, including \textbf{command-line shell programs} like GNU Make and Git version control, for dynamically tying your data gathering, analysis, and presentation documents together so that they can be easily reproduced;
\item \textbf{RStudio}, a program that brings all of these tools together in one place.
\end{itemize}
%%%%%%%%%%%%%% What is reproducible research? %%%%%%%%%%%%%
\section{What Is Reproducible Research?}
\index{reproducible research|(}
\index{replication|(}
Though there is some debate over what are the necessary and sufficient conditions for a replication \cite[2]{Makel2014}, research results are generally considered \emph{replicable} if there is sufficient information available for independent researchers to make the same findings using the same procedures with new data.\footnote{This is close to what \cite{Lykken1968} calls ``operational replication''.} For research that relies on experiments, this can mean a researcher not involved in the original research being able to rerun the experiment, including sampling, and validate that the new results are comparable to the original ones. In computational and quantitative empirical sciences, results are replicable if independent researchers can recreate findings by following the procedures originally used to gather the data and run the computer code. Of course, it is sometimes difficult to replicate the original data set because of issues such as limited resources to gather new data or because the original study already sampled the full universe of cases. \index{replication|)} So as a next-best standard we can aim for ``\emph{really reproducible research}'' \cite[1226]{Peng2011}.\footnote{The idea of really reproducible computational research was originally thought of and implemented by Jon Claerbout\index{Jon Claerbout} and the Stanford Exploration Project beginning in the 1980s and early 1990s \cite[]{Fomel2009,Donoho2009}. Further seminal advances were made by Jonathan B. Buckheit and David L. Donoho who created the Wavelab library of MATLAB\index{MATLAB} routines for their research on wavelets in the mid-1990s \cite[]{Buckheit1995}.} In computational sciences\footnote{Reproducibility is important for both quantitative and qualitative research \cite[]{King1994}. Nonetheless, we will focus mainly on methods for reproducibility in quantitative computational research.} this means:
\begin{quote}
the data and code used to make a finding are available and they are sufficient for an independent researcher to recreate the finding.
\end{quote}
In practice, research needs to be {\emph{easy}} for independent researchers to reproduce \cite[]{Ball2012}. If a study is difficult to reproduce it's more likely that no one will reproduce it. If someone does attempt to reproduce this research, it will be difficult for them to tell if any errors they find were in the original research or problems they introduced during the reproduction. In this book you will learn how to avoid these problems.
In particular you will learn tools for dynamically ``{\emph{knitting}}''\index{knit}\footnote{Much of the reproducible computational research and literate programming literatures have traditionally used the term ``weave''\index{weave} to describe the process of combining source code and presentation documents \cite[see][101]{Knuth1992}. In the R community weave is usually used to describe the combination of source code and LaTeX documents. The term ``knit'' reflects the vocabulary of the {\emph{knitr}} R package\index{knitr} (knit + R). It is used more generally to describe weaving with a variety of markup languages. The term is used by RStudio if you are using the \emph{rmarkdown}\index{rmarkdown} package, which is similar to \emph{knitr}. We also cover the \emph{rmarkdown} package in this book. Because of this, I use the term knit rather than weave in this book.} the data and the source code together with your presentation documents. Combined with well-organized source files and clearly and completely commented code, independent researchers will be able to understand how you obtained your results. This will make your computational research easily reproducible.
%%%%%%%%%%%%%% Why should research be reproducible? %%%%%%%%%%%%%
\section{Why Should Research Be Reproducible?}
Reproducible research is one of the main components of science. If that's not enough reason for you to make your research reproducible, consider that the tools of reproducible research also have direct benefits for you as a researcher.
\subsection{For science}
Replicability has been a key part of scientific inquiry from perhaps the 1200s \cite[]{Bacon1267,Nosek2012}. It has even been called the ``demarcation between science and non-science'' \cite[2]{Braude1979}. Why is replication so important for scientific inquiry?
\paragraph{Standard to judge scientific claims}
\emph{Replication} opens claims to scrutiny, allowing us to keep what works and discard what doesn't. Science, according to the American Physical Society, ``is the systematic enterprise of gathering knowledge \ldots organizing and condensing that knowledge into testable laws and theories''. The ``ultimate standard'' for evaluating scientific claims is whether or not the claims can be replicated \cite[]{Peng2011,Kelly2006}. Research findings cannot even really be considered ``genuine contribution[s] to human knowledge'' until they have been verified through replication \cite[38]{Stodden2009}. Replication ``requires the complete and open exchange of data, procedures, and materials''. Scientific conclusions that are not replicable should be abandoned or modified ``when confronted with more complete or reliable \ldots evidence''.\footnote{See the American Physical Society's website at \url{http://www.aps.org/policy/statements/99_6.cfm}. See also \cite{Fomel2009}.}
\emph{Reproducibility enhances replicability}. If other researchers are able to clearly understand how a finding was originally made, then they will be better able to conduct comparable research in meaningful attempts to replicate the original findings. Sometimes strict replicability is not feasible, for example, when it is only possible to gather one data set on a population of interest. In these cases reproducibility is a ``minimum standard'' for judging scientific claims \citep{Peng2011}.
It is important to note that though reproducibility is a minimum standard for judging scientific claims, ``a study can be reproducible and still be wrong'' \citep{Peng2014}. For example, a statistically significant finding in one study may remain statistically significant when reproduced using the original data/code, but when researchers try to replicate it using new data and even methods, they are unable to find a similar result. The original finding could simply have been noise, even though it is fully reproducible.
\paragraph{Avoiding effort duplication \& encouraging cumulative knowledge development}
Not only is reproducibility important for evaluating scientific claims, it can also contribute to the cumulative growth of scientific knowledge \citep{Kelly2006,King1995}. Reproducible research cuts down on the amount of time scientists have to spend gathering data or developing procedures that have already been collected or figured out. Because researchers do not have to discover on their own things that have already been done, they can more quickly build on established findings and develop new knowledge.
\subsection{For you}
Working to make your research reproducible does require extra upfront effort. For example, you need to put effort into learning the tools of reproducible research by doing things such as reading this book. But beyond the clear benefits for science, why should you make this effort? Using reproducible research tools can make your research process more effective and (hopefully) ultimately easier.
\paragraph{Better work habits}
Making a project reproducible from the start encourages you to use better work habits. It can spur you to more effectively plan and organize your research. It should push you to bring your data and source code up to a higher level of quality than you might if you ``thought `no one was looking'\thinspace'' \cite[386]{Donoho2010}. This forces you to root out errors--a ubiquitous part of computational research--earlier in the research process \cite[385]{Donoho2010}. Clear documentation also makes it easier to find errors.\footnote{Of course, it's important to keep in mind that reproducibility is ``neither necessary nor sufficient to prevent mistakes'' \cite[]{Stodden2009b}.}
Reproducible research needs to be stored so that other researchers can actually access the data and source code. By taking steps to make your research accessible for others you are also making it easier for yourself to find your data and methods when you revise your work or begin new a project. You are avoiding personal effort duplication, allowing you to cumulatively build on your own work more effectively.
\paragraph{Better teamwork}
The steps you take to make sure an independent researcher can figure out what you have done also make it easier for your collaborators to understand your work and build on it. This applies not only to current collaborators, but also future collaborators. Bringing new members of a research team up to speed on a cumulatively growing research project is faster if they can easily understand what has been done already \cite[386]{Donoho2010}.
\paragraph{Changes are easier}
A third person may or may not actually reproduce your research even if you make it easy for them to do so. But, {\emph{you will almost certainly reproduce parts or even all of your own research}}. No actual research process is completely linear. You almost never gather data, run analyses, and present your results without going backwards to add variables, make changes to your statistical models, create new graphs, alter results tables in light of new findings, and so on. You will probably try to make these changes long after you last worked on the project and long since you remembered the details of how you did it. Whether your changes are because of journal reviewers' and conference participants' comments or you discover that new and better data has been made available since beginning the project, designing your research to be reproducible from the start makes it much easier to change things later on.
Dynamic reproducible documents in particular can make changing things much easier. Changes made to one part of a research project have a way of cascading through the other parts. For example, adding a new variable to a largely completed analysis requires gathering new data and merging it with existing data sets. If you used data imputation or matching methods you may need to rerun these models. You then have to update your main statistical analyses, and recreate the tables and graphs you used to present the results. Adding a new variable essentially forces you to reproduce large portions of your research. If when you started the project you used tools that make it easier for others to reproduce your research, you also made it easier to reproduce the work yourself. You will have taken steps to have a ``better relationship with [your] future [self]'' \cite[2]{Bowers2011}.
\paragraph{Higher research impact}\index{impact, research}
Reproducible research is more likely to be useful for other researchers than non-reproducible research. Useful research is cited more frequently \cite[]{Donoho2002,Piwowar2007,Vandewalle2012}. Research that is fully reproducible contains more information, i.e. more reasons to use and cite it, than presentation documents merely showing findings. Independent researchers may use the reproducible data or code to look at other, often unanticipated, questions. When they use your work for a new purpose they will (should) cite your work. Because of this, Vandewalle et al. even argue that ``the goal of reproducible research is to have more impact with our research'' \citeyearpar[1253]{Vandewalle2007}.
A reason researchers often avoid making their research fully reproducible is that they are afraid other people will use their data and code to compete with them. I'll let Donoho et al. address this one:
\begin{quote}
\emph{True. But competition means that strangers will read your papers, try to learn from them, cite them, and try to do even better. If you prefer obscurity, why are you publishing?} \citeyearpar[16]{Donoho2009}
\end{quote}
\index{reproducible research|)}
\section{Who Should Read This Book?}
This book is intended primarily for researchers who want to use a systematic workflow that encourages reproducibility as well as practical state-of-the-art computational tools to put this workflow into practice. These people include professional researchers, upper-level undergraduate, and graduate students working on computational data-driven projects. Hopefully, editors at academic publishers will also find the book useful for improving their ability to evaluate and edit reproducible research.
The more researchers that use the tools of reproducibility the better. So I include enough information in the book for people who have very limited experience with these tools, including limited experience with R, LaTeX, and Markdown. They will be able to start incorporating reproducible research tools into their workflow right away. The book will also be helpful for people who already have general experience using technologies such as R and LaTeX, but would like to know how to tie them together for reproducible research.
\subsection{Academic researchers}
Hopefully so far in this chapter I've convinced you that reproducible research has benefits for you as a member of the scientific community and personally as a computational researcher. This book is intended to be a practical guide for how to actually make your research reproducible. Even if you already use tools such as R and LaTeX you may not be leveraging their full potential. This book will teach you useful ways to get the most out of them as part of a reproducible research workflow.
\subsection{Students}
Upper-level undergraduate and graduate students conducting original computational research should make their research reproducible for the same reasons that professional researchers should. Forcing yourself to clearly document the steps you took will also encourage you to think more clearly about what you are doing and reinforce what you are learning. It will hopefully give you a greater appreciation of research accountability and integrity early in your career \cite[183]{Barr2012,Ball2012}.
Even if you don't have extensive experience with computer languages, this book will teach you specific habits and tools that you can use throughout your student research and hopefully your careers. Learning these things earlier will save you considerable time and effort later.
\subsection{Instructors}
When instructors incorporate the tools of reproducible research into their assignments they not only build students' understanding of research best practice, but are also better able to evaluate and provide meaningful feedback on students' work \cite[183]{Ball2012}. This book provides a resource that you can use with students to put reproducibility into practice.
If you are teaching computational courses, you may also benefit from making your lecture material dynamically reproducible. Your slides will be easier to update for the same reasons that it is easier to update research. Making the methods you used to create the material available to students will give them more information. Clearly documenting how you created lecture material can also pass information on to future instructors.
\subsection{Editors}
Beyond a lack of reproducible research skills among researchers, an impediment to actually creating reproducible research is a lack of infrastructure to publish it \cite[]{Peng2011}. Hopefully, this book will be useful for editors at academic publishers who want to be better at evaluating reproducible research, editing it, and developing systems to make it more widely available. The journal {\emph{Biostatistics}} is a good example of a publication that is encouraging (actually requiring) reproducible research. From 2009 the journal has had an editor for reproducibility that ensures replication files are available and that results can be replicated using these files \cite[]{Peng2009}. The more editors there are with the skills to work with reproducible research the more likely it is that researchers will do it.
\subsection{Private sector researchers}
Researchers in the private sector may or may not want to make their work easily reproducible outside of their organization. However, that does not mean that significant benefits cannot be gained from using the methods of reproducible research. First, even if public reproducibility is ruled out to guard proprietary information,\footnote{There are ways to enable some public reproducibility without revealing confidential information. See \cite{Vandewalle2007} for a discussion of one approach.} making your research reproducible to members of your organization can spread valuable information about how analyses were done and data was collected. This will help build your organization's knowledge and avoid effort duplication. Just as a lack of reproducibility hinders the spread of information in the scientific community, it can hinder it inside of a private organization. Using the sort of dynamic automated processes run with clearly documented source code we will learn in this book can also help create robust data analysis methods that help your organization avoid errors that may come from cutting-and-pasting data across spreadsheets.\footnote{See this post by David Smith about how the J.P. Morgan\index{JP Morgan} ``London Whale''\index{London Whale} problem may have been prevented with the type of processes covered in this book: \url{http://blog.revolutionanalytics.com/2013/02/did-an-excel-error-bring-down-the-london-whale.html} (posted 11 February 2013).}
Also, the tools of reproducible research covered in this book enable you to create professional standardized reports that can be easily updated or changed when new information is available. In particular, you will learn how to create batch reports based on quantitative data.
%%%%%%%%%%%%%%%%% The Tools of Reproducible Research %%%%%%%%%%%%%%%
\section{The Tools of Reproducible Research}
This book will teach you the tools you need to make your research highly reproducible. Reproducible research involves two broad sets of tools. The first is a {\bf{reproducible research environment}}\index{reproducible research!environment} that includes the statistical tools you need to run your analyses as well as ``the ability to automatically track the provenance of data, analyses, and results and to package them (or pointers to persistent versions of them) for redistribution''. The second set of tools is a {\bf{reproducible research publisher}}\index{reproducible research!publisher}, which prepares dynamic documents for presenting results and is easily linked to the reproducible research environment \cite[415]{Mesirov2010}.
In this book we will focus on learning how to use the widely available and highly flexible reproducible research environment--R/RStudio \cite[]{RLanguage,RStudioCite}.\footnote{The book was created with R version \Sexpr{paste0(version$major, '.', version$minor)} and developer builds of RStudio version 0.99.370.} R/RStudio can be linked to numerous reproducible research publishers such as LaTeX and Markdown with Yihui Xie's {\emph{knitr}} package \citeyearpar{R-knitr} or the related \emph{rmarkdown} package \citep{R-rmarkdown}. The main tools covered in this book include:
\begin{itemize}
\item {\bf{R}}: a programming language primarily for statistics and graphics. It can also be useful for data gathering and creating presentation documents.
\item {\bf{{\emph{knitr} and {\emph{rmarkdown}}}}}: related R packages for literate programming\index{literate programming}. They allow you to combine your statistical analysis and the presentation of the results into one document. They work with R and a number of other languages such as Bash, Python, and Ruby.
\item {\bf{Markup languages}}: instructions for how to format a presentation document. In this book we cover LaTeX, Markdown, and a little HTML.
\item {\bf{RStudio}}: an integrated developer environment (IDE)\index{integrated developer environment} for R that tightly combines R, {\emph{knitr}}, \emph{rmarkdown}, and markup languages.
\item {\bf{Cloud storage \& versioning}}: Services such as Dropbox and Git/GitHub that can store data, code, and presentation files, save previous versions of these files, and make this information widely available.
\item {\bf{Unix-like shell programs}}\index{Unix-like shell program}: These tools are useful for working with large research projects.\footnote{In this book I cover the Bash shell for Linux\index{Linux} and Mac as well as Windows PowerShell\index{Windows PowerShell}.} They also allow us to use command-line tools including GNU Make for compiling projects and Pandoc, a program useful for converting documents from one markup language to another.
\end{itemize}
%%%%%%%%%%%%%%%%%%% Why use R, knitr, and RStudio for reproducible research? %%%%%%%%%%%%%%
\section{Why Use R, \emph{knitr}/\emph{rmarkdown}, and RStudio for Reproducible Research?}
\paragraph{Why R?}
Why use a statistical programming language like R for reproducible research? R has a very active development community that is constantly expanding what it is capable of. As we will see in this book, R enables researchers across a wide range of disciplines to gather data and run statistical analyses. Using the {\emph{knitr}} or \emph{rmarkdown} package, you can connect your R-based analyses to presentation documents created with markup languages\index{markup language} such as LaTeX and Markdown. This allows you to dynamically and reproducibly present results in articles, slideshows, and webpages.
The way you interact with R has benefits for reproducible research. In general you interact with R (or any other programming and markup language) by explicitly writing down your steps as source code\index{source code}. This promotes reproducibility more than your typical interactions with Graphical User Interface (GUI)\index{Graphical User Interface}\index{GUI} programs like\index{SPSS} SPSS\footnote{I know you can write scripts in statistical programs like SPSS, but doing so is not encouraged by the program's interface and you often have to learn multiple languages for writing scripts that run analyses, create graphics, and deal with matrices.} and Microsoft Word\index{Microsoft Word}. When you write R code and embed it in presentation documents created using markup languages, you are forced to explicitly state the steps you took to do your research. When you do research by clicking through drop-down menus in GUI programs, your steps are lost, or at least documenting them requires considerable extra effort. Also it is generally more difficult to dynamically embed your analysis in presentation documents created by GUI word processing programs in a way that will be accessible to other researchers both now and in the future. I'll come back to these points in Chapter \ref{GettingStartedRR}.
\paragraph{Why {\normalfont{knitr}} and {\normalfont{rmarkdown}}?}
Literate programming\index{literate programming} is a crucial part of reproducible quantitative research.\footnote{Donald Knuth\index{Donald Knuth} coined the term literate programming in the 1970s to refer to a source file that could be both run by a computer and ``woven'' with a formatted presentation document \cite[]{Knuth1992}.} Being able to directly link your analyses, your results, and the code you used to produce the results makes tracing your steps much easier. There are many different literate programming tools for a number of different programming languages.\footnote{A very interesting tool that is worth taking a look at for the Python\index{Python} programming language is HTML Notebooks\index{HTML Notebook} created with IPython.\index{IPython} For more details see \url{http://ipython.org/ipython-doc/dev/notebook/index.html}.} Previously, one of the most common tools for researchers using R and the LaTeX markup language was \emph{Sweave} \cite[]{Leisch2002}.\index{Sweave} The packages I am going to focus on in this book are newer and have more capabilities. They are called {\emph{knitr}}\index{knitr} and \emph{rmarkdown}. Why are we going to use these tools in this book and not \emph{Sweave} or some other tool?
The simple answer is that they are more capable than \emph{Sweave}. Both \emph{knitr} and \emph{rmarkdown} can work with markup languages other than LaTeX including Markdown and HTML. \emph{rmarkdown} can even output Microsoft Word documents.\index{Microsoft Word} They can work with programming languages other than R. They highlight R code\index{syntax highlighting} in presentation documents making it easier for your readers to follow.\footnote{Syntax highlighting uses different colors and fonts to distinguish different types of text.} They give you better control over the inclusion of graphics and can cache code chunks, i.e. save the output for later.\index{cache} \emph{knitr} has the ability to understand \emph{Sweave}-like syntax, so it will be easy to convert backwards to \emph{Sweave} if you want to.\footnote{Note that the Sweave-style syntax is not identical to actual \emph{Sweave} syntax. See Yihui Xie's discussion of the differences between the two at: \url{http://yihui.name/knitr/demo/sweave/}. \emph{knitr} has a function (\texttt{Sweave2knitr})\index{Sweave2knitr} for converting \emph{Sweave} to \emph{knitr} syntax.} You also have the choice to use much simpler and more straightforward syntax with {\emph{knitr}} and \emph{rmarkdown}.
\emph{knitr} and \emph{rmarkdown} have broadly similar capabilities and syntax. They both are literate programming tools that can produce presentation documents from multiple markup languages. They have almost identical syntax when used in Markdown.\index{Markdown} Their main difference is that they take different approaches to creating presentation documents. \emph{knitr} documents must be written using the markup language associated with the desired output. For example, with \emph{knitr}, LaTeX must be used to create PDF output documents and Markdown or HTML must be used to create webpages. \emph{rmarkdown} builds directly on \emph{knitr}, the key difference being that it uses the straightforward Markdown markup language to generate PDF, HTML, and MS Word documents.\footnote{It does this by relying on a tool called Pandoc \citep{Pandoc2014}.\index{Pandoc}}
Because you write with the simple Markdown syntax, \emph{rmarkdown} is generally easier to use. It has the advantage of being able to take the same markup document and output multiple types of presentation documents. Nonetheless, for complex documents like books and long articles or work that requires custom formatting, \emph{knitr} LaTeX is often preferable and extremely flexible, though the syntax is more complicated.
\paragraph{Why RStudio?}
\index{RStudio}Why use the RStudio integrated development environment for reproducible research? R by itself has the capabilities necessary to gather data, analyze it, and, with a little help from {\emph{knitr}}/\emph{rmarkdown} and markup languages, present results in a way that is highly reproducible. RStudio allows you to do all of these things, but simplifies many of them and allows you to navigate through them more easily. It also is a happy medium between R's text-based interface and a pure GUI.
Not only does RStudio do many of the things that R can do but more easily, it is also a very good standalone editor for writing documents with LaTeX and Markdown. For LaTeX documents it can, for example, insert frequently used commands like \texttt{\textbackslash{}section\{\}} for numbered sections (see Chapter \ref{LatexChapter}).\footnote{If you are more comfortable with a what-you-see-is-what-you-get (WYSIWYG)\index{WYSIWYG} word processor like Microsoft Word, you might be interested in exploring Lyx\index{Lyx}. It is a WYSIWYG-like LaTeX editor that works with {\emph{knitr}}. It doesn't work with the other markup languages covered in this book. For more information see: \url{http://www.lyx.org/}. I give some brief information on using Lyx with \emph{knitr} in Chapter 3's Appendix.} There are many LaTeX editors available, both open source and paid. But RStudio is currently the best program for creating reproducible LaTeX and Markdown documents. It has full syntax highlighting\index{syntax highlighting}. Its syntax highlighting can even distinguish between R code and markup commands in the same document. It can spell check LaTeX and Markdown documents. It handles {\emph{knitr}}/\emph{rmarkdown} code chunks\index{code chunk} beautifully (see Chapter \ref{GettingStartedRKnitr}).
Finally, RStudio not only has tight integration with various markup languages, it also has capabilities for using other tools such as C++, CSS, JavaScript, and a few other programming languages. It is closely integrated with the version control programs Git\index{Git} and SVN\index{SVN}. Both of these programs allow you to keep track of the changes you make to your documents (see Chapter \ref{Storing}). This is important for reproducible research since version control programs can document many of your research steps. It also has a built-in ability to make HTML slideshows from \emph{knitr}/\emph{rmarkdown} documents. Basically, RStudio makes it easy to create and navigate through complex reproducible research documents.
\subsection{Installing the main software}\label{InstallR}
Before you read this book you should install the main software. All of the software programs covered in this book are open source and can be easily downloaded for free. They are available for Windows\index{Windows}, Mac\index{Mac}, and Linux operating systems\index{Linux}. They should run well on most modern computers.
You should install R before installing RStudio. You can download the programs from the following websites:
\begin{itemize}
\item {\bf{R}}: \url{http://www.r-project.org/},
\item {\bf{RStudio Desktop (Open Source License)}}: \url{http://www.rstudio.com/products/rstudio/download/}.
\end{itemize}
\noindent The download webpages for these programs have comprehensive information on how to install them, so please refer to those pages for more information.
After installing R and RStudio you will probably also want to install a number of user-written packages that are covered in this book. To install all of these user-written packages, please see page \pageref{ReqPackages}.
\paragraph{Installing markup languages}\label{InstallMarkup}
If you are planning to create LaTeX documents you need to install a TeX distribution\index{TeX distribution}.\footnote{LaTeX is is really a set of macros for the TeX typesetting system.\index{TeX} It is included in all major TeX distributions.} They are available for Windows, Mac, and Linux systems. They can be found at: \url{http://www.latex-project.org/ftp.html}. Please refer to that site for more installation information.
If you want to create Markdown documents you can separately install the {\emph{markdown}} package\index{R package!markdown} in R. You can do this the same way that you install any package in R, with the {\tt{install.packages}} command.\footnote{The exact command is: {\tt{install.packages("markdown")}}.}
\paragraph{GNU Make}
If you are using a Linux computer you already have GNU Make\label{InstallMake}\index{GNU Make} installed.\footnote{To verify this, open the Terminal\index{Terminal} and type: \texttt{make --version} (I used version 3.81 for this book). This should output details about the current version of Make installed on your computer.} Mac users will need to install the command-line developer tools.\index{command-line developer tools} There are two ways to do this. One is go to the App Store\index{Apple App Store} and download Xcode (it's free).\index{Xcode} Once Xcode is installed, install command-line tools, which you will find by opening Xcode then clicking on \texttt{Preference} \textrightarrow \: \texttt{Downloads}. However, Xcode is a very large download and you only need the command-line tools for Make. To install just the command-line tools, open the Terminal\index{Terminal} and try to run Make by typing \texttt{make} and hitting return. A box should appear asking you if you want to install the command-line developer tools. Click \texttt{Install}. Windows users will have Make installed if they have already installed Rtools\index{Rtools} (see page \pageref{RtoolsDownload}). Mac and Windows users will need to install this software not only so that GNU Make runs properly, but also so that other command-line tools work well.
\paragraph{Other Tools}
We will discuss other tools such as Git that can be a useful part of a reproducible research workflow. Installation instructions for these tools will be discussed below.
%%%%%%%%%%%%%% Book Overview %%%%%%%%%%%%%%
\section{Book Overview}
The purpose of this book is to give you the tools that you will need to do reproducible research with R and RStudio. This book describes a workflow for reproducible research primarily using R and RStudio. It is designed to give you the necessary tools to use this workflow for your own research. It is not designed to be a complete reference for R, RStudio, {\emph{knitr}}/\emph{rmarkdown}, Git, or any other program that is a part of this workflow. Instead it shows you how these tools can fit together to make your research more reproducible. To get the most out of these individual programs I will along the way point you to other resources that cover these programs in more detail.
To that end, I can recommend a number of resources that cover more of the nitty-gritty:\label{OtherBooks}
\begin{itemize}
\item Michael J. Crawley's \citeyearpar{Crawley2013} encyclopaedic R book, appropriately titled \emph{\textbf{The R Book}}, published by Wiley.
\item Hadley Whickham \citeyearpar{Whickham2014book} has a great new book out from Chapman and Hall on \emph{\textbf{Advanced R}}.
\item Yihui Xie's \citeyearpar{Xie2013} book \emph{\textbf{Dynamic Documents with R and knitr}}, published by Chapman and Hall, provides a comprehensive look at how to create documents with \emph{knitr}. It's a good complement to this book's generally more research project--level focus.
\item Norman Matloff's \citeyearpar{Matloff2011} tour through the programming language aspects of R called \emph{\textbf{The Art of R Programming: A Tour of Statistical Design Software}}, published by No Starch Press.
\item Cathy O'Neil and Rachel Schutt \citeyearpar{ONeil2013} give a great introduction the field of data science generally in \emph{\textbf{Doing Data Science}}, published by O'Reilly Media Inc.
\item For an excellent introduction to the command-line\index{command-line} in Linux and Mac, see William E. Shotts Jr.'s \citeyearpar{ShottsJr2012} book \emph{\textbf{The Linux Command-line: A Complete Introduction}} also published by No Starch Press. It is also helpful for Windows users running PowerShell (see Chapter \ref{DirectoriesChapter}).
\item The RStudio website (\url{http://www.rstudio.com/ide/docs/}) has a number of useful tutorials on how to use {\emph{knitr}} with LaTeX and Markdown. They also have very good documentation for \emph{rmarkdown} at \url{http://rmarkdown.rstudio.com/}.
\end{itemize}
That being said, my goal is for this book to be {\emph{self-sufficient}}. A reader without a detailed understanding of these programs will be able to understand and use the commands and procedures I cover in this book. While learning how to use R and the other programs I personally often encountered illustrative examples that included commands, variables, and other things that were not well explained in the texts that I was reading. This caused me to waste many hours trying to figure out, for example, what the \texttt{\$} is used for (preview: it's the component selector, see Section \ref{ComponentSelect}). I hope to save you from this wasted time by either providing a brief explanation of possibly frustrating and mysterious things and/or pointing you in the direction of good explanations.
\subsection{How to read this book}
This book gives you a workflow. It has a beginning, middle, and end. So, unlike a reference book, it can and should be read linearly as it takes you through an empirical research processes from an empty folder to a completed set of documents that reproducibly showcase your findings.
That being said, readers with more experience using tools like R or LaTeX may want to skip over the nitty-gritty parts of the book that describe how to manipulate data frames or compile LaTeX documents into PDFs. Please feel free to skip these sections.
\paragraph{More-experienced R users}
If you are an experienced R user you may want to skip over the first section of Chapter \ref{GettingStartedRKnitr}: Getting Started with R, RStudio, and \emph{knitr}/\emph{rmarkdown}. But don't skip over the whole chapter. The latter parts contain important information on the {\emph{knitr}}/\emph{rmarkdown} packages. If you are experienced with R data manipulation you may also want to skip all of Chapter \ref{DataClean}.
\paragraph{More-experienced LaTeX users}
If you are familiar with LaTeX you might want to skip the first part of Chapter \ref{LatexChapter}. The second part may be useful as it includes information on how to dynamically create BibTeX bibliographies with \emph{knitr} and how to include \emph{knitr} output in a Beamer slideshow.
\paragraph{Less-experienced LaTeX/Markdown users}
If you do not have experience with LaTeX or Markdown you may benefit from reading, or at least skimming, the introductory chapters on these top topics (chapters \ref{LatexChapter} and \ref{MarkdownChapter}) before reading Part III.
\subsection{Reproduce this book}
This book practices what it preaches. It can be reproduced. I wrote the book using the programs and methods that I describe. Full documentation and source files can be found at the book's GitHub\index{GitHub} repository. Feel free to read and even use (within reason and with attribution, of course) the book's source code. You can find it at: \url{https://GitHub.com/christophergandrud/Rep-Res-Book}. This is especially useful if you want to know how to do something in the book that I don't directly cover in the text.
If you notice any errors or places where the book can be improved please report them on the book's GitHub Issues page: \url{https://GitHub.com/christophergandrud/Rep-Res-Book/issues}. Corrections will be posted at: \url{http://christophergandrud.GitHub.io/RepResR-RStudio/errata.htm}.
\subsection{Contents overview}
The book is broken into four parts. The first part (chapters \ref{GettingStartedRR}, \ref{GettingStartedRKnitr}, and \ref{DirectoriesChapter}) gives an overview of the reproducible research workflow as well as the general computer skills that you'll need to use this workflow. Each of the next three parts of the book guides you through the specific skills you will need for each part of the reproducible research process. Part two (chapters \ref{Storing}, \ref{DataGather}, and \ref{DataClean}) covers the data gathering and file storage process. The third part (chapters \ref{StatsModel}, \ref{TablesChapter}, and \ref{FiguresChapter}) teaches you how to dynamically incorporate your statistical analysis, results figures, and tables into your presentation documents. The final part (chapters \ref{LatexChapter}, \ref{LargeDocs}, and \ref{MarkdownChapter}) covers how to create reproducible presentation documents including LaTeX articles, books, slideshows, and batch reports as well as Markdown webpages and slideshows.
================================================
FILE: Old/Source-v2/Children/Chapter1/chapter1.md
================================================
Introducing Reproducible Research {#Intro}
=================================
Research is often presented in very selective containers: slideshows,
journal articles, books, or maybe even websites. These presentation
documents announce a project's findings and try to convince us that the
results are correct [@Mesirov2010]. It's important to remember that
these documents are not the research. Especially in the computational
and statistical sciences, these documents are the "advertising". The
research is the "full software environment, code, and data that produced
the results" [@Buckheit1995; @Donoho2010 385]. When we separate the
research from its advertisement we are making it difficult for others to
verify the findings by reproducing them.
This book gives you the tools to dynamically combine your research with
the presentation of your findings. The first tool is a workflow for
reproducible research that weaves the principles of reproducibility
throughout your entire research project, from data gathering to the
statistical analysis, and the presentation of results. You will also
learn how to use a number of computer tools that make this workflow
possible. These tools include:
- the **R** statistical language that will allow you to gather data
and analyze it;
- the **LaTeX** and **Markdown** markup languages that you can use to
create documents--slideshows, articles, books, and webpages--for
presenting your findings;
- the *knitr* and *rmarkdown* **packages** for R and other tools,
including **command-line shell programs** like GNU Make and Git
version control, for dynamically tying your data gathering,
analysis, and presentation documents together so that they can be
easily reproduced;
- **RStudio**, a program that brings all of these tools together in
one place.
What Is Reproducible Research?
------------------------------
Though there is some debate over what are the necessary and sufficient
conditions for a replication [@Makel2014 2], research results are
generally considered *replicable* if there is sufficient information
available for independent researchers to make the same findings using
the same procedures with new data.[^1] For research that relies on
experiments, this can mean a researcher not involved in the original
research being able to rerun the experiment, including sampling, and
validate that the new results are comparable to the original ones. In
computational and quantitative empirical sciences, results are
replicable if independent researchers can recreate findings by following
the procedures originally used to gather the data and run the computer
code. Of course, it is sometimes difficult to replicate the original
data set because of issues such as limited resources to gather new data
or because the original study already sampled the full universe of
cases. So as a next-best standard we can aim for "*really reproducible
research*" [@Peng2011 1226].[^2] In computational sciences[^3] this
means:
> the data and code used to make a finding are available and they are
> sufficient for an independent researcher to recreate the finding.
In practice, research needs to be *easy* for independent researchers to
reproduce [@Ball2012]. If a study is difficult to reproduce it's more
likely that no one will reproduce it. If someone does attempt to
reproduce this research, it will be difficult for them to tell if any
errors they find were in the original research or problems they
introduced during the reproduction. In this book you will learn how to
avoid these problems.
In particular you will learn tools for dynamically "*knitting*"[^4] the
data and the source code together with your presentation documents.
Combined with well-organized source files and clearly and completely
commented code, independent researchers will be able to understand how
you obtained your results. This will make your computational research
easily reproducible.
Why Should Research Be Reproducible?
------------------------------------
Reproducible research is one of the main components of science. If
that's not enough reason for you to make your research reproducible,
consider that the tools of reproducible research also have direct
benefits for you as a researcher.
### For science
Replicability has been a key part of scientific inquiry from perhaps the
1200s [@Bacon1267; @Nosek2012]. It has even been called the "demarcation
between science and non-science" [@Braude1979 2]. Why is replication so
important for scientific inquiry?
##### Standard to judge scientific claims
*Replication* opens claims to scrutiny, allowing us to keep what works
and discard what doesn't. Science, according to the American Physical
Society, "is the systematic enterprise of gathering knowledge
...organizing and condensing that knowledge into testable laws and
theories". The "ultimate standard" for evaluating scientific claims is
whether or not the claims can be replicated [@Peng2011; @Kelly2006].
Research findings cannot even really be considered "genuine
contribution\[s\] to human knowledge" until they have been verified
through replication [@Stodden2009 38]. Replication "requires the
complete and open exchange of data, procedures, and materials".
Scientific conclusions that are not replicable should be abandoned or
modified "when confronted with more complete or reliable
...evidence".[^5]
*Reproducibility enhances replicability*. If other researchers are able
to clearly understand how a finding was originally made, then they will
be better able to conduct comparable research in meaningful attempts to
replicate the original findings. Sometimes strict replicability is not
feasible, for example, when it is only possible to gather one data set
on a population of interest. In these cases reproducibility is a
"minimum standard" for judging scientific claims [@Peng2011].
It is important to note that though reproducibility is a minimum
standard for judging scientific claims, "a study can be reproducible and
still be wrong" [@Peng2014]. For example, a statistically significant
finding in one study may remain statistically significant when
reproduced using the original data/code, but when researchers try to
replicate it using new data and even methods, they are unable to find a
similar result. The original finding could simply have been noise, even
though it is fully reproducible.
##### Avoiding effort duplication & encouraging cumulative knowledge development
Not only is reproducibility important for evaluating scientific claims,
it can also contribute to the cumulative growth of scientific knowledge
[@Kelly2006; @King1995]. Reproducible research cuts down on the amount
of time scientists have to spend gathering data or developing procedures
that have already been collected or figured out. Because researchers do
not have to discover on their own things that have already been done,
they can more quickly build on established findings and develop new
knowledge.
### For you
Working to make your research reproducible does require extra upfront
effort. For example, you need to put effort into learning the tools of
reproducible research by doing things such as reading this book. But
beyond the clear benefits for science, why should you make this effort?
Using reproducible research tools can make your research process more
effective and (hopefully) ultimately easier.
##### Better work habits
Making a project reproducible from the start encourages you to use
better work habits. It can spur you to more effectively plan and
organize your research. It should push you to bring your data and source
code up to a higher level of quality than you might if you "thought 'no
one was looking'" [@Donoho2010 386]. This forces you to root out
errors--a ubiquitous part of computational research--earlier in the
research process [@Donoho2010 385]. Clear documentation also makes it
easier to find errors.[^6]
Reproducible research needs to be stored so that other researchers can
actually access the data and source code. By taking steps to make your
research accessible for others you are also making it easier for
yourself to find your data and methods when you revise your work or
begin new a project. You are avoiding personal effort duplication,
allowing you to cumulatively build on your own work more effectively.
##### Better teamwork
The steps you take to make sure an independent researcher can figure out
what you have done also make it easier for your collaborators to
understand your work and build on it. This applies not only to current
collaborators, but also future collaborators. Bringing new members of a
research team up to speed on a cumulatively growing research project is
faster if they can easily understand what has been done already
[@Donoho2010 386].
##### Changes are easier
A third person may or may not actually reproduce your research even if
you make it easy for them to do so. But, *you will almost certainly
reproduce parts or even all of your own research*. No actual research
process is completely linear. You almost never gather data, run
analyses, and present your results without going backwards to add
variables, make changes to your statistical models, create new graphs,
alter results tables in light of new findings, and so on. You will
probably try to make these changes long after you last worked on the
project and long since you remembered the details of how you did it.
Whether your changes are because of journal reviewers' and conference
participants' comments or you discover that new and better data has been
made available since beginning the project, designing your research to
be reproducible from the start makes it much easier to change things
later on.
Dynamic reproducible documents in particular can make changing things
much easier. Changes made to one part of a research project have a way
of cascading through the other parts. For example, adding a new variable
to a largely completed analysis requires gathering new data and merging
it with existing data sets. If you used data imputation or matching
methods you may need to rerun these models. You then have to update your
main statistical analyses, and recreate the tables and graphs you used
to present the results. Adding a new variable essentially forces you to
reproduce large portions of your research. If when you started the
project you used tools that make it easier for others to reproduce your
research, you also made it easier to reproduce the work yourself. You
will have taken steps to have a "better relationship with \[your\]
future \[self\]" [@Bowers2011 2].
##### Higher research impact
Reproducible research is more likely to be useful for other researchers
than non-reproducible research. Useful research is cited more frequently
[@Donoho2002; @Piwowar2007; @Vandewalle2012]. Research that is fully
reproducible contains more information, i.e. more reasons to use and
cite it, than presentation documents merely showing findings.
Independent researchers may use the reproducible data or code to look at
other, often unanticipated, questions. When they use your work for a new
purpose they will (should) cite your work. Because of this, Vandewalle
et al. even argue that "the goal of reproducible research is to have
more impact with our research" [-@Vandewalle2007 1253].
A reason researchers often avoid making their research fully
reproducible is that they are afraid other people will use their data
and code to compete with them. I'll let Donoho et al. address this one:
> *True. But competition means that strangers will read your papers, try
> to learn from them, cite them, and try to do even better. If you
> prefer obscurity, why are you publishing?* [-@Donoho2009 16]
Who Should Read This Book?
--------------------------
This book is intended primarily for researchers who want to use a
systematic workflow that encourages reproducibility as well as practical
state-of-the-art computational tools to put this workflow into practice.
These people include professional researchers, upper-level
undergraduate, and graduate students working on computational
data-driven projects. Hopefully, editors at academic publishers will
also find the book useful for improving their ability to evaluate and
edit reproducible research.
The more researchers that use the tools of reproducibility the better.
So I include enough information in the book for people who have very
limited experience with these tools, including limited experience with
R, LaTeX, and Markdown. They will be able to start incorporating
reproducible research tools into their workflow right away. The book
will also be helpful for people who already have general experience
using technologies such as R and LaTeX, but would like to know how to
tie them together for reproducible research.
### Academic researchers
Hopefully so far in this chapter I've convinced you that reproducible
research has benefits for you as a member of the scientific community
and personally as a computational researcher. This book is intended to
be a practical guide for how to actually make your research
reproducible. Even if you already use tools such as R and LaTeX you may
not be leveraging their full potential. This book will teach you useful
ways to get the most out of them as part of a reproducible research
workflow.
### Students
Upper-level undergraduate and graduate students conducting original
computational research should make their research reproducible for the
same reasons that professional researchers should. Forcing yourself to
clearly document the steps you took will also encourage you to think
more clearly about what you are doing and reinforce what you are
learning. It will hopefully give you a greater appreciation of research
accountability and integrity early in your career [@Barr2012; @Ball2012
183].
Even if you don't have extensive experience with computer languages,
this book will teach you specific habits and tools that you can use
throughout your student research and hopefully your careers. Learning
these things earlier will save you considerable time and effort later.
### Instructors
When instructors incorporate the tools of reproducible research into
their assignments they not only build students' understanding of
research best practice, but are also better able to evaluate and provide
meaningful feedback on students' work [@Ball2012 183]. This book
provides a resource that you can use with students to put
reproducibility into practice.
If you are teaching computational courses, you may also benefit from
making your lecture material dynamically reproducible. Your slides will
be easier to update for the same reasons that it is easier to update
research. Making the methods you used to create the material available
to students will give them more information. Clearly documenting how you
created lecture material can also pass information on to future
instructors.
### Editors
Beyond a lack of reproducible research skills among researchers, an
impediment to actually creating reproducible research is a lack of
infrastructure to publish it [@Peng2011]. Hopefully, this book will be
useful for editors at academic publishers who want to be better at
evaluating reproducible research, editing it, and developing systems to
make it more widely available. The journal *Biostatistics* is a good
example of a publication that is encouraging (actually requiring)
reproducible research. From 2009 the journal has had an editor for
reproducibility that ensures replication files are available and that
results can be replicated using these files [@Peng2009]. The more
editors there are with the skills to work with reproducible research the
more likely it is that researchers will do it.
### Private sector researchers
Researchers in the private sector may or may not want to make their work
easily reproducible outside of their organization. However, that does
not mean that significant benefits cannot be gained from using the
methods of reproducible research. First, even if public reproducibility
is ruled out to guard proprietary information,[^7] making your research
reproducible to members of your organization can spread valuable
information about how analyses were done and data was collected. This
will help build your organization's knowledge and avoid effort
duplication. Just as a lack of reproducibility hinders the spread of
information in the scientific community, it can hinder it inside of a
private organization. Using the sort of dynamic automated processes run
with clearly documented source code we will learn in this book can also
help create robust data analysis methods that help your organization
avoid errors that may come from cutting-and-pasting data across
spreadsheets.[^8]
Also, the tools of reproducible research covered in this book enable you
to create professional standardized reports that can be easily updated
or changed when new information is available. In particular, you will
learn how to create batch reports based on quantitative data.
The Tools of Reproducible Research
----------------------------------
This book will teach you the tools you need to make your research highly
reproducible. Reproducible research involves two broad sets of tools.
The first is a **reproducible research environment** that includes the
statistical tools you need to run your analyses as well as "the ability
to automatically track the provenance of data, analyses, and results and
to package them (or pointers to persistent versions of them) for
redistribution". The second set of tools is a **reproducible research
publisher**, which prepares dynamic documents for presenting results and
is easily linked to the reproducible research environment [@Mesirov2010
415].
In this book we will focus on learning how to use the widely available
and highly flexible reproducible research environment--R/RStudio
[@RLanguage; @RStudioCite].[^9] R/RStudio can be linked to numerous
reproducible research publishers such as LaTeX and Markdown with Yihui
Xie's *knitr* package [-@R-knitr] or the related *rmarkdown* package
[@R-rmarkdown]. The main tools covered in this book include:
- **R**: a programming language primarily for statistics and graphics.
It can also be useful for data gathering and creating presentation
documents.
- ***knitr* and *rmarkdown***: related R packages for literate
programming. They allow you to combine your statistical analysis and
the presentation of the results into one document. They work with R
and a number of other languages such as Bash, Python, and Ruby.
- **Markup languages**: instructions for how to format a presentation
document. In this book we cover LaTeX, Markdown, and a little HTML.
- **RStudio**: an integrated developer environment (IDE) for R that
tightly combines R, *knitr*, *rmarkdown*, and markup languages.
- **Cloud storage & versioning**: Services such as Dropbox and
Git/GitHub that can store data, code, and presentation files, save
previous versions of these files, and make this information widely
available.
- **Unix-like shell programs**: These tools are useful for working
with large research projects.[^10] They also allow us to use
command-line tools including GNU Make for compiling projects and
Pandoc, a program useful for converting documents from one markup
language to another.
Why Use R, *knitr*/*rmarkdown*, and RStudio for Reproducible Research?
----------------------------------------------------------------------
##### Why R?
Why use a statistical programming language like R for reproducible
research? R has a very active development community that is constantly
expanding what it is capable of. As we will see in this book, R enables
researchers across a wide range of disciplines to gather data and run
statistical analyses. Using the *knitr* or *rmarkdown* package, you can
connect your R-based analyses to presentation documents created with
markup languages such as LaTeX and Markdown. This allows you to
dynamically and reproducibly present results in articles, slideshows,
and webpages.
The way you interact with R has benefits for reproducible research. In
general you interact with R (or any other programming and markup
language) by explicitly writing down your steps as source code. This
promotes reproducibility more than your typical interactions with
Graphical User Interface (GUI) programs like SPSS[^11] and Microsoft
Word. When you write R code and embed it in presentation documents
created using markup languages, you are forced to explicitly state the
steps you took to do your research. When you do research by clicking
through drop-down menus in GUI programs, your steps are lost, or at
least documenting them requires considerable extra effort. Also it is
generally more difficult to dynamically embed your analysis in
presentation documents created by GUI word processing programs in a way
that will be accessible to other researchers both now and in the future.
I'll come back to these points in Chapter
[\[GettingStartedRR\]](#GettingStartedRR){reference-type="ref"
reference="GettingStartedRR"}.
##### Why and ?
Literate programming is a crucial part of reproducible quantitative
research.[^12] Being able to directly link your analyses, your results,
and the code you used to produce the results makes tracing your steps
much easier. There are many different literate programming tools for a
number of different programming languages.[^13] Previously, one of the
most common tools for researchers using R and the LaTeX markup language
was *Sweave* [@Leisch2002]. The packages I am going to focus on in this
book are newer and have more capabilities. They are called *knitr* and
*rmarkdown*. Why are we going to use these tools in this book and not
*Sweave* or some other tool?
The simple answer is that they are more capable than *Sweave*. Both
*knitr* and *rmarkdown* can work with markup languages other than LaTeX
including Markdown and HTML. *rmarkdown* can even output Microsoft Word
documents. They can work with programming languages other than R. They
highlight R code in presentation documents making it easier for your
readers to follow.[^14] They give you better control over the inclusion
of graphics and can cache code chunks, i.e. save the output for later.
*knitr* has the ability to understand *Sweave*-like syntax, so it will
be easy to convert backwards to *Sweave* if you want to.[^15] You also
have the choice to use much simpler and more straightforward syntax with
*knitr* and *rmarkdown*.
*knitr* and *rmarkdown* have broadly similar capabilities and syntax.
They both are literate programming tools that can produce presentation
documents from multiple markup languages. They have almost identical
syntax when used in Markdown. Their main difference is that they take
different approaches to creating presentation documents. *knitr*
documents must be written using the markup language associated with the
desired output. For example, with *knitr*, LaTeX must be used to create
PDF output documents and Markdown or HTML must be used to create
webpages. *rmarkdown* builds directly on *knitr*, the key difference
being that it uses the straightforward Markdown markup language to
generate PDF, HTML, and MS Word documents.[^16]
Because you write with the simple Markdown syntax, *rmarkdown* is
generally easier to use. It has the advantage of being able to take the
same markup document and output multiple types of presentation
documents. Nonetheless, for complex documents like books and long
articles or work that requires custom formatting, *knitr* LaTeX is often
preferable and extremely flexible, though the syntax is more
complicated.
##### Why RStudio?
Why use the RStudio integrated development environment for reproducible
research? R by itself has the capabilities necessary to gather data,
analyze it, and, with a little help from *knitr*/*rmarkdown* and markup
languages, present results in a way that is highly reproducible. RStudio
allows you to do all of these things, but simplifies many of them and
allows you to navigate through them more easily. It also is a happy
medium between R's text-based interface and a pure GUI.
Not only does RStudio do many of the things that R can do but more
easily, it is also a very good standalone editor for writing documents
with LaTeX and Markdown. For LaTeX documents it can, for example, insert
frequently used commands like `\section{}` for numbered sections (see
Chapter [\[LatexChapter\]](#LatexChapter){reference-type="ref"
reference="LatexChapter"}).[^17] There are many LaTeX editors available,
both open source and paid. But RStudio is currently the best program for
creating reproducible LaTeX and Markdown documents. It has full syntax
highlighting. Its syntax highlighting can even distinguish between R
code and markup commands in the same document. It can spell check LaTeX
and Markdown documents. It handles *knitr*/*rmarkdown* code chunks
beautifully (see Chapter
[\[GettingStartedRKnitr\]](#GettingStartedRKnitr){reference-type="ref"
reference="GettingStartedRKnitr"}).
Finally, RStudio not only has tight integration with various markup
languages, it also has capabilities for using other tools such as C++,
CSS, JavaScript, and a few other programming languages. It is closely
integrated with the version control programs Git and SVN. Both of these
programs allow you to keep track of the changes you make to your
documents (see Chapter [\[Storing\]](#Storing){reference-type="ref"
reference="Storing"}). This is important for reproducible research since
version control programs can document many of your research steps. It
also has a built-in ability to make HTML slideshows from
*knitr*/*rmarkdown* documents. Basically, RStudio makes it easy to
create and navigate through complex reproducible research documents.
### Installing the main software {#InstallR}
Before you read this book you should install the main software. All of
the software programs covered in this book are open source and can be
easily downloaded for free. They are available for Windows, Mac, and
Linux operating systems. They should run well on most modern computers.
You should install R before installing RStudio. You can download the
programs from the following websites:
- **R**: <http://www.r-project.org/>,
- **RStudio Desktop (Open Source License)**:
<http://www.rstudio.com/products/rstudio/download/>.
The download webpages for these programs have comprehensive information
on how to install them, so please refer to those pages for more
information.
After installing R and RStudio you will probably also want to install a
number of user-written packages that are covered in this book. To
install all of these user-written packages, please see page .
##### Installing markup languages {#InstallMarkup}
If you are planning to create LaTeX documents you need to install a TeX
distribution.[^18] They are available for Windows, Mac, and Linux
systems. They can be found at: <http://www.latex-project.org/ftp.html>.
Please refer to that site for more installation information.
If you want to create Markdown documents you can separately install the
*markdown* package in R. You can do this the same way that you install
any package in R, with the command.[^19]
##### GNU Make
If you are using a Linux computer you already have GNU
Make[\[InstallMake\]]{#InstallMake label="InstallMake"} installed.[^20]
Mac users will need to install the command-line developer tools. There
are two ways to do this. One is go to the App Store and download Xcode
(it's free). Once Xcode is installed, install command-line tools, which
you will find by opening Xcode then clicking on `Preference`
`Downloads`. However, Xcode is a very large download and you only need
the command-line tools for Make. To install just the command-line tools,
open the Terminal and try to run Make by typing `make` and hitting
return. A box should appear asking you if you want to install the
command-line developer tools. Click `Install`. Windows users will have
Make installed if they have already installed Rtools (see page ). Mac
and Windows users will need to install this software not only so that
GNU Make runs properly, but also so that other command-line tools work
well.
##### Other Tools
We will discuss other tools such as Git that can be a useful part of a
reproducible research workflow. Installation instructions for these
tools will be discussed below.
Book Overview
-------------
The purpose of this book is to give you the tools that you will need to
do reproducible research with R and RStudio. This book describes a
workflow for reproducible research primarily using R and RStudio. It is
designed to give you the necessary tools to use this workflow for your
own research. It is not designed to be a complete reference for R,
RStudio, *knitr*/*rmarkdown*, Git, or any other program that is a part
of this workflow. Instead it shows you how these tools can fit together
to make your research more reproducible. To get the most out of these
individual programs I will along the way point you to other resources
that cover these programs in more detail.
To that end, I can recommend a number of resources that cover more of
the nitty-gritty:[\[OtherBooks\]]{#OtherBooks label="OtherBooks"}
- Michael J. Crawley's [-@Crawley2013] encyclopaedic R book,
appropriately titled ***The R Book***, published by Wiley.
- Hadley Whickham [-@Whickham2014book] has a great new book out from
Chapman and Hall on ***Advanced R***.
- Yihui Xie's [-@Xie2013] book ***Dynamic Documents with R and
knitr***, published by Chapman and Hall, provides a comprehensive
look at how to create documents with *knitr*. It's a good complement
to this book's generally more research project--level focus.
- Norman Matloff's [-@Matloff2011] tour through the programming
language aspects of R called ***The Art of R Programming: A Tour of
Statistical Design Software***, published by No Starch Press.
- Cathy O'Neil and Rachel Schutt [-@ONeil2013] give a great
introduction the field of data science generally in ***Doing Data
Science***, published by O'Reilly Media Inc.
- For an excellent introduction to the command-line in Linux and Mac,
see William E. Shotts Jr.'s [-@ShottsJr2012] book ***The Linux
Command-line: A Complete Introduction*** also published by No Starch
Press. It is also helpful for Windows users running PowerShell (see
Chapter
[\[DirectoriesChapter\]](#DirectoriesChapter){reference-type="ref"
reference="DirectoriesChapter"}).
- The RStudio website (<http://www.rstudio.com/ide/docs/>) has a
number of useful tutorials on how to use *knitr* with LaTeX and
Markdown. They also have very good documentation for *rmarkdown* at
<http://rmarkdown.rstudio.com/>.
That being said, my goal is for this book to be *self-sufficient*. A
reader without a detailed understanding of these programs will be able
to understand and use the commands and procedures I cover in this book.
While learning how to use R and the other programs I personally often
encountered illustrative examples that included commands, variables, and
other things that were not well explained in the texts that I was
reading. This caused me to waste many hours trying to figure out, for
example, what the `$` is used for (preview: it's the component selector,
see Section [\[ComponentSelect\]](#ComponentSelect){reference-type="ref"
reference="ComponentSelect"}). I hope to save you from this wasted time
by either providing a brief explanation of possibly frustrating and
mysterious things and/or pointing you in the direction of good
explanations.
### How to read this book
This book gives you a workflow. It has a beginning, middle, and end. So,
unlike a reference book, it can and should be read linearly as it takes
you through an empirical research processes from an empty folder to a
completed set of documents that reproducibly showcase your findings.
That being said, readers with more experience using tools like R or
LaTeX may want to skip over the nitty-gritty parts of the book that
describe how to manipulate data frames or compile LaTeX documents into
PDFs. Please feel free to skip these sections.
##### More-experienced R users
If you are an experienced R user you may want to skip over the first
section of Chapter
[\[GettingStartedRKnitr\]](#GettingStartedRKnitr){reference-type="ref"
reference="GettingStartedRKnitr"}: Getting Started with R, RStudio, and
*knitr*/*rmarkdown*. But don't skip over the whole chapter. The latter
parts contain important information on the *knitr*/*rmarkdown* packages.
If you are experienced with R data manipulation you may also want to
skip all of Chapter [\[DataClean\]](#DataClean){reference-type="ref"
reference="DataClean"}.
##### More-experienced LaTeX users
If you are familiar with LaTeX you might want to skip the first part of
Chapter [\[LatexChapter\]](#LatexChapter){reference-type="ref"
reference="LatexChapter"}. The second part may be useful as it includes
information on how to dynamically create BibTeX bibliographies with
*knitr* and how to include *knitr* output in a Beamer slideshow.
##### Less-experienced LaTeX/Markdown users
If you do not have experience with LaTeX or Markdown you may benefit
from reading, or at least skimming, the introductory chapters on these
top topics (chapters
[\[LatexChapter\]](#LatexChapter){reference-type="ref"
reference="LatexChapter"} and
[\[MarkdownChapter\]](#MarkdownChapter){reference-type="ref"
reference="MarkdownChapter"}) before reading Part III.
### Reproduce this book
This book practices what it preaches. It can be reproduced. I wrote the
book using the programs and methods that I describe. Full documentation
and source files can be found at the book's GitHub repository. Feel free
to read and even use (within reason and with attribution, of course) the
book's source code. You can find it at:
<https://GitHub.com/christophergandrud/Rep-Res-Book>. This is especially
useful if you want to know how to do something in the book that I don't
directly cover in the text.
If you notice any errors or places where the book can be improved please
report them on the book's GitHub Issues page:
<https://GitHub.com/christophergandrud/Rep-Res-Book/issues>. Corrections
will be posted at:
<http://christophergandrud.GitHub.io/RepResR-RStudio/errata.htm>.
### Contents overview
The book is broken into four parts. The first part (chapters
[\[GettingStartedRR\]](#GettingStartedRR){reference-type="ref"
reference="GettingStartedRR"},
[\[GettingStartedRKnitr\]](#GettingStartedRKnitr){reference-type="ref"
reference="GettingStartedRKnitr"}, and
[\[DirectoriesChapter\]](#DirectoriesChapter){reference-type="ref"
reference="DirectoriesChapter"}) gives an overview of the reproducible
research workflow as well as the general computer skills that you'll
need to use this workflow. Each of the next three parts of the book
guides you through the specific skills you will need for each part of
the reproducible research process. Part two (chapters
[\[Storing\]](#Storing){reference-type="ref" reference="Storing"},
[\[DataGather\]](#DataGather){reference-type="ref"
reference="DataGather"}, and
[\[DataClean\]](#DataClean){reference-type="ref" reference="DataClean"})
covers the data gathering and file storage process. The third part
(chapters [\[StatsModel\]](#StatsModel){reference-type="ref"
reference="StatsModel"},
[\[TablesChapter\]](#TablesChapter){reference-type="ref"
reference="TablesChapter"}, and
[\[FiguresChapter\]](#FiguresChapter){reference-type="ref"
reference="FiguresChapter"}) teaches you how to dynamically incorporate
your statistical analysis, results figures, and tables into your
presentation documents. The final part (chapters
[\[LatexChapter\]](#LatexChapter){reference-type="ref"
reference="LatexChapter"},
[\[LargeDocs\]](#LargeDocs){reference-type="ref" reference="LargeDocs"},
and [\[MarkdownChapter\]](#MarkdownChapter){reference-type="ref"
reference="MarkdownChapter"}) covers how to create reproducible
presentation documents including LaTeX articles, books, slideshows, and
batch reports as well as Markdown webpages and slideshows.
[^1]: This is close to what [@Lykken1968] calls "operational
replication".
[^2]: The idea of really reproducible computational research was
originally thought of and implemented by Jon Claerbout and the
Stanford Exploration Project beginning in the 1980s and early 1990s
[@Fomel2009; @Donoho2009]. Further seminal advances were made by
Jonathan B. Buckheit and David L. Donoho who created the Wavelab
library of MATLAB routines for their research on wavelets in the
mid-1990s [@Buckheit1995].
[^3]: Reproducibility is important for both quantitative and qualitative
research [@King1994]. Nonetheless, we will focus mainly on on
methods for reproducibility in quantitative computational research.
[^4]: Much of the reproducible computational research and literate
programming literatures have traditionally used the term "weave" to
describe the process of combining source code and presentation
documents [see @Knuth1992 101]. In the R community weave is usually
used to describe the combination of source code and LaTeX documents.
The term "knit" reflects the vocabulary of the *knitr* R package
(knit + R). It is used more generally to describe weaving with a
variety of markup languages. The term is used by RStudio if you are
using the *rmarkdown* package, which is similar to *knitr*. We also
cover the *rmarkdown* package in this book. Because of this, I use
the term knit rather than weave in this book.
[^5]: See the American Physical Society's website at
<http://www.aps.org/policy/statements/99_6.cfm>. See also
[@Fomel2009].
[^6]: Of course, it's important to keep in mind that reproducibility is
"neither necessary nor sufficient to prevent mistakes"
[@Stodden2009b].
[^7]: There are ways to enable some public reproducibility without
revealing confidential information. See [@Vandewalle2007] for a
discussion of one approach.
[^8]: See this post by David Smith about how the J.P. Morgan "London
Whale" problem may have been prevented with the type of processes
covered in this book:
<http://blog.revolutionanalytics.com/2013/02/did-an-excel-error-bring-down-the-london-whale.html>
(posted 11 February 2013).
[^9]: The book was created with R version and developer builds of
RStudio version 0.99.370.
[^10]: In this book I cover the Bash shell for Linux and Mac as well as
Windows PowerShell.
[^11]: I know you can write scripts in statistical programs like SPSS,
but doing so is not encouraged by the program's interface and you
often have to learn multiple languages for writing scripts that run
analyses, create graphics, and deal with matrices.
[^12]: Donald Knuth coined the term literate programming in the 1970s to
refer to a source file that could be both run by a computer and
"woven" with a formatted presentation document [@Knuth1992].
[^13]: A very interesting tool that is worth taking a look at for the
Python programming language is HTML Notebooks created with IPython.
For more details see
<http://ipython.org/ipython-doc/dev/notebook/index.html>.
[^14]: Syntax highlighting uses different colors and fonts to
distinguish different types of text.
[^15]: Note that the Sweave-style syntax is not identical to actual
*Sweave* syntax. See Yihui Xie's discussion of the differences
between the two at: <http://yihui.name/knitr/demo/sweave/>. *knitr*
has a function (`Sweave2knitr`) for converting *Sweave* to *knitr*
syntax.
[^16]: It does this by relying on a tool called Pandoc [@Pandoc2014].
[^17]: If you are more comfortable with a what-you-see-is-what-you-get
(WYSIWYG) word processor like Microsoft Word, you might be
interested in exploring Lyx. It is a WYSIWYG-like LaTeX editor that
works with *knitr*. It doesn't work with the other markup languages
covered in this book. For more information see:
<http://www.lyx.org/>. I give some brief information on using Lyx
with *knitr* in Chapter 3's Appendix.
[^18]: LaTeX is is really a set of macros for the TeX typesetting
system. It is included in all major TeX distributions.
[^19]: The exact command is: .
[^20]: To verify this, open the Terminal and type: `make –version` (I
used version 3.81 for this book). This should output details about
the current version of Make installed on your computer.
================================================
FILE: Old/Source-v2/Children/Chapter10/chapter10.Rnw
================================================
% Chapter Chapter 10 For Reproducible Research in R and RStudio
% Christopher Gandrud
% Created: 16/07/2012 05:45:03 pm CEST
% Updated: 2 April 2015
<<set-parent10, echo=FALSE, results='hide', cache=FALSE>>=
set_parent('Rep-Res-Parent.Rnw')
@
\chapter{Showing Results with Figures}\label{FiguresChapter}
One of the main reasons that many people use R is to take advantage of its comprehensive and powerful set of data visualization tools. Visually displaying information with graphics is often a much more effective way of presenting both descriptive statistics and analysis results than the tables we covered in the last chapter.\footnote{There are, of course, a number of exceptions to this rule of thumb. \citeauthor{vanBelle2008} \citeyearpar[][Ch. 9]{vanBelle2008} argues that a few numbers should be listed in a sentence, many numbers shown in tables, and relationships between numbers are best shown with graphs. Similarly, \cite{Tufte2001} argues that tables tend to outperform graphics for displaying 20 or fewer numbers. Graphics often outperform tables for showing larger data sets and relationships within the data.}
Nonetheless, dynamically incorporating figures with \emph{knitr}/\emph{rmarkdown} has many of the same benefits as dynamically including tables, especially the ability to have data set or analysis changes automatically cascade into your presentation documents. The basic process for including figures in knitted presentation documents is also very similar to including tables, though there are some important extra considerations we need to make to properly size the figures and be able to include interactive visualizations in our presentation documents.
In this chapter we will first briefly learn how to include non-knitted graphics in LaTeX and Markdown documents before turning to look at how to dynamically knit R graphics into presentation documents. In the remainder of the chapter we will look at how to actually create graphics with R including some of the fundamentals of R's default graphics package, as well as the \emph{ggplot2} \citep{R-ggplot2}\index{ggplot2} and \emph{googleVis} \citep{R-googleVis}\index{googleVis} packages. In each case we will focus on how to include the figures created by these packages in knitted presentation documents.
\section{Including Non-knitted Graphics}
Understanding how \emph{knitr}/\emph{rmarkdown} dynamically include figures is easier if you understand how figures are normally included in LaTeX and Markdown. Unlike a word processing program like Microsoft Word\index{Microsoft Word}, in LaTeX, Markdown, HTML, and other markup languages you don't copy and paste figures into your document. Instead you link to an image file outside of your markup document. Typically these image files are in formats such as \emph{PDF}, \emph{PNG}, and \emph{JPEG}.\index{PDF}\index{PNG}\index{JPEG}\footnote{PDF: Portable Document Format, PNG: Portable Network Graphic, JPEG: Joint Photographic Experts Group. \\ A quick note about file formats: By default \emph{knitr} creates PDF formatted figure files when knitting R LaTeX documents. These figures, generally built with vector graphics,\index{vector graphics} allow you to zoom in on them by any amount without them becoming pixelated. This means that your images will be crisp in PDF presentation documents. For Markdown documents,\index{Markdown!figure formats} \emph{knitr} creates PNG images. PNG images are usually relatively high quality and can be rendered directly on websites, unlike PDFs. JPEG formatted files usually take up less disk space than PDF and PNG files. However, their quality is also worse and can often look very pixelated. For more information, Wikipedia has a comprehensive comparison of graphics file formats at: \url{http://en.wikipedia.org/wiki/Comparison_of_graphics_file_formats}.}
There are three advantages to this method of including graphics over cut and paste. The first is that whenever the image files are changed, the changes are updated in the final presentation document when it is compiled, no recopying and pasting. The second advantage is that the images are sized and placed with the markup code rather than pointing and clicking. This is tedious at first, but saves considerable time and frustration when a document becomes larger. It also makes it easy to consistently format multiple images in a document. Finally, because the image is not actually loaded in the markup file, you won't notice any sluggishness while editing the markup document that you get in a traditional word processor if there are many images.
If the image files are in the same directory as the markup document, we don't need to specify the image's file path, only its name. If they are in another directory, we need to include additional file path information.\index{file path} Remember to use relative paths when possible.\index{file path!relative} In this section we will learn how to include graphics files in documents created with LaTeX and Markdown.
\subsection{Including graphics in LaTeX}\index{LaTeX!graphics}
The main way to include graphics (graphs, photos, and so on) in LaTeX documents is to use the \texttt{includegraphics}\index{LaTeX command!includegraphics} command to link to image files. To have the full range of features for \texttt{includegraphics}, make sure to load the \emph{graphicx}\index{LaTeX package!graphicx} package in your document's preamble. Imagine that we wanted to include an image of butterflies\index{butterfly} stored in a file called \emph{HeliconiusMimicry.png} in a LaTeX-produced document.\footnote{The image used here is from \cite{Meyer2006}.} We type:
<<Ch10IncludeGraphics, eval=FALSE, tidy=FALSE>>=
\includegraphics[scale=0.8]{HeliconiusMimicry.png}
@
\noindent In the square brackets you'll notice \verb|scale=0.8|. This formats the image to be included at 80 percent of its actual size. You can use other options such as \texttt{height} to specify the height, \texttt{width} to specify the width, and \texttt{angle} to specify the angle at which to rotate the image. You can add more than one option if they are separated by commas. Rather than hard coding the width in exact centimeters, you can determine its width as a proportion of the text width using \verb|\textwidth|\index{LaTeX!textwidth}.\footnote{Note there are a number of other ways to set the size of a figure relative to a page element. See: LaTeX Wiki Book for more details: \url{http://en.wikibooks.org/wiki/LaTeX/Page_Layout}.} For example, to set our image at 80 percent of the text width we can type:
<<Ch10IncludeGraphicsTextWidth, eval=FALSE, tidy=FALSE>>=
\includegraphics[width=0.8\textwidth]{HeliconiusMimicry.png}
@
\paragraph{{\tt{figure}} float environment}\index{LaTeX environment!figure}
Most often you will want to include LaTeX figures in a \texttt{figure} float environment. The \emph{figure} environment works almost exactly the same way as the \texttt{table} environment we saw in the last chapter. It allows you to separate the figure from the text, add a caption, and label the figure. We begin the environment with \verb|\begin{figure}[POSITION_SPEC]|. \verb|POSITION_SPEC| can have the same values as we saw earlier with tables (page \pageref{POSITIONSPEC}). We can then include a \texttt{caption}\index{LaTeX!caption} and \texttt{label} command.\index{LaTeX command!label} The environment is closed with \verb|\end{figure}|. For example, to create Figure \ref{ExampleLaTeXFigure}, I used the following code:\footnote{For simplicity, this code does not include the full image's actual file path.}
<<Ch10FigFloat, eval=FALSE, tidy=FALSE>>=
\begin{figure}[ht]
\caption{An Example Figure in LaTeX}
\label{ExampleLaTeXFigure}
\begin{center}
\includegraphics[scale=0.8]{HeliconiusMimicry.png}
\end{center}
{\scriptsize{Source: \cite{Meyer2006}}}
\end{figure}
@
\noindent Notice that after the call to end the \texttt{center} environment we include \verb|{\scriptsize{Source: \cite{Meyer2006}}}|. This simply includes a note in the figure environment giving the image's source. The note moves with the figure and is separate from the text. The \texttt{scriptsize}\index{LaTeX command!scriptsize} command transforms the text to smaller than normal size font. See Chapter \ref{LatexChapter} (Section \ref{FontSize}) for more details on LaTeX font sizes. The command \verb|\cite{Meyer2006}| inserts a citation from the bibliography for \cite{Meyer2006}. We will discuss bibliographies in more detail in the next chapter (Section \ref{BibTeXBib}).
\begin{figure}[ht]
\caption{An Example Figure in LaTeX}
\label{ExampleLaTeXFigure}
\begin{center}
\includegraphics[scale=0.8]{Children/Chapter10/images10/HeliconiusMimicry.png}
\end{center}
{\scriptsize{Source: \cite{Meyer2006}}}
\end{figure}
\subsection{Including graphics in Markdown/HTML}\index{Markdown!graphics}
Markdown has a similar command as LaTeX's \texttt{includegraphics}. It goes like this: \verb||.\index{![]()} This syntax may seem strange now, but it will hopefully make more sense when we cover Markdown hyperlinks in Chapter \ref{MarkdownChapter} (Section \ref{MarkdownLinks}) as this is what it is intended to imitate. \verb|ALT_TEXT| refers to HTML's \texttt{alt} (alternative text)\index{HTML!alt} attribute. This should be a very short description of the image that will appear if it fails to load in a web browser. \verb|FILE_PATH| specifies the image's file path.\footnote{You can also include a title in quotation marks after the file path. This specifies the HTML \texttt{title} attribute. However, this attribute does not create a title for the image in the way that \texttt{caption} does for LaTeX float figures. Instead it creates a tooltip\index{tooltip}, a small box that appears when you place your cursor over the image. Specifying descriptive alt text is very useful for screen readers that help visually impaired people access web content.} Here is an example using the image we worked with before.\label{TitleAttribute}
<<Ch10MarkdownImage, eval=FALSE, tidy=FALSE>>=

@
\noindent Note that the file path can be a URL. You may, for example, store an image in the Dropbox Public folder or on GitHub and use its URL to link to it in the Markdown document.\footnote{For images stored on GitHub\index{GitHub} use the URL for the raw version of the file.}
Markdown does not include ways to resize or re-position an image, so that the syntax would stay simple. If you want to resize or position your image you will have to use HTML\index{HTML} markup. Probably the simplest way to include images with HTML is by using the \texttt{img} (image) element tag.\index{HTML element!img} To create the equivalent of what we just did in Markdown with HTML we type:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{1, 1, 1}
\color{fgcolor}
\begin{kframe}
\begin{alltt}
<img src="HeliconiusMimicry.png" alt="ButterflyImage"></img>
\end{alltt}
\end{kframe}
\end{knitrout}
\noindent The \texttt{src} (script)\index{HTML attribute!src} attribute specifies the file path. To change the width and height of the image we can use the \texttt{width}\index{HTML attribute!width} and \texttt{height}\index{HTML attribute!height} attributes. For example:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{1, 1, 1}
\color{fgcolor}
\begin{kframe}
\begin{alltt}
<img src="HeliconiusMimicry.png" alt="ButterflyImage"
width="100px" height="100px"></img>
\end{alltt}
\end{kframe}
\end{knitrout}
\noindent creates an image that is 100 pixels (\texttt{px}) wide by 100 pixels\index{pixel}\index{px} high.\footnote{A pixel is the smallest discrete part of images displayed on a screen. See the ``pixel'' Wikipedia page for more details: \url{http://en.wikipedia.org/wiki/Pixel}.} It is also possible to specify the alignment of figures in Markdown with a custom CSS style file. I don't cover how to do that here.
\section{Basic \emph{knitr}/\emph{rmarkdown} Figure Options}
So far we have looked at how to include images that have already been created into our LaTeX and Markdown documents. \emph{knitr}, and by extension \emph{rmarkdown}, allow us to combine a figure's creation by R with its inclusion in a presentation document. They are tied together and update together. We use \emph{knitr} chunk options to specify how the figure will look in the presentation document and where it will be saved. Let's learn some of the more important chunk options for figures.
\subsection{Chunk options}
\paragraph{{\tt{fig.path}}}\index{knitr option!fig.path}
When you use \emph{knitr} to create and include figures in your presentation documents it (1) runs the code you give it to create the figure, (2) automatically saves it into a particular directory,\footnote{If a code chunk creates more than one figure, \emph{knitr} automatically saves each into its own file in the same directory.} and (3) includes the necessary LaTeX or Markdown code to include the figure in the final presentation document. By default \emph{knitr} saves images into a folder (it creates) called \emph{figure} located in the working directory.\footnote{File names are based on the code chunk label where they were created.} You can tell \emph{knitr} where to save the images with the \texttt{fig.path} option. Simply use the file path naming conventions suitable for your system and include the new path in quotation marks.
\paragraph{{\tt{out.height}}}\index{knitr option!out.height}
To set the height that a figure will be in the final presentation document use the \texttt{out.height} option. In R LaTeX documents you can set the width using centimeters, inches, or as a proportion of a page element. In R Markdown documents you use pixels to set the height. For example, to set a figure's height in an R Markdown document to 200 pixels use \verb|out.height='200px'|.
\paragraph{{\tt{out.width}}}\index{knitr option!out.width}
Similarly, we can set the width of a \emph{knitr} created figure using the \texttt{out.width} option. The same rules apply as with \texttt{out.width}. For example, to have a figure shown up at 80 percent of the text width in an R LaTeX document use: \verb|out.width='0.8\\textwidth'|. Notice that that there are two backslashes before \texttt{textwidth}.\index{LaTeX!textwidth} As we saw earlier, the LaTeX command only has one. However, all \emph{knitr} code chunk options must be written as they would be in R. We need to escape the backslash with the backslash escape character, i.e. use two backslashes.
\paragraph{{\tt{fig.align}}}\index{knitr option!fig.align}
You can set a knitted figure's alignment using \texttt{fig.align}. The option can be set to \texttt{left}, \texttt{center}, or \texttt{right}. To center a figure, add \verb|fig.align='center'|.
\paragraph{Other figure chunk options}
The previous options are probably the most commonly used ways of adjusting figures with \emph{knitr}. However, \emph{knitr} has many other chunk options to help you adjust your figures so that they are incorporated into your presentation documents the way that you want. The option \texttt{fig.cap}\index{knitr option!fig.cap} allows you to set a figure's LaTeX caption and \texttt{fig.lb}\index{knitr option!fig.lb} allows you to set the label.\footnote{In this chapter we will set these options in the markup rather than the code chunk. I prefer doing this because \emph{knitr} options need to be on the same line and so can sometimes result in very long lists of options that are difficult to read.} As we will see below (page \pageref{DevTalk}), you can use the \verb|dev| option to choose the figure's output file format, e.g. PDF, PNG, JPEG. Please see the official \emph{knitr} code chunk options webpage for more information on figure chunk options: \url{http://yihui.name/knitr/options#chunk_options}.
\subsection{Global options}
If you want all of your figures to share the same options--e.g. same height and alignment--you can set global figure options at the beginning of your document with \verb|opts_chunk$set|.\index{knitr!global chunk options}\index{knitr!opts\_chunk} Imagine that we are making an R LaTeX Sweave-style document and want all of our figures to be center aligned and 80 percent of the text width. We type:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{1, 1, 1}
\color{fgcolor}
\begin{kframe}
\begin{alltt}
\textless{\textless}include=FALSE\textgreater{\textgreater}=
opts_chunk$\hlkwd{set.}(fig.align = \hlstr{"center"},
out.width = \hlstr{"0.8\textbackslash{}\textbackslash{}textwidth"})
@
\end{alltt}
\end{kframe}
\end{knitrout}
\noindent You can also set some global figure options, such as \texttt{fig\_height} and \texttt{fig\_width} in your \emph{rmarkdown} YAML header.\index{rmarkdown!header}
\section{Knitting R's Default Graphics}
R's \emph{graphics} package\index{graphics, R package}--loaded by default--includes commands to create numerous plot types. These include \texttt{hist}\index{hist} for histograms, \texttt{pairs}\index{R function!pairs} for scatterplot matrices, \texttt{boxplot}\index{boxplot} for creating boxplots, and the versatile \texttt{plot}\index{R function!plot} for creating x-y plots--including scatterplots\index{scatterplot} and bar charts\index{bar chart} depending on the data's type.
There are many useful resources for learning how to fully utilize R's default graphics capabilities. These include Paul Murrell's \citeyearpar{Murrell2011} very comprehensive \emph{R Graphics} book. The Cookbook for R\footnote{\url{http://www.cookbook-r.com/Graphs/}} and Quick-R\footnote{\url{http://www.statmethods.net/advgraphs/}} websites are also very helpful. Winston Chang, the maintainer of the Cookbook for R, also has a full book devoted to creating R graphics \citeyearpar{Chang2012}.
In this section we are going to see how to include R's default graphics in our LaTeX and Markdown presentation documents. We will also see an example of how to source the creation of a graph from a segmented analysis file. Most of R's default graphics capabilities create static graphics. They are not animations or interactive. The discussion in this section is exclusively about using static graphics with \emph{knitr}/\emph{rmarkdown}. Later in the chapter we will discuss how to knit interactive graphics.
Let's look at an example we first saw at the end of Chapter \ref{StatsModel} (Section \ref{SourceCarsGraph}). Remember that we accessed an R source code file stored on GitHub to create a simple scatterplot of cars' speed and stopping distances using R's \emph{cars} data set, which is loaded by default. We haven't yet seen the code in the R source file that created the plot. The variable \textbf{speed} contains the stopping speed and \textbf{dist} contains the stopping distances. Here is the code to create the plot:
<<Ch10CarsPlotCode, eval=FALSE, tidy=FALSE>>=
# Create simple scatterplot of cars' speed and stopping distance
plot(x = cars$speed, y = cars$dist,
xlab = "Speed (mph)",
ylab = "Stopping Distance (ft)",
cex.lab = 1.5)
@
\noindent We select the variables from \emph{cars} to plot on the $x$ and $y$ axes of our graph with the component selector (\verb|$|). Then we use the \texttt{xlab}\index{xlab} and \texttt{ylab}\index{ylab} arguments to specify the $x$ and $y$ axis labels. We could have added a title for the plot using the \texttt{main}\index{main} argument. We didn't do this because we will give the plot a title in the LaTeX \texttt{figure} environment. The \texttt{cex.lab}\index{cex.lab} argument increased the labels' font size. The argument specifically determines how to scale the labels relative to the default size. 1.5 means 50 percent larger than the default.
Now let's see how to create this plot with \emph{knitr} and include it in a LaTeX \texttt{figure} environment.\index{LaTeX environment!figure}
{\small
\begin{knitrout}
\definecolor{shadecolor}{rgb}{1, 1, 1}
\color{fgcolor}
\begin{kframe}
\begin{alltt}
\textbackslash{}begin\{figure\}[ht]
\textbackslash{}caption\{Example Simple Scatter Plot Using \textbackslash{}texttt\{plot\}\}
\textbackslash{}label\{BasicFigureExample\}
\textless{\textless}echo=FALSE, fig.align='center', out.width='8cm', out.height='8cm'\textgreater{\textgreater}=
\hlkwd{plot}(x = cars$speed, y = cars$dist,
xlab = \hlstr{"Speed (mph)"},
ylab = \hlstr{"Stopping Distance (ft)"},
cex.lab = 1.5)
@
\textbackslash{}end\{figure\}
\end{alltt}
\end{kframe}
\end{knitrout}
}
% Actually create simple scatterplot
\begin{figure}
\caption{Example Simple Scatter Plot Using \texttt{plot}}
\label{BasicFigureExample}
<<Ch10PlotBasic, echo=FALSE, fig.align='center', out.width='8cm', out.height='8cm'>>=
plot(x = cars$speed, y = cars$dist,
xlab = "Speed (mph)",
ylab = "Stopping Distance (ft)",
cex.lab = 1.5)
@
\end{figure}
\noindent This code produces Figure \ref{BasicFigureExample}.\footnote{Note that I did not specify the center environment. This is because it is specified in a \emph{knitr} global chunk option.} If you are familiar with R graphics you will notice that we did not need to tell \emph{knitr} to save the file in a particular format. Instead, behind the scenes it automatically saves the plot as a PDF file in a folder called \emph{figure} that is a child of the current working directory. You can choose the figure file's format with the \texttt{dev} (graphical device) chunk option.\index{knitr option!dev}\label{DevTalk} For example, to save the figure in a PNG formatted file simply add the chunk option \verb|dev='PNG'|. You can choose any graphical device format supported by R. For a full list of R's graphical devices type \verb|?Devices| into your console.\index{R!graphical device} One reason you might want to change the format is to reduce your presentation document's file size. Using a bitmap format like PNG will create smaller files than PDFs, though lower-quality images.
We could, of course, simply link to the original R source code file stored on GitHub\index{GitHub} with the \verb|source_url|\index{R function!source\_url} command. Let's look at an example of this with a different source code file. Remember in Chapter \ref{DataGather} we used a makefile to gather data from three different sources on the internet. The CSV is called \emph{MainData.csv} and is stored on GitHub at: \url{http://bit.ly/V0ldsf}.\footnote{The full version of the URL is: \url{https://raw.githubusercontent.com/christophergandrud/Rep-Res-Examples/master/DataGather_Merge/MainData.csv}} We can download this data into R and make a scatterplot matrix with this code:\index{scatterplot matrix}
<<Ch10ScatterPlotMatrix, eval=FALSE, tidy=FALSE>>=
# Download data
MainData <- repmis::source_data("http://bit.ly/V0ldsf")
# Subset MainData so that it only includes the year 2003
SubData <- subset(MainData, year == 2003)
# Remove iso2c, country, year variables
# Keep reg_4state, disproportionality, FertilizerConsumption
SubData <- SubData[, c("reg_4state",
"disproportionality",
"FertilizerConsumption")]
# Create a scatterplot matrix
pairs(x = SubData)
@
\noindent This\index{R function!source\_data} is a lot of code, but you should be familiar with most of it. You will notice that after downloading the data we cleaned it up in preparation for plotting with the \texttt{pairs} command\index{R function!pairs} by removing data from all years other than 2003 and all of the country-year identifying variables. Finally, we created the scatterplot matrix with \texttt{pairs}.
To dynamically include the plot in our final document, we don't need to include all of this code in a code chunk in our markup document. A file containing the code is available on GitHub.\footnote{See: \url{https://raw.githubusercontent.com/christophergandrud/Rep-Res-Examples/master/Graphs/ScatterPlotMatrix.R}.} So we only need to use \verb|source_url| to link to it. I've shortened the raw source code file's URL to: \url{http://bit.ly/TE0gTc}. Let's look at the syntax for knitting this into an R Markdown file:
{\scriptsize
<<Ch9ScatterplotMatrixMarkdown, eval=FALSE, tidy=FALSE>>=
### Scatterplot Matrix Created from MainData.csv
```{r, echo=FALSE, warning=FALSE, message=FALSE, out.width='500px', out.height='500px'}
# Create scatterplot matrix from MainData.csv
devtools::source_url("http://bit.ly/TE0gTc")
```
@
}
\noindent This code creates the plot that we see in Figure \ref{MarkdownScatterMatrix}. Because we have linked all the way back to the original data set \emph{MainData.csv}, any time it is updated by the makefile, the update will automatically cascade all the way through to our final presentation document the next time we knit it.
\begin{figure}
\caption{Example of a Scatterplot Matrix in a Markdown Document}
\label{MarkdownScatterMatrix}
\begin{center}
\includegraphics[width=0.7\textwidth]{Children/Chapter10/images10/MarkdownScatterMatrix.png}
\end{center}
\end{figure}
\section{Including \emph{ggplot2} Graphics}\index{ggplot2|(}
The \emph{ggplot2} package\footnote{``GG'' stands for grammar of graphics and ``2'' indicates that it is the second major version of the package.} \citep{R-ggplot2}\index{ggplot2} is probably one of the most popular recent developments in R graphics. It greatly expands the aesthetic and substantive tools R has for displaying quantitative information. Figures created with \emph{ggplot2} are (generally) static,\footnote{It is possible to combine a series of figures created with \emph{ggplot2} into an animation. For a nice example of an animation using \emph{ggplot2} see Jerzy Wieczorek's animation of 2012 US presidential campaigning:\index{presidential campaigning} \url{http://bit.ly/UUVKka}.} so they are included in knitted documents the same way as most of R's default graphics.
There are a number of very good resources for learning how to use \emph{ggplot2}. These include Hadley Wickham's \emph{ggplot2} book \citeyearpar{Whickham2009book} and article \citeyearpar{Whickham2010journal}. The official \emph{ggplot2} website\footnote{\url{http://docs.ggplot2.org/current/}} has up-to-date information. I've also found the Cookbook for R website helpful.\footnote{\url{http://wiki.stdout.org/rcookbook/Graphs/}}
Given that there is already extensive good documentation on \emph{ggplot2} we are not going to learn the full details of how to use the package here. Instead, let's look at some examples of how to manipulate a data frame and a regression results object so that they can be graphed with \emph{ggplot2}. First we will create a multi-line time series plot. Then we will create a caterpillar plot of regression results. Along with giving you a general sense of how \emph{ggplot2} works, the examples illuminate how \emph{ggplot2} can be made part of a fully reproducible research workflow.\footnote{Note that everything we do here with \emph{ggplot2} can also be done with R's default graphics, though the appearance will be different.}
Sometimes we may want to show how multiple variables change together overtime. For example, imagine we have data on inflation\index{inflation}\index{US Federal Reserve} in the United States along with inflation forecasts made by the US Federal Reserve\index{US Federal Reserve} two quarters beforehand. The data is stored on GitHub at: \url{https://raw.githubusercontent.com/christophergandrud/Rep-Res-Examples/master/Graphs/InflationData.csv}.\footnote{This data is from \cite{GandrudGrafstrom2012}. The example here partially recreates Figure 1 from that paper.} I've loaded the data into R and put it into an object called \emph{InflationData}. It looks like this:
% Load inflation data
<<Ch10LoadInflationData, include=FALSE, message=FALSE>>=
# Create URL object
InflationUrl <- "https://raw.githubusercontent.com/christophergandrud/Rep-Res-Examples/master/Graphs/InflationData.csv"
# Load data
InflationData <- repmis::source_data(InflationUrl)
@
{\small
<<Ch10HeadInflationData>>=
names(InflationData)
@
}
We want to create a plot with \textbf{Quarter} as the $x$ axis, inflation as the $y$ axis, and two lines. One line will represent \textbf{ActualInflation} and the other \textbf{EstimatedInflation}. To do this we need to reshape\index{reshape data} our data so that the inflation variables are in long format\index{long formatted data} like this:
\vspace{0.5cm}
\begin{tabular}{l l l}
\hline
Quarter & Variable & Value \\[0.25cm]
\hline\hline
1969.1 & ActualInflation & \\
1969.1 & EstimatedInflation & \\
1969.2 & ActualInflation & \\
1969.2 & EstimatedInflation & \\
\ldots & & \\
\hline
\end{tabular}
\vspace{0.5cm}
\noindent We can use the \texttt{gather} command from \emph{tidyr}\index{R function!gather}\index{tidyr} that we first saw in Chapter \ref{DataClean} (Section \ref{GatherReshape}) to reshape the data. The variable identifying the observations in this case is \texttt{Quarter}. The \textbf{ActualInflation} and \textbf{EstimatedInflation} variables (in columns two and three) are the variables that we want to gather. So let's gather the data:
<<Ch10GatherInflation, tidy=FALSE>>=
# Load tidyr
library(tidyr)
# Gather InflationData
GatheredInflation <- gather(InflationData, variable,
value, 2:3)
# Show GatheredInflation variables
head(GatheredInflation)
@
\noindent Now we have a data set we can use to create our line graph with \emph{ggplot2}.
Let's cover a few basic \emph{ggplot2}\index{ggplot2} ideas that will help us understand the following code better. First, plots are composed of layers\index{ggplot2!layers} including the coordinate system, points, labels, and so on. Each layer has aesthetics, including the variables plotted on the $x$ and $y$ axes, label sizes, colors, and shapes. Aesthetic elements are defined by the \texttt{aes}\index{ggplot2!aes}\index{ggplot2!aesthetics} argument. Finally, the main layer types are called geometrics,\index{ggplot2!geometrics} including lines, points, bars, and text. Commands that set geometrics usually begin with \texttt{geom}.\index{ggplot2!geom} For example, the geometric to create lines is \verb|geom_line|.\index{ggplot2!geom\_line}
{\footnotesize
<<Ch10ggplot2Lines, eval=FALSE, tidy=FALSE>>=
# Load ggplot2
library(ggplot2)
# Create plot
LinePlot <- ggplot(data = GatheredInflation, aes(x = Quarter,
y = value,
color = variable,
linetype = variable)) +
geom_line() +
scale_color_discrete(name = "", labels = c("Actual",
"Estimated")) +
scale_linetype(name = "", labels = c("Actual",
"Estimated")) +
xlab("\n Quarter") + ylab("Inflation\n") +
theme_bw(base_size = 15)
# Print plot
print(LinePlot)
@
}
\noindent You can see we set the $x$ and $y$ axes using the \textbf{Quarter} and \textbf{value} variables. We told \emph{ggplot} that elements in the geometric layer should have lines with different colors and line types (dashed, dotted, and so on) based on the value of \textbf{variable} that they represent. \verb|geom_line| specifies that we want to add a line geometric layer.\footnote{Remember from Chapter \ref{GettingStartedRKnitr} that commands must be followed by parentheses. These layers are commands so they need to be followed by parentheses.} \verb|scale_color_discrete| and \verb|scale_linetype|\index{ggplot2!scale\_color\_discrete}\index{ggplot2!scale\_linetype} are used here to hide the plot's legend title with \verb|name = ""| and customize the legend's labels with \verb|labels = . . .|.\index{ggplot2!labels} You can also use them to determine the specific colors and line types you would like to use. \texttt{xlab} and \texttt{ylab} set the axes' labels. You can add a title with \texttt{ggtitle}.\index{ggplot2!ggtitle} Finally, I added \verb|theme_bw|\index{ggplot2!theme\_bw} so that the plot would use a simple black-and-white theme. We added the argument \verb|base_size = 15| to increase the plot's font size.\index{ggplot2!base\_size}
All of the code required to create this graph is on GitHub at: \url{http://bit.ly/VEvGJG}.\footnote{The full URL is: \url{https://raw.githubusercontent.com/christophergandrud/Rep-Res-Examples/master/Graphs/InflationLineGraph.R}.} So to knit the graph like Figure \ref{ggplot2Line} into an R Sweave-style LaTeX document we type:
{\scriptsize
\begin{knitrout}
\definecolor{shadecolor}{rgb}{1, 1, 1}
\color{fgcolor}
\begin{kframe}
\begin{alltt}
\textbackslash{}begin\{figure\}[ht]
\textbackslash{}caption\{Example Multi-line Time Series Plot Created with \textbackslash{}emph\{ggplot2\}\}
\textbackslash{}label\{ggplot2Line\}
\textbackslash{}begin\{center\}
\textless{\textless}echo=FALSE, message=FALSE, warning=FALSE, out.width='10cm', out.height='8cm'\textgreater{\textgreater}=
\hlcom{# Create plot}
devtools::\hlkwd{source_url}(\hlstr{"http://bit.ly/VEvGJG"})
@
\textbackslash{}end\{center\}
\textbackslash{}end\{figure\}
\end{alltt}
\end{kframe}
\end{knitrout}
}
% Actually add graph
\begin{figure}
\caption{Example Multi-line Time Series Plot Created with \emph{ggplot2}}
\label{ggplot2Line}
<<Ch10MultiLines, echo=FALSE, message = FALSE, warning=FALSE, out.width='10cm', out.height='8cm'>>=
# Create plot
devtools::source_url("http://bit.ly/VEvGJG")
@
\end{figure}
\noindent The syntax for including this and other \emph{ggplot2} figures in an R Markdown document is the same as we saw for default R graphics.
\subsection{Showing regression results with caterpillar plots}
Many packages that estimate statistical models from data in R have built-in plotting capabilities. For example, the \emph{survival} package \citep{R-survival} has a \texttt{plot.survfit}\index{plot.survfit} command for plotting survival curves created using event history analysis.\index{event history analysis}\index{event history analysis} These plots can Of course, be knitted into presentation documents like the plots we have seen already.
However, sometimes either a package doesn't have built-in commands for plotting model results the way you want to and/or you want to use \emph{ggplot2} to improve the aesthetic quality of the plots they do create by default. In either case you can almost always create the plot that you want by first breaking into the model results object, extracting what you want, then plotting it with \emph{ggplot2}. The process is very similar to what we did in Chapter \ref{TablesChapter} to create custom tables (see Section \ref{NonSupportedClasses}).
To illustrate how this can work, let's create a caterpillar plot, like Figure \ref{CatPlot},\index{caterpillar plot}\index{coefficient} showing the mean coefficient estimates and the uncertainty\index{uncertainty} surrounding them from a Bayesian normal linear regression model\index{Bayesian normal linear regression} using the \emph{swiss} data frame. Here is our model:\index{Zelig}\index{R function!zelig}
<<Ch10SwissModel, message=FALSE, error=FALSE, warning=FALSE,tidy=FALSE>>=
# Load Zelig package
library(Zelig)
# Estimate model
NBModel2 <- zelig(Examination ~ Education + Agriculture +
Catholic + Infant.Mortality,
model = "normal.bayes",
data = swiss, cite = FALSE)
@
\noindent Remember from Chapter \ref{TablesChapter} that we can create an object summarizing\index{R function!summary} our estimation results like this:
<<Ch10ModelSummaryResults, tidy=FALSE>>=
# Create summary object
NBModel2Sum <- summary(NBModel2)
# Create summary data frame
NBSum2DF <- data.frame(NBModel2Sum$summary)
# Show data frame
NBSum2DF
@
\noindent We want to use \emph{ggplot2} to create credibility intervals\index{credibility interval} for each variable with \textbf{X2.5.} as the minimum value and \textbf{X97.5.} as the maximum value. These are the lower and upper bounds of the middle 95 percent of the estimates' marginal posterior distributions, i.e. the 95 percent credibility intervals.\footnote{The procedures used here are also generally applicable for graphing frequentist\index{frequentist} confidence intervals once you have calculated the confidence intervals. One useful command for doing this is \texttt{confint}.\index{R function!confint}\index{confidence interval}} We will also create a point at the \textbf{mean} of each estimate. To do this we will use \emph{ggplot2}'s \verb|geom_pointrange| command.
First we need to do a little tidying up.\label{RowNamesTidy}
<<Ch10SubsetBayes, tidy=FALSE>>=
# Convert row.names to normal variable
NBSum2DF$Variable <- row.names(NBSum2DF)
# Keep only coefficient estimates
## This allows for a more interpretable scale
NBSum2DF <- subset(NBSum2DF, Variable != "(Intercept)")
NBSum2DF <- subset(NBSum2DF, Variable != "sigma2")
@
\noindent The first line of executable code creates a proper variable out of the data frame's row.names\index{row.names} attribute. In this case row.names contains the names of the variables included in the regression. The second and third executable lines remove the estimates \emph{(Intercept)} and \emph{sigma2}. This allows the variable's coefficient estimates to be plotted on a scale that enables easier interpretation.
Now we can create our caterpillar plot.
<<Ch10CatPlot, eval=FALSE, tidy=FALSE>>=
# Load ggplot2
library(ggplot2)
# Make caterpillar plot
ggplot(data = NBSum2DF, aes(x = reorder(Variable, X2.5.),
y = Mean,
ymin = X2.5., ymax = X97.5.)) +
geom_pointrange(size = 1.4) +
geom_hline(aes(intercept= 0), linetype = "dotted") +
xlab("Variable\n") + ylab("\n Coefficient Estimate") +
coord_flip() +
theme_bw(base_size = 20)
@
\noindent There are some new pieces of code in here, so let's take a look. First, the data frame is reordered from the highest to lowest value of \textbf{X2.5.} using the \texttt{reorder} command.\index{R function!reorder} This makes the plot easier to read. The middle point of the point range is set with \texttt{y} and the lower and upper bounds with \texttt{ymin}\index{ggplot2!ymin} and \texttt{ymax}.\index{ggplot2!ymax} The \verb|geom_hline|\index{ggplot2!geom\_hline} command used here creates a dotted horizontal line at 0, i.e. no effect. \verb|coord_flip|\index{ggplot2!coord\_flip} flips the plot's coordinates so that the variable names are on the $y$ axis. We can include this plot in a knitted document the same way as before.
\begin{figure}
\caption{An Example Caterpillar Plot Created with \emph{ggplot2}}
\label{CatPlot}
% Actually include plot
<<Ch10PlotCatPlot, echo=FALSE, message=FALSE>>=
# Create plot
devtools::source_url("https://raw.githubusercontent.com/christophergandrud/Rep-Res-Examples/master/Graphs/CaterpillarPlot.R")
@
\end{figure}
\index{ggplot2|)}
%%%%%%%%%%%%% googleVis
\section{JavaScript Graphs with \emph{googleVis}}\index{googleVis}
Markus Gesmann and Diego de Castillo's \emph{googleVis}\index{googleVis} package \citeyearpar{R-googleVis} allows us to use Google's Visualization API\index{API} from within R to create interactive tables, plots, and maps with Google Chart Tools.\index{Google Chart Tools} Because the visualizations are written in JavaScript\index{JavaScript} they can be included in HTML presentation documents created by R Markdown. Unfortunately, they cannot be directly\footnote{The example in this chapter is a from a screenshot.} included in LaTeX-produced PDFs. The \emph{animation}\index{animation} package \citep{R-animation} does have some limited features for including interactive visualizations in PDFs (as well as HTML documents) and is worth investigating if you want to do this.
\paragraph{Basic googleVis figures}
Let's briefly look at how to make one type of figure with \emph{googleVis}: a choropleth map.\index{choropleth map} This is created with the \texttt{gvisGeoChart} function.\index{gvisGeoChart}\index{R function!gvisGeoChart} We will use this example to illustrate how to incorporate \emph{googleVis} figures into R Markdown.\footnote{For demonstrations of the full range of plotting functions available, visit the \emph{googleVis} website: \url{http://code.google.com/p/google-motion-charts-with-r/wiki/GadgetExamples#googleVis_Examples}.}
Imagine that we want to map global fertilizer\index{fertilizer} consumption in 2003 using the World Bank\index{World Bank} data we gathered in Chapter \ref{DataGather}. Remember that the data was highly right skewed, so we will actually map the natural logarithm\index{logarithm}\index{R function!log} of the \textbf{FertilizerConsumption} variable.\footnote{You'll notice in the code below that we remove all values of \textbf{FertilizerConsumption} less-than 0.1. This is so that we can calculate integer values with the natural logarithm. See Section \ref{Infinity} for more details.} Assuming that we have already loaded the \emph{MainData.csv} data set, here is the code:
<<Ch10GeoMap, eval=FALSE, tidy=FALSE>>=
# Load googleVis
library(googleVis)
# Subset MainData so that it only includes 2003
SubData <- subset(MainData, year == 2003)
# Keep values of FertilizerConsumption greater-than 0.1
SubData <- subset(SubData, FertilizerConsumption > 0.1)
# Find the natural logarithm of FertilizerConsumption.
## Round the results to one decimal digit.
SubData$LogConsumption <- round(log(SubData$FertilizerConsumption),
digits = 1)
# Make a map of Fertilizer Consumption
FCMap <- gvisGeoChart(data = SubData,
locationvar = "iso2c",
colorvar = "LogConsumption",
options = list(
colors = "['#ECE7F2', '#A6BDDB', '#2B8CBE']",
width = "780px",
height = "500px"))
@
\noindent The \texttt{locationvar} argument specifies the variable with information on each observation's location. Google Chart Tools can use ISO\index{ISO} two-letter country codes to determine each country's location. \texttt{colorvar} specifies the variable with the values to map for each country. We can determine other options by creating a list-type\index{R!list} object with arguments specifying characteristics such as the map's width, height, and colors. The colors here are written using hexadecimal values.\index{hexadecimal} This is a commonly used format for specifying colors on websites.\footnote{You can also use hexadecimal values in \emph{ggplot2}. The Color Brewer 2\index{Color Brewer} website (\url{http://colorbrewer2.org/}) is very helpful for picking hexadecimal color palettes,\index{color palettes} among others.}
To view the figure on your computer simply use \emph{googleVis}'s \texttt{plot} command. For example, to view our map we type:
<<Ch10ViewGoogleVisMapPlot, eval=FALSE>>=
plot(FCMap)
@
\noindent Note that you need to be connected to the internet to view figures created by \emph{googleVis}, otherwise your image will not be able to access the required JavaScript\index{JavaScript} files from the Google Visualization API.\index{API}
\begin{figure}
\caption{Screenshot of a \emph{googleVis} Geo Chart}
\label{GeoMapImage}
\begin{center}
\includegraphics[width=\textwidth]{Children/Chapter10/images10/GeoChartScreenShot.png}
\end{center}
\end{figure}
\paragraph{Including \emph{googleVis} in knitted documents}
Typing \verb|print(FCMap, tag = "chart")|\index{R function!print} in a knittable document would print the entire JavaScript code needed to create the map. Much like we saw with tables produced with \emph{xtable} and \emph{texreg} in Chapter \ref{TablesChapter}, we need to change the code chunk \texttt{results} option to include the map as a map rather than as JavaScript markup. To have the visualization show up in your HTML output, rather than the code block, simply set the code chunk option\index{knitr option!results} to \verb|results='asis'|.\footnote{You can use \texttt{results='asis'} to include almost any type of JavaScript graphics. For an example using the D3 JavaScript library\index{D3 JavaScript library}\index{JavaScript} and \emph{knitr} see this page by Yihui Xie: \url{http://yihui.name/knitr/demo/javascript/}.} For example, the full code needed to create and print \emph{FCMap} is available at: \url{http://bit.ly/VNnZxS}.\footnote{The full URL is: \url{https://raw.githubusercontent.com/christophergandrud/Rep-Res-Examples/master/Graphs/GoogleVisMap.R}.} To knit the map into an R Markdown document we type:
<<Ch10MapKnit, eval=FALSE, tidy=FALSE>>=
```{r, echo=FALSE, message=FALSE, results='asis'}
# Create and print geo map
devtools::source_url("http://bit.ly/VNnZxS")
```
@
\paragraph{Note for Motion Charts}
You may notice that Google motion charts\footnote{You can use the \texttt{gvisMotionChart}\index{R function!gvisMotionChart} command to make these.}\index{motion chart} do not show up in the RStudio \textbf{Preview HTML}\index{RStudio!Preview HTML} window or even in your web browser when you open the knitted HTML version of the file. You just see a big blank space where you had hoped the chart would be. It will show up, however, if you use the \verb|plot| command on a \verb|gvis| motion chart object in the console. Motion charts can only be displayed when they are hosted on a web server or located in a directory `trusted' by Flash Player.\footnote{This is because motion charts and annotated time line charts rely on Flash,\index{Flash} unlike the other Google visualizations. For more information see Markus Gesmann's blog post at: \url{http://www.magesblog.com/2012/05/interactive-reports-in-r-with-knitr-and.html}.}\index{Flash Player}
The \verb|plot| command opens a local server, but simply opening the HTML file and the RStudio \textbf{Preview HTML} window do not. An easy way to solve this problem is to save the HTML file in your Dropbox\index{Dropbox} \emph{Public} folder\index{Dropbox!Public folder} and access it through the associated public URL link (see Chapter \ref{Storing}). Publishing a motion chart on GitHub Pages\index{GitHub!Pages} also works well (see Chapter \ref{MarkdownChapter}). For information on how to set a directory as `trusted' by Flash Player\index{Flash Player} see: \url{http://www.macromedia.com/support/documentation/en/flashplayer/help/settings_manager04.html}.
\subsection{JavaScript Graphs with \emph{htmlwidgets}-based packages}
The number of tools for creating JavaScript graphs from R that can be knitted into HTML files is growing rapidly. The \emph{htmlwidgets}\index{htmlwidgets} \citep{R-htmlwidgets} framework is especially making the development of these tools easier. As of this writing there are tools built on \emph{htmlwidgets} for creating maps, network graphs, time series graphs, and interactive tables, among others. Though the syntax of each of these tools differs, they can all easily be included into R Markdown documents. Often you simply run their core functions in a code chunk, without needing to use an additional call to \texttt{print} or \texttt{plot}.
\subsection*{Chapter summary}
In this chapter we have learned how to take results from our statistical analyses and other information from our data and dynamically present them in figures. In the next three chapters we will learn the details of how to create the LaTeX and Markdown presentation documents we use to present the tables we created in Chapter \ref{TablesChapter} and the figures we created in this chapter.
================================================
FILE: Old/Source-v2/Children/Chapter11/chapter11.Rnw
================================================
% Chapter Chapter 11 For Reproducible Research in R and RStudio
% Christopher Gandrud
% Created: 16/07/2012 05:45:03 pm CEST
% Updated: 17 April 2015
<<set-parent11, echo=FALSE, results='hide', cache=FALSE>>=
set_parent('Rep-Res-Parent.Rnw')
@
\chapter{Presenting with \emph{knitr}/LaTeX}\label{LatexChapter}
We have already begun to see how LaTeX works for presenting research results. This chapter gives you a more detailed and comprehensive introduction to basic LaTeX document structures and commands. It is not a complete introduction to all that LaTeX is capable of, but we will cover enough that you will be able to create an entire well-formatted article and slideshow with LaTeX that you can use to dynamically present your results. In the next chapter (Chapter \ref{LargeDocs}) we will build on these skills by learning how to use {\emph{knitr}} to create more complex LaTeX documents.
For basic LaTeX documents, such as short articles or simple presentations, it may often be quicker and simpler to write the markup using an R Markdown document and compile it to PDF with the \emph{rmarkdown} package.\index{rmarkdown} As we will see in Chapter \ref{MarkdownChapter}, Markdown syntax is much simpler than normal LaTeX. However, there are at least two reasons why it is useful to become familiar with LaTeX syntax. First, understanding LaTeX syntax will help you debug issues you might encounter when using \emph{rmarkdown} with LaTeX that would otherwise be mysterious if you were only familiar with Markdown. Second, R Markdown has limited capabilities for creating more complex documents such as books and documents with highly customizable formatting needs. Using \emph{kntr} and LaTeX can be useful in these situations.
In this chapter we will learn about basic LaTeX document structures and syntax as well as how to dynamically create LaTeX bibliographies with BibTeX, R, and \emph{knitr}. Finally, we will look at how to create PDF beamer slideshows with LaTeX and \emph{knitr}.
\textbf{Note:} Chapter \ref{LatexChapter} and the following chapter are unusual for this book in that they do not refer to both \emph{knitr} and \emph{rmarkdown}. Instead they focus on capabilities largely exclusive to \emph{knitr}.
\section{The Basics}
In this section we will look at how to create a LaTeX article including what editor programs to use, the basic structure of a LaTeX document, including preamble and body, LaTeX syntax for creating headings, paragraphs, lines, text formatting, math, lists, footnotes, and cross-references. I will assume that you already have a fully functioning TeX distribution\index{TeX distribution} installed on your computer. See Section \ref{InstallMarkup} for information on how to install TeX.
\subsection{Getting started with LaTeX editors}
As I mentioned earlier, RStudio\index{RStudio!LaTeX editor} is a fully functional LaTeX editor in addition to being an integrated development environment for R. If you want to create a new LaTeX document you can click {\tt{File}} in the menu bar then {\tt{New}} \textrightarrow{} {\tt{R Sweave}}.
\begin{wrapfigure}{r}{0.3\textwidth}
\caption{RStudio TeX Format Options}
\label{TeXFormat}
\begin{center}
\includegraphics[scale=0.6]{Children/Chapter11/images11/TeXFormat.png}
\end{center}
\end{wrapfigure}
Remember from Chapter \ref{GettingStartedRKnitr} that R Sweave\index{R Sweave} files are basically LaTeX files that can include {\emph{knitr}} code chunks. You can use RStudio to knit and compile a document with the click of one button: \textbf{Compile PDF}\index{RStudio!Compile PDF button} (\includegraphics[scale=0.5]{Children/Chapter11/images11/CompilePDF.png}). You can use this button to compile R Sweave files like regular LaTeX files in RStudio even if they do not have code chunks. If you use another program to compile them you might need to change the file extension from {\tt{.Rnw}} to {\tt{.tex}}. You can also insert many of the items we will cover in this section into your documents with RStudio's LaTeX \texttt{TeX Format} button.\index{RStudio!TeX format button} See Figure \ref{TeXFormat}.
There are many other LaTeX editors\index{LaTeX!editors}\footnote{Wikipedia has collated a table that comprehensively compares many of these editors: \url{http://en.wikipedia.org/wiki/List_of_text_editors}.} and many text editors that can be modified to compile LaTeX documents. For example, alongside writing this book in RStudio, I typed much of the LaTeX markup in the Sublime Text\footnote{http://www.sublimetext.com/} text editor.\index{Sublime Text} None of these options have RStudio's high-level integration with \emph{knitr}, however.\footnote{Andrew Wheiss has created a Sublime Text plugin called \emph{KnitrSublime}. It enables some R LaTeX integration. For more details see: \url{https://GitHub.com/andrewheiss/KnitrSublime}.}
If you are new to LaTeX you may be more comfortably using Lyx.\index{Lyx} Lyx has a Microsoft Word-type interface, but creates actual LaTeX documents. It also has \emph{knitr} integration. See Chapter \ref{GettingStartedRKnitr}'s Appendix for how to set up and use \emph{knitr} and Lyx.
\subsection{Basic LaTeX command syntax}\index{LaTeX!basic command syntax}
As you probably noticed in Part III's examples, LaTeX commands start with a backslash (\texttt{\textbackslash{}}). For example, to create a section heading you use the \verb|\section| command.\index{LaTeX command!section} The arguments for LaTeX commands are written inside of curly braces (\verb|{}|) like this:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{1, 1, 1}
\color{fgcolor}
\begin{kframe}
\begin{verbatim}
\section{My Section Name}
\end{verbatim}
\end{kframe}
\end{knitrout}
\noindent Probably one of the biggest sources of errors that occur when compiling a LaTeX document to PDF\index{PDF}\index{LaTeX!error} are caused by curly brackets that aren't closed, i.e. an open bracket (\verb|{|) is not matched with a subsequent closed bracket (\verb|}|). Watch out for this and use an editor (like RStudio) that highlights brackets' matching pairs. As we will see, unlike in R with parentheses, if your LaTeX command does not have an argument you do not need to include the curly brackets at all.
There are a number of places to find comprehensive lists of LaTeX commands. The Netherlands TeX users group\index{Netherlands TeX users} has compiled one: \url{http://www.ntg.nl/doc/biemesderfer/ltxcrib.pdf}.
\subsection{The LaTeX preamble \& body}\label{LaTeXPreamble}
\index{LaTeX!preamble|(}
All LaTeX documents require a preamble. The preamble goes at the very beginning of the document. The preamble usually starts with the \texttt{documentclass}\index{LaTeX command!documentclass} command. This specifies what type of presentation document you are creating--e.g. an article, a book, a slideshow,\footnote{``Slideshow'' is not a valid class. One slideshow class that we discuss later is called ``beamer''.} and so on. LaTeX refers to these as classes.\index{LaTeX!class} Classes specify a document's formatting. You can add options to \texttt{documentclass} to change the format of the entire document. For example, if we wanted to create an article class document with two columns we would type:
<<Ch11DCOptions, eval=FALSE, tidy=FALSE>>=
\documentclass[twocolumn]{article}
@
In the preamble you can also specify other style options and load any extra packages\index{LaTeX!packages} you may want to use.\footnote{The command to load a package in LaTeX is \texttt{\textbackslash{}usepackage}.\index{LaTeX command!usepackage} For example, if you include \texttt{\textbackslash{}usepackage\{url\}} in the preamble of your document you will be able to specify URL links in the body with the command \texttt{\textbackslash{}url\{SOMEURL\}}.\index{LaTeX package!url}}
The preamble is often followed by the body of your document. It is specified with the \texttt{body} environment.\index{LaTeX environment!body} See Chapter \ref{TablesChapter} (Section \ref{LaTeXEnviron}) for more details about LaTeX environments. You tell LaTeX where the body\index{LaTeX!begin document} of your document starts by typing \verb|\begin{document}|. The very last line of your document is usually \verb|\end{document}|, indicating that your document has ended. When you open a new R Sweave file in RStudio it creates an article class\index{LaTeXarticle} document with a very simple preamble and body like this:
<<Ch11FirstOpenDoc, eval=FALSE, tidy=FALSE>>=
\documentclass{article}
\begin{document}
\end{document}
@
\noindent This is all you need to get a very basic article class document working. If you want the document to be of another class, simply change \texttt{article} to something else, a \texttt{book} for example.
Let's begin to modify the markup. First we will include in the preamble the (\texttt{hyperref})\index{LaTeX package!hyperref} for clickable hyperlinks and \texttt{natbib}\index{LaTeX package!natbib} for bibliography formatting.\index{bibliography} We will discuss \texttt{natbib} in more detail below. Note that in general, and unlike in R, almost all of the LaTeX packages you will use are installed on your computer when you installed the TeX distribution.
\index{LaTeX!preamble|)}
Next, it's often a good idea to include \emph{knitr} code chunks that specify features of the document as a whole. These can include global chunk options\index{knitr!global chunk options} as well as loading data and packages used throughout the document.
Then it's a good idea to specify title information just after the \texttt{document} environment begins.\index{LaTeX environment!document} Use the \texttt{title}\index{LaTeX command!title} command to add a title, the \texttt{author}\index{LaTeX!author} command to add author information, and \texttt{date}\index{LaTeX command!date} to specify the date.\footnote{In some document classes the current data will automatically be included if you don't specify the date.} Then include the \texttt{maketitle} command.\index{LaTeX command!maketitle} This will place your title and author information in the body of the document. If you are writing an article you may also want to follow \texttt{maketitle} with an abstract. Unsurprisingly, you can use the \texttt{abstract}\index{LaTeX environment!abstract} environment to include this.
Here is a full LaTeX article class document with all of these changes added:
{\scriptsize
\begin{knitrout}
\definecolor{shadecolor}{rgb}{1, 1, 1}\color{fgcolor}
\begin{kframe}
\begin{alltt}
%%%%%%%%%%%%%% Article Preamble %%%%%%%%%%%%%%
\textbackslash{}documentclass\{article\}
%%%% Load LaTeX packages
\textbackslash{}usepackage\{hyperref\}
\textbackslash{}usepackage[authoryear]\{natbib\}
%%%% Set knitr global options and gather data
\textless{}\textless{}Global, include=FALSE\textgreater{}\textgreater{}=
\hlcom{#### Set chunk options ####}
opts_chunk$\hlkwd{set}(fig.align=\hlstr{'center'})
\hlcom{#### Load and cite R packages ####}
\hlcom{# Create list of packages}
PackagesUsed <- c(\hlstr{"knitr", "ggplot2", "repmis"})
\hlcom{# Load PackagesUsed and create .bib BibTeX file}
\hlcom{# Note must have repmis package installed.}
repmis::LoadandCite(PackagesUsed, file = \hlstr{"Packages.bib"}, install = FALSE)
\hlcom{#### Gather Democracy data from Pemstein et al. (2010) ####}
\hlcom{# For simplicity, store the URL in an object called 'url'.}
url <- \hlstr{"http://www.unified-democracy-scores.org/files/20140312/z/uds_summary.csv.gz"}
\hlcom{# Create a temporary file called 'temp' to put the zip file into.}
temp <- \hlkwd{tempfile}()
\hlcom{# Download the compressed file into the temporary file.}
\hlkwd{download.file}(url, temp)
\hlcom{# Decompress the file and convert it into a data frame}
\hlcom{# class object called 'data'.}
UDSData <- \hlkwd{read.csv}(\hlkwd{gzfile}(temp, \hlstr{"uds_summary.csv"}))
\hlcom{# Delete the temporary file.}
\hlkwd{unlink}(temp)
@
%%%% Start document body
\textbackslash{}begin\{document\}
%%%%%%%%%%%%% Create title %%%%%%%%%%%%%%%%%
\textbackslash{}title\{An Example knitr LaTeX Article\}
\textbackslash{}author\{Christopher Gandrud \textbackslash{}\textbackslash{}
Hertie School of Governance\textbackslash{}thanks\{Email: \textbackslash{}href\{mailto:gandrud@hertie-school.org\}
\{gandrud@hertie-school.org\}\}\}
\textbackslash{}date\{January 2015\}
\textbackslash{}maketitle
%%%%%%%%%%%%% Abstract %%%%%%%%%%%%%%%%%%%%
\textbackslash{}begin\{abstract\}
Here is an example of a knittable article class LaTeX document.
\textbackslash{}end\{abstract\}
%%%%%%%%%%% Article Main Text %%%%%%%%%%%%%
\textbackslash{}section\{The Graph\}
I gathered data from \textbackslash{}cite\{Pemstein2010\} on countries' democracy level. They call their
democracy measure the Unified Democracy Score (UDS). Figure \textbackslash{}ref\{DemPlot\} shows the mean
UDS scores over time for all of the countries in their sample.
\textbackslash{}begin\{figure\}
\textbackslash{}caption\{Mean UDS Scores\}
\textbackslash{}label\{DemPlot\}
\textless{}\textless{}echo=FALSE, message=FALSE, warning=FALSE, out.width='7cm', out.height='7cm'\textgreater{}\textgreater{}=
\hlcom{# Graph UDS scores}
\hlkwd{ggplot}(UDSData, \hlkwd{aes}(x = year, y = mean)) +
\hlkwd{geom_point}(alpha = I(0.1)) +
\hlkwd{stat_smooth}(size = 2) +
\hlkwd{ylab}(\hlstr{"Democracy Score"}) + \hlkwd{xlab}(\hlstr{""}) +
\hlkwd{theme\_bw}()
@
\textbackslash{}end\{figure\}
%%%%%%%%%%% Reproducing the Document %%%%%
\textbackslash{}section\*\{Appendix: Reproducing the Document\}
This document was created using R version
\textbackslash{}Sexpr\{\hlkwd{paste0}(version$major, ".", version$minor)\}
and the R package \textbackslash{}emph\{knitr\}
\textbackslash{}citep\{R-knitr\}. It also relied on the R packages
\textbackslash{}emph\{ggplot2\} \textbackslash{}citep\{R-ggplot2\} and \textbackslash{}emph\{repmis\} \textbackslash{}citep\{R-repmis\}.
The document can be completely reproduced from
source files available on GitHub at:
\textbackslash{}url\{https://GitHub.com/christophergandrud/Rep-Res-Examples\}.
%%%%%%%%% Bibliography %%%%%%%%%%%%%%%%%%%%
\textbackslash{}bibliographystyle\{apa\}
\textbackslash{}bibliography\{Main.bib,Packages.bib\}
\textbackslash{}end\{document\}
\end{alltt}
\end{kframe}
\end{knitrout}
}
\noindent The \emph{knitr} code chunk\index{knitr!code chunk} syntax should be familiar to you from previous chapters, so let's unpack the LaTeX syntax from just after the first code chunk, including the ``Create Title'' and ``Abstract'' parts. New syntax shown in later parts of this example is discussed in the remainder of this section and the next section on bibliographies.
First, remember that the percent sign (\%) is LaTeX's comment character. Using it to comment your markup can make it easier to read. Second, as we saw in Chapter \ref{TablesChapter} (Section \ref{LaTeXTables}), double backslashes (\verb|\\|),\index{LaTeX!\textbackslash{}\textbackslash{}} like those after the author's name, force a new line in LaTeX. We will discuss the \texttt{emph} command in a moment. Third, using the \texttt{thanks}\index{LaTeX command!thanks} command allows us to create a footnote for author contact information\footnote{Frequently it also includes thank-yous to people who have helped the research.} that is not numbered like the other footnotes (see below). Finally, you'll notice \verb|\href{mailto: . . . .org}}|.\index{LaTeX command!href}\index{LaTeX command!mailto} This creates an email address in the final document that will open the reader's default email program\index{LaTeX!email program} when clicked.
Finally, you may have noticed the following line:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{1, 1, 1}\color{fgcolor}
\begin{kframe}
\begin{alltt}
\textbackslash{}Sexpr\{paste0(version$major, ".", version$minor)\}
\end{alltt}
\end{kframe}
\end{knitrout}
\noindent This code finds the current version of R being used and prints the version number into the presentation document.
\subsection{Headings}\index{LaTeX!headings}
Earlier in the chapter we briefly saw how to create section-level headings with \texttt{section}.\index{LaTeX command!section} There are a number of other sub-section-level headings including \texttt{subsection}, \texttt{subsubsection}, \texttt{paragraph}, and \texttt{subparagraph}.\index{LaTeX command!subsection}\index{LaTeX command!subsubsection}\index{LaTeX command!paragraph}\index{LaTeX command!subparagraph} Headers are numbered automatically by LaTeX.\footnote{The \texttt{paragraph} level does not have numbers.} To have an unnumbered section,\index{LaTeX!unnumbered section} place an asterisk in it like this: \verb|\section*{Unnumbered Section}|. In book class documents you can also use \texttt{chapter}\index{LaTeX command!chapter} to create new chapters and \texttt{part} for collections of chapters.\index{LaTeX command!part}
\subsection{Paragraphs \& spacing}\index{LaTeX!paragraph}\index{LaTeX!spacing}
In LaTeX, paragraphs are simply created by adding a blank line between lines. It will format all of the tabs for the beginning of paragraphs based on the document's class rules. As we discussed before, writing tabs in the markup version of your document does nothing in the compiled document. They are generally used just to make the markup easier for people to read.\index{LaTeX!tabs}
Note that adding more blank lines between paragraphs will not add extra space between the paragraphs in the final document. To specify the space following paragraphs (or almost any line) use the \texttt{vspace} (vertical space) command.\index{LaTeX command!vspace} For example, to add three centimeters of vertical space on a page type: \verb|\vspace{3cm}|. This gives us the following space:
\vspace{3cm}
Similarly, adding extra spaces between words in your LaTeX markup won't create extra spaces between words in the compiled document. To add horizontal space use the \texttt{hspace}\index{LaTeX command!hspace} command in the same way as \texttt{vspace}.
\subsection{Horizontal lines}\index{LaTeX command!hline}\index{LaTeX command!hrulefill}\index{LaTeX!lines}
Use the \texttt{hrulefill} command to create horizontal lines in the text of your document. For example, \verb|\hrulefill| creates:
\vspace{0.2cm}
\hrulefill
\noindent Inside of a \verb|tabular| environment,\index{LaTeX environment!tabular} use the \verb|hline| command rather than \verb|hrulefill|.
\subsection{Text formatting}
Let's briefly look at how to do some of the more common types of text formatting in LaTeX and how to create some commonly used diacritics and special characters.
\paragraph{Italics \& Bold}\index{LaTeX!italics}\index{LaTeX!emphasis}\index{LaTeX!bold}
To italicize a word in LaTeX use the \texttt{emph} (emphasis) command.\index{LaTeX command!emph} For bold use \texttt{textbf}.\index{LaTeX command!textbf} You can nest commands inside of one another to combine their effect. For example, to \emph{\textbf{italicize and bold}} a word use: \verb|\emph{textbf{italicize and bold}}|.
\paragraph{Font size}\label{FontSize}\index{LaTeX!font size}
You can specify the base font size of an entire document with a \texttt{documentclass} option. For example, to create an article with 12-point font use: \texttt{\textbackslash{}documentclass[12pt]\{article\}}.
There are a number of commands to set the size of specific pieces of text relative to the base size. See Table \ref{LaTeXFontSize} for the full list. Usually a slightly different syntax is used for these commands that goes like this: \verb|{\SIZE_COMMAND . . . }|. For example, to use the {\tiny{tiny size}} in your text use: \verb|{\tiny{tiny size}}|.
You can change the size of code chunks that \emph{knitr} places in presentation documents using these commands. Just place the code chunk inside of \verb|{\SIZE_COMMAND . . . }|. This is similar to using the \verb|size| code chunk option.\index{knitr option!size}
\begin{table}
\caption{LaTeX Font Size Commands}
\label{LaTeXFontSize}
\begin{center}
\vspace{0.2cm}
\begin{tabular}{c}
{\Huge \texttt{Huge}} \\
{\huge \texttt{huge}} \\
{\LARGE \texttt{LARGE}} \\
{\Large \texttt{Large}} \\
{\large \texttt{large}} \\
{\normalsize \texttt{normalsize}} \\
{\small \texttt{small}} \\
{\footnotesize \texttt{footnotesize}} \\
{\scriptsize \texttt{scriptsize}} \\
{\tiny \texttt{tiny}}
\vspace{0.2cm}
\end{tabular}
\end{center}
\end{table}
\paragraph{Diacritics}\index{LaTeX!diacritics}
You cannot directly enter letters with diacritics--e.g. accent mark--into LaTeX. For example, to create a letter c with a cedilla (\c{c}) you need to type \verb|\c{c}|. To create an `a' with an acute accent (\'{a}) type: \verb|\'{a}|. There are obviously many types of diacritics and commands to include them within LaTeX-produced documents. For a comprehensive discussion of the issue and a list of commands see the LaTeX Wikibook page on the topic: \url{http://en.wikibooks.org/wiki/LaTeX/Special_Characters}. If you regularly use non-English alphabets you might also be interested in reading the LaTeX Wikibook page on internationalization: \url{http://en.wikibooks.org/wiki/LaTeX/Internationalization}.\index{LaTeX!internationalization}\index{LaTeX!non-English characters}
\paragraph{Quotation marks}\index{LaTeX!quotation marks}
To specify double left quotation marks (``) use two back ticks (\verb|``|). For double right quotes ('') use two apostrophes (\verb|''|). Single quotes follow the same format (\verb|`'|).
\subsection{Math}\index{LaTeX!math}\label{MathLaTeX}
LaTeX is particularly popular among quantitative researchers and mathematicians because it is very good at rendering mathematics. A complete listing of every math command would take up quite a bit of space.\footnote{See the Netherlands TeX user group list mentioned earlier for an extensive compilation of math commands.} I am briefly going to discuss how to include math in a LaTeX document. This discussion includes a few math syntax examples.
To include math inline with your text, place the math syntax in between backslashes and parentheses, i.e. \verb|\( . . . \)|. For example, \verb|\( s^{2} = \frac{\sum(x - \bar{x})^2}{n - 1} \)| produces \( s^{2} = \frac{\sum(x - \bar{x})^2}{n - 1} \) in our final document.\footnote{Instead of backslashes and parentheses you can also use a pair of dollar signs (\texttt{\$\ldots \$})\index{LaTeX!\$}.} We can display math separately from the text by placing the math commands inside of backslashes and square brackets: \verb|\[ . . . \]|.\footnote{Equivalently, use two pairs of dollar signs (\texttt{\$\$\ldots \$\$}) or the \texttt{display} environment.\index{LaTeX environment!display} Though it will still work in most cases, the double dollar sign math syntax may cause errors. You can also number display equations using the \texttt{equation} environment.\index{LaTeX environment!equation}} For example,
<<Ch11Math, eval=FALSE, engine='sh'>>=
\[
s^{2} = \frac{\sum(x - \bar{x})^2}{n - 1}
\]
@
\noindent gives us:
\[
s^{2} = \frac{\sum(x - \bar{x})^2}{n - 1}
\]
\subsection{Lists}\index{LaTeX!lists}
To create bullet lists\index{LaTeX!bullet lists} in LaTeX use the \texttt{itemize}\index{LaTeX environment!itemize} environment. Each list item is delimited with the \texttt{item}\index{LaTeX command!item} command. For example:
<<Ch11Lists1, eval=FALSE, tidy=FALSE>>=
\begin{itemize}
\item The first item.
\item The second item.
\item The third item.
\end{itemize}
@
\noindent gives us:
\begin{itemize}
\item The first item.
\item The second item.
\item The third item.
\end{itemize}
\noindent To create a numbered list use the \texttt{enumerate}\index{LaTeX environment!enumerate} environment instead of \texttt{itemize}. You can create sublists\index{LaTeX!sublists} simply by nesting lists inside of lists like this:
<<Ch11Lists2, eval=FALSE, tidy=FALSE>>=
\begin{itemize}
\item The first item.
\item The second item.
\begin{itemize}
\item A sublist item
\end{itemize}
\item The third item.
\end{itemize}
@
\noindent which gives us:
\begin{itemize}
\item The first item.
\item The second item.
\begin{itemize}
\item A sublist item
\end{itemize}
\item The third item.
\end{itemize}
\subsection{Footnotes}\index{LaTeX!footnotes}
Plain, non-bibliographic footnotes are easy to create in LaTeX. Simply place \texttt{\textbackslash{}footnote\{} where you would like the footnote number to appear in the text. Then type the footnote's text. Of course, remember to close the footnote with a \texttt{\}}. LaTeX does the rest, including formatting and numbering.
\subsection{Cross-references}\index{LaTeX!cross-references}
LaTeX will also automatically format cross-references. We were already partially introduced to cross-references in chapters \ref{TablesChapter} and \ref{FiguresChapter}. At the place where you would like to reference, add a \texttt{label} such as \verb|\label{ACrossRefLabel}|.\index{LaTeX command!label} It doesn't really matter what label you choose, though make sure they are not duplicated in the document. Also, it can be a good idea to use the same conventions that we learned for labeling R objects (see Section \ref{ObjectNames}). Then place a \texttt{ref}\index{LaTeX command!ref} command (e.g. \verb|\ref{ACrossRefLabel|) at the place in the text where you want the cross-reference to be.
If you place the \texttt{label} on the same line as a heading command, \texttt{ref} will place the heading number. If \texttt{label} is in a \texttt{table} or \texttt{figure} environment you will get the table or figure number. You can also use \texttt{pageref} instead of \texttt{ref} to include the page number. Finally, loading the \emph{hyperref}\index{LaTeX package!hyperref} package makes cross-references (or footnote) clickable. Clicking on them will take you to the items they refer to.
\section{Bibliographies with BibTeX}\label{BibTeXBib}\index{BibTeX|(}\index{LaTeX!bibliographies|(}\index{bibliography|(}
LaTeX can take advantage of very comprehensive bibliography-making capabilities. All major TeX distributions come with BibTeX. BibTeX is basically a tool for creating databases of citation information. In this section, we are going to see how to incorporate a BibTeX bibliography into your LaTeX documents. Then we will learn how use R to automatically generate a bibliography of packages used to create a knitted document. For more information on BibTeX syntax see the LaTeX Wikibook page on Bibliography management: \url{http://en.wikibooks.org/wiki/LaTeX/Bibliography_Management}.
\subsection{The \emph{.bib} file}
BibTeX bibliographies are stored in plain-text files with the extension \texttt{.bib}. These files are databases of citations.\footnote{The order of the citations does not matter.} The syntax for each citation goes like this:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{1, 1, 1}\color{fgcolor}
\begin{kframe}
\begin{alltt}
@DOCUMENT_TYPE\{CITE_KEY,
title = \{TITLE\},
author = \{AUTHOR\},
. . . = \{. . .\}
\}
\end{alltt}
\end{kframe}
\end{knitrout}
\noindent \verb|DOCUMENT_TYPE| specifies what type of document--article, book, webpage, and so on--the citation is for. This determines what items the citation can and needs to include. Then we have the \verb|CITE_KEY|.\index{BibTeX!citation keys} This is the reference's label that you will use to include the citation in your presentation documents. We'll look more at this later in the section. Each citation must have a unique \verb|CITE_KEY|. A common way to write these keys is to use the author's surname and the publication year, e.g. \verb|Donoho2009|. The cite key is followed by the other citation attributes such as \texttt{author}, \texttt{title}, and \texttt{year}. These attributes all follow the same syntax: \verb|ATTRIBUTE = {. . .}|.
It's worth taking a moment to discuss the syntax for the BibTeX author attribute. First, multiple author names are separated by \texttt{and}. Second, BibTeX assumes that the last word for each author is their surname. If you would like multiple words to be taken as the ``surname'' then enclose these words in curly brackets. If we wanted to cite the World Bank\index{World Bank, citing} as an author we write \verb|{World Bank}|; otherwise it will be formatted ``Bank, World'' in the presentation document.
Here is a complete BibTeX entry for \cite{Donoho2009}:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{1, 1, 1}\color{fgcolor}
\begin{kframe}
\begin{alltt}
@article\{Donoho2009,
author = \{David L Donoho and Arian Maleki and Morteza
Shahram and Inam Ur Rahman and Victoria Stodden\},
title = \{Reproducible research in computational harmonic
analysis\},
journal = \{Computing in Science \& Engineering\},
year = \{2009\},
volume = \{11\},
number = \{1\},
pages = \{8--18\}
\}
\end{alltt}
\end{kframe}
\end{knitrout}
\noindent Each item of the entry must end in a comma, except the last one.\footnote{This is very similar to how we create vectors in R, though in BibTeX you can actually have a comma after the last attribute.}
\subsection{Including citations in LaTeX documents}
When you want to include citations from a BibTeX file in your LaTeX document you first use the \texttt{bibliography}\index{LaTeX command!bibliography} command. For example, if the BibTeX file is called \emph{Main.bib} and it is in the same directory as your markup document, then type: \verb|\bibliography{Main.bib}|. You can use a bibliography stored in another directory; just include the appropriate file path information. Usually \texttt{bibliography} is placed right before \verb|\end{document}| so that it appears at the end of the compiled presentation document.
You can also specify how you would like the references to be formatted using the \texttt{bibliographystyle}\index{LaTeX command!bibliographystyle} command. For example, this book uses the American Psychological Association (APA)\index{APA} style for references. To set this I included \verb|\bibliographystyle{apa}| directly before \texttt{bibliography}. The default style\footnote{It is referred to in LaTeX as the plain style.} is to number citations (e.g. [1]) rather than include author-year information\footnote{This is sometimes referred to as the ``Harvard'' style.} used by the APA. You will need to include the LaTeX package \emph{natbib}\index{LaTeX package!Natbib} in your preamble to be able to use author-year citation styles. This book includes \verb|\usepackage[authoryear]{natbib}| in its preamble.\index{author-year citations}\index{Harvard style citations}
Place the \texttt{cite}\index{LaTeX command!cite} command in your document's text where you want to place a reference. You include the \verb|CITE_KEY| for the reference in this command, e.g. \verb|\cite{Donoho2009}|. You can include multiple citations in \texttt{cite}, just separate the \verb|CITE_KEY|s with commas. You can add options such as the page numbers or other text to a citation using square brackets ([]). For example, if we wanted to cite the tenth page of \cite{Donoho2009} we type: \verb|\cite[10]{Donoho2009}|. The author-year style in-text citation that this produces looks like this: \cite[10]{Donoho2009}. You can add text at the beginning of a citation with another set of square brackets. Typing \verb|\cite[see][10]{Donoho2009}| gives us: \cite[see][10]{Donoho2009}.
If you are using an author-year style you can use a variety of \emph{natbib} commands to change what information is included in the parentheses. For a selection of these commands and examples, see Table \ref{NatbibTable}.
\begin{table}
\caption{A Selection of \emph{natbib} In-text Citation Style Commands}
\label{NatbibTable}
\begin{center}
\begin{tabular}{l r}
\hline
Command Example & Output \\[0.25cm]
\hline\hline
\verb|\cite{Donoho2009}| & \cite{Donoho2009} \\
\verb|\citep{Donoho2009}| & \citep{Donoho2009} \\
\verb|\citeauthor{Donoho2009}| & \citeauthor{Donoho2009} \\
\verb|\citeyear{Donoho2009}| & \citeyear{Donoho2009} \\
\verb|\citeyearpar{Donoho2009}| & \citeyearpar{Donoho2009} \\
\hline
\end{tabular}
\end{center}
\end{table}
\subsection{Generating a BibTeX file of R package citations}\index{BibTeX!automatic generation}
Researchers are pretty good about citing others' articles and data. However, citations of R packages used in analyses is very inconsistent. This is unfortunate not only because correct attribution is not being given to those who worked to create the packages, but also because it makes reproducibility harder. Not citing packages obscures important steps that were taken in the research process, primarily which package versions were used. Fortunately, there are R tools for quickly and dynamically generating package BibTeX files, including the versions of the packages you are using. They will automatically update the citations each time you compile your document to reflect any changes made to the packages.
You can automatically create citations for R packages using the \texttt{citation}\index{R function!citation} command inside of a code chunk. For example, if you want the citation information for the \texttt{xtable}\index{xtable} package you simply type:
{\small
<<Ch11IntroCite>>=
citation("xtable")
@
}
\noindent This gives you both the plain citation as well as the BibTeX version. If you only want the BibTeX version of the citation you can use the \texttt{toBibtex} command.\index{R function!toBibTeX}
<<Ch11IntrotoBibtex>>=
toBibtex(citation("xtable"))
@
The {\emph{knitr}} package creates BibTeX bibliographies for R packages with the \verb|write_bib|\index{R function!write\_bib} command. Let's make a BibTeX file called \emph{Packages.bib} containing citation information for the \emph{xtable} package.
<<Ch10OneBib, eval=FALSE, tidy=FALSE>>=
# Create package BibTeX file
knitr::write_bib("xtable",
file = "Packages.bib")
@
\noindent \verb|write_bib| automatically assigns each entry a cite key using the format \verb|R-PACKAGE_NAME|, e.g. \verb|R-xtable|.
\textbf{Warning:} \emph{knitr}'s \verb|write_bib| command currently does not have the ability to append package citations to an existing file, but instead writes them to a new file. If there is already a file with the same name, it will overwrite the file. So, be very careful using this command to avoid accidental deletions. It is a good idea to have \verb|write_bib| always write to a file specifically for automatically generated package citations. You can include more than one bibliography in LaTeX's \texttt{bibliography} command. All you need to do is separate them with a comma.
<<Ch11TwoBibs, eval=FALSE, tidy=FALSE>>=
\bibliography{Main.bib,Packages.bib}
@
We can use these techniques to automatically create a BibTeX file with citation information for all of the packages used in a research project. Simply make a character vector of the names of packages that you would like to include in your bibliography. Then run this through \verb|write_bib|.
You can make sure you are citing all of the key packages used in a knitted document by (a) creating a vector of all of the packages and then (b) using this in the following code to both load the packages and write the bibliography:
<<Ch11LoadCite, eval=FALSE>>=
# Package list
PackagesUsed <- c("ggplot2", "knitr",
"xtable", "Zelig")
# Load packages
lapply(PackagesUsed, library,
character.only = TRUE)
# Create package BibTeX file
knitr::write_bib(PackagesUsed,
file = "Packages.bib")
@
\noindent In the first executable line we just create our list of packages to load and cite. The next command is \texttt{lapply}\index{R function!lapply} (list apply). This applies the function \texttt{library} to all of the items in \emph{PackagesUsed}. \texttt{character.only = TRUE} is a \texttt{library}\index{R function!library} argument that allows us to use character string versions of the package names as R sees them in the \emph{PackagesUsed} vector, rather than as objects (how we have used \texttt{library} up until now). If you include these commands in a code chunk at the beginning of your knitted document, then you can be sure that you will have a BibTeX file with all of your packages.
The full LaTeX document example I showed you earlier uses the \texttt{LoadandCite} command\index{R function!LoadandCite} from the \emph{repmis} package. This simplifies the process of loading and citing R packages.\index{repmis}\footnote{It can also install the packages if the option \texttt{install = TRUE}. You can have it install specific package versions by entering the version numbers with the \texttt{versions} argument. This is very useful for enabling the replication of analyses that rely on specific package versions.}
\index{BibTeX|)}\index{LaTeX!bibliographies|)}\index{bibliography|(}
\section{Presentations with LaTeX Beamer}\label{latexBeamer}
\index{beamer|(}\index{LaTeXbeamer|(}
You can make slideshow presentations with LaTeX. Creating a presentation with a markup language can take a bit more effort than using a WYSIWYG program like Microsoft PowerPoint\index{Microsoft PowerPoint} or Apple's Keynote.\index{Apple Keynote} However, combining LaTeX and \emph{knitr} can make fully reproducible presentations that dynamically create and present results. I have found this particularly useful in my teaching as dynamically produced presentations allow me to provide my students with fully replicable examples of how I created a figure on a slide, for example. \emph{knitr} also makes it easy to beautifully present code examples.
One of the most popular LaTeX tools for slideshows is the beamer class. When you compile a beamer class document, a PDF will be created where every page is a different slide (see Figure \ref{BeamerExample}). All major PDF viewer programs have some sort of ``View Full Screen'' option to view beamer PDFs as full screen slideshows. Usually you can navigate through the slides with the forward and back arrows on the keyboard.
In this section we will take a brief look at the basics of creating slideshows with beamer, highlighting special considerations that need to be made when working with beamer and \emph{knitr}. A full example of a knittable beamer presentation with illustrations of the many of the points discussed here is printed at the end of the chapter.
\begin{figure}
\caption{Knitted Beamer PDF Example}
\label{BeamerExample}
\begin{center}
\includegraphics[scale=0.5]{Children/Chapter11/images11/BeamerExample.png}
\end{center}
{\scriptsize The presentation in this example was created using a custom beamer theme available at: \url{https://GitHub.com/christophergandrud/Make-Projects/tree/master/Rnw_Lecture}.}
\end{figure}
\subsection{Beamer basics}
{\emph{knitr}} largely works the same way in LaTeX slideshows as it does in article or book class documents. There are a few differences to look out for.
\paragraph{The Beamer preamble}
You use \texttt{documentclass}\index{LaTeX command!documentclass} to set a LaTeX document as a \texttt{beamer} slideshow. You can also include global style information in the preamble by using the commands \texttt{usetheme},\index{LaTeX command!usetheme}\index{LaTeX command!usecolortheme}\index{LaTeX command!useinnertheme}\index{LaTeX command!useoutertheme} \texttt{usecolortheme}, \texttt{useinnertheme}, \texttt{useoutertheme}. For a fairly comprehensive compilation of beamer themes see the Hartwork's Beamer theme matrix: \url{http://www.hartwork.org/beamer-theme-matrix/}.
\paragraph{Slide frames}\index{LaTeX!beamer slides}
After the preamble, you start your document as usual by beginning the \texttt{document} environment.\index{LaTeX environment!document} Then you need to start creating slides. Individual beamer slides are created using the \texttt{frame}\index{LaTeX command!frame}\index{LaTeX environment!frame} environments. Create a frame title using \texttt{frametitle}.\index{LaTeX command!frametitle}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{1, 1, 1}
\color{fgcolor}
\begin{kframe}
\begin{verbatim}
\frame{
\frametitle{An example frame}
}
\end{verbatim}
\end{kframe}
\end{knitrout}
\noindent Note that you can also use the usual \verb|\begin{frame} . . \end{frame}| syntax. Unlike in a WYSIWYG slide show program, you will not be able to tell if you have tried to put more information on one slide than it can handle until after you compile the document.\footnote{One way to deal with frames that span multiple slides is to use the \texttt{allowframebreaks} command, i.e. \texttt{\textbackslash{}begin\{frame\}[allowframebreaks].\index{LaTeX command!allowframebreaks}}}
\paragraph{Title frames}\index{LaTeX!beamer title frames}
One important difference from a regular LaTeX article is that instead of using \texttt{maketitle} to place your title information, in beamer you place the \texttt{titlepage}\index{LaTeX command!titlepage} inside of a frame by itself.
\paragraph{Sections \& outlines}
We can use section\index{LaTeX command!section} commands in much the same way as we do in other types of LaTeX documents. Section commands do not need to be placed inside of frames. After the title slide, many slideshows have a presentation outline. You can automatically create one from your section headings using the \texttt{tableofcontents}\index{LaTeX command!tableofcontents} command. Like the \texttt{titlepage} command,\index{LaTeX command!titlepage} \texttt{tableofcontents} can go on its own frame, i.e.\index{LaTeX!table of contents}\index{LaTeX!outlines}
\begin{knitrout}
\definecolor{shadecolor}{rgb}{1, 1, 1}
\color{fgcolor}
\begin{kframe}
\begin{verbatim}
%%% Title slide
\frame{
\titlepage
}
%% Table of contents slide
\frame{
\frametitle{Outline}
\tableofcontents
}
\end{verbatim}
\end{kframe}
\end{knitrout}
\paragraph{Make list items appear}\index{LaTeX!list appear}
Lists work the same way in beamer as they do in other LaTeX document classes. They do have an added feature in that you can have each item appear as you progress through the slide show. After \verb|\item|, place the number of the order in which the item should appear. Enclose the number in \verb|< ->|. For example,
<<Ch11ItemFadeIn, eval=FALSE>>=
\begin{itemize}
\item<1-> The first item.
\item<2-> The second item.
\item<2-> The third item.
\end{itemize}
@
\noindent In this example the first item will appear before the next two. These two will appear at the same time.
\subsection{\emph{knitr} with LaTeX slideshows}
\emph{knitr} code chunks have the same syntax in LaTeX slideshows as in other LaTeX documents. You do need to make one change to the \texttt{frame} options, however, to include highlighted {\emph{knitr}} code chunks on your slides. You should add the \texttt{fragile} option to the \texttt{frame} command.\footnote{For a detailed discussion of why you need to use the \texttt{fragile} option with the \texttt{verbatim} environment\index{LaTeX environment!verbatim} that {\emph{knitr}} uses to display highlighted text in LaTeX documents see this blog post by Pieter Belmans: \url{http://pbelmans.wordpress.com/2011/02/20/why-latex-beamer-needs-fragile-when-using-verbatim/} (posted 20 February 2011).} Here is an example:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{1, 1, 1}
\color{fgcolor}
\begin{kframe}
\begin{verbatim}
\begin{frame}[fragile]
\frametitle{An example fragile frame.}
\end{frame}
\end{verbatim}
\end{kframe}
\end{knitrout}
\noindent Here is a complete knittable beamer example:
{\scriptsize
\begin{knitrout}
\definecolor{shadecolor}{rgb}{1, 1, 1}\color{fgcolor}
\begin{kframe}
\begin{alltt}
\textbackslash{}documentclass\{beamer\}
\textbackslash{}begin\{document\}
%% Title page inforamtion
\textbackslash{}title\{Example Beamer/\textbackslash{}emph\{knitr\} Slideshow\}
\textbackslash{}author\{\textbackslash{}href\{mailto:gandrud@hertie-school.org\}\{Christopher Gandrud\}\}
%%% Title slide
\textbackslash{}frame\{
\textbackslash{}titlepage
\}
%% Table of contents slide
\textbackslash{}frame\{
\textbackslash{}frametitle\{Outline\}
\textbackslash{}tableofcontents
\}
%%% The code
\textbackslash{}section\{Access the code\}
\textbackslash{}begin\{frame\}[fragile]
\textbackslash{}frametitle\{Access the code\}
The code to create the following figure is available online.
To access it we can type:
\textless{}\textless{}eval=FALSE\textgreater{}\textgreater{}=
\hlcom{# Access and run the code to create a caterpillar plot}
devtools::source\_url(\hlstr{"http://bit.ly/VRKphr"})
@
\textbackslash{}end\{frame\}
%%% The figure
\textbackslash{}section\{The Figure\}
\textbackslash{}begin\{frame\}[fragile]
\textbackslash{}frametitle\{The resulting figure\}
\textless{}\textless{}echo=FALSE, message=FALSE, out.width='\textbackslash{}\textbackslash{}textwidth', out.height='0.8\textbackslash{}\textbackslash{}textheight'\textgreater{}\textgreater{}=
\hlcom{# Access and run the figure code}
devtools::source\_url(\hlstr{"http://bit.ly/VRKphr"})
@
\textbackslash{}end\{frame\}
\textbackslash{}end\{document\}
\end{alltt}
\end{kframe}
\end{knitrout}
}
In Chapter \ref{MarkdownChapter} we will see how to use the \emph{rmarkdown} package to create beamer presentations with the much simpler Markdown syntax.
\index{beamer|)}\index{LaTeXbeamer|)}
\subsection*{Chapter summary}
In this chapter we have learned the nitty-gritty of how to create simple LaTeX documents--articles and slideshows--that we can embed our reproducible research in using \emph{knitr}. In the next chapter we look at how to create more complex LaTeX documents, including theses, books, and batch reports.
================================================
FILE: Old/Source-v2/Children/Chapter12/chapter12.Rnw
================================================
% Chapter Chapter 12 For Reproducible Research in R and RStudio
% Christopher Gandrud
% Created: 16/07/2012 05:45:03 pm CEST
% Updated: 5 May 2015
<<set-parent12, include=FALSE>>=
set_parent('Rep-Res-Parent.Rnw')
@
\chapter{Large \emph{knitr}/LaTeX Documents: Theses, Books, and Batch Reports}\label{LargeDocs}
In the previous chapter we learned the basics of how to make LaTeX documents to create and present research findings. So far we have only learned how to create short documents, like articles and slideshows. For longer and more complex documents, such as theses and books, a single LaTeX markup file can become very unwieldy very quickly, especially when it includes \emph{knitr} code chunks as well. Ideally we would segment the markup file into individual chapter files and then bring them all together when we compile the whole document. This would allow us to benefit from a modular file structure while producing one presentation document with continuous section and page numbering. To do this we can take advantage of LaTeX and \emph{knitr} to separate markup files into manageable pieces. Like directories, these pieces are called \textbf{child} files, which are combined using a \textbf{parent} document.
Many of these tools can also be used to create batch reports\index{batch reports}: documents that present results for a selected part of a data set. For example, a researcher may want to create individual reports of answers to survey questions from interviewees with a specific age. In the latter part of this chapter we will rely on {\emph{knitr}} and the \emph{brew} package \citep{R-brew} to create batch reports.
In this chapter we will first briefly discuss how to plan a large document's file structure. We will then look at three methods for including child documents into parent documents. The first is very simple and uses the LaTeX command \texttt{input}\index{LaTeX command!input}. The second uses \emph{knitr} to include knittable child documents. The final method is a special case of the \emph{knitr} method that uses the command-line program Pandoc \index{Pandoc} to convert child documents written in non-LaTeX markup languages and include them into a LaTeX parent. After this we will look at how to create batch reports.
\section{Planning Large Documents}
Before discussing the specifics of each of these methods, it's worth taking a moment to carefully plan the structure of our child and parent documents. Books and theses have a natural parent-child structure, i.e. they are single documents comprised of multiple chapters. They often include other child-like features such as title pages, bibliographies, figures, and appendices. You could include most of these features directly into one markup file. But this file would become very large and unwieldy. It would be difficult to find the one part or section that you want to edit. If your presentation markup files are difficult to navigate, they are difficult to reproduce.
Instead of one long markup file, you can break the document at natural division points, like chapters, into multiple child documents.\index{child files}\index{parent document} These can then be combined with a parent document. The parent document acts like the skeleton that organizes the children in a specific order. The parent document can be compiled and all of the children will be in the right place. In LaTeX, a parent document will include the preamble where the document class (\texttt{book} for example\index{LaTeXbook}) is set and all of the necessary LaTeX packages are loaded. It also includes \emph{knitr} global options, the \texttt{maketitle}, \verb|\begin{document}| and \verb|\end{document}|, and the \texttt{bibliography.} When you compile the parent document you will compile the entire document. Notice that if the parent document contains the preamble and so on, that the children cannot contain this information as well. This can create some issues if you only want to compile one chapter rather than the whole document. We will see how to overcome this problem with \emph{knitr} later in the chapter.
To make your many child and parent documents manageable, it is a good idea to store your child files in a subdirectory of the folder storing the parent file. This book was created using a knittable parent and child structure, so please see the markup files on GitHub for a complete example of how to use \emph{knitr} with large documents.\footnote{See: \url{https://github.com/christophergandrud/Rep-Res-Book/tree/master/Source}.} When segmenting your presentation documents into parents and children, the remainder of your research project structure can stay largely the same as we have seen so far.
\section{Large Documents with Traditional LaTeX}
Imagine that we are writing a book with three chapters. No part of the document includes \emph{knitr} code chunks. We can split the book into three child documents and place them in a subdirectory of the parent document's folder called \emph{Children}. The child documents should not contain a preamble, \verb|\begin{document}|, or \verb|\end{document}|. Because they are chapters, we will begin the documents simply with the \texttt{chapter} heading.\index{LaTeX command!chapter} For example, the chapter in this book has:
<<Ch12ChapterName, eval=FALSE, tidy=FALSE, size='scriptsize'>>=
\chapter{Large \emph{knitr}/LaTeX Documents: Theses, Books, \& Batch Reports}\label{LargeDocs}
@
\noindent As we saw earlier, the \texttt{label}\index{LaTeX command!label} command is used for cross-referencing.
\subsection{Inputting/including children}
Now in the parent document we can place the \texttt{input}\index{LaTeX command!input} command where we would like the child to show up in the final document. If we want there to be a clear page on either side of the included document we should use the \texttt{include}\index{LaTeX command!include} command instead. In the \texttt{input} or \texttt{include} command, we simply place the child document's file path. Here is an example parent document with three child documents (\emph{Chapter1.tex}, \emph{Chapter2.tex}, and \emph{Chapter3.tex}) all located in a subdirectory of the parent document called \emph{Children}:
\begin{knitrout}
\definecolor{shadecolor}{rgb}{1, 1, 1}
\color{fgcolor}
\begin{kframe}
\begin{alltt}
%%%%%%%%%%%%%% Article Preamble %%%%%%%%%%%%%%
\textbackslash{}documentclass\{book\}
%%%% Load LaTeX packages
\textbackslash{}usepackage\{hyperref\}
\textbackslash{}usepackage\{makeidx\}
\textbackslash{}usepackage[authoryear]\{natbib\}
%%%% Start document body
\textbackslash{}begin\{document\}
%%%%%%%%%%%%% Create title %%%%%%%%%%%%%%%%%
\textbackslash{}title\{An Example LaTeX Book\}
\textbackslash{}author\{Christopher Gandrud\}
\textbackslash{}maketitle
%%%%%%%%%%%% Frontmatter %%%%%%%%%%%%%%%%%%%
\textbackslash{}tableofcontents
\textbackslash{}listoffigures
\textbackslash{}listoftables
%%%% Start index
\textbackslash{}makeindex
%%%%%%%%%%% Input child documents %%%%%%%%%
%%%% Chapter 1
\textbackslash{}input\{Children/Chapter1.tex\}
%%%% Chapter 2
\textbackslash{}input\{Children/Chapter2.tex\}
%%%% Chapter 3
\textbackslash{}input\{Children/Chapter3.tex\}
%%%%%%%%% Bibliography %%%%%%%%%%%%%%%%%%%%
\textbackslash{}bibliographystyle\{apa\}
\textbackslash{}bibliography\{Main.bib,Packages.bib\}
%%%%%%%%% Index %%%%%%%%%%%%%%%%%%%%%%%%%%
\textbackslash{}clearpage
\textbackslash{}printindex
\textbackslash{}end\{document\}
\end{alltt}
\end{kframe}
\end{knitrout}
\subsection{Other common features of large documents}
There are some other commands in this example parent document that we have not seen before. These commands create the book's front matter\index{front matter}--tables of contents, lists of figures and tables--as well as blank pages and the book's index.
\paragraph{Table of contents}\index{LaTeX!table of contents}
If you are using LaTeX's section headings (e.g. \texttt{chapter}, \texttt{section})\index{LaTeX command!chapter}\index{LaTeX command!section} you can automatically generate a table of contents with the \texttt{tableofcontents}\index{LaTeX command!tableofcontents} command. We saw an example earlier when we created a beamer slideshow. Simply place this command where you want the table of contents to appear. Usually this is after the \texttt{maketitle} command near the beginning of the document.
\paragraph{Lists of figures and tables}\index{LaTeX!list of tables/figures}
It is also common for large documents to include lists of its figures and tables. Usually these are placed after the table of contents. LaTeX will automatically create these lists from the \texttt{caption}s\index{LaTeX command!caption} you place in \texttt{table} and \texttt{figure} environments. To create these lists, use the \texttt{listoffigures} and \texttt{listoftables} commands.\index{LaTeX command!listoftables}\index{LaTeX command!listoffigures}
\paragraph{Blank Pages}
Sometimes we want to make sure that an index, a bibliography, or some other item begins on a new page. To do this, simply place the \texttt{clearpage}\index{LaTeX command!clearpage} command directly before the item.
\paragraph{Index}\index{LaTeX!indices}
You can automatically create an index with the \emph{makeidx} (make index) LaTeX package.\index{LaTeX package!makeidx} To set up this package, include it in your preamble. Then, near the beginning of your document, enable the index by placing \verb|\makeindex|. You will probably want the actual index to be printed near the end of the document. To do this, place \verb|\printindex| after the bibliography or somewhere else before \verb|\end{document}|. Throughout the child documents, you can use \verb|\index{INDEX_KEY}| at places that you would like the index to refer to. For example, if we wanted to create an index entry for this spot in this book with the \verb|INDEX_KEY| ``indices'' we type: \verb|\index{indices}|.
%%%%%%%%%%%%% Knitted Child Documents %%%%%%%%%%%%%%
\section{\emph{knitr} and Large Documents}\index{knitr!large documents}
LaTeX's own parent-child functions are very useful if you are creating plain, non-knittable documents. For knittable documents we need to use \emph{knitr}'s parent-child options. Not only do these allow us to include knittable children in parent documents, it also allows us to \texttt{knit} each child document separately. This can be very useful working on document drafts as we don't need to compile the whole document every time we want to look at changes made in one chapter.
\subsection{The parent document}
Like regular LaTeX parent documents, knittable parent documents include commands to create the preamble, front matter, bibliography, and so on. {\emph{knitr}} global chunk options\index{knitr!global chunk options} and packag
gitextract_fuj2c_ui/
├── .gitignore
├── Old/
│ ├── BookMake.R
│ ├── CoverGraphics/
│ │ └── 2ndEditionCover_v1/
│ │ ├── index.html
│ │ └── main.css
│ ├── EarlyOutline.md
│ ├── README.md
│ ├── Source-v2/
│ │ ├── .gitignore
│ │ ├── Children/
│ │ │ ├── Chapter1/
│ │ │ │ ├── chapter1.Rnw
│ │ │ │ └── chapter1.md
│ │ │ ├── Chapter10/
│ │ │ │ └── chapter10.Rnw
│ │ │ ├── Chapter11/
│ │ │ │ └── chapter11.Rnw
│ │ │ ├── Chapter12/
│ │ │ │ └── chapter12.Rnw
│ │ │ ├── Chapter13/
│ │ │ │ └── chapter13.Rnw
│ │ │ ├── Chapter14/
│ │ │ │ └── chapter14.Rnw
│ │ │ ├── Chapter2/
│ │ │ │ └── chapter2.Rnw
│ │ │ ├── Chapter3/
│ │ │ │ └── chapter3.Rnw
│ │ │ ├── Chapter4/
│ │ │ │ └── chapter4.Rnw
│ │ │ ├── Chapter5/
│ │ │ │ └── chapter5.Rnw
│ │ │ ├── Chapter6/
│ │ │ │ └── chapter6.Rnw
│ │ │ ├── Chapter7/
│ │ │ │ └── chapter7.Rnw
│ │ │ ├── Chapter8/
│ │ │ │ └── chapter8.Rnw
│ │ │ ├── Chapter9/
│ │ │ │ └── chapter9.Rnw
│ │ │ └── FrontMatter/
│ │ │ ├── AdditionalResources/
│ │ │ │ └── AdditionalResources.Rnw
│ │ │ ├── Packages.Rnw
│ │ │ ├── Preface.Rnw
│ │ │ ├── StylisticConventions.md
│ │ │ └── rep-res-PackagesCited.bib
│ │ ├── Rep-Res-Parent.Rnw
│ │ ├── Rep-Res-Parent.toc
│ │ ├── krantz.cls
│ │ └── rep-res-book.bib
│ ├── SourceOld/
│ │ ├── Chapter1/
│ │ │ └── chapter1.Rmd
│ │ ├── Chapter10/
│ │ │ └── chapter10.Rmd
│ │ ├── Chapter11/
│ │ │ └── chapter11.Rmd
│ │ ├── Chapter12/
│ │ │ └── chapter12.Rmd
│ │ ├── Chapter13/
│ │ │ └── chapter13.Rmd
│ │ ├── Chapter14/
│ │ │ └── chapter14.Rmd
│ │ ├── Chapter2/
│ │ │ └── chapter2.Rmd
│ │ ├── Chapter3/
│ │ │ └── chapter3.Rmd
│ │ ├── Chapter4/
│ │ │ └── chapter4.Rmd
│ │ ├── Chapter5/
│ │ │ └── chapter5.Rmd
│ │ ├── Chapter6/
│ │ │ └── chapter6.Rmd
│ │ ├── Chapter7/
│ │ │ └── chapter7.Rmd
│ │ ├── Chapter8/
│ │ │ └── chapter8.Rmd
│ │ └── Chapter9/
│ │ └── chapter9.Rmd
│ └── Writing_Setup/
│ ├── Early_Book_Origins.md
│ ├── HeaderFooter/
│ │ ├── IndvChapterFoot.tex
│ │ └── IndvChapterHead.tex
│ ├── IndvChapter.sh
│ ├── IndvChapter1.Rnw
│ ├── OldScripts/
│ │ ├── ConvertRmdtoRnw.sh
│ │ └── Rmd_Book.sh
│ ├── ProductionNotes.md
│ ├── Rnw_Book.sh
│ └── TableofContentPDF/
│ ├── GandrudRep-Res-Book-TOC.fdb_latexmk
│ ├── GandrudRep-Res-Book-TOC.tex
│ └── krantz.cls
├── README.Rmd
├── README.md
└── rep-res-3rd-edition/
├── .gitignore
├── 01-author.Rmd
├── 01-stylistic-conventions.Rmd
├── 02-additional-resources.Rmd
├── 03-introduction.Rmd
├── 04-getting-started.Rmd
├── 05-start-R.Rmd
├── 06-file-management.Rmd
├── 07-storage.Rmd
├── 08-gather.Rmd
├── 09-clean.Rmd
├── 10-modeling.Rmd
├── 11-tables.Rmd
├── 12-figures.Rmd
├── 13-latex.Rmd
├── 14-web.Rmd
├── 16-conclusion.Rmd
├── 99-references.Rmd
├── LICENSE
├── README.md
├── _bookdown.yml
├── _output.yml
├── book.bib
├── css/
│ └── style.css
├── index.Rmd
├── krantz.cls
├── latex/
│ ├── after_body.tex
│ ├── before_body.tex
│ └── preamble.tex
├── packages.bib
└── rep-res-3rd-edition.Rproj
Condensed preview — 90 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (1,527K chars).
[
{
"path": ".gitignore",
"chars": 260,
"preview": "# Ignore the following files from Git version control tracking #\n#######################################################"
},
{
"path": "Old/BookMake.R",
"chars": 1302,
"preview": "#################\n# Make file for the book Reproducible Research with R and RStudio\n# Christopher Gandrud\n# Updated: 30 "
},
{
"path": "Old/CoverGraphics/2ndEditionCover_v1/index.html",
"chars": 211,
"preview": "<!DOCTYPE html>\n\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\n\n<head>\n\n<meta charset=\"utf-8\">\n<link rel=\"stylesheet\" href"
},
{
"path": "Old/CoverGraphics/2ndEditionCover_v1/main.css",
"chars": 2031,
"preview": ".book-logo {\n width: 100%;\n height: 80%;\n padding: 20%;\n margin: 5%;\n position: relative;\n}\n.book-logo:be"
},
{
"path": "Old/EarlyOutline.md",
"chars": 5963,
"preview": "# Reproducible Research with R and RStudio: A workflow for data gathering, analysis, and document creation\n\n## Updated C"
},
{
"path": "Old/README.md",
"chars": 105,
"preview": "# The Old Directory \n\nThis folder contains obsolete files that were used in earlier versions of the book."
},
{
"path": "Old/Source-v2/.gitignore",
"chars": 495,
"preview": "# Ignore LaTeX compile byproduct files #\n########################################\n\n*.aux\n*.bbl\n*.blg\ncache/*\n.DS_Store\nf"
},
{
"path": "Old/Source-v2/Children/Chapter1/chapter1.Rnw",
"chars": 41951,
"preview": "% Chapter Chapter 1 For Reproducible Research in R and RStudio\n% Christopher Gandrud\n% Created: 16/07/2012 05:45:03 pm C"
},
{
"path": "Old/Source-v2/Children/Chapter1/chapter1.md",
"chars": 40742,
"preview": "Introducing Reproducible Research {#Intro}\n=================================\n\nResearch is often presented in very select"
},
{
"path": "Old/Source-v2/Children/Chapter10/chapter10.Rnw",
"chars": 48046,
"preview": "% Chapter Chapter 10 For Reproducible Research in R and RStudio\n% Christopher Gandrud\n% Created: 16/07/2012 05:45:03 pm "
},
{
"path": "Old/Source-v2/Children/Chapter11/chapter11.Rnw",
"chars": 46396,
"preview": "% Chapter Chapter 11 For Reproducible Research in R and RStudio\n% Christopher Gandrud\n% Created: 16/07/2012 05:45:03 pm "
},
{
"path": "Old/Source-v2/Children/Chapter12/chapter12.Rnw",
"chars": 26860,
"preview": "% Chapter Chapter 12 For Reproducible Research in R and RStudio\n% Christopher Gandrud\n% Created: 16/07/2012 05:45:03 pm "
},
{
"path": "Old/Source-v2/Children/Chapter13/chapter13.Rnw",
"chars": 41380,
"preview": "% Chapter Chapter 13 For Reproducible Research in R and RStudio\n% Christopher Gandrud\n% Created: 16/07/2012 05:45:03 pm "
},
{
"path": "Old/Source-v2/Children/Chapter14/chapter14.Rnw",
"chars": 16327,
"preview": "% Chapter Chapter 14 For Reproducible Research in R and RStudio\n% Christopher Gandrud\n% Created: 16/07/2012 05:45:03 pm "
},
{
"path": "Old/Source-v2/Children/Chapter2/chapter2.Rnw",
"chars": 21610,
"preview": "% Chapter Chapter 2 For Reproducible Research in R and RStudio\n% Christopher Gandrud\n% Created: 16/07/2012 05:45:03 pm C"
},
{
"path": "Old/Source-v2/Children/Chapter3/chapter3.Rnw",
"chars": 67890,
"preview": "% Chapter Chapter 3 For Reproducible Research in R and RStudio\n% Christopher Gandrud\n% Created: 16/07/2012 05:45:03 pm C"
},
{
"path": "Old/Source-v2/Children/Chapter4/chapter4.Rnw",
"chars": 31025,
"preview": "% Chapter Chapter 4 For Reproducible Research in R and RStudio\n% Christopher Gandrud\n% Created: 16/07/2012 05:45:03 pm C"
},
{
"path": "Old/Source-v2/Children/Chapter5/chapter5.Rnw",
"chars": 57445,
"preview": "% Chapter Chapter 5 For Reproducible Research in R and RStudio\n% Christopher Gandrud\n% Created: 16/07/2012 05:45:03 pm C"
},
{
"path": "Old/Source-v2/Children/Chapter6/chapter6.Rnw",
"chars": 43529,
"preview": "% Chapter Chapter 6 For Reproducible Research in R and RStudio\n% Christopher Gandrud\n% Created: 16/07/2012 05:45:03 pm C"
},
{
"path": "Old/Source-v2/Children/Chapter7/chapter7.Rnw",
"chars": 40223,
"preview": "% Chapter Chapter 7 For Reproducible Research in R and RStudio\n% Christopher Gandrud\n% Created: 16/07/2012 05:45:03 pm C"
},
{
"path": "Old/Source-v2/Children/Chapter8/chapter8.Rnw",
"chars": 31673,
"preview": "% Chapter Chapter 8 For Reproducible Research in R and RStudio\n% Christopher Gandrud\n% Created: 16/07/2012 05:45:03 pm C"
},
{
"path": "Old/Source-v2/Children/Chapter9/chapter9.Rnw",
"chars": 47087,
"preview": "% Chapter Chapter 9 For Reproducible Research in R and RStudio\n% Christopher Gandrud\n% Created: 16/07/2012 05:45:03 pm C"
},
{
"path": "Old/Source-v2/Children/FrontMatter/AdditionalResources/AdditionalResources.Rnw",
"chars": 4385,
"preview": "% Example Project Explanation For Reproducible Research in R and RStudio\n% Christopher Gandrud\n% Updated: 20 March 2015\n"
},
{
"path": "Old/Source-v2/Children/FrontMatter/Packages.Rnw",
"chars": 3957,
"preview": "<<set-parentPackages, echo=FALSE, results='hide', cache=FALSE>>=\nset_parent('Rep-Res-Parent.Rnw')\n@\n\n\\chapter*{Required "
},
{
"path": "Old/Source-v2/Children/FrontMatter/Preface.Rnw",
"chars": 6623,
"preview": "<<set-parentPreface, echo=FALSE, results='hide', cache=FALSE>>=\nset_parent('Rep-Res-Parent.Rnw')\n@\n\n\\chapter*{Preface}\n\n"
},
{
"path": "Old/Source-v2/Children/FrontMatter/StylisticConventions.md",
"chars": 1134,
"preview": "%% Stylistic Conventions for Reproducible Research with R and RStudio\n\nI use the following conventions throughout this b"
},
{
"path": "Old/Source-v2/Children/FrontMatter/rep-res-PackagesCited.bib",
"chars": 8864,
"preview": "@Manual{R-animation,\n title = {animation: A gallery of animations in statistics and utilities to create\nanimations},\n "
},
{
"path": "Old/Source-v2/Rep-Res-Parent.Rnw",
"chars": 4599,
"preview": "%%%%%%%%%%%%%%%\n% Parent document for the book Reproducible Research with R and RStudio\n% Christopher Gandrud\n% 17 April"
},
{
"path": "Old/Source-v2/Rep-Res-Parent.toc",
"chars": 24092,
"preview": "\\contentsline {chapter}{Preface}{xiii}\n\\contentsline {chapter}{Stylistic Conventions}{xvii}\n\\contentsline {chapter}{Requ"
},
{
"path": "Old/Source-v2/krantz.cls",
"chars": 59742,
"preview": "%%\n%% This is file `Krantz.cls'\n%%% Created by Shashi Kumar / ITC [August 2008]\n\n\n\\NeedsTeXFormat{LaTeX2e}[1995/12/01]\n\\"
},
{
"path": "Old/Source-v2/rep-res-book.bib",
"chars": 19209,
"preview": "% Main Bibliography For Reproducible Research in R and RStudio\n% Christopher Gandrud\n% Updated: 24 April 2015\n\n@article{"
},
{
"path": "Old/SourceOld/Chapter1/chapter1.Rmd",
"chars": 5325,
"preview": "<!---\n Chapter Chapter 1 For Reproducible Research in R and RStudio\n Christopher Gandrud\n Created 28/06/2012 05:48:16 pm"
},
{
"path": "Old/SourceOld/Chapter10/chapter10.Rmd",
"chars": 165,
"preview": "<!---\n Chapter Chapter 10 For Reproducible Research in R and RStudio\n Christopher Gandrud\n Created 28/06/2012 05:48:16 p"
},
{
"path": "Old/SourceOld/Chapter11/chapter11.Rmd",
"chars": 2904,
"preview": "<!---\n Chapter Chapter 11 For Reproducible Research in R and RStudio\n Christopher Gandrud\n Created 28/06/2012 05:48:16 p"
},
{
"path": "Old/SourceOld/Chapter12/chapter12.Rmd",
"chars": 177,
"preview": "<!---\n Chapter Chapter 12 For Reproducible Research in R and RStudio\n Christopher Gandrud\n Created 28/06/2012 05:48:16 p"
},
{
"path": "Old/SourceOld/Chapter13/chapter13.Rmd",
"chars": 981,
"preview": "<!---\n Chapter Chapter 13 For Reproducible Research in R and RStudio\n Christopher Gandrud\n Created 28/06/2012 05:48:16 p"
},
{
"path": "Old/SourceOld/Chapter14/chapter14.Rmd",
"chars": 165,
"preview": "<!---\n Chapter Chapter 14 For Reproducible Research in R and RStudio\n Christopher Gandrud\n Created 28/06/2012 05:48:16 p"
},
{
"path": "Old/SourceOld/Chapter2/chapter2.Rmd",
"chars": 1499,
"preview": "<!---\n Chapter Chapter 2 For Reproducible Research in R and RStudio\n Christopher Gandrud\n Created 28/06/2012 05:48:16 pm"
},
{
"path": "Old/SourceOld/Chapter3/chapter3.Rmd",
"chars": 163,
"preview": "<!---\n Chapter Chapter 3 For Reproducible Research in R and RStudio\n Christopher Gandrud\n Created 28/06/2012 05:48:16 pm"
},
{
"path": "Old/SourceOld/Chapter4/chapter4.Rmd",
"chars": 163,
"preview": "<!---\n Chapter Chapter 4 For Reproducible Research in R and RStudio\n Christopher Gandrud\n Created 28/06/2012 05:48:16 pm"
},
{
"path": "Old/SourceOld/Chapter5/chapter5.Rmd",
"chars": 1244,
"preview": "<!---\n Chapter Chapter 5 For Reproducible Research in R and RStudio\n Christopher Gandrud\n Created 28/06/2012 05:48:16 pm"
},
{
"path": "Old/SourceOld/Chapter6/chapter6.Rmd",
"chars": 9407,
"preview": "<!---\n Chapter Chapter 6 For Reproducible Research in R and RStudio\n Christopher Gandrud\n Created 28/06/2012 05:48:16 pm"
},
{
"path": "Old/SourceOld/Chapter7/chapter7.Rmd",
"chars": 163,
"preview": "<!---\n Chapter Chapter 7 For Reproducible Research in R and RStudio\n Christopher Gandrud\n Created 28/06/2012 05:48:16 pm"
},
{
"path": "Old/SourceOld/Chapter8/chapter8.Rmd",
"chars": 4757,
"preview": "<!---\n Chapter Chapter 8 For Reproducible Research in R and RStudio\n Christopher Gandrud\n Created 28/06/2012 05:48:16 pm"
},
{
"path": "Old/SourceOld/Chapter9/chapter9.Rmd",
"chars": 3243,
"preview": "<!---\n Chapter Chapter 9 For Reproducible Research in R and RStudio\n Christopher Gandrud\n Created 28/06/2012 05:48:16 pm"
},
{
"path": "Old/Writing_Setup/Early_Book_Origins.md",
"chars": 1103,
"preview": "# Description of the Origins of Reproducible Research in R and RStudio\n\n## Christopher Gandrud\n\n---\n\nThe book began as a"
},
{
"path": "Old/Writing_Setup/HeaderFooter/IndvChapterFoot.tex",
"chars": 112,
"preview": "\\bibliographystyle{plain}\n\\bibliography{/git_repositories/Rep-Res-Book/Source/rep-res-book.bib}\n\n\n\\end{document}"
},
{
"path": "Old/Writing_Setup/HeaderFooter/IndvChapterHead.tex",
"chars": 488,
"preview": "\\documentclass{article}\n\n\\usepackage{amssymb}\n\\usepackage{amsmath}\n\\usepackage{graphicx}\n\\usepackage{subfigure}\n%\\usepac"
},
{
"path": "Old/Writing_Setup/IndvChapter.sh",
"chars": 1154,
"preview": "##########\n# Shell script to create individual chapters that are compilable in LaTeX\n# Christopher Gandrud\n# Updated 18 "
},
{
"path": "Old/Writing_Setup/IndvChapter1.Rnw",
"chars": 600,
"preview": "\\documentclass{article}\n\n\\usepackage{amssymb}\n\\usepackage{amsmath}\n\\usepackage{graphicx}\n\\usepackage{subfigure}\n%\\usepac"
},
{
"path": "Old/Writing_Setup/OldScripts/ConvertRmdtoRnw.sh",
"chars": 516,
"preview": "##########\n# Shell script for Converting Early Chapter Drafts from Markdown to LaTeX\n# Christopher Gandrud\n# Updated 16 "
},
{
"path": "Old/Writing_Setup/OldScripts/Rmd_Book.sh",
"chars": 959,
"preview": "##########\n# Shell script to create directories & files for Reproducible Research in R/RStudio\n# With Markdown\n# Christo"
},
{
"path": "Old/Writing_Setup/ProductionNotes.md",
"chars": 1852,
"preview": "# Reproducible Research for R and RStudio\n\n## Production Notes\n\n### Christopher Gandrud\n\n---\n---\n\n## Shell Script for di"
},
{
"path": "Old/Writing_Setup/Rnw_Book.sh",
"chars": 4766,
"preview": "##########\n# Shell script to create directories & files for Reproducible Research in R/RStudio\n# With LaTeX\n# Christophe"
},
{
"path": "Old/Writing_Setup/TableofContentPDF/GandrudRep-Res-Book-TOC.fdb_latexmk",
"chars": 938,
"preview": "# Fdb version 3\n[\"pdflatex\"] 1359598830 \"GandrudRep-Res-Book-TOC.tex\" \"GandrudRep-Res-Book-TOC.pdf\" \"GandrudRep-Res-Book"
},
{
"path": "Old/Writing_Setup/TableofContentPDF/GandrudRep-Res-Book-TOC.tex",
"chars": 1052,
"preview": "\\documentclass[krantz1]{krantz}\n\n% Load required LaTeX packages\n\\usepackage[authoryear]{natbib}\n\\usepackage{amssymb}\n\\us"
},
{
"path": "Old/Writing_Setup/TableofContentPDF/krantz.cls",
"chars": 59742,
"preview": "%%\n%% This is file `Krantz.cls'\n%%% Created by Shashi Kumar / ITC [August 2008]\n\n\n\\NeedsTeXFormat{LaTeX2e}[1995/12/01]\n\\"
},
{
"path": "README.Rmd",
"chars": 768,
"preview": "---\noutput: github_document \n---\n\n# Reproducible Research with R and RStudio (Third Edition)\n\n[<img src=\"img/re-res-book"
},
{
"path": "README.md",
"chars": 1599,
"preview": "\n# Reproducible Research with R and RStudio (Third Edition)\n\n[<img src=\"img/re-res-book-cover-3rd.png\" align=\"right\" />]"
},
{
"path": "rep-res-3rd-edition/.gitignore",
"chars": 40,
"preview": ".Rproj.user\n.Rhistory\n.RData\n.Ruserdata\n"
},
{
"path": "rep-res-3rd-edition/01-author.Rmd",
"chars": 619,
"preview": "# About the Author {-}\n\n**Christopher Gandrud** is Head of Economics and Experimentation at Zalando SE. He leads teams o"
},
{
"path": "rep-res-3rd-edition/01-stylistic-conventions.Rmd",
"chars": 1612,
"preview": "# Stylistic Conventions {-}\n\nI use the following conventions throughout the book:\n\n- **Abstract variables**: Abstract "
},
{
"path": "rep-res-3rd-edition/02-additional-resources.Rmd",
"chars": 8334,
"preview": "# Additional Resources {-}\n\nYou can freely download additional resources supplementing examples in this book. These reso"
},
{
"path": "rep-res-3rd-edition/03-introduction.Rmd",
"chars": 44491,
"preview": "\\mainmatter\n\n# (PART) Getting Started {-}\n\n# Introducing Reproducible Research{#Intro}\n\nResearch is typically presented "
},
{
"path": "rep-res-3rd-edition/04-getting-started.Rmd",
"chars": 24262,
"preview": "# Getting Started with Reproducible Research {#GettingStartedRR}\n\nResearchers often start thinking about making their wo"
},
{
"path": "rep-res-3rd-edition/05-start-R.Rmd",
"chars": 65194,
"preview": "# Getting Started with R, RStudio, and knitr/R Markdown {#GettingStartedRKnitr}\n\nIf you have rarely or never used R befo"
},
{
"path": "rep-res-3rd-edition/06-file-management.Rmd",
"chars": 31531,
"preview": "# Getting Started with File Management {#DirectoriesChapter}\n\nCareful file management is crucial for reproducible resear"
},
{
"path": "rep-res-3rd-edition/07-storage.Rmd",
"chars": 51845,
"preview": "# (PART) Data Gathering and Storage {-}\n\n# Storing, Collaborating, Accessing Files, and Versioning {#Storing}\n\nIn additi"
},
{
"path": "rep-res-3rd-edition/08-gather.Rmd",
"chars": 34289,
"preview": "# Gathering Data with R {#DataGather}\n\nHow you gather your data directly impacts how reproducible your research\nwill be."
},
{
"path": "rep-res-3rd-edition/09-clean.Rmd",
"chars": 36742,
"preview": "# Preparing Data for Analysis {#DataClean}\n\nOnce we have gathered the raw data that we want to include in our\nstatistica"
},
{
"path": "rep-res-3rd-edition/10-modeling.Rmd",
"chars": 27184,
"preview": "# (PART) Analysis and Results {-}\n\n# Statistical Modeling and knitr/R Markdown {#StatsModel}\n\nWhen you have your data cl"
},
{
"path": "rep-res-3rd-edition/11-tables.Rmd",
"chars": 43888,
"preview": "# Showing Results with Tables {#TablesChapter}\n\nGraphs and other visual methods, discussed in the next chapter, can\nofte"
},
{
"path": "rep-res-3rd-edition/12-figures.Rmd",
"chars": 43447,
"preview": "# Showing Results with Figures {#FiguresChapter}\n\nOne of the main reasons that many people use R is to take advantage of"
},
{
"path": "rep-res-3rd-edition/13-latex.Rmd",
"chars": 37868,
"preview": "# (PART) Presentation Documents {-}\n\n# Presenting with LaTeX {#LatexChapter}\n\nWe have already begun to see how LaTeX wor"
},
{
"path": "rep-res-3rd-edition/14-web.Rmd",
"chars": 34017,
"preview": "# Presenting in a Variety of Formats with R Markdown {#MarkdownChapter}\n\nWhile Markdown started as a simple way to write"
},
{
"path": "rep-res-3rd-edition/16-conclusion.Rmd",
"chars": 14950,
"preview": "# Conclusion {#bookconclusion}\n\n> *Well, we have completed our journey. The only thing left to do now is\n> practice, pra"
},
{
"path": "rep-res-3rd-edition/99-references.Rmd",
"chars": 925,
"preview": "`r if (knitr:::is_html_output()) '# References {-}'`\n\n```{r include=FALSE}\n# Additional packages to cite\npkg_additional "
},
{
"path": "rep-res-3rd-edition/LICENSE",
"chars": 1066,
"preview": "MIT License\n\nCopyright (c) 2016 Yihui Xie\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\n"
},
{
"path": "rep-res-3rd-edition/README.md",
"chars": 320,
"preview": "# Reproducible Research with R and RStudio (Third Edition)\n\nThis version of the book is built on top of the [Bookdown ex"
},
{
"path": "rep-res-3rd-edition/_bookdown.yml",
"chars": 193,
"preview": "book_filename: bookdown\nclean: [packages.bib, bookdown.bbl]\ndelete_merged_file: true\nlanguage:\n label:\n fig: \"FIGURE"
},
{
"path": "rep-res-3rd-edition/_output.yml",
"chars": 847,
"preview": "bookdown::gitbook:\n css: css/style.css\n config:\n toc:\n collapse: none\n before: |\n <li><a href=\"./\""
},
{
"path": "rep-res-3rd-edition/book.bib",
"chars": 21822,
"preview": "\n@book{healy2018data,\n title={Data Visualization: A Practical Introduction},\n author={Healy, Kieran},\n year={2018},\n "
},
{
"path": "rep-res-3rd-edition/css/style.css",
"chars": 315,
"preview": "p.caption {\n color: #777;\n margin-top: 10px;\n}\np code {\n white-space: inherit;\n}\npre {\n word-break: normal;\n word-w"
},
{
"path": "rep-res-3rd-edition/index.Rmd",
"chars": 7965,
"preview": "---\ntitle: \"Reproducible Research with R and RStudio (Third Edition)\"\nauthor: \"Christopher Gandrud\"\ndate: \"`r Sys.Date()"
},
{
"path": "rep-res-3rd-edition/krantz.cls",
"chars": 61050,
"preview": "%% This is file `Krantz.cls'\n%%% Created by Shashi Kumar / ITC [August 2008]\n\n\n\\NeedsTeXFormat{LaTeX2e}[1995/12/01]\n\\Pro"
},
{
"path": "rep-res-3rd-edition/latex/after_body.tex",
"chars": 24,
"preview": "\\backmatter\n\\printindex\n"
},
{
"path": "rep-res-3rd-edition/latex/before_body.tex",
"chars": 489,
"preview": "% you may need to leave a few empty pages before the dedication page\n\n%\\cleardoublepage\\newpage\\thispagestyle{empty}\\nul"
},
{
"path": "rep-res-3rd-edition/latex/preamble.tex",
"chars": 1262,
"preview": "\\usepackage{booktabs}\n\\usepackage{longtable}\n\\usepackage[bf,singlelinecheck=off]{caption}\n\n\\usepackage{framed,color}\n\\de"
},
{
"path": "rep-res-3rd-edition/packages.bib",
"chars": 16100,
"preview": "@Manual{R-animation,\n title = {animation: A Gallery of Animations in Statistics and Utilities to Create\nAnimations},\n "
},
{
"path": "rep-res-3rd-edition/rep-res-3rd-edition.Rproj",
"chars": 215,
"preview": "Version: 1.0\n\nRestoreWorkspace: No\nSaveWorkspace: No\nAlwaysSaveHistory: Default\n\nEnableCodeIndexing: Yes\nUseSpacesForTab"
}
]
About this extraction
This page contains the full source code of the christophergandrud/Rep-Res-Book GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 90 files (1.4 MB), approximately 394.6k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.