Repository: BurntSushi/xsv
Branch: master
Commit: f4304660231d
Files: 65
Total size: 240.2 KB

Directory structure:
gitextract_au2mdo17/

├── .gitignore
├── .travis.yml
├── BENCHMARKS.md
├── COPYING
├── Cargo.toml
├── LICENSE-MIT
├── Makefile
├── README.md
├── UNLICENSE
├── appveyor.yml
├── ci/
│   ├── before_deploy.sh
│   ├── install.sh
│   ├── script.sh
│   └── utils.sh
├── scripts/
│   ├── benchmark-basic
│   ├── build-release
│   ├── github-release
│   └── github-upload
├── session.vim
├── src/
│   ├── cmd/
│   │   ├── cat.rs
│   │   ├── count.rs
│   │   ├── fixlengths.rs
│   │   ├── flatten.rs
│   │   ├── fmt.rs
│   │   ├── frequency.rs
│   │   ├── headers.rs
│   │   ├── index.rs
│   │   ├── input.rs
│   │   ├── join.rs
│   │   ├── mod.rs
│   │   ├── partition.rs
│   │   ├── reverse.rs
│   │   ├── sample.rs
│   │   ├── search.rs
│   │   ├── select.rs
│   │   ├── slice.rs
│   │   ├── sort.rs
│   │   ├── split.rs
│   │   ├── stats.rs
│   │   └── table.rs
│   ├── config.rs
│   ├── index.rs
│   ├── main.rs
│   ├── select.rs
│   └── util.rs
└── tests/
    ├── test_cat.rs
    ├── test_count.rs
    ├── test_fixlengths.rs
    ├── test_flatten.rs
    ├── test_fmt.rs
    ├── test_frequency.rs
    ├── test_headers.rs
    ├── test_index.rs
    ├── test_join.rs
    ├── test_partition.rs
    ├── test_reverse.rs
    ├── test_search.rs
    ├── test_select.rs
    ├── test_slice.rs
    ├── test_sort.rs
    ├── test_split.rs
    ├── test_stats.rs
    ├── test_table.rs
    ├── tests.rs
    └── workdir.rs

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
.*.swp
doc
tags
examples/data/ss10pusa.csv
build
target
ctags.rust
*.csv
*.tsv
main
*.idx
builds


================================================
FILE: .travis.yml
================================================
language: rust
cache: cargo

env:
  global:
    - PROJECT_NAME=xsv
matrix:
  include:
    # Stable channel
    - os: linux
      rust: stable
      env: TARGET=i686-unknown-linux-musl
    - os: osx
      rust: stable
      env: TARGET=x86_64-apple-darwin
    - os: linux
      rust: stable
      env: TARGET=x86_64-unknown-linux-musl
    # Minimum Rust supported channel.
    - os: linux
      rust: 1.28.0
      env: TARGET=x86_64-unknown-linux-gnu
    - os: linux
      rust: 1.28.0
      env: TARGET=x86_64-unknown-linux-musl

before_install:
  - export PATH="$PATH:$HOME/.cargo/bin"

install:
  - bash ci/install.sh

script:
  - bash ci/script.sh

before_deploy:
  - bash ci/before_deploy.sh

deploy:
  provider: releases
  api_key:
    secure: aDT53aTIcl6RLcd4/StnKT55LgJyjiCtsmu1Byy0TIEtP4ZfNhsHwCbqyZT6TLownLJPi5wLM1WRncGKNYQelFDk/mUA8YugcFDfiSN//ZZ8KLAQiI+PX6JCrFYr/ZmP4dJzFWS1hPsr/X0gdbrlb3kuQG7BI9gH3GY4yTsLNiY=
  file_glob: true
  file: ${PROJECT_NAME}-${TRAVIS_TAG}-${TARGET}.*
  # don't delete the artifacts from previous phases
  skip_cleanup: true
  # deploy when a new tag is pushed
  on:
    # channel to use to produce the release artifacts
    # NOTE make sure you only release *once* per target
    # TODO you may want to pick a different channel
    condition: $TRAVIS_RUST_VERSION = stable
    tags: true

branches:
  only:
    # Pushes and PR to the master branch
    - master
    # IMPORTANT Ruby regex to match tags. Required, or travis won't trigger deploys when a new tag
    # is pushed. This regex matches semantic versions like v1.2.3-rc4+2016.02.22
    - /^\d+\.\d+\.\d+.*$/

notifications:
  email:
    on_success: never


================================================
FILE: BENCHMARKS.md
================================================
These are some very basic and unscientific benchmarks of various commands
provided by `xsv`. Please see below for more information.

These benchmarks were run with
[worldcitiespop_mil.csv](https://burntsushi.net/stuff/worldcitiespop_mil.csv),
which is a random 1,000,000 row subset of the world city population dataset
from the [Data Science Toolkit](https://github.com/petewarden/dstkdata).

These benchmarks were run on an Intel i7-6900K (8 CPUs, 16 threads) with 64GB
of memory.

```
count                   0.11 seconds   413.76  MB/sec
flatten                 4.54 seconds   10.02   MB/sec
flatten_condensed       4.45 seconds   10.22   MB/sec
frequency               1.82 seconds   25.00   MB/sec
index                   0.12 seconds   379.28  MB/sec
sample_10               0.18 seconds   252.85  MB/sec
sample_1000             0.18 seconds   252.85  MB/sec
sample_100000           0.29 seconds   156.94  MB/sec
search                  0.27 seconds   168.56  MB/sec
select                  0.14 seconds   325.09  MB/sec
search                  0.13 seconds   350.10  MB/sec
select                  0.13 seconds   350.10  MB/sec
sort                    2.18 seconds   20.87   MB/sec
slice_one_middle        0.08 seconds   568.92  MB/sec
slice_one_middle_index  0.01 seconds   4551.36 MB/sec
stats                   1.09 seconds   41.75   MB/sec
stats_index             0.15 seconds   303.42  MB/sec
stats_everything        1.94 seconds   23.46   MB/sec
stats_everything_index  0.93 seconds   48.93   MB/sec
```

### Details

The purpose of these benchmarks is to provide a rough ballpark estimate of how
fast each command is. My hope is that they can also catch significant
performance regressions.

The `count` command can be viewed as a sort of baseline of the fastest possible
command that parses every record in CSV data.

The benchmarks that end with `_index` are run with indexing enabled.


================================================
FILE: COPYING
================================================
This project is dual-licensed under the Unlicense and MIT licenses.

You may use this code under the terms of either license.


================================================
FILE: Cargo.toml
================================================
[package]
name = "xsv"
version = "0.13.0"  #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = "A high performance CSV command line toolkit."
documentation = "https://burntsushi.net/rustdoc/xsv/"
homepage = "https://github.com/BurntSushi/xsv"
repository = "https://github.com/BurntSushi/xsv"
readme = "README.md"
keywords = ["csv", "tsv", "slice", "command"]
license = "Unlicense/MIT"
autotests = false

[[bin]]
name = "xsv"
test = false
bench = false
doctest = false

[[test]]
name = "tests"

[profile.release]
opt-level = 3
debug = true

[profile.test]
opt-level = 3

[dependencies]
byteorder = "1"
crossbeam-channel = "0.2.4"
csv = "1"
csv-index = "0.1.5"
docopt = "1"
filetime = "0.1"
num_cpus = "1.4"
rand = "0.5"
regex = "1"
serde = "1"
serde_derive = "1"
streaming-stats = "0.2"
tabwriter = "1"
threadpool = "1.3"

[dev-dependencies]
quickcheck = { version = "0.7", default-features = false }
log = "0.4"


================================================
FILE: LICENSE-MIT
================================================
The MIT License (MIT)

Copyright (c) 2015 Andrew Gallant

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.


================================================
FILE: Makefile
================================================
all:
	@echo Nothing to do...

ctags:
	ctags --recurse --options=ctags.rust --languages=Rust

docs:
	cargo doc
	in-dir ./target/doc fix-perms
	rscp ./target/doc/* gopher:~/www/burntsushi.net/rustdoc/

debug:
	cargo build --verbose
	rustc -L ./target/deps/ -g -Z lto --opt-level 3 src/main.rs

push:
	git push home master
	git push origin master

dev:
	cargo build
	cp ./target/xsv ~/bin/bin/xsv

release:
	cargo build --release
	mkdir -p ~/bin/bin
	cp ./target/release/xsv ~/bin/bin/xsv

github:
	./scripts/build-release
	./scripts/github-release
	./scripts/github-upload


================================================
FILE: README.md
================================================
# `xsv` is now unmaintained

In lieu of `xsv`, I'd recommend either
[qsv](https://github.com/dathere/qsv)
or
[xan](https://github.com/medialab/xan).

-------------------------------------------------------------------------------


xsv is a command line program for indexing, slicing, analyzing, splitting
and joining CSV files. Commands should be simple, fast and composable:

1. Simple tasks should be easy.
2. Performance trade offs should be exposed in the CLI interface.
3. Composition should not come at the expense of performance.

This README contains information on how to
[install `xsv`](https://github.com/BurntSushi/xsv#installation), in addition to
a quick tour of several commands.

[![Linux build status](https://api.travis-ci.org/BurntSushi/xsv.svg)](https://travis-ci.org/BurntSushi/xsv)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/xsv?svg=true)](https://ci.appveyor.com/project/BurntSushi/xsv)
[![](https://meritbadge.herokuapp.com/xsv)](https://crates.io/crates/xsv)

Dual-licensed under MIT or the [UNLICENSE](https://unlicense.org).


### Available commands

* **cat** - Concatenate CSV files by row or by column.
* **count** - Count the rows in a CSV file. (Instantaneous with an index.)
* **fixlengths** - Force a CSV file to have same-length records by either
  padding or truncating them.
* **flatten** - A flattened view of CSV records. Useful for viewing one record
  at a time. e.g., `xsv slice -i 5 data.csv | xsv flatten`.
* **fmt** - Reformat CSV data with different delimiters, record terminators
  or quoting rules. (Supports ASCII delimited data.)
* **frequency** - Build frequency tables of each column in CSV data. (Uses
  parallelism to go faster if an index is present.)
* **headers** - Show the headers of CSV data. Or show the intersection of all
  headers between many CSV files.
* **index** - Create an index for a CSV file. This is very quick and provides
  constant time indexing into the CSV file.
* **input** - Read CSV data with exotic quoting/escaping rules.
* **join** - Inner, outer and cross joins. Uses a simple hash index to make it
  fast.
* **partition** - Partition CSV data based on a column value.
* **sample** - Randomly draw rows from CSV data using reservoir sampling (i.e.,
  use memory proportional to the size of the sample).
* **reverse** - Reverse order of rows in CSV data.
* **search** - Run a regex over CSV data. Applies the regex to each field
  individually and shows only matching rows.
* **select** - Select or re-order columns from CSV data.
* **slice** - Slice rows from any part of a CSV file. When an index is present,
  this only has to parse the rows in the slice (instead of all rows leading up
  to the start of the slice).
* **sort** - Sort CSV data.
* **split** - Split one CSV file into many CSV files of N chunks.
* **stats** - Show basic types and statistics of each column in the CSV file.
  (i.e., mean, standard deviation, median, range, etc.)
* **table** - Show aligned output of any CSV data using
  [elastic tabstops](https://github.com/BurntSushi/tabwriter).


### A whirlwind tour

Let's say you're playing with some of the data from the
[Data Science Toolkit](https://github.com/petewarden/dstkdata), which contains
several CSV files. Maybe you're interested in the population counts of each
city in the world. So grab the data and start examining it:

```bash
$ curl -LO https://burntsushi.net/stuff/worldcitiespop.csv
$ xsv headers worldcitiespop.csv
1   Country
2   City
3   AccentCity
4   Region
5   Population
6   Latitude
7   Longitude
```

The next thing you might want to do is get an overview of the kind of data that
appears in each column. The `stats` command will do this for you:

```bash
$ xsv stats worldcitiespop.csv --everything | xsv table
field       type     min            max            min_length  max_length  mean          stddev         median     mode         cardinality
Country     Unicode  ad             zw             2           2                                                   cn           234
City        Unicode   bab el ahmar  Þykkvibaer     1           91                                                  san jose     2351892
AccentCity  Unicode   Bâb el Ahmar  ïn Bou Chella  1           91                                                  San Antonio  2375760
Region      Unicode  00             Z9             0           2                                        13         04           397
Population  Integer  7              31480498       0           8           47719.570634  302885.559204  10779                   28754
Latitude    Float    -54.933333     82.483333      1           12          27.188166     21.952614      32.497222  51.15        1038349
Longitude   Float    -179.983333    180            1           14          37.08886      63.22301       35.28      23.8         1167162
```

The `xsv table` command takes any CSV data and formats it into aligned columns
using [elastic tabstops](https://github.com/BurntSushi/tabwriter). You'll
notice that it even gets alignment right with respect to Unicode characters.

So, this command takes about 12 seconds to run on my machine, but we can speed
it up by creating an index and re-running the command:

```bash
$ xsv index worldcitiespop.csv
$ xsv stats worldcitiespop.csv --everything | xsv table
...
```

Which cuts it down to about 8 seconds on my machine. (And creating the index
takes less than 2 seconds.)

Notably, the same type of "statistics" command in another
[CSV command line toolkit](https://csvkit.readthedocs.io/)
takes about 2 minutes to produce similar statistics on the same data set.

Creating an index gives us more than just faster statistics gathering. It also
makes slice operations extremely fast because *only the sliced portion* has to
be parsed. For example, let's say you wanted to grab the last 10 records:

```bash
$ xsv count worldcitiespop.csv
3173958
$ xsv slice worldcitiespop.csv -s 3173948 | xsv table
Country  City               AccentCity         Region  Population  Latitude     Longitude
zw       zibalonkwe         Zibalonkwe         06                  -19.8333333  27.4666667
zw       zibunkululu        Zibunkululu        06                  -19.6666667  27.6166667
zw       ziga               Ziga               06                  -19.2166667  27.4833333
zw       zikamanas village  Zikamanas Village  00                  -18.2166667  27.95
zw       zimbabwe           Zimbabwe           07                  -20.2666667  30.9166667
zw       zimre park         Zimre Park         04                  -17.8661111  31.2136111
zw       ziyakamanas        Ziyakamanas        00                  -18.2166667  27.95
zw       zizalisari         Zizalisari         04                  -17.7588889  31.0105556
zw       zuzumba            Zuzumba            06                  -20.0333333  27.9333333
zw       zvishavane         Zvishavane         07      79876       -20.3333333  30.0333333
```

These commands are *instantaneous* because they run in time and memory
proportional to the size of the slice (which means they will scale to
arbitrarily large CSV data).

Switching gears a little bit, you might not always want to see every column in
the CSV data. In this case, maybe we only care about the country, city and
population. So let's take a look at 10 random rows:

```bash
$ xsv select Country,AccentCity,Population worldcitiespop.csv \
  | xsv sample 10 \
  | xsv table
Country  AccentCity       Population
cn       Guankoushang
za       Klipdrift
ma       Ouled Hammou
fr       Les Gravues
la       Ban Phadèng
de       Lüdenscheid      80045
qa       Umm ash Shubrum
bd       Panditgoan
us       Appleton
ua       Lukashenkivske
```

Whoops! It seems some cities don't have population counts. How pervasive is
that?

```bash
$ xsv frequency worldcitiespop.csv --limit 5
field,value,count
Country,cn,238985
Country,ru,215938
Country,id,176546
Country,us,141989
Country,ir,123872
City,san jose,328
City,san antonio,320
City,santa rosa,296
City,santa cruz,282
City,san juan,255
AccentCity,San Antonio,317
AccentCity,Santa Rosa,296
AccentCity,Santa Cruz,281
AccentCity,San Juan,254
AccentCity,San Miguel,254
Region,04,159916
Region,02,142158
Region,07,126867
Region,03,122161
Region,05,118441
Population,(NULL),3125978
Population,2310,12
Population,3097,11
Population,983,11
Population,2684,11
Latitude,51.15,777
Latitude,51.083333,772
Latitude,50.933333,769
Latitude,51.116667,769
Latitude,51.133333,767
Longitude,23.8,484
Longitude,23.2,477
Longitude,23.05,476
Longitude,25.3,474
Longitude,23.1,459
```

(The `xsv frequency` command builds a frequency table for each column in the
CSV data. This one only took 5 seconds.)

So it seems that most cities do not have a population count associated with
them at all. No matter—we can adjust our previous command so that it only
shows rows with a population count:

```bash
$ xsv search -s Population '[0-9]' worldcitiespop.csv \
  | xsv select Country,AccentCity,Population \
  | xsv sample 10 \
  | xsv table
Country  AccentCity       Population
es       Barañáin         22264
es       Puerto Real      36946
at       Moosburg         4602
hu       Hejobaba         1949
ru       Polyarnyye Zori  15092
gr       Kandíla          1245
is       Ólafsvík         992
hu       Decs             4210
bg       Sliven           94252
gb       Leatherhead      43544
```

Erk. Which country is `at`? No clue, but the Data Science Toolkit has a CSV
file called `countrynames.csv`. Let's grab it and do a join so we can see which
countries these are:

```bash
curl -LO https://gist.githubusercontent.com/anonymous/063cb470e56e64e98cf1/raw/98e2589b801f6ca3ff900b01a87fbb7452eb35c7/countrynames.csv
$ xsv headers countrynames.csv
1   Abbrev
2   Country
$ xsv join --no-case  Country sample.csv Abbrev countrynames.csv | xsv table
Country  AccentCity       Population  Abbrev  Country
es       Barañáin         22264       ES      Spain
es       Puerto Real      36946       ES      Spain
at       Moosburg         4602        AT      Austria
hu       Hejobaba         1949        HU      Hungary
ru       Polyarnyye Zori  15092       RU      Russian Federation | Russia
gr       Kandíla          1245        GR      Greece
is       Ólafsvík         992         IS      Iceland
hu       Decs             4210        HU      Hungary
bg       Sliven           94252       BG      Bulgaria
gb       Leatherhead      43544       GB      Great Britain | UK | England | Scotland | Wales | Northern Ireland | United Kingdom
```

Whoops, now we have two columns called `Country` and an `Abbrev` column that we
no longer need. This is easy to fix by re-ordering columns with the `xsv
select` command:

```bash
$ xsv join --no-case  Country sample.csv Abbrev countrynames.csv \
  | xsv select 'Country[1],AccentCity,Population' \
  | xsv table
Country                                                                              AccentCity       Population
Spain                                                                                Barañáin         22264
Spain                                                                                Puerto Real      36946
Austria                                                                              Moosburg         4602
Hungary                                                                              Hejobaba         1949
Russian Federation | Russia                                                          Polyarnyye Zori  15092
Greece                                                                               Kandíla          1245
Iceland                                                                              Ólafsvík         992
Hungary                                                                              Decs             4210
Bulgaria                                                                             Sliven           94252
Great Britain | UK | England | Scotland | Wales | Northern Ireland | United Kingdom  Leatherhead      43544
```

Perhaps we can do this with the original CSV data? Indeed we can—because
joins in `xsv` are fast.

```bash
$ xsv join --no-case Abbrev countrynames.csv Country worldcitiespop.csv \
  | xsv select '!Abbrev,Country[1]' \
  > worldcitiespop_countrynames.csv
$ xsv sample 10 worldcitiespop_countrynames.csv | xsv table
Country                      City                   AccentCity             Region  Population  Latitude    Longitude
Sri Lanka                    miriswatte             Miriswatte             36                  7.2333333   79.9
Romania                      livezile               Livezile               26      1985        44.512222   22.863333
Indonesia                    tawainalu              Tawainalu              22                  -4.0225     121.9273
Russian Federation | Russia  otar                   Otar                   45                  56.975278   48.305278
France                       le breuil-bois robert  le Breuil-Bois Robert  A8                  48.945567   1.717026
France                       lissac                 Lissac                 B1                  45.103094   1.464927
Albania                      lumalasi               Lumalasi               46                  40.6586111  20.7363889
China                        motzushih              Motzushih              11                  27.65       111.966667
Russian Federation | Russia  svakino                Svakino                69                  55.60211    34.559785
Romania                      tirgu pancesti         Tirgu Pancesti         38                  46.216667   27.1
```

The `!Abbrev,Country[1]` syntax means, "remove the `Abbrev` column and remove
the second occurrence of the `Country` column." Since we joined with
`countrynames.csv` first, the first `Country` name (fully expanded) is now
included in the CSV data.

This `xsv join` command takes about 7 seconds on my machine. The performance
comes from constructing a very simple hash index of one of the CSV data files
given. The `join` command does an inner join by default, but it also has left,
right and full outer join support too.


### Installation

Binaries for Windows, Linux and macOS are available [from Github](https://github.com/BurntSushi/xsv/releases/latest).

If you're a **macOS Homebrew** user, then you can install xsv
from homebrew-core:

```
$ brew install xsv
```

If you're a **macOS MacPorts** user, then you can install xsv
from the [official ports](https://www.macports.org/ports.php?by=name&substr=xsv):

```
$ sudo port install xsv
```

If you're a **Nix/NixOS** user, you can install xsv from nixpkgs:

```
$ nix-env -i xsv
```

Alternatively, you can compile from source by
[installing Cargo](https://crates.io/install)
([Rust's](https://www.rust-lang.org/) package manager)
and installing `xsv` using Cargo:

```bash
cargo install xsv
```

Compiling from this repository also works similarly:

```bash
git clone git://github.com/BurntSushi/xsv
cd xsv
cargo build --release
```

Compilation will probably take a few minutes depending on your machine. The
binary will end up in `./target/release/xsv`.


### Benchmarks

I've compiled some [very rough
benchmarks](https://github.com/BurntSushi/xsv/blob/master/BENCHMARKS.md) of
various `xsv` commands.


### Motivation

Here are several valid criticisms of this project:

1. You shouldn't be working with CSV data because CSV is a terrible format.
2. If your data is gigabytes in size, then CSV is the wrong storage type.
3. Various SQL databases provide all of the operations available in `xsv` with
   more sophisticated indexing support. And the performance is a zillion times
   better.

I'm sure there are more criticisms, but the impetus for this project was a 40GB
CSV file that was handed to me. I was tasked with figuring out the shape of the
data inside of it and coming up with a way to integrate it into our existing
system. It was then that I realized that every single CSV tool I knew about was
woefully inadequate. They were just too slow or didn't provide enough
flexibility. (Another project I had comprised of a few dozen CSV files. They
were smaller than 40GB, but they were each supposed to represent the same kind
of data. But they all had different column and unintuitive column names. Useful
CSV inspection tools were critical here—and they had to be reasonably fast.)

The key ingredients for helping me with my task were indexing, random sampling,
searching, slicing and selecting columns. All of these things made dealing with
40GB of CSV data a bit more manageable (or dozens of CSV files).

Getting handed a large CSV file *once* was enough to launch me on this quest.
From conversations I've had with others, CSV data files this large don't seem
to be a rare event. Therefore, I believe there is room for a tool that has a
hope of dealing with data that large.


### Naming collision

This project is unrelated to another similar project with the same name:
https://mj.ucw.cz/sw/xsv/


================================================
FILE: UNLICENSE
================================================
This is free and unencumbered software released into the public domain.

Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.

In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.

For more information, please refer to <http://unlicense.org/>


================================================
FILE: appveyor.yml
================================================

# Inspired from https://github.com/habitat-sh/habitat/blob/master/appveyor.yml
cache:
  - c:\cargo\registry
  - c:\cargo\git
  - c:\projects\ripgrep\target

init:
  - mkdir c:\cargo
  - mkdir c:\rustup
  - SET PATH=c:\cargo\bin;%PATH%

environment:
  CARGO_HOME: "c:\\cargo"
  RUSTUP_HOME: "c:\\rustup"
  CARGO_TARGET_DIR: "c:\\projects\\xsv\\target"
  global:
    PROJECT_NAME: xsv
    RUST_BACKTRACE: full
  matrix:
    # Stable channel
    - TARGET: i686-pc-windows-gnu
      CHANNEL: stable
    - TARGET: i686-pc-windows-msvc
      CHANNEL: stable
    - TARGET: x86_64-pc-windows-gnu
      CHANNEL: stable
    - TARGET: x86_64-pc-windows-msvc
      CHANNEL: stable

matrix:
  fast_finish: true

# Install Rust and Cargo
# (Based on from https://github.com/rust-lang/libc/blob/master/appveyor.yml)
install:
  - curl -sSf -o rustup-init.exe https://win.rustup.rs/
  - rustup-init.exe -y --default-host %TARGET% --no-modify-path
  - if defined MSYS2_BITS set PATH=%PATH%;C:\msys64\mingw%MSYS2_BITS%\bin
  - rustc -V
  - cargo -V

# ???
build: false

# Equivalent to Travis' `script` phase
# TODO modify this phase as you see fit
test_script:
  - cargo test --verbose

before_deploy:
  # Generate artifacts for release
  - cargo build --release
  - mkdir staging
  # TODO update this part to copy the artifacts that make sense for your project
  - copy target\release\xsv.exe staging
  - cd staging
    # release zipfile will look like 'rust-everywhere-v1.2.3-x86_64-pc-windows-msvc'
  - 7z a ../%PROJECT_NAME%-%APPVEYOR_REPO_TAG_NAME%-%TARGET%.zip *
  - appveyor PushArtifact ../%PROJECT_NAME%-%APPVEYOR_REPO_TAG_NAME%-%TARGET%.zip

deploy:
  description: 'Windows release'
  # All the zipped artifacts will be deployed
  artifact: /.*\.zip/
  auth_token:
    secure: vv4vBCEosGlyQjaEC1+kraP2P6O4CQSa+Tw50oHWFTGcmuXxaWS0/yEXbxsIRLpw
  provider: GitHub
  # deploy when a new tag is pushed and only on the stable channel
  on:
    # channel to use to produce the release artifacts
    # NOTE make sure you only release *once* per target
    # TODO you may want to pick a different channel
    CHANNEL: stable
    appveyor_repo_tag: true

branches:
  only:
    - appveyor
    - /\d+\.\d+\.\d+/
  except:
    - master


================================================
FILE: ci/before_deploy.sh
================================================
# `before_deploy` phase: here we package the build artifacts

set -ex

. $(dirname $0)/utils.sh

# Generate artifacts for release
mk_artifacts() {
    cargo build --target $TARGET --release
}

mk_tarball() {
    # create a "staging" directory
    local td=$(mktempd)
    local out_dir=$(pwd)

    # TODO update this part to copy the artifacts that make sense for your project
    # NOTE All Cargo build artifacts will be under the 'target/$TARGET/{debug,release}'
    cp target/$TARGET/release/xsv $td

    pushd $td

    # release tarball will look like 'rust-everywhere-v1.2.3-x86_64-unknown-linux-gnu.tar.gz'
    tar czf $out_dir/${PROJECT_NAME}-${TRAVIS_TAG}-${TARGET}.tar.gz *

    popd
    rm -r $td
}

main() {
    mk_artifacts
    mk_tarball
}

main


================================================
FILE: ci/install.sh
================================================
# `install` phase: install stuff needed for the `script` phase

set -ex

. $(dirname $0)/utils.sh

install_c_toolchain() {
    case $TARGET in
        aarch64-unknown-linux-gnu)
            sudo apt-get install -y --no-install-recommends \
                 gcc-aarch64-linux-gnu libc6-arm64-cross libc6-dev-arm64-cross
            ;;
        *)
            # For other targets, this is handled by addons.apt.packages in .travis.yml
            ;;
    esac
}

install_rustup() {
    curl https://sh.rustup.rs -sSf \
      | sh -s -- -y --default-toolchain=$TRAVIS_RUST_VERSION

    rustc -V
    cargo -V
}

install_standard_crates() {
    if [ $(host) != "$TARGET" ]; then
        rustup target add $TARGET
    fi
}

configure_cargo() {
    local prefix=$(gcc_prefix)

    if [ ! -z $prefix ]; then
        # information about the cross compiler
        ${prefix}gcc -v

        # tell cargo which linker to use for cross compilation
        mkdir -p .cargo
        cat >>.cargo/config <<EOF
[target.$TARGET]
linker = "${prefix}gcc"
EOF
    fi
}

main() {
    install_c_toolchain
    install_rustup
    install_standard_crates
    configure_cargo

    # TODO if you need to install extra stuff add it here
}

main


================================================
FILE: ci/script.sh
================================================
# `script` phase: you usually build, test and generate docs in this phase

set -ex

. $(dirname $0)/utils.sh

# NOTE Workaround for rust-lang/rust#31907 - disable doc tests when cross compiling
# This has been fixed in the nightly channel but it would take a while to reach the other channels
disable_cross_doctests() {
    if [ $(host) != "$TARGET" ] && [ "$TRAVIS_RUST_VERSION" = "stable" ]; then
        if [ "$TRAVIS_OS_NAME" = "osx" ]; then
            brew install gnu-sed --default-names
        fi

        find src -name '*.rs' -type f | xargs sed -i -e 's:\(//.\s*```\):\1 ignore,:g'
    fi
}

# TODO modify this function as you see fit
# PROTIP Always pass `--target $TARGET` to cargo commands, this makes cargo output build artifacts
# to target/$TARGET/{debug,release} which can reduce the number of needed conditionals in the
# `before_deploy`/packaging phase
run_test_suite() {
    case $TARGET in
        # configure emulation for transparent execution of foreign binaries
        aarch64-unknown-linux-gnu)
            export QEMU_LD_PREFIX=/usr/aarch64-linux-gnu
            ;;
        arm*-unknown-linux-gnueabihf)
            export QEMU_LD_PREFIX=/usr/arm-linux-gnueabihf
            ;;
        *)
            ;;
    esac

    if [ ! -z "$QEMU_LD_PREFIX" ]; then
        # Run tests on a single thread when using QEMU user emulation
        export RUST_TEST_THREADS=1
    fi

    cargo build --target $TARGET --verbose
    cargo test --target $TARGET

    # sanity check the file type
    file target/$TARGET/debug/xsv
}

main() {
    disable_cross_doctests
    run_test_suite
}

main


================================================
FILE: ci/utils.sh
================================================
mktempd() {
    echo $(mktemp -d 2>/dev/null || mktemp -d -t tmp)
}

host() {
    case "$TRAVIS_OS_NAME" in
        linux)
            echo x86_64-unknown-linux-gnu
            ;;
        osx)
            echo x86_64-apple-darwin
            ;;
    esac
}

gcc_prefix() {
    case "$TARGET" in
        aarch64-unknown-linux-gnu)
            echo aarch64-linux-gnu-
            ;;
        arm*-gnueabihf)
            echo arm-linux-gnueabihf-
            ;;
        *)
            return
            ;;
    esac
}

dobin() {
    [ -z $MAKE_DEB ] && die 'dobin: $MAKE_DEB not set'
    [ $# -lt 1 ] && die "dobin: at least one argument needed"

    local f prefix=$(gcc_prefix)
    for f in "$@"; do
        install -m0755 $f $dtd/debian/usr/bin/
        ${prefix}strip -s $dtd/debian/usr/bin/$(basename $f)
    done
}

architecture() {
    case $1 in
        x86_64-unknown-linux-gnu|x86_64-unknown-linux-musl)
            echo amd64
            ;;
        i686-unknown-linux-gnu|i686-unknown-linux-musl)
            echo i386
            ;;
        arm*-unknown-linux-gnueabihf)
            echo armhf
            ;;
        *)
            die "architecture: unexpected target $TARGET"
            ;;
    esac
}


================================================
FILE: scripts/benchmark-basic
================================================
#!/bin/sh

# This script does some very basic benchmarks with 'xsv' on a city population
# data set (which is a strict subset of the `worldcitiespop` data set). If it
# doesn't exist on your system, it will be downloaded to /tmp for you.
#
# These aren't meant to be overly rigorous, but they should be enough to catch
# significant regressions.
#
# Make sure you're using an `xsv` generated by `cargo build --release`.

set -e

pat="$1"
data=/tmp/worldcitiespop_mil.csv
data_idx=/tmp/worldcitiespop_mil.csv.idx
if [ ! -r "$data" ]; then
  curl -sS https://burntsushi.net/stuff/worldcitiespop_mil.csv > "$data"
fi
data_size=$(stat --format '%s' "$data")

function real_seconds {
  cmd=$(echo $@ "> /dev/null 2>&1")
  t=$(
    $(which time) -p sh -c "$cmd" 2>&1 \
      | grep '^real' \
      | awk '{print $2}')
  if [ $(echo "$t < 0.01" | bc) = 1 ]; then
    t=0.01
  fi
  echo $t
}

function benchmark {
  rm -f "$data_idx"
  t1=$(real_seconds "$@")
  rm -f "$data_idx"
  t2=$(real_seconds "$@")
  rm -f "$data_idx"
  t3=$(real_seconds "$@")
  echo "scale=2; ($t1 + $t2 + $t3) / 3" | bc
}

function benchmark_with_index {
  rm -f "$data_idx"
  xsv index "$data"
  t1=$(real_seconds "$@")
  t2=$(real_seconds "$@")
  t3=$(real_seconds "$@")
  rm -f "$data_idx"
  echo "scale=2; ($t1 + $t2 + $t3) / 3" | bc
}

function run {
  index=
  while true; do
    case "$1" in
      --index) index="yes" && shift ;;
      *) break ;;
    esac
  done
  name="$1"
  shift

  if [ -z "$pat" ] || echo "$name" | grep -E -q "^$pat$"; then
    if [ -z "$index" ]; then
      t=$(benchmark "$@")
    else
      t=$(benchmark_with_index "$@")
    fi
    mb_per=$(echo "scale=2; ($data_size / $t) / 2^20" | bc)
    printf "%s\t%0.02f seconds\t%s MB/sec\n" $name $t $mb_per
  fi
}

run count xsv count "$data"
run flatten xsv flatten "$data"
run flatten_condensed xsv flatten "$data" --condense 50
run frequency xsv frequency "$data"
run index xsv index "$data"
run sample_10 xsv sample 10 "$data"
run sample_1000 xsv sample 1000 "$data"
run sample_100000 xsv sample 100000 "$data"
run search xsv search -s Country "'(?i)us'" "$data"
run select xsv select Country "$data"
run sort xsv sort -s AccentCity "$data"
run slice_one_middle xsv slice -i 500000 "$data"
run --index slice_one_middle_index xsv slice -i 500000 "$data"
run stats xsv stats "$data"
run --index stats_index xsv stats "$data"
run stats_everything xsv stats "$data" --everything
run --index stats_everything_index xsv stats "$data" --everything


================================================
FILE: scripts/build-release
================================================
#!/bin/sh

version=$(git describe --abbrev=0 --tags)
name="xsv-$version-x86_64-unknown-linux-gnu"

mkdir -p ./builds/
cargo build --release
rm -rf "/tmp/$name"
mkdir "/tmp/$name"
cp ./target/release/xsv "/tmp/$name/"
cp ./README.md "/tmp/$name/"
cp ./UNLICENSE "/tmp/$name/"
tar zcf "./builds/$name.tar.gz" -C /tmp $name


================================================
FILE: scripts/github-release
================================================
#!/bin/sh

version=$(git describe --abbrev=0 --tags)
name="xsv-$version-x86_64-unknown-linux-gnu"

github-release release --user BurntSushi --repo xsv --tag $version \
  --name "xsv-$version" --pre-release


================================================
FILE: scripts/github-upload
================================================
#!/bin/sh

version=$(git describe --abbrev=0 --tags)
name="xsv-$version-x86_64-unknown-linux-gnu"

./scripts/build-release
github-release upload --user BurntSushi --repo xsv --tag $version \
  --name "$name.tar.gz" \
  --file "./builds/$name.tar.gz"


================================================
FILE: session.vim
================================================
au BufWritePost *.rs silent!make ctags > /dev/null 2>&1


================================================
FILE: src/cmd/cat.rs
================================================
use csv;

use CliResult;
use config::{Config, Delimiter};
use util;

static USAGE: &'static str = "
Concatenates CSV data by column or by row.

When concatenating by column, the columns will be written in the same order as
the inputs given. The number of rows in the result is always equivalent to to
the minimum number of rows across all given CSV data. (This behavior can be
reversed with the '--pad' flag.)

When concatenating by row, all CSV data must have the same number of columns.
If you need to rearrange the columns or fix the lengths of records, use the
'select' or 'fixlengths' commands. Also, only the headers of the *first* CSV
data given are used. Headers in subsequent inputs are ignored. (This behavior
can be disabled with --no-headers.)

Usage:
    xsv cat rows    [options] [<input>...]
    xsv cat columns [options] [<input>...]
    xsv cat --help

cat options:
    -p, --pad              When concatenating columns, this flag will cause
                           all records to appear. It will pad each row if
                           other CSV data isn't long enough.

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -n, --no-headers       When set, the first row will NOT be interpreted
                           as column names. Note that this has no effect when
                           concatenating columns.
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Deserialize)]
struct Args {
    cmd_rows: bool,
    cmd_columns: bool,
    arg_input: Vec<String>,
    flag_pad: bool,
    flag_output: Option<String>,
    flag_no_headers: bool,
    flag_delimiter: Option<Delimiter>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;
    if args.cmd_rows {
        args.cat_rows()
    } else if args.cmd_columns {
        args.cat_columns()
    } else {
        unreachable!();
    }
}

impl Args {
    fn configs(&self) -> CliResult<Vec<Config>> {
        util::many_configs(&*self.arg_input,
                           self.flag_delimiter,
                           self.flag_no_headers)
             .map_err(From::from)
    }

    fn cat_rows(&self) -> CliResult<()> {
        let mut row = csv::ByteRecord::new();
        let mut wtr = Config::new(&self.flag_output).writer()?;
        for (i, conf) in self.configs()?.into_iter().enumerate() {
            let mut rdr = conf.reader()?;
            if i == 0 {
                conf.write_headers(&mut rdr, &mut wtr)?;
            }
            while rdr.read_byte_record(&mut row)? {
                wtr.write_byte_record(&row)?;
            }
        }
        wtr.flush().map_err(From::from)
    }

    fn cat_columns(&self) -> CliResult<()> {
        let mut wtr = Config::new(&self.flag_output).writer()?;
        let mut rdrs = self.configs()?
            .into_iter()
            .map(|conf| conf.no_headers(true).reader())
            .collect::<Result<Vec<_>, _>>()?;

        // Find the lengths of each record. If a length varies, then an error
        // will occur so we can rely on the first length being the correct one.
        let mut lengths = vec![];
        for rdr in &mut rdrs {
            lengths.push(rdr.byte_headers()?.len());
        }

        let mut iters = rdrs.iter_mut()
                            .map(|rdr| rdr.byte_records())
                            .collect::<Vec<_>>();
        'OUTER: loop {
            let mut record = csv::ByteRecord::new();
            let mut num_done = 0;
            for (iter, &len) in iters.iter_mut().zip(lengths.iter()) {
                match iter.next() {
                    None => {
                        num_done += 1;
                        if self.flag_pad {
                            for _ in 0..len {
                                record.push_field(b"");
                            }
                        } else {
                            break 'OUTER;
                        }
                    }
                    Some(Err(err)) => return fail!(err),
                    Some(Ok(next)) => record.extend(&next),
                }
            }
            // Only needed when `--pad` is set.
            // When not set, the OUTER loop breaks when the shortest iterator
            // is exhausted.
            if num_done >= iters.len() {
                break 'OUTER;
            }
            wtr.write_byte_record(&record)?;
        }
        wtr.flush().map_err(From::from)
    }
}


================================================
FILE: src/cmd/count.rs
================================================
use csv;

use CliResult;
use config::{Delimiter, Config};
use util;

static USAGE: &'static str = "
Prints a count of the number of records in the CSV data.

Note that the count will not include the header row (unless --no-headers is
given).

Usage:
    xsv count [options] [<input>]

Common options:
    -h, --help             Display this message
    -n, --no-headers       When set, the first row will not be included in
                           the count.
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Deserialize)]
struct Args {
    arg_input: Option<String>,
    flag_no_headers: bool,
    flag_delimiter: Option<Delimiter>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;
    let conf = Config::new(&args.arg_input)
        .delimiter(args.flag_delimiter)
        .no_headers(args.flag_no_headers);

    let count =
        match conf.indexed()? {
            Some(idx) => idx.count(),
            None => {
                let mut rdr = conf.reader()?;
                let mut count = 0u64;
                let mut record = csv::ByteRecord::new();
                while rdr.read_byte_record(&mut record)? {
                    count += 1;
                }
                count
            }
        };
    Ok(println!("{}", count))
}


================================================
FILE: src/cmd/fixlengths.rs
================================================
use std::cmp;

use csv;

use CliResult;
use config::{Config, Delimiter};
use util;

static USAGE: &'static str = "
Transforms CSV data so that all records have the same length. The length is
the length of the longest record in the data (not counting trailing empty fields,
but at least 1). Records with smaller lengths are padded with empty fields.

This requires two complete scans of the CSV data: one for determining the
record size and one for the actual transform. Because of this, the input
given must be a file and not stdin.

Alternatively, if --length is set, then all records are forced to that length.
This requires a single pass and can be done with stdin.

Usage:
    xsv fixlengths [options] [<input>]

fixlengths options:
    -l, --length <arg>     Forcefully set the length of each record. If a
                           record is not the size given, then it is truncated
                           or expanded as appropriate.

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Deserialize)]
struct Args {
    arg_input: Option<String>,
    flag_length: Option<usize>,
    flag_output: Option<String>,
    flag_delimiter: Option<Delimiter>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;
    let config = Config::new(&args.arg_input)
        .delimiter(args.flag_delimiter)
        .no_headers(true)
        .flexible(true);
    let length = match args.flag_length {
        Some(length) => {
            if length == 0 {
                return fail!("Length must be greater than 0.");
            }
            length
        }
        None => {
            if config.is_std() {
                return fail!("<stdin> cannot be used in this command. \
                              Please specify a file path.");
            }
            let mut maxlen = 0usize;
            let mut rdr = config.reader()?;
            let mut record = csv::ByteRecord::new();
            while rdr.read_byte_record(&mut record)? {
                let mut index = 0;
                let mut nonempty_count = 0;
                for field in &record {
                    index += 1;
                    if index == 1 || !field.is_empty() {
                        nonempty_count = index;
                    }
                }
                maxlen = cmp::max(maxlen, nonempty_count);
            }
            maxlen
        }
    };

    let mut rdr = config.reader()?;
    let mut wtr = Config::new(&args.flag_output).writer()?;
    for r in rdr.byte_records() {
        let mut r = r?;
        if length >= r.len() {
            for _ in r.len()..length {
                r.push_field(b"");
            }
        } else {
            r.truncate(length);
        }
        wtr.write_byte_record(&r)?;
    }
    wtr.flush()?;
    Ok(())
}


================================================
FILE: src/cmd/flatten.rs
================================================
use std::borrow::Cow;
use std::io::{self, Write};

use tabwriter::TabWriter;

use CliResult;
use config::{Config, Delimiter};
use util;

static USAGE: &'static str = "
Prints flattened records such that fields are labeled separated by a new line.
This mode is particularly useful for viewing one record at a time. Each
record is separated by a special '#' character (on a line by itself), which
can be changed with the --separator flag.

There is also a condensed view (-c or --condense) that will shorten the
contents of each field to provide a summary view.

Usage:
    xsv flatten [options] [<input>]

flatten options:
    -c, --condense <arg>  Limits the length of each field to the value
                           specified. If the field is UTF-8 encoded, then
                           <arg> refers to the number of code points.
                           Otherwise, it refers to the number of bytes.
    -s, --separator <arg>  A string of characters to write after each record.
                           When non-empty, a new line is automatically
                           appended to the separator.
                           [default: #]

Common options:
    -h, --help             Display this message
    -n, --no-headers       When set, the first row will not be interpreted
                           as headers. When set, the name of each field
                           will be its index.
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Deserialize)]
struct Args {
    arg_input: Option<String>,
    flag_condense: Option<usize>,
    flag_separator: String,
    flag_no_headers: bool,
    flag_delimiter: Option<Delimiter>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;
    let rconfig = Config::new(&args.arg_input)
        .delimiter(args.flag_delimiter)
        .no_headers(args.flag_no_headers);
    let mut rdr = rconfig.reader()?;
    let headers = rdr.byte_headers()?.clone();

    let mut wtr = TabWriter::new(io::stdout());
    let mut first = true;
    for r in rdr.byte_records() {
        if !first && !args.flag_separator.is_empty() {
            writeln!(&mut wtr, "{}", args.flag_separator)?;
        }
        first = false;
        let r = r?;
        for (i, (header, field)) in headers.iter().zip(&r).enumerate() {
            if rconfig.no_headers {
                write!(&mut wtr, "{}", i)?;
            } else {
                wtr.write_all(&header)?;
            }
            wtr.write_all(b"\t")?;
            wtr.write_all(&*util::condense(
                Cow::Borrowed(&*field), args.flag_condense))?;
            wtr.write_all(b"\n")?;
        }
    }
    wtr.flush()?;
    Ok(())
}


================================================
FILE: src/cmd/fmt.rs
================================================
use csv;

use CliResult;
use config::{Config, Delimiter};
use util;

static USAGE: &'static str = "
Formats CSV data with a custom delimiter or CRLF line endings.

Generally, all commands in xsv output CSV data in a default format, which is
the same as the default format for reading CSV data. This makes it easy to
pipe multiple xsv commands together. However, you may want the final result to
have a specific delimiter or record separator, and this is where 'xsv fmt' is
useful.

Usage:
    xsv fmt [options] [<input>]

fmt options:
    -t, --out-delimiter <arg>  The field delimiter for writing CSV data.
                               [default: ,]
    --crlf                     Use '\\r\\n' line endings in the output.
    --ascii                    Use ASCII field and record separators.
    --quote <arg>              The quote character to use. [default: \"]
    --quote-always             Put quotes around every value.
    --escape <arg>             The escape character to use. When not specified,
                               quotes are escaped by doubling them.

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Deserialize)]
struct Args {
    arg_input: Option<String>,
    flag_out_delimiter: Option<Delimiter>,
    flag_crlf: bool,
    flag_ascii: bool,
    flag_output: Option<String>,
    flag_delimiter: Option<Delimiter>,
    flag_quote: Delimiter,
    flag_quote_always: bool,
    flag_escape: Option<Delimiter>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;

    let rconfig = Config::new(&args.arg_input)
        .delimiter(args.flag_delimiter)
        .no_headers(true);
    let mut wconfig = Config::new(&args.flag_output)
        .delimiter(args.flag_out_delimiter)
        .crlf(args.flag_crlf);

    if args.flag_ascii {
        wconfig = wconfig
            .delimiter(Some(Delimiter(b'\x1f')))
            .terminator(csv::Terminator::Any(b'\x1e'));
    }
    if args.flag_quote_always {
        wconfig = wconfig.quote_style(csv::QuoteStyle::Always);
    }
    if let Some(escape) = args.flag_escape {
        wconfig = wconfig.escape(Some(escape.as_byte())).double_quote(false);
    }
    wconfig = wconfig.quote(args.flag_quote.as_byte());


    let mut rdr = rconfig.reader()?;
    let mut wtr = wconfig.writer()?;
    let mut r = csv::ByteRecord::new();
    while rdr.read_byte_record(&mut r)? {
        wtr.write_byte_record(&r)?;
    }
    wtr.flush()?;
    Ok(())
}


================================================
FILE: src/cmd/frequency.rs
================================================
use std::fs;
use std::io;

use channel;
use csv;
use stats::{Frequencies, merge_all};
use threadpool::ThreadPool;

use CliResult;
use config::{Config, Delimiter};
use index::Indexed;
use select::{SelectColumns, Selection};
use util;

static USAGE: &'static str = "
Compute a frequency table on CSV data.

The frequency table is formatted as CSV data:

    field,value,count

By default, there is a row for the N most frequent values for each field in the
data. The order and number of values can be tweaked with --asc and --limit,
respectively.

Since this computes an exact frequency table, memory proportional to the
cardinality of each column is required.

Usage:
    xsv frequency [options] [<input>]

frequency options:
    -s, --select <arg>     Select a subset of columns to compute frequencies
                           for. See 'xsv select --help' for the format
                           details. This is provided here because piping 'xsv
                           select' into 'xsv frequency' will disable the use
                           of indexing.
    -l, --limit <arg>      Limit the frequency table to the N most common
                           items. Set to '0' to disable a limit.
                           [default: 10]
    -a, --asc              Sort the frequency tables in ascending order by
                           count. The default is descending order.
    --no-nulls             Don't include NULLs in the frequency table.
    -j, --jobs <arg>       The number of jobs to run in parallel.
                           This works better when the given CSV data has
                           an index already created. Note that a file handle
                           is opened for each job.
                           When set to '0', the number of jobs is set to the
                           number of CPUs detected.
                           [default: 0]

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -n, --no-headers       When set, the first row will NOT be included
                           in the frequency table. Additionally, the 'field'
                           column will be 1-based indices instead of header
                           names.
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Clone, Deserialize)]
struct Args {
    arg_input: Option<String>,
    flag_select: SelectColumns,
    flag_limit: usize,
    flag_asc: bool,
    flag_no_nulls: bool,
    flag_jobs: usize,
    flag_output: Option<String>,
    flag_no_headers: bool,
    flag_delimiter: Option<Delimiter>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;
    let rconfig = args.rconfig();

    let mut wtr = Config::new(&args.flag_output).writer()?;
    let (headers, tables) = match args.rconfig().indexed()? {
        Some(ref mut idx) if args.njobs() > 1 => args.parallel_ftables(idx),
        _ => args.sequential_ftables(),
    }?;

    wtr.write_record(vec!["field", "value", "count"])?;
    let head_ftables = headers.into_iter().zip(tables.into_iter());
    for (i, (header, ftab)) in head_ftables.enumerate() {
        let mut header = header.to_vec();
        if rconfig.no_headers {
            header = (i+1).to_string().into_bytes();
        }
        for (value, count) in args.counts(&ftab).into_iter() {
            let count = count.to_string();
            let row = vec![&*header, &*value, count.as_bytes()];
            wtr.write_record(row)?;
        }
    }
    Ok(())
}

type ByteString = Vec<u8>;
type Headers = csv::ByteRecord;
type FTable = Frequencies<Vec<u8>>;
type FTables = Vec<Frequencies<Vec<u8>>>;

impl Args {
    fn rconfig(&self) -> Config {
        Config::new(&self.arg_input)
            .delimiter(self.flag_delimiter)
            .no_headers(self.flag_no_headers)
            .select(self.flag_select.clone())
    }

    fn counts(&self, ftab: &FTable) -> Vec<(ByteString, u64)> {
        let mut counts = if self.flag_asc {
            ftab.least_frequent()
        } else {
            ftab.most_frequent()
        };
        if self.flag_limit > 0 {
            counts = counts.into_iter().take(self.flag_limit).collect();
        }
        counts.into_iter().map(|(bs, c)| {
            if b"" == &**bs {
                (b"(NULL)"[..].to_vec(), c)
            } else {
                (bs.clone(), c)
            }
        }).collect()
    }

    fn sequential_ftables(&self) -> CliResult<(Headers, FTables)> {
        let mut rdr = self.rconfig().reader()?;
        let (headers, sel) = self.sel_headers(&mut rdr)?;
        Ok((headers, self.ftables(&sel, rdr.byte_records())?))
    }

    fn parallel_ftables(&self, idx: &mut Indexed<fs::File, fs::File>)
                       -> CliResult<(Headers, FTables)> {
        let mut rdr = self.rconfig().reader()?;
        let (headers, sel) = self.sel_headers(&mut rdr)?;

        if idx.count() == 0 {
            return Ok((headers, vec![]));
        }

        let chunk_size = util::chunk_size(idx.count() as usize, self.njobs());
        let nchunks = util::num_of_chunks(idx.count() as usize, chunk_size);

        let pool = ThreadPool::new(self.njobs());
        let (send, recv) = channel::bounded(0);
        for i in 0..nchunks {
            let (send, args, sel) = (send.clone(), self.clone(), sel.clone());
            pool.execute(move || {
                let mut idx = args.rconfig().indexed().unwrap().unwrap();
                idx.seek((i * chunk_size) as u64).unwrap();
                let it = idx.byte_records().take(chunk_size);
                send.send(args.ftables(&sel, it).unwrap());
            });
        }
        drop(send);
        Ok((headers, merge_all(recv).unwrap()))
    }

    fn ftables<I>(&self, sel: &Selection, it: I) -> CliResult<FTables>
            where I: Iterator<Item=csv::Result<csv::ByteRecord>> {
        let null = &b""[..].to_vec();
        let nsel = sel.normal();
        let mut tabs: Vec<_> =
            (0..nsel.len()).map(|_| Frequencies::new()).collect();
        for row in it {
            let row = row?;
            for (i, field) in nsel.select(row.into_iter()).enumerate() {
                let field = trim(field.to_vec());
                if !field.is_empty() {
                    tabs[i].add(field);
                } else {
                    if !self.flag_no_nulls {
                        tabs[i].add(null.clone());
                    }
                }
            }
        }
        Ok(tabs)
    }

    fn sel_headers<R: io::Read>(&self, rdr: &mut csv::Reader<R>)
                  -> CliResult<(csv::ByteRecord, Selection)> {
        let headers = rdr.byte_headers()?;
        let sel = self.rconfig().selection(headers)?;
        Ok((sel.select(headers).map(|h| h.to_vec()).collect(), sel))
    }

    fn njobs(&self) -> usize {
        if self.flag_jobs == 0 { util::num_cpus() } else { self.flag_jobs }
    }
}

fn trim(bs: ByteString) -> ByteString {
    match String::from_utf8(bs) {
        Ok(s) => s.trim().as_bytes().to_vec(),
        Err(bs) => bs.into_bytes(),
    }
}


================================================
FILE: src/cmd/headers.rs
================================================
use std::io;

use tabwriter::TabWriter;

use CliResult;
use config::Delimiter;
use util;

static USAGE: &'static str = "
Prints the fields of the first row in the CSV data.

These names can be used in commands like 'select' to refer to columns in the
CSV data.

Note that multiple CSV files may be given to this command. This is useful with
the --intersect flag.

Usage:
    xsv headers [options] [<input>...]

headers options:
    -j, --just-names       Only show the header names (hide column index).
                           This is automatically enabled if more than one
                           input is given.
    --intersect            Shows the intersection of all headers in all of
                           the inputs given.

Common options:
    -h, --help             Display this message
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Deserialize)]
struct Args {
    arg_input: Vec<String>,
    flag_just_names: bool,
    flag_intersect: bool,
    flag_delimiter: Option<Delimiter>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;
    let configs = util::many_configs(
        &*args.arg_input, args.flag_delimiter, true)?;

    let num_inputs = configs.len();
    let mut headers: Vec<Vec<u8>> = vec![];
    for conf in configs.into_iter() {
        let mut rdr = conf.reader()?;
        for header in rdr.byte_headers()?.iter() {
            if !args.flag_intersect
                || !headers.iter().any(|h| &**h == header)
            {
                headers.push(header.to_vec());
            }
        }
    }

    let mut wtr: Box<io::Write> =
        if args.flag_just_names {
            Box::new(io::stdout())
        } else {
            Box::new(TabWriter::new(io::stdout()))
        };
    for (i, header) in headers.into_iter().enumerate() {
        if num_inputs == 1 && !args.flag_just_names {
            write!(&mut wtr, "{}\t", i+1)?;
        }
        wtr.write_all(&header)?;
        wtr.write_all(b"\n")?;
    }
    wtr.flush()?;
    Ok(())
}


================================================
FILE: src/cmd/index.rs
================================================
use std::fs;
use std::io;
use std::path::{Path, PathBuf};

use csv_index::RandomAccessSimple;

use CliResult;
use config::{Config, Delimiter};
use util;

static USAGE: &'static str = "
Creates an index of the given CSV data, which can make other operations like
slicing, splitting and gathering statistics much faster.

Note that this does not accept CSV data on stdin. You must give a file
path. The index is created at 'path/to/input.csv.idx'. The index will be
automatically used by commands that can benefit from it. If the original CSV
data changes after the index is made, commands that try to use it will result
in an error (you have to regenerate the index before it can be used again).

Usage:
    xsv index [options] <input>
    xsv index --help

index options:
    -o, --output <file>    Write index to <file> instead of <input>.idx.
                           Generally, this is not currently useful because
                           the only way to use an index is if it is specially
                           named <input>.idx.

Common options:
    -h, --help             Display this message
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Deserialize)]
struct Args {
    arg_input: String,
    flag_output: Option<String>,
    flag_delimiter: Option<Delimiter>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;

    let pidx = match args.flag_output {
        None => util::idx_path(&Path::new(&args.arg_input)),
        Some(p) => PathBuf::from(&p),
    };

    let rconfig = Config::new(&Some(args.arg_input))
                         .delimiter(args.flag_delimiter);
    let mut rdr = rconfig.reader_file()?;
    let mut wtr = io::BufWriter::new(fs::File::create(&pidx)?);
    RandomAccessSimple::create(&mut rdr, &mut wtr)?;
    Ok(())
}


================================================
FILE: src/cmd/input.rs
================================================
use csv;

use CliResult;
use config::{Config, Delimiter};
use util;

static USAGE: &'static str = "
Read CSV data with special quoting rules.

Generally, all xsv commands support basic options like specifying the delimiter
used in CSV data. This does not cover all possible types of CSV data. For
example, some CSV files don't use '\"' for quotes or use different escaping
styles.

Usage:
    xsv input [options] [<input>]

input options:
    --quote <arg>          The quote character to use. [default: \"]
    --escape <arg>         The escape character to use. When not specified,
                           quotes are escaped by doubling them.
    --no-quoting           Disable quoting completely.

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Deserialize)]
struct Args {
    arg_input: Option<String>,
    flag_output: Option<String>,
    flag_delimiter: Option<Delimiter>,
    flag_quote: Delimiter,
    flag_escape: Option<Delimiter>,
    flag_no_quoting: bool,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;
    let mut rconfig = Config::new(&args.arg_input)
        .delimiter(args.flag_delimiter)
        .no_headers(true)
        .quote(args.flag_quote.as_byte());
    let wconfig = Config::new(&args.flag_output);

    if let Some(escape) = args.flag_escape {
        rconfig = rconfig.escape(Some(escape.as_byte())).double_quote(false);
    }
    if args.flag_no_quoting {
        rconfig = rconfig.quoting(false);
    }

    let mut rdr = rconfig.reader()?;
    let mut wtr = wconfig.writer()?;
    let mut row = csv::ByteRecord::new();
    while rdr.read_byte_record(&mut row)? {
        wtr.write_record(&row)?;
    }
    wtr.flush()?;
    Ok(())
}


================================================
FILE: src/cmd/join.rs
================================================
use std::collections::hash_map::{HashMap, Entry};
use std::fmt;
use std::fs;
use std::io;
use std::iter::repeat;
use std::str;

use byteorder::{WriteBytesExt, BigEndian};
use csv;

use CliResult;
use config::{Config, Delimiter};
use index::Indexed;
use select::{SelectColumns, Selection};
use util;

static USAGE: &'static str = "
Joins two sets of CSV data on the specified columns.

The default join operation is an 'inner' join. This corresponds to the
intersection of rows on the keys specified.

Joins are always done by ignoring leading and trailing whitespace. By default,
joins are done case sensitively, but this can be disabled with the --no-case
flag.

The columns arguments specify the columns to join for each input. Columns can
be referenced by name or index, starting at 1. Specify multiple columns by
separating them with a comma. Specify a range of columns with `-`. Both
columns1 and columns2 must specify exactly the same number of columns.
(See 'xsv select --help' for the full syntax.)

Usage:
    xsv join [options] <columns1> <input1> <columns2> <input2>
    xsv join --help

join options:
    --no-case              When set, joins are done case insensitively.
    --left                 Do a 'left outer' join. This returns all rows in
                           first CSV data set, including rows with no
                           corresponding row in the second data set. When no
                           corresponding row exists, it is padded out with
                           empty fields.
    --right                Do a 'right outer' join. This returns all rows in
                           second CSV data set, including rows with no
                           corresponding row in the first data set. When no
                           corresponding row exists, it is padded out with
                           empty fields. (This is the reverse of 'outer left'.)
    --full                 Do a 'full outer' join. This returns all rows in
                           both data sets with matching records joined. If
                           there is no match, the missing side will be padded
                           out with empty fields. (This is the combination of
                           'outer left' and 'outer right'.)
    --cross                USE WITH CAUTION.
                           This returns the cartesian product of the CSV
                           data sets given. The number of rows return is
                           equal to N * M, where N and M correspond to the
                           number of rows in the given data sets, respectively.
    --nulls                When set, joins will work on empty fields.
                           Otherwise, empty fields are completely ignored.
                           (In fact, any row that has an empty field in the
                           key specified is ignored.)

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -n, --no-headers       When set, the first row will not be interpreted
                           as headers. (i.e., They are not searched, analyzed,
                           sliced, etc.)
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

type ByteString = Vec<u8>;

#[derive(Deserialize)]
struct Args {
    arg_columns1: SelectColumns,
    arg_input1: String,
    arg_columns2: SelectColumns,
    arg_input2: String,
    flag_left: bool,
    flag_right: bool,
    flag_full: bool,
    flag_cross: bool,
    flag_output: Option<String>,
    flag_no_headers: bool,
    flag_no_case: bool,
    flag_nulls: bool,
    flag_delimiter: Option<Delimiter>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;
    let mut state = args.new_io_state()?;
    match (
        args.flag_left,
        args.flag_right,
        args.flag_full,
        args.flag_cross,
    ) {
        (true, false, false, false) => {
            state.write_headers()?;
            state.outer_join(false)
        }
        (false, true, false, false) => {
            state.write_headers()?;
            state.outer_join(true)
        }
        (false, false, true, false) => {
            state.write_headers()?;
            state.full_outer_join()
        }
        (false, false, false, true) => {
            state.write_headers()?;
            state.cross_join()
        }
        (false, false, false, false) => {
            state.write_headers()?;
            state.inner_join()
        }
        _ => fail!("Please pick exactly one join operation.")
    }
}

struct IoState<R, W: io::Write> {
    wtr: csv::Writer<W>,
    rdr1: csv::Reader<R>,
    sel1: Selection,
    rdr2: csv::Reader<R>,
    sel2: Selection,
    no_headers: bool,
    casei: bool,
    nulls: bool,
}

impl<R: io::Read + io::Seek, W: io::Write> IoState<R, W> {
    fn write_headers(&mut self) -> CliResult<()> {
        if !self.no_headers {
            let mut headers = self.rdr1.byte_headers()?.clone();
            headers.extend(self.rdr2.byte_headers()?.iter());
            self.wtr.write_record(&headers)?;
        }
        Ok(())
    }

    fn inner_join(mut self) -> CliResult<()> {
        let mut scratch = csv::ByteRecord::new();
        let mut validx = ValueIndex::new(
            self.rdr2, &self.sel2, self.casei, self.nulls)?;
        for row in self.rdr1.byte_records() {
            let row = row?;
            let key = get_row_key(&self.sel1, &row, self.casei);
            match validx.values.get(&key) {
                None => continue,
                Some(rows) => {
                    for &rowi in rows.iter() {
                        validx.idx.seek(rowi as u64)?;

                        validx.idx.read_byte_record(&mut scratch)?;
                        let combined = row.iter().chain(scratch.iter());
                        self.wtr.write_record(combined)?;
                    }
                }
            }
        }
        Ok(())
    }

    fn outer_join(mut self, right: bool) -> CliResult<()> {
        if right {
            ::std::mem::swap(&mut self.rdr1, &mut self.rdr2);
            ::std::mem::swap(&mut self.sel1, &mut self.sel2);
        }

        let mut scratch = csv::ByteRecord::new();
        let (_, pad2) = self.get_padding()?;
        let mut validx = ValueIndex::new(
            self.rdr2, &self.sel2, self.casei, self.nulls)?;
        for row in self.rdr1.byte_records() {
            let row = row?;
            let key = get_row_key(&self.sel1, &row, self.casei);
            match validx.values.get(&key) {
                None => {
                    if right {
                        self.wtr.write_record(pad2.iter().chain(&row))?;
                    } else {
                        self.wtr.write_record(row.iter().chain(&pad2))?;
                    }
                }
                Some(rows) => {
                    for &rowi in rows.iter() {
                        validx.idx.seek(rowi as u64)?;
                        let row1 = row.iter();
                        validx.idx.read_byte_record(&mut scratch)?;
                        if right {
                            self.wtr.write_record(scratch.iter().chain(row1))?;
                        } else {
                            self.wtr.write_record(row1.chain(&scratch))?;
                        }
                    }
                }
            }
        }
        Ok(())
    }

    fn full_outer_join(mut self) -> CliResult<()> {
        let mut scratch = csv::ByteRecord::new();
        let (pad1, pad2) = self.get_padding()?;
        let mut validx = ValueIndex::new(
            self.rdr2, &self.sel2, self.casei, self.nulls)?;

        // Keep track of which rows we've written from rdr2.
        let mut rdr2_written: Vec<_> =
            repeat(false).take(validx.num_rows).collect();
        for row1 in self.rdr1.byte_records() {
            let row1 = row1?;
            let key = get_row_key(&self.sel1, &row1, self.casei);
            match validx.values.get(&key) {
                None => {
                    self.wtr.write_record(row1.iter().chain(&pad2))?;
                }
                Some(rows) => {
                    for &rowi in rows.iter() {
                        rdr2_written[rowi] = true;

                        validx.idx.seek(rowi as u64)?;
                        validx.idx.read_byte_record(&mut scratch)?;
                        self.wtr.write_record(row1.iter().chain(&scratch))?;
                    }
                }
            }
        }

        // OK, now write any row from rdr2 that didn't get joined with a row
        // from rdr1.
        for (i, &written) in rdr2_written.iter().enumerate() {
            if !written {
                validx.idx.seek(i as u64)?;
                validx.idx.read_byte_record(&mut scratch)?;
                self.wtr.write_record(pad1.iter().chain(&scratch))?;
            }
        }
        Ok(())
    }

    fn cross_join(mut self) -> CliResult<()> {
        let mut pos = csv::Position::new();
        pos.set_byte(0);
        let mut row2 = csv::ByteRecord::new();
        for row1 in self.rdr1.byte_records() {
            let row1 = row1?;
            self.rdr2.seek(pos.clone())?;
            if self.rdr2.has_headers() {
                // Read and skip the header row, since CSV readers disable
                // the header skipping logic after being seeked.
                self.rdr2.read_byte_record(&mut row2)?;
            }
            while self.rdr2.read_byte_record(&mut row2)? {
                self.wtr.write_record(row1.iter().chain(&row2))?;
            }
        }
        Ok(())
    }

    fn get_padding(
        &mut self,
    ) -> CliResult<(csv::ByteRecord, csv::ByteRecord)> {
        let len1 = self.rdr1.byte_headers()?.len();
        let len2 = self.rdr2.byte_headers()?.len();
        Ok((
            repeat(b"").take(len1).collect(),
            repeat(b"").take(len2).collect(),
        ))
    }
}

impl Args {
    fn new_io_state(&self)
        -> CliResult<IoState<fs::File, Box<io::Write+'static>>> {
        let rconf1 = Config::new(&Some(self.arg_input1.clone()))
            .delimiter(self.flag_delimiter)
            .no_headers(self.flag_no_headers)
            .select(self.arg_columns1.clone());
        let rconf2 = Config::new(&Some(self.arg_input2.clone()))
            .delimiter(self.flag_delimiter)
            .no_headers(self.flag_no_headers)
            .select(self.arg_columns2.clone());

        let mut rdr1 = rconf1.reader_file()?;
        let mut rdr2 = rconf2.reader_file()?;
        let (sel1, sel2) = self.get_selections(
            &rconf1, &mut rdr1, &rconf2, &mut rdr2)?;
        Ok(IoState {
            wtr: Config::new(&self.flag_output).writer()?,
            rdr1: rdr1,
            sel1: sel1,
            rdr2: rdr2,
            sel2: sel2,
            no_headers: rconf1.no_headers,
            casei: self.flag_no_case,
            nulls: self.flag_nulls,
        })
    }

    fn get_selections<R: io::Read>(
        &self,
        rconf1: &Config, rdr1: &mut csv::Reader<R>,
        rconf2: &Config, rdr2: &mut csv::Reader<R>,
    ) -> CliResult<(Selection, Selection)> {
        let headers1 = rdr1.byte_headers()?;
        let headers2 = rdr2.byte_headers()?;
        let select1 = rconf1.selection(&*headers1)?;
        let select2 = rconf2.selection(&*headers2)?;
        if select1.len() != select2.len() {
            return fail!(format!(
                "Column selections must have the same number of columns, \
                 but found column selections with {} and {} columns.",
                select1.len(), select2.len()));
        }
        Ok((select1, select2))
    }
}

struct ValueIndex<R> {
    // This maps tuples of values to corresponding rows.
    values: HashMap<Vec<ByteString>, Vec<usize>>,
    idx: Indexed<R, io::Cursor<Vec<u8>>>,
    num_rows: usize,
}

impl<R: io::Read + io::Seek> ValueIndex<R> {
    fn new(
        mut rdr: csv::Reader<R>,
        sel: &Selection,
        casei: bool,
        nulls: bool,
    ) -> CliResult<ValueIndex<R>> {
        let mut val_idx = HashMap::with_capacity(10000);
        let mut row_idx = io::Cursor::new(Vec::with_capacity(8 * 10000));
        let (mut rowi, mut count) = (0usize, 0usize);

        // This logic is kind of tricky. Basically, we want to include
        // the header row in the line index (because that's what csv::index
        // does), but we don't want to include header values in the ValueIndex.
        if !rdr.has_headers() {
            // ... so if there are no headers, we seek to the beginning and
            // index everything.
            let mut pos = csv::Position::new();
            pos.set_byte(0);
            rdr.seek(pos)?;
        } else {
            // ... and if there are headers, we make sure that we've parsed
            // them, and write the offset of the header row to the index.
            rdr.byte_headers()?;
            row_idx.write_u64::<BigEndian>(0)?;
            count += 1;
        }

        let mut row = csv::ByteRecord::new();
        while rdr.read_byte_record(&mut row)? {
            // This is a bit hokey. We're doing this manually instead of using
            // the `csv-index` crate directly so that we can create both
            // indexes in one pass.
            row_idx.write_u64::<BigEndian>(row.position().unwrap().byte())?;

            let fields: Vec<_> = sel
                .select(&row)
                .map(|v| transform(v, casei))
                .collect();
            if nulls || !fields.iter().any(|f| f.is_empty()) {
                match val_idx.entry(fields) {
                    Entry::Vacant(v) => {
                        let mut rows = Vec::with_capacity(4);
                        rows.push(rowi);
                        v.insert(rows);
                    }
                    Entry::Occupied(mut v) => {
                        v.get_mut().push(rowi);
                    }
                }
            }
            rowi += 1;
            count += 1;
        }

        row_idx.write_u64::<BigEndian>(count as u64)?;
        let idx = Indexed::open(rdr, io::Cursor::new(row_idx.into_inner()))?;
        Ok(ValueIndex {
            values: val_idx,
            idx: idx,
            num_rows: rowi,
        })
    }
}

impl<R> fmt::Debug for ValueIndex<R> {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        // Sort the values by order of first appearance.
        let mut kvs = self.values.iter().collect::<Vec<_>>();
        kvs.sort_by(|&(_, v1), &(_, v2)| v1[0].cmp(&v2[0]));
        for (keys, rows) in kvs.into_iter() {
            // This is just for debugging, so assume Unicode for now.
            let keys = keys.iter()
                           .map(|k| String::from_utf8(k.to_vec()).unwrap())
                           .collect::<Vec<_>>();
            writeln!(f, "({}) => {:?}", keys.join(", "), rows)?
        }
        Ok(())
    }
}

fn get_row_key(
    sel: &Selection,
    row: &csv::ByteRecord,
    casei: bool,
) -> Vec<ByteString> {
    sel.select(row).map(|v| transform(&v, casei)).collect()
}

fn transform(bs: &[u8], casei: bool) -> ByteString {
    match str::from_utf8(bs) {
        Err(_) => bs.to_vec(),
        Ok(s) => {
            if !casei {
                s.trim().as_bytes().to_vec()
            } else {
                let norm: String =
                    s.trim().chars()
                     .map(|c| c.to_lowercase().next().unwrap()).collect();
                norm.into_bytes()
            }
        }
    }
}


================================================
FILE: src/cmd/mod.rs
================================================
pub mod cat;
pub mod count;
pub mod fixlengths;
pub mod flatten;
pub mod fmt;
pub mod frequency;
pub mod headers;
pub mod index;
pub mod input;
pub mod join;
pub mod partition;
pub mod reverse;
pub mod sample;
pub mod search;
pub mod select;
pub mod slice;
pub mod sort;
pub mod split;
pub mod stats;
pub mod table;


================================================
FILE: src/cmd/partition.rs
================================================
use std::collections::{HashMap, HashSet};
use std::collections::hash_map::Entry;
use std::fs;
use std::io;
use std::path::Path;

use csv;
use regex::Regex;

use CliResult;
use config::{Config, Delimiter};
use select::SelectColumns;
use util::{self, FilenameTemplate};

static USAGE: &'static str = "
Partitions the given CSV data into chunks based on the value of a column

The files are written to the output directory with filenames based on the
values in the partition column and the `--filename` flag.

Usage:
    xsv partition [options] <column> <outdir> [<input>]
    xsv partition --help

partition options:
    --filename <filename>  A filename template to use when constructing
                           the names of the output files.  The string '{}'
                           will be replaced by a value based on the value
                           of the field, but sanitized for shell safety.
                           [default: {}.csv]
    -p, --prefix-length <n>  Truncate the partition column after the
                           specified number of bytes when creating the
                           output file.
    --drop                 Drop the partition column from results.

Common options:
    -h, --help             Display this message
    -n, --no-headers       When set, the first row will NOT be interpreted
                           as column names. Otherwise, the first row will
                           appear in all chunks as the header row.
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Clone, Deserialize)]
struct Args {
    arg_column: SelectColumns,
    arg_input: Option<String>,
    arg_outdir: String,
    flag_filename: FilenameTemplate,
    flag_prefix_length: Option<usize>,
    flag_drop: bool,
    flag_no_headers: bool,
    flag_delimiter: Option<Delimiter>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;
    fs::create_dir_all(&args.arg_outdir)?;

    // It would be nice to support efficient parallel partitions, but doing
    // do would involve more complicated inter-thread communication, with
    // multiple readers and writers, and some way of passing buffers
    // between them.
    args.sequential_partition()
}

impl Args {
    /// Configuration for our reader.
    fn rconfig(&self) -> Config {
        Config::new(&self.arg_input)
            .delimiter(self.flag_delimiter)
            .no_headers(self.flag_no_headers)
            .select(self.arg_column.clone())
    }

    /// Get the column to use as a key.
    fn key_column(
        &self,
        rconfig: &Config,
        headers: &csv::ByteRecord,
    ) -> CliResult<usize> {
        let select_cols = rconfig.selection(headers)?;
        if select_cols.len() == 1 {
            Ok(select_cols[0])
        } else {
            fail!("can only partition on one column")
        }
    }

    /// A basic sequential partition.
    fn sequential_partition(&self) -> CliResult<()> {
        let rconfig = self.rconfig();
        let mut rdr = rconfig.reader()?;
        let headers = rdr.byte_headers()?.clone();
        let key_col = self.key_column(&rconfig, &headers)?;
        let mut gen = WriterGenerator::new(self.flag_filename.clone());

        let mut writers: HashMap<Vec<u8>, BoxedWriter> =
            HashMap::new();
        let mut row = csv::ByteRecord::new();
        while rdr.read_byte_record(&mut row)? {
            // Decide what file to put this in.
            let column = &row[key_col];
            let key = match self.flag_prefix_length {
                // We exceed --prefix-length, so ignore the extra bytes.
                Some(len) if len < column.len() => &column[0..len],
                _ => &column[..],
            };
            let mut entry = writers.entry(key.to_vec());
            let wtr = match entry {
                Entry::Occupied(ref mut occupied) => occupied.get_mut(),
                Entry::Vacant(vacant) => {
                    // We have a new key, so make a new writer.
                    let mut wtr = gen.writer(&*self.arg_outdir, key)?;
                    if !rconfig.no_headers {
                        if self.flag_drop {
                            wtr.write_record(headers.iter().enumerate()
                                .filter_map(|(i, e)| if i != key_col { Some(e) } else { None }))?;
                        } else {
                            wtr.write_record(&headers)?;
                        }
                    }
                    vacant.insert(wtr)
                }
            };
            if self.flag_drop {
                wtr.write_record(row.iter().enumerate()
                    .filter_map(|(i, e)| if i != key_col { Some(e) } else { None }))?;
            } else {
                wtr.write_byte_record(&row)?;
            }
        }
        Ok(())
    }
}

type BoxedWriter = csv::Writer<Box<io::Write+'static>>;

/// Generates unique filenames based on CSV values.
struct WriterGenerator {
    template: FilenameTemplate,
    counter: usize,
    used: HashSet<String>,
    non_word_char: Regex,
}

impl WriterGenerator {
    fn new(template: FilenameTemplate) -> WriterGenerator {
        WriterGenerator {
            template: template,
            counter: 1,
            used: HashSet::new(),
            non_word_char: Regex::new(r"\W").unwrap(),
        }
    }

    /// Create a CSV writer for `key`.  Does not add headers.
    fn writer<P>(&mut self, path: P, key: &[u8]) -> io::Result<BoxedWriter>
        where P: AsRef<Path>
    {
        let unique_value = self.unique_value(key);
        self.template.writer(path.as_ref(), &unique_value)
    }

    /// Generate a unique value for `key`, suitable for use in a
    /// "shell-safe" filename.  If you pass `key` twice, you'll get two
    /// different values.
    fn unique_value(&mut self, key: &[u8]) -> String {
        // Sanitize our key.
        let utf8 = String::from_utf8_lossy(key);
        let safe = self.non_word_char.replace_all(&*utf8, "").into_owned();
        let base =
            if safe.is_empty() {
                "empty".to_owned()
            } else {
                safe
            };

        // Now check for collisions.
        if !self.used.contains(&base) {
            self.used.insert(base.clone());
            base
        } else {
            loop {
                let candidate = format!("{}_{}", &base, self.counter);
                self.counter = self.counter.checked_add(1).unwrap_or_else(|| {
                    // We'll run out of other things long before we ever
                    // reach this, but we'll check just for correctness and
                    // completeness.
                    panic!("Cannot generate unique value")
                });
                if !self.used.contains(&candidate) {
                    self.used.insert(candidate.clone());
                    return candidate
                }
            }
        }
    }
}


================================================
FILE: src/cmd/reverse.rs
================================================
use CliResult;
use config::{Config, Delimiter};
use util;

static USAGE: &'static str = "
Reverses rows of CSV data.

Useful for cases when there is no column that can be used for sorting in reverse order,
or when keys are not unique and order of rows with the same key needs to be preserved.

Note that this requires reading all of the CSV data into memory.

Usage:
    xsv reverse [options] [<input>]

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -n, --no-headers       When set, the first row will not be interpreted
                           as headers. Namely, it will be reversed with the rest
                           of the rows. Otherwise, the first row will always
                           appear as the header row in the output.
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Deserialize)]
struct Args {
    arg_input: Option<String>,
    flag_output: Option<String>,
    flag_no_headers: bool,
    flag_delimiter: Option<Delimiter>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;
    let rconfig = Config::new(&args.arg_input)
        .delimiter(args.flag_delimiter)
        .no_headers(args.flag_no_headers);

    let mut rdr = rconfig.reader()?;

    let mut all = rdr.byte_records().collect::<Result<Vec<_>, _>>()?;
    all.reverse();

    let mut wtr = Config::new(&args.flag_output).writer()?;
    rconfig.write_headers(&mut rdr, &mut wtr)?;
    for r in all.into_iter() {
        wtr.write_byte_record(&r)?;
    }
    Ok(wtr.flush()?)
}


================================================
FILE: src/cmd/sample.rs
================================================
use std::io;

use byteorder::{ByteOrder, LittleEndian};
use csv;
use rand::{self, Rng, SeedableRng};
use rand::rngs::StdRng;

use CliResult;
use config::{Config, Delimiter};
use index::Indexed;
use util;

static USAGE: &'static str = "
Randomly samples CSV data uniformly using memory proportional to the size of
the sample.

When an index is present, this command will use random indexing if the sample
size is less than 10% of the total number of records. This allows for efficient
sampling such that the entire CSV file is not parsed.

This command is intended to provide a means to sample from a CSV data set that
is too big to fit into memory (for example, for use with commands like 'xsv
frequency' or 'xsv stats'). It will however visit every CSV record exactly
once, which is necessary to provide a uniform random sample. If you wish to
limit the number of records visited, use the 'xsv slice' command to pipe into
'xsv sample'.

Usage:
    xsv sample [options] <sample-size> [<input>]
    xsv sample --help

sample options:
    --seed <number>        RNG seed.

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -n, --no-headers       When set, the first row will be consider as part of
                           the population to sample from. (When not set, the
                           first row is the header row and will always appear
                           in the output.)
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Deserialize)]
struct Args {
    arg_input: Option<String>,
    arg_sample_size: u64,
    flag_output: Option<String>,
    flag_no_headers: bool,
    flag_delimiter: Option<Delimiter>,
    flag_seed: Option<usize>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;
    let rconfig = Config::new(&args.arg_input)
        .delimiter(args.flag_delimiter)
        .no_headers(args.flag_no_headers);
    let sample_size = args.arg_sample_size;

    let mut wtr = Config::new(&args.flag_output).writer()?;
    let sampled = match rconfig.indexed()? {
        Some(mut idx) => {
            if do_random_access(sample_size, idx.count()) {
                rconfig.write_headers(&mut *idx, &mut wtr)?;
                sample_random_access(&mut idx, sample_size)?
            } else {
                let mut rdr = rconfig.reader()?;
                rconfig.write_headers(&mut rdr, &mut wtr)?;
                sample_reservoir(&mut rdr, sample_size, args.flag_seed)?
            }
        }
        _ => {
            let mut rdr = rconfig.reader()?;
            rconfig.write_headers(&mut rdr, &mut wtr)?;
            sample_reservoir(&mut rdr, sample_size, args.flag_seed)?
        }
    };
    for row in sampled.into_iter() {
        wtr.write_byte_record(&row)?;
    }
    Ok(wtr.flush()?)
}

fn sample_random_access<R, I>(
    idx: &mut Indexed<R, I>,
    sample_size: u64,
) -> CliResult<Vec<csv::ByteRecord>>
where R: io::Read + io::Seek, I: io::Read + io::Seek
{
    let mut all_indices = (0..idx.count()).collect::<Vec<_>>();
    let mut rng = ::rand::thread_rng();
    rng.shuffle(&mut *all_indices);

    let mut sampled = Vec::with_capacity(sample_size as usize);
    for i in all_indices.into_iter().take(sample_size as usize) {
        idx.seek(i)?;
        sampled.push(idx.byte_records().next().unwrap()?);
    }
    Ok(sampled)
}

fn sample_reservoir<R: io::Read>(
    rdr: &mut csv::Reader<R>,
    sample_size: u64,
    seed: Option<usize>
) -> CliResult<Vec<csv::ByteRecord>> {
    // The following algorithm has been adapted from:
    // https://en.wikipedia.org/wiki/Reservoir_sampling
    let mut reservoir = Vec::with_capacity(sample_size as usize);
    let mut records = rdr.byte_records().enumerate();
    for (_, row) in records.by_ref().take(reservoir.capacity()) {
        reservoir.push(row?);
    }

    // Seeding rng
    let mut rng: StdRng = match seed {
        None => {
            StdRng::from_rng(rand::thread_rng()).unwrap()
        }
        Some(seed) => {
            let mut buf = [0u8; 32];
            LittleEndian::write_u64(&mut buf, seed as u64);
            SeedableRng::from_seed(buf)
        }
    };

    // Now do the sampling.
    for (i, row) in records {
        let random = rng.gen_range(0, i+1);
        if random < sample_size as usize {
            reservoir[random] = row?;
        }
    }
    Ok(reservoir)
}

fn do_random_access(sample_size: u64, total: u64) -> bool {
    sample_size <= (total / 10)
}


================================================
FILE: src/cmd/search.rs
================================================
use csv;
use regex::bytes::RegexBuilder;

use CliResult;
use config::{Config, Delimiter};
use select::SelectColumns;
use util;

static USAGE: &'static str = "
Filters CSV data by whether the given regex matches a row.

The regex is applied to each field in each row, and if any field matches,
then the row is written to the output. The columns to search can be limited
with the '--select' flag (but the full row is still written to the output if
there is a match).

Usage:
    xsv search [options] <regex> [<input>]
    xsv search --help

search options:
    -i, --ignore-case      Case insensitive search. This is equivalent to
                           prefixing the regex with '(?i)'.
    -s, --select <arg>     Select the columns to search. See 'xsv select -h'
                           for the full syntax.
    -v, --invert-match     Select only rows that did not match

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -n, --no-headers       When set, the first row will not be interpreted
                           as headers. (i.e., They are not searched, analyzed,
                           sliced, etc.)
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Deserialize)]
struct Args {
    arg_input: Option<String>,
    arg_regex: String,
    flag_select: SelectColumns,
    flag_output: Option<String>,
    flag_no_headers: bool,
    flag_delimiter: Option<Delimiter>,
    flag_invert_match: bool,
    flag_ignore_case: bool,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;
    let pattern = RegexBuilder::new(&*args.arg_regex)
        .case_insensitive(args.flag_ignore_case)
        .build()?;
    let rconfig = Config::new(&args.arg_input)
        .delimiter(args.flag_delimiter)
        .no_headers(args.flag_no_headers)
        .select(args.flag_select);

    let mut rdr = rconfig.reader()?;
    let mut wtr = Config::new(&args.flag_output).writer()?;

    let headers = rdr.byte_headers()?.clone();
    let sel = rconfig.selection(&headers)?;

    if !rconfig.no_headers {
        wtr.write_record(&headers)?;
    }
    let mut record = csv::ByteRecord::new();
    while rdr.read_byte_record(&mut record)? {
        let mut m = sel.select(&record).any(|f| pattern.is_match(f));
        if args.flag_invert_match {
            m = !m;
        }
        if m {
            wtr.write_byte_record(&record)?;
        }
    }
    Ok(wtr.flush()?)
}


================================================
FILE: src/cmd/select.rs
================================================
use csv;

use CliResult;
use config::{Config, Delimiter};
use select::SelectColumns;
use util;

static USAGE: &'static str = "
Select columns from CSV data efficiently.

This command lets you manipulate the columns in CSV data. You can re-order
them, duplicate them or drop them. Columns can be referenced by index or by
name if there is a header row (duplicate column names can be disambiguated with
more indexing). Finally, column ranges can be specified.

  Select the first and fourth columns:
  $ xsv select 1,4

  Select the first 4 columns (by index and by name):
  $ xsv select 1-4
  $ xsv select Header1-Header4

  Ignore the first 2 columns (by range and by omission):
  $ xsv select 3-
  $ xsv select '!1-2'

  Select the third column named 'Foo':
  $ xsv select 'Foo[2]'

  Re-order and duplicate columns arbitrarily:
  $ xsv select 3-1,Header3-Header1,Header1,Foo[2],Header1

  Quote column names that conflict with selector syntax:
  $ xsv select '\"Date - Opening\",\"Date - Actual Closing\"'

Usage:
    xsv select [options] [--] <selection> [<input>]
    xsv select --help

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -n, --no-headers       When set, the first row will not be interpreted
                           as headers. (i.e., They are not searched, analyzed,
                           sliced, etc.)
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Deserialize)]
struct Args {
    arg_input: Option<String>,
    arg_selection: SelectColumns,
    flag_output: Option<String>,
    flag_no_headers: bool,
    flag_delimiter: Option<Delimiter>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;

    let rconfig = Config::new(&args.arg_input)
        .delimiter(args.flag_delimiter)
        .no_headers(args.flag_no_headers)
        .select(args.arg_selection);

    let mut rdr = rconfig.reader()?;
    let mut wtr = Config::new(&args.flag_output).writer()?;

    let headers = rdr.byte_headers()?.clone();
    let sel = rconfig.selection(&headers)?;

    if !rconfig.no_headers {
        wtr.write_record(sel.iter().map(|&i| &headers[i]))?;
    }
    let mut record = csv::ByteRecord::new();
    while rdr.read_byte_record(&mut record)? {
        wtr.write_record(sel.iter().map(|&i| &record[i]))?;
    }
    wtr.flush()?;
    Ok(())
}


================================================
FILE: src/cmd/slice.rs
================================================
use std::fs;


use CliResult;
use config::{Config, Delimiter};
use index::Indexed;
use util;

static USAGE: &'static str = "
Returns the rows in the range specified (starting at 0, half-open interval).
The range does not include headers.

If the start of the range isn't specified, then the slice starts from the first
record in the CSV data.

If the end of the range isn't specified, then the slice continues to the last
record in the CSV data.

This operation can be made much faster by creating an index with 'xsv index'
first. Namely, a slice on an index requires parsing just the rows that are
sliced. Without an index, all rows up to the first row in the slice must be
parsed.

Usage:
    xsv slice [options] [<input>]

slice options:
    -s, --start <arg>      The index of the record to slice from.
    -e, --end <arg>        The index of the record to slice to.
    -l, --len <arg>        The length of the slice (can be used instead
                           of --end).
    -i, --index <arg>      Slice a single record (shortcut for -s N -l 1).

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -n, --no-headers       When set, the first row will not be interpreted
                           as headers. Otherwise, the first row will always
                           appear in the output as the header row.
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Deserialize)]
struct Args {
    arg_input: Option<String>,
    flag_start: Option<usize>,
    flag_end: Option<usize>,
    flag_len: Option<usize>,
    flag_index: Option<usize>,
    flag_output: Option<String>,
    flag_no_headers: bool,
    flag_delimiter: Option<Delimiter>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;
    match args.rconfig().indexed()? {
        None => args.no_index(),
        Some(idxed) => args.with_index(idxed),
    }
}

impl Args {
    fn no_index(&self) -> CliResult<()> {
        let mut rdr = self.rconfig().reader()?;
        let mut wtr = self.wconfig().writer()?;
        self.rconfig().write_headers(&mut rdr, &mut wtr)?;

        let (start, end) = self.range()?;
        for r in rdr.byte_records().skip(start).take(end - start) {
            wtr.write_byte_record(&r?)?;
        }
        Ok(wtr.flush()?)
    }

    fn with_index(
        &self,
        mut idx: Indexed<fs::File, fs::File>,
    ) -> CliResult<()> {
        let mut wtr = self.wconfig().writer()?;
        self.rconfig().write_headers(&mut *idx, &mut wtr)?;

        let (start, end) = self.range()?;
        if end - start == 0 {
            return Ok(());
        }
        idx.seek(start as u64)?;
        for r in idx.byte_records().take(end - start) {
            wtr.write_byte_record(&r?)?;
        }
        wtr.flush()?;
        Ok(())
    }

    fn range(&self) -> Result<(usize, usize), String> {
        util::range(
            self.flag_start, self.flag_end, self.flag_len, self.flag_index)
    }

    fn rconfig(&self) -> Config {
        Config::new(&self.arg_input)
            .delimiter(self.flag_delimiter)
            .no_headers(self.flag_no_headers)
    }

    fn wconfig(&self) -> Config {
        Config::new(&self.flag_output)
    }
}


================================================
FILE: src/cmd/sort.rs
================================================
use std::cmp;

use CliResult;
use config::{Config, Delimiter};
use select::SelectColumns;
use util;
use std::str::from_utf8;

use self::Number::{Float, Int};

static USAGE: &'static str = "
Sorts CSV data lexicographically.

Note that this requires reading all of the CSV data into memory.

Usage:
    xsv sort [options] [<input>]

sort options:
    -s, --select <arg>     Select a subset of columns to sort.
                           See 'xsv select --help' for the format details.
    -N, --numeric          Compare according to string numerical value
    -R, --reverse          Reverse order

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -n, --no-headers       When set, the first row will not be interpreted
                           as headers. Namely, it will be sorted with the rest
                           of the rows. Otherwise, the first row will always
                           appear as the header row in the output.
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Deserialize)]
struct Args {
    arg_input: Option<String>,
    flag_select: SelectColumns,
    flag_numeric: bool,
    flag_reverse: bool,
    flag_output: Option<String>,
    flag_no_headers: bool,
    flag_delimiter: Option<Delimiter>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;
    let numeric = args.flag_numeric;
    let reverse = args.flag_reverse;
    let rconfig = Config::new(&args.arg_input)
        .delimiter(args.flag_delimiter)
        .no_headers(args.flag_no_headers)
        .select(args.flag_select);

    let mut rdr = rconfig.reader()?;

    let headers = rdr.byte_headers()?.clone();
    let sel = rconfig.selection(&headers)?;

    let mut all = rdr.byte_records().collect::<Result<Vec<_>, _>>()?;
    match (numeric, reverse) {
        (false, false) =>
            all.sort_by(|r1, r2| {
                let a = sel.select(r1);
                let b = sel.select(r2);
                iter_cmp(a, b)
            }),
        (true, false) =>
            all.sort_by(|r1, r2| {
                let a = sel.select(r1);
                let b = sel.select(r2);
                iter_cmp_num(a, b)
            }),
        (false, true) =>
            all.sort_by(|r1, r2| {
                let a = sel.select(r1);
                let b = sel.select(r2);
                iter_cmp(b, a)
            }),
        (true, true) =>
            all.sort_by(|r1, r2| {
                let a = sel.select(r1);
                let b = sel.select(r2);
                iter_cmp_num(b, a)
            }),
    }

    let mut wtr = Config::new(&args.flag_output).writer()?;
    rconfig.write_headers(&mut rdr, &mut wtr)?;
    for r in all.into_iter() {
        wtr.write_byte_record(&r)?;
    }
    Ok(wtr.flush()?)
}

/// Order `a` and `b` lexicographically using `Ord`
pub fn iter_cmp<A, L, R>(mut a: L, mut b: R) -> cmp::Ordering
        where A: Ord, L: Iterator<Item=A>, R: Iterator<Item=A> {
    loop {
        match (a.next(), b.next()) {
            (None, None) => return cmp::Ordering::Equal,
            (None, _   ) => return cmp::Ordering::Less,
            (_   , None) => return cmp::Ordering::Greater,
            (Some(x), Some(y)) => match x.cmp(&y) {
                cmp::Ordering::Equal => (),
                non_eq => return non_eq,
            },
        }
    }
}

/// Try parsing `a` and `b` as numbers when ordering
pub fn iter_cmp_num<'a, L, R>(mut a: L, mut b: R) -> cmp::Ordering
        where L: Iterator<Item=&'a [u8]>, R: Iterator<Item=&'a [u8]> {
    loop {
        match (next_num(&mut a), next_num(&mut b)) {
            (None, None) => return cmp::Ordering::Equal,
            (None, _   ) => return cmp::Ordering::Less,
            (_   , None) => return cmp::Ordering::Greater,
            (Some(x), Some(y)) => match compare_num(x, y) {
                cmp::Ordering::Equal => (),
                non_eq => return non_eq,
            },
        }
    }
}

#[derive(Clone, Copy, PartialEq)]
enum Number {
    Int(i64),
    Float(f64),
}

fn compare_num(n1: Number, n2: Number) -> cmp::Ordering{
    match (n1, n2) {
        (Int(i1), Int(i2)) => i1.cmp(&i2),
        (Int(i1), Float(f2)) => compare_float(i1 as f64, f2),
        (Float(f1), Int(i2)) => compare_float(f1, i2 as f64),
        (Float(f1), Float(f2)) => compare_float(f1, f2),
    }
}

fn compare_float(f1: f64, f2: f64) -> cmp::Ordering {
    f1.partial_cmp(&f2).unwrap_or(cmp::Ordering::Equal)
}


fn next_num<'a, X>(xs: &mut X) -> Option<Number>
        where X: Iterator<Item=&'a [u8]> {
    xs.next()
        .and_then(|bytes| from_utf8(bytes).ok())
        .and_then(|s| {
            if let Ok(i) = s.parse::<i64>() { Some(Number::Int(i)) }
            else if let Ok(f) = s.parse::<f64>() { Some(Number::Float(f)) }
            else { None }
        })
}


================================================
FILE: src/cmd/split.rs
================================================
use std::fs;
use std::io;
use std::path::Path;

use channel;
use csv;
use threadpool::ThreadPool;

use CliResult;
use config::{Config, Delimiter};
use index::Indexed;
use util::{self, FilenameTemplate};

static USAGE: &'static str = "
Splits the given CSV data into chunks.

The files are written to the directory given with the name '{start}.csv',
where {start} is the index of the first record of the chunk (starting at 0).

Usage:
    xsv split [options] <outdir> [<input>]
    xsv split --help

split options:
    -s, --size <arg>       The number of records to write into each chunk.
                           [default: 500]
    -j, --jobs <arg>       The number of spliting jobs to run in parallel.
                           This only works when the given CSV data has
                           an index already created. Note that a file handle
                           is opened for each job.
                           When set to '0', the number of jobs is set to the
                           number of CPUs detected.
                           [default: 0]
    --filename <filename>  A filename template to use when constructing
                           the names of the output files.  The string '{}'
                           will be replaced by a value based on the value
                           of the field, but sanitized for shell safety.
                           [default: {}.csv]

Common options:
    -h, --help             Display this message
    -n, --no-headers       When set, the first row will NOT be interpreted
                           as column names. Otherwise, the first row will
                           appear in all chunks as the header row.
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Clone, Deserialize)]
struct Args {
    arg_input: Option<String>,
    arg_outdir: String,
    flag_size: usize,
    flag_jobs: usize,
    flag_filename: FilenameTemplate,
    flag_no_headers: bool,
    flag_delimiter: Option<Delimiter>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;
    if args.flag_size == 0 {
        return fail!("--size must be greater than 0.");
    }
    fs::create_dir_all(&args.arg_outdir)?;

    match args.rconfig().indexed()? {
        Some(idx) => args.parallel_split(idx),
        None => args.sequential_split(),
    }
}

impl Args {
    fn sequential_split(&self) -> CliResult<()> {
        let rconfig = self.rconfig();
        let mut rdr = rconfig.reader()?;
        let headers = rdr.byte_headers()?.clone();

        let mut wtr = self.new_writer(&headers, 0)?;
        let mut i = 0;
        let mut row = csv::ByteRecord::new();
        while rdr.read_byte_record(&mut row)? {
            if i > 0 && i % self.flag_size == 0 {
                wtr.flush()?;
                wtr = self.new_writer(&headers, i)?;
            }
            wtr.write_byte_record(&row)?;
            i += 1;
        }
        wtr.flush()?;
        Ok(())
    }

    fn parallel_split(
        &self,
        idx: Indexed<fs::File, fs::File>,
    ) -> CliResult<()> {
        let nchunks = util::num_of_chunks(
            idx.count() as usize, self.flag_size);
        let pool = ThreadPool::new(self.njobs());
        let (tx, rx) = channel::bounded::<()>(0);
        for i in 0..nchunks {
            let args = self.clone();
            let tx = tx.clone();
            pool.execute(move || {
                let conf = args.rconfig();
                let mut idx = conf.indexed().unwrap().unwrap();
                let headers = idx.byte_headers().unwrap().clone();
                let mut wtr = args
                    .new_writer(&headers, i * args.flag_size)
                    .unwrap();

                idx.seek((i * args.flag_size) as u64).unwrap();
                for row in idx.byte_records().take(args.flag_size) {
                    let row = row.unwrap();
                    wtr.write_byte_record(&row).unwrap();
                }
                wtr.flush().unwrap();
                drop(tx);
            });
        }
        drop(tx);
        rx.recv();
        Ok(())
    }

    fn new_writer(
        &self,
        headers: &csv::ByteRecord,
        start: usize,
    ) -> CliResult<csv::Writer<Box<io::Write+'static>>> {
        let dir = Path::new(&self.arg_outdir);
        let path = dir.join(self.flag_filename.filename(&format!("{}", start)));
        let spath = Some(path.display().to_string());
        let mut wtr = Config::new(&spath).writer()?;
        if !self.rconfig().no_headers {
            wtr.write_record(headers)?;
        }
        Ok(wtr)
    }

    fn rconfig(&self) -> Config {
        Config::new(&self.arg_input)
            .delimiter(self.flag_delimiter)
            .no_headers(self.flag_no_headers)
    }

    fn njobs(&self) -> usize {
        if self.flag_jobs == 0 {
            util::num_cpus()
        } else {
            self.flag_jobs
        }
    }
}


================================================
FILE: src/cmd/stats.rs
================================================
use std::borrow::ToOwned;
use std::default::Default;
use std::fmt;
use std::fs;
use std::io;
use std::iter::{FromIterator, repeat};
use std::str::{self, FromStr};

use channel;
use csv;
use stats::{Commute, OnlineStats, MinMax, Unsorted, merge_all};
use threadpool::ThreadPool;

use CliResult;
use config::{Config, Delimiter};
use index::Indexed;
use select::{SelectColumns, Selection};
use util;

use self::FieldType::{TUnknown, TNull, TUnicode, TFloat, TInteger};

static USAGE: &'static str = "
Computes basic statistics on CSV data.

Basic statistics includes mean, median, mode, standard deviation, sum, max and
min values. Note that some statistics are expensive to compute, so they must
be enabled explicitly. By default, the following statistics are reported for
*every* column in the CSV data: mean, max, min and standard deviation. The
default set of statistics corresponds to statistics that can be computed
efficiently on a stream of data (i.e., constant memory).

Computing statistics on a large file can be made much faster if you create
an index for it first with 'xsv index'.

Usage:
    xsv stats [options] [<input>]

stats options:
    -s, --select <arg>     Select a subset of columns to compute stats for.
                           See 'xsv select --help' for the format details.
                           This is provided here because piping 'xsv select'
                           into 'xsv stats' will disable the use of indexing.
    --everything           Show all statistics available.
    --mode                 Show the mode.
                           This requires storing all CSV data in memory.
    --cardinality          Show the cardinality.
                           This requires storing all CSV data in memory.
    --median               Show the median.
                           This requires storing all CSV data in memory.
    --nulls                Include NULLs in the population size for computing
                           mean and standard deviation.
    -j, --jobs <arg>       The number of jobs to run in parallel.
                           This works better when the given CSV data has
                           an index already created. Note that a file handle
                           is opened for each job.
                           When set to '0', the number of jobs is set to the
                           number of CPUs detected.
                           [default: 0]

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -n, --no-headers       When set, the first row will NOT be interpreted
                           as column names. i.e., They will be included
                           in statistics.
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Clone, Deserialize)]
struct Args {
    arg_input: Option<String>,
    flag_select: SelectColumns,
    flag_everything: bool,
    flag_mode: bool,
    flag_cardinality: bool,
    flag_median: bool,
    flag_nulls: bool,
    flag_jobs: usize,
    flag_output: Option<String>,
    flag_no_headers: bool,
    flag_delimiter: Option<Delimiter>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;

    let mut wtr = Config::new(&args.flag_output).writer()?;
    let (headers, stats) = match args.rconfig().indexed()? {
        None => args.sequential_stats(),
        Some(idx) => {
            if args.flag_jobs == 1 {
                args.sequential_stats()
            } else {
                args.parallel_stats(idx)
            }
        }
    }?;
    let stats = args.stats_to_records(stats);

    wtr.write_record(&args.stat_headers())?;
    let fields = headers.iter().zip(stats.into_iter());
    for (i, (header, stat)) in fields.enumerate() {
        let header =
            if args.flag_no_headers {
                i.to_string().into_bytes()
            } else {
                header.to_vec()
            };
        let stat = stat.iter().map(|f| f.as_bytes());
        wtr.write_record(vec![&*header].into_iter().chain(stat))?;
    }
    wtr.flush()?;
    Ok(())
}

impl Args {
    fn sequential_stats(&self) -> CliResult<(csv::ByteRecord, Vec<Stats>)> {
        let mut rdr = self.rconfig().reader()?;
        let (headers, sel) = self.sel_headers(&mut rdr)?;
        let stats = self.compute(&sel, rdr.byte_records())?;
        Ok((headers, stats))
    }

    fn parallel_stats(
        &self,
        idx: Indexed<fs::File, fs::File>,
    ) -> CliResult<(csv::ByteRecord, Vec<Stats>)> {
        // N.B. This method doesn't handle the case when the number of records
        // is zero correctly. So we use `sequential_stats` instead.
        if idx.count() == 0 {
            return self.sequential_stats();
        }

        let mut rdr = self.rconfig().reader()?;
        let (headers, sel) = self.sel_headers(&mut rdr)?;

        let chunk_size = util::chunk_size(idx.count() as usize, self.njobs());
        let nchunks = util::num_of_chunks(idx.count() as usize, chunk_size);

        let pool = ThreadPool::new(self.njobs());
        let (send, recv) = channel::bounded(0);
        for i in 0..nchunks {
            let (send, args, sel) = (send.clone(), self.clone(), sel.clone());
            pool.execute(move || {
                let mut idx = args.rconfig().indexed().unwrap().unwrap();
                idx.seek((i * chunk_size) as u64).unwrap();
                let it = idx.byte_records().take(chunk_size);
                send.send(args.compute(&sel, it).unwrap());
            });
        }
        drop(send);
        Ok((headers, merge_all(recv).unwrap_or_else(Vec::new)))
    }

    fn stats_to_records(&self, stats: Vec<Stats>) -> Vec<csv::StringRecord> {
        let mut records: Vec<_> = repeat(csv::StringRecord::new())
            .take(stats.len())
            .collect();
        let pool = ThreadPool::new(self.njobs());
        let mut results = vec![];
        for mut stat in stats.into_iter() {
            let (send, recv) = channel::bounded(0);
            results.push(recv);
            pool.execute(move || { send.send(stat.to_record()); });
        }
        for (i, recv) in results.into_iter().enumerate() {
            records[i] = recv.recv().unwrap();
        }
        records
    }

    fn compute<I>(&self, sel: &Selection, it: I) -> CliResult<Vec<Stats>>
            where I: Iterator<Item=csv::Result<csv::ByteRecord>> {
        let mut stats = self.new_stats(sel.len());
        for row in it {
            let row = row?;
            for (i, field) in sel.select(&row).enumerate() {
                stats[i].add(field);
            }
        }
        Ok(stats)
    }

    fn sel_headers<R: io::Read>(
        &self,
        rdr: &mut csv::Reader<R>,
    ) -> CliResult<(csv::ByteRecord, Selection)> {
        let headers = rdr.byte_headers()?.clone();
        let sel = self.rconfig().selection(&headers)?;
        Ok((csv::ByteRecord::from_iter(sel.select(&headers)), sel))
    }

    fn rconfig(&self) -> Config {
        Config::new(&self.arg_input)
            .delimiter(self.flag_delimiter)
            .no_headers(self.flag_no_headers)
            .select(self.flag_select.clone())
    }

    fn njobs(&self) -> usize {
        if self.flag_jobs == 0 { util::num_cpus() } else { self.flag_jobs }
    }

    fn new_stats(&self, record_len: usize) -> Vec<Stats> {
        repeat(Stats::new(WhichStats {
            include_nulls: self.flag_nulls,
            sum: true,
            range: true,
            dist: true,
            cardinality: self.flag_cardinality || self.flag_everything,
            median: self.flag_median || self.flag_everything,
            mode: self.flag_mode || self.flag_everything,
        })).take(record_len).collect()
    }

    fn stat_headers(&self) -> csv::StringRecord {
        let mut fields = vec![
            "field", "type", "sum", "min", "max", "min_length", "max_length",
            "mean", "stddev",
        ];
        let all = self.flag_everything;
        if self.flag_median || all { fields.push("median"); }
        if self.flag_mode || all { fields.push("mode"); }
        if self.flag_cardinality || all { fields.push("cardinality"); }
        csv::StringRecord::from(fields)
    }
}

#[derive(Clone, Debug, Eq, PartialEq)]
struct WhichStats {
    include_nulls: bool,
    sum: bool,
    range: bool,
    dist: bool,
    cardinality: bool,
    median: bool,
    mode: bool,
}

impl Commute for WhichStats {
    fn merge(&mut self, other: WhichStats) {
        assert_eq!(*self, other);
    }
}

#[derive(Clone)]
struct Stats {
    typ: FieldType,
    sum: Option<TypedSum>,
    minmax: Option<TypedMinMax>,
    online: Option<OnlineStats>,
    mode: Option<Unsorted<Vec<u8>>>,
    median: Option<Unsorted<f64>>,
    which: WhichStats,
}

impl Stats {
    fn new(which: WhichStats) -> Stats {
        let (mut sum, mut minmax, mut online, mut mode, mut median) =
            (None, None, None, None, None);
        if which.sum { sum = Some(Default::default()); }
        if which.range { minmax = Some(Default::default()); }
        if which.dist { online = Some(Default::default()); }
        if which.mode || which.cardinality { mode = Some(Default::default()); }
        if which.median { median = Some(Default::default()); }
        Stats {
            typ: Default::default(),
            sum: sum,
            minmax: minmax,
            online: online,
            mode: mode,
            median: median,
            which: which,
        }
    }

    fn add(&mut self, sample: &[u8]) {
        let sample_type = FieldType::from_sample(sample);
        self.typ.merge(sample_type);

        let t = self.typ;
        self.sum.as_mut().map(|v| v.add(t, sample));
        self.minmax.as_mut().map(|v| v.add(t, sample));
        self.mode.as_mut().map(|v| v.add(sample.to_vec()));
        match self.typ {
            TUnknown => {}
            TNull => {
                if self.which.include_nulls {
                    self.online.as_mut().map(|v| { v.add_null(); });
                }
            }
            TUnicode => {}
            TFloat | TInteger => {
                if sample_type.is_null() {
                    if self.which.include_nulls {
                        self.online.as_mut().map(|v| { v.add_null(); });
                    }
                } else {
                    let n = from_bytes::<f64>(sample).unwrap();
                    self.median.as_mut().map(|v| { v.add(n); });
                    self.online.as_mut().map(|v| { v.add(n); });
                }
            }
        }
    }

    fn to_record(&mut self) -> csv::StringRecord {
        let typ = self.typ;
        let mut pieces = vec![];
        let empty = || "".to_owned();

        pieces.push(self.typ.to_string());
        match self.sum.as_ref().and_then(|sum| sum.show(typ)) {
            Some(sum) => { pieces.push(sum); }
            None => { pieces.push(empty()); }
        }
        match self.minmax.as_ref().and_then(|mm| mm.show(typ)) {
            Some(mm) => { pieces.push(mm.0); pieces.push(mm.1); }
            None => { pieces.push(empty()); pieces.push(empty()); }
        }
        match self.minmax.as_ref().and_then(|mm| mm.len_range()) {
            Some(mm) => { pieces.push(mm.0); pieces.push(mm.1); }
            None => { pieces.push(empty()); pieces.push(empty()); }
        }

        if !self.typ.is_number() {
            pieces.push(empty()); pieces.push(empty());
        } else {
            match self.online {
                Some(ref v) => {
                    pieces.push(v.mean().to_string());
                    pieces.push(v.stddev().to_string());
                }
                None => { pieces.push(empty()); pieces.push(empty()); }
            }
        }
        match self.median.as_mut().and_then(|v| v.median()) {
            None => {
                if self.which.median {
                    pieces.push(empty());
                }
            }
            Some(v) => { pieces.push(v.to_string()); }
        }
        match self.mode.as_mut() {
            None => {
                if self.which.mode {
                    pieces.push(empty());
                }
                if self.which.cardinality {
                    pieces.push(empty());
                }
            }
            Some(ref mut v) => {
                if self.which.mode {
                    let lossy = |s: Vec<u8>| -> String {
                        String::from_utf8_lossy(&*s).into_owned()
                    };
                    pieces.push(
                        v.mode().map_or("N/A".to_owned(), lossy));
                }
                if self.which.cardinality {
                    pieces.push(v.cardinality().to_string());
                }
            }
        }
        csv::StringRecord::from(pieces)
    }
}

impl Commute for Stats {
    fn merge(&mut self, other: Stats) {
        self.typ.merge(other.typ);
        self.sum.merge(other.sum);
        self.minmax.merge(other.minmax);
        self.online.merge(other.online);
        self.mode.merge(other.mode);
        self.median.merge(other.median);
        self.which.merge(other.which);
    }
}

#[derive(Clone, Copy, PartialEq)]
enum FieldType {
    TUnknown,
    TNull,
    TUnicode,
    TFloat,
    TInteger,
}

impl FieldType {
    fn from_sample(sample: &[u8]) -> FieldType {
        if sample.is_empty() {
            return TNull;
        }
        let string = match str::from_utf8(sample) {
            Err(_) => return TUnknown,
            Ok(s) => s,
        };
        if let Ok(_) = string.parse::<i64>() { return TInteger; }
        if let Ok(_) = string.parse::<f64>() { return TFloat; }
        TUnicode
    }

    fn is_number(&self) -> bool {
        *self == TFloat || *self == TInteger
    }

    fn is_null(&self) -> bool {
        *self == TNull
    }
}

impl Commute for FieldType {
    fn merge(&mut self, other: FieldType) {
        *self = match (*self, other) {
            (TUnicode, TUnicode) => TUnicode,
            (TFloat, TFloat) => TFloat,
            (TInteger, TInteger) => TInteger,
            // Null does not impact the type.
            (TNull, any) | (any, TNull) => any,
            // There's no way to get around an unknown.
            (TUnknown, _) | (_, TUnknown) => TUnknown,
            // Integers can degrate to floats.
            (TFloat, TInteger) | (TInteger, TFloat) => TFloat,
            // Numbers can degrade to Unicode strings.
            (TUnicode, TFloat) | (TFloat, TUnicode) => TUnicode,
            (TUnicode, TInteger) | (TInteger, TUnicode) => TUnicode,
        };
    }
}

impl Default for FieldType {
    // The default is the most specific type.
    // Type inference proceeds by assuming the most specific type and then
    // relaxing the type as counter-examples are found.
    fn default() -> FieldType { TNull }
}

impl fmt::Display for FieldType {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match *self {
            TUnknown => write!(f, "Unknown"),
            TNull => write!(f, "NULL"),
            TUnicode => write!(f, "Unicode"),
            TFloat => write!(f, "Float"),
            TInteger => write!(f, "Integer"),
        }
    }
}

/// TypedSum keeps a rolling sum of the data seen.
///
/// It sums integers until it sees a float, at which point it sums floats.
#[derive(Clone, Default)]
struct TypedSum {
    integer: i64,
    float: Option<f64>,
}

impl TypedSum {
    fn add(&mut self, typ: FieldType, sample: &[u8]) {
        if sample.is_empty() {
            return;
        }
        match typ {
            TFloat => {
                let float: f64 = from_bytes::<f64>(sample).unwrap();
                match self.float {
                    None => {
                        self.float = Some((self.integer as f64) + float);
                    }
                    Some(ref mut f) => {
                        *f += float;
                    }
                }
            }
            TInteger => {
                if let Some(ref mut float) = self.float {
                    *float += from_bytes::<f64>(sample).unwrap();
                } else {
                    self.integer += from_bytes::<i64>(sample).unwrap();
                }
            }
            _ => {}
        }
    }

    fn show(&self, typ: FieldType) -> Option<String> {
        match typ {
            TNull | TUnicode | TUnknown  => None,
            TInteger => Some(self.integer.to_string()),
            TFloat => Some(self.float.unwrap_or(0.0).to_string()),
        }
    }
}

impl Commute for TypedSum {
    fn merge(&mut self, other: TypedSum) {
        match (self.float, other.float) {
            (Some(f1), Some(f2)) => self.float = Some(f1 + f2),
            (Some(f1), None) => self.float = Some(f1 + (other.integer as f64)),
            (None, Some(f2)) => self.float = Some((self.integer as f64) + f2),
            (None, None) => self.integer += other.integer,
        }
    }
}

/// TypedMinMax keeps track of minimum/maximum values for each possible type
/// where min/max makes sense.
#[derive(Clone)]
struct TypedMinMax {
    strings: MinMax<Vec<u8>>,
    str_len: MinMax<usize>,
    integers: MinMax<i64>,
    floats: MinMax<f64>,
}

impl TypedMinMax {
    fn add(&mut self, typ: FieldType, sample: &[u8]) {
        self.str_len.add(sample.len());
        if sample.is_empty() {
            return;
        }
        self.strings.add(sample.to_vec());
        match typ {
            TUnicode | TUnknown | TNull => {}
            TFloat => {
                let n = str::from_utf8(&*sample)
                            .ok()
                            .and_then(|s| s.parse::<f64>().ok())
                            .unwrap();
                self.floats.add(n);
                self.integers.add(n as i64);
            }
            TInteger => {
                let n = str::from_utf8(&*sample)
                            .ok()
                            .and_then(|s| s.parse::<i64>().ok())
                            .unwrap();
                self.integers.add(n);
                self.floats.add(n as f64);
            }
        }
    }

    fn len_range(&self) -> Option<(String, String)> {
        match (self.str_len.min(), self.str_len.max()) {
            (Some(min), Some(max)) => Some((min.to_string(), max.to_string())),
            _ => None,
        }
    }

    fn show(&self, typ: FieldType) -> Option<(String, String)> {
        match typ {
            TNull => None,
            TUnicode | TUnknown => {
                match (self.strings.min(), self.strings.max()) {
                    (Some(min), Some(max)) => {
                        let min = String::from_utf8_lossy(&**min).to_string();
                        let max = String::from_utf8_lossy(&**max).to_string();
                        Some((min, max))
                    }
                    _ => None
                }
            }
            TInteger => {
                match (self.integers.min(), self.integers.max()) {
                    (Some(min), Some(max)) => {
                        Some((min.to_string(), max.to_string()))
                    }
                    _ => None
                }
            }
            TFloat => {
                match (self.floats.min(), self.floats.max()) {
                    (Some(min), Some(max)) => {
                        Some((min.to_string(), max.to_string()))
                    }
                    _ => None
                }
            }
        }
    }
}

impl Default for TypedMinMax {
    fn default() -> TypedMinMax {
        TypedMinMax {
            strings: Default::default(),
            str_len: Default::default(),
            integers: Default::default(),
            floats: Default::default(),
        }
    }
}

impl Commute for TypedMinMax {
    fn merge(&mut self, other: TypedMinMax) {
        self.strings.merge(other.strings);
        self.str_len.merge(other.str_len);
        self.integers.merge(other.integers);
        self.floats.merge(other.floats);
    }
}

fn from_bytes<T: FromStr>(bytes: &[u8]) -> Option<T> {
    str::from_utf8(bytes).ok().and_then(|s| s.parse().ok())
}


================================================
FILE: src/cmd/table.rs
================================================
use std::borrow::Cow;

use csv;
use tabwriter::TabWriter;

use CliResult;
use config::{Config, Delimiter};
use util;

static USAGE: &'static str = "
Outputs CSV data as a table with columns in alignment.

This will not work well if the CSV data contains large fields.

Note that formatting a table requires buffering all CSV data into memory.
Therefore, you should use the 'sample' or 'slice' command to trim down large
CSV data before formatting it with this command.

Usage:
    xsv table [options] [<input>]

table options:
    -w, --width <arg>      The minimum width of each column.
                           [default: 2]
    -p, --pad <arg>        The minimum number of spaces between each column.
                           [default: 2]
    -c, --condense <arg>  Limits the length of each field to the value
                           specified. If the field is UTF-8 encoded, then
                           <arg> refers to the number of code points.
                           Otherwise, it refers to the number of bytes.

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)
";

#[derive(Deserialize)]
struct Args {
    arg_input: Option<String>,
    flag_width: usize,
    flag_pad: usize,
    flag_output: Option<String>,
    flag_delimiter: Option<Delimiter>,
    flag_condense: Option<usize>,
}

pub fn run(argv: &[&str]) -> CliResult<()> {
    let args: Args = util::get_args(USAGE, argv)?;
    let rconfig = Config::new(&args.arg_input)
        .delimiter(args.flag_delimiter)
        .no_headers(true);
    let wconfig = Config::new(&args.flag_output)
        .delimiter(Some(Delimiter(b'\t')));

    let tw = TabWriter::new(wconfig.io_writer()?)
        .minwidth(args.flag_width)
        .padding(args.flag_pad);
    let mut wtr = wconfig.from_writer(tw);
    let mut rdr = rconfig.reader()?;

    let mut record = csv::ByteRecord::new();
    while rdr.read_byte_record(&mut record)? {
        wtr.write_record(record.iter().map(|f| {
            util::condense(Cow::Borrowed(f), args.flag_condense)
        }))?;
    }
    wtr.flush()?;
    Ok(())
}


================================================
FILE: src/config.rs
================================================
#[allow(deprecated, unused_imports)]
use std::ascii::AsciiExt;
use std::borrow::ToOwned;
use std::env;
use std::fs;
use std::io::{self, Read};
use std::ops::Deref;
use std::path::PathBuf;

use csv;
use index::Indexed;
use serde::de::{Deserializer, Deserialize, Error};

use CliResult;
use select::{SelectColumns, Selection};
use util;


#[derive(Clone, Copy, Debug)]
pub struct Delimiter(pub u8);

/// Delimiter represents values that can be passed from the command line that
/// can be used as a field delimiter in CSV data.
///
/// Its purpose is to ensure that the Unicode character given decodes to a
/// valid ASCII character as required by the CSV parser.
impl Delimiter {
    pub fn as_byte(self) -> u8 {
        self.0
    }
}

impl<'de> Deserialize<'de> for Delimiter {
    fn deserialize<D: Deserializer<'de>>(d: D) -> Result<Delimiter, D::Error> {
        let c = String::deserialize(d)?;
        match &*c {
            r"\t" => Ok(Delimiter(b'\t')),
            s => {
                if s.len() != 1 {
                    let msg = format!("Could not convert '{}' to a single \
                                       ASCII character.", s);
                    return Err(D::Error::custom(msg));
                }
                let c = s.chars().next().unwrap();
                if c.is_ascii() {
                    Ok(Delimiter(c as u8))
                } else {
                    let msg = format!("Could not convert '{}' \
                                       to ASCII delimiter.", c);
                    Err(D::Error::custom(msg))
                }
            }
        }
    }
}

#[derive(Debug)]
pub struct Config {
    path: Option<PathBuf>, // None implies <stdin>
    idx_path: Option<PathBuf>,
    select_columns: Option<SelectColumns>,
    delimiter: u8,
    pub no_headers: bool,
    flexible: bool,
    terminator: csv::Terminator,
    quote: u8,
    quote_style: csv::QuoteStyle,
    double_quote: bool,
    escape: Option<u8>,
    quoting: bool,
}

impl Config {
    pub fn new(path: &Option<String>) -> Config {
        let (path, delim) = match *path {
            None => (None, b','),
            Some(ref s) if s.deref() == "-" => (None, b','),
            Some(ref s) => {
                let path = PathBuf::from(s);
                let delim =
                    if path.extension().map_or(false, |v| v == "tsv" || v == "tab") {
                        b'\t'
                    } else {
                        b','
                    };
                (Some(path), delim)
            }
        };
        Config {
            path: path,
            idx_path: None,
            select_columns: None,
            delimiter: delim,
            no_headers: false,
            flexible: false,
            terminator: csv::Terminator::Any(b'\n'),
            quote: b'"',
            quote_style: csv::QuoteStyle::Necessary,
            double_quote: true,
            escape: None,
            quoting: true,
        }
    }

    pub fn delimiter(mut self, d: Option<Delimiter>) -> Config {
        if let Some(d) = d {
            self.delimiter = d.as_byte();
        }
        self
    }

    pub fn no_headers(mut self, mut yes: bool) -> Config {
        if env::var("XSV_TOGGLE_HEADERS").unwrap_or("0".to_owned()) == "1" {
            yes = !yes;
        }
        self.no_headers = yes;
        self
    }

    pub fn flexible(mut self, yes: bool) -> Config {
        self.flexible = yes;
        self
    }

    pub fn crlf(mut self, yes: bool) -> Config {
        if yes {
            self.terminator = csv::Terminator::CRLF;
        } else {
            self.terminator = csv::Terminator::Any(b'\n');
        }
        self
    }

    pub fn terminator(mut self, term: csv::Terminator) -> Config {
        self.terminator = term;
        self
    }

    pub fn quote(mut self, quote: u8) -> Config {
        self.quote = quote;
        self
    }

    pub fn quote_style(mut self, style: csv::QuoteStyle) -> Config {
        self.quote_style = style;
        self
    }

    pub fn double_quote(mut self, yes: bool) -> Config {
        self.double_quote = yes;
        self
    }

    pub fn escape(mut self, escape: Option<u8>) -> Config {
        self.escape = escape;
        self
    }

    pub fn quoting(mut self, yes: bool) -> Config {
        self.quoting = yes;
        self
    }

    pub fn select(mut self, sel_cols: SelectColumns) -> Config {
        self.select_columns = Some(sel_cols);
        self
    }

    pub fn is_std(&self) -> bool {
        self.path.is_none()
    }

    pub fn selection(
        &self,
        first_record: &csv::ByteRecord,
    ) -> Result<Selection, String> {
        match self.select_columns {
            None => Err("Config has no 'SelectColums'. Did you call \
                         Config::select?".to_owned()),
            Some(ref sel) => sel.selection(first_record, !self.no_headers),
        }
    }

    pub fn write_headers<R: io::Read, W: io::Write>
                        (&self, r: &mut csv::Reader<R>, w: &mut csv::Writer<W>)
                        -> csv::Result<()> {
        if !self.no_headers {
            let r = r.byte_headers()?;
            if !r.is_empty() {
                w.write_record(r)?;
            }
        }
        Ok(())
    }

    pub fn writer(&self)
                 -> io::Result<csv::Writer<Box<io::Write+'static>>> {
        Ok(self.from_writer(self.io_writer()?))
    }

    pub fn reader(&self)
                 -> io::Result<csv::Reader<Box<io::Read+'static>>> {
        Ok(self.from_reader(self.io_reader()?))
    }

    pub fn reader_file(&self) -> io::Result<csv::Reader<fs::File>> {
        match self.path {
            None => Err(io::Error::new(
                io::ErrorKind::Other, "Cannot use <stdin> here",
            )),
            Some(ref p) => fs::File::open(p).map(|f| self.from_reader(f)),
        }
    }

    pub fn index_files(&self)
           -> io::Result<Option<(csv::Reader<fs::File>, fs::File)>> {
        let (csv_file, idx_file) = match (&self.path, &self.idx_path) {
            (&None, &None) => return Ok(None),
            (&None, &Some(_)) => return Err(io::Error::new(
                io::ErrorKind::Other,
                "Cannot use <stdin> with indexes",
                // Some(format!("index file: {}", p.display()))
            )),
            (&Some(ref p), &None) => {
                // We generally don't want to report an error here, since we're
                // passively trying to find an index.
                let idx_file = match fs::File::open(&util::idx_path(p)) {
                    // TODO: Maybe we should report an error if the file exists
                    // but is not readable.
                    Err(_) => return Ok(None),
                    Ok(f) => f,
                };
                (fs::File::open(p)?, idx_file)
            }
            (&Some(ref p), &Some(ref ip)) => {
                (fs::File::open(p)?, fs::File::open(ip)?)
            }
        };
        // If the CSV data was last modified after the index file was last
        // modified, then return an error and demand the user regenerate the
        // index.
        let data_modified = util::last_modified(&csv_file.metadata()?);
        let idx_modified = util::last_modified(&idx_file.metadata()?);
        if data_modified > idx_modified {
            return Err(io::Error::new(
                io::ErrorKind::Other,
                "The CSV file was modified after the index file. \
                 Please re-create the index.",
            ));
        }
        let csv_rdr = self.from_reader(csv_file);
        Ok(Some((csv_rdr, idx_file)))
    }

    pub fn indexed(&self)
                  -> CliResult<Option<Indexed<fs::File, fs::File>>> {
        match self.index_files()? {
            None => Ok(None),
            Some((r, i)) => Ok(Some(Indexed::open(r, i)?)),
        }
    }

    pub fn io_reader(&self) -> io::Result<Box<io::Read+'static>> {
        Ok(match self.path {
                None => Box::new(io::stdin()),
                Some(ref p) => {
                    match fs::File::open(p){
                        Ok(x) => Box::new(x),
                        Err(err) => {
                            let msg = format!(
                                "failed to open {}: {}", p.display(), err);
                            return Err(io::Error::new(
                                io::ErrorKind::NotFound,
                                msg,
                            ));
                        }
                    }
                },
            })
    }

    pub fn from_reader<R: Read>(&self, rdr: R) -> csv::Reader<R> {
        csv::ReaderBuilder::new()
            .flexible(self.flexible)
            .delimiter(self.delimiter)
            .has_headers(!self.no_headers)
            .quote(self.quote)
            .quoting(self.quoting)
            .escape(self.escape)
            .from_reader(rdr)
    }

    pub fn io_writer(&self) -> io::Result<Box<io::Write+'static>> {
        Ok(match self.path {
            None => Box::new(io::stdout()),
            Some(ref p) => Box::new(fs::File::create(p)?),
        })
    }

    pub fn from_writer<W: io::Write>(&self, wtr: W) -> csv::Writer<W> {
        csv::WriterBuilder::new()
            .flexible(self.flexible)
            .delimiter(self.delimiter)
            .terminator(self.terminator)
            .quote(self.quote)
            .quote_style(self.quote_style)
            .double_quote(self.double_quote)
            .escape(self.escape.unwrap_or(b'\\'))
            .buffer_capacity(32 * (1<<10))
            .from_writer(wtr)
    }
}


================================================
FILE: src/index.rs
================================================
use std::io;
use std::ops;

use csv;
use csv_index::RandomAccessSimple;

use CliResult;

/// Indexed composes a CSV reader with a simple random access index.
pub struct Indexed<R, I> {
    csv_rdr: csv::Reader<R>,
    idx: RandomAccessSimple<I>,
}

impl<R, I> ops::Deref for Indexed<R, I> {
    type Target = csv::Reader<R>;
    fn deref(&self) -> &csv::Reader<R> { &self.csv_rdr }
}

impl<R, I> ops::DerefMut for Indexed<R, I> {
    fn deref_mut(&mut self) -> &mut csv::Reader<R> { &mut self.csv_rdr }
}

impl<R: io::Read + io::Seek, I: io::Read + io::Seek> Indexed<R, I> {
    /// Opens an index.
    pub fn open(
        csv_rdr: csv::Reader<R>,
        idx_rdr: I,
    ) -> CliResult<Indexed<R, I>> {
        Ok(Indexed {
            csv_rdr: csv_rdr,
            idx: RandomAccessSimple::open(idx_rdr)?,
        })
    }

    /// Return the number of records (not including the header record) in this
    /// index.
    pub fn count(&self) -> u64 {
        if self.csv_rdr.has_headers() && !self.idx.is_empty() {
            self.idx.len() - 1
        } else {
            self.idx.len()
        }
    }

    /// Seek to the starting position of record `i`.
    pub fn seek(&mut self, mut i: u64) -> CliResult<()> {
        if i >= self.count() {
            let msg = format!(
                "invalid record index {} (there are {} records)",
                i, self.count());
            return fail!(io::Error::new(io::ErrorKind::Other, msg));
        }
        if self.csv_rdr.has_headers() {
            i += 1;
        }
        let pos = self.idx.get(i)?;
        self.csv_rdr.seek(pos)?;
        Ok(())
    }
}


================================================
FILE: src/main.rs
================================================
extern crate byteorder;
extern crate crossbeam_channel as channel;
extern crate csv;
extern crate csv_index;
extern crate docopt;
extern crate filetime;
extern crate num_cpus;
extern crate rand;
extern crate regex;
extern crate serde;
#[macro_use]
extern crate serde_derive;
extern crate stats;
extern crate tabwriter;
extern crate threadpool;

use std::borrow::ToOwned;
use std::env;
use std::fmt;
use std::io;
use std::process;

use docopt::Docopt;

macro_rules! wout {
    ($($arg:tt)*) => ({
        use std::io::Write;
        (writeln!(&mut ::std::io::stdout(), $($arg)*)).unwrap();
    });
}

macro_rules! werr {
    ($($arg:tt)*) => ({
        use std::io::Write;
        (writeln!(&mut ::std::io::stderr(), $($arg)*)).unwrap();
    });
}

macro_rules! fail {
    ($e:expr) => (Err(::std::convert::From::from($e)));
}

macro_rules! command_list {
    () => (
"
    cat         Concatenate by row or column
    count       Count records
    fixlengths  Makes all records have same length
    flatten     Show one field per line
    fmt         Format CSV output (change field delimiter)
    frequency   Show frequency tables
    headers     Show header names
    help        Show this usage message.
    index       Create CSV index for faster access
    input       Read CSV data with special quoting rules
    join        Join CSV files
    partition   Partition CSV data based on a column value
    sample      Randomly sample CSV data
    reverse     Reverse rows of CSV data
    search      Search CSV data with regexes
    select      Select columns from CSV
    slice       Slice records from CSV
    sort        Sort CSV data
    split       Split CSV data into many files
    stats       Compute basic statistics
    table       Align CSV data into columns
"
    )
}

mod cmd;
mod config;
mod index;
mod select;
mod util;

static USAGE: &'static str = concat!("
Usage:
    xsv <command> [<args>...]
    xsv [options]

Options:
    --list        List all commands available.
    -h, --help    Display this message
    <command> -h  Display the command help message
    --version     Print version info and exit

Commands:", command_list!());

#[derive(Deserialize)]
struct Args {
    arg_command: Option<Command>,
    flag_list: bool,
}

fn main() {
    let args: Args = Docopt::new(USAGE)
                            .and_then(|d| d.options_first(true)
                                           .version(Some(util::version()))
                                           .deserialize())
                            .unwrap_or_else(|e| e.exit());
    if args.flag_list {
        wout!(concat!("Installed commands:", command_list!()));
        return;
    }
    match args.arg_command {
        None => {
            werr!(concat!(
                "xsv is a suite of CSV command line utilities.

Please choose one of the following commands:",
                command_list!()));
            process::exit(0);
        }
        Some(cmd) => {
            match cmd.run() {
                Ok(()) => process::exit(0),
                Err(CliError::Flag(err)) => err.exit(),
                Err(CliError::Csv(err)) => {
                    werr!("{}", err);
                    process::exit(1);
                }
                Err(CliError::Io(ref err))
                        if err.kind() == io::ErrorKind::BrokenPipe => {
                    process::exit(0);
                }
                Err(CliError::Io(err)) => {
                    werr!("{}", err);
                    process::exit(1);
                }
                Err(CliError::Other(msg)) => {
                    werr!("{}", msg);
                    process::exit(1);
                }
            }
        }
    }
}

#[derive(Debug, Deserialize)]
#[serde(rename_all = "lowercase")]
enum Command {
    Cat,
    Count,
    FixLengths,
    Flatten,
    Fmt,
    Frequency,
    Headers,
    Help,
    Index,
    Input,
    Join,
    Partition,
    Reverse,
    Sample,
    Search,
    Select,
    Slice,
    Sort,
    Split,
    Stats,
    Table,
}

impl Command {
    fn run(self) -> CliResult<()> {
        let argv: Vec<_> = env::args().map(|v| v.to_owned()).collect();
        let argv: Vec<_> = argv.iter().map(|s| &**s).collect();
        let argv = &*argv;

        if !argv[1].chars().all(char::is_lowercase) {
            return Err(CliError::Other(format!(
                "xsv expects commands in lowercase. Did you mean '{}'?", 
                argv[1].to_lowercase()).to_string()));
        }
        match self {
            Command::Cat => cmd::cat::run(argv),
            Command::Count => cmd::count::run(argv),
            Command::FixLengths => cmd::fixlengths::run(argv),
            Command::Flatten => cmd::flatten::run(argv),
            Command::Fmt => cmd::fmt::run(argv),
            Command::Frequency => cmd::frequency::run(argv),
            Command::Headers => cmd::headers::run(argv),
            Command::Help => { wout!("{}", USAGE); Ok(()) }
            Command::Index => cmd::index::run(argv),
            Command::Input => cmd::input::run(argv),
            Command::Join => cmd::join::run(argv),
            Command::Partition => cmd::partition::run(argv),
            Command::Reverse => cmd::reverse::run(argv),
            Command::Sample => cmd::sample::run(argv),
            Command::Search => cmd::search::run(argv),
            Command::Select => cmd::select::run(argv),
            Command::Slice => cmd::slice::run(argv),
            Command::Sort => cmd::sort::run(argv),
            Command::Split => cmd::split::run(argv),
            Command::Stats => cmd::stats::run(argv),
            Command::Table => cmd::table::run(argv),
        }
    }
}

pub type CliResult<T> = Result<T, CliError>;

#[derive(Debug)]
pub enum CliError {
    Flag(docopt::Error),
    Csv(csv::Error),
    Io(io::Error),
    Other(String),
}

impl fmt::Display for CliError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match *self {
            CliError::Flag(ref e) => { e.fmt(f) }
            CliError::Csv(ref e) => { e.fmt(f) }
            CliError::Io(ref e) => { e.fmt(f) }
            CliError::Other(ref s) => { f.write_str(&**s) }
        }
    }
}

impl From<docopt::Error> for CliError {
    fn from(err: docopt::Error) -> CliError {
        CliError::Flag(err)
    }
}

impl From<csv::Error> for CliError {
    fn from(err: csv::Error) -> CliError {
        if !err.is_io_error() {
            return CliError::Csv(err);
        }
        match err.into_kind() {
            csv::ErrorKind::Io(v) => From::from(v),
            _ => unreachable!(),
        }
    }
}

impl From<io::Error> for CliError {
    fn from(err: io::Error) -> CliError {
        CliError::Io(err)
    }
}

impl From<String> for CliError {
    fn from(err: String) -> CliError {
        CliError::Other(err)
    }
}

impl<'a> From<&'a str> for CliError {
    fn from(err: &'a str) -> CliError {
        CliError::Other(err.to_owned())
    }
}

impl From<regex::Error> for CliError {
    fn from(err: regex::Error) -> CliError {
        CliError::Other(format!("{:?}", err))
    }
}


================================================
FILE: src/select.rs
================================================
use std::cmp::Ordering;
use std::collections::HashSet;
use std::fmt;
use std::iter::{self, repeat};
use std::ops;
use std::slice;
use std::str::FromStr;

use csv;
use serde::de::{Deserializer, Deserialize, Error};

#[derive(Clone)]
pub struct SelectColumns {
    selectors: Vec<Selector>,
    invert: bool,
}

impl SelectColumns {
    fn parse(mut s: &str) -> Result<SelectColumns, String> {
        let invert =
            if !s.is_empty() && s.as_bytes()[0] == b'!' {
                s = &s[1..];
                true
            } else {
                false
            };
        Ok(SelectColumns {
            selectors: SelectorParser::new(s).parse()?,
            invert: invert,
        })
    }

    pub fn selection(
        &self,
        first_record: &csv::ByteRecord,
        use_names: bool,
    ) -> Result<Selection, String> {
        if self.selectors.is_empty() {
            return Ok(Selection(if self.invert {
                // Inverting everything means we get nothing.
                vec![]
            } else {
                (0..first_record.len()).collect()
            }));
        }

        let mut map = vec![];
        for sel in &self.selectors {
            let idxs = sel.indices(first_record, use_names);
            map.extend(idxs?.into_iter());
        }
        if self.invert {
            let set: HashSet<_> = map.into_iter().collect();
            let mut map = vec![];
            for i in 0..first_record.len() {
                if !set.contains(&i) {
                    map.push(i);
                }
            }
            return Ok(Selection(map));
        }
        Ok(Selection(map))
    }
}

impl fmt::Debug for SelectColumns {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        if self.selectors.is_empty() {
            write!(f, "<All>")
        } else {
            let strs: Vec<_> =
                self.selectors
                    .iter().map(|sel| format!("{:?}", sel)).collect();
            write!(f, "{}", strs.join(", "))
        }
    }
}

impl<'de> Deserialize<'de> for SelectColumns {
    fn deserialize<D: Deserializer<'de>>(
        d: D,
    ) -> Result<SelectColumns, D::Error> {
        let raw = String::deserialize(d)?;
        SelectColumns::parse(&raw).map_err(|e| D::Error::custom(&e))
    }
}

struct SelectorParser {
    chars: Vec<char>,
    pos: usize,
}

impl SelectorParser {
    fn new(s: &str) -> SelectorParser {
        SelectorParser { chars: s.chars().collect(), pos: 0 }
    }

    fn parse(&mut self) -> Result<Vec<Selector>, String> {
        let mut sels = vec![];
        loop {
            if self.cur().is_none() {
                break;
            }
            let f1: OneSelector =
                if self.cur() == Some('-') {
                    OneSelector::Start
                } else {
                    self.parse_one()?
                };
            let f2: Option<OneSelector> =
                if self.cur() == Some('-') {
                    self.bump();
                    Some(if self.is_end_of_selector() {
                        OneSelector::End
                    } else {
                        self.parse_one()?
                    })
                } else {
                    None
                };
            if !self.is_end_of_selector() {
                return Err(format!(
                    "Expected end of field but got '{}' instead.",
                    self.cur().unwrap()));
            }
            sels.push(match f2 {
                Some(end) => Selector::Range(f1, end),
                None => Selector::One(f1),
            });
            self.bump();
        }
        Ok(sels)
    }

    fn parse_one(&mut self) -> Result<OneSelector, String> {
        let name =
            if self.cur() == Some('"') {
                self.bump();
                self.parse_quoted_name()?
            } else {
                self.parse_name()?
            };
        Ok(if self.cur() == Some('[') {
            let idx = self.parse_index()?;
            OneSelector::IndexedName(name, idx)
        } else {
            match FromStr::from_str(&name) {
                Err(_) => OneSelector::IndexedName(name, 0),
                Ok(idx) => OneSelector::Index(idx),
            }
        })
    }

    fn parse_name(&mut self) -> Result<String, String> {
        let mut name = String::new();
        loop {
            if self.is_end_of_field() || self.cur() == Some('[') {
                break;
            }
            name.push(self.cur().unwrap());
            self.bump();
        }
        Ok(name)
    }

    fn parse_quoted_name(&mut self) -> Result<String, String> {
        let mut name = String::new();
        loop {
            match self.cur() {
                None => {
                    return Err("Unclosed quote, missing closing \"."
                               .to_owned());
                }
                Some('"') => {
                    self.bump();
                    if self.cur() == Some('"') {
                        self.bump();
                        name.push('"'); name.push('"');
                        continue;
                    }
                    break
                }
                Some(c) => { name.push(c); self.bump(); }
            }
        }
        Ok(name)
    }

    fn parse_index(&mut self) -> Result<usize, String> {
        assert_eq!(self.cur().unwrap(), '[');
        self.bump();

        let mut idx = String::new();
        loop {
            match self.cur() {
                None => {
                    return Err("Unclosed index bracket, missing closing ]."
                               .to_owned());
                }
                Some(']') => { self.bump(); break; }
                Some(c) => { idx.push(c); self.bump(); }
            }
        }
        FromStr::from_str(&idx).map_err(|err| {
            format!("Could not convert '{}' to an integer: {}", idx, err)
        })
    }

    fn cur(&self) -> Option<char> {
        self.chars.get(self.pos).cloned()
    }

    fn is_end_of_field(&self) -> bool {
        self.cur().map_or(true, |c| c == ',' || c == '-')
    }

    fn is_end_of_selector(&self) -> bool {
        self.cur().map_or(true, |c| c == ',')
    }

    fn bump(&mut self) {
        if self.pos < self.chars.len() { self.pos += 1; }
    }
}

#[derive(Clone)]
enum Selector {
    One(OneSelector),
    Range(OneSelector, OneSelector),
}

#[derive(Clone)]
enum OneSelector {
    Start,
    End,
    Index(usize),
    IndexedName(String, usize),
}

impl Selector {
    fn indices(
        &self,
        first_record: &csv::ByteRecord,
        use_names: bool,
    ) -> Result<Vec<usize>, String> {
        match *self {
            Selector::One(ref sel) => {
                sel.index(first_record, use_names).map(|i| vec![i])
            }
            Selector::Range(ref sel1, ref sel2) => {
                let i1 = sel1.index(first_record, use_names)?;
                let i2 = sel2.index(first_record, use_names)?;
                Ok(match i1.cmp(&i2) {
                    Ordering::Equal => vec!(i1),
                    Ordering::Less => (i1..(i2 + 1)).collect(),
                    Ordering::Greater => {
                        let mut inds = vec![];
                        let mut i = i1 + 1;
                        while i > i2 {
                            i -= 1;
                            inds.push(i);
                        }
                        inds
                    }
                })
            }
        }
    }
}

impl OneSelector {
    fn index(
        &self,
        first_record: &csv::ByteRecord,
        use_names: bool,
    ) -> Result<usize, String> {
        match *self {
            OneSelector::Start => Ok(0),
            OneSelector::End => Ok(
                if first_record.len() == 0 {
                    0
                } else {
                    first_record.len() - 1
                }
            ),
            OneSelector::Index(i) => {
                if i < 1 || i > first_record.len() {
                    Err(format!("Selector index {} is out of \
                                 bounds. Index must be >= 1 \
                                 and <= {}.", i, first_record.len()))
                } else {
                    // Indices given by user are 1-offset. Convert them here!
                    Ok(i-1)
                }
            }
            OneSelector::IndexedName(ref s, sidx) => {
                if !use_names {
                    return Err(format!("Cannot use names ('{}') in selection \
                                        with --no-headers set.", s));
                }
                let mut num_found = 0;
                for (i, field) in first_record.iter().enumerate() {
                    if field == s.as_bytes() {
                        if num_found == sidx {
                            return Ok(i);
                        }
                        num_found += 1;
                    }
                }
                if num_found == 0 {
                    Err(format!("Selector name '{}' does not exist \
                                 as a named header in the given CSV \
                                 data.", s))
                } else {
                    Err(format!("Selector index '{}' for name '{}' is \
                                 out of bounds. Must be >= 0 and <= {}.",
                                 sidx, s, num_found - 1))
                }
            }
        }
    }
}

impl fmt::Debug for Selector {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match *self {
            Selector::One(ref sel) => sel.fmt(f),
            Selector::Range(ref s, ref e) =>
                write!(f, "Range({:?}, {:?})", s, e),
        }
    }
}

impl fmt::Debug for OneSelector {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match *self {
            OneSelector::Start => write!(f, "Start"),
            OneSelector::End => write!(f, "End"),
            OneSelector::Index(idx) => write!(f, "Index({})", idx),
            OneSelector::IndexedName(ref s, idx) =>
                write!(f, "IndexedName({}[{}])", s, idx),
        }
    }
}

#[derive(Clone, Debug)]
pub struct Selection(Vec<usize>);

pub type _GetField =
    for <'c> fn(&mut &'c csv::ByteRecord, &usize) -> Option<&'c [u8]>;

impl Selection {
    pub fn select<'a, 'b>(&'a self, row: &'b csv::ByteRecord)
                 -> iter::Scan<
                        slice::Iter<'a, usize>,
                        &'b csv::ByteRecord,
                        _GetField,
                    > {
        // This is horrifying.
        fn get_field<'c>(row: &mut &'c csv::ByteRecord, idx: &usize)
                        -> Option<&'c [u8]> {
            Some(&row[*idx])
        }
        let get_field: _GetField = get_field;
        self.iter().scan(row, get_field)
    }

    pub fn normal(&self) -> NormalSelection {
        let &Selection(ref inds) = self;
        if inds.is_empty() {
            return NormalSelection(vec![]);
        }

        let mut normal = inds.clone();
        normal.sort();
        normal.dedup();
        let mut set: Vec<_> =
            repeat(false).take(normal[normal.len()-1] + 1).collect();
        for i in normal.into_iter() {
            set[i] = true;
        }
        NormalSelection(set)
    }

    pub fn len(&self) -> usize {
        self.0.len()
    }
}

impl ops::Deref for Selection {
    type Target = [usize];

    fn deref(&self) -> &[usize] {
        &self.0
    }
}

#[derive(Clone, Debug)]
pub struct NormalSelection(Vec<bool>);

pub type _NormalScan<'a, T, I> = iter::Scan<
    iter::Enumerate<I>,
    &'a [bool],
    _NormalGetField<T>,
>;

pub type _NormalFilterMap<'a, T, I> = iter::FilterMap<
    _NormalScan<'a, T, I>,
    fn(Option<T>) -> Option<T>
>;

pub type _NormalGetField<T> =
    fn(&mut &[bool], (usize, T)) -> Option<Option<T>>;

impl NormalSelection {
    pub fn select<'a, T, I>(&'a self, row: I) -> _NormalFilterMap<'a, T, I>
             where I: Iterator<Item=T> {
        fn filmap<T>(v: Option<T>) -> Option<T> { v }
        fn get_field<T>(set: &mut &[bool], t: (usize, T))
                       -> Option<Option<T>> {
            let (i, v) = t;
            if i < set.len() && set[i] { Some(Some(v)) } else { Some(None) }
        }
        let get_field: _NormalGetField<T> = get_field;
        let filmap: fn(Option<T>) -> Option<T> = filmap;
        row.enumerate().scan(&**self, get_field).filter_map(filmap)
    }

    pub fn len(&self) -> usize {
        self.iter().filter(|b| **b).count()
    }
}

impl ops::Deref for NormalSelection {
    type Target = [bool];

    fn deref(&self) -> &[bool] {
        &self.0
    }
}


================================================
FILE: src/util.rs
================================================
use std::borrow::Cow;
use std::fs;
use std::io;
use std::path::{Path, PathBuf};
use std::str;
use std::thread;
use std::time;

use csv;
use docopt::Docopt;
use num_cpus;
use serde::de::{Deserializer, Deserialize, DeserializeOwned, Error};

use CliResult;
use config::{Config, Delimiter};

pub fn num_cpus() -> usize {
    num_cpus::get()
}

pub fn version() -> String {
    let (maj, min, pat) = (
        option_env!("CARGO_PKG_VERSION_MAJOR"),
        option_env!("CARGO_PKG_VERSION_MINOR"),
        option_env!("CARGO_PKG_VERSION_PATCH"),
    );
    match (maj, min, pat) {
        (Some(maj), Some(min), Some(pat)) =>
            format!("{}.{}.{}", maj, min, pat),
        _ => "".to_owned(),
    }
}

pub fn get_args<T>(usage: &str, argv: &[&str]) -> CliResult<T>
        where T: DeserializeOwned {
    Docopt::new(usage)
           .and_then(|d| d.argv(argv.iter().map(|&x| x))
                          .version(Some(version()))
                          .deserialize())
           .map_err(From::from)
}

pub fn many_configs(inps: &[String], delim: Option<Delimiter>,
                    no_headers: bool) -> Result<Vec<Config>, String> {
    let mut inps = inps.to_vec();
    if inps.is_empty() {
        inps.push("-".to_owned()); // stdin
    }
    let confs = inps.into_iter()
                    .map(|p| Config::new(&Some(p))
                                    .delimiter(delim)
                                    .no_headers(no_headers))
                    .collect::<Vec<_>>();
    errif_greater_one_stdin(&*confs)?;
    Ok(confs)
}

pub fn errif_greater_one_stdin(inps: &[Config]) -> Result<(), String> {
    let nstd = inps.iter().filter(|inp| inp.is_std()).count();
    if nstd > 1 {
        return Err("At most one <stdin> input is allowed.".to_owned());
    }
    Ok(())
}

pub fn chunk_size(nitems: usize, njobs: usize) -> usize {
    if nitems < njobs {
        nitems
    } else {
        nitems / njobs
    }
}

pub fn num_of_chunks(nitems: usize, chunk_size: usize) -> usize {
    if chunk_size == 0 {
        return nitems;
    }
    let mut n = nitems / chunk_size;
    if nitems % chunk_size != 0 {
        n += 1;
    }
    n
}

pub fn last_modified(md: &fs::Metadata) -> u64 {
    use filetime::FileTime;
    FileTime::from_last_modification_time(md).seconds_relative_to_1970()
}

pub fn condense<'a>(val: Cow<'a, [u8]>, n: Option<usize>) -> Cow<'a, [u8]> {
    match n {
        None => val,
        Some(n) => {
            let mut is_short_utf8 = false;
            if let Ok(s) = str::from_utf8(&*val) {
                if n >= s.chars().count() {
                    is_short_utf8 = true;
                } else {
                    let mut s = s.chars().take(n).collect::<String>();
                    s.push_str("...");
                    return Cow::Owned(s.into_bytes());
                }
            }
            if is_short_utf8 || n >= (*val).len() { // already short enough
                val
            } else {
                // This is a non-Unicode string, so we just trim on bytes.
                let mut s = val[0..n].to_vec();
                s.extend(b"...".iter().cloned());
                Cow::Owned(s)
            }
        }
    }
}

pub fn idx_path(csv_path: &Path) -> PathBuf {
    let mut p = csv_path.to_path_buf().into_os_string().into_string().unwrap();
    p.push_str(".idx");
    PathBuf::from(&p)
}

pub type Idx = Option<usize>;

pub fn range(start: Idx, end: Idx, len: Idx, index: Idx)
            -> Result<(usize, usize), String> {
    match (start, end, len, index) {
        (None, None, None, Some(i)) => Ok((i, i+1)),
        (_, _, _, Some(_)) =>
            Err("--index cannot be used with --start, --end or --len".to_owned()),
        (_, Some(_), Some(_), None) =>
            Err("--end and --len cannot be used at the same time.".to_owned()),
        (_, None, None, None) => Ok((start.unwrap_or(0), ::std::usize::MAX)),
        (_, Some(e), None, None) => {
            let s = start.unwrap_or(0);
            if s > e {
                Err(format!("The end of the range ({}) must be greater than or\n\
                             equal to the start of the range ({}).", e, s))
            } else {
                Ok((s, e))
            }
        }
        (_, None, Some(l), None) => {
            let s = start.unwrap_or(0);
            Ok((s, s + l))
        }
    }
}

/// Create a directory recursively, avoiding the race conditons fixed by
/// https://github.com/rust-lang/rust/pull/39799.
fn create_dir_all_threadsafe(path: &Path) -> io::Result<()> {
    // Try 20 times. This shouldn't theoretically need to be any larger
    // than the number of nested directories we need to create.
    for _ in 0..20 {
        match fs::create_dir_all(path) {
            // This happens if a directory in `path` doesn't exist when we
            // test for it, and another thread creates it before we can.
            Err(ref err) if err.kind() == io::ErrorKind::AlreadyExists => {},
            other => return other,
        }
        // We probably don't need to sleep at all, because the intermediate
        // directory is already created.  But let's attempt to back off a
        // bit and let the other thread finish.
        thread::sleep(time::Duration::from_millis(25));
    }
    // Try one last time, returning whatever happens.
    fs::create_dir_all(path)
}

/// Represents a filename template of the form `"{}.csv"`, where `"{}"` is
/// the splace to insert the part of the filename generated by `xsv`.
#[derive(Clone, Debug)]
pub struct FilenameTemplate {
    prefix: String,
    suffix: String,
}

impl FilenameTemplate {
    /// Generate a new filename using `unique_value` to replace the `"{}"`
    /// in the template.
    pub fn filename(&self, unique_value: &str) -> String {
        format!("{}{}{}", &self.prefix, unique_value, &self.suffix)
    }

    /// Create a new, writable file in directory `path` with a filename
    /// using `unique_value` to replace the `"{}"` in the template.  Note
    /// that we do not output headers; the caller must do that if
    /// desired.
    pub fn writer<P>(&self, path: P, unique_value: &str)
                 -> io::Result<csv::Writer<Box<io::Write+'static>>>
        where P: AsRef<Path>
    {
        let filename = self.filename(unique_value);
        let full_path = path.as_ref().join(filename);
        if let Some(parent) = full_path.parent() {
            // We may be called concurrently, especially by parallel `xsv
            // split`, so be careful to avoid the `create_dir_all` race
            // condition.
            create_dir_all_threadsafe(parent)?;
        }
        let spath = Some(full_path.display().to_string());
        Config::new(&spath).writer()
    }
}

impl<'de> Deserialize<'de> for FilenameTemplate {
    fn deserialize<D: Deserializer<'de>>(
        d: D,
    ) -> Result<FilenameTemplate, D::Error> {
        let raw = String::deserialize(d)?;
        let chunks = raw.split("{}").collect::<Vec<_>>();
        if chunks.len() == 2 {
            Ok(FilenameTemplate {
                prefix: chunks[0].to_owned(),
                suffix: chunks[1].to_owned(),
            })
        } else {
            Err(D::Error::custom(
                "The --filename argument must contain one '{}'."))
        }
    }
}


================================================
FILE: tests/test_cat.rs
================================================
use std::process;

use {Csv, CsvData, qcheck};
use workdir::Workdir;

fn no_headers(cmd: &mut process::Command) {
    cmd.arg("--no-headers");
}

fn pad(cmd: &mut process::Command) {
    cmd.arg("--pad");
}

fn run_cat<X, Y, Z, F>(test_name: &str, which: &str, rows1: X, rows2: Y,
                       modify_cmd: F) -> Z
          where X: Csv, Y: Csv, Z: Csv, F: FnOnce(&mut process::Command) {
    let wrk = Workdir::new(test_name);
    wrk.create("in1.csv", rows1);
    wrk.create("in2.csv", rows2);

    let mut cmd = wrk.command("cat");
    modify_cmd(cmd.arg(which).arg("in1.csv").arg("in2.csv"));
    wrk.read_stdout(&mut cmd)
}

#[test]
fn prop_cat_rows() {
    fn p(rows: CsvData) -> bool {
        let expected = rows.clone();
        let (rows1, rows2) =
            if rows.is_empty() {
                (vec![], vec![])
            } else {
                let (rows1, rows2) = rows.split_at(rows.len() / 2);
                (rows1.to_vec(), rows2.to_vec())
            };
        let got: CsvData = run_cat("cat_rows", "rows",
                                   rows1, rows2, no_headers);
        rassert_eq!(got, expected)
    }
    qcheck(p as fn(CsvData) -> bool);
}

#[test]
fn cat_rows_space() {
    let rows = vec![svec!["\u{0085}"]];
    let expected = rows.clone();
    let (rows1, rows2) =
        if rows.is_empty() {
            (vec![], vec![])
        } else {
            let (rows1, rows2) = rows.split_at(rows.len() / 2);
            (rows1.to_vec(), rows2.to_vec())
        };
    let got: Vec<Vec<String>> =
        run_cat("cat_rows_space", "rows", rows1, rows2, no_headers);
    assert_eq!(got, expected);
}

#[test]
fn cat_rows_headers() {
    let rows1 = vec![svec!["h1", "h2"], svec!["a", "b"]];
    let rows2 = vec![svec!["h1", "h2"], svec!["y", "z"]];

    let mut expected = rows1.clone();
    expected.extend(rows2.clone().into_iter().skip(1));

    let got: Vec<Vec<String>> = run_cat("cat_rows_headers", "rows",
                                        rows1, rows2, |_| ());
    assert_eq!(got, expected);
}

#[test]
fn prop_cat_cols() {
    fn p(rows1: CsvData, rows2: CsvData) -> bool {
        let got: Vec<Vec<String>> = run_cat(
            "cat_cols", "columns", rows1.clone(), rows2.clone(), no_headers);

        let mut expected: Vec<Vec<String>> = vec![];
        let (rows1, rows2) = (rows1.to_vecs().into_iter(),
                              rows2.to_vecs().into_iter());
        for (mut r1, r2) in rows1.zip(rows2) {
            r1.extend(r2.into_iter());
            expected.push(r1);
        }
        rassert_eq!(got, expected)
    }
    qcheck(p as fn(CsvData, CsvData) -> bool);
}

#[test]
fn cat_cols_headers() {
    let rows1 = vec![svec!["h1", "h2"], svec!["a", "b"]];
    let rows2 = vec![svec!["h3", "h4"], svec!["y", "z"]];

    let expected = vec![
        svec!["h1", "h2", "h3", "h4"],
        svec!["a", "b", "y", "z"],
    ];
    let got: Vec<Vec<String>> = run_cat("cat_cols_headers", "columns",
                                        rows1, rows2, |_| ());
    assert_eq!(got, expected);
}

#[test]
fn cat_cols_no_pad() {
    let rows1 = vec![svec!["a", "b"]];
    let rows2 = vec![svec!["y", "z"], svec!["y", "z"]];

    let expected = vec![
        svec!["a", "b", "y", "z"],
    ];
    let got: Vec<Vec<String>> = run_cat("cat_cols_headers", "columns",
                                        rows1, rows2, no_headers);
    assert_eq!(got, expected);
}

#[test]
fn cat_cols_pad() {
    let rows1 = vec![svec!["a", "b"]];
    let rows2 = vec![svec!["y", "z"], svec!["y", "z"]];

    let expected = vec![
        svec!["a", "b", "y", "z"],
        svec!["", "", "y", "z"],
    ];
    let got: Vec<Vec<String>> = run_cat("cat_cols_headers", "columns",
                                        rows1, rows2, pad);
    assert_eq!(got, expected);
}


================================================
FILE: tests/test_count.rs
================================================
use {CsvData, qcheck};
use workdir::Workdir;

/// This tests whether `xsv count` gets the right answer.
///
/// It does some simple case analysis to handle whether we want to test counts
/// in the presence of headers and/or indexes.
fn prop_count_len(name: &str, rows: CsvData,
                  headers: bool, idx: bool) -> bool {
    let mut expected_count = rows.len();
    if headers && expected_count > 0 {
        expected_count -= 1;
    }

    let wrk = Workdir::new(name);
    if idx {
        wrk.create_indexed("in.csv", rows);
    } else {
        wrk.create("in.csv", rows);
    }

    let mut cmd = wrk.command("count");
    if !headers {
        cmd.arg("--no-headers");
    }
    cmd.arg("in.csv");

    let got_count: usize = wrk.stdout(&mut cmd);
    rassert_eq!(got_count, expected_count)
}

#[test]
fn prop_count() {
    fn p(rows: CsvData) -> bool {
        prop_count_len("prop_count", rows, false, false)
    }
    qcheck(p as fn(CsvData) -> bool);
}

#[test]
fn prop_count_headers() {
    fn p(rows: CsvData) -> bool {
        prop_count_len("prop_count_headers", rows, true, false)
    }
    qcheck(p as fn(CsvData) -> bool);
}

#[test]
fn prop_count_indexed() {
    fn p(rows: CsvData) -> bool {
        prop_count_len("prop_count_indexed", rows, false, true)
    }
    qcheck(p as fn(CsvData) -> bool);
}

#[test]
fn prop_count_indexed_headers() {
    fn p(rows: CsvData) -> bool {
        prop_count_len("prop_count_indexed_headers", rows, true, true)
    }
    qcheck(p as fn(CsvData) -> bool);
}


================================================
FILE: tests/test_fixlengths.rs
================================================
use quickcheck::TestResult;

use {CsvRecord, qcheck};
use workdir::Workdir;

fn trim_trailing_empty(it : &CsvRecord) -> Vec<String> {
    let mut cloned = it.clone().unwrap();
    while cloned.len() > 1 && cloned.last().unwrap().is_empty() {
        cloned.pop();
    }
    cloned
}

#[test]
fn prop_fixlengths_all_maxlen() {
    fn p(rows: Vec<CsvRecord>) -> TestResult {
        let expected_len =
            match rows.iter().map(|r| trim_trailing_empty(r).len()).max() {
                None => return TestResult::discard(),
                Some(n) => n,
            };

        let wrk = Workdir::new("fixlengths_all_maxlen").flexible(true);
        wrk.create("in.csv", rows);

        let mut cmd = wrk.command("fixlengths");
        cmd.arg("in.csv");

        let got: Vec<CsvRecord> = wrk.read_stdout(&mut cmd);
        let got_len = got.iter().map(|r| r.len()).max().unwrap();
        for r in got.iter() { assert_eq!(r.len(), got_len) }
        TestResult::from_bool(rassert_eq!(got_len, expected_len))
    }
    qcheck(p as fn(Vec<CsvRecord>) -> TestResult);
}

#[test]
fn fixlengths_all_maxlen_trims() {
    let rows = vec![
        svec!["h1", "h2"],
        svec!["abcdef", "ghijkl", "", ""],
        svec!["mnopqr", "stuvwx", "", ""],
    ];

    let wrk = Workdir::new("fixlengths_all_maxlen_trims").flexible(true);
    wrk.create("in.csv", rows);

    let mut cmd = wrk.command("fixlengths");
    cmd.arg("in.csv");

    let got: Vec<CsvRecord> = wrk.read_stdout(&mut cmd);
    for r in got.iter() { assert_eq!(r.len(), 2) }
}

#[test]
fn fixlengths_all_maxlen_trims_at_least_1() {
    let rows = vec![
        svec![""],
        svec!["", ""],
        svec!["", "", ""],
    ];

    let wrk = Workdir::new("fixlengths_all_maxlen_trims_at_least_1").flexible(true);
    wrk.create("in.csv", rows);

    let mut cmd = wrk.command("fixlengths");
    cmd.arg("in.csv");

    let got: Vec<CsvRecord> = wrk.read_stdout(&mut cmd);
    for r in got.iter() { assert_eq!(r.len(), 1) }
}


#[test]
fn prop_fixlengths_explicit_len() {
    fn p(rows: Vec<CsvRecord>, expected_len: usize) -> TestResult {
        if expected_len == 0 || rows.is_empty() {
            return TestResult::discard();
        }

        let wrk = Workdir::new("fixlengths_explicit_len").flexible(true);
        wrk.create("in.csv", rows);

        let mut cmd = wrk.command("fixlengths");
        cmd.arg("in.csv").args(&["-l", &*expected_len.to_string()]);

        let got: Vec<CsvRecord> = wrk.read_stdout(&mut cmd);
        let got_len = got.iter().map(|r| r.len()).max().unwrap();
        for r in got.iter() { assert_eq!(r.len(), got_len) }
        TestResult::from_bool(rassert_eq!(got_len, expected_len))
    }
    qcheck(p as fn(Vec<CsvRecord>, usize) -> TestResult);
}


================================================
FILE: tests/test_flatten.rs
================================================
use std::process;

use workdir::Workdir;

fn setup(name: &str) -> (Workdir, process::Command) {
    let rows = vec![
        svec!["h1", "h2"],
        svec!["abcdef", "ghijkl"],
        svec!["mnopqr", "stuvwx"],
    ];

    let wrk = Workdir::new(name);
    wrk.create("in.csv", rows);

    let mut cmd = wrk.command("flatten");
    cmd.arg("in.csv");

    (wrk, cmd)
}

#[test]
fn flatten_basic() {
    let (wrk, mut cmd) = setup("flatten_basic");
    let got: String = wrk.stdout(&mut cmd);
    let expected = "\
h1  abcdef
h2  ghijkl
#
h1  mnopqr
h2  stuvwx\
";
    assert_eq!(got, expected.to_string());
}

#[test]
fn flatten_no_headers() {
    let (wrk, mut cmd) = setup("flatten_no_headers");
    cmd.arg("--no-headers");

    let got: String = wrk.stdout(&mut cmd);
    let expected = "\
0   h1
1   h2
#
0   abcdef
1   ghijkl
#
0   mnopqr
1   stuvwx\
";
    assert_eq!(got, expected.to_string());
}

#[test]
fn flatten_separator() {
    let (wrk, mut cmd) = setup("flatten_separator");
    cmd.args(&["--separator", "!mysep!"]);

    let got: String = wrk.stdout(&mut cmd);
    let expected = "\
h1  abcdef
h2  ghijkl
!mysep!
h1  mnopqr
h2  stuvwx\
";
    assert_eq!(got, expected.to_string());
}

#[test]
fn flatten_condense() {
    let (wrk, mut cmd) = setup("flatten_condense");
    cmd.args(&["--condense", "2"]);

    let got: String = wrk.stdout(&mut cmd);
    let expected = "\
h1  ab...
h2  gh...
#
h1  mn...
h2  st...\
";
    assert_eq!(got, expected.to_string());
}


================================================
FILE: tests/test_fmt.rs
================================================
use std::process;

use workdir::Workdir;

fn setup(name: &str) -> (Workdir, process::Command) {
    let rows = vec![
        svec!["h1", "h2"],
        svec!["abcdef", "ghijkl"],
        svec!["mnopqr", "stuvwx"],
    ];

    let wrk = Workdir::new(name);
    wrk.create("in.csv", rows);

    let mut cmd = wrk.command("fmt");
    cmd.arg("in.csv");

    (wrk, cmd)
}

#[test]
fn fmt_delimiter() {
    let (wrk, mut cmd) = setup("fmt_delimiter");
    cmd.args(&["--out-delimiter", "\t"]);

    let got: String = wrk.stdout(&mut cmd);
    let expected = "\
h1\th2
abcdef\tghijkl
mnopqr\tstuvwx";
    assert_eq!(got, expected.to_string());
}

#[test]
fn fmt_weird_delimiter() {
    let (wrk, mut cmd) = setup("fmt_weird_delimiter");
    cmd.args(&["--out-delimiter", "h"]);

    let got: String = wrk.stdout(&mut cmd);
    let expected = "\
\"h1\"h\"h2\"
abcdefh\"ghijkl\"
mnopqrhstuvwx";
    assert_eq!(got, expected.to_string());
}

#[test]
fn fmt_crlf() {
    let (wrk, mut cmd) = setup("fmt_crlf");
    cmd.arg("--crlf");

    let got: String = wrk.stdout(&mut cmd);
    let expected = "\
h1,h2\r
abcdef,ghijkl\r
mnopqr,stuvwx";
    assert_eq!(got, expected.to_string());
}

#[test]
fn fmt_quote_always() {
    let (wrk, mut cmd) = setup("fmt_quote_always");
    cmd.arg("--quote-always");

    let got: String = wrk.stdout(&mut cmd);
    let expected = "\
\"h1\",\"h2\"
\"abcdef\",\"ghijkl\"
\"mnopqr\",\"stuvwx\"";
    assert_eq!(got, expected.to_string());
}


================================================
FILE: tests/test_frequency.rs
================================================
use std::borrow::ToOwned;
use std::collections::hash_map::{HashMap, Entry};
use std::process;

use csv;
use stats::Frequencies;

use {Csv, CsvData, qcheck_sized};
use workdir::Workdir;

fn setup(name: &str) -> (Workdir, process::Command) {
    let rows = vec![
        svec!["h1", "h2"],
        svec!["a", "z"],
        svec!["a", "y"],
        svec!["a", "y"],
        svec!["b", "z"],
        svec!["", "z"],
        svec!["(NULL)", "x"],
    ];

    let wrk = Workdir::new(name);
    wrk.create("in.csv", rows);

    let mut cmd = wrk.command("frequency");
    cmd.arg("in.csv");

    (wrk, cmd)
}

#[test]
fn frequency_no_headers() {
    let (wrk, mut cmd) = setup("frequency_no_headers");
    cmd.args(&["--limit", "0"]).args(&["--select", "1"]).arg("--no-headers");

    let mut got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    got = got.into_iter().skip(1).collect();
    got.sort();
    let expected = vec![
        svec!["1", "(NULL)", "1"],
        svec!["1", "(NULL)", "1"],
        svec!["1", "a", "3"],
        svec!["1", "b", "1"],
        svec!["1", "h1", "1"],
    ];
    assert_eq!(got, expected);
}

#[test]
fn frequency_no_nulls() {
    let (wrk, mut cmd) = setup("frequency_no_nulls");
    cmd.arg("--no-nulls").args(&["--limit", "0"]).args(&["--select", "h1"]);

    let mut got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    got.sort();
    let expected = vec![
        svec!["field", "value", "count"],
        svec!["h1", "(NULL)", "1"],
        svec!["h1", "a", "3"],
        svec!["h1", "b", "1"],
    ];
    assert_eq!(got, expected);
}

#[test]
fn frequency_nulls() {
    let (wrk, mut cmd) = setup("frequency_nulls");
    cmd.args(&["--limit", "0"]).args(&["--select", "h1"]);

    let mut got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    got.sort();
    let expected = vec![
        svec!["field", "value", "count"],
        svec!["h1", "(NULL)", "1"],
        svec!["h1", "(NULL)", "1"],
        svec!["h1", "a", "3"],
        svec!["h1", "b", "1"],
    ];
    assert_eq!(got, expected);
}

#[test]
fn frequency_limit() {
    let (wrk, mut cmd) = setup("frequency_limit");
    cmd.args(&["--limit", "1"]);

    let mut got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    got.sort();
    let expected = vec![
        svec!["field", "value", "count"],
        svec!["h1", "a", "3"],
        svec!["h2", "z", "3"],
    ];
    assert_eq!(got, expected);
}

#[test]
fn frequency_asc() {
    let (wrk, mut cmd) = setup("frequency_asc");
    cmd.args(&["--limit", "1"]).args(&["--select", "h2"]).arg("--asc");

    let mut got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    got.sort();
    let expected = vec![
        svec!["field", "value", "count"],
        svec!["h2", "x", "1"],
    ];
    assert_eq!(got, expected);
}

#[test]
fn frequency_select() {
    let (wrk, mut cmd) = setup("frequency_select");
    cmd.args(&["--limit", "0"]).args(&["--select", "h2"]);

    let mut got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    got.sort();
    let expected = vec![
        svec!["field", "value", "count"],
        svec!["h2", "x", "1"],
        svec!["h2", "y", "2"],
        svec!["h2", "z", "3"],
    ];
    assert_eq!(got, expected);
}

// This tests that a frequency table computed by `xsv` is always the same
// as the frequency table computed in memory.
#[test]
fn prop_frequency() {
    fn p(rows: CsvData) -> bool {
        param_prop_frequency("prop_frequency", rows, false)
    }
    // Run on really small values because we are incredibly careless
    // with allocation.
    qcheck_sized(p as fn(CsvData) -> bool, 2);
}


// This tests that running the frequency command on a CSV file with these two
// rows does not burst in flames:
//
//     \u{FEFF}
//     ""
//
// In this case, the `param_prop_frequency` just ignores this particular test.
// Namely, \u{FEFF} is the UTF-8 BOM, which is ignored by the underlying CSV
// reader.
#[test]
fn frequency_bom() {
    let rows = CsvData {
        data: vec![
            ::CsvRecord(vec!["\u{FEFF}".to_string()]),
            ::CsvRecord(vec!["".to_string()]),
        ],
    };
    assert!(param_prop_frequency("prop_frequency", rows, false))
}

// This tests that a frequency table computed by `xsv` (with an index) is
// always the same as the frequency table computed in memory.
#[test]
fn prop_frequency_indexed() {
    fn p(rows: CsvData) -> bool {
        param_prop_frequency("prop_frequency_indxed", rows, true)
    }
    // Run on really small values because we are incredibly careless
    // with allocation.
    qcheck_sized(p as fn(CsvData) -> bool, 2);
}

fn param_prop_frequency(name: &str, rows: CsvData, idx: bool) -> bool {
    if !rows.is_empty() && rows[0][0].len() == 3 && rows[0][0] == "\u{FEFF}" {
        return true;
    }
    let wrk = Workdir::new(name);
    if idx {
        wrk.create_indexed("in.csv", rows.clone());
    } else {
        wrk.create("in.csv", rows.clone());
    }

    let mut cmd = wrk.command("frequency");
    cmd.arg("in.csv").args(&["-j", "4"]).args(&["--limit", "0"]);

    let stdout = wrk.stdout::<String>(&mut cmd);
    let got_ftables = ftables_from_csv_string(stdout);
    let expected_ftables = ftables_from_rows(rows);
    assert_eq_ftables(&got_ftables, &expected_ftables)
}

type FTables = HashMap<String, Frequencies<String>>;

#[derive(Deserialize)]
struct FRow {
    field: String,
    value: String,
    count: usize,
}

fn ftables_from_rows<T: Csv>(rows: T) -> FTables {
    let mut rows = rows.to_vecs();
    if rows.len() <= 1 {
        return HashMap::new();
    }

    let header = rows.remove(0);
    let mut ftables = HashMap::new();
    for field in header.iter() {
        ftables.insert(field.clone(), Frequencies::new());
    }
    for row in rows.into_iter() {
        for (i, mut field) in row.into_iter().enumerate() {
            field = field.trim().to_owned();
            if field.is_empty() {
                field = "(NULL)".to_owned();
            }
            ftables.get_mut(&header[i]).unwrap().add(field);
        }
    }
    ftables
}

fn ftables_from_csv_string(data: String) -> FTables {
    let mut rdr = csv::Reader::from_reader(data.as_bytes());
    let mut ftables = HashMap::new();
    for frow in rdr.deserialize() {
        let frow: FRow = frow.unwrap();
        match ftables.entry(frow.field) {
            Entry::Vacant(v) => {
                let mut ftable = Frequencies::new();
                for _ in 0..frow.count {
                    ftable.add(frow.value.clone());
                }
                v.insert(ftable);
            }
            Entry::Occupied(mut v) => {
                for _ in 0..frow.count {
                    v.get_mut().add(frow.value.clone());
                }
            }
        }
    }
    ftables
}

fn freq_data<T>(ftable: &Frequencies<T>) -> Vec<(&T, u64)>
        where T: ::std::hash::Hash + Ord + Clone {
    let mut freqs = ftable.most_frequent();
    freqs.sort();
    freqs
}

fn assert_eq_ftables(got: &FTables, expected: &FTables) -> bool {
    for (k, v) in got.iter() {
        assert_eq!(freq_data(v), freq_data(expected.get(k).unwrap()));
    }
    for (k, v) in expected.iter() {
        assert_eq!(freq_data(got.get(k).unwrap()), freq_data(v));
    }
    true
}


================================================
FILE: tests/test_headers.rs
================================================
use std::process;

use workdir::Workdir;

fn setup(name: &str) -> (Workdir, process::Command) {
    let rows1 = vec![svec!["h1", "h2"], svec!["a", "b"]];
    let rows2 = vec![svec!["h2", "h3"], svec!["y", "z"]];

    let wrk = Workdir::new(name);
    wrk.create("in1.csv", rows1);
    wrk.create("in2.csv", rows2);

    let mut cmd = wrk.command("headers");
    cmd.arg("in1.csv");

    (wrk, cmd)
}

#[test]
fn headers_basic() {
    let (wrk, mut cmd) = setup("headers_basic");

    let got: String = wrk.stdout(&mut cmd);
    let expected = "\
1   h1
2   h2";
    assert_eq!(got, expected.to_string());
}

#[test]
fn headers_just_names() {
    let (wrk, mut cmd) = setup("headers_just_names");
    cmd.arg("--just-names");

    let got: String = wrk.stdout(&mut cmd);
    let expected = "\
h1
h2";
    assert_eq!(got, expected.to_string());
}

#[test]
fn headers_multiple() {
    let (wrk, mut cmd) = setup("headers_multiple");
    cmd.arg("in2.csv");

    let got: String = wrk.stdout(&mut cmd);
    let expected = "\
h1
h2
h2
h3";
    assert_eq!(got, expected.to_string());
}

#[test]
fn headers_intersect() {
    let (wrk, mut cmd) = setup("headers_intersect");
    cmd.arg("in2.csv").arg("--intersect");

    let got: String = wrk.stdout(&mut cmd);
    let expected = "\
h1
h2
h3";
    assert_eq!(got, expected.to_string());
}


================================================
FILE: tests/test_index.rs
================================================
use std::fs;

use filetime::{FileTime, set_file_times};

use workdir::Workdir;

#[test]
fn index_outdated() {
    let wrk = Workdir::new("index_outdated");
    wrk.create_indexed("in.csv", vec![svec![""]]);

    let md = fs::metadata(&wrk.path("in.csv.idx")).unwrap();
    set_file_times(
        &wrk.path("in.csv"),
        future_time(FileTime::from_last_modification_time(&md)),
        future_time(FileTime::from_last_access_time(&md)),
    ).unwrap();

    let mut cmd = wrk.command("count");
    cmd.arg("--no-headers").arg("in.csv");
    wrk.assert_err(&mut cmd);
}

fn future_time(ft: FileTime) -> FileTime {
    let secs = ft.seconds_relative_to_1970();
    FileTime::from_seconds_since_1970(secs + 10_000, 0)
}


================================================
FILE: tests/test_join.rs
================================================
use workdir::Workdir;

// This macro takes *two* identifiers: one for the test with headers
// and another for the test without headers.
macro_rules! join_test {
    ($name:ident, $fun:expr) => (
        mod $name {
            use std::process;

            use workdir::Workdir;
            use super::{make_rows, setup};

            #[test]
            fn headers() {
                let wrk = setup(stringify!($name), true);
                let mut cmd = wrk.command("join");
                cmd.args(&["city", "cities.csv", "city", "places.csv"]);
                $fun(wrk, cmd, true);
            }

            #[test]
            fn no_headers() {
                let n = stringify!(concat_idents!($name, _no_headers));
                let wrk = setup(n, false);
                let mut cmd = wrk.command("join");
                cmd.arg("--no-headers");
                cmd.args(&["1", "cities.csv", "1", "places.csv"]);
                $fun(wrk, cmd, false);
            }
        }
    );
}

fn setup(name: &str, headers: bool) -> Workdir {
    let mut cities = vec![
        svec!["Boston", "MA"],
        svec!["New York", "NY"],
        svec!["San Francisco", "CA"],
        svec!["Buffalo", "NY"],
    ];
    let mut places = vec![
        svec!["Boston", "Logan Airport"],
        svec!["Boston", "Boston Garden"],
        svec!["Buffalo", "Ralph Wilson Stadium"],
        svec!["Orlando", "Disney World"],
    ];
    if headers { cities.insert(0, svec!["city", "state"]); }
    if headers { places.insert(0, svec!["city", "place"]); }

    let wrk = Workdir::new(name);
    wrk.create("cities.csv", cities);
    wrk.create("places.csv", places);
    wrk
}

fn make_rows(headers: bool, rows: Vec<Vec<String>>) -> Vec<Vec<String>> {
    let mut all_rows = vec![];
    if headers {
        all_rows.push(svec!["city", "state", "city", "place"]);
    }
    all_rows.extend(rows.into_iter());
    all_rows
}

join_test!(join_inner,
           |wrk: Workdir, mut cmd: process::Command, headers: bool| {
    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = make_rows(headers, vec![
        svec!["Boston", "MA", "Boston", "Logan Airport"],
        svec!["Boston", "MA", "Boston", "Boston Garden"],
        svec!["Buffalo", "NY", "Buffalo", "Ralph Wilson Stadium"],
    ]);
    assert_eq!(got, expected);
});

join_test!(join_outer_left,
           |wrk: Workdir, mut cmd: process::Command, headers: bool| {
    cmd.arg("--left");
    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = make_rows(headers, vec![
        svec!["Boston", "MA", "Boston", "Logan Airport"],
        svec!["Boston", "MA", "Boston", "Boston Garden"],
        svec!["New York", "NY", "", ""],
        svec!["San Francisco", "CA", "", ""],
        svec!["Buffalo", "NY", "Buffalo", "Ralph Wilson Stadium"],
    ]);
    assert_eq!(got, expected);
});

join_test!(join_outer_right,
           |wrk: Workdir, mut cmd: process::Command, headers: bool| {
    cmd.arg("--right");
    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = make_rows(headers, vec![
        svec!["Boston", "MA", "Boston", "Logan Airport"],
        svec!["Boston", "MA", "Boston", "Boston Garden"],
        svec!["Buffalo", "NY", "Buffalo", "Ralph Wilson Stadium"],
        svec!["", "", "Orlando", "Disney World"],
    ]);
    assert_eq!(got, expected);
});

join_test!(join_outer_full,
           |wrk: Workdir, mut cmd: process::Command, headers: bool| {
    cmd.arg("--full");
    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = make_rows(headers, vec![
        svec!["Boston", "MA", "Boston", "Logan Airport"],
        svec!["Boston", "MA", "Boston", "Boston Garden"],
        svec!["New York", "NY", "", ""],
        svec!["San Francisco", "CA", "", ""],
        svec!["Buffalo", "NY", "Buffalo", "Ralph Wilson Stadium"],
        svec!["", "", "Orlando", "Disney World"],
    ]);
    assert_eq!(got, expected);
});

#[test]
fn join_inner_issue11() {
    let a = vec![
        svec!["1", "2"],
        svec!["3", "4"],
        svec!["5", "6"],
    ];
    let b = vec![
        svec!["2", "1"],
        svec!["4", "3"],
        svec!["6", "5"],
    ];

    let wrk = Workdir::new("join_inner_issue11");
    wrk.create("a.csv", a);
    wrk.create("b.csv", b);

    let mut cmd = wrk.command("join");
    cmd.args(&["1,2", "a.csv", "2,1", "b.csv"]);

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = vec![
        svec!["1", "2", "2", "1"],
        svec!["3", "4", "4", "3"],
        svec!["5", "6", "6", "5"],
    ];
    assert_eq!(got, expected);
}

#[test]
fn join_cross() {
    let wrk = Workdir::new("join_cross");
    wrk.create("letters.csv",
               vec![svec!["h1", "h2"], svec!["a", "b"], svec!["c", "d"]]);
    wrk.create("numbers.csv",
               vec![svec!["h3", "h4"], svec!["1", "2"], svec!["3", "4"]]);

    let mut cmd = wrk.command("join");
    cmd.arg("--cross")
       .args(&["", "letters.csv", "", "numbers.csv"]);
    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = vec![
        svec!["h1", "h2", "h3", "h4"],
        svec!["a", "b", "1", "2"],
        svec!["a", "b", "3", "4"],
        svec!["c", "d", "1", "2"],
        svec!["c", "d", "3", "4"],
    ];
    assert_eq!(got, expected);
}

#[test]
fn join_cross_no_headers() {
    let wrk = Workdir::new("join_cross_no_headers");
    wrk.create("letters.csv", vec![svec!["a", "b"], svec!["c", "d"]]);
    wrk.create("numbers.csv", vec![svec!["1", "2"], svec!["3", "4"]]);

    let mut cmd = wrk.command("join");
    cmd.arg("--cross").arg("--no-headers")
       .args(&["", "letters.csv", "", "numbers.csv"]);
    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = vec![
        svec!["a", "b", "1", "2"],
        svec!["a", "b", "3", "4"],
        svec!["c", "d", "1", "2"],
        svec!["c", "d", "3", "4"],
    ];
    assert_eq!(got, expected);
}


================================================
FILE: tests/test_partition.rs
================================================
use std::borrow::ToOwned;

use workdir::Workdir;

macro_rules! part_eq {
    ($wrk:expr, $path:expr, $expected:expr) => (
        assert_eq!($wrk.from_str::<String>(&$wrk.path($path)),
                   $expected.to_owned());
    );
}

fn data(headers: bool) -> Vec<Vec<String>> {
    let mut rows = vec![
        svec!["NY", "Manhatten"],
        svec!["CA", "San Francisco"],
        svec!["TX", "Dallas"],
        svec!["NY", "Buffalo"],
        svec!["TX", "Fort Worth"],
    ];
    if headers { rows.insert(0, svec!["state", "city"]); }
    rows
}

#[test]
fn partition() {
    let wrk = Workdir::new("partition");
    wrk.create("in.csv", data(true));

    let mut cmd = wrk.command("partition");
    cmd.arg("state").arg(&wrk.path(".")).arg("in.csv");
    wrk.run(&mut cmd);

    part_eq!(wrk, "CA.csv", "\
state,city
CA,San Francisco
");
    part_eq!(wrk, "NY.csv", "\
state,city
NY,Manhatten
NY,Buffalo
");
    part_eq!(wrk, "TX.csv", "\
state,city
TX,Dallas
TX,Fort Worth
");
}

#[test]
fn partition_drop() {
    let wrk = Workdir::new("partition");
    wrk.create("in.csv", data(true));

    let mut cmd = wrk.command("partition");
    cmd.arg("--drop").arg("state").arg(&wrk.path(".")).arg("in.csv");
    wrk.run(&mut cmd);

    part_eq!(wrk, "CA.csv", "\
city
San Francisco
");
    part_eq!(wrk, "NY.csv", "\
city
Manhatten
Buffalo
");
    part_eq!(wrk, "TX.csv", "\
city
Dallas
Fort Worth
");
}

#[test]
fn partition_without_headers() {
    let wrk = Workdir::new("partition_without_headers");
    wrk.create("in.csv", data(false));

    let mut cmd = wrk.command("partition");
    cmd.arg("--no-headers").arg("1").arg(&wrk.path(".")).arg("in.csv");
    wrk.run(&mut cmd);

    part_eq!(wrk, "CA.csv", "\
CA,San Francisco
");
    part_eq!(wrk, "NY.csv", "\
NY,Manhatten
NY,Buffalo
");
    part_eq!(wrk, "TX.csv", "\
TX,Dallas
TX,Fort Worth
");
}

#[test]
fn partition_drop_without_headers() {
    let wrk = Workdir::new("partition_without_headers");
    wrk.create("in.csv", data(false));

    let mut cmd = wrk.command("partition");
    cmd.arg("--drop").arg("--no-headers").arg("1").arg(&wrk.path(".")).arg("in.csv");
    wrk.run(&mut cmd);

    part_eq!(wrk, "CA.csv", "\
San Francisco
");
    part_eq!(wrk, "NY.csv", "\
Manhatten
Buffalo
");
    part_eq!(wrk, "TX.csv", "\
Dallas
Fort Worth
");
}

#[test]
fn partition_into_new_directory() {
    let wrk = Workdir::new("partition_into_new_directory");
    wrk.create("in.csv", data(true));

    let mut cmd = wrk.command("partition");
    cmd.arg("state").arg(&wrk.path("out")).arg("in.csv");
    wrk.run(&mut cmd);

    assert!(wrk.path("out/NY.csv").exists());
}

#[test]
fn partition_custom_filename() {
    let wrk = Workdir::new("partition_custom_filename");
    wrk.create("in.csv", data(true));

    let mut cmd = wrk.command("partition");
    cmd.args(&["--filename", "state-{}-partition.csv"])
        .arg("state")
        .arg(&wrk.path("."))
        .arg("in.csv");
    wrk.run(&mut cmd);

    assert!(wrk.path("state-NY-partition.csv").exists());
}

#[test]
fn partition_custom_filename_with_directory() {
    let wrk = Workdir::new("partition_custom_filename_with_directory");
    wrk.create("in.csv", data(true));

    let mut cmd = wrk.command("partition");
    cmd.args(&["--filename", "{}/cities.csv"])
        .arg("state")
        .arg(&wrk.path("."))
        .arg("in.csv");
    wrk.run(&mut cmd);

    // This variation also helps with parallel partition jobs.
    assert!(wrk.path("NY/cities.csv").exists());
}

#[test]
fn partition_invalid_filename() {
    let wrk = Workdir::new("partition_invalid_filename");
    wrk.create("in.csv", data(true));

    let mut cmd = wrk.command("partition");
    cmd.args(&["--filename", "foo.csv"])
        .arg("state")
        .arg(&wrk.path("."))
        .arg("in.csv");
    wrk.assert_err(&mut cmd);

    let mut cmd = wrk.command("partition");
    cmd.args(&["--filename", "{}{}.csv"])
        .arg("state")
        .arg(&wrk.path("."))
        .arg("in.csv");
    wrk.assert_err(&mut cmd);
}

fn tricky_data() -> Vec<Vec<String>> {
    vec![
        svec!["key", "explanation"],
        svec!["", "empty key"],
        svec!["empty", "the string empty"],
        svec!["unsafe _1$!,\"", "unsafe in shell"],
        svec!["collision", "ordinary value"],
        svec!["collision", "in same file"],
        svec!["coll ision", "collides"],
        svec!["collision!", "collides again"],
        svec!["collision_2", "collides with disambiguated"],
    ]
}

#[test]
fn partition_with_tricky_key_values() {
    let wrk = Workdir::new("partition_with_tricky_key_values");
    wrk.create("in.csv", tricky_data());

    let mut cmd = wrk.command("partition");
    cmd.arg("key").arg(&wrk.path(".")).arg("in.csv");
    wrk.run(&mut cmd);

    part_eq!(wrk, "empty.csv", "\
key,explanation
,empty key
");
     part_eq!(wrk, "empty_1.csv", "\
key,explanation
empty,the string empty
");
     part_eq!(wrk, "unsafe_1.csv", r#"key,explanation
"unsafe _1$!,""",unsafe in shell
"#);
     part_eq!(wrk, "collision.csv", "\
key,explanation
collision,ordinary value
collision,in same file
");
    part_eq!(wrk, "collision_2.csv", "\
key,explanation
coll ision,collides
");
    part_eq!(wrk, "collision_3.csv", "\
key,explanation
collision!,collides again
");
    // Tricky! We didn't see this an input, but we did generate it as an
    // output already.
    part_eq!(wrk, "collision_2_4.csv", "\
key,explanation
collision_2,collides with disambiguated
");
}

fn prefix_data() -> Vec<Vec<String>> {
    vec![
        svec!["state", "city"],
        svec!["MA", "Boston"],
        svec!["ME", "Portland"],
        svec!["M", "Too short"],
        svec!["CA", "San Francisco"],
        svec!["CO", "Denver"],
    ]
}

#[test]
fn partition_with_prefix_length() {
    let wrk = Workdir::new("partition_with_prefix_length");
    wrk.create("in.csv", prefix_data());

    let mut cmd = wrk.command("partition");
    cmd
        .args(&["--prefix-length", "1"])
        .arg("state")
        .arg(&wrk.path("."))
        .arg("in.csv");
    wrk.run(&mut cmd);

    part_eq!(wrk, "M.csv", "\
state,city
MA,Boston
ME,Portland
M,Too short
");
    part_eq!(wrk, "C.csv", "\
state,city
CA,San Francisco
CO,Denver
");
}


================================================
FILE: tests/test_reverse.rs
================================================
use workdir::Workdir;

use {Csv, CsvData, qcheck};

fn prop_reverse(name: &str, rows: CsvData, headers: bool) -> bool {
    let wrk = Workdir::new(name);
    wrk.create("in.csv", rows.clone());

    let mut cmd = wrk.command("reverse");
    cmd.arg("in.csv");
    if !headers { cmd.arg("--no-headers"); }

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let mut expected = rows.to_vecs();
    let headers = if headers && !expected.is_empty() {
        expected.remove(0)
    } else {
        vec![]
    };
    expected.reverse();
    if !headers.is_empty() { expected.insert(0, headers); }
    rassert_eq!(got, expected)
}

#[test]
fn prop_reverse_headers() {
    fn p(rows: CsvData) -> bool {
        prop_reverse("prop_reverse_headers", rows, true)
    }
    qcheck(p as fn(CsvData) -> bool);
}

#[test]
fn prop_reverse_no_headers() {
    fn p(rows: CsvData) -> bool {
        prop_reverse("prop_reverse_no_headers", rows, false)
    }
    qcheck(p as fn(CsvData) -> bool);
}


================================================
FILE: tests/test_search.rs
================================================
use workdir::Workdir;

fn data(headers: bool) -> Vec<Vec<String>> {
    let mut rows = vec![
        svec!["foobar", "barfoo"],
        svec!["a", "b"],
        svec!["barfoo", "foobar"],
    ];
    if headers { rows.insert(0, svec!["h1", "h2"]); }
    rows
}

#[test]
fn search() {
    let wrk = Workdir::new("search");
    wrk.create("data.csv", data(true));
    let mut cmd = wrk.command("search");
    cmd.arg("^foo").arg("data.csv");

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = vec![
        svec!["h1", "h2"],
        svec!["foobar", "barfoo"],
        svec!["barfoo", "foobar"],
    ];
    assert_eq!(got, expected);
}

#[test]
fn search_empty() {
    let wrk = Workdir::new("search");
    wrk.create("data.csv", data(true));
    let mut cmd = wrk.command("search");
    cmd.arg("xxx").arg("data.csv");

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = vec![
        svec!["h1", "h2"],
    ];
    assert_eq!(got, expected);
}

#[test]
fn search_empty_no_headers() {
    let wrk = Workdir::new("search");
    wrk.create("data.csv", data(true));
    let mut cmd = wrk.command("search");
    cmd.arg("xxx").arg("data.csv");
    cmd.arg("--no-headers");

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected: Vec<Vec<String>> = vec![];
    assert_eq!(got, expected);
}

#[test]
fn search_ignore_case() {
    let wrk = Workdir::new("search");
    wrk.create("data.csv", data(true));
    let mut cmd = wrk.command("search");
    cmd.arg("^FoO").arg("data.csv");
    cmd.arg("--ignore-case");

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = vec![
        svec!["h1", "h2"],
        svec!["foobar", "barfoo"],
        svec!["barfoo", "foobar"],
    ];
    assert_eq!(got, expected);
}

#[test]
fn search_no_headers() {
    let wrk = Workdir::new("search_no_headers");
    wrk.create("data.csv", data(false));
    let mut cmd = wrk.command("search");
    cmd.arg("^foo").arg("data.csv");
    cmd.arg("--no-headers");

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = vec![
        svec!["foobar", "barfoo"],
        svec!["barfoo", "foobar"],
    ];
    assert_eq!(got, expected);
}

#[test]
fn search_select() {
    let wrk = Workdir::new("search_select");
    wrk.create("data.csv", data(true));
    let mut cmd = wrk.command("search");
    cmd.arg("^foo").arg("data.csv");
    cmd.arg("--select").arg("h2");

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = vec![
        svec!["h1", "h2"],
        svec!["barfoo", "foobar"],
    ];
    assert_eq!(got, expected);
}

#[test]
fn search_select_no_headers() {
    let wrk = Workdir::new("search_select_no_headers");
    wrk.create("data.csv", data(false));
    let mut cmd = wrk.command("search");
    cmd.arg("^foo").arg("data.csv");
    cmd.arg("--select").arg("2");
    cmd.arg("--no-headers");

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = vec![
        svec!["barfoo", "foobar"],
    ];
    assert_eq!(got, expected);
}

#[test]
fn search_invert_match() {
    let wrk = Workdir::new("search_invert_match");
    wrk.create("data.csv", data(false));
    let mut cmd = wrk.command("search");
    cmd.arg("^foo").arg("data.csv");
    cmd.arg("--invert-match");

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = vec![
        svec!["foobar", "barfoo"],
        svec!["a", "b"],
    ];
    assert_eq!(got, expected);
}

#[test]
fn search_invert_match_no_headers() {
    let wrk = Workdir::new("search_invert_match");
    wrk.create("data.csv", data(false));
    let mut cmd = wrk.command("search");
    cmd.arg("^foo").arg("data.csv");
    cmd.arg("--invert-match");
    cmd.arg("--no-headers");

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = vec![
        svec!["a", "b"],
    ];
    assert_eq!(got, expected);
}


================================================
FILE: tests/test_select.rs
================================================
use workdir::Workdir;

macro_rules! select_test {
    ($name:ident, $select:expr, $select_no_headers:expr,
     $expected_headers:expr, $expected_rows:expr) => (
        mod $name {
            use workdir::Workdir;
            use super::data;

            #[test]
            fn headers() {
                let wrk = Workdir::new(stringify!($name));
                wrk.create("data.csv", data(true));
                let mut cmd = wrk.command("select");
                cmd.arg("--").arg($select).arg("data.csv");
                let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);

                let expected = vec![
                    $expected_headers.iter()
                                     .map(|s| s.to_string())
                                     .collect::<Vec<String>>(),
                    $expected_rows.iter()
                                  .map(|s| s.to_string())
                                  .collect::<Vec<String>>(),
                ];
                assert_eq!(got, expected);
            }

            #[test]
            fn no_headers() {
                let wrk = Workdir::new(stringify!($name));
                wrk.create("data.csv", data(false));
                let mut cmd = wrk.command("select");
                cmd.arg("--no-headers")
                   .arg("--").arg($select_no_headers).arg("data.csv");
                let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);

                let expected = vec![
                    $expected_rows.iter()
                                  .map(|s| s.to_string())
                                  .collect::<Vec<String>>(),
                ];
                assert_eq!(got, expected);
            }
        }
    );
}

macro_rules! select_test_err {
    ($name:ident, $select:expr) => (
        #[test]
        fn $name() {
            let wrk = Workdir::new(stringify!($name));
            wrk.create("data.csv", data(true));
            let mut cmd = wrk.command("select");
            cmd.arg($select).arg("data.csv");
            wrk.assert_err(&mut cmd);
        }
    );
}

fn header_row() -> Vec<String> { svec!["h1", "h2", "h[]3", "h4", "h1"] }

fn data(headers: bool) -> Vec<Vec<String>> {
    let mut rows = vec![
        svec!["a", "b", "c", "d", "e"],
    ];
    if headers { rows.insert(0, header_row()) }
    rows
}

select_test!(select_simple, "h1", "1", ["h1"], ["a"]);
select_test!(select_simple_idx, "h1[0]", "1", ["h1"], ["a"]);
select_test!(select_simple_idx_2, "h1[1]", "5", ["h1"], ["e"]);

select_test!(select_quoted, r#""h[]3""#, "3", ["h[]3"], ["c"]);
select_test!(select_quoted_idx, r#""h[]3"[0]"#, "3", ["h[]3"], ["c"]);

select_test!(select_range, "h1-h4", "1-4",
             ["h1", "h2", "h[]3", "h4"], ["a", "b", "c", "d"]);

select_test!(select_range_multi, r#"h1-h2,"h[]3"-h4"#, "1-2,3-4",
             ["h1", "h2", "h[]3", "h4"], ["a", "b", "c", "d"]);
select_test!(select_range_multi_idx, r#"h1-h2,"h[]3"[0]-h4"#, "1-2,3-4",
             ["h1", "h2", "h[]3", "h4"], ["a", "b", "c", "d"]);

select_test!(select_reverse, "h1[1]-h1[0]", "5-1",
             ["h1", "h4", "h[]3", "h2", "h1"], ["e", "d", "c", "b", "a"]);

select_test!(select_not, r#"!"h[]3"[0]"#, "!3",
             ["h1", "h2", "h4", "h1"], ["a", "b", "d", "e"]);
select_test!(select_not_range, "!h1[1]-h2", "!5-2", ["h1"], ["a"]);

select_test!(select_duplicate, "h1,h1", "1,1", ["h1", "h1"], ["a", "a"]);
select_test!(select_duplicate_range, "h1-h2,h1-h2", "1-2,1-2",
             ["h1", "h2", "h1", "h2"], ["a", "b", "a", "b"]);
select_test!(select_duplicate_range_reverse, "h1-h2,h2-h1", "1-2,2-1",
             ["h1", "h2", "h2", "h1"], ["a", "b", "b", "a"]);

select_test!(select_range_no_end, "h4-", "4-", ["h4", "h1"], ["d", "e"]);
select_test!(select_range_no_start, "-h2", "-2", ["h1", "h2"], ["a", "b"]);
select_test!(select_range_no_end_cat, "h4-,h1", "4-,1",
             ["h4", "h1", "h1"], ["d", "e", "a"]);
select_test!(select_range_no_start_cat, "-h2,h1[1]", "-2,5",
             ["h1", "h2", "h1"], ["a", "b", "e"]);

select_test_err!(select_err_unknown_header, "dne");
select_test_err!(select_err_oob_low, "0");
select_test_err!(select_err_oob_high, "6");
select_test_err!(select_err_idx_as_name, "1[0]");
select_test_err!(select_err_idx_oob_high, "h1[2]");
select_test_err!(select_err_idx_not_int, "h1[2.0]");
select_test_err!(select_err_idx_not_int_2, "h1[a]");
select_test_err!(select_err_unclosed_quote, r#""h1"#);
select_test_err!(select_err_unclosed_bracket, r#""h1"[1"#);
select_test_err!(select_err_expected_end_of_field, "a-b-");


================================================
FILE: tests/test_slice.rs
================================================
use std::borrow::ToOwned;
use std::process;

use workdir::Workdir;

macro_rules! slice_tests {
    ($name:ident, $start:expr, $end:expr, $expected:expr) => (
        mod $name {
            use super::test_slice;

            #[test]
            fn headers_no_index() {
                let name = concat!(stringify!($name), "headers_no_index");
                test_slice(name, $start, $end, $expected, true, false, false);
            }

            #[test]
            fn no_headers_no_index() {
                let name = concat!(stringify!($name), "no_headers_no_index");
                test_slice(name, $start, $end, $expected, false, false, false);
            }

            #[test]
            fn headers_index() {
                let name = concat!(stringify!($name), "headers_index");
                test_slice(name, $start, $end, $expected, true, true, false);
            }

            #[test]
            fn no_headers_index() {
                let name = concat!(stringify!($name), "no_headers_index");
                test_slice(name, $start, $end, $expected, false, true, false);
            }

            #[test]
            fn headers_no_index_len() {
                let name = concat!(stringify!($name), "headers_no_index_len");
                test_slice(name, $start, $end, $expected, true, false, true);
            }

            #[test]
            fn no_headers_no_index_len() {
                let name = concat!(stringify!($name),
                                   "no_headers_no_index_len");
                test_slice(name, $start, $end, $expected, false, false, true);
            }

            #[test]
            fn headers_index_len() {
                let name = concat!(stringify!($name), "headers_index_len");
                test_slice(name, $start, $end, $expected, true, true, true);
            }

            #[test]
            fn no_headers_index_len() {
                let name = concat!(stringify!($name), "no_headers_index_len");
                test_slice(name, $start, $end, $expected, false, true, true);
            }
        }
    );
}

fn setup(name: &str, headers: bool, use_index: bool)
        -> (Workdir, process::Command) {
    let wrk = Workdir::new(name);
    let mut data = vec![
        svec!["a"], svec!["b"], svec!["c"], svec!["d"], svec!["e"]
    ];
    if headers { data.insert(0, svec!["header"]); }
    if use_index {
        wrk.create_indexed("in.csv", data);
    } else {
        wrk.create("in.csv", data);
    }

    let mut cmd = wrk.command("slice");
    cmd.arg("in.csv");

    (wrk, cmd)
}

fn test_slice(name: &str, start: Option<usize>, end: Option<usize>,
              expected: &[&str], headers: bool,
              use_index: bool, as_len: bool) {
    let (wrk, mut cmd) = setup(name, headers, use_index);
    if let Some(start) = start {
        cmd.arg("--start").arg(&start.to_string());
    }
    if let Some(end) = end {
        if as_len {
            let start = start.unwrap_or(0);
            cmd.arg("--len").arg(&(end - start).to_string());
        } else {
            cmd.arg("--end").arg(&end.to_string());
        }
    }
    if !headers {
        cmd.arg("--no-headers");
    }

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let mut expected = expected.iter()
                               .map(|&s| vec![s.to_owned()])
                               .collect::<Vec<Vec<String>>>();
    if headers { expected.insert(0, svec!["header"]); }
    assert_eq!(got, expected);
}

fn test_index(name: &str, idx: usize, expected: &str,
              headers: bool, use_index: bool) {
    let (wrk, mut cmd) = setup(name, headers, use_index);
    cmd.arg("--index").arg(&idx.to_string());
    if !headers {
        cmd.arg("--no-headers");
    }

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let mut expected = vec![vec![expected.to_owned()]];
    if headers { expected.insert(0, svec!["header"]); }
    assert_eq!(got, expected);
}

slice_tests!(slice_simple, Some(0), Some(1), &["a"]);
slice_tests!(slice_simple_2, Some(1), Some(3), &["b", "c"]);
slice_tests!(slice_no_start, None, Some(1), &["a"]);
slice_tests!(slice_no_end, Some(3), None, &["d", "e"]);
slice_tests!(slice_all, None, None, &["a", "b", "c", "d", "e"]);

#[test]
fn slice_index() {
    test_index("slice_index", 1, "b", true, false);
}
#[test]
fn slice_index_no_headers() {
    test_index("slice_index_no_headers", 1, "b", false, false);
}
#[test]
fn slice_index_withindex() {
    test_index("slice_index_withindex", 1, "b", true, true);
}
#[test]
fn slice_index_no_headers_withindex() {
    test_index("slice_index_no_headers_withindex", 1, "b", false, true);
}


================================================
FILE: tests/test_sort.rs
================================================
use std::cmp;

use workdir::Workdir;

use {Csv, CsvData, qcheck};

fn prop_sort(name: &str, rows: CsvData, headers: bool) -> bool {
    let wrk = Workdir::new(name);
    wrk.create("in.csv", rows.clone());

    let mut cmd = wrk.command("sort");
    cmd.arg("in.csv");
    if !headers { cmd.arg("--no-headers"); }

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let mut expected = rows.to_vecs();
    let headers = if headers && !expected.is_empty() {
        expected.remove(0)
    } else {
        vec![]
    };
    expected.sort_by(|r1, r2| iter_cmp(r1.iter(), r2.iter()));
    if !headers.is_empty() { expected.insert(0, headers); }
    rassert_eq!(got, expected)
}

#[test]
fn prop_sort_headers() {
    fn p(rows: CsvData) -> bool {
        prop_sort("prop_sort_headers", rows, true)
    }
    qcheck(p as fn(CsvData) -> bool);
}

#[test]
fn prop_sort_no_headers() {
    fn p(rows: CsvData) -> bool {
        prop_sort("prop_sort_no_headers", rows, false)
    }
    qcheck(p as fn(CsvData) -> bool);
}

#[test]
fn sort_select() {
    let wrk = Workdir::new("sort_select");
    wrk.create("in.csv", vec![svec!["1", "b"], svec!["2", "a"]]);

    let mut cmd = wrk.command("sort");
    cmd.arg("--no-headers").args(&["--select", "2"]).arg("in.csv");

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = vec![svec!["2", "a"], svec!["1", "b"]];
    assert_eq!(got, expected);
}

#[test]
fn sort_numeric() {
    let wrk = Workdir::new("sort_numeric");
    wrk.create("in.csv", vec![
        svec!["N", "S"],
        svec!["10", "a"],
        svec!["LETTER", "b"],
        svec!["2", "c"],
        svec!["1", "d"],
    ]);

    let mut cmd = wrk.command("sort");
    cmd.arg("-N").arg("in.csv");

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = vec![
        svec!["N", "S"],
        //Non-numerics should be put first
        svec!["LETTER", "b"],
        svec!["1", "d"],
        svec!["2", "c"],
        svec!["10", "a"],
    ];
    assert_eq!(got, expected);
}

#[test]
fn sort_numeric_non_natural() {
    let wrk = Workdir::new("sort_numeric_non_natural");
    wrk.create("in.csv", vec![
        svec!["N", "S"],
        svec!["8.33", "a"],
        svec!["5", "b"],
        svec!["LETTER", "c"],
        svec!["7.4", "d"],
        svec!["3.33", "e"],
    ]);

    let mut cmd = wrk.command("sort");
    cmd.arg("-N").arg("in.csv");

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = vec![
        svec!["N", "S"],
        //Non-numerics should be put first
        svec!["LETTER", "c"],
        svec!["3.33", "e"],
        svec!["5", "b"],
        svec!["7.4", "d"],
        svec!["8.33", "a"],
    ];
    assert_eq!(got, expected);
}

#[test]
fn sort_reverse() {
    let wrk = Workdir::new("sort_reverse");
    wrk.create("in.csv", vec![
        svec!["R", "S"],
        svec!["1", "b"],
        svec!["2", "a"],
    ]);

    let mut cmd = wrk.command("sort");
    cmd.arg("-R").arg("--no-headers").arg("in.csv");

    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
    let expected = vec![
        svec!["R", "S"],
        svec!["2", "a"],
        svec!["1", "b"],
    ];
    assert_eq!(got, expected);
}

/// Order `a` and `b` lexicographically using `Ord`
pub fn iter_cmp<A, L, R>(mut a: L, mut b: R) -> cmp::Ordering
        where A: Ord, L: Iterator<Item=A>, R: Iterator<Item=A> {
    loop {
        match (a.next(), b.next()) {
            (None, None) => return cmp::Ordering::Equal,
            (None, _   ) => return cmp::Ordering::Less,
            (_   , None) => return cmp::Ordering::Greater,
            (Some(x), Some(y)) => match x.cmp(&y) {
                cmp::Ordering::Equal => (),
                non_eq => return non_eq,
            },
        }
    }
}


================================================
FILE: tests/test_split.rs
================================================
use std::borrow::ToOwned;

use workdir::Workdir;

macro_rules! split_eq {
    ($wrk:expr, $path:expr, $expected:expr) => (
        // assert_eq!($wrk.path($path).into_os_string().into_string().unwrap(),
                   // $expected.to_owned());
        assert_eq!($wrk.from_str::<String>(&$wrk.path($path)),
                   $expected.to_owned());
    );
}

fn data(headers: bool) -> Vec<Vec<String>> {
    let mut rows = vec![
        svec!["a", "b"], svec!["c", "d"],
        svec!["e", "f"], svec!["g", "h"],
        svec!["i", "j"], svec!["k", "l"],
    ];
    if headers { rows.insert(0, svec!["h1", "h2"]); }
    rows
}

#[test]
fn split_zero() {
    let wrk = Workdir::new("split_zero");
    wrk.create("in.csv", data(true));

    let mut cmd = wrk.command("split");
    cmd.args(&["--size", "0"]).arg(&wrk.path(".")).arg("in.csv");
    wrk.assert_err(&mut cmd);
}

#[test]
fn split() {
    let wrk = Workdir::new("split");
    wrk.create("in.csv", data(true));

    let mut cmd = wrk.command("split");
    cmd.args(&["--size", "2"]).arg(&wrk.path(".")).arg("in.csv");
    wrk.run(&mut cmd);

    split_eq!(wrk, "0.csv", "\
h1,h2
a,b
c,d
");
    split_eq!(wrk, "2.csv", "\
h1,h2
e,f
g,h
");
    split_eq!(wrk, "4.csv", "\
h1,h2
i,j
k,l
");
    assert!(!wrk.path("6.csv").exists());
}

#[test]
fn split_idx() {
    let wrk = Workdir::new("split_idx");
    wrk.create_indexed("in.csv", data(true));

    let mut cmd = wrk.command("split");
    cmd.args(&["--size", "2"]).arg(&wrk.path(".")).arg("in.csv");
    wrk.run(&mut cmd);

    split_eq!(wrk, "0.csv", "\
h1,h2
a,b
c,d
");
    split_eq!(wrk, "2.csv", "\
h1,h2
e,f
g,h
");
    split_eq!(wrk, "4.csv", "\
h1,h2
i,j
k,l
");
    assert!(!wrk.path("6.csv").exists());
}

#[test]
fn split_no_headers() {
    let wrk = Workdir::new("split_no_headers");
    wrk.create("in.csv", data(false));

    let mut cmd = wrk.command("split");
    cmd.args(&["--no-headers", "--size", "2"])
       .arg(&wrk.path("."))
       .arg("in.csv");
    wrk.run(&mut cmd);

    split_eq!(wrk, "0.csv", "\
a,b
c,d
");
    split_eq!(wrk, "2.csv", "\
e,f
g,h
");
    split_eq!(wrk, "4.csv", "\
i,j
k,l
");
}

#[test]
fn split_no_headers_idx() {
    let wrk = Workdir::new("split_no_headers_idx");
    wrk.create_indexed("in.csv", data(false));

    let mut cmd = wrk.command("split");
    cmd.args(&["--no-headers", "--size", "2"])
       .arg(&wrk.path("."))
       .arg("in.csv");
    wrk.run(&mut cmd);

    split_eq!(wrk, "0.csv", "\
a,b
c,d
");
    split_eq!(wrk, "2.csv", "\
e,f
g,h
");
    split_eq!(wrk, "4.csv", "\
i,j
k,l
");
}

#[test]
fn split_one() {
    let wrk = Workdir::new("split_one");
    wrk.create("in.csv", data(true));

    let mut cmd = wrk.command("split");
    cmd.args(&["--size", "1"]).arg(&wrk.path(".")).arg("in.csv");
    wrk.run(&mut cmd);

    split_eq!(wrk, "0.csv", "\
h1,h2
a,b
");
    split_eq!(wrk, "1.csv", "\
h1,h2
c,d
");
    split_eq!(wrk, "2.csv", "\
h1,h2
e,f
");
    split_eq!(wrk, "3.csv", "\
h1,h2
g,h
");
    split_eq!(wrk, "4.csv", "\
h1,h2
i,j
");
    split_eq!(wrk, "5.csv", "\
h1,h2
k,l
");
}

#[test]
fn split_one_idx() {
    let wrk = Workdir::new("split_one_idx");
    wrk.create_indexed("in.csv", data(true));

    let mut cmd = wrk.command("split");
    cmd.args(&["--size", "1"]).arg(&wrk.path(".")).arg("in.csv");
    wrk.run(&mut cmd);

    split_eq!(wrk, "0.csv", "\
h1,h2
a,b
");
    split_eq!(wrk, "1.csv", "\
h1,h2
c,d
");
    split_eq!(wrk, "2.csv", "\
h1,h2
e,f
");
    split_eq!(wrk, "3.csv", "\
h1,h2
g,h
");
    split_eq!(wrk, "4.csv", "\
h1,h2
i,j
");
    split_eq!(wrk, "5.csv", "\
h1,h2
k,l
");
}

#[test]
fn split_uneven() {
    let wrk = Workdir::new("split_uneven");
    wrk.create("in.csv", data(true));

    let mut cmd = wrk.command("split");
    cmd.args(&["--size", "4"]).arg(&wrk.path(".")).arg("in.csv");
    wrk.run(&mut cmd);

    split_eq!(wrk, "0.csv", "\
h1,h2
a,b
c,d
e,f
g,h
");
    split_eq!(wrk, "4.csv", "\
h1,h2
i,j
k,l
");
}

#[test]
fn split_uneven_idx() {
    let wrk = Workdir::new("split_uneven_idx");
    wrk.create_indexed("in.csv", data(true));

    let mut cmd = wrk.command("split");
    cmd.args(&["--size", "4"]).arg(&wrk.path(".")).arg("in.csv");
    wrk.run(&mut cmd);

    split_eq!(wrk, "0.csv", "\
h1,h2
a,b
c,d
e,f
g,h
");
    split_eq!(wrk, "4.csv", "\
h1,h2
i,j
k,l
");
}

#[test]
fn split_custom_filename() {
    let wrk = Workdir::new("split");
    wrk.create("in.csv", data(true));

    let mut cmd = wrk.command("split");
    cmd.args(&["--size", "2"])
       .args(&["--filename", "prefix-{}.csv"])
       .arg(&wrk.path(".")).arg("in.csv");
    wrk.run(&mut cmd);

    assert!(wrk.path("prefix-0.csv").exists());
    assert!(wrk.path("prefix-2.csv").exists());
    assert!(wrk.path("prefix-4.csv").exists());
}


================================================
FILE: tests/test_stats.rs
================================================
use std::borrow::ToOwned;
use std::cmp;
use std::process;

use workdir::Workdir;

macro_rules! stats_tests {
    ($name:ident, $field:expr, $rows:expr, $expect:expr) => (
        stats_tests!($name, $field, $rows, $expect, false);
    );
    ($name:ident, $field:expr, $rows:expr, $expect:expr, $nulls:expr) => (
        mod $name {
            use super::test_stats;

            stats_test_headers!($name, $field, $rows, $expect, $nulls);
            stats_test_no_headers!($name, $field, $rows, $expect, $nulls);
        }
    );
}

macro_rules! stats_test_headers {
    ($name:ident, $field:expr, $rows:expr, $expect:expr) => (
        stats_test_headers!($name, $field, $rows, $expect, false);
    );
    ($name:ident, $field:expr, $rows:expr, $expect:expr, $nulls:expr) => (
        #[test]
        fn headers_no_index() {
            let name = concat!(stringify!($name), "_headers_no_index");
            test_stats(name, $field, $rows, $expect, true, false, $nulls);
        }

        #[test]
        fn headers_index() {
            let name = concat!(stringify!($name), "_headers_index");
            test_stats(name, $field, $rows, $expect, true, true, $nulls);
        }
    );
}

macro_rules! stats_test_no_headers {
    ($name:ident, $field:expr, $rows:expr, $expect:expr) => (
        stats_test_no_headers!($name, $field, $rows, $expect, false);
    );
    ($name:ident, $field:expr, $rows:expr, $expect:expr, $nulls:expr) => (
        #[test]
        fn no_headers_no_index() {
            let name = concat!(stringify!($name), "_no_headers_no_index");
            test_stats(name, $field, $rows, $expect, false, false, $nulls);
        }

        #[test]
        fn no_headers_index() {
            let name = concat!(stringify!($name), "_no_headers_index");
            test_stats(name, $field, $rows, $expect, false, true, $nulls);
        }
    );
}

fn test_stats<S>(name: S, field: &str, rows: &[&str], expected: &str,
                 headers: bool, use_index: bool, nulls: bool)
        where S: ::std::ops::Deref<Target=str> {
    let (wrk, mut cmd) = setup(name, rows, headers, use_index, nulls);
    let field_val = get_field_value(&wrk, &mut cmd, field);
    // Only compare the first few bytes since floating point arithmetic
    // can mess with exact comparisons.
    let len = cmp::min(10, cmp::min(field_val.len(), expected.len()));
    assert_eq!(&field_val[0..len], &expected[0..len]);
}

fn setup<S>(name: S, rows: &[&str], headers: bool,
            use_index: bool, nulls: bool) -> (Workdir, process::Command)
        where S: ::std::ops::Deref<Target=str> {
    let wrk = Workdir::new(&name);
    let mut data: Vec<Vec<String>> =
        rows.iter().map(|&s| vec![s.to_owned()]).collect();
    if headers { data.insert(0, svec!["header"]); }
    if use_index {
        wrk.create_indexed("in.csv", data);
    } else {
        wrk.create("in.csv", data);
    }

    let mut cmd = wrk.command("stats");
    cmd.arg("in.csv");
    if !headers { cmd.arg("--no-headers"); }
    if nulls { cmd.arg("--nulls"); }

    (wrk, cmd)
}

fn get_field_value(wrk: &Workdir, cmd: &mut process::Command, field: &str)
                  -> String {
    if field == "median" { cmd.arg("--median"); }
    if field == "cardinality" { cmd.arg("--cardinality"); }
    if field == "mode" { cmd.arg("--mode"); }

    let mut rows: Vec<Vec<String>> = wrk.read_stdout(cmd);
    let headers = rows.remove(0);
    for row in rows.iter() {
        for (h, val) in headers.iter().zip(row.iter()) {
            if &**h == field {
                return val.clone();
            }
        }
    }
    panic!("BUG: Could not find field '{}' in headers '{:?}' \
            for command '{:?}'.", field, headers, cmd);
}

stats_tests!(stats_infer_unicode, "type", &["a"], "Unicode");
stats_tests!(stats_infer_int, "type", &["1"], "Integer");
stats_tests!(stats_infer_float, "type", &["1.2"], "Float");
stats_tests!(stats_infer_null, "type", &[""], "NULL");
stats_tests!(stats_infer_unicode_null, "type", &["a", ""], "Unicode");
stats_tests!(stats_infer_int_null, "type", &["1", ""], "Integer");
stats_tests!(stats_infer_float_null, "type", &["1.2", ""], "Float");
stats_tests!(stats_infer_null_unicode, "type", &["", "a"], "Unicode");
stats_tests!(stats_infer_null_int, "type", &["", "1"], "Integer");
stats_tests!(stats_infer_null_float, "type", &["", "1.2"], "Float");
stats_tests!(stats_infer_int_unicode, "type", &["1", "a"], "Unicode");
stats_tests!(stats_infer_unicode_int, "type", &["a", "1"], "Unicode");
stats_tests!(stats_infer_int_float, "type", &["1", "1.2"], "Float");
stats_tests!(stats_infer_float_int, "type", &["1.2", "1"], "Float");
stats_tests!(stats_infer_null_int_float_unicode, "type",
             &["", "1", "1.2", "a"], "Unicode");

stats_tests!(stats_no_mean, "mean", &["a"], "");
stats_tests!(stats_no_stddev, "stddev", &["a"], "");
stats_tests!(stats_no_median, "median", &["a"], "");
stats_tests!(stats_no_mode, "mode", &["a", "b"], "N/A");

stats_tests!(stats_null_mean, "mean", &[""], "");
stats_tests!(stats_null_stddev, "stddev", &[""], "");
stats_tests!(stats_null_median, "median", &[""], "");
stats_tests!(stats_null_mode, "mode", &[""], "N/A");

stats_tests!(stats_includenulls_null_mean, "mean", &[""], "", true);
stats_tests!(stats_includenulls_null_stddev, "stddev", &[""], "", true);
stats_tests!(stats_includenulls_null_median, "median", &[""], "", true);
stats_tests!(stats_includenulls_null_mode, "mode", &[""], "N/A", true);

stats_tests!(stats_includenulls_mean,
             "mean", &["5", "", "15", "10"], "7.5", true);

stats_tests!(stats_sum_integers, "sum", &["1", "2"], "3");
stats_tests!(stats_sum_floats, "sum", &["1.5", "2.8"], "4.3");
stats_tests!(stats_sum_mixed1, "sum", &["1.5", "2"], "3.5");
stats_tests!(stats_sum_mixed2, "sum", &["2", "1.5"], "3.5");
stats_tests!(stats_sum_mixed3, "sum", &["1.5", "hi", "2.8"], "4.3");
stats_tests!(stats_sum_nulls1, "sum", &["1", "", "2"], "3");
stats_tests!(stats_sum_nulls2, "sum", &["", "1", "2"], "3");

stats_tests!(stats_min, "min", &["2", "1.1"], "1.1");
stats_tests!(stats_max, "max", &["2", "1.1"], "2");
stats_tests!(stats_min_mix, "min", &["2", "a", "1.1"], "1.1");
stats_tests!(stats_max_mix, "max", &["2", "a", "1.1"], "a");
stats_tests!(stats_min_null, "min", &["", "2", "1.1"], "1.1");
stats_tests!(stats_max_null, "max", &["2", "1.1", ""], "2");

stats_tests!(stats_len_min, "min_length", &["aa", "a"], "1");
stats_tests!(stats_len_max, "max_length", &["a", "aa"], "2");
stats_tests!(stats_len_min_null, "min_length", &["", "aa", "a"], "0");
stats_tests!(stats_len_max_null, "max_length", &["a", "aa", ""], "2");

stats_tests!(stats_mean, "mean", &["5", "15", "10"], "10");
stats_tests!(stats_stddev, "stddev", &["1", "2", "3"], "0.816496580927726");
stats_tests!(stats_mean_null, "mean", &["", "5", "15", "10"], "10");
stats_tests!(stats_stddev_null, "stddev", &["1", "2", "3", ""],
             "0.816496580927726");
stats_tests!(stats_mean_mix, "mean", &["5", "15.1", "9.9"], "10");
stats_tests!(stats_stddev_mix, "stddev", &["1", "2.1", "2.9"],
             "0.7788880963698614");

stats_tests!(stats_cardinality, "cardinality", &["a", "b", "a"], "2");
stats_tests!(stats_mode, "mode", &["a", "b", "a"], "a");
stats_tests!(stats_mode_null, "mode", &["", "a", "b", "a"], "a");
stats_tests!(stats_median, "median", &["1", "2", "3"], "2");
stats_tests!(stats_median_null, "median", &["", "1", "2", "3"], "2");
stats_tests!(stats_median_even, "median", &["1", "2", "3", "4"], "2.5");
stats_tests!(stats_median_even_null, "median",
             &["", "1", "2", "3", "4"], "2.5");
stats_tests!(stats_median_mix, "median", &["1", "2.5", "3"], "2.5");

mod stats_infer_nothing {
    // Only test CSV data with headers.
    // Empty CSV data with no headers won't produce any statistical analysis.
    use super::test_stats;
    stats_test_headers!(stats_infer_nothing, "type", &[], "NULL");
}

mod stats_zero_cardinality {
    use super::test_stats;
    stats_test_headers!(stats_zero_cardinality, "cardinality", &[], "0");
}

mod stats_zero_mode {
    use super::test_stats;
    stats_test_headers!(stats_zero_mode, "mode", &[], "N/A");
}

mod stats_zero_mean {
    use super::test_stats;
    stats_test_headers!(stats_zero_mean, "mean", &[], "");
}

mod stats_zero_median {
    use super::test_stats;
    stats_test_headers!(stats_zero_median, "median", &[], "");
}

mod stats_header_fields {
    use super::test_stats;
    stats_test_headers!(stats_header_field_name, "field", &["a"], "header");
    stats_test_no_headers!(stats_header_no_field_name, "field", &["a"], "0");
}


================================================
FILE: tests/test_table.rs
================================================
use workdir::Workdir;

fn data() -> Vec<Vec<String>> {
    vec![
        svec!["h1", "h2", "h3"],
        svec!["abcdefg", "a", "a"],
        svec!["a", "abc", "z"],
    ]
}

#[test]
fn table() {
    let wrk = Workdir::new("table");
    wrk.create("in.csv", data());

    let mut cmd = wrk.command("table");
    cmd.arg("in.csv");

    let got: String = wrk.stdout(&mut cmd);
    assert_eq!(&*got, "\
h1       h2   h3
abcdefg  a    a
a        abc  z\
")
}


================================================
FILE: tests/tests.rs
================================================
#![allow(dead_code)]

#[macro_use]
extern crate log;
#[macro_use]
extern crate serde_derive;

extern crate csv;
extern crate filetime;
extern crate quickcheck;
extern crate rand;
extern crate stats;

use std::fmt;
use std::mem::transmute;
use std::ops;

use quickcheck::{Arbitrary, Gen, QuickCheck, StdGen, Testable};
use rand::{Rng, thread_rng};

macro_rules! svec[
    ($($x:expr),*) => (
        vec![$($x),*].into_iter()
                     .map(|s: &'static str| s.to_string())
                     .collect::<Vec<String>>()
    );
    ($($x:expr,)*) => (svec![$($x),*]);
];

macro_rules! rassert_eq {
    ($given:expr, $expected:expr) => ({assert_eq!($given, $expected); true});
}

mod workdir;

mod test_cat;
mod test_count;
mod test_fixlengths;
mod test_flatten;
mod test_fmt;
mod test_frequency;
mod test_headers;
mod test_index;
mod test_join;
mod test_partition;
mod test_reverse;
mod test_search;
mod test_select;
mod test_slice;
mod test_sort;
mod test_split;
mod test_stats;
mod test_table;

fn qcheck<T: Testable>(p: T) {
    QuickCheck::new().gen(StdGen::new(thread_rng(), 5)).quickcheck(p);
}

fn qcheck_sized<T: Testable>(p: T, size: usize) {
    QuickCheck::new().gen(StdGen::new(thread_rng(), size)).quickcheck(p);
}

pub type CsvVecs = Vec<Vec<String>>;

pub trait Csv {
    fn to_vecs(self) -> CsvVecs;
    fn from_vecs(CsvVecs) -> Self;
}

impl Csv for CsvVecs {
    fn to_vecs(self) -> CsvVecs { self }
    fn from_vecs(vecs: CsvVecs) -> CsvVecs { vecs }
}

#[derive(Clone, Eq, Ord, PartialEq, PartialOrd)]
struct CsvRecord(Vec<String>);

impl CsvRecord {
    fn unwrap(self) -> Vec<String> {
        let CsvRecord(v) = self;
        v
    }
}

impl ops::Deref for CsvRecord {
    type Target = [String];
    fn deref<'a>(&'a self) -> &'a [String] { &*self.0 }
}

impl fmt::Debug for CsvRecord {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        let bytes: Vec<_> = self.iter()
                                .map(|s| s.as_bytes())
                                .collect();
        write!(f, "{:?}", bytes)
    }
}

impl Arbitrary for CsvRecord {
    fn arbitrary<G: Gen>(g: &mut G) -> CsvRecord {
        let size = { let s = g.size(); g.gen_range(1, s) };
        CsvRecord((0..size).map(|_| Arbitrary::arbitrary(g)).collect())
    }

    fn shrink(&self) -> Box<Iterator<Item=CsvRecord>+'static> {
        Box::new(self.clone().unwrap()
                     .shrink().filter(|r| r.len() > 0).map(CsvRecord))
    }
}

impl Csv for Vec<CsvRecord> {
    fn to_vecs(self) -> CsvVecs {
        unsafe { transmute(self) }
    }
    fn from_vecs(vecs: CsvVecs) -> Vec<CsvRecord> {
        unsafe { transmute(vecs) }
    }
}

#[derive(Clone, Debug, Eq, Ord, PartialOrd)]
struct CsvData {
    data: Vec<CsvRecord>,
}

impl CsvData {
    fn unwrap(self) -> Vec<CsvRecord> { self.data }

    fn len(&self) -> usize { (&**self).len() }

    fn is_empty(&self) -> bool { self.len() == 0 }
}

impl ops::Deref for CsvData {
    type Target = [CsvRecord];
    fn deref<'a>(&'a self) -> &'a [CsvRecord] { &*self.data }
}

impl Arbitrary for CsvData {
    fn arbitrary<G: Gen>(g: &mut G) -> CsvData {
        let record_len = { let s = g.size(); g.gen_range(1, s) };
        let num_records: usize = g.gen_range(0, 100);
        CsvData{
            data: (0..num_records).map(|_| {
                CsvRecord((0..record_len)
                          .map(|_| Arbitrary::arbitrary(g))
                          .collect())
            }).collect(),
        }
    }

    fn shrink(&self) -> Box<Iterator<Item=CsvData>+'static> {
        let len = if self.is_empty() { 0 } else { self[0].len() };
        let mut rows: Vec<CsvData> =
            self.clone()
                .unwrap()
                .shrink()
                .filter(|rows| rows.iter().all(|r| r.len() == len))
                .map(|rows| CsvData { data: rows })
                .collect();
        // We should also introduce CSV data with fewer columns...
        if len > 1 {
            rows.extend(
                self.clone()
                    .unwrap()
                    .shrink()
                    .filter(|rows|
                        rows.iter().all(|r| r.len() == len - 1))
                    .map(|rows| CsvData { data: rows }));
        }
        Box::new(rows.into_iter())
    }
}

impl Csv for CsvData {
    fn to_vecs(self) -> CsvVecs { unsafe { transmute(self.data) } }
    fn from_vecs(vecs: CsvVecs) -> CsvData {
        CsvData {
            data: unsafe { transmute(vecs) },
        }
    }
}

impl PartialEq for CsvData {
    fn eq(&self, other: &CsvData) -> bool {
        (self.data.is_empty() && other.data.is_empty())
        || self.data == other.data
    }
}


================================================
FILE: tests/workdir.rs
================================================
use std::env;
use std::fmt;
use std::fs;
use std::io::{self, Read};
use std::path::{Path, PathBuf};
use std::process;
use std::str::FromStr;
use std::sync::atomic;
use std::time::Duration;

use csv;

use Csv;

static XSV_INTEGRATION_TEST_DIR: &'static str = "xit";

static NEXT_ID: atomic::AtomicUsize = atomic::ATOMIC_USIZE_INIT;

pub struct Workdir {
    root: PathBuf,
    dir: PathBuf,
    flexible: bool,
}

impl Workdir {
    pub fn new(name: &str) -> Workdir {
        let id = NEXT_ID.fetch_add(1, atomic::Ordering::SeqCst);
        let mut root = env::current_exe().unwrap()
                           .parent()
                           .expect("executable's directory")
                           .to_path_buf();
        if root.ends_with("deps") {
            root.pop();
        }
        let dir = root.join(XSV_INTEGRATION_TEST_DIR)
                      .join(name)
                      .join(&format!("test-{}", id));
        // println!("{:?}", dir);
        if let Err(err) = create_dir_all(&dir) {
            panic!("Could not create '{:?}': {}", dir, err);
        }
        Workdir { root: root, dir: dir, flexible: false }
    }

    pub fn flexible(mut self, yes: bool) -> Workdir {
        self.flexible = yes;
        self
    }

    pub fn create<T: Csv>(&self, name: &str, rows: T) {
        let mut wtr = csv::WriterBuilder::new()
            .flexible(self.flexible)
            .from_path(&self.path(name))
            .unwrap();
        for row in rows.to_vecs().into_iter() {
            wtr.write_record(row).unwrap();
        }
        wtr.flush().unwrap();
    }

    pub fn create_indexed<T: Csv>(&self, name: &str, rows: T) {
        self.create(name, rows);

        let mut cmd = self.command("index");
        cmd.arg(name);
        self.run(&mut cmd);
    }

    pub fn read_stdout<T: Csv>(&self, cmd: &mut process::Command) -> T {
        let stdout: String = self.stdout(cmd);
        let mut rdr = csv::ReaderBuilder::new()
            .has_headers(false)
            .from_reader(io::Cursor::new(stdout));

        let records: Vec<Vec<String>> = rdr
            .records()
            .collect::<Result<Vec<csv::StringRecord>, _>>()
            .unwrap()
            .into_iter()
            .map(|r| r.iter().map(|f| f.to_string()).collect())
            .collect();
        Csv::from_vecs(records)
    }

    pub fn command(&self, sub_command: &str) -> process::Command {
        let mut cmd = process::Command::new(&self.xsv_bin());
        cmd.current_dir(&self.dir).arg(sub_command);
        cmd
    }

    pub fn output(&self, cmd: &mut process::Command) -> process::Output {
        debug!("[{}]: {:?}", self.dir.display(), cmd);
        println!("[{}]: {:?}", self.dir.display(), cmd);
        let o = cmd.output().unwrap();
        if !o.status.success() {
            panic!("\n\n===== {:?} =====\n\
                    command failed but expected success!\
                    \n\ncwd: {}\
                    \n\nstatus: {}\
                    \n\nstdout: {}\n\nstderr: {}\
                    \n\n=====\n",
                   cmd, self.dir.display(), o.status,
                   String::from_utf8_lossy(&o.stdout),
                   String::from_utf8_lossy(&o.stderr))
        }
        o
    }

    pub fn run(&self, cmd: &mut process::Command) {
        self.output(cmd);
    }

    pub fn stdout<T: FromStr>(&self, cmd: &mut process::Command) -> T {
        let o = self.output(cmd);
        let stdout = String::from_utf8_lossy(&o.stdout);
        stdout.trim_matches(&['\r', '\n'][..]).parse().ok().expect(
            &format!("Could not convert from string: '{}'", stdout))
    }

    pub fn assert_err(&self, cmd: &mut process::Command) {
        let o = cmd.output().unwrap();
        if o.status.success() {
            panic!("\n\n===== {:?} =====\n\
                    command succeeded but expected failure!\
                    \n\ncwd: {}\
                    \n\nstatus: {}\
                    \n\nstdout: {}\n\nstderr: {}\
                    \n\n=====\n",
                   cmd, self.dir.display(), o.status,
                   String::from_utf8_lossy(&o.stdout),
                   String::from_utf8_lossy(&o.stderr));
        }
    }

    pub fn from_str<T: FromStr>(&self, name: &Path) -> T {
        let mut o = String::new();
        fs::File::open(name).unwrap().read_to_string(&mut o).unwrap();
        o.parse().ok().expect("fromstr")
    }

    pub fn path(&self, name: &str) -> PathBuf {
        self.dir.join(name)
    }

    pub fn xsv_bin(&self) -> PathBuf {
        self.root.join("xsv")
    }
}

impl fmt::Debug for Workdir {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "path={}", self.dir.display())
    }
}

// For whatever reason, `fs::create_dir_all` fails intermittently on Travis
// with a weird "file exists" error. Despite my best efforts to get to the
// bottom of it, I've decided a try-wait-and-retry hack is good enough.
fn create_dir_all<P: AsRef<Path>>(p: P) -> io::Result<()> {
    let mut last_err = None;
    for _ in 0..10 {
        if let Err(err) = fs::create_dir_all(&p) {
            last_err = Some(err);
            ::std::thread::sleep(Duration::from_millis(500));
        } else {
            return Ok(())
        }
    }
    Err(last_err.unwrap())
}