[
  {
    "path": ".gitignore",
    "content": ".*.swp\ndoc\ntags\nexamples/data/ss10pusa.csv\nbuild\ntarget\nctags.rust\n*.csv\n*.tsv\nmain\n*.idx\nbuilds\n"
  },
  {
    "path": ".travis.yml",
    "content": "language: rust\ncache: cargo\n\nenv:\n  global:\n    - PROJECT_NAME=xsv\nmatrix:\n  include:\n    # Stable channel\n    - os: linux\n      rust: stable\n      env: TARGET=i686-unknown-linux-musl\n    - os: osx\n      rust: stable\n      env: TARGET=x86_64-apple-darwin\n    - os: linux\n      rust: stable\n      env: TARGET=x86_64-unknown-linux-musl\n    # Minimum Rust supported channel.\n    - os: linux\n      rust: 1.28.0\n      env: TARGET=x86_64-unknown-linux-gnu\n    - os: linux\n      rust: 1.28.0\n      env: TARGET=x86_64-unknown-linux-musl\n\nbefore_install:\n  - export PATH=\"$PATH:$HOME/.cargo/bin\"\n\ninstall:\n  - bash ci/install.sh\n\nscript:\n  - bash ci/script.sh\n\nbefore_deploy:\n  - bash ci/before_deploy.sh\n\ndeploy:\n  provider: releases\n  api_key:\n    secure: aDT53aTIcl6RLcd4/StnKT55LgJyjiCtsmu1Byy0TIEtP4ZfNhsHwCbqyZT6TLownLJPi5wLM1WRncGKNYQelFDk/mUA8YugcFDfiSN//ZZ8KLAQiI+PX6JCrFYr/ZmP4dJzFWS1hPsr/X0gdbrlb3kuQG7BI9gH3GY4yTsLNiY=\n  file_glob: true\n  file: ${PROJECT_NAME}-${TRAVIS_TAG}-${TARGET}.*\n  # don't delete the artifacts from previous phases\n  skip_cleanup: true\n  # deploy when a new tag is pushed\n  on:\n    # channel to use to produce the release artifacts\n    # NOTE make sure you only release *once* per target\n    # TODO you may want to pick a different channel\n    condition: $TRAVIS_RUST_VERSION = stable\n    tags: true\n\nbranches:\n  only:\n    # Pushes and PR to the master branch\n    - master\n    # IMPORTANT Ruby regex to match tags. Required, or travis won't trigger deploys when a new tag\n    # is pushed. This regex matches semantic versions like v1.2.3-rc4+2016.02.22\n    - /^\\d+\\.\\d+\\.\\d+.*$/\n\nnotifications:\n  email:\n    on_success: never\n"
  },
  {
    "path": "BENCHMARKS.md",
    "content": "These are some very basic and unscientific benchmarks of various commands\nprovided by `xsv`. Please see below for more information.\n\nThese benchmarks were run with\n[worldcitiespop_mil.csv](https://burntsushi.net/stuff/worldcitiespop_mil.csv),\nwhich is a random 1,000,000 row subset of the world city population dataset\nfrom the [Data Science Toolkit](https://github.com/petewarden/dstkdata).\n\nThese benchmarks were run on an Intel i7-6900K (8 CPUs, 16 threads) with 64GB\nof memory.\n\n```\ncount                   0.11 seconds   413.76  MB/sec\nflatten                 4.54 seconds   10.02   MB/sec\nflatten_condensed       4.45 seconds   10.22   MB/sec\nfrequency               1.82 seconds   25.00   MB/sec\nindex                   0.12 seconds   379.28  MB/sec\nsample_10               0.18 seconds   252.85  MB/sec\nsample_1000             0.18 seconds   252.85  MB/sec\nsample_100000           0.29 seconds   156.94  MB/sec\nsearch                  0.27 seconds   168.56  MB/sec\nselect                  0.14 seconds   325.09  MB/sec\nsearch                  0.13 seconds   350.10  MB/sec\nselect                  0.13 seconds   350.10  MB/sec\nsort                    2.18 seconds   20.87   MB/sec\nslice_one_middle        0.08 seconds   568.92  MB/sec\nslice_one_middle_index  0.01 seconds   4551.36 MB/sec\nstats                   1.09 seconds   41.75   MB/sec\nstats_index             0.15 seconds   303.42  MB/sec\nstats_everything        1.94 seconds   23.46   MB/sec\nstats_everything_index  0.93 seconds   48.93   MB/sec\n```\n\n### Details\n\nThe purpose of these benchmarks is to provide a rough ballpark estimate of how\nfast each command is. My hope is that they can also catch significant\nperformance regressions.\n\nThe `count` command can be viewed as a sort of baseline of the fastest possible\ncommand that parses every record in CSV data.\n\nThe benchmarks that end with `_index` are run with indexing enabled.\n"
  },
  {
    "path": "COPYING",
    "content": "This project is dual-licensed under the Unlicense and MIT licenses.\n\nYou may use this code under the terms of either license.\n"
  },
  {
    "path": "Cargo.toml",
    "content": "[package]\nname = \"xsv\"\nversion = \"0.13.0\"  #:version\nauthors = [\"Andrew Gallant <jamslam@gmail.com>\"]\ndescription = \"A high performance CSV command line toolkit.\"\ndocumentation = \"https://burntsushi.net/rustdoc/xsv/\"\nhomepage = \"https://github.com/BurntSushi/xsv\"\nrepository = \"https://github.com/BurntSushi/xsv\"\nreadme = \"README.md\"\nkeywords = [\"csv\", \"tsv\", \"slice\", \"command\"]\nlicense = \"Unlicense/MIT\"\nautotests = false\n\n[[bin]]\nname = \"xsv\"\ntest = false\nbench = false\ndoctest = false\n\n[[test]]\nname = \"tests\"\n\n[profile.release]\nopt-level = 3\ndebug = true\n\n[profile.test]\nopt-level = 3\n\n[dependencies]\nbyteorder = \"1\"\ncrossbeam-channel = \"0.2.4\"\ncsv = \"1\"\ncsv-index = \"0.1.5\"\ndocopt = \"1\"\nfiletime = \"0.1\"\nnum_cpus = \"1.4\"\nrand = \"0.5\"\nregex = \"1\"\nserde = \"1\"\nserde_derive = \"1\"\nstreaming-stats = \"0.2\"\ntabwriter = \"1\"\nthreadpool = \"1.3\"\n\n[dev-dependencies]\nquickcheck = { version = \"0.7\", default-features = false }\nlog = \"0.4\"\n"
  },
  {
    "path": "LICENSE-MIT",
    "content": "The MIT License (MIT)\n\nCopyright (c) 2015 Andrew Gallant\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in\nall copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN\nTHE SOFTWARE.\n"
  },
  {
    "path": "Makefile",
    "content": "all:\n\t@echo Nothing to do...\n\nctags:\n\tctags --recurse --options=ctags.rust --languages=Rust\n\ndocs:\n\tcargo doc\n\tin-dir ./target/doc fix-perms\n\trscp ./target/doc/* gopher:~/www/burntsushi.net/rustdoc/\n\ndebug:\n\tcargo build --verbose\n\trustc -L ./target/deps/ -g -Z lto --opt-level 3 src/main.rs\n\npush:\n\tgit push home master\n\tgit push origin master\n\ndev:\n\tcargo build\n\tcp ./target/xsv ~/bin/bin/xsv\n\nrelease:\n\tcargo build --release\n\tmkdir -p ~/bin/bin\n\tcp ./target/release/xsv ~/bin/bin/xsv\n\ngithub:\n\t./scripts/build-release\n\t./scripts/github-release\n\t./scripts/github-upload\n"
  },
  {
    "path": "README.md",
    "content": "# `xsv` is now unmaintained\n\nIn lieu of `xsv`, I'd recommend either\n[qsv](https://github.com/dathere/qsv)\nor\n[xan](https://github.com/medialab/xan).\n\n-------------------------------------------------------------------------------\n\n\nxsv is a command line program for indexing, slicing, analyzing, splitting\nand joining CSV files. Commands should be simple, fast and composable:\n\n1. Simple tasks should be easy.\n2. Performance trade offs should be exposed in the CLI interface.\n3. Composition should not come at the expense of performance.\n\nThis README contains information on how to\n[install `xsv`](https://github.com/BurntSushi/xsv#installation), in addition to\na quick tour of several commands.\n\n[![Linux build status](https://api.travis-ci.org/BurntSushi/xsv.svg)](https://travis-ci.org/BurntSushi/xsv)\n[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/xsv?svg=true)](https://ci.appveyor.com/project/BurntSushi/xsv)\n[![](https://meritbadge.herokuapp.com/xsv)](https://crates.io/crates/xsv)\n\nDual-licensed under MIT or the [UNLICENSE](https://unlicense.org).\n\n\n### Available commands\n\n* **cat** - Concatenate CSV files by row or by column.\n* **count** - Count the rows in a CSV file. (Instantaneous with an index.)\n* **fixlengths** - Force a CSV file to have same-length records by either\n  padding or truncating them.\n* **flatten** - A flattened view of CSV records. Useful for viewing one record\n  at a time. e.g., `xsv slice -i 5 data.csv | xsv flatten`.\n* **fmt** - Reformat CSV data with different delimiters, record terminators\n  or quoting rules. (Supports ASCII delimited data.)\n* **frequency** - Build frequency tables of each column in CSV data. (Uses\n  parallelism to go faster if an index is present.)\n* **headers** - Show the headers of CSV data. Or show the intersection of all\n  headers between many CSV files.\n* **index** - Create an index for a CSV file. This is very quick and provides\n  constant time indexing into the CSV file.\n* **input** - Read CSV data with exotic quoting/escaping rules.\n* **join** - Inner, outer and cross joins. Uses a simple hash index to make it\n  fast.\n* **partition** - Partition CSV data based on a column value.\n* **sample** - Randomly draw rows from CSV data using reservoir sampling (i.e.,\n  use memory proportional to the size of the sample).\n* **reverse** - Reverse order of rows in CSV data.\n* **search** - Run a regex over CSV data. Applies the regex to each field\n  individually and shows only matching rows.\n* **select** - Select or re-order columns from CSV data.\n* **slice** - Slice rows from any part of a CSV file. When an index is present,\n  this only has to parse the rows in the slice (instead of all rows leading up\n  to the start of the slice).\n* **sort** - Sort CSV data.\n* **split** - Split one CSV file into many CSV files of N chunks.\n* **stats** - Show basic types and statistics of each column in the CSV file.\n  (i.e., mean, standard deviation, median, range, etc.)\n* **table** - Show aligned output of any CSV data using\n  [elastic tabstops](https://github.com/BurntSushi/tabwriter).\n\n\n### A whirlwind tour\n\nLet's say you're playing with some of the data from the\n[Data Science Toolkit](https://github.com/petewarden/dstkdata), which contains\nseveral CSV files. Maybe you're interested in the population counts of each\ncity in the world. So grab the data and start examining it:\n\n```bash\n$ curl -LO https://burntsushi.net/stuff/worldcitiespop.csv\n$ xsv headers worldcitiespop.csv\n1   Country\n2   City\n3   AccentCity\n4   Region\n5   Population\n6   Latitude\n7   Longitude\n```\n\nThe next thing you might want to do is get an overview of the kind of data that\nappears in each column. The `stats` command will do this for you:\n\n```bash\n$ xsv stats worldcitiespop.csv --everything | xsv table\nfield       type     min            max            min_length  max_length  mean          stddev         median     mode         cardinality\nCountry     Unicode  ad             zw             2           2                                                   cn           234\nCity        Unicode   bab el ahmar  Þykkvibaer     1           91                                                  san jose     2351892\nAccentCity  Unicode   Bâb el Ahmar  ïn Bou Chella  1           91                                                  San Antonio  2375760\nRegion      Unicode  00             Z9             0           2                                        13         04           397\nPopulation  Integer  7              31480498       0           8           47719.570634  302885.559204  10779                   28754\nLatitude    Float    -54.933333     82.483333      1           12          27.188166     21.952614      32.497222  51.15        1038349\nLongitude   Float    -179.983333    180            1           14          37.08886      63.22301       35.28      23.8         1167162\n```\n\nThe `xsv table` command takes any CSV data and formats it into aligned columns\nusing [elastic tabstops](https://github.com/BurntSushi/tabwriter). You'll\nnotice that it even gets alignment right with respect to Unicode characters.\n\nSo, this command takes about 12 seconds to run on my machine, but we can speed\nit up by creating an index and re-running the command:\n\n```bash\n$ xsv index worldcitiespop.csv\n$ xsv stats worldcitiespop.csv --everything | xsv table\n...\n```\n\nWhich cuts it down to about 8 seconds on my machine. (And creating the index\ntakes less than 2 seconds.)\n\nNotably, the same type of \"statistics\" command in another\n[CSV command line toolkit](https://csvkit.readthedocs.io/)\ntakes about 2 minutes to produce similar statistics on the same data set.\n\nCreating an index gives us more than just faster statistics gathering. It also\nmakes slice operations extremely fast because *only the sliced portion* has to\nbe parsed. For example, let's say you wanted to grab the last 10 records:\n\n```bash\n$ xsv count worldcitiespop.csv\n3173958\n$ xsv slice worldcitiespop.csv -s 3173948 | xsv table\nCountry  City               AccentCity         Region  Population  Latitude     Longitude\nzw       zibalonkwe         Zibalonkwe         06                  -19.8333333  27.4666667\nzw       zibunkululu        Zibunkululu        06                  -19.6666667  27.6166667\nzw       ziga               Ziga               06                  -19.2166667  27.4833333\nzw       zikamanas village  Zikamanas Village  00                  -18.2166667  27.95\nzw       zimbabwe           Zimbabwe           07                  -20.2666667  30.9166667\nzw       zimre park         Zimre Park         04                  -17.8661111  31.2136111\nzw       ziyakamanas        Ziyakamanas        00                  -18.2166667  27.95\nzw       zizalisari         Zizalisari         04                  -17.7588889  31.0105556\nzw       zuzumba            Zuzumba            06                  -20.0333333  27.9333333\nzw       zvishavane         Zvishavane         07      79876       -20.3333333  30.0333333\n```\n\nThese commands are *instantaneous* because they run in time and memory\nproportional to the size of the slice (which means they will scale to\narbitrarily large CSV data).\n\nSwitching gears a little bit, you might not always want to see every column in\nthe CSV data. In this case, maybe we only care about the country, city and\npopulation. So let's take a look at 10 random rows:\n\n```bash\n$ xsv select Country,AccentCity,Population worldcitiespop.csv \\\n  | xsv sample 10 \\\n  | xsv table\nCountry  AccentCity       Population\ncn       Guankoushang\nza       Klipdrift\nma       Ouled Hammou\nfr       Les Gravues\nla       Ban Phadèng\nde       Lüdenscheid      80045\nqa       Umm ash Shubrum\nbd       Panditgoan\nus       Appleton\nua       Lukashenkivske\n```\n\nWhoops! It seems some cities don't have population counts. How pervasive is\nthat?\n\n```bash\n$ xsv frequency worldcitiespop.csv --limit 5\nfield,value,count\nCountry,cn,238985\nCountry,ru,215938\nCountry,id,176546\nCountry,us,141989\nCountry,ir,123872\nCity,san jose,328\nCity,san antonio,320\nCity,santa rosa,296\nCity,santa cruz,282\nCity,san juan,255\nAccentCity,San Antonio,317\nAccentCity,Santa Rosa,296\nAccentCity,Santa Cruz,281\nAccentCity,San Juan,254\nAccentCity,San Miguel,254\nRegion,04,159916\nRegion,02,142158\nRegion,07,126867\nRegion,03,122161\nRegion,05,118441\nPopulation,(NULL),3125978\nPopulation,2310,12\nPopulation,3097,11\nPopulation,983,11\nPopulation,2684,11\nLatitude,51.15,777\nLatitude,51.083333,772\nLatitude,50.933333,769\nLatitude,51.116667,769\nLatitude,51.133333,767\nLongitude,23.8,484\nLongitude,23.2,477\nLongitude,23.05,476\nLongitude,25.3,474\nLongitude,23.1,459\n```\n\n(The `xsv frequency` command builds a frequency table for each column in the\nCSV data. This one only took 5 seconds.)\n\nSo it seems that most cities do not have a population count associated with\nthem at all. No matter—we can adjust our previous command so that it only\nshows rows with a population count:\n\n```bash\n$ xsv search -s Population '[0-9]' worldcitiespop.csv \\\n  | xsv select Country,AccentCity,Population \\\n  | xsv sample 10 \\\n  | xsv table\nCountry  AccentCity       Population\nes       Barañáin         22264\nes       Puerto Real      36946\nat       Moosburg         4602\nhu       Hejobaba         1949\nru       Polyarnyye Zori  15092\ngr       Kandíla          1245\nis       Ólafsvík         992\nhu       Decs             4210\nbg       Sliven           94252\ngb       Leatherhead      43544\n```\n\nErk. Which country is `at`? No clue, but the Data Science Toolkit has a CSV\nfile called `countrynames.csv`. Let's grab it and do a join so we can see which\ncountries these are:\n\n```bash\ncurl -LO https://gist.githubusercontent.com/anonymous/063cb470e56e64e98cf1/raw/98e2589b801f6ca3ff900b01a87fbb7452eb35c7/countrynames.csv\n$ xsv headers countrynames.csv\n1   Abbrev\n2   Country\n$ xsv join --no-case  Country sample.csv Abbrev countrynames.csv | xsv table\nCountry  AccentCity       Population  Abbrev  Country\nes       Barañáin         22264       ES      Spain\nes       Puerto Real      36946       ES      Spain\nat       Moosburg         4602        AT      Austria\nhu       Hejobaba         1949        HU      Hungary\nru       Polyarnyye Zori  15092       RU      Russian Federation | Russia\ngr       Kandíla          1245        GR      Greece\nis       Ólafsvík         992         IS      Iceland\nhu       Decs             4210        HU      Hungary\nbg       Sliven           94252       BG      Bulgaria\ngb       Leatherhead      43544       GB      Great Britain | UK | England | Scotland | Wales | Northern Ireland | United Kingdom\n```\n\nWhoops, now we have two columns called `Country` and an `Abbrev` column that we\nno longer need. This is easy to fix by re-ordering columns with the `xsv\nselect` command:\n\n```bash\n$ xsv join --no-case  Country sample.csv Abbrev countrynames.csv \\\n  | xsv select 'Country[1],AccentCity,Population' \\\n  | xsv table\nCountry                                                                              AccentCity       Population\nSpain                                                                                Barañáin         22264\nSpain                                                                                Puerto Real      36946\nAustria                                                                              Moosburg         4602\nHungary                                                                              Hejobaba         1949\nRussian Federation | Russia                                                          Polyarnyye Zori  15092\nGreece                                                                               Kandíla          1245\nIceland                                                                              Ólafsvík         992\nHungary                                                                              Decs             4210\nBulgaria                                                                             Sliven           94252\nGreat Britain | UK | England | Scotland | Wales | Northern Ireland | United Kingdom  Leatherhead      43544\n```\n\nPerhaps we can do this with the original CSV data? Indeed we can—because\njoins in `xsv` are fast.\n\n```bash\n$ xsv join --no-case Abbrev countrynames.csv Country worldcitiespop.csv \\\n  | xsv select '!Abbrev,Country[1]' \\\n  > worldcitiespop_countrynames.csv\n$ xsv sample 10 worldcitiespop_countrynames.csv | xsv table\nCountry                      City                   AccentCity             Region  Population  Latitude    Longitude\nSri Lanka                    miriswatte             Miriswatte             36                  7.2333333   79.9\nRomania                      livezile               Livezile               26      1985        44.512222   22.863333\nIndonesia                    tawainalu              Tawainalu              22                  -4.0225     121.9273\nRussian Federation | Russia  otar                   Otar                   45                  56.975278   48.305278\nFrance                       le breuil-bois robert  le Breuil-Bois Robert  A8                  48.945567   1.717026\nFrance                       lissac                 Lissac                 B1                  45.103094   1.464927\nAlbania                      lumalasi               Lumalasi               46                  40.6586111  20.7363889\nChina                        motzushih              Motzushih              11                  27.65       111.966667\nRussian Federation | Russia  svakino                Svakino                69                  55.60211    34.559785\nRomania                      tirgu pancesti         Tirgu Pancesti         38                  46.216667   27.1\n```\n\nThe `!Abbrev,Country[1]` syntax means, \"remove the `Abbrev` column and remove\nthe second occurrence of the `Country` column.\" Since we joined with\n`countrynames.csv` first, the first `Country` name (fully expanded) is now\nincluded in the CSV data.\n\nThis `xsv join` command takes about 7 seconds on my machine. The performance\ncomes from constructing a very simple hash index of one of the CSV data files\ngiven. The `join` command does an inner join by default, but it also has left,\nright and full outer join support too.\n\n\n### Installation\n\nBinaries for Windows, Linux and macOS are available [from Github](https://github.com/BurntSushi/xsv/releases/latest).\n\nIf you're a **macOS Homebrew** user, then you can install xsv\nfrom homebrew-core:\n\n```\n$ brew install xsv\n```\n\nIf you're a **macOS MacPorts** user, then you can install xsv\nfrom the [official ports](https://www.macports.org/ports.php?by=name&substr=xsv):\n\n```\n$ sudo port install xsv\n```\n\nIf you're a **Nix/NixOS** user, you can install xsv from nixpkgs:\n\n```\n$ nix-env -i xsv\n```\n\nAlternatively, you can compile from source by\n[installing Cargo](https://crates.io/install)\n([Rust's](https://www.rust-lang.org/) package manager)\nand installing `xsv` using Cargo:\n\n```bash\ncargo install xsv\n```\n\nCompiling from this repository also works similarly:\n\n```bash\ngit clone git://github.com/BurntSushi/xsv\ncd xsv\ncargo build --release\n```\n\nCompilation will probably take a few minutes depending on your machine. The\nbinary will end up in `./target/release/xsv`.\n\n\n### Benchmarks\n\nI've compiled some [very rough\nbenchmarks](https://github.com/BurntSushi/xsv/blob/master/BENCHMARKS.md) of\nvarious `xsv` commands.\n\n\n### Motivation\n\nHere are several valid criticisms of this project:\n\n1. You shouldn't be working with CSV data because CSV is a terrible format.\n2. If your data is gigabytes in size, then CSV is the wrong storage type.\n3. Various SQL databases provide all of the operations available in `xsv` with\n   more sophisticated indexing support. And the performance is a zillion times\n   better.\n\nI'm sure there are more criticisms, but the impetus for this project was a 40GB\nCSV file that was handed to me. I was tasked with figuring out the shape of the\ndata inside of it and coming up with a way to integrate it into our existing\nsystem. It was then that I realized that every single CSV tool I knew about was\nwoefully inadequate. They were just too slow or didn't provide enough\nflexibility. (Another project I had comprised of a few dozen CSV files. They\nwere smaller than 40GB, but they were each supposed to represent the same kind\nof data. But they all had different column and unintuitive column names. Useful\nCSV inspection tools were critical here—and they had to be reasonably fast.)\n\nThe key ingredients for helping me with my task were indexing, random sampling,\nsearching, slicing and selecting columns. All of these things made dealing with\n40GB of CSV data a bit more manageable (or dozens of CSV files).\n\nGetting handed a large CSV file *once* was enough to launch me on this quest.\nFrom conversations I've had with others, CSV data files this large don't seem\nto be a rare event. Therefore, I believe there is room for a tool that has a\nhope of dealing with data that large.\n\n\n### Naming collision\n\nThis project is unrelated to another similar project with the same name:\nhttps://mj.ucw.cz/sw/xsv/\n"
  },
  {
    "path": "UNLICENSE",
    "content": "This is free and unencumbered software released into the public domain.\n\nAnyone is free to copy, modify, publish, use, compile, sell, or\ndistribute this software, either in source code form or as a compiled\nbinary, for any purpose, commercial or non-commercial, and by any\nmeans.\n\nIn jurisdictions that recognize copyright laws, the author or authors\nof this software dedicate any and all copyright interest in the\nsoftware to the public domain. We make this dedication for the benefit\nof the public at large and to the detriment of our heirs and\nsuccessors. We intend this dedication to be an overt act of\nrelinquishment in perpetuity of all present and future rights to this\nsoftware under copyright law.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\nEXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF\nMERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.\nIN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR\nOTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,\nARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR\nOTHER DEALINGS IN THE SOFTWARE.\n\nFor more information, please refer to <http://unlicense.org/>\n"
  },
  {
    "path": "appveyor.yml",
    "content": "\n# Inspired from https://github.com/habitat-sh/habitat/blob/master/appveyor.yml\ncache:\n  - c:\\cargo\\registry\n  - c:\\cargo\\git\n  - c:\\projects\\ripgrep\\target\n\ninit:\n  - mkdir c:\\cargo\n  - mkdir c:\\rustup\n  - SET PATH=c:\\cargo\\bin;%PATH%\n\nenvironment:\n  CARGO_HOME: \"c:\\\\cargo\"\n  RUSTUP_HOME: \"c:\\\\rustup\"\n  CARGO_TARGET_DIR: \"c:\\\\projects\\\\xsv\\\\target\"\n  global:\n    PROJECT_NAME: xsv\n    RUST_BACKTRACE: full\n  matrix:\n    # Stable channel\n    - TARGET: i686-pc-windows-gnu\n      CHANNEL: stable\n    - TARGET: i686-pc-windows-msvc\n      CHANNEL: stable\n    - TARGET: x86_64-pc-windows-gnu\n      CHANNEL: stable\n    - TARGET: x86_64-pc-windows-msvc\n      CHANNEL: stable\n\nmatrix:\n  fast_finish: true\n\n# Install Rust and Cargo\n# (Based on from https://github.com/rust-lang/libc/blob/master/appveyor.yml)\ninstall:\n  - curl -sSf -o rustup-init.exe https://win.rustup.rs/\n  - rustup-init.exe -y --default-host %TARGET% --no-modify-path\n  - if defined MSYS2_BITS set PATH=%PATH%;C:\\msys64\\mingw%MSYS2_BITS%\\bin\n  - rustc -V\n  - cargo -V\n\n# ???\nbuild: false\n\n# Equivalent to Travis' `script` phase\n# TODO modify this phase as you see fit\ntest_script:\n  - cargo test --verbose\n\nbefore_deploy:\n  # Generate artifacts for release\n  - cargo build --release\n  - mkdir staging\n  # TODO update this part to copy the artifacts that make sense for your project\n  - copy target\\release\\xsv.exe staging\n  - cd staging\n    # release zipfile will look like 'rust-everywhere-v1.2.3-x86_64-pc-windows-msvc'\n  - 7z a ../%PROJECT_NAME%-%APPVEYOR_REPO_TAG_NAME%-%TARGET%.zip *\n  - appveyor PushArtifact ../%PROJECT_NAME%-%APPVEYOR_REPO_TAG_NAME%-%TARGET%.zip\n\ndeploy:\n  description: 'Windows release'\n  # All the zipped artifacts will be deployed\n  artifact: /.*\\.zip/\n  auth_token:\n    secure: vv4vBCEosGlyQjaEC1+kraP2P6O4CQSa+Tw50oHWFTGcmuXxaWS0/yEXbxsIRLpw\n  provider: GitHub\n  # deploy when a new tag is pushed and only on the stable channel\n  on:\n    # channel to use to produce the release artifacts\n    # NOTE make sure you only release *once* per target\n    # TODO you may want to pick a different channel\n    CHANNEL: stable\n    appveyor_repo_tag: true\n\nbranches:\n  only:\n    - appveyor\n    - /\\d+\\.\\d+\\.\\d+/\n  except:\n    - master\n"
  },
  {
    "path": "ci/before_deploy.sh",
    "content": "# `before_deploy` phase: here we package the build artifacts\n\nset -ex\n\n. $(dirname $0)/utils.sh\n\n# Generate artifacts for release\nmk_artifacts() {\n    cargo build --target $TARGET --release\n}\n\nmk_tarball() {\n    # create a \"staging\" directory\n    local td=$(mktempd)\n    local out_dir=$(pwd)\n\n    # TODO update this part to copy the artifacts that make sense for your project\n    # NOTE All Cargo build artifacts will be under the 'target/$TARGET/{debug,release}'\n    cp target/$TARGET/release/xsv $td\n\n    pushd $td\n\n    # release tarball will look like 'rust-everywhere-v1.2.3-x86_64-unknown-linux-gnu.tar.gz'\n    tar czf $out_dir/${PROJECT_NAME}-${TRAVIS_TAG}-${TARGET}.tar.gz *\n\n    popd\n    rm -r $td\n}\n\nmain() {\n    mk_artifacts\n    mk_tarball\n}\n\nmain\n"
  },
  {
    "path": "ci/install.sh",
    "content": "# `install` phase: install stuff needed for the `script` phase\n\nset -ex\n\n. $(dirname $0)/utils.sh\n\ninstall_c_toolchain() {\n    case $TARGET in\n        aarch64-unknown-linux-gnu)\n            sudo apt-get install -y --no-install-recommends \\\n                 gcc-aarch64-linux-gnu libc6-arm64-cross libc6-dev-arm64-cross\n            ;;\n        *)\n            # For other targets, this is handled by addons.apt.packages in .travis.yml\n            ;;\n    esac\n}\n\ninstall_rustup() {\n    curl https://sh.rustup.rs -sSf \\\n      | sh -s -- -y --default-toolchain=$TRAVIS_RUST_VERSION\n\n    rustc -V\n    cargo -V\n}\n\ninstall_standard_crates() {\n    if [ $(host) != \"$TARGET\" ]; then\n        rustup target add $TARGET\n    fi\n}\n\nconfigure_cargo() {\n    local prefix=$(gcc_prefix)\n\n    if [ ! -z $prefix ]; then\n        # information about the cross compiler\n        ${prefix}gcc -v\n\n        # tell cargo which linker to use for cross compilation\n        mkdir -p .cargo\n        cat >>.cargo/config <<EOF\n[target.$TARGET]\nlinker = \"${prefix}gcc\"\nEOF\n    fi\n}\n\nmain() {\n    install_c_toolchain\n    install_rustup\n    install_standard_crates\n    configure_cargo\n\n    # TODO if you need to install extra stuff add it here\n}\n\nmain\n"
  },
  {
    "path": "ci/script.sh",
    "content": "# `script` phase: you usually build, test and generate docs in this phase\n\nset -ex\n\n. $(dirname $0)/utils.sh\n\n# NOTE Workaround for rust-lang/rust#31907 - disable doc tests when cross compiling\n# This has been fixed in the nightly channel but it would take a while to reach the other channels\ndisable_cross_doctests() {\n    if [ $(host) != \"$TARGET\" ] && [ \"$TRAVIS_RUST_VERSION\" = \"stable\" ]; then\n        if [ \"$TRAVIS_OS_NAME\" = \"osx\" ]; then\n            brew install gnu-sed --default-names\n        fi\n\n        find src -name '*.rs' -type f | xargs sed -i -e 's:\\(//.\\s*```\\):\\1 ignore,:g'\n    fi\n}\n\n# TODO modify this function as you see fit\n# PROTIP Always pass `--target $TARGET` to cargo commands, this makes cargo output build artifacts\n# to target/$TARGET/{debug,release} which can reduce the number of needed conditionals in the\n# `before_deploy`/packaging phase\nrun_test_suite() {\n    case $TARGET in\n        # configure emulation for transparent execution of foreign binaries\n        aarch64-unknown-linux-gnu)\n            export QEMU_LD_PREFIX=/usr/aarch64-linux-gnu\n            ;;\n        arm*-unknown-linux-gnueabihf)\n            export QEMU_LD_PREFIX=/usr/arm-linux-gnueabihf\n            ;;\n        *)\n            ;;\n    esac\n\n    if [ ! -z \"$QEMU_LD_PREFIX\" ]; then\n        # Run tests on a single thread when using QEMU user emulation\n        export RUST_TEST_THREADS=1\n    fi\n\n    cargo build --target $TARGET --verbose\n    cargo test --target $TARGET\n\n    # sanity check the file type\n    file target/$TARGET/debug/xsv\n}\n\nmain() {\n    disable_cross_doctests\n    run_test_suite\n}\n\nmain\n"
  },
  {
    "path": "ci/utils.sh",
    "content": "mktempd() {\n    echo $(mktemp -d 2>/dev/null || mktemp -d -t tmp)\n}\n\nhost() {\n    case \"$TRAVIS_OS_NAME\" in\n        linux)\n            echo x86_64-unknown-linux-gnu\n            ;;\n        osx)\n            echo x86_64-apple-darwin\n            ;;\n    esac\n}\n\ngcc_prefix() {\n    case \"$TARGET\" in\n        aarch64-unknown-linux-gnu)\n            echo aarch64-linux-gnu-\n            ;;\n        arm*-gnueabihf)\n            echo arm-linux-gnueabihf-\n            ;;\n        *)\n            return\n            ;;\n    esac\n}\n\ndobin() {\n    [ -z $MAKE_DEB ] && die 'dobin: $MAKE_DEB not set'\n    [ $# -lt 1 ] && die \"dobin: at least one argument needed\"\n\n    local f prefix=$(gcc_prefix)\n    for f in \"$@\"; do\n        install -m0755 $f $dtd/debian/usr/bin/\n        ${prefix}strip -s $dtd/debian/usr/bin/$(basename $f)\n    done\n}\n\narchitecture() {\n    case $1 in\n        x86_64-unknown-linux-gnu|x86_64-unknown-linux-musl)\n            echo amd64\n            ;;\n        i686-unknown-linux-gnu|i686-unknown-linux-musl)\n            echo i386\n            ;;\n        arm*-unknown-linux-gnueabihf)\n            echo armhf\n            ;;\n        *)\n            die \"architecture: unexpected target $TARGET\"\n            ;;\n    esac\n}\n"
  },
  {
    "path": "scripts/benchmark-basic",
    "content": "#!/bin/sh\n\n# This script does some very basic benchmarks with 'xsv' on a city population\n# data set (which is a strict subset of the `worldcitiespop` data set). If it\n# doesn't exist on your system, it will be downloaded to /tmp for you.\n#\n# These aren't meant to be overly rigorous, but they should be enough to catch\n# significant regressions.\n#\n# Make sure you're using an `xsv` generated by `cargo build --release`.\n\nset -e\n\npat=\"$1\"\ndata=/tmp/worldcitiespop_mil.csv\ndata_idx=/tmp/worldcitiespop_mil.csv.idx\nif [ ! -r \"$data\" ]; then\n  curl -sS https://burntsushi.net/stuff/worldcitiespop_mil.csv > \"$data\"\nfi\ndata_size=$(stat --format '%s' \"$data\")\n\nfunction real_seconds {\n  cmd=$(echo $@ \"> /dev/null 2>&1\")\n  t=$(\n    $(which time) -p sh -c \"$cmd\" 2>&1 \\\n      | grep '^real' \\\n      | awk '{print $2}')\n  if [ $(echo \"$t < 0.01\" | bc) = 1 ]; then\n    t=0.01\n  fi\n  echo $t\n}\n\nfunction benchmark {\n  rm -f \"$data_idx\"\n  t1=$(real_seconds \"$@\")\n  rm -f \"$data_idx\"\n  t2=$(real_seconds \"$@\")\n  rm -f \"$data_idx\"\n  t3=$(real_seconds \"$@\")\n  echo \"scale=2; ($t1 + $t2 + $t3) / 3\" | bc\n}\n\nfunction benchmark_with_index {\n  rm -f \"$data_idx\"\n  xsv index \"$data\"\n  t1=$(real_seconds \"$@\")\n  t2=$(real_seconds \"$@\")\n  t3=$(real_seconds \"$@\")\n  rm -f \"$data_idx\"\n  echo \"scale=2; ($t1 + $t2 + $t3) / 3\" | bc\n}\n\nfunction run {\n  index=\n  while true; do\n    case \"$1\" in\n      --index) index=\"yes\" && shift ;;\n      *) break ;;\n    esac\n  done\n  name=\"$1\"\n  shift\n\n  if [ -z \"$pat\" ] || echo \"$name\" | grep -E -q \"^$pat$\"; then\n    if [ -z \"$index\" ]; then\n      t=$(benchmark \"$@\")\n    else\n      t=$(benchmark_with_index \"$@\")\n    fi\n    mb_per=$(echo \"scale=2; ($data_size / $t) / 2^20\" | bc)\n    printf \"%s\\t%0.02f seconds\\t%s MB/sec\\n\" $name $t $mb_per\n  fi\n}\n\nrun count xsv count \"$data\"\nrun flatten xsv flatten \"$data\"\nrun flatten_condensed xsv flatten \"$data\" --condense 50\nrun frequency xsv frequency \"$data\"\nrun index xsv index \"$data\"\nrun sample_10 xsv sample 10 \"$data\"\nrun sample_1000 xsv sample 1000 \"$data\"\nrun sample_100000 xsv sample 100000 \"$data\"\nrun search xsv search -s Country \"'(?i)us'\" \"$data\"\nrun select xsv select Country \"$data\"\nrun sort xsv sort -s AccentCity \"$data\"\nrun slice_one_middle xsv slice -i 500000 \"$data\"\nrun --index slice_one_middle_index xsv slice -i 500000 \"$data\"\nrun stats xsv stats \"$data\"\nrun --index stats_index xsv stats \"$data\"\nrun stats_everything xsv stats \"$data\" --everything\nrun --index stats_everything_index xsv stats \"$data\" --everything\n"
  },
  {
    "path": "scripts/build-release",
    "content": "#!/bin/sh\n\nversion=$(git describe --abbrev=0 --tags)\nname=\"xsv-$version-x86_64-unknown-linux-gnu\"\n\nmkdir -p ./builds/\ncargo build --release\nrm -rf \"/tmp/$name\"\nmkdir \"/tmp/$name\"\ncp ./target/release/xsv \"/tmp/$name/\"\ncp ./README.md \"/tmp/$name/\"\ncp ./UNLICENSE \"/tmp/$name/\"\ntar zcf \"./builds/$name.tar.gz\" -C /tmp $name\n\n"
  },
  {
    "path": "scripts/github-release",
    "content": "#!/bin/sh\n\nversion=$(git describe --abbrev=0 --tags)\nname=\"xsv-$version-x86_64-unknown-linux-gnu\"\n\ngithub-release release --user BurntSushi --repo xsv --tag $version \\\n  --name \"xsv-$version\" --pre-release\n"
  },
  {
    "path": "scripts/github-upload",
    "content": "#!/bin/sh\n\nversion=$(git describe --abbrev=0 --tags)\nname=\"xsv-$version-x86_64-unknown-linux-gnu\"\n\n./scripts/build-release\ngithub-release upload --user BurntSushi --repo xsv --tag $version \\\n  --name \"$name.tar.gz\" \\\n  --file \"./builds/$name.tar.gz\"\n"
  },
  {
    "path": "session.vim",
    "content": "au BufWritePost *.rs silent!make ctags > /dev/null 2>&1\n"
  },
  {
    "path": "src/cmd/cat.rs",
    "content": "use csv;\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse util;\n\nstatic USAGE: &'static str = \"\nConcatenates CSV data by column or by row.\n\nWhen concatenating by column, the columns will be written in the same order as\nthe inputs given. The number of rows in the result is always equivalent to to\nthe minimum number of rows across all given CSV data. (This behavior can be\nreversed with the '--pad' flag.)\n\nWhen concatenating by row, all CSV data must have the same number of columns.\nIf you need to rearrange the columns or fix the lengths of records, use the\n'select' or 'fixlengths' commands. Also, only the headers of the *first* CSV\ndata given are used. Headers in subsequent inputs are ignored. (This behavior\ncan be disabled with --no-headers.)\n\nUsage:\n    xsv cat rows    [options] [<input>...]\n    xsv cat columns [options] [<input>...]\n    xsv cat --help\n\ncat options:\n    -p, --pad              When concatenating columns, this flag will cause\n                           all records to appear. It will pad each row if\n                           other CSV data isn't long enough.\n\nCommon options:\n    -h, --help             Display this message\n    -o, --output <file>    Write output to <file> instead of stdout.\n    -n, --no-headers       When set, the first row will NOT be interpreted\n                           as column names. Note that this has no effect when\n                           concatenating columns.\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Deserialize)]\nstruct Args {\n    cmd_rows: bool,\n    cmd_columns: bool,\n    arg_input: Vec<String>,\n    flag_pad: bool,\n    flag_output: Option<String>,\n    flag_no_headers: bool,\n    flag_delimiter: Option<Delimiter>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n    if args.cmd_rows {\n        args.cat_rows()\n    } else if args.cmd_columns {\n        args.cat_columns()\n    } else {\n        unreachable!();\n    }\n}\n\nimpl Args {\n    fn configs(&self) -> CliResult<Vec<Config>> {\n        util::many_configs(&*self.arg_input,\n                           self.flag_delimiter,\n                           self.flag_no_headers)\n             .map_err(From::from)\n    }\n\n    fn cat_rows(&self) -> CliResult<()> {\n        let mut row = csv::ByteRecord::new();\n        let mut wtr = Config::new(&self.flag_output).writer()?;\n        for (i, conf) in self.configs()?.into_iter().enumerate() {\n            let mut rdr = conf.reader()?;\n            if i == 0 {\n                conf.write_headers(&mut rdr, &mut wtr)?;\n            }\n            while rdr.read_byte_record(&mut row)? {\n                wtr.write_byte_record(&row)?;\n            }\n        }\n        wtr.flush().map_err(From::from)\n    }\n\n    fn cat_columns(&self) -> CliResult<()> {\n        let mut wtr = Config::new(&self.flag_output).writer()?;\n        let mut rdrs = self.configs()?\n            .into_iter()\n            .map(|conf| conf.no_headers(true).reader())\n            .collect::<Result<Vec<_>, _>>()?;\n\n        // Find the lengths of each record. If a length varies, then an error\n        // will occur so we can rely on the first length being the correct one.\n        let mut lengths = vec![];\n        for rdr in &mut rdrs {\n            lengths.push(rdr.byte_headers()?.len());\n        }\n\n        let mut iters = rdrs.iter_mut()\n                            .map(|rdr| rdr.byte_records())\n                            .collect::<Vec<_>>();\n        'OUTER: loop {\n            let mut record = csv::ByteRecord::new();\n            let mut num_done = 0;\n            for (iter, &len) in iters.iter_mut().zip(lengths.iter()) {\n                match iter.next() {\n                    None => {\n                        num_done += 1;\n                        if self.flag_pad {\n                            for _ in 0..len {\n                                record.push_field(b\"\");\n                            }\n                        } else {\n                            break 'OUTER;\n                        }\n                    }\n                    Some(Err(err)) => return fail!(err),\n                    Some(Ok(next)) => record.extend(&next),\n                }\n            }\n            // Only needed when `--pad` is set.\n            // When not set, the OUTER loop breaks when the shortest iterator\n            // is exhausted.\n            if num_done >= iters.len() {\n                break 'OUTER;\n            }\n            wtr.write_byte_record(&record)?;\n        }\n        wtr.flush().map_err(From::from)\n    }\n}\n"
  },
  {
    "path": "src/cmd/count.rs",
    "content": "use csv;\n\nuse CliResult;\nuse config::{Delimiter, Config};\nuse util;\n\nstatic USAGE: &'static str = \"\nPrints a count of the number of records in the CSV data.\n\nNote that the count will not include the header row (unless --no-headers is\ngiven).\n\nUsage:\n    xsv count [options] [<input>]\n\nCommon options:\n    -h, --help             Display this message\n    -n, --no-headers       When set, the first row will not be included in\n                           the count.\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Deserialize)]\nstruct Args {\n    arg_input: Option<String>,\n    flag_no_headers: bool,\n    flag_delimiter: Option<Delimiter>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n    let conf = Config::new(&args.arg_input)\n        .delimiter(args.flag_delimiter)\n        .no_headers(args.flag_no_headers);\n\n    let count =\n        match conf.indexed()? {\n            Some(idx) => idx.count(),\n            None => {\n                let mut rdr = conf.reader()?;\n                let mut count = 0u64;\n                let mut record = csv::ByteRecord::new();\n                while rdr.read_byte_record(&mut record)? {\n                    count += 1;\n                }\n                count\n            }\n        };\n    Ok(println!(\"{}\", count))\n}\n"
  },
  {
    "path": "src/cmd/fixlengths.rs",
    "content": "use std::cmp;\n\nuse csv;\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse util;\n\nstatic USAGE: &'static str = \"\nTransforms CSV data so that all records have the same length. The length is\nthe length of the longest record in the data (not counting trailing empty fields,\nbut at least 1). Records with smaller lengths are padded with empty fields.\n\nThis requires two complete scans of the CSV data: one for determining the\nrecord size and one for the actual transform. Because of this, the input\ngiven must be a file and not stdin.\n\nAlternatively, if --length is set, then all records are forced to that length.\nThis requires a single pass and can be done with stdin.\n\nUsage:\n    xsv fixlengths [options] [<input>]\n\nfixlengths options:\n    -l, --length <arg>     Forcefully set the length of each record. If a\n                           record is not the size given, then it is truncated\n                           or expanded as appropriate.\n\nCommon options:\n    -h, --help             Display this message\n    -o, --output <file>    Write output to <file> instead of stdout.\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Deserialize)]\nstruct Args {\n    arg_input: Option<String>,\n    flag_length: Option<usize>,\n    flag_output: Option<String>,\n    flag_delimiter: Option<Delimiter>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n    let config = Config::new(&args.arg_input)\n        .delimiter(args.flag_delimiter)\n        .no_headers(true)\n        .flexible(true);\n    let length = match args.flag_length {\n        Some(length) => {\n            if length == 0 {\n                return fail!(\"Length must be greater than 0.\");\n            }\n            length\n        }\n        None => {\n            if config.is_std() {\n                return fail!(\"<stdin> cannot be used in this command. \\\n                              Please specify a file path.\");\n            }\n            let mut maxlen = 0usize;\n            let mut rdr = config.reader()?;\n            let mut record = csv::ByteRecord::new();\n            while rdr.read_byte_record(&mut record)? {\n                let mut index = 0;\n                let mut nonempty_count = 0;\n                for field in &record {\n                    index += 1;\n                    if index == 1 || !field.is_empty() {\n                        nonempty_count = index;\n                    }\n                }\n                maxlen = cmp::max(maxlen, nonempty_count);\n            }\n            maxlen\n        }\n    };\n\n    let mut rdr = config.reader()?;\n    let mut wtr = Config::new(&args.flag_output).writer()?;\n    for r in rdr.byte_records() {\n        let mut r = r?;\n        if length >= r.len() {\n            for _ in r.len()..length {\n                r.push_field(b\"\");\n            }\n        } else {\n            r.truncate(length);\n        }\n        wtr.write_byte_record(&r)?;\n    }\n    wtr.flush()?;\n    Ok(())\n}\n"
  },
  {
    "path": "src/cmd/flatten.rs",
    "content": "use std::borrow::Cow;\nuse std::io::{self, Write};\n\nuse tabwriter::TabWriter;\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse util;\n\nstatic USAGE: &'static str = \"\nPrints flattened records such that fields are labeled separated by a new line.\nThis mode is particularly useful for viewing one record at a time. Each\nrecord is separated by a special '#' character (on a line by itself), which\ncan be changed with the --separator flag.\n\nThere is also a condensed view (-c or --condense) that will shorten the\ncontents of each field to provide a summary view.\n\nUsage:\n    xsv flatten [options] [<input>]\n\nflatten options:\n    -c, --condense <arg>  Limits the length of each field to the value\n                           specified. If the field is UTF-8 encoded, then\n                           <arg> refers to the number of code points.\n                           Otherwise, it refers to the number of bytes.\n    -s, --separator <arg>  A string of characters to write after each record.\n                           When non-empty, a new line is automatically\n                           appended to the separator.\n                           [default: #]\n\nCommon options:\n    -h, --help             Display this message\n    -n, --no-headers       When set, the first row will not be interpreted\n                           as headers. When set, the name of each field\n                           will be its index.\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Deserialize)]\nstruct Args {\n    arg_input: Option<String>,\n    flag_condense: Option<usize>,\n    flag_separator: String,\n    flag_no_headers: bool,\n    flag_delimiter: Option<Delimiter>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n    let rconfig = Config::new(&args.arg_input)\n        .delimiter(args.flag_delimiter)\n        .no_headers(args.flag_no_headers);\n    let mut rdr = rconfig.reader()?;\n    let headers = rdr.byte_headers()?.clone();\n\n    let mut wtr = TabWriter::new(io::stdout());\n    let mut first = true;\n    for r in rdr.byte_records() {\n        if !first && !args.flag_separator.is_empty() {\n            writeln!(&mut wtr, \"{}\", args.flag_separator)?;\n        }\n        first = false;\n        let r = r?;\n        for (i, (header, field)) in headers.iter().zip(&r).enumerate() {\n            if rconfig.no_headers {\n                write!(&mut wtr, \"{}\", i)?;\n            } else {\n                wtr.write_all(&header)?;\n            }\n            wtr.write_all(b\"\\t\")?;\n            wtr.write_all(&*util::condense(\n                Cow::Borrowed(&*field), args.flag_condense))?;\n            wtr.write_all(b\"\\n\")?;\n        }\n    }\n    wtr.flush()?;\n    Ok(())\n}\n"
  },
  {
    "path": "src/cmd/fmt.rs",
    "content": "use csv;\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse util;\n\nstatic USAGE: &'static str = \"\nFormats CSV data with a custom delimiter or CRLF line endings.\n\nGenerally, all commands in xsv output CSV data in a default format, which is\nthe same as the default format for reading CSV data. This makes it easy to\npipe multiple xsv commands together. However, you may want the final result to\nhave a specific delimiter or record separator, and this is where 'xsv fmt' is\nuseful.\n\nUsage:\n    xsv fmt [options] [<input>]\n\nfmt options:\n    -t, --out-delimiter <arg>  The field delimiter for writing CSV data.\n                               [default: ,]\n    --crlf                     Use '\\\\r\\\\n' line endings in the output.\n    --ascii                    Use ASCII field and record separators.\n    --quote <arg>              The quote character to use. [default: \\\"]\n    --quote-always             Put quotes around every value.\n    --escape <arg>             The escape character to use. When not specified,\n                               quotes are escaped by doubling them.\n\nCommon options:\n    -h, --help             Display this message\n    -o, --output <file>    Write output to <file> instead of stdout.\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Deserialize)]\nstruct Args {\n    arg_input: Option<String>,\n    flag_out_delimiter: Option<Delimiter>,\n    flag_crlf: bool,\n    flag_ascii: bool,\n    flag_output: Option<String>,\n    flag_delimiter: Option<Delimiter>,\n    flag_quote: Delimiter,\n    flag_quote_always: bool,\n    flag_escape: Option<Delimiter>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n\n    let rconfig = Config::new(&args.arg_input)\n        .delimiter(args.flag_delimiter)\n        .no_headers(true);\n    let mut wconfig = Config::new(&args.flag_output)\n        .delimiter(args.flag_out_delimiter)\n        .crlf(args.flag_crlf);\n\n    if args.flag_ascii {\n        wconfig = wconfig\n            .delimiter(Some(Delimiter(b'\\x1f')))\n            .terminator(csv::Terminator::Any(b'\\x1e'));\n    }\n    if args.flag_quote_always {\n        wconfig = wconfig.quote_style(csv::QuoteStyle::Always);\n    }\n    if let Some(escape) = args.flag_escape {\n        wconfig = wconfig.escape(Some(escape.as_byte())).double_quote(false);\n    }\n    wconfig = wconfig.quote(args.flag_quote.as_byte());\n\n\n    let mut rdr = rconfig.reader()?;\n    let mut wtr = wconfig.writer()?;\n    let mut r = csv::ByteRecord::new();\n    while rdr.read_byte_record(&mut r)? {\n        wtr.write_byte_record(&r)?;\n    }\n    wtr.flush()?;\n    Ok(())\n}\n"
  },
  {
    "path": "src/cmd/frequency.rs",
    "content": "use std::fs;\nuse std::io;\n\nuse channel;\nuse csv;\nuse stats::{Frequencies, merge_all};\nuse threadpool::ThreadPool;\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse index::Indexed;\nuse select::{SelectColumns, Selection};\nuse util;\n\nstatic USAGE: &'static str = \"\nCompute a frequency table on CSV data.\n\nThe frequency table is formatted as CSV data:\n\n    field,value,count\n\nBy default, there is a row for the N most frequent values for each field in the\ndata. The order and number of values can be tweaked with --asc and --limit,\nrespectively.\n\nSince this computes an exact frequency table, memory proportional to the\ncardinality of each column is required.\n\nUsage:\n    xsv frequency [options] [<input>]\n\nfrequency options:\n    -s, --select <arg>     Select a subset of columns to compute frequencies\n                           for. See 'xsv select --help' for the format\n                           details. This is provided here because piping 'xsv\n                           select' into 'xsv frequency' will disable the use\n                           of indexing.\n    -l, --limit <arg>      Limit the frequency table to the N most common\n                           items. Set to '0' to disable a limit.\n                           [default: 10]\n    -a, --asc              Sort the frequency tables in ascending order by\n                           count. The default is descending order.\n    --no-nulls             Don't include NULLs in the frequency table.\n    -j, --jobs <arg>       The number of jobs to run in parallel.\n                           This works better when the given CSV data has\n                           an index already created. Note that a file handle\n                           is opened for each job.\n                           When set to '0', the number of jobs is set to the\n                           number of CPUs detected.\n                           [default: 0]\n\nCommon options:\n    -h, --help             Display this message\n    -o, --output <file>    Write output to <file> instead of stdout.\n    -n, --no-headers       When set, the first row will NOT be included\n                           in the frequency table. Additionally, the 'field'\n                           column will be 1-based indices instead of header\n                           names.\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Clone, Deserialize)]\nstruct Args {\n    arg_input: Option<String>,\n    flag_select: SelectColumns,\n    flag_limit: usize,\n    flag_asc: bool,\n    flag_no_nulls: bool,\n    flag_jobs: usize,\n    flag_output: Option<String>,\n    flag_no_headers: bool,\n    flag_delimiter: Option<Delimiter>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n    let rconfig = args.rconfig();\n\n    let mut wtr = Config::new(&args.flag_output).writer()?;\n    let (headers, tables) = match args.rconfig().indexed()? {\n        Some(ref mut idx) if args.njobs() > 1 => args.parallel_ftables(idx),\n        _ => args.sequential_ftables(),\n    }?;\n\n    wtr.write_record(vec![\"field\", \"value\", \"count\"])?;\n    let head_ftables = headers.into_iter().zip(tables.into_iter());\n    for (i, (header, ftab)) in head_ftables.enumerate() {\n        let mut header = header.to_vec();\n        if rconfig.no_headers {\n            header = (i+1).to_string().into_bytes();\n        }\n        for (value, count) in args.counts(&ftab).into_iter() {\n            let count = count.to_string();\n            let row = vec![&*header, &*value, count.as_bytes()];\n            wtr.write_record(row)?;\n        }\n    }\n    Ok(())\n}\n\ntype ByteString = Vec<u8>;\ntype Headers = csv::ByteRecord;\ntype FTable = Frequencies<Vec<u8>>;\ntype FTables = Vec<Frequencies<Vec<u8>>>;\n\nimpl Args {\n    fn rconfig(&self) -> Config {\n        Config::new(&self.arg_input)\n            .delimiter(self.flag_delimiter)\n            .no_headers(self.flag_no_headers)\n            .select(self.flag_select.clone())\n    }\n\n    fn counts(&self, ftab: &FTable) -> Vec<(ByteString, u64)> {\n        let mut counts = if self.flag_asc {\n            ftab.least_frequent()\n        } else {\n            ftab.most_frequent()\n        };\n        if self.flag_limit > 0 {\n            counts = counts.into_iter().take(self.flag_limit).collect();\n        }\n        counts.into_iter().map(|(bs, c)| {\n            if b\"\" == &**bs {\n                (b\"(NULL)\"[..].to_vec(), c)\n            } else {\n                (bs.clone(), c)\n            }\n        }).collect()\n    }\n\n    fn sequential_ftables(&self) -> CliResult<(Headers, FTables)> {\n        let mut rdr = self.rconfig().reader()?;\n        let (headers, sel) = self.sel_headers(&mut rdr)?;\n        Ok((headers, self.ftables(&sel, rdr.byte_records())?))\n    }\n\n    fn parallel_ftables(&self, idx: &mut Indexed<fs::File, fs::File>)\n                       -> CliResult<(Headers, FTables)> {\n        let mut rdr = self.rconfig().reader()?;\n        let (headers, sel) = self.sel_headers(&mut rdr)?;\n\n        if idx.count() == 0 {\n            return Ok((headers, vec![]));\n        }\n\n        let chunk_size = util::chunk_size(idx.count() as usize, self.njobs());\n        let nchunks = util::num_of_chunks(idx.count() as usize, chunk_size);\n\n        let pool = ThreadPool::new(self.njobs());\n        let (send, recv) = channel::bounded(0);\n        for i in 0..nchunks {\n            let (send, args, sel) = (send.clone(), self.clone(), sel.clone());\n            pool.execute(move || {\n                let mut idx = args.rconfig().indexed().unwrap().unwrap();\n                idx.seek((i * chunk_size) as u64).unwrap();\n                let it = idx.byte_records().take(chunk_size);\n                send.send(args.ftables(&sel, it).unwrap());\n            });\n        }\n        drop(send);\n        Ok((headers, merge_all(recv).unwrap()))\n    }\n\n    fn ftables<I>(&self, sel: &Selection, it: I) -> CliResult<FTables>\n            where I: Iterator<Item=csv::Result<csv::ByteRecord>> {\n        let null = &b\"\"[..].to_vec();\n        let nsel = sel.normal();\n        let mut tabs: Vec<_> =\n            (0..nsel.len()).map(|_| Frequencies::new()).collect();\n        for row in it {\n            let row = row?;\n            for (i, field) in nsel.select(row.into_iter()).enumerate() {\n                let field = trim(field.to_vec());\n                if !field.is_empty() {\n                    tabs[i].add(field);\n                } else {\n                    if !self.flag_no_nulls {\n                        tabs[i].add(null.clone());\n                    }\n                }\n            }\n        }\n        Ok(tabs)\n    }\n\n    fn sel_headers<R: io::Read>(&self, rdr: &mut csv::Reader<R>)\n                  -> CliResult<(csv::ByteRecord, Selection)> {\n        let headers = rdr.byte_headers()?;\n        let sel = self.rconfig().selection(headers)?;\n        Ok((sel.select(headers).map(|h| h.to_vec()).collect(), sel))\n    }\n\n    fn njobs(&self) -> usize {\n        if self.flag_jobs == 0 { util::num_cpus() } else { self.flag_jobs }\n    }\n}\n\nfn trim(bs: ByteString) -> ByteString {\n    match String::from_utf8(bs) {\n        Ok(s) => s.trim().as_bytes().to_vec(),\n        Err(bs) => bs.into_bytes(),\n    }\n}\n"
  },
  {
    "path": "src/cmd/headers.rs",
    "content": "use std::io;\n\nuse tabwriter::TabWriter;\n\nuse CliResult;\nuse config::Delimiter;\nuse util;\n\nstatic USAGE: &'static str = \"\nPrints the fields of the first row in the CSV data.\n\nThese names can be used in commands like 'select' to refer to columns in the\nCSV data.\n\nNote that multiple CSV files may be given to this command. This is useful with\nthe --intersect flag.\n\nUsage:\n    xsv headers [options] [<input>...]\n\nheaders options:\n    -j, --just-names       Only show the header names (hide column index).\n                           This is automatically enabled if more than one\n                           input is given.\n    --intersect            Shows the intersection of all headers in all of\n                           the inputs given.\n\nCommon options:\n    -h, --help             Display this message\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Deserialize)]\nstruct Args {\n    arg_input: Vec<String>,\n    flag_just_names: bool,\n    flag_intersect: bool,\n    flag_delimiter: Option<Delimiter>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n    let configs = util::many_configs(\n        &*args.arg_input, args.flag_delimiter, true)?;\n\n    let num_inputs = configs.len();\n    let mut headers: Vec<Vec<u8>> = vec![];\n    for conf in configs.into_iter() {\n        let mut rdr = conf.reader()?;\n        for header in rdr.byte_headers()?.iter() {\n            if !args.flag_intersect\n                || !headers.iter().any(|h| &**h == header)\n            {\n                headers.push(header.to_vec());\n            }\n        }\n    }\n\n    let mut wtr: Box<io::Write> =\n        if args.flag_just_names {\n            Box::new(io::stdout())\n        } else {\n            Box::new(TabWriter::new(io::stdout()))\n        };\n    for (i, header) in headers.into_iter().enumerate() {\n        if num_inputs == 1 && !args.flag_just_names {\n            write!(&mut wtr, \"{}\\t\", i+1)?;\n        }\n        wtr.write_all(&header)?;\n        wtr.write_all(b\"\\n\")?;\n    }\n    wtr.flush()?;\n    Ok(())\n}\n"
  },
  {
    "path": "src/cmd/index.rs",
    "content": "use std::fs;\nuse std::io;\nuse std::path::{Path, PathBuf};\n\nuse csv_index::RandomAccessSimple;\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse util;\n\nstatic USAGE: &'static str = \"\nCreates an index of the given CSV data, which can make other operations like\nslicing, splitting and gathering statistics much faster.\n\nNote that this does not accept CSV data on stdin. You must give a file\npath. The index is created at 'path/to/input.csv.idx'. The index will be\nautomatically used by commands that can benefit from it. If the original CSV\ndata changes after the index is made, commands that try to use it will result\nin an error (you have to regenerate the index before it can be used again).\n\nUsage:\n    xsv index [options] <input>\n    xsv index --help\n\nindex options:\n    -o, --output <file>    Write index to <file> instead of <input>.idx.\n                           Generally, this is not currently useful because\n                           the only way to use an index is if it is specially\n                           named <input>.idx.\n\nCommon options:\n    -h, --help             Display this message\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Deserialize)]\nstruct Args {\n    arg_input: String,\n    flag_output: Option<String>,\n    flag_delimiter: Option<Delimiter>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n\n    let pidx = match args.flag_output {\n        None => util::idx_path(&Path::new(&args.arg_input)),\n        Some(p) => PathBuf::from(&p),\n    };\n\n    let rconfig = Config::new(&Some(args.arg_input))\n                         .delimiter(args.flag_delimiter);\n    let mut rdr = rconfig.reader_file()?;\n    let mut wtr = io::BufWriter::new(fs::File::create(&pidx)?);\n    RandomAccessSimple::create(&mut rdr, &mut wtr)?;\n    Ok(())\n}\n"
  },
  {
    "path": "src/cmd/input.rs",
    "content": "use csv;\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse util;\n\nstatic USAGE: &'static str = \"\nRead CSV data with special quoting rules.\n\nGenerally, all xsv commands support basic options like specifying the delimiter\nused in CSV data. This does not cover all possible types of CSV data. For\nexample, some CSV files don't use '\\\"' for quotes or use different escaping\nstyles.\n\nUsage:\n    xsv input [options] [<input>]\n\ninput options:\n    --quote <arg>          The quote character to use. [default: \\\"]\n    --escape <arg>         The escape character to use. When not specified,\n                           quotes are escaped by doubling them.\n    --no-quoting           Disable quoting completely.\n\nCommon options:\n    -h, --help             Display this message\n    -o, --output <file>    Write output to <file> instead of stdout.\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Deserialize)]\nstruct Args {\n    arg_input: Option<String>,\n    flag_output: Option<String>,\n    flag_delimiter: Option<Delimiter>,\n    flag_quote: Delimiter,\n    flag_escape: Option<Delimiter>,\n    flag_no_quoting: bool,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n    let mut rconfig = Config::new(&args.arg_input)\n        .delimiter(args.flag_delimiter)\n        .no_headers(true)\n        .quote(args.flag_quote.as_byte());\n    let wconfig = Config::new(&args.flag_output);\n\n    if let Some(escape) = args.flag_escape {\n        rconfig = rconfig.escape(Some(escape.as_byte())).double_quote(false);\n    }\n    if args.flag_no_quoting {\n        rconfig = rconfig.quoting(false);\n    }\n\n    let mut rdr = rconfig.reader()?;\n    let mut wtr = wconfig.writer()?;\n    let mut row = csv::ByteRecord::new();\n    while rdr.read_byte_record(&mut row)? {\n        wtr.write_record(&row)?;\n    }\n    wtr.flush()?;\n    Ok(())\n}\n"
  },
  {
    "path": "src/cmd/join.rs",
    "content": "use std::collections::hash_map::{HashMap, Entry};\nuse std::fmt;\nuse std::fs;\nuse std::io;\nuse std::iter::repeat;\nuse std::str;\n\nuse byteorder::{WriteBytesExt, BigEndian};\nuse csv;\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse index::Indexed;\nuse select::{SelectColumns, Selection};\nuse util;\n\nstatic USAGE: &'static str = \"\nJoins two sets of CSV data on the specified columns.\n\nThe default join operation is an 'inner' join. This corresponds to the\nintersection of rows on the keys specified.\n\nJoins are always done by ignoring leading and trailing whitespace. By default,\njoins are done case sensitively, but this can be disabled with the --no-case\nflag.\n\nThe columns arguments specify the columns to join for each input. Columns can\nbe referenced by name or index, starting at 1. Specify multiple columns by\nseparating them with a comma. Specify a range of columns with `-`. Both\ncolumns1 and columns2 must specify exactly the same number of columns.\n(See 'xsv select --help' for the full syntax.)\n\nUsage:\n    xsv join [options] <columns1> <input1> <columns2> <input2>\n    xsv join --help\n\njoin options:\n    --no-case              When set, joins are done case insensitively.\n    --left                 Do a 'left outer' join. This returns all rows in\n                           first CSV data set, including rows with no\n                           corresponding row in the second data set. When no\n                           corresponding row exists, it is padded out with\n                           empty fields.\n    --right                Do a 'right outer' join. This returns all rows in\n                           second CSV data set, including rows with no\n                           corresponding row in the first data set. When no\n                           corresponding row exists, it is padded out with\n                           empty fields. (This is the reverse of 'outer left'.)\n    --full                 Do a 'full outer' join. This returns all rows in\n                           both data sets with matching records joined. If\n                           there is no match, the missing side will be padded\n                           out with empty fields. (This is the combination of\n                           'outer left' and 'outer right'.)\n    --cross                USE WITH CAUTION.\n                           This returns the cartesian product of the CSV\n                           data sets given. The number of rows return is\n                           equal to N * M, where N and M correspond to the\n                           number of rows in the given data sets, respectively.\n    --nulls                When set, joins will work on empty fields.\n                           Otherwise, empty fields are completely ignored.\n                           (In fact, any row that has an empty field in the\n                           key specified is ignored.)\n\nCommon options:\n    -h, --help             Display this message\n    -o, --output <file>    Write output to <file> instead of stdout.\n    -n, --no-headers       When set, the first row will not be interpreted\n                           as headers. (i.e., They are not searched, analyzed,\n                           sliced, etc.)\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\ntype ByteString = Vec<u8>;\n\n#[derive(Deserialize)]\nstruct Args {\n    arg_columns1: SelectColumns,\n    arg_input1: String,\n    arg_columns2: SelectColumns,\n    arg_input2: String,\n    flag_left: bool,\n    flag_right: bool,\n    flag_full: bool,\n    flag_cross: bool,\n    flag_output: Option<String>,\n    flag_no_headers: bool,\n    flag_no_case: bool,\n    flag_nulls: bool,\n    flag_delimiter: Option<Delimiter>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n    let mut state = args.new_io_state()?;\n    match (\n        args.flag_left,\n        args.flag_right,\n        args.flag_full,\n        args.flag_cross,\n    ) {\n        (true, false, false, false) => {\n            state.write_headers()?;\n            state.outer_join(false)\n        }\n        (false, true, false, false) => {\n            state.write_headers()?;\n            state.outer_join(true)\n        }\n        (false, false, true, false) => {\n            state.write_headers()?;\n            state.full_outer_join()\n        }\n        (false, false, false, true) => {\n            state.write_headers()?;\n            state.cross_join()\n        }\n        (false, false, false, false) => {\n            state.write_headers()?;\n            state.inner_join()\n        }\n        _ => fail!(\"Please pick exactly one join operation.\")\n    }\n}\n\nstruct IoState<R, W: io::Write> {\n    wtr: csv::Writer<W>,\n    rdr1: csv::Reader<R>,\n    sel1: Selection,\n    rdr2: csv::Reader<R>,\n    sel2: Selection,\n    no_headers: bool,\n    casei: bool,\n    nulls: bool,\n}\n\nimpl<R: io::Read + io::Seek, W: io::Write> IoState<R, W> {\n    fn write_headers(&mut self) -> CliResult<()> {\n        if !self.no_headers {\n            let mut headers = self.rdr1.byte_headers()?.clone();\n            headers.extend(self.rdr2.byte_headers()?.iter());\n            self.wtr.write_record(&headers)?;\n        }\n        Ok(())\n    }\n\n    fn inner_join(mut self) -> CliResult<()> {\n        let mut scratch = csv::ByteRecord::new();\n        let mut validx = ValueIndex::new(\n            self.rdr2, &self.sel2, self.casei, self.nulls)?;\n        for row in self.rdr1.byte_records() {\n            let row = row?;\n            let key = get_row_key(&self.sel1, &row, self.casei);\n            match validx.values.get(&key) {\n                None => continue,\n                Some(rows) => {\n                    for &rowi in rows.iter() {\n                        validx.idx.seek(rowi as u64)?;\n\n                        validx.idx.read_byte_record(&mut scratch)?;\n                        let combined = row.iter().chain(scratch.iter());\n                        self.wtr.write_record(combined)?;\n                    }\n                }\n            }\n        }\n        Ok(())\n    }\n\n    fn outer_join(mut self, right: bool) -> CliResult<()> {\n        if right {\n            ::std::mem::swap(&mut self.rdr1, &mut self.rdr2);\n            ::std::mem::swap(&mut self.sel1, &mut self.sel2);\n        }\n\n        let mut scratch = csv::ByteRecord::new();\n        let (_, pad2) = self.get_padding()?;\n        let mut validx = ValueIndex::new(\n            self.rdr2, &self.sel2, self.casei, self.nulls)?;\n        for row in self.rdr1.byte_records() {\n            let row = row?;\n            let key = get_row_key(&self.sel1, &row, self.casei);\n            match validx.values.get(&key) {\n                None => {\n                    if right {\n                        self.wtr.write_record(pad2.iter().chain(&row))?;\n                    } else {\n                        self.wtr.write_record(row.iter().chain(&pad2))?;\n                    }\n                }\n                Some(rows) => {\n                    for &rowi in rows.iter() {\n                        validx.idx.seek(rowi as u64)?;\n                        let row1 = row.iter();\n                        validx.idx.read_byte_record(&mut scratch)?;\n                        if right {\n                            self.wtr.write_record(scratch.iter().chain(row1))?;\n                        } else {\n                            self.wtr.write_record(row1.chain(&scratch))?;\n                        }\n                    }\n                }\n            }\n        }\n        Ok(())\n    }\n\n    fn full_outer_join(mut self) -> CliResult<()> {\n        let mut scratch = csv::ByteRecord::new();\n        let (pad1, pad2) = self.get_padding()?;\n        let mut validx = ValueIndex::new(\n            self.rdr2, &self.sel2, self.casei, self.nulls)?;\n\n        // Keep track of which rows we've written from rdr2.\n        let mut rdr2_written: Vec<_> =\n            repeat(false).take(validx.num_rows).collect();\n        for row1 in self.rdr1.byte_records() {\n            let row1 = row1?;\n            let key = get_row_key(&self.sel1, &row1, self.casei);\n            match validx.values.get(&key) {\n                None => {\n                    self.wtr.write_record(row1.iter().chain(&pad2))?;\n                }\n                Some(rows) => {\n                    for &rowi in rows.iter() {\n                        rdr2_written[rowi] = true;\n\n                        validx.idx.seek(rowi as u64)?;\n                        validx.idx.read_byte_record(&mut scratch)?;\n                        self.wtr.write_record(row1.iter().chain(&scratch))?;\n                    }\n                }\n            }\n        }\n\n        // OK, now write any row from rdr2 that didn't get joined with a row\n        // from rdr1.\n        for (i, &written) in rdr2_written.iter().enumerate() {\n            if !written {\n                validx.idx.seek(i as u64)?;\n                validx.idx.read_byte_record(&mut scratch)?;\n                self.wtr.write_record(pad1.iter().chain(&scratch))?;\n            }\n        }\n        Ok(())\n    }\n\n    fn cross_join(mut self) -> CliResult<()> {\n        let mut pos = csv::Position::new();\n        pos.set_byte(0);\n        let mut row2 = csv::ByteRecord::new();\n        for row1 in self.rdr1.byte_records() {\n            let row1 = row1?;\n            self.rdr2.seek(pos.clone())?;\n            if self.rdr2.has_headers() {\n                // Read and skip the header row, since CSV readers disable\n                // the header skipping logic after being seeked.\n                self.rdr2.read_byte_record(&mut row2)?;\n            }\n            while self.rdr2.read_byte_record(&mut row2)? {\n                self.wtr.write_record(row1.iter().chain(&row2))?;\n            }\n        }\n        Ok(())\n    }\n\n    fn get_padding(\n        &mut self,\n    ) -> CliResult<(csv::ByteRecord, csv::ByteRecord)> {\n        let len1 = self.rdr1.byte_headers()?.len();\n        let len2 = self.rdr2.byte_headers()?.len();\n        Ok((\n            repeat(b\"\").take(len1).collect(),\n            repeat(b\"\").take(len2).collect(),\n        ))\n    }\n}\n\nimpl Args {\n    fn new_io_state(&self)\n        -> CliResult<IoState<fs::File, Box<io::Write+'static>>> {\n        let rconf1 = Config::new(&Some(self.arg_input1.clone()))\n            .delimiter(self.flag_delimiter)\n            .no_headers(self.flag_no_headers)\n            .select(self.arg_columns1.clone());\n        let rconf2 = Config::new(&Some(self.arg_input2.clone()))\n            .delimiter(self.flag_delimiter)\n            .no_headers(self.flag_no_headers)\n            .select(self.arg_columns2.clone());\n\n        let mut rdr1 = rconf1.reader_file()?;\n        let mut rdr2 = rconf2.reader_file()?;\n        let (sel1, sel2) = self.get_selections(\n            &rconf1, &mut rdr1, &rconf2, &mut rdr2)?;\n        Ok(IoState {\n            wtr: Config::new(&self.flag_output).writer()?,\n            rdr1: rdr1,\n            sel1: sel1,\n            rdr2: rdr2,\n            sel2: sel2,\n            no_headers: rconf1.no_headers,\n            casei: self.flag_no_case,\n            nulls: self.flag_nulls,\n        })\n    }\n\n    fn get_selections<R: io::Read>(\n        &self,\n        rconf1: &Config, rdr1: &mut csv::Reader<R>,\n        rconf2: &Config, rdr2: &mut csv::Reader<R>,\n    ) -> CliResult<(Selection, Selection)> {\n        let headers1 = rdr1.byte_headers()?;\n        let headers2 = rdr2.byte_headers()?;\n        let select1 = rconf1.selection(&*headers1)?;\n        let select2 = rconf2.selection(&*headers2)?;\n        if select1.len() != select2.len() {\n            return fail!(format!(\n                \"Column selections must have the same number of columns, \\\n                 but found column selections with {} and {} columns.\",\n                select1.len(), select2.len()));\n        }\n        Ok((select1, select2))\n    }\n}\n\nstruct ValueIndex<R> {\n    // This maps tuples of values to corresponding rows.\n    values: HashMap<Vec<ByteString>, Vec<usize>>,\n    idx: Indexed<R, io::Cursor<Vec<u8>>>,\n    num_rows: usize,\n}\n\nimpl<R: io::Read + io::Seek> ValueIndex<R> {\n    fn new(\n        mut rdr: csv::Reader<R>,\n        sel: &Selection,\n        casei: bool,\n        nulls: bool,\n    ) -> CliResult<ValueIndex<R>> {\n        let mut val_idx = HashMap::with_capacity(10000);\n        let mut row_idx = io::Cursor::new(Vec::with_capacity(8 * 10000));\n        let (mut rowi, mut count) = (0usize, 0usize);\n\n        // This logic is kind of tricky. Basically, we want to include\n        // the header row in the line index (because that's what csv::index\n        // does), but we don't want to include header values in the ValueIndex.\n        if !rdr.has_headers() {\n            // ... so if there are no headers, we seek to the beginning and\n            // index everything.\n            let mut pos = csv::Position::new();\n            pos.set_byte(0);\n            rdr.seek(pos)?;\n        } else {\n            // ... and if there are headers, we make sure that we've parsed\n            // them, and write the offset of the header row to the index.\n            rdr.byte_headers()?;\n            row_idx.write_u64::<BigEndian>(0)?;\n            count += 1;\n        }\n\n        let mut row = csv::ByteRecord::new();\n        while rdr.read_byte_record(&mut row)? {\n            // This is a bit hokey. We're doing this manually instead of using\n            // the `csv-index` crate directly so that we can create both\n            // indexes in one pass.\n            row_idx.write_u64::<BigEndian>(row.position().unwrap().byte())?;\n\n            let fields: Vec<_> = sel\n                .select(&row)\n                .map(|v| transform(v, casei))\n                .collect();\n            if nulls || !fields.iter().any(|f| f.is_empty()) {\n                match val_idx.entry(fields) {\n                    Entry::Vacant(v) => {\n                        let mut rows = Vec::with_capacity(4);\n                        rows.push(rowi);\n                        v.insert(rows);\n                    }\n                    Entry::Occupied(mut v) => {\n                        v.get_mut().push(rowi);\n                    }\n                }\n            }\n            rowi += 1;\n            count += 1;\n        }\n\n        row_idx.write_u64::<BigEndian>(count as u64)?;\n        let idx = Indexed::open(rdr, io::Cursor::new(row_idx.into_inner()))?;\n        Ok(ValueIndex {\n            values: val_idx,\n            idx: idx,\n            num_rows: rowi,\n        })\n    }\n}\n\nimpl<R> fmt::Debug for ValueIndex<R> {\n    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {\n        // Sort the values by order of first appearance.\n        let mut kvs = self.values.iter().collect::<Vec<_>>();\n        kvs.sort_by(|&(_, v1), &(_, v2)| v1[0].cmp(&v2[0]));\n        for (keys, rows) in kvs.into_iter() {\n            // This is just for debugging, so assume Unicode for now.\n            let keys = keys.iter()\n                           .map(|k| String::from_utf8(k.to_vec()).unwrap())\n                           .collect::<Vec<_>>();\n            writeln!(f, \"({}) => {:?}\", keys.join(\", \"), rows)?\n        }\n        Ok(())\n    }\n}\n\nfn get_row_key(\n    sel: &Selection,\n    row: &csv::ByteRecord,\n    casei: bool,\n) -> Vec<ByteString> {\n    sel.select(row).map(|v| transform(&v, casei)).collect()\n}\n\nfn transform(bs: &[u8], casei: bool) -> ByteString {\n    match str::from_utf8(bs) {\n        Err(_) => bs.to_vec(),\n        Ok(s) => {\n            if !casei {\n                s.trim().as_bytes().to_vec()\n            } else {\n                let norm: String =\n                    s.trim().chars()\n                     .map(|c| c.to_lowercase().next().unwrap()).collect();\n                norm.into_bytes()\n            }\n        }\n    }\n}\n"
  },
  {
    "path": "src/cmd/mod.rs",
    "content": "pub mod cat;\npub mod count;\npub mod fixlengths;\npub mod flatten;\npub mod fmt;\npub mod frequency;\npub mod headers;\npub mod index;\npub mod input;\npub mod join;\npub mod partition;\npub mod reverse;\npub mod sample;\npub mod search;\npub mod select;\npub mod slice;\npub mod sort;\npub mod split;\npub mod stats;\npub mod table;\n"
  },
  {
    "path": "src/cmd/partition.rs",
    "content": "use std::collections::{HashMap, HashSet};\nuse std::collections::hash_map::Entry;\nuse std::fs;\nuse std::io;\nuse std::path::Path;\n\nuse csv;\nuse regex::Regex;\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse select::SelectColumns;\nuse util::{self, FilenameTemplate};\n\nstatic USAGE: &'static str = \"\nPartitions the given CSV data into chunks based on the value of a column\n\nThe files are written to the output directory with filenames based on the\nvalues in the partition column and the `--filename` flag.\n\nUsage:\n    xsv partition [options] <column> <outdir> [<input>]\n    xsv partition --help\n\npartition options:\n    --filename <filename>  A filename template to use when constructing\n                           the names of the output files.  The string '{}'\n                           will be replaced by a value based on the value\n                           of the field, but sanitized for shell safety.\n                           [default: {}.csv]\n    -p, --prefix-length <n>  Truncate the partition column after the\n                           specified number of bytes when creating the\n                           output file.\n    --drop                 Drop the partition column from results.\n\nCommon options:\n    -h, --help             Display this message\n    -n, --no-headers       When set, the first row will NOT be interpreted\n                           as column names. Otherwise, the first row will\n                           appear in all chunks as the header row.\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Clone, Deserialize)]\nstruct Args {\n    arg_column: SelectColumns,\n    arg_input: Option<String>,\n    arg_outdir: String,\n    flag_filename: FilenameTemplate,\n    flag_prefix_length: Option<usize>,\n    flag_drop: bool,\n    flag_no_headers: bool,\n    flag_delimiter: Option<Delimiter>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n    fs::create_dir_all(&args.arg_outdir)?;\n\n    // It would be nice to support efficient parallel partitions, but doing\n    // do would involve more complicated inter-thread communication, with\n    // multiple readers and writers, and some way of passing buffers\n    // between them.\n    args.sequential_partition()\n}\n\nimpl Args {\n    /// Configuration for our reader.\n    fn rconfig(&self) -> Config {\n        Config::new(&self.arg_input)\n            .delimiter(self.flag_delimiter)\n            .no_headers(self.flag_no_headers)\n            .select(self.arg_column.clone())\n    }\n\n    /// Get the column to use as a key.\n    fn key_column(\n        &self,\n        rconfig: &Config,\n        headers: &csv::ByteRecord,\n    ) -> CliResult<usize> {\n        let select_cols = rconfig.selection(headers)?;\n        if select_cols.len() == 1 {\n            Ok(select_cols[0])\n        } else {\n            fail!(\"can only partition on one column\")\n        }\n    }\n\n    /// A basic sequential partition.\n    fn sequential_partition(&self) -> CliResult<()> {\n        let rconfig = self.rconfig();\n        let mut rdr = rconfig.reader()?;\n        let headers = rdr.byte_headers()?.clone();\n        let key_col = self.key_column(&rconfig, &headers)?;\n        let mut gen = WriterGenerator::new(self.flag_filename.clone());\n\n        let mut writers: HashMap<Vec<u8>, BoxedWriter> =\n            HashMap::new();\n        let mut row = csv::ByteRecord::new();\n        while rdr.read_byte_record(&mut row)? {\n            // Decide what file to put this in.\n            let column = &row[key_col];\n            let key = match self.flag_prefix_length {\n                // We exceed --prefix-length, so ignore the extra bytes.\n                Some(len) if len < column.len() => &column[0..len],\n                _ => &column[..],\n            };\n            let mut entry = writers.entry(key.to_vec());\n            let wtr = match entry {\n                Entry::Occupied(ref mut occupied) => occupied.get_mut(),\n                Entry::Vacant(vacant) => {\n                    // We have a new key, so make a new writer.\n                    let mut wtr = gen.writer(&*self.arg_outdir, key)?;\n                    if !rconfig.no_headers {\n                        if self.flag_drop {\n                            wtr.write_record(headers.iter().enumerate()\n                                .filter_map(|(i, e)| if i != key_col { Some(e) } else { None }))?;\n                        } else {\n                            wtr.write_record(&headers)?;\n                        }\n                    }\n                    vacant.insert(wtr)\n                }\n            };\n            if self.flag_drop {\n                wtr.write_record(row.iter().enumerate()\n                    .filter_map(|(i, e)| if i != key_col { Some(e) } else { None }))?;\n            } else {\n                wtr.write_byte_record(&row)?;\n            }\n        }\n        Ok(())\n    }\n}\n\ntype BoxedWriter = csv::Writer<Box<io::Write+'static>>;\n\n/// Generates unique filenames based on CSV values.\nstruct WriterGenerator {\n    template: FilenameTemplate,\n    counter: usize,\n    used: HashSet<String>,\n    non_word_char: Regex,\n}\n\nimpl WriterGenerator {\n    fn new(template: FilenameTemplate) -> WriterGenerator {\n        WriterGenerator {\n            template: template,\n            counter: 1,\n            used: HashSet::new(),\n            non_word_char: Regex::new(r\"\\W\").unwrap(),\n        }\n    }\n\n    /// Create a CSV writer for `key`.  Does not add headers.\n    fn writer<P>(&mut self, path: P, key: &[u8]) -> io::Result<BoxedWriter>\n        where P: AsRef<Path>\n    {\n        let unique_value = self.unique_value(key);\n        self.template.writer(path.as_ref(), &unique_value)\n    }\n\n    /// Generate a unique value for `key`, suitable for use in a\n    /// \"shell-safe\" filename.  If you pass `key` twice, you'll get two\n    /// different values.\n    fn unique_value(&mut self, key: &[u8]) -> String {\n        // Sanitize our key.\n        let utf8 = String::from_utf8_lossy(key);\n        let safe = self.non_word_char.replace_all(&*utf8, \"\").into_owned();\n        let base =\n            if safe.is_empty() {\n                \"empty\".to_owned()\n            } else {\n                safe\n            };\n\n        // Now check for collisions.\n        if !self.used.contains(&base) {\n            self.used.insert(base.clone());\n            base\n        } else {\n            loop {\n                let candidate = format!(\"{}_{}\", &base, self.counter);\n                self.counter = self.counter.checked_add(1).unwrap_or_else(|| {\n                    // We'll run out of other things long before we ever\n                    // reach this, but we'll check just for correctness and\n                    // completeness.\n                    panic!(\"Cannot generate unique value\")\n                });\n                if !self.used.contains(&candidate) {\n                    self.used.insert(candidate.clone());\n                    return candidate\n                }\n            }\n        }\n    }\n}\n"
  },
  {
    "path": "src/cmd/reverse.rs",
    "content": "use CliResult;\nuse config::{Config, Delimiter};\nuse util;\n\nstatic USAGE: &'static str = \"\nReverses rows of CSV data.\n\nUseful for cases when there is no column that can be used for sorting in reverse order,\nor when keys are not unique and order of rows with the same key needs to be preserved.\n\nNote that this requires reading all of the CSV data into memory.\n\nUsage:\n    xsv reverse [options] [<input>]\n\nCommon options:\n    -h, --help             Display this message\n    -o, --output <file>    Write output to <file> instead of stdout.\n    -n, --no-headers       When set, the first row will not be interpreted\n                           as headers. Namely, it will be reversed with the rest\n                           of the rows. Otherwise, the first row will always\n                           appear as the header row in the output.\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Deserialize)]\nstruct Args {\n    arg_input: Option<String>,\n    flag_output: Option<String>,\n    flag_no_headers: bool,\n    flag_delimiter: Option<Delimiter>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n    let rconfig = Config::new(&args.arg_input)\n        .delimiter(args.flag_delimiter)\n        .no_headers(args.flag_no_headers);\n\n    let mut rdr = rconfig.reader()?;\n\n    let mut all = rdr.byte_records().collect::<Result<Vec<_>, _>>()?;\n    all.reverse();\n\n    let mut wtr = Config::new(&args.flag_output).writer()?;\n    rconfig.write_headers(&mut rdr, &mut wtr)?;\n    for r in all.into_iter() {\n        wtr.write_byte_record(&r)?;\n    }\n    Ok(wtr.flush()?)\n}\n"
  },
  {
    "path": "src/cmd/sample.rs",
    "content": "use std::io;\n\nuse byteorder::{ByteOrder, LittleEndian};\nuse csv;\nuse rand::{self, Rng, SeedableRng};\nuse rand::rngs::StdRng;\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse index::Indexed;\nuse util;\n\nstatic USAGE: &'static str = \"\nRandomly samples CSV data uniformly using memory proportional to the size of\nthe sample.\n\nWhen an index is present, this command will use random indexing if the sample\nsize is less than 10% of the total number of records. This allows for efficient\nsampling such that the entire CSV file is not parsed.\n\nThis command is intended to provide a means to sample from a CSV data set that\nis too big to fit into memory (for example, for use with commands like 'xsv\nfrequency' or 'xsv stats'). It will however visit every CSV record exactly\nonce, which is necessary to provide a uniform random sample. If you wish to\nlimit the number of records visited, use the 'xsv slice' command to pipe into\n'xsv sample'.\n\nUsage:\n    xsv sample [options] <sample-size> [<input>]\n    xsv sample --help\n\nsample options:\n    --seed <number>        RNG seed.\n\nCommon options:\n    -h, --help             Display this message\n    -o, --output <file>    Write output to <file> instead of stdout.\n    -n, --no-headers       When set, the first row will be consider as part of\n                           the population to sample from. (When not set, the\n                           first row is the header row and will always appear\n                           in the output.)\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Deserialize)]\nstruct Args {\n    arg_input: Option<String>,\n    arg_sample_size: u64,\n    flag_output: Option<String>,\n    flag_no_headers: bool,\n    flag_delimiter: Option<Delimiter>,\n    flag_seed: Option<usize>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n    let rconfig = Config::new(&args.arg_input)\n        .delimiter(args.flag_delimiter)\n        .no_headers(args.flag_no_headers);\n    let sample_size = args.arg_sample_size;\n\n    let mut wtr = Config::new(&args.flag_output).writer()?;\n    let sampled = match rconfig.indexed()? {\n        Some(mut idx) => {\n            if do_random_access(sample_size, idx.count()) {\n                rconfig.write_headers(&mut *idx, &mut wtr)?;\n                sample_random_access(&mut idx, sample_size)?\n            } else {\n                let mut rdr = rconfig.reader()?;\n                rconfig.write_headers(&mut rdr, &mut wtr)?;\n                sample_reservoir(&mut rdr, sample_size, args.flag_seed)?\n            }\n        }\n        _ => {\n            let mut rdr = rconfig.reader()?;\n            rconfig.write_headers(&mut rdr, &mut wtr)?;\n            sample_reservoir(&mut rdr, sample_size, args.flag_seed)?\n        }\n    };\n    for row in sampled.into_iter() {\n        wtr.write_byte_record(&row)?;\n    }\n    Ok(wtr.flush()?)\n}\n\nfn sample_random_access<R, I>(\n    idx: &mut Indexed<R, I>,\n    sample_size: u64,\n) -> CliResult<Vec<csv::ByteRecord>>\nwhere R: io::Read + io::Seek, I: io::Read + io::Seek\n{\n    let mut all_indices = (0..idx.count()).collect::<Vec<_>>();\n    let mut rng = ::rand::thread_rng();\n    rng.shuffle(&mut *all_indices);\n\n    let mut sampled = Vec::with_capacity(sample_size as usize);\n    for i in all_indices.into_iter().take(sample_size as usize) {\n        idx.seek(i)?;\n        sampled.push(idx.byte_records().next().unwrap()?);\n    }\n    Ok(sampled)\n}\n\nfn sample_reservoir<R: io::Read>(\n    rdr: &mut csv::Reader<R>,\n    sample_size: u64,\n    seed: Option<usize>\n) -> CliResult<Vec<csv::ByteRecord>> {\n    // The following algorithm has been adapted from:\n    // https://en.wikipedia.org/wiki/Reservoir_sampling\n    let mut reservoir = Vec::with_capacity(sample_size as usize);\n    let mut records = rdr.byte_records().enumerate();\n    for (_, row) in records.by_ref().take(reservoir.capacity()) {\n        reservoir.push(row?);\n    }\n\n    // Seeding rng\n    let mut rng: StdRng = match seed {\n        None => {\n            StdRng::from_rng(rand::thread_rng()).unwrap()\n        }\n        Some(seed) => {\n            let mut buf = [0u8; 32];\n            LittleEndian::write_u64(&mut buf, seed as u64);\n            SeedableRng::from_seed(buf)\n        }\n    };\n\n    // Now do the sampling.\n    for (i, row) in records {\n        let random = rng.gen_range(0, i+1);\n        if random < sample_size as usize {\n            reservoir[random] = row?;\n        }\n    }\n    Ok(reservoir)\n}\n\nfn do_random_access(sample_size: u64, total: u64) -> bool {\n    sample_size <= (total / 10)\n}\n"
  },
  {
    "path": "src/cmd/search.rs",
    "content": "use csv;\nuse regex::bytes::RegexBuilder;\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse select::SelectColumns;\nuse util;\n\nstatic USAGE: &'static str = \"\nFilters CSV data by whether the given regex matches a row.\n\nThe regex is applied to each field in each row, and if any field matches,\nthen the row is written to the output. The columns to search can be limited\nwith the '--select' flag (but the full row is still written to the output if\nthere is a match).\n\nUsage:\n    xsv search [options] <regex> [<input>]\n    xsv search --help\n\nsearch options:\n    -i, --ignore-case      Case insensitive search. This is equivalent to\n                           prefixing the regex with '(?i)'.\n    -s, --select <arg>     Select the columns to search. See 'xsv select -h'\n                           for the full syntax.\n    -v, --invert-match     Select only rows that did not match\n\nCommon options:\n    -h, --help             Display this message\n    -o, --output <file>    Write output to <file> instead of stdout.\n    -n, --no-headers       When set, the first row will not be interpreted\n                           as headers. (i.e., They are not searched, analyzed,\n                           sliced, etc.)\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Deserialize)]\nstruct Args {\n    arg_input: Option<String>,\n    arg_regex: String,\n    flag_select: SelectColumns,\n    flag_output: Option<String>,\n    flag_no_headers: bool,\n    flag_delimiter: Option<Delimiter>,\n    flag_invert_match: bool,\n    flag_ignore_case: bool,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n    let pattern = RegexBuilder::new(&*args.arg_regex)\n        .case_insensitive(args.flag_ignore_case)\n        .build()?;\n    let rconfig = Config::new(&args.arg_input)\n        .delimiter(args.flag_delimiter)\n        .no_headers(args.flag_no_headers)\n        .select(args.flag_select);\n\n    let mut rdr = rconfig.reader()?;\n    let mut wtr = Config::new(&args.flag_output).writer()?;\n\n    let headers = rdr.byte_headers()?.clone();\n    let sel = rconfig.selection(&headers)?;\n\n    if !rconfig.no_headers {\n        wtr.write_record(&headers)?;\n    }\n    let mut record = csv::ByteRecord::new();\n    while rdr.read_byte_record(&mut record)? {\n        let mut m = sel.select(&record).any(|f| pattern.is_match(f));\n        if args.flag_invert_match {\n            m = !m;\n        }\n        if m {\n            wtr.write_byte_record(&record)?;\n        }\n    }\n    Ok(wtr.flush()?)\n}\n"
  },
  {
    "path": "src/cmd/select.rs",
    "content": "use csv;\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse select::SelectColumns;\nuse util;\n\nstatic USAGE: &'static str = \"\nSelect columns from CSV data efficiently.\n\nThis command lets you manipulate the columns in CSV data. You can re-order\nthem, duplicate them or drop them. Columns can be referenced by index or by\nname if there is a header row (duplicate column names can be disambiguated with\nmore indexing). Finally, column ranges can be specified.\n\n  Select the first and fourth columns:\n  $ xsv select 1,4\n\n  Select the first 4 columns (by index and by name):\n  $ xsv select 1-4\n  $ xsv select Header1-Header4\n\n  Ignore the first 2 columns (by range and by omission):\n  $ xsv select 3-\n  $ xsv select '!1-2'\n\n  Select the third column named 'Foo':\n  $ xsv select 'Foo[2]'\n\n  Re-order and duplicate columns arbitrarily:\n  $ xsv select 3-1,Header3-Header1,Header1,Foo[2],Header1\n\n  Quote column names that conflict with selector syntax:\n  $ xsv select '\\\"Date - Opening\\\",\\\"Date - Actual Closing\\\"'\n\nUsage:\n    xsv select [options] [--] <selection> [<input>]\n    xsv select --help\n\nCommon options:\n    -h, --help             Display this message\n    -o, --output <file>    Write output to <file> instead of stdout.\n    -n, --no-headers       When set, the first row will not be interpreted\n                           as headers. (i.e., They are not searched, analyzed,\n                           sliced, etc.)\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Deserialize)]\nstruct Args {\n    arg_input: Option<String>,\n    arg_selection: SelectColumns,\n    flag_output: Option<String>,\n    flag_no_headers: bool,\n    flag_delimiter: Option<Delimiter>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n\n    let rconfig = Config::new(&args.arg_input)\n        .delimiter(args.flag_delimiter)\n        .no_headers(args.flag_no_headers)\n        .select(args.arg_selection);\n\n    let mut rdr = rconfig.reader()?;\n    let mut wtr = Config::new(&args.flag_output).writer()?;\n\n    let headers = rdr.byte_headers()?.clone();\n    let sel = rconfig.selection(&headers)?;\n\n    if !rconfig.no_headers {\n        wtr.write_record(sel.iter().map(|&i| &headers[i]))?;\n    }\n    let mut record = csv::ByteRecord::new();\n    while rdr.read_byte_record(&mut record)? {\n        wtr.write_record(sel.iter().map(|&i| &record[i]))?;\n    }\n    wtr.flush()?;\n    Ok(())\n}\n"
  },
  {
    "path": "src/cmd/slice.rs",
    "content": "use std::fs;\n\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse index::Indexed;\nuse util;\n\nstatic USAGE: &'static str = \"\nReturns the rows in the range specified (starting at 0, half-open interval).\nThe range does not include headers.\n\nIf the start of the range isn't specified, then the slice starts from the first\nrecord in the CSV data.\n\nIf the end of the range isn't specified, then the slice continues to the last\nrecord in the CSV data.\n\nThis operation can be made much faster by creating an index with 'xsv index'\nfirst. Namely, a slice on an index requires parsing just the rows that are\nsliced. Without an index, all rows up to the first row in the slice must be\nparsed.\n\nUsage:\n    xsv slice [options] [<input>]\n\nslice options:\n    -s, --start <arg>      The index of the record to slice from.\n    -e, --end <arg>        The index of the record to slice to.\n    -l, --len <arg>        The length of the slice (can be used instead\n                           of --end).\n    -i, --index <arg>      Slice a single record (shortcut for -s N -l 1).\n\nCommon options:\n    -h, --help             Display this message\n    -o, --output <file>    Write output to <file> instead of stdout.\n    -n, --no-headers       When set, the first row will not be interpreted\n                           as headers. Otherwise, the first row will always\n                           appear in the output as the header row.\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Deserialize)]\nstruct Args {\n    arg_input: Option<String>,\n    flag_start: Option<usize>,\n    flag_end: Option<usize>,\n    flag_len: Option<usize>,\n    flag_index: Option<usize>,\n    flag_output: Option<String>,\n    flag_no_headers: bool,\n    flag_delimiter: Option<Delimiter>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n    match args.rconfig().indexed()? {\n        None => args.no_index(),\n        Some(idxed) => args.with_index(idxed),\n    }\n}\n\nimpl Args {\n    fn no_index(&self) -> CliResult<()> {\n        let mut rdr = self.rconfig().reader()?;\n        let mut wtr = self.wconfig().writer()?;\n        self.rconfig().write_headers(&mut rdr, &mut wtr)?;\n\n        let (start, end) = self.range()?;\n        for r in rdr.byte_records().skip(start).take(end - start) {\n            wtr.write_byte_record(&r?)?;\n        }\n        Ok(wtr.flush()?)\n    }\n\n    fn with_index(\n        &self,\n        mut idx: Indexed<fs::File, fs::File>,\n    ) -> CliResult<()> {\n        let mut wtr = self.wconfig().writer()?;\n        self.rconfig().write_headers(&mut *idx, &mut wtr)?;\n\n        let (start, end) = self.range()?;\n        if end - start == 0 {\n            return Ok(());\n        }\n        idx.seek(start as u64)?;\n        for r in idx.byte_records().take(end - start) {\n            wtr.write_byte_record(&r?)?;\n        }\n        wtr.flush()?;\n        Ok(())\n    }\n\n    fn range(&self) -> Result<(usize, usize), String> {\n        util::range(\n            self.flag_start, self.flag_end, self.flag_len, self.flag_index)\n    }\n\n    fn rconfig(&self) -> Config {\n        Config::new(&self.arg_input)\n            .delimiter(self.flag_delimiter)\n            .no_headers(self.flag_no_headers)\n    }\n\n    fn wconfig(&self) -> Config {\n        Config::new(&self.flag_output)\n    }\n}\n"
  },
  {
    "path": "src/cmd/sort.rs",
    "content": "use std::cmp;\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse select::SelectColumns;\nuse util;\nuse std::str::from_utf8;\n\nuse self::Number::{Float, Int};\n\nstatic USAGE: &'static str = \"\nSorts CSV data lexicographically.\n\nNote that this requires reading all of the CSV data into memory.\n\nUsage:\n    xsv sort [options] [<input>]\n\nsort options:\n    -s, --select <arg>     Select a subset of columns to sort.\n                           See 'xsv select --help' for the format details.\n    -N, --numeric          Compare according to string numerical value\n    -R, --reverse          Reverse order\n\nCommon options:\n    -h, --help             Display this message\n    -o, --output <file>    Write output to <file> instead of stdout.\n    -n, --no-headers       When set, the first row will not be interpreted\n                           as headers. Namely, it will be sorted with the rest\n                           of the rows. Otherwise, the first row will always\n                           appear as the header row in the output.\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Deserialize)]\nstruct Args {\n    arg_input: Option<String>,\n    flag_select: SelectColumns,\n    flag_numeric: bool,\n    flag_reverse: bool,\n    flag_output: Option<String>,\n    flag_no_headers: bool,\n    flag_delimiter: Option<Delimiter>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n    let numeric = args.flag_numeric;\n    let reverse = args.flag_reverse;\n    let rconfig = Config::new(&args.arg_input)\n        .delimiter(args.flag_delimiter)\n        .no_headers(args.flag_no_headers)\n        .select(args.flag_select);\n\n    let mut rdr = rconfig.reader()?;\n\n    let headers = rdr.byte_headers()?.clone();\n    let sel = rconfig.selection(&headers)?;\n\n    let mut all = rdr.byte_records().collect::<Result<Vec<_>, _>>()?;\n    match (numeric, reverse) {\n        (false, false) =>\n            all.sort_by(|r1, r2| {\n                let a = sel.select(r1);\n                let b = sel.select(r2);\n                iter_cmp(a, b)\n            }),\n        (true, false) =>\n            all.sort_by(|r1, r2| {\n                let a = sel.select(r1);\n                let b = sel.select(r2);\n                iter_cmp_num(a, b)\n            }),\n        (false, true) =>\n            all.sort_by(|r1, r2| {\n                let a = sel.select(r1);\n                let b = sel.select(r2);\n                iter_cmp(b, a)\n            }),\n        (true, true) =>\n            all.sort_by(|r1, r2| {\n                let a = sel.select(r1);\n                let b = sel.select(r2);\n                iter_cmp_num(b, a)\n            }),\n    }\n\n    let mut wtr = Config::new(&args.flag_output).writer()?;\n    rconfig.write_headers(&mut rdr, &mut wtr)?;\n    for r in all.into_iter() {\n        wtr.write_byte_record(&r)?;\n    }\n    Ok(wtr.flush()?)\n}\n\n/// Order `a` and `b` lexicographically using `Ord`\npub fn iter_cmp<A, L, R>(mut a: L, mut b: R) -> cmp::Ordering\n        where A: Ord, L: Iterator<Item=A>, R: Iterator<Item=A> {\n    loop {\n        match (a.next(), b.next()) {\n            (None, None) => return cmp::Ordering::Equal,\n            (None, _   ) => return cmp::Ordering::Less,\n            (_   , None) => return cmp::Ordering::Greater,\n            (Some(x), Some(y)) => match x.cmp(&y) {\n                cmp::Ordering::Equal => (),\n                non_eq => return non_eq,\n            },\n        }\n    }\n}\n\n/// Try parsing `a` and `b` as numbers when ordering\npub fn iter_cmp_num<'a, L, R>(mut a: L, mut b: R) -> cmp::Ordering\n        where L: Iterator<Item=&'a [u8]>, R: Iterator<Item=&'a [u8]> {\n    loop {\n        match (next_num(&mut a), next_num(&mut b)) {\n            (None, None) => return cmp::Ordering::Equal,\n            (None, _   ) => return cmp::Ordering::Less,\n            (_   , None) => return cmp::Ordering::Greater,\n            (Some(x), Some(y)) => match compare_num(x, y) {\n                cmp::Ordering::Equal => (),\n                non_eq => return non_eq,\n            },\n        }\n    }\n}\n\n#[derive(Clone, Copy, PartialEq)]\nenum Number {\n    Int(i64),\n    Float(f64),\n}\n\nfn compare_num(n1: Number, n2: Number) -> cmp::Ordering{\n    match (n1, n2) {\n        (Int(i1), Int(i2)) => i1.cmp(&i2),\n        (Int(i1), Float(f2)) => compare_float(i1 as f64, f2),\n        (Float(f1), Int(i2)) => compare_float(f1, i2 as f64),\n        (Float(f1), Float(f2)) => compare_float(f1, f2),\n    }\n}\n\nfn compare_float(f1: f64, f2: f64) -> cmp::Ordering {\n    f1.partial_cmp(&f2).unwrap_or(cmp::Ordering::Equal)\n}\n\n\n\nfn next_num<'a, X>(xs: &mut X) -> Option<Number>\n        where X: Iterator<Item=&'a [u8]> {\n    xs.next()\n        .and_then(|bytes| from_utf8(bytes).ok())\n        .and_then(|s| {\n            if let Ok(i) = s.parse::<i64>() { Some(Number::Int(i)) }\n            else if let Ok(f) = s.parse::<f64>() { Some(Number::Float(f)) }\n            else { None }\n        })\n}\n"
  },
  {
    "path": "src/cmd/split.rs",
    "content": "use std::fs;\nuse std::io;\nuse std::path::Path;\n\nuse channel;\nuse csv;\nuse threadpool::ThreadPool;\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse index::Indexed;\nuse util::{self, FilenameTemplate};\n\nstatic USAGE: &'static str = \"\nSplits the given CSV data into chunks.\n\nThe files are written to the directory given with the name '{start}.csv',\nwhere {start} is the index of the first record of the chunk (starting at 0).\n\nUsage:\n    xsv split [options] <outdir> [<input>]\n    xsv split --help\n\nsplit options:\n    -s, --size <arg>       The number of records to write into each chunk.\n                           [default: 500]\n    -j, --jobs <arg>       The number of spliting jobs to run in parallel.\n                           This only works when the given CSV data has\n                           an index already created. Note that a file handle\n                           is opened for each job.\n                           When set to '0', the number of jobs is set to the\n                           number of CPUs detected.\n                           [default: 0]\n    --filename <filename>  A filename template to use when constructing\n                           the names of the output files.  The string '{}'\n                           will be replaced by a value based on the value\n                           of the field, but sanitized for shell safety.\n                           [default: {}.csv]\n\nCommon options:\n    -h, --help             Display this message\n    -n, --no-headers       When set, the first row will NOT be interpreted\n                           as column names. Otherwise, the first row will\n                           appear in all chunks as the header row.\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Clone, Deserialize)]\nstruct Args {\n    arg_input: Option<String>,\n    arg_outdir: String,\n    flag_size: usize,\n    flag_jobs: usize,\n    flag_filename: FilenameTemplate,\n    flag_no_headers: bool,\n    flag_delimiter: Option<Delimiter>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n    if args.flag_size == 0 {\n        return fail!(\"--size must be greater than 0.\");\n    }\n    fs::create_dir_all(&args.arg_outdir)?;\n\n    match args.rconfig().indexed()? {\n        Some(idx) => args.parallel_split(idx),\n        None => args.sequential_split(),\n    }\n}\n\nimpl Args {\n    fn sequential_split(&self) -> CliResult<()> {\n        let rconfig = self.rconfig();\n        let mut rdr = rconfig.reader()?;\n        let headers = rdr.byte_headers()?.clone();\n\n        let mut wtr = self.new_writer(&headers, 0)?;\n        let mut i = 0;\n        let mut row = csv::ByteRecord::new();\n        while rdr.read_byte_record(&mut row)? {\n            if i > 0 && i % self.flag_size == 0 {\n                wtr.flush()?;\n                wtr = self.new_writer(&headers, i)?;\n            }\n            wtr.write_byte_record(&row)?;\n            i += 1;\n        }\n        wtr.flush()?;\n        Ok(())\n    }\n\n    fn parallel_split(\n        &self,\n        idx: Indexed<fs::File, fs::File>,\n    ) -> CliResult<()> {\n        let nchunks = util::num_of_chunks(\n            idx.count() as usize, self.flag_size);\n        let pool = ThreadPool::new(self.njobs());\n        let (tx, rx) = channel::bounded::<()>(0);\n        for i in 0..nchunks {\n            let args = self.clone();\n            let tx = tx.clone();\n            pool.execute(move || {\n                let conf = args.rconfig();\n                let mut idx = conf.indexed().unwrap().unwrap();\n                let headers = idx.byte_headers().unwrap().clone();\n                let mut wtr = args\n                    .new_writer(&headers, i * args.flag_size)\n                    .unwrap();\n\n                idx.seek((i * args.flag_size) as u64).unwrap();\n                for row in idx.byte_records().take(args.flag_size) {\n                    let row = row.unwrap();\n                    wtr.write_byte_record(&row).unwrap();\n                }\n                wtr.flush().unwrap();\n                drop(tx);\n            });\n        }\n        drop(tx);\n        rx.recv();\n        Ok(())\n    }\n\n    fn new_writer(\n        &self,\n        headers: &csv::ByteRecord,\n        start: usize,\n    ) -> CliResult<csv::Writer<Box<io::Write+'static>>> {\n        let dir = Path::new(&self.arg_outdir);\n        let path = dir.join(self.flag_filename.filename(&format!(\"{}\", start)));\n        let spath = Some(path.display().to_string());\n        let mut wtr = Config::new(&spath).writer()?;\n        if !self.rconfig().no_headers {\n            wtr.write_record(headers)?;\n        }\n        Ok(wtr)\n    }\n\n    fn rconfig(&self) -> Config {\n        Config::new(&self.arg_input)\n            .delimiter(self.flag_delimiter)\n            .no_headers(self.flag_no_headers)\n    }\n\n    fn njobs(&self) -> usize {\n        if self.flag_jobs == 0 {\n            util::num_cpus()\n        } else {\n            self.flag_jobs\n        }\n    }\n}\n"
  },
  {
    "path": "src/cmd/stats.rs",
    "content": "use std::borrow::ToOwned;\nuse std::default::Default;\nuse std::fmt;\nuse std::fs;\nuse std::io;\nuse std::iter::{FromIterator, repeat};\nuse std::str::{self, FromStr};\n\nuse channel;\nuse csv;\nuse stats::{Commute, OnlineStats, MinMax, Unsorted, merge_all};\nuse threadpool::ThreadPool;\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse index::Indexed;\nuse select::{SelectColumns, Selection};\nuse util;\n\nuse self::FieldType::{TUnknown, TNull, TUnicode, TFloat, TInteger};\n\nstatic USAGE: &'static str = \"\nComputes basic statistics on CSV data.\n\nBasic statistics includes mean, median, mode, standard deviation, sum, max and\nmin values. Note that some statistics are expensive to compute, so they must\nbe enabled explicitly. By default, the following statistics are reported for\n*every* column in the CSV data: mean, max, min and standard deviation. The\ndefault set of statistics corresponds to statistics that can be computed\nefficiently on a stream of data (i.e., constant memory).\n\nComputing statistics on a large file can be made much faster if you create\nan index for it first with 'xsv index'.\n\nUsage:\n    xsv stats [options] [<input>]\n\nstats options:\n    -s, --select <arg>     Select a subset of columns to compute stats for.\n                           See 'xsv select --help' for the format details.\n                           This is provided here because piping 'xsv select'\n                           into 'xsv stats' will disable the use of indexing.\n    --everything           Show all statistics available.\n    --mode                 Show the mode.\n                           This requires storing all CSV data in memory.\n    --cardinality          Show the cardinality.\n                           This requires storing all CSV data in memory.\n    --median               Show the median.\n                           This requires storing all CSV data in memory.\n    --nulls                Include NULLs in the population size for computing\n                           mean and standard deviation.\n    -j, --jobs <arg>       The number of jobs to run in parallel.\n                           This works better when the given CSV data has\n                           an index already created. Note that a file handle\n                           is opened for each job.\n                           When set to '0', the number of jobs is set to the\n                           number of CPUs detected.\n                           [default: 0]\n\nCommon options:\n    -h, --help             Display this message\n    -o, --output <file>    Write output to <file> instead of stdout.\n    -n, --no-headers       When set, the first row will NOT be interpreted\n                           as column names. i.e., They will be included\n                           in statistics.\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Clone, Deserialize)]\nstruct Args {\n    arg_input: Option<String>,\n    flag_select: SelectColumns,\n    flag_everything: bool,\n    flag_mode: bool,\n    flag_cardinality: bool,\n    flag_median: bool,\n    flag_nulls: bool,\n    flag_jobs: usize,\n    flag_output: Option<String>,\n    flag_no_headers: bool,\n    flag_delimiter: Option<Delimiter>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n\n    let mut wtr = Config::new(&args.flag_output).writer()?;\n    let (headers, stats) = match args.rconfig().indexed()? {\n        None => args.sequential_stats(),\n        Some(idx) => {\n            if args.flag_jobs == 1 {\n                args.sequential_stats()\n            } else {\n                args.parallel_stats(idx)\n            }\n        }\n    }?;\n    let stats = args.stats_to_records(stats);\n\n    wtr.write_record(&args.stat_headers())?;\n    let fields = headers.iter().zip(stats.into_iter());\n    for (i, (header, stat)) in fields.enumerate() {\n        let header =\n            if args.flag_no_headers {\n                i.to_string().into_bytes()\n            } else {\n                header.to_vec()\n            };\n        let stat = stat.iter().map(|f| f.as_bytes());\n        wtr.write_record(vec![&*header].into_iter().chain(stat))?;\n    }\n    wtr.flush()?;\n    Ok(())\n}\n\nimpl Args {\n    fn sequential_stats(&self) -> CliResult<(csv::ByteRecord, Vec<Stats>)> {\n        let mut rdr = self.rconfig().reader()?;\n        let (headers, sel) = self.sel_headers(&mut rdr)?;\n        let stats = self.compute(&sel, rdr.byte_records())?;\n        Ok((headers, stats))\n    }\n\n    fn parallel_stats(\n        &self,\n        idx: Indexed<fs::File, fs::File>,\n    ) -> CliResult<(csv::ByteRecord, Vec<Stats>)> {\n        // N.B. This method doesn't handle the case when the number of records\n        // is zero correctly. So we use `sequential_stats` instead.\n        if idx.count() == 0 {\n            return self.sequential_stats();\n        }\n\n        let mut rdr = self.rconfig().reader()?;\n        let (headers, sel) = self.sel_headers(&mut rdr)?;\n\n        let chunk_size = util::chunk_size(idx.count() as usize, self.njobs());\n        let nchunks = util::num_of_chunks(idx.count() as usize, chunk_size);\n\n        let pool = ThreadPool::new(self.njobs());\n        let (send, recv) = channel::bounded(0);\n        for i in 0..nchunks {\n            let (send, args, sel) = (send.clone(), self.clone(), sel.clone());\n            pool.execute(move || {\n                let mut idx = args.rconfig().indexed().unwrap().unwrap();\n                idx.seek((i * chunk_size) as u64).unwrap();\n                let it = idx.byte_records().take(chunk_size);\n                send.send(args.compute(&sel, it).unwrap());\n            });\n        }\n        drop(send);\n        Ok((headers, merge_all(recv).unwrap_or_else(Vec::new)))\n    }\n\n    fn stats_to_records(&self, stats: Vec<Stats>) -> Vec<csv::StringRecord> {\n        let mut records: Vec<_> = repeat(csv::StringRecord::new())\n            .take(stats.len())\n            .collect();\n        let pool = ThreadPool::new(self.njobs());\n        let mut results = vec![];\n        for mut stat in stats.into_iter() {\n            let (send, recv) = channel::bounded(0);\n            results.push(recv);\n            pool.execute(move || { send.send(stat.to_record()); });\n        }\n        for (i, recv) in results.into_iter().enumerate() {\n            records[i] = recv.recv().unwrap();\n        }\n        records\n    }\n\n    fn compute<I>(&self, sel: &Selection, it: I) -> CliResult<Vec<Stats>>\n            where I: Iterator<Item=csv::Result<csv::ByteRecord>> {\n        let mut stats = self.new_stats(sel.len());\n        for row in it {\n            let row = row?;\n            for (i, field) in sel.select(&row).enumerate() {\n                stats[i].add(field);\n            }\n        }\n        Ok(stats)\n    }\n\n    fn sel_headers<R: io::Read>(\n        &self,\n        rdr: &mut csv::Reader<R>,\n    ) -> CliResult<(csv::ByteRecord, Selection)> {\n        let headers = rdr.byte_headers()?.clone();\n        let sel = self.rconfig().selection(&headers)?;\n        Ok((csv::ByteRecord::from_iter(sel.select(&headers)), sel))\n    }\n\n    fn rconfig(&self) -> Config {\n        Config::new(&self.arg_input)\n            .delimiter(self.flag_delimiter)\n            .no_headers(self.flag_no_headers)\n            .select(self.flag_select.clone())\n    }\n\n    fn njobs(&self) -> usize {\n        if self.flag_jobs == 0 { util::num_cpus() } else { self.flag_jobs }\n    }\n\n    fn new_stats(&self, record_len: usize) -> Vec<Stats> {\n        repeat(Stats::new(WhichStats {\n            include_nulls: self.flag_nulls,\n            sum: true,\n            range: true,\n            dist: true,\n            cardinality: self.flag_cardinality || self.flag_everything,\n            median: self.flag_median || self.flag_everything,\n            mode: self.flag_mode || self.flag_everything,\n        })).take(record_len).collect()\n    }\n\n    fn stat_headers(&self) -> csv::StringRecord {\n        let mut fields = vec![\n            \"field\", \"type\", \"sum\", \"min\", \"max\", \"min_length\", \"max_length\",\n            \"mean\", \"stddev\",\n        ];\n        let all = self.flag_everything;\n        if self.flag_median || all { fields.push(\"median\"); }\n        if self.flag_mode || all { fields.push(\"mode\"); }\n        if self.flag_cardinality || all { fields.push(\"cardinality\"); }\n        csv::StringRecord::from(fields)\n    }\n}\n\n#[derive(Clone, Debug, Eq, PartialEq)]\nstruct WhichStats {\n    include_nulls: bool,\n    sum: bool,\n    range: bool,\n    dist: bool,\n    cardinality: bool,\n    median: bool,\n    mode: bool,\n}\n\nimpl Commute for WhichStats {\n    fn merge(&mut self, other: WhichStats) {\n        assert_eq!(*self, other);\n    }\n}\n\n#[derive(Clone)]\nstruct Stats {\n    typ: FieldType,\n    sum: Option<TypedSum>,\n    minmax: Option<TypedMinMax>,\n    online: Option<OnlineStats>,\n    mode: Option<Unsorted<Vec<u8>>>,\n    median: Option<Unsorted<f64>>,\n    which: WhichStats,\n}\n\nimpl Stats {\n    fn new(which: WhichStats) -> Stats {\n        let (mut sum, mut minmax, mut online, mut mode, mut median) =\n            (None, None, None, None, None);\n        if which.sum { sum = Some(Default::default()); }\n        if which.range { minmax = Some(Default::default()); }\n        if which.dist { online = Some(Default::default()); }\n        if which.mode || which.cardinality { mode = Some(Default::default()); }\n        if which.median { median = Some(Default::default()); }\n        Stats {\n            typ: Default::default(),\n            sum: sum,\n            minmax: minmax,\n            online: online,\n            mode: mode,\n            median: median,\n            which: which,\n        }\n    }\n\n    fn add(&mut self, sample: &[u8]) {\n        let sample_type = FieldType::from_sample(sample);\n        self.typ.merge(sample_type);\n\n        let t = self.typ;\n        self.sum.as_mut().map(|v| v.add(t, sample));\n        self.minmax.as_mut().map(|v| v.add(t, sample));\n        self.mode.as_mut().map(|v| v.add(sample.to_vec()));\n        match self.typ {\n            TUnknown => {}\n            TNull => {\n                if self.which.include_nulls {\n                    self.online.as_mut().map(|v| { v.add_null(); });\n                }\n            }\n            TUnicode => {}\n            TFloat | TInteger => {\n                if sample_type.is_null() {\n                    if self.which.include_nulls {\n                        self.online.as_mut().map(|v| { v.add_null(); });\n                    }\n                } else {\n                    let n = from_bytes::<f64>(sample).unwrap();\n                    self.median.as_mut().map(|v| { v.add(n); });\n                    self.online.as_mut().map(|v| { v.add(n); });\n                }\n            }\n        }\n    }\n\n    fn to_record(&mut self) -> csv::StringRecord {\n        let typ = self.typ;\n        let mut pieces = vec![];\n        let empty = || \"\".to_owned();\n\n        pieces.push(self.typ.to_string());\n        match self.sum.as_ref().and_then(|sum| sum.show(typ)) {\n            Some(sum) => { pieces.push(sum); }\n            None => { pieces.push(empty()); }\n        }\n        match self.minmax.as_ref().and_then(|mm| mm.show(typ)) {\n            Some(mm) => { pieces.push(mm.0); pieces.push(mm.1); }\n            None => { pieces.push(empty()); pieces.push(empty()); }\n        }\n        match self.minmax.as_ref().and_then(|mm| mm.len_range()) {\n            Some(mm) => { pieces.push(mm.0); pieces.push(mm.1); }\n            None => { pieces.push(empty()); pieces.push(empty()); }\n        }\n\n        if !self.typ.is_number() {\n            pieces.push(empty()); pieces.push(empty());\n        } else {\n            match self.online {\n                Some(ref v) => {\n                    pieces.push(v.mean().to_string());\n                    pieces.push(v.stddev().to_string());\n                }\n                None => { pieces.push(empty()); pieces.push(empty()); }\n            }\n        }\n        match self.median.as_mut().and_then(|v| v.median()) {\n            None => {\n                if self.which.median {\n                    pieces.push(empty());\n                }\n            }\n            Some(v) => { pieces.push(v.to_string()); }\n        }\n        match self.mode.as_mut() {\n            None => {\n                if self.which.mode {\n                    pieces.push(empty());\n                }\n                if self.which.cardinality {\n                    pieces.push(empty());\n                }\n            }\n            Some(ref mut v) => {\n                if self.which.mode {\n                    let lossy = |s: Vec<u8>| -> String {\n                        String::from_utf8_lossy(&*s).into_owned()\n                    };\n                    pieces.push(\n                        v.mode().map_or(\"N/A\".to_owned(), lossy));\n                }\n                if self.which.cardinality {\n                    pieces.push(v.cardinality().to_string());\n                }\n            }\n        }\n        csv::StringRecord::from(pieces)\n    }\n}\n\nimpl Commute for Stats {\n    fn merge(&mut self, other: Stats) {\n        self.typ.merge(other.typ);\n        self.sum.merge(other.sum);\n        self.minmax.merge(other.minmax);\n        self.online.merge(other.online);\n        self.mode.merge(other.mode);\n        self.median.merge(other.median);\n        self.which.merge(other.which);\n    }\n}\n\n#[derive(Clone, Copy, PartialEq)]\nenum FieldType {\n    TUnknown,\n    TNull,\n    TUnicode,\n    TFloat,\n    TInteger,\n}\n\nimpl FieldType {\n    fn from_sample(sample: &[u8]) -> FieldType {\n        if sample.is_empty() {\n            return TNull;\n        }\n        let string = match str::from_utf8(sample) {\n            Err(_) => return TUnknown,\n            Ok(s) => s,\n        };\n        if let Ok(_) = string.parse::<i64>() { return TInteger; }\n        if let Ok(_) = string.parse::<f64>() { return TFloat; }\n        TUnicode\n    }\n\n    fn is_number(&self) -> bool {\n        *self == TFloat || *self == TInteger\n    }\n\n    fn is_null(&self) -> bool {\n        *self == TNull\n    }\n}\n\nimpl Commute for FieldType {\n    fn merge(&mut self, other: FieldType) {\n        *self = match (*self, other) {\n            (TUnicode, TUnicode) => TUnicode,\n            (TFloat, TFloat) => TFloat,\n            (TInteger, TInteger) => TInteger,\n            // Null does not impact the type.\n            (TNull, any) | (any, TNull) => any,\n            // There's no way to get around an unknown.\n            (TUnknown, _) | (_, TUnknown) => TUnknown,\n            // Integers can degrate to floats.\n            (TFloat, TInteger) | (TInteger, TFloat) => TFloat,\n            // Numbers can degrade to Unicode strings.\n            (TUnicode, TFloat) | (TFloat, TUnicode) => TUnicode,\n            (TUnicode, TInteger) | (TInteger, TUnicode) => TUnicode,\n        };\n    }\n}\n\nimpl Default for FieldType {\n    // The default is the most specific type.\n    // Type inference proceeds by assuming the most specific type and then\n    // relaxing the type as counter-examples are found.\n    fn default() -> FieldType { TNull }\n}\n\nimpl fmt::Display for FieldType {\n    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {\n        match *self {\n            TUnknown => write!(f, \"Unknown\"),\n            TNull => write!(f, \"NULL\"),\n            TUnicode => write!(f, \"Unicode\"),\n            TFloat => write!(f, \"Float\"),\n            TInteger => write!(f, \"Integer\"),\n        }\n    }\n}\n\n/// TypedSum keeps a rolling sum of the data seen.\n///\n/// It sums integers until it sees a float, at which point it sums floats.\n#[derive(Clone, Default)]\nstruct TypedSum {\n    integer: i64,\n    float: Option<f64>,\n}\n\nimpl TypedSum {\n    fn add(&mut self, typ: FieldType, sample: &[u8]) {\n        if sample.is_empty() {\n            return;\n        }\n        match typ {\n            TFloat => {\n                let float: f64 = from_bytes::<f64>(sample).unwrap();\n                match self.float {\n                    None => {\n                        self.float = Some((self.integer as f64) + float);\n                    }\n                    Some(ref mut f) => {\n                        *f += float;\n                    }\n                }\n            }\n            TInteger => {\n                if let Some(ref mut float) = self.float {\n                    *float += from_bytes::<f64>(sample).unwrap();\n                } else {\n                    self.integer += from_bytes::<i64>(sample).unwrap();\n                }\n            }\n            _ => {}\n        }\n    }\n\n    fn show(&self, typ: FieldType) -> Option<String> {\n        match typ {\n            TNull | TUnicode | TUnknown  => None,\n            TInteger => Some(self.integer.to_string()),\n            TFloat => Some(self.float.unwrap_or(0.0).to_string()),\n        }\n    }\n}\n\nimpl Commute for TypedSum {\n    fn merge(&mut self, other: TypedSum) {\n        match (self.float, other.float) {\n            (Some(f1), Some(f2)) => self.float = Some(f1 + f2),\n            (Some(f1), None) => self.float = Some(f1 + (other.integer as f64)),\n            (None, Some(f2)) => self.float = Some((self.integer as f64) + f2),\n            (None, None) => self.integer += other.integer,\n        }\n    }\n}\n\n/// TypedMinMax keeps track of minimum/maximum values for each possible type\n/// where min/max makes sense.\n#[derive(Clone)]\nstruct TypedMinMax {\n    strings: MinMax<Vec<u8>>,\n    str_len: MinMax<usize>,\n    integers: MinMax<i64>,\n    floats: MinMax<f64>,\n}\n\nimpl TypedMinMax {\n    fn add(&mut self, typ: FieldType, sample: &[u8]) {\n        self.str_len.add(sample.len());\n        if sample.is_empty() {\n            return;\n        }\n        self.strings.add(sample.to_vec());\n        match typ {\n            TUnicode | TUnknown | TNull => {}\n            TFloat => {\n                let n = str::from_utf8(&*sample)\n                            .ok()\n                            .and_then(|s| s.parse::<f64>().ok())\n                            .unwrap();\n                self.floats.add(n);\n                self.integers.add(n as i64);\n            }\n            TInteger => {\n                let n = str::from_utf8(&*sample)\n                            .ok()\n                            .and_then(|s| s.parse::<i64>().ok())\n                            .unwrap();\n                self.integers.add(n);\n                self.floats.add(n as f64);\n            }\n        }\n    }\n\n    fn len_range(&self) -> Option<(String, String)> {\n        match (self.str_len.min(), self.str_len.max()) {\n            (Some(min), Some(max)) => Some((min.to_string(), max.to_string())),\n            _ => None,\n        }\n    }\n\n    fn show(&self, typ: FieldType) -> Option<(String, String)> {\n        match typ {\n            TNull => None,\n            TUnicode | TUnknown => {\n                match (self.strings.min(), self.strings.max()) {\n                    (Some(min), Some(max)) => {\n                        let min = String::from_utf8_lossy(&**min).to_string();\n                        let max = String::from_utf8_lossy(&**max).to_string();\n                        Some((min, max))\n                    }\n                    _ => None\n                }\n            }\n            TInteger => {\n                match (self.integers.min(), self.integers.max()) {\n                    (Some(min), Some(max)) => {\n                        Some((min.to_string(), max.to_string()))\n                    }\n                    _ => None\n                }\n            }\n            TFloat => {\n                match (self.floats.min(), self.floats.max()) {\n                    (Some(min), Some(max)) => {\n                        Some((min.to_string(), max.to_string()))\n                    }\n                    _ => None\n                }\n            }\n        }\n    }\n}\n\nimpl Default for TypedMinMax {\n    fn default() -> TypedMinMax {\n        TypedMinMax {\n            strings: Default::default(),\n            str_len: Default::default(),\n            integers: Default::default(),\n            floats: Default::default(),\n        }\n    }\n}\n\nimpl Commute for TypedMinMax {\n    fn merge(&mut self, other: TypedMinMax) {\n        self.strings.merge(other.strings);\n        self.str_len.merge(other.str_len);\n        self.integers.merge(other.integers);\n        self.floats.merge(other.floats);\n    }\n}\n\nfn from_bytes<T: FromStr>(bytes: &[u8]) -> Option<T> {\n    str::from_utf8(bytes).ok().and_then(|s| s.parse().ok())\n}\n"
  },
  {
    "path": "src/cmd/table.rs",
    "content": "use std::borrow::Cow;\n\nuse csv;\nuse tabwriter::TabWriter;\n\nuse CliResult;\nuse config::{Config, Delimiter};\nuse util;\n\nstatic USAGE: &'static str = \"\nOutputs CSV data as a table with columns in alignment.\n\nThis will not work well if the CSV data contains large fields.\n\nNote that formatting a table requires buffering all CSV data into memory.\nTherefore, you should use the 'sample' or 'slice' command to trim down large\nCSV data before formatting it with this command.\n\nUsage:\n    xsv table [options] [<input>]\n\ntable options:\n    -w, --width <arg>      The minimum width of each column.\n                           [default: 2]\n    -p, --pad <arg>        The minimum number of spaces between each column.\n                           [default: 2]\n    -c, --condense <arg>  Limits the length of each field to the value\n                           specified. If the field is UTF-8 encoded, then\n                           <arg> refers to the number of code points.\n                           Otherwise, it refers to the number of bytes.\n\nCommon options:\n    -h, --help             Display this message\n    -o, --output <file>    Write output to <file> instead of stdout.\n    -d, --delimiter <arg>  The field delimiter for reading CSV data.\n                           Must be a single character. (default: ,)\n\";\n\n#[derive(Deserialize)]\nstruct Args {\n    arg_input: Option<String>,\n    flag_width: usize,\n    flag_pad: usize,\n    flag_output: Option<String>,\n    flag_delimiter: Option<Delimiter>,\n    flag_condense: Option<usize>,\n}\n\npub fn run(argv: &[&str]) -> CliResult<()> {\n    let args: Args = util::get_args(USAGE, argv)?;\n    let rconfig = Config::new(&args.arg_input)\n        .delimiter(args.flag_delimiter)\n        .no_headers(true);\n    let wconfig = Config::new(&args.flag_output)\n        .delimiter(Some(Delimiter(b'\\t')));\n\n    let tw = TabWriter::new(wconfig.io_writer()?)\n        .minwidth(args.flag_width)\n        .padding(args.flag_pad);\n    let mut wtr = wconfig.from_writer(tw);\n    let mut rdr = rconfig.reader()?;\n\n    let mut record = csv::ByteRecord::new();\n    while rdr.read_byte_record(&mut record)? {\n        wtr.write_record(record.iter().map(|f| {\n            util::condense(Cow::Borrowed(f), args.flag_condense)\n        }))?;\n    }\n    wtr.flush()?;\n    Ok(())\n}\n"
  },
  {
    "path": "src/config.rs",
    "content": "#[allow(deprecated, unused_imports)]\nuse std::ascii::AsciiExt;\nuse std::borrow::ToOwned;\nuse std::env;\nuse std::fs;\nuse std::io::{self, Read};\nuse std::ops::Deref;\nuse std::path::PathBuf;\n\nuse csv;\nuse index::Indexed;\nuse serde::de::{Deserializer, Deserialize, Error};\n\nuse CliResult;\nuse select::{SelectColumns, Selection};\nuse util;\n\n\n#[derive(Clone, Copy, Debug)]\npub struct Delimiter(pub u8);\n\n/// Delimiter represents values that can be passed from the command line that\n/// can be used as a field delimiter in CSV data.\n///\n/// Its purpose is to ensure that the Unicode character given decodes to a\n/// valid ASCII character as required by the CSV parser.\nimpl Delimiter {\n    pub fn as_byte(self) -> u8 {\n        self.0\n    }\n}\n\nimpl<'de> Deserialize<'de> for Delimiter {\n    fn deserialize<D: Deserializer<'de>>(d: D) -> Result<Delimiter, D::Error> {\n        let c = String::deserialize(d)?;\n        match &*c {\n            r\"\\t\" => Ok(Delimiter(b'\\t')),\n            s => {\n                if s.len() != 1 {\n                    let msg = format!(\"Could not convert '{}' to a single \\\n                                       ASCII character.\", s);\n                    return Err(D::Error::custom(msg));\n                }\n                let c = s.chars().next().unwrap();\n                if c.is_ascii() {\n                    Ok(Delimiter(c as u8))\n                } else {\n                    let msg = format!(\"Could not convert '{}' \\\n                                       to ASCII delimiter.\", c);\n                    Err(D::Error::custom(msg))\n                }\n            }\n        }\n    }\n}\n\n#[derive(Debug)]\npub struct Config {\n    path: Option<PathBuf>, // None implies <stdin>\n    idx_path: Option<PathBuf>,\n    select_columns: Option<SelectColumns>,\n    delimiter: u8,\n    pub no_headers: bool,\n    flexible: bool,\n    terminator: csv::Terminator,\n    quote: u8,\n    quote_style: csv::QuoteStyle,\n    double_quote: bool,\n    escape: Option<u8>,\n    quoting: bool,\n}\n\nimpl Config {\n    pub fn new(path: &Option<String>) -> Config {\n        let (path, delim) = match *path {\n            None => (None, b','),\n            Some(ref s) if s.deref() == \"-\" => (None, b','),\n            Some(ref s) => {\n                let path = PathBuf::from(s);\n                let delim =\n                    if path.extension().map_or(false, |v| v == \"tsv\" || v == \"tab\") {\n                        b'\\t'\n                    } else {\n                        b','\n                    };\n                (Some(path), delim)\n            }\n        };\n        Config {\n            path: path,\n            idx_path: None,\n            select_columns: None,\n            delimiter: delim,\n            no_headers: false,\n            flexible: false,\n            terminator: csv::Terminator::Any(b'\\n'),\n            quote: b'\"',\n            quote_style: csv::QuoteStyle::Necessary,\n            double_quote: true,\n            escape: None,\n            quoting: true,\n        }\n    }\n\n    pub fn delimiter(mut self, d: Option<Delimiter>) -> Config {\n        if let Some(d) = d {\n            self.delimiter = d.as_byte();\n        }\n        self\n    }\n\n    pub fn no_headers(mut self, mut yes: bool) -> Config {\n        if env::var(\"XSV_TOGGLE_HEADERS\").unwrap_or(\"0\".to_owned()) == \"1\" {\n            yes = !yes;\n        }\n        self.no_headers = yes;\n        self\n    }\n\n    pub fn flexible(mut self, yes: bool) -> Config {\n        self.flexible = yes;\n        self\n    }\n\n    pub fn crlf(mut self, yes: bool) -> Config {\n        if yes {\n            self.terminator = csv::Terminator::CRLF;\n        } else {\n            self.terminator = csv::Terminator::Any(b'\\n');\n        }\n        self\n    }\n\n    pub fn terminator(mut self, term: csv::Terminator) -> Config {\n        self.terminator = term;\n        self\n    }\n\n    pub fn quote(mut self, quote: u8) -> Config {\n        self.quote = quote;\n        self\n    }\n\n    pub fn quote_style(mut self, style: csv::QuoteStyle) -> Config {\n        self.quote_style = style;\n        self\n    }\n\n    pub fn double_quote(mut self, yes: bool) -> Config {\n        self.double_quote = yes;\n        self\n    }\n\n    pub fn escape(mut self, escape: Option<u8>) -> Config {\n        self.escape = escape;\n        self\n    }\n\n    pub fn quoting(mut self, yes: bool) -> Config {\n        self.quoting = yes;\n        self\n    }\n\n    pub fn select(mut self, sel_cols: SelectColumns) -> Config {\n        self.select_columns = Some(sel_cols);\n        self\n    }\n\n    pub fn is_std(&self) -> bool {\n        self.path.is_none()\n    }\n\n    pub fn selection(\n        &self,\n        first_record: &csv::ByteRecord,\n    ) -> Result<Selection, String> {\n        match self.select_columns {\n            None => Err(\"Config has no 'SelectColums'. Did you call \\\n                         Config::select?\".to_owned()),\n            Some(ref sel) => sel.selection(first_record, !self.no_headers),\n        }\n    }\n\n    pub fn write_headers<R: io::Read, W: io::Write>\n                        (&self, r: &mut csv::Reader<R>, w: &mut csv::Writer<W>)\n                        -> csv::Result<()> {\n        if !self.no_headers {\n            let r = r.byte_headers()?;\n            if !r.is_empty() {\n                w.write_record(r)?;\n            }\n        }\n        Ok(())\n    }\n\n    pub fn writer(&self)\n                 -> io::Result<csv::Writer<Box<io::Write+'static>>> {\n        Ok(self.from_writer(self.io_writer()?))\n    }\n\n    pub fn reader(&self)\n                 -> io::Result<csv::Reader<Box<io::Read+'static>>> {\n        Ok(self.from_reader(self.io_reader()?))\n    }\n\n    pub fn reader_file(&self) -> io::Result<csv::Reader<fs::File>> {\n        match self.path {\n            None => Err(io::Error::new(\n                io::ErrorKind::Other, \"Cannot use <stdin> here\",\n            )),\n            Some(ref p) => fs::File::open(p).map(|f| self.from_reader(f)),\n        }\n    }\n\n    pub fn index_files(&self)\n           -> io::Result<Option<(csv::Reader<fs::File>, fs::File)>> {\n        let (csv_file, idx_file) = match (&self.path, &self.idx_path) {\n            (&None, &None) => return Ok(None),\n            (&None, &Some(_)) => return Err(io::Error::new(\n                io::ErrorKind::Other,\n                \"Cannot use <stdin> with indexes\",\n                // Some(format!(\"index file: {}\", p.display()))\n            )),\n            (&Some(ref p), &None) => {\n                // We generally don't want to report an error here, since we're\n                // passively trying to find an index.\n                let idx_file = match fs::File::open(&util::idx_path(p)) {\n                    // TODO: Maybe we should report an error if the file exists\n                    // but is not readable.\n                    Err(_) => return Ok(None),\n                    Ok(f) => f,\n                };\n                (fs::File::open(p)?, idx_file)\n            }\n            (&Some(ref p), &Some(ref ip)) => {\n                (fs::File::open(p)?, fs::File::open(ip)?)\n            }\n        };\n        // If the CSV data was last modified after the index file was last\n        // modified, then return an error and demand the user regenerate the\n        // index.\n        let data_modified = util::last_modified(&csv_file.metadata()?);\n        let idx_modified = util::last_modified(&idx_file.metadata()?);\n        if data_modified > idx_modified {\n            return Err(io::Error::new(\n                io::ErrorKind::Other,\n                \"The CSV file was modified after the index file. \\\n                 Please re-create the index.\",\n            ));\n        }\n        let csv_rdr = self.from_reader(csv_file);\n        Ok(Some((csv_rdr, idx_file)))\n    }\n\n    pub fn indexed(&self)\n                  -> CliResult<Option<Indexed<fs::File, fs::File>>> {\n        match self.index_files()? {\n            None => Ok(None),\n            Some((r, i)) => Ok(Some(Indexed::open(r, i)?)),\n        }\n    }\n\n    pub fn io_reader(&self) -> io::Result<Box<io::Read+'static>> {\n        Ok(match self.path {\n                None => Box::new(io::stdin()),\n                Some(ref p) => {\n                    match fs::File::open(p){\n                        Ok(x) => Box::new(x),\n                        Err(err) => {\n                            let msg = format!(\n                                \"failed to open {}: {}\", p.display(), err);\n                            return Err(io::Error::new(\n                                io::ErrorKind::NotFound,\n                                msg,\n                            ));\n                        }\n                    }\n                },\n            })\n    }\n\n    pub fn from_reader<R: Read>(&self, rdr: R) -> csv::Reader<R> {\n        csv::ReaderBuilder::new()\n            .flexible(self.flexible)\n            .delimiter(self.delimiter)\n            .has_headers(!self.no_headers)\n            .quote(self.quote)\n            .quoting(self.quoting)\n            .escape(self.escape)\n            .from_reader(rdr)\n    }\n\n    pub fn io_writer(&self) -> io::Result<Box<io::Write+'static>> {\n        Ok(match self.path {\n            None => Box::new(io::stdout()),\n            Some(ref p) => Box::new(fs::File::create(p)?),\n        })\n    }\n\n    pub fn from_writer<W: io::Write>(&self, wtr: W) -> csv::Writer<W> {\n        csv::WriterBuilder::new()\n            .flexible(self.flexible)\n            .delimiter(self.delimiter)\n            .terminator(self.terminator)\n            .quote(self.quote)\n            .quote_style(self.quote_style)\n            .double_quote(self.double_quote)\n            .escape(self.escape.unwrap_or(b'\\\\'))\n            .buffer_capacity(32 * (1<<10))\n            .from_writer(wtr)\n    }\n}\n"
  },
  {
    "path": "src/index.rs",
    "content": "use std::io;\nuse std::ops;\n\nuse csv;\nuse csv_index::RandomAccessSimple;\n\nuse CliResult;\n\n/// Indexed composes a CSV reader with a simple random access index.\npub struct Indexed<R, I> {\n    csv_rdr: csv::Reader<R>,\n    idx: RandomAccessSimple<I>,\n}\n\nimpl<R, I> ops::Deref for Indexed<R, I> {\n    type Target = csv::Reader<R>;\n    fn deref(&self) -> &csv::Reader<R> { &self.csv_rdr }\n}\n\nimpl<R, I> ops::DerefMut for Indexed<R, I> {\n    fn deref_mut(&mut self) -> &mut csv::Reader<R> { &mut self.csv_rdr }\n}\n\nimpl<R: io::Read + io::Seek, I: io::Read + io::Seek> Indexed<R, I> {\n    /// Opens an index.\n    pub fn open(\n        csv_rdr: csv::Reader<R>,\n        idx_rdr: I,\n    ) -> CliResult<Indexed<R, I>> {\n        Ok(Indexed {\n            csv_rdr: csv_rdr,\n            idx: RandomAccessSimple::open(idx_rdr)?,\n        })\n    }\n\n    /// Return the number of records (not including the header record) in this\n    /// index.\n    pub fn count(&self) -> u64 {\n        if self.csv_rdr.has_headers() && !self.idx.is_empty() {\n            self.idx.len() - 1\n        } else {\n            self.idx.len()\n        }\n    }\n\n    /// Seek to the starting position of record `i`.\n    pub fn seek(&mut self, mut i: u64) -> CliResult<()> {\n        if i >= self.count() {\n            let msg = format!(\n                \"invalid record index {} (there are {} records)\",\n                i, self.count());\n            return fail!(io::Error::new(io::ErrorKind::Other, msg));\n        }\n        if self.csv_rdr.has_headers() {\n            i += 1;\n        }\n        let pos = self.idx.get(i)?;\n        self.csv_rdr.seek(pos)?;\n        Ok(())\n    }\n}\n"
  },
  {
    "path": "src/main.rs",
    "content": "extern crate byteorder;\nextern crate crossbeam_channel as channel;\nextern crate csv;\nextern crate csv_index;\nextern crate docopt;\nextern crate filetime;\nextern crate num_cpus;\nextern crate rand;\nextern crate regex;\nextern crate serde;\n#[macro_use]\nextern crate serde_derive;\nextern crate stats;\nextern crate tabwriter;\nextern crate threadpool;\n\nuse std::borrow::ToOwned;\nuse std::env;\nuse std::fmt;\nuse std::io;\nuse std::process;\n\nuse docopt::Docopt;\n\nmacro_rules! wout {\n    ($($arg:tt)*) => ({\n        use std::io::Write;\n        (writeln!(&mut ::std::io::stdout(), $($arg)*)).unwrap();\n    });\n}\n\nmacro_rules! werr {\n    ($($arg:tt)*) => ({\n        use std::io::Write;\n        (writeln!(&mut ::std::io::stderr(), $($arg)*)).unwrap();\n    });\n}\n\nmacro_rules! fail {\n    ($e:expr) => (Err(::std::convert::From::from($e)));\n}\n\nmacro_rules! command_list {\n    () => (\n\"\n    cat         Concatenate by row or column\n    count       Count records\n    fixlengths  Makes all records have same length\n    flatten     Show one field per line\n    fmt         Format CSV output (change field delimiter)\n    frequency   Show frequency tables\n    headers     Show header names\n    help        Show this usage message.\n    index       Create CSV index for faster access\n    input       Read CSV data with special quoting rules\n    join        Join CSV files\n    partition   Partition CSV data based on a column value\n    sample      Randomly sample CSV data\n    reverse     Reverse rows of CSV data\n    search      Search CSV data with regexes\n    select      Select columns from CSV\n    slice       Slice records from CSV\n    sort        Sort CSV data\n    split       Split CSV data into many files\n    stats       Compute basic statistics\n    table       Align CSV data into columns\n\"\n    )\n}\n\nmod cmd;\nmod config;\nmod index;\nmod select;\nmod util;\n\nstatic USAGE: &'static str = concat!(\"\nUsage:\n    xsv <command> [<args>...]\n    xsv [options]\n\nOptions:\n    --list        List all commands available.\n    -h, --help    Display this message\n    <command> -h  Display the command help message\n    --version     Print version info and exit\n\nCommands:\", command_list!());\n\n#[derive(Deserialize)]\nstruct Args {\n    arg_command: Option<Command>,\n    flag_list: bool,\n}\n\nfn main() {\n    let args: Args = Docopt::new(USAGE)\n                            .and_then(|d| d.options_first(true)\n                                           .version(Some(util::version()))\n                                           .deserialize())\n                            .unwrap_or_else(|e| e.exit());\n    if args.flag_list {\n        wout!(concat!(\"Installed commands:\", command_list!()));\n        return;\n    }\n    match args.arg_command {\n        None => {\n            werr!(concat!(\n                \"xsv is a suite of CSV command line utilities.\n\nPlease choose one of the following commands:\",\n                command_list!()));\n            process::exit(0);\n        }\n        Some(cmd) => {\n            match cmd.run() {\n                Ok(()) => process::exit(0),\n                Err(CliError::Flag(err)) => err.exit(),\n                Err(CliError::Csv(err)) => {\n                    werr!(\"{}\", err);\n                    process::exit(1);\n                }\n                Err(CliError::Io(ref err))\n                        if err.kind() == io::ErrorKind::BrokenPipe => {\n                    process::exit(0);\n                }\n                Err(CliError::Io(err)) => {\n                    werr!(\"{}\", err);\n                    process::exit(1);\n                }\n                Err(CliError::Other(msg)) => {\n                    werr!(\"{}\", msg);\n                    process::exit(1);\n                }\n            }\n        }\n    }\n}\n\n#[derive(Debug, Deserialize)]\n#[serde(rename_all = \"lowercase\")]\nenum Command {\n    Cat,\n    Count,\n    FixLengths,\n    Flatten,\n    Fmt,\n    Frequency,\n    Headers,\n    Help,\n    Index,\n    Input,\n    Join,\n    Partition,\n    Reverse,\n    Sample,\n    Search,\n    Select,\n    Slice,\n    Sort,\n    Split,\n    Stats,\n    Table,\n}\n\nimpl Command {\n    fn run(self) -> CliResult<()> {\n        let argv: Vec<_> = env::args().map(|v| v.to_owned()).collect();\n        let argv: Vec<_> = argv.iter().map(|s| &**s).collect();\n        let argv = &*argv;\n\n        if !argv[1].chars().all(char::is_lowercase) {\n            return Err(CliError::Other(format!(\n                \"xsv expects commands in lowercase. Did you mean '{}'?\", \n                argv[1].to_lowercase()).to_string()));\n        }\n        match self {\n            Command::Cat => cmd::cat::run(argv),\n            Command::Count => cmd::count::run(argv),\n            Command::FixLengths => cmd::fixlengths::run(argv),\n            Command::Flatten => cmd::flatten::run(argv),\n            Command::Fmt => cmd::fmt::run(argv),\n            Command::Frequency => cmd::frequency::run(argv),\n            Command::Headers => cmd::headers::run(argv),\n            Command::Help => { wout!(\"{}\", USAGE); Ok(()) }\n            Command::Index => cmd::index::run(argv),\n            Command::Input => cmd::input::run(argv),\n            Command::Join => cmd::join::run(argv),\n            Command::Partition => cmd::partition::run(argv),\n            Command::Reverse => cmd::reverse::run(argv),\n            Command::Sample => cmd::sample::run(argv),\n            Command::Search => cmd::search::run(argv),\n            Command::Select => cmd::select::run(argv),\n            Command::Slice => cmd::slice::run(argv),\n            Command::Sort => cmd::sort::run(argv),\n            Command::Split => cmd::split::run(argv),\n            Command::Stats => cmd::stats::run(argv),\n            Command::Table => cmd::table::run(argv),\n        }\n    }\n}\n\npub type CliResult<T> = Result<T, CliError>;\n\n#[derive(Debug)]\npub enum CliError {\n    Flag(docopt::Error),\n    Csv(csv::Error),\n    Io(io::Error),\n    Other(String),\n}\n\nimpl fmt::Display for CliError {\n    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {\n        match *self {\n            CliError::Flag(ref e) => { e.fmt(f) }\n            CliError::Csv(ref e) => { e.fmt(f) }\n            CliError::Io(ref e) => { e.fmt(f) }\n            CliError::Other(ref s) => { f.write_str(&**s) }\n        }\n    }\n}\n\nimpl From<docopt::Error> for CliError {\n    fn from(err: docopt::Error) -> CliError {\n        CliError::Flag(err)\n    }\n}\n\nimpl From<csv::Error> for CliError {\n    fn from(err: csv::Error) -> CliError {\n        if !err.is_io_error() {\n            return CliError::Csv(err);\n        }\n        match err.into_kind() {\n            csv::ErrorKind::Io(v) => From::from(v),\n            _ => unreachable!(),\n        }\n    }\n}\n\nimpl From<io::Error> for CliError {\n    fn from(err: io::Error) -> CliError {\n        CliError::Io(err)\n    }\n}\n\nimpl From<String> for CliError {\n    fn from(err: String) -> CliError {\n        CliError::Other(err)\n    }\n}\n\nimpl<'a> From<&'a str> for CliError {\n    fn from(err: &'a str) -> CliError {\n        CliError::Other(err.to_owned())\n    }\n}\n\nimpl From<regex::Error> for CliError {\n    fn from(err: regex::Error) -> CliError {\n        CliError::Other(format!(\"{:?}\", err))\n    }\n}\n"
  },
  {
    "path": "src/select.rs",
    "content": "use std::cmp::Ordering;\nuse std::collections::HashSet;\nuse std::fmt;\nuse std::iter::{self, repeat};\nuse std::ops;\nuse std::slice;\nuse std::str::FromStr;\n\nuse csv;\nuse serde::de::{Deserializer, Deserialize, Error};\n\n#[derive(Clone)]\npub struct SelectColumns {\n    selectors: Vec<Selector>,\n    invert: bool,\n}\n\nimpl SelectColumns {\n    fn parse(mut s: &str) -> Result<SelectColumns, String> {\n        let invert =\n            if !s.is_empty() && s.as_bytes()[0] == b'!' {\n                s = &s[1..];\n                true\n            } else {\n                false\n            };\n        Ok(SelectColumns {\n            selectors: SelectorParser::new(s).parse()?,\n            invert: invert,\n        })\n    }\n\n    pub fn selection(\n        &self,\n        first_record: &csv::ByteRecord,\n        use_names: bool,\n    ) -> Result<Selection, String> {\n        if self.selectors.is_empty() {\n            return Ok(Selection(if self.invert {\n                // Inverting everything means we get nothing.\n                vec![]\n            } else {\n                (0..first_record.len()).collect()\n            }));\n        }\n\n        let mut map = vec![];\n        for sel in &self.selectors {\n            let idxs = sel.indices(first_record, use_names);\n            map.extend(idxs?.into_iter());\n        }\n        if self.invert {\n            let set: HashSet<_> = map.into_iter().collect();\n            let mut map = vec![];\n            for i in 0..first_record.len() {\n                if !set.contains(&i) {\n                    map.push(i);\n                }\n            }\n            return Ok(Selection(map));\n        }\n        Ok(Selection(map))\n    }\n}\n\nimpl fmt::Debug for SelectColumns {\n    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {\n        if self.selectors.is_empty() {\n            write!(f, \"<All>\")\n        } else {\n            let strs: Vec<_> =\n                self.selectors\n                    .iter().map(|sel| format!(\"{:?}\", sel)).collect();\n            write!(f, \"{}\", strs.join(\", \"))\n        }\n    }\n}\n\nimpl<'de> Deserialize<'de> for SelectColumns {\n    fn deserialize<D: Deserializer<'de>>(\n        d: D,\n    ) -> Result<SelectColumns, D::Error> {\n        let raw = String::deserialize(d)?;\n        SelectColumns::parse(&raw).map_err(|e| D::Error::custom(&e))\n    }\n}\n\nstruct SelectorParser {\n    chars: Vec<char>,\n    pos: usize,\n}\n\nimpl SelectorParser {\n    fn new(s: &str) -> SelectorParser {\n        SelectorParser { chars: s.chars().collect(), pos: 0 }\n    }\n\n    fn parse(&mut self) -> Result<Vec<Selector>, String> {\n        let mut sels = vec![];\n        loop {\n            if self.cur().is_none() {\n                break;\n            }\n            let f1: OneSelector =\n                if self.cur() == Some('-') {\n                    OneSelector::Start\n                } else {\n                    self.parse_one()?\n                };\n            let f2: Option<OneSelector> =\n                if self.cur() == Some('-') {\n                    self.bump();\n                    Some(if self.is_end_of_selector() {\n                        OneSelector::End\n                    } else {\n                        self.parse_one()?\n                    })\n                } else {\n                    None\n                };\n            if !self.is_end_of_selector() {\n                return Err(format!(\n                    \"Expected end of field but got '{}' instead.\",\n                    self.cur().unwrap()));\n            }\n            sels.push(match f2 {\n                Some(end) => Selector::Range(f1, end),\n                None => Selector::One(f1),\n            });\n            self.bump();\n        }\n        Ok(sels)\n    }\n\n    fn parse_one(&mut self) -> Result<OneSelector, String> {\n        let name =\n            if self.cur() == Some('\"') {\n                self.bump();\n                self.parse_quoted_name()?\n            } else {\n                self.parse_name()?\n            };\n        Ok(if self.cur() == Some('[') {\n            let idx = self.parse_index()?;\n            OneSelector::IndexedName(name, idx)\n        } else {\n            match FromStr::from_str(&name) {\n                Err(_) => OneSelector::IndexedName(name, 0),\n                Ok(idx) => OneSelector::Index(idx),\n            }\n        })\n    }\n\n    fn parse_name(&mut self) -> Result<String, String> {\n        let mut name = String::new();\n        loop {\n            if self.is_end_of_field() || self.cur() == Some('[') {\n                break;\n            }\n            name.push(self.cur().unwrap());\n            self.bump();\n        }\n        Ok(name)\n    }\n\n    fn parse_quoted_name(&mut self) -> Result<String, String> {\n        let mut name = String::new();\n        loop {\n            match self.cur() {\n                None => {\n                    return Err(\"Unclosed quote, missing closing \\\".\"\n                               .to_owned());\n                }\n                Some('\"') => {\n                    self.bump();\n                    if self.cur() == Some('\"') {\n                        self.bump();\n                        name.push('\"'); name.push('\"');\n                        continue;\n                    }\n                    break\n                }\n                Some(c) => { name.push(c); self.bump(); }\n            }\n        }\n        Ok(name)\n    }\n\n    fn parse_index(&mut self) -> Result<usize, String> {\n        assert_eq!(self.cur().unwrap(), '[');\n        self.bump();\n\n        let mut idx = String::new();\n        loop {\n            match self.cur() {\n                None => {\n                    return Err(\"Unclosed index bracket, missing closing ].\"\n                               .to_owned());\n                }\n                Some(']') => { self.bump(); break; }\n                Some(c) => { idx.push(c); self.bump(); }\n            }\n        }\n        FromStr::from_str(&idx).map_err(|err| {\n            format!(\"Could not convert '{}' to an integer: {}\", idx, err)\n        })\n    }\n\n    fn cur(&self) -> Option<char> {\n        self.chars.get(self.pos).cloned()\n    }\n\n    fn is_end_of_field(&self) -> bool {\n        self.cur().map_or(true, |c| c == ',' || c == '-')\n    }\n\n    fn is_end_of_selector(&self) -> bool {\n        self.cur().map_or(true, |c| c == ',')\n    }\n\n    fn bump(&mut self) {\n        if self.pos < self.chars.len() { self.pos += 1; }\n    }\n}\n\n#[derive(Clone)]\nenum Selector {\n    One(OneSelector),\n    Range(OneSelector, OneSelector),\n}\n\n#[derive(Clone)]\nenum OneSelector {\n    Start,\n    End,\n    Index(usize),\n    IndexedName(String, usize),\n}\n\nimpl Selector {\n    fn indices(\n        &self,\n        first_record: &csv::ByteRecord,\n        use_names: bool,\n    ) -> Result<Vec<usize>, String> {\n        match *self {\n            Selector::One(ref sel) => {\n                sel.index(first_record, use_names).map(|i| vec![i])\n            }\n            Selector::Range(ref sel1, ref sel2) => {\n                let i1 = sel1.index(first_record, use_names)?;\n                let i2 = sel2.index(first_record, use_names)?;\n                Ok(match i1.cmp(&i2) {\n                    Ordering::Equal => vec!(i1),\n                    Ordering::Less => (i1..(i2 + 1)).collect(),\n                    Ordering::Greater => {\n                        let mut inds = vec![];\n                        let mut i = i1 + 1;\n                        while i > i2 {\n                            i -= 1;\n                            inds.push(i);\n                        }\n                        inds\n                    }\n                })\n            }\n        }\n    }\n}\n\nimpl OneSelector {\n    fn index(\n        &self,\n        first_record: &csv::ByteRecord,\n        use_names: bool,\n    ) -> Result<usize, String> {\n        match *self {\n            OneSelector::Start => Ok(0),\n            OneSelector::End => Ok(\n                if first_record.len() == 0 {\n                    0\n                } else {\n                    first_record.len() - 1\n                }\n            ),\n            OneSelector::Index(i) => {\n                if i < 1 || i > first_record.len() {\n                    Err(format!(\"Selector index {} is out of \\\n                                 bounds. Index must be >= 1 \\\n                                 and <= {}.\", i, first_record.len()))\n                } else {\n                    // Indices given by user are 1-offset. Convert them here!\n                    Ok(i-1)\n                }\n            }\n            OneSelector::IndexedName(ref s, sidx) => {\n                if !use_names {\n                    return Err(format!(\"Cannot use names ('{}') in selection \\\n                                        with --no-headers set.\", s));\n                }\n                let mut num_found = 0;\n                for (i, field) in first_record.iter().enumerate() {\n                    if field == s.as_bytes() {\n                        if num_found == sidx {\n                            return Ok(i);\n                        }\n                        num_found += 1;\n                    }\n                }\n                if num_found == 0 {\n                    Err(format!(\"Selector name '{}' does not exist \\\n                                 as a named header in the given CSV \\\n                                 data.\", s))\n                } else {\n                    Err(format!(\"Selector index '{}' for name '{}' is \\\n                                 out of bounds. Must be >= 0 and <= {}.\",\n                                 sidx, s, num_found - 1))\n                }\n            }\n        }\n    }\n}\n\nimpl fmt::Debug for Selector {\n    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {\n        match *self {\n            Selector::One(ref sel) => sel.fmt(f),\n            Selector::Range(ref s, ref e) =>\n                write!(f, \"Range({:?}, {:?})\", s, e),\n        }\n    }\n}\n\nimpl fmt::Debug for OneSelector {\n    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {\n        match *self {\n            OneSelector::Start => write!(f, \"Start\"),\n            OneSelector::End => write!(f, \"End\"),\n            OneSelector::Index(idx) => write!(f, \"Index({})\", idx),\n            OneSelector::IndexedName(ref s, idx) =>\n                write!(f, \"IndexedName({}[{}])\", s, idx),\n        }\n    }\n}\n\n#[derive(Clone, Debug)]\npub struct Selection(Vec<usize>);\n\npub type _GetField =\n    for <'c> fn(&mut &'c csv::ByteRecord, &usize) -> Option<&'c [u8]>;\n\nimpl Selection {\n    pub fn select<'a, 'b>(&'a self, row: &'b csv::ByteRecord)\n                 -> iter::Scan<\n                        slice::Iter<'a, usize>,\n                        &'b csv::ByteRecord,\n                        _GetField,\n                    > {\n        // This is horrifying.\n        fn get_field<'c>(row: &mut &'c csv::ByteRecord, idx: &usize)\n                        -> Option<&'c [u8]> {\n            Some(&row[*idx])\n        }\n        let get_field: _GetField = get_field;\n        self.iter().scan(row, get_field)\n    }\n\n    pub fn normal(&self) -> NormalSelection {\n        let &Selection(ref inds) = self;\n        if inds.is_empty() {\n            return NormalSelection(vec![]);\n        }\n\n        let mut normal = inds.clone();\n        normal.sort();\n        normal.dedup();\n        let mut set: Vec<_> =\n            repeat(false).take(normal[normal.len()-1] + 1).collect();\n        for i in normal.into_iter() {\n            set[i] = true;\n        }\n        NormalSelection(set)\n    }\n\n    pub fn len(&self) -> usize {\n        self.0.len()\n    }\n}\n\nimpl ops::Deref for Selection {\n    type Target = [usize];\n\n    fn deref(&self) -> &[usize] {\n        &self.0\n    }\n}\n\n#[derive(Clone, Debug)]\npub struct NormalSelection(Vec<bool>);\n\npub type _NormalScan<'a, T, I> = iter::Scan<\n    iter::Enumerate<I>,\n    &'a [bool],\n    _NormalGetField<T>,\n>;\n\npub type _NormalFilterMap<'a, T, I> = iter::FilterMap<\n    _NormalScan<'a, T, I>,\n    fn(Option<T>) -> Option<T>\n>;\n\npub type _NormalGetField<T> =\n    fn(&mut &[bool], (usize, T)) -> Option<Option<T>>;\n\nimpl NormalSelection {\n    pub fn select<'a, T, I>(&'a self, row: I) -> _NormalFilterMap<'a, T, I>\n             where I: Iterator<Item=T> {\n        fn filmap<T>(v: Option<T>) -> Option<T> { v }\n        fn get_field<T>(set: &mut &[bool], t: (usize, T))\n                       -> Option<Option<T>> {\n            let (i, v) = t;\n            if i < set.len() && set[i] { Some(Some(v)) } else { Some(None) }\n        }\n        let get_field: _NormalGetField<T> = get_field;\n        let filmap: fn(Option<T>) -> Option<T> = filmap;\n        row.enumerate().scan(&**self, get_field).filter_map(filmap)\n    }\n\n    pub fn len(&self) -> usize {\n        self.iter().filter(|b| **b).count()\n    }\n}\n\nimpl ops::Deref for NormalSelection {\n    type Target = [bool];\n\n    fn deref(&self) -> &[bool] {\n        &self.0\n    }\n}\n"
  },
  {
    "path": "src/util.rs",
    "content": "use std::borrow::Cow;\nuse std::fs;\nuse std::io;\nuse std::path::{Path, PathBuf};\nuse std::str;\nuse std::thread;\nuse std::time;\n\nuse csv;\nuse docopt::Docopt;\nuse num_cpus;\nuse serde::de::{Deserializer, Deserialize, DeserializeOwned, Error};\n\nuse CliResult;\nuse config::{Config, Delimiter};\n\npub fn num_cpus() -> usize {\n    num_cpus::get()\n}\n\npub fn version() -> String {\n    let (maj, min, pat) = (\n        option_env!(\"CARGO_PKG_VERSION_MAJOR\"),\n        option_env!(\"CARGO_PKG_VERSION_MINOR\"),\n        option_env!(\"CARGO_PKG_VERSION_PATCH\"),\n    );\n    match (maj, min, pat) {\n        (Some(maj), Some(min), Some(pat)) =>\n            format!(\"{}.{}.{}\", maj, min, pat),\n        _ => \"\".to_owned(),\n    }\n}\n\npub fn get_args<T>(usage: &str, argv: &[&str]) -> CliResult<T>\n        where T: DeserializeOwned {\n    Docopt::new(usage)\n           .and_then(|d| d.argv(argv.iter().map(|&x| x))\n                          .version(Some(version()))\n                          .deserialize())\n           .map_err(From::from)\n}\n\npub fn many_configs(inps: &[String], delim: Option<Delimiter>,\n                    no_headers: bool) -> Result<Vec<Config>, String> {\n    let mut inps = inps.to_vec();\n    if inps.is_empty() {\n        inps.push(\"-\".to_owned()); // stdin\n    }\n    let confs = inps.into_iter()\n                    .map(|p| Config::new(&Some(p))\n                                    .delimiter(delim)\n                                    .no_headers(no_headers))\n                    .collect::<Vec<_>>();\n    errif_greater_one_stdin(&*confs)?;\n    Ok(confs)\n}\n\npub fn errif_greater_one_stdin(inps: &[Config]) -> Result<(), String> {\n    let nstd = inps.iter().filter(|inp| inp.is_std()).count();\n    if nstd > 1 {\n        return Err(\"At most one <stdin> input is allowed.\".to_owned());\n    }\n    Ok(())\n}\n\npub fn chunk_size(nitems: usize, njobs: usize) -> usize {\n    if nitems < njobs {\n        nitems\n    } else {\n        nitems / njobs\n    }\n}\n\npub fn num_of_chunks(nitems: usize, chunk_size: usize) -> usize {\n    if chunk_size == 0 {\n        return nitems;\n    }\n    let mut n = nitems / chunk_size;\n    if nitems % chunk_size != 0 {\n        n += 1;\n    }\n    n\n}\n\npub fn last_modified(md: &fs::Metadata) -> u64 {\n    use filetime::FileTime;\n    FileTime::from_last_modification_time(md).seconds_relative_to_1970()\n}\n\npub fn condense<'a>(val: Cow<'a, [u8]>, n: Option<usize>) -> Cow<'a, [u8]> {\n    match n {\n        None => val,\n        Some(n) => {\n            let mut is_short_utf8 = false;\n            if let Ok(s) = str::from_utf8(&*val) {\n                if n >= s.chars().count() {\n                    is_short_utf8 = true;\n                } else {\n                    let mut s = s.chars().take(n).collect::<String>();\n                    s.push_str(\"...\");\n                    return Cow::Owned(s.into_bytes());\n                }\n            }\n            if is_short_utf8 || n >= (*val).len() { // already short enough\n                val\n            } else {\n                // This is a non-Unicode string, so we just trim on bytes.\n                let mut s = val[0..n].to_vec();\n                s.extend(b\"...\".iter().cloned());\n                Cow::Owned(s)\n            }\n        }\n    }\n}\n\npub fn idx_path(csv_path: &Path) -> PathBuf {\n    let mut p = csv_path.to_path_buf().into_os_string().into_string().unwrap();\n    p.push_str(\".idx\");\n    PathBuf::from(&p)\n}\n\npub type Idx = Option<usize>;\n\npub fn range(start: Idx, end: Idx, len: Idx, index: Idx)\n            -> Result<(usize, usize), String> {\n    match (start, end, len, index) {\n        (None, None, None, Some(i)) => Ok((i, i+1)),\n        (_, _, _, Some(_)) =>\n            Err(\"--index cannot be used with --start, --end or --len\".to_owned()),\n        (_, Some(_), Some(_), None) =>\n            Err(\"--end and --len cannot be used at the same time.\".to_owned()),\n        (_, None, None, None) => Ok((start.unwrap_or(0), ::std::usize::MAX)),\n        (_, Some(e), None, None) => {\n            let s = start.unwrap_or(0);\n            if s > e {\n                Err(format!(\"The end of the range ({}) must be greater than or\\n\\\n                             equal to the start of the range ({}).\", e, s))\n            } else {\n                Ok((s, e))\n            }\n        }\n        (_, None, Some(l), None) => {\n            let s = start.unwrap_or(0);\n            Ok((s, s + l))\n        }\n    }\n}\n\n/// Create a directory recursively, avoiding the race conditons fixed by\n/// https://github.com/rust-lang/rust/pull/39799.\nfn create_dir_all_threadsafe(path: &Path) -> io::Result<()> {\n    // Try 20 times. This shouldn't theoretically need to be any larger\n    // than the number of nested directories we need to create.\n    for _ in 0..20 {\n        match fs::create_dir_all(path) {\n            // This happens if a directory in `path` doesn't exist when we\n            // test for it, and another thread creates it before we can.\n            Err(ref err) if err.kind() == io::ErrorKind::AlreadyExists => {},\n            other => return other,\n        }\n        // We probably don't need to sleep at all, because the intermediate\n        // directory is already created.  But let's attempt to back off a\n        // bit and let the other thread finish.\n        thread::sleep(time::Duration::from_millis(25));\n    }\n    // Try one last time, returning whatever happens.\n    fs::create_dir_all(path)\n}\n\n/// Represents a filename template of the form `\"{}.csv\"`, where `\"{}\"` is\n/// the splace to insert the part of the filename generated by `xsv`.\n#[derive(Clone, Debug)]\npub struct FilenameTemplate {\n    prefix: String,\n    suffix: String,\n}\n\nimpl FilenameTemplate {\n    /// Generate a new filename using `unique_value` to replace the `\"{}\"`\n    /// in the template.\n    pub fn filename(&self, unique_value: &str) -> String {\n        format!(\"{}{}{}\", &self.prefix, unique_value, &self.suffix)\n    }\n\n    /// Create a new, writable file in directory `path` with a filename\n    /// using `unique_value` to replace the `\"{}\"` in the template.  Note\n    /// that we do not output headers; the caller must do that if\n    /// desired.\n    pub fn writer<P>(&self, path: P, unique_value: &str)\n                 -> io::Result<csv::Writer<Box<io::Write+'static>>>\n        where P: AsRef<Path>\n    {\n        let filename = self.filename(unique_value);\n        let full_path = path.as_ref().join(filename);\n        if let Some(parent) = full_path.parent() {\n            // We may be called concurrently, especially by parallel `xsv\n            // split`, so be careful to avoid the `create_dir_all` race\n            // condition.\n            create_dir_all_threadsafe(parent)?;\n        }\n        let spath = Some(full_path.display().to_string());\n        Config::new(&spath).writer()\n    }\n}\n\nimpl<'de> Deserialize<'de> for FilenameTemplate {\n    fn deserialize<D: Deserializer<'de>>(\n        d: D,\n    ) -> Result<FilenameTemplate, D::Error> {\n        let raw = String::deserialize(d)?;\n        let chunks = raw.split(\"{}\").collect::<Vec<_>>();\n        if chunks.len() == 2 {\n            Ok(FilenameTemplate {\n                prefix: chunks[0].to_owned(),\n                suffix: chunks[1].to_owned(),\n            })\n        } else {\n            Err(D::Error::custom(\n                \"The --filename argument must contain one '{}'.\"))\n        }\n    }\n}\n"
  },
  {
    "path": "tests/test_cat.rs",
    "content": "use std::process;\n\nuse {Csv, CsvData, qcheck};\nuse workdir::Workdir;\n\nfn no_headers(cmd: &mut process::Command) {\n    cmd.arg(\"--no-headers\");\n}\n\nfn pad(cmd: &mut process::Command) {\n    cmd.arg(\"--pad\");\n}\n\nfn run_cat<X, Y, Z, F>(test_name: &str, which: &str, rows1: X, rows2: Y,\n                       modify_cmd: F) -> Z\n          where X: Csv, Y: Csv, Z: Csv, F: FnOnce(&mut process::Command) {\n    let wrk = Workdir::new(test_name);\n    wrk.create(\"in1.csv\", rows1);\n    wrk.create(\"in2.csv\", rows2);\n\n    let mut cmd = wrk.command(\"cat\");\n    modify_cmd(cmd.arg(which).arg(\"in1.csv\").arg(\"in2.csv\"));\n    wrk.read_stdout(&mut cmd)\n}\n\n#[test]\nfn prop_cat_rows() {\n    fn p(rows: CsvData) -> bool {\n        let expected = rows.clone();\n        let (rows1, rows2) =\n            if rows.is_empty() {\n                (vec![], vec![])\n            } else {\n                let (rows1, rows2) = rows.split_at(rows.len() / 2);\n                (rows1.to_vec(), rows2.to_vec())\n            };\n        let got: CsvData = run_cat(\"cat_rows\", \"rows\",\n                                   rows1, rows2, no_headers);\n        rassert_eq!(got, expected)\n    }\n    qcheck(p as fn(CsvData) -> bool);\n}\n\n#[test]\nfn cat_rows_space() {\n    let rows = vec![svec![\"\\u{0085}\"]];\n    let expected = rows.clone();\n    let (rows1, rows2) =\n        if rows.is_empty() {\n            (vec![], vec![])\n        } else {\n            let (rows1, rows2) = rows.split_at(rows.len() / 2);\n            (rows1.to_vec(), rows2.to_vec())\n        };\n    let got: Vec<Vec<String>> =\n        run_cat(\"cat_rows_space\", \"rows\", rows1, rows2, no_headers);\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn cat_rows_headers() {\n    let rows1 = vec![svec![\"h1\", \"h2\"], svec![\"a\", \"b\"]];\n    let rows2 = vec![svec![\"h1\", \"h2\"], svec![\"y\", \"z\"]];\n\n    let mut expected = rows1.clone();\n    expected.extend(rows2.clone().into_iter().skip(1));\n\n    let got: Vec<Vec<String>> = run_cat(\"cat_rows_headers\", \"rows\",\n                                        rows1, rows2, |_| ());\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn prop_cat_cols() {\n    fn p(rows1: CsvData, rows2: CsvData) -> bool {\n        let got: Vec<Vec<String>> = run_cat(\n            \"cat_cols\", \"columns\", rows1.clone(), rows2.clone(), no_headers);\n\n        let mut expected: Vec<Vec<String>> = vec![];\n        let (rows1, rows2) = (rows1.to_vecs().into_iter(),\n                              rows2.to_vecs().into_iter());\n        for (mut r1, r2) in rows1.zip(rows2) {\n            r1.extend(r2.into_iter());\n            expected.push(r1);\n        }\n        rassert_eq!(got, expected)\n    }\n    qcheck(p as fn(CsvData, CsvData) -> bool);\n}\n\n#[test]\nfn cat_cols_headers() {\n    let rows1 = vec![svec![\"h1\", \"h2\"], svec![\"a\", \"b\"]];\n    let rows2 = vec![svec![\"h3\", \"h4\"], svec![\"y\", \"z\"]];\n\n    let expected = vec![\n        svec![\"h1\", \"h2\", \"h3\", \"h4\"],\n        svec![\"a\", \"b\", \"y\", \"z\"],\n    ];\n    let got: Vec<Vec<String>> = run_cat(\"cat_cols_headers\", \"columns\",\n                                        rows1, rows2, |_| ());\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn cat_cols_no_pad() {\n    let rows1 = vec![svec![\"a\", \"b\"]];\n    let rows2 = vec![svec![\"y\", \"z\"], svec![\"y\", \"z\"]];\n\n    let expected = vec![\n        svec![\"a\", \"b\", \"y\", \"z\"],\n    ];\n    let got: Vec<Vec<String>> = run_cat(\"cat_cols_headers\", \"columns\",\n                                        rows1, rows2, no_headers);\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn cat_cols_pad() {\n    let rows1 = vec![svec![\"a\", \"b\"]];\n    let rows2 = vec![svec![\"y\", \"z\"], svec![\"y\", \"z\"]];\n\n    let expected = vec![\n        svec![\"a\", \"b\", \"y\", \"z\"],\n        svec![\"\", \"\", \"y\", \"z\"],\n    ];\n    let got: Vec<Vec<String>> = run_cat(\"cat_cols_headers\", \"columns\",\n                                        rows1, rows2, pad);\n    assert_eq!(got, expected);\n}\n"
  },
  {
    "path": "tests/test_count.rs",
    "content": "use {CsvData, qcheck};\nuse workdir::Workdir;\n\n/// This tests whether `xsv count` gets the right answer.\n///\n/// It does some simple case analysis to handle whether we want to test counts\n/// in the presence of headers and/or indexes.\nfn prop_count_len(name: &str, rows: CsvData,\n                  headers: bool, idx: bool) -> bool {\n    let mut expected_count = rows.len();\n    if headers && expected_count > 0 {\n        expected_count -= 1;\n    }\n\n    let wrk = Workdir::new(name);\n    if idx {\n        wrk.create_indexed(\"in.csv\", rows);\n    } else {\n        wrk.create(\"in.csv\", rows);\n    }\n\n    let mut cmd = wrk.command(\"count\");\n    if !headers {\n        cmd.arg(\"--no-headers\");\n    }\n    cmd.arg(\"in.csv\");\n\n    let got_count: usize = wrk.stdout(&mut cmd);\n    rassert_eq!(got_count, expected_count)\n}\n\n#[test]\nfn prop_count() {\n    fn p(rows: CsvData) -> bool {\n        prop_count_len(\"prop_count\", rows, false, false)\n    }\n    qcheck(p as fn(CsvData) -> bool);\n}\n\n#[test]\nfn prop_count_headers() {\n    fn p(rows: CsvData) -> bool {\n        prop_count_len(\"prop_count_headers\", rows, true, false)\n    }\n    qcheck(p as fn(CsvData) -> bool);\n}\n\n#[test]\nfn prop_count_indexed() {\n    fn p(rows: CsvData) -> bool {\n        prop_count_len(\"prop_count_indexed\", rows, false, true)\n    }\n    qcheck(p as fn(CsvData) -> bool);\n}\n\n#[test]\nfn prop_count_indexed_headers() {\n    fn p(rows: CsvData) -> bool {\n        prop_count_len(\"prop_count_indexed_headers\", rows, true, true)\n    }\n    qcheck(p as fn(CsvData) -> bool);\n}\n"
  },
  {
    "path": "tests/test_fixlengths.rs",
    "content": "use quickcheck::TestResult;\n\nuse {CsvRecord, qcheck};\nuse workdir::Workdir;\n\nfn trim_trailing_empty(it : &CsvRecord) -> Vec<String> {\n    let mut cloned = it.clone().unwrap();\n    while cloned.len() > 1 && cloned.last().unwrap().is_empty() {\n        cloned.pop();\n    }\n    cloned\n}\n\n#[test]\nfn prop_fixlengths_all_maxlen() {\n    fn p(rows: Vec<CsvRecord>) -> TestResult {\n        let expected_len =\n            match rows.iter().map(|r| trim_trailing_empty(r).len()).max() {\n                None => return TestResult::discard(),\n                Some(n) => n,\n            };\n\n        let wrk = Workdir::new(\"fixlengths_all_maxlen\").flexible(true);\n        wrk.create(\"in.csv\", rows);\n\n        let mut cmd = wrk.command(\"fixlengths\");\n        cmd.arg(\"in.csv\");\n\n        let got: Vec<CsvRecord> = wrk.read_stdout(&mut cmd);\n        let got_len = got.iter().map(|r| r.len()).max().unwrap();\n        for r in got.iter() { assert_eq!(r.len(), got_len) }\n        TestResult::from_bool(rassert_eq!(got_len, expected_len))\n    }\n    qcheck(p as fn(Vec<CsvRecord>) -> TestResult);\n}\n\n#[test]\nfn fixlengths_all_maxlen_trims() {\n    let rows = vec![\n        svec![\"h1\", \"h2\"],\n        svec![\"abcdef\", \"ghijkl\", \"\", \"\"],\n        svec![\"mnopqr\", \"stuvwx\", \"\", \"\"],\n    ];\n\n    let wrk = Workdir::new(\"fixlengths_all_maxlen_trims\").flexible(true);\n    wrk.create(\"in.csv\", rows);\n\n    let mut cmd = wrk.command(\"fixlengths\");\n    cmd.arg(\"in.csv\");\n\n    let got: Vec<CsvRecord> = wrk.read_stdout(&mut cmd);\n    for r in got.iter() { assert_eq!(r.len(), 2) }\n}\n\n#[test]\nfn fixlengths_all_maxlen_trims_at_least_1() {\n    let rows = vec![\n        svec![\"\"],\n        svec![\"\", \"\"],\n        svec![\"\", \"\", \"\"],\n    ];\n\n    let wrk = Workdir::new(\"fixlengths_all_maxlen_trims_at_least_1\").flexible(true);\n    wrk.create(\"in.csv\", rows);\n\n    let mut cmd = wrk.command(\"fixlengths\");\n    cmd.arg(\"in.csv\");\n\n    let got: Vec<CsvRecord> = wrk.read_stdout(&mut cmd);\n    for r in got.iter() { assert_eq!(r.len(), 1) }\n}\n\n\n#[test]\nfn prop_fixlengths_explicit_len() {\n    fn p(rows: Vec<CsvRecord>, expected_len: usize) -> TestResult {\n        if expected_len == 0 || rows.is_empty() {\n            return TestResult::discard();\n        }\n\n        let wrk = Workdir::new(\"fixlengths_explicit_len\").flexible(true);\n        wrk.create(\"in.csv\", rows);\n\n        let mut cmd = wrk.command(\"fixlengths\");\n        cmd.arg(\"in.csv\").args(&[\"-l\", &*expected_len.to_string()]);\n\n        let got: Vec<CsvRecord> = wrk.read_stdout(&mut cmd);\n        let got_len = got.iter().map(|r| r.len()).max().unwrap();\n        for r in got.iter() { assert_eq!(r.len(), got_len) }\n        TestResult::from_bool(rassert_eq!(got_len, expected_len))\n    }\n    qcheck(p as fn(Vec<CsvRecord>, usize) -> TestResult);\n}\n"
  },
  {
    "path": "tests/test_flatten.rs",
    "content": "use std::process;\n\nuse workdir::Workdir;\n\nfn setup(name: &str) -> (Workdir, process::Command) {\n    let rows = vec![\n        svec![\"h1\", \"h2\"],\n        svec![\"abcdef\", \"ghijkl\"],\n        svec![\"mnopqr\", \"stuvwx\"],\n    ];\n\n    let wrk = Workdir::new(name);\n    wrk.create(\"in.csv\", rows);\n\n    let mut cmd = wrk.command(\"flatten\");\n    cmd.arg(\"in.csv\");\n\n    (wrk, cmd)\n}\n\n#[test]\nfn flatten_basic() {\n    let (wrk, mut cmd) = setup(\"flatten_basic\");\n    let got: String = wrk.stdout(&mut cmd);\n    let expected = \"\\\nh1  abcdef\nh2  ghijkl\n#\nh1  mnopqr\nh2  stuvwx\\\n\";\n    assert_eq!(got, expected.to_string());\n}\n\n#[test]\nfn flatten_no_headers() {\n    let (wrk, mut cmd) = setup(\"flatten_no_headers\");\n    cmd.arg(\"--no-headers\");\n\n    let got: String = wrk.stdout(&mut cmd);\n    let expected = \"\\\n0   h1\n1   h2\n#\n0   abcdef\n1   ghijkl\n#\n0   mnopqr\n1   stuvwx\\\n\";\n    assert_eq!(got, expected.to_string());\n}\n\n#[test]\nfn flatten_separator() {\n    let (wrk, mut cmd) = setup(\"flatten_separator\");\n    cmd.args(&[\"--separator\", \"!mysep!\"]);\n\n    let got: String = wrk.stdout(&mut cmd);\n    let expected = \"\\\nh1  abcdef\nh2  ghijkl\n!mysep!\nh1  mnopqr\nh2  stuvwx\\\n\";\n    assert_eq!(got, expected.to_string());\n}\n\n#[test]\nfn flatten_condense() {\n    let (wrk, mut cmd) = setup(\"flatten_condense\");\n    cmd.args(&[\"--condense\", \"2\"]);\n\n    let got: String = wrk.stdout(&mut cmd);\n    let expected = \"\\\nh1  ab...\nh2  gh...\n#\nh1  mn...\nh2  st...\\\n\";\n    assert_eq!(got, expected.to_string());\n}\n"
  },
  {
    "path": "tests/test_fmt.rs",
    "content": "use std::process;\n\nuse workdir::Workdir;\n\nfn setup(name: &str) -> (Workdir, process::Command) {\n    let rows = vec![\n        svec![\"h1\", \"h2\"],\n        svec![\"abcdef\", \"ghijkl\"],\n        svec![\"mnopqr\", \"stuvwx\"],\n    ];\n\n    let wrk = Workdir::new(name);\n    wrk.create(\"in.csv\", rows);\n\n    let mut cmd = wrk.command(\"fmt\");\n    cmd.arg(\"in.csv\");\n\n    (wrk, cmd)\n}\n\n#[test]\nfn fmt_delimiter() {\n    let (wrk, mut cmd) = setup(\"fmt_delimiter\");\n    cmd.args(&[\"--out-delimiter\", \"\\t\"]);\n\n    let got: String = wrk.stdout(&mut cmd);\n    let expected = \"\\\nh1\\th2\nabcdef\\tghijkl\nmnopqr\\tstuvwx\";\n    assert_eq!(got, expected.to_string());\n}\n\n#[test]\nfn fmt_weird_delimiter() {\n    let (wrk, mut cmd) = setup(\"fmt_weird_delimiter\");\n    cmd.args(&[\"--out-delimiter\", \"h\"]);\n\n    let got: String = wrk.stdout(&mut cmd);\n    let expected = \"\\\n\\\"h1\\\"h\\\"h2\\\"\nabcdefh\\\"ghijkl\\\"\nmnopqrhstuvwx\";\n    assert_eq!(got, expected.to_string());\n}\n\n#[test]\nfn fmt_crlf() {\n    let (wrk, mut cmd) = setup(\"fmt_crlf\");\n    cmd.arg(\"--crlf\");\n\n    let got: String = wrk.stdout(&mut cmd);\n    let expected = \"\\\nh1,h2\\r\nabcdef,ghijkl\\r\nmnopqr,stuvwx\";\n    assert_eq!(got, expected.to_string());\n}\n\n#[test]\nfn fmt_quote_always() {\n    let (wrk, mut cmd) = setup(\"fmt_quote_always\");\n    cmd.arg(\"--quote-always\");\n\n    let got: String = wrk.stdout(&mut cmd);\n    let expected = \"\\\n\\\"h1\\\",\\\"h2\\\"\n\\\"abcdef\\\",\\\"ghijkl\\\"\n\\\"mnopqr\\\",\\\"stuvwx\\\"\";\n    assert_eq!(got, expected.to_string());\n}\n"
  },
  {
    "path": "tests/test_frequency.rs",
    "content": "use std::borrow::ToOwned;\nuse std::collections::hash_map::{HashMap, Entry};\nuse std::process;\n\nuse csv;\nuse stats::Frequencies;\n\nuse {Csv, CsvData, qcheck_sized};\nuse workdir::Workdir;\n\nfn setup(name: &str) -> (Workdir, process::Command) {\n    let rows = vec![\n        svec![\"h1\", \"h2\"],\n        svec![\"a\", \"z\"],\n        svec![\"a\", \"y\"],\n        svec![\"a\", \"y\"],\n        svec![\"b\", \"z\"],\n        svec![\"\", \"z\"],\n        svec![\"(NULL)\", \"x\"],\n    ];\n\n    let wrk = Workdir::new(name);\n    wrk.create(\"in.csv\", rows);\n\n    let mut cmd = wrk.command(\"frequency\");\n    cmd.arg(\"in.csv\");\n\n    (wrk, cmd)\n}\n\n#[test]\nfn frequency_no_headers() {\n    let (wrk, mut cmd) = setup(\"frequency_no_headers\");\n    cmd.args(&[\"--limit\", \"0\"]).args(&[\"--select\", \"1\"]).arg(\"--no-headers\");\n\n    let mut got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    got = got.into_iter().skip(1).collect();\n    got.sort();\n    let expected = vec![\n        svec![\"1\", \"(NULL)\", \"1\"],\n        svec![\"1\", \"(NULL)\", \"1\"],\n        svec![\"1\", \"a\", \"3\"],\n        svec![\"1\", \"b\", \"1\"],\n        svec![\"1\", \"h1\", \"1\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn frequency_no_nulls() {\n    let (wrk, mut cmd) = setup(\"frequency_no_nulls\");\n    cmd.arg(\"--no-nulls\").args(&[\"--limit\", \"0\"]).args(&[\"--select\", \"h1\"]);\n\n    let mut got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    got.sort();\n    let expected = vec![\n        svec![\"field\", \"value\", \"count\"],\n        svec![\"h1\", \"(NULL)\", \"1\"],\n        svec![\"h1\", \"a\", \"3\"],\n        svec![\"h1\", \"b\", \"1\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn frequency_nulls() {\n    let (wrk, mut cmd) = setup(\"frequency_nulls\");\n    cmd.args(&[\"--limit\", \"0\"]).args(&[\"--select\", \"h1\"]);\n\n    let mut got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    got.sort();\n    let expected = vec![\n        svec![\"field\", \"value\", \"count\"],\n        svec![\"h1\", \"(NULL)\", \"1\"],\n        svec![\"h1\", \"(NULL)\", \"1\"],\n        svec![\"h1\", \"a\", \"3\"],\n        svec![\"h1\", \"b\", \"1\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn frequency_limit() {\n    let (wrk, mut cmd) = setup(\"frequency_limit\");\n    cmd.args(&[\"--limit\", \"1\"]);\n\n    let mut got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    got.sort();\n    let expected = vec![\n        svec![\"field\", \"value\", \"count\"],\n        svec![\"h1\", \"a\", \"3\"],\n        svec![\"h2\", \"z\", \"3\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn frequency_asc() {\n    let (wrk, mut cmd) = setup(\"frequency_asc\");\n    cmd.args(&[\"--limit\", \"1\"]).args(&[\"--select\", \"h2\"]).arg(\"--asc\");\n\n    let mut got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    got.sort();\n    let expected = vec![\n        svec![\"field\", \"value\", \"count\"],\n        svec![\"h2\", \"x\", \"1\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn frequency_select() {\n    let (wrk, mut cmd) = setup(\"frequency_select\");\n    cmd.args(&[\"--limit\", \"0\"]).args(&[\"--select\", \"h2\"]);\n\n    let mut got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    got.sort();\n    let expected = vec![\n        svec![\"field\", \"value\", \"count\"],\n        svec![\"h2\", \"x\", \"1\"],\n        svec![\"h2\", \"y\", \"2\"],\n        svec![\"h2\", \"z\", \"3\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n// This tests that a frequency table computed by `xsv` is always the same\n// as the frequency table computed in memory.\n#[test]\nfn prop_frequency() {\n    fn p(rows: CsvData) -> bool {\n        param_prop_frequency(\"prop_frequency\", rows, false)\n    }\n    // Run on really small values because we are incredibly careless\n    // with allocation.\n    qcheck_sized(p as fn(CsvData) -> bool, 2);\n}\n\n\n// This tests that running the frequency command on a CSV file with these two\n// rows does not burst in flames:\n//\n//     \\u{FEFF}\n//     \"\"\n//\n// In this case, the `param_prop_frequency` just ignores this particular test.\n// Namely, \\u{FEFF} is the UTF-8 BOM, which is ignored by the underlying CSV\n// reader.\n#[test]\nfn frequency_bom() {\n    let rows = CsvData {\n        data: vec![\n            ::CsvRecord(vec![\"\\u{FEFF}\".to_string()]),\n            ::CsvRecord(vec![\"\".to_string()]),\n        ],\n    };\n    assert!(param_prop_frequency(\"prop_frequency\", rows, false))\n}\n\n// This tests that a frequency table computed by `xsv` (with an index) is\n// always the same as the frequency table computed in memory.\n#[test]\nfn prop_frequency_indexed() {\n    fn p(rows: CsvData) -> bool {\n        param_prop_frequency(\"prop_frequency_indxed\", rows, true)\n    }\n    // Run on really small values because we are incredibly careless\n    // with allocation.\n    qcheck_sized(p as fn(CsvData) -> bool, 2);\n}\n\nfn param_prop_frequency(name: &str, rows: CsvData, idx: bool) -> bool {\n    if !rows.is_empty() && rows[0][0].len() == 3 && rows[0][0] == \"\\u{FEFF}\" {\n        return true;\n    }\n    let wrk = Workdir::new(name);\n    if idx {\n        wrk.create_indexed(\"in.csv\", rows.clone());\n    } else {\n        wrk.create(\"in.csv\", rows.clone());\n    }\n\n    let mut cmd = wrk.command(\"frequency\");\n    cmd.arg(\"in.csv\").args(&[\"-j\", \"4\"]).args(&[\"--limit\", \"0\"]);\n\n    let stdout = wrk.stdout::<String>(&mut cmd);\n    let got_ftables = ftables_from_csv_string(stdout);\n    let expected_ftables = ftables_from_rows(rows);\n    assert_eq_ftables(&got_ftables, &expected_ftables)\n}\n\ntype FTables = HashMap<String, Frequencies<String>>;\n\n#[derive(Deserialize)]\nstruct FRow {\n    field: String,\n    value: String,\n    count: usize,\n}\n\nfn ftables_from_rows<T: Csv>(rows: T) -> FTables {\n    let mut rows = rows.to_vecs();\n    if rows.len() <= 1 {\n        return HashMap::new();\n    }\n\n    let header = rows.remove(0);\n    let mut ftables = HashMap::new();\n    for field in header.iter() {\n        ftables.insert(field.clone(), Frequencies::new());\n    }\n    for row in rows.into_iter() {\n        for (i, mut field) in row.into_iter().enumerate() {\n            field = field.trim().to_owned();\n            if field.is_empty() {\n                field = \"(NULL)\".to_owned();\n            }\n            ftables.get_mut(&header[i]).unwrap().add(field);\n        }\n    }\n    ftables\n}\n\nfn ftables_from_csv_string(data: String) -> FTables {\n    let mut rdr = csv::Reader::from_reader(data.as_bytes());\n    let mut ftables = HashMap::new();\n    for frow in rdr.deserialize() {\n        let frow: FRow = frow.unwrap();\n        match ftables.entry(frow.field) {\n            Entry::Vacant(v) => {\n                let mut ftable = Frequencies::new();\n                for _ in 0..frow.count {\n                    ftable.add(frow.value.clone());\n                }\n                v.insert(ftable);\n            }\n            Entry::Occupied(mut v) => {\n                for _ in 0..frow.count {\n                    v.get_mut().add(frow.value.clone());\n                }\n            }\n        }\n    }\n    ftables\n}\n\nfn freq_data<T>(ftable: &Frequencies<T>) -> Vec<(&T, u64)>\n        where T: ::std::hash::Hash + Ord + Clone {\n    let mut freqs = ftable.most_frequent();\n    freqs.sort();\n    freqs\n}\n\nfn assert_eq_ftables(got: &FTables, expected: &FTables) -> bool {\n    for (k, v) in got.iter() {\n        assert_eq!(freq_data(v), freq_data(expected.get(k).unwrap()));\n    }\n    for (k, v) in expected.iter() {\n        assert_eq!(freq_data(got.get(k).unwrap()), freq_data(v));\n    }\n    true\n}\n"
  },
  {
    "path": "tests/test_headers.rs",
    "content": "use std::process;\n\nuse workdir::Workdir;\n\nfn setup(name: &str) -> (Workdir, process::Command) {\n    let rows1 = vec![svec![\"h1\", \"h2\"], svec![\"a\", \"b\"]];\n    let rows2 = vec![svec![\"h2\", \"h3\"], svec![\"y\", \"z\"]];\n\n    let wrk = Workdir::new(name);\n    wrk.create(\"in1.csv\", rows1);\n    wrk.create(\"in2.csv\", rows2);\n\n    let mut cmd = wrk.command(\"headers\");\n    cmd.arg(\"in1.csv\");\n\n    (wrk, cmd)\n}\n\n#[test]\nfn headers_basic() {\n    let (wrk, mut cmd) = setup(\"headers_basic\");\n\n    let got: String = wrk.stdout(&mut cmd);\n    let expected = \"\\\n1   h1\n2   h2\";\n    assert_eq!(got, expected.to_string());\n}\n\n#[test]\nfn headers_just_names() {\n    let (wrk, mut cmd) = setup(\"headers_just_names\");\n    cmd.arg(\"--just-names\");\n\n    let got: String = wrk.stdout(&mut cmd);\n    let expected = \"\\\nh1\nh2\";\n    assert_eq!(got, expected.to_string());\n}\n\n#[test]\nfn headers_multiple() {\n    let (wrk, mut cmd) = setup(\"headers_multiple\");\n    cmd.arg(\"in2.csv\");\n\n    let got: String = wrk.stdout(&mut cmd);\n    let expected = \"\\\nh1\nh2\nh2\nh3\";\n    assert_eq!(got, expected.to_string());\n}\n\n#[test]\nfn headers_intersect() {\n    let (wrk, mut cmd) = setup(\"headers_intersect\");\n    cmd.arg(\"in2.csv\").arg(\"--intersect\");\n\n    let got: String = wrk.stdout(&mut cmd);\n    let expected = \"\\\nh1\nh2\nh3\";\n    assert_eq!(got, expected.to_string());\n}\n"
  },
  {
    "path": "tests/test_index.rs",
    "content": "use std::fs;\n\nuse filetime::{FileTime, set_file_times};\n\nuse workdir::Workdir;\n\n#[test]\nfn index_outdated() {\n    let wrk = Workdir::new(\"index_outdated\");\n    wrk.create_indexed(\"in.csv\", vec![svec![\"\"]]);\n\n    let md = fs::metadata(&wrk.path(\"in.csv.idx\")).unwrap();\n    set_file_times(\n        &wrk.path(\"in.csv\"),\n        future_time(FileTime::from_last_modification_time(&md)),\n        future_time(FileTime::from_last_access_time(&md)),\n    ).unwrap();\n\n    let mut cmd = wrk.command(\"count\");\n    cmd.arg(\"--no-headers\").arg(\"in.csv\");\n    wrk.assert_err(&mut cmd);\n}\n\nfn future_time(ft: FileTime) -> FileTime {\n    let secs = ft.seconds_relative_to_1970();\n    FileTime::from_seconds_since_1970(secs + 10_000, 0)\n}\n"
  },
  {
    "path": "tests/test_join.rs",
    "content": "use workdir::Workdir;\n\n// This macro takes *two* identifiers: one for the test with headers\n// and another for the test without headers.\nmacro_rules! join_test {\n    ($name:ident, $fun:expr) => (\n        mod $name {\n            use std::process;\n\n            use workdir::Workdir;\n            use super::{make_rows, setup};\n\n            #[test]\n            fn headers() {\n                let wrk = setup(stringify!($name), true);\n                let mut cmd = wrk.command(\"join\");\n                cmd.args(&[\"city\", \"cities.csv\", \"city\", \"places.csv\"]);\n                $fun(wrk, cmd, true);\n            }\n\n            #[test]\n            fn no_headers() {\n                let n = stringify!(concat_idents!($name, _no_headers));\n                let wrk = setup(n, false);\n                let mut cmd = wrk.command(\"join\");\n                cmd.arg(\"--no-headers\");\n                cmd.args(&[\"1\", \"cities.csv\", \"1\", \"places.csv\"]);\n                $fun(wrk, cmd, false);\n            }\n        }\n    );\n}\n\nfn setup(name: &str, headers: bool) -> Workdir {\n    let mut cities = vec![\n        svec![\"Boston\", \"MA\"],\n        svec![\"New York\", \"NY\"],\n        svec![\"San Francisco\", \"CA\"],\n        svec![\"Buffalo\", \"NY\"],\n    ];\n    let mut places = vec![\n        svec![\"Boston\", \"Logan Airport\"],\n        svec![\"Boston\", \"Boston Garden\"],\n        svec![\"Buffalo\", \"Ralph Wilson Stadium\"],\n        svec![\"Orlando\", \"Disney World\"],\n    ];\n    if headers { cities.insert(0, svec![\"city\", \"state\"]); }\n    if headers { places.insert(0, svec![\"city\", \"place\"]); }\n\n    let wrk = Workdir::new(name);\n    wrk.create(\"cities.csv\", cities);\n    wrk.create(\"places.csv\", places);\n    wrk\n}\n\nfn make_rows(headers: bool, rows: Vec<Vec<String>>) -> Vec<Vec<String>> {\n    let mut all_rows = vec![];\n    if headers {\n        all_rows.push(svec![\"city\", \"state\", \"city\", \"place\"]);\n    }\n    all_rows.extend(rows.into_iter());\n    all_rows\n}\n\njoin_test!(join_inner,\n           |wrk: Workdir, mut cmd: process::Command, headers: bool| {\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = make_rows(headers, vec![\n        svec![\"Boston\", \"MA\", \"Boston\", \"Logan Airport\"],\n        svec![\"Boston\", \"MA\", \"Boston\", \"Boston Garden\"],\n        svec![\"Buffalo\", \"NY\", \"Buffalo\", \"Ralph Wilson Stadium\"],\n    ]);\n    assert_eq!(got, expected);\n});\n\njoin_test!(join_outer_left,\n           |wrk: Workdir, mut cmd: process::Command, headers: bool| {\n    cmd.arg(\"--left\");\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = make_rows(headers, vec![\n        svec![\"Boston\", \"MA\", \"Boston\", \"Logan Airport\"],\n        svec![\"Boston\", \"MA\", \"Boston\", \"Boston Garden\"],\n        svec![\"New York\", \"NY\", \"\", \"\"],\n        svec![\"San Francisco\", \"CA\", \"\", \"\"],\n        svec![\"Buffalo\", \"NY\", \"Buffalo\", \"Ralph Wilson Stadium\"],\n    ]);\n    assert_eq!(got, expected);\n});\n\njoin_test!(join_outer_right,\n           |wrk: Workdir, mut cmd: process::Command, headers: bool| {\n    cmd.arg(\"--right\");\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = make_rows(headers, vec![\n        svec![\"Boston\", \"MA\", \"Boston\", \"Logan Airport\"],\n        svec![\"Boston\", \"MA\", \"Boston\", \"Boston Garden\"],\n        svec![\"Buffalo\", \"NY\", \"Buffalo\", \"Ralph Wilson Stadium\"],\n        svec![\"\", \"\", \"Orlando\", \"Disney World\"],\n    ]);\n    assert_eq!(got, expected);\n});\n\njoin_test!(join_outer_full,\n           |wrk: Workdir, mut cmd: process::Command, headers: bool| {\n    cmd.arg(\"--full\");\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = make_rows(headers, vec![\n        svec![\"Boston\", \"MA\", \"Boston\", \"Logan Airport\"],\n        svec![\"Boston\", \"MA\", \"Boston\", \"Boston Garden\"],\n        svec![\"New York\", \"NY\", \"\", \"\"],\n        svec![\"San Francisco\", \"CA\", \"\", \"\"],\n        svec![\"Buffalo\", \"NY\", \"Buffalo\", \"Ralph Wilson Stadium\"],\n        svec![\"\", \"\", \"Orlando\", \"Disney World\"],\n    ]);\n    assert_eq!(got, expected);\n});\n\n#[test]\nfn join_inner_issue11() {\n    let a = vec![\n        svec![\"1\", \"2\"],\n        svec![\"3\", \"4\"],\n        svec![\"5\", \"6\"],\n    ];\n    let b = vec![\n        svec![\"2\", \"1\"],\n        svec![\"4\", \"3\"],\n        svec![\"6\", \"5\"],\n    ];\n\n    let wrk = Workdir::new(\"join_inner_issue11\");\n    wrk.create(\"a.csv\", a);\n    wrk.create(\"b.csv\", b);\n\n    let mut cmd = wrk.command(\"join\");\n    cmd.args(&[\"1,2\", \"a.csv\", \"2,1\", \"b.csv\"]);\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = vec![\n        svec![\"1\", \"2\", \"2\", \"1\"],\n        svec![\"3\", \"4\", \"4\", \"3\"],\n        svec![\"5\", \"6\", \"6\", \"5\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn join_cross() {\n    let wrk = Workdir::new(\"join_cross\");\n    wrk.create(\"letters.csv\",\n               vec![svec![\"h1\", \"h2\"], svec![\"a\", \"b\"], svec![\"c\", \"d\"]]);\n    wrk.create(\"numbers.csv\",\n               vec![svec![\"h3\", \"h4\"], svec![\"1\", \"2\"], svec![\"3\", \"4\"]]);\n\n    let mut cmd = wrk.command(\"join\");\n    cmd.arg(\"--cross\")\n       .args(&[\"\", \"letters.csv\", \"\", \"numbers.csv\"]);\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = vec![\n        svec![\"h1\", \"h2\", \"h3\", \"h4\"],\n        svec![\"a\", \"b\", \"1\", \"2\"],\n        svec![\"a\", \"b\", \"3\", \"4\"],\n        svec![\"c\", \"d\", \"1\", \"2\"],\n        svec![\"c\", \"d\", \"3\", \"4\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn join_cross_no_headers() {\n    let wrk = Workdir::new(\"join_cross_no_headers\");\n    wrk.create(\"letters.csv\", vec![svec![\"a\", \"b\"], svec![\"c\", \"d\"]]);\n    wrk.create(\"numbers.csv\", vec![svec![\"1\", \"2\"], svec![\"3\", \"4\"]]);\n\n    let mut cmd = wrk.command(\"join\");\n    cmd.arg(\"--cross\").arg(\"--no-headers\")\n       .args(&[\"\", \"letters.csv\", \"\", \"numbers.csv\"]);\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = vec![\n        svec![\"a\", \"b\", \"1\", \"2\"],\n        svec![\"a\", \"b\", \"3\", \"4\"],\n        svec![\"c\", \"d\", \"1\", \"2\"],\n        svec![\"c\", \"d\", \"3\", \"4\"],\n    ];\n    assert_eq!(got, expected);\n}\n"
  },
  {
    "path": "tests/test_partition.rs",
    "content": "use std::borrow::ToOwned;\n\nuse workdir::Workdir;\n\nmacro_rules! part_eq {\n    ($wrk:expr, $path:expr, $expected:expr) => (\n        assert_eq!($wrk.from_str::<String>(&$wrk.path($path)),\n                   $expected.to_owned());\n    );\n}\n\nfn data(headers: bool) -> Vec<Vec<String>> {\n    let mut rows = vec![\n        svec![\"NY\", \"Manhatten\"],\n        svec![\"CA\", \"San Francisco\"],\n        svec![\"TX\", \"Dallas\"],\n        svec![\"NY\", \"Buffalo\"],\n        svec![\"TX\", \"Fort Worth\"],\n    ];\n    if headers { rows.insert(0, svec![\"state\", \"city\"]); }\n    rows\n}\n\n#[test]\nfn partition() {\n    let wrk = Workdir::new(\"partition\");\n    wrk.create(\"in.csv\", data(true));\n\n    let mut cmd = wrk.command(\"partition\");\n    cmd.arg(\"state\").arg(&wrk.path(\".\")).arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    part_eq!(wrk, \"CA.csv\", \"\\\nstate,city\nCA,San Francisco\n\");\n    part_eq!(wrk, \"NY.csv\", \"\\\nstate,city\nNY,Manhatten\nNY,Buffalo\n\");\n    part_eq!(wrk, \"TX.csv\", \"\\\nstate,city\nTX,Dallas\nTX,Fort Worth\n\");\n}\n\n#[test]\nfn partition_drop() {\n    let wrk = Workdir::new(\"partition\");\n    wrk.create(\"in.csv\", data(true));\n\n    let mut cmd = wrk.command(\"partition\");\n    cmd.arg(\"--drop\").arg(\"state\").arg(&wrk.path(\".\")).arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    part_eq!(wrk, \"CA.csv\", \"\\\ncity\nSan Francisco\n\");\n    part_eq!(wrk, \"NY.csv\", \"\\\ncity\nManhatten\nBuffalo\n\");\n    part_eq!(wrk, \"TX.csv\", \"\\\ncity\nDallas\nFort Worth\n\");\n}\n\n#[test]\nfn partition_without_headers() {\n    let wrk = Workdir::new(\"partition_without_headers\");\n    wrk.create(\"in.csv\", data(false));\n\n    let mut cmd = wrk.command(\"partition\");\n    cmd.arg(\"--no-headers\").arg(\"1\").arg(&wrk.path(\".\")).arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    part_eq!(wrk, \"CA.csv\", \"\\\nCA,San Francisco\n\");\n    part_eq!(wrk, \"NY.csv\", \"\\\nNY,Manhatten\nNY,Buffalo\n\");\n    part_eq!(wrk, \"TX.csv\", \"\\\nTX,Dallas\nTX,Fort Worth\n\");\n}\n\n#[test]\nfn partition_drop_without_headers() {\n    let wrk = Workdir::new(\"partition_without_headers\");\n    wrk.create(\"in.csv\", data(false));\n\n    let mut cmd = wrk.command(\"partition\");\n    cmd.arg(\"--drop\").arg(\"--no-headers\").arg(\"1\").arg(&wrk.path(\".\")).arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    part_eq!(wrk, \"CA.csv\", \"\\\nSan Francisco\n\");\n    part_eq!(wrk, \"NY.csv\", \"\\\nManhatten\nBuffalo\n\");\n    part_eq!(wrk, \"TX.csv\", \"\\\nDallas\nFort Worth\n\");\n}\n\n#[test]\nfn partition_into_new_directory() {\n    let wrk = Workdir::new(\"partition_into_new_directory\");\n    wrk.create(\"in.csv\", data(true));\n\n    let mut cmd = wrk.command(\"partition\");\n    cmd.arg(\"state\").arg(&wrk.path(\"out\")).arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    assert!(wrk.path(\"out/NY.csv\").exists());\n}\n\n#[test]\nfn partition_custom_filename() {\n    let wrk = Workdir::new(\"partition_custom_filename\");\n    wrk.create(\"in.csv\", data(true));\n\n    let mut cmd = wrk.command(\"partition\");\n    cmd.args(&[\"--filename\", \"state-{}-partition.csv\"])\n        .arg(\"state\")\n        .arg(&wrk.path(\".\"))\n        .arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    assert!(wrk.path(\"state-NY-partition.csv\").exists());\n}\n\n#[test]\nfn partition_custom_filename_with_directory() {\n    let wrk = Workdir::new(\"partition_custom_filename_with_directory\");\n    wrk.create(\"in.csv\", data(true));\n\n    let mut cmd = wrk.command(\"partition\");\n    cmd.args(&[\"--filename\", \"{}/cities.csv\"])\n        .arg(\"state\")\n        .arg(&wrk.path(\".\"))\n        .arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    // This variation also helps with parallel partition jobs.\n    assert!(wrk.path(\"NY/cities.csv\").exists());\n}\n\n#[test]\nfn partition_invalid_filename() {\n    let wrk = Workdir::new(\"partition_invalid_filename\");\n    wrk.create(\"in.csv\", data(true));\n\n    let mut cmd = wrk.command(\"partition\");\n    cmd.args(&[\"--filename\", \"foo.csv\"])\n        .arg(\"state\")\n        .arg(&wrk.path(\".\"))\n        .arg(\"in.csv\");\n    wrk.assert_err(&mut cmd);\n\n    let mut cmd = wrk.command(\"partition\");\n    cmd.args(&[\"--filename\", \"{}{}.csv\"])\n        .arg(\"state\")\n        .arg(&wrk.path(\".\"))\n        .arg(\"in.csv\");\n    wrk.assert_err(&mut cmd);\n}\n\nfn tricky_data() -> Vec<Vec<String>> {\n    vec![\n        svec![\"key\", \"explanation\"],\n        svec![\"\", \"empty key\"],\n        svec![\"empty\", \"the string empty\"],\n        svec![\"unsafe _1$!,\\\"\", \"unsafe in shell\"],\n        svec![\"collision\", \"ordinary value\"],\n        svec![\"collision\", \"in same file\"],\n        svec![\"coll ision\", \"collides\"],\n        svec![\"collision!\", \"collides again\"],\n        svec![\"collision_2\", \"collides with disambiguated\"],\n    ]\n}\n\n#[test]\nfn partition_with_tricky_key_values() {\n    let wrk = Workdir::new(\"partition_with_tricky_key_values\");\n    wrk.create(\"in.csv\", tricky_data());\n\n    let mut cmd = wrk.command(\"partition\");\n    cmd.arg(\"key\").arg(&wrk.path(\".\")).arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    part_eq!(wrk, \"empty.csv\", \"\\\nkey,explanation\n,empty key\n\");\n     part_eq!(wrk, \"empty_1.csv\", \"\\\nkey,explanation\nempty,the string empty\n\");\n     part_eq!(wrk, \"unsafe_1.csv\", r#\"key,explanation\n\"unsafe _1$!,\"\"\",unsafe in shell\n\"#);\n     part_eq!(wrk, \"collision.csv\", \"\\\nkey,explanation\ncollision,ordinary value\ncollision,in same file\n\");\n    part_eq!(wrk, \"collision_2.csv\", \"\\\nkey,explanation\ncoll ision,collides\n\");\n    part_eq!(wrk, \"collision_3.csv\", \"\\\nkey,explanation\ncollision!,collides again\n\");\n    // Tricky! We didn't see this an input, but we did generate it as an\n    // output already.\n    part_eq!(wrk, \"collision_2_4.csv\", \"\\\nkey,explanation\ncollision_2,collides with disambiguated\n\");\n}\n\nfn prefix_data() -> Vec<Vec<String>> {\n    vec![\n        svec![\"state\", \"city\"],\n        svec![\"MA\", \"Boston\"],\n        svec![\"ME\", \"Portland\"],\n        svec![\"M\", \"Too short\"],\n        svec![\"CA\", \"San Francisco\"],\n        svec![\"CO\", \"Denver\"],\n    ]\n}\n\n#[test]\nfn partition_with_prefix_length() {\n    let wrk = Workdir::new(\"partition_with_prefix_length\");\n    wrk.create(\"in.csv\", prefix_data());\n\n    let mut cmd = wrk.command(\"partition\");\n    cmd\n        .args(&[\"--prefix-length\", \"1\"])\n        .arg(\"state\")\n        .arg(&wrk.path(\".\"))\n        .arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    part_eq!(wrk, \"M.csv\", \"\\\nstate,city\nMA,Boston\nME,Portland\nM,Too short\n\");\n    part_eq!(wrk, \"C.csv\", \"\\\nstate,city\nCA,San Francisco\nCO,Denver\n\");\n}\n"
  },
  {
    "path": "tests/test_reverse.rs",
    "content": "use workdir::Workdir;\n\nuse {Csv, CsvData, qcheck};\n\nfn prop_reverse(name: &str, rows: CsvData, headers: bool) -> bool {\n    let wrk = Workdir::new(name);\n    wrk.create(\"in.csv\", rows.clone());\n\n    let mut cmd = wrk.command(\"reverse\");\n    cmd.arg(\"in.csv\");\n    if !headers { cmd.arg(\"--no-headers\"); }\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let mut expected = rows.to_vecs();\n    let headers = if headers && !expected.is_empty() {\n        expected.remove(0)\n    } else {\n        vec![]\n    };\n    expected.reverse();\n    if !headers.is_empty() { expected.insert(0, headers); }\n    rassert_eq!(got, expected)\n}\n\n#[test]\nfn prop_reverse_headers() {\n    fn p(rows: CsvData) -> bool {\n        prop_reverse(\"prop_reverse_headers\", rows, true)\n    }\n    qcheck(p as fn(CsvData) -> bool);\n}\n\n#[test]\nfn prop_reverse_no_headers() {\n    fn p(rows: CsvData) -> bool {\n        prop_reverse(\"prop_reverse_no_headers\", rows, false)\n    }\n    qcheck(p as fn(CsvData) -> bool);\n}\n"
  },
  {
    "path": "tests/test_search.rs",
    "content": "use workdir::Workdir;\n\nfn data(headers: bool) -> Vec<Vec<String>> {\n    let mut rows = vec![\n        svec![\"foobar\", \"barfoo\"],\n        svec![\"a\", \"b\"],\n        svec![\"barfoo\", \"foobar\"],\n    ];\n    if headers { rows.insert(0, svec![\"h1\", \"h2\"]); }\n    rows\n}\n\n#[test]\nfn search() {\n    let wrk = Workdir::new(\"search\");\n    wrk.create(\"data.csv\", data(true));\n    let mut cmd = wrk.command(\"search\");\n    cmd.arg(\"^foo\").arg(\"data.csv\");\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = vec![\n        svec![\"h1\", \"h2\"],\n        svec![\"foobar\", \"barfoo\"],\n        svec![\"barfoo\", \"foobar\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn search_empty() {\n    let wrk = Workdir::new(\"search\");\n    wrk.create(\"data.csv\", data(true));\n    let mut cmd = wrk.command(\"search\");\n    cmd.arg(\"xxx\").arg(\"data.csv\");\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = vec![\n        svec![\"h1\", \"h2\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn search_empty_no_headers() {\n    let wrk = Workdir::new(\"search\");\n    wrk.create(\"data.csv\", data(true));\n    let mut cmd = wrk.command(\"search\");\n    cmd.arg(\"xxx\").arg(\"data.csv\");\n    cmd.arg(\"--no-headers\");\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected: Vec<Vec<String>> = vec![];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn search_ignore_case() {\n    let wrk = Workdir::new(\"search\");\n    wrk.create(\"data.csv\", data(true));\n    let mut cmd = wrk.command(\"search\");\n    cmd.arg(\"^FoO\").arg(\"data.csv\");\n    cmd.arg(\"--ignore-case\");\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = vec![\n        svec![\"h1\", \"h2\"],\n        svec![\"foobar\", \"barfoo\"],\n        svec![\"barfoo\", \"foobar\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn search_no_headers() {\n    let wrk = Workdir::new(\"search_no_headers\");\n    wrk.create(\"data.csv\", data(false));\n    let mut cmd = wrk.command(\"search\");\n    cmd.arg(\"^foo\").arg(\"data.csv\");\n    cmd.arg(\"--no-headers\");\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = vec![\n        svec![\"foobar\", \"barfoo\"],\n        svec![\"barfoo\", \"foobar\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn search_select() {\n    let wrk = Workdir::new(\"search_select\");\n    wrk.create(\"data.csv\", data(true));\n    let mut cmd = wrk.command(\"search\");\n    cmd.arg(\"^foo\").arg(\"data.csv\");\n    cmd.arg(\"--select\").arg(\"h2\");\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = vec![\n        svec![\"h1\", \"h2\"],\n        svec![\"barfoo\", \"foobar\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn search_select_no_headers() {\n    let wrk = Workdir::new(\"search_select_no_headers\");\n    wrk.create(\"data.csv\", data(false));\n    let mut cmd = wrk.command(\"search\");\n    cmd.arg(\"^foo\").arg(\"data.csv\");\n    cmd.arg(\"--select\").arg(\"2\");\n    cmd.arg(\"--no-headers\");\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = vec![\n        svec![\"barfoo\", \"foobar\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn search_invert_match() {\n    let wrk = Workdir::new(\"search_invert_match\");\n    wrk.create(\"data.csv\", data(false));\n    let mut cmd = wrk.command(\"search\");\n    cmd.arg(\"^foo\").arg(\"data.csv\");\n    cmd.arg(\"--invert-match\");\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = vec![\n        svec![\"foobar\", \"barfoo\"],\n        svec![\"a\", \"b\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn search_invert_match_no_headers() {\n    let wrk = Workdir::new(\"search_invert_match\");\n    wrk.create(\"data.csv\", data(false));\n    let mut cmd = wrk.command(\"search\");\n    cmd.arg(\"^foo\").arg(\"data.csv\");\n    cmd.arg(\"--invert-match\");\n    cmd.arg(\"--no-headers\");\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = vec![\n        svec![\"a\", \"b\"],\n    ];\n    assert_eq!(got, expected);\n}\n"
  },
  {
    "path": "tests/test_select.rs",
    "content": "use workdir::Workdir;\n\nmacro_rules! select_test {\n    ($name:ident, $select:expr, $select_no_headers:expr,\n     $expected_headers:expr, $expected_rows:expr) => (\n        mod $name {\n            use workdir::Workdir;\n            use super::data;\n\n            #[test]\n            fn headers() {\n                let wrk = Workdir::new(stringify!($name));\n                wrk.create(\"data.csv\", data(true));\n                let mut cmd = wrk.command(\"select\");\n                cmd.arg(\"--\").arg($select).arg(\"data.csv\");\n                let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n\n                let expected = vec![\n                    $expected_headers.iter()\n                                     .map(|s| s.to_string())\n                                     .collect::<Vec<String>>(),\n                    $expected_rows.iter()\n                                  .map(|s| s.to_string())\n                                  .collect::<Vec<String>>(),\n                ];\n                assert_eq!(got, expected);\n            }\n\n            #[test]\n            fn no_headers() {\n                let wrk = Workdir::new(stringify!($name));\n                wrk.create(\"data.csv\", data(false));\n                let mut cmd = wrk.command(\"select\");\n                cmd.arg(\"--no-headers\")\n                   .arg(\"--\").arg($select_no_headers).arg(\"data.csv\");\n                let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n\n                let expected = vec![\n                    $expected_rows.iter()\n                                  .map(|s| s.to_string())\n                                  .collect::<Vec<String>>(),\n                ];\n                assert_eq!(got, expected);\n            }\n        }\n    );\n}\n\nmacro_rules! select_test_err {\n    ($name:ident, $select:expr) => (\n        #[test]\n        fn $name() {\n            let wrk = Workdir::new(stringify!($name));\n            wrk.create(\"data.csv\", data(true));\n            let mut cmd = wrk.command(\"select\");\n            cmd.arg($select).arg(\"data.csv\");\n            wrk.assert_err(&mut cmd);\n        }\n    );\n}\n\nfn header_row() -> Vec<String> { svec![\"h1\", \"h2\", \"h[]3\", \"h4\", \"h1\"] }\n\nfn data(headers: bool) -> Vec<Vec<String>> {\n    let mut rows = vec![\n        svec![\"a\", \"b\", \"c\", \"d\", \"e\"],\n    ];\n    if headers { rows.insert(0, header_row()) }\n    rows\n}\n\nselect_test!(select_simple, \"h1\", \"1\", [\"h1\"], [\"a\"]);\nselect_test!(select_simple_idx, \"h1[0]\", \"1\", [\"h1\"], [\"a\"]);\nselect_test!(select_simple_idx_2, \"h1[1]\", \"5\", [\"h1\"], [\"e\"]);\n\nselect_test!(select_quoted, r#\"\"h[]3\"\"#, \"3\", [\"h[]3\"], [\"c\"]);\nselect_test!(select_quoted_idx, r#\"\"h[]3\"[0]\"#, \"3\", [\"h[]3\"], [\"c\"]);\n\nselect_test!(select_range, \"h1-h4\", \"1-4\",\n             [\"h1\", \"h2\", \"h[]3\", \"h4\"], [\"a\", \"b\", \"c\", \"d\"]);\n\nselect_test!(select_range_multi, r#\"h1-h2,\"h[]3\"-h4\"#, \"1-2,3-4\",\n             [\"h1\", \"h2\", \"h[]3\", \"h4\"], [\"a\", \"b\", \"c\", \"d\"]);\nselect_test!(select_range_multi_idx, r#\"h1-h2,\"h[]3\"[0]-h4\"#, \"1-2,3-4\",\n             [\"h1\", \"h2\", \"h[]3\", \"h4\"], [\"a\", \"b\", \"c\", \"d\"]);\n\nselect_test!(select_reverse, \"h1[1]-h1[0]\", \"5-1\",\n             [\"h1\", \"h4\", \"h[]3\", \"h2\", \"h1\"], [\"e\", \"d\", \"c\", \"b\", \"a\"]);\n\nselect_test!(select_not, r#\"!\"h[]3\"[0]\"#, \"!3\",\n             [\"h1\", \"h2\", \"h4\", \"h1\"], [\"a\", \"b\", \"d\", \"e\"]);\nselect_test!(select_not_range, \"!h1[1]-h2\", \"!5-2\", [\"h1\"], [\"a\"]);\n\nselect_test!(select_duplicate, \"h1,h1\", \"1,1\", [\"h1\", \"h1\"], [\"a\", \"a\"]);\nselect_test!(select_duplicate_range, \"h1-h2,h1-h2\", \"1-2,1-2\",\n             [\"h1\", \"h2\", \"h1\", \"h2\"], [\"a\", \"b\", \"a\", \"b\"]);\nselect_test!(select_duplicate_range_reverse, \"h1-h2,h2-h1\", \"1-2,2-1\",\n             [\"h1\", \"h2\", \"h2\", \"h1\"], [\"a\", \"b\", \"b\", \"a\"]);\n\nselect_test!(select_range_no_end, \"h4-\", \"4-\", [\"h4\", \"h1\"], [\"d\", \"e\"]);\nselect_test!(select_range_no_start, \"-h2\", \"-2\", [\"h1\", \"h2\"], [\"a\", \"b\"]);\nselect_test!(select_range_no_end_cat, \"h4-,h1\", \"4-,1\",\n             [\"h4\", \"h1\", \"h1\"], [\"d\", \"e\", \"a\"]);\nselect_test!(select_range_no_start_cat, \"-h2,h1[1]\", \"-2,5\",\n             [\"h1\", \"h2\", \"h1\"], [\"a\", \"b\", \"e\"]);\n\nselect_test_err!(select_err_unknown_header, \"dne\");\nselect_test_err!(select_err_oob_low, \"0\");\nselect_test_err!(select_err_oob_high, \"6\");\nselect_test_err!(select_err_idx_as_name, \"1[0]\");\nselect_test_err!(select_err_idx_oob_high, \"h1[2]\");\nselect_test_err!(select_err_idx_not_int, \"h1[2.0]\");\nselect_test_err!(select_err_idx_not_int_2, \"h1[a]\");\nselect_test_err!(select_err_unclosed_quote, r#\"\"h1\"#);\nselect_test_err!(select_err_unclosed_bracket, r#\"\"h1\"[1\"#);\nselect_test_err!(select_err_expected_end_of_field, \"a-b-\");\n"
  },
  {
    "path": "tests/test_slice.rs",
    "content": "use std::borrow::ToOwned;\nuse std::process;\n\nuse workdir::Workdir;\n\nmacro_rules! slice_tests {\n    ($name:ident, $start:expr, $end:expr, $expected:expr) => (\n        mod $name {\n            use super::test_slice;\n\n            #[test]\n            fn headers_no_index() {\n                let name = concat!(stringify!($name), \"headers_no_index\");\n                test_slice(name, $start, $end, $expected, true, false, false);\n            }\n\n            #[test]\n            fn no_headers_no_index() {\n                let name = concat!(stringify!($name), \"no_headers_no_index\");\n                test_slice(name, $start, $end, $expected, false, false, false);\n            }\n\n            #[test]\n            fn headers_index() {\n                let name = concat!(stringify!($name), \"headers_index\");\n                test_slice(name, $start, $end, $expected, true, true, false);\n            }\n\n            #[test]\n            fn no_headers_index() {\n                let name = concat!(stringify!($name), \"no_headers_index\");\n                test_slice(name, $start, $end, $expected, false, true, false);\n            }\n\n            #[test]\n            fn headers_no_index_len() {\n                let name = concat!(stringify!($name), \"headers_no_index_len\");\n                test_slice(name, $start, $end, $expected, true, false, true);\n            }\n\n            #[test]\n            fn no_headers_no_index_len() {\n                let name = concat!(stringify!($name),\n                                   \"no_headers_no_index_len\");\n                test_slice(name, $start, $end, $expected, false, false, true);\n            }\n\n            #[test]\n            fn headers_index_len() {\n                let name = concat!(stringify!($name), \"headers_index_len\");\n                test_slice(name, $start, $end, $expected, true, true, true);\n            }\n\n            #[test]\n            fn no_headers_index_len() {\n                let name = concat!(stringify!($name), \"no_headers_index_len\");\n                test_slice(name, $start, $end, $expected, false, true, true);\n            }\n        }\n    );\n}\n\nfn setup(name: &str, headers: bool, use_index: bool)\n        -> (Workdir, process::Command) {\n    let wrk = Workdir::new(name);\n    let mut data = vec![\n        svec![\"a\"], svec![\"b\"], svec![\"c\"], svec![\"d\"], svec![\"e\"]\n    ];\n    if headers { data.insert(0, svec![\"header\"]); }\n    if use_index {\n        wrk.create_indexed(\"in.csv\", data);\n    } else {\n        wrk.create(\"in.csv\", data);\n    }\n\n    let mut cmd = wrk.command(\"slice\");\n    cmd.arg(\"in.csv\");\n\n    (wrk, cmd)\n}\n\nfn test_slice(name: &str, start: Option<usize>, end: Option<usize>,\n              expected: &[&str], headers: bool,\n              use_index: bool, as_len: bool) {\n    let (wrk, mut cmd) = setup(name, headers, use_index);\n    if let Some(start) = start {\n        cmd.arg(\"--start\").arg(&start.to_string());\n    }\n    if let Some(end) = end {\n        if as_len {\n            let start = start.unwrap_or(0);\n            cmd.arg(\"--len\").arg(&(end - start).to_string());\n        } else {\n            cmd.arg(\"--end\").arg(&end.to_string());\n        }\n    }\n    if !headers {\n        cmd.arg(\"--no-headers\");\n    }\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let mut expected = expected.iter()\n                               .map(|&s| vec![s.to_owned()])\n                               .collect::<Vec<Vec<String>>>();\n    if headers { expected.insert(0, svec![\"header\"]); }\n    assert_eq!(got, expected);\n}\n\nfn test_index(name: &str, idx: usize, expected: &str,\n              headers: bool, use_index: bool) {\n    let (wrk, mut cmd) = setup(name, headers, use_index);\n    cmd.arg(\"--index\").arg(&idx.to_string());\n    if !headers {\n        cmd.arg(\"--no-headers\");\n    }\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let mut expected = vec![vec![expected.to_owned()]];\n    if headers { expected.insert(0, svec![\"header\"]); }\n    assert_eq!(got, expected);\n}\n\nslice_tests!(slice_simple, Some(0), Some(1), &[\"a\"]);\nslice_tests!(slice_simple_2, Some(1), Some(3), &[\"b\", \"c\"]);\nslice_tests!(slice_no_start, None, Some(1), &[\"a\"]);\nslice_tests!(slice_no_end, Some(3), None, &[\"d\", \"e\"]);\nslice_tests!(slice_all, None, None, &[\"a\", \"b\", \"c\", \"d\", \"e\"]);\n\n#[test]\nfn slice_index() {\n    test_index(\"slice_index\", 1, \"b\", true, false);\n}\n#[test]\nfn slice_index_no_headers() {\n    test_index(\"slice_index_no_headers\", 1, \"b\", false, false);\n}\n#[test]\nfn slice_index_withindex() {\n    test_index(\"slice_index_withindex\", 1, \"b\", true, true);\n}\n#[test]\nfn slice_index_no_headers_withindex() {\n    test_index(\"slice_index_no_headers_withindex\", 1, \"b\", false, true);\n}\n"
  },
  {
    "path": "tests/test_sort.rs",
    "content": "use std::cmp;\n\nuse workdir::Workdir;\n\nuse {Csv, CsvData, qcheck};\n\nfn prop_sort(name: &str, rows: CsvData, headers: bool) -> bool {\n    let wrk = Workdir::new(name);\n    wrk.create(\"in.csv\", rows.clone());\n\n    let mut cmd = wrk.command(\"sort\");\n    cmd.arg(\"in.csv\");\n    if !headers { cmd.arg(\"--no-headers\"); }\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let mut expected = rows.to_vecs();\n    let headers = if headers && !expected.is_empty() {\n        expected.remove(0)\n    } else {\n        vec![]\n    };\n    expected.sort_by(|r1, r2| iter_cmp(r1.iter(), r2.iter()));\n    if !headers.is_empty() { expected.insert(0, headers); }\n    rassert_eq!(got, expected)\n}\n\n#[test]\nfn prop_sort_headers() {\n    fn p(rows: CsvData) -> bool {\n        prop_sort(\"prop_sort_headers\", rows, true)\n    }\n    qcheck(p as fn(CsvData) -> bool);\n}\n\n#[test]\nfn prop_sort_no_headers() {\n    fn p(rows: CsvData) -> bool {\n        prop_sort(\"prop_sort_no_headers\", rows, false)\n    }\n    qcheck(p as fn(CsvData) -> bool);\n}\n\n#[test]\nfn sort_select() {\n    let wrk = Workdir::new(\"sort_select\");\n    wrk.create(\"in.csv\", vec![svec![\"1\", \"b\"], svec![\"2\", \"a\"]]);\n\n    let mut cmd = wrk.command(\"sort\");\n    cmd.arg(\"--no-headers\").args(&[\"--select\", \"2\"]).arg(\"in.csv\");\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = vec![svec![\"2\", \"a\"], svec![\"1\", \"b\"]];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn sort_numeric() {\n    let wrk = Workdir::new(\"sort_numeric\");\n    wrk.create(\"in.csv\", vec![\n        svec![\"N\", \"S\"],\n        svec![\"10\", \"a\"],\n        svec![\"LETTER\", \"b\"],\n        svec![\"2\", \"c\"],\n        svec![\"1\", \"d\"],\n    ]);\n\n    let mut cmd = wrk.command(\"sort\");\n    cmd.arg(\"-N\").arg(\"in.csv\");\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = vec![\n        svec![\"N\", \"S\"],\n        //Non-numerics should be put first\n        svec![\"LETTER\", \"b\"],\n        svec![\"1\", \"d\"],\n        svec![\"2\", \"c\"],\n        svec![\"10\", \"a\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn sort_numeric_non_natural() {\n    let wrk = Workdir::new(\"sort_numeric_non_natural\");\n    wrk.create(\"in.csv\", vec![\n        svec![\"N\", \"S\"],\n        svec![\"8.33\", \"a\"],\n        svec![\"5\", \"b\"],\n        svec![\"LETTER\", \"c\"],\n        svec![\"7.4\", \"d\"],\n        svec![\"3.33\", \"e\"],\n    ]);\n\n    let mut cmd = wrk.command(\"sort\");\n    cmd.arg(\"-N\").arg(\"in.csv\");\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = vec![\n        svec![\"N\", \"S\"],\n        //Non-numerics should be put first\n        svec![\"LETTER\", \"c\"],\n        svec![\"3.33\", \"e\"],\n        svec![\"5\", \"b\"],\n        svec![\"7.4\", \"d\"],\n        svec![\"8.33\", \"a\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n#[test]\nfn sort_reverse() {\n    let wrk = Workdir::new(\"sort_reverse\");\n    wrk.create(\"in.csv\", vec![\n        svec![\"R\", \"S\"],\n        svec![\"1\", \"b\"],\n        svec![\"2\", \"a\"],\n    ]);\n\n    let mut cmd = wrk.command(\"sort\");\n    cmd.arg(\"-R\").arg(\"--no-headers\").arg(\"in.csv\");\n\n    let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);\n    let expected = vec![\n        svec![\"R\", \"S\"],\n        svec![\"2\", \"a\"],\n        svec![\"1\", \"b\"],\n    ];\n    assert_eq!(got, expected);\n}\n\n/// Order `a` and `b` lexicographically using `Ord`\npub fn iter_cmp<A, L, R>(mut a: L, mut b: R) -> cmp::Ordering\n        where A: Ord, L: Iterator<Item=A>, R: Iterator<Item=A> {\n    loop {\n        match (a.next(), b.next()) {\n            (None, None) => return cmp::Ordering::Equal,\n            (None, _   ) => return cmp::Ordering::Less,\n            (_   , None) => return cmp::Ordering::Greater,\n            (Some(x), Some(y)) => match x.cmp(&y) {\n                cmp::Ordering::Equal => (),\n                non_eq => return non_eq,\n            },\n        }\n    }\n}\n"
  },
  {
    "path": "tests/test_split.rs",
    "content": "use std::borrow::ToOwned;\n\nuse workdir::Workdir;\n\nmacro_rules! split_eq {\n    ($wrk:expr, $path:expr, $expected:expr) => (\n        // assert_eq!($wrk.path($path).into_os_string().into_string().unwrap(),\n                   // $expected.to_owned());\n        assert_eq!($wrk.from_str::<String>(&$wrk.path($path)),\n                   $expected.to_owned());\n    );\n}\n\nfn data(headers: bool) -> Vec<Vec<String>> {\n    let mut rows = vec![\n        svec![\"a\", \"b\"], svec![\"c\", \"d\"],\n        svec![\"e\", \"f\"], svec![\"g\", \"h\"],\n        svec![\"i\", \"j\"], svec![\"k\", \"l\"],\n    ];\n    if headers { rows.insert(0, svec![\"h1\", \"h2\"]); }\n    rows\n}\n\n#[test]\nfn split_zero() {\n    let wrk = Workdir::new(\"split_zero\");\n    wrk.create(\"in.csv\", data(true));\n\n    let mut cmd = wrk.command(\"split\");\n    cmd.args(&[\"--size\", \"0\"]).arg(&wrk.path(\".\")).arg(\"in.csv\");\n    wrk.assert_err(&mut cmd);\n}\n\n#[test]\nfn split() {\n    let wrk = Workdir::new(\"split\");\n    wrk.create(\"in.csv\", data(true));\n\n    let mut cmd = wrk.command(\"split\");\n    cmd.args(&[\"--size\", \"2\"]).arg(&wrk.path(\".\")).arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    split_eq!(wrk, \"0.csv\", \"\\\nh1,h2\na,b\nc,d\n\");\n    split_eq!(wrk, \"2.csv\", \"\\\nh1,h2\ne,f\ng,h\n\");\n    split_eq!(wrk, \"4.csv\", \"\\\nh1,h2\ni,j\nk,l\n\");\n    assert!(!wrk.path(\"6.csv\").exists());\n}\n\n#[test]\nfn split_idx() {\n    let wrk = Workdir::new(\"split_idx\");\n    wrk.create_indexed(\"in.csv\", data(true));\n\n    let mut cmd = wrk.command(\"split\");\n    cmd.args(&[\"--size\", \"2\"]).arg(&wrk.path(\".\")).arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    split_eq!(wrk, \"0.csv\", \"\\\nh1,h2\na,b\nc,d\n\");\n    split_eq!(wrk, \"2.csv\", \"\\\nh1,h2\ne,f\ng,h\n\");\n    split_eq!(wrk, \"4.csv\", \"\\\nh1,h2\ni,j\nk,l\n\");\n    assert!(!wrk.path(\"6.csv\").exists());\n}\n\n#[test]\nfn split_no_headers() {\n    let wrk = Workdir::new(\"split_no_headers\");\n    wrk.create(\"in.csv\", data(false));\n\n    let mut cmd = wrk.command(\"split\");\n    cmd.args(&[\"--no-headers\", \"--size\", \"2\"])\n       .arg(&wrk.path(\".\"))\n       .arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    split_eq!(wrk, \"0.csv\", \"\\\na,b\nc,d\n\");\n    split_eq!(wrk, \"2.csv\", \"\\\ne,f\ng,h\n\");\n    split_eq!(wrk, \"4.csv\", \"\\\ni,j\nk,l\n\");\n}\n\n#[test]\nfn split_no_headers_idx() {\n    let wrk = Workdir::new(\"split_no_headers_idx\");\n    wrk.create_indexed(\"in.csv\", data(false));\n\n    let mut cmd = wrk.command(\"split\");\n    cmd.args(&[\"--no-headers\", \"--size\", \"2\"])\n       .arg(&wrk.path(\".\"))\n       .arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    split_eq!(wrk, \"0.csv\", \"\\\na,b\nc,d\n\");\n    split_eq!(wrk, \"2.csv\", \"\\\ne,f\ng,h\n\");\n    split_eq!(wrk, \"4.csv\", \"\\\ni,j\nk,l\n\");\n}\n\n#[test]\nfn split_one() {\n    let wrk = Workdir::new(\"split_one\");\n    wrk.create(\"in.csv\", data(true));\n\n    let mut cmd = wrk.command(\"split\");\n    cmd.args(&[\"--size\", \"1\"]).arg(&wrk.path(\".\")).arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    split_eq!(wrk, \"0.csv\", \"\\\nh1,h2\na,b\n\");\n    split_eq!(wrk, \"1.csv\", \"\\\nh1,h2\nc,d\n\");\n    split_eq!(wrk, \"2.csv\", \"\\\nh1,h2\ne,f\n\");\n    split_eq!(wrk, \"3.csv\", \"\\\nh1,h2\ng,h\n\");\n    split_eq!(wrk, \"4.csv\", \"\\\nh1,h2\ni,j\n\");\n    split_eq!(wrk, \"5.csv\", \"\\\nh1,h2\nk,l\n\");\n}\n\n#[test]\nfn split_one_idx() {\n    let wrk = Workdir::new(\"split_one_idx\");\n    wrk.create_indexed(\"in.csv\", data(true));\n\n    let mut cmd = wrk.command(\"split\");\n    cmd.args(&[\"--size\", \"1\"]).arg(&wrk.path(\".\")).arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    split_eq!(wrk, \"0.csv\", \"\\\nh1,h2\na,b\n\");\n    split_eq!(wrk, \"1.csv\", \"\\\nh1,h2\nc,d\n\");\n    split_eq!(wrk, \"2.csv\", \"\\\nh1,h2\ne,f\n\");\n    split_eq!(wrk, \"3.csv\", \"\\\nh1,h2\ng,h\n\");\n    split_eq!(wrk, \"4.csv\", \"\\\nh1,h2\ni,j\n\");\n    split_eq!(wrk, \"5.csv\", \"\\\nh1,h2\nk,l\n\");\n}\n\n#[test]\nfn split_uneven() {\n    let wrk = Workdir::new(\"split_uneven\");\n    wrk.create(\"in.csv\", data(true));\n\n    let mut cmd = wrk.command(\"split\");\n    cmd.args(&[\"--size\", \"4\"]).arg(&wrk.path(\".\")).arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    split_eq!(wrk, \"0.csv\", \"\\\nh1,h2\na,b\nc,d\ne,f\ng,h\n\");\n    split_eq!(wrk, \"4.csv\", \"\\\nh1,h2\ni,j\nk,l\n\");\n}\n\n#[test]\nfn split_uneven_idx() {\n    let wrk = Workdir::new(\"split_uneven_idx\");\n    wrk.create_indexed(\"in.csv\", data(true));\n\n    let mut cmd = wrk.command(\"split\");\n    cmd.args(&[\"--size\", \"4\"]).arg(&wrk.path(\".\")).arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    split_eq!(wrk, \"0.csv\", \"\\\nh1,h2\na,b\nc,d\ne,f\ng,h\n\");\n    split_eq!(wrk, \"4.csv\", \"\\\nh1,h2\ni,j\nk,l\n\");\n}\n\n#[test]\nfn split_custom_filename() {\n    let wrk = Workdir::new(\"split\");\n    wrk.create(\"in.csv\", data(true));\n\n    let mut cmd = wrk.command(\"split\");\n    cmd.args(&[\"--size\", \"2\"])\n       .args(&[\"--filename\", \"prefix-{}.csv\"])\n       .arg(&wrk.path(\".\")).arg(\"in.csv\");\n    wrk.run(&mut cmd);\n\n    assert!(wrk.path(\"prefix-0.csv\").exists());\n    assert!(wrk.path(\"prefix-2.csv\").exists());\n    assert!(wrk.path(\"prefix-4.csv\").exists());\n}\n"
  },
  {
    "path": "tests/test_stats.rs",
    "content": "use std::borrow::ToOwned;\nuse std::cmp;\nuse std::process;\n\nuse workdir::Workdir;\n\nmacro_rules! stats_tests {\n    ($name:ident, $field:expr, $rows:expr, $expect:expr) => (\n        stats_tests!($name, $field, $rows, $expect, false);\n    );\n    ($name:ident, $field:expr, $rows:expr, $expect:expr, $nulls:expr) => (\n        mod $name {\n            use super::test_stats;\n\n            stats_test_headers!($name, $field, $rows, $expect, $nulls);\n            stats_test_no_headers!($name, $field, $rows, $expect, $nulls);\n        }\n    );\n}\n\nmacro_rules! stats_test_headers {\n    ($name:ident, $field:expr, $rows:expr, $expect:expr) => (\n        stats_test_headers!($name, $field, $rows, $expect, false);\n    );\n    ($name:ident, $field:expr, $rows:expr, $expect:expr, $nulls:expr) => (\n        #[test]\n        fn headers_no_index() {\n            let name = concat!(stringify!($name), \"_headers_no_index\");\n            test_stats(name, $field, $rows, $expect, true, false, $nulls);\n        }\n\n        #[test]\n        fn headers_index() {\n            let name = concat!(stringify!($name), \"_headers_index\");\n            test_stats(name, $field, $rows, $expect, true, true, $nulls);\n        }\n    );\n}\n\nmacro_rules! stats_test_no_headers {\n    ($name:ident, $field:expr, $rows:expr, $expect:expr) => (\n        stats_test_no_headers!($name, $field, $rows, $expect, false);\n    );\n    ($name:ident, $field:expr, $rows:expr, $expect:expr, $nulls:expr) => (\n        #[test]\n        fn no_headers_no_index() {\n            let name = concat!(stringify!($name), \"_no_headers_no_index\");\n            test_stats(name, $field, $rows, $expect, false, false, $nulls);\n        }\n\n        #[test]\n        fn no_headers_index() {\n            let name = concat!(stringify!($name), \"_no_headers_index\");\n            test_stats(name, $field, $rows, $expect, false, true, $nulls);\n        }\n    );\n}\n\nfn test_stats<S>(name: S, field: &str, rows: &[&str], expected: &str,\n                 headers: bool, use_index: bool, nulls: bool)\n        where S: ::std::ops::Deref<Target=str> {\n    let (wrk, mut cmd) = setup(name, rows, headers, use_index, nulls);\n    let field_val = get_field_value(&wrk, &mut cmd, field);\n    // Only compare the first few bytes since floating point arithmetic\n    // can mess with exact comparisons.\n    let len = cmp::min(10, cmp::min(field_val.len(), expected.len()));\n    assert_eq!(&field_val[0..len], &expected[0..len]);\n}\n\nfn setup<S>(name: S, rows: &[&str], headers: bool,\n            use_index: bool, nulls: bool) -> (Workdir, process::Command)\n        where S: ::std::ops::Deref<Target=str> {\n    let wrk = Workdir::new(&name);\n    let mut data: Vec<Vec<String>> =\n        rows.iter().map(|&s| vec![s.to_owned()]).collect();\n    if headers { data.insert(0, svec![\"header\"]); }\n    if use_index {\n        wrk.create_indexed(\"in.csv\", data);\n    } else {\n        wrk.create(\"in.csv\", data);\n    }\n\n    let mut cmd = wrk.command(\"stats\");\n    cmd.arg(\"in.csv\");\n    if !headers { cmd.arg(\"--no-headers\"); }\n    if nulls { cmd.arg(\"--nulls\"); }\n\n    (wrk, cmd)\n}\n\nfn get_field_value(wrk: &Workdir, cmd: &mut process::Command, field: &str)\n                  -> String {\n    if field == \"median\" { cmd.arg(\"--median\"); }\n    if field == \"cardinality\" { cmd.arg(\"--cardinality\"); }\n    if field == \"mode\" { cmd.arg(\"--mode\"); }\n\n    let mut rows: Vec<Vec<String>> = wrk.read_stdout(cmd);\n    let headers = rows.remove(0);\n    for row in rows.iter() {\n        for (h, val) in headers.iter().zip(row.iter()) {\n            if &**h == field {\n                return val.clone();\n            }\n        }\n    }\n    panic!(\"BUG: Could not find field '{}' in headers '{:?}' \\\n            for command '{:?}'.\", field, headers, cmd);\n}\n\nstats_tests!(stats_infer_unicode, \"type\", &[\"a\"], \"Unicode\");\nstats_tests!(stats_infer_int, \"type\", &[\"1\"], \"Integer\");\nstats_tests!(stats_infer_float, \"type\", &[\"1.2\"], \"Float\");\nstats_tests!(stats_infer_null, \"type\", &[\"\"], \"NULL\");\nstats_tests!(stats_infer_unicode_null, \"type\", &[\"a\", \"\"], \"Unicode\");\nstats_tests!(stats_infer_int_null, \"type\", &[\"1\", \"\"], \"Integer\");\nstats_tests!(stats_infer_float_null, \"type\", &[\"1.2\", \"\"], \"Float\");\nstats_tests!(stats_infer_null_unicode, \"type\", &[\"\", \"a\"], \"Unicode\");\nstats_tests!(stats_infer_null_int, \"type\", &[\"\", \"1\"], \"Integer\");\nstats_tests!(stats_infer_null_float, \"type\", &[\"\", \"1.2\"], \"Float\");\nstats_tests!(stats_infer_int_unicode, \"type\", &[\"1\", \"a\"], \"Unicode\");\nstats_tests!(stats_infer_unicode_int, \"type\", &[\"a\", \"1\"], \"Unicode\");\nstats_tests!(stats_infer_int_float, \"type\", &[\"1\", \"1.2\"], \"Float\");\nstats_tests!(stats_infer_float_int, \"type\", &[\"1.2\", \"1\"], \"Float\");\nstats_tests!(stats_infer_null_int_float_unicode, \"type\",\n             &[\"\", \"1\", \"1.2\", \"a\"], \"Unicode\");\n\nstats_tests!(stats_no_mean, \"mean\", &[\"a\"], \"\");\nstats_tests!(stats_no_stddev, \"stddev\", &[\"a\"], \"\");\nstats_tests!(stats_no_median, \"median\", &[\"a\"], \"\");\nstats_tests!(stats_no_mode, \"mode\", &[\"a\", \"b\"], \"N/A\");\n\nstats_tests!(stats_null_mean, \"mean\", &[\"\"], \"\");\nstats_tests!(stats_null_stddev, \"stddev\", &[\"\"], \"\");\nstats_tests!(stats_null_median, \"median\", &[\"\"], \"\");\nstats_tests!(stats_null_mode, \"mode\", &[\"\"], \"N/A\");\n\nstats_tests!(stats_includenulls_null_mean, \"mean\", &[\"\"], \"\", true);\nstats_tests!(stats_includenulls_null_stddev, \"stddev\", &[\"\"], \"\", true);\nstats_tests!(stats_includenulls_null_median, \"median\", &[\"\"], \"\", true);\nstats_tests!(stats_includenulls_null_mode, \"mode\", &[\"\"], \"N/A\", true);\n\nstats_tests!(stats_includenulls_mean,\n             \"mean\", &[\"5\", \"\", \"15\", \"10\"], \"7.5\", true);\n\nstats_tests!(stats_sum_integers, \"sum\", &[\"1\", \"2\"], \"3\");\nstats_tests!(stats_sum_floats, \"sum\", &[\"1.5\", \"2.8\"], \"4.3\");\nstats_tests!(stats_sum_mixed1, \"sum\", &[\"1.5\", \"2\"], \"3.5\");\nstats_tests!(stats_sum_mixed2, \"sum\", &[\"2\", \"1.5\"], \"3.5\");\nstats_tests!(stats_sum_mixed3, \"sum\", &[\"1.5\", \"hi\", \"2.8\"], \"4.3\");\nstats_tests!(stats_sum_nulls1, \"sum\", &[\"1\", \"\", \"2\"], \"3\");\nstats_tests!(stats_sum_nulls2, \"sum\", &[\"\", \"1\", \"2\"], \"3\");\n\nstats_tests!(stats_min, \"min\", &[\"2\", \"1.1\"], \"1.1\");\nstats_tests!(stats_max, \"max\", &[\"2\", \"1.1\"], \"2\");\nstats_tests!(stats_min_mix, \"min\", &[\"2\", \"a\", \"1.1\"], \"1.1\");\nstats_tests!(stats_max_mix, \"max\", &[\"2\", \"a\", \"1.1\"], \"a\");\nstats_tests!(stats_min_null, \"min\", &[\"\", \"2\", \"1.1\"], \"1.1\");\nstats_tests!(stats_max_null, \"max\", &[\"2\", \"1.1\", \"\"], \"2\");\n\nstats_tests!(stats_len_min, \"min_length\", &[\"aa\", \"a\"], \"1\");\nstats_tests!(stats_len_max, \"max_length\", &[\"a\", \"aa\"], \"2\");\nstats_tests!(stats_len_min_null, \"min_length\", &[\"\", \"aa\", \"a\"], \"0\");\nstats_tests!(stats_len_max_null, \"max_length\", &[\"a\", \"aa\", \"\"], \"2\");\n\nstats_tests!(stats_mean, \"mean\", &[\"5\", \"15\", \"10\"], \"10\");\nstats_tests!(stats_stddev, \"stddev\", &[\"1\", \"2\", \"3\"], \"0.816496580927726\");\nstats_tests!(stats_mean_null, \"mean\", &[\"\", \"5\", \"15\", \"10\"], \"10\");\nstats_tests!(stats_stddev_null, \"stddev\", &[\"1\", \"2\", \"3\", \"\"],\n             \"0.816496580927726\");\nstats_tests!(stats_mean_mix, \"mean\", &[\"5\", \"15.1\", \"9.9\"], \"10\");\nstats_tests!(stats_stddev_mix, \"stddev\", &[\"1\", \"2.1\", \"2.9\"],\n             \"0.7788880963698614\");\n\nstats_tests!(stats_cardinality, \"cardinality\", &[\"a\", \"b\", \"a\"], \"2\");\nstats_tests!(stats_mode, \"mode\", &[\"a\", \"b\", \"a\"], \"a\");\nstats_tests!(stats_mode_null, \"mode\", &[\"\", \"a\", \"b\", \"a\"], \"a\");\nstats_tests!(stats_median, \"median\", &[\"1\", \"2\", \"3\"], \"2\");\nstats_tests!(stats_median_null, \"median\", &[\"\", \"1\", \"2\", \"3\"], \"2\");\nstats_tests!(stats_median_even, \"median\", &[\"1\", \"2\", \"3\", \"4\"], \"2.5\");\nstats_tests!(stats_median_even_null, \"median\",\n             &[\"\", \"1\", \"2\", \"3\", \"4\"], \"2.5\");\nstats_tests!(stats_median_mix, \"median\", &[\"1\", \"2.5\", \"3\"], \"2.5\");\n\nmod stats_infer_nothing {\n    // Only test CSV data with headers.\n    // Empty CSV data with no headers won't produce any statistical analysis.\n    use super::test_stats;\n    stats_test_headers!(stats_infer_nothing, \"type\", &[], \"NULL\");\n}\n\nmod stats_zero_cardinality {\n    use super::test_stats;\n    stats_test_headers!(stats_zero_cardinality, \"cardinality\", &[], \"0\");\n}\n\nmod stats_zero_mode {\n    use super::test_stats;\n    stats_test_headers!(stats_zero_mode, \"mode\", &[], \"N/A\");\n}\n\nmod stats_zero_mean {\n    use super::test_stats;\n    stats_test_headers!(stats_zero_mean, \"mean\", &[], \"\");\n}\n\nmod stats_zero_median {\n    use super::test_stats;\n    stats_test_headers!(stats_zero_median, \"median\", &[], \"\");\n}\n\nmod stats_header_fields {\n    use super::test_stats;\n    stats_test_headers!(stats_header_field_name, \"field\", &[\"a\"], \"header\");\n    stats_test_no_headers!(stats_header_no_field_name, \"field\", &[\"a\"], \"0\");\n}\n"
  },
  {
    "path": "tests/test_table.rs",
    "content": "use workdir::Workdir;\n\nfn data() -> Vec<Vec<String>> {\n    vec![\n        svec![\"h1\", \"h2\", \"h3\"],\n        svec![\"abcdefg\", \"a\", \"a\"],\n        svec![\"a\", \"abc\", \"z\"],\n    ]\n}\n\n#[test]\nfn table() {\n    let wrk = Workdir::new(\"table\");\n    wrk.create(\"in.csv\", data());\n\n    let mut cmd = wrk.command(\"table\");\n    cmd.arg(\"in.csv\");\n\n    let got: String = wrk.stdout(&mut cmd);\n    assert_eq!(&*got, \"\\\nh1       h2   h3\nabcdefg  a    a\na        abc  z\\\n\")\n}\n"
  },
  {
    "path": "tests/tests.rs",
    "content": "#![allow(dead_code)]\n\n#[macro_use]\nextern crate log;\n#[macro_use]\nextern crate serde_derive;\n\nextern crate csv;\nextern crate filetime;\nextern crate quickcheck;\nextern crate rand;\nextern crate stats;\n\nuse std::fmt;\nuse std::mem::transmute;\nuse std::ops;\n\nuse quickcheck::{Arbitrary, Gen, QuickCheck, StdGen, Testable};\nuse rand::{Rng, thread_rng};\n\nmacro_rules! svec[\n    ($($x:expr),*) => (\n        vec![$($x),*].into_iter()\n                     .map(|s: &'static str| s.to_string())\n                     .collect::<Vec<String>>()\n    );\n    ($($x:expr,)*) => (svec![$($x),*]);\n];\n\nmacro_rules! rassert_eq {\n    ($given:expr, $expected:expr) => ({assert_eq!($given, $expected); true});\n}\n\nmod workdir;\n\nmod test_cat;\nmod test_count;\nmod test_fixlengths;\nmod test_flatten;\nmod test_fmt;\nmod test_frequency;\nmod test_headers;\nmod test_index;\nmod test_join;\nmod test_partition;\nmod test_reverse;\nmod test_search;\nmod test_select;\nmod test_slice;\nmod test_sort;\nmod test_split;\nmod test_stats;\nmod test_table;\n\nfn qcheck<T: Testable>(p: T) {\n    QuickCheck::new().gen(StdGen::new(thread_rng(), 5)).quickcheck(p);\n}\n\nfn qcheck_sized<T: Testable>(p: T, size: usize) {\n    QuickCheck::new().gen(StdGen::new(thread_rng(), size)).quickcheck(p);\n}\n\npub type CsvVecs = Vec<Vec<String>>;\n\npub trait Csv {\n    fn to_vecs(self) -> CsvVecs;\n    fn from_vecs(CsvVecs) -> Self;\n}\n\nimpl Csv for CsvVecs {\n    fn to_vecs(self) -> CsvVecs { self }\n    fn from_vecs(vecs: CsvVecs) -> CsvVecs { vecs }\n}\n\n#[derive(Clone, Eq, Ord, PartialEq, PartialOrd)]\nstruct CsvRecord(Vec<String>);\n\nimpl CsvRecord {\n    fn unwrap(self) -> Vec<String> {\n        let CsvRecord(v) = self;\n        v\n    }\n}\n\nimpl ops::Deref for CsvRecord {\n    type Target = [String];\n    fn deref<'a>(&'a self) -> &'a [String] { &*self.0 }\n}\n\nimpl fmt::Debug for CsvRecord {\n    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {\n        let bytes: Vec<_> = self.iter()\n                                .map(|s| s.as_bytes())\n                                .collect();\n        write!(f, \"{:?}\", bytes)\n    }\n}\n\nimpl Arbitrary for CsvRecord {\n    fn arbitrary<G: Gen>(g: &mut G) -> CsvRecord {\n        let size = { let s = g.size(); g.gen_range(1, s) };\n        CsvRecord((0..size).map(|_| Arbitrary::arbitrary(g)).collect())\n    }\n\n    fn shrink(&self) -> Box<Iterator<Item=CsvRecord>+'static> {\n        Box::new(self.clone().unwrap()\n                     .shrink().filter(|r| r.len() > 0).map(CsvRecord))\n    }\n}\n\nimpl Csv for Vec<CsvRecord> {\n    fn to_vecs(self) -> CsvVecs {\n        unsafe { transmute(self) }\n    }\n    fn from_vecs(vecs: CsvVecs) -> Vec<CsvRecord> {\n        unsafe { transmute(vecs) }\n    }\n}\n\n#[derive(Clone, Debug, Eq, Ord, PartialOrd)]\nstruct CsvData {\n    data: Vec<CsvRecord>,\n}\n\nimpl CsvData {\n    fn unwrap(self) -> Vec<CsvRecord> { self.data }\n\n    fn len(&self) -> usize { (&**self).len() }\n\n    fn is_empty(&self) -> bool { self.len() == 0 }\n}\n\nimpl ops::Deref for CsvData {\n    type Target = [CsvRecord];\n    fn deref<'a>(&'a self) -> &'a [CsvRecord] { &*self.data }\n}\n\nimpl Arbitrary for CsvData {\n    fn arbitrary<G: Gen>(g: &mut G) -> CsvData {\n        let record_len = { let s = g.size(); g.gen_range(1, s) };\n        let num_records: usize = g.gen_range(0, 100);\n        CsvData{\n            data: (0..num_records).map(|_| {\n                CsvRecord((0..record_len)\n                          .map(|_| Arbitrary::arbitrary(g))\n                          .collect())\n            }).collect(),\n        }\n    }\n\n    fn shrink(&self) -> Box<Iterator<Item=CsvData>+'static> {\n        let len = if self.is_empty() { 0 } else { self[0].len() };\n        let mut rows: Vec<CsvData> =\n            self.clone()\n                .unwrap()\n                .shrink()\n                .filter(|rows| rows.iter().all(|r| r.len() == len))\n                .map(|rows| CsvData { data: rows })\n                .collect();\n        // We should also introduce CSV data with fewer columns...\n        if len > 1 {\n            rows.extend(\n                self.clone()\n                    .unwrap()\n                    .shrink()\n                    .filter(|rows|\n                        rows.iter().all(|r| r.len() == len - 1))\n                    .map(|rows| CsvData { data: rows }));\n        }\n        Box::new(rows.into_iter())\n    }\n}\n\nimpl Csv for CsvData {\n    fn to_vecs(self) -> CsvVecs { unsafe { transmute(self.data) } }\n    fn from_vecs(vecs: CsvVecs) -> CsvData {\n        CsvData {\n            data: unsafe { transmute(vecs) },\n        }\n    }\n}\n\nimpl PartialEq for CsvData {\n    fn eq(&self, other: &CsvData) -> bool {\n        (self.data.is_empty() && other.data.is_empty())\n        || self.data == other.data\n    }\n}\n"
  },
  {
    "path": "tests/workdir.rs",
    "content": "use std::env;\nuse std::fmt;\nuse std::fs;\nuse std::io::{self, Read};\nuse std::path::{Path, PathBuf};\nuse std::process;\nuse std::str::FromStr;\nuse std::sync::atomic;\nuse std::time::Duration;\n\nuse csv;\n\nuse Csv;\n\nstatic XSV_INTEGRATION_TEST_DIR: &'static str = \"xit\";\n\nstatic NEXT_ID: atomic::AtomicUsize = atomic::ATOMIC_USIZE_INIT;\n\npub struct Workdir {\n    root: PathBuf,\n    dir: PathBuf,\n    flexible: bool,\n}\n\nimpl Workdir {\n    pub fn new(name: &str) -> Workdir {\n        let id = NEXT_ID.fetch_add(1, atomic::Ordering::SeqCst);\n        let mut root = env::current_exe().unwrap()\n                           .parent()\n                           .expect(\"executable's directory\")\n                           .to_path_buf();\n        if root.ends_with(\"deps\") {\n            root.pop();\n        }\n        let dir = root.join(XSV_INTEGRATION_TEST_DIR)\n                      .join(name)\n                      .join(&format!(\"test-{}\", id));\n        // println!(\"{:?}\", dir);\n        if let Err(err) = create_dir_all(&dir) {\n            panic!(\"Could not create '{:?}': {}\", dir, err);\n        }\n        Workdir { root: root, dir: dir, flexible: false }\n    }\n\n    pub fn flexible(mut self, yes: bool) -> Workdir {\n        self.flexible = yes;\n        self\n    }\n\n    pub fn create<T: Csv>(&self, name: &str, rows: T) {\n        let mut wtr = csv::WriterBuilder::new()\n            .flexible(self.flexible)\n            .from_path(&self.path(name))\n            .unwrap();\n        for row in rows.to_vecs().into_iter() {\n            wtr.write_record(row).unwrap();\n        }\n        wtr.flush().unwrap();\n    }\n\n    pub fn create_indexed<T: Csv>(&self, name: &str, rows: T) {\n        self.create(name, rows);\n\n        let mut cmd = self.command(\"index\");\n        cmd.arg(name);\n        self.run(&mut cmd);\n    }\n\n    pub fn read_stdout<T: Csv>(&self, cmd: &mut process::Command) -> T {\n        let stdout: String = self.stdout(cmd);\n        let mut rdr = csv::ReaderBuilder::new()\n            .has_headers(false)\n            .from_reader(io::Cursor::new(stdout));\n\n        let records: Vec<Vec<String>> = rdr\n            .records()\n            .collect::<Result<Vec<csv::StringRecord>, _>>()\n            .unwrap()\n            .into_iter()\n            .map(|r| r.iter().map(|f| f.to_string()).collect())\n            .collect();\n        Csv::from_vecs(records)\n    }\n\n    pub fn command(&self, sub_command: &str) -> process::Command {\n        let mut cmd = process::Command::new(&self.xsv_bin());\n        cmd.current_dir(&self.dir).arg(sub_command);\n        cmd\n    }\n\n    pub fn output(&self, cmd: &mut process::Command) -> process::Output {\n        debug!(\"[{}]: {:?}\", self.dir.display(), cmd);\n        println!(\"[{}]: {:?}\", self.dir.display(), cmd);\n        let o = cmd.output().unwrap();\n        if !o.status.success() {\n            panic!(\"\\n\\n===== {:?} =====\\n\\\n                    command failed but expected success!\\\n                    \\n\\ncwd: {}\\\n                    \\n\\nstatus: {}\\\n                    \\n\\nstdout: {}\\n\\nstderr: {}\\\n                    \\n\\n=====\\n\",\n                   cmd, self.dir.display(), o.status,\n                   String::from_utf8_lossy(&o.stdout),\n                   String::from_utf8_lossy(&o.stderr))\n        }\n        o\n    }\n\n    pub fn run(&self, cmd: &mut process::Command) {\n        self.output(cmd);\n    }\n\n    pub fn stdout<T: FromStr>(&self, cmd: &mut process::Command) -> T {\n        let o = self.output(cmd);\n        let stdout = String::from_utf8_lossy(&o.stdout);\n        stdout.trim_matches(&['\\r', '\\n'][..]).parse().ok().expect(\n            &format!(\"Could not convert from string: '{}'\", stdout))\n    }\n\n    pub fn assert_err(&self, cmd: &mut process::Command) {\n        let o = cmd.output().unwrap();\n        if o.status.success() {\n            panic!(\"\\n\\n===== {:?} =====\\n\\\n                    command succeeded but expected failure!\\\n                    \\n\\ncwd: {}\\\n                    \\n\\nstatus: {}\\\n                    \\n\\nstdout: {}\\n\\nstderr: {}\\\n                    \\n\\n=====\\n\",\n                   cmd, self.dir.display(), o.status,\n                   String::from_utf8_lossy(&o.stdout),\n                   String::from_utf8_lossy(&o.stderr));\n        }\n    }\n\n    pub fn from_str<T: FromStr>(&self, name: &Path) -> T {\n        let mut o = String::new();\n        fs::File::open(name).unwrap().read_to_string(&mut o).unwrap();\n        o.parse().ok().expect(\"fromstr\")\n    }\n\n    pub fn path(&self, name: &str) -> PathBuf {\n        self.dir.join(name)\n    }\n\n    pub fn xsv_bin(&self) -> PathBuf {\n        self.root.join(\"xsv\")\n    }\n}\n\nimpl fmt::Debug for Workdir {\n    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {\n        write!(f, \"path={}\", self.dir.display())\n    }\n}\n\n// For whatever reason, `fs::create_dir_all` fails intermittently on Travis\n// with a weird \"file exists\" error. Despite my best efforts to get to the\n// bottom of it, I've decided a try-wait-and-retry hack is good enough.\nfn create_dir_all<P: AsRef<Path>>(p: P) -> io::Result<()> {\n    let mut last_err = None;\n    for _ in 0..10 {\n        if let Err(err) = fs::create_dir_all(&p) {\n            last_err = Some(err);\n            ::std::thread::sleep(Duration::from_millis(500));\n        } else {\n            return Ok(())\n        }\n    }\n    Err(last_err.unwrap())\n}\n"
  }
]